-
Notifications
You must be signed in to change notification settings - Fork 1
Data Storage
All nodes have abel
(the UiO supercomputer cluster) disks, and your abel home area mounted to them. So, all the files located in /projects are available on the cod nodes, see below.
In addition, the nodes have local discs, currently:
-
/node/data
for permanent files, e.g input to your program -
/node/work
--> working area for your programs - see below for details.
- Do not use your abel home area, you only have 200 GB and you are not meant to share data there with others, e.g. your colleagues
- Data on
/projects/cees
is backed up by USIT, but NOT data on/node/data
and/node/work
- Reading and writing data to and from
/node/data
and/node/work
will be much faster and efficient than to/projects/cees
- DO NOT USE
/projects/cees
for data needed as (medium to large) analyses; use/node/work
on the cod nodes, $SCRATCH for SLURM jobs, or - if we decide this -/work
on Abel
This leads to the following strategy for how to choose which disk to use:
- For something short and quick, eg. less, tar, you can directly work on data in
/projects/cees
- For a long running program, or one that generates a lot of data over a long time, use the locally attached
/data
and/work
once the long running job is done, you can move the data you want to keep to/projects/cees
NOTE: Having your program write a lot over a long time to a file on /projects/cees
causes problems for the backup system, as the file may be changed during backup
NOTE: Use compression (gzip, pbzip2) where possible!
- For long-term storage of data you do not need regular access to, please use the NIRD (replacement for norstore) allocation, see below
- Long term storage of data you may need to access: NIRD disk (time to recover the data: intermediate, rsync to working area )
- Data from finished publications: appropriate database (e.g. genbank, SRA), datadryad, figshare, or NIRD archive
- Storage of project data ( finished analyses) that you need to access occasionally/regularly : /projects/cees/in_progress
Access : See getting access.
-
Command to get into NIRD:
ssh login.nird.sigma2.no
-
NOTE that you need a separate password for NIRD, you cannot use your UiO password. If you don't know your NIRD passwrod, you can reset it at https://www.metacenter.no/
-
When you log in, you will be in the folder
/nird/home/username
, our data is stored in/projects/NS9003K
-
Folders there are:
- 454data, **runsIllumina, **runsPacbio -> From the Norwegian Sequencing Centre, do not touch
- projects -> where you can store your files
-
In
/projects/NS9003K/projects
, please use the same foldername as you use on Abel in/projects/cees/in_progress
. Add clear README files
NOTES:
-
In general it is wise to copy data first, then delete the original. This prevent accidental data loss. Do not use the 'mv' command for big files!
-
Use rsync, it preserves permissions and timestamps, and allows for finishing an interrupted copy job without having to copy every file again.
-
Use 'screen' (Only possible with option 2)
-
See note on tarballs and md5 sums at this page
When you are logged in on NIRD:
cd /projects/NS9003K/projects/path/to/yourfolder
rsync -av cod5.uio.no:/projects/cees/in_progress/path/to/folder_to_copy
NOTE: The .
at the end NOTE adding a trailing slash /
to the folder_to_copy
will only copy its content, not the whole folder!
When you are logged in on abel/cod nodes
:
cd /projects/cees/in_progress/path/where/folder_to_copy/is
rsync -av folder_to_copy login.nird.sigma2.no:/projects/NS9003K/projects/path/to/yourfolder
NOTE: Adding a trailing slash /
to the folder_to_copy will only copy its content, not the whole folder!
Back to the CEES-HPC wiki home page