-
Notifications
You must be signed in to change notification settings - Fork 1
Data Storage
##Organization of the nodes
All nodes have abel
(the UiO supercomputer cluster) disks, and your abel home area mounted to them. So, all the files located in /projects are available on the cod nodes, see below.
In addition, the nodes have local discs, currently:
-
/node/data
for permanent files, e.g input to your program -
/node/work
--> working area for your programs - see below for details.
##Data storage
#####Choosing where to work with your data:
- Do not use your abel home area, you only have 200 GB and you are not meant to share data there with others, e.g. your colleagues
- Data on
/projects/cees
is backed up by USIT, but NOT data on/node/data
and/node/work
- Reading and writing data to and from
/node/data
and/node/work
will be much faster and efficient than to/projects/cees
- DO NOT USE
/projects/cees
for data needed as (medium to large) analyses; use/node/work
on the cod nodes, $SCRATCH for SLURM jobs, or - if we decide this -/work
on Abel
This leads to the following strategy for how to choose which disk to use:
- For something short and quick, eg. less, tar, you can directly work on data in
/projects/cees
- For a long running program, or one that generates a lot of data over a long time, use the locally attached
/data
and/work
once the long running job is done, you can move the data you want to keep to/projects/cees
NOTE: Having your program write a lot over a long time to a file on /projects/cees
causes problems for the backup system, as the file may be changed during backup
NOTE: Use compression (gzip, pbzip2) where possible!
- For long-term storage of data you do not need regular access to, please use the NIRD (replacement for norstore) allocation, see below
##Long term storage of data
- Long term storage of data that you generally do not need to access: NIRD tape (time to recover the data: long)
- Long term storage of data you may need to access: NIRD disk (time to recover the data: intermediate, rsync to working area )
- Data from finished publications: appropriate database (e.g. genbank, SRA), datadryad, figshare, or NIRD archive
- Storage of project data ( finished analyses) that you need to access occasionally/regularly : /projects/cees/in_progress
Access : See getting access.
-
Command to get into NIRD:
ssh login.nird.sigma2.no
-
NOTE that you need a separate password for NIRD, you cannot use your UiO password. If you don't know your NIRD passwrod, you can reset it at https://www.metacenter.no/
-
When you log in, you will be in the folder
/nird/home/username
, our data is stored in/projects/NS9003K
-
Folders there are:
- 454data, **runsIllumina, **runsPacbio -> From the Norwegian Sequencing Centre, do not touch
- projects -> where you can store your files
-
In
/projects/NS9003K/projects
, please use the same foldername as you use on Abel in/projects/cees/in_progress
. Add clear README files
NOTES:
-
In general it is wise to copy data first, then delete the original. This prevent accidental data loss. Do not use the 'mv' command for big files!
-
Use rsync, it preserves permissions and timestamps, and allows for finishing an interrupted copy job without having to copy every file again.
-
Use 'screen' (Only possible with option 2)
-
See note on tarballs and md5 sums below
######Option 1: When you are logged in on NIRD:
cd /projects/NS9003K/projects/path/to/yourfolder rsync -av cod5.uio.no:/projects/cees/in_progress/path/to/folder_to_copy
NOTE: The
.
at the end NOTE adding a trailing slash/
to thefolder_to_copy
will only copy its content, not the whole folder!######Option 2: When you are logged in on
abel/cod nodes
:cd /projects/cees/in_progress/path/where/folder_to_copy/is rsync -av folder_to_copy login.nird.sigma2.no:/projects/NS9003K/projects/path/to/yourfolder
NOTE: Adding a trailing slash
/
to the folder_to_copy will only copy its content, not the whole folder!
-
The tape storage is for storing data which is accessed less frequently, or duplicating data stored on a disk. As we have a limited number of files that can be stored on tape (100 per 1TB of tape space), ALWAYS make a compressed tarball of your d ata first (i.e., before copying to NIRD):
tar -cvzf filename.tgz your_folder
This will collect and compress
your_folder
, with all files and folders in it (i.e., recursively) into one big file.NOTE: Please add a clear README to your tarball! NOTE this may take a long time, use 'screen'!
TIP: generate an md5 checksum:
md5sum filename.tgz > filename.tgz.md5
This allows for checking whether two files are identical (really and completely). The program generates a long, unique string that is different for each file (one byte difference creates an entirely different string).
NOTE: This takes a long time for large files, use 'screen'.
- Once your file is on NIRD, run the same md5sum command and compare the output. Alternatively, run this command:
md5sum --check filename.tgz.md5
-
To copy the tarball to tape
NOTE: You have to be logged in to NIRD for this
WriteToTape filename.tgz NS9003K projects/yourfolder
This will copy the file filename.tgz under
/tape/NS9003K/projects/
yourfolderIf you will not keep a copy of the file elsewhere, add the
-- replicate
flag:WriteToTape filename.tgz NS9003K projects/yourfolder --replicate
This will, in addition to copying to
/tape/NS9003K/projects/yourfolder
, create a replica under/replica/NS9003K/projects/yourfolder
(on a different tape)You are asked to confirm the job, and be told you will receive an email once the writing is done (the job is queued so you can close the NIRD session if you wish).
-
To list files stored on tape
lst -la /tape/NS9003K
-
Delete a file from tape
DeleteFromTape /tape/NS9003K/projects/yourfolder/yourfile.tgz
-
Copy a file from tape back to the NIRD disks
cpt /tape/NS9003K/projects/yourfolder/yourfile.tgz /projects/NS9003K/projects/path/to/yourfolder
-
List files inside a tarball
tart /tape/NS9003K/projects/yourfolder/yourfile.tgz
-
Replicating a file which is already on tape
MakeReplica /tape/NS9003K/projects/yourfolder/yourfile.tgz
This will request to replicate
/tape/NS9003K/projects/yourfolder/ yourfile.tgz
to/replica/NS9003K/projects/yourfolder/yourfile.tgz
-
More commands: See https://www.norstore.no/services/tape-storage
Back to the CEES-HPC wiki home page