Skip to content
Lex Nederbragt edited this page Feb 23, 2018 · 13 revisions

Organization of the nodes

All nodes have abel (the UiO supercomputer cluster) disks, and your abel home area mounted to them. So, all the files located in /projects are available on the cod nodes, see below.

In addition, the nodes have local discs, currently:

  • /node/data for permanent files, e.g input to your program
  • /node/work --> working area for your programs - see below for details.

Data storage

Choosing where to work with your data:

Work And Store

  • Do not use your abel home area, you only have 200 GB and you are not meant to share data there with others, e.g. your colleagues
  • Data on /projects/cees is backed up by USIT, but NOT data on /node/data and /node/work
  • Reading and writing data to and from /node/data and /node/work will be much faster and efficient than to /projects/cees
  • DO NOT USE /projects/cees for data needed as (medium to large) analyses; use /node/work on the cod nodes, $SCRATCH for SLURM jobs, or - if we decide this - /work on Abel

This leads to the following strategy for how to choose which disk to use:

  1. For something short and quick, eg. less, tar, you can directly work on data in /projects/cees
  2. For a long running program, or one that generates a lot of data over a long time, use the locally attached /data and /work once the long running job is done, you can move the data you want to keep to /projects/cees

NOTE: Having your program write a lot over a long time to a file on /projects/cees causes problems for the backup system, as the file may be changed during backup

NOTE: Use compression (gzip, pbzip2) where possible!

  1. For long-term storage of data you do not need regular access to, please use the NIRD (replacement for norstore) allocation, see below

Long term storage of data

  • Long term storage of data you may need to access: NIRD disk (time to recover the data: intermediate, rsync to working area )
  • Data from finished publications: appropriate database (e.g. genbank, SRA), datadryad, figshare, or NIRD archive
  • Storage of project data ( finished analyses) that you need to access occasionally/regularly : /projects/cees/in_progress

Use of NIRD

Access : See getting access.

Organisation of the area

  • Command to get into NIRD:

    ssh login.nird.sigma2.no

  • NOTE that you need a separate password for NIRD, you cannot use your UiO password. If you don't know your NIRD passwrod, you can reset it at https://www.metacenter.no/

  • When you log in, you will be in the folder /nird/home/username, our data is stored in /projects/NS9003K

  • Folders there are:

    • 454data, **runsIllumina, **runsPacbio -> From the Norwegian Sequencing Centre, do not touch
    • projects -> where you can store your files
  • In /projects/NS9003K/projects, please use the same foldername as you use on Abel in /projects/cees/in_progress. Add clear README files

To copy files/folders to NIRD

NOTES:

  • In general it is wise to copy data first, then delete the original. This prevent accidental data loss. Do not use the 'mv' command for big files!

  • Use rsync, it preserves permissions and timestamps, and allows for finishing an interrupted copy job without having to copy every file again.

  • Use 'screen' (Only possible with option 2)

  • See note on tarballs and md5 sums at this page

Option 1:

When you are logged in on NIRD:

cd /projects/NS9003K/projects/path/to/yourfolder
rsync -av cod5.uio.no:/projects/cees/in_progress/path/to/folder_to_copy

NOTE: The . at the end NOTE adding a trailing slash / to the folder_to_copy will only copy its content, not the whole folder!

Option 2:

When you are logged in on abel/cod nodes :

cd /projects/cees/in_progress/path/where/folder_to_copy/is
rsync -av folder_to_copy login.nird.sigma2.no:/projects/NS9003K/projects/path/to/yourfolder

NOTE: Adding a trailing slash / to the folder_to_copy will only copy its content, not the whole folder!