Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk quota exceeded in directory ~/.singularity/cache/net when using NCI Gadi HPC Configuration #775

Closed
mathob opened this issue Oct 15, 2024 · 6 comments

Comments

@mathob
Copy link

mathob commented Oct 15, 2024

I am trying a workflow using the nci_gadi profile but am encountering a disk quota problem at the stage of pulling singularity images. At this point nextflow wants to use the directory ~/.singularity/cache for caching but my home directory filesystem is subject to a stringent disk space restriction. Setting the environment variable NXF_SINGULARITY_CACHEDIR to a directory under /scratch (not subject to the same space constraint) works insofar as images end up being saved there; however the singularity pull commands continue to use the directory ~/.singularity/cache/net to save temporary downloads which accumulate there until eventually I exceed my disk quota.

So my question is: how do I configure nextflow to use a directory other than ~/.singularity/cache/net for singularity pull commands? I had thought that environment variable SINGULARITY_TMPDIR (or maybe NXF_SINGULARITY_TMPDIR?) would be relevant but I have not figured out how make nextflow respect them.

My command is:

nextflow run nf-core/rnaseq --input SampleSheet.csv --outdir ${PWD}/results --gtf ${PWD}/mm10/gencode.vM10.primary_assembly.annotation.gtf --fasta ${PWD}/mm10/GRCm38.primary_assembly.genome.fa --aligner star_rsem -profile nci_gadi,singularity

Some log info is attached (username and project details obfuscated)
log.txt

The actual error message is:

  Failed to pull singularity image
    command: singularity pull  --name depot.galaxyproject.org-singularity-fastqc-0.12.1--hdfd78af_0.img.pulling.1728977625666 https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0 > /dev/null
    status : 255
    hint   : Try and increase singularity.pullTimeout in the config (current is "20m")
    message:
      INFO:    Downloading network image
      INFO:    Cleaning up incomplete download: /home/xxx/[myuser]/.singularity/cache/net/tmp_4260206683
      FATAL:   write /home/xxx/[myuser]/.singularity/cache/net/tmp_4260206683: disk quota exceeded

The ~/.singularity/cache/net directory fills up with files like:

 25M 87cf6042ef882b1562d5793dad2386a6256d3b7a4210d05002fa5b9133989fbf
 61M 9bcc502e675ace760d757418328c6526cd8afac7ea028e6cd2ebd99e73aa0ff9
168M tmp_215370473
128M tmp_3968314656

Thanks for any suggestions!

Matthew

@jfy133
Copy link
Member

jfy133 commented Oct 15, 2024

Hi @mathob

I'm not sure for sure, maybe @pontus might have a better idea, but Nextflow also has NXF_SINGULARITY_LIBRARYDIR as a singularity related variable, have you tried that? In addition to NXF_SINGULARITY_CACHEDIR

@pontus
Copy link
Contributor

pontus commented Oct 16, 2024

This one is SINGULARITY_CACHEDIR (without NXF_ which has a different function than NXF_SINGULARITY_CACHEDIR - SINGULARITY_CACHEDIR tells singularity where it should cache things like OCI layers and evidently nowadays use temporary files whereas NXF_SINGULARITY_CACHEDIR corresponds to singularity.cacheDir and tells nextflow where it should cache images.

So setting SINGULARITY_CACHEDIR before launching the monitoring process should help for containers resolved to https-urls at least.

I don't think nextflow will attempt to cache any docker:// URLs though, meaning if you run such a pipeline that will happen on compute node and possibly/probably not inherit SINGULARITY_CACHEDIR. If that is a problem it might need setting SINGULARITY_CACHEDIR in e.g. beforeScript.

If this is a system thing, it sounds like something that might make sense for the profile to do, but it would still be necessary to fix SINGULARITY_CACHEDIR before launching the main process to cover the cases where nextflow downloads for caching.

@qiyubio
Copy link
Contributor

qiyubio commented Oct 16, 2024

Defining export SINGULARITY_CACHEDIR = xxx in ~/.bashrc or adding env { SINGULARITY_CACHEDIR = xxx } in config file would both avoid the issue.

@pontus
Copy link
Contributor

pontus commented Oct 16, 2024

env will typically be too late.

@qiyubio
Copy link
Contributor

qiyubio commented Oct 16, 2024

Thanks for closing the knowledge gap, I guess adding setenv in module file is another option.

@mathob
Copy link
Author

mathob commented Oct 22, 2024

Thanks everyone, problem solved by using SINGULARITY_CACHEDIR (which I had confused with NXF_SINGULARITY_CACHEDIR). Thanks @pontus for explaining the difference.

@mathob mathob closed this as completed Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants