Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor disk space usage of temporary files #3

Open
ChristopherWilks opened this issue Mar 24, 2020 · 0 comments
Open

Monitor disk space usage of temporary files #3

ChristopherWilks opened this issue Mar 24, 2020 · 0 comments

Comments

@ChristopherWilks
Copy link
Collaborator

This is not specifically a problem with the recount-pump workflow as it is with many of the compute environments it's run in.

MARCC (/dev/shm), AWS EC2 (using c5d instances with local NVMe's), and now Stampede2 (/tmp) all require most of the temporary files to be written to space-constrained filesystems, local to the node where the workflow is executing.

Since the temporary file size of any given run accession (job) varies widely, from 100's of MBs to many GBs, and our job scheduling is primitive, we need to track the temporary disk space usage of a given node while the workflow is running on it.

The problem primarily stems from the following aspects of the workflow:

  1. The need to efficiently utilize typical nodes by running concurrent processing jobs on them, which increases the total temporary space needed at any time
  2. Failed jobs leave temporary files behind
  3. Temporary files are typically large (FASTQs/BAMs)

We therefore need a way to 1) track temporary disk space usage and 2) continuously monitor for and remove failed jobs' temporary files, to at least partially address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant