Monitor disk space usage of temporary files #3

ChristopherWilks · 2020-03-24T17:00:59Z

This is not specifically a problem with the recount-pump workflow as it is with many of the compute environments it's run in.

MARCC (/dev/shm), AWS EC2 (using c5d instances with local NVMe's), and now Stampede2 (/tmp) all require most of the temporary files to be written to space-constrained filesystems, local to the node where the workflow is executing.

Since the temporary file size of any given run accession (job) varies widely, from 100's of MBs to many GBs, and our job scheduling is primitive, we need to track the temporary disk space usage of a given node while the workflow is running on it.

The problem primarily stems from the following aspects of the workflow:

The need to efficiently utilize typical nodes by running concurrent processing jobs on them, which increases the total temporary space needed at any time
Failed jobs leave temporary files behind
Temporary files are typically large (FASTQs/BAMs)

We therefore need a way to 1) track temporary disk space usage and 2) continuously monitor for and remove failed jobs' temporary files, to at least partially address this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor disk space usage of temporary files #3

Monitor disk space usage of temporary files #3

ChristopherWilks commented Mar 24, 2020

Monitor disk space usage of temporary files #3

Monitor disk space usage of temporary files #3

Comments

ChristopherWilks commented Mar 24, 2020