Efficiency of catalogue compression script #129

robjmcgibbon · 2025-01-15T12:57:57Z

We have a script for compressing SOAP catalogues. It consists of two parts. Firstly all the datasets in the catalogue are compressed and written to a temporary output files. This part is done in parallel. Then a single process copies all the datasets from the temporary files into the final output file.

The script is written with MPI. I did try to get multiple ranks to write to the output file, but there were complications due to the fact that the datasets use the custom SWIFT lossy compression filters. Option one would be to try and get this to work.

At second option would be to run the first part of the script on a compute node (where we can make use of all the cores), and then run the second part on a login node. If we went for this option I would add an argument to the compression script which specifies whether to run the first part, second part, or both. I'm not sure how many people would actually bother to use this option though, which is why I haven't implemented it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiency of catalogue compression script #129

Efficiency of catalogue compression script #129

robjmcgibbon commented Jan 15, 2025

Efficiency of catalogue compression script #129

Efficiency of catalogue compression script #129

Comments

robjmcgibbon commented Jan 15, 2025