Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency of catalogue compression script #129

Open
robjmcgibbon opened this issue Jan 15, 2025 · 0 comments
Open

Efficiency of catalogue compression script #129

robjmcgibbon opened this issue Jan 15, 2025 · 0 comments

Comments

@robjmcgibbon
Copy link
Collaborator

We have a script for compressing SOAP catalogues. It consists of two parts. Firstly all the datasets in the catalogue are compressed and written to a temporary output files. This part is done in parallel. Then a single process copies all the datasets from the temporary files into the final output file.

The script is written with MPI. I did try to get multiple ranks to write to the output file, but there were complications due to the fact that the datasets use the custom SWIFT lossy compression filters. Option one would be to try and get this to work.

At second option would be to run the first part of the script on a compute node (where we can make use of all the cores), and then run the second part on a login node. If we went for this option I would add an argument to the compression script which specifies whether to run the first part, second part, or both. I'm not sure how many people would actually bother to use this option though, which is why I haven't implemented it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant