Ergonomic improvements to concatenation scripts #403

mabruzzo · 2024-07-03T17:25:43Z

This introduces a few changes to the concatenation scripts. These are all surface-level changes (they don't affect the output format -- that is still something I want to address).

There are essentially 3 changes:

first, I introduced a new script called concat.py that can be used to concatenate more than one kind of dataset at a time (3D-cubes or multiple kinds of 2D datasets).
next, I replaced --concat-outputs with --snaps (this is a subjective improvement)

The main differenece has to do with specifying ranges of snapshots with --snaps is of the form START:STOP[:STEP] (just like a python slice). For comparison, you use would specify a range with FIRST-LAST with --concat-outputs (LAST is included in the range)
You can still use --concat-outputs, but you get a deprecation warning
I definitely want to hear if people dislike this approach

finally, I made it possible to automatically detect the number of files that need to be concatenated. You can still specify --num-processes, but you get a warning about it.

I also haven't really touched the particle-concatentation script. Based on how code is shared, change number 2 will affect that script. But propagating change 3 would take some additional work.

I'm definitely open to any kind of feedback!

In a subsequent PR, it's my intention to modify things so that we can optionally use multiprocessing or mpi4py to speed up concatenation (I have an idea on how to do that in a minimally invasive manner that won't affect hypothetical dask-usage).

evaneschneider · 2024-07-03T18:16:50Z

This all seems fine to me, but I'd like @helenarichie or @bcaddy to take a look, since I think they are the main ones who have been using the scripts in their existing form.

helenarichie · 2024-07-03T20:43:03Z

I think this looks good! I like having the default behavior be that you don't have to specify the number of processes. It might be nice to also do that with the snapshot numbers at some point. I also don't have any issues with changing the way the snapshot numbers are specified.

I am having a little bit of trouble figuring out which arguments are required/what the default behavior is and was wondering if you could clarify. For example, if I run this and my input directory contains 2D and 3D snapshots will it just go ahead and concatenate all of them unless I tell it to omit certain types of output? Sorry if I'm missing something obvious!

It would also probably be good to update the documentation with these changes.

mabruzzo · 2024-07-03T21:38:01Z

I think this looks good! I like having the default behavior be that you don't have to specify the number of processes.

And it is less error-prone!

It might be nice to also do that with the snapshot numbers at some point.

I think that could definitely be nice! One of my goals here is achieving more sensible default behavior with fewer arguments (along those lines, it could be cool to do something along those lines with the -s or -o flag)

My next primary concern is introducing some level of parallelism.

I am having a little bit of trouble figuring out which arguments are required/what the default behavior is and was wondering if you could clarify. For example, if I run this and my input directory contains 2D and 3D snapshots will it just go ahead and concatenate all of them unless I tell it to omit certain types of output? Sorry if I'm missing something obvious!

The behavior of concat_3d_data.py and concat_2d_data.py are unchanged in this regard.

For concat.py you need to opt in for each type of output with the --kind parameter. Consequently,

--kind 3D just does 3D data
--kind slice just does slices (--kind slice-xy,xz will omit data from the yz datasets).
--kind proj just does axis-aligned projections
--kind rot_proj just does rotated projections
--kind 3D slice proj rot_proj would do all of the above

Once we add support for concatenating particles, I might support --all-kinds to handle all types of supported datasets)

It would also probably be good to update the documentation with these changes.

I'm happy to do that. I might hold off until I have your approval/this is merged.

bcaddy · 2024-07-03T21:40:12Z

This is great. I'm totally for it

helenarichie · 2024-07-03T22:25:33Z

Ah, I see. I had gotten myself a little confused before... I appreciate the clarification!

I think this is good to go!

evaneschneider · 2024-07-08T14:57:16Z

I'll go ahead and merge this, @mabruzzo feel free to update the wiki when you get a chance.

mabruzzo added 3 commits July 3, 2024 11:46

introduce python_scripts/concat.py

5553df4

auto-detecting the number of files to concatenate.

e5240da

fixing various issues.

dd5b463

helenarichie approved these changes Jul 3, 2024

View reviewed changes

Merge branch 'dev' into ergonomic_concat

aaf784b

evaneschneider merged commit 43f381b into cholla-hydro:dev Jul 8, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ergonomic improvements to concatenation scripts #403

Ergonomic improvements to concatenation scripts #403

mabruzzo commented Jul 3, 2024 •

edited

Loading

evaneschneider commented Jul 3, 2024

helenarichie commented Jul 3, 2024 •

edited

Loading

mabruzzo commented Jul 3, 2024

bcaddy commented Jul 3, 2024

helenarichie commented Jul 3, 2024

evaneschneider commented Jul 8, 2024

Ergonomic improvements to concatenation scripts #403

Ergonomic improvements to concatenation scripts #403

Conversation

mabruzzo commented Jul 3, 2024 • edited Loading

evaneschneider commented Jul 3, 2024

helenarichie commented Jul 3, 2024 • edited Loading

mabruzzo commented Jul 3, 2024

bcaddy commented Jul 3, 2024

helenarichie commented Jul 3, 2024

evaneschneider commented Jul 8, 2024

mabruzzo commented Jul 3, 2024 •

edited

Loading

helenarichie commented Jul 3, 2024 •

edited

Loading