Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelizing peak counts on cosmoSLICS #29

Open
3 tasks done
lbaumo opened this issue Oct 10, 2023 · 1 comment
Open
3 tasks done

parallelizing peak counts on cosmoSLICS #29

lbaumo opened this issue Oct 10, 2023 · 1 comment

Comments

@lbaumo
Copy link
Collaborator

lbaumo commented Oct 10, 2023

Here are the steps we need to create the peak counts/multiscale peak counts/l1-norm data vectors on SLICS:

  • Write a function that inputs a single file, outputs the data vector

Run the following steps for every cosmology

  • Make a list of paths of all tiles for a given cosmology- this includes each seed, each line of sight, and each bin (we will probably just start with two lines of sight
  • Make a table with the cosmology number, seed, line of sight, bin, and data vector. For each file, fill in the table. I would use pool to parallelize over files and send each file to a different thread. make the number of threads an argument.

Now, make a script that generates job scripts for every cosmology. This will be a python script that produces bash scripts. I will provide a sample

@lbaumo
Copy link
Collaborator Author

lbaumo commented Feb 9, 2024

Here is a list of the functions we need to parallelize the simulation processing pipeline @AndreasTersenov

  • process_tile This function takes a file name and mass mapping method as input as well as an optional input that is the mass map output name. It will take the catalog and return the three summary statistics. This function contains three functions make_shear_map, make_mass_map, and summary_statistics.

  • process_footprint This function will wrap process_tile. It will either take a list of filenames with a given bin, seed and LOS as input or take a bin, seed, and LOS as input and produce a list of files. It will then create an output directory if we want to save the mass maps and use string manipulations to create a list of output map names. Finally, it will call process_tile parallelized over the number of tiles in a footprint and average over the results. The output is a vector of summary statistics with additional columns for the seed and LOS.

  • process_cosmo will loop over the tomographic bins. For each bin, it will loop over seeds and LOS and call process_footprint. The resulting ~10 row table constructed from the loop will be saved in a directory that specifies the run and mass mapping method with a filename like #cosmo_bin_method_run.npy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant