Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting Memory errors #537

Open
falkamelung opened this issue Nov 23, 2023 · 0 comments
Open

Troubleshooting Memory errors #537

falkamelung opened this issue Nov 23, 2023 · 0 comments

Comments

@falkamelung
Copy link
Member

Occasionally you may get an Out of Memory error in both minsar (rsmas_insar) and miaplpy. This happens because the estimation of the required memory is not correct (the memory requirements for each run_step are indefaults/job_defaults.cfg). The job_submission.py script uses this file to estimate how many jobs can be simultaneously run on one node and creates the run_files accordingly. In this example it thinks that there is enough memory to run 15 jobs on this node:

wc -l run_05_miaplpy_unwrap_ifgram_0
15 run_05_miaplpy_unwrap_ifgram_0

If you can't reduce the memory requirement by changing parameters in the *template file (e.g. more looks for isce processing) or a smaller miaplpy.subset area for MiaplPy, recreate the job files using a higher value for numMemoryUnits:

job_submission.py --template $TE/MiamiTsxSMDT36.template run_05_miaplpy_unwrap_ifgram --outdir run_files --numMemoryUnits 2 --writeonly

The original job_submission.py command is in the log file for minsar and displayed to the screen by miaplpyApp.py --jobfiles.

You can check memory usage on the compute node using free -h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant