Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes temp directory and bigmem issues with kestrel runs #438

Merged
merged 6 commits into from
Apr 16, 2024

Conversation

ChristopherCaradonna
Copy link

@ChristopherCaradonna ChristopherCaradonna commented Feb 29, 2024

Fixes # .

Pull Request Description

This fixes Kestrel run errors related the existing files in the common temp directory on local node storage. Eagle assumes there is always local storage, and sets the TMPDIR to /tmp/scratch/$USER. Kestrel always assumes there is no local disk, and sets TMPDIR to /tmp/scratch. Existing files from other users on /tmp/scratch cause buildstock batch to fail.

This fix ensures the temporary directory is set to /tmp/scratch/$USER, avoiding file conflicts.

Checklist

Not all may apply

  • Code changes (must work)
  • Tests exercising your feature/bug fix (check coverage report on Checks -> BuildStockBatch Tests -> Artifacts)
  • Coverage has increased or at least not decreased. Update minimum_coverage in .github/workflows/coverage.yml as necessary.
  • All other unit and integration tests passing
  • Update validation for project config yaml file changes
  • Update existing documentation
  • Run a small batch run on Kestrel/Eagle to make sure it all works if you made changes that will affect Kestrel/Eagle
  • Add to the changelog_dev.rst file and propose migration text in the pull request

Copy link
Member

@asparke2 asparke2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real solution may be to get HPC to change the file permissions on /tmp/scratch, but this works in the meantime. @rHorsey suggested doing this via ENV in the kestrel shell scripts because then it doesn't affect Eagle.

Copy link

github-actions bot commented Feb 29, 2024

File Coverage
All files 87%
base.py 92%
exc.py 57%
hpc.py 78%
local.py 70%
postprocessing.py 85%
utils.py 92%
cloud/docker_base.py 88%
sampler/base.py 79%
sampler/downselect.py 33%
sampler/precomputed.py 93%
sampler/residential_quota.py 61%
test/shared_testing_stuff.py 85%
test/test_docker.py 33%
test/test_local.py 97%
test/test_validation.py 97%
workflow_generator/base.py 90%
workflow_generator/commercial.py 53%
workflow_generator/residential_hpxml.py 86%

Minimum allowed coverage is 33%

Generated by 🐒 cobertura-action against 1748ec9

Copy link
Member

@nmerket nmerket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Add and entry to the changelog and I'll merge it when you're ready.

nmerket and others added 4 commits March 11, 2024 09:53
Updates postprocessing memory default to require less than the 250,000 MB of memory available on the standard partition nodes because Kestrel only has 8 bigmem nodes.
Specify memory for sampling and simulation jobs to avoid issue with default jobs going into bigmem queue
@nmerket nmerket changed the title Fixes temp directory issues with kestrel runs Fixes temp directory and bigmem issues with kestrel runs Mar 20, 2024
@nmerket
Copy link
Member

nmerket commented Apr 15, 2024

Are we feeling like this is ready to merge? I'd like to get this into a release so people can benefit from these bugfixes.

@asparke2
Copy link
Member

@nmerket yeah, it's good to go, we used it for the Com EUSS runs

@nmerket nmerket merged commit 670ad58 into develop Apr 16, 2024
6 checks passed
@nmerket nmerket deleted the ccaradon/kestrel_tmp_dir_fix branch April 16, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants