Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Engine Crash Headache Medicine? #20174

Open
JustinPrivitera opened this issue Jan 6, 2025 · 2 comments
Open

Compute Engine Crash Headache Medicine? #20174

JustinPrivitera opened this issue Jan 6, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@JustinPrivitera
Copy link
Member

When compute engine crashes, why do we give up resources? Why not hold them and ask the user if they want to launch again with the existing allocation?

@JustinPrivitera JustinPrivitera added the enhancement New feature or request label Jan 6, 2025
@JustinPrivitera JustinPrivitera changed the title Compute Engine Headache Medicine? Compute Engine Crash Headache Medicine? Jan 6, 2025
@markcmiller86
Copy link
Member

I think the problem is that engine_par IS the job submitted to batch. So, when that process exits, the allocated resources are lost. We'd want to submit to batch an engine manager of sorts instead which can never exit unless requested and which can re-launch engines when they fail. The VisIt Component Launcher (VCL) probably comes close but may require modification to serve this role.

@markcmiller86
Copy link
Member

There are similar remarks/goals in #2214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants