Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resilience to "soft" errors #5

Open
JavierGOrdonnez opened this issue Sep 18, 2024 · 3 comments
Open

Resilience to "soft" errors #5

JavierGOrdonnez opened this issue Sep 18, 2024 · 3 comments
Assignees

Comments

@JavierGOrdonnez
Copy link

JavierGOrdonnez commented Sep 18, 2024

User story

As a user, I would like to be able to iterate on a failing dakota.in file (and osparc template) until it runs successfully.

Current behaviour

Dakota Service returns an error and is not reactive to further inputs (see this line).

Although a dakota.in could be iterated on on a local setup to avoid this issue, this poses two complications:

  • There is no access to the underlying OSPARC template - thus error there or in the Dakota-template interface can not be debugged there.
  • Looking towards the future, non-expert users should be able to perform all debugging and iterations directly in oSPARC and this process should hopefully be as smooth as possible.

With "soft errors" I refer to everything of the form:

 DakotaService: [osparc-meta-dakota:0.1.0] Traceback (most recent call last):
DakotaService: [osparc-meta-dakota:0.1.0]   File "/docker/dakota-start.py", line 55, in main
DakotaService: [osparc-meta-dakota:0.1.0]     dakota_service.start()
DakotaService: [osparc-meta-dakota:0.1.0]   File "/docker/dakota-start.py", line 120, in start
DakotaService: [osparc-meta-dakota:0.1.0]     self.start_dakota(dakota_conf, self.output0_dir_path)
DakotaService: [osparc-meta-dakota:0.1.0]   File "/docker/dakota-start.py", line 166, in start_dakota
DakotaService: [osparc-meta-dakota:0.1.0]     study.execute()
DakotaService: [osparc-meta-dakota:0.1.0] RuntimeError: Dakota aborted: Unknown error 252
...

e.g. everything being correctly handled by the except statement mentioned above. Hard errors (e.g. the script failing somewhere else) are out of scope.

Desired behaviour

Such errors should be logged same as now, but then the script returns to the state in line 55 - e.g. DakotaService.start() is executed again.

The DakotaService object should be the same (so that no new handshake is needed) and register which is the input file that gave the error, and only proceed to execution if a new dakota.in is sent.

PS I ignore if the sidecar repeatedly copies the Notebook output to the Dakota input, or only when such output has changed. That will affect how the "new" dakota.in detection should be carried out - either by file information, watchdog, or file contents.

@wvangeit
Copy link
Collaborator

Did you try on your own machine to see if this behavior actually works in dakota?
I changed the service code, but it seems the dakota python process, as I kind of was afraid of, is not able to recover from an error.

@JavierGOrdonnez
Copy link
Author

I can test the dakota-itis wheel, but not the oSPARC service (I dont deploy locally) nor the interface with python (which is to be handled by the ParallelRunner). I would be interested to see your setup, and maybe I can investigate it myself as well. Thank you.

@wvangeit
Copy link
Collaborator

wvangeit commented Sep 20, 2024

It has nothing to do with the service itself though. My question was if you tried it locally with the dakota wheel. It seems dakota python can't recover from these errors. (unless you found a way around it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants