You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When launching an array of jobs on Slurm, Orion will cancel all subsequent trials when too many of them have failed and then displays a message explaining this. However, it is not very clear from this message that the user needs to take action at this point by removing the failed trials from the experiment.
A short message like: "you can call orion db rm <exp name> --status broken to correct this. Orion will cancel all further trials in this experiment until this is done."
I now know my way around this problem but it took me a while to understand what was going on.
The text was updated successfully, but these errors were encountered:
When launching an array of jobs on Slurm, Orion will cancel all subsequent trials when too many of them have failed and then displays a message explaining this. However, it is not very clear from this message that the user needs to take action at this point by removing the failed trials from the experiment.
A short message like: "you can call
orion db rm <exp name> --status broken
to correct this. Orion will cancel all further trials in this experiment until this is done."I now know my way around this problem but it took me a while to understand what was going on.
The text was updated successfully, but these errors were encountered: