Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for solver returning NaN #163

Open
wants to merge 1 commit into
base: emmetfrancis/spatial-param
Choose a base branch
from

Conversation

emmetfrancis
Copy link
Collaborator

@emmetfrancis emmetfrancis commented Apr 10, 2024

Detect when dolfin.assemble_mixed returns nan and recompute (in smart.solvers)

Right now, this is only implemented in one place for the residual calculation that seemed to usually cause the issue, but there may be other times that dolfin.assemble_mixed could return nan as well.

@jorgensd
Copy link
Collaborator

I would be good to figure out the root cause of this. Do you have a minimal example?

@emmetfrancis
Copy link
Collaborator Author

emmetfrancis commented Apr 10, 2024

Agreed, I do not have a minimal working example yet, it is not clear why this returns NaN sometimes, but I was able to narrow it down to these lines. The strangest part is that this seems to occur non-deterministically, Henrik also observed this issue.

I can come back to this in the next couple days to hopefully put together a MWE, it's a confusing one... in the meantime feel free to let me know if you have additional thoughts. But we can wait to merge this I think.

@jorgensd
Copy link
Collaborator

Agreed, I do not have a minimal working example yet, it is not clear why this returns NaN sometimes, but I was able to narrow it down to these lines. The strangest part is that this seems to occur non-deterministically, Henrik also observed this issue.

I think there is a map from one mesh to another that is not created in time (where it should). I’ve seen this before and ive fixed those instances in the latest release of dolfin. Clearly there must be some cases ive missed.

@finsberg
Copy link
Collaborator

I agree with Jørgen we should try to find the root cause of this. My main concern with this fix is that there is a risk that the program enters an infinite loop that it cannot escape. If we go for this solution, there should be a guard against that, e.g that it tries N number of times, or for N seconds before it breaks out of the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants