Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cumulative flow states #1819

Merged
merged 26 commits into from
Sep 19, 2024
Merged

cumulative flow states #1819

merged 26 commits into from
Sep 19, 2024

Conversation

SouthEndMusic
Copy link
Collaborator

Fixes #1808

@SouthEndMusic SouthEndMusic marked this pull request as draft September 11, 2024 12:04
@SouthEndMusic SouthEndMusic changed the title Integrated flow states cumulative flow states Sep 11, 2024
@SouthEndMusic
Copy link
Collaborator Author

SouthEndMusic commented Sep 16, 2024

Finally all tests are green again after the refactor. It was kind of a pain to make all the book-keeping work out (quite a bit changed in that regard).

Findings

These findings are based on running the HWS model as taken from the regression tests.

Water balance

Let's start with the best result: for the HWS model the largest absolute water balance error is 8.83354278613524e-10.

This means that per basin per day, less than a μL is not accounted for.

Note that this balance error is independent of the solver tolerances, the water balance is inherently closed due to the problem formulation. The errors left are just (accumulated) floating point errors.

Profile

The refactor didn't introduce any glaring performance problems in the code:

image

States and Jacobian sparsity

The state vector is still a component vector, with these components:

julia> only(getfield(model.integrator.u, :axes))
Axis(tabulated_rating_curve = 1:11, pump = 12:40, outlet = 41:101, user_demand_inflow = 102:127, user_demand_outflow = 128:153, linear_resistance = 154:153, manning_resistance = 154:239, evaporation = 240:376, infiltration = 377:513, integral = 514:522)

i.e. sorted by node type. This means that we can find convergence bottlenecks in a more fine-grained way than before.

Considerations for the states:

  • Each conservative node and not exactly integrable boundary node has a state. So FlowBoundary doesn't need a state, and UserDemand needs states for inflow and outflow separately due to the time-dependent return factor.
  • Also precipitation and drainage don't need their own states, as these are exactly integrable (precipitation uses a fixed area).
  • evaporation and infiltration now have separate states, a bit more on that later.

The Jacobian sparsity looks like this:

plot

Which admittedly looks quite cool, but there are now many more states (going from 137 to 522) and more non-zeros in the Jacobian. To give an indication on how the Jacobian sparsity comes to be: say we have the following model section.

model_detail

The state for the manning flow depends on the forcing states of the connected basins, but also the flow states of all nodes connected to these basins. That is because the manning flow depends on the basin levels, which depend on the basin storages, which depend on the cumulative flows over all flow edges connected to the basins.

Performance

As expected, the performance takes a hit. The runtime of the HWS model:

  • On main: 104.484030 seconds (63.46 M allocations: 10.354 GiB, 0.65% gc time, 0.01% compilation time) (I thought the HWS model was much faster on main?)
  • On this branch: 266.583831 seconds (441.16 M allocations: 104.220 GiB, 1.53% gc time)

I had hoped I could reduce the number of states by clumping evaporation and infliltration together into a single state. The problem with that is that the contribution of these forcings over a timestep can then not exactly be split into an evaporation and an infiltration contribution. The contribution of this state of combined forcings over a timestep $\Delta t = t_1 - t_0$ would be:

$$ \Delta V = \text{infiltration} \cdot \int_{t_0}^{t_1} \phi\left(d(u(t')); 0.1\right) \text{d}t' + \text{evaporation} \cdot \int_{t_0}^{t_1} A\left(u(t')\right)\phi\left(d(u(t');0.1\right)\text{d}t'. $$

where $\phi$ is the reduction factor. Both integrals are unknowns (except when the basin is not in the 'reduction factor regime'), so it cannot be computed exactly how $\Delta V$ is split up into the infiltration and evaporation contributions. If we find the performance more important than this exact split, we could approximate it. Note that this approximation does not affect the water balance, because we still know exactly how much water is leaving the basin. Maybe that is not so bad as the reduction factor is a coarse approximation of what happens when a basin dries out anyway.

@visr
Copy link
Member

visr commented Sep 16, 2024

I made a plot of the "Test Delwaq coupling" artifact "delwaq_map.nc" variable "ribasim_Continuity", which should be as close to 1 as possible, comparing this branch (new) to the current main (old).

image

So that is very encouraging. (note the +1 on the y axis, so 0 is 1)

import pandas as pd
import xarray as xr

ds_old = xr.open_dataset(r"main\delwaq_map.nc")
da_old = ds_old["ribasim_Continuity"]

ds_new = xr.open_dataset(r"integrated-flow-states\delwaq_map.nc")
da_new = ds_new["ribasim_Continuity"]

df_old = da_old.sel(ribasim_nNodes=3).to_dataframe()["ribasim_Continuity"].rename("old")
df_new = da_new.sel(ribasim_nNodes=3).to_dataframe()["ribasim_Continuity"].rename("new")

df = pd.concat([df_old, df_new], axis=1)
df.plot()

@Huite
Copy link
Contributor

Huite commented Sep 16, 2024

Looks very promising! The robustness + simplicity with regards to the water balances makes me think this is well worth it.

Which admittedly looks quite cool, but there are now many more states (going from 137 to 522) and more non-zeros in the Jacobian.

You can reduce bandwidth with algorithms like: https://en.wikipedia.org/wiki/Cuthill%E2%80%93McKee_algorithm

I'm not sure this is relevant: maybe the solver machinery already does things like this. Second, given the model sizes, a lot of data may fit wholly within CPU caches, so better bandwidth may have no meaningful effects on memory access patterns.

@SouthEndMusic
Copy link
Collaborator Author

SouthEndMusic commented Sep 16, 2024

You can reduce bandwidth with algorithms like: https://en.wikipedia.org/wiki/Cuthill%E2%80%93McKee_algorithm

@Huite I have dabbled in that before. As far as I'm aware, the goal of reducing the bandwidth is reducing fill-in in the Jacobian factorization. I looked a little bit in what OrdindaryDiffEq.jl does there and the factors I found were quite sparse, so there's probably already some hidden cleverness there

EDIT: This is already part of the matrix factorization: https://dl.acm.org/doi/10.1145/992200.992206

docs/concept/modelconcept.qmd Outdated Show resolved Hide resolved
docs/concept/equations.qmd Outdated Show resolved Hide resolved
docs/concept/equations.qmd Outdated Show resolved Hide resolved
docs/concept/equations.qmd Outdated Show resolved Hide resolved
docs/concept/equations.qmd Outdated Show resolved Hide resolved
core/src/callback.jl Outdated Show resolved Hide resolved
core/src/solve.jl Outdated Show resolved Hide resolved
core/src/solve.jl Outdated Show resolved Hide resolved
core/src/solve.jl Outdated Show resolved Hide resolved
core/src/solve.jl Outdated Show resolved Hide resolved
@SouthEndMusic
Copy link
Collaborator Author

I fixed the flaky tests, turns out there was a dictionary involved in setting up the edge data in the flow table which messed up the ordering.

@SouthEndMusic SouthEndMusic merged commit e2a66ab into main Sep 19, 2024
25 of 27 checks passed
@SouthEndMusic SouthEndMusic deleted the integrated_flow_states branch September 19, 2024 12:07
visr added a commit that referenced this pull request Oct 1, 2024
The coupler is switching to an approach where they don't need to reset
the cumulatives to 0. Since this complicated the Ribasim code, let's get
rid of it.

The reason for the coupler to move away from resetting cumulatives is
that this possibly broke with #1819.

The dedicated BMI arrays like `vertical_flux_bmi` and
`user_demand.realized_bmi` are removed. The `BMI.get_value_ptr` now just
points to the cumulative state, or if it is not a state (drainage,
precipitation) to `basin.cumulative_drainage` or
`basin.cumulative_precipitation`.

This is breaking because I also renamed these BMI variables to be more
consistent with the current code, where we use the term cumulative
internally.

```
user_demand.realized -> user_demand.inflow
basin.drainage_integrated -> basin.cumulative_drainage
basin.infiltration_integrated ->  basin.cumulative_infiltration
```
visr added a commit that referenced this pull request Oct 7, 2024
With #1819 the state goes further away from 0 with time. That meanse we rely more on the relative tolerance than the aboslute tolerance.

Reducing the default `reltol` by two orders of magnitude is enough to get only negligible leaky Terminals with occasional very brief leaks of 1e-8 m3/s. It is also enough to minimize the infiltration error to acceptable levels as can be seen in #1863 (comment).

For HWS this slows down simulation from 9.1s to 12.3s, which isn't too bad for a two orders of magnitude reduction in relative tolerance.

https://docs.sciml.ai/DiffEqDocs/stable/basics/faq/#What-does-tolerance-mean-and-how-much-error-should-I-expect
visr added a commit that referenced this pull request Oct 8, 2024
With #1819 the state goes further away from 0 with time. That means we
rely more on the relative tolerance than the absolute tolerance, see the
[SciML
docs](https://docs.sciml.ai/DiffEqDocs/stable/basics/faq/#What-does-tolerance-mean-and-how-much-error-should-I-expect)
on the topic. Hence I found that reducing the `abstol` had no effect,
but reducing the default `reltol` by two orders of magnitude is enough
to get only negligible leaky Terminals with occasional very brief leaks
of 1e-8 m3/s. It is also enough to minimize the infiltration error to
acceptable levels as can be seen in
#1863 (comment).

~~For HWS this slows down simulation from 9.1s to 12.3s, which isn't too
bad for a two orders of magnitude reduction in relative tolerance.~~
Rechecking HWS performance gives a speedup! 12.4s for 1e-8, 8.1s for
1e-7, 10.2s for 1e-6. So 1e-7 seems optimal. Not sure what went wrong
last time, but this seems pretty consistent. This is with both absolute
and relative set to the same number.

Fixes #1851
Fixes #1863
@visr visr mentioned this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Core refactor: integrated flows as states instead of storages
3 participants