Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly interpolate seasons in Grouper #2019

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

saschahofmann
Copy link
Contributor

@saschahofmann saschahofmann commented Dec 11, 2024

Pull Request Checklist:

What kind of change does this PR introduce?

This PR adds a line to correctly interpolate seasonal values. It also changes the test_timeseries function that now accepts a calendar argument instead of cftime. Not providing it or providing None is equivalent to cftime=False and calendar='standard to the previous cftime=True. This allows for testing different calendar implementations e.g. 360_day calendars

@github-actions github-actions bot added the sdba Issues concerning the sdba submodule. label Dec 11, 2024
@saschahofmann
Copy link
Contributor Author

I just realised that the factor of 1/6 is assuming that all seasons have the same length which in gregorian calendars is not necessarily true but I am not sure it matters too much at least the function should be smooth.

@saschahofmann
Copy link
Contributor Author

Just to prove that this leads to a smooth result, same input as in the issue:
image

@Zeitsperre Zeitsperre requested a review from aulemahal December 11, 2024 16:53
@Zeitsperre Zeitsperre added bug Something isn't working standards / conventions Suggestions on ways forward labels Dec 11, 2024
Copy link

Warning

This Pull Request is coming from a fork and must be manually tagged approved in order to perform additional testing.

@saschahofmann
Copy link
Contributor Author

Weirdly and contrary to what I showed yesterday, today I am still getting clear transitions as if there still wasn't any linear interpolation.

@Zeitsperre
Copy link
Collaborator

@saschahofmann We recently changed the layout of xclim to use a src structure. It might be worthwhile to try reinstalling the library.

@Zeitsperre Zeitsperre mentioned this pull request Dec 12, 2024
5 tasks
@github-actions github-actions bot added the docs Improvements to documenation label Dec 12, 2024
@saschahofmann
Copy link
Contributor Author

I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look?

@Zeitsperre
Copy link
Collaborator

I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look?

Could it be that you have obsolete __pycache__ folders still among your cloned folders? @coxipi is looking into recreating your example based on your branch for validation, but if the tests are working as intended on CI, then it's likely a caching/installation issue.

@coxipi
Copy link
Contributor

coxipi commented Dec 13, 2024

I managed to install the environment, for some reason I only had the branch "main" when I cloned the fork yesterday

  • I confirmed that the function has the appropriate modifications inside the notebook I'm using
import inspect
print(inspect.getsource(sdba.base.Grouper.get_index))
  • I also find that the interpolation is wrong.

I'll try to have look later. Maybe the interp boolean condition is not triggered properly?

@saschahofmann
Copy link
Contributor Author

I am pretty sure that the get_index function is updated in my notebook. Either I am wrong in expecting a smoother result (it seems to have changed slightly to what I got earlier) or there is something else going on. I will keep investigating

@coxipi
Copy link
Contributor

coxipi commented Dec 16, 2024

It's simply interp which can't be "nearest", otherwise no interpolation takes place ... I think our only other option is linear.

from xclim import sdba
QM = sdba.EmpiricalQuantileMapping.train(
    ref, hist, nquantiles=15, group="time.season", kind="+"
)

scen = QM.adjust(sim, extrapolation="constant", interp="nearest")
scen_interp = QM.adjust(sim, extrapolation="constant", interp="linear")
outd = {
    "Reference":ref,
    "Model - biased":hist,
    "Model - adjusted - no interp":scen, 
    "Model - adjusted - linear interp":scen_interp, 
}
for k,da in outd.items(): 
    da.groupby("time.dayofyear").mean().plot(label=k)
plt.legend()

image

This doesn't reproduce your figure however. It seems your figure above was matching the reference very well, better than what I have even with the linear interpolation. But it does get rid of obvious discontinuities.

@coxipi
Copy link
Contributor

coxipi commented Dec 16, 2024

There is clearly something wrong going on. Comparing
hist - scen_month
scen_time - scen_month
scen_season - scen_month
scen_month - scen_month

scen_season is way off

image

@saschahofmann
Copy link
Contributor Author

@coxipi I think only mention this in the original issue: my analysis is done with QuantileDeltaMapping instead of EmpiricalQuantileMapping. Here the equivalent chart to yours for that:
image
season still seem kinda weird

@saschahofmann
Copy link
Contributor Author

saschahofmann commented Dec 18, 2024

A similar trend becomes apparent when looking at the adjusted - historical (now for EmpiricalQuantileMapping)
image

@coxipi
Copy link
Contributor

coxipi commented Dec 18, 2024

Yes, I have seen simlilar things by playing with the choice of how get_index. I feel this should not be this sensitive. Let me try and get this back

@saschahofmann
Copy link
Contributor Author

Ah of course I saw the OR but not !='nan' 🤦 .

Indeed this resolves EQM:
image

I am quite surprised that there are no more spikes in the nearest after all I would have thought that this would kinda be expected.

@coxipi
Copy link
Contributor

coxipi commented Jan 7, 2025

I am quite surprised that there are no more spikes in the nearest after all I would have thought that this would kinda be expected.

I see your point. Maybe the sufficiently high number of quantiles (20 or 50) and the fact that you average over 15 years is enough to make this smooth. If you look directly at the time series, the "nearest" should be less smooth?

Anyways, great work, thanks a lot!

@saschahofmann
Copy link
Contributor Author

I am also finally getting smooth results for QDM 🎉 :
image

and one of our other checks also finally looks as expected:
image

I officially declare this the greatest detective work since Sherlock Holmes solved the Mistery of the hound of Baskerville 😂
Will keep playing around with this but I think this might be ready now!

@coxipi
Copy link
Contributor

coxipi commented Jan 7, 2025

Ah of course I saw the OR but not !='nan' 🤦 .

I made the same error above when I commented the extrapolate haha... maybe I influenced your reading

@saschahofmann
Copy link
Contributor Author

I am quite surprised that there are no more spikes in the nearest after all I would have thought that this would kinda be expected.

I see your point. Maybe the sufficiently high number of quantiles (20 or 50) and the fact that you average over 15 years is enough to make this smooth. If you look directly at the time series, the "nearest" should be less smooth?

Anyways, great work, thanks a lot!

Hm I am looking at timeseries but looks smooth as well:
image

but I won't complain about a graph looking smoother than expected.

@coxipi
Copy link
Contributor

coxipi commented Jan 7, 2025

Hum, in the QDM case, the linear interpolation seems to have some issues?

@saschahofmann
Copy link
Contributor Author

Dang now I see it too somehow I was focusing on the nearest. Let's see.

@saschahofmann
Copy link
Contributor Author

saschahofmann commented Jan 7, 2025

Ok here the linearly interpolated afs for QDM:
image

as comparison nearest:
image

I believe this is due to the problem I mentioned in December and better summarised by you

Gonna think about this tomorrow.

@saschahofmann
Copy link
Contributor Author

Do you have any resources on someone else doing this? The originally paper Cannon et al. (2015 ) doesn't seem to look at monthly or seasonal adjustments/ the only thing I could find was them saying to use a sliding window:

To correct biases in the seasonal cycle, the quantile mapping algorithms are applied to pooled daily data falling within sliding 3-month windows centered on the month of interest. For example, correction of data from December would include days from November to January, correction of data from January would include days from December to February, and so forth. Time-dependent means and empirical quantiles of projected data are calculated over 30-yr sliding windows centered on the year of interest

@coxipi
Copy link
Contributor

coxipi commented Jan 8, 2025

No, unfortunately, I'm searching right now. @aulemahal , Sascha fixed one problem, in the extrapolation, it was assumed that values of season/month would start at 0, but for season with periodic condition, it can go below zero, this needed a change.

But now, we have the problem I describe here, e.g.:

To give a concrete example, consider tasmax on Feb.28. Its rank is computed in DJF. But if we included it in MAM, it would be a lower rank. We are saying that we have an interpolated continuous function that we want to apply on those ranks, but these values are still segmented, there are four groups of ranks in a year.

Do you agree this is a problem? Do you know if people explored specifically the use of QDM with seasons in the litterature?

@saschahofmann
Copy link
Contributor Author

I was also thinking the other option for a fix for the extrapolation without needing to change the extrapolate function is changing the mapping of the seasons to start at 1 (so that cyclic_bounds would add 0 and 5). Not sure which of the two you prefer. I think the current fix might be more robust to future changes because it simply uses the coordinate values.

@aulemahal
Copy link
Collaborator

Do you know if people explored specifically the use of QDM with seasons in the litterature?

I don't think I have ever read such a paper (neither for QDM nor for any other QMs). Maybe @huard remembers if we had sources in mind when implementing it ?

My "fear" is that we implemented it because it was possible, because time.dt.season existed in the code.

@saschahofmann
Copy link
Contributor Author

saschahofmann commented Jan 8, 2025

The same problem exists for months. The difference in quantiles might be smaller but you can easily see it in the interpolated af:
image

@saschahofmann
Copy link
Contributor Author

I am moving the discussion to #2048 and will leave the issue #2014 open for now. As discussed in the development meeting I will add warnings that link to that discussion. Where should I place these warnings?

I guess I could raise a logging.warning in `qdm_adjust', when grouping is not None and interp='linear'? I was thinking to also put one in the docs, but I am not sure what would be the right place to do that.

Once this is done, I suggest we merge this in since all changes still apply IMO.

@saschahofmann
Copy link
Contributor Author

I discovered another issue when running QDM with a 360_day calendar. To reproduce you can just convert ref and hist with .convert_calendar('360_day', align_on='year').

One of the issues is related to @aulemahal PR #2038, here I am using a different approach.
Could someone check changes in base.py get_coordinates and in utils.pyL478?

@saschahofmann
Copy link
Contributor Author

And another fix for using sdba.Scaling in utils.pyL222

src/xclim/sdba/base.py Outdated Show resolved Hide resolved
@saschahofmann
Copy link
Contributor Author

Merged @aulemahal fix and this is now ready for merging

@Zeitsperre Zeitsperre requested a review from aulemahal January 21, 2025 14:39
Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

I think we can avoid a breaking change in test_timeseries.

@@ -475,9 +477,10 @@ def interp_on_quantiles(
return out

if prop not in xq.dims:
xq = xq.expand_dims({prop: group.get_coordinate()})
prop_coords = group.get_coordinate(newx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember this code enough, but is there a possibility to have prop in xq.dims but not prop in yq.dims ? That would make the code fail as prop_coords is declared only in the first if .

Comment on lines +211 to +212
calendar : str or None
Whether to use a calendar. If a calendar is provided, cftime is used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this change, but however this function is public... Could we avoid the breaking change by simply adding calendar as a new argument, keeping cftime ?

To avoid a breaking change and avoid having to pass two arguments in the new state, I suggest we simply ignore cftime when calendar is given ?

I would also mention this change in the changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docs Improvements to documenation sdba Issues concerning the sdba submodule. standards / conventions Suggestions on ways forward
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants