Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation on Time-Series with all NaN targets #12

Open
liam-sbhoo opened this issue Jan 16, 2025 · 1 comment
Open

Evaluation on Time-Series with all NaN targets #12

liam-sbhoo opened this issue Jan 16, 2025 · 1 comment

Comments

@liam-sbhoo
Copy link
Contributor

liam-sbhoo commented Jan 16, 2025

Hey there!

While evaluating on some of the datasets (e.g. electricity/W), in the test data, I can see that some time series only contain NaN targets (see the screenshot below). I'd expect the score on these time series would dominate the overall score.

Any thoughts on this?
Or is there any normalization in the scores that make the scores on these time-series less "disruptive"?

Thank you in advance! 😄

image

@cuthalionn
Copy link
Contributor

Hi @liam-sbhoo,

Thanks for raising this point! I agree, that this is a problematic instance that our dataset builder framework did not filter.
Our evaluation framework can handle and avoid some similar problematic ts instances originally saved in the dataset:

  1. If the gold target in prediction horizon has nans (The second list in your example), we ignore these instances through gluonts eval function.
  2. If the forecast prediction has nan, we again ignore these instances through gluonts eval function.

We allow for missing values in the historical context, however, the instance you have found is exceptional because all values are missing in the historical context. I agree this is not ideal and preferably should have been filtered from the dataset. After you raised the issue I checked to see how many such instances we have in the whole gift_eval and I found only one more in addition to the one you shared:

  1. electricity/W/short, item MT_178
  2. bitbrains_fast_storage/H/short, item fastStorage_552

So to answer your question whether these instances dominate the results or not, I believed they would not. The reason is (1) its a very small portion in their respective datasets, and (2) we normalize each model's result on every dataset with seasonal_naive. However, just to be sure I replicated the results for one model (moirai_small) with and without exlcuding the problematic instances for both datasets. You can see the results below:

Electricy/W, Short

Dataset Model eval_metrics/MASE[0.5] CRPS
electricity/W/short s_naive 2.0897 0.09888
electricity/W/short moirai_small 2.21046 0.10663
electricity/W/short moirai_small_normalized 1.05779 1.07843
[excluded] electricity/W/short s_naive 2.09215 0.09835
[excluded] electricity/W/short moirai_small 2.21827 0.10618
[excluded] electricity/W/short moirai_small_normalized 1.06028 1.07955

bitbrains_fast_storage/H, Short

Dataset Model eval_metrics/MASE[0.5] CRPS
bitbrains_fast_storage/H/short s_naive 1.29851 1.07803
bitbrains_fast_storage/H/short moirai_small 1.39726 0.60111
bitbrains_fast_storage/H/short moirai_small_normalized 1.07605 0.5576
[excluded] bitbrains_fast_storage/H/short s_naive 1.29851 1.07803
[excluded] bitbrains_fast_storage/H/short moirai_small 1.38931 0.60217
[excluded] bitbrains_fast_storage/H/short moirai_small_normalized 1.06993 0.55858

The difference after normalization with seasonal naive seems to be very small at least for the moirai model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants