Evaluation on Time-Series with all NaN targets #12

liam-sbhoo · 2025-01-16T09:08:30Z

Hey there!

While evaluating on some of the datasets (e.g. electricity/W), in the test data, I can see that some time series only contain NaN targets (see the screenshot below). I'd expect the score on these time series would dominate the overall score.

Any thoughts on this?
Or is there any normalization in the scores that make the scores on these time-series less "disruptive"?

Thank you in advance! 😄

The text was updated successfully, but these errors were encountered:

cuthalionn · 2025-01-20T09:53:52Z

Hi @liam-sbhoo,

Thanks for raising this point! I agree, that this is a problematic instance that our dataset builder framework did not filter.
Our evaluation framework can handle and avoid some similar problematic ts instances originally saved in the dataset:

If the gold target in prediction horizon has nans (The second list in your example), we ignore these instances through gluonts eval function.
If the forecast prediction has nan, we again ignore these instances through gluonts eval function.

We allow for missing values in the historical context, however, the instance you have found is exceptional because all values are missing in the historical context. I agree this is not ideal and preferably should have been filtered from the dataset. After you raised the issue I checked to see how many such instances we have in the whole gift_eval and I found only one more in addition to the one you shared:

electricity/W/short, item MT_178
bitbrains_fast_storage/H/short, item fastStorage_552

So to answer your question whether these instances dominate the results or not, I believed they would not. The reason is (1) its a very small portion in their respective datasets, and (2) we normalize each model's result on every dataset with seasonal_naive. However, just to be sure I replicated the results for one model (moirai_small) with and without exlcuding the problematic instances for both datasets. You can see the results below:

Electricy/W, Short

Dataset	Model	eval_metrics/MASE[0.5]	CRPS
electricity/W/short	s_naive	2.0897	0.09888
electricity/W/short	moirai_small	2.21046	0.10663
electricity/W/short	moirai_small_normalized	1.05779	1.07843
[excluded] electricity/W/short	s_naive	2.09215	0.09835
[excluded] electricity/W/short	moirai_small	2.21827	0.10618
[excluded] electricity/W/short	moirai_small_normalized	1.06028	1.07955

bitbrains_fast_storage/H, Short

Dataset	Model	eval_metrics/MASE[0.5]	CRPS
bitbrains_fast_storage/H/short	s_naive	1.29851	1.07803
bitbrains_fast_storage/H/short	moirai_small	1.39726	0.60111
bitbrains_fast_storage/H/short	moirai_small_normalized	1.07605	0.5576
[excluded] bitbrains_fast_storage/H/short	s_naive	1.29851	1.07803
[excluded] bitbrains_fast_storage/H/short	moirai_small	1.38931	0.60217
[excluded] bitbrains_fast_storage/H/short	moirai_small_normalized	1.06993	0.55858

The difference after normalization with seasonal naive seems to be very small at least for the moirai model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation on Time-Series with all NaN targets #12

Evaluation on Time-Series with all NaN targets #12

liam-sbhoo commented Jan 16, 2025 •

edited

Loading

cuthalionn commented Jan 20, 2025

Evaluation on Time-Series with all NaN targets #12

Evaluation on Time-Series with all NaN targets #12

Comments

liam-sbhoo commented Jan 16, 2025 • edited Loading

cuthalionn commented Jan 20, 2025

Electricy/W, Short

bitbrains_fast_storage/H, Short

liam-sbhoo commented Jan 16, 2025 •

edited

Loading