Improve performance of getitem of TimeSeriesDataSet #806

denix56 · 2021-12-21T18:39:10Z

Description

Pandas DataFrame is quite slow in comparison to numpy due to additional checks.
By replacing it with np.recarray I was able to improve performance by 5-10%.
Recarray allows us to have nice attribute access as in pandas, while improving performance.
The raw numpy arrays are a bit faster than recarray, however the difference is not as big as between pandas and recarray.
I have tested on Demand Forecasting with gpu=1, 0 workers and pin_memory=True.

codecov-commenter · 2021-12-28T17:35:41Z

Codecov Report

Merging #806 (eb706f9) into master (0b5892a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #806   +/-   ##
=======================================
  Coverage   89.05%   89.06%           
=======================================
  Files          24       24           
  Lines        3829     3832    +3     
=======================================
+ Hits         3410     3413    +3     
  Misses        419      419

Flag	Coverage Δ
cpu	`89.06% <100.00%> (+<0.01%)`	⬆️
pytest	`89.06% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pytorch_forecasting/data/timeseries.py	`93.12% <100.00%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b5892a...eb706f9. Read the comment docs.

jdb78 · 2022-02-20T00:04:34Z

I am tempted to merge this. Think we should run the example notebooks also because things might change there - even if only visual.

denix56 added 3 commits December 21, 2021 19:33

Replace DataFrame of indices with np.recarray

39a28ba

Enable index field (to avoid changes in other files)

46646c0

Fix conversion to numpy, when we have numpy already

08cb636

Remove whitespaces

eb706f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of getitem of TimeSeriesDataSet #806

Improve performance of getitem of TimeSeriesDataSet #806

denix56 commented Dec 21, 2021

codecov-commenter commented Dec 28, 2021 •

edited

Loading

jdb78 commented Feb 20, 2022

Improve performance of __getitem__ of TimeSeriesDataSet #806

Are you sure you want to change the base?

Improve performance of __getitem__ of TimeSeriesDataSet #806

Conversation

denix56 commented Dec 21, 2021

Description

codecov-commenter commented Dec 28, 2021 • edited Loading

Codecov Report

jdb78 commented Feb 20, 2022

Improve performance of getitem of TimeSeriesDataSet #806

Improve performance of getitem of TimeSeriesDataSet #806

codecov-commenter commented Dec 28, 2021 •

edited

Loading