Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of __getitem__ of TimeSeriesDataSet #806

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

denix56
Copy link

@denix56 denix56 commented Dec 21, 2021

Description

Pandas DataFrame is quite slow in comparison to numpy due to additional checks.
By replacing it with np.recarray I was able to improve performance by 5-10%.
Recarray allows us to have nice attribute access as in pandas, while improving performance.
The raw numpy arrays are a bit faster than recarray, however the difference is not as big as between pandas and recarray.
I have tested on Demand Forecasting with gpu=1, 0 workers and pin_memory=True.

@codecov-commenter
Copy link

codecov-commenter commented Dec 28, 2021

Codecov Report

Merging #806 (eb706f9) into master (0b5892a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #806   +/-   ##
=======================================
  Coverage   89.05%   89.06%           
=======================================
  Files          24       24           
  Lines        3829     3832    +3     
=======================================
+ Hits         3410     3413    +3     
  Misses        419      419           
Flag Coverage Δ
cpu 89.06% <100.00%> (+<0.01%) ⬆️
pytest 89.06% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pytorch_forecasting/data/timeseries.py 93.12% <100.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b5892a...eb706f9. Read the comment docs.

@jdb78
Copy link
Collaborator

jdb78 commented Feb 20, 2022

I am tempted to merge this. Think we should run the example notebooks also because things might change there - even if only visual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants