Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

energy_from_power returns incorrect index for shifted hourly data #370

Open
kandersolar opened this issue May 17, 2023 · 0 comments
Open

Comments

@kandersolar
Copy link
Member

TrendAnalysis does not like hourly data where the timestamps are not at the top of the hour. Here is a toy example where it errors when given an input time series timestamped at the half-hour mark instead of the usual 00:

import pandas as pd
import numpy as np
import rdtools

# toy hourly dataset, note that minute=30
times = pd.date_range('2019-01-01 00:30:00', periods=8760*3, freq='H')
df = pd.DataFrame({'pv': 1 - 0.005*np.arange(len(times))/8760,
                   'poa_global': 1000,
                   'temperature_cell': 25},
                  index=times)

ta = rdtools.TrendAnalysis(**df)
ta.filter_params.pop('clip_filter')  # clipping filter doesn't like this toy dataset
ta.sensor_analysis()  # ValueError: Less than two years of data left after filtering

Inspection of the object's attributes reveals that pv_energy's index is misaligned with the original data's index (the minutes have been truncated to 00):

In [181]: ta.pv_energy[:3]
Out[181]: 
2019-01-01 01:00:00    0.500000
2019-01-01 02:00:00    0.999999
2019-01-01 03:00:00    0.999999
Freq: H, Name: energy_Wh, dtype: float64

Index misalignment between energy and the irradiance data messes up the filtering, resulting in the "less than two years" error. Of course, the index difference traces back to energy_from_power:

In [189]: rdtools.normalization.energy_from_power(df['pv'])[:3]
Out[189]: 
2019-01-01 01:00:00    0.500000
2019-01-01 02:00:00    0.999999
2019-01-01 03:00:00    0.999999
Freq: H, Name: energy_Wh, dtype: float64

And from there, I think, to the call to _aggregate and a resample it performs:

# series that has same index as desired output
output_dummy = time_series.resample(target_frequency,
closed='right',
label='right').sum()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant