Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload processed data sets to zenodo #21

Open
jokochems opened this issue Oct 11, 2021 · 4 comments
Open

Upload processed data sets to zenodo #21

jokochems opened this issue Oct 11, 2021 · 4 comments
Assignees

Comments

@jokochems
Copy link
Member

Once we have created data sets, we should upload a zipped version to zenodo.

@jokochems jokochems self-assigned this May 27, 2022
@jokochems jokochems linked a pull request Aug 3, 2022 that will close this issue
@jokochems jokochems added this to the release-v0.1.0 milestone Aug 18, 2022
@jokochems
Copy link
Member Author

This should be done once data preparation is final for the PhD thesis. The goal is to have something by fall 2023.

@maurerle
Copy link
Collaborator

maurerle commented Oct 30, 2023

I think this is quite crucial.
On Zenodo, you can provide different versions of a dataset too, so you could upload the current version and update it once it is finished.

For now, I got it working somehow:

conda create -v -n pommesdata python=3.10 pandas pip numpy matplotlib
conda activate pommesdata
pip install scikit-learn ipykernel statsmodels seaborn geopandas

However, I had problems with the execution path (I just moved the raw data folder up one layer)
Furthermore I had a missing folder prepared_data and missing file raw_data_input/timeseries/when2heat.csv
So I downloaded the 2022 version from here:

wget https://data.open-power-system-data.org/when2heat/2022-02-22/when2heat.csv

I had to add .fillna(0) here:
conv_de_new['commissioned_last'] = conv_de_new['commissioned'].fillna(0).astype(int)

Now I get TypeError: agg function failed [how->mean,dtype->object]

FR = tools.load_entsoe_generation_data('FR')
AT = tools.load_entsoe_generation_data('AT')

because the prepared entsoe dataset has columns which can not be averaged.
So I fixed this with:
df = df.resample("H").mean(numeric_only=True) (changed in pandas 2.0 - my fault for using new versions)

Finally I get: TypeError: can only concatenate str (not "int") to str

    eu_fuels = conv_eu['fuel'].unique()

    for fuel in eu_fuels:
        conv_eu.loc[conv_eu['fuel'] == fuel, 
                    :] = conv_eu.loc[conv_eu['fuel'] == fuel, 
                                     :].fillna(conv_eu.loc[conv_eu['fuel'] == fuel, :].mean())

which I don't want to take care of ;)

So if you could provide a dataset to get the dispatch model running, it would be very nice! :)

At least I am a third through the notebook.

@jokochems
Copy link
Member Author

I think this is quite crucial. On Zenodo, you can provide different versions of a dataset too, so you could upload the current version and update it once it is finished.

For now, I got it working somehow:

conda create -v -n pommesdata python=3.10 pandas pip numpy matplotlib
conda activate pommesdata
pip install scikit-learn ipykernel statsmodels seaborn geopandas

However, I had problems with the execution path (I just moved the raw data folder up one layer) Furthermore I had a missing folder prepared_data and missing file raw_data_input/timeseries/when2heat.csv So I downloaded the 2022 version from here:

wget https://data.open-power-system-data.org/when2heat/2022-02-22/when2heat.csv

I had to add .fillna(0) here: conv_de_new['commissioned_last'] = conv_de_new['commissioned'].fillna(0).astype(int)

Now I get TypeError: agg function failed [how->mean,dtype->object]

FR = tools.load_entsoe_generation_data('FR')
AT = tools.load_entsoe_generation_data('AT')

because the prepared entsoe dataset has columns which can not be averaged. So I fixed this with: df = df.resample("H").mean(numeric_only=True) (changed in pandas 2.0 - my fault for using new versions)

Finally I get: TypeError: can only concatenate str (not "int") to str

    eu_fuels = conv_eu['fuel'].unique()

    for fuel in eu_fuels:
        conv_eu.loc[conv_eu['fuel'] == fuel, 
                    :] = conv_eu.loc[conv_eu['fuel'] == fuel, 
                                     :].fillna(conv_eu.loc[conv_eu['fuel'] == fuel, :].mean())

which I don't want to take care of ;)

So if you could provide a dataset to get the dispatch model running, it would be very nice! :)

At least I am a third through the notebook.

Thanks for your experience report, @maurerle.
I know that the dependencies here are somewhat outdated. For time reasons, I decided not to update at some point. I think, I included my fixed dependencies, at least in a feature branch, but also, I guess that there might be some problems with the geopandas dependencies which seem to have worked fine for you.

I can provide some uploads to Zenodo shortly

@jokochems
Copy link
Member Author

jokochems commented Apr 13, 2024

Hey @maurerle, in the course of handing in my PhD thesis, I decided to finally upload a processed data set with investment model input: https://doi.org/10.5281/zenodo.10968672

It was super quick. I could provide data sets for the dispatch-related part as well. I think, I will do so shortly.

@jokochems jokochems reopened this Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants