Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LDI] How to acquire and process historical data for further analysis in R? #9

Open
amotl opened this issue Sep 27, 2019 · 3 comments

Comments

@amotl
Copy link
Member

amotl commented Sep 27, 2019

One of our colleagues would like to download the data (since somewhere beginning of February) of 8 LDI-sensors next to official stations for further analysis in e.g. R.

Making a dashboard by using the specific LDI station identifiers is easy but is there actually some download functionality?

@amotl
Copy link
Member Author

amotl commented Sep 27, 2019

Introduction

For acquiring observations from specific stations, you can use luftdatenpumpe to generate a JSON file which might be processed by other tools in the downstream/analysis pipeline.

Ad hoc example

We want to outline a basic example here. The output of the command below is available at LDI_BE_7013_10725_13585_2019-09-27T210801Z.json to get an idea about what this could do for you.

luftdatenpumpe readings --network=ldi --reverse-geocode --station=7013,10725,13585 > 'LDI_BE_7013_10725_13585_2019-09-27T210801Z.json'

Historical data example

There's a section named LDI CSV archive data examples (InfluxDB) within luftdatenpumpe --help.

In short, you will have to download the historical data first by invoking

wget --mirror --continue --no-host-directories --directory-prefix=/var/spool/archive.luftdaten.info --accept-regex='2019-0[2-9]' http://archive.luftdaten.info/

and then process this data by invoking

luftdatenpumpe readings --network=ldi --station=7013,10725,13585 --source=file:///var/spool/archive.luftdaten.info

@amotl amotl changed the title How to download historical data from LDI? How to acquire and process historical data from LDI? Sep 27, 2019
@amotl amotl changed the title How to acquire and process historical data from LDI? How to acquire and process historical data from LDI for further analysis in R Sep 27, 2019
@amotl amotl changed the title How to acquire and process historical data from LDI for further analysis in R How to acquire and process historical data from LDI for further analysis in R? Sep 27, 2019
@amotl
Copy link
Member Author

amotl commented Sep 27, 2019

Reading Parquet files from R

Please also note that there are by-sensor Parquet files available at http://archive.luftdaten.info/parquet/. While luftdatenpumpe does not have an option for ingesting them, we definitively would like to add that as an improvement.

Nevertheless, it might already be a better option for wrangling with the data directly in R without using luftdatenpumpe at all. You would either use the R package for Arrow to access the data files through Arrow or one of the R packages for accessing the Spark analytics engine to read the files through its machinery.

The discussion at [1] outlines different ways of accessing Parquet files from R.

[1] https://stackoverflow.com/questions/30402253/how-do-i-read-a-parquet-in-r-and-convert-it-to-an-r-dataframe

@amotl
Copy link
Member Author

amotl commented Nov 22, 2019

We just found this module which could fill the gap between Python and R.

rpy2 is an interface to R running embedded in a Python process.

-- https://rpy2.bitbucket.io/

See also https://code.likeagirl.io/walking-the-python-r-bridge-66b63bab0fbd.

@amotl amotl changed the title How to acquire and process historical data from LDI for further analysis in R? [LDI] How to acquire and process historical data for further analysis in R? Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant