Skip to content

Commit

Permalink
Merge pull request #15 from inbo/batch-data-per-year
Browse files Browse the repository at this point in the history
Batch data per year
  • Loading branch information
wlangera authored Apr 24, 2024
2 parents 45bfa11 + db3ed07 commit 91121d0
Show file tree
Hide file tree
Showing 4 changed files with 197 additions and 34 deletions.
16 changes: 12 additions & 4 deletions source/targets/data_preparation/R/path_to_files.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
# Path to raw data from SOVON
path_to_counts_sovon <- function(proj_path, file) {
file_path <- file.path(proj_path, "data", "mas", file)
return(file_path)
# Paths to raw data from SOVON
paths_to_counts_sovon <- function(
proj_path,
pattern = "qgis_export_sovon_wfs") {
# List paths to all files
file_paths <- list.files(
file.path(proj_path, "data"),
pattern = pattern,
full.names = TRUE,
recursive = TRUE)

return(file_paths)
}

# Path to counting locations
Expand Down
99 changes: 99 additions & 0 deletions source/targets/data_preparation/README.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: "Procedure om data te downloaden van de SOVON WFS service en verwerking via targets pipeline"
author: "Hans Van Calster & Ward Langeraert"
date: "`r Sys.Date()`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Download data via SOVON WFS service
## QGIS v3.24

1. een account aanmaken via <https://www.vogelatlas.be/user/newuser>

1. Rechten aanvragen Sovon

1. open QGIS

1. Zet project crs op EPSG:28992
1. Kies `Kaartlagen` --\> `Databronnen beheren` --\> `WFS / OGC API objecten`

1. klik `nieuw` in het dialoogvenster om een nieuwe WFS verbinding te maken

1. Gebruik als naam `sovon` en als url `https://portal.sovon.nl/views/wfs/453/`
1. Klik op OK

1. Maak een verbinding met de zopas toegevoegde WFS service

1. Klik op verbinden
1. Selecteer de laag `ms:viewpoints`
- Vraag een query aan ('Query maken') voor data van het nieuwe jaar, bv.: `SELECT * FROM viewpoints WHERE jaar = 2024`
1. Vink het boxje aan 'Alleen objecten bevragen die het huidige zichtbare bereik overlappen'
1. Klik op toevoegen
1. Geef je gebruikersnaam en paswoord
1. Indien QGIS vraagt voor transformatie van EPSG:28992 naar een ander CRS, kan je dit best cancellen

Als alles goed gaat, worden nu alle data waar je toegang toe hebt gedownload (dit kan even duren).
Dit zijn de zogenaamde bezoekstippen.

## QGIS v3.26-3.28

1. een account aanmaken via <https://www.vogelatlas.be/user/newuser>

1. Rechten aanvragen Sovon

1. open QGIS

1. Zet project crs op EPSG:28992
1. Kies `Kaartlagen` --\> `Databronnen beheren` --\> `WFS / OGC API objecten`

1. klik `nieuw` in het dialoogvenster om een nieuwe WFS verbinding te maken

1. Gebruik als naam `sovon` en als url `https://portal.sovon.nl/views/wfs/453/`
1. Klik op OK

1. Maak een verbinding met de zopas toegevoegde WFS service

1. Klik op Verbinden
1. Geef je gebruikersnaam en paswoord en kik op ok
1. Selecteer de laag `ms:viewpoints`
- Vraag een query aan ('Query maken') voor data van het nieuwe jaar, bv.: `SELECT * FROM viewpoints WHERE jaar = 2024`
1. Vink het boxje aan 'Alleen objecten bevragen die het huidige zichtbare bereik overlappen'
1. Wijzig Coördinaten Referentiesysteem naar EPSG:28992
1. Klik op toevoegen
1. Indien QGIS vraagt voor transformatie van EPSG:28992 naar een ander CRS, kan je dit best cancellen

Als alles goed gaat, worden nu alle data waar je toegang toe hebt gedownload (dit kan even duren).
Dit zijn de zogenaamde bezoekstippen.

# Exporteren en localisatie van de data

Wanneer alle data gedownload zijn, kan je deze laag exporteren:

1. Zorg dat alle data zichtbaar zijn in de view

1. Selecteer `Kaartlagen` --\> `Opslaan als ...` en sla op als `.geojson` met als bestandsnaam `<YYYYMMDD_qgis_export_sovon_wfs_JAAR>` en CRS `EPSG:28992`.
`YYYYMMDD` is de datum van export, `JAAR` is het jaar wanneer de data verzameld is, zie SQL query.
Klik op OK (dit kan even duren).
Sla het geojson-bestand op in de folder `mbag-mas/data/mas`.

1. De finale export die je wilt gebruiken voor data preparatie en latere analyses sla je op in een folder met als naam `JAAR` onder `mbag-mas/source/targets/data_preparation/data`.
Elke folder mag slechts 1 bestand hebben met de data van dat jaar (zie verder).

# Verwerking van de data

We verwerken de data via een pipeline met de [targets package](https://books.ropensci.org/targets/).
Dit omvat data selectie, preparatie en berekening van variabelen.

We maken gebruik van ["dynamic branching"](https://books.ropensci.org/targets/dynamic.html) in de targets pipeline.
Dit is een manier om nieuwe targets te definiëren terwijl de pipeline actief is.
Hierbij wordt een nieuwe target gemaakt voor elk bestand.
Bij het toevoegen van een nieuwe dataset van een jaar, zal de pipeline bijgevolg enkel de berekeningen voor de data van het nieuwe jaar moeten doen en niet opnieuw de berekeningen voor de vorige jaren.
De volledige pipeline ziet er als volgt uit:

```{r}
targets::tar_glimpse()
```
55 changes: 40 additions & 15 deletions source/targets/data_preparation/_targets.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,24 +38,28 @@ source(file.path(mbag_dir, "source", "R", "predatoren_f.R"))

# Target list
list(
tarchetypes::tar_file(
name = mas_counts_sovon_file,
command = path_to_counts_sovon(
proj_path = mbag_dir,
file = "20230810_qgis_export_sovon_wfs_2023.geojson"
tarchetypes::tar_files(
name = mas_counts_sovon_files,
command = paths_to_counts_sovon(
proj_path = target_dir
)
),
tar_target(
name = mas_counts_sovon,
command = sf::st_read(
mas_counts_sovon_file
)
dsn = mas_counts_sovon_files,
quiet = TRUE
),
pattern = map(mas_counts_sovon_files),
iteration = "list"
),
tar_target(
name = crs_pipeline,
command = amersfoort_to_lambert72(
mas_counts_sovon
)
),
pattern = map(mas_counts_sovon),
iteration = "list"
),
tarchetypes::tar_file(
name = sample_file,
Expand All @@ -77,44 +81,65 @@ list(
x = crs_pipeline,
y = sample,
by = dplyr::join_by(plotnaam == pointid)
)
),
pattern = map(crs_pipeline),
iteration = "list"
),
tar_target(
name = select_time_periods,
command = select_within_time_periods(
counts_df = select_sampled_points
)
),
pattern = map(select_sampled_points),
iteration = "list"
),
tar_target(
name = select_within_radius,
command = select_within_circle_radius(
counts_df = select_time_periods,
radius = 300
)
),
pattern = map(select_time_periods),
iteration = "list"
),
tar_target(
name = select_species_groups,
command = dplyr::filter(
select_within_radius,
soortgrp %in% 1:2
)
),
pattern = map(select_within_radius),
iteration = "list"
),
tar_target(
name = remove_double_counts,
command = process_double_counted_data(
counts_df = select_species_groups
)
),
pattern = map(select_species_groups),
iteration = "list"
),
tar_target(
name = remove_subspecies_names,
command = adjust_subspecies_names_nl(
counts_df = remove_double_counts
)
),
pattern = map(remove_double_counts),
iteration = "list"
),
tar_target(
name = mas_data_clean,
name = add_predator_variable,
command = add_predator_variables(
counts_df = remove_subspecies_names
),
pattern = map(remove_subspecies_names),
iteration = "list"
),
tar_target(
name = mas_data_clean,
command = do.call(
what = rbind.data.frame,
args = c(add_predator_variable, make.row.names = FALSE)
)
)
)
61 changes: 46 additions & 15 deletions source/targets/data_preparation/_targets/meta/meta
Original file line number Diff line number Diff line change
@@ -1,28 +1,59 @@
name|type|data|command|depend|seed|path|time|size|bytes|format|repository|iteration|parent|children|seconds|warnings|error
.Random.seed|object|93b4d65f2506ee1c|||||||||||||||
add_predator_variable|stem|b1d6101a48661ce9|e1c93a06d9f6d4cb|d65332b68dd4f90d|-1919973047||t19836.3580925959s|448d33d82c53cd6c|1211653|qs|local|vector|||0.11||
.Random.seed|object|9f7a7034c0e37700|||||||||||||||
add_predator_variable|pattern|01c728dfa7f8b9cc|e1c93a06d9f6d4cb||-1919973047||||2253323|qs|local|list||add_predator_variable_cb66b9e8*add_predator_variable_85b65ad8|0.2||
add_predator_variable_85b65ad8|branch|b1d6101a48661ce9|e1c93a06d9f6d4cb|801966e5819e4cf7|-1006700069||t19837.3755483621s|448d33d82c53cd6c|1211653|qs|local|list|add_predator_variable||0.09||
add_predator_variable_cb66b9e8|branch|031187f0a3d3d06d|e1c93a06d9f6d4cb|ff7cec7a334367e8|-758145585||t19837.375545188s|6338640b9116fd0c|1041670|qs|local|list|add_predator_variable||0.11||
add_predator_variable_fab8aebd|branch|b1d6101a48661ce9|e1c93a06d9f6d4cb|6d94daf6feaabdba|-1267021414||t19836.6334798554s|448d33d82c53cd6c|1211653|qs|local|list|add_predator_variable||0.16||
add_predator_variables|function|7467ab35b1f1bd3d|||||||||||||||
adjust_subspecies_names_nl|function|50a4c7e8d7a82397|||||||||||||||
amersfoort_to_lambert72|function|7a05a501641027b3|||||||||||||||
crs_pipeline|stem|24316d2285354371|fc01a4ccb0c4ce92|980a00c69b7059d5|-1580479739||t19836.3456511859s|cd33a8123b9ea861|1405399|qs|local|vector|||0.65||
crs_pipeline|pattern|235a92d58c68e1f8|fc01a4ccb0c4ce92||-1580479739||||3378016|qs|local|list||crs_pipeline_64cad22b*crs_pipeline_1af84150|2.98||
crs_pipeline_1af84150|branch|25cc7b9c3738e95d|fc01a4ccb0c4ce92|e944c1068696cbd8|-1771903272||t19837.3754989254s|2fbb451c02f1c505|1445019|qs|local|list|crs_pipeline||0.5||
crs_pipeline_64cad22b|branch|9ab625d2f93529ae|fc01a4ccb0c4ce92|864fa89216d77cc1|-908645192||t19837.3754907866s|9899be058e361a24|1932997|qs|local|list|crs_pipeline||2.48||
crs_pipeline_916fc15e|branch|24316d2285354371|fc01a4ccb0c4ce92|a8efcc518d143ecd|1475784767||t19836.6334441313s|cd33a8123b9ea861|1405399|qs|local|list|crs_pipeline||0.91||
kraaiachtigen_f|function|32a6f93504fb3f7a|||||||||||||||
mas_counts_sovon|stem|3079d5194a1a13d4|404705b488b2fa7b|88ddfa3cdb0ecee3|222117260||t19831.5074770628s|e5aa51f74688237f|918916|qs|local|vector|||1.44||
mas_counts_sovon_file|stem|8b786fe9ff0cbd2e|6ce5a8aaba143dfe|9c71b69a8c465d4b|-1372158522|C:/R/git_repositories/mbag-mas/data/mas/20230810_qgis_export_sovon_wfs_2023.geojson|t19579.6291970393s|ce6386d2aec493f8|24420334|file|local|vector|||0.22||
mas_data_clean|stem|b1d6101a48661ce9|e1c93a06d9f6d4cb|7114343701e0abce|-1942276229||t19836.3791747591s|448d33d82c53cd6c|1211653|qs|local|vector|||0.11||
mas_counts_sovon|pattern|5fbe6dfeb0d38926|168c3b81e4150fa3||222117260||||2225605|qs|local|list||mas_counts_sovon_91991620*mas_counts_sovon_614b0811|3.13||
mas_counts_sovon_1da3ce1e|branch|3079d5194a1a13d4|168c3b81e4150fa3|6eec518936a1b4f1|881722848||t19836.6334312342s|e5aa51f74688237f|918916|qs|local|list|mas_counts_sovon||1.95||
mas_counts_sovon_614b0811|branch|10642f9094fdec9b|168c3b81e4150fa3|ca9e87ddd0d9c32c|-548042463||t19837.3754597336s|5b3a0e257f2b0824|950517|qs|local|list|mas_counts_sovon||1.31||
mas_counts_sovon_91991620|branch|effc8a3fabb0ef20|168c3b81e4150fa3|c8ebcef82b558ede|337220797||t19837.375443185s|1d54da80af1b45c9|1275088|qs|local|list|mas_counts_sovon||1.82||
mas_counts_sovon_files|pattern|0e998fcd11de47cb|c52338d7124b0a05||1920956259||||57764640|file|local|vector||mas_counts_sovon_files_c804bee4*mas_counts_sovon_files_b858e527|0||
mas_counts_sovon_files_4776d7dc|branch|8b786fe9ff0cbd2e|c52338d7124b0a05|ef46db3751d8e999|-1411769136|C:/R/git_repositories/mbag-mas/source/targets/data_preparation/data/2023/20230810_qgis_export_sovon_wfs_2023.geojson|t19579.6291970393s|ce6386d2aec493f8|24420334|file|local|vector|mas_counts_sovon_files||0||
mas_counts_sovon_files_b858e527|branch|ab8d895f67e61e2f|c52338d7124b0a05|ef46db3751d8e999|1632737556|C:/R/git_repositories/mbag-mas/source/targets/data_preparation/data/2023/20240424_qgis_export_sovon_wfs_2023.geojson|t19837.3619874362s|71c635049da56b6f|24964075|file|local|vector|mas_counts_sovon_files||0||
mas_counts_sovon_files_c804bee4|branch|0a5fba55a0968fbb|c52338d7124b0a05|ef46db3751d8e999|90413319|C:/R/git_repositories/mbag-mas/source/targets/data_preparation/data/2018_2022/20240424_qgis_export_sovon_wfs_2018_2022.geojson|t19837.3739489539s|0fabe1ebcd5e16ff|32800565|file|local|vector|mas_counts_sovon_files||0||
mas_counts_sovon_files_files|stem|2351b7f404e60792|9afdce93aca4c6a2|fad413a8af0d6d43|1663214924||t19837.3804801301s|ef5bb583e5bfa4cd|174|rds|local|vector||mas_counts_sovon_files_files_b080d5d2*mas_counts_sovon_files_files_ab3c34ba|0.22||
mas_data_clean|stem|44e15449852bb5ad|f3f492e13a4a7743|8a3d0f01a50a17eb|-1942276229||t19837.3804850748s|03f66b67abe35641|2222011|qs|local|vector|||0.09||
mbag_dir|object|73add2da6d24990f|||||||||||||||
path_to_counts_sovon|function|b8a0d214f65ee27f|||||||||||||||
path_to_samples|function|5ecc4024b743a014|||||||||||||||
paths_to_counts_sovon|function|3dfadc517ce46981|||||||||||||||
predatoren_f|function|febc8f3bf40ecb7d|||||||||||||||
process_double_counted_data|function|705fe098ec314f30|||||||||||||||
remove_double_counts|stem|33d4032f50d2b5ed|56d403ffe2f51374|bc3a8ebbfcaba53e|-48525810||t19836.3456652496s|d83934c08084e74c|1195980|qs|local|vector|||0.17||
remove_subspecies_names|stem|35f7dbbf56f35d05|6192e8e36bfb46d7|a54bc7e6764336c6|1657860091||t19836.3580890818s|ca0af1b203c1cf95|1195963|qs|local|vector|||0.09||
remove_double_counts|pattern|ffe95c1207105760|56d403ffe2f51374||-48525810||||2224185|qs|local|list||remove_double_counts_f63cd31f*remove_double_counts_c457cfbe|0.39||
remove_double_counts_2183673c|branch|33d4032f50d2b5ed|56d403ffe2f51374|120e9624994e724f|-1583060927||t19836.6334707344s|d83934c08084e74c|1195980|qs|local|list|remove_double_counts||0.25||
remove_double_counts_c457cfbe|branch|33d4032f50d2b5ed|56d403ffe2f51374|ab0b8bcc64a7515b|-1868331755||t19837.3755358011s|d83934c08084e74c|1195980|qs|local|list|remove_double_counts||0.14||
remove_double_counts_f63cd31f|branch|37b50a4fb2d8fcf2|56d403ffe2f51374|01d2ddb2c97905f7|-466127144||t19837.3755321213s|67d14343058855aa|1028205|qs|local|list|remove_double_counts||0.25||
remove_subspecies_names|pattern|c2ee75b4dd7cea9f|6192e8e36bfb46d7||1657860091||||2224142|qs|local|list||remove_subspecies_names_d6384eb7*remove_subspecies_names_b2cf8fa2|0.2||
remove_subspecies_names_8033c49b|branch|35f7dbbf56f35d05|6192e8e36bfb46d7|d6984430fdb56526|425039760||t19836.6334751919s|ca0af1b203c1cf95|1195963|qs|local|list|remove_subspecies_names||0.14||
remove_subspecies_names_b2cf8fa2|branch|35f7dbbf56f35d05|6192e8e36bfb46d7|3f914117ad872cd8|1016185236||t19837.3755419358s|ca0af1b203c1cf95|1195963|qs|local|list|remove_subspecies_names||0.11||
remove_subspecies_names_d6384eb7|branch|1d4a71f600a3d00d|6192e8e36bfb46d7|be2db777c07b1ce5|518042368||t19837.3755388335s|a03bcf07b5ae740d|1028179|qs|local|list|remove_subspecies_names||0.09||
roofvogels_f|function|ff0ef1f62c67a283|||||||||||||||
sample|stem|e4966dd76c186021|7c7940c3ee902e8e|351c5860311905d4|887991846||t19831.545437312s|e3ea31e6dc0d59a4|15016|qs|local|vector|||0.07||
sample_file|stem|12ec54a1802f0ccc|9373cad8a757c886|fce4235ad2580001|1141903364|C:/R/git_repositories/mbag-mas/data/steekproefkaders/steekproef_avimap_mbag_piloot.csv|t19803.5701886462s|60ef8d6db1574934|62070|file|local|vector|||0||
select_sampled_points|stem|f751de663336b945|b659c8f8ce392cde|ecc1e7b45b6a8d0c|-107447084||t19831.5248409952s|d8de74379de70ddb|1187211|qs|local|vector|||0.03||
select_species_groups|stem|f9ee84b7905152e5|3f1e242af19e665d|50097636e83c673b|-1185638444||t19831.5759555089s|9d27dbe475bb2b9d|1194195|qs|local|vector|||0.04||
select_time_periods|stem|e194aeeaa582d6a6|206ce93b3790e30c|4f8cbc5f7ebd8ca9|291324708||t19836.3456566927s|2783a0b288bff409|1201608|qs|local|vector|||0.28||
sample|stem|e4966dd76c186021|7c7940c3ee902e8e|351c5860311905d4|887991846||t19836.6334056857s|e3ea31e6dc0d59a4|15016|qs|local|vector|||0.14||
sample_file|stem|12ec54a1802f0ccc|9373cad8a757c886|fce4235ad2580001|1141903364|C:/R/git_repositories/mbag-mas/data/steekproefkaders/steekproef_avimap_mbag_piloot.csv|t19803.5701886462s|60ef8d6db1574934|62070|file|local|vector|||0.35||
select_sampled_points|pattern|46f373e6f1e29ecd|b659c8f8ce392cde||-107447084||||2222005|qs|local|list||select_sampled_points_798b2ce7*select_sampled_points_88f19b3d|0.08||
select_sampled_points_4f539302|branch|f751de663336b945|b659c8f8ce392cde|a65d89958ab07d0a|-1764201090||t19836.6334476012s|d8de74379de70ddb|1187211|qs|local|list|select_sampled_points||0.05||
select_sampled_points_798b2ce7|branch|80b137acdb5e8b01|b659c8f8ce392cde|bb98d2b8fd3cc0de|-1314208889||t19837.3755014323s|78ad3df6a3489d61|1034794|qs|local|list|select_sampled_points||0.05||
select_sampled_points_88f19b3d|branch|f751de663336b945|b659c8f8ce392cde|fbf7d5e89f0b86db|-648933279||t19837.3755038851s|d8de74379de70ddb|1187211|qs|local|list|select_sampled_points||0.03||
select_species_groups|pattern|daa88c8ce05e71c3|3f1e242af19e665d||-1185638444||||2221551|qs|local|list||select_species_groups_49cc114d*select_species_groups_f7e0a6a0|0.05||
select_species_groups_49cc114d|branch|482c066fc0b0f032|3f1e242af19e665d|40712c34058e5d42|-433748385||t19837.3755250427s|d12ae4d089c90245|1027356|qs|local|list|select_species_groups||0.03||
select_species_groups_9005462e|branch|f9ee84b7905152e5|3f1e242af19e665d|653a67f1416430f0|1024928494||t19836.6334651819s|9d27dbe475bb2b9d|1194195|qs|local|list|select_species_groups||0.03||
select_species_groups_f7e0a6a0|branch|f9ee84b7905152e5|3f1e242af19e665d|18248565ac075282|824669762||t19837.3755272638s|9d27dbe475bb2b9d|1194195|qs|local|list|select_species_groups||0.02||
select_time_periods|pattern|7826a632a7f2449b|206ce93b3790e30c||291324708||||2247642|qs|local|list||select_time_periods_106885d3*select_time_periods_29eeb0e5|0.59||
select_time_periods_00ee06b7|branch|e194aeeaa582d6a6|206ce93b3790e30c|2e587d1043c214a5|1334375700||t19836.6334562646s|2783a0b288bff409|1201608|qs|local|list|select_time_periods||0.52||
select_time_periods_106885d3|branch|06c7e3bf807396d0|206ce93b3790e30c|20ba66030adc44e5|195757004||t19837.3755096047s|4920a3c8cbb29c0c|1046034|qs|local|list|select_time_periods||0.33||
select_time_periods_29eeb0e5|branch|e194aeeaa582d6a6|206ce93b3790e30c|83e2769cddc8e49b|-1029364792||t19837.3755147218s|2783a0b288bff409|1201608|qs|local|list|select_time_periods||0.26||
select_within_circle_radius|function|880681aea06f4662|||||||||||||||
select_within_radius|stem|f9ee84b7905152e5|b43ee3d813fbfa23|f5c01b2557c9822c|-523127267||t19836.3456611804s|9d27dbe475bb2b9d|1194195|qs|local|vector|||0.19||
select_within_radius|pattern|87eb19e22f8fb771|b43ee3d813fbfa23||-523127267||||2221551|qs|local|list||select_within_radius_dcff1891*select_within_radius_2565ae04|0.38||
select_within_radius_2565ae04|branch|f9ee84b7905152e5|b43ee3d813fbfa23|010a05c2966bd8d0|-237587632||t19837.3755230418s|9d27dbe475bb2b9d|1194195|qs|local|list|select_within_radius||0.19||
select_within_radius_730aa92e|branch|f9ee84b7905152e5|b43ee3d813fbfa23|d58b5b6016a3cc70|1596416445||t19836.6334621581s|9d27dbe475bb2b9d|1194195|qs|local|list|select_within_radius||0.29||
select_within_radius_dcff1891|branch|482c066fc0b0f032|b43ee3d813fbfa23|0d8a75fc941e0264|1251980147||t19837.3755189017s|d12ae4d089c90245|1027356|qs|local|list|select_within_radius||0.19||
select_within_time_periods|function|2798499bd63b690b|||||||||||||||
target_dir|object|5289a857edb685f1|||||||||||||||

0 comments on commit 91121d0

Please sign in to comment.