Features
- support remote repositories for stream processing to facilitate review species interaction claims of large data corpora (e.g., GBIF, iDigBio) on regular hardware (e.g., a laptop) #52 globalbioticinteractions/globalbioticinteractions#1030 fyi @seltmann @zedomel
Example 1. Extract all interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
GIB_VERSION=hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
CONTENT_REPO=https://linker.bio
preston cat --remote $CONTENT_REPO $GIB_VERSION --no-cache\
| elton stream --data-dir data --remote $CONTENT_REPO --no-cache
Example 2. Review interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
GIB_VERSION=hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
CONTENT_REPO=https://linker.bio
preston cat --remote $CONTENT_REPO $GIB_VERSION --no-cache\
| elton stream --data-dir data --remote $CONTENT_REPO --record-type review --no-cache
Note that both Example 1 and example 2 streams content provided by https://linker.bio . If you'd like to keep the content (>>GiB), remove the --no-cache
option and you'll have a copy of a large corpus of biodiversity data available for reproducible offline processing after an initial "sync/pull" from https://linker.bio .
Improvements
globalbioticinteractions/globalbioticinteractions#1030