Releases: globalbioticinteractions/elton
0.14.2
Features
- support remote repositories for stream processing to facilitate review species interaction claims of large data corpora (e.g., GBIF, iDigBio) on regular hardware (e.g., a laptop) #52 globalbioticinteractions/globalbioticinteractions#1030 fyi @seltmann @zedomel
Example 1. Extract all interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
GIB_VERSION=hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
CONTENT_REPO=https://linker.bio
preston cat --remote $CONTENT_REPO $GIB_VERSION --no-cache\
| elton stream --data-dir data --remote $CONTENT_REPO --no-cache
Example 2. Review interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
GIB_VERSION=hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
CONTENT_REPO=https://linker.bio
preston cat --remote $CONTENT_REPO $GIB_VERSION --no-cache\
| elton stream --data-dir data --remote $CONTENT_REPO --record-type review --no-cache
Note that both Example 1 and example 2 streams content provided by https://linker.bio . If you'd like to keep the content (>>GiB), remove the --no-cache
option and you'll have a copy of a large corpus of biodiversity data available for reproducible offline processing after an initial "sync/pull" from https://linker.bio .
Improvements
globalbioticinteractions/globalbioticinteractions#1030
Bugs
0.14.1
Features
- support stream processing of rdf/quads towards #52 and globalbioticinteractions/globalbioticinteractions#1030
- introduce
elton tee
to copy resources described in rdf/n-quads stream into preston compatible, content addressed data dir. - introduce
--prov-mode
to let elton commands (e.g.,elton prov
,elton interactions
.elton names
,elton review
,elton stream
,elton nanopubs
, andelton ls --online
) details the processing methods and their inputs/outputs in machine readable rdf/nquads stream.
Example 1 - Append tracked interaction dataset and their dependencies to a preston archive.
elton track --prov-mode globalbioticinteractions/template-dataset\
| elton tee\
| preston append\
| tail -1
yielding
<urn:uuid:76ae2794-b9a2-4a27-b235-927377d77370> <http://www.w3.org/ns/prov#endedAtTime> "2025-01-07T23:53:04.402Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:76ae2794-b9a2-4a27-b235-927377d77370> .
Example 2 - generate interaction table from lastest version of a preston archive generated via Example 1.
preston head\
| preston cat\
| elton stream --record-type interaction --data-dir data\
| mlr --itsvlite --oxtab cat\
| tail
yielding
localityName
referenceDoi 10.1007/s13127-011-0039-1
referenceUrl https://doi.org/10.1007/s13127-011-0039-1
referenceCitation Gittenberger, A., Gittenberger, E. (2011). Cryptic, adaptive radiation of endoparasitic snails: sibling species of Leptoconchus (Gastropoda: Coralliophilidae) in corals. Org Divers Evol, 11(1), 21–41. doi:10.1007/s13127-011-0039-1
namespace globalbioticinteractions/template-dataset
citation Jorrit H. Poelen. 2014. Species associations manually extracted from literature.
archiveURI hash://sha256/5b4ee64e7384bdf3d75b1d6617edd5d82124567b4ec52b47920ea332837ff060
lastSeenAt 2025-01-07T23:55:26.792Z
contentHash
eltonVersion 0.14.0-SNAPSHOT
Example 3 - generate a review report of a tracked dataset, append their inputs (datasets)/outputs (review table) to a preston archive, then save the review report in to a file review.tsv
.
elton track --prov-mode globalbioticinteractions/template-dataset\
| elton tee\
| preston append\
| elton stream --record-type review --data-dir data
> review.tsv
Improvements
- upgrade to globi-lib v0.27.0 to help improve elton <> preston integration
- upgrade to preston v0.10.2
- allow for separate configuration of work-dir, prov-dir and data-dir.
- reduce repeated cache updates; use hash uris to avoid leaking local p…
…aths in prov logs; related to globalbioticinteractions/globalbioticinteractions#1030 #52
Bugs
0.13.9
Features
n/a
Improvements
- upgrade to globi-lib v0.26.6 to pickup most recent datasets (or dataset configurations) deposited with Zenodo. globalbioticinteractions/globalbioticinteractions#1017
Bugs
n/a
0.13.8
Features
n/a
Improvements
- upgrade to globi-lib v0.26.5 to work towards addressing globalbioticinteractions/globalbioticinteractions#999
- make cache dir configurable; globalbioticinteractions/globalbioticinteractions#999
Bugs
n/a
0.13.7
Features
n/a
Improvements
- upgrade to globi-lib v0.26.4 to help address globalbioticinteractions/globalbioticinteractions#999
- add ability to do streaming reviews, in addition to streaming interaction/name records.
example of creating streaming reports -
using single line globi.json file:
{ "namespace": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "citation": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "format": "dwca", "url": "https://linker.bio/hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf" }
to stream review records via
echo globi.json | elton stream --record-type review > review.tsv
Bugs
n/a
0.13.6
Features
- introduce [elton stream] to help stream all interactions from a versioned GBIF/iDigBio graph Big-Bee-Network/bif#1 and https://github.com/Big-Bee-Network/bif .
example usage:
using single line globi.json file:
{ "namespace": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "citation": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "format": "dwca", "url": "https://linker.bio/hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf" }
to do
echo globi.json | elton stream > interactions.tsv
Note that multi-line json would stream many datasets into the same interactions.tsv .
Improvements
- upgrade to globi-lib v0.26.2
Bugs
n/a
0.13.4
Features
n/a
Improvements
- upgrade to globi-lib v0.26.0 related to globalbioticinteractions/globalbioticinteractions#982 for support of primaryKey/foreignKey relations across tables of an interaction dataset.
Bugs
n/a
0.13.3
Features
n/a
Improvements
- upgrade to preston 0.7.8; related to #52
- add "track" as alias for update/pull/sync
- Bump xalan:xalan from 2.7.2 to 2.7.3
- update to globi libs v0.25.17
Bugs
n/a