Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIDS 2.0: harmonize TSV columns to be singular (ATM "units" and "migrate_plural_columns") #1821

Draft
wants to merge 3 commits into
base: bids-2.0
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,7 @@ with two exceptions:
It is RECOMMENDED that the column names in the header of the TSV file are
written in [`snake_case`](https://en.wikipedia.org/wiki/Snake_case) with the
first letter in lower case (for example, `variable_name`, not `Variable_name`).
It is RECOMMENDED that the column names are singular (for example, `variable_name`, not `variable_names`).
Column names defined in the header MUST be separated with tabs as for the data contents.
Furthermore, column names MUST NOT be blank (that is, an empty string) and MUST NOT
be duplicated within a single TSV file.
Expand Down
4 changes: 2 additions & 2 deletions src/schema/objects/columns.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -237,8 +237,8 @@ high_cutoff:
- type: string
enum:
- n/a
hplc_recovery_fractions:
name: hplc_recovery_fractions
hplc_recovery_fraction:
name: hplc_recovery_fraction
display_name: HPLC recovery fractions
description: |
HPLC recovery fractions (the fraction of activity that gets loaded onto the HPLC).
Expand Down
2 changes: 1 addition & 1 deletion src/schema/objects/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2023,7 +2023,7 @@ MetaboliteRecoveryCorrectionApplied:
description: |
Metabolite recovery correction from the HPLC, for tracers where it changes
with time postinjection.
If `true`, the `hplc_recovery_fractions` column MUST be present in the
If `true`, the `hplc_recovery_fraction` column MUST be present in the
corresponding `*_blood.tsv` file.
type: boolean
MiscChannelCount:
Expand Down
4 changes: 2 additions & 2 deletions src/schema/rules/tabular_data/pet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Blood:
metabolite_polar_fraction:
level: optional
level_addendum: recommended if `MetaboliteAvail` is `true`
hplc_recovery_fractions:
hplc_recovery_fraction:
level: optional
level_addendum: required if `MetaboliteRecoveryCorrectionApplied` is `true`
whole_blood_radioactivity:
Expand Down Expand Up @@ -48,7 +48,7 @@ BloodMetaboliteCorrection:
- extension == ".tsv"
- sidecar.MetaboliteRecoveryCorrectionApplied == true
columns:
hplc_recovery_fractions: required
hplc_recovery_fraction: required

BloodWholeBlood:
selectors:
Expand Down
20 changes: 20 additions & 0 deletions tools/schemacode/bidsschematools/migrations.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,25 @@ def migrate_participants(dataset_path: Path):
lgr.info(f" - migrated content in {new_file}")


def migrate_tsv_columns(dataset_path: Path):
"""
Rename some columns in .tsv (and corresponding sidecar .json)
"""
# TODO: ideally here we would not provide file_glob
# but rather take schema and deduce which files could have
# the column... alternatively - consider all .tsv files and
# their .json files (note -- could be above and multiple given
# inheritance principle)
for col_from, col_to, file_glob in (
# https://github.com/bids-standard/bids-2-devel/issues/78
("hplc_recovery_fraction", "hplc_recovery_fraction", "*_blood.*"),
# https://github.com/bids-standard/bids-2-devel/issues/15
("units", "unit", "_channels.*"), # dependency on migrate_participants
# ??? Any other columns to rename for some reason?
):
raise NotImplementedError()


def migrate_dataset(dataset_path):
lgr.info(f"Migrating dataset at {dataset_path}")
dataset_path = Path(dataset_path)
Expand All @@ -74,6 +93,7 @@ def migrate_dataset(dataset_path):
for migration in [
migrate_participants,
migrate_version,
migrate_tsv_columns, # depends on migrate_participants
]:
lgr.info(f" - applying migration {migration.__name__}")
migration(dataset_path)
Expand Down
Loading