-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Changes for specification V2.0.0 #105
Conversation
fix typo in STUDY CONTACTS table section
Make arc.cwl optional
Add data path annotation section to ARC specification
Fix typos in isa-xlsx
Add comment to annotation table
Validation specs
Rework Data Nodes
Add versioning of validation packages to validation_packages.yml
Updated ARC RO-Crate Profile Description
let git do the cqc versioning, add optional version suffix in cqc folder structure
Make some sections in investigation and study metadata sheets optional
Datamap specification
add `arc_specification` key to `validation_packages.yml`
ARC specification.md
Outdated
|
||
Each ARC is a directory containing the following elements: | ||
|
||
- *Studies* are collections of material and resources used within the investigation. | ||
Metadata that describe the characteristics of material and resources follow the ISA study model. Study-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a file `isa.study.xlsx`, which MUST exist to specify the input material or data resources. Resources MAY include biological materials (e.g. plant samples, analytical standards) created during the current investigation. Resources MAY further include external data (e.g., knowledge files, results files) that need to be included and cannot be referenced due to external limitations. Resources described in a study file can be the input for one or multiple assays. Further details on `isa.study.xlsx` are specified [below](#study-and-resources). Resource (descriptor) files MUST be placed in a `resources` subdirectory. | ||
Metadata that describe the characteristics of material and resources follow the ISA study model. Study-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.study.xlsx` file, which MUST exist to specify the input material or data resources. Resources MAY include biological materials (e.g. plant samples, analytical standards) created during the current investigation. Resources MAY further include external data (e.g., knowledge files, results files) that need to be included and cannot be referenced due to external limitations. Resources described in a study file can be the input for one or multiple assays. Further details on `isa.study.xlsx` are specified [below](#study-and-resources). Resource (descriptor) files MUST be placed in a `resources` subdirectory. Further explications about data entities defined in the study MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for studies containing data. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata that describe the characteristics of material and resources follow the ISA study model.
For me this sentence reads strangely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it was replaced but not cut out. Will throw it out now
ARC specification.md
Outdated
|
||
- *Assays* correspond to outcomes of experimental assays or analytical measurements (in the interpretation of the ISA model) and are treated as immutable data. Each assay is a collection of files, together with a corresponding metadata file, stored in a subdirectory of the top-level subdirectory `assays`. Assay-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a file `isa.assay.xlsx`, which MUST exist for each assay. Further details on `isa.assay.xlsx` are specified [below](#assay-data-and-metadata). Assay data files MUST be placed in a `dataset` subdirectory. | ||
- *Assays* correspond to outcomes of experimental assays or analytical measurements (in the interpretation of the ISA model) and are treated as immutable data. Each assay is a collection of files, together with a corresponding metadata file, stored in a subdirectory of the top-level subdirectory `assays`. Assay-level metadata is stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.assay.xlsx` file, which MUST exist for each assay. Further details on `isa.assay.xlsx` are specified [below](#assay-data-and-metadata). Assay data files MUST be placed in a `dataset` subdirectory. Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further explications about data entities defined in the assay MAY be stored in ISA-XLSX format in a
isa.datamap.xlsx
file, which SHOULD exist for each assay. Further details onisa.datamap.xlsx
are specified in the isa-xlsx specification
Mixing MAY and SHOULD? maybe unify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way it is currently written holds two questions (and imho in the wrong order).
- Should such metadata be stored?
- If so, where should it be stored?
Suggestion changing the order and using only one keyword (SHOULD):
Further explications about data entities defined in the assay SHOULD exist for each assay, in ISA-XLSX format in a isa.datamap.xlsx
file. Further details ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant the following way:
- In general, each assay or study
SHOULD
contain adatamap
file - For every data entity you MAY decide to add additional information to this
datamap
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I agree it reads like there are options for how to store additional information.
ARC specification.md
Outdated
\--- resources | ||
\--- protocol [optional / add. payload] | ||
\--- assays | ||
\--- <assay_name> | ||
| isa.assay.xlsx | ||
| isa.datamap.xlsx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]?
@@ -142,12 +156,16 @@ The `study` file MUST follow the [ISA-XLSX study file specification](ISA-XLSX.md | |||
|
|||
Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory. | |||
|
|||
### Assay Data and Metadata | |||
Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above (switch beween MAY and SHOULD)
|
||
All measurement data sets are considered as assays and are considered immutable input data. Assay data MUST be placed into a unique subdirectory of the top-level `assays` subdirectory. All ISA metadata specific to a single assay MUST be annotated in the file `isa.assay.xlsx` at the root of the assay's subdirectory. This workbook MUST contain a single assay that can be organized in one or multiple worksheets. | ||
|
||
The `assay` file MUST follow the [ISA-XLSX assay file specification](ISA-XLSX.md#assay-file). | ||
|
||
Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above (switch beween MAY and SHOULD)
ARC specification.md
Outdated
|
||
`assays/Assay2/isa.assay.xlsx`: | ||
|
||
| Input [Data] | Parameter[script file] | Output [Data] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Component [script file]? 😅
- Data nodes in `isa.assay.xlsx` files: The path MAY be specified relative to the `dataset` sub-folder of the assay | ||
- Data nodes in `isa.study.xlsx` files: The path MAY be specified relative to the `resources` sub-folder of the study | ||
|
||
### Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also get example for folder specific pattern
?
| Comment [Answer to everything] | | ||
|--------------------------------| | ||
| forty-two | | ||
|
||
## Others | ||
|
||
Columns whose headers do not follow any of the formats described above are considered additional payload and are out of the scope of this specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we now officially support free text columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd read that as "the tool implementing the standard is free to decide what to do with free text columns"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just a heads-up that a hard fail on unknown columns is not necessary.
ISA-XLSX.md
Outdated
In the `Datamap Table sheets`, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF). | ||
|
||
The content of the datamap table MUST be placed in an `xlsx table` whose name starts with `datamapTable`. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placed in an
xlsx table
whose name starts with
start with or equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wanted to keep it in line with the AnnotationTable. But I agree it's not necessary as there can only be one table. Will change to equals.
## Comments | ||
|
||
A `Comment` can be used to provide some additional information. Columns headed with `Comment[<comment name>]` MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we must remember Comment location in ARCtrl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, as the comment column does not refere to a specific other column but always to the table as a whole.
Add validation_summary.json specs, package metadata, and ARC apps
@Freymaurer @kappe-c @kMutagene I made some changes according to your comments. Please check again if your remarks are resolved now! |
This PR merges all changes accumulated for the Version 2.0.0 of the specification into the main branch. Afterwards, Version 2.0.0 will be released.
Big Overhaul of data representation by @HLWeil
Mechanisms for ARC Quality Control section by @kMutagene
Various changes to xlsx table representation
Make study section in investigaiton file optional
Various clarifications and fixes
Input would be very welcome, @kappe-c @floWetzels @ZimmerD @kMutagene @Freymaurer @chgarth @muehlhaus @gdoniparthi