Skip to content

datatype

Daniel Jacob edited this page Apr 9, 2020 · 5 revisions

Data Type

Whatever the kind of experiment, this assumes a design of experiment (DoE) involving individuals, samples or whatever things, as the main objects of study. This also assumes the observation of dependent variables resulting from effects of some controlled experimental factors. Moreover, the objects of study usually have an identifier for each of them, and the variables can be quantitative or qualitative.

Data management : promote good pratices

First and foremost, it is important to have well-organized data. The files generated during data collection have to be organized according to the entity-attribute relational model. Indeed, each entity corresponds to a type of collected data (samples, compounds, ...) for which is associated a set of attributes, i.e. observed or measured variables. Well organized data means that each variable forms a column, each observation forms a line, and each type of "unit observational" forms a table, i.e a file. Then, a link is established for each subset with the subset from which it was obtained, so that the links can be interpreted as "obtained from", since each column of each subset of data must be associated with a type of experimental data (called a category), especially those corresponding to identifiers that the links are based on.

figure1 Figure 1: A link is established for each subset with the subset from which it was obtained, so that the links can be interpreted as "obtained from", since each column of each subset of data must be associated with a type of experimental data (called a category), especially those corresponding to identifiers that the links are based on.

Another good pratice is to promote non-proprietary formats such as CSV or TSV, which is a necessary and indispensable step towards "open linked data". So an ODAM dataset is a bundle that contains a set of TSV files. The TSV files are simple tables containing the data of the dataset.

Minimal but relevant structural metadata

Two specific TSV files, namely s_subsets.tsv and a_attributes.tsv, describe the metadata of the dataset, including informational metadata like descriptions of measures, as well as structural metadata like references between tables. The metadata lets non-expert users explore and visualize your data.

figure2 Figure 2: s_subsets.tsv: a file allowing to associate with each subset of data a key concept corresponding to the main entity of the subset and the relations of the type "obtainedFrom" between these concepts

figure3 Figure 3: a_attributes.tsv: a metadata file allowing each attribute (concept/variable) to be annotated with some minimal but relevant metadata


Clone this wiki locally