Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purpose, motivation and implementation #1

Open
aidanheerdegen opened this issue Apr 18, 2023 · 2 comments
Open

Purpose, motivation and implementation #1

aidanheerdegen opened this issue Apr 18, 2023 · 2 comments

Comments

@aidanheerdegen
Copy link
Member

First up, props to @dougiesquire for starting the ball rolling.

What is the purpose of this repo?

Central location for all schema information for the ACCESS-NRI organisation.

Why?

Consistent approach across ACCESS-NRI organisation

Improves productivity as there is only one source of truth to find schema. Reduces the barrier for those new to the subject area who are not in a position to create their own schema due to lack of background knowledge.

Also naturally leads to interoperability: if everyone uses the same schema they re-use and connect with existing schema, which get the same connectivity "for free".

Such interconnected schema enables building knowledge graphs. A knowledge graph, or semantic network, is a graph based representation of the connections between objects contained in the schema. A knowledge graph can facilitate traversing data in novel ways that were previously unknown.

Knowledge graphs are a sort of ad hoc ontology.

Discoverability

Adding schema to webpages in json-ld format promotes discovery. It is the standard for semantic searching and cataloguing on the web.

This can lead to connections with other data providers, which adds value with little specific effort.

@aidanheerdegen aidanheerdegen changed the title Purpose and motivation Purpose, motivation and implementation Apr 18, 2023
@aidanheerdegen
Copy link
Member Author

How?

Format

The standard for schema on the web is RDF. That is what is used by schema.org and Bioschemas. Bioschemas is probably the one we should be following most closely.

An example schema is Bioschema Dataset, and an example record is

{
    "@context": "https://schema.org/",
    "@type": "Dataset",
    "http://purl.org/dc/terms/conformsTo": { "@type": "CreativeWork", "@id": "https://bioschemas.org/profiles/Dataset/1.0-RELEASE" },
    "@id": "https://doi.org/10.5281/zenodo.5743204",
    "identifier": "10.5281/zenodo.5743204",
    "name": "RDF version of the data from Choi, JS. et al. Towards a generalized toxicity prediction model for oxide nanomaterials using integrated data from different sources (2018)",
    "description": "This is an RDFied version of the dataset published in Choi, JS., Ha, M.K., Trinh, T.X. et al. Towards a generalized toxicity prediction model for oxide nanomaterials using integrated data from different sources. Sci Rep 8, 6110 (2018). The original dataset publication DOI: https://doi.org/10.1038/s41598-018-24483-z. The Original publication authors: Jang-Sik Choi, My Kieu Ha, Tung Xuan Trinh, Tae Hyun Yoon & Hyung-Gi Byun",
    "license": "https://creativecommons.org/licenses/by/4.0/legalcode",
    "url": "https://zenodo.org/record/5743204",
    "keywords": "oxide, nanomaterial, toxicity, prediction",
    "creator": [
      {
        "@type": "Organization",
        "name": "NanoSolveIT"
      }
    ],
    "datePublished": "2021-11-30",
    "citation": { "@type": "CreativeWork", "@id": "https://doi.org/10.1038/s41598-018-24483-z", "name": "Towards a generalized toxicity prediction model for oxide nanomaterials using integrated data from different sources" }
  }

Other use cases

This all very well, but how does this map to relational databases like the ones typically used for data indexing?

In a general sense not very well. However, the reverse mapping, from SQL DB to RDF is more straightforward.

If we're playing mostly in the relational DB/SQL space and so want the RDF mapping for interoperability with the wider world then that will affect how complex we let the schemas become. Or we have a strict hierarchy of schema, with a tighter definition at the bottom, which is interoperable with SQL, and higher level schema with more freedom that allow for more connectivity.

@dougiesquire
Copy link
Collaborator

dougiesquire commented Apr 18, 2023

Thanks for providing these details and context @aidanheerdegen

(Possibly) relevant climate-data examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants