Skip to content

Latest commit

 

History

History
118 lines (73 loc) · 5.21 KB

index.md

File metadata and controls

118 lines (73 loc) · 5.21 KB
layout
default

Build Status

Biolink Model

A high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc) and their associations.

Biolink Model is designed as a way of standardizing types and relational structures in knowledge graphs (KGs), where the KG may be either a property graph or RDF triple store.

The schema is expressed as a YAML, which is translated to:

Datamodel

The schema assumes a property graph, where nodes represent individual entities, and edges represent relationship between entities. Biolink Model provides a schema for representing both nodes and edges.

The model itself can be divided into a few parts,

  • Entities (subjects and objects)
  • Predicates (relationships between core concepts)
  • Associations (statements including evidence and provenance)
  • Entity Slots (node properties)
  • Edge Slots (edge properties)

Entities

A entity corresponds to a database entity or a concept, represented as a node in a property graph.

All typed entities are a sub-class of NamedThing.

Each entity has,

  • its own unique stable URI
  • mappings to other ontologies (SIO, SO, etc.)
  • list of valid ID prefixes

These entity types are higher level terms that can be used to categorize nodes in a KG.

For more detailed typing, one can use specific terms from an ontology.

Associations

A typed association between two entities, usually supported by evidence and provenance. An association is represented as an edge/relationship between two nodes, in a property graph.

All edges are a sub-class of Association.

An association connects a subject node and an object node via a relation property. The nature of the association is defined based on the relation property.

Certain associations can have additional properties like provided_by, has_evidence, publications.

Slots

Slots are used to collectively refer to, both, node and edge properties.

There are two types of slots defined in the model,

Browse the Biolink Model to explore all defined entities, associations, and slots.

Identifiers

See Biolink Model JSON-LD context for a list of CURIE prefix mappings.

These include prefix expansions such as:

  "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
  "NCBIGene": "http://www.ncbi.nlm.nih.gov/gene/",
  "NCIT": "http://purl.obolibrary.org/obo/NCIT_",

Note: We do not curate these in Biolink Model. Rather we take these from upstream sources, via PrefixCommons. We specify a priority order of upstream sources in cases where conflicts may occur. See the default_curi_maps tag at the top of the biolink-model.yaml.

We also specify a small set of top-level prefix overrides via the prefixes tag at the top of the YAML.

Biolink Model representation

Biolink Model aims at representing knowledge in a graph form regardless of the graph representation used.

Following are some recommendations when attempting to use Biolink Model with each style of representation.

Citing Biolink Model

Unni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ, The Biomedical Data Translator Consortium (2022). Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin Transl Sci. Wiley; 2022 Jun 6; https://onlinelibrary.wiley.com/doi/10.1111/cts.13302