Morph-KGC is an engine that constructs RDF knowledge graphs from heterogeneous data sources with the R2RML and RML mapping languages. Morph-KGC is built on top of pandas and it leverages mapping partitions to significantly reduce execution times and memory consumption for large data sources.
- User-friendly mappings with YARRRML.
- Transformation functions with RML-FNML, including Python user-defined functions.
- RDF-star generation with RML-star.
- RML views over tabular data sources and JSON files.
- Integration with RDFLib, Oxigraph and Kafka.
- Optimized to materialize large knowledge graphs.
- Remote data and mapping files.
- Input data formats:
- Relational databases: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, MariaDB, SQLite.
- Tabular files: CSV, TSV, Excel, Parquet, Feather, ORC, Stata, SAS, SPSS, ODS.
- Hierarchical files: JSON, XML.
- In-memory data structures: Python Dictionaries, DataFrames.
- Cloud data lake solutions: Databricks.
- Property graph databases: Neo4j, Kùzu.
Learn quickly with the tutorial in Google Colaboratory!
PyPi is the fastest way to install Morph-KGC:
pip install morph-kgc
We recommend to use virtual environments to install Morph-KGC.
To run the engine via command line you just need to execute the following:
python3 -m morph_kgc config.ini
Check the documentation to see how to generate the configuration INI file. Here you can also see an example INI file.
It is also possible to run Morph-KGC as a library with RDFLib and Oxigraph:
import morph_kgc
# generate the triples and load them to an RDFLib graph
g_rdflib = morph_kgc.materialize('/path/to/config.ini')
# work with the RDFLib graph
q_res = g_rdflib.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')
# generate the triples and load them to Oxigraph
g_oxigraph = morph_kgc.materialize_oxigraph('/path/to/config.ini')
# work with Oxigraph
q_res = g_oxigraph.query('SELECT DISTINCT ?classes WHERE { ?s a ?classes }')
# the methods above also accept the config as a string
config = """
[DataSource1]
mappings: /path/to/mapping/mapping_file.rml.ttl
db_url: mysql+pymysql://user:password@localhost:3306/db_name
"""
g_rdflib = morph_kgc.materialize(config)
Morph-KGC is available under the Apache License 2.0.
Ontology Engineering Group, Universidad Politécnica de Madrid.
If you used Morph-KGC in your work, please cite the SoftwareX or SWJ papers:
@article{arenas2024rmlfnml,
title = {{An RML-FNML module for Python user-defined functions in Morph-KGC}},
author = {Julián Arenas-Guerrero and Paola Espinoza-Arias and José Antonio Bernabé-Diaz and Prashant Deshmukh and José Luis Sánchez-Fernández and Oscar Corcho},
journal = {SoftwareX},
year = {2024},
volume = {26},
pages = {101709},
issn = {2352-7110},
publisher = {Elsevier},
doi = {10.1016/j.softx.2024.101709}
}
@article{arenas2024morph,
title = {{Morph-KGC: Scalable knowledge graph materialization with mapping partitions}},
author = {Arenas-Guerrero, Julián and Chaves-Fraga, David and Toledo, Jhon and Pérez, María S. and Corcho, Oscar},
journal = {Semantic Web},
year = {2024},
volume = {15},
number = {1},
pages = {1-20},
issn = {2210-4968},
publisher = {IOS Press},
doi = {10.3233/SW-223135}
}