You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When building a dataset from N-Quads, the JsonLdProcessor checks for every triple whether it is unique. This is done through a pairwise comparison in JsonLdProcessor._compare_rdf_triples()
This means that the triples being compared grows exponentially with the size of the dataset (or at least, the graph).
To give some metrics, for a 14k line N-Quads file, all in a single graph, the time drops from 18.8s with on my M1 mac to 0.7s without comparison.
Given the limited occurrence and impact of duplicate triples/quads in N-Quads files, this is really way too expensive.
At the very least, the parser could build an index (HashMap or dict) to speed up this comparison; but given that the JSON-LD builder that usually follows this step does this too, the entire comparison could be dropped as a whole.
The text was updated successfully, but these errors were encountered:
RinkeHoekstra
changed the title
JsonLdProcessor._compare_rdf_triples() is a massive performance hog in from_rdfJsonLdProcessor._compare_rdf_triples() is a massive performance hog in parse_nquadsNov 4, 2022
When building a dataset from N-Quads, the
JsonLdProcessor
checks for every triple whether it is unique. This is done through a pairwise comparison inJsonLdProcessor._compare_rdf_triples()
This means that the triples being compared grows exponentially with the size of the dataset (or at least, the graph).
pyld/lib/pyld/jsonld.py
Line 1634 in 316fbc2
To give some metrics, for a 14k line N-Quads file, all in a single graph, the time drops from 18.8s with on my M1 mac to 0.7s without comparison.
Given the limited occurrence and impact of duplicate triples/quads in N-Quads files, this is really way too expensive.
At the very least, the parser could build an index (HashMap or dict) to speed up this comparison; but given that the JSON-LD builder that usually follows this step does this too, the entire comparison could be dropped as a whole.
The text was updated successfully, but these errors were encountered: