-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-file import #1537
Multi-file import #1537
Conversation
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1537 +/- ##
==========================================
+ Coverage 88.22% 88.47% +0.24%
==========================================
Files 360 363 +3
Lines 27095 27533 +438
Branches 3647 3714 +67
==========================================
+ Hits 23905 24360 +455
+ Misses 1953 1941 -12
+ Partials 1237 1232 -5 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Johannes Kalmbach <[email protected]>
# Conflicts: # src/index/CMakeLists.txt # src/index/Permutation.h
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
TODO<joka921> We need dedicated testing with multiple inputs for the Multifile parser. Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1-1 with Johannes, committed some minor changes, looks very good now and we tested it end-to-end
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
So far, the input to QLever's index builder was always single input stream. Correspondingly, each Qleverfile had a variable `CAT_INPUT_FILES`, which specifies a command that writes the input to standard ouput, which is then piped into the index builder Since ad-freiburg/qlever#1537, QLever allows multipe input streams. The streams are parsed concurrently. For each stream separately, the following can be specified: the command that writes the input to standard output, the format, the graph of which this input should become a part, and whether the stream should be parsed in parallel (recommended for very large inputs but then all the prefix declarations must come at the beginning) or not. In the Qleverfile, multiple input streams and their configuration can now be specifed via `MULTI_INPUT_JSON`, which takes a JSON string as value . See `qlever index --help` or `src/qlever/qleverfile.py` for information about the structure of that JSON. Using `CAT_INPUT_FILES` is still supported, but it has to be either `MULTI_INPUT_JSON` or `CAT_INPUT_FILES`. The Qleverfile for Wikidata has been updated to use `MULTI_INPUT_JSON` (as a showcase and because it makes sense).
So far, QLever read its input from a single file or from standard input. This made it hard to associate graph information per file. It also caused problems when parallel parsing was activated and a Turtle file did not have all its prefix declarations at the beginning. With this change, QLever can read its input from multiple input streams (files or pipes), and the streams are parsed concurrently. It can be specified separately for each stream which default graph to use for that stream and whether to use the parallel parser or not. Specifying a value for
"parallel-parsing"
in the.settings.json
file is now deprecated.There will a corresponding change in https://github.com/ad-freiburg/qlever-control next that enables the convenient control of this new functionality from a
Qleverfile
.