-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-5231: fix poor query performance for hasStatements() in FedX #5232
Conversation
The previous implementation of the FedXConnection was delegating "hasStatements()" to the implementation of "getStatements()", where the latter was actually fetching data from the federation members. For checks hasStatements() checks like {null, rdf:type, null} or even {null, null, null} the implementation is problematic as it would fetch all data matching the pattern from the federation members, only to answer if it actually exists. We now make use of "existence" check on the federation members, and can actually rely on the source selection cache for this. Unit test coverage has been added.
/** | ||
* Returns the accessible federation members in the context of the query. By default this is all federation members. | ||
* | ||
* @param queryInfo | ||
* @return | ||
*/ | ||
protected List<Endpoint> getAccessibleFederationMembers(QueryInfo queryInfo) { | ||
return federationContext.getFederation().getMembers(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more why this needs to be protected. I didn't see that it's override anywhere in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. We make use of specializations of FederationEvalStrategy to validate new optimizations (before contributing them back to RDF4J) or add additional (special) ones applicable to our use-cases. Without having this method accessible in sub-classes, we would need to override the two methods entirely (get/hasStatementsInternal) as we have a specialization for accessible members, currently validating resilience, but looking ahead also for policies/permissions. Does this help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound good. Can you update the docs to reflect that this is meant to be an extension point for subclasses. If you are unsure if you need to change it in future you can annotate it as experimental if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, done
Btw, as a rule of thumb we usually consider performance issues as bugs and allow them in bug fix releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can merge when you are ready.
I cancelled the test that was taking too long. It's an issue related to some very specific edge cases when connections and resources are not closed before trying to shutdown the ShaclSail. I have a PR open for trying to fix it, but it's a bit tricky. |
GitHub issue resolved: #5231
The previous implementation of the FedXConnection was delegating "hasStatements()" to the implementation of "getStatements()", where the latter was actually fetching data from the federation members.
For checks hasStatements() checks like {null, rdf:type, null} or even {null, null, null} the implementation is problematic as it would fetch all data matching the pattern from the federation members, only to answer if it actually exists.
We now make use of "existence" check on the federation members, and can actually rely on the source selection cache for this.
Unit test coverage has been added.
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)