GH-5231: fix poor query performance for hasStatements() in FedX #5232

aschwarte10 · 2025-01-15T12:13:34Z

GitHub issue resolved: #5231

The previous implementation of the FedXConnection was delegating "hasStatements()" to the implementation of "getStatements()", where the latter was actually fetching data from the federation members.

For checks hasStatements() checks like {null, rdf:type, null} or even {null, null, null} the implementation is problematic as it would fetch all data matching the pattern from the federation members, only to answer if it actually exists.

We now make use of "existence" check on the federation members, and can actually rely on the source selection cache for this.

Unit test coverage has been added.

PR Author Checklist (see the contributor guidelines for more details):

my pull request is self-contained
I've added tests for the changes I made
I've applied code formatting (you can use mvn process-resources to format from the command line)
I've squashed my commits where necessary
every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

The previous implementation of the FedXConnection was delegating "hasStatements()" to the implementation of "getStatements()", where the latter was actually fetching data from the federation members. For checks hasStatements() checks like {null, rdf:type, null} or even {null, null, null} the implementation is problematic as it would fetch all data matching the pattern from the federation members, only to answer if it actually exists. We now make use of "existence" check on the federation members, and can actually rely on the source selection cache for this. Unit test coverage has been added.

hmottestad · 2025-01-21T21:07:53Z

.../federation/src/main/java/org/eclipse/rdf4j/federated/evaluation/FederationEvalStrategy.java

+	/**
+	 * Returns the accessible federation members in the context of the query. By default this is all federation members.
+	 *
+	 * @param queryInfo
+	 * @return
+	 */
+	protected List<Endpoint> getAccessibleFederationMembers(QueryInfo queryInfo) {
+		return federationContext.getFederation().getMembers();
+	}


Can you explain more why this needs to be protected. I didn't see that it's override anywhere in the code.

Sure. We make use of specializations of FederationEvalStrategy to validate new optimizations (before contributing them back to RDF4J) or add additional (special) ones applicable to our use-cases. Without having this method accessible in sub-classes, we would need to override the two methods entirely (get/hasStatementsInternal) as we have a specialization for accessible members, currently validating resilience, but looking ahead also for policies/permissions. Does this help?

Sound good. Can you update the docs to reflect that this is meant to be an extension point for subclasses. If you are unsure if you need to change it in future you can annotate it as experimental if you want.

Good suggestion, done

hmottestad · 2025-01-21T21:09:06Z

Btw, as a rule of thumb we usually consider performance issues as bugs and allow them in bug fix releases.

hmottestad

You can merge when you are ready.

hmottestad · 2025-01-23T11:11:15Z

I cancelled the test that was taking too long. It's an issue related to some very specific edge cases when connections and resources are not closed before trying to shutdown the ShaclSail. I have a PR open for trying to fix it, but it's a bit tricky.

aschwarte10 requested a review from hmottestad January 15, 2025 12:13

hmottestad reviewed Jan 21, 2025

View reviewed changes

GH-5231: refine javadoc, add Experimental annotation

1cc4ab8

hmottestad approved these changes Jan 23, 2025

View reviewed changes

aschwarte10 merged commit dcacf74 into main Jan 24, 2025
8 of 9 checks passed

aschwarte10 deleted the GH-5231-poor-performance-has-statement branch January 24, 2025 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-5231: fix poor query performance for hasStatements() in FedX #5232

GH-5231: fix poor query performance for hasStatements() in FedX #5232

aschwarte10 commented Jan 15, 2025 •

edited

Loading

hmottestad Jan 21, 2025

aschwarte10 Jan 22, 2025

hmottestad Jan 22, 2025

aschwarte10 Jan 23, 2025

hmottestad commented Jan 21, 2025

hmottestad left a comment

hmottestad commented Jan 23, 2025

GH-5231: fix poor query performance for hasStatements() in FedX #5232

GH-5231: fix poor query performance for hasStatements() in FedX #5232

Conversation

aschwarte10 commented Jan 15, 2025 • edited Loading

hmottestad Jan 21, 2025

Choose a reason for hiding this comment

aschwarte10 Jan 22, 2025

Choose a reason for hiding this comment

hmottestad Jan 22, 2025

Choose a reason for hiding this comment

aschwarte10 Jan 23, 2025

Choose a reason for hiding this comment

hmottestad commented Jan 21, 2025

hmottestad left a comment

Choose a reason for hiding this comment

hmottestad commented Jan 23, 2025

aschwarte10 commented Jan 15, 2025 •

edited

Loading