Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greedy query planning for large connected components #1442

Merged
merged 27 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
2b8b261
Greedy quuery planning.
joka921 Aug 14, 2024
b6d0371
In the middle of implementing this and that.
joka921 Aug 19, 2024
510c606
in the middle of things.
joka921 Aug 19, 2024
40f8e84
Merge branch 'master' into greedy-query-planner
joka921 Sep 26, 2024
b58e196
Fix the merge.
joka921 Sep 26, 2024
eb14678
Merge remote-tracking branch 'origin/greedy-query-planner' into greed…
joka921 Sep 26, 2024
f37f3d7
Fix a bug.. lets trythis
joka921 Oct 6, 2024
f48e06f
Merge branch 'master' into greedy-query-planner
joka921 Oct 6, 2024
9e51063
The counting of the subgraphs works, and we have found out that the D…
joka921 Oct 6, 2024
df9d3b4
Correctly handle GROUP BY in a `GRAPH ?var` clause
joka921 Oct 7, 2024
f0e4442
Make the subgraph counting more efficient and make its threshold conf…
joka921 Oct 8, 2024
f23f30f
Fix the code and the tests and do some refactoring.
joka921 Oct 8, 2024
6b7166c
Merge branch 'group-by-in-graph' into greedy-query-planner
joka921 Oct 8, 2024
2ccbc70
Before writing tests.
joka921 Oct 9, 2024
074dbfc
Now we have reasonable numbers for the result of countSubgraphs...
joka921 Oct 9, 2024
466479b
Fix test coverage and comments.
joka921 Oct 9, 2024
e3453d8
Fix test coverage and comments.
joka921 Oct 9, 2024
c417e9a
Small fixes for sonar
joka921 Oct 9, 2024
708d165
Merge branch 'master' into greedy-query-planner
joka921 Oct 9, 2024
198a589
Different parameter.
joka921 Oct 9, 2024
e201c07
small changes from an initial review.
joka921 Oct 9, 2024
777cca7
Changes made during 1-1 with Johannes
Oct 9, 2024
0e78527
Add some additional things.
joka921 Oct 10, 2024
15df99b
Fix codespell
joka921 Oct 10, 2024
943c3e9
A few more minor improvements before merging this
Oct 10, 2024
1d95109
Merge remote-tracking branch 'origin/master' into greedy-query-planner
Oct 10, 2024
20a5538
Remove code duplicate introduced by automatic merge
Oct 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 69 additions & 8 deletions src/engine/QueryPlanner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include "engine/TransitivePathBase.h"
#include "engine/Union.h"
#include "engine/Values.h"
#include "global/RuntimeParameters.h"
#include "parser/Alias.h"
#include "parser/SparqlParserHelpers.h"

Expand Down Expand Up @@ -1193,6 +1194,19 @@
row.insert(row.end(), addedPlans.begin(), addedPlans.end());
}

// _____________________________________________________________________________
size_t QueryPlanner::findUniqueNodeIds(
const std::vector<SubtreePlan>& connectedComponent) {
ad_utility::HashSet<uint64_t> uniqueNodeIds;
// TODO<joka921> Assert that all the node IDs only consist of a single
// element. Or even better: Do a bitwise or and then count the number of set
// bits...
std::ranges::copy(connectedComponent | std::views::transform(
&SubtreePlan::_idsOfIncludedNodes),
std::inserter(uniqueNodeIds, uniqueNodeIds.end()));
return uniqueNodeIds.size();
}

// _____________________________________________________________________________
std::vector<QueryPlanner::SubtreePlan>
QueryPlanner::runDynamicProgrammingOnConnectedComponent(
Expand All @@ -1206,16 +1220,12 @@
dpTab.push_back(std::move(connectedComponent));
applyFiltersIfPossible<false>(dpTab.back(), filters);
applyTextLimitsIfPossible(dpTab.back(), textLimits, false);
ad_utility::HashSet<uint64_t> uniqueNodeIds;
std::ranges::copy(
dpTab.back() | std::views::transform(&SubtreePlan::_idsOfIncludedNodes),
std::inserter(uniqueNodeIds, uniqueNodeIds.end()));
size_t numSeeds = uniqueNodeIds.size();
size_t numSeeds = findUniqueNodeIds(dpTab.back());

for (size_t k = 2; k <= numSeeds; ++k) {
LOG(TRACE) << "Producing plans that unite " << k << " triples."
<< std::endl;
dpTab.emplace_back(vector<SubtreePlan>());
dpTab.emplace_back();
for (size_t i = 1; i * 2 <= k; ++i) {
checkCancellation();
auto newPlans = merge(dpTab[i - 1], dpTab[k - i - 1], tg);
Expand All @@ -1230,6 +1240,38 @@
return std::move(dpTab.back());
}

// _____________________________________________________________________________
std::vector<QueryPlanner::SubtreePlan>
QueryPlanner::runGreedyPlanningOnConnectedComponent(
std::vector<SubtreePlan> connectedComponent,
const vector<SparqlFilter>& filters, const TextLimitMap& textLimits,
const TripleGraph& tg) const {
auto& result = connectedComponent;
applyFiltersIfPossible<true>(result, filters);
applyTextLimitsIfPossible(result, textLimits, true);
size_t numSeeds = findUniqueNodeIds(result);

Check warning on line 1252 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1248-L1252

Added lines #L1248 - L1252 were not covered by tests

while (numSeeds > 1) {
checkCancellation();
auto newPlans = merge(result, result, tg);
applyFiltersIfPossible<true>(newPlans, filters);
applyTextLimitsIfPossible(newPlans, textLimits, true);
auto smallestIdx = findSmallestExecutionTree(newPlans);
auto& cheapestNewTree = newPlans.at(smallestIdx);
size_t oldSize = result.size();
std::erase_if(result, [&cheapestNewTree](const auto& plan) {

Check warning on line 1262 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1255-L1262

Added lines #L1255 - L1262 were not covered by tests
// TODO<joka921> We can also assert some other invariants here.
return (cheapestNewTree._idsOfIncludedNodes & plan._idsOfIncludedNodes) !=
0;
});
result.push_back(std::move(cheapestNewTree));
AD_CORRECTNESS_CHECK(result.size() < oldSize);
numSeeds--;
}

Check warning on line 1270 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1264-L1270

Added lines #L1264 - L1270 were not covered by tests
// TODO<joka921> Assert that all seeds are covered by the result.
return std::move(result);
}

Check warning on line 1273 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1272-L1273

Added lines #L1272 - L1273 were not covered by tests

// _____________________________________________________________________________
vector<vector<QueryPlanner::SubtreePlan>> QueryPlanner::fillDpTab(
const QueryPlanner::TripleGraph& tg, vector<SparqlFilter> filters,
Expand All @@ -1247,9 +1289,13 @@
components[componentIndices.at(i)].push_back(std::move(initialPlans.at(i)));
}
vector<vector<SubtreePlan>> lastDpRowFromComponents;
bool useGreedyPlanning = RuntimeParameters().get<"use-greedy-planning">();
for (auto& component : components | std::views::values) {
lastDpRowFromComponents.push_back(runDynamicProgrammingOnConnectedComponent(
std::move(component), filters, textLimits, tg));
auto impl = useGreedyPlanning
? &QueryPlanner::runGreedyPlanningOnConnectedComponent
: &QueryPlanner::runDynamicProgrammingOnConnectedComponent;
lastDpRowFromComponents.push_back(
std::invoke(impl, this, std::move(component), filters, textLimits, tg));
checkCancellation();
}
size_t numConnectedComponents = lastDpRowFromComponents.size();
Expand Down Expand Up @@ -1593,6 +1639,21 @@
lastRow.begin();
};

// _________________________________________________________________________________
size_t QueryPlanner::findSmallestExecutionTree(
const std::vector<SubtreePlan>& lastRow) {
AD_CONTRACT_CHECK(!lastRow.empty());

Check warning on line 1645 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1644-L1645

Added lines #L1644 - L1645 were not covered by tests
// TODO<joka921> Precompute the sizes and costs. (or check if they are
// cached).
auto compare = [](const auto& a, const auto& b) {
auto aSize = a.getSizeEstimate(), bSize = b.getSizeEstimate();
auto aCost = a.getCostEstimate(), bCost = b.getCostEstimate();
return std::tie(aSize, aCost) < std::tie(bSize, bCost);
};
return std::min_element(lastRow.begin(), lastRow.end(), compare) -
lastRow.begin();
};

Check warning on line 1655 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1648-L1655

Added lines #L1648 - L1655 were not covered by tests

// _____________________________________________________________________________
std::vector<QueryPlanner::SubtreePlan> QueryPlanner::createJoinCandidates(
const SubtreePlan& ain, const SubtreePlan& bin,
Expand Down
11 changes: 10 additions & 1 deletion src/engine/QueryPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,11 @@ class QueryPlanner {
const vector<SparqlFilter>& filters, const TextLimitMap& textLimits,
const TripleGraph& tg) const;

std::vector<QueryPlanner::SubtreePlan> runGreedyPlanningOnConnectedComponent(
std::vector<SubtreePlan> connectedComponent,
const vector<SparqlFilter>& filters, const TextLimitMap& textLimits,
const TripleGraph& tg) const;

// Creates a SubtreePlan for the given text leaf node in the triple graph.
// While doing this the TextLimitMetaObjects are created and updated according
// to the text leaf node.
Expand Down Expand Up @@ -529,8 +534,12 @@ class QueryPlanner {
* sorting by the cache key when comparing equally cheap indices, else the
* first element that has the minimum index is returned.
*/
[[nodiscard]] size_t findCheapestExecutionTree(
size_t findCheapestExecutionTree(
const std::vector<SubtreePlan>& lastRow) const;
static size_t findSmallestExecutionTree(
const std::vector<SubtreePlan>& lastRow);
static size_t findUniqueNodeIds(
const std::vector<SubtreePlan>& connectedComponent);

/// if this Planner is not associated with a queryExecutionContext we are only
/// in the unit test mode
Expand Down
3 changes: 2 additions & 1 deletion src/global/RuntimeParameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ inline auto& RuntimeParameters() {
SizeT<"lazy-index-scan-max-size-materialization">{1'000'000},
Bool<"use-binsearch-transitive-path">{true},
Bool<"group-by-hash-map-enabled">{false},
SizeT<"service-max-value-rows">{100}};
SizeT<"service-max-value-rows">{100},
Bool<"use-greedy-planning">{false}};
}();
return params;
}
Expand Down
Loading