Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy join #1524

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
1840f4c
Make stuff static or const where possible
RobinTF Sep 24, 2024
a287bf3
Cleanup join code
RobinTF Sep 26, 2024
1cc9f8b
Extend BlockZipperJoinImpl to handle undef values correctly
RobinTF Sep 27, 2024
2f3f4fa
Remove redundant semicolons
RobinTF Sep 29, 2024
56d334a
Fix condition
RobinTF Sep 29, 2024
d9bccea
Make first prototype for lazy joins
RobinTF Sep 30, 2024
243ea2b
Add comment and try to fix warnings in clang
RobinTF Sep 30, 2024
465baa2
Add Unit tests for join algorithm
RobinTF Oct 1, 2024
3508a92
Properly call flush
RobinTF Oct 1, 2024
b77230e
Fix testing code
RobinTF Oct 1, 2024
01e2d8e
Add another unit test and fix bug
RobinTF Oct 2, 2024
856c91c
Merge remote-tracking branch 'ad-freiburg/master' into lazy-join
RobinTF Oct 2, 2024
6e5558e
Split up code into dedicated functions
RobinTF Oct 2, 2024
2834a28
Reuse code and fix sonarcloud issues
RobinTF Oct 2, 2024
1a74800
Make join with undef work
RobinTF Oct 2, 2024
7ef6b5b
Simplify functions further
RobinTF Oct 2, 2024
64d404d
Allow lazy generator output of join
RobinTF Oct 3, 2024
ac63c9c
Skip undefined if guaranteed to be undefined
RobinTF Oct 3, 2024
1fdec46
Use contract check instead of correctness check
RobinTF Oct 4, 2024
4baaf81
Add some tests and fix lifetime issues with coroutines
RobinTF Oct 5, 2024
831507b
Merge remote-tracking branch 'ad-freiburg/master' into lazy-join
RobinTF Oct 5, 2024
c6f0525
Add more unit tests
RobinTF Oct 6, 2024
fefe3bb
Generalize interfaces and unify code
RobinTF Oct 6, 2024
5b91fb1
Don't repeat code
RobinTF Oct 6, 2024
4d535a8
Fix typo
RobinTF Oct 6, 2024
9fe9ef2
Adjust tests to cover their intended code
RobinTF Oct 6, 2024
3ac9faf
Optimize code again for coverage
RobinTF Oct 6, 2024
b9f2594
Fix some sonarcloud issues
RobinTF Oct 6, 2024
3c2a473
Use helper function to consume single-step generator
RobinTF Oct 8, 2024
9a379ce
Merge remote-tracking branch 'ad-freiburg/master' into lazy-join
RobinTF Oct 8, 2024
b7536c4
Generalize interface for monostate conversion
RobinTF Oct 8, 2024
1733d27
Allow IndexScan optimizations to yield lazy generators
RobinTF Oct 8, 2024
fd0b633
Remove unused member from join class
RobinTF Oct 8, 2024
0f83805
Improve partial coverage a bit
RobinTF Oct 9, 2024
bc09d1e
Add variants of test cases for greater coverage
RobinTF Oct 9, 2024
3db3229
Merge branch 'master' into lazy-join
RobinTF Oct 18, 2024
e43ff76
Merge remote-tracking branch 'ad-freiburg/master' into lazy-join
RobinTF Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/engine/AddCombinedRowToTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,11 @@ class AddCombinedRowToIdTable {
}
}

IdTable& resultTable() & {
flush();
return resultTable_;
}

// Move the result out after the last write. The function ensures, that the
// `flush()` is called before doing so.
IdTable&& resultTable() && {
Expand Down
445 changes: 310 additions & 135 deletions src/engine/Join.cpp

Large diffs are not rendered by default.

34 changes: 27 additions & 7 deletions src/engine/Join.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#include "engine/QueryExecutionTree.h"
#include "util/HashMap.h"
#include "util/HashSet.h"
#include "util/JoinAlgorithms/JoinAlgorithms.h"
#include "util/TypeTraits.h"

class Join : public Operation {
private:
Expand Down Expand Up @@ -93,6 +93,26 @@ class Join : public Operation {
void join(const IdTable& a, ColumnIndex jc1, const IdTable& b,
ColumnIndex jc2, IdTable* result) const;

// Allows the parameters to be null, which causes them to be ignored.
static LocalVocab mergeVocabsIfNecessary(
const std::shared_ptr<const Result>& result1,
const std::shared_ptr<const Result>& result2);

template <typename T>
ProtoResult monostateGeneratorToResult(
bool requestedLaziness, cppcoro::generator<std::monostate> generator,
std::shared_ptr<const Result> a, std::shared_ptr<const Result> b,
T rowAdder,
ad_utility::InvocableWithExactReturnType<IdTable, T&> auto extractTable,
std::invocable auto postAction) const;

static bool couldContainUndef(const auto& blocks, const auto& tree,
ColumnIndex joinColumn);

ProtoResult lazyJoin(std::shared_ptr<const Result> a, ColumnIndex jc1,
std::shared_ptr<const Result> b, ColumnIndex jc2,
bool requestLaziness) const;

Comment on lines +96 to +115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some documentation, what do those functions do?

/**
* @brief Joins IdTables dynA and dynB on join column jc2, returning
* the result in dynRes. Creates a cross product for matching rows by putting
Expand All @@ -113,24 +133,24 @@ class Join : public Operation {
virtual string getCacheKeyImpl() const override;

private:
ProtoResult computeResult([[maybe_unused]] bool requestLaziness) override;
ProtoResult computeResult(bool requestLaziness) override;

VariableToColumnMap computeVariableToColumnMap() const override;

// A special implementation that is called when both children are
// `IndexScan`s. Uses the lazy scans to only retrieve the subset of the
// `IndexScan`s that is actually needed without fully materializing them.
IdTable computeResultForTwoIndexScans();
ProtoResult computeResultForTwoIndexScans(bool requestLaziness) const;

// A special implementation that is called when one of the children is an
// `IndexScan`. The argument `scanIsLeft` determines whether the `IndexScan`
// is the left or the right child of this `Join`. This needs to be known to
// determine the correct order of the columns in the result.
template <bool scanIsLeft>
IdTable computeResultForIndexScanAndIdTable(const IdTable& idTable,
ColumnIndex joinColTable,
IndexScan& scan,
ColumnIndex joinColScan);
ProtoResult computeResultForIndexScanAndIdTable(
bool requestLaziness, const IdTable& idTable, ColumnIndex joinColTable,
std::shared_ptr<IndexScan> scan, ColumnIndex joinColScan,
const std::shared_ptr<const Result>& subResult = nullptr) const;

/*
* @brief Combines 2 rows like in a join and inserts the result in the
Expand Down
7 changes: 4 additions & 3 deletions src/index/IndexImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,10 @@ auto lazyOptionalJoinOnFirstColumn(auto& leftInput, auto& rightInput,
std::make_shared<ad_utility::CancellationHandle<>>(),
BUFFER_SIZE_JOIN_PATTERNS_WITH_OSP, resultCallback};

ad_utility::zipperJoinForBlocksWithoutUndef(leftInput, rightInput, comparator,
rowAdder, projection, projection,
std::true_type{});
auto generator = ad_utility::zipperJoinForBlocksWithoutUndef(
true, std::ranges::ref_view(leftInput), std::ranges::ref_view(rightInput),
comparator, rowAdder, projection, projection, std::true_type{});
ad_utility::consumeSingleStepGenerator(generator);
rowAdder.flush();
}

Expand Down
Loading
Loading