Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy join #1524

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

Implement lazy join #1524

wants to merge 37 commits into from

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Sep 30, 2024

This PR adds support for lazily computed join operations.

Copy link

codecov bot commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 89.77273% with 54 lines in your changes missing coverage. Please review.

Project coverage is 88.99%. Comparing base (bf36257) to head (e43ff76).

Files with missing lines Patch % Lines
src/util/JoinAlgorithms/JoinAlgorithms.h 86.04% 0 Missing and 36 partials ⚠️
src/engine/Join.cpp 93.12% 5 Missing and 13 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1524      +/-   ##
==========================================
+ Coverage   88.95%   88.99%   +0.03%     
==========================================
  Files         366      366              
  Lines       33256    33644     +388     
  Branches     3729     3774      +45     
==========================================
+ Hits        29582    29940     +358     
+ Misses       2440     2433       -7     
- Partials     1234     1271      +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RobinTF RobinTF marked this pull request as ready for review October 3, 2024 22:50
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of reviews, let's discuss the complexity of this, which somewhat surprises me
(I already think my original code is too complex).

Comment on lines 477 to 482
for ([[maybe_unused]] std::monostate& _ : generator) {
// Generator should never yield, just consume everything in one go.
AD_FAIL();
}
auto result = std::move(*rowAdder).resultTable();
result.setColumnSubset(joinColMap.permutationResult());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a function materializedResultFromLazyInput(std::move(generator), std::move(rowAdder), joinColMap)

src/engine/Join.cpp Outdated Show resolved Hide resolved
Comment on lines +96 to +115
// Allows the parameters to be null, which causes them to be ignored.
static LocalVocab mergeVocabsIfNecessary(
const std::shared_ptr<const Result>& result1,
const std::shared_ptr<const Result>& result2);

template <typename T>
ProtoResult monostateGeneratorToResult(
bool requestedLaziness, cppcoro::generator<std::monostate> generator,
std::shared_ptr<const Result> a, std::shared_ptr<const Result> b,
T rowAdder,
ad_utility::InvocableWithExactReturnType<IdTable, T&> auto extractTable,
std::invocable auto postAction) const;

static bool couldContainUndef(const auto& blocks, const auto& tree,
ColumnIndex joinColumn);

ProtoResult lazyJoin(std::shared_ptr<const Result> a, ColumnIndex jc1,
std::shared_ptr<const Result> b, ColumnIndex jc2,
bool requestLaziness) const;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some documentation, what do those functions do?

if (!result.empty()) {
co_yield result;
}
*localVocab = mergeVocabsIfNecessary(a, b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply merge the local vocabs upfront, and don't need to have them inside the inner generator lambda (the shared pointer magic makes all these things work.

Comment on lines +521 to +523
return convertGenerator(std::move(blockOrBlocks));
} else {
static_assert(ad_utility::alwaysFalse<T>, "Unexpected type");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else { static_assert(std::ranges::range<t>, "message) ... }

Also is range the correct concept, don't you explicitly expect generatorfor the convertGeneratorfunction?

Comment on lines +1075 to +1077
auto begL = fullBlockLeft.get().begin();
auto begR = fullBlockRight.get().begin();
auto addRowIndex = [&begL, &begR, this](auto itFromL, auto itFromR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

return [this, begL = fullBlockLeft.get().begin(),
&subrangeRight](auto itFromL) {
if constexpr (potentiallyHasUndef) {
AD_CORRECTNESS_CHECK(!isUndefined_(subrangeRight.front()) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this might be in a somewhat tight loop, precompute the condition, and check after the handling of the block or what not else, that the lambda wasn't called etc. or just make this check AD_EXPENSIVE_CHECK

if constexpr (potentiallyHasUndef) {
AD_CORRECTNESS_CHECK(rightSide_.undefBlocks_.empty());
}
getNextBlocks(equalToCurrentElLeft, leftSide_, rightSide_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Checks are also related to the part of the code i don't understand yet.

Comment on lines +1270 to +1287
while (side.it_ != side.end_) {
const auto& lBlock = *side.it_;
for (const auto& rBlock : undefBlocks) {
if constexpr (reversed) {
compatibleRowAction_.setInput(rBlock.fullBlock(), lBlock);
} else {
compatibleRowAction_.setInput(lBlock, rBlock.fullBlock());
}
for (size_t i : ad_utility::integerRange(lBlock.size())) {
for (size_t j : rBlock.getIndexRange()) {
hasUndef = true;
if constexpr (reversed) {
compatibleRowAction_.addRow(j, i);
} else {
compatibleRowAction_.addRow(i, j);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really duplicated to what you did for the cartesian thing above, can something be filtered out here?

}
}
++side.it_;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am stopping the review for now.
I am not yet fully sure why the undef support takes three times the code of the old version.
Maybe we can use a design similar to the fully materialized zipper join for bla with undef where the undef handling is completely templated out.

@sparql-conformance
Copy link

Copy link

sonarcloud bot commented Oct 18, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants