Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use one LocalVocab for each block during lazy evaluation #1567

Merged
merged 10 commits into from
Oct 22, 2024

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Oct 19, 2024

Lazy operations now yield pairs of (IdTable, LocalVocab for that IdTable). Previously, there was one (potentially large) local vocab for the complete result, even if it was lazy. The new structure can reduce the RAM usage for lazy results that contain many local vocab entries dramatically. Consider for example queries like the following:

SELECT * WHERE {
  ?s ?p ?o 
  BIND( CONCAT(?o, "suffix") as ?bound)
}

Previously all the result strings of the BIND operation would be kept in RAM at the same time although the complete query (index scan + bind + export) could be computed lazily.

Copy link

codecov bot commented Oct 19, 2024

Codecov Report

Attention: Patch coverage is 97.37991% with 6 lines in your changes missing coverage. Please review.

Project coverage is 88.98%. Comparing base (2ebca4d) to head (311ca96).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/Result.h 75.00% 0 Missing and 2 partials ⚠️
src/engine/Bind.cpp 96.00% 1 Missing ⚠️
src/engine/ExportQueryExecutionTrees.cpp 97.29% 0 Missing and 1 partial ⚠️
src/engine/Filter.cpp 95.65% 0 Missing and 1 partial ⚠️
src/engine/GroupBy.cpp 96.55% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1567      +/-   ##
==========================================
+ Coverage   88.97%   88.98%   +0.01%     
==========================================
  Files         368      368              
  Lines       33819    33860      +41     
  Branches     3826     3828       +2     
==========================================
+ Hits        30090    30131      +41     
  Misses       2473     2473              
  Partials     1256     1256              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass on everything but the tests.
Mostly just refactorings.
Do we really need two structs of [IdTable, LocalVocab] in the Result class?

src/engine/Result.h Show resolved Hide resolved
Comment on lines 97 to 99
void mergeWith(R&& vocabs) {
auto inserter = std::back_inserter(otherWordSets_);
for (const auto& vocab : AD_FWD(vocabs)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be const R& vocabs and without AD_FWD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the merge/share routines could skip all empty local vocabs.
That is an easy optimization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the merge/share routines could skip all empty local vocabs. That is an easy optimization.

Does it though? To determine the size it does iterate over the same data structures just to calculate the size

src/engine/GroupBy.h Outdated Show resolved Hide resolved
src/engine/GroupBy.cpp Outdated Show resolved Hide resolved
src/engine/GroupBy.cpp Outdated Show resolved Hide resolved
src/engine/ExportQueryExecutionTrees.cpp Outdated Show resolved Hide resolved
src/engine/ExportQueryExecutionTrees.cpp Outdated Show resolved Hide resolved
@RobinTF
Copy link
Collaborator Author

RobinTF commented Oct 21, 2024

@joka921

Do we really need two structs of [IdTable, LocalVocab] in the Result class?

Kind of yes. Non-lazy results always have a shared pointer to a const Result because this Result might be stored in the cache or used somewhere else. It can't be modified from there, whereas the lazy Results have mutable, owned IdTables yielded by their respective generator which can be modified because only a copy of them could potentially land in the cache. So there are two different philosophies encoded in the structs. Also the same applies to the LocalVocab. This could be unified, but would make every generator to create shared pointers just for the sake of obeying the format.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group by probably has a subtle lifetime bug (see my comment from previously) else there are only a few suggestions.

src/engine/Filter.cpp Outdated Show resolved Hide resolved
src/engine/Bind.cpp Outdated Show resolved Hide resolved
src/engine/Union.cpp Outdated Show resolved Hide resolved
src/engine/Union.cpp Outdated Show resolved Hide resolved
test/FilterTest.cpp Show resolved Hide resolved
@sparql-conformance
Copy link

Copy link

sonarcloud bot commented Oct 22, 2024

@@ -533,7 +533,8 @@ Result::Generator GroupBy::computeResultLazily(
// Reuse buffer if not moved out
resultTable = std::move(outputPair.idTable_);
resultTable.clear();
currentLocalVocab = LocalVocab{};
// Keep last local vocab for next commit.
currentLocalVocab = std::move(storedLocalVocabs.back());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily correct.
If your group is very long (e.g. from 5 IdTables ago) , then you also have to store the corresponding local vocab from exactly that group.
So you have to probably have alocal vocab that is associated with the currentGroupBlock and updated exactly when you do the commitRow call (because then you implicitly updated the next group block ,isn't that true?

@joka921 joka921 changed the title Make LocalVocab part of generator for lazy results Use one LocalVocab for each block during lazy evaluation Oct 22, 2024
@joka921 joka921 merged commit c70d5e9 into ad-freiburg:master Oct 22, 2024
22 checks passed
@RobinTF RobinTF deleted the local-vocab-generator branch October 22, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants