Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally use sequential stored field reader in LuceneSyntheticSourceChangesSnapshot #121636

Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,15 @@ public LuceneSyntheticSourceChangesSnapshot(
this.maxMemorySizeInBytes = maxMemorySizeInBytes > 0 ? maxMemorySizeInBytes : 1;
this.sourceLoader = mapperService.mappingLookup().newSourceLoader(null, SourceFieldMetrics.NOOP);
Set<String> storedFields = sourceLoader.requiredStoredFields();
this.storedFieldLoader = StoredFieldLoader.create(false, storedFields);

// If more than a few ops are requested let's enforce a sequential reader. Typically, thousands of ops may be requested.
// Given how LuceneSyntheticSourceChangesSnapshot accesses stored field, it should always benefit from using sequential reader.
//
// A sequential reader decompresses a block eagerly, so that increasing adjacent doc ids can access stored fields without
// compressing on each StoredFields#document(docId) invocation. The only downside is the last few operations in the request
// seq_no range are at the beginning of a block, which means stored fields for many docs are being decompressed that isn't used.
Copy link
Contributor

@henningandersen henningandersen Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true also for zstd? Or rather would that always decompress the entire thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it both true for zstd or deflate, since both de-compress an entire block.

boolean shouldForceSequentialReader = toSeqNo - fromSeqNo > 10;
this.storedFieldLoader = StoredFieldLoader.create(false, storedFields, shouldForceSequentialReader);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use always force a sequential reader. I think no matter the stored field format, it has benefits given how LuceneSyntheticSourceChangesSnapshot access stored fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure. Does this not depend upon how jumpy the data is, which with index sorting could be very jumpy - and whether there is more than one stored field per doc (not sure that is always true?)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is why then changed to logic not look at index.codec setting anymore.

this.lastSeenSeqNo = fromSeqNo - 1;
}

Expand Down