Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Minor SnapshotManager refactor followup #4089

Merged

Conversation

scottsand-db
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Followup to #4035.

Our SnapshotManager LogSegment construction logic is ... a bit messy, to say the least. This PR makes getLogSegmentForVersion not return an Optional LogSegment.

I also start documenting some of the key steps in constructing a LogSegment. More steps, refactors, and comments will come in future PRs.

How was this patch tested?

Mainly just a refactor. Existing UTs.

Does this PR introduce any user-facing changes?

No.

@scottsand-db scottsand-db self-assigned this Jan 24, 2025
@scottsand-db scottsand-db force-pushed the kernel_snapshot_manager_refactor_3 branch from f9edf00 to 26450b9 Compare January 24, 2025 18:32
@@ -744,12 +735,6 @@ class SnapshotManagerSuite extends AnyFunSuite with MockFileSystemClientUtils {
versionToLoad = Optional.of(17),
expectedErrorMessageContains = "missing log file for version 0"
)
testExpectedError[InvalidTableException](
deltaFileStatuses(15L until 25L) ++ singularCheckpointFileStatuses(Seq(20L)),
startCheckpoint = Optional.of(20),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just throws an IllegalArgumentException when we invoke the log listing util ... which is expected.

)
testExpectedError[RuntimeException](
files,
startCheckpoint = Optional.of(20),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just throws an IllegalArgumentException when we invoke the log listing util ... which is expected.

@scottsand-db scottsand-db force-pushed the kernel_snapshot_manager_refactor_3 branch from 54221a4 to 5cc7b09 Compare January 24, 2025 19:03
Comment on lines 373 to 378
() -> {
logger.info("Loading last checkpoint from the _last_checkpoint file");
return new Checkpointer(logPath)
.readLastCheckpointFile(engine)
.map(x -> x.version);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: in this case (where versionToLoadOpt is empty), didn't we previously default to using findLastCompleteCheckpointBefore if _last_checkpoint didn't exist. Wondering if this is a meaningful change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We did not do that. findLastCompleteCheckpointBefore is only useful if you can give it a version to search for before. If versionToLoad is empty, then we don't have an end version to compare against.

findLastCompleteCheckpointBefore(engine, logPath, beforeVersion).map(x -> x.version);

// the previous latest complete checkpoint at or before $versionToLoad. //
////////////////////////////////////////////////////////////////////////////////////////////////

final Optional<Long> startCheckpointVersionOpt =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor to util

* `versionToLoadOpt` is provided, will use the checkpoint pointed to by the _last_checkpoint
* file.
*/
private Optional<Long> getStartCheckpointVersion(Engine engine, Optional<Long> versionToLoadOpt) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we can also add nice tests for each of these individual getLogSegment util helper methods.

I invision a deep getLogSegment method (as opposed to, basically, getLogSegmentA calls getLogSegmentB which calls getLogSegmentC).

Instead, getLogSegment is 1 large method, and it has little utilities that poke and help out.

Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome lgtm

@scottsand-db scottsand-db merged commit 57c7c87 into delta-io:master Jan 25, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants