-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Minor SnapshotManager refactor followup #4089
[Kernel] Minor SnapshotManager refactor followup #4089
Conversation
f9edf00
to
26450b9
Compare
@@ -744,12 +735,6 @@ class SnapshotManagerSuite extends AnyFunSuite with MockFileSystemClientUtils { | |||
versionToLoad = Optional.of(17), | |||
expectedErrorMessageContains = "missing log file for version 0" | |||
) | |||
testExpectedError[InvalidTableException]( | |||
deltaFileStatuses(15L until 25L) ++ singularCheckpointFileStatuses(Seq(20L)), | |||
startCheckpoint = Optional.of(20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this just throws an IllegalArgumentException when we invoke the log listing util ... which is expected.
) | ||
testExpectedError[RuntimeException]( | ||
files, | ||
startCheckpoint = Optional.of(20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this just throws an IllegalArgumentException when we invoke the log listing util ... which is expected.
54221a4
to
5cc7b09
Compare
() -> { | ||
logger.info("Loading last checkpoint from the _last_checkpoint file"); | ||
return new Checkpointer(logPath) | ||
.readLastCheckpointFile(engine) | ||
.map(x -> x.version); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: in this case (where versionToLoadOpt
is empty), didn't we previously default to using findLastCompleteCheckpointBefore
if _last_checkpoint
didn't exist. Wondering if this is a meaningful change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. We did not do that. findLastCompleteCheckpointBefore
is only useful if you can give it a version to search for before. If versionToLoad is empty, then we don't have an end version to compare against.
delta/kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Line 381 in 107b2d3
findLastCompleteCheckpointBefore(engine, logPath, beforeVersion).map(x -> x.version); |
kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Outdated
Show resolved
Hide resolved
// the previous latest complete checkpoint at or before $versionToLoad. // | ||
//////////////////////////////////////////////////////////////////////////////////////////////// | ||
|
||
final Optional<Long> startCheckpointVersionOpt = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to util
* `versionToLoadOpt` is provided, will use the checkpoint pointed to by the _last_checkpoint | ||
* file. | ||
*/ | ||
private Optional<Long> getStartCheckpointVersion(Engine engine, Optional<Long> versionToLoadOpt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we can also add nice tests for each of these individual getLogSegment util helper methods.
I invision a deep getLogSegment method (as opposed to, basically, getLogSegmentA calls getLogSegmentB which calls getLogSegmentC).
Instead, getLogSegment is 1 large method, and it has little utilities that poke and help out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome lgtm
Which Delta project/connector is this regarding?
Description
Followup to #4035.
Our SnapshotManager LogSegment construction logic is ... a bit messy, to say the least. This PR makes
getLogSegmentForVersion
not return an Optional LogSegment.I also start documenting some of the key steps in constructing a LogSegment. More steps, refactors, and comments will come in future PRs.
How was this patch tested?
Mainly just a refactor. Existing UTs.
Does this PR introduce any user-facing changes?
No.