Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Load the protocol and metadata from the CRC files when available #4077

Open
wants to merge 52 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
dd9fe7b
[Kernel] Support for loading protocol and metadata from checksum file…
vkorukanti May 15, 2024
df12241
refactor according to the comments
huan233usc Jan 21, 2025
a404846
add logging and fix comments
huan233usc Jan 22, 2025
1e410c5
move comments
huan233usc Jan 22, 2025
1994b2e
update comments
huan233usc Jan 22, 2025
ddc0815
update comments
huan233usc Jan 22, 2025
02067f3
replace static test table with test generated
huan233usc Jan 22, 2025
dec4938
use optional in building crc and use rfii in crc test
huan233usc Jan 22, 2025
6eff126
resolve comments
huan233usc Jan 22, 2025
69c29df
update comments
huan233usc Jan 22, 2025
fec680e
fix test
huan233usc Jan 22, 2025
60981a6
updated param name
huan233usc Jan 22, 2025
97bc44f
update doc
huan233usc Jan 22, 2025
2f5b1a2
fix idention
huan233usc Jan 22, 2025
c7bf343
update the doc to reflect nullness
huan233usc Jan 22, 2025
dbf4490
fix javafmt
huan233usc Jan 22, 2025
f19d48b
check crc info's version, use snapshot for lower bound
huan233usc Jan 22, 2025
25912ed
update comments
huan233usc Jan 22, 2025
44c8445
clean up tests
huan233usc Jan 23, 2025
4122d70
clean unused test methods
huan233usc Jan 23, 2025
017260c
revert unused test methods
huan233usc Jan 23, 2025
ab1115a
clean up unused import
huan233usc Jan 23, 2025
7d89e55
revert unused test methods
huan233usc Jan 23, 2025
2270126
resolve comments
huan233usc Jan 23, 2025
ab3d62e
handle edge case
huan233usc Jan 23, 2025
78bb03c
prefer use crc over checkpoint
huan233usc Jan 23, 2025
9949e90
fix java format
huan233usc Jan 23, 2025
e1532e7
add tests and fix listing bug
huan233usc Jan 23, 2025
69acace
refactor per comments
huan233usc Jan 23, 2025
9993886
handle version is 0
huan233usc Jan 23, 2025
70829a0
merge from latest version
huan233usc Jan 23, 2025
733eaed
fix java format
huan233usc Jan 23, 2025
4202b31
rever accident deleted changes in conflict resolve
huan233usc Jan 23, 2025
37f4251
refactor
huan233usc Jan 23, 2025
7494fd3
fix test attempt 1
huan233usc Jan 23, 2025
3f0e05f
fix test attempt 2
huan233usc Jan 23, 2025
4bb7bad
add tests
huan233usc Jan 24, 2025
e42b70e
fix indent
huan233usc Jan 24, 2025
18ecc74
fix test
huan233usc Jan 24, 2025
7383397
add docs
huan233usc Jan 24, 2025
afd5db1
fix comment
huan233usc Jan 24, 2025
ec3a24b
fix comment
huan233usc Jan 24, 2025
412e0fb
resolve comments
huan233usc Jan 24, 2025
e3d86a9
format java
huan233usc Jan 24, 2025
0dbea3f
update check
huan233usc Jan 24, 2025
588b464
format java
huan233usc Jan 24, 2025
155cf2b
update internal utils to move filter away
huan233usc Jan 24, 2025
8fe533b
fix comment
huan233usc Jan 28, 2025
119d0e0
take while
huan233usc Jan 28, 2025
c189537
adding header
huan233usc Jan 28, 2025
cf54166
fix typo
huan233usc Jan 28, 2025
21ed78f
fix typo
huan233usc Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ private FileNames() {}
private static final Pattern CLASSIC_CHECKPOINT_FILE_PATTERN =
Pattern.compile("\\d+\\.checkpoint\\.parquet");

/** Example: 00000000000000000001.crc */
private static final Pattern CHECK_SUM_FILE_PATTERN = Pattern.compile("(\\d+)\\.crc")

/**
* Examples:
*
Expand All @@ -69,11 +72,27 @@ private FileNames() {}

public static final String SIDECAR_DIRECTORY = "_sidecars";

private static final Pattern checksumFileRegex = Pattern.compile("(\\d+)\\.crc");
////////////////////////
// Version extractors //
////////////////////////

/** Returns the delta (json format) path for a given delta file. */
public static String deltaFile(Path path, long version) {
return String.format("%s/%020d.json", path, version);
/**
* Get the version of the checkpoint, checksum or delta file. Throws an error if an unexpected
* file type is seen. These unexpected files should be filtered out to ensure forward
* compatibility in cases where new file types are added, but without an explicit protocol
* upgrade.
*/
public static long getFileVersion(Path path) {
if (isCheckpointFile(path.getName())) {
return checkpointVersion(path);
} else if (isCommitFile(path.getName())) {
return deltaVersion(path);
} else if (isChecksumFile(path.getName())) {
return checksumVersion(path);
} else {
throw new IllegalArgumentException(
String.format("Unexpected file type found in transaction log: %s", path));
}
}

/** Returns the version for the given delta path. */
Expand Down Expand Up @@ -152,7 +171,7 @@ public static Path topLevelV2CheckpointFile(

/** Returns the path for a V2 sidecar file with a given UUID. */
public static Path v2CheckpointSidecarFile(Path path, String uuid) {
return new Path(String.format("%s/%s/%s.parquet", path.toString(), SIDECAR_DIRECTORY, uuid));
huan233usc marked this conversation as resolved.
Show resolved Hide resolved
return new Path(String.format("%s/_sidecars/%s.parquet", path.toString(), uuid));
}

/**
Expand Down Expand Up @@ -194,32 +213,13 @@ public static boolean isV2CheckpointFile(String path) {
return V2_CHECKPOINT_FILE_PATTERN.matcher(new Path(path).getName()).matches();
}

public static boolean isCommitFile(String fileName) {
String filename = new Path(fileName).getName();
return DELTA_FILE_PATTERN.matcher(filename).matches()
|| UUID_DELTA_FILE_REGEX.matcher(filename).matches();
}

public static boolean isChecksumFile(String checksumFilePath) {
return checksumFileRegex.matcher(new Path(checksumFilePath).getName()).matches();
public static boolean isCommitFile(String path) {
final String fileName = new Path(path).getName();
return DELTA_FILE_PATTERN.matcher(fileName).matches()
|| UUID_DELTA_FILE_REGEX.matcher(fileName).matches();
}

/**
* Get the version of the checkpoint, checksum or delta file. Throws an error if an unexpected
* file type is seen. These unexpected files should be filtered out to ensure forward
* compatibility in cases where new file types are added, but without an explicit protocol
* upgrade.
*/
public static long getFileVersion(Path path) {
if (isCheckpointFile(path.getName())) {
return checkpointVersion(path);
} else if (isCommitFile(path.getName())) {
return deltaVersion(path);
} else if (isChecksumFile(path.getName())) {
return checksumVersion(path);
} else {
throw new IllegalArgumentException(
String.format("Unexpected file type found in transaction log: %s", path));
public static boolean isChecksumFile(String checksumFilePath) {
return CHECK_SUM_FILE_PATTERN.matcher(new Path(checksumFilePath).getName()).matches();
}
}
}
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.