-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate partition scope DLO strategy and persist to DLO table #284
base: main
Are you sure you want to change the base?
Generate partition scope DLO strategy and persist to DLO table #284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the description with testing done in local docker.
...ark/src/main/java/com/linkedin/openhouse/jobs/spark/DataLayoutStrategyGeneratorSparkApp.java
Outdated
Show resolved
Hide resolved
...t/src/main/java/com/linkedin/openhouse/datalayout/generator/DataLayoutStrategyGenerator.java
Show resolved
Hide resolved
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TableFileStats.java
Outdated
Show resolved
Hide resolved
Row partition = row.getStruct(2); | ||
if (partition != null) { | ||
for (int i = 0; i < partition.size(); i++) { | ||
partitionValues.add(Objects.toString(partition.get(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the string representation look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like "2024-02-01, CA"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have partCol1=partVal1, partCol2=partVal2, ... ?
...n/java/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGenerator.java
Outdated
Show resolved
Hide resolved
|
||
private Optional<DataLayoutStrategy> buildDataLayoutStrategy( | ||
Dataset<Long> fileSizes, int partitionCount, String partitionValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it partitionValue or partitionValues, if there are more than one partition column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitionValue is the string representation of list of partitionValues.
...va/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGeneratorTest.java
Show resolved
Hide resolved
619844f
to
9012af8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jiang95-dev, overall looks great, having a few minor comments
...src/test/java/com/linkedin/openhouse/datalayout/persistence/StrategiesDaoTablePropsTest.java
Outdated
Show resolved
Hide resolved
SparkSession spark, | ||
String outputFqtn, | ||
List<DataLayoutStrategy> strategies, | ||
boolean isPartitionScope) { | ||
if (outputFqtn != null && !strategies.isEmpty()) { | ||
createTableIfNotExists(spark, outputFqtn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create table schema will have partition_id
...n/java/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGenerator.java
Outdated
Show resolved
Hide resolved
try (SparkSession spark = getSparkSession()) { | ||
spark.sql("USE openhouse"); | ||
spark.sql( | ||
String.format("CREATE TABLE %s (id INT, data STRING) PARTITIONED BY (id)", testTable)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you do 2 columns, +1 timestamp with day granularity
Summary
Generate partition scope DLO strategy and persist to DLO table.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.