-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Clean up Parquet generic and internal readers #12102
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is great, thanks for splitting into multiple commits, that made it easy to review. We'll need to update revAPI with the new ParquetValueReader.createStructReader
interface which doesn't take in explicit types
protected abstract ParquetValueReader<T> createStructReader( | ||
List<Type> types, List<ParquetValueReader<?>> fieldReaders, Types.StructType structType); | ||
List<ParquetValueReader<?>> fieldReaders, Types.StructType structType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RevAPI is failing due to the new abstract method that implementations will need to implement (which I understand the rationale, the previous types argument were not used). I think we'll need to add the breaking change to revAPI
./gradlew :iceberg-parquet:revapiAcceptBreak --justification "Implementations of ParquetValueReader.createStructReader should not have to pass in explicit types" \
--code "java.method.abstractMethodAdded" \
--new "method org.apache.iceberg.parquet.ParquetValueReader<T> org.apache.iceberg.data.parquet.BaseParquetReaders<T>::createStructReader(java.util.List<org.apache.iceberg.parquet.ParquetValueReader<?>>, org.apache.iceberg.types.Types.StructType)"
This is a refactor that cleans up a few issues I noticed while reviewing #11904 and while working on Parquet variant readers.
ParquetValueReaders
to keep implementations privatetypes
argument fromParquetValueReader.StructReader
subclasses. Some accessible uses had to be left, but most are removed. This allowed removing types from the reader builders as well.TimeUnit
param from factory methods added in Parquet: Add readers and writers for the internal object model #11904setPageSource
LogicalTypeAnnotation
classes to cut down on multi-line method signaturesChanges are grouped into separate commits for easier review.