Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ISSUE-16094: fix s3 storage parquet structureFormat ingestion (#18660)
This aims at fixing the s3 ingestion for parquet files, current behaviour is that the pipeline will break if it encounters a file that is not valid parquet in the the container, this is not great as containers might container non parquet files on purpose like for example _SUCCESS files created by spark. For that do not fail the whole pipeline when a single container fails, instead count it as a failure and move on with the remainder of the containers, this is already an improvement by ideally the ingestion should try a couple more files under the given prefix before given up, additionally we can allow users to specify file patterns to be ignored. Co-authored-by: Abdallah Serghine <[email protected]> Co-authored-by: Pere Miquel Brull <[email protected]>
- Loading branch information