Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset support other partitions #4

Open
mahiki opened this issue Aug 22, 2023 · 0 comments
Open

Dataset support other partitions #4

mahiki opened this issue Aug 22, 2023 · 0 comments

Comments

@mahiki
Copy link
Owner

mahiki commented Aug 22, 2023

Dataset currently defines a rundate partition in a hive-style layout. This can be handy for s3 queries with Athena or Spark, or other partitioned file reader. Dataset is getting in the way of read/write across partitions, or writing a large dataframe partitioned by columns.

Not sure how important this is, usually the pipeline is generating small datasets at the end of a long process, and Dataset is there just to store these artifacts in an organized way rather than scattered all over my laptop.

Probably Prefect will be publishing their Julia API before this becomes a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant