-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify read
and scan
functions
#13040
Labels
A-io
Area: reading and writing data
A-io-cloud
Area: reading/writing to cloud storage
accepted
Ready for implementation
enhancement
New feature or an improvement of an existing feature
Milestone
Comments
stinodego
added
enhancement
New feature or an improvement of an existing feature
accepted
Ready for implementation
labels
Dec 14, 2023
This was referenced Dec 14, 2023
2 tasks
Here are a few open, unaccepted issues that should be addressed during the harmonization of |
👍- was about to ask why there is no On that note - a small documentation recommendation - I think it's time to add sub-headers to the IO functions as grouped by type: csv:
- [polars.read_csv](https://docs.pola.rs/py-polars/html/reference/api/polars.read_csv.html)
- [polars.read_csv_batched](https://docs.pola.rs/py-polars/html/reference/api/polars.read_csv_batched.html)
- [polars.scan_csv](https://docs.pola.rs/py-polars/html/reference/api/polars.scan_csv.html)
- [polars.DataFrame.write_csv](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_csv.html)
- [polars.LazyFrame.sink_csv](https://docs.pola.rs/py-polars/html/reference/api/polars.LazyFrame.sink_csv.html)
ipc:
- [polars.read_ipc](https://docs.pola.rs/py-polars/html/reference/api/polars.read_ipc.html)
- [polars.read_ipc_stream](https://docs.pola.rs/py-polars/html/reference/api/polars.read_ipc_stream.html)
- [polars.scan_ipc](https://docs.pola.rs/py-polars/html/reference/api/polars.scan_ipc.html)
- [polars.read_ipc_schema](https://docs.pola.rs/py-polars/html/reference/api/polars.read_ipc_schema.html)
- [polars.DataFrame.write_ipc](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_ipc.html)
- [polars.DataFrame.write_ipc_stream](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_ipc_stream.html)
- [polars.LazyFrame.sink_ipc](https://docs.pola.rs/py-polars/html/reference/api/polars.LazyFrame.sink_ipc.html)
parquet:
- [polars.read_parquet](https://docs.pola.rs/py-polars/html/reference/api/polars.read_parquet.html)
- [polars.scan_parquet](https://docs.pola.rs/py-polars/html/reference/api/polars.scan_parquet.html)
- [polars.read_parquet_schema](https://docs.pola.rs/py-polars/html/reference/api/polars.read_parquet_schema.html)
- [polars.DataFrame.write_parquet](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_parquet.html)
- [polars.LazyFrame.sink_parquet](https://docs.pola.rs/py-polars/html/reference/api/polars.LazyFrame.sink_parquet.html)
database:
- [polars.read_database](https://docs.pola.rs/py-polars/html/reference/api/polars.read_database.html)
- [polars.read_database_uri](https://docs.pola.rs/py-polars/html/reference/api/polars.read_database_uri.html)
- [polars.DataFrame.write_database](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_database.html)
json:
- [polars.read_json](https://docs.pola.rs/py-polars/html/reference/api/polars.read_json.html)
- [polars.read_ndjson](https://docs.pola.rs/py-polars/html/reference/api/polars.read_ndjson.html)
- [polars.scan_ndjson](https://docs.pola.rs/py-polars/html/reference/api/polars.scan_ndjson.html)
- [polars.DataFrame.write_json](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_json.html)
- [polars.DataFrame.write_ndjson](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_ndjson.html)
- [polars.LazyFrame.sink_ndjson](https://docs.pola.rs/py-polars/html/reference/api/polars.LazyFrame.sink_ndjson.html)
avro:
- [polars.read_avro](https://docs.pola.rs/py-polars/html/reference/api/polars.read_avro.html)
- [polars.DataFrame.write_avro](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_avro.html)
excel:
- [polars.read_excel](https://docs.pola.rs/py-polars/html/reference/api/polars.read_excel.html)
- [polars.read_ods](https://docs.pola.rs/py-polars/html/reference/api/polars.read_ods.html)
- [polars.DataFrame.write_excel](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_excel.html#)
iceberg:
- [polars.scan_iceberg](https://docs.pola.rs/py-polars/html/reference/api/polars.scan_iceberg.html)
delta:
- [polars.scan_delta](https://docs.pola.rs/py-polars/html/reference/api/polars.scan_delta.html)
- [polars.read_delta](https://docs.pola.rs/py-polars/html/reference/api/polars.read_delta.html)
- [polars.DataFrame.write_delta](https://docs.pola.rs/py-polars/html/reference/api/polars.DataFrame.write_delta.html)
... |
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-io
Area: reading and writing data
A-io-cloud
Area: reading/writing to cloud storage
accepted
Ready for implementation
enhancement
New feature or an improvement of an existing feature
read
functions should behave exactly likescan
functions followed bycollect
.There may be some added or removed parameters for functionality that is (not) relevant in eager mode.
We should take a look at our existing scan functions and make sure they conform to these expectations:
scan_parquet
read_ipc
toscan_ipc
:storage_options
-read_csv
andread_ipc
do not use nativestorage_options
configuration keys #17815read_csv
toscan_csv
:scan_csv
currently lacks the following:scan_csv
does not support a list of datatypes inschema_overrides
#17813storage_options
-read_csv
andread_ipc
do not use nativestorage_options
configuration keys #17815scan_ndjson
scan_delta
scan_iceberg
(has noread
equivalent yet)The text was updated successfully, but these errors were encountered: