Reading a CSV file when the separator parameter is set to a non-default value, it will always load the entirety of its contents into memory. #13655
Labels
A-io-csv
Area: reading/writing CSV files
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Log output
No response
Issue description
When attempting to read a large 18GB CSV file using streaming or batched reading methods, setting the separator parameter to a non-default value might lead to a memory explosion, despite this phenomenon not being reflected in the Windows process manager.
Additionally, I speculate that bug #9266 may be related to this issue.
Expected behavior
To avoid loading the entire content into memory, you can utilize streaming or batched reading methods instead.
Installed versions
Polars: 0.20.3
Index type: UInt32
Platform: Windows-10-10.0.18363-SP0
Python: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager:
cloudpickle: 2.2.1
connectorx:
deltalake:
fsspec: 2023.4.0
gevent:
hvplot: 0.8.4
matplotlib: 3.7.2
numpy: 1.24.3
openpyxl: 3.0.10
pandas: 2.0.3
pyarrow: 11.0.0
pydantic: 1.10.8
pyiceberg:
pyxlsb:
sqlalchemy: 1.4.39
xlsx2csv:
xlsxwriter:
The text was updated successfully, but these errors were encountered: