Reading big CSV files for FLAML #428

torronen · 2022-01-25T16:23:35Z

torronen
Jan 25, 2022
Collaborator

I am using Pandas.read_csv with custom datatypes (float32). My datasets are 5 GB, 50 GB and 115 GB.
However, Pandas is using lots of RAM + swap, over 400 G, for 50 GB dataset. Ideally I would like to have 5 Gb fit 8GB RAM, 50 GB fit 64 GB RAM and 115 GB fit 128 GB RAM. Training takes more, but I am fine with it being paged/swapped.

What would be the best way to read CSV files for use with FLAML? Pandas chunking, Dask, somehow use native reading features of packages which support it (LightGBM at least)? If stuck with Pandas, I will try with even smaller datatypes (float16).

I will keep trying different options, but if someone already knows, it might save lots of time as experimenting with big datasets can be slow.

Answered by sonichi

Jan 25, 2022

@torronen To use native data loading of libraries like lightgbm, you can modify this example: https://microsoft.github.io/FLAML/docs/Getting-Started#tune-user-defined-function
Make a Dataset from the csv file instead of from the pandas dataframe. Does that work for you?

View full answer

sonichi · 2022-01-25T17:20:49Z

sonichi
Jan 25, 2022

@torronen To use native data loading of libraries like lightgbm, you can modify this example: https://microsoft.github.io/FLAML/docs/Getting-Started#tune-user-defined-function
Make a Dataset from the csv file instead of from the pandas dataframe. Does that work for you?

2 replies

torronen Jan 25, 2022
Collaborator Author

Thank you. I will try it.

sonichi Jan 25, 2022

PR created: #430
Please let us know if it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading big CSV files for FLAML #428

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Reading big CSV files for FLAML #428

torronen Jan 25, 2022 Collaborator

Replies: 1 comment · 2 replies

sonichi Jan 25, 2022

torronen Jan 25, 2022 Collaborator Author

sonichi Jan 25, 2022

torronen
Jan 25, 2022
Collaborator

Replies: 1 comment 2 replies

sonichi
Jan 25, 2022

torronen Jan 25, 2022
Collaborator Author