-
I am using Pandas.read_csv with custom datatypes (float32). My datasets are 5 GB, 50 GB and 115 GB. What would be the best way to read CSV files for use with FLAML? Pandas chunking, Dask, somehow use native reading features of packages which support it (LightGBM at least)? If stuck with Pandas, I will try with even smaller datatypes (float16). I will keep trying different options, but if someone already knows, it might save lots of time as experimenting with big datasets can be slow. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
@torronen To use native data loading of libraries like lightgbm, you can modify this example: https://microsoft.github.io/FLAML/docs/Getting-Started#tune-user-defined-function |
Beta Was this translation helpful? Give feedback.
@torronen To use native data loading of libraries like lightgbm, you can modify this example: https://microsoft.github.io/FLAML/docs/Getting-Started#tune-user-defined-function
Make a Dataset from the csv file instead of from the pandas dataframe. Does that work for you?