You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello.
I have discovered a performance degradation in the .loc function of pandas version 2.0.3 when .loc handling big DataFrame with non-unique indexes. When using pandas more than 4 indexes, .loc drastically increases to X1000 times. And I notice that hi-ml-cpath/environment.yml, shows that it depends on pandas version 2.0.3. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on GitHub related to this issue, including #54550 and #54746.
I also found that hi-ml-cpath/other/slide_image_loading/src/Histopathology/datasets/panda_dataset.py and hi-ml-cpath/src/health_cpath/datasets/panda_tiles_dataset.py used the influenced api. There may be more files used the influenced api.
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of .loc .
Any other workarounds or solutions would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered:
Issue Description:
Hello.
I have discovered a performance degradation in the .loc function of pandas version 2.0.3 when .loc handling big DataFrame with non-unique indexes. When using pandas more than 4 indexes, .loc drastically increases to X1000 times. And I notice that
hi-ml-cpath/environment.yml
, shows that it depends on pandas version 2.0.3. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on GitHub related to this issue, including #54550 and #54746.I also found that
hi-ml-cpath/other/slide_image_loading/src/Histopathology/datasets/panda_dataset.py
andhi-ml-cpath/src/health_cpath/datasets/panda_tiles_dataset.py
used the influenced api. There may be more files used the influenced api.Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of
.loc
.Any other workarounds or solutions would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered: