Fix: Slide ids turned into floats in split csv when names consist of only number #228
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of the Issue
train
,val
, andtest
splits introduceNaN
values when these splits are concatenated into a dataframe bysave_splits()
.NaN
values to floats due to the lack ofNaN
rep in integer columns in Pandas.ValueError
as shown in the screenshot will occurCLAM/datasets/dataset_generic.py
Line 247 in 3f875f7
Proposed fix
save_splits
to prevent unintended type conversion.dtype=object
inGeneric_WSI_Classification_Dataset
.get_split_from_df()
, cast the dtype of the corresponding split column to match that ofself.slide_data['slide_id']
.This happened when I was working with my own task's dataset csv. I can provide the csv file to reproduce this bug if needs be.