You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I successfully implemented the instance segmentation function provided in the Micro-SAM repository. However, while using the semantic_sam_trainer function to fine-tune the model for semantic segmentation on custom images, I encountered several issues. Below, I detail the problems, fixes made to the source code, and the remaining unresolved issue.
Issue 1: RuntimeError: "host_softmax" not implemented for 'Bool'
Problem: When computing the Dice Loss using a custom loss function, the following error occurred:
RuntimeError: "host_softmax" not implemented for 'Bool'
This happened because the prediction tensor was treated as a boolean type, but the torch.softmax function requires a floating-point input.
Fix: To address this, I explicitly converted the pred tensor to a floating-point type before applying torch.softmax.
if self.softmax:
pred = torch.softmax(pred.float(), dim=1)
Issue 2: ValueError: Expected input and target of same shape
Problem: When comparing the input tensor against class indices in _one_hot_encoder, the output shape of the tensor was [B, H*num_classes, W] instead of the expected [B, num_classes, H, W]. This caused a mismatch in shapes during the loss computation.
Fix: I modified the _one_hot_encoder function to unsqueeze the tensor along the channel dimension (axis 1) to align the output with the expected shape.
Modified Code:
Before:
temp_prob = input_tensor == i
tensor_list.append(temp_prob)
Issue 3: RuntimeError: "host_softmax" not implemented for 'Bool' (Recurrence)
Problem: The softmax error reappeared due to the masks tensor being of boolean type. This issue arose inside the _compute_loss function.
Fix: I converted the masks tensor to a floating-point type within _compute_loss.
Modified Code:
masks = masks.float()
Issue 4: AssertionError: Class number out of range
Problem: While running the code with the assumption of 3 classes, the following error occurred:
Assertion `t >= 0 && t < n_classes` failed.
This indicates that one or more pixels in the target tensor had values outside the valid range of [0, num_classes-1]. This particular error occurred when it was run with 3 classes
Debugging Steps:
Verified the groud truth masks had 4 classes in the dataset.
Updated the code to handle 4 classes. However, this led to a shape mismatch error in the Dice Loss computation.
Issue 5: ValueError: Expected input and target of same shape
Problem: After resolving the previous issues, a ValueError was raised:
ValueError: Expected input and target of same shape, got: torch.Size([2, 3, 488, 685]), torch.Size([2, 4, 488, 685]).
This occurred because the input tensor had 3 channels, while the target tensor had 4 channels.
The text was updated successfully, but these errors were encountered:
Before I look into the details and try to reproduce the issues, could you elaborate on the problem statement?
i.e. what are your input images, what are the corresponding labels, and what is the expected outcome?
This would be a good starting point for us to discuss further details!
For the input images I am using images of cells as pngs. For the mask I am using Labelme(annotation software) to first add labels to those input images. Those files are treated as json files and then I convert the json files back into the mask as a png. For the labels, we should have 4 classes, Background and 3 cell types. For the expected outcome, currently I am trying to get the training phase of the code working where I pass in all the necessary parameters to the SemanticSamTrainer. Once the training of the model is finished, I would evaluate the performance of the model against a test set and visualize it.
I successfully implemented the instance segmentation function provided in the Micro-SAM repository. However, while using the semantic_sam_trainer function to fine-tune the model for semantic segmentation on custom images, I encountered several issues. Below, I detail the problems, fixes made to the source code, and the remaining unresolved issue.
Issue 1: RuntimeError: "host_softmax" not implemented for 'Bool'
Problem: When computing the Dice Loss using a custom loss function, the following error occurred:
This happened because the prediction tensor was treated as a boolean type, but the torch.softmax function requires a floating-point input.
Fix: To address this, I explicitly converted the pred tensor to a floating-point type before applying torch.softmax.
Issue 2: ValueError: Expected input and target of same shape
Problem: When comparing the input tensor against class indices in _one_hot_encoder, the output shape of the tensor was [B, H*num_classes, W] instead of the expected [B, num_classes, H, W]. This caused a mismatch in shapes during the loss computation.
Fix: I modified the _one_hot_encoder function to unsqueeze the tensor along the channel dimension (axis 1) to align the output with the expected shape.
Modified Code:
Before:
After:
Final concatenation:
Issue 3: RuntimeError: "host_softmax" not implemented for 'Bool' (Recurrence)
Problem: The softmax error reappeared due to the masks tensor being of boolean type. This issue arose inside the _compute_loss function.
Fix: I converted the masks tensor to a floating-point type within _compute_loss.
Modified Code:
Issue 4: AssertionError: Class number out of range
Problem: While running the code with the assumption of 3 classes, the following error occurred:
This indicates that one or more pixels in the target tensor had values outside the valid range of [0, num_classes-1]. This particular error occurred when it was run with 3 classes
Debugging Steps:
Verified the groud truth masks had 4 classes in the dataset.
Updated the code to handle 4 classes. However, this led to a shape mismatch error in the Dice Loss computation.
Issue 5: ValueError: Expected input and target of same shape
Problem: After resolving the previous issues, a ValueError was raised:
This occurred because the input tensor had 3 channels, while the target tensor had 4 channels.
The text was updated successfully, but these errors were encountered: