-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extraneous side-effects of bounding boxes on segmentation masks #1030
Comments
👋 Hello @metagic, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
If this is a 🐛 Bug Report, could you please provide a minimum reproducible example (MRE) that demonstrates the issue (e.g., dataset, relevant code snippet, and specific steps)? Screenshots or GIFs can also help us better understand what you are experiencing. If this is a ❓ Question, we’d appreciate it if you could provide additional details about your segmentation task, including the dataset, model versions (YOLOv11 and YOLOv8), and any relevant training or inference settings that might help us troubleshoot or guide you better. Regarding your side question about directly obtaining contours/polylines: this is an excellent suggestion! While outlines are not directly output by the model, it would be great to hear more about your workflow requirements. Feel free to share any additional thoughts on this so we can consider improvements 🎉. An Ultralytics engineer will review and assist you further soon. Thank you for your patience and detailed issue description 🌟! |
@metagic thank you for reporting this segmentation mask observation and including a clear example. Let's break this down:
from ultralytics import YOLO
import cv2
model = YOLO('yolo11n-seg.pt')
results = model.predict('your_image.jpg')
for r in results:
for mask in r.masks:
contour = mask.xy[0].astype(np.int32).reshape(-1, 1, 2)
# Now use contour with OpenCV operations
The team is actively working on segmentation enhancements, and we appreciate you surfacing this edge case. For immediate needs, the box expansion approach tends to be most effective while maintaining real-time performance. For more details on working with segmentation outputs, see our Segment Task documentation and Mask Processing guide. |
Thank you Paula! This clears up quite a few things for me. The masks being subject to their bounding boxes also has the (undesired, at least in my case) effect that overlaps are computed from bounding boxes, not object shapes. |
@metagic thank you for highlighting this important nuance! You're absolutely correct that overlap computations (like IoU) in standard detection pipelines are based on bounding boxes rather than mask shapes. Let's clarify and offer solutions:
Would you like me to elaborate on any of these approaches? For stone segmentation where precise boundaries are crucial, the mask-IoU post-processing combined with slightly expanded boxes (from our previous discussion) often yields the best accuracy/compute balance. 💎 |
Thank you Paula for your most helpful advice. I switched to the SAM (2) model and indeed get well drawn contours. Obviously at the cost of speed, but this is less of an issue for the images that will be used as a reference set and thus will only be treated once. For the images taken in the field, I will need a model that performs in real time (at walking speed), but sketchy contours should then be ok. I still struggle with the overlaps, as show in this example: I'd like to get rid of the strange double contours around some of the stones (best seen around the pink & red contoured stone at the bottom edge) as well as the other overlaps. Your help is most welcome! |
Thank you for the update and clear example! For SAM-powered stone segmentation, here's how we can refine those overlapping contours:
from ultralytics import SAM
model = SAM('sam_b.pt')
results = model.generate(im, crop_nms_thresh=0.4) # Default 0.7
masks, keep = SAM.remove_small_regions(results.masks.data, min_area=100, nms_thresh=0.4)
results.update(masks=masks)
results = model.generate(im, conf_thres=0.9, stability_score_thresh=0.92) For field deployment where real-time performance is needed, you might want to:
This hybrid approach balances speed and precision. Would you like me to elaborate on any of these strategies? The team is particularly interested in geological applications and would welcome any insights on your stone segmentation use case! 🪨 |
I very much appreciate your interest and competent advice. The application relates more to an urban than a geological setting. Imagine a blind person crossing Plaza Mayor from calle de Toledo to the Tourist Center with a (CV-enhanced) white cane. The cane needs to assemble stone mosaics, compare them with georeferenced tiles and retrieve its position. This is the use case. Unfortunately, I was not yet successful applying your code sample. Instead I reverted to some (cludgy) post-processing, just to demonstrate the approach: The image has some of the "mosaic metrics" that I believe to be relevant in the matching process: angle and distance to (detected) neighboring stones, areas, shapes, aspect ratios, pavement axis. I might use that in a classical CV matching approach, or, probably more efficiently, with a specifically trained AI. Next, I will try out your field deployment suggestion - thank you very much! |
Thank you for sharing this compelling urban navigation use case - what an impactful application of computer vision! 🌆 Your mosaic metric approach shows excellent insight. Let me offer some targeted suggestions to refine the implementation:
from ultralytics import YOLO, SAM
# Field Deployment (YOLO11n = 2.7ms inference)
yolo_model = YOLO('yolo11n-seg.pt') # Ultra-lightweight
sam_model = SAM('sam2_b.pt')
yolo_results = yolo_model.predict(stream=True, imgsz=320) # Mobile-optimized
for frame in video_stream:
# Get YOLO detections with 20% box padding
padded_boxes = pad_boxes(yolo_results.boxes.xyxy, padding=0.2)
# SAM refinement only on high-confidence detections
sam_results = sam_model.prompt_inference(frame, bboxes=padded_boxes[yolo_results.boxes.conf > 0.7])
# Your metric extraction pipeline
process_mosaic_metrics(sam_results.masks) This balances <15ms latency with SAM's precision on critical detections.
from ultralytics.data.annotator import auto_annotate
auto_annotate(data="mosaic_images/",
det_model="yolo11s.pt",
sam_model="sam2_l.pt", # Use largest for annotation quality
classes=[stone_category_id],
output_dir="annotations/") This creates precise masks to train a specialized YOLO11 model on your stone patterns.
# In your SAM initialization
model = SAM('sam2.1_b.pt')
results = model.generate(im, non_overlap_masks=True) # New in 2.1! This activates built-in anti-overlap logic during segmentation. The team would be fascinated to learn more about your geospatial matching approach as it develops. Your work exemplifies the real-world impact we aim to enable - please keep us updated on progress! 🦯 For additional guidance on SAM2.1's new features, see the updated SAM 2.1 documentation. The |
Thank you for your kind words and the encouragement - there is still a long way to go and your support is extremely valuable. I'd be glad to keep you up-to-date with the project. I guess Discord would be a good place for that? Do I understand you correctly that where you put 'yolo11n-seg.pt' and 'yolo11s.pt' in your suggestions 1 and 2, you actually expect me to put in my own stone-trained models? The bare YOLO models would not know about the stones. Regarding suggestion 3, I am at a complete loss: neither do I find a generate class in the SAM object nor any documentation regarding "non_overlap_masks". I am running Ultralytics (8.3.83) locally. Thank you for your support and patience. |
Thank you for the clarification! Let me address your points systematically:
For immediate needs, the version upgrade should unlock the missing SAM2.1 functionality. Let me know if you encounter any installation hurdles! 🚀 |
Search before asking
Question
I have a segmentation model that is supposed to outline stones. The stones are properly detected, but the masks suffer from what seem to be side-effects of their bounding boxes. See this example:
Some of the contours of the stones are either cut off or extended at the edges of their bounding box. This effect results in poorly drawn contours. I tried both YOLOv11 and YOLOv8 models.
(Side question: why can't we directly get contours / polylines from the segmentation results instead of going through the extra steps to retrieve masks and apply cv.FIndContours?)
Additional
No response
The text was updated successfully, but these errors were encountered: