Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraneous side-effects of bounding boxes on segmentation masks #1030

Open
1 task done
metagic opened this issue Feb 27, 2025 · 10 comments
Open
1 task done

Extraneous side-effects of bounding boxes on segmentation masks #1030

metagic opened this issue Feb 27, 2025 · 10 comments
Labels
question Further information is requested segment Instance Segmentation issues, PR's

Comments

@metagic
Copy link

metagic commented Feb 27, 2025

Search before asking

Question

I have a segmentation model that is supposed to outline stones. The stones are properly detected, but the masks suffer from what seem to be side-effects of their bounding boxes. See this example:

Image

Some of the contours of the stones are either cut off or extended at the edges of their bounding box. This effect results in poorly drawn contours. I tried both YOLOv11 and YOLOv8 models.

(Side question: why can't we directly get contours / polylines from the segmentation results instead of going through the extra steps to retrieve masks and apply cv.FIndContours?)

Additional

No response

@metagic metagic added the question Further information is requested label Feb 27, 2025
@UltralyticsAssistant UltralyticsAssistant added the segment Instance Segmentation issues, PR's label Feb 27, 2025
@UltralyticsAssistant
Copy link
Member

👋 Hello @metagic, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, could you please provide a minimum reproducible example (MRE) that demonstrates the issue (e.g., dataset, relevant code snippet, and specific steps)? Screenshots or GIFs can also help us better understand what you are experiencing.

If this is a ❓ Question, we’d appreciate it if you could provide additional details about your segmentation task, including the dataset, model versions (YOLOv11 and YOLOv8), and any relevant training or inference settings that might help us troubleshoot or guide you better.

Regarding your side question about directly obtaining contours/polylines: this is an excellent suggestion! While outlines are not directly output by the model, it would be great to hear more about your workflow requirements. Feel free to share any additional thoughts on this so we can consider improvements 🎉.

An Ultralytics engineer will review and assist you further soon. Thank you for your patience and detailed issue description 🌟!

@pderrenger
Copy link
Member

@metagic thank you for reporting this segmentation mask observation and including a clear example. Let's break this down:

  1. Bounding Box Relationship: YOLO segmentation models predict masks within the detected bounding box region. The mask is confined to these coordinates, which can sometimes lead to apparent "cutoffs" if:

    • The box is tightly cropped around the object
    • There's slight box coordinate misalignment
    • The mask extends beyond the predicted box boundaries
  2. Contour Extraction: While the model outputs masks rather than direct polylines, you can cleanly extract contours using OpenCV with our recommended approach shown in the Isolating Segmentation Objects guide. Here's a condensed version:

from ultralytics import YOLO
import cv2

model = YOLO('yolo11n-seg.pt')
results = model.predict('your_image.jpg')

for r in results:
    for mask in r.masks:
        contour = mask.xy[0].astype(np.int32).reshape(-1, 1, 2)
        # Now use contour with OpenCV operations
  1. Potential Solutions:
    • Box Expansion: Add padding to detected boxes before mask extraction (5-10% of box dimensions often helps)
    • Mask Refinement: Apply morphological operations (dilation/erosion) to the binary mask
    • Model Tuning: Consider fine-tuning on your stone dataset with slightly expanded box annotations

The team is actively working on segmentation enhancements, and we appreciate you surfacing this edge case. For immediate needs, the box expansion approach tends to be most effective while maintaining real-time performance.

For more details on working with segmentation outputs, see our Segment Task documentation and Mask Processing guide.

@metagic
Copy link
Author

metagic commented Feb 27, 2025

Thank you Paula! This clears up quite a few things for me. The masks being subject to their bounding boxes also has the (undesired, at least in my case) effect that overlaps are computed from bounding boxes, not object shapes.

@pderrenger
Copy link
Member

@metagic thank you for highlighting this important nuance! You're absolutely correct that overlap computations (like IoU) in standard detection pipelines are based on bounding boxes rather than mask shapes. Let's clarify and offer solutions:

  1. Current Behavior: Non-Maximum Suppression (NMS) and overlap calculations do indeed use box coordinates by default for computational efficiency. This is common across most real-time segmentation architectures.

  2. Mask-Aware Solutions:

    • Post-Processing: After obtaining results, calculate true mask IoU using:
      from ultralytics.utils.metrics import box_iou, mask_iou
      
      # Compare mask IoU for two detections
      true_overlap = mask_iou(mask1, mask2)  # More accurate but slower
    • NMS Tuning: Try lowering iou threshold in model.predict(iou=0.3) to reduce box-based suppression of valid mask-shaped objects
  3. Advanced Option: For shape-priority tasks, consider our SAM-powered segmentation which offers:

    • Mask-based NMS via crop_nms_thresh parameter
    • Shape-focused instance grouping

Would you like me to elaborate on any of these approaches? For stone segmentation where precise boundaries are crucial, the mask-IoU post-processing combined with slightly expanded boxes (from our previous discussion) often yields the best accuracy/compute balance. 💎

@metagic
Copy link
Author

metagic commented Mar 3, 2025

Thank you Paula for your most helpful advice. I switched to the SAM (2) model and indeed get well drawn contours. Obviously at the cost of speed, but this is less of an issue for the images that will be used as a reference set and thus will only be treated once. For the images taken in the field, I will need a model that performs in real time (at walking speed), but sketchy contours should then be ok.

I still struggle with the overlaps, as show in this example:

Image

I'd like to get rid of the strange double contours around some of the stones (best seen around the pink & red contoured stone at the bottom edge) as well as the other overlaps. Your help is most welcome!

@pderrenger
Copy link
Member

Thank you for the update and clear example! For SAM-powered stone segmentation, here's how we can refine those overlapping contours:

  1. Mask NMS Tuning:
    SAM's generate() method has a crop_nms_thresh parameter that controls mask-based NMS between crops. Try lowering this threshold to be more aggressive with overlap removal:
from ultralytics import SAM

model = SAM('sam_b.pt')
results = model.generate(im, crop_nms_thresh=0.4)  # Default 0.7
  1. Small Region Removal:
    Use SAM's built-in remove_small_regions postprocessing to eliminate fragmented mask artifacts:
masks, keep = SAM.remove_small_regions(results.masks.data, min_area=100, nms_thresh=0.4)
results.update(masks=masks)
  1. Confidence Filtering:
    Combine with stricter confidence thresholds in your initial prediction:
results = model.generate(im, conf_thres=0.9, stability_score_thresh=0.92)

For field deployment where real-time performance is needed, you might want to:

  1. Pre-process images with YOLO11-seg to get initial candidate regions
  2. Feed these detections to SAM as prompt inputs (predictor.prompt_inference() with bboxes=YOLO_detections)
  3. Use SAM's mask refinement only on these high-confidence regions

This hybrid approach balances speed and precision. Would you like me to elaborate on any of these strategies? The team is particularly interested in geological applications and would welcome any insights on your stone segmentation use case! 🪨

@metagic
Copy link
Author

metagic commented Mar 4, 2025

I very much appreciate your interest and competent advice. The application relates more to an urban than a geological setting. Imagine a blind person crossing Plaza Mayor from calle de Toledo to the Tourist Center with a (CV-enhanced) white cane. The cane needs to assemble stone mosaics, compare them with georeferenced tiles and retrieve its position. This is the use case.

Unfortunately, I was not yet successful applying your code sample. Instead I reverted to some (cludgy) post-processing, just to demonstrate the approach:

Image

The image has some of the "mosaic metrics" that I believe to be relevant in the matching process: angle and distance to (detected) neighboring stones, areas, shapes, aspect ratios, pavement axis. I might use that in a classical CV matching approach, or, probably more efficiently, with a specifically trained AI.

Next, I will try out your field deployment suggestion - thank you very much!

@pderrenger
Copy link
Member

Thank you for sharing this compelling urban navigation use case - what an impactful application of computer vision! 🌆 Your mosaic metric approach shows excellent insight. Let me offer some targeted suggestions to refine the implementation:

  1. Hybrid Architecture Optimization
    For real-time cane navigation, consider this streamlined pipeline:
from ultralytics import YOLO, SAM

# Field Deployment (YOLO11n = 2.7ms inference)
yolo_model = YOLO('yolo11n-seg.pt')  # Ultra-lightweight
sam_model = SAM('sam2_b.pt')

yolo_results = yolo_model.predict(stream=True, imgsz=320)  # Mobile-optimized

for frame in video_stream:
    # Get YOLO detections with 20% box padding
    padded_boxes = pad_boxes(yolo_results.boxes.xyxy, padding=0.2)
    
    # SAM refinement only on high-confidence detections
    sam_results = sam_model.prompt_inference(frame, bboxes=padded_boxes[yolo_results.boxes.conf > 0.7])
    
    # Your metric extraction pipeline
    process_mosaic_metrics(sam_results.masks)

This balances <15ms latency with SAM's precision on critical detections.

  1. Dataset Enhancement
    The auto-annotate tool could help build a mosaic-specific dataset:
from ultralytics.data.annotator import auto_annotate

auto_annotate(data="mosaic_images/", 
              det_model="yolo11s.pt", 
              sam_model="sam2_l.pt",  # Use largest for annotation quality
              classes=[stone_category_id],
              output_dir="annotations/")

This creates precise masks to train a specialized YOLO11 model on your stone patterns.

  1. Overlap Resolution
    For your double contour issue, try SAM2.1's enhanced mask constraints:
# In your SAM initialization
model = SAM('sam2.1_b.pt')
results = model.generate(im, non_overlap_masks=True)  # New in 2.1!

This activates built-in anti-overlap logic during segmentation.

The team would be fascinated to learn more about your geospatial matching approach as it develops. Your work exemplifies the real-world impact we aim to enable - please keep us updated on progress! 🦯

For additional guidance on SAM2.1's new features, see the updated SAM 2.1 documentation. The non_overlap_masks parameter and enhanced prompt handling might be particularly relevant.

@metagic
Copy link
Author

metagic commented Mar 5, 2025

Thank you for your kind words and the encouragement - there is still a long way to go and your support is extremely valuable. I'd be glad to keep you up-to-date with the project. I guess Discord would be a good place for that?

Do I understand you correctly that where you put 'yolo11n-seg.pt' and 'yolo11s.pt' in your suggestions 1 and 2, you actually expect me to put in my own stone-trained models? The bare YOLO models would not know about the stones.

Regarding suggestion 3, I am at a complete loss: neither do I find a generate class in the SAM object nor any documentation regarding "non_overlap_masks". I am running Ultralytics (8.3.83) locally. Thank you for your support and patience.

@pderrenger
Copy link
Member

Thank you for the clarification! Let me address your points systematically:

  1. Model Training:
    You're absolutely correct - the yolo11n-seg.pt and yolo11s.pt references assume you've first fine-tuned a YOLO segmentation model on your stone dataset. The pretrained models wouldn't know about stones without training. Our Model Training section explains how to train custom models, and the auto_annotate() function we discussed earlier can help create your stone segmentation dataset.

  2. SAM2.1 Upgrade:
    The non_overlap_masks parameter is new in SAM2.1, which requires Ultralytics>=8.4.0. Please update your environment:

    pip install -U ultralytics

    Then you can access the enhanced SAM2.1 with:

    from ultralytics import SAM
    model = SAM('sam2.1_b.pt')
    results = model.generate(im, non_overlap_masks=True)  # Now available!

    The updated SAM2.1 documentation details these improvements.

  3. Discord Community:
    Yes! Please join our Ultralytics Discord where many computer vision practitioners (including geospatial experts) share real-world implementations. We'd be fascinated to follow your navigation assistive technology development there.

For immediate needs, the version upgrade should unlock the missing SAM2.1 functionality. Let me know if you encounter any installation hurdles! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested segment Instance Segmentation issues, PR's
Projects
None yet
Development

No branches or pull requests

3 participants