Average Precision and Average Recall metrics reported by COCOeval seem to be incorrect #672

tinybike · 2024-07-22T20:59:20Z

I'm not sure if this is the right place to report issues with https://github.com/ppwwyyxx/cocoapi -- that repo doesn't have its own Issues tab, so I'm opening an issue here instead.

I'm confused by how pycocotools calculates average precision and recall metrics reported in the summary. I'm not sure if it's actually a bug, or if I'm just fundamentally misunderstanding how these calculations are being done under the hood. So, I wrote out a super simple test case, just taking two bboxes with perfect overlap and passing them into COCOeval:

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
 
actual_boxes = [[50, 50, 150, 150], [200, 200, 300, 300]]
predicted_boxes =  [[50, 50, 150, 150], [200, 200, 300, 300]]
scores = [1.0, 1.0]
coco_actual = COCO()
coco_predicted = COCO()
actual_annotations_list = []
predicted_annotations_list = []
for id, box in enumerate(actual_boxes):
    actual_annotations_list.append({
        "id": id,
        "image_id": 1,
        "category_id": 1,
        "bbox": [box[0], box[1], box[2] - box[0], box[3] - box[1]],
        "area": (box[2] - box[0]) * (box[3] - box[1]),
        "iscrowd": 0,
    })
for id, box in enumerate(predicted_boxes):
    predicted_annotations_list.append({
        "id": id,
        "image_id": 1,
        "category_id": 1,
        "bbox": [box[0], box[1], box[2] - box[0], box[3] - box[1]],
        "area": (box[2] - box[0]) * (box[3] - box[1]),
        "iscrowd": 0,
        "score": scores[id],
    })
coco_actual.dataset = {
    "images": [{"id": 1}],
    "annotations": actual_annotations_list,
    "categories": [{"id": 1, "name": "object"}],
}
coco_actual.createIndex()
coco_predicted.dataset = {
    "images": [{"id": 1}],
    "annotations": predicted_annotations_list,
    "categories": [{"id": 1, "name": "object"}],
}
coco_predicted.createIndex()
coco_eval = COCOeval(coco_actual, coco_predicted, iouType="bbox")
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()

Here is the output:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.252
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.500

I believe these are considered "large", and the summary shows AP=0.252 and AR=0.500. These numbers do not make sense to me. Actual and predicted are 100% identical here, so we'd expect average precision and recall to both be 1.0, right? Am I misunderstanding something, or is there a bug in how these metrics are calculated?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Average Precision and Average Recall metrics reported by COCOeval seem to be incorrect #672

Average Precision and Average Recall metrics reported by COCOeval seem to be incorrect #672

tinybike commented Jul 22, 2024

Average Precision and Average Recall metrics reported by COCOeval seem to be incorrect #672

Average Precision and Average Recall metrics reported by COCOeval seem to be incorrect #672

Comments

tinybike commented Jul 22, 2024