Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average Precision and Average Recall metrics reported by COCOeval seem to be incorrect #672

Open
tinybike opened this issue Jul 22, 2024 · 0 comments

Comments

@tinybike
Copy link

I'm not sure if this is the right place to report issues with https://github.com/ppwwyyxx/cocoapi -- that repo doesn't have its own Issues tab, so I'm opening an issue here instead.

I'm confused by how pycocotools calculates average precision and recall metrics reported in the summary. I'm not sure if it's actually a bug, or if I'm just fundamentally misunderstanding how these calculations are being done under the hood. So, I wrote out a super simple test case, just taking two bboxes with perfect overlap and passing them into COCOeval:

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
 
actual_boxes = [[50, 50, 150, 150], [200, 200, 300, 300]]
predicted_boxes =  [[50, 50, 150, 150], [200, 200, 300, 300]]
scores = [1.0, 1.0]
coco_actual = COCO()
coco_predicted = COCO()
actual_annotations_list = []
predicted_annotations_list = []
for id, box in enumerate(actual_boxes):
    actual_annotations_list.append({
        "id": id,
        "image_id": 1,
        "category_id": 1,
        "bbox": [box[0], box[1], box[2] - box[0], box[3] - box[1]],
        "area": (box[2] - box[0]) * (box[3] - box[1]),
        "iscrowd": 0,
    })
for id, box in enumerate(predicted_boxes):
    predicted_annotations_list.append({
        "id": id,
        "image_id": 1,
        "category_id": 1,
        "bbox": [box[0], box[1], box[2] - box[0], box[3] - box[1]],
        "area": (box[2] - box[0]) * (box[3] - box[1]),
        "iscrowd": 0,
        "score": scores[id],
    })
coco_actual.dataset = {
    "images": [{"id": 1}],
    "annotations": actual_annotations_list,
    "categories": [{"id": 1, "name": "object"}],
}
coco_actual.createIndex()
coco_predicted.dataset = {
    "images": [{"id": 1}],
    "annotations": predicted_annotations_list,
    "categories": [{"id": 1, "name": "object"}],
}
coco_predicted.createIndex()
coco_eval = COCOeval(coco_actual, coco_predicted, iouType="bbox")
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()

Here is the output:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.252
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.500

I believe these are considered "large", and the summary shows AP=0.252 and AR=0.500. These numbers do not make sense to me. Actual and predicted are 100% identical here, so we'd expect average precision and recall to both be 1.0, right? Am I misunderstanding something, or is there a bug in how these metrics are calculated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant