This is a person reidentification model for a general scenario. It uses a whole body image as an input and outputs an embedding vector to match a pair of images by the cosine distance. The model is based on the OmniScaleNet backbone with Linear Context Transform (LCT) blocks developed for fast inference. A single reidentification head from the 1/16 scale feature map outputs an embedding vector of 256 floats.
Metric | Value |
---|---|
Market-1501 rank@1 accuracy | 96.2 % |
Market-1501 mAP | 87.7 % |
Pose coverage | Standing upright, parallel to image plane |
Support of occluded pedestrians | YES |
Occlusion coverage | <50% |
GFlops | 1.993 |
MParams | 2.103 |
Source framework | PyTorch* |
The cumulative matching curve (CMC) at rank-1 is accuracy denoting the possibility to locate at least one true positive in the top-1 rank. Mean Average Precision (mAP) is the mean across Average Precision (AP) of all queries. AP is defined as the area under the precision and recall curve.
The net expects one input image of the shape 1, 3, 256, 128
in the B, C, H, W
format, where:
B
- batch sizeC
- number of channelsH
- image heightW
- image width
The expected color order is BGR
.
The net outputs a blob with the 1, 256
shape named reid_embedding
which can be
compared with other descriptors using the
cosine distance.
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
- Crossroad Camera C++ Demo
- Multi Camera Multi Target Python* Demo
- Pedestrian Tracker C++ Demo
- Social Distance C++ Demo
[*] Other names and brands may be claimed as the property of others.