A trained model of ICNet for fast semantic segmentation, trained on the CamVid dataset from scratch using the TensorFlow* framework. The trained model has 30% sparsity (ratio of zeros within all the convolution kernel weights). For details about the original floating-point model, check out the ICNet for Real-Time Semantic Segmentation on High-Resolution Images.
The model input is a blob that consists of a single image of 1, 720, 960, 3
in the BGR
order. The pixel values are integers in the [0, 255] range.
The model output for icnet-camvid-ava-sparse-30-0001
is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset:
- Sky
- Building
- Pole
- Road
- Pavement
- Tree
- SignSymbol
- Fence
- Vehicle
- Pedestrian
- Bike
- Unlabeled
Metric | Value |
---|---|
GFlops | 75.8180 |
MParams | 26.7043 |
Source framework | TensorFlow* |
The quality metrics were calculated on the CamVid validation dataset. The unlabeled
class had been ignored during metrics calculation.
Metric | Value |
---|---|
mIoU | 75.87% |
IOU=TP/(TP+FN+FP)
, where:TP
- number of true positive pixels for given classFN
- number of false negative pixels for given classFP
- number of false positive pixels for given class
Image, name: data
, shape - 1, 720, 960, 3
, format is B, H, W, C
, where:
B
- batch sizeH
- heightW
- widthC
- channel
Channel order is BGR
.
Semantic segmentation class prediction map, shape - 1, 720, 960
, output data format is B, H, W
, where:
B
- batch sizeH
- horizontal coordinate of the input pixelW
- vertical coordinate of the input pixel
Output contains the class prediction result of each pixel.
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
[*] Other names and brands may be claimed as the property of others.