Adding no-kalman tracking mode, class_id aggregation, video tracking …

…examples and other small improvements (#17)
wmuron · Sep 22, 2021 · 50d8be8 · 50d8be8
1 parent 6d52e07
commit 50d8be8
Show file tree

Hide file tree

Showing 20 changed files with 674 additions and 214 deletions.
diff --git a/Makefile b/Makefile
@@ -20,7 +20,7 @@ env-install:
 clean:
 	autoflake --in-place --remove-unused-variables ./motpy/*.py ./tests/*.py
 
-check:
+static-check:
 	mypy --ignore-missing-imports motpy
 
 demo-mot16:
@@ -29,3 +29,11 @@ demo-mot16:
 
 demo-webcam:
 	python examples/webcam_face_tracking.py
+
+demo-video:
+	python examples/detect_and_track_in_video.py \
+            --video_path=./assets/video.mp4 \
+            --detect_labels=['car','truck'] \
+            --tracker_min_iou=0.2 \
+            --architecture=fasterrcnn \
+            --device=cuda
diff --git a/README.md b/README.md
@@ -4,57 +4,86 @@ Project is meant to provide a simple yet powerful baseline for multiple object t
 
 ![2D tracking preview](assets/mot16_challange.gif)
 
-*video source: https://motchallenge.net/data/MOT16/ - sequence 11*
+_video source: <https://motchallenge.net/data/MOT16/> - sequence 11_
 
-## Features:
+## Features
 
-- tracking by detection paradigm
-- IOU + (optional) feature similarity matching strategy
-- Kalman filter used to model object trackers
-- each object is modeled as a center point (n-dimensional) and its size (n-dimensional); e.g. 2D position with width and height would be the most popular use case for bounding boxes tracking
-- seperately configurable system order for object position and size (currently 0th, 1st and 2nd order systems are allowed)
-- quite fast, more than realtime performance even on Raspberry Pi
+    - tracking by detection paradigm
+    - IOU + (optional) feature similarity matching strategy
+    - Kalman filter used to model object trackers
+    - each object is modeled as a center point (n-dimensional) and its size (n-dimensional); e.g. 2D position with width and height would be the most popular use case for bounding boxes tracking
+    - seperately configurable system order for object position and size (currently 0th, 1st and 2nd order systems are allowed)
+    - quite fast, more than realtime performance even on Raspberry Pi
 
-## Installation:
+## Installation
 
-### Latest release:
+### Latest release
 
 ```bash
 pip install motpy
 ```
 
-### Develop:
+#### Additional installation steps on Raspberry Pi
+
+You might need to have to install following dependencies on RPi platform:
+
+```bash
+sudo apt-get install python-scipy
+sudo apt install libatlas-base-dev
+```
+
+### Develop
+
 ```bash
 git clone https://github.com/wmuron/motpy
 cd motpy 
 make install-develop # to install editable version of library
 make test # to run all tests
 ```
 
-## Demo
+## Example usage
 
-### 2D tracking
+### 2D tracking - synthetic example
 
 Run demo example of tracking N objects in 2D space. In the ideal world it will show a bunch of colorful objects moving on a grey canvas in various directions, sometimes overlapping, sometimes not. Each object is detected from time to time (green box) and once it's being tracked by motpy, its track box is drawn in red with an ID above.
 
-```
+```bash
 make demo
 ```
 
-![2D tracking preview](assets/2d_multi_object_tracking.gif)
+<https://user-images.githubusercontent.com/5874874/134305624-d6358cb1-39f8-4499-8a7b-64745f4795a6.mp4>
+
+### Detect and track objects in the video
+
+-   example uses a COCO-trained model provided by torchvision library
+-   to run this example, you'll have to install `requirements_dev.txt` dependencies (`torch`, `torchvision`, etc.)
+-   to run on CPU, specify `--device=cpu` 
+
+```bash
+python examples/detect_and_track_in_video.py \
+            --video_path=./assets/video.mp4 \
+            --detect_labels=['car','truck'] \
+            --tracker_min_iou=0.15 \
+            --device=cuda
+```
+
+<https://user-images.githubusercontent.com/5874874/134303165-b6835c8a-9cfe-486c-b79f-499f638c0a71.mp4>
+
+_video source: <https://www.youtube.com/watch?v=PGMu_Z89Ao8/>, a great YT channel created by J Utah_
 
 ### MOT16 challange tracking
 
-1. Download MOT16 dataset from `https://motchallenge.net/data/MOT16/` and extract to `~Downloads/MOT16` directory,
-2. Type the command: 
-   ```bash
-   python examples/mot16_challange.py --dataset_root=~/Downloads/MOT16 --seq_id=11
-   ```
-   This will run a simplified example where a tracker processes artificially corrupted ground-truth bounding boxes from sequence 11; you can preview the expected results in the beginning of the README file.
+1.  Download MOT16 dataset from `https://motchallenge.net/data/MOT16/` and extract to `~/Downloads/MOT16` directory,
+2.  Type the command: 
+    ```bash
+    python examples/mot16_challange.py --dataset_root=~/Downloads/MOT16 --seq_id=11
+    ```
+    This will run a simplified example where a tracker processes artificially corrupted ground-truth bounding boxes from sequence 11; you can preview the expected results in the beginning of the README file.
 
 ### Face tracking on webcam
 
 Run the following command to start tracking your own face.
+
 ```bash
 python examples/webcam_face_tracking.py
 ```
@@ -104,26 +133,30 @@ model_spec = {
         'r_var_pos': 0.1 # measurement noise
     }
 
-tracker = MultiObjectTracker(dt=1 / 10, model_spec=model_spec)
+tracker = MultiObjectTracker(dt=0.1, model_spec=model_spec)
 ```
 
 The simplification used here is that the object position and size can be treated and modeled independently; hence you can use even 2D bounding boxes in 3D space.
 
 Feel free to tune the parameter of Q and R matrix builders to better fit your use case.
 
 ## Tested platforms
-- Linux (Ubuntu)
-- macOS (Catalina)
-- Raspberry Pi (4)
+
+    - Linux (Ubuntu)
+    - macOS (Catalina)
+    - Raspberry Pi (4)
 
 ## Things to do
 
-- [x] Initial version
-- [ ] Documentation
-- [ ] Performance optimization
-- [ ] Multiple object classes support
+    - [x] Initial version
+    - [ ] Documentation
+    - [ ] Performance optimization
+    - [x] Multiple object classes support via instance-level class_id counting
+    - [x] Allow tracking without Kalman filter
+    - [x] Easy to use and configurable example of video processing with off-the-shelf object detector
 
 ## References, papers, ideas and acknowledgements
-- https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/
-- http://elvera.nue.tu-berlin.de/files/1517Bochinski2017.pdf
-- https://arxiv.org/abs/1602.00763
+
+    - https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/
+    - http://elvera.nue.tu-berlin.de/files/1517Bochinski2017.pdf
+    - https://arxiv.org/abs/1602.00763
diff --git a/assets/2d_multi_object_tracking.gif b/assets/2d_multi_object_tracking.gif
diff --git a/assets/video.mp4 b/assets/video.mp4
diff --git a/examples/2d_multi_object_tracking.py b/examples/2d_multi_object_tracking.py
@@ -1,14 +1,21 @@
+import time
 
 import cv2
-import motpy
 from motpy import ModelPreset, MultiObjectTracker
 from motpy.core import setup_logger
-from motpy.testing_viz import draw_rectangle, draw_text, image_generator
+from motpy.testing_viz import draw_rectangle, draw_track, image_generator
+from motpy.utils import ensure_packages_installed
 
-logger = setup_logger(__name__, is_main=True)
+ensure_packages_installed(['cv2'])
 
 
-def demo_tracking_visualization(num_steps: int = 1000, num_objects: int = 10):
+logger = setup_logger(__name__, 'DEBUG', is_main=True)
+
+
+def demo_tracking_visualization(
+        model_spec=ModelPreset.constant_acceleration_and_static_box_size_2d.value,
+        num_steps: int = 1000,
+        num_objects: int = 20):
     gen = image_generator(
         num_steps=num_steps,
         num_objects=num_objects,
@@ -20,23 +27,24 @@ def demo_tracking_visualization(num_steps: int = 1000, num_objects: int = 10):
     dt = 1 / 24
     tracker = MultiObjectTracker(
         dt=dt,
-        model_spec=ModelPreset.constant_acceleration_and_static_box_size_2d.value,
+        model_spec=model_spec,
         active_tracks_kwargs={'min_steps_alive': 2, 'max_staleness': 6},
         tracker_kwargs={'max_staleness': 12})
 
     for _ in range(num_steps):
         img, _, detections = next(gen)
-
         detections = [d for d in detections if d.box is not None]
+
+        t0 = time.time()
         active_tracks = tracker.step(detections=detections)
+        elapsed = (time.time() - t0) * 1000.
+        logger.debug(f'tracking elapsed time: {elapsed:.3f} ms')
 
         for track in active_tracks:
-            score = track.score if track.score is not None else -1
-            img = draw_rectangle(img, track.box, color=(10, 10, 220), thickness=5)
-            img = draw_text(img, f'{track.id[:8]}... ({score:.2f})', above_box=track.box)
+            draw_track(img, track)
 
         for det in detections:
-            img = draw_rectangle(img, det.box, color=(10, 220, 20), thickness=1)
+            draw_rectangle(img, det.box, color=(10, 220, 20), thickness=1)
 
         cv2.imshow('preview', img)
         # stop the demo by pressing q

diff --git a/examples/coco_labels.py b/examples/coco_labels.py
@@ -0,0 +1,107 @@
+from typing import Sequence
+
+COCO_LABELS = {0: '__background__',
+               1: 'person',
+               2: 'bicycle',
+               3: 'car',
+               4: 'motorcycle',
+               5: 'airplane',
+               6: 'bus',
+               7: 'train',
+               8: 'truck',
+               9: 'boat',
+               10: 'traffic light',
+               11: 'fire hydrant',
+               12: 'stop sign',
+               13: 'parking meter',
+               14: 'bench',
+               15: 'bird',
+               16: 'cat',
+               17: 'dog',
+               18: 'horse',
+               19: 'sheep',
+               20: 'cow',
+               21: 'elephant',
+               22: 'bear',
+               23: 'zebra',
+               24: 'giraffe',
+               25: 'backpack',
+               26: 'umbrella',
+               27: 'handbag',
+               28: 'tie',
+               29: 'suitcase',
+               30: 'frisbee',
+               31: 'skis',
+               32: 'snowboard',
+               33: 'sports ball',
+               34: 'kite',
+               35: 'baseball bat',
+               36: 'baseball glove',
+               37: 'skateboard',
+               38: 'surfboard',
+               39: 'tennis racket',
+               40: 'bottle',
+               41: 'wine glass',
+               42: 'cup',
+               43: 'fork',
+               44: 'knife',
+               45: 'spoon',
+               46: 'bowl',
+               47: 'banana',
+               48: 'apple',
+               49: 'sandwich',
+               50: 'orange',
+               51: 'broccoli',
+               52: 'carrot',
+               53: 'hot dog',
+               54: 'pizza',
+               55: 'donut',
+               56: 'cake',
+               57: 'chair',
+               58: 'couch',
+               59: 'potted plant',
+               60: 'bed',
+               61: 'dining table',
+               62: 'toilet',
+               63: 'tv',
+               64: 'laptop',
+               65: 'mouse',
+               66: 'remote',
+               67: 'keyboard',
+               68: 'cell phone',
+               69: 'microwave',
+               70: 'oven',
+               71: 'toaster',
+               72: 'sink',
+               73: 'refrigerator',
+               74: 'book',
+               75: 'clock',
+               76: 'vase',
+               77: 'scissors',
+               78: 'teddy bear',
+               79: 'hair drier',
+               80: 'toothbrush'}
+
+
+def get_class_ids(labels) -> Sequence[int]:
+    if len(labels) == 0:
+        raise ValueError('specify more than one label to detect')
+
+    if isinstance(labels[0], int):
+        for class_id in labels:
+            if class_id not in COCO_LABELS:
+                raise ValueError(f'provided unknown COCO class id: {class_id}')
+
+        return labels
+    elif isinstance(labels[0],  str):
+        inv = {v: k for k, v in COCO_LABELS.items()}
+        class_ids = []
+        for class_name in labels:
+            if class_name not in inv:
+                raise ValueError(f'provided unknown COCO class name: {class_name}')
+
+            class_ids.append(inv[class_name])
+
+        return class_ids
+    else:
+        raise NotImplementedError()