Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

galthran-wq · 2025-02-13T13:35:04Z

Tickets

What has been done

With this PR, .extract_faces is able to accept a list of images

How to test

make lint && make test

Benchmarking on detecting 50 faces:

For yolov11n, batch size 20 is 59.27% faster than batch size 1.
For yolov11s, batch size 20 is 29.00% faster than batch size 1.
For yolov11m, batch size 20 is 31.73% faster than batch size 1.
For yolov8, batch size 20 is 12.68% faster than batch size 1.

skyler14 · 2025-02-16T08:53:51Z

Do you have a branch in your fork that currently combines all the optimizations you've submitted. I'd like to start using them while the approval process is going

whats been the total speedup you've been able to see

galthran-wq · 2025-02-16T09:30:38Z

I do. You can check
https://github.com/galthran-wq/deepface/tree/master-enhanced

it combines these two PRs with some other small modifications:

.represent uses batched detector inference (here Batching on .represent to improve performance and utilize GPU in full #1433 it only does batched embedding, because batched detection is not yet implemented)
.represent returns a list of list of dicts, if a batch of images is passed. This is neccessary to be able to recover to which images the resulting faces correspond to. It might be a good idea to include this change in the PR as well. You can check the test in the fork.

Not all of the detectors currently (both in this PR and in the fork) implement batching. In particular, YOLO does. I've found it to be optimal in terms of performance and inference speed. The only problem is installing both torch and tensorflow with GPU, but I've managed to somehow do that.

All in all, with the combination of yolov11m and Facenet, both using GPU, and batch size 100 (the largest I could fit in 4090) I am seeing aroung 15x speed boost, but that is highly dependent on the input images, the GPU (especially memory size). I've also had a quick peek and it seems like the performance on the CPU is improved as well.

@serengil FYI I would be happy to contribute the aforementioned modifications if we have progress on the PRs.

serengil · 2025-02-16T09:32:37Z

I will review this PR this week i hope

serengil · 2025-02-16T12:07:23Z

Seems this breaks the unit tests. Must be sorted.

galthran-wq · 2025-02-16T14:47:36Z

should be good now

serengil · 2025-02-16T15:25:22Z

Nope, still failing.

tests/test_extract_faces.py

serengil · 2025-02-17T13:00:32Z

deepface/models/face_detection/RetinaFace.py

-            resp.append(facial_area)
-
-        return resp
+        if isinstance(img, np.ndarray):


what if img is (2, 224, 224, 3) sized numpy array? i mean many images in single numpy array.

serengil · 2025-02-17T13:04:06Z

You implemented OpenCv, Ssd, Yolo, MtCnn and RetinaFace to accept list inputs

What if I send list to YuNet, MediaPipe, FastMtCnn, Dlib or CenterFace?

I assume an exception will be thrown, but users should see a meaningful message.

serengil · 2025-02-18T10:42:54Z

@galthran-wq you are pushing too many things, would you please inform me when it is ready.

galthran-wq · 2025-02-18T10:42:25Z

tests/test_extract_faces.py

+        )
+
+
+def test_batch_extract_faces_single_image():


FYI this might be not the expected behaviour

what do you mean?

galthran-wq · 2025-02-18T10:44:03Z

tests/test_extract_faces.py

+def test_batch_extract_faces(detector_backend):
+    detector_backend_to_rtol = {
+        "opencv": 0.1,
+        "mtcnn": 0.2,


turns out batching has different impact on the result for different backends. It is mostly the same (for almost all the models, the relative error is <1%.
mtcnn and opencv suffer a bit more, not quite sure why (which features are different exactly)

we may consider to throw exception for those detectors as they are not suitable for batch maybe

tests/test_extract_faces.py

galthran-wq · 2025-02-18T10:47:24Z

@galthran-wq you are pushing too many things, would you please inform me when it is ready.

it is ready now, im usually pushing when it's done

I've just fixed what you've noted:

added pseudo-batching (for loop) for other models
support for np array batched input (dim=4)
changed extract_faces to return a list of lists on batched input
couple more tests

serengil · 2025-02-18T10:59:18Z

tests/test_extract_faces.py

+        len(imgs_objs_batch) == 4 and 
+        all(isinstance(obj, list) for obj in imgs_objs_batch)
+    )
+    assert all(


please add comment, the last one has many faces, others have just one face

serengil · 2025-02-18T11:07:44Z

tests/test_extract_faces.py

@@ -79,6 +79,144 @@ def test_different_detectors():
        logger.info(f"✅ extract_faces for {detector} backend test is done")


+@pytest.mark.parametrize("detector_backend", [


please add an unit test case for

img1 = cv2.imread(img1_path) img2 = cv2.imread(img2_path) img = np.stack([img1, img2]) assert len(img.shape) == 4 # Check dimension. assert img.shape[0] == 2 # Check batch size.

what if image is batch in numpy format?

serengil · 2025-02-20T13:40:50Z

deepface/models/face_detection/OpenCv.py

-        resp = []
-
-        detected_face = None
+        if isinstance(img, np.ndarray):


I recommend you to call detect_faces itself it if it is list. this can also be done in Detector.py or detection.py.

in that way, we will not change all detectors

serengil · 2025-02-20T17:49:53Z

is this ready? i can still see my comments not resolved.

galthran-wq · 2025-02-20T18:14:15Z

is this ready? i can still see my comments not resolved.

almost, tell me if you think that the batched numpy array test is okay and let's also settle on whether refactoring of current detect_faces for non-batched detectors is needed (your last comment).

I plan to:

add test comments
fallback to pseudo-batching on opencv and mtcnn and make sure the rtol stays at 0.01% then
(maybe) refactor detectors a bit

galthran-wq added 10 commits February 12, 2025 09:43

batched detection

f4d18a7

deepFace batch detection; typing

0ad7c57

test batch extract faces

b38e95c

chagne detector interface

737ee79

opencv pseudo batching

b2d6178

yolo detect batched

1bd8335

enhance batched detector test

bbf6a55

mtcnn batching

ba2ff90

soft test

ad01724

true batching on detect_faces

799f83c

galthran-wq added 4 commits February 16, 2025 13:30

detection skip

619930c

pseudo batched retinaface

b544a2d

test diff detetors

8bfdcf1

lint

7e59cdf

galthran-wq added 2 commits February 17, 2025 12:08

optional MtCnn batching (does not work in python3.8)

0f67dda

lint

c4b4b4a

serengil reviewed Feb 17, 2025

View reviewed changes

tests/test_extract_faces.py Show resolved Hide resolved

serengil reviewed Feb 17, 2025

View reviewed changes

tests/test_extract_faces.py Show resolved Hide resolved

serengil reviewed Feb 17, 2025

View reviewed changes

galthran-wq added 4 commits February 18, 2025 08:48

detect faces return list of lists on batched inputs

60bee4e

add a couple to test batch extract faces

f3d05ef

add more models and detector-specific rtol

1c825e8

pseudo-batching dlib

1d358aa

galthran-wq added 10 commits February 18, 2025 09:47

pseudo-batching centerface

f5188c8

psedu-batching fastmtcnn

26e537d

mediapipe pseudo bathcing

7f04e6b

yunet pseudobatching

991566f

change interface in a special case

70b61a7

batch test add other detector models

27dea80

test numpy array batched input

526ab1b

fix batched numpy array input

988afa6

lint

3e34675

batch extract faces on single image special case

2eb5cac

galthran-wq commented Feb 18, 2025

View reviewed changes

lint

c30f55c

serengil reviewed Feb 18, 2025

View reviewed changes

serengil reviewed Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

galthran-wq commented Feb 13, 2025

skyler14 commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil Feb 17, 2025

serengil commented Feb 17, 2025

serengil commented Feb 18, 2025

galthran-wq Feb 18, 2025

serengil Feb 18, 2025

galthran-wq Feb 18, 2025

serengil Feb 18, 2025

galthran-wq commented Feb 18, 2025 •

edited

Loading

serengil Feb 18, 2025

serengil Feb 18, 2025

serengil Feb 20, 2025

serengil commented Feb 20, 2025

galthran-wq commented Feb 20, 2025

		@@ -79,6 +79,144 @@ def test_different_detectors():
		logger.info(f"✅ extract_faces for {detector} backend test is done")


		@pytest.mark.parametrize("detector_backend", [

Batching on .extract_faces to improve performance and utilize GPU in full #1435

Are you sure you want to change the base?

Batching on .extract_faces to improve performance and utilize GPU in full #1435

Conversation

galthran-wq commented Feb 13, 2025

Tickets

What has been done

How to test

skyler14 commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

Choose a reason for hiding this comment

serengil commented Feb 17, 2025

serengil commented Feb 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

galthran-wq commented Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serengil commented Feb 20, 2025

galthran-wq commented Feb 20, 2025

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

galthran-wq commented Feb 18, 2025 •

edited

Loading