Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching on .extract_faces to improve performance and utilize GPU in full #1435

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

galthran-wq
Copy link
Contributor

Tickets

#1433
#1101
#1434

What has been done

With this PR, .extract_faces is able to accept a list of images

How to test

make lint && make test

Benchmarking on detecting 50 faces:
image

For yolov11n, batch size 20 is 59.27% faster than batch size 1.
For yolov11s, batch size 20 is 29.00% faster than batch size 1.
For yolov11m, batch size 20 is 31.73% faster than batch size 1.
For yolov8, batch size 20 is 12.68% faster than batch size 1.

@skyler14
Copy link

Do you have a branch in your fork that currently combines all the optimizations you've submitted. I'd like to start using them while the approval process is going

whats been the total speedup you've been able to see

@galthran-wq
Copy link
Contributor Author

I do. You can check
https://github.com/galthran-wq/deepface/tree/master-enhanced

it combines these two PRs with some other small modifications:

  • .represent uses batched detector inference (here Batching on .represent to improve performance and utilize GPU in full #1433 it only does batched embedding, because batched detection is not yet implemented)
  • .represent returns a list of list of dicts, if a batch of images is passed. This is neccessary to be able to recover to which images the resulting faces correspond to. It might be a good idea to include this change in the PR as well. You can check the test in the fork.

Not all of the detectors currently (both in this PR and in the fork) implement batching. In particular, YOLO does. I've found it to be optimal in terms of performance and inference speed. The only problem is installing both torch and tensorflow with GPU, but I've managed to somehow do that.

All in all, with the combination of yolov11m and Facenet, both using GPU, and batch size 100 (the largest I could fit in 4090) I am seeing aroung 15x speed boost, but that is highly dependent on the input images, the GPU (especially memory size). I've also had a quick peek and it seems like the performance on the CPU is improved as well.

@serengil FYI I would be happy to contribute the aforementioned modifications if we have progress on the PRs.

@serengil
Copy link
Owner

I will review this PR this week i hope

@serengil
Copy link
Owner

Seems this breaks the unit tests. Must be sorted.

@galthran-wq
Copy link
Contributor Author

should be good now

@serengil
Copy link
Owner

Nope, still failing.

resp.append(facial_area)

return resp
if isinstance(img, np.ndarray):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if img is (2, 224, 224, 3) sized numpy array? i mean many images in single numpy array.

@serengil
Copy link
Owner

You implemented OpenCv, Ssd, Yolo, MtCnn and RetinaFace to accept list inputs

What if I send list to YuNet, MediaPipe, FastMtCnn, Dlib or CenterFace?

I assume an exception will be thrown, but users should see a meaningful message.

@serengil
Copy link
Owner

@galthran-wq you are pushing too many things, would you please inform me when it is ready.

)


def test_batch_extract_faces_single_image():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this might be not the expected behaviour

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

def test_batch_extract_faces(detector_backend):
detector_backend_to_rtol = {
"opencv": 0.1,
"mtcnn": 0.2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out batching has different impact on the result for different backends. It is mostly the same (for almost all the models, the relative error is <1%.
mtcnn and opencv suffer a bit more, not quite sure why (which features are different exactly)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may consider to throw exception for those detectors as they are not suitable for batch maybe

@galthran-wq
Copy link
Contributor Author

galthran-wq commented Feb 18, 2025

@galthran-wq you are pushing too many things, would you please inform me when it is ready.

it is ready now, im usually pushing when it's done

I've just fixed what you've noted:

  • added pseudo-batching (for loop) for other models
  • support for np array batched input (dim=4)
  • changed extract_faces to return a list of lists on batched input
  • couple more tests

len(imgs_objs_batch) == 4 and
all(isinstance(obj, list) for obj in imgs_objs_batch)
)
assert all(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment, the last one has many faces, others have just one face

@@ -79,6 +79,144 @@ def test_different_detectors():
logger.info(f"✅ extract_faces for {detector} backend test is done")


@pytest.mark.parametrize("detector_backend", [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an unit test case for

img1 = cv2.imread(img1_path)
img2 = cv2.imread(img2_path)

img = np.stack([img1, img2])

assert len(img.shape) == 4 # Check dimension.
assert img.shape[0] == 2 # Check batch size.

what if image is batch in numpy format?

resp = []

detected_face = None
if isinstance(img, np.ndarray):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend you to call detect_faces itself it if it is list. this can also be done in Detector.py or detection.py.

in that way, we will not change all detectors

@serengil
Copy link
Owner

is this ready? i can still see my comments not resolved.

@galthran-wq
Copy link
Contributor Author

is this ready? i can still see my comments not resolved.

almost, tell me if you think that the batched numpy array test is okay and let's also settle on whether refactoring of current detect_faces for non-batched detectors is needed (your last comment).

I plan to:

  • add test comments
  • fallback to pseudo-batching on opencv and mtcnn and make sure the rtol stays at 0.01% then
  • (maybe) refactor detectors a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants