Is it possible to Use DINOv2 for Facial Recognition/Search #455

adalinama · 2024-08-15T20:27:16Z

I have my own dataset with hundreds of thousands photos of peoples faces and was wondering if anyone has done something similar in using DINO for Facial Recognition? Hoping to be able to input a photo of a faces and to use DINO as a backend to gather all photos that contain the same person without have to do extensive labeling on dataset. Additionally I hope using DINO can help differentiate the different faces better in terms of clustering similar images together, as I find that the current solutions I have all tried have been unable to identify the same person throughout different pictures accurately, or it thinks that all people with glasses and white hair are Steven Spielberg. Not sure if I explained this well, but for example sometimes I just want to find one picture that I may have taken with a person several years ago and I only have one picture, can I use DINO? and what is a direction to get started in implementing something like this

odusseys · 2024-08-21T08:08:20Z

You are going to have a much easier time using models dedicated to face embeddings. DinoV2 is quite slow and gives extremely high dimensional outputs which are not really conducive to vector search without some kind of dimensionality reduction.

zdaiot · 2024-11-21T01:37:13Z

@odusseys Do you have any model to recommend?Thanks

1921134176 · 2024-11-21T08:12:38Z

A simple method is to use different sized backbones to extract features, and then directly use k-means for clustering. If everything goes smoothly, you can see the differences between backbones of different sizes, and then try to select backbones with smaller parameter counts to improve efficiency.

zdaiot · 2024-11-22T01:32:42Z

@1921134176 Thanks a lot. I also want to ask that dinov2 uses blurring identifiable faces during training. Does this mean that dinov2 is not suitable for face tasks?

1921134176 · 2024-11-22T01:41:38Z

In my personal opinion, dinov2 mainly aims to obtain a universal and task independent feature extraction backbone, and specific downstream adjustments still need to be made according to the domain. I am not very familiar with facial recognition, but in terms of remote sensing segmentation, we directly freeze the Dinov2 backbone and use the features extracted by Dinov2 to fine tune downstream applications, which has achieved good results. We use VIT-G.

zdaiot · 2024-11-22T02:44:34Z

Thanks a lot, How many pictures did you use for fine-tuning?

1921134176 · 2024-11-23T00:37:14Z

We trained many different task models, with the minimum task using 500 samples with 518px and the maximum task using 500000 samples with 518px. Generally, freezing training for about 10 epoch can yield a preliminary usable result. We found that frozen vit-g has high accuracy from the beginning and converges quickly during downstream application training.

zdaiot · 2024-11-25T01:18:49Z

Thanks a lot~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to Use DINOv2 for Facial Recognition/Search #455

Is it possible to Use DINOv2 for Facial Recognition/Search #455

adalinama commented Aug 15, 2024

odusseys commented Aug 21, 2024

zdaiot commented Nov 21, 2024

1921134176 commented Nov 21, 2024

zdaiot commented Nov 22, 2024

1921134176 commented Nov 22, 2024

zdaiot commented Nov 22, 2024

1921134176 commented Nov 23, 2024

zdaiot commented Nov 25, 2024

Is it possible to Use DINOv2 for Facial Recognition/Search #455

Is it possible to Use DINOv2 for Facial Recognition/Search #455

Comments

adalinama commented Aug 15, 2024

odusseys commented Aug 21, 2024

zdaiot commented Nov 21, 2024

1921134176 commented Nov 21, 2024

zdaiot commented Nov 22, 2024

1921134176 commented Nov 22, 2024

zdaiot commented Nov 22, 2024

1921134176 commented Nov 23, 2024

zdaiot commented Nov 25, 2024