Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to Use DINOv2 for Facial Recognition/Search #455

Open
adalinama opened this issue Aug 15, 2024 · 8 comments
Open

Is it possible to Use DINOv2 for Facial Recognition/Search #455

adalinama opened this issue Aug 15, 2024 · 8 comments

Comments

@adalinama
Copy link

I have my own dataset with hundreds of thousands photos of peoples faces and was wondering if anyone has done something similar in using DINO for Facial Recognition? Hoping to be able to input a photo of a faces and to use DINO as a backend to gather all photos that contain the same person without have to do extensive labeling on dataset. Additionally I hope using DINO can help differentiate the different faces better in terms of clustering similar images together, as I find that the current solutions I have all tried have been unable to identify the same person throughout different pictures accurately, or it thinks that all people with glasses and white hair are Steven Spielberg. Not sure if I explained this well, but for example sometimes I just want to find one picture that I may have taken with a person several years ago and I only have one picture, can I use DINO? and what is a direction to get started in implementing something like this

@odusseys
Copy link

You are going to have a much easier time using models dedicated to face embeddings. DinoV2 is quite slow and gives extremely high dimensional outputs which are not really conducive to vector search without some kind of dimensionality reduction.

@zdaiot
Copy link

zdaiot commented Nov 21, 2024

@odusseys Do you have any model to recommend?Thanks

@1921134176
Copy link

A simple method is to use different sized backbones to extract features, and then directly use k-means for clustering. If everything goes smoothly, you can see the differences between backbones of different sizes, and then try to select backbones with smaller parameter counts to improve efficiency.

@zdaiot
Copy link

zdaiot commented Nov 22, 2024

@1921134176 Thanks a lot. I also want to ask that dinov2 uses blurring identifiable faces during training. Does this mean that dinov2 is not suitable for face tasks?

@1921134176
Copy link

In my personal opinion, dinov2 mainly aims to obtain a universal and task independent feature extraction backbone, and specific downstream adjustments still need to be made according to the domain. I am not very familiar with facial recognition, but in terms of remote sensing segmentation, we directly freeze the Dinov2 backbone and use the features extracted by Dinov2 to fine tune downstream applications, which has achieved good results. We use VIT-G.

@zdaiot
Copy link

zdaiot commented Nov 22, 2024

Thanks a lot, How many pictures did you use for fine-tuning?

@1921134176
Copy link

We trained many different task models, with the minimum task using 500 samples with 518px and the maximum task using 500000 samples with 518px. Generally, freezing training for about 10 epoch can yield a preliminary usable result. We found that frozen vit-g has high accuracy from the beginning and converges quickly during downstream application training.

@zdaiot
Copy link

zdaiot commented Nov 25, 2024

Thanks a lot~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants