-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to Use DINOv2 for Facial Recognition/Search #455
Comments
You are going to have a much easier time using models dedicated to face embeddings. DinoV2 is quite slow and gives extremely high dimensional outputs which are not really conducive to vector search without some kind of dimensionality reduction. |
@odusseys Do you have any model to recommend?Thanks |
A simple method is to use different sized backbones to extract features, and then directly use k-means for clustering. If everything goes smoothly, you can see the differences between backbones of different sizes, and then try to select backbones with smaller parameter counts to improve efficiency. |
@1921134176 Thanks a lot. I also want to ask that dinov2 uses blurring identifiable faces during training. Does this mean that dinov2 is not suitable for face tasks? |
In my personal opinion, dinov2 mainly aims to obtain a universal and task independent feature extraction backbone, and specific downstream adjustments still need to be made according to the domain. I am not very familiar with facial recognition, but in terms of remote sensing segmentation, we directly freeze the Dinov2 backbone and use the features extracted by Dinov2 to fine tune downstream applications, which has achieved good results. We use VIT-G. |
Thanks a lot, How many pictures did you use for fine-tuning? |
We trained many different task models, with the minimum task using 500 samples with 518px and the maximum task using 500000 samples with 518px. Generally, freezing training for about 10 epoch can yield a preliminary usable result. We found that frozen vit-g has high accuracy from the beginning and converges quickly during downstream application training. |
Thanks a lot~ |
I have my own dataset with hundreds of thousands photos of peoples faces and was wondering if anyone has done something similar in using DINO for Facial Recognition? Hoping to be able to input a photo of a faces and to use DINO as a backend to gather all photos that contain the same person without have to do extensive labeling on dataset. Additionally I hope using DINO can help differentiate the different faces better in terms of clustering similar images together, as I find that the current solutions I have all tried have been unable to identify the same person throughout different pictures accurately, or it thinks that all people with glasses and white hair are Steven Spielberg. Not sure if I explained this well, but for example sometimes I just want to find one picture that I may have taken with a person several years ago and I only have one picture, can I use DINO? and what is a direction to get started in implementing something like this
The text was updated successfully, but these errors were encountered: