Speaker Recognition is a tool to detect a speaker in a one second sound clip with high accuracy using tensorflow and a Convoluted Neural Network (CNN). It comes with a pre-trained model for 6 speakers.
Based on the keras Speaker Recognition example: https://keras.io/examples/audio/speaker_recognition_using_cnn/
- add new speakers and re-train the model
- analyze audio files
- list trained speakers
To add new speakers you need to prepare about 1.000 different audio clips with one second length each. This can be done easily with a 30 minute recording of the speaker that can be trimmed and cut with audacity. Then select the folder containing the sound clips and train the model. This will take 30+ minutes based on your computer.