-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TweetyNet Missing Annotations #153
Comments
Did you use the spectrogram and local score array output function to see exactly what the local score arrays look like? |
Specifically, the spectrogram_visualization() function might help you get some insights. Keep in mind that this tweetynet model was trained on south american xeno-canto clips, so it is likely just missing African birds. Also, we have better datasets to train with since this model was trained, something that your team should consider looking into. Specifically, we have the data science team's annotations of near 2000 audio clips, we also have the dataset that was annotated by COSMOS students last summer. Sam should know where these are located. |
I recommended creating this issue since the system didn't throw an error for zero detections via zero division. If the model doesn't detect a bird, we should have an error thrown that the user sees, so a file that doesn't appear to have been processed is worth looking into. As for tweetynet, wasn't it trained on bird-vox and another European bird dataset? Which south American datasets was tweetynet trained on? |
I think that the easiest way to accomplish this would be to have a parameter that could be a bit of a "failure report" and you could construct a set of the clip names as the clips are iterated through in generate_automated_labels(). Once you get the output dataframe, you can create a set from the FILE NAME column. You can use set theory to subtract the successful clips from all clips which would give you a list of failed clips. |
When running TweetyNet in PyHa through generate_automated_labels() from the IsoAutio module, some files result in neither any annotations nor in any errors.
We found this issue when running it on the BirdCLEF2023 training data.
We used the following parameters when running TN:
isolation_parameters_tweety = { "model" : "tweetynet", "tweety_output": True, "verbose" : True }
When we called generate_automated_labels(), we passed in the default parameters.
I've uploaded all of the .wav files that resulted in zero annotations or errors to the shared e4e google drive under the folder 'BirdCLEF2023_Missing_Files_Issue'.
The text was updated successfully, but these errors were encountered: