You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a bioacoustician or data scientist training a new model, e.g. as part of a DCLDE workshop, I want to know the sample sizes for annotated SRKW signals available open labeled Orcasound data. The data archives are documented in Orcasound's orcadata wiki, but as of summer 2023 there is neither quantification of the training and test data set sizes, nor visualization of the data distribution (e.g. by pod).
Proposed solution:
Quantify the Pod.Cast annotations of SRKW calls in the Pod.Cast labeled data archive rounds 2-10, providing both the total number of hours reviewed (broken down by aggregate duration of true positives vs true negative data) and the number of validated true positives (confirmed calls). Visualize the call distribution by SRKW pod (J, K, L, J+K, K+L, and J+L).
Quantify the reviewed OrcaAL samples, ideally after completing the annotation of the initial bioacoustic bout. Then publish the resulting training data set for development of multi-class models, along with the model that performs best.
Quantify the reviewed OrcaHello candidate detections (60-second samples). Visualize reviewed sample distribution by species (querying API for tags like humpback and Bigg's KW) and by SRKW pod (using other available tags).
Regarding #3, as of July 24, 2023, the OrcaHello dashboard filtered for All data shows that expert moderators David Bain, Val Veirs, and Scott Veirs had reviewed ~3900 minutes (65 hours) out of 5912 minute-long candidate detections (~100 hours total data duration), including 578 minutes (~9.5 hours) of Orcasound data confirmed to contain SRKW calls (~10% true positives, 90% false positives). Estimating the deployment duration as roughly 4 years over 3 locations, the reviewed false positive rate has less than about 1 per day per node, well below the target of no more than 10 false positives per day per node.
Of all the reviewed candidates, 578 were confirmed to contain at least one SRKW signal.
The text was updated successfully, but these errors were encountered:
Issue statement:
As a bioacoustician or data scientist training a new model, e.g. as part of a DCLDE workshop, I want to know the sample sizes for annotated SRKW signals available open labeled Orcasound data. The data archives are documented in Orcasound's orcadata wiki, but as of summer 2023 there is neither quantification of the training and test data set sizes, nor visualization of the data distribution (e.g. by pod).
Proposed solution:
Regarding #3, as of July 24, 2023, the OrcaHello dashboard filtered for All data shows that expert moderators David Bain, Val Veirs, and Scott Veirs had reviewed ~3900 minutes (65 hours) out of 5912 minute-long candidate detections (~100 hours total data duration), including 578 minutes (~9.5 hours) of Orcasound data confirmed to contain SRKW calls (~10% true positives, 90% false positives). Estimating the deployment duration as roughly 4 years over 3 locations, the reviewed false positive rate has less than about 1 per day per node, well below the target of no more than 10 false positives per day per node.
Of all the reviewed candidates, 578 were confirmed to contain at least one SRKW signal.
The text was updated successfully, but these errors were encountered: