Skip to content

Latest commit



128 lines (78 loc) · 9.63 KB

File metadata and controls

128 lines (78 loc) · 9.63 KB

Pod.Cast 🎱 🐋 | Annotation system

License: MIT

Developed by Prakruti Gogia, Akash Mahajan and Nithya Govindarajan during Microsoft AI4Earth & OneWeek hackathons. (this is volunteer-driven & is not an official product)

For a general introduction to the Pod.Cast project, initiated in 2019, and its relationship to other AI for Orcas efforts, please read the Pod.Cast project general overview at

Techinical Overview is a prototype flask-based web-app to label unlabelled bioacoustic recordings, while viewing predictions from a model. This is useful to setup some quick-and-dirty labelling sessions that don't need any advanced features such as automated model inference, user access roles, interfacing with other backends, gamification etc.

(See prediction-explorer for a related tool to quickly visualize & browse model predictions on a set of audio files. This runs locally)

Screenshot of Pod.Cast annotation UI

  • Each page/session gets a unique URL (via the sessionid URL param), that you can use to share if you find something interesting
  • Refer to the instructions on the page for how to edit model predictions or create annotations
  • The progress bar tracks the current "round" of unlabelled sessions for which annotations have been submitted
  • If you aren't sure, or want to see a new one, skip & refresh loads a random (un-annotated) session without submitting anything

Dataset Creation

This tool has been used in an active learning style to create & release new training & test sets at orcadata/wiki.

  • To do so, a candidate 2-3hr window is identified, with likely activity (reported by sighting networks / Orcasound listeners). Data is processed from Orcasound's S3 archives as follows:
    • Format conversion (HLS -> concatenated wav file)
    • Audio is split into 1-minute easily browsable "sessions"
    • Data to use for labelling/training is prioritized as follows:
      • Candidates are selected for labelling using predictions from an ML model, using a mid-low threshold (tuned for high recall). This helps discard data & prioritize labelling effort.
  • Each round generates new labelled data that improves models trained on this data, making them more robust to varied acoustic conditions at different hydrophone nodes.
  • Held-out test sets have also been created in a similar fashion as accuracy and robustness benchmarks.

Flowchart of feedback loop between model & human listeners


This prototype is a single page application with a simple flask backend that interfaces with Azure blob storage. For simplicity/ease of access, this version doubles up use of blob storage as a sort of database. A JSON file acts as a single entry, and separate containers as sort of tables/collections (for now for this hack makes it easy to do quick-and-dirty viewing/editing in Azure Storage Explorer, or any equivalent blob viewer for S3 etc.).

Architecture diagram showing API interactions between frontend, backend & blob storage

Backend API:

GET /fetch/session/roundid

Scans the getcontainer blob for an unlabelled session, randomly picks & returns a {sessionid=X} response. The sessionid is simply the name of the corresponding X.JSON file on the blob. Updates/resets internal global variable backend_state that contains info for the progress bar.

GET /load/session/roundid/sessionid

GET Azure blob wav

Fetches the corresponding JSON file from the getcontainer blob. (For an example, see example-load.json) JSON file contains backend_state for the progress bar, and uri that points the client directly to the corresponding audio file on the blob storage.

POST /submit/session/roundid/sessionid

Writes a JSON to the postcontainer blob. (For an example, see example-submit.json, which has the same schema). Also updates internal global variable backend_state that contains info for the progress bar.

Client logic:

Primary logic is defined in main.js.

  • fetchUrl, dataUrl, postUrl in index.html define above API
  • The client first checks for the sessionid URL parameter & runs loadSession or fetchAndLoadSession as appropriate
  • This is done on page load and when a submit/skip button is clicked

Use & setup

Setup & local debugging

  1. Create an isolated python environment, and pip install --upgrade pip && pip install -r requirements.txt. (Python 3.6.8 has been tested, though recent versions should likely work as dependencies are quite simple)

  2. Set the environment variable and FLASK_ENV=development. If you haven't made your own CREDS file yet, see #3. Once that's done from this directory start the server with python -m flask run, and browse to the link in the terminal (e.g. in your browser (Edge and Chrome are tested).

  3. The CREDS.yaml specifies how the backend authenticates with blob storage & the specific container names to use. The provided file is a template and should be replaced:

Note that when you run this locally, you will still be connecting & writing to the actual blob storage specified in CREDS.yaml so be careful.

Using your own blob storage

This assumes you have already created an Azure Storage account & know how to view & access it using Azure Storage Explorer.

  1. Enable a CORS rule to the account. In short, setting this allows a browser client to directly make a request to the blob storage to retrieve a *.wav file.

Screenshot of Azure Storage explorer showing CORS permissions

  1. Make sure you have 3 containers; [1]: audiocontainer *.wav audio files (~1min duration - as each file forms one page/session) [2]: getcontainer model predictions specified in JSON format example-load.json corresponding to each *.wav file [3]: postcontainer destination for user-submitted annotations in JSON format example-submit.json.

  2. Enable public read-only access to blobs in audiocontainer (select the "blobs" option). Along with #1, this is required for the browser to directly retrieve *.wav files.

Screenshot of Azure Storage explorer to set public access level

Deployment to Azure App Service

Prerequsite: Install Azure CLI

  1. Authenticate and setup your local environment to be using the right subscription
az login 
az account list --output table 
az account set --subscription SUBSCRIPTIONID
  1. In the root directory of your application, create a deployment config file at .azure/config. This contains details about your resource group, appservice plan to use, etc. (An example file is at .azure/config)

  2. Now run the following commands to deploy the app. The first command packages up your local directory into a *.zip for deployment and deploys the app on Azure. If an app with the same name in the deployment config file exists it will update it, else create a new app. The second command is to only be run the first time, to register the entry point of the app. (see note below)

az webapp up --sku B1 --dryrun
az webapp config set -g mldev -n aifororcas-podcast --startup-file "gunicorn --bind= --timeout 600 podcast_server:app"

This deployment example is loosely based on the Quickstart. We make a change to the startup command to register the different name of our app file (FYI some more details about the CLI commands used here are at: az-webapp-up, configuring-python-app)


This code uses a fork of audio-annotator for the frontend code. audio-annotator uses wavesurfer.js for rendering and playing audio. Please refer to the respective references for more info on the core functions/classes used in this repo. (Note: the wavesurfer.js version used here is older than the current docs).

Icons used in readme flowcharts were made by Prosymbols from