Skip to content

models openai whisper large v3

github-actions[bot] edited this page Sep 27, 2024 · 11 revisions

openai-whisper-large-v3

Overview

Whisper is a model that can recognize and translate speech using deep learning. It was trained on a large amount of data from different sources and languages. Whisper models can handle various tasks and domains without needing to adjust the model.

Whisper large-v3 is similar to the previous large models, but it has some minor changes:

It uses 128 Mel frequency bins instead of 80 for the input 
It adds a new language token for Cantonese

The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio generated by Whisper large-v2. The model was trained for 2.0 epochs on this mixed data.

The large-v3 model shows better performance than Whisper large-v2 on many languages, reducing errors by 10% to 20%.

| Size |Parameters|English-only|Multilingual| |large-v3| 1550 M | x | ✓ |

The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Inference samples

Inference type Python sample (Notebook) CLI with YAML
Real time asr-online-endpoint.ipynb asr-online-endpoint.sh
Batch asr-batch-endpoint.ipynb coming soon

Sample inputs and outputs (for real-time inference)

Sample input

{
   "input_data": {
       "audio": ["https://www2.cs.uic.edu/~i101/SoundFiles/gettysburg.wav", "https://www2.cs.uic.edu/~i101/SoundFiles/preamble.wav"],
       "language": ["en", "en"]
   }
}

Sample output

[
    {
        "text":"four score and seven years ago our fathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure"
    }
    {
        "text":" we the people of the united states in order to form a more perfect union establish justice insure domestic tranquillity provide for the common defense promote the general welfare and secure the blessings of liberty to ourselves and our posterity do ordain and establish this constitution for the united states of america"
    }
]

Version: 4

Tags

huggingface_model_id : openai/whisper-large-v3 hiddenlayerscanned author : OpenAI inference_compute_allow_list : ['Standard_DS5_v2', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2'] Featured Preview license : mit SharedComputeCapacityEnabled task : automatic-speech-recognition

View in Studio: https://ml.azure.com/registries/azureml/models/openai-whisper-large-v3/version/4

License: mit

Properties

SharedComputeCapacityEnabled: True

SHA:

inference-min-sku-spec: 6|0|56|112

inference-recommended-sku: Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

languages: en, zh, de, es, ru, ko, fr, ja, pt, tr, pl, ca, nl, ar, sv, it, id, hi, fi, vi, he, uk, el, ms, cs, ro, da, hu, ta, no, th, ur, hr, bg, lt, la, mi, ml, cy, sk, te, fa, lv, bn, sr, az, sl, kn, et, mk, br, eu, is, hy, ne, mn, bs, kk, sq, sw, gl, mr, pa, si, km, sn, yo, so, af, oc, ka, be, tg, sd, gu, am, yi, lo, uz, fo, ht, ps, tk, nn, mt, sa, lb, my, bo, tl, mg, as, tt, haw, ln, ha, ba, jw, su

Clone this wiki locally