models documentation

Models

All models

ALLaM-2-7b-instruct

Description

ALLaM is a series of powerful language models designed to advance Arabic Language Technology (ALT) developed by the National Center for Artificial Intelligence (NCAI) at the Saudi Data and AI Authority (SDAIA). ALLaM-2-7b-instruct is traine...

analyze-conversations

The "Analyze Conversations" is a standard model that utilizes Azure AI Language to perform various analyzes on text-based conversations. Azure AI language hosts pre-trained, task-oriented, and optimized conversation focused ML models, including various summarization aspects, PII entity extraction...
analyze-documents

The "Analyze Documents" is a standard model that utilizes Azure AI Language to perform various analyzes on text-based documents. Azure AI language hosts pre-trained, task-oriented, and optimized document focused ML models, such as summarization, sentiment analysis, entity extraction, etc.

...

ask-wikipedia

The "Ask Wikipedia" is a Q&A model that employs GPT3.5 to answer questions using information sourced from Wikipedia, ensuring more grounded responses. This process involves identifying the relevant Wikipedia link and extracting its contents. These contents are then used as an augmented prompt, en...
AutoML-Image-Classification

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Image-Instance-Segmentation

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Image-Object-Detection

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Named-Entity-Recognition

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Text-Classification

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
bert-base-cased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
bert-base-uncased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate input...
bert-large-cased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
bert-large-uncased

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inpu...
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the i...
Bleu-Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1] | | What is this metric? | Measures how closely the generated text matches a reference text based on n-gram overlap. | | How does it work? | The BLEU score calculates the geometric mean of the precision of n-grams between the model-generated text and ...
bring-your-own-data-chat-qna

The "Bring Your Own Data Chat QnA" is a pre-trained chat model, enhanced by GPT3.5, that leverages your personally indexed data and chat history to deliver more concrete and relevant answers. It involves processing the raw query through an embedding procedure, followed by a "Vector Search" to pin...
bring-your-own-data-qna

The "Bring your own data QnA" is a pre-trained Q&A model, enhanced by GPT3.5, that leverages your personally indexed data to deliver more concrete and relevant answers. It involves processing the raw query through an embedding procedure, followed by a "Vector Search" to pinpoint the most pertinen...
bytetrack_yolox_x_crowdhuman_mot17-private-half

bytetrack_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtai...
camembert-base

CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.

It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains.

Training Details

Training Data

OSCAR or Open...

chat-quality-safety-eval

The chat quality and safety evaluation flow will evaluate the chat systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your LLM responses . Utilizing GPT model to assist with measurements aims to achieve a high agreement with human evaluatio...
chat-with-wikipedia

The "Chat with Wikipedia" is a pre-trained chat model with GPT3.5: it combines conversation history and information from Wikipedia to make the answer more grounded. It involves finding a relevant Wikipedia link and getting page contents for the question. It can remember previous interactions and ...
classification-accuracy-eval

The "Classification Accuracy Evaluation" is a model designed to assess the effectiveness of a data classification system. It involves matching each prediction against the ground truth, subsequently assigning a "Correct" or "Incorrect" score. The cumulative results are then leveraged to generate p...
Coherence-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: where 1 is bad and 5 is good | | What is this metric? | Measures how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language. | | How does it work? | The coherence measure assesses the abi...
compvis-stable-diffusion-v1-4

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The Stable-Diffusion-v1-4 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 225k steps at resolution 5...
count-cars

The "Count Cars" is a model designed for accurately quantifying the number of specific vehicles – particularly red cars – in given images. Utilizing the advanced capabilities of Azure OpenAI GPT-4 Turbo with Vision, this system meticulously analyzes each image, identifies and counts red cars, out...
CxrReportGen

Overview

The CxrReportGen model utilizes a multimodal architecture, integrating a BiomedCLIP image encoder with a Phi-3-Mini text encoder to accurately interpret complex medical imaging studies of chest X-rays. CxrReportGen follows the same framework as **[MAIRA-2](https://www.microsoft.com/e...

databricks-dolly-v2-12b

Databricks' dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records [databricks-dolly-15k](https://github.com/d...
Deci-DeciCoder-1b

The Model Card for DeciCoder 1B provides details about a 1 billion parameter decoder-only code completion model developed by Deci. The model was trained on Python, Java, and JavaScript subsets of Starcoder Training Dataset and uses Grouped Query Attention with a context window of 2048 tokens. It ...
deci-decidiffusion-v1-0

DeciDiffusion 1.0 is an 820 million parameter latent diffusion model designed for text-to-image conversion. Trained initially on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset, the model's training involved advanced techniques to improve speed, training performance, and achieve su...
Deci-DeciLM-7B

DeciLM-7B is a decoder-only text generation model with 7.04 billion parameters, released by Deci under the Apache 2.0 license. It is the top-performing 7B base language model on the Open LLM Leaderboard and uses variable Grouped-Query Attention (GQA) to achieve a superior balance between accuracy...
Deci-DeciLM-7B-instruct

DeciLM-7B-instruct is a model for short-form instruction following, built by LoRA fine-tuning on the SlimOrca dataset. It is a derivative of the recently released DeciLM-7B language model, a pre-trained, high-efficiency generative text model with 7 billion parameters. DeciLM-7B-instruct is one of...
deepset-minilm-uncased-squad2

Training Details

Hyperparameters

seed=42
batch_size = 12
n_epochs = 4
base_LM_model = "microsoft/MiniLM-L12-H384-uncased"
max_seq_len = 384
learning_rate = 4e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64
grad_acc_steps=4

Evaluation Res...

deepset-roberta-base-squad2

This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.

Training Details

Hype...

deformable_detr_twostage_refine_r50_16x2_50e_coco

deformable_detr_twostage_refine_r50_16x2_50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...
detect-defects

The "Detect Defects" is a model designed for meticulous examination of images. It operates by employing GPT-4 Turbo with Vision to compare a test image against a reference image. Each analysis focuses on identifying variances or anomalies, classifying them as defects. This methodical comparison e...
distilbert-base-cased

DistilBERT, a transformers model, is designed to be smaller and quicker than BERT. It underwent pretraining on the same dataset in a self-supervised manner, utilizing the BERT base model as a reference. This entails training solely on raw texts, without human annotation, thus enabling the utiliza...
distilbert-base-cased-distilled-squad

The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://...
distilbert-base-uncased

DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lot...
distilbert-base-uncased-distilled-squad

DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxi...
distilbert-base-uncased-finetuned-sst-2-english

DistilBERT base uncased finetuned SST-2 model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy ...
distilgpt2

DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has 82 million parameters, was developed using knowledge distillation and was designed to be a faster, li...
distilroberta-base

distilroberta-base is a distilled version of the RoBERTa-base model. It follows the same training procedure as DistilBERT. The code for the distillation process can be found [here](https://github.com/hugg...
ECI-Evaluator

Definition

Election Critical Information (ECI) refers to any content related to elections, including voting processes, candidate information, and election results. The ECI evaluator uses the Azure AI Safety Evaluation service to assess the generated responses for ECI without a disclaimer.

#...

F1Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1] | | What is this metric? | Measures the ratio of the number of shared words between the model generation and the ground truth answers. | | How does it work? | The F1-score computes the ratio of the number of shared words between the model generation ...
facebook-bart-large-cnn

BART is a transformer model that combines a bidirectional encoder similar to BERT with an autoregressive decoder akin to GPT. It is trained using two main techniques: (1) corrupting text with a chosen noising function, and (2) training a model to reconstruct the original text.

When fine-tuned fo...

facebook-deit-base-patch16-224

DeiT (Data-efficient image Transformers) is an image transformer that do not require very large amounts of data for training. This is achieved through a novel distillation procedure using teacher-student strategy, which results in high throughput and accuracy. DeiT is pre-trained and fine-tuned o...
facebook-dinov2-base-imagenet1k-1-layer

Vision Transformer (base-sized model) trained using DINOv2

Vision Transformer (ViT) model trained using the DINOv2 method. It was introduced in the paper DINOv2: Learning Robust Visual Features without Supervision by Oquab et al. and first released...

Facebook-DinoV2-Image-Embeddings-ViT-Base

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion with the DinoV2 method.

Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token ...

Facebook-DinoV2-Image-Embeddings-ViT-Giant

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion with the DinoV2 method.

Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token ...

facebook-sam-vit-base

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
facebook-sam-vit-huge

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
facebook-sam-vit-large

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 bi...
finiteautomata-bertweet-base-sentiment-analysis

Repository: https://github.com/finiteautomata/pysentimiento/

Model trained with SemEval 2017 corpus (around ~40k tweets). Base model is BERTweet, a RoBERTa model trained on English tweets.

Uses `POS...

Fluency-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: where 1 is bad and 5 is good | | What is this metric? | Measures the grammatical proficiency of a generative AI's predicted answer. | | How does it work? | The fluency measure assesses the extent to which the generated text conforms to grammatical...
Gleu-Score-Evaluator

| | | | -- | -- | | Score range | Float [0-1] | | What is this metric? | Measures the degree of overlap between the generated text and both the reference text and source text, balancing between precision and recall. | | How does it work? | The GLEU score is computed by averaging the precision and...
google-vit-base-patch16-224

The Vision Transformer (ViT) model, as introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Dosovitskiy et al., underwent pre-training on ImageNet-21k with a resolution of 224x224. Su...
gpt2

GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generat...
gpt2-large

GPT-2 Large is the 774M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM)

Training Details

See the [associated paper](https://d4mucfpksywv.cloudfront.net/bet...

gpt2-medium

GPT-2 Medium is the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

Training Details

See the [associated paper](https://d4mucfpksywv.c...

Groundedness-Evaluator

| | | | -- | -- | | Score range | Integer [1-5]: where 1 is bad and 5 is good | | What is this metric? | Measures how well the model's generated answers align with information from the source data (user-defined context). | | How does it work? | The groundedness measure assesses the correspondenc...
Hate-and-Unfairness-Evaluator

Definition

Hateful and unfair content refers to any language pertaining to hate toward or unfair representations of individuals and social groups along factors including but not limited to race, ethnicity, nationality, gender, sexual orientation, religion, immigration status, ability, persona...

how-to-use-functions-with-GPT-chat-API

The "Use Functions with Chat Models" is a chat model illustrates how to employ the LLM tool's Chat API with external functions, thereby expanding the capabilities of GPT models. The Chat Completion API includes an optional 'functions' parameter, which can be used to stipulate function specificati...
Indirect-Attack-Evaluator

Definition

Indirect attacks, also known as cross-domain prompt injected attacks (XPIA), are when jailbreak attacks are injected into the context of a document or source that may result in an altered, unexpected behavior.

Indirect attacks evaluations are broken down into three subcategories: ...

Jean-Baptiste-camembert-ner

Summary: camembert-ner is a NER model fine-tuned from camemBERT on the Wikiner-fr dataset and was validated on email/chat data. It shows better performance on entities that do not start with an uppercase. The model has four classes: O, MISC, PER, ORG and LOC. The model can be loaded using Hugging...
Llama-2-13b

Model Details

Note: Use of this model is governed by the Meta license. Click on View License above.