models facebook deit base patch16 224

facebook-deit-base-patch16-224

Overview

Description: This model is a more efficiently trained Vision Transformer (ViT). The Vision Transformer (ViT) is a transformer encoder model that is pre-trained and fine-tuned on a large collection of images in a supervised fashion. It is presented with images as sequences of fixed-size patches, which are linearly embedded, and before feeding the sequence to the layers of the Transformer encoder, absolute position embeddings are added. By pre-training the model, it is able to generate an inner representation of images that can be used to extract useful features for downstream tasks. For example, if one has a dataset of labeled images, a standard classifier can be trained by placing a linear layer on top of the pre-trained encoder. The last hidden state of the [CLS] token can be used as a representation of the entire image. > The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. ### Inference samples Inference type|Python sample (Notebook)|CLI with YAML |--|--|--| Real time|image-classification-online-endpoint.ipynb|image-classification-online-endpoint.sh Batch |image-classification-batch-endpoint.ipynb|image-classification-batch-endpoint.sh ### Finetuning samples Task|Use case|Dataset|Python sample (Notebook)|CLI with YAML |---|--|--|--|--| Image Multi-class classification|Image Multi-class classification|fridgeObjects|fridgeobjects-multiclass-classification.ipynb|fridgeobjects-multiclass-classification.sh Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|fridgeobjects-multilabel-classification.ipynb|fridgeobjects-multilabel-classification.sh ### Model Evaluation |Task|Use case|Dataset|Python sample (Notebook)| |---|--|--|--| |Image Multi-class classification|Image Multi-class classification|fridgeObjects|image-multiclass-classification.ipynb| |Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|image-multilabel-classification.ipynb| ### Sample inputs and outputs (for real-time inference) #### Sample input json { "input_data": { "columns": [ "image" ], "index": [0, 1], "data": ["image1", "image2"] } } Note: "image1" and "image2" string should be in base64 format or publicly accessible urls. #### Sample output json [ { "probs": [0.91, 0.09], "labels": ["can", "carton"] }, { "probs": [0.1, 0.9], "labels": ["can", "carton"] } ] #### Model inference - visualization for a sample image mc visualization

Version: 6

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models facebook deit base patch16 224

facebook-deit-base-patch16-224

Overview

Tags

Properties

Wiki menu

Clone this wiki locally