-
Notifications
You must be signed in to change notification settings - Fork 126
models facebook sam vit base
The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.
The SAM model is made up of 3 modules:
- The
VisionEncoder
: a VIT based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used. - The
PromptEncoder
: generates embeddings for points and bounding boxes - The
MaskDecoder
: a two-ways transformer which performs cross attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed - The
Neck
: predicts the output masks based on the contextualized masks produced by theMaskDecoder
.
See here for an overview of the datastet.
apache-2.0
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | mask-generation-online-endpoint.ipynb | mask-generation-online-endpoint.sh |
Batch | mask-generation-batch-endpoint.ipynb | mask-generation-batch-endpoint.sh |
{
"input_data": {
"columns": [
"image",
"input_points",
"input_boxes",
"input_labels",
"multimask_output"
],
"index": [0],
"data": [["image1", "", "[[650, 900, 1000, 1250]]", "", false]]
},
"params": {}
}
Note: "image1" string should be in base64 format or publicly accessible urls.
[
{
"predictions": [
0: {
"mask_per_prediction": [
0: {
"encoded_binary_mask": "encoded_binary_mask1",
"iou_score": 0.85
}
]
}
]
},
]
Note: "encoded_binary_mask1" string is in base64 format.
Version: 3
Preview
author : facebook
huggingface_model_id : facebook/sam-vit-base
license : apache-2.0
task : image-segmentation
training_dataset : SA-1B
SharedComputeCapacityEnabled
inference_compute_allow_list : ['Standard_DS5_v2', 'Standard_D8a_v4', 'Standard_D8as_v4', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_FX4mds', 'Standard_FX12mds', 'Standard_F16s_v2', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E4s_v3', 'Standard_E8s_v3', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
View in Studio: https://ml.azure.com/registries/azureml/models/facebook-sam-vit-base/version/3
License: apache-2.0
SharedComputeCapacityEnabled: True
SHA: b5fc59950038394bae73f549a55a9b46bc6f3d96
inference-min-sku-spec: 4|0|32|64
inference-recommended-sku: Standard_DS5_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2