fix

huggingface · Mar 4, 2025 · 5d52e83 · 5d52e83
1 parent 729bb0f
commit 5d52e83
Showing 1 changed file with 4 additions and 3 deletions.
diff --git a/aya-vision.md b/aya-vision.md
@@ -68,10 +68,11 @@ Model merging enhances the generative capabilities of our final model that leads
 
 Multimodal model merging also enables our Aya Vision models to excel in text-only tasks as measured in mArenaHard datasets compared with the other leading vision-language models. 
 
-![stages](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/aya-vision/image-11.png)
-Overview of the training pipeline for Aya Vision
+| ![stages](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/aya-vision/image-11.png) |
+| :--: |
+| Overview of the training pipeline for Aya Vision |
 
-Scaling up to 32B
+## Scaling up to 32B
 
 Finally, we scale our recipe from 8B to 32B, resulting in the state-of-the-art open-weight multilingual vision-language model – Aya Vision 32B which shows significant improvements in win rates due to the stronger initialization of the text-backbone, and outperforms models more than 2x of its size, such as Llama-3.2 90B Vision, Molmo 72B, and Qwen2.5-VL 72B by win rates ranging from 49% to 63% on AyaVisionBench and 52% to 72% on mWildVision average across 23 languages.