-
Notifications
You must be signed in to change notification settings - Fork 126
models ALLaM 2 7b instruct
ALLaM is a series of powerful language models designed to advance Arabic Language Technology (ALT) developed by the National Center for Artificial Intelligence (NCAI) at the Saudi Data and AI Authority (SDAIA). ALLaM-2-7b-instruct
is trained from scratch. Our pretraining from scratch recipe consists of two steps: training on 4T English tokens followed by training on 1.2T mixed Arabic/English tokens. This retains the English capabilities of the model without catastrophic forgetting, effectively transferring knowledge from one language distribution to another.
- Developed by: National Center for Artificial Intelligence at SDAIA
- Model type: Autoregressive Transformer
- Language(s): Arabic, English
- License: Please see the LICENSE file
- Input: Text
- Output: Text
ALLaM
is specifically designed to expedite the research and development of ALT through Large Language Models (LLM). It serves as one of the foundational elements for building product offerings as well as facilitating experimental initiatives.
ALLaM is a family of LLMs specially trained for Arabic. The main two paths followed for pretraining are:
- ALLaM-1: Continue training from open source models
- ALLaM-2: Pretraining models from scratch
For this release, we are providing our instruction-tuned 7B parameter generative model pretrained from scratch.
Some parameters for this model are provided in the following table:
Size | Context Length | Pretraining Tokens | Instructions | Preference Pairs |
---|---|---|---|---|
7B parameters | 4096 tokens | 4T(en) + 1.2T(en+ar) | 7M | 260K |
ALLaM is a generative model that comes with inherent uncertainties. Trials cannot encompass every possible use case. Hence, predicting ALLaM's responses in every context is not possible, leading on occasion to incorrect or biased outputs. Developers must conduct thorough safety evaluations and make specific adjustments to ensure the model is suitable for the intended purposes.
ALLaM models, similar to other large language models, have certain risks and limitations. They can reflect biases present in their training data. Despite efforts to mitigate these biases, they may still manifest in the model's outputs. Developers must assess outputs for their context and implement additional safeguards as needed. In addition, developers should implement robust content filtering and safety mechanisms to prevent the production of harmful content and consider further mitigation strategies depending on the specific use case. ALLaM models, like all generative models, are inherently uncertain in their outputs, therefore, they may produce outputs that appear coherent, confident, and reliable but could be incorrect, inaccurate, or outdated. Developers should perform rigorous safety tests tailored to their specific model application.
The output generated by this model is not considered a statement of NCAI, SDAIA, or any other organization.
ALLaM-2-7b-instruct is pre-trained on a total of 4 trillion tokens in English and Arabic, Our training codebase is built on NVIDIA/MegatronLM. Average MFU during training was ~42%. We trained our model using bf16-mixed precision.
Massive Multitask Language Understanding (MMLU) is a collection of many multiple-choice evaluation questions sourced from various academic levels (elementary to college level). These questions are typically related to humanities, STEM, or social sciences. It was originally an English dataset, but other variants were developed for Arabic:
-
Original English MMLU (MMLU-en): A collection of 14,079 original English questions spanning 57 domains.
-
Translated Arabic MMLU (MMLU-ar-trans): An English to Arabic machine translation of the original English MMLU.
-
Natural Arabic MMLU (Arabic MMLU): A collection of 14,575 original Arabic questions spanning 40 domains.
Exams Arabic (Exams Ar): A multiple choice question dataset with 537 samples, covering several domains e.g., Islamic studies, science, humanities, and physics.
Arabic Cultural Alignment (araCA): This dataset was generated by gpt-3.5-turbo
and contains 8,710 True and False questions from 58 different areas.
Education and Training Evaluation Commission (ETEC): An Arabic multiple choice questions evaluation dataset collected by ALLaM team in collaboration with Saudi ETEC. It covers different levels of education (from elementary to after-college level) with a total of 1,188 test samples. This dataset is not publically available and only accessible to our evaluation team to prevent accidental contamination.
IEN: Selected from MOE platform IEN, they are arranged by grade/topic/difficulty level. They cover all the Saudi curriculum from 1st grade to high school. The dataset contains 10435 questions in MCQ format and 10910 in True/False format.
AraMath: A collection of 6000 multi-domains (600 is test samples) MCQ with focus on Math adopted in Arabic in MMLU style. The collection focuses on world problems and how they can be formulated as a math equation.
We evaluated all models using our own evaluation pipeline to ensure fair comparison.
Model | MMLU-en (0-shot) | MMLU-ar-trans (0-shot) | Arabic MMLU (0-shot) | Exams Ar (5-shot) | araCA-ACVA (5-shot) | ETEC (0-shot) | IEN-MCQ | IEN-TF | ARAMATH |
---|---|---|---|---|---|---|---|---|---|
Llama2 13B chat | 53.8 | 28.7 | 35.8 | 22.9 | 60.1 | 30.4 | 36.9 | 70.6 | 22.3 |
AceGPT 13B chat | 54.63 | 37.2 | 52.6 | 42.6 | 67.7 | 37.3 | 63.08 | 75.82 | 52.60 |
Jais 13B | 49.45 | 41.43 | 54.8 | 46.93 | 70.68 | 48.68 | 69.28 | 51.49 | 25.30 |
Jais 30B chat (v1) | 54.49 | 44.0 | 60.4 | 48.6 | 71.1 | 48.5 | 69.1 | 68.1 | 26 |
Jais 30B chat (v3) | 57.57 | 30.15 | 62.37 | 51.21 | 70.05 | 38.53 | 45.1 | 48.7 | 32.50 |
GPT-4 (API Generation) | 80.78 | 64.1 | 73.45 | 61 | 78 | 72 | 86.66 | 86.41 | 52 |
GPT-4o (API Generation) | 82.89 | 69.34 | 66.83 | 61.82 | 72.51 | 80.07 | 91.75 | 88.16 | 58.83 |
Claude Opus (API Generation) | 81.8 | 65.87 | 56.8 | 54.37 | 71.11 | 78.54 | 89.17 | 82.27 | 52.33 |
ALLaM-2-7b-instruct | 63.65 | 51.5 | 69.15 | 52.9 | 78.2 | 67.0 | 90.8 | 85.6 | 41.5 |
Multi-turn bench (MT-bench): A challenging multi-turn benchmark that uses GPT-4 as a judge. MT-bench comprises 80 questions from 8 domains. Each question is presented to the model and the responses are submitted to GPT-4 to assign scores to each response. The judge returns a score for the first and second turn separately.
This dataset was automatically translated to Arabic and manually verified and culturally aligned.
Model | AR Average | AR Turn 1 | AR Turn 2 | EN Average | EN Turn 1 | EN Turn 2 |
---|---|---|---|---|---|---|
AceGPT 13B chat | 5.44 | 6.76 | 4.12 | 6.33 | 7.01 | 5.64 |
Jais 13B chat | 4.18 | 4.77 | 3.96 | 4.72 | 5.07 | 4.36 |
Jais 30B chat (v1) | 3.89 | 4.13 | 3.64 | 3.54 | 4.13 | 2.95 |
Jais 30B chat (v3) | 5.86 | 6.25 | 5.47 | 6.28 | 6.78 | 5.78 |
ALLaM-2-7b-instruct | 6.6 | 6.73 | 6.46 | 7.14 | 7.58 | 6.7 |
It is important to note that this model is optimized to function without a predefined system prompt. While ALLaM does not come with a default system prompt, it does provide the flexibility to add a custom system prompt.
For instance, a well-crafted system prompt could be: “You are ALLaM, a bilingual English and Arabic AI assistant.”
System prompts can also be in Arabic: "أنت علام، مساعد ذكاء اصطناعي مطور من الهيئة السعودية للبيانات والذكاء الاصطناعي، تجيب على الأسئلة بطريقة مفيدة مع مراعاة القيم الثقافية العربية."
Alternatively, users can get creative with their prompts, such as: “You are an AI assistant who responds to everything like a pirate.”
The system prompt is integrated inside the tokenizer config (accessed via apply_chat_template() module).
{
"input_data": {
"input_string": [
{
"role": "user",
"content": "How do I prepare a cup of tea?"
}
],
"parameters": {
"temperature": 0.6,
"top_p": 0.9,
"do_sample": true,
"max_new_tokens": 200
}
}
}
{
"output": " Preparing a cup of tea can be simple if you follow these steps:\n\n1. Gather your supplies: You will need loose tea leaves or tea bags, hot water, a kettle or a teapot, a tea infuser or a tea bag, a cup or mug, and a saucer or coaster for your cup.\n\n2. Choose your tea: Decide whether you want to use loose tea leaves or tea bags. If using loose tea, select the type of tea you prefer, such as black, green, white, or herbal.\n\n3. Measure your tea: If using loose tea, measure out one teaspoon of tea leaves per 8 ounces of water. If using tea bags, one bag per cup is usually sufficient.\n\n4. Heat your water: Fill your kettle or teapot with fresh, cold water and place it on the stove or use an electric kettle to"
}
{
"input_data": {
"input_string": [
{
"role": "user",
"content": "كيف أجهز كوب شاهي؟"
}
],
"parameters": {
"temperature": 0.6,
"top_p": 0.9,
"do_sample": true,
"max_new_tokens": 200
}
}
}
{
"output": " لإعداد كوب من الشاي اللذيذ، يمكنك اتباع الخطوات التالية:\n\nالمكونات:\n1. أوراق الشاي (حسب النوع والكمية المفضلة لديك)\n2. ماء (حسب الكمية المطلوبة)\n3. حليب (اختياري)\n4. سكر (اختياري)\n5. إبريق الشاي أو كوب للخلط (إذا كنت تفضل الخلط مباشرة في الكوب)\n\nالخطوات:\n1. اغلي الماء في غلاية ماء أو على البوتاجاز.\n2. اختر أوراق الشاي المناسبة حسب تفضيلك. الشاي الأسود مناسب للشاي التقليدي، لكن يمكنك تجربة أنواع أخرى مثل الشاي الأخضر أو العشبي.\n3. ضع كمية الشاي المناسبة في الإبريق أو الكوب الذي ستستخدمه للخلط.\n4. صب الماء المغلي فوق أوراق الشاي واترك الشاي ينقع لمدة تتراوح بين 2 إلى 5 دقائق (حسب نوع الشاي والقوام المرغوب فيه).\n5. إذا كنت تفضل الحليب والسكر، أضفهما إلى الشاي الآن وقلب جيدًا حتى يذوب الحليب والسكر.\n"
}
If you found this work helpful or used any part of this work, please include the following citation:
@misc{bari2024allamlargelanguagemodels,
title={ALLaM: Large Language Models for Arabic and English},
author={M Saiful Bari and Yazeed Alnumay and Norah A. Alzahrani and Nouf M. Alotaibi and Hisham A. Alyahya and Sultan AlRashed and Faisal A. Mirza and Shaykhah Z. Alsubaie and Hassan A. Alahmed and Ghadah Alabduljabbar and Raghad Alkhathran and Yousef Almushayqih and Raneem Alnajim and Salman Alsubaihi and Maryam Al Mansour and Majed Alrubaian and Ali Alammari and Zaki Alawami and Abdulmohsen Al-Thubaity and Ahmed Abdelali and Jeril Kuriakose and Abdalghani Abujabal and Nora Al-Twairesh and Areeb Alowisheq and Haidar Khan},
year={2024},
eprint={2407.15390},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.15390},
}
Version: 2
displayName : ALLaM-2-7b-instruct
summary : ALLaM is a series of powerful language models designed to advance Arabic Language Technology.
disable-batch : true
maxOutputTokens : 4096
hiddenlayerscanned : true
languages : AR, EN
inputModalities : text
outputModalities : text
SharedComputeCapacityEnabled
Featured
license : apache-2.0
task : chat-completion
author : SDAIA
inference_supported_envs : ['vllm']
inference_compute_allow_list : ['Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_ND40rs_v2', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_NC96ads_A100_v4']
View in Studio: https://ml.azure.com/registries/azureml/models/ALLaM-2-7b-instruct/version/2
License: apache-2.0
SharedComputeCapacityEnabled: True
languages: AR, EN
inference-min-sku-spec: 12|1|220|64
inference-recommended-sku: Standard_NC12s_v3, Standard_NC24s_v3, Standard_ND40rs_v2, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_NC96ads_A100_v4