models ALLaM 2 7b instruct

ALLaM-2-7b-instruct

Overview

Description

ALLaM is a series of powerful language models designed to advance Arabic Language Technology (ALT) developed by the National Center for Artificial Intelligence (NCAI) at the Saudi Data and AI Authority (SDAIA). ALLaM-2-7b-instruct is trained from scratch. Our pretraining from scratch recipe consists of two steps: training on 4T English tokens followed by training on 1.2T mixed Arabic/English tokens. This retains the English capabilities of the model without catastrophic forgetting, effectively transferring knowledge from one language distribution to another.

Developed by: National Center for Artificial Intelligence at SDAIA
Model type: Autoregressive Transformer
Language(s): Arabic, English
License: Please see the LICENSE file
Input: Text
Output: Text

Intended Use

ALLaM is specifically designed to expedite the research and development of ALT through Large Language Models (LLM). It serves as one of the foundational elements for building product offerings as well as facilitating experimental initiatives.

Model Details

ALLaM is a family of LLMs specially trained for Arabic. The main two paths followed for pretraining are:

ALLaM-1: Continue training from open source models
ALLaM-2: Pretraining models from scratch

For this release, we are providing our instruction-tuned 7B parameter generative model pretrained from scratch.

Some parameters for this model are provided in the following table:

Size	Context Length	Pretraining Tokens	Instructions	Preference Pairs
7B parameters	4096 tokens	4T(en) + 1.2T(en+ar)	7M	260K

Ethical Considerations and Limitations

ALLaM is a generative model that comes with inherent uncertainties. Trials cannot encompass every possible use case. Hence, predicting ALLaM's responses in every context is not possible, leading on occasion to incorrect or biased outputs. Developers must conduct thorough safety evaluations and make specific adjustments to ensure the model is suitable for the intended purposes.

Responsible AI Considerations

ALLaM models, similar to other large language models, have certain risks and limitations. They can reflect biases present in their training data. Despite efforts to mitigate these biases, they may still manifest in the model's outputs. Developers must assess outputs for their context and implement additional safeguards as needed. In addition, developers should implement robust content filtering and safety mechanisms to prevent the production of harmful content and consider further mitigation strategies depending on the specific use case. ALLaM models, like all generative models, are inherently uncertain in their outputs, therefore, they may produce outputs that appear coherent, confident, and reliable but could be incorrect, inaccurate, or outdated. Developers should perform rigorous safety tests tailored to their specific model application.

The output generated by this model is not considered a statement of NCAI, SDAIA, or any other organization.

Training Details

ALLaM-2-7b-instruct is pre-trained on a total of 4 trillion tokens in English and Arabic, Our training codebase is built on NVIDIA/MegatronLM. Average MFU during training was ~42%. We trained our model using bf16-mixed precision.

Evaluation

Automatic Benchmarks

Massive Multitask Language Understanding (MMLU) is a collection of many multiple-choice evaluation questions sourced from various academic levels (elementary to college level). These questions are typically related to humanities, STEM, or social sciences. It was originally an English dataset, but other variants were developed for Arabic:

Original English MMLU (MMLU-en): A collection of 14,079 original English questions spanning 57 domains.
Translated Arabic MMLU (MMLU-ar-trans): An English to Arabic machine translation of the original English MMLU.
Natural Arabic MMLU (Arabic MMLU): A collection of 14,575 original Arabic questions spanning 40 domains.

Exams Arabic (Exams Ar): A multiple choice question dataset with 537 samples, covering several domains e.g., Islamic studies, science, humanities, and physics.

Arabic Cultural Alignment (araCA): This dataset was generated by gpt-3.5-turbo and contains 8,710 True and False questions from 58 different areas.

Education and Training Evaluation Commission (ETEC): An Arabic multiple choice questions evaluation dataset collected by ALLaM team in collaboration with Saudi ETEC. It covers different levels of education (from elementary to after-college level) with a total of 1,188 test samples. This dataset is not publically available and only accessible to our evaluation team to prevent accidental contamination.

IEN: Selected from MOE platform IEN, they are arranged by grade/topic/difficulty level. They cover all the Saudi curriculum from 1st grade to high school. The dataset contains 10435 questions in MCQ format and 10910 in True/False format.

AraMath: A collection of 6000 multi-domains (600 is test samples) MCQ with focus on Math adopted in Arabic in MMLU style. The collection focuses on world problems and how they can be formulated as a math equation.

We evaluated all models using our own evaluation pipeline to ensure fair comparison.

Model	MMLU-en (0-shot)	MMLU-ar-trans (0-shot)	Arabic MMLU (0-shot)	Exams Ar (5-shot)	araCA-ACVA (5-shot)	ETEC (0-shot)	IEN-MCQ	IEN-TF	ARAMATH
Llama2 13B chat	53.8	28.7	35.8	22.9	60.1	30.4	36.9	70.6	22.3
AceGPT 13B chat	54.63	37.2	52.6	42.6	67.7	37.3	63.08	75.82	52.60
Jais 13B	49.45	41.43	54.8	46.93	70.68	48.68	69.28	51.49	25.30
Jais 30B chat (v1)	54.49	44.0	60.4	48.6	71.1	48.5	69.1	68.1	26
Jais 30B chat (v3)	57.57	30.15	62.37	51.21	70.05	38.53	45.1	48.7	32.50
GPT-4 (API Generation)	80.78	64.1	73.45	61	78	72	86.66	86.41	52
GPT-4o (API Generation)	82.89	69.34	66.83	61.82	72.51	80.07	91.75	88.16	58.83
Claude Opus (API Generation)	81.8	65.87	56.8	54.37	71.11	78.54	89.17	82.27	52.33
ALLaM-2-7b-instruct	63.65	51.5	69.15	52.9	78.2	67.0	90.8	85.6	41.5

MT-bench

Multi-turn bench (MT-bench): A challenging multi-turn benchmark that uses GPT-4 as a judge. MT-bench comprises 80 questions from 8 domains. Each question is presented to the model and the responses are submitted to GPT-4 to assign scores to each response. The judge returns a score for the first and second turn separately.

This dataset was automatically translated to Arabic and manually verified and culturally aligned.

Model	AR Average	AR Turn 1	AR Turn 2	EN Average	EN Turn 1	EN Turn 2
AceGPT 13B chat	5.44	6.76	4.12	6.33	7.01	5.64
Jais 13B chat	4.18	4.77	3.96	4.72	5.07	4.36
Jais 30B chat (v1)	3.89	4.13	3.64	3.54	4.13	2.95
Jais 30B chat (v3)	5.86	6.25	5.47	6.28	6.78	5.78
ALLaM-2-7b-instruct	6.6	6.73	6.46	7.14	7.58	6.7

Inference samples

System Prompt

It is important to note that this model is optimized to function without a predefined system prompt. While ALLaM does not come with a default system prompt, it does provide the flexibility to add a custom system prompt.

For instance, a well-crafted system prompt could be: “You are ALLaM, a bilingual English and Arabic AI assistant.”

System prompts can also be in Arabic: "أنت علام، مساعد ذكاء اصطناعي مطور من الهيئة السعودية للبيانات والذكاء الاصطناعي، تجيب على الأسئلة بطريقة مفيدة مع مراعاة القيم الثقافية العربية."

Alternatively, users can get creative with their prompts, such as: “You are an AI assistant who responds to everything like a pirate.”

The system prompt is integrated inside the tokenizer config (accessed via apply_chat_template() module).

Sample inputs and outputs (for real-time inference)

English

Sample input

{
    "input_data": {
        "input_string": [
            {
                "role": "user",
                "content": "How do I prepare a cup of tea?"
            }
        ],
        "parameters": {
            "temperature": 0.6,
            "top_p": 0.9,
            "do_sample": true,
            "max_new_tokens": 200
        }
    }
}

Sample output

{
    "output": " Preparing a cup of tea can be simple if you follow these steps:\n\n1. Gather your supplies: You will need loose tea leaves or tea bags, hot water, a kettle or a teapot, a tea infuser or a tea bag, a cup or mug, and a saucer or coaster for your cup.\n\n2. Choose your tea: Decide whether you want to use loose tea leaves or tea bags. If using loose tea, select the type of tea you prefer, such as black, green, white, or herbal.\n\n3. Measure your tea: If using loose tea, measure out one teaspoon of tea leaves per 8 ounces of water. If using tea bags, one bag per cup is usually sufficient.\n\n4. Heat your water: Fill your kettle or teapot with fresh, cold water and place it on the stove or use an electric kettle to"
}

Arabic

Sample input

{
    "input_data": {
        "input_string": [
            {
                "role": "user",
                "content": "كيف أجهز كوب شاهي؟"
            }
        ],
        "parameters": {
            "temperature": 0.6,
            "top_p": 0.9,
            "do_sample": true,
            "max_new_tokens": 200
        }
    }
}

Sample output

{
    "output": " لإعداد كوب من الشاي اللذيذ، يمكنك اتباع الخطوات التالية:\n\nالمكونات:\n1. أوراق الشاي (حسب النوع والكمية المفضلة لديك)\n2. ماء (حسب الكمية المطلوبة)\n3. حليب (اختياري)\n4. سكر (اختياري)\n5. إبريق الشاي أو كوب للخلط (إذا كنت تفضل الخلط مباشرة في الكوب)\n\nالخطوات:\n1. اغلي الماء في غلاية ماء أو على البوتاجاز.\n2. اختر أوراق الشاي المناسبة حسب تفضيلك. الشاي الأسود مناسب للشاي التقليدي، لكن يمكنك تجربة أنواع أخرى مثل الشاي الأخضر أو العشبي.\n3. ضع كمية الشاي المناسبة في الإبريق أو الكوب الذي ستستخدمه للخلط.\n4. صب الماء المغلي فوق أوراق الشاي واترك الشاي ينقع لمدة تتراوح بين 2 إلى 5 دقائق (حسب نوع الشاي والقوام المرغوب فيه).\n5. إذا كنت تفضل الحليب والسكر، أضفهما إلى الشاي الآن وقلب جيدًا حتى يذوب الحليب والسكر.\n"
}

Citation

If you found this work helpful or used any part of this work, please include the following citation:

@misc{bari2024allamlargelanguagemodels,
      title={ALLaM: Large Language Models for Arabic and English}, 
      author={M Saiful Bari and Yazeed Alnumay and Norah A. Alzahrani and Nouf M. Alotaibi and Hisham A. Alyahya and Sultan AlRashed and Faisal A. Mirza and Shaykhah Z. Alsubaie and Hassan A. Alahmed and Ghadah Alabduljabbar and Raghad Alkhathran and Yousef Almushayqih and Raneem Alnajim and Salman Alsubaihi and Maryam Al Mansour and Majed Alrubaian and Ali Alammari and Zaki Alawami and Abdulmohsen Al-Thubaity and Ahmed Abdelali and Jeril Kuriakose and Abdalghani Abujabal and Nora Al-Twairesh and Areeb Alowisheq and Haidar Khan},
      year={2024},
      eprint={2407.15390},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.15390}, 
}

Version: 2

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models ALLaM 2 7b instruct

ALLaM-2-7b-instruct

Overview

Description

Intended Use

Model Details

Ethical Considerations and Limitations

Responsible AI Considerations

Training Details

Evaluation

Automatic Benchmarks

MT-bench

Inference samples

System Prompt

Sample inputs and outputs (for real-time inference)

English

Sample input

Sample output

Arabic

Sample input

Sample output

Citation

Tags

Properties

Wiki menu

Clone this wiki locally