ALLaM-AI
/

ALLaM-7B-Instruct-preview

@@ -123,31 +123,41 @@ All models were evaluated using our proprietary evaluation pipeline and [LM Eval
 The evaluation scores of ALLaM can be found in JSON format [here](https://huggingface.co/ALLaM-AI/ALLaM-7B-Instruct-preview/tree/main/evaluation).
-| model                             |   EXAMS (ar) 5 Shot |   ACVA 5 Shot |   ETECH 0 Shot |   MOE-IEN-MCQ 0 Shot |   MOE-IEN-TF 0 Shot |   SDAIA MCQs 0 Shot |   ArabicMMLU 0 Shot |   AraMath 5 Shot |   OpenAI MMLU-ar 0 Shot |   GAT 0 Shot |
-|:----------------------------------|------------------:|--------------:|--------------:|---------------------:|--------------------:|-------------------:|--------------------:|--------------------:|--------------------:|-----------------------------:|
-| Qwen2.5-72B-Instruct              |             60.71 |         79.92 |         79.92 |                89.25 |               87.23 |              79.3  |               74.1  |               92.17 |               73.59 |                        59.54 |
-| Llama-3.1-70B-Instruct            |             60.34 |         77.07 |         72.3  |                85.02 |               70.21 |              76.79 |               71.46 |               85.17 |               69.88 |                        42.36 |
-| jais-adapted-70b-chat             |             54.75 |         73.33 |         59.57 |                76    |               56.97 |              69.39 |               65.74 |               52.17 |               56.82 |                        39.15 |
-| jais-family-30b-8k-chat           |             50.28 |         74.47 |         55.71 |                73.02 |               72.14 |              65.31 |               63.11 |               50.33 |               50.9  |                        36.44 |
-| jais-family-30b-16k-chat          |             49.72 |         60.08 |         27.64 |                40.56 |               60.03 |              26.98 |               62.04 |               46.5  |               50.98 |                        34.85 |
-| AceGPT-v2-8B-Chat                 |             51.96 |         72.69 |         56.71 |                77.02 |               75.85 |              68.44 |               57.02 |               40    |               49.99 |                        36.15 |
-| jais-family-6p7b-chat             |             46.93 |         73.8  |         48.31 |                61.55 |               68.14 |              60.66 |               56.15 |               30.83 |               44.96 |                        31.71 |
-| jais-adapted-7b-chat              |             40.6  |         70.44 |         40.96 |                58.69 |               68.63 |              51.38 |               49.75 |               24.17 |               38.54 |                        29.68 |
-| jais-adapted-13b-chat             |             48.23 |         67.78 |         47.89 |                70.75 |               41.54 |              61.92 |               56.42 |               41.17 |               46.83 |                        33.4  |
-| Qwen2.5-14B-Instruct              |             57.54 |         75.04 |         73.41 |                83.76 |               71.04 |              73.59 |               69.36 |               91.17 |               63.8  |                        51.7  |
-| Mistral-7B-Instruct-v0.3          |             33.71 |         61.21 |         33.83 |                53.9  |               65.38 |              44.1  |               45.27 |               24.33 |               32.32 |                        26.65 |
-| falcon-mamba-7b-instruct          |             28.49 |         63.52 |         34.62 |                47.22 |               71.34 |              39.9  |               39.27 |               31.67 |               28.45 |                        29.69 |
-| Mistral-Nemo-Instruct-2407        |             47.49 |         76.92 |         51.43 |                70.23 |               71.73 |              61.1  |               55.97 |               43.33 |               46.15 |                        25.44 |
-| Qwen2.5-7B-Instruct               |             50.65 |         78.17 |         64.11 |                78.31 |               75.17 |              68.7  |               61.54 |               60.5  |               56.1  |                        41.42 |
-| Llama-3.1-8B-Instruct             |             54    |         70.54 |         51.9  |                70.01 |               76.99 |              62.42 |               56.53 |               42.83 |               44.67 |                        30.76 |
-| jais-family-13b-chat              |             45.07 |         71.18 |         46.83 |                60.92 |               50.87 |              54.83 |               58.14 |               41.67 |               47.73 |                        31.72 |
-| Mistral-Small-Instruct-2409       |             38.73 |         68.93 |         44.03 |                62.16 |               75.87 |              52.51 |               50.43 |               46.33 |               39.63 |                        28.82 |
-| ALLaM-7B-Instruct-preview |             51.58 |         76.33 |         66.81 |                91.54 |               85.57 |              73.9  |               67.78 |               65.5  |               55.91 |                        44.53 |
 #### English Benchmarks
-| model                             |   AGIEval 0 Shot |   Arc-(challenge) 0 Shot |   GPQA (main) 0 Shot |   Hendrycks ethics 0 Shot |   Winogrande 0 Shot |   HellaSwag 0 Shot |   TriviaQa 5 Shot |   MMLU Pro 5 Shot |   Minerva Math 4 Shot |   MMLU 0 Shot |   TruthfulQA-mc2 0 Shot |   IFEval (prompt_level strict) 0 Shot |   IFEval (inst_level strict`) 0 Shot |   GSM8k 5 Shot |
 |:----------------------------------|-----------------:|-----------------------:|--------------------------:|--------------------------:|--------------------:|-------------------:|------------------:|------------------:|----------------------:|--------------:|------------------------:|----------------------------------:|--------------------------------:|---------------:|
 | Qwen2.5-72B-Instruct              |            71.09 |                  63.48 |                     25.67 |                     78.33 |               76.24 |              87.41 |             70.9  |             62.77 |                 54.04 |         83.44 |                   69.54 |                             67.47 |                           76.86 |          93.25 |
 | Llama-3.1-70B-Instruct            |            52.6  |                  63.05 |                     27.01 |                     80.28 |               79.08 |              84.67 |             82.09 |             59    |                 49.18 |         82.36 |                   59.92 |                             70.98 |                           79.74 |          88.4  |

 The evaluation scores of ALLaM can be found in JSON format [here](https://huggingface.co/ALLaM-AI/ALLaM-7B-Instruct-preview/tree/main/evaluation).
+| Model                       | ETEC <br>0 shot   | IEN-MCQ <br>0 shot   | IEN-TF <br>0 shot   | AraPro <br>0 shot   | AraMath <br>5 shot   | ARIFEval <br>(prompt strict) <br>0 shot   | ARIFEval <br>(inst strict) <br>0 shot   | ExamsAR <br>5 shot   | ACVA  <br> 5 shot   | Arabicmmlu <br>0 Shot   | Openai mmlu <br>0 shot   | GAT 0 shot   |
+|:----------------------------|:---------|:-----------------|:----------------|:----------------|:-----------------|:-----------------------------------|:---------------------------------|:------------------|:--------------|:--------------------|:--------------------|:-----------------------------|
+| ALLaM-7B-Instruct-preview         | 66.67    | **91.77**        | 82.95           | 69.71           | 66.78            | 31.34                              | 67.65                            | 51.58             | 76.33         | 67.78               | 55.91               | 44.53                        |
+| AceGPT-v2-8B-Chat           | 35.67    | 53.59            | 63.4            | 43.85           | 27.11            | 30.41                              | 64.03                            | 51.96             | 72.69         | 57.02               | 49.99               | 36.15                        |
+| jais-family-6p7b-chat       | 49.28    | 68.43            | 71.78           | 57.61           | 40.0             | 35.82                              | 70.58                            | 46.93             | 73.8          | 56.15               | 44.96               | 31.71                        |
+| jais-family-13b-chat        | 53.31    | 74.88            | 68.76           | 62.79           | 41.49            | 16.6                               | 54.95                            | 45.07             | 71.18         | 58.14               | 47.73               | 31.72                        |
+| jais-family-30b-8k-chat     | 68.84    | 79.6             | 78.81           | 70.49           | 70.91            | **70.9**                           | **88.6**                         | 50.28             | 74.47         | 63.11               | 50.9                | 36.44                        |
+| jais-family-30b-16k-chat    | 45.68    | 59.23            | 71.7            | 52.51           | 34.38            | 51.87                              | 79.11                            | 49.72             | 60.08         | 62.04               | 50.98               | 34.85                        |
+| jais-adapted-7b-chat        | 40.96    | 60.64            | 63.66           | 47.73           | 44.46            | 51.12                              | 78.16                            | 40.6              | 70.44         | 49.75               | 38.54               | 29.68                        |
+| jais-adapted-13b-chat       | 72.18    | 80.51            | 77.64           | 69.11           | 82.81            | 68.66                              | 86.76                            | 48.23             | 67.78         | 56.42               | 46.83               | 33.4                         |
+| jais-adapted-70b-chat       | 37.52    | 52.65            | 57.63           | 41.47           | 56.53            | 8.58                               | 47.92                            | 54.75             | 73.33         | 65.74               | 56.82               | 39.15                        |
+| Qwen2.5-7B-Instruct         | 40.49    | 57.38            | 67.18           | 50.59           | 28.43            | 14.93                              | 54.27                            | 50.65             | 78.17         | 61.54               | 56.1                | 41.42                        |
+| Qwen2.5-14B-Instruct        | 78.33    | 84.93            | 81.92           | 71.81           | 91.9             | 56.9                               | 82.87                            | 57.54             | 75.04         | 69.36               | 63.8                | 51.7                         |
+| Qwen2.5-72B-Instruct        | 64.81    | 81.6             | 80.35           | 67.19           | 64.46            | 25.75                              | 63.41                            | 60.71             | **79.92**     | **74.1**            | **73.59**           | **59.54**                    |
+| Mistral-7B-Instruct-v0.3    | **78.7** | 86.88            | **86.62**       | **74.69**       | **92.89**        | 67.72                              | 87.51                            | 34.08             | 60.25         | 45.27               | 32.3                | 26.65                        |
+| Mistral-Small-Instruct-2409 | 53.52    | 72.76            | 70.65           | 61.27           | 33.39            | 16.79                              | 54.68                            | 38.73             | 68.93         | 50.43               | 39.63               | 28.82                        |
+| Mistral-Nemo-Instruct-2407  | 56.81    | 74.51            | 76.47           | 64.59           | 45.62            | 27.05                              | 65.05                            | 47.49             | 76.92         | 55.97               | 46.15               | 25.44                        |
+| falcon-mamba-7b-instruct    | 64.12    | 66.38            | 78.46           | 64.63           | 71.74            | 28.17                              | 65.19                            | 28.49             | 63.52         | 39.27               | 28.45               | 29.69                        |
+| Llama-3.1-8B-Instruct       | 48.65    | 62.95            | 68.68           | 57.53           | 26.61            | 17.16                              | 54.27                            | 54.0              | 70.54         | 56.53               | 44.67               | 30.76                        |
+| Llama-3.3-70B-Instruct      | 45.47    | 46.22            | 63.92           | 54.31           | 25.29            | 13.99                              | 52.97                            | **65.74**         | 76.93         | 72.01               | 70.25               | 44.12                        |
+Closed models evaluations:
+| Model                                  | ETEC <br>0 shot   | IEN-MCQ <br>0 shot   | IEN-TF <br>0 shot   | AraPro <br>0 shot   | AraMath <br>5 shot   | ARIFEval <br>(prompt strict) <br>0 shot   | ARIFEval <br>(inst strict) <br>0 shot   | ExamsAR <br>5 shot   | ACVA  <br> 5 shot   | Arabicmmlu <br>0 Shot   | Openai mmlu <br>0 shot   | GAT 0 shot   |
+|:---------------------------------------|:--------------|:-----------------|:----------------|:----------------|:-----------------|:----------------------------------|:--------------------------------|:-----------------|:-----------------------|:--------------------|:---------------------|:----------------------|
+| GPT4o  (API Generation)                | 79.39         | **92.03**        | 88.97           | 80.86           | 83.47            | 70.9                              | 88.12                           | 61.82            | 72.51                  | 79.02               | **76.5**             | 62.65                 |
+| Claude Sonnet 3.5 (API Generation) oct | **85.9**      | 86.17            | **89.42**       | **81.46**       | 79.83            | 53.73                             | 80.14                           | **62.38**        | **80.42**              | 69.5                | 66.4                 | **68.89**             |
+| gemini pro 1.5                | 83.31         | 88.28            | 85.44           | 76.22           | **94.88**        | **74.81**                         | **90.17**                       | 58.1             | 75.17                  | **82.0**            | 64.8                 | 59.14                 |
 #### English Benchmarks
+| model                             |   AGIEval 0 Shot |   Arc (challenge) 0 Shot |   GPQA (main) 0 Shot |   Hendrycks <br>ethics 0 Shot |   Winogrande 0 Shot |   HellaSwag 0 Shot |   TriviaQa 5 Shot |   MMLU Pro<br>5 Shot |   Minerva Math <br>4 Shot |   MMLU 0 Shot |   TruthfulQA <br>(mc2) 0 Shot |   IFEval <br>(prompt strict)<br>0 Shot |   IFEval <br>(inst strict)<br>0 Shot |   GSM8k 5 Shot |
 |:----------------------------------|-----------------:|-----------------------:|--------------------------:|--------------------------:|--------------------:|-------------------:|------------------:|------------------:|----------------------:|--------------:|------------------------:|----------------------------------:|--------------------------------:|---------------:|
 | Qwen2.5-72B-Instruct              |            71.09 |                  63.48 |                     25.67 |                     78.33 |               76.24 |              87.41 |             70.9  |             62.77 |                 54.04 |         83.44 |                   69.54 |                             67.47 |                           76.86 |          93.25 |
 | Llama-3.1-70B-Instruct            |            52.6  |                  63.05 |                     27.01 |                     80.28 |               79.08 |              84.67 |             82.09 |             59    |                 49.18 |         82.36 |                   59.92 |                             70.98 |                           79.74 |          88.4  |