Upload 3 files

by pvn251 - opened 12 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-11

Files changed (3) hide show

README.md +7 -7
adapter_config.json +1 -3
adapter_model.safetensors +1 -1

README.md CHANGED Viewed

@@ -107,7 +107,7 @@ We evaluated the model against baselines on binary answerability classification
 |                                                        |       Precision     |     Recall    |     F1   |      Precision    |     Recall    |     F1    |                                    |                                   |
 |     BigBird (pre-trained      embeddings)   w/ MLP     |         49.2        |      68.5     |       57.3      |         48        |      29.2     |       36.3      |                 48.9               |                46.8               |
 |                llama2-7b as classifier (Full SFT)               |         72.2        |       71      |       71.6      |        71.4       |      72.6     |        72       |                 71.8               |                71.8               |
-|                   Granite 3.2-8b LoRA                  |         84.2        |       68      |       75.2      |        73.1       |      87.2     |       79.5      |                 77.6               |                77.4               |
 - Multi-turn Setting (MT-RAG Benchmark): In this setting, the model is given the full multi-turn conversation history along with the supporting documents. This benchmark evaluates the model's ability to assess answerability when the final user query can also depend on prior turns for context.
@@ -118,7 +118,7 @@ We evaluated the model against baselines on binary answerability classification
 |                                                        |       Precision     |     Recall    |     F1    |      Precision    |     Recall    |     F1    |                                    |                                   |
 |     BigBird (pre-trained      embeddings)   w/ MLP     |         69.6        |      77.6     |       73.4      |        70.1       |      60.8     |       65.2      |                 69.8               |                69.6               |
 |       llama2-7b   as classifier      (Full   SFT)      |         86.9        |      89.4     |       88.2      |        87.3       |      84.5     |       85.9      |                 87.1               |                87.1               |
-|                   Granite 3.2-8b LoRA                  |         85.4        |      89.3     |       87.3      |         87        |      82.4     |       84.6      |                 86.1               |                86.1               |
 ### Comparing LoRA Adapter vs. Vanilla Granite for Answer Quality
@@ -138,13 +138,13 @@ We compare the performance of Granite 3.2-8b Instruct vs. Granite 3.2-8b LoRA ad
     This score rewards the model for correctly abstaining on unanswerable queries (full credit) and for providing faithful answers on answerable queries (partial credit based on RAGAS Faithfulness). No credit is given for incorrect or unfaithful predictions.
-The LoRA adapter achieves a 7\% lift on this metric - rewarding the model for correctly abstaining on unanswerable queries and for being faithful when it chooses to answer.
-|                                  |     F1   Score     Unanswerable    |     F1   Score     Answerable    |     Recall     Unanswerable    |     Recall     Answerable    |     Ragas   Faithfulness     (on   Truly Answerable)     |     Joint   Answerability-     Faithfulness   Score    |
-|:--------------------------------:|:----------------------------------:|:--------------------------------:|:------------------------------:|:----------------------------:|:--------------------------------------------------------:|:------------------------------------------------------:|
-|     Granite   3.2-8b Instruct    |                  14                |                 76               |                8               |               97             |                             75                           |                            50                          |
-|        Granite 3.2-8b LoRA       |                  47                |                 77               |                37              |               88             |                             70                           |                            57                          |
 ## Model Card Authors

 |                                                        |       Precision     |     Recall    |     F1   |      Precision    |     Recall    |     F1    |                                    |                                   |
 |     BigBird (pre-trained      embeddings)   w/ MLP     |         49.2        |      68.5     |       57.3      |         48        |      29.2     |       36.3      |                 48.9               |                46.8               |
 |                llama2-7b as classifier (Full SFT)               |         72.2        |       71      |       71.6      |        71.4       |      72.6     |        72       |                 71.8               |                71.8               |
+|     Granite 3.2-8b LoRA    |     87.9    |     69    |     77.3    |     74.4    |     90.5    |     81.7    |     79.7    |     79.5    |
 - Multi-turn Setting (MT-RAG Benchmark): In this setting, the model is given the full multi-turn conversation history along with the supporting documents. This benchmark evaluates the model's ability to assess answerability when the final user query can also depend on prior turns for context.
 |                                                        |       Precision     |     Recall    |     F1    |      Precision    |     Recall    |     F1    |                                    |                                   |
 |     BigBird (pre-trained      embeddings)   w/ MLP     |         69.6        |      77.6     |       73.4      |        70.1       |      60.8     |       65.2      |                 69.8               |                69.6               |
 |       llama2-7b   as classifier      (Full   SFT)      |         86.9        |      89.4     |       88.2      |        87.3       |      84.5     |       85.9      |                 87.1               |                87.1               |
+|     Granite 3.2-8b LoRA    |     89.1    |     93.7    |     91.3    |     92.3    |     86.7    |     89.4    |     90.5    |     90.5
 ### Comparing LoRA Adapter vs. Vanilla Granite for Answer Quality
     This score rewards the model for correctly abstaining on unanswerable queries (full credit) and for providing faithful answers on answerable queries (partial credit based on RAGAS Faithfulness). No credit is given for incorrect or unfaithful predictions.
+The LoRA adapter achieves a 17\% lift on this metric - rewarding the model for correctly abstaining on unanswerable queries and for being faithful when it chooses to answer.
+|                                  |     F1   Score     Unanswerable    |     F1   Score     Answerable    |     Recall     Unanswerable    |     Recall     Answerable    |        Joint   Answerability-     Faithfulness   Score    |
+|:--------------------------------:|:----------------------------------:|:--------------------------------:|:------------------------------:|:----------------------------:|:------------------------------------------------------:|
+|     Granite   3.2-8b Instruct    |                  14                |                 76               |                8               |               97             |                                                       50                          |
+|     Granite 3.2-8b LoRA    |                   68                  |                 82                |                 64                |                85               |                            67                          |
 ## Model Card Authors

adapter_config.json CHANGED Viewed

@@ -29,7 +29,5 @@
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,
-  "use_rslora": false,
-  "model_type": "granite"
 }

   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,
+  "use_rslora": false
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dddf86fe562228b248556ee9807b84656da6f66e1a13bb50b418d0e616fc3a23
 size 94404160

 version https://git-lfs.github.com/spec/v1
+oid sha256:b177dadd8b111877d385b1d48722dc61213cb18620e41cc2ddd66ec8df64243c
 size 94404160