Files changed (3) hide show
  1. README.md +7 -7
  2. adapter_config.json +1 -3
  3. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -107,7 +107,7 @@ We evaluated the model against baselines on binary answerability classification
107
  | | Precision | Recall | F1 | Precision | Recall | F1 | | |
108
  | BigBird (pre-trained embeddings) w/ MLP | 49.2 | 68.5 | 57.3 | 48 | 29.2 | 36.3 | 48.9 | 46.8 |
109
  | llama2-7b as classifier (Full SFT) | 72.2 | 71 | 71.6 | 71.4 | 72.6 | 72 | 71.8 | 71.8 |
110
- | Granite 3.2-8b LoRA | 84.2 | 68 | 75.2 | 73.1 | 87.2 | 79.5 | 77.6 | 77.4 |
111
 
112
 
113
  - Multi-turn Setting (MT-RAG Benchmark): In this setting, the model is given the full multi-turn conversation history along with the supporting documents. This benchmark evaluates the model's ability to assess answerability when the final user query can also depend on prior turns for context.
@@ -118,7 +118,7 @@ We evaluated the model against baselines on binary answerability classification
118
  | | Precision | Recall | F1 | Precision | Recall | F1 | | |
119
  | BigBird (pre-trained embeddings) w/ MLP | 69.6 | 77.6 | 73.4 | 70.1 | 60.8 | 65.2 | 69.8 | 69.6 |
120
  | llama2-7b as classifier (Full SFT) | 86.9 | 89.4 | 88.2 | 87.3 | 84.5 | 85.9 | 87.1 | 87.1 |
121
- | Granite 3.2-8b LoRA | 85.4 | 89.3 | 87.3 | 87 | 82.4 | 84.6 | 86.1 | 86.1 |
122
 
123
 
124
  ### Comparing LoRA Adapter vs. Vanilla Granite for Answer Quality
@@ -138,13 +138,13 @@ We compare the performance of Granite 3.2-8b Instruct vs. Granite 3.2-8b LoRA ad
138
  This score rewards the model for correctly abstaining on unanswerable queries (full credit) and for providing faithful answers on answerable queries (partial credit based on RAGAS Faithfulness). No credit is given for incorrect or unfaithful predictions.
139
 
140
 
141
- The LoRA adapter achieves a 7\% lift on this metric - rewarding the model for correctly abstaining on unanswerable queries and for being faithful when it chooses to answer.
142
 
143
- | | F1 Score Unanswerable | F1 Score Answerable | Recall Unanswerable | Recall Answerable | Ragas Faithfulness (on Truly Answerable) | Joint Answerability- Faithfulness Score |
144
- |:--------------------------------:|:----------------------------------:|:--------------------------------:|:------------------------------:|:----------------------------:|:--------------------------------------------------------:|:------------------------------------------------------:|
145
- | Granite 3.2-8b Instruct | 14 | 76 | 8 | 97 | 75 | 50 |
146
- | Granite 3.2-8b LoRA | 47 | 77 | 37 | 88 | 70 | 57 |
147
 
 
 
 
 
148
 
149
  ## Model Card Authors
150
 
 
107
  | | Precision | Recall | F1 | Precision | Recall | F1 | | |
108
  | BigBird (pre-trained embeddings) w/ MLP | 49.2 | 68.5 | 57.3 | 48 | 29.2 | 36.3 | 48.9 | 46.8 |
109
  | llama2-7b as classifier (Full SFT) | 72.2 | 71 | 71.6 | 71.4 | 72.6 | 72 | 71.8 | 71.8 |
110
+ | Granite 3.2-8b LoRA | 87.9 | 69 | 77.3 | 74.4 | 90.5 | 81.7 | 79.7 | 79.5 |
111
 
112
 
113
  - Multi-turn Setting (MT-RAG Benchmark): In this setting, the model is given the full multi-turn conversation history along with the supporting documents. This benchmark evaluates the model's ability to assess answerability when the final user query can also depend on prior turns for context.
 
118
  | | Precision | Recall | F1 | Precision | Recall | F1 | | |
119
  | BigBird (pre-trained embeddings) w/ MLP | 69.6 | 77.6 | 73.4 | 70.1 | 60.8 | 65.2 | 69.8 | 69.6 |
120
  | llama2-7b as classifier (Full SFT) | 86.9 | 89.4 | 88.2 | 87.3 | 84.5 | 85.9 | 87.1 | 87.1 |
121
+ | Granite 3.2-8b LoRA | 89.1 | 93.7 | 91.3 | 92.3 | 86.7 | 89.4 | 90.5 | 90.5
122
 
123
 
124
  ### Comparing LoRA Adapter vs. Vanilla Granite for Answer Quality
 
138
  This score rewards the model for correctly abstaining on unanswerable queries (full credit) and for providing faithful answers on answerable queries (partial credit based on RAGAS Faithfulness). No credit is given for incorrect or unfaithful predictions.
139
 
140
 
141
+ The LoRA adapter achieves a 17\% lift on this metric - rewarding the model for correctly abstaining on unanswerable queries and for being faithful when it chooses to answer.
142
 
 
 
 
 
143
 
144
+ | | F1 Score Unanswerable | F1 Score Answerable | Recall Unanswerable | Recall Answerable | Joint Answerability- Faithfulness Score |
145
+ |:--------------------------------:|:----------------------------------:|:--------------------------------:|:------------------------------:|:----------------------------:|:------------------------------------------------------:|
146
+ | Granite 3.2-8b Instruct | 14 | 76 | 8 | 97 | 50 |
147
+ | Granite 3.2-8b LoRA | 68 | 82 | 64 | 85 | 67 |
148
 
149
  ## Model Card Authors
150
 
adapter_config.json CHANGED
@@ -29,7 +29,5 @@
29
  ],
30
  "task_type": "CAUSAL_LM",
31
  "use_dora": false,
32
- "use_rslora": false,
33
-
34
- "model_type": "granite"
35
  }
 
29
  ],
30
  "task_type": "CAUSAL_LM",
31
  "use_dora": false,
32
+ "use_rslora": false
 
 
33
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dddf86fe562228b248556ee9807b84656da6f66e1a13bb50b418d0e616fc3a23
3
  size 94404160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b177dadd8b111877d385b1d48722dc61213cb18620e41cc2ddd66ec8df64243c
3
  size 94404160