Upload 3 files
#2
by
pvn251
- opened
- README.md +7 -7
- adapter_config.json +1 -3
- adapter_model.safetensors +1 -1
README.md
CHANGED
@@ -107,7 +107,7 @@ We evaluated the model against baselines on binary answerability classification
|
|
107 |
| | Precision | Recall | F1 | Precision | Recall | F1 | | |
|
108 |
| BigBird (pre-trained embeddings) w/ MLP | 49.2 | 68.5 | 57.3 | 48 | 29.2 | 36.3 | 48.9 | 46.8 |
|
109 |
| llama2-7b as classifier (Full SFT) | 72.2 | 71 | 71.6 | 71.4 | 72.6 | 72 | 71.8 | 71.8 |
|
110 |
-
|
|
111 |
|
112 |
|
113 |
- Multi-turn Setting (MT-RAG Benchmark): In this setting, the model is given the full multi-turn conversation history along with the supporting documents. This benchmark evaluates the model's ability to assess answerability when the final user query can also depend on prior turns for context.
|
@@ -118,7 +118,7 @@ We evaluated the model against baselines on binary answerability classification
|
|
118 |
| | Precision | Recall | F1 | Precision | Recall | F1 | | |
|
119 |
| BigBird (pre-trained embeddings) w/ MLP | 69.6 | 77.6 | 73.4 | 70.1 | 60.8 | 65.2 | 69.8 | 69.6 |
|
120 |
| llama2-7b as classifier (Full SFT) | 86.9 | 89.4 | 88.2 | 87.3 | 84.5 | 85.9 | 87.1 | 87.1 |
|
121 |
-
|
|
122 |
|
123 |
|
124 |
### Comparing LoRA Adapter vs. Vanilla Granite for Answer Quality
|
@@ -138,13 +138,13 @@ We compare the performance of Granite 3.2-8b Instruct vs. Granite 3.2-8b LoRA ad
|
|
138 |
This score rewards the model for correctly abstaining on unanswerable queries (full credit) and for providing faithful answers on answerable queries (partial credit based on RAGAS Faithfulness). No credit is given for incorrect or unfaithful predictions.
|
139 |
|
140 |
|
141 |
-
The LoRA adapter achieves a
|
142 |
|
143 |
-
| | F1 Score Unanswerable | F1 Score Answerable | Recall Unanswerable | Recall Answerable | Ragas Faithfulness (on Truly Answerable) | Joint Answerability- Faithfulness Score |
|
144 |
-
|:--------------------------------:|:----------------------------------:|:--------------------------------:|:------------------------------:|:----------------------------:|:--------------------------------------------------------:|:------------------------------------------------------:|
|
145 |
-
| Granite 3.2-8b Instruct | 14 | 76 | 8 | 97 | 75 | 50 |
|
146 |
-
| Granite 3.2-8b LoRA | 47 | 77 | 37 | 88 | 70 | 57 |
|
147 |
|
|
|
|
|
|
|
|
|
148 |
|
149 |
## Model Card Authors
|
150 |
|
|
|
107 |
| | Precision | Recall | F1 | Precision | Recall | F1 | | |
|
108 |
| BigBird (pre-trained embeddings) w/ MLP | 49.2 | 68.5 | 57.3 | 48 | 29.2 | 36.3 | 48.9 | 46.8 |
|
109 |
| llama2-7b as classifier (Full SFT) | 72.2 | 71 | 71.6 | 71.4 | 72.6 | 72 | 71.8 | 71.8 |
|
110 |
+
| Granite 3.2-8b LoRA | 87.9 | 69 | 77.3 | 74.4 | 90.5 | 81.7 | 79.7 | 79.5 |
|
111 |
|
112 |
|
113 |
- Multi-turn Setting (MT-RAG Benchmark): In this setting, the model is given the full multi-turn conversation history along with the supporting documents. This benchmark evaluates the model's ability to assess answerability when the final user query can also depend on prior turns for context.
|
|
|
118 |
| | Precision | Recall | F1 | Precision | Recall | F1 | | |
|
119 |
| BigBird (pre-trained embeddings) w/ MLP | 69.6 | 77.6 | 73.4 | 70.1 | 60.8 | 65.2 | 69.8 | 69.6 |
|
120 |
| llama2-7b as classifier (Full SFT) | 86.9 | 89.4 | 88.2 | 87.3 | 84.5 | 85.9 | 87.1 | 87.1 |
|
121 |
+
| Granite 3.2-8b LoRA | 89.1 | 93.7 | 91.3 | 92.3 | 86.7 | 89.4 | 90.5 | 90.5
|
122 |
|
123 |
|
124 |
### Comparing LoRA Adapter vs. Vanilla Granite for Answer Quality
|
|
|
138 |
This score rewards the model for correctly abstaining on unanswerable queries (full credit) and for providing faithful answers on answerable queries (partial credit based on RAGAS Faithfulness). No credit is given for incorrect or unfaithful predictions.
|
139 |
|
140 |
|
141 |
+
The LoRA adapter achieves a 17\% lift on this metric - rewarding the model for correctly abstaining on unanswerable queries and for being faithful when it chooses to answer.
|
142 |
|
|
|
|
|
|
|
|
|
143 |
|
144 |
+
| | F1 Score Unanswerable | F1 Score Answerable | Recall Unanswerable | Recall Answerable | Joint Answerability- Faithfulness Score |
|
145 |
+
|:--------------------------------:|:----------------------------------:|:--------------------------------:|:------------------------------:|:----------------------------:|:------------------------------------------------------:|
|
146 |
+
| Granite 3.2-8b Instruct | 14 | 76 | 8 | 97 | 50 |
|
147 |
+
| Granite 3.2-8b LoRA | 68 | 82 | 64 | 85 | 67 |
|
148 |
|
149 |
## Model Card Authors
|
150 |
|
adapter_config.json
CHANGED
@@ -29,7 +29,5 @@
|
|
29 |
],
|
30 |
"task_type": "CAUSAL_LM",
|
31 |
"use_dora": false,
|
32 |
-
"use_rslora": false
|
33 |
-
|
34 |
-
"model_type": "granite"
|
35 |
}
|
|
|
29 |
],
|
30 |
"task_type": "CAUSAL_LM",
|
31 |
"use_dora": false,
|
32 |
+
"use_rslora": false
|
|
|
|
|
33 |
}
|
adapter_model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 94404160
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b177dadd8b111877d385b1d48722dc61213cb18620e41cc2ddd66ec8df64243c
|
3 |
size 94404160
|