Alex commited on
Commit
f2ae64e
·
verified ·
1 Parent(s): 6c03a2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +738 -188
README.md CHANGED
@@ -1,199 +1,749 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ datasets:
4
+ - nvidia/Aegis-AI-Content-Safety-Dataset-1.0
5
  ---
6
 
7
+ # Model Card for AC/Meta-Llama-Guard-2-8B_Nvidia-Aegis-AI-Safety
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
+ A meta-llama/Meta-Llama-Guard-2-8B model fine-tuned on the nvidia/Aegis-AI-Content-Safety-Dataset-1.0 dataset. A total of 3099 examples are in the training set.
11
+
12
+ This is a multi-label text classifier that has 14 categories:
13
+ - "0": "Controlled/Regulated Substances"
14
+ - "1": "Criminal Planning/Confessions"
15
+ - "2": "Deception/Fraud"
16
+ - "3": "Guns and Illegal Weapons"
17
+ - "4": "Harassment"
18
+ - "5": "Hate/Identity Hate"
19
+ - "6": "Needs Caution"
20
+ - "7": "PII/Privacy"
21
+ - "8": "Profanity"
22
+ - "9": "Sexual"
23
+ - "10": "Sexual (minor)"
24
+ - "11": "Suicide and Self Harm"
25
+ - "12": "Threat"
26
+ - "13": "Violence"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## How to Get Started with the Model
29
 
30
+ ```py
31
+ from accelerate import Accelerator
32
+ from datasets import load_dataset, Dataset, DatasetDict
33
+ from datetime import datetime
34
+ from transformers import AutoModelForSequenceClassification, AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, EvalPrediction, DataCollatorWithPadding, Pipeline, pipeline, BitsAndBytesConfig
35
+ from transformers.pipelines import PIPELINE_REGISTRY, TextClassificationPipeline
36
+ from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, PeftModel, AutoPeftModelForCausalLM
37
+
38
+ import numpy as np
39
+ import torch
40
+ import os
41
+ import pandas as pd
42
+ import evaluate
43
+ import torch
44
+
45
+ accelerator = Accelerator()
46
+ device = accelerator.device
47
+
48
+ BASE_MODEL_PATH = "meta-llama/Meta-Llama-Guard-2-8B"
49
+ MODEL_PEFT = AC/Meta-Llama-Guard-2-8B_Nvidia-Aegis-AI-Safety
50
+
51
+ def load_model(model_path, quantize = True, peft_adapter_path=None):
52
+ if quantize:
53
+ nf4_config = BitsAndBytesConfig(
54
+ load_in_4bit=True,
55
+ bnb_4bit_quant_type="nf4",
56
+ bnb_4bit_use_double_quant=True,
57
+ bnb_4bit_compute_dtype=torch.bfloat16
58
+ )
59
+ model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=nf4_config, trust_remote_code=True)
60
+ else:
61
+ model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
62
+
63
+ # Load tokenizer and model from the local folder
64
+ tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side="left")
65
+
66
+
67
+ # NOTE: base_model is modified when the PeftModel is created from it
68
+ # Hence, if we want to access the base_model, we can't use the "base_model" variable. We can just re-initialize our base_model by loading it from scratch again"
69
+ if peft_adapter_path:
70
+ print(f"Attaching PEFT Adapters from folder {peft_adapter_path}...")
71
+ model = PeftModel.from_pretrained(
72
+ model = model, # The model to be adapted. This model should be initialized with from_pretrained
73
+ model_id = peft_adapter_path, # Directory containing the PEFT configuration file
74
+ is_trainable = False, # Adapter is frozen and will only be used for inference
75
+ )
76
+ # This should make the runtime more efficient by merging the adapter weights with the llm weights. But I realize when I do this, the PEFT LLM isn't performing as well....
77
+ # model.merge_and_unload()
78
+
79
+ if tokenizer.pad_token_id is None:
80
+ # Required for batching example
81
+ tokenizer.pad_token_id = tokenizer.eos_token_id
82
+
83
+ return model, tokenizer
84
+
85
+
86
+ # This is the original labels from llamaguard
87
+ updated_llamaguard_id2label = {
88
+ "S1": "Controlled/Regulated Substances",
89
+ "S2": "Criminal Planning/Confessions",
90
+ "S3": "Deception/Fraud",
91
+ "S4": "Guns and Illegal Weapons",
92
+ "S5": "Harassment",
93
+ "S6": "Hate/Identity Hate",
94
+ "S7": "Needs Caution",
95
+ "S8": "PII/Privacy",
96
+ "S9": "Profanity",
97
+ "S10": "Sexual",
98
+ "S11": "Sexual (minor)",
99
+ "S12": "Suicide and Self Harm",
100
+ "S13": "Threat",
101
+ "S14": "Violence"
102
+ }
103
+
104
+ # Returns a list of two formatted prompts. The first one will check against the User message, the second will check against the Agent message
105
+ def format_to_qa(user_prompt):
106
+ system_input = '''
107
+ <|begin_of_text|>[INST] Task: Check if there is unsafe content in User messages in conversations according to our safety policy with the below categories.
108
+
109
+ <BEGIN UNSAFE CONTENT CATEGORIES>
110
+ S1: Controlled/Regulated Substances
111
+ S2: Criminal Planning/Confessions
112
+ S3: Deception/Fraud
113
+ S4: Guns and Illegal Weapons
114
+ S5: Harassment
115
+ S6: Hate/Identity Hate
116
+ S7: Needs Caution
117
+ S8: PII/Privacy
118
+ S9: Profanity
119
+ S10: Sexual
120
+ S11: Sexual (minor)
121
+ S12: Suicide and Self Harm
122
+ S13: Threat
123
+ S14: Violence
124
+ <END UNSAFE CONTENT CATEGORIES>
125
+
126
+ <BEGIN CONVERSATION>
127
+
128
+ User: {user_prompt}
129
+
130
+ <END CONVERSATION>
131
+
132
+ Provide your safety assessment for ONLY THE LAST User in the above conversation:
133
+ - First line must read 'safe' or 'unsafe'.
134
+ - If unsafe, a second line must include a comma-separated list of violated
135
+ [/INST]
136
+ '''.strip()
137
+
138
+ return system_input.format(user_prompt=user_prompt)
139
+
140
+ # Returns the tex response
141
+ def llm(model, tokenizer, prompts, max_new_tokens=10):
142
+ inputs = tokenizer([prompts], return_tensors="pt", padding=True).to(device)
143
+ outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, pad_token_id=tokenizer.eos_token_id)
144
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
145
+ return response
146
+
147
+ # Extracts the output
148
+ def format_ans(resp):
149
+ final_resp = resp.split("[/INST]")
150
+ if len(final_resp) == 1:
151
+ print("Error in handling output!")
152
+ print(f"Initial Response from LLM was: {resp}")
153
+ return "Error Encountered: LLM Output format is unexpected"
154
+ output = final_resp[-1].strip()
155
+ # With our finetuned model, it tends to repeat itself.. so it might print out part of the next prompt...
156
+ output = output.split("Provide")[0].strip()
157
+
158
+ if "unsafe" in output:
159
+ violated_categories = output.split("\n")
160
+ # violated_categories = violated_categories.split("Provide")[0]
161
+ # 1st line should be "unsafe", while 2nd line is a list of comma separated unsafe categories
162
+ if len(violated_categories) != 2:
163
+ print("Error in handling output!")
164
+ print(f"Initial Response from LLM was: {resp}")
165
+ return f"Error Encountered: LLM Output format is unexpected"
166
+
167
+ # There might be multiple unsafe categories flagged
168
+ category_names = []
169
+ for category in violated_categories[-1].split(","):
170
+ category_names.append(updated_llamaguard_id2label[category.strip()])
171
+ return ", ".join(category_names)
172
+ elif output == "safe":
173
+ return "safe"
174
+ else:
175
+ print(f"Initial Response from LLM was: {resp}")
176
+ return "Error Encountered: LLM Output format is unexpected"
177
+
178
+ # Returns a one-hot-encoded list
179
+ def format_ans_for_eval(ans):
180
+ final_labels = np.zeros(len(all_labels))
181
+ if ans == "safe":
182
+ return final_labels
183
+ else:
184
+ for label in ans.split(","):
185
+ label = label.strip()
186
+ label_id = label2id[label]
187
+ final_labels[label_id] = 1
188
+
189
+ return final_labels
190
+
191
+
192
+ llamaguard_model, llamaguard_tokenizer = load_model(BASE_MODEL_PATH, quantize=True, peft_adapter_path=MODEL_PEFT)
193
+
194
+ formatted_prompt = format_to_qa("How to make a bomb?")
195
+ resp = llm(llamaguard_model, llamaguard_tokenizer, formatted_prompt)
196
+ final_ans = format_ans(resp)
197
+ print(final_ans)
198
+ ```
199
 
200
  ## Evaluation
201
+ Evaluation is conducted on the test set in nvidia/Aegis-AI-Content-Safety-Dataset-1.0 dataset. A total of 359 examples are in the test set.
202
+
203
+ For AI safety use case, having false negatives (text was actually toxic but model predicted it as safe) is worse than having false positives (text was actually safe but model predicted it as unsafe)
204
+
205
+ Precision: Out of all text predicted as toxic, how many were actually toxic?
206
+ Recall: Out of all text that were actually toxic, how many were predicted toxic?
207
+
208
+ As we want to reduce false negatives, we will focus on recall.
209
+
210
+ | Metric | AC/Meta-Llama-Guard-2-8B_Nvidia-Aegis-AI-Safety | meta-llama/Meta-Llama-Guard-2-8B |
211
+ | :----------- | :----------- | :----------- |
212
+ | accuracy | 0.7713887783525667 | 0.903899721448468 |
213
+ | f1 | 0.17397555715312724 | 0.2823179791976226 |
214
+ | precision | 0.11234911792014857 | 0.2646239554317549 |
215
+ | recall | 0.3853503184713376 | 0.30254777070063693 |
216
+ | TP | 3756 | 4448 |
217
+ | TN | 121 | 95 |
218
+ | FP | 956 | 264 |
219
+ | FN | 193 | 219 |
220
+
221
+
222
+ ## Finetuning
223
+ ```
224
+ import os
225
+ import time
226
+ import torch
227
+ import gc
228
+
229
+ from accelerate import Accelerator
230
+ import bitsandbytes as bnb
231
+ from datasets import load_dataset, DatasetDict, Dataset
232
+ from datetime import datetime
233
+ from functools import partial
234
+ from huggingface_hub import snapshot_download
235
+ from transformers import (
236
+ AutoModelForCausalLM,
237
+ AutoTokenizer,
238
+ BitsAndBytesConfig,
239
+ HfArgumentParser,
240
+ Trainer,
241
+ TrainingArguments,
242
+ DataCollatorForLanguageModeling,
243
+ EarlyStoppingCallback,
244
+ pipeline,
245
+ logging,
246
+ set_seed,
247
+ )
248
+ from random import randrange
249
+ from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, PeftModel, AutoPeftModelForCausalLM
250
+ from trl import SFTTrainer
251
+
252
+ import pandas as pd
253
+ import json
254
+
255
+
256
+ ################################################################################
257
+ # QLoRA parameters
258
+ ################################################################################
259
+ lora_r = 8 # Higher rank gives better performance, but more compute needed during finetuning
260
+ lora_alpha = 64 # Scaling factor for the learned weights. Higher alpha assigns more weight to LoRA activations
261
+ lora_dropout = 0.1 # Dropout probability for LoRA layers
262
+ bias = "none" # Specify whether the corresponding biases will be updated during training
263
+ task_type = "CAUSAL_LM" # Task type
264
+
265
+ ################################################################################
266
+ # TrainingArguments parameters
267
+ ################################################################################
268
+ batch_size = 3 # Batch size per GPU for training
269
+ max_steps = 1500 # Number of steps to train. A step is one gradient update (based on batch size), while an epoch consists of one full cycle through the training data, which is usually many steps
270
+ output_dir = f'./lora/safety-{datetime.now().strftime("%d-%m-%Y_%H-%M")}' # Output directory where the model predictions and checkpoints will be stored
271
+
272
+
273
+
274
+ all_labels = [
275
+ 'Controlled/Regulated Substances',
276
+ 'Criminal Planning/Confessions',
277
+ 'Deception/Fraud',
278
+ 'Guns and Illegal Weapons',
279
+ 'Harassment',
280
+ 'Hate/Identity Hate',
281
+ 'Needs Caution',
282
+ 'PII/Privacy',
283
+ 'Profanity',
284
+ 'Sexual',
285
+ 'Sexual (minor)',
286
+ 'Suicide and Self Harm',
287
+ 'Threat',
288
+ 'Violence'
289
+ ]
290
+
291
+ id2label = {idx:label for idx, label in enumerate(all_labels)}
292
+ label2id = {label:idx for idx, label in enumerate(all_labels)}
293
+
294
+ # This is the mappings mapped to Llamaguard2's format (S{id})
295
+ llamaguard_id2label = {
296
+ "S1": "Controlled/Regulated Substances",
297
+ "S2": "Criminal Planning/Confessions",
298
+ "S3": "Deception/Fraud",
299
+ "S4": "Guns and Illegal Weapons",
300
+ "S5": "Harassment",
301
+ "S6": "Hate/Identity Hate",
302
+ "S7": "Needs Caution",
303
+ "S8": "PII/Privacy",
304
+ "S9": "Profanity",
305
+ "S10": "Sexual",
306
+ "S11": "Sexual (minor)",
307
+ "S12": "Suicide and Self Harm",
308
+ "S13": "Threat",
309
+ "S14": "Violence"
310
+ }
311
+
312
+ llamaguard_label2id = {
313
+ 'Controlled/Regulated Substances': 'S1',
314
+ 'Criminal Planning/Confessions': 'S2',
315
+ 'Deception/Fraud': 'S3',
316
+ 'Guns and Illegal Weapons': 'S4',
317
+ 'Harassment': 'S5',
318
+ 'Hate/Identity Hate': 'S6',
319
+ 'Needs Caution': 'S7',
320
+ 'PII/Privacy': 'S8',
321
+ 'Profanity': 'S9',
322
+ 'Sexual': 'S10',
323
+ 'Sexual (minor)': 'S11',
324
+ 'Suicide and Self Harm': 'S12',
325
+ 'Threat': 'S13',
326
+ 'Violence': 'S14'
327
+ }
328
+
329
+
330
+
331
+
332
+ accelerator = Accelerator()
333
+ device = accelerator.device
334
+
335
+ print(f"Using device: {repr(device)}")
336
+
337
+ BASE_MODEL_PATH = "meta-llama/Meta-Llama-Guard-2-8B"
338
+
339
+ def load_model(model_path, peft_adapter_path=None):
340
+ nf4_config = BitsAndBytesConfig(
341
+ load_in_4bit=True,
342
+ bnb_4bit_quant_type="nf4",
343
+ bnb_4bit_use_double_quant=True,
344
+ bnb_4bit_compute_dtype=torch.bfloat16
345
+ )
346
+
347
+ # Load tokenizer and model from the local folder
348
+ tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side="left")
349
+
350
+ model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=nf4_config, trust_remote_code=True)
351
+
352
+ # NOTE: base_model is modified when the PeftModel is created from it
353
+ # Hence, if we want to access the base_model, we can't use the "base_model" variable. We can just re-initialize our base_model by loading it from scratch again"
354
+ if peft_adapter_path:
355
+ print("Attaching PEFT Adapters...")
356
+ model = PeftModel.from_pretrained(
357
+ model = model, # The model to be adapted. This model should be initialized with from_pretrained
358
+ model_id = peft_adapter_path, # Directory containing the PEFT configuration file
359
+ is_trainable = False, # Adapter is frozen and will only be used for inference
360
+ )
361
+ # This should make the runtime more efficient by merging the adapter weights with the llm weights. But I realize when I do this, the PEFT LLM isn't performing as well....
362
+ # model.merge_and_unload()
363
+
364
+ if tokenizer.pad_token_id is None:
365
+ # Required for batching example
366
+ tokenizer.pad_token_id = tokenizer.eos_token_id
367
+
368
+ return model, tokenizer
369
+
370
+ # base_model, tokenizer = load_model(BASE_MODEL_PATH)
371
+
372
+
373
+ def delete_var(var_name):
374
+ if var_name in globals():
375
+ del globals()[var_name]
376
+
377
+
378
+ def free_memory():
379
+ for i in ["base_model", "peft_model", "tokenizer"]:
380
+ delete_var(i)
381
+
382
+ gc.collect()
383
+ torch.cuda.empty_cache()
384
+
385
+ # Returns a list of two formatted prompts. The first one will check against the User message, the second will check against the Agent message
386
+ # Finetuning is used for finetuning preparation... It is a dictionary of {"output": val, "eos_token": val}
387
+ def format_to_qa(user_prompt, finetuning=None):
388
+ system_input = '''
389
+ <|begin_of_text|>[INST] Task: Check if there is unsafe content in User messages in conversations according to our safety policy with the below categories.
390
+
391
+ <BEGIN UNSAFE CONTENT CATEGORIES>
392
+ S1: Controlled/Regulated Substances
393
+ S2: Criminal Planning/Confessions
394
+ S3: Deception/Fraud
395
+ S4: Guns and Illegal Weapons
396
+ S5: Harassment
397
+ S6: Hate/Identity Hate
398
+ S7: Needs Caution
399
+ S8: PII/Privacy
400
+ S9: Profanity
401
+ S10: Sexual
402
+ S11: Sexual (minor)
403
+ S12: Suicide and Self Harm
404
+ S13: Threat
405
+ S14: Violence
406
+ <END UNSAFE CONTENT CATEGORIES>
407
+
408
+ <BEGIN CONVERSATION>
409
+
410
+ User: {user_prompt}
411
+
412
+ <END CONVERSATION>
413
+
414
+ Provide your safety assessment for ONLY THE LAST User in the above conversation:
415
+ - First line must read 'safe' or 'unsafe'.
416
+ - If unsafe, a second line must include a comma-separated list of violated
417
+ [/INST]
418
+ '''.strip()
419
+
420
+ formatted_template = system_input.format(user_prompt=user_prompt)
421
+ if finetuning:
422
+ return f"{formatted_template}{finetuning['output']}"
423
+ # return f"{formatted_template}{finetuning['output']}{finetuning['eos_token']}"
424
+ else:
425
+ return formatted_template
426
+
427
+ # Returns the text response
428
+ def llm(model, tokenizer, prompts):
429
+ inputs = tokenizer([prompts], return_tensors="pt", padding=True).to(device)
430
+ outputs = model.generate(**inputs, max_new_tokens=500, pad_token_id=tokenizer.eos_token_id)
431
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
432
+ return response
433
+
434
+ # Extracts the output
435
+ def format_ans(resp):
436
+ final_resp = resp.split("[/INST]")
437
+ if len(final_resp) == 1:
438
+ print("Error in handling output!")
439
+ print(f"Initial Response from LLM was: {resp}")
440
+ return "Error Encountered: LLM Output format is unexpected"
441
+ output = final_resp[-1].strip()
442
+
443
+ if "unsafe" in output:
444
+ violated_categories = output.split("\n")
445
+ # 1st line should be "unsafe", while 2nd line is a list of comma separated unsafe categories
446
+ if len(violated_categories) != 2:
447
+ print("Error in handling output!")
448
+ print(f"Initial Response from LLM was: {resp}")
449
+ return f"Error Encountered: LLM Output format is unexpected"
450
+
451
+ # There might be multiple unsafe categories flagged
452
+ category_names = []
453
+ for category in violated_categories[-1].split(","):
454
+ category_names.append(llamaguard_id2label[category.strip()])
455
+ return ", ".join(category_names)
456
+ elif output == "safe":
457
+ return "safe"
458
+ else:
459
+ print(f"Initial Response from LLM was: {resp}")
460
+ return "Error Encountered: LLM Output format is unexpected"
461
+
462
+ # Returns a one-hot-encoded list
463
+ def format_ans_for_eval(ans):
464
+ final_labels = np.zeros(len(all_labels))
465
+ if ans == "safe":
466
+ return final_labels
467
+ else:
468
+ for label in ans.split(","):
469
+ label = label.strip()
470
+ label_id = label2id[label]
471
+ final_labels[label_id] = 1
472
+
473
+ return final_labels
474
+
475
+
476
+
477
+ train_df = pd.read_csv("nvidia_train.csv")
478
+ test_df = pd.read_csv("nvidia_test.csv")
479
+
480
+ dataset = DatasetDict({
481
+ 'train': Dataset.from_pandas(train_df),
482
+ 'test': Dataset.from_pandas(test_df)}
483
+ )
484
+
485
+
486
+ base_model, tokenizer = load_model(BASE_MODEL_PATH)
487
+
488
+
489
+ # Used when we are formatting our prompt in create_prompt_formats
490
+ EOS_token = tokenizer.eos_token
491
+
492
+ # We want the label to be the label IDs, separated by commas. E.g. (S1, S2, S3)
493
+ def format_labels(examples):
494
+ final_label = []
495
+ for label in all_labels:
496
+ if examples[label] == True:
497
+ # We don't add the label name itself, but the label ID
498
+ final_label.append(llamaguard_label2id[label])
499
+ if len(final_label) == 0:
500
+ final_label = "safe"
501
+ else:
502
+ final_label = ", ".join(final_label)
503
+ final_label = f"unsafe\n{final_label}"
504
+ examples["final_label"] = final_label
505
+ return examples
506
+
507
+
508
+ def preprocess_text(examples, max_length):
509
+ # Populate the QA template
510
+ template = format_to_qa(examples["text"], finetuning={"output": examples["final_label"], "eos_token": EOS_token})
511
+ # Tokenize the QA template
512
+ examples["formatted"] = template
513
+ return tokenizer(template, truncation=True, max_length=max_length)
514
+
515
+ # Get the maximum length of our Model
516
+ def get_max_length(model):
517
+ """
518
+ Extracts maximum token length from the model configuration
519
+
520
+ :param model: Hugging Face model
521
+ """
522
+
523
+ conf = model.config
524
+ # Initialize a "max_length" variable to store maximum sequence length as null
525
+ max_length = None
526
+ # Find maximum sequence length in the model configuration and save it in "max_length" if found
527
+ for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
528
+ # Get the "length_setting" attribute from model.config. If there is no such attribute, set the value of max_length to None
529
+ max_length = getattr(model.config, length_setting, None)
530
+ if max_length:
531
+ print(f"Found max lenth: {max_length}")
532
+ break
533
+ # Set "max_length" to 1024 (default value) if maximum sequence length is not found in the model configuration
534
+ if not max_length:
535
+ max_length = 1024
536
+ print(f"Using default max length: {max_length}")
537
+ return max_length
538
+
539
+
540
+
541
+ max_length = get_max_length(base_model)
542
+
543
+ preprocessed_dataset = dataset.map(format_labels)
544
+
545
+ _preprocess_text = partial(preprocess_text, max_length=max_length)
546
+ preprocessed_dataset = preprocessed_dataset.map(_preprocess_text, remove_columns=all_labels)
547
+ preprocessed_dataset = preprocessed_dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
548
+
549
+
550
+
551
+ def find_all_linear_names(model):
552
+ """
553
+ Find modules to apply LoRA to.
554
+
555
+ :param model: PEFT model
556
+ """
557
+
558
+ cls = bnb.nn.Linear4bit
559
+ lora_module_names = set()
560
+ for name, module in model.named_modules():
561
+ if isinstance(module, cls):
562
+ names = name.split('.')
563
+ lora_module_names.add(names[0] if len(names) == 1 else names[-1])
564
+
565
+ if 'lm_head' in lora_module_names:
566
+ lora_module_names.remove('lm_head')
567
+ print(f"LoRA module names: {list(lora_module_names)}")
568
+ return list(lora_module_names)
569
+
570
+ def print_trainable_parameters(model, use_4bit = False):
571
+ """
572
+ Prints the number of trainable parameters in the model.
573
+
574
+ :param model: PEFT model
575
+ """
576
+
577
+ trainable_params = 0
578
+ all_param = 0
579
+
580
+ for _, param in model.named_parameters():
581
+ num_params = param.numel()
582
+ if num_params == 0 and hasattr(param, "ds_numel"):
583
+ num_params = param.ds_numel
584
+ all_param += num_params
585
+ if param.requires_grad:
586
+ trainable_params += num_params
587
+
588
+ if use_4bit:
589
+ trainable_params /= 2
590
+
591
+ print(
592
+ f"All Parameters: {all_param:,d} || Trainable Parameters: {trainable_params:,d} || Trainable Parameters %: {100 * trainable_params / all_param}"
593
+ )
594
+
595
+ def create_peft_config(r, lora_alpha, target_modules, lora_dropout, bias, task_type):
596
+ """
597
+ Creates Parameter-Efficient Fine-Tuning configuration for the model
598
+
599
+ :param r: LoRA attention dimension
600
+ :param lora_alpha: Alpha parameter for LoRA scaling
601
+ :param modules: Names of the modules to apply LoRA to
602
+ :param lora_dropout: Dropout Probability for LoRA layers
603
+ :param bias: Specifies if the bias parameters should be trained
604
+ """
605
+ config = LoraConfig(
606
+ r = r,
607
+ lora_alpha = lora_alpha,
608
+ target_modules = target_modules,
609
+ lora_dropout = lora_dropout,
610
+ bias = bias,
611
+ task_type = task_type,
612
+ )
613
+
614
+ return config
615
+
616
+ def fine_tune(model,
617
+ tokenizer,
618
+ dataset,
619
+ output_dir,
620
+ lora_r,
621
+ lora_alpha,
622
+ lora_dropout,
623
+ bias,
624
+ task_type,
625
+ batch_size,
626
+ max_steps):
627
+ """
628
+ Prepares and fine-tune the pre-trained model.
629
+
630
+ :param model: Pre-trained Hugging Face model
631
+ :param tokenizer: Model tokenizer
632
+ :param dataset: Preprocessed training dataset
633
+ """
634
+
635
+ target_modules = find_all_linear_names(model)
636
+
637
+ # Enable gradient checkpointing to reduce memory usage during fine-tuning
638
+ model.gradient_checkpointing_enable()
639
+
640
+ # Prepare the model for QLoRA training
641
+ model = prepare_model_for_kbit_training(model)
642
+
643
+ # Get LoRA module names
644
+ target_modules = find_all_linear_names(model)
645
+
646
+ # Create PEFT configuration
647
+ peft_config = create_peft_config(lora_r, lora_alpha, target_modules, lora_dropout, bias, task_type)
648
+
649
+ # Create a trainable PeftModel
650
+ peft_model = get_peft_model(model, peft_config)
651
+
652
+ # Print information about the percentage of trainable parameters
653
+ print_trainable_parameters(peft_model)
654
+
655
+ # Training parameters
656
+ training_args = TrainingArguments(
657
+ output_dir=output_dir,
658
+ logging_dir=f"{output_dir}/logs",
659
+ learning_rate=2e-5,
660
+ gradient_accumulation_steps=4,
661
+ per_device_train_batch_size=batch_size,
662
+ per_device_eval_batch_size=batch_size,
663
+ max_steps=max_steps,
664
+ weight_decay=0.01,
665
+ fp16=True,
666
+ evaluation_strategy="steps",
667
+ eval_steps=0.1,
668
+ logging_strategy="steps",
669
+ logging_steps=0.1,
670
+ save_strategy="steps",
671
+ save_steps=0.1,
672
+ save_total_limit=2,
673
+ load_best_model_at_end=True,
674
+ )
675
+
676
+ trainer = Trainer(
677
+ model=peft_model,
678
+ args=training_args,
679
+ train_dataset=dataset["train"],
680
+ eval_dataset=dataset["test"],
681
+ tokenizer=tokenizer,
682
+ data_collator = DataCollatorForLanguageModeling(tokenizer, mlm = False)
683
+ )
684
+
685
+ peft_model.config.use_cache = False
686
+
687
+ # Launch training and log metrics
688
+ print("Training...")
689
+
690
+ train_result = trainer.train()
691
+ metrics = train_result.metrics
692
+ trainer.log_metrics("train", metrics)
693
+ trainer.save_metrics("train", metrics)
694
+ trainer.save_state()
695
+ print(metrics)
696
+
697
+ # # Evaluate model
698
+ # print("Evaluating...")
699
+ # eval_metrics = trainer.evaluate()
700
+ # print(eval_metrics) # This will print the evaluation metrics
701
+ # trainer.log_metrics("eval", eval_metrics)
702
+ # trainer.save_metrics("eval", eval_metrics)
703
+
704
+ # Save best model
705
+ print("Saving best checkpoint of the model...")
706
+ os.makedirs(output_dir, exist_ok = True)
707
+ trainer.model.save_pretrained(output_dir)
708
+
709
+ # Write logs to both the final_dir and the output_dir...
710
+ print("Writing logs...")
711
+ f = open(f"{output_dir}/logs.txt", "w")
712
+ f.write(json.dumps(trainer.state.log_history))
713
+ f.close()
714
+
715
+ # Free memory for merging weights
716
+ del model
717
+ torch.cuda.empty_cache()
718
+
719
+ return trainer
720
+
721
+
722
+ trainer = fine_tune(
723
+ base_model,
724
+ tokenizer,
725
+ preprocessed_dataset,
726
+ output_dir,
727
+ lora_r,
728
+ lora_alpha,
729
+ lora_dropout,
730
+ bias,
731
+ task_type,
732
+ batch_size,
733
+ max_steps
734
+ )
735
+
736
+
737
+ free_memory()
738
+
739
+ # PEFT_ADAPTER_PATH = "./lora/safety"
740
+ PEFT_ADAPTER_PATH = output_dir
741
+
742
+ peft_model, tokenizer = load_model(BASE_MODEL_PATH, PEFT_ADAPTER_PATH)
743
+
744
+ prompt = "How to make a bomb?"
745
+ formatted_prompt = format_to_qa(prompt)
746
+ resp = llm(peft_model, tokenizer, formatted_prompt)
747
+ final_ans = format_ans(resp)
748
+ print(final_ans)
749
+ ```