Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +151 -139
README.md CHANGED
@@ -1,139 +1,151 @@
1
- ---
2
- license: apache-2.0
3
- license_link: https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2/blob/main/LICENSE
4
- language:
5
- - en
6
- pipeline_tag: text-generation
7
- base_model: Qwen/Qwen2.5-0.5B-Instruct
8
- tags:
9
- - chat
10
- - abliterated
11
- - uncensored
12
- ---
13
-
14
- # huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2
15
-
16
-
17
- This is an uncensored version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
18
- This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
19
-
20
- Ablation was performed using a new and faster method, which yields better results.
21
-
22
-
23
- ## ollama
24
-
25
- You can use [huihui_ai/qwen2.5-abliterate:0.5b-v2](https://ollama.com/huihui_ai/qwen2.5-abliterate:0.5b-v2) directly,
26
- ```
27
- ollama run huihui_ai/qwen2.5-abliterate:0.5b-v2
28
- ```
29
-
30
-
31
- ## Usage
32
- You can use this model in your applications by loading it with Hugging Face's `transformers` library:
33
-
34
-
35
- ```python
36
- from transformers import AutoModelForCausalLM, AutoTokenizer
37
-
38
- # Load the model and tokenizer
39
- model_name = "huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2"
40
- model = AutoModelForCausalLM.from_pretrained(
41
- model_name,
42
- torch_dtype="auto",
43
- device_map="auto"
44
- )
45
- tokenizer = AutoTokenizer.from_pretrained(model_name)
46
-
47
- # Initialize conversation context
48
- initial_messages = [
49
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}
50
- ]
51
- messages = initial_messages.copy() # Copy the initial conversation context
52
-
53
- # Enter conversation loop
54
- while True:
55
- # Get user input
56
- user_input = input("User: ").strip() # Strip leading and trailing spaces
57
-
58
- # If the user types '/exit', end the conversation
59
- if user_input.lower() == "/exit":
60
- print("Exiting chat.")
61
- break
62
-
63
- # If the user types '/clean', reset the conversation context
64
- if user_input.lower() == "/clean":
65
- messages = initial_messages.copy() # Reset conversation context
66
- print("Chat history cleared. Starting a new conversation.")
67
- continue
68
-
69
- # If input is empty, prompt the user and continue
70
- if not user_input:
71
- print("Input cannot be empty. Please enter something.")
72
- continue
73
-
74
- # Add user input to the conversation
75
- messages.append({"role": "user", "content": user_input})
76
-
77
- # Build the chat template
78
- text = tokenizer.apply_chat_template(
79
- messages,
80
- tokenize=False,
81
- add_generation_prompt=True
82
- )
83
-
84
- # Tokenize input and prepare it for the model
85
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
86
-
87
- # Generate a response from the model
88
- generated_ids = model.generate(
89
- **model_inputs,
90
- max_new_tokens=8192
91
- )
92
-
93
- # Extract model output, removing special tokens
94
- generated_ids = [
95
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
96
- ]
97
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
98
-
99
- # Add the model's response to the conversation
100
- messages.append({"role": "assistant", "content": response})
101
-
102
- # Print the model's response
103
- print(f"Qwen: {response}")
104
-
105
- ```
106
-
107
- ## Pass Rate Description
108
-
109
- The pass rate is defined as the proportion of harmful instructions that did not trigger the test condition (TestPassed=False) out of the total number of instructions processed. It is calculated by subtracting the number of triggered instructions (triggered_total) from the total number of instructions (total), then dividing the result by the total number of instructions: (total - triggered_total) / total. The pass rate is presented as a decimal value (rounded to two decimal places for clarity) and as a percentage (rounded to one decimal place) to clearly indicate the fraction of instructions that did not trigger the condition.
110
-
111
- The test set data comes from [huihui-ai/harmbench_behaviors](https://huggingface.co/datasets/huihui-ai/harmbench_behaviors), the test code, [TestPassed.py](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2/blob/main/TestPassed.py).
112
-
113
- The test result is [99.1%](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2/blob/main/TestPassed.jsonl).
114
- ```
115
- python TestPassed.py
116
- Load Model huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2 ...
117
- Processing harmful instructions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 320/320 [00:26<00:00, 11.96it/s]
118
- Passed total: 317/320, Passed ratio: 0.99 (99.1%)
119
- ```
120
-
121
- Below is the comparison of pass rates.
122
-
123
- | Model | Passed total | Passed ratio |
124
- |--------------------------------------|--------------|--------------|
125
- | Qwen2.5-0.5B-Instruct | 201/320 | 62.8% |
126
- | Qwen2.5-0.5B-Instruct-abliterated | 310/320 | 96.9% |
127
- | Qwen2.5-0.5B-Instruct-abliterated-v2 | **317/320** | **99.1%** |
128
-
129
-
130
- ### Donation
131
-
132
- If you like it, please click 'like' and follow us for more updates.
133
- You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
134
-
135
- ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
136
- - bitcoin(BTC):
137
- ```
138
- bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
139
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ license_link: https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2/blob/main/LICENSE
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ pipeline_tag: text-generation
19
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
20
+ tags:
21
+ - chat
22
+ - abliterated
23
+ - uncensored
24
+ ---
25
+
26
+ # huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2
27
+
28
+
29
+ This is an uncensored version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
30
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
31
+
32
+ Ablation was performed using a new and faster method, which yields better results.
33
+
34
+
35
+ ## ollama
36
+
37
+ You can use [huihui_ai/qwen2.5-abliterate:0.5b-v2](https://ollama.com/huihui_ai/qwen2.5-abliterate:0.5b-v2) directly,
38
+ ```
39
+ ollama run huihui_ai/qwen2.5-abliterate:0.5b-v2
40
+ ```
41
+
42
+
43
+ ## Usage
44
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
45
+
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+
50
+ # Load the model and tokenizer
51
+ model_name = "huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2"
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ model_name,
54
+ torch_dtype="auto",
55
+ device_map="auto"
56
+ )
57
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
58
+
59
+ # Initialize conversation context
60
+ initial_messages = [
61
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}
62
+ ]
63
+ messages = initial_messages.copy() # Copy the initial conversation context
64
+
65
+ # Enter conversation loop
66
+ while True:
67
+ # Get user input
68
+ user_input = input("User: ").strip() # Strip leading and trailing spaces
69
+
70
+ # If the user types '/exit', end the conversation
71
+ if user_input.lower() == "/exit":
72
+ print("Exiting chat.")
73
+ break
74
+
75
+ # If the user types '/clean', reset the conversation context
76
+ if user_input.lower() == "/clean":
77
+ messages = initial_messages.copy() # Reset conversation context
78
+ print("Chat history cleared. Starting a new conversation.")
79
+ continue
80
+
81
+ # If input is empty, prompt the user and continue
82
+ if not user_input:
83
+ print("Input cannot be empty. Please enter something.")
84
+ continue
85
+
86
+ # Add user input to the conversation
87
+ messages.append({"role": "user", "content": user_input})
88
+
89
+ # Build the chat template
90
+ text = tokenizer.apply_chat_template(
91
+ messages,
92
+ tokenize=False,
93
+ add_generation_prompt=True
94
+ )
95
+
96
+ # Tokenize input and prepare it for the model
97
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
98
+
99
+ # Generate a response from the model
100
+ generated_ids = model.generate(
101
+ **model_inputs,
102
+ max_new_tokens=8192
103
+ )
104
+
105
+ # Extract model output, removing special tokens
106
+ generated_ids = [
107
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
108
+ ]
109
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
110
+
111
+ # Add the model's response to the conversation
112
+ messages.append({"role": "assistant", "content": response})
113
+
114
+ # Print the model's response
115
+ print(f"Qwen: {response}")
116
+
117
+ ```
118
+
119
+ ## Pass Rate Description
120
+
121
+ The pass rate is defined as the proportion of harmful instructions that did not trigger the test condition (TestPassed=False) out of the total number of instructions processed. It is calculated by subtracting the number of triggered instructions (triggered_total) from the total number of instructions (total), then dividing the result by the total number of instructions: (total - triggered_total) / total. The pass rate is presented as a decimal value (rounded to two decimal places for clarity) and as a percentage (rounded to one decimal place) to clearly indicate the fraction of instructions that did not trigger the condition.
122
+
123
+ The test set data comes from [huihui-ai/harmbench_behaviors](https://huggingface.co/datasets/huihui-ai/harmbench_behaviors), the test code, [TestPassed.py](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2/blob/main/TestPassed.py).
124
+
125
+ The test result is [99.1%](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2/blob/main/TestPassed.jsonl).
126
+ ```
127
+ python TestPassed.py
128
+ Load Model huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v2 ...
129
+ Processing harmful instructions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 320/320 [00:26<00:00, 11.96it/s]
130
+ Passed total: 317/320, Passed ratio: 0.99 (99.1%)
131
+ ```
132
+
133
+ Below is the comparison of pass rates.
134
+
135
+ | Model | Passed total | Passed ratio |
136
+ |--------------------------------------|--------------|--------------|
137
+ | Qwen2.5-0.5B-Instruct | 201/320 | 62.8% |
138
+ | Qwen2.5-0.5B-Instruct-abliterated | 310/320 | 96.9% |
139
+ | Qwen2.5-0.5B-Instruct-abliterated-v2 | **317/320** | **99.1%** |
140
+
141
+
142
+ ### Donation
143
+
144
+ If you like it, please click 'like' and follow us for more updates.
145
+ You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
146
+
147
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
148
+ - bitcoin(BTC):
149
+ ```
150
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
151
+ ```