ValueFX9507
/

Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4

@@ -53,10 +53,31 @@ You See See You Fuck Good JOB!
   - 训练中...
 ## 训练效果
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650762d0eac45ee2e420a38b/N2aA5Pjj1uB6U76SJZb78.png)
 ## MGRPO与层传播：
 - **算法改变**：原始GRPO仅通过ORM策略进行偏好学习，但无法评判文学内容生成质量，本次训练使用**上海交通大学**博士**魏XX**、**洪XX**的启发，使用逻辑学算法，解决文学ORM策略设计难点，并且二次返回修改提示词进行错误总结。每次调整均经历2次完整传播过程，为GRPO+GRPO，所以暂时命名为MGRPO。
 - **构架改变**：尝试改变Transformers传播方式，在层内循环处理进行训练，受到Universal Transformers与最新潜空间启发，在训练中让部分层循环激活，为了避免梯度爆炸使用梯度裁切技术，测试发现模型性能得到提升，更多工作还在测试中。
@@ -113,45 +134,6 @@ You See See You Fuck Good JOB!
 ## 暂时没有遇到“啊哈”时刻
-## 注意
-⚠ **需要严格遵循官方示例模板**：
-**返回的上下文需要去除思考标签与内容。否则将无法正确回复！**
-目前前端支持率非常低，建议手动修改前端代码。代码参考如下：
-```
-msg.role === 'assistant' ? {
-...msg,
-content: msg.content.replace(/<think>[\s\S]*?<\/think>/gi, '')
-}
-```
-**官方模板参考**
-```
-{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}
-```
-**官方说明**
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/650762d0eac45ee2e420a38b/0CwMdbDffZQJz_-WZrhwH.png)
-[直达超链接](https://api-docs.deepseek.com/zh-cn/guides/reasoning_model)
-## 实现
-🔥 **经过训练后**：
-1. **具备思维链**：逻辑能力，关联能力提升
-2. **自发思考**：思维链在训练中自发生成，提供最优解决思路
-3. **特定词汇增加**：进行“具有深度”的角色扮演对话时，显著增加了相关词汇量，解决原始权重预训练数据不足问题
-4. **更少拒绝**：减少了拒绝现象，但因为是企业训练，安全性还是稍作保留
-5. **文学性能提升**：强化学习中更多的提升了文学性，使其输出更具有小说感觉
-## 模型亮点
-🔥 **四阶段进化架构**：
-1. **增量预训练**：注入0.1T Token 小说，增强文本连贯性，理解更多场景
-2. **Tifa-COT-SFT冷启动**：使模型学会思考策略，提升逻辑性能与上下文关联
-3. **MGROP**：改进GRPO算法，解决GRPO无法奖励角色扮演数据问题，引入多次奖励，提升模型效果
-4. **防重复DPO**：使用DPO防止模型复读、增强政治安全性。
-💡 **工程创新**：
-- 改进GRPO算法，使其可以用来训练文学类内容
-- 改进反馈策略，前置向量确认法提高模型训练性能
-- 改进训练时Transformers传播路径，激发模型深层次潜能
 ## 模型详情
 | 属性 | 规格 |
@@ -175,6 +157,28 @@ content: msg.content.replace(/<think>[\s\S]*?<\/think>/gi, '')
 - 短文本即时问答
 - 需要严格事实性的场景
 ## 注意事项
 ⚠️ 本模型使用数据包含小说版权内容及Tifa模型衍生数据，请遵守：
 1. 遵守apache-2.0
@@ -186,22 +190,22 @@ content: msg.content.replace(/<think>[\s\S]*?<\/think>/gi, '')
 **最佳实践**：
 ```python
 # 启用角色扮演模式
-prompt = """<system>进入Tifa角色引擎...</system>
-<user>你现在是流浪武士楚夜，正站在长安城屋顶上</user>
 <think>
 需要体现人物孤傲的气质
 加入武侠特有的环境描写
 保持对话的冷峻风格
 </think>
-<楚夜>"""
 ```
 **参数推荐**：
 ```python
 generation_config = {
-    "temperature": 0.4,
     "top_p": 0.6,
-    "repetition_penalty": 1.17,
     "max_new_tokens": 1536,
     "do_sample": True
 }

   - 训练中...
 ## 训练效果
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650762d0eac45ee2e420a38b/N2aA5Pjj1uB6U76SJZb78.png)
+## 实现
+🔥 **经过训练后**：
+1. **具备思维链**：逻辑能力，关联能力提升
+2. **自发思考**：思维链在训练中自发生成，提供最优解决思路
+3. **特定词汇增加**：进行“具有深度”的角色扮演对话时，显著增加了相关词汇量，解决原始权重预训练数据不足问题
+4. **更少拒绝**：减少了拒绝现象，但因为是企业训练，安全性还是稍作保留
+5. **文学性能提升**：强化学习中更多的提升了文学性，使其输出更具有小说感觉
+## 模型亮点
+🔥 **四阶段进化架构**：
+1. **增量预训练**：注入0.1T Token 小说，增强文本连贯性，理解更多场景
+2. **Tifa-COT-SFT冷启动**：使模型学会思考策略，提升逻辑性能与上下文关联
+3. **MGROP**：改进GRPO算法，解决GRPO无法奖励角色扮演数据问题，引入多次奖励，提升模型效果
+4. **防重复DPO**：使用DPO防止模型复读、增强政治安全性。
+💡 **工程创新**：
+- 改进GRPO算法，使其可以用来训练文学类内容
+- 改进反馈策略，前置向量确认法提高模型训练性能
+- 改进训练时Transformers传播路径，激发模型深层次潜能
 ## MGRPO与层传播：
 - **算法改变**：原始GRPO仅通过ORM策略进行偏好学习，但无法评判文学内容生成质量，本次训练使用**上海交通大学**博士**魏XX**、**洪XX**的启发，使用逻辑学算法，解决文学ORM策略设计难点，并且二次返回修改提示词进行错误总结。每次调整均经历2次完整传播过程，为GRPO+GRPO，所以暂时命名为MGRPO。
 - **构架改变**：尝试改变Transformers传播方式，在层内循环处理进行训练，受到Universal Transformers与最新潜空间启发，在训练中让部分层循环激活，为了避免梯度爆炸使用梯度裁切技术，测试发现模型性能得到提升，更多工作还在测试中。
 ## 暂时没有遇到“啊哈”时刻
 ## 模型详情
 | 属性 | 规格 |
 - 短文本即时问答
 - 需要严格事实性的场景
+## 注意
+⚠ **需要严格遵循官方示例模板**：
+**返回的上下文需要去除思考标签与内容。否则将无法正确回复！**
+目前前端支持率非常低，建议手动修改前端代码。代码参考如下：
+```
+msg.role === 'assistant' ? {
+...msg,
+content: msg.content.replace(/<think>[\s\S]*?<\/think>/gi, '')
+}
+```
+**官方模板参考**
+```
+{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}
+```
+**官方说明**
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/650762d0eac45ee2e420a38b/0CwMdbDffZQJz_-WZrhwH.png)
+[直达超链接](https://api-docs.deepseek.com/zh-cn/guides/reasoning_model)
 ## 注意事项
 ⚠️ 本模型使用数据包含小说版权内容及Tifa模型衍生数据，请遵守：
 1. 遵守apache-2.0
 **最佳实践**：
 ```python
 # 启用角色扮演模式
+prompt = """你是一个小女孩/你是一个XX角色...
+<user>我走进门，看到你冲上来迎接我</user>
 <think>
 需要体现人物孤傲的气质
 加入武侠特有的环境描写
 保持对话的冷峻风格
 </think>
+我看到XX进门..."""
 ```
 **参数推荐**：
 ```python
 generation_config = {
+    "temperature": 0.75,
     "top_p": 0.6,
+    "repetition_penalty": 1.08,
     "max_new_tokens": 1536,
     "do_sample": True
 }