HectorHe commited on
Commit
4c4f337
·
verified ·
1 Parent(s): c62da6f

End of training

Browse files
Files changed (3) hide show
  1. README.md +3 -1
  2. config.json +1 -1
  3. training.log +2 -0
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
  base_model: Qwen/Qwen1.5-MoE-A2.7B
 
3
  library_name: transformers
4
  model_name: Qwen1.5-MOE-aux-free-sft-math7k-1e-4-gamma
5
  tags:
6
  - generated_from_trainer
 
7
  - trl
8
  - sft
9
  licence: license
@@ -11,7 +13,7 @@ licence: license
11
 
12
  # Model Card for Qwen1.5-MOE-aux-free-sft-math7k-1e-4-gamma
13
 
14
- This model is a fine-tuned version of [Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
 
1
  ---
2
  base_model: Qwen/Qwen1.5-MoE-A2.7B
3
+ datasets: HectorHe/math7k
4
  library_name: transformers
5
  model_name: Qwen1.5-MOE-aux-free-sft-math7k-1e-4-gamma
6
  tags:
7
  - generated_from_trainer
8
+ - open-r1
9
  - trl
10
  - sft
11
  licence: license
 
13
 
14
  # Model Card for Qwen1.5-MOE-aux-free-sft-math7k-1e-4-gamma
15
 
16
+ This model is a fine-tuned version of [Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) on the [HectorHe/math7k](https://huggingface.co/datasets/HectorHe/math7k) dataset.
17
  It has been trained using [TRL](https://github.com/huggingface/trl).
18
 
19
  ## Quick start
config.json CHANGED
@@ -33,7 +33,7 @@
33
  "tie_word_embeddings": false,
34
  "torch_dtype": "bfloat16",
35
  "transformers_version": "4.51.0",
36
- "use_cache": false,
37
  "use_sliding_window": false,
38
  "vocab_size": 151936
39
  }
 
33
  "tie_word_embeddings": false,
34
  "torch_dtype": "bfloat16",
35
  "transformers_version": "4.51.0",
36
+ "use_cache": true,
37
  "use_sliding_window": false,
38
  "vocab_size": 151936
39
  }
training.log CHANGED
@@ -439,3 +439,5 @@ weight_decay=0.0,
439
  2025-09-15 02:07:56 - INFO - __main__ - model.layers.21.mlp: 60 experts, range=[-0.0312, 0.0312]
440
  2025-09-15 02:07:56 - INFO - __main__ - model.layers.22.mlp: 60 experts, range=[-0.0312, 0.0312]
441
  2025-09-15 02:07:56 - INFO - __main__ - model.layers.23.mlp: 60 experts, range=[-0.0312, 0.0312]
 
 
 
439
  2025-09-15 02:07:56 - INFO - __main__ - model.layers.21.mlp: 60 experts, range=[-0.0312, 0.0312]
440
  2025-09-15 02:07:56 - INFO - __main__ - model.layers.22.mlp: 60 experts, range=[-0.0312, 0.0312]
441
  2025-09-15 02:07:56 - INFO - __main__ - model.layers.23.mlp: 60 experts, range=[-0.0312, 0.0312]
442
+ 2025-09-15 02:09:40 - INFO - __main__ - Model saved to /tmp/data/Qwen1.5-MOE/aux_free_sft/math7k/1e-4-gamma
443
+ 2025-09-15 02:09:40 - INFO - __main__ - Pushing to hub...