--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3-0.6B pipeline_tag: text-generation library_name: transformers tags: - moe - qwen3 - code - math - reasoning - medical - instruction - if datasets: - nvidia/OpenCodeReasoning - unsloth/OpenMathReasoning-mini - patrickfleith/instruction-freak-reasoning - FreedomIntelligence/medical-o1-reasoning-SFT - Malikeh1375/medical-question-answering-datasets - Myashka/SO-Python_QA-filtered-2023-no_code-tanh_score - ArdentTJ/t1_daily_conversations --- ![The Imitation Game](qwen3-moe.jpg) "*We are all experts at something, but we’re all also beginners at something else.*" — *The Imitation Game (2014)* # Qwen3 2.4B A0.6B This is a MoE (Mixture of Experts) Qwen3 model which has total number of 2.4B parameters and 0.6B for each of 4 experts. All the expert models can be seen below. This model aims to provide more accurate results with more efficiency and less memory usage! ## Expert Models: ### Key Training Parameters (SFTConfig) * `per_device_train_batch_size = 2` * `gradient_accumulation_steps = 4` * `warmup_steps = 5` * `num_train_epochs = 1` * `learning_rate = 2e-5` * `optim = "adamw_8bit"` * `weight_decay = 0.01` * `seed = 3407` ### Coding: [suayptalha/Qwen3-0.6B-Code-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Code-Expert) This model was fully fine-tuned with BF16 on first 20k rows of `nvidia/OpenCodeReasoning` dataset for 1 epoch. ### Math: [suayptalha/Qwen3-0.6B-Math-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Math-Expert) This model was fully fine-tuned with BF16 on entire `unsloth/OpenMathReasoning-mini` dataset for 1 epoch. ### Medical: [suayptalha/Qwen3-0.6B-Medical-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Medical-Expert) This model was fully fine-tuned with BF16 on first 20k rows of `FreedomIntelligence/medical-o1-reasoning-SFT` dataset for 1 epoch. ### Instruction Following: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) `Qwen/Qwen3-0.6B` model was directly used for this expert, no fine-tune was applied. ## Router Model: The router model can be found [here](https://huggingface.co/suayptalha/MoE-Router-v2) which was trained version of `distilbert/distilbert-base-uncased` on 7 different datasets. ## Usage: ```py import torch from huggingface_hub import snapshot_download from transformers import AutoModelForCausalLM, AutoTokenizer device = torch.device("cuda" if torch.cuda.is_available() else "cpu") local_dir = snapshot_download( repo_id="suayptalha/Qwen3-2.4B-A0.6B", ) model = AutoModelForCausalLM.from_pretrained( local_dir, trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained( local_dir, ) model.to(device) model.eval() prompt = "I have pain in my chest, what should I do?" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) with torch.no_grad(): output_ids = model.generate( text=prompt, max_new_tokens=512, temperature=0.6, top_p=0.95, ) output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(output_text) ``` ## License: This project is licensed under the Apache License 2.0. See the [LICENSE](./LICENSE) file for details. ## Support: Buy Me A Coffee