Custom DeepSeek-R1 (4 Layers)
⚠️ For Testing Purposes Only
This is a modified version of DeepSeek-R1 with random weights, used for architecture experiments.
Key Modifications
- Reduced to 4 layers (original: 32+ layers)
- Contains:
- First 3 layers: MLA (Multi-head Latent Attention)
- Layer 4: MoE (Mixture of Experts)
- All weights randomly initialized (not performance-optimized)
Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support