Custom DeepSeek-R1 (4 Layers)

⚠️ For Testing Purposes Only
This is a modified version of DeepSeek-R1 with random weights, used for architecture experiments.

Key Modifications

  • Reduced to 4 layers (original: 32+ layers)
  • Contains:
    • First 3 layers: MLA (Multi-head Latent Attention)
    • Layer 4: MoE (Mixture of Experts)
  • All weights randomly initialized (not performance-optimized)

Usage

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")
Downloads last month
4
Safetensors
Model size
15.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support