---
language: en
license: apache-2.0
tags:
- test
- custom-architecture
- deepseek
---

# Custom DeepSeek-R1 (4 Layers)

⚠️ **For Testing Purposes Only**  
This is a modified version of DeepSeek-R1 with **random weights**, used for architecture experiments.

## Key Modifications
- Reduced to **4 layers** (original: 32+ layers)
- Contains:
  - First 3 layers: **MLA** (Multi-head Latent Attention)
  - Layer 4: **MoE** (Mixture of Experts)
- All weights randomly initialized (not performance-optimized)

## Usage
```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")