--- language: en license: apache-2.0 tags: - test - custom-architecture - deepseek --- # Custom DeepSeek-R1 (4 Layers) ⚠️ **For Testing Purposes Only** This is a modified version of DeepSeek-R1 with **random weights**, used for architecture experiments. ## Key Modifications - Reduced to **4 layers** (original: 32+ layers) - Contains: - First 3 layers: **MLA** (Multi-head Latent Attention) - Layer 4: **MoE** (Mixture of Experts) - All weights randomly initialized (not performance-optimized) ## Usage ```python from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")