custom-deepseek-r1-4L / README.md

Create README.md

7ef0bab verified 3 months ago

659 Bytes

metadata

language: en
license: apache-2.0
tags:
  - test
  - custom-architecture
  - deepseek

Custom DeepSeek-R1 (4 Layers)

⚠️ For Testing Purposes Only
This is a modified version of DeepSeek-R1 with random weights, used for architecture experiments.

Key Modifications

Reduced to 4 layers (original: 32+ layers)
Contains:
- First 3 layers: MLA (Multi-head Latent Attention)
- Layer 4: MoE (Mixture of Experts)
All weights randomly initialized (not performance-optimized)

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")