Llama 3.1 8B UltraLong 1M Instruct 6-bit MLX

MLX version of Llama 3.1 8B UltraLong 1M Instruct

This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct using mlx-lm version 0.22.5.

Model Details

Maximum context window: 1M tokens

For more details, please refer to arXiv.

Use with mlx

pip install -U mlx-lm
python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"
Downloads last month
21
Safetensors
Model size
1.76B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit

Quantized
(17)
this model

Collection including TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit