TheCluster
/

Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit

Text Generation

Model card Files Files and versions Community

Llama 3.1 8B UltraLong 1M Instruct 6-bit MLX

MLX version of Llama 3.1 8B UltraLong 1M Instruct

This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct using mlx-lm version 0.22.5.

Model Details

Maximum context window: 1M tokens

For more details, please refer to arXiv.

Use with mlx

pip install -U mlx-lm

python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"

Downloads last month: 21

Safetensors

Model size

1.76B params

Tensor type

BF16

·

U32

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit

Base model

nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct

Quantized

(17)

this model

Collection including TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit

Llama 3.1 8B UltraLong MLX

4 items • Updated 23 days ago