metadata

pipeline_tag: text-generation
base_model: nvidia/Llama-3.1-8B-UltraLong-1M-Instruct
base_model_relation: quantized
tags:
  - chat
  - 6bit
  - apple
  - long-context
license: cc-by-nc-4.0
language:
  - en
  - fr
  - es
  - de
  - it
  - hi
  - ru
library_name: mlx

Llama 3.1 8B UltraLong 1M Instruct 6-bit MLX

MLX version of Llama 3.1 8B UltraLong 1M Instruct

This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct using mlx-lm version 0.22.5.

Model Details

Maximum context window: 1M tokens

For more details, please refer to arXiv.

Use with mlx

pip install -U mlx-lm

python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"