metadata
pipeline_tag: text-generation
base_model: nvidia/Llama-3.1-8B-UltraLong-1M-Instruct
base_model_relation: quantized
tags:
- chat
- 6bit
- apple
- long-context
license: cc-by-nc-4.0
language:
- en
- fr
- es
- de
- it
- hi
- ru
library_name: mlx
Llama 3.1 8B UltraLong 1M Instruct 6-bit MLX
MLX version of Llama 3.1 8B UltraLong 1M Instruct
This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct
using mlx-lm version 0.22.5.
Model Details
Maximum context window: 1M tokens
For more details, please refer to arXiv.
Use with mlx
pip install -U mlx-lm
python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"