Llama 3.1 8B UltraLong MLX
Collection
4 items
•
Updated
MLX version of Llama 3.1 8B UltraLong 1M Instruct
This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct
using mlx-lm version 0.22.5.
Maximum context window: 1M tokens
For more details, please refer to arXiv.
pip install -U mlx-lm
python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-6bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"