llama32-3b-instruct / runner.sh

Commit History

fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
78963b9

yusufs commited on

fix(using sail/Sailor2-3B-Chat): sail/Sailor2-3B-Chat
8132d1f

yusufs commited on

docs(add-comment): add comment
22ac900

yusufs commited on

feat(runner.sh): DeepSeek-R1-Distill-Qwen-32B d66bcfc2f3fd52799f95943264f32ba15ca0003d
148829b

yusufs commited on

feat(runner.sh): --trust-remote-code
1530e6e

yusufs commited on

feat(runner.sh): add deepseek-ai/DeepSeek-R1 and deepseek-ai/DeepSeek-V3
57f9fa5

yusufs commited on

feat(runner.sh): only enable prefix caching and disable log request
c0cde8e

yusufs commited on

feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate
8c5a84b

yusufs commited on

fix(runner.sh): enable eager mode (disabling cuda graph)
5bd7bc7

yusufs commited on

fix(runner.sh): --enforce-eager not support values
cb15911

yusufs commited on

fix(runner.sh): explicitly disabling enforce_eager
266e7dd

yusufs commited on

fix(runner.sh): disable eager-loading so it using cuda graph (in order for parallel and faster processing)
6bb48e9

yusufs commited on

feat(runner.sh): add specific task and code revision
dc19c1d

yusufs commited on

feat(runner.sh): using MODEL_ID only
490e6a3

yusufs commited on

feat(runner.sh): using runner.sh to select llm in the run time
69c6372

yusufs commited on