fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half. 78963b9 yusufs commited on Apr 16
feat(runner.sh): DeepSeek-R1-Distill-Qwen-32B d66bcfc2f3fd52799f95943264f32ba15ca0003d 148829b yusufs commited on Jan 28
feat(runner.sh): add deepseek-ai/DeepSeek-R1 and deepseek-ai/DeepSeek-V3 57f9fa5 yusufs commited on Jan 28
feat(runner.sh): only enable prefix caching and disable log request c0cde8e yusufs commited on Jan 28
feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate 8c5a84b yusufs commited on Jan 28
fix(runner.sh): disable eager-loading so it using cuda graph (in order for parallel and faster processing) 6bb48e9 yusufs commited on Jan 20
feat(runner.sh): using runner.sh to select llm in the run time 69c6372 yusufs commited on Dec 26, 2024