Qwen3-Embedding-0.6B ONNX for TEI
This is an ONNX version of Qwen/Qwen3-Embedding-0.6B optimized for Text Embeddings Inference (TEI).
Model Details
- Base Model: Qwen/Qwen3-Embedding-0.6B
- Format: ONNX with external data (
model.onnx
+model.onnx_data
) - Pooling: Mean pooling (built into the ONNX graph)
- Embedding Dimension: 1024
- Max Sequence Length: 32768 tokens
Usage with TEI
docker run --gpus all -p 8080:80 -v $PWD:/data \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id janni-t/qwen3-embedding-0.6b-tei-onnx
For CPU inference:
docker run -p 8080:80 -v $PWD:/data \
ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
--model-id janni-t/qwen3-embedding-0.6b-tei-onnx
Conversion Details
This model was converted from the original PyTorch model to ONNX format with:
- Consolidated external data for TEI compatibility
- Mean pooling integrated into the ONNX graph
- Optimized for CPU inference
Original Model
See the original model card at Qwen/Qwen3-Embedding-0.6B for:
- Model architecture details
- Training information
- Benchmark results
- Citation information
License
Apache 2.0 (same as the original model)
- Downloads last month
- 83
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support