Qwen3-Embedding-0.6B ONNX for TEI

This is an ONNX version of Qwen/Qwen3-Embedding-0.6B optimized for Text Embeddings Inference (TEI).

Model Details

  • Base Model: Qwen/Qwen3-Embedding-0.6B
  • Format: ONNX with external data (model.onnx + model.onnx_data)
  • Pooling: Mean pooling (built into the ONNX graph)
  • Embedding Dimension: 1024
  • Max Sequence Length: 32768 tokens

Usage with TEI

docker run --gpus all -p 8080:80 -v $PWD:/data \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id janni-t/qwen3-embedding-0.6b-tei-onnx

For CPU inference:

docker run -p 8080:80 -v $PWD:/data \
  ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
  --model-id janni-t/qwen3-embedding-0.6b-tei-onnx

Conversion Details

This model was converted from the original PyTorch model to ONNX format with:

  • Consolidated external data for TEI compatibility
  • Mean pooling integrated into the ONNX graph
  • Optimized for CPU inference

Original Model

See the original model card at Qwen/Qwen3-Embedding-0.6B for:

  • Model architecture details
  • Training information
  • Benchmark results
  • Citation information

License

Apache 2.0 (same as the original model)

Downloads last month
83
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for janni-t/qwen3-embedding-0.6b-tei-onnx

Quantized
(6)
this model