Error on loading in VLLM, what i am doing wrong?

#1
by djdeniro - opened
Loading safetensors checkpoint shards:   7% 2/27 [00:05<01:20,  3.21s/it](Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600] WorkerProc failed to start.
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600] Traceback (most recent call last):
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 574, in worker_main
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     worker = WorkerProc(*args, **kwargs)
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 440, in __init__
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     self.worker.load_model()
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2373, in load_model
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     self.model = model_loader.load_model(
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]                  ^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     self.load_weights(model, model_config)
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     loaded_weights = model.load_weights(
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]                      ^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 701, in load_weights
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     return loader.load_weights(weights)
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     autoloaded_weights = set(self._load_module("", self.module, weights))
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     yield from self._load_module(prefix,
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     loaded_params = module_load_weights(weights)
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 538, in load_weights
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     success = weight_loader(param,
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]               ^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/moe_wna16.py", line 466, in moe_wna16_weight_loader
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     return weight_loader(param,
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]            ^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1396, in weight_loader
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     self._load_model_weight_or_group_weight_scale(
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1073, in _load_model_weight_or_group_weight_scale
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     self._load_w13(shard_id=shard_id,
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1116, in _load_w13
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600]     expert_data.copy_(loaded_weight)
vllm-1  | (Worker_PP0_TP1 pid=418) ERROR 09-13 21:35:07 [multiproc_executor.py:600] RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1
version: '3.8'

services:
  vllm:
    tty: true
    restart: unless-stopped
    ports:
      - 8007:8000
    image: rocm/vllm-dev:nightly_main_20250913
    shm_size: '256g'
    volumes:
     - /mnt/tb_disk/llm:/app/models
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
      - /dev/mem:/dev/mem
    environment:
      - HIP_VISIBLE_DEVICES=1,5,2,3,4,7,6,0
      - VLLM_USE_V1=1
      - VLLM_CUSTOM_OPS=all
      - NCCL_DEBUG=ERROR
      - VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
      - PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
      - VLLM_ROCM_USE_AITER=0
      - NCCL_P2P_DISABLE=1
      - SAFETENSORS_FAST_GPU=1
      - PYTORCH_TUNABLEOP_ENABLED
    command: |
      sh -c '
      vllm serve /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix \
        --gpu-memory-utilization 0.965 \
        --max-model-len 32768  \
        --tensor-parallel-size 2 \
        -pp 4 \
        --enable-auto-tool-choice \
        --disable-log-requests \
        --enable-chunked-prefill \
        --tool-call-parser qwen3_coder   \
        --max-num-seqs 8 \
        --max-num-batched-tokens 4096
       '
volumes: {}
QuantTrio org

I suggest adding the --enable-expert-parallel parameter after --tp 2 --pp 4. This flag enables expert parallelism for MoE. You’re welcome to test it and share your latest results — please include the full log in your feedback.

i put log here on ROCm/vllm github

https://github.com/ROCm/vllm/issues/696

if it possible, please create a GPTQ-INT4 quant without mixing with int8, i hope it will woek well with ROCm, thank you!

 ✔ Container vllm-7-vllm-1  Recreated                                                                                                                  0.3s 
Attaching to vllm-1
INFO 09-18 17:26:29 [__init__.py:216] Automatically detected platform rocm.
WARNING 09-18 17:26:51 [__init__.py:1764] argument '--disable-log-requests' is deprecated and replaced with '--enable-log-requests'. This will be removed in v0.12.0.
(APIServer pid=7) INFO 09-18 17:26:51 [api_server.py:1814] vLLM API server version 0.10.2rc3.dev180+g2a4d6412e
(APIServer pid=7) INFO 09-18 17:26:51 [utils.py:328] non-default args: {'model_tag': '/app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'model': '/app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix', 'trust_remote_code': True, 'max_model_len': 65536, 'served_model_name': ['Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8'], 'tensor_parallel_size': 8, 'gpu_memory_utilization': 0.965, 'max_num_seqs': 8}
(APIServer pid=7) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.

(APIServer pid=7) INFO 09-18 17:27:23 [__init__.py:707] Resolved architecture: Qwen3MoeForCausalLM
(APIServer pid=7) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=7) INFO 09-18 17:27:23 [__init__.py:1766] Using max model len 65536
(APIServer pid=7) INFO 09-18 17:27:23 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 09-18 17:27:29 [__init__.py:216] Automatically detected platform rocm.
(EngineCore_DP0 pid=281) INFO 09-18 17:27:50 [core.py:648] Waiting for init message from front-end.
(EngineCore_DP0 pid=281) INFO 09-18 17:27:50 [core.py:75] Initializing a V1 LLM engine (v0.10.2rc3.dev180+g2a4d6412e) with config: model='/app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix', speculative_config=None, tokenizer='/app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=gptq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":16,"local_cache_dir":null}
(EngineCore_DP0 pid=281) WARNING 09-18 17:27:50 [multiproc_worker_utils.py:273] Reducing Torch parallelism from 128 threads to 1 to avoid unnecess(EngineCore_DP0 pid=281) INFO 09-18 17:27:50 [core.py:75] Initializing a V1 LLM engine (v0.10.2rc3.dev180+g2a4d6412e) with config: model='/app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix', speculative_config=None, tokenizer='/app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=gptq, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":16,"local_cache_dir":null}
(EngineCore_DP0 pid=281) INFO 09-18 17:27:50 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7], buffer_handle=(8, 16777216, 10, 'psm_1eee8047'), local_subscribe_addr='ipc:///tmp/31218087-e190-494b-99f1-e648dcdd3640', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=281) WARNING 09-18 17:27:50 [multiproc_worker_utils.py:273] Reducing Torch parallelism from 128 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 09-18 17:27:55 [__init__.py:216] Automatically detected platform rocm.
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_3891382a'), local_subscribe_addr='ipc:///tmp/57528fd2-f248-448b-8663-2ed5e57cceb9', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_d6b41d99'), local_subscribe_addr='ipc:///tmp/72e938d8-ecef-40fb-98be-b8ec37c6a953', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_ac6d0a6a'), local_subscribe_addr='ipc:///tmp/95dea33a-0b6c-45d6-8e14-5e5b92d322b4', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_8dcfe175'), local_subscribe_addr='ipc:///tmp/dc82cf23-3291-46e2-a3ef-cafef17ee764', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_a8c47ca3'), local_subscribe_addr='ipc:///tmp/004a6afc-84b6-4811-846a-0b5f7c99d48b', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c81f9a48'), local_subscribe_addr='ipc:///tmp/08be96b6-4d62-42c8-a48a-40fc1d2c7662', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_b34f8ba8'), local_subscribe_addr='ipc:///tmp/aa406cb3-bcd1-4ce9-bcf1-85f84cd0c0c6', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:17 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_af8f8b2b'), local_subscribe_addr='ipc:///tmp/f2b0edc8-a99e-4315-9b24-c08c8b306d1b', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:18 [__init__.py:1439] Found nccl from library librccl.so.1
INFO 09-18 17:28:18 [pynccl.py:70] vLLM is using nccl==2.22.3
INFO 09-18 17:28:21 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_118bae48'), local_subscribe_addr='ipc:///tmp/0baccec2-b9f5-4316-9236-f9ce1c657ac4', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-18 17:28:21 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_118bae48'), local_subscribe_addr='ipc:///tmp/0baccec2-b9f5-4316-9236-f9ce1c657ac4', remote_subscribe_addr=None, remote_addr_ipv6=False) Enable Watch
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 1 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 4 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 4, EP rank 4
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 2 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 5 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 5, EP rank 5
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 6 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 6, EP rank 6
INFO 09-18 17:28:21 [parallel_state.py:1206] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7, EP rank 7
(Worker_TP4 pid=421) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP0 pid=417) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP1 pid=418) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP7 pid=424) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP3 pid=420) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP2 pid=419) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP5 pid=422) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP6 pid=423) INFO 09-18 17:28:21 [gpu_model_runner.py:2450] Starting to load model /app/models/models/vllm/Qwen3-235B-A22B-Instruct-2507-GPTQ-Int4-Int8Mix...
(Worker_TP4 pid=421) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP7 pid=424) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP3 pid=420) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP2 pid=419) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP0 pid=417) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP1 pid=418) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP5 pid=422) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP6 pid=423) INFO 09-18 17:28:21 [gpu_model_runner.py:2482] Loading model from scratch...
(Worker_TP4 pid=421) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP4 pid=421) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP7 pid=424) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP7 pid=424) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP3 pid=420) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP3 pid=420) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP2 pid=419) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP2 pid=419) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP5 pid=422) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP5 pid=422) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP1 pid=418) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP1 pid=418) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP6 pid=423) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP6 pid=423) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
(Worker_TP0 pid=417) INFO 09-18 17:28:21 [rocm.py:245] Using Triton Attention backend on V1 engine.
(Worker_TP0 pid=417) INFO 09-18 17:28:21 [triton_attn.py:266] Using vllm unified attention for TritonAttentionImpl
Loading safetensors checkpoint shards:   7% 2/27 [00:22<04:46, 11.48s/it](Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.worker.load_model()
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2483, in load_model
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model = model_loader.load_model(
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.load_weights(model, model_config)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_weights = model.load_weights(
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 702, in load_weights
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return loader.load_weights(weights)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     autoloaded_weights = set(self._load_module("", self.module, weights))
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.worker.load_model()
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     yield from self._load_module(prefix,
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2483, in load_model
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model = model_loader.load_model(
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_params = module_load_weights(weights)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 538, in load_weights
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.load_weights(model, model_config)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     success = weight_loader(param,
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_lo(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_weights = model.load_weights(
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/moe_wna16.py", line 466, in moe_wna16_weight_loader
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return weight_loader(param,
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 702, in load_weights
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return loader.load_weights(weights)
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1451, in weight_loader
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_model_weight_or_group_weight_scale(
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1128, in _load_model_weight_or_group_weight_scale
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.worker.load_model()
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.worker.load_model()
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     autoloaded_weights = set(self._load_module("", self.module, weights))
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_w13(shard_id=shard_id,
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1171, in _load_w13
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     expert_data.copy_(loaded_weight)
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2483, in load_model
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2483, in load_model
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     yield from self._load_module(prefix,
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597] RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model = model_loader.load_model(
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.model = model_loader.load_model(
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_params = module_load_weights(weights)
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.load_weights(model, model_config)
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self.load_weights(model, model_config)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 538, in load_weights
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     success = weight_loader(param,
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_weights = model.load_weights(
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_weights = model.load_weights(
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]               ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/moe_wna16.py", line 466, in moe_wna16_weight_loader
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 702, in load_weights
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 702, in load_weights
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return loader.load_weights(weights)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return weight_loader(param,
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return loader.load_weights(weights)
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/u
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1451, in weight_loader
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     autoloaded_weights = set(self._load_module("", self.module, weights))
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     autoloaded_weights = set(self._load_module("", self.module, weights))
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_model_weight_or_group_weight_scale(
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1128, in _load_model_weight_or_group_weight_scale
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_w13(shard_id=shard_id,
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     yield from self._load_module(prefix,
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     yield from self._load_module(prefix,
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1171, in _load_w13
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     expert_data.copy_(loaded_weight)
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_params = module_load_weights(weights)
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     loaded_params = module_load_weights(weights)
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597] RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 538, in load_weights
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 538, in load_weights
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     success = weight_loader(param,
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     success = weight_loader(param,
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]               ^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]               ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/moe_wna16.py", line 466, in moe_wna16_weight_loader
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/moe_wna16.py", line 466, in moe_wna16_weight_loader
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return weight_loader(param,
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     return weight_loader(param,
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1451, in weight_loader
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1451, in weight_loader
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_model_weight_or_group_weight_scale(
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_model_weight_or_group_weight_scale(
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1128, in _load_model_weight_or_group_weight_scale
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1128, in _load_model_weight_or_group_weight_scale
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_w13(shard_id=shard_id,
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     self._load_w13(shard_id=shard_id,
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1171, in _load_w13
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1171, in _load_w13
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     expert_data.copy_(loaded_weight)
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]     expert_data.copy_(loaded_weight)
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597] RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597] RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1
Loading safetensors checkpoint shards:   7% 2/27 [00:24<05:02, 12.10s/it]
(Worker_TP2 pid=419) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP1 pid=418) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP3 pid=420) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=417) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP6 pid=423) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=417) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
(Worker_TP3 pid=420) ERROR 09-18 17:28:46 [multiproc_executor.py:597]               ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=418) ERROR 09-18 17:28:46 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP2 pid=419) ERROR 09-18 17:28:46 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
(Worker_TP4 pid=421) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP7 pid=424) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP5 pid=422) INFO 09-18 17:28:46 [multiproc_executor.py:558] Parent process exited, terminating worker
[rank0]:[W918 17:28:46.977770380 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712] EngineCore failed to start.
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712] Traceback (most recent call last):
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 703, in run_engine_core
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 502, in __init__
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 81, in __init__
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 55, in __init__
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]     self._init_executor()
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712]     raise e from None
(EngineCore_DP0 pid=281) ERROR 09-18 17:28:50 [core.py:712] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=281) Process EngineCore_DP0:
(EngineCore_DP0 pid=281) Traceback (most recent call last):
(EngineCore_DP0 pid=281)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=281)     self.run()
(EngineCore_DP0 pid=281)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=281)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 716, in run_engine_core
(EngineCore_DP0 pid=281)     raise e
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 703, in run_engine_core
(EngineCore_DP0 pid=281)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=281)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 502, in __init__
(EngineCore_DP0 pid=281)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 81, in __init__
(EngineCore_DP0 pid=281)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=281)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 55, in __init__
(EngineCore_DP0 pid=281)     self._init_executor()
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
(EngineCore_DP0 pid=281)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=281)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=281)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
(EngineCore_DP0 pid=281)     raise e from None
(EngineCore_DP0 pid=281) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=7) Traceback (most recent call last):
(APIServer pid=7)   File "/usr/local/bin/vllm", line 7, in <module>
(APIServer pid=7)     sys.exit(main())
(APIServer pid=7)              ^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=7)     args.dispatch_function(args)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=7)     uvloop.run(run_server(args))
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=7)     return __asyncio.run(
(APIServer pid=7)            ^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=7)     return runner.run(main)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=7)     return await main
(APIServer pid=7)            ^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1859, in run_server
(APIServer pid=7)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1879, in run_server_worker
(APIServer pid=7)     async with build_async_engine_client(
(APIServer pid=7)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=7)     return await anext(self.gen)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)     return runner.run(main)on3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 174, in build_async_engine_client
(APIServer pid=7)     async with build_async_engine_client_from_engine_args(
(APIServer pid=7)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 216, in build_async_engine_client_from_engine_args
(APIServer pid=7)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=7)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 1595, in inner
(APIServer pid=7)     return fn(*args, **kwargs)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=7)     return cls(
(APIServer pid=7)            ^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=7)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=7)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=7)     return AsyncMPClient(*client_args)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=7)     super().__init__(
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=7)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=7)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=7)     next(self.gen)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=7)     wait_for_engine_startup(
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=7)     return self._loop.run_until_complete(task)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 174, in build_async_engine_client
(APIServer pid=7)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=7) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
vllm-1 exited with code 1

I’ve just checked the ROCm/vllm version, and it includes the necessary code for Mixed GPTQ quantization. However, since our team does not have access to AMD ROCm devices, I can only provide the following suggestions:

1.Change HIP_VISIBLE_DEVICES=1,5,2,3,4,7,6,0 to HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7.

2.In vllm serve, use --tp 8 with --enable-expert-parallel instead of -pp 4

Let me know if these steps help

Got same error, i also tried :
-pp 8 -tp 1
-pp 4 -tp 2
-pp 2 -tp 4
-pp 1 -tp 8

Sorting of HIP_VISIBLE_DEVICES helps me make main load into first 2 R9700, but i also change it from your advice.

Loading safetensors checkpoint shards:   7% 2/27 [00:01<00:16,  1.52it/s]
vllm-1  | (Worker_PP4 pid=416) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP2 pid=414) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP3 pid=415) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP6 pid=418) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP0 pid=412) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP1 pid=413) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP5 pid=417) INFO 09-19 10:08:45 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     self.load_weights(model, model_config)
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     loaded_weights = model.load_weights(
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]                      ^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 702, in load_weights
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     return loader.load_weights(weights)
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     autoloaded_weights = set(self._load_module("", self.module, weights))
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     yield from self._load_module(prefix,
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     loaded_params = module_load_weights(weights)
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_moe.py", line 538, in load_weights
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     success = weight_loader(param,
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]               ^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/moe_wna16.py", line 477, in moe_wna16_weight_loader
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     return weight_loader(param,
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1488, in weight_loader
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     self._load_model_weight_or_group_weight_scale(
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1165, in _load_model_weight_or_group_weight_scale
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     self._load_w13(shard_id=shard_id,
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1208, in _load_w13
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597]     expert_data.copy_(loaded_weight)
vllm-1  | (Worker_PP0 pid=412) ERROR 09-19 10:08:45 [multiproc_executor.py:597] RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1

maybe need to setup hidden_act or hidden_size for solving problem.

looks like they try to load hidden_size by 4096 but got 2048 in layer.

Sign up or log in to comment