CVE-2026-53923 - Vulnerability Details

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Attack Vector Network

Attack Complexity Low

Privileges Required None

Attack Requirements None

User Interaction Passive

Vulnerable System Confidentiality Impact Low

Vulnerable System Integrity Impact Low

Vulnerable System Availability Impact None

Subsequent System Confidentiality Impact None

Subsequent System Integrity Impact None

Subsequent System Availability Impact None

No CVSS v3.1

No CVSS v3.0

No CVSS v2

This CVE is not in the KEV list.

No EPSS score available.

Key SSVC decision points have not yet been added.

Default status is the baseline for the product, each version can override it (e.g. patched versions marked unaffected).

Vendor Product Default status Versions

vllm-project

vllm

affected

Version	Status	Constraints
`>= 0.5.5, < 0.23.1rc0`	affected	—

No data.

OpenCVE Enrichment is a feature of OpenCVE that uses AI to automatically link vendors and products to CVEs. Learn more on GitHub.

No data.

Project Subscriptions

No data.

Advisories

Source	ID	Title
Github GHSA	GHSA-5jv2-g5wq-cmr4	vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving

Fixes

Solution

No solution given by the vendor.

Workaround

No workaround given by the vendor.

References

Link	Providers
https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e
https://github.com/vllm-project/vllm/pull/44971
https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4

History

Mon, 22 Jun 2026 22:45:00 +0000

Type	Values Removed	Values Added
Description		vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Title		vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Weaknesses		CWE-200 CWE-681
References		https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e https://github.com/vllm-project/vllm/pull/44971 https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
Metrics		cvssV4_0 `{'score': 5.3, 'vector': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N'}`

Projects

Sign in to view the affected projects.

MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published: 2026-06-22T21:55:42.001Z

Updated: 2026-06-22T21:55:42.001Z

Reserved: 2026-06-11T15:46:12.316Z

Link: CVE-2026-53923

Vulnrichment

No data.

NVD

No data.

Redhat

No data.

OpenCVE Enrichment

Updated: 2026-06-22T23:30:05Z

Weaknesses

Attack Vector Network

Attack Complexity Low

Privileges Required None

Attack Requirements None

User Interaction Passive

Vulnerable System Confidentiality Impact Low

Vulnerable System Integrity Impact Low

Vulnerable System Availability Impact None

Subsequent System Confidentiality Impact None

Subsequent System Integrity Impact None

Subsequent System Availability Impact None

Project Subscriptions

Projects

JSON object

JSON object

JSON object

JSON object

JSON object