Ecosyste.ms: Advisories

An open API service providing security vulnerability metadata for many open source software ecosystems.

Security Advisories: GSA_kwCzR0hTQS14cjdyLWY4eHEtdmZ2ds4AA4-u

runc vulnerable to container breakout through process.cwd trickery and leaked fds

Impact

In runc 1.1.11 and earlier, due to an internal file descriptor leak, an attacker could cause a newly-spawned container process (from runc exec) to have a working directory in the host filesystem namespace, allowing for a container escape by giving access to the host filesystem ("attack 2"). The same attack could be used by a malicious image to allow a container process to gain access to the host filesystem through runc run ("attack 1"). Variants of attacks 1 and 2 could be also be used to overwrite semi-arbitrary host binaries, allowing for complete container escapes ("attack 3a" and "attack 3b").

Strictly speaking, while attack 3a is the most severe from a CVSS perspective, attacks 2 and 3b are arguably more dangerous in practice because they allow for a breakout from inside a container as opposed to requiring a user execute a malicious image. The reason attacks 1 and 3a are scored higher is because being able to socially engineer users is treated as a given for UI:R vectors, despite attacks 2 and 3b requiring far more minimal user interaction (just reasonable runc exec operations on a container the attacker has access to). In any case, all four attacks can lead to full control of the host system.

Attack 1: process.cwd "mis-configuration"

In runc 1.1.11 and earlier, several file descriptors were inadvertently leaked internally within runc into runc init, including a handle to the host's /sys/fs/cgroup (this leak was added in v1.0.0-rc93). If the container was configured to have process.cwd set to /proc/self/fd/7/ (the actual fd can change depending on file opening order in runc), the resulting pid1 process will have a working directory in the host mount namespace and thus the spawned process can access the entire host filesystem. This alone is not an exploit against runc, however a malicious image could make any innocuous-looking non-/ path a symlink to /proc/self/fd/7/ and thus trick a user into starting a container whose binary has access to the host filesystem.

Furthermore, prior to runc 1.1.12, runc also did not verify that the final working directory was inside the container's mount namespace after calling chdir(2) (as we have already joined the container namespace, it was incorrectly assumed there would be no way to chdir outside the container after pivot_root(2)).

The CVSS score for this attack is CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:N (8.2, high severity).

Note that this attack requires a privileged user to be tricked into running a malicious container image. It should be noted that when using higher-level runtimes (such as Docker or Kubernetes), this exploit can be considered critical as it can be done remotely by anyone with the rights to start a container image (and can be exploited from within Dockerfiles using ONBUILD in the case of Docker).

Attack 2: runc exec container breakout

(This is a modification of attack 1, constructed to allow for a process inside a container to break out.)

The same fd leak and lack of verification of the working directory in attack 1 also apply to runc exec. If a malicious process inside the container knows that some administrative process will call runc exec with the --cwd argument and a given path, in most cases they can replace that path with a symlink to /proc/self/fd/7/. Once the container process has executed the container binary, PR_SET_DUMPABLE protections no longer apply and the attacker can open /proc/$exec_pid/cwd to get access to the host filesystem.

runc exec defaults to a cwd of / (which cannot be replaced with a symlink), so this attack depends on the attacker getting a user (or some administrative process) to use --cwd and figuring out what path the target working directory is. Note that if the target working directory is a parent of the program binary being executed, the attacker might be unable to replace the path with a symlink (the execve will fail in most cases, unless the host filesystem layout specifically matches the container layout in specific ways and the attacker knows which binary the runc exec is executing).

The CVSS score for this attack is CVSS:3.1/AV:L/AC:H/PR:L/UI:R/S:C/C:H/I:H/A:N (7.2, high severity).

Attacks 3a and 3b: process.args host binary overwrite attack

(These are modifications of attacks 1 and 2, constructed to overwrite a host binary by using execve to bring a magic-link reference into the container.)

Attacks 1 and 2 can be adapted to overwrite a host binary by using a path like /proc/self/fd/7/../../../bin/bash as the process.args binary argument, causing a host binary to be executed by a container process. The /proc/$pid/exe handle can then be used to overwrite the host binary, as seen in CVE-2019-5736 (note that the same #! trick can be used to avoid detection as an attacker). As the overwritten binary could be something like /bin/bash, as soon as a privileged user executes the target binary on the host, the attacker can pivot to gain full access to the host.

For the purposes of CVSS scoring:

As mentioned in attack 1, while 3b is scored lower it is more dangerous in practice as it doesn't require a user to run a malicious image.

Patches

runc 1.1.12 has been released, and includes patches for this issue. Note that there are four separate fixes applied:

Other Runtimes

We have discovered that several other container runtimes are either potentially vulnerable to similar attacks, or do not have sufficient protection against attacks of this nature. We recommend other container runtime authors look at our patches and make sure they at least add a getcwd() != ENOENT check as well as consider whether close_range(3, UINT_MAX, CLOSE_RANGE_CLOEXEC) before executing their equivalent of runc init is appropriate.

Workarounds

For attacks 1 and 2, only permit containers (and runc exec) to use a process.cwd of /. It is not possible for / to be replaced with a symlink (the path is resolved from within the container's mount namespace, and you cannot change the root of a mount namespace or an fs root to a symlink).

For attacks 1 and 3a, only permit users to run trusted images.

For attack 3b, there is no practical workaround other than never using runc exec because any binary you try to execute with runc exec could end up being a malicious binary target.

See Also

Credits

Thanks to Rory McNamara from Snyk for discovering and disclosing the original vulnerability (attack 1) to Docker, @lifubang from acmcoder for discovering how to adapt the attack to overwrite host binaries (attack 3a), and Aleksa Sarai from SUSE for discovering how to adapt the attacks to work as container breakouts using runc exec (attacks 2 and 3b).

Permalink: https://github.com/advisories/GHSA-xr7r-f8xq-vfvv
JSON: https://advisories.ecosyste.ms/api/v1/advisories/GSA_kwCzR0hTQS14cjdyLWY4eHEtdmZ2ds4AA4-u
Source: GitHub Advisory Database
Origin: Unspecified
Severity: High
Classification: General
Published: 12 months ago
Updated: 11 months ago


CVSS Score: 8.6
CVSS vector: CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

EPSS Percentage: 0.01379
EPSS Percentile: 0.86004

Identifiers: GHSA-xr7r-f8xq-vfvv, CVE-2024-21626
References: Repository: https://github.com/opencontainers/runc
Blast Radius: 38.1

Affected Packages

go:github.com/opencontainers/runc
Dependent packages: 7,425
Dependent repositories: 27,022
Downloads:
Affected Version Ranges: >= 1.0.0-rc93, <= 1.1.11
Fixed in: 1.1.12
All affected versions: 1.0.0, 1.0.0-rc93, 1.0.0-rc94, 1.0.0-rc95, 1.0.1, 1.0.2, 1.0.3, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.1.9, 1.1.10, 1.1.11
All unaffected versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.1.0, 0.1.1, 1.1.12, 1.1.13, 1.1.14, 1.1.15, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4