System hardening leads to CVE-2015-3341 and fun with DTrace

Here at FreeAgent, security is a never-ending project. We are never finished, and we are always looking for ways to harden our platform, ensuring data is kept safe. Security should not be taken for granted – it is not just about technical mitigations or fancy enterprise firewalls, equally important is developing the right set of processes and procedures.

One important aspect is testing your security controls. You need to be sure they do what you think they do!

This is how we stumbled on CVE-2015-3341¹, a privilege escalation bug in the Illumos kernel that affected all the derived distributions including SmartOS and OmniOS. I contacted Oracle who confirmed Solaris 10 was unaffected (no pfexecd) and that Solaris 11.2 is fixed.

RBAC, Privileges and Profiles

We were looking to harden our services such that if a malicious attacker could gain control of a process through a vulnerability, their actions
would be limited. This is the general idea behind the least privilege model and not running services as root.

Our issue is that gaining access as a regular user grants too much freedom to an attacker. There are too many lines of code and numerous avenues for an attacker to pivot, or escalate privileges and cause more damage as a regular Unix user.

We wanted to limit an attackers access and scope for further exploitation using RBAC (Role Based Access Control) and Privileges which are built into the Illumos kernel.

RBAC allows you to create roles, roles have authorizations and privileges assigned, and these roles can in turn be assigned to users. Briefly privileges allow fine-grained control over the actions of processes. Essentially breaking up all the superpowers of the root user.

For example there is the PRIV_NET_PRIVADDR privilege which allows a process to bind to a trusted port lower than 1024. There is no need to give a process all of the uid=0 superpowers if it only needs to bind to port 80.

Locking Down

By default a normal Unix user has certain privileges. These are:

file_link_any
file_read
file_write
net_access
proc_exec
proc_fork
proc_info
proc_session

See the man page for privileges on an Illumos-based system for more information.

This set of privileges, among other things, allows a user to call the exec system call to execute binaries. This in turn allows the user to create sockets to access the network and list running processes on the system. If our application process was compromised and the attacker gained a shell, they could launch commands and access the network, generally causing more mischief.

Stripping away some of these privileges in Illumos involves not applying the Basic Solaris User profile. Normally a Unix user can execute any command via the exec system call, but not applying the Basic Solaris User policy prevents this. In fact with no other configuration a user cannot exec any program with the caveat the user is using a profile shell, and/or a process with PRIV_PFEXEC set.

You can see how this could be useful to lock down access. We can launch our application and control not only which binaries it can run, but also what privileges they can run with.

Testing

Without going in to the setup details, we configured a mock application. Then we mimicked a break in and tested it from an attackers perspective. It was during this testing that we discovered it only worked intermittently –
sometimes I would be restricted and couldn’t run even the most basic of commands, other times I had full access.

For example, in the following session we su to a different user with a profile that should disallow almost all exec system calls. The first time it works correctly and we get “Permission denied” errors. The ls and ifconfig commands are denied. The second time we switch to the user, all commands are allowed and we can run ls and ifconfig.

[root@web1-dev ~]# su - deploy
-bash: /opt/local/bin/grep: Permission denied
-bash: /opt/local/bin/id: Permission denied
-bash: [: -eq: unary operator expected
-bash: /opt/local/bin/id: Permission denied
-bash: /opt/local/bin/ls: Permission denied
-bash: /opt/local/bin/id: Permission denied
-bash: /opt/local/bin/id: Permission denied
[deploy@web1-dev(vmlocal):~]$ ls
-bash: /opt/local/bin/ls: Permission denied
[deploy@web1-dev(vmlocal):~]$ ifconfig
-bash: /usr/sbin/ifconfig: Permission denied
[deploy@web1-dev(vmlocal):~]$ exit
logout
[root@web1-dev ~]# su - deploy
-bash: /opt/local/bin/id: Permission denied
[deploy@web1-dev(vmlocal):~]$ ls
[deploy@web1-dev(vmlocal):~]$ ifconfig
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
net0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 2
        inet 192.168.224.203 netmask ffffff00 broadcast 192.168.224.255
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128

Given the lack of clear documentation, this is where we start code-diving and using the awesome tool DTrace to figure out what is happening. Once you start using DTrace, you can’t stop! Every problem looks like a problem ready to be solved with some DTrace foo, it’s totally addictive.

Bug find #1

OK, so I have a hunch at this stage: I must be missing some configuration. I start looking at the code and using DTrace to print out the kernel function calls to get an idea of the code path taken. I want to see if I can figure out what configuration I need to tweak.

It turns out that the kernel exec function executes a function called pfexec_call which calls out to a userspace daemon called pfexecd.

The pfexecd daemon checks the exec_attr database. This is where the configuration that allows or denies the exec and sets the privileges it should run with is stored. Great! Let’s DTrace that then:

!/usr/sbin/dtrace -s

/* Shows the path to a binary passed to pfexec_call in the kernel exec system call and the return code after the call
   Useful when configuring restrictive profiles to see what is allowed/disallowed or debugging RBAC in general */

#pragma D option quiet

fbt::pfexec_call:entry
{
  /* Path passed to pfexec_call */
  printf("nPath: %s ", stringof(args[1]->pn_buf));
}

fbt::pfexec_call:return
{
  /* Path passed to pfexec_call */
  printf("Returned %d", args[1]);
}

Using the above I can see that when things are working as expected, the pfexec_call returns 13 (EACCES). When I can execute binaries that I shouldn’t be able to, it returns -1.

Looking at what actually happens when -1 is returned here we end up falling all the way through the rest of the function and allowing the exec call. -1 is not treated as an error code on which to reject access.

This is consistent with the behaviour we have seen. The return code from pfexec_call is checked and only if it is greater than 0 is permission denied. As can be seen in the following snippet, a return code of -1 does not trigger the permission denied code path.

error = pfexec_call(p->p_cred, &resolvepn, &args.pfcred,
    &args.scrubenv);

/* Returning errno in case we're not allowed to execute. */
if (error > 0) {
    if (dir != NULL)
        VN_RELE(dir);
    pn_free(&resolvepn);
    VN_RELE(vp);
    goto out;
}

pfexec_call is implemented in kpld.c and calls out to the userspace pfexecd daemon. So we know the issue lies somewhere in klpd.c or pfexecd.

Looking at the implementation of pfexec_call, one bug is evident and appears to be a typo. After making the call to pfexecd, the in kernel pfexec_call function checks the reply for sanity, making sure the reply is the expected size and that pointers to the privilege sets
in the reply are at expected offsets.

prp = (pfexec_reply_t *)da.rbuf;
/*
 * Check the size of the result and the alignment of the
 * privilege sets.
 */
if (da.rsize < sizeof (pr) ||
    prp->pfr_ioff > da.rsize - sizeof (priv_set_t) ||
    prp->pfr_loff > da.rsize - sizeof (priv_set_t) ||
    (prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) != 0 ||
    (prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) != 0)
    goto out;

Notice in the second check, prp->pfr_loff is checked twice and prp->pfr_ioff is never checked.

So that is bug number 1, but it is not the cause of our issue.
Without a fix though, a corrupt reply where prp->pfr_ioff is invalid could make it through the sanity check.

Bug find #2

Looking closer at pfexec_call we spot a second problem.
At the beginning of the function, err is initialised to -1:

int pfexec_call(const cred_t *cr, struct pathname *rpnp, cred_t **pfcr,
    boolean_t *scrub)
{
    klpd_reg_t *pfd;
    pfexec_arg_t *pap;
    pfexec_reply_t pr, *prp;
    door_arg_t da;
    int dres;
    cred_t *ncr = NULL;
    int err = -1;
    priv_set_t *iset;
    priv_set_t *lset;
    zone_t *myzone = crgetzone(CRED());
    size_t pasize = PFEXEC_ARG_SIZE(MAXPATHLEN);

If the previous alignment/size checks of the reply from pfexecd fail, in other words we have a corrupt reply from pfexecd, we hit the goto out statement and jump to the end of the function. Returning err, which is still set to the default of -1, it is never set to an error condition and that causes the calling code to reject the exec call.

So bug number 2: a bad pfexecd reply allows access regardless of the configuration.

Again using DTrace I essentially wired up printf‘s to various functions in the pfexec_call before and after this sanity check. Sure enough when access was allowed and it should have been denied, none of the function calls after this sanity check were executed, but calls before the sanity check were. So we are pretty sure we are in fact hitting this goto out statement and not correctly setting an error code.

The remaining question then is why is the pfexecd reply invalid and failing the sanity check?

Bug find #3

I may be missing something here but I couldn’t figure out if it was possible to use DTrace to check the values in this sanity check in the kernel easily. If you know a way, let me know! mdb might have been the way to go.

I ended up applying the following tweak and building a new kernel:

if (da.rsize < sizeof (pr) ||
    prp->pfr_ioff > da.rsize - sizeof (priv_set_t) ||
    prp->pfr_loff > da.rsize - sizeof (priv_set_t) ||
    (prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) != 0 ||
    (prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) != 0) {

        /* DEBUG */
        cmn_err(CE_NOTE, "pfexec_call: Size of alignment of priv set incorrect");
        cmn_err(CE_NOTE, "da.rsize %lu < sizeof(pr) %lu", da.rsize, (unsigned long) sizeof (pr));
        cmn_err(CE_NOTE, "prp->pfr_ioff %u > %lu", prp->pfr_ioff, da.rsize - sizeof (priv_set_t));
        cmn_err(CE_NOTE, "prp->pfr_loff %u > %lu", prp->pfr_loff, da.rsize - sizeof (priv_set_t));
        cmn_err(CE_NOTE, "prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = %lu", (unsigned long) (prp->pfr_loff & (sizeof (priv_chunk_t) - 1)));
        cmn_err(CE_NOTE, "prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = %lu", (unsigned long) (prp->pfr_loff & (sizeof (priv_chunk_t) - 1)));
        goto out;
 }

Now, running through the test again, we get some useful debug output!

2015-04-14T13:39:58.108382+00:00 hyp2 genunix: [ID 560343 kern.notice] NOTICE: pfexec_call: Size of alignment of priv set incorrect
2015-04-14T13:39:58.108414+00:00 hyp2 genunix: [ID 882263 kern.notice] NOTICE: da.rsize 48 < sizeof(pr) 48
2015-04-14T13:39:58.108417+00:00 hyp2 genunix: [ID 931590 kern.notice] NOTICE: prp->pfr_ioff 0 > 36
2015-04-14T13:39:58.108419+00:00 hyp2 genunix: [ID 931638 kern.notice] NOTICE: prp->pfr_loff 48 > 36
2015-04-14T13:39:58.108420+00:00 hyp2 genunix: [ID 369643 kern.notice] NOTICE: prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = 0
2015-04-14T13:39:58.108422+00:00 hyp2 genunix: [ID 369643 kern.notice] NOTICE: prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = 0
2015-04-14T13:39:58.114119+00:00 hyp2 genunix: [ID 560343 kern.notice] NOTICE: pfexec_call: Size of alignment of priv set incorrect
2015-04-14T13:39:58.114150+00:00 hyp2 genunix: [ID 882263 kern.notice] NOTICE: da.rsize 48 < sizeof(pr) 48
2015-04-14T13:39:58.114153+00:00 hyp2 genunix: [ID 931590 kern.notice] NOTICE: prp->pfr_ioff 0 > 36
2015-04-14T13:39:58.114155+00:00 hyp2 genunix: [ID 931638 kern.notice] NOTICE: prp->pfr_loff 48 > 36
2015-04-14T13:39:58.114157+00:00 hyp2 genunix: [ID 369643 kern.notice] NOTICE: prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = 0
2015-04-14T13:39:58.114159+00:00 hyp2 genunix: [ID 369643 kern.notice] NOTICE: prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = 0
2015-04-14T13:39:59.675967+00:00 hyp2 genunix: [ID 560343 kern.notice] NOTICE: pfexec_call: Size of alignment of priv set incorrect
2015-04-14T13:39:59.676006+00:00 hyp2 genunix: [ID 882263 kern.notice] NOTICE: da.rsize 48 < sizeof(pr) 48
2015-04-14T13:39:59.676010+00:00 hyp2 genunix: [ID 931590 kern.notice] NOTICE: prp->pfr_ioff 0 > 36
2015-04-14T13:39:59.676012+00:00 hyp2 genunix: [ID 931638 kern.notice] NOTICE: prp->pfr_loff 48 > 36
2015-04-14T13:39:59.676014+00:00 hyp2 genunix: [ID 369643 kern.notice] NOTICE: prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = 0
2015-04-14T13:39:59.676016+00:00 hyp2 genunix: [ID 369643 kern.notice] NOTICE: prp->pfr_loff & (sizeof (priv_chunk_t) - 1)) = 0

Bingo! It is this sanity check that is failing and causing commands to be executed when they should be allowed. Specifically, we can now see that prp->pfr_loff is incorrect – it is 48 and should never be larger than 36.

Root cause

Digging further, why is the reply from pfexecd malformed?

Following the same process as before I used DTrace to poke the pfexecd daemon to watch execution which helped make sense of the code.
When I knew I was in the right area I made a small patch to pfexecd that printf‘d some useful information.

Previously we had identified that the alignment check was failing because the pfexecd reply contained invalid data. This was compounded by the fact a suitable error code never got set, resulting in the exec being allowed.

Looking at the debug output from the patched pfexecd helps narrow down the problem:

at function start res->pfr_loff 0
at function start res->pfr_allowed B
mysz: 48
at function start res->pfr_loff 48
at function start res->pfr_allowed B
pfr_allowed is false, returning
at ret: res->pfr_loff = 0
at ret: res->pfr_ioff = 0
pfr_allowed is false, returning
at ret: res->pfr_loff = 48
at ret: res->pfr_ioff = 0
mysz: 48
at function start res->pfr_loff 48
at function start res->pfr_allowed B
pfr_allowed is false, returning
at ret: res->pfr_loff = 48
at ret: res->pfr_ioff = 0
mysz: 48

In some of the calls to pfexecd, the function that handles the call from the kernel pfexec_callback, res->pfr_loff is already 48. The invalid value is there before the function really starts to do anything meaningful.

This is the source of the bug. If we look at the start of the function here we can see the memory for the reply is allocated using alloca. The alloca function allocates space in the stack of the caller.

At no point does the code in the function then set res->pfr_loff to any value before returning.

So the issue is that the memory allocated by alloca has not been initialised to sensible defaults, and is essentially full of junk data.

This is the smoking gun and bug number 3.

Fixed

The good news is the issue is now fixed in Illumos and SmartOS. The people at Joyent were incredibly helpful in confirming the issue and getting a patch added to Illumos, so many thanks!

Conclusion

We have some more work to do in this area but it looks like RBAC can be used to harden services against intrusion.

If there is any interest in this we may follow up with a further blog post about it.

This process was a good reminder that whatever security controls you rely on, you should test them, both internally through review and externally through penetration tests. And test regularly! Software is not a static entity, an otherwise innocuous update or feature addition could also add a bug that can be exploited to circumvent your defences.

Trying to find the root cause of the issue validates one of our reasons for selecting SmartOS. That is, a lot of engineering effort has been expended in making the operating system observable and easy to debug (relatively). Great tools like DTrace and the MDB debugger alongside the well-commented code, provide insight when things inevitably go wrong.

The CVE number was requested and allocated to Illumos for the identification and tracking of this issue. It appears the CVE information was never publicly released but has been fixed. ↩

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team