OS-5886: pollheads are susceptible to use-after-free

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2017-01-06T19:05:09.000Z
Updated at:2022-08-05T18:26:13.230Z

People

Created by:Former user
Reported by:Former user

Description

While investigating various aspects of devpoll, it became clear that certain combinations of behavior against a pollhead are not provably correct. The most simple case of this I could find was pollhead_delete() (for, say, a POLLREMOVE operation) racing against a pollhead_cleanup() and freeing of that same resource.

I wrote up a contrived (and clumsy) test program to induce this behavior:

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/socket.h>
#include <fcntl.h>
#include <poll.h>

/*
 * Compile as: gcc -o pollhead -pthread -lsocket pollhead.c
 */

int pfd, sockfd, started;
struct pollfd pollev;

pthread_mutex_t lock;

void *racer(void *args) {

        started = 1;
        pthread_mutex_lock(&lock);
        pthread_mutex_unlock(&lock);
        pollev.fd = sockfd;
        pollev.events = POLLREMOVE;
        pollev.revents = 0;
        write(pfd, &pollev, sizeof (pollev));
        return (NULL);
}


int main() {
        pthread_t thread_race;

        sockfd = socket(AF_UNIX, SOCK_DGRAM, 0);
        pfd = open("/dev/poll", O_RDWR, 0);

        pollev.fd = sockfd;
        pollev.events = POLLIN;
        pollev.revents = 0;
        write(pfd, &pollev, sizeof (pollev));

        pthread_mutex_init(&lock, NULL);
        pthread_mutex_lock(&lock);
        started = 0;
        pthread_create(&thread_race, NULL, racer, NULL);

        while (!started) {
                usleep(100000);
        }

        pthread_mutex_unlock(&lock);
        close(sockfd);

        pthread_join(thread_race, NULL);
}

With 'kmem_flags/W 2' being executed on kmdb during boot, running the above program via this dtrace (perhaps several times) should induce a panic:

dtrace -w -n 'pollnotify:return /pid == $target/ { trace(timestamp); chill(100000000); } pollhead_delete:entry /pid == $target/ { trace(timestamp); chill(200000000) } pollhead_clean:entry /pid == $target/ { trace(timestamp) } syscall::write:,syscall::close: /pid == $target/ { trace(timestamp) }' -c ./pollhead

The use-after-free is made apparent via the deadbeef-ing:

panic[cpu2]/thread=ffffff02646ca860:
BAD TRAP: type=d (#gp General protection) rp=ffffff0008f28960 addr=ffffff0298b84b78


pollhead:
#gp General protection
addr=0xffffff0298b84b78
pid=10239, pc=0xfffffffffba6ebc4, sp=0xffffff0008f28a50, eflags=0x10212
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: dd2190
cr3: 11b752000
cr8: c

        rdi: fffffffffbcb8720 rsi: ffffff02aa792108 rdx: ffffff02646ca860
        rcx:                8  r8: ffffff024fb15580  r9: ffffff02500f83e0
        rax: deadbeefdeadbeef rbx: deadbeefdeadbf17 rbp: ffffff0008f28a80
        r10: fffffffffb856bf4 r11: ffffff025e180440 r12: ffffff02aa792108
        r13: fffffffffbcb8720 r14: ffffff03c85a39fc r15: ffffff0298b84b78
        fsb:                0 gsb: ffffff024fb2a580  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                d err:                0 rip: fffffffffba6ebc4
         cs:               30 rfl:            10212 rsp: ffffff0008f28a50
         ss:               38

ffffff0008f28840 unix:real_mode_stop_cpu_stage2_end+b1c3 ()
ffffff0008f28950 unix:trap+a70 ()
ffffff0008f28960 unix:_cmntrap+e6 ()
ffffff0008f28a80 genunix:pollhead_delete+54 ()
ffffff0008f28b50 poll:dpwrite+712 ()
ffffff0008f28b80 genunix:cdev_write+2d ()
ffffff0008f28c60 specfs:spec_write+4c1 ()
ffffff0008f28d00 genunix:fop_write+f3 ()
ffffff0008f28dd0 genunix:write+250 ()
ffffff0008f28e00 genunix:write32+1e ()
ffffff0008f28eb0 genunix:dtrace_systrace_syscall32+f5 ()
ffffff0008f28f10 unix:brand_sys_sysenter+1d3 ()

dumping to /dev/zvol/dsk/zones/dump, offset 65536, content: kernel
> $C
ffffff0008f28a80 pollhead_delete+0x54(ffffff029ca31068, ffffff02aa792108)
ffffff0008f28b50 dpwrite+0x712(2c00000000, ffffff0008f28d30, ffffff02633a28d8)
ffffff0008f28b80 cdev_write+0x2d(2c00000000, ffffff0008f28d30, ffffff02633a28d8)
ffffff0008f28c60 spec_write+0x4c1(ffffff02949fb800, ffffff0008f28d30, 0, ffffff02633a28d8, 0)
ffffff0008f28d00 fop_write+0xf3(ffffff02949fb800, ffffff0008f28d30, 0, ffffff02633a28d8, 0)
ffffff0008f28dd0 write+0x250(4, 8061610, 8)
ffffff0008f28e00 write32+0x1e(4, 8061610, 8)
ffffff0008f28eb0 dtrace_systrace_syscall32+0xf5(4, 8061610, 8, 646ca860, 1, 0, 8f28f10, fb8010d6)
ffffff0008f28f10 _sys_sysenter_post_swapgs+0x153()
> pollhead_delete+0x54::dis
pollhead_delete+0x2d:           addq   $0xfffffffffbcb8620,%r13 <plocks>
pollhead_delete+0x34:           movq   %rsi,%r12
pollhead_delete+0x37:           movq   %r13,%rdi
pollhead_delete+0x3a:           call   -0x20d1ff        <mutex_enter>
pollhead_delete+0x3f:           movq   (%rbx),%rax
pollhead_delete+0x42:           testq  %rax,%rax
pollhead_delete+0x45:           jne    +0x16    <pollhead_delete+0x5d>
pollhead_delete+0x47:           jmp    +0x2a    <pollhead_delete+0x73>
pollhead_delete+0x49:           nopl   0x0(%rax)
pollhead_delete+0x50:           leaq   0x28(%rax),%rbx
pollhead_delete+0x54:           movq   0x28(%rax),%rax
pollhead_delete+0x58:           testq  %rax,%rax
pollhead_delete+0x5b:           je     +0x16    <pollhead_delete+0x73>
pollhead_delete+0x5d:           cmpq   %rax,%r12
pollhead_delete+0x60:           jne    -0x12    <pollhead_delete+0x50>
pollhead_delete+0x62:           movq   0x28(%r12),%rax
pollhead_delete+0x67:           movq   %rax,(%rbx)
pollhead_delete+0x6a:           movq   $0x0,0x28(%r12)
pollhead_delete+0x73:           movq   %r13,%rdi
pollhead_delete+0x76:           call   -0x20d11b        <mutex_exit>
pollhead_delete+0x7b:           movq   -0x28(%rbp),%rbx

The issue is how access to the polldat_t are not adequately synchronized. In this instance, the POLLREMOVE action is allowed to progress as far as dereferencing pd_php, passing it to pollhead_delete. It's here where dtrace chill() takes hold, allowing the close(2) of the socket to progress. Since pollhead_delete() hasn't started, the pollhead_clean() operation is allowed to complete (grabbing and releasing the pollhead lock) which removes the polldat_t association from the pollhead and goes on to free the pollhead_t. At this point, the chill() completes and the pollhead_delete() resumes against the now-stale pollhead_t pointer, inducing the panic.

Comments

Comment by Dan McDonald
Created at 2022-07-25T19:13:03.974Z

AKA [https://www.illumos.org/issues/13700|https://www.illumos.org/issues/13700|smart-link]


Comment by Dan McDonald
Created at 2022-08-05T18:26:13.230Z

13700 is now upstream.