• Varad Gautam's avatar
    ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry · a11ddb37
    Varad Gautam authored
    do_mq_timedreceive calls wq_sleep with a stack local address.  The
    sender (do_mq_timedsend) uses this address to later call pipelined_send.
    
    This leads to a very hard to trigger race where a do_mq_timedreceive
    call might return and leave do_mq_timedsend to rely on an invalid
    address, causing the following crash:
    
      RIP: 0010:wake_q_add_safe+0x13/0x60
      Call Trace:
       __x64_sys_mq_timedsend+0x2a9/0x490
       do_syscall_64+0x80/0x680
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f5928e40343
    
    The race occurs as:
    
    1. do_mq_timedreceive calls wq_sleep with the address of `struct
       ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
       holds a valid `struct ext_wait_queue *` as long as the stack has not
       been overwritten.
    
    2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
       do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
       __pipelined_op.
    
    3. Sender calls __pipelined_op::smp_store_release(&this->state,
       STATE_READY).  Here is where the race window begins.  (`this` is
       `ewq_addr`.)
    
    4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
       will see `state == STATE_READY` and break.
    
    5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
       to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
       stack.  (Although the address may not get overwritten until another
       function happens to touch it, which means it can persist around for an
       indefinite time.)
    
    6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
       `struct ext_wait_queue *`, and uses it to find a task_struct to pass to
       the wake_q_add_safe call.  In the lucky case where nothing has
       overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
       In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
       bogus address as the receiver's task_struct causing the crash.
    
    do_mq_timedsend::__pipelined_op() should not dereference `this` after
    setting STATE_READY, as the receiver counterpart is now free to return.
    Change __pipelined_op to call wake_q_add_safe on the receiver's
    task_struct returned by get_task_struct, instead of dereferencing `this`
    which sits on the receiver's stack.
    
    As Manfred pointed out, the race potentially also exists in
    ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare.  Fix
    those in the same way.
    
    Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
    Fixes: c5b2cbdb ("ipc/mqueue.c: update/document memory barriers")
    Fixes: 8116b54e ("ipc/sem.c: document and update memory barriers")
    Fixes: 0d97a82b ("ipc/msg.c: update and document memory barriers")
    Signed-off-by: default avatarVarad Gautam <varad.gautam@suse.com>
    Reported-by: default avatarMatthias von Faber <matthias.vonfaber@aox-tech.de>
    Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
    Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
    Cc: Christian Brauner <christian.brauner@ubuntu.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a11ddb37
sem.c 62.7 KB