• Douglas Anderson's avatar
    USB: core: Avoid race of async_completed() w/ usbdev_release() · 2ca01b2e
    Douglas Anderson authored
    BugLink: http://bugs.launchpad.net/bugs/1718195
    
    commit ed62ca2f upstream.
    
    While running reboot tests w/ a specific set of USB devices (and
    slub_debug enabled), I found that once every few hours my device would
    be crashed with a stack that looked like this:
    
    [   14.012445] BUG: spinlock bad magic on CPU#0, modprobe/2091
    [   14.012460]  lock: 0xffffffc0cb055978, .magic: ffffffc0, .owner: cryption contexts: %lu/%lu
    [   14.012460] /1025536097, .owner_cpu: 0
    [   14.012466] CPU: 0 PID: 2091 Comm: modprobe Not tainted 4.4.79 #352
    [   14.012468] Hardware name: Google Kevin (DT)
    [   14.012471] Call trace:
    [   14.012483] [<....>] dump_backtrace+0x0/0x160
    [   14.012487] [<....>] show_stack+0x20/0x28
    [   14.012494] [<....>] dump_stack+0xb4/0xf0
    [   14.012500] [<....>] spin_dump+0x8c/0x98
    [   14.012504] [<....>] spin_bug+0x30/0x3c
    [   14.012508] [<....>] do_raw_spin_lock+0x40/0x164
    [   14.012515] [<....>] _raw_spin_lock_irqsave+0x64/0x74
    [   14.012521] [<....>] __wake_up+0x2c/0x60
    [   14.012528] [<....>] async_completed+0x2d0/0x300
    [   14.012534] [<....>] __usb_hcd_giveback_urb+0xc4/0x138
    [   14.012538] [<....>] usb_hcd_giveback_urb+0x54/0xf0
    [   14.012544] [<....>] xhci_irq+0x1314/0x1348
    [   14.012548] [<....>] usb_hcd_irq+0x40/0x50
    [   14.012553] [<....>] handle_irq_event_percpu+0x1b4/0x3f0
    [   14.012556] [<....>] handle_irq_event+0x4c/0x7c
    [   14.012561] [<....>] handle_fasteoi_irq+0x158/0x1c8
    [   14.012564] [<....>] generic_handle_irq+0x30/0x44
    [   14.012568] [<....>] __handle_domain_irq+0x90/0xbc
    [   14.012572] [<....>] gic_handle_irq+0xcc/0x18c
    
    Investigation using kgdb() found that the wait queue that was passed
    into wake_up() had been freed (it was filled with slub_debug poison).
    
    I analyzed and instrumented the code and reproduced.  My current
    belief is that this is happening:
    
    1. async_completed() is called (from IRQ).  Moves "as" onto the
       completed list.
    2. On another CPU, proc_reapurbnonblock_compat() calls
       async_getcompleted().  Blocks on spinlock.
    3. async_completed() releases the lock; keeps running; gets blocked
       midway through wake_up().
    4. proc_reapurbnonblock_compat() => async_getcompleted() gets the
       lock; removes "as" from completed list and frees it.
    5. usbdev_release() is called.  Frees "ps".
    6. async_completed() finally continues running wake_up().  ...but
       wake_up() has a pointer to the freed "ps".
    
    The instrumentation that led me to believe this was based on adding
    some trace_printk() calls in a select few functions and then using
    kdb's "ftdump" at crash time.  The trace follows (NOTE: in the trace
    below I cheated a little bit and added a udelay(1000) in
    async_completed() after releasing the spinlock because I wanted it to
    trigger quicker):
    
    <...>-2104   0d.h2 13759034us!: async_completed at start: as=ffffffc0cc638200
    mtpd-2055    3.... 13759356us : async_getcompleted before spin_lock_irqsave
    mtpd-2055    3d..1 13759362us : async_getcompleted after list_del_init: as=ffffffc0cc638200
    mtpd-2055    3.... 13759371us+: proc_reapurbnonblock_compat: free_async(ffffffc0cc638200)
    mtpd-2055    3.... 13759422us+: async_getcompleted before spin_lock_irqsave
    mtpd-2055    3.... 13759479us : usbdev_release at start: ps=ffffffc0cc042080
    mtpd-2055    3.... 13759487us : async_getcompleted before spin_lock_irqsave
    mtpd-2055    3.... 13759497us!: usbdev_release after kfree(ps): ps=ffffffc0cc042080
    <...>-2104   0d.h2 13760294us : async_completed before wake_up(): as=ffffffc0cc638200
    
    To fix this problem we can just move the wake_up() under the ps->lock.
    There should be no issues there that I'm aware of.
    Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
    Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
    2ca01b2e
devio.c 59.7 KB