• Roman Gushchin's avatar
    mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release · b749ecfa
    Roman Gushchin authored
    Karsten reported the following panic in __free_slab() happening on a s390x
    machine:
    
      Unable to handle kernel pointer dereference in virtual kernel address space
      Failing address: 0000000000000000 TEID: 0000000000000483
      Fault in home space mode while using kernel ASCE.
      AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
      Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
      Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
      Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
      Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
      Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
                 0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
                 0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
                 0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
                 000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
      Krnl Code: 00000000003cada6: e310a1400004        lg      %r1,320(%r10)
                 00000000003cadac: c0e50046c286        brasl   %r14,ca32b8
                #00000000003cadb2: a7f4fe36            brc     15,3caa1e
                >00000000003cadb6: e32060800024        stg     %r2,128(%r6)
                 00000000003cadbc: a7f4fd9e            brc     15,3ca8f8
                 00000000003cadc0: c0e50046790c        brasl   %r14,c99fd8
                 00000000003cadc6: a7f4fe2c            brc     15,3caa
                 00000000003cadc6: a7f4fe2c            brc     15,3caa1e
                 00000000003cadca: ecb1ffff00d9        aghik   %r11,%r1,-1
      Call Trace:
      (<00000000003cabcc> __free_slab+0x49c/0x6b0)
       <00000000001f5886> rcu_core+0x5a6/0x7e0
       <0000000000ca2dea> __do_softirq+0xf2/0x5c0
       <0000000000152644> irq_exit+0x104/0x130
       <000000000010d222> do_IRQ+0x9a/0xf0
       <0000000000ca2344> ext_int_handler+0x130/0x134
       <0000000000103648> enabled_wait+0x58/0x128
      (<0000000000103634> enabled_wait+0x44/0x128)
       <0000000000103b00> arch_cpu_idle+0x40/0x58
       <0000000000ca0544> default_idle_call+0x3c/0x68
       <000000000018eaa4> do_idle+0xec/0x1c0
       <000000000018ee0e> cpu_startup_entry+0x36/0x40
       <000000000122df34> arch_call_rest_init+0x5c/0x88
       <0000000000000000> 0x0
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       <00000000003ca8f4> __free_slab+0x1c4/0x6b0
      Kernel panic - not syncing: Fatal exception in interrupt
    
    The kernel panics on an attempt to dereference the NULL memcg pointer.
    When shutdown_cache() is called from the kmem_cache_destroy() context, a
    memcg kmem_cache might have empty slab pages in a partial list, which are
    still charged to the memory cgroup.
    
    These pages are released by free_partial() at the beginning of
    shutdown_cache(): either directly or by scheduling a RCU-delayed work
    (if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag).  The latter case
    is when the reported panic can happen: memcg_unlink_cache() is called
    immediately after shrinking partial lists, without waiting for scheduled
    RCU works.  It sets the kmem_cache->memcg_params.memcg pointer to NULL,
    and the following attempt to dereference it by __free_slab() from the
    RCU work context causes the panic.
    
    To fix the issue, let's postpone the release of the memcg pointer to
    destroy_memcg_params().  It's called from a separate work context by
    slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
    This guarantees that all scheduled page release RCU works will complete
    before the memcg pointer will be zeroed.
    
    Big thanks for Karsten for the perfect report containing all necessary
    information, his help with the analysis of the problem and testing of the
    fix.
    
    Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
    Fixes: fb2f2b0a ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
    Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
    Reported-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
    Tested-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Cc: Karsten Graul <kgraul@linux.ibm.com>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b749ecfa
slab_common.c 44.1 KB