• Roman Gushchin's avatar
    mm: memcg: synchronize objcg lists with a dedicated spinlock · 0764db9b
    Roman Gushchin authored
    Alexander reported a circular lock dependency revealed by the mmap1 ltp
    test:
    
      LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
              WARNING: possible circular locking dependency detected
              5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
              ------------------------------------------------------
              mmap1/202299 is trying to acquire lock:
              00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
              but task is already holding lock:
              00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
              which lock already depends on the new lock.
              the existing dependency chain (in reverse order) is:
              -> #1 (&sighand->siglock){-.-.}-{2:2}:
                     __lock_acquire+0x604/0xbd8
                     lock_acquire.part.0+0xe2/0x238
                     lock_acquire+0xb0/0x200
                     _raw_spin_lock_irqsave+0x6a/0xd8
                     __lock_task_sighand+0x90/0x190
                     cgroup_freeze_task+0x2e/0x90
                     cgroup_migrate_execute+0x11c/0x608
                     cgroup_update_dfl_csses+0x246/0x270
                     cgroup_subtree_control_write+0x238/0x518
                     kernfs_fop_write_iter+0x13e/0x1e0
                     new_sync_write+0x100/0x190
                     vfs_write+0x22c/0x2d8
                     ksys_write+0x6c/0xf8
                     __do_syscall+0x1da/0x208
                     system_call+0x82/0xb0
              -> #0 (css_set_lock){..-.}-{2:2}:
                     check_prev_add+0xe0/0xed8
                     validate_chain+0x736/0xb20
                     __lock_acquire+0x604/0xbd8
                     lock_acquire.part.0+0xe2/0x238
                     lock_acquire+0xb0/0x200
                     _raw_spin_lock_irqsave+0x6a/0xd8
                     obj_cgroup_release+0x4a/0xe0
                     percpu_ref_put_many.constprop.0+0x150/0x168
                     drain_obj_stock+0x94/0xe8
                     refill_obj_stock+0x94/0x278
                     obj_cgroup_charge+0x164/0x1d8
                     kmem_cache_alloc+0xac/0x528
                     __sigqueue_alloc+0x150/0x308
                     __send_signal+0x260/0x550
                     send_signal+0x7e/0x348
                     force_sig_info_to_task+0x104/0x180
                     force_sig_fault+0x48/0x58
                     __do_pgm_check+0x120/0x1f0
                     pgm_check_handler+0x11e/0x180
              other info that might help us debug this:
               Possible unsafe locking scenario:
                     CPU0                    CPU1
                     ----                    ----
                lock(&sighand->siglock);
                                             lock(css_set_lock);
                                             lock(&sighand->siglock);
                lock(css_set_lock);
               *** DEADLOCK ***
              2 locks held by mmap1/202299:
               #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
               #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
              stack backtrace:
              CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
              Hardware name: IBM 3906 M04 704 (LPAR)
              Call Trace:
                dump_stack_lvl+0x76/0x98
                check_noncircular+0x136/0x158
                check_prev_add+0xe0/0xed8
                validate_chain+0x736/0xb20
                __lock_acquire+0x604/0xbd8
                lock_acquire.part.0+0xe2/0x238
                lock_acquire+0xb0/0x200
                _raw_spin_lock_irqsave+0x6a/0xd8
                obj_cgroup_release+0x4a/0xe0
                percpu_ref_put_many.constprop.0+0x150/0x168
                drain_obj_stock+0x94/0xe8
                refill_obj_stock+0x94/0x278
                obj_cgroup_charge+0x164/0x1d8
                kmem_cache_alloc+0xac/0x528
                __sigqueue_alloc+0x150/0x308
                __send_signal+0x260/0x550
                send_signal+0x7e/0x348
                force_sig_info_to_task+0x104/0x180
                force_sig_fault+0x48/0x58
                __do_pgm_check+0x120/0x1f0
                pgm_check_handler+0x11e/0x180
              INFO: lockdep is turned off.
    
    In this example a slab allocation from __send_signal() caused a
    refilling and draining of a percpu objcg stock, resulted in a releasing
    of another non-related objcg.  Objcg release path requires taking the
    css_set_lock, which is used to synchronize objcg lists.
    
    This can create a circular dependency with the sighandler lock, which is
    taken with the locked css_set_lock by the freezer code (to freeze a
    task).
    
    In general it seems that using css_set_lock to synchronize objcg lists
    makes any slab allocations and deallocation with the locked css_set_lock
    and any intervened locks risky.
    
    To fix the problem and make the code more robust let's stop using
    css_set_lock to synchronize objcg lists and use a new dedicated spinlock
    instead.
    
    Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
    Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
    Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
    Reported-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
    Tested-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
    Reviewed-by: default avatarWaiman Long <longman@redhat.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Reviewed-by: default avatarJeremy Linton <jeremy.linton@arm.com>
    Tested-by: default avatarJeremy Linton <jeremy.linton@arm.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0764db9b
memcontrol.c 192 KB