• Daisuke Nishimura's avatar
    memcg: fix deadlock between cpuset and memcg · dfe076b0
    Daisuke Nishimura authored
    Commit b1dd693e ("memcg: avoid deadlock between move charge and
    try_charge()") can cause another deadlock about mmap_sem on task migration
    if cpuset and memcg are mounted onto the same mount point.
    
    After the commit, cgroup_attach_task() has sequence like:
    
    cgroup_attach_task()
      ss->can_attach()
        cpuset_can_attach()
        mem_cgroup_can_attach()
          down_read(&mmap_sem)        (1)
      ss->attach()
        cpuset_attach()
          mpol_rebind_mm()
            down_write(&mmap_sem)     (2)
            up_write(&mmap_sem)
          cpuset_migrate_mm()
            do_migrate_pages()
              down_read(&mmap_sem)
              up_read(&mmap_sem)
        mem_cgroup_move_task()
          mem_cgroup_clear_mc()
            up_read(&mmap_sem)
    
    We can cause deadlock at (2) because we've already aquire the mmap_sem at (1).
    
    But the commit itself is necessary to fix deadlocks which have existed
    before the commit like:
    
    Ex.1)
                    move charge             |        try charge
      --------------------------------------+------------------------------
        mem_cgroup_can_attach()             |  down_write(&mmap_sem)
          mc.moving_task = current          |    ..
          mem_cgroup_precharge_mc()         |  __mem_cgroup_try_charge()
            mem_cgroup_count_precharge()    |    prepare_to_wait()
              down_read(&mmap_sem)          |    if (mc.moving_task)
              -> cannot aquire the lock     |    -> true
                                            |      schedule()
                                            |      -> move charge should wake it up
    
    Ex.2)
                    move charge             |        try charge
      --------------------------------------+------------------------------
        mem_cgroup_can_attach()             |
          mc.moving_task = current          |
          mem_cgroup_precharge_mc()         |
            mem_cgroup_count_precharge()    |
              down_read(&mmap_sem)          |
              ..                            |
              up_read(&mmap_sem)            |
                                            |  down_write(&mmap_sem)
        mem_cgroup_move_task()              |    ..
          mem_cgroup_move_charge()          |  __mem_cgroup_try_charge()
            down_read(&mmap_sem)            |    prepare_to_wait()
            -> cannot aquire the lock       |    if (mc.moving_task)
                                            |    -> true
                                            |      schedule()
                                            |      -> move charge should wake it up
    
    This patch fixes all of these problems by:
    1. revert the commit.
    2. To fix the Ex.1, we set mc.moving_task after mem_cgroup_count_precharge()
       has released the mmap_sem.
    3. To fix the Ex.2, we use down_read_trylock() instead of down_read() in
       mem_cgroup_move_charge() and, if it has failed to aquire the lock, cancel
       all extra charges, wake up all waiters, and retry trylock.
    Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Reported-by: default avatarBen Blum <bblum@andrew.cmu.edu>
    Cc: Miao Xie <miaox@cn.fujitsu.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Paul Menage <menage@google.com>
    Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    dfe076b0
memcontrol.c 128 KB