• Yang Shi's avatar
    mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct · 88aa7cc6
    Yang Shi authored
    mmap_sem is on the hot path of kernel, and it very contended, but it is
    abused too.  It is used to protect arg_start|end and evn_start|end when
    reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
    sense since those proc files just expect to read 4 values atomically and
    not related to VM, they could be set to arbitrary values by C/R.
    
    And, the mmap_sem contention may cause unexpected issue like below:
    
    INFO: task ps:14018 blocked for more than 120 seconds.
           Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
     ps              D    0 14018      1 0x00000004
     Call Trace:
       schedule+0x36/0x80
       rwsem_down_read_failed+0xf0/0x150
       call_rwsem_down_read_failed+0x18/0x30
       down_read+0x20/0x40
       proc_pid_cmdline_read+0xd9/0x4e0
       __vfs_read+0x37/0x150
       vfs_read+0x96/0x130
       SyS_read+0x55/0xc0
       entry_SYSCALL_64_fastpath+0x1a/0xc5
    
    Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
    for them to mitigate the abuse of mmap_sem.
    
    So, introduce a new spinlock in mm_struct to protect the concurrent
    access to arg_start|end, env_start|end and others, as well as replace
    write map_sem to read to protect the race condition between prctl and
    sys_brk which might break check_data_rlimit(), and makes prctl more
    friendly to other VM operations.
    
    This patch just eliminates the abuse of mmap_sem, but it can't resolve
    the above hung task warning completely since the later
    access_remote_vm() call needs acquire mmap_sem.  The mmap_sem
    scalability issue will be solved in the future.
    
    [yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
      Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
    Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
    Reviewed-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Alexey Dobriyan <adobriyan@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mateusz Guzik <mguzik@redhat.com>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    88aa7cc6
base.c 85.6 KB