• Uladzislau Rezki (Sony)'s avatar
    mm: vmalloc: remove a global vmap_blocks xarray · 062eacf5
    Uladzislau Rezki (Sony) authored
    A global vmap_blocks-xarray array can be contented under heavy usage of
    the vm_map_ram()/vm_unmap_ram() APIs.  The lock_stat shows that a
    "vmap_blocks.xa_lock" lock is a second in a top-list when it comes to
    contentions:
    
    <snip>
    ----------------------------------------
    class name con-bounces contentions ...
    ----------------------------------------
    vmap_area_lock:         2554079 2554276 ...
      --------------
      vmap_area_lock        1297948  [<00000000dd41cbaa>] alloc_vmap_area+0x1c7/0x910
      vmap_area_lock        1256330  [<000000009d927bf3>] free_vmap_block+0x4a/0xe0
      vmap_area_lock              1  [<00000000c95c05a7>] find_vm_area+0x16/0x70
      --------------
      vmap_area_lock        1738590  [<00000000dd41cbaa>] alloc_vmap_area+0x1c7/0x910
      vmap_area_lock         815688  [<000000009d927bf3>] free_vmap_block+0x4a/0xe0
      vmap_area_lock              1  [<00000000c1d619d7>] __get_vm_area_node+0xd2/0x170
    
    vmap_blocks.xa_lock:    862689  862698 ...
      -------------------
      vmap_blocks.xa_lock   378418    [<00000000625a5626>] vm_map_ram+0x359/0x4a0
      vmap_blocks.xa_lock   484280    [<00000000caa2ef03>] xa_erase+0xe/0x30
      -------------------
      vmap_blocks.xa_lock   576226    [<00000000caa2ef03>] xa_erase+0xe/0x30
      vmap_blocks.xa_lock   286472    [<00000000625a5626>] vm_map_ram+0x359/0x4a0
    ...
    <snip>
    
    that is a result of running vm_map_ram()/vm_unmap_ram() in
    a loop. The test creates 64(on 64 CPUs system) threads and
    each one maps/unmaps 1 page.
    
    After this change the "xa_lock" can be considered as a noise
    in the same test condition:
    
    <snip>
    ...
    &xa->xa_lock#1:         10333 10394 ...
      --------------
      &xa->xa_lock#1        5349      [<00000000bbbc9751>] xa_erase+0xe/0x30
      &xa->xa_lock#1        5045      [<0000000018def45d>] vm_map_ram+0x3a4/0x4f0
      --------------
      &xa->xa_lock#1        7326      [<0000000018def45d>] vm_map_ram+0x3a4/0x4f0
      &xa->xa_lock#1        3068      [<00000000bbbc9751>] xa_erase+0xe/0x30
    ...
    <snip>
    
    Running the test_vmalloc.sh run_test_mask=1024 nr_threads=64 nr_pages=5
    shows around ~8 percent of throughput improvement of vm_map_ram() and
    vm_unmap_ram() APIs.
    
    This patch does not fix vmap_area_lock/free_vmap_area_lock and
    purge_vmap_area_lock bottle-necks, it is rather a separate rework.
    
    Link: https://lkml.kernel.org/r/20230330190639.431589-1-urezki@gmail.comSigned-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
    Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
    Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    062eacf5
vmalloc.c 114 KB