• Waiman Long's avatar
    locking/rwsem: Remove reader optimistic spinning · 617f3ef9
    Waiman Long authored
    Reader optimistic spinning is helpful when the reader critical section
    is short and there aren't that many readers around. It also improves
    the chance that a reader can get the lock as writer optimistic spinning
    disproportionally favors writers much more than readers.
    
    Since commit d3681e26 ("locking/rwsem: Wake up almost all readers
    in wait queue"), all the waiting readers are woken up so that they can
    all get the read lock and run in parallel. When the number of contending
    readers is large, allowing reader optimistic spinning will likely cause
    reader fragmentation where multiple smaller groups of readers can get
    the read lock in a sequential manner separated by writers. That reduces
    reader parallelism.
    
    One possible way to address that drawback is to limit the number of
    readers (preferably one) that can do optimistic spinning. These readers
    act as representatives of all the waiting readers in the wait queue as
    they will wake up all those waiting readers once they get the lock.
    
    Alternatively, as reader optimistic lock stealing has already enhanced
    fairness to readers, it may be easier to just remove reader optimistic
    spinning and simplifying the optimistic spinning code as a result.
    
    Performance measurements (locking throughput kops/s) using a locking
    microbenchmark with 50/50 reader/writer distribution and turbo-boost
    disabled was done on a 2-socket Cascade Lake system (48-core 96-thread)
    to see the impacts of these changes:
    
      1) Vanilla     - 5.10-rc3 kernel
      2) Before      - 5.10-rc3 kernel with previous patches in this series
      2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch
      3) no-rspin    - 5.10-rc3 kernel with reader spinning disabled
    
      # of threads  CS Load   Vanilla  Before   limit-rspin   no-rspin
      ------------  -------   -------  ------   -----------   --------
           2            1      5,185    5,662      5,214       5,077
           4            1      5,107    4,983      5,188       4,760
           8            1      4,782    4,564      4,720       4,628
          16            1      4,680    4,053      4,567       3,402
          32            1      4,299    1,115      1,118       1,098
          64            1      3,218      983      1,001         957
          96            1      1,938      944        957         930
    
           2           20      2,008    2,128      2,264       1,665
           4           20      1,390    1,033      1,046       1,101
           8           20      1,472    1,155      1,098       1,213
          16           20      1,332    1,077      1,089       1,122
          32           20        967      914        917         980
          64           20        787      874        891         858
          96           20        730      836        847         844
    
           2          100        372      356        360         355
           4          100        492      425        434         392
           8          100        533      537        529         538
          16          100        548      572        568         598
          32          100        499      520        527         537
          64          100        466      517        526         512
          96          100        406      497        506         509
    
    The column "CS Load" represents the number of pause instructions issued
    in the locking critical section. A CS load of 1 is extremely short and
    is not likey in real situations. A load of 20 (moderate) and 100 (long)
    are more realistic.
    
    It can be seen that the previous patches in this series have reduced
    performance in general except in highly contended cases with moderate
    or long critical sections that performance improves a bit. This change
    is mostly caused by the "Prevent potential lock starvation" patch that
    reduce reader optimistic spinning and hence reduce reader fragmentation.
    
    The patch that further limit reader optimistic spinning doesn't seem to
    have too much impact on overall performance as shown in the benchmark
    data.
    
    The patch that disables reader optimistic spinning shows reduced
    performance at lightly loaded cases, but comparable or slightly better
    performance on with heavier contention.
    
    This patch just removes reader optimistic spinning for now. As readers
    are not going to do optimistic spinning anymore, we don't need to
    consider if the OSQ is empty or not when doing lock stealing.
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
    Link: https://lkml.kernel.org/r/20201121041416.12285-6-longman@redhat.com
    617f3ef9
lock_events_list.h 3.08 KB