• Linus Torvalds's avatar
    Merge tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux · 4de520f1
    Linus Torvalds authored
    Pull io_uring futex support from Jens Axboe:
     "This adds support for using futexes through io_uring - first futex
      wake and wait, and then the vectored variant of waiting, futex waitv.
    
      For both wait/wake/waitv, we support the bitset variant, as the
      'normal' variants can be easily implemented on top of that.
    
      PI and requeue are not supported through io_uring, just the above
      mentioned parts. This may change in the future, but in the spirit of
      keeping this small (and based on what people have been asking for),
      this is what we currently have.
    
      Wake support is pretty straight forward, most of the thought has gone
      into the wait side to avoid needing to offload wait operations to a
      blocking context. Instead, we rely on the usual callbacks to retry and
      post a completion event, when appropriate.
    
      As far as I can recall, the first request for futex support with
      io_uring came from Andres Freund, working on postgres. His aio rework
      of postgres was one of the early adopters of io_uring, and futex
      support was a natural extension for that. This is relevant from both a
      usability point of view, as well as for effiency and performance. In
      Andres's words, for the former:
    
         Futex wait support in io_uring makes it a lot easier to avoid
         deadlocks in concurrent programs that have their own buffer pool:
         Obviously pages in the application buffer pool have to be locked
         during IO. If the initiator of IO A needs to wait for a held lock
         B, the holder of lock B might wait for the IO A to complete. The
         ability to wait for a lock and IO completions at the same time
         provides an efficient way to avoid such deadlocks
    
      and in terms of effiency, even without unlocking the full potential
      yet, Andres says:
    
         Futex wake support in io_uring is useful because it allows for more
         efficient directed wakeups. For some "locks" postgres has queues
         implemented in userspace, with wakeup logic that cannot easily be
         implemented with FUTEX_WAKE_BITSET on a single "futex word"
         (imagine waiting for journal flushes to have completed up to a
         certain point).
    
         Thus a "lock release" sometimes need to wake up many processes in a
         row. A quick-and-dirty conversion to doing these wakeups via
         io_uring lead to a 3% throughput increase, with 12% fewer context
         switches, albeit in a fairly extreme workload"
    
    * tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux:
      io_uring: add support for vectored futex waits
      futex: make the vectored futex operations available
      futex: make futex_parse_waitv() available as a helper
      futex: add wake_data to struct futex_q
      io_uring: add support for futex wake and wait
      futex: abstract out a __futex_wake_mark() helper
      futex: factor out the futex wake handling
      futex: move FUTEX2_VALID_MASK to futex.h
    4de520f1
io_uring.c 123 KB