1. 15 Mar, 2021 5 commits
    • Pavel Begunkov's avatar
      io_uring: fix concurrent parking · 9e138a48
      Pavel Begunkov authored
      If io_sq_thread_park() of one task got rescheduled right after
      set_bit(), before it gets back to mutex_lock() there can happen
      park()/unpark() by another task with SQPOLL locking again and
      continuing running never seeing that first set_bit(SHOULD_PARK),
      so won't even try to put the mutex down for parking.
      
      It will get parked eventually when SQPOLL drops the lock for reschedule,
      but may be problematic and will get in the way of further fixes.
      
      Account number of tasks waiting for parking with a new atomic variable
      park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely
      replaces SHOULD_PARK bit with this atomic var because it's convenient
      to have it as a bit in the state and will help to do optimisations
      later.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e138a48
    • Pavel Begunkov's avatar
      io_uring: halt SQO submission on ctx exit · f6d54255
      Pavel Begunkov authored
      io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
      potentially running submitting new requests. It's not a disaster because
      of using a "try" variant of percpu_ref_get, but is far from nice.
      
      Remove ctx from the sqd ctx list earlier, before cancellation loop, so
      SQPOLL can't find it and so won't submit new requests.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f6d54255
    • Pavel Begunkov's avatar
      io_uring: replace sqd rw_semaphore with mutex · 09a6f4ef
      Pavel Begunkov authored
      The only user of read-locking of sqd->rw_lock is sq_thread itself, which
      is by definition alone, so we don't really need rw_semaphore, but mutex
      will do. Replace it with a mutex, and kill read-to-write upgrading and
      extra task_work handling in io_sq_thread().
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      09a6f4ef
    • Pavel Begunkov's avatar
      io_uring: fix complete_post use ctx after free · 180f829f
      Pavel Begunkov authored
      If io_req_complete_post() put not a final ref, we can't rely on the
      request's ctx ref, and so ctx may potentially be freed while
      complete_post() is in io_cqring_ev_posted()/etc.
      
      In that case get an additional ctx reference, and put it in the end, so
      protecting following io_cqring_ev_posted(). And also prolong ctx
      lifetime until spin_unlock happens, as we do with mutexes, so added
      percpu_ref_get() doesn't race with ctx free.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      180f829f
    • Pavel Begunkov's avatar
      io_uring: fix ->flags races by linked timeouts · efe814a4
      Pavel Begunkov authored
      It's racy to modify req->flags from a not owning context, e.g. linked
      timeout calling req_set_fail_links() for the master request might race
      with that request setting/clearing flags while being executed
      concurrently. Just remove req_set_fail_links(prev) from
      io_link_timeout_fn(), io_async_find_and_cancel() and functions down the
      line take care of setting the fail bit.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      efe814a4
  2. 14 Mar, 2021 1 commit
    • Jens Axboe's avatar
      io_uring: convert io_buffer_idr to XArray · 9e15c3a0
      Jens Axboe authored
      Like we did for the personality idr, convert the IO buffer idr to use
      XArray. This avoids a use-after-free on removal of entries, since idr
      doesn't like doing so from inside an iterator, and it nicely reduces
      the amount of code we need to support this feature.
      
      Fixes: 5a2e745d ("io_uring: buffer registration infrastructure")
      Cc: stable@vger.kernel.org
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: yangerkun <yangerkun@huawei.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e15c3a0
  3. 13 Mar, 2021 2 commits
  4. 12 Mar, 2021 5 commits
    • Pavel Begunkov's avatar
      io_uring: fix OP_ASYNC_CANCEL across tasks · 58f99373
      Pavel Begunkov authored
      IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task.
      If it fails go over tctx_list and try it out for every single tctx.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      58f99373
    • Pavel Begunkov's avatar
      io_uring: cancel sqpoll via task_work · 521d6a73
      Pavel Begunkov authored
      1) The first problem is io_uring_cancel_sqpoll() ->
      io_uring_cancel_task_requests() basically doing park(); park(); and so
      hanging.
      
      2) Another one is more subtle, when the master task is doing cancellations,
      but SQPOLL task submits in-between the end of the cancellation but
      before finish() requests taking a ref to the ctx, and so eternally
      locking it up.
      
      3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
      same io_uring_cancel_sqpoll() from the owner task, they race for
      tctx->wait events. And there probably more of them.
      
      Instead do SQPOLL cancellations from within SQPOLL task context via
      task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
      park()/unpark() during cancellation, which is ugly, subtle and anyway
      doesn't allow to do io_run_task_work() properly.
      
      io_uring_cancel_sqpoll() is called only from SQPOLL task context and
      under sqd locking, so all parking is removed from there. And so,
      io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
      SQPOLL task, and that spare us from some headache.
      
      Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
      which is not used anymore.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      521d6a73
    • Pavel Begunkov's avatar
      io_uring: prevent racy sqd->thread checks · 26984fbf
      Pavel Begunkov authored
      SQPOLL thread to which we're trying to attach may be going away, it's
      not nice but a more serious problem is if io_sq_offload_create() sees
      sqd->thread==NULL, and tries to init it with a new thread. There are
      tons of ways it can be exploited or fail.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26984fbf
    • Pavel Begunkov's avatar
      io_uring: remove useless ->startup completion · 0df8ea60
      Pavel Begunkov authored
      We always do complete(&sqd->startup) almost right after sqd->thread
      creation, either in the success path or in io_sq_thread_finish(). It's
      specifically created not started for us to be able to set some stuff
      like sqd->thread and io_uring_alloc_task_context() before following
      right after wake_up_new_task().
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0df8ea60
    • Pavel Begunkov's avatar
      io_uring: cancel deferred requests in try_cancel · e1915f76
      Pavel Begunkov authored
      As io_uring_cancel_files() and others let SQO to run between
      io_uring_try_cancel_requests(), SQO may generate new deferred requests,
      so it's safer to try to cancel them in it.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e1915f76
  5. 11 Mar, 2021 2 commits
    • Jens Axboe's avatar
      io_uring: perform IOPOLL reaping if canceler is thread itself · d052d1d6
      Jens Axboe authored
      We bypass IOPOLL completion polling (and reaping) for the SQPOLL thread,
      but if it's the thread itself invoking cancelations, then we still need
      to perform it or no one will.
      
      Fixes: 9936c7c2 ("io_uring: deduplicate core cancellations sequence")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d052d1d6
    • Jens Axboe's avatar
      io_uring: force creation of separate context for ATTACH_WQ and non-threads · 5c2469e0
      Jens Axboe authored
      Earlier kernels had SQPOLL threads that could share across anything, as
      we grabbed the context we needed on a per-ring basis. This is no longer
      the case, so only allow attaching directly if we're in the same thread
      group. That is the common use case. For non-group tasks, just setup a
      new context and thread as we would've done if sharing wasn't set. This
      isn't 100% ideal in terms of CPU utilization for the forked and share
      case, but hopefully that isn't much of a concern. If it is, there are
      plans in motion for how to improve that. Most importantly, we want to
      avoid app side regressions where sharing worked before and now doesn't.
      With this patch, functionality is equivalent to previous kernels that
      supported IORING_SETUP_ATTACH_WQ with SQPOLL.
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5c2469e0
  6. 10 Mar, 2021 15 commits
  7. 07 Mar, 2021 10 commits