1. 20 May, 2020 2 commits
    • Xiaoguang Wang's avatar
      io_uring: reset -EBUSY error when io sq thread is waken up · d4ae271d
      Xiaoguang Wang authored
      In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
      we will won't clear it again, which will result in io_sq_thread() will
      never have a chance to submit sqes again. Below test program test.c
      can reveal this bug:
      
      int main(int argc, char *argv[])
      {
              struct io_uring ring;
              int i, fd, ret;
              struct io_uring_sqe *sqe;
              struct io_uring_cqe *cqe;
              struct iovec *iovecs;
              void *buf;
              struct io_uring_params p;
      
              if (argc < 2) {
                      printf("%s: file\n", argv[0]);
                      return 1;
              }
      
              memset(&p, 0, sizeof(p));
              p.flags = IORING_SETUP_SQPOLL;
              ret = io_uring_queue_init_params(4, &ring, &p);
              if (ret < 0) {
                      fprintf(stderr, "queue_init: %s\n", strerror(-ret));
                      return 1;
              }
      
              fd = open(argv[1], O_RDONLY | O_DIRECT);
              if (fd < 0) {
                      perror("open");
                      return 1;
              }
      
              iovecs = calloc(10, sizeof(struct iovec));
              for (i = 0; i < 10; i++) {
                      if (posix_memalign(&buf, 4096, 4096))
                              return 1;
                      iovecs[i].iov_base = buf;
                      iovecs[i].iov_len = 4096;
              }
      
              ret = io_uring_register_files(&ring, &fd, 1);
              if (ret < 0) {
                      fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
                      return ret;
              }
      
              for (i = 0; i < 10; i++) {
                      sqe = io_uring_get_sqe(&ring);
                      if (!sqe)
                              break;
      
                      io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
                      sqe->flags |= IOSQE_FIXED_FILE;
      
                      ret = io_uring_submit(&ring);
                      sleep(1);
                      printf("submit %d\n", i);
              }
      
              for (i = 0; i < 10; i++) {
                      io_uring_wait_cqe(&ring, &cqe);
                      printf("receive: %d\n", i);
                      if (cqe->res != 4096) {
                              fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
                              ret = 1;
                      }
                      io_uring_cqe_seen(&ring, cqe);
              }
      
              close(fd);
              io_uring_queue_exit(&ring);
              return 0;
      }
      sudo ./test testfile
      above command will hang on the tenth request, to fix this bug, when io
      sq_thread is waken up, we reset the variable 'ret' to be zero.
      Suggested-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4ae271d
    • Jens Axboe's avatar
      io_uring: don't add non-IO requests to iopoll pending list · b532576e
      Jens Axboe authored
      We normally disable any commands that aren't specifically poll commands
      for a ring that is setup for polling, but we do allow buffer provide and
      remove commands to support buffer selection for polled IO. Once a
      request is issued, we add it to the poll list to poll for completion. But
      we should not do that for non-IO commands, as those request complete
      inline immediately and aren't pollable. If we do, we can leave requests
      on the iopoll list after they are freed.
      
      Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b532576e
  2. 19 May, 2020 1 commit
  3. 18 May, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: cancel work if task_work_add() fails · e3aabf95
      Jens Axboe authored
      We currently move it to the io_wqe_manager for execution, but we cannot
      safely do so as we may lack some of the state to execute it out of
      context. As we cancel work anyway when the ring/task exits, just mark
      this request as canceled and io_async_task_func() will do the right
      thing.
      
      Fixes: aa96bf8a ("io_uring: use io-wq manager as backup task if task is exiting")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e3aabf95
  4. 17 May, 2020 4 commits
    • Jens Axboe's avatar
      io_uring: remove dead check in io_splice() · 948a7749
      Jens Axboe authored
      We checked for 'force_nonblock' higher up, so it's definitely false
      at this point. Kill the check, it's a remnant of when we tried to do
      inline splice without always punting to async context.
      
      Fixes: 2fb3e822 ("io_uring: punt splice async because of inode mutex")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      948a7749
    • Pavel Begunkov's avatar
      io_uring: fix FORCE_ASYNC req preparation · bd2ab18a
      Pavel Begunkov authored
      As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs,
      so they can be prepared properly.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bd2ab18a
    • Pavel Begunkov's avatar
      io_uring: don't prepare DRAIN reqs twice · 650b5481
      Pavel Begunkov authored
      If req->io is not NULL, it's already prepared. Don't do it again,
      it's dangerous.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      650b5481
    • Jens Axboe's avatar
      io_uring: initialize ctx->sqo_wait earlier · 583863ed
      Jens Axboe authored
      Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated,
      instead of deferring it to the offload setup. This fixes a syzbot
      reported lockdep complaint, which is really due to trying to wake_up
      on an uninitialized wait queue:
      
      RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319
      RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b
      RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260
      R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       assign_lock_key kernel/locking/lockdep.c:913 [inline]
       register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
       __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234
       lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
       __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122
       io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160
       io_poll_remove_all fs/io_uring.c:4357 [inline]
       io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305
       io_uring_create fs/io_uring.c:7843 [inline]
       io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x441319
      Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
      
      Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      583863ed
  5. 13 May, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: polled fixed file must go through free iteration · 9d9e88a2
      Jens Axboe authored
      When we changed the file registration handling, it became important to
      iterate the bulk request freeing list for fixed files as well, or we
      miss dropping the fixed file reference. If not, we're leaking references,
      and we'll get a kworker stuck waiting for file references to disappear.
      
      This also means we can remove the special casing of fixed vs non-fixed
      files, we need to iterate for both and we can just rely on
      __io_req_aux_free() doing io_put_file() instead of doing it manually.
      
      Fixes: 05589553 ("io_uring: refactor file register/unregister/update handling")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9d9e88a2
  6. 09 May, 2020 1 commit
  7. 07 May, 2020 2 commits
  8. 05 May, 2020 1 commit
  9. 04 May, 2020 1 commit
  10. 03 May, 2020 4 commits
  11. 02 May, 2020 8 commits
    • Linus Torvalds's avatar
      Merge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 743f0573
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
      
       - prevent the intel_pstate driver from printing excessive diagnostic
         messages in some cases (Chris Wilson)
      
       - make the hibernation restore kernel freeze kernel threads as well as
         user space tasks (Dexuan Cui)
      
       - fix the ACPI device PM disagnostic messages to include the correct
         power state name (Kai-Heng Feng).
      
      * tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: ACPI: Output correct message on target power state
        PM: hibernate: Freeze kernel threads in software_resume()
        cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
      743f0573
    • Rafael J. Wysocki's avatar
      Merge branches 'pm-cpufreq' and 'pm-sleep' · a5383996
      Rafael J. Wysocki authored
      * pm-cpufreq:
        cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
      
      * pm-sleep:
        PM: hibernate: Freeze kernel threads in software_resume()
      a5383996
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f66ed1eb
      Linus Torvalds authored
      Pull iomap fix from Darrick Wong:
       "Hoist the check for an unrepresentable FIBMAP return value into
        ioctl_fibmap.
      
        The internal kernel function can handle 64-bit values (and is needed
        to fix a regression on ext4 + jbd2). It is only the userspace ioctl
        that is so old that it cannot deal"
      
      * tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fibmap: Warn and return an error in case of block > INT_MAX
      f66ed1eb
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 29a47f45
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
        Stable fixes:
         - fix handling of backchannel binding in BIND_CONN_TO_SESSION
      
        Bugfixes:
         - Fix a credential use-after-free issue in pnfs_roc()
         - Fix potential posix_acl refcnt leak in nfs3_set_acl
         - defer slow parts of rpc_free_client() to a workqueue
         - Fix an Oopsable race in __nfs_list_for_each_server()
         - Fix trace point use-after-free race
         - Regression: the RDMA client no longer responds to server disconnect
           requests
         - Fix return values of xdr_stream_encode_item_{present, absent}
         - _pnfs_return_layout() must always wait for layoutreturn completion
      
        Cleanups:
         - Remove unreachable error conditions"
      
      * tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFS: Fix a race in __nfs_list_for_each_server()
        NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
        SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
        NFSv4: Remove unreachable error condition due to rpc_run_task()
        SUNRPC: Remove unreachable error condition
        xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
        xprtrdma: Fix trace point use-after-free race
        xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
        nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
        NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
        NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
      29a47f45
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma · ed6889db
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "Core:
         - Documentation typo fixes
         - fix the channel indexes
         - dmatest: fixes for process hang and iterations
      
        Drivers:
         - hisilicon: build error fix without PCI_MSI
         - ti-k3: deadlock fix
         - uniphier-xdmac: fix for reg region
         - pch: fix data race
         - tegra: fix clock state"
      
      * tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: dmatest: Fix process hang when reading 'wait' parameter
        dmaengine: dmatest: Fix iteration non-stop logic
        dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronization
        dmaengine: fix channel index enumeration
        dmaengine: mmp_tdma: Reset channel error on release
        dmaengine: mmp_tdma: Do not ignore slave config validation errors
        dmaengine: pch_dma.c: Avoid data race between probe and irq handler
        dt-bindings: dma: uniphier-xdmac: switch to single reg region
        include/linux/dmaengine: Typos fixes in API documentation
        dmaengine: xilinx_dma: Add missing check for empty list
        dmaengine: ti: k3-psil: fix deadlock on error path
        dmaengine: hisilicon: Fix build error without PCI_MSI
      ed6889db
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio · 690e2aba
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - copy_*_user validity check for new vfio_dma_rw interface (Yan Zhao)
      
       - Fix a potential math overflow (Yan Zhao)
      
       - Use follow_pfn() for calculating PFNMAPs (Sean Christopherson)
      
      * tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
        vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
        vfio: checking of validity of user vaddr in vfio_dma_rw
      690e2aba
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 42eb62d4
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Add -fasynchronous-unwind-tables to the vDSO CFLAGS"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: vdso: Add -fasynchronous-unwind-tables to cflags
      42eb62d4
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block · cf018530
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix for statx not grabbing the file table, making AT_EMPTY_PATH fail
      
       - Cover a few cases where async poll can handle retry, eliminating the
         need for an async thread
      
       - fallback request busy/free fix (Bijan)
      
       - syzbot reported SQPOLL thread exit fix for non-preempt (Xiaoguang)
      
       - Fix extra put of req for sync_file_range (Pavel)
      
       - Always punt splice async. We'll improve this for 5.8, but wanted to
         eliminate the inode mutex lock from the non-blocking path for 5.7
         (Pavel)
      
      * tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
        io_uring: punt splice async because of inode mutex
        io_uring: check non-sync defer_list carefully
        io_uring: fix extra put in sync_file_range()
        io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()
        io_uring: use proper references for fallback_req locking
        io_uring: only force async punt if poll based retry can't handle it
        io_uring: enable poll retry for any file with ->read_iter / ->write_iter
        io_uring: statx must grab the file table for valid fd
      cf018530
  12. 01 May, 2020 14 commits