1. 21 Sep, 2020 3 commits
    • Douglas Gilbert's avatar
      tools/io_uring: fix compile breakage · 72f04da4
      Douglas Gilbert authored
      It would seem none of the kernel continuous integration does this:
          $ cd tools/io_uring
          $ make
      
      Otherwise it may have noticed:
         cc -Wall -Wextra -g -D_GNU_SOURCE   -c -o io_uring-bench.o
      	 io_uring-bench.c
      io_uring-bench.c:133:12: error: static declaration of ‘gettid’
      	 follows non-static declaration
        133 | static int gettid(void)
            |            ^~~~~~
      In file included from /usr/include/unistd.h:1170,
                       from io_uring-bench.c:27:
      /usr/include/x86_64-linux-gnu/bits/unistd_ext.h:34:16: note:
      	 previous declaration of ‘gettid’ was here
         34 | extern __pid_t gettid (void) __THROW;
            |                ^~~~~~
      make: *** [<builtin>: io_uring-bench.o] Error 1
      
      The problem on Ubuntu 20.04 (with lk 5.9.0-rc5) is that unistd.h
      already defines gettid(). So prefix the local definition with
      "lk_".
      Signed-off-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      72f04da4
    • Jens Axboe's avatar
      io_uring: don't use retry based buffered reads for non-async bdev · f5cac8b1
      Jens Axboe authored
      Some block devices, like dm, bubble back -EAGAIN through the completion
      handler. We check for this in io_read(), but don't honor it for when
      we have copied the iov. Return -EAGAIN for this case before retrying,
      to force punt to io-wq.
      
      Fixes: bcf5a063 ("io_uring: support true async buffered reads, if file provides it")
      Reported-by: default avatarZorro Lang <zlang@redhat.com>
      Tested-by: default avatarZorro Lang <zlang@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f5cac8b1
    • Jens Axboe's avatar
      io_uring: don't re-setup vecs/iter in io_resumit_prep() is already there · 8f3d7496
      Jens Axboe authored
      If we already have mapped the necessary data for retry, then don't set
      it up again. It's a pointless operation, and we leak the iovec if it's
      a large (non-stack) vec.
      
      Fixes: b63534c4 ("io_uring: re-issue block requests that failed because of resources")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8f3d7496
  2. 14 Sep, 2020 2 commits
  3. 13 Sep, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: grab any needed state during defer prep · 202700e1
      Jens Axboe authored
      Always grab work environment for deferred links. The assumption that we
      will be running it always from the task in question is false, as exiting
      tasks may mean that we're deferring this one to a thread helper. And at
      that point it's too late to grab the work environment.
      
      Fixes: debb85f4 ("io_uring: factor out grab_env() from defer_prep()")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      202700e1
  4. 05 Sep, 2020 3 commits
  5. 02 Sep, 2020 2 commits
  6. 01 Sep, 2020 1 commit
  7. 27 Aug, 2020 3 commits
  8. 26 Aug, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: make offset == -1 consistent with preadv2/pwritev2 · 0fef9483
      Jens Axboe authored
      The man page for io_uring generally claims were consistent with what
      preadv2 and pwritev2 accept, but turns out there's a slight discrepancy
      in how offset == -1 is handled for pipes/streams. preadv doesn't allow
      it, but preadv2 does. This currently causes io_uring to return -EINVAL
      if that is attempted, but we should allow that as documented.
      
      This change makes us consistent with preadv2/pwritev2 for just passing
      in a NULL ppos for streams if the offset is -1.
      
      Cc: stable@vger.kernel.org # v5.7+
      Reported-by: default avatarBenedikt Ames <wisp3rwind@posteo.eu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0fef9483
  9. 25 Aug, 2020 4 commits
  10. 23 Aug, 2020 2 commits
  11. 20 Aug, 2020 3 commits
    • Pavel Begunkov's avatar
      io_uring: kill extra iovec=NULL in import_iovec() · 867a23ea
      Pavel Begunkov authored
      If io_import_iovec() returns an error, return iovec is undefined and
      must not be used, so don't set it to NULL when failing.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      867a23ea
    • Pavel Begunkov's avatar
      io_uring: comment on kfree(iovec) checks · f261c168
      Pavel Begunkov authored
      kfree() handles NULL pointers well, but io_{read,write}() checks it
      because of performance reasons. Leave a comment there for those who are
      tempted to patch it.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f261c168
    • Pavel Begunkov's avatar
      io_uring: fix racy req->flags modification · bb175342
      Pavel Begunkov authored
      Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
      io_cqring_overflow_flush() are racy, because they might be called
      asynchronously.
      
      REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
      be guaranteed that requests _currently_ marked inflight can't be
      overflown, the problem will be solved with removing the flag
      altogether.
      
      That's how the patch works, it removes inflight status of a request
      in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
      list. That's Ok to do, because no opcode specific handling can be done
      after io_cqring_fill_event(), the same assumption as with "struct
      io_completion" patches.
      And it already have a good place for such cleanups, which is
      io_clean_op(). A nice side effect of this is removing this inflight
      check from the hot path.
      
      note on synchronisation: now __io_cqring_fill_event() may be taking two
      spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
      because we never do that in reverse order, and CQ-overflow of inflight
      requests shouldn't happen often.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bb175342
  12. 19 Aug, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: use system_unbound_wq for ring exit work · fc666777
      Jens Axboe authored
      We currently use system_wq, which is unbounded in terms of number of
      workers. This means that if we're exiting tons of rings at the same
      time, then we'll briefly spawn tons of event kworkers just for a very
      short blocking time as the rings exit.
      
      Use system_unbound_wq instead, which has a sane cap on the concurrency
      level.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fc666777
  13. 18 Aug, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: cleanup io_import_iovec() of pre-mapped request · 8452fd0c
      Jens Axboe authored
      io_rw_prep_async() goes through a dance of clearing req->io, calling
      the iovec import, then re-setting req->io. Provide an internal helper
      that does the right thing without needing state tweaked to get there.
      
      This enables further cleanups in io_read, io_write, and
      io_resubmit_prep(), but that's left for another time.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8452fd0c
  14. 16 Aug, 2020 8 commits
    • Jens Axboe's avatar
      io_uring: get rid of kiocb_wait_page_queue_init() · 3b2a4439
      Jens Axboe authored
      The 5.9 merge moved this function io_uring, which means that we don't
      need to retain the generic nature of it. Clean up this part by removing
      redundant checks, and just inlining the small remainder in
      io_rw_should_retry().
      
      No functional changes in this patch.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3b2a4439
    • Jens Axboe's avatar
      io_uring: find and cancel head link async work on files exit · b711d4ea
      Jens Axboe authored
      Commit f254ac04 ("io_uring: enable lookup of links holding inflight files")
      only handled 2 out of the three head link cases we have, we also need to
      lookup and cancel work that is blocked in io-wq if that work has a link
      that's holding a reference to the files structure.
      
      Put the "cancel head links that hold this request pending" logic into
      io_attempt_cancel(), which will to through the motions of finding and
      canceling head links that hold the current inflight files stable request
      pending.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b711d4ea
    • Linus Torvalds's avatar
      Linux 5.9-rc1 · 9123e3a7
      Linus Torvalds authored
      9123e3a7
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block · 2cc3c4b3
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A few differerent things in here.
      
        Seems like syzbot got some more io_uring bits wired up, and we got a
        handful of reports and the associated fixes are in here.
      
        General fixes too, and a lot of them marked for stable.
      
        Lastly, a bit of fallout from the async buffered reads, where we now
        more easily trigger short reads. Some applications don't really like
        that, so the io_read() code now handles short reads internally, and
        got a cleanup along the way so that it's now easier to read (and
        documented). We're now passing tests that failed before"
      
      * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
        io_uring: short circuit -EAGAIN for blocking read attempt
        io_uring: sanitize double poll handling
        io_uring: internally retry short reads
        io_uring: retain iov_iter state over io_read/io_write calls
        task_work: only grab task signal lock when needed
        io_uring: enable lookup of links holding inflight files
        io_uring: fail poll arm on queue proc failure
        io_uring: hold 'ctx' reference around task_work queue + execute
        fs: RWF_NOWAIT should imply IOCB_NOIO
        io_uring: defer file table grabbing request cleanup for locked requests
        io_uring: add missing REQ_F_COMP_LOCKED for nested requests
        io_uring: fix recursive completion locking on oveflow flush
        io_uring: use TWA_SIGNAL for task_work uncondtionally
        io_uring: account locked memory before potential error case
        io_uring: set ctx sq/cq entry count earlier
        io_uring: Fix NULL pointer dereference in loop_rw_iter()
        io_uring: add comments on how the async buffered read retry works
        io_uring: io_async_buf_func() need not test page bit
      2cc3c4b3
    • Mike Rapoport's avatar
      parisc: fix PMD pages allocation by restoring pmd_alloc_one() · 6f6aea7e
      Mike Rapoport authored
      Commit 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one()
      and pmd_free_one()") converted parisc to use generic version of
      pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for
      PMD.
      
      Restore the original version of pmd_alloc_one() for parisc, just use
      GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and
      memset.
      
      Fixes: 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.eeSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f6aea7e
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block · 4b6c093e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes on the block side of things:
      
         - Discard granularity fix (Coly)
      
         - rnbd cleanups (Guoqing)
      
         - md error handling fix (Dan)
      
         - md sysfs fix (Junxiao)
      
         - Fix flush request accounting, which caused an IO slowdown for some
           configurations (Ming)
      
         - Properly propagate loop flag for partition scanning (Lennart)"
      
      * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
        block: fix double account of flush request's driver tag
        loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
        rnbd: no need to set bi_end_io in rnbd_bio_map_kern
        rnbd: remove rnbd_dev_submit_io
        md-cluster: Fix potential error pointer dereference in resize_bitmaps()
        block: check queue's limits.discard_granularity in __blkdev_issue_discard()
        md: get sysfs entry after redundancy attr group create
      4b6c093e
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d84835b1
      Linus Torvalds authored
      Pull RISC-V fix from Palmer Dabbelt:
       "I collected a single fix during the merge window: we managed to break
        the early trap setup on !MMU, this fixes it"
      
      * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Setup exception vector for nommu platform
      d84835b1
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh · 5bbec3cf
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
        changes to arch/sh"
      
      * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
        sh: landisk: Add missing initialization of sh_io_port_base
        sh: bring syscall_set_return_value in line with other architectures
        sh: Add SECCOMP_FILTER
        sh: Rearrange blocks in entry-common.S
        sh: switch to copy_thread_tls()
        sh: use the generic dma coherent remap allocator
        sh: don't allow non-coherent DMA for NOMMU
        dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
        sh: unexport register_trapped_io and match_trapped_io_handler
        sh: don't include <asm/io_trapped.h> in <asm/io.h>
        sh: move the ioremap implementation out of line
        sh: move ioremap_fixed details out of <asm/io.h>
        sh: remove __KERNEL__ ifdefs from non-UAPI headers
        sh: sort the selects for SUPERH alphabetically
        sh: remove -Werror from Makefiles
        sh: Replace HTTP links with HTTPS ones
        arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
        sh: stacktrace: Remove stacktrace_ops.stack()
        sh: machvec: Modernize printing of kernel messages
        sh: pci: Modernize printing of kernel messages
        ...
      5bbec3cf
  15. 15 Aug, 2020 5 commits
    • Jens Axboe's avatar
      io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56
      Jens Axboe authored
      One case was missed in the short IO retry handling, and that's hitting
      -EAGAIN on a blocking attempt read (eg from io-wq context). This is a
      problem on sockets that are marked as non-blocking when created, they
      don't carry any REQ_F_NOWAIT information to help us terminate them
      instead of perpetually retrying.
      
      Fixes: 227c0c96 ("io_uring: internally retry short reads")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f91daf56
    • Jens Axboe's avatar
      io_uring: sanitize double poll handling · d4e7cd36
      Jens Axboe authored
      There's a bit of confusion on the matching pairs of poll vs double poll,
      depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
      poll driven retry.
      
      Add io_poll_get_double() that returns the double poll waitqueue, if any,
      and io_poll_get_single() that returns the original poll waitqueue. With
      that, remove the argument to io_poll_remove_double().
      
      Finally ensure that wait->private is cleared once the double poll handler
      has run, so that remove knows it's already been seen.
      
      Cc: stable@vger.kernel.org # v5.8
      Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4e7cd36
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · 713eee84
      Linus Torvalds authored
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
       "Fixes:
         - Fixes for 'perf bench numa'.
      
         - Always memset source before memcpy in 'perf bench mem'.
      
         - Quote CC and CXX for their arguments to fix build in environments
           using those variables to pass more than just the compiler names.
      
         - Fix module symbol processing, addressing regression detected via
           "perf test".
      
         - Allow multiple probes in record+script_probe_vfs_getname.sh 'perf
           test' entry.
      
        Improvements:
         - Add script to autogenerate socket family name id->string table from
           copy of kernel header, used so far in 'perf trace'.
      
         - 'perf ftrace' improvements to provide similar options for this
           utility so that one can go from 'perf record', 'perf trace', etc to
           'perf ftrace' just by changing the name of the subcommand.
      
         - Prefer new "sched:sched_waking" trace event when it exists in 'perf
           sched' post processing.
      
         - Update POWER9 metrics to utilize other metrics.
      
         - Fall back to querying debuginfod if debuginfo not found locally.
      
        Miscellaneous:
         - Sync various kvm headers with kernel sources"
      
      * tag 'perf-tools-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (40 commits)
        perf ftrace: Make option description initials all capital letters
        perf build-ids: Fall back to debuginfod query if debuginfo not found
        perf bench numa: Remove dead code in parse_nodes_opt()
        perf stat: Update POWER9 metrics to utilize other metrics
        perf ftrace: Add change log
        perf: ftrace: Add set_tracing_options() to set all trace options
        perf ftrace: Add option --tid to filter by thread id
        perf ftrace: Add option -D/--delay to delay tracing
        perf: ftrace: Allow set graph depth by '--graph-opts'
        perf ftrace: Add support for trace option tracing_thresh
        perf ftrace: Add option 'verbose' to show more info for graph tracer
        perf ftrace: Add support for tracing option 'irq-info'
        perf ftrace: Add support for trace option funcgraph-irqs
        perf ftrace: Add support for trace option sleep-time
        perf ftrace: Add support for tracing option 'func_stack_trace'
        perf tools: Add general function to parse sublevel options
        perf ftrace: Add option '--inherit' to trace children processes
        perf ftrace: Show trace column header
        perf ftrace: Add option '-m/--buffer-size' to set per-cpu buffer size
        perf ftrace: Factor out function write_tracing_file_int()
        ...
      713eee84
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 50f6c7db
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes and small updates all around the place:
      
         - Fix mitigation state sysfs output
      
         - Fix an FPU xstate/sxave code assumption bug triggered by
           Architectural LBR support
      
         - Fix Lightning Mountain SoC TSC frequency enumeration bug
      
         - Fix kexec debug output
      
         - Fix kexec memory range assumption bug
      
         - Fix a boundary condition in the crash kernel code
      
         - Optimize porgatory.ro generation a bit
      
         - Enable ACRN guests to use X2APIC mode
      
         - Reduce a __text_poke() IRQs-off critical section for the benefit of
           PREEMPT_RT"
      
      * tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/alternatives: Acquire pte lock with interrupts enabled
        x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
        x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
        x86/purgatory: Don't generate debug info for purgatory.ro
        x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
        kexec_file: Correctly output debugging information for the PT_LOAD ELF header
        kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
        x86/crash: Correct the address boundary of function parameters
        x86/acrn: Remove redundant chars from ACRN signature
        x86/acrn: Allow ACRN guest to use X2APIC mode
      50f6c7db
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1195d58f
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes: fix a new tracepoint's output value, and fix the formatting
        of show-state syslog printouts"
      
      * tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/debug: Fix the alignment of the show-state debug output
        sched: Fix use of count for nr_running tracepoint
      1195d58f