1. 26 Nov, 2019 27 commits
    • Pavel Begunkov's avatar
      io_uring: drain next sqe instead of shadowing · 1b4a51b6
      Pavel Begunkov authored
      There's an issue with the shadow drain logic in that we drop the
      completion lock after deciding to defer a request, then re-grab it later
      and assume that the state is still the same. In the mean time, someone
      else completing a request could have found and issued it. This can cause
      a stall in the queue, by having a shadow request inserted that nobody is
      going to drain.
      
      Additionally, if we fail allocating the shadow request, we simply ignore
      the drain.
      
      Instead of using a shadow request, defer the next request/link instead.
      This also has the following advantages:
      
      - removes semi-duplicated code
      - doesn't allocate memory for shadows
      - works better if only the head marked for drain
      - doesn't need complex synchronisation
      
      On the flip side, it removes the shadow->seq ==
      last_drain_in_in_link->seq optimization. That shouldn't be a common
      case, and can always be added back, if needed.
      
      Fixes: 4fe2c963 ("io_uring: add support for link with drain")
      Cc: Jackie Liu <liuyun01@kylinos.cn>
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1b4a51b6
    • Jens Axboe's avatar
      io_uring: close lookup gap for dependent next work · b76da70f
      Jens Axboe authored
      When we find new work to process within the work handler, we queue the
      linked timeout before we have issued the new work. This can be
      problematic for very short timeouts, as we have a window where the new
      work isn't visible.
      
      Allow the work handler to store a callback function for this in the work
      item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that
      is set, then io-wq will call the callback when it has setup the new work
      item.
      Reported-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b76da70f
    • Jens Axboe's avatar
      io_uring: allow finding next link independent of req reference count · 4d7dd462
      Jens Axboe authored
      We currently try and start the next link when we put the request, and
      only if we were going to free it. This means that the optimization to
      continue executing requests from the same context often fails, as we're
      not putting the final reference.
      
      Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the
      next request more efficiently.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4d7dd462
    • Jens Axboe's avatar
      io_uring: io_allocate_scq_urings() should return a sane state · eb065d30
      Jens Axboe authored
      We currently rely on the ring destroy on cleaning things up in case of
      failure, but io_allocate_scq_urings() can leave things half initialized
      if only parts of it fails.
      
      Be nice and return with either everything setup in success, or return an
      error with things nicely cleaned up.
      
      Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eb065d30
    • Pavel Begunkov's avatar
      io_uring: Always REQ_F_FREE_SQE for allocated sqe · bbad27b2
      Pavel Begunkov authored
      Always mark requests with allocated sqe and deallocate it in
      __io_free_req(). It's easier to follow and doesn't add edge cases.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bbad27b2
    • Jens Axboe's avatar
      io_uring: io_fail_links() should only consider first linked timeout · 5d960724
      Jens Axboe authored
      We currently clear the linked timeout field if we cancel such a timeout,
      but we should only attempt to cancel if it's the first one we see.
      Others should simply be freed like other requests, as they haven't
      been started yet.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5d960724
    • Pavel Begunkov's avatar
      io_uring: Fix leaking linked timeouts · 09fbb0a8
      Pavel Begunkov authored
      let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT
      
      1. submission stage: submission references for REQ and LINK_TIMEOUT
      are dropped. So, references respectively (1,1,2)
      
      2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all
      linked timeouts will call cancel_timeout() and drop 1 reference.
      So, references after: (0,0,1). That's a leak.
      
      Make it treat only the first linked timeout as such, and pass others
      through __io_double_put_req().
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      09fbb0a8
    • Pavel Begunkov's avatar
      io_uring: remove redundant check · f70193d6
      Pavel Begunkov authored
      Pass any IORING_OP_LINK_TIMEOUT request further, where it will
      eventually fail in io_issue_sqe().
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f70193d6
    • Pavel Begunkov's avatar
      io_uring: break links for failed defer · d3b35796
      Pavel Begunkov authored
      If io_req_defer() failed, it needs to cancel a dependant link.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d3b35796
    • Dan Carpenter's avatar
      io-wq: remove extra space characters · b2e9c7d6
      Dan Carpenter authored
      These lines are indented an extra space character.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b2e9c7d6
    • Jens Axboe's avatar
      io-wq: wait for io_wq_create() to setup necessary workers · b60fda60
      Jens Axboe authored
      We currently have a race where if setup is really slow, we can be
      calling io_wq_destroy() before we're done setting up. This will cause
      the caller to get stuck waiting for the manager to set things up, but
      the manager already exited.
      
      Fix this by doing a sync setup of the manager. This also fixes the case
      where if we failed creating workers, we'd also get stuck.
      
      In practice this race window was really small, as we already wait for
      the manager to start. Hence someone would have to call io_wq_destroy()
      after the task has started, but before it started the first loop. The
      reported test case forked tons of these, which is why it became an
      issue.
      
      Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
      Fixes: 771b53d0 ("io-wq: small threadpool implementation for io_uring")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b60fda60
    • Jens Axboe's avatar
      io_uring: request cancellations should break links · fba38c27
      Jens Axboe authored
      We currently don't explicitly break links if a request is cancelled, but
      we should. Add explicitly link breakage for all types of request
      cancellations that we support.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fba38c27
    • Jens Axboe's avatar
      io_uring: correct poll cancel and linked timeout expiration completion · b0dd8a41
      Jens Axboe authored
      Currently a poll request fills a completion entry of 0, even if it got
      cancelled. This is odd, and it makes it harder to support with chains.
      Ensure that it returns -ECANCELED in the completions events if it got
      cancelled, and furthermore ensure that the linked timeout that triggered
      it completes with -ETIME if we did indeed trigger the completions
      through a timeout.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b0dd8a41
    • Jens Axboe's avatar
      io_uring: remove dead REQ_F_SEQ_PREV flag · e0e328c4
      Jens Axboe authored
      With the conversion to io-wq, we no longer use that flag. Kill it.
      
      Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e0e328c4
    • Jens Axboe's avatar
      io_uring: fix sequencing issues with linked timeouts · 94ae5e77
      Jens Axboe authored
      We have an issue with timeout links that are deeper in the submit chain,
      because we only handle it upfront, not from later submissions. Move the
      prep + issue of the timeout link to the async work prep handler, and do
      it normally for non-async queue. If we validate and prepare the timeout
      links upfront when we first see them, there's nothing stopping us from
      supporting any sort of nesting.
      
      Fixes: 2665abfd ("io_uring: add support for linked SQE timeouts")
      Reported-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      94ae5e77
    • Jens Axboe's avatar
      io_uring: make req->timeout be dynamically allocated · ad8a48ac
      Jens Axboe authored
      There are a few reasons for this:
      
      - As a prep to improving the linked timeout logic
      - io_timeout is the biggest member in the io_kiocb opcode union
      
      This also enables a few cleanups, like unifying the timer setup between
      IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple
      arguments to the link/prep helpers.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ad8a48ac
    • Jens Axboe's avatar
      io_uring: make io_double_put_req() use normal completion path · 978db57e
      Jens Axboe authored
      If we don't use the normal completion path, we may skip killing links
      that should be errored and freed. Add __io_double_put_req() for use
      within the completion path itself, other calls should just use
      io_double_put_req().
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      978db57e
    • Jens Axboe's avatar
      io_uring: cleanup return values from the queueing functions · 0e0702da
      Jens Axboe authored
      __io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err,
      but the caller doesn't care since the errors are handled inline. Clean
      these up and just make them void.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0e0702da
    • Jens Axboe's avatar
      io_uring: io_async_cancel() should pass in 'nxt' request pointer · 95a5bbae
      Jens Axboe authored
      If we have a linked request, this enables us to pass it back directly
      without having to go through async context.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      95a5bbae
    • Linus Torvalds's avatar
      Merge tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 9c91e6a5
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
       "A lot of changes this time around, details below.
      
        From the next cycle onwards, we'll switch the EDAC tree to topic
        branches (instead of a single edac-for-next branch) which should make
        the changes handling more flexible, hopefully. We'll see.
      
        Summary:
      
         - Rework error logging functions to accept a count of errors
           parameter (Hanna Hawa)
      
         - Part one of substantial EDAC core + ghes_edac driver cleanup
           (Robert Richter)
      
         - Print additional useful logging information in skx_* (Tony Luck)
      
         - Improve amd64_edac hw detection + cleanups (Yazen Ghannam)
      
         - Misc cleanups, fixes and code improvements"
      
      * tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (35 commits)
        EDAC/altera: Use the Altera System Manager driver
        EDAC/altera: Cleanup the ECC Manager
        EDAC/altera: Use fast register IO for S10 IRQs
        EDAC/ghes: Do not warn when incrementing refcount on 0
        EDAC/Documentation: Describe CPER module definition and DIMM ranks
        EDAC: Unify the mc_event tracepoint call
        EDAC/ghes: Remove intermediate buffer pvt->detail_location
        EDAC/ghes: Fix grain calculation
        EDAC/ghes: Use standard kernel macros for page calculations
        EDAC: Remove misleading comment in struct edac_raw_error_desc
        EDAC/mc: Reduce indentation level in edac_mc_handle_error()
        EDAC/mc: Remove needless zero string termination
        EDAC/mc: Do not BUG_ON() in edac_mc_alloc()
        EDAC: Introduce an mci_for_each_dimm() iterator
        EDAC: Remove EDAC_DIMM_OFF() macro
        EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function
        EDAC/amd64: Get rid of the ECC disabled long message
        EDAC/ghes: Fix locking and memory barrier issues
        EDAC/amd64: Check for memory before fully initializing an instance
        EDAC/amd64: Use cached data when checking for ECC
        ...
      9c91e6a5
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 752272f1
      Linus Torvalds authored
      Pull KVM updates from Paolo Bonzini:
       "ARM:
         - data abort report and injection
         - steal time support
         - GICv4 performance improvements
         - vgic ITS emulation fixes
         - simplify FWB handling
         - enable halt polling counters
         - make the emulated timer PREEMPT_RT compliant
      
        s390:
         - small fixes and cleanups
         - selftest improvements
         - yield improvements
      
        PPC:
         - add capability to tell userspace whether we can single-step the
           guest
         - improve the allocation of XIVE virtual processor IDs
         - rewrite interrupt synthesis code to deliver interrupts in virtual
           mode when appropriate.
         - minor cleanups and improvements.
      
        x86:
         - XSAVES support for AMD
         - more accurate report of nested guest TSC to the nested hypervisor
         - retpoline optimizations
         - support for nested 5-level page tables
         - PMU virtualization optimizations, and improved support for nested
           PMU virtualization
         - correct latching of INITs for nested virtualization
         - IOAPIC optimization
         - TSX_CTRL virtualization for more TAA happiness
         - improved allocation and flushing of SEV ASIDs
         - many bugfixes and cleanups"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
        kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
        KVM: x86: Grab KVM's srcu lock when setting nested state
        KVM: x86: Open code shared_msr_update() in its only caller
        KVM: Fix jump label out_free_* in kvm_init()
        KVM: x86: Remove a spurious export of a static function
        KVM: x86: create mmu/ subdirectory
        KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
        KVM: x86: remove set but not used variable 'called'
        KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
        KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
        KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
        KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
        KVM: x86: do not modify masked bits of shared MSRs
        KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
        KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
        KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
        KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
        KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
        KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
        KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
        ...
      752272f1
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 3f3c8be9
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
      
       - a small series to remove the build constraint of Xen x86 MCE handling
         to 64-bit only
      
       - a bunch of minor cleanups
      
      * tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: Fix Kconfig indentation
        xen/mcelog: also allow building for 32-bit kernels
        xen/mcelog: add PPIN to record when available
        xen/mcelog: drop __MC_MSR_MCGCAP
        xen/gntdev: Use select for DMA_SHARED_BUFFER
        xen: mm: make xen_mm_init static
        xen: mm: include <xen/xen-ops.h> for missing declarations
      3f3c8be9
    • Linus Torvalds's avatar
      Merge tag 'mips_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 2981dcf3
      Linus Torvalds authored
      Pull MIPS updates from Paul Burton:
       "The main MIPS changes for 5.5:
      
         - Atomics-related code sees some rework & cleanup, most notably
           allowing Loongson LL/SC errata workarounds to be more bulletproof &
           their correctness to be checked at build time.
      
         - Command line setup code is simplified somewhat, resolving various
           corner cases.
      
         - MIPS kernels can now be built with kcov code coverage support.
      
         - We can now build with CONFIG_FORTIFY_SOURCE=y.
      
         - Miscellaneous cleanups.
      
        And some platform specific changes:
      
         - We now disable some broken TLB functionality on certain Ingenic
           systems, and JZ4780 systems gain some devicetree nodes to support
           more devices.
      
         - Loongson support sees a number of cleanups, and we gain initial
           support for Loongson 3A R4 systems.
      
         - We gain support for MediaTek MT7688-based GARDENA Smart Gateway
           systems.
      
         - SGI IP27 (Origin 2*) see a number of fixes, cleanups &
           simplifications.
      
         - SGI IP30 (Octane) systems are now supported"
      
      * tag 'mips_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (107 commits)
        MIPS: SGI-IP27: Enable ethernet phy on second Origin 200 module
        MIPS: PCI: Fix fake subdevice ID for IOC3
        MIPS: Ingenic: Disable abandoned HPTLB function.
        MIPS: PCI: remember nasid changed by set interrupt affinity
        MIPS: SGI-IP27: Fix crash, when CPUs are disabled via nr_cpus parameter
        mips: add support for folded p4d page tables
        mips: drop __pXd_offset() macros that duplicate pXd_index() ones
        mips: fix build when "48 bits virtual memory" is enabled
        MIPS: math-emu: Reuse name array in debugfs_fpuemu()
        MIPS: allow building with kcov coverage
        MIPS: Loongson64: Drop setup_pcimap
        MIPS: Loongson2ef: Convert to early_printk_8250
        MIPS: Drop CPU_SUPPORTS_UNCACHED_ACCELERATED
        MIPS: Loongson{2ef, 32, 64} convert to generic fw cmdline
        MIPS: Drop pmon.h
        MIPS: Loongson: Unify LOONGSON3/LOONGSON64 Kconfig usage
        MIPS: Loongson: Rename LOONGSON1 to LOONGSON32
        MIPS: Loongson: Fix return value of loongson_hwmon_init
        MIPS: add support for SGI Octane (IP30)
        MIPS: PCI: make phys_to_dma/dma_to_phys for pci-xtalk-bridge common
        ...
      2981dcf3
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v5.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 5ef30d74
      Linus Torvalds authored
      Pull m68k updates from Geert Uytterhoeven:
      
       - Atari Falcon IDE platform driver conversion for module autoload
      
       - defconfig updates (including enablement of Amiga ICY I2C)
      
       - small fixes and cleanups
      
      * tag 'm68k-for-v5.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k/atari: Convert Falcon IDE drivers to platform drivers
        m68k: defconfig: Enable ICY I2C and LTC2990 on Amiga
        m68k: defconfig: Update defconfigs for v5.4-rc1
        m68k: q40: Fix info-leak in rtc_ioctl
        nubus: Remove cast to void pointer
      5ef30d74
    • Linus Torvalds's avatar
      Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 28fcb77b
      Linus Torvalds authored
      Pull RAS updates from Borislav Petkov:
      
       - Fully reworked thermal throttling notifications, there should be no
         more spamming of dmesg (Srinivas Pandruvada and Benjamin Berg)
      
       - More enablement for the Intel-compatible CPUs Zhaoxin (Tony W
         Wang-oc)
      
       - PPIN support for Icelake (Tony Luck)
      
      * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce/therm_throt: Optimize notifications of thermal throttle
        x86/mce: Add Xeon Icelake to list of CPUs that support PPIN
        x86/mce: Lower throttling MCE messages' priority to warning
        x86/mce: Add Zhaoxin LMCE support
        x86/mce: Add Zhaoxin CMCI support
        x86/mce: Add Zhaoxin MCE support
        x86/mce/amd: Make disable_err_thresholding() static
      28fcb77b
    • Linus Torvalds's avatar
      Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 63c2291f
      Linus Torvalds authored
      Pull x86 microcode updates from Borislav Petkov:
       "This converts the late loading method to load the microcode in
        parallel (vs sequentially currently). The patch remained in linux-next
        for the maximum amount of time so that any potential and hard to debug
        fallout be minimized.
      
        Now cloud folks have their milliseconds back but all the normal people
        should use early loading anyway :-)"
      
      * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode/intel: Issue the revision updated message only on the BSP
        x86/microcode: Update late microcode in parallel
        x86/microcode/amd: Fix two -Wunused-but-set-variable warnings
      63c2291f
    • Linus Torvalds's avatar
      Merge tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · ea1f56fa
      Linus Torvalds authored
      Pull s390 updates from Vasily Gorbik:
      
       - Adjust PMU device drivers registration to avoid WARN_ON and few other
         perf improvements.
      
       - Enhance tracing in vfio-ccw.
      
       - Few stack unwinder fixes and improvements, convert get_wchan custom
         stack unwinding to generic api usage.
      
       - Fixes for mm helpers issues uncovered with tests validating
         architecture page table helpers.
      
       - Fix noexec bit handling when hardware doesn't support it.
      
       - Fix memleak and unsigned value compared with zero bugs in crypto
         code. Minor code simplification.
      
       - Fix crash during kdump with kasan enabled kernel.
      
       - Switch bug and alternatives from asm to asm_inline to improve
         inlining decisions.
      
       - Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add
         z13s and z14 ZR1 to TUNE descriptions.
      
       - Minor head64.S simplification.
      
       - Fix physical to logical CPU map for SMT.
      
       - Several cleanups in qdio code.
      
       - Other minor cleanups and fixes all over the code.
      
      * tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
        s390/cpumf: Adjust registration of s390 PMU device drivers
        s390/smp: fix physical to logical CPU map for SMT
        s390/early: move access registers setup in C code
        s390/head64: remove unnecessary vdso_per_cpu_data setup
        s390/early: move control registers setup in C code
        s390/kasan: support memcpy_real with TRACE_IRQFLAGS
        s390/crypto: Fix unsigned variable compared with zero
        s390/pkey: use memdup_user() to simplify code
        s390/pkey: fix memory leak within _copy_apqns_from_user()
        s390/disassembler: don't hide instruction addresses
        s390/cpum_sf: Assign error value to err variable
        s390/cpum_sf: Replace function name in debug statements
        s390/cpum_sf: Use consistant debug print format for sampling
        s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
        s390: add error handling to perf_callchain_kernel
        s390: always inline current_stack_pointer()
        s390/mm: add mm_pxd_folded() checks to pxd_free()
        s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
        s390/mm: simplify page table helpers for large entries
        s390/mm: make pmd/pud_bad() report large entries as bad
        ...
      ea1f56fa
  2. 25 Nov, 2019 13 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 4ba380f6
      Linus Torvalds authored
      Pull arm64 updates from Catalin Marinas:
       "Apart from the arm64-specific bits (core arch and perf, new arm64
        selftests), it touches the generic cow_user_page() (reviewed by
        Kirill) together with a macro for x86 to preserve the existing
        behaviour on this architecture.
      
        Summary:
      
         - On ARMv8 CPUs without hardware updates of the access flag, avoid
           failing cow_user_page() on PFN mappings if the pte is old. The
           patches introduce an arch_faults_on_old_pte() macro, defined as
           false on x86. When true, cow_user_page() makes the pte young before
           attempting __copy_from_user_inatomic().
      
         - Covert the synchronous exception handling paths in
           arch/arm64/kernel/entry.S to C.
      
         - FTRACE_WITH_REGS support for arm64.
      
         - ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
      
         - Several kselftest cases specific to arm64, together with a
           MAINTAINERS update for these files (moved to the ARM64 PORT entry).
      
         - Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
           instructions under certain conditions.
      
         - Workaround for Cortex-A57 and A72 errata where the CPU may
           speculatively execute an AT instruction and associate a VMID with
           the wrong guest page tables (corrupting the TLB).
      
         - Perf updates for arm64: additional PMU topologies on HiSilicon
           platforms, support for CCN-512 interconnect, AXI ID filtering in
           the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
      
         - GICv3 optimisation to avoid a heavy barrier when accessing the
           ICC_PMR_EL1 register.
      
         - ELF HWCAP documentation updates and clean-up.
      
         - SMC calling convention conduit code clean-up.
      
         - KASLR diagnostics printed during boot
      
         - NVIDIA Carmel CPU added to the KPTI whitelist
      
         - Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
           stale macro, simplify calculation in __create_pgd_mapping(), typos.
      
         - Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
           endinanness to help with allmodconfig"
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
        arm64: Kconfig: add a choice for endianness
        kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
        arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
        MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
        arm64: kaslr: Check command line before looking for a seed
        arm64: kaslr: Announce KASLR status on boot
        kselftest: arm64: fake_sigreturn_misaligned_sp
        kselftest: arm64: fake_sigreturn_bad_size
        kselftest: arm64: fake_sigreturn_duplicated_fpsimd
        kselftest: arm64: fake_sigreturn_missing_fpsimd
        kselftest: arm64: fake_sigreturn_bad_size_for_magic0
        kselftest: arm64: fake_sigreturn_bad_magic
        kselftest: arm64: add helper get_current_context
        kselftest: arm64: extend test_init functionalities
        kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
        kselftest: arm64: mangle_pstate_invalid_daif_bits
        kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
        kselftest: arm64: extend toplevel skeleton Makefile
        drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
        arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
        ...
      4ba380f6
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.5-rc1-kunit' of... · e25645b1
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest KUnit support gtom Shuah Khan:
       "This adds KUnit, a lightweight unit testing and mocking framework for
        the Linux kernel from Brendan Higgins.
      
        KUnit is not an end-to-end testing framework. It is currently
        supported on UML and sub-systems can write unit tests and run them in
        UML env. KUnit documentation is included in this update.
      
        In addition, this Kunit update adds 3 new kunit tests:
      
         - proc sysctl test from Iurii Zaikin
      
         - the 'list' doubly linked list test from David Gow
      
         - ext4 tests for decoding extended timestamps from Iurii Zaikin
      
        In the future KUnit will be linked to Kselftest framework to provide a
        way to trigger KUnit tests from user-space"
      
      * tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits)
        lib/list-test: add a test for the 'list' doubly linked list
        ext4: add kunit test for decoding extended timestamps
        Documentation: kunit: Fix verification command
        kunit: Fix '--build_dir' option
        kunit: fix failure to build without printk
        MAINTAINERS: add proc sysctl KUnit test to PROC SYSCTL section
        kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()
        MAINTAINERS: add entry for KUnit the unit testing framework
        Documentation: kunit: add documentation for KUnit
        kunit: defconfig: add defconfigs for building KUnit tests
        kunit: tool: add Python wrappers for running KUnit tests
        kunit: test: add tests for KUnit managed resources
        kunit: test: add the concept of assertions
        kunit: test: add tests for kunit test abort
        kunit: test: add support for test abort
        objtool: add kunit_try_catch_throw to the noreturn list
        kunit: test: add initial tests
        lib: enable building KUnit in lib/
        kunit: test: add the concept of expectations
        kunit: test: add assertion printing library
        ...
      e25645b1
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.5-rc1-fixes' of... · db7d2754
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
       "This consists of several fixes to tests and framework.
      
        Masami Hiramatsu fixed several tests to build and run correctly on arm
        and other 32bit architectures"
      
      * tag 'linux-kselftest-5.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: sync: Fix cast warnings on arm
        selftests: net: Fix printf format warnings on arm
        selftests: net: Use size_t and ssize_t for counting file size
        selftests: vm: Build/Run 64bit tests only on 64bit arch
        selftests: proc: Make va_max 1MB
        kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
        selftests: Move kselftest_module.sh into kselftest/
        selftests: gen_kselftest_tar.sh: Do not clobber kselftest/
        selftests: breakpoints: Fix a typo of function name
        selftests: Fix O= and KBUILD_OUTPUT handling for relative paths
      db7d2754
    • Linus Torvalds's avatar
      Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 1c1ff483
      Linus Torvalds authored
      Pull fsverity updates from Eric Biggers:
       "Expose the fs-verity bit through statx()"
      
      * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        docs: fs-verity: mention statx() support
        f2fs: support STATX_ATTR_VERITY
        ext4: support STATX_ATTR_VERITY
        statx: define STATX_ATTR_VERITY
        docs: fs-verity: document first supported kernel version
      1c1ff483
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · ea4b71bc
      Linus Torvalds authored
      Pull fscrypt updates from Eric Biggers:
      
       - Add the IV_INO_LBLK_64 encryption policy flag which modifies the
         encryption to be optimized for UFS inline encryption hardware.
      
       - For AES-128-CBC, use the crypto API's implementation of ESSIV (which
         was added in 5.4) rather than doing ESSIV manually.
      
       - A few other cleanups.
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        f2fs: add support for IV_INO_LBLK_64 encryption policies
        ext4: add support for IV_INO_LBLK_64 encryption policies
        fscrypt: add support for IV_INO_LBLK_64 policies
        fscrypt: avoid data race on fscrypt_mode::logged_impl_name
        docs: ioctl-number: document fscrypt ioctl numbers
        fscrypt: zeroize fscrypt_info before freeing
        fscrypt: remove struct fscrypt_ctx
        fscrypt: invoke crypto API for ESSIV handling
      ea4b71bc
    • Linus Torvalds's avatar
      Merge tag 'affs-for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · ae36607b
      Linus Torvalds authored
      Pull AFFS updates from David Sterba:
       "A minor bugfix and cleanup for AFFS"
      
      * tag 'affs-for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: fix a memory leak in affs_remount
        affs: Replace binary semaphores with mutexes
      ae36607b
    • Linus Torvalds's avatar
      Merge tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 97d0bf96
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "User visible changes:
         - new block group profiles: RAID1 with 3- and 4- copies
             - RAID1 in btrfs has always 2 copies, now add support for 3 and 4
             - this is an incompat feature (named RAID1C34)
             - recommended use of RAID1C3 is replacement of RAID6 profile on
               metadata, this brings a more reliable resiliency against 2
               device loss/damage
      
         - support for new checksums
             - per-filesystem, set at mkfs time
             - fast hash (crc32c successor): xxhash, 64bit digest
             - strong hashes (both 256bit): sha256 (slower, FIPS), blake2b
               (faster)
             - the blake2b module goes via the crypto tree, btrfs.ko has a
               soft dependency
      
         - speed up lseek, don't take inode locks unnecessarily, this can
           speed up parallel SEEK_CUR/SEEK_SET/SEEK_END by 80%
      
         - send:
             - allow clone operations within the same file
             - limit maximum number of sent clone references to avoid slow
               backref walking
      
         - error message improvements: device scan prints process name and PID
      
        Core changes:
         - cleanups
             - remove unique workqueue helpers, used to provide a way to avoid
               deadlocks in the workqueue code, now done in a simpler way
             - remove lots of indirect function calls in compression code
             - extent IO tree code moved out of extent_io.c
             - cleanup backup superblock handling at mount time
             - transaction life cycle documentation and cleanups
             - locking code cleanups, annotations and documentation
             - add more cold, const, pure function attributes
             - removal of unused or redundant struct members or variables
      
         - new tree-checker sanity tests
             - try to detect missing INODE_ITEM, cross-reference checks of
               DIR_ITEM, DIR_INDEX, INODE_REF, and XATTR_* items
      
         - remove own bio scheduling code (used to avoid checksum submissions
           being stuck behind other IO), replaced by cgroup controller-based
           code to allow better control and avoid priority inversions in cases
           where the custom and cgroup scheduling disagreed
      
        Fixes:
         - avoid getting stuck during cyclic writebacks
      
         - fix trimming of ranges crossing block group boundaries
      
         - fix rename exchange on subvolumes, all involved subvolumes need to
           be recorded in the transaction"
      
      * tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits)
        btrfs: drop bdev argument from submit_extent_page
        btrfs: remove extent_map::bdev
        btrfs: drop bio_set_dev where not needed
        btrfs: get bdev directly from fs_devices in submit_extent_page
        btrfs: record all roots for rename exchange on a subvol
        Btrfs: fix block group remaining RO forever after error during device replace
        btrfs: scrub: Don't check free space before marking a block group RO
        btrfs: change btrfs_fs_devices::rotating to bool
        btrfs: change btrfs_fs_devices::seeding to bool
        btrfs: rename btrfs_block_group_cache
        btrfs: block-group: Reuse the item key from caller of read_one_block_group()
        btrfs: block-group: Refactor btrfs_read_block_groups()
        btrfs: document extent buffer locking
        btrfs: access eb::blocking_writers according to ACCESS_ONCE policies
        btrfs: set blocking_writers directly, no increment or decrement
        btrfs: merge blocking_writers branches in btrfs_tree_read_lock
        btrfs: drop incompat bit for raid1c34 after last block group is gone
        btrfs: add incompat for raid1 with 3, 4 copies
        btrfs: add support for 4-copy replication (raid1c4)
        btrfs: add support for 3-copy replication (raid1c3)
        ...
      97d0bf96
    • Linus Torvalds's avatar
      Merge tag 'mtd/for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 1b88176b
      Linus Torvalds authored
      Pull MTD updates from Miquel Raynal:
       "MTD core:
         - drop inactive maintainers, update the repositories and add IRC
           channel
         - debugfs functions improvements
         - initialize more structure parameters
         - misc fixes reported by robots
      
        MTD devices:
         - spear_smi: Fixed Write Burst mode
         - new Intel IXP4xx flash probing hook
      
        Raw NAND core:
         - useless extra checks dropped
         - update the detection of the bad block markers position
      
        Raw NAND controller drivers:
         - Cadence: new driver
         - Brcmnand: support for flash-dma v0 + fixes
         - Denali: drop support for the legacy controller/chip DT representation
         - superfluous dev_err() calls removed
      
        SPI NOR core changes:
         - introduce 'struct spi_nor_controller_ops'
         - clean the Register Operations methods
         - use dev_dbg insted of dev_err for low level info
         - fix retlen handling in sst_write()
         - fix silent truncations in spi_nor_read and spi_nor_read_raw()
         - fix the clearing of QE bit on lock()/unlock()
         - rework the disabling of the block write protection
         - rework the Quad Enable methods
         - make sure nor->spimem and nor->controller_ops are mutually exclusive
         - set default Quad Enable method for ISSI flashes
         - add support for few flashes
      
        SPI NOR controller drivers changes:
         - intel-spi:
            - support chips without software sequencer
            - add support for Intel Cannon Lake and Intel Comet Lake-H flashes
      
        CFI core changes:
         - code cleanups related useless initializers and coding style issues
         - fix for a possible double free problem in cfi_cmdset_0002
         - improved HyperFlash error reporting and handling in cfi_cmdset_0002 core"
      
      * tag 'mtd/for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (73 commits)
        mtd: devices: fix mchp23k256 read and write
        mtd: no need to check return value of debugfs_create functions
        mtd: spi-nor: Set default Quad Enable method for ISSI flashes
        mtd: spi-nor: Add support for is25wp256
        mtd: spi-nor: Add support for w25q256jw
        mtd: spi-nor: Move condition to avoid a NULL check
        mtd: spi-nor: Make sure nor->spimem and nor->controller_ops are mutually exclusive
        mtd: spi-nor: Rename Quad Enable methods
        mtd: spi-nor: Merge spansion Quad Enable methods
        mtd: spi-nor: Rename CR_QUAD_EN_SPAN to SR2_QUAD_EN_BIT1
        mtd: spi-nor: Extend the SR Read Back test
        mtd: spi-nor: Rework the disabling of block write protection
        mtd: spi-nor: Fix clearing of QE bit on lock()/unlock()
        mtd: cfi_cmdset_0002: fix delayed error detection on HyperFlash
        mtd: cfi_cmdset_0002: only check errors when ready in cfi_check_err_status()
        mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
        mtd: cfi_cmdset_*: kill useless 'ret' variable initializers
        mtd: cfi_util: use DIV_ROUND_UP() in cfi_udelay()
        mtd: spi-nor: Print debug message when the read back test fails
        mtd: spi-nor: Check all the bits written, not just the BP ones
        ...
      1b88176b
    • Linus Torvalds's avatar
      Merge tag 'for-5.5/dm-changes' of... · eeee2827
      Linus Torvalds authored
      Merge tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mike Snitzer:
      
       - Fix DM core to disallow stacking request-based DM on partitions.
      
       - Fix DM raid target to properly resync raidset even if bitmap needed
         additional pages.
      
       - Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
         IO and crypt workqueues.
      
       - Fix DM integrity metadata layout that was aligned on 128K boundary
         rather than the intended 4K boundary (removes 124K of wasted space
         for each metadata block).
      
       - Improve the DM thin, cache and clone targets to use spin_lock_irq
         rather than spin_lock_irqsave where possible.
      
       - Fix DM thin single thread performance that was lost due to needless
         workqueue wakeups.
      
       - Fix DM zoned target performance that was lost due to excessive
         backing device checks.
      
       - Add ability to trigger write failure with the DM dust test target.
      
       - Fix whitespace indentation in drivers/md/Kconfig.
      
       - Various smalls fixes and cleanups (e.g. use struct_size, fix
         uninitialized variable, variable renames, etc).
      
      * tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
        Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
        dm: Fix Kconfig indentation
        dm thin: wakeup worker only when deferred bios exist
        dm integrity: fix excessive alignment of metadata runs
        dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
        dm zoned: reduce overhead of backing device checks
        dm dust: add limited write failure mode
        dm dust: change ret to r in dust_map_read and dust_map
        dm dust: change result vars to r
        dm cache: replace spin_lock_irqsave with spin_lock_irq
        dm bio prison: replace spin_lock_irqsave with spin_lock_irq
        dm thin: replace spin_lock_irqsave with spin_lock_irq
        dm clone: add bucket_lock_irq/bucket_unlock_irq helpers
        dm clone: replace spin_lock_irqsave with spin_lock_irq
        dm writecache: handle REQ_FUA
        dm writecache: fix uninitialized variable warning
        dm stripe: use struct_size() in kmalloc()
        dm raid: streamline rs_get_progress() and its raid_status() caller side
        dm raid: simplify rs_setup_recovery call chain
        dm raid: to ensure resynchronization, perform raid set grow in preresume
        ...
      eeee2827
    • Linus Torvalds's avatar
      Merge tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block · 7e5192b9
      Linus Torvalds authored
      Pull disk revalidation updates from Jens Axboe:
       "This continues the work that Jan Kara started to thoroughly cleanup
        and consolidate how we handle rescans and revalidations"
      
      * tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block:
        block: move clearing bd_invalidated into check_disk_size_change
        block: remove (__)blkdev_reread_part as an exported API
        block: fix bdev_disk_changed for non-partitioned devices
        block: move rescan_partitions to fs/block_dev.c
        block: merge invalidate_partitions into rescan_partitions
        block: refactor rescan_partitions
      7e5192b9
    • Linus Torvalds's avatar
      Merge tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block · 464a47f4
      Linus Torvalds authored
      Pull zoned block device update from Jens Axboe:
       "Enhancements and improvements to the zoned device support"
      
      * tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block:
        scsi: sd_zbc: Remove set but not used variable 'buflen'
        block: rework zone reporting
        scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
        null_blk: Add zone_nr_conv to features
        null_blk: clean up report zones
        null_blk: clean up the block device operations
        block: Remove partition support for zoned block devices
        block: Simplify report zones execution
        block: cleanup the !zoned case in blk_revalidate_disk_zones
        block: Enhance blk_revalidate_disk_zones()
      464a47f4
    • Linus Torvalds's avatar
      Merge tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block · 323264ee
      Linus Torvalds authored
      Pull additional block driver updates from Jens Axboe:
       "Here's another block driver update, done to avoid conflicts with the
        zoned changes coming next.
      
        This contains:
      
         - Prepare SCSI sd for zone open/close/finish support
      
         - Small NVMe pull request
              - hwmon support (Akinobu)
              - add new co-maintainer (Christoph)
              - work-around for a discard issue on non-conformant drives
                (Eduard)
      
         - Small nbd leak fix"
      
      * tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
        nbd: prevent memory leak
        nvme: hwmon: add quirk to avoid changing temperature threshold
        nvme: hwmon: provide temperature min and max values for each sensor
        nvmet: add another maintainer
        nvme: Discard workaround for non-conformant devices
        nvme: Add hardware monitoring support
        scsi: sd_zbc: add zone open, close, and finish support
      323264ee
    • Linus Torvalds's avatar
      Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block · 2d539430
      Linus Torvalds authored
      Pull block driver updates from Jens Axboe:
       "Here are the main block driver updates for 5.5. Nothing major in here,
        mostly just fixes. This contains:
      
         - a set of bcache changes via Coly
      
         - MD changes from Song
      
         - loop unmap write-zeroes fix (Darrick)
      
         - spelling fixes (Geert)
      
         - zoned additions cleanups to null_blk/dm (Ajay)
      
         - allow null_blk online submit queue changes (Bart)
      
         - NVMe changes via Keith, nothing major here either"
      
      * tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits)
        Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()"
        drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
        drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
        bcache: don't export symbols
        bcache: remove the extra cflags for request.o
        bcache: at least try to shrink 1 node in bch_mca_scan()
        bcache: add idle_max_writeback_rate sysfs interface
        bcache: add code comments in bch_btree_leaf_dirty()
        bcache: fix deadlock in bcache_allocator
        bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
        bcache: deleted code comments for dead code in bch_data_insert_keys()
        bcache: add more accurate error messages in read_super()
        bcache: fix static checker warning in bcache_device_free()
        bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
        bcache: fix fifo index swapping condition in journal_pin_cmp()
        md/raid10: prevent access of uninitialized resync_pages offset
        md: avoid invalid memory access for array sb->dev_roles
        md/raid1: avoid soft lockup under high load
        null_blk: add zone open, close, and finish support
        dm: add zone open, close and finish support
        ...
      2d539430