1. 25 May, 2018 1 commit
  2. 20 May, 2018 14 commits
    • Linus Torvalds's avatar
      Linux 4.17-rc6 · 771c577c
      Linus Torvalds authored
      771c577c
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 6fd5a36a
      Linus Torvalds authored
      Pull parisc fixlets from Helge Deller:
       "Three small section mismatch fixes, one of them was found by 0-day
        test infrastructure"
      
      * 'parisc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Move ccio_cujo20_fixup() into init section
        parisc: Move setup_profiling_timer() out of init section
        parisc: Move find_pa_parent_type() out of init section
      6fd5a36a
    • Linus Torvalds's avatar
      Merge tag 'for-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · e5e03ad9
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "We've accumulated some fixes during the last week, some of them were
        in the works for a longer time but there are some newer ones too.
      
        Most of the fixes have a reproducer and fix user visible problems,
        also candidates for stable kernels. They IMHO qualify for a late rc,
        though I did not expect that many"
      
      * tag 'for-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix crash when trying to resume balance without the resume flag
        btrfs: Fix delalloc inodes invalidation during transaction abort
        btrfs: Split btrfs_del_delalloc_inode into 2 functions
        btrfs: fix reading stale metadata blocks after degraded raid1 mounts
        btrfs: property: Set incompat flag if lzo/zstd compression is set
        Btrfs: fix duplicate extents after fsync of file with prealloc extents
        Btrfs: fix xattr loss after power failure
        Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting
      e5e03ad9
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 132ce5d4
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
      
       - Łukasz Stelmach spotted a couple of issues with the decompressor.
      
       - a couple of kdump fixes found while testing kdump
      
       - replace some perl with shell code
      
       - resolve SIGFPE breakage
      
       - kprobes fixes
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: fix kill( ,SIGFPE) breakage
        ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
        ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
        ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
        ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
        ARM: replace unnecessary perl with sed and the shell $(( )) operator
        ARM: kexec: record parent context registers for non-crash CPUs
        ARM: kexec: fix kdump register saving on panic()
        ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel
        ARM: 8753/1: decompressor: add a missing parameter to the addruart macro
      132ce5d4
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8a6bd2f4
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "An unfortunately larger set of fixes, but a large portion is
        selftests:
      
         - Fix the missing clusterid initializaiton for x2apic cluster
           management which caused boot failures due to IPIs being sent to the
           wrong cluster
      
         - Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
           task
      
         - Wrap access to __supported_pte_mask in __startup_64() where clang
           compile fails due to a non PC relative access being generated.
      
         - Two fixes for 5 level paging fallout in the decompressor:
      
            - Handle GOT correctly for paging_prepare() and
              cleanup_trampoline()
      
            - Fix the page table handling in cleanup_trampoline() to avoid
              page table corruption.
      
         - Stop special casing protection key 0 as this is inconsistent with
           the manpage and also inconsistent with the allocation map handling.
      
         - Override the protection key wen moving away from PROT_EXEC to
           prevent inaccessible memory.
      
         - Fix and update the protection key selftests to address breakage and
           to cover the above issue
      
         - Add a MOV SS self test"
      
      [ Part of the x86 fixes were in the earlier core pull due to dependencies ]
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
        x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
        x86/apic/x2apic: Initialize cluster ID properly
        x86/boot/compressed/64: Fix moving page table out of trampoline memory
        x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
        x86/pkeys: Do not special case protection key 0
        x86/pkeys/selftests: Add a test for pkey 0
        x86/pkeys/selftests: Save off 'prot' for allocations
        x86/pkeys/selftests: Fix pointer math
        x86/pkeys: Override pkey when moving away from PROT_EXEC
        x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
        x86/pkeys/selftests: Add PROT_EXEC test
        x86/pkeys/selftests: Factor out "instruction page"
        x86/pkeys/selftests: Allow faults on unknown keys
        x86/pkeys/selftests: Avoid printf-in-signal deadlocks
        x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
        x86/pkeys/selftests: Stop using assert()
        x86/pkeys/selftests: Give better unexpected fault error messages
        x86/selftests: Add mov_to_ss test
        x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
        x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
        ...
      8a6bd2f4
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b9aad922
      Linus Torvalds authored
      Pull UP timer fix from Thomas Gleixner:
       "Work around the for_each_cpu() oddity on UP kernels in the tick
        broadcast code which causes boot failures because the CPU0 bit is
        always reported as set independent of the cpumask content"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick/broadcast: Use for_each_cpu() specially on UP kernels
      b9aad922
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 441cab96
      Linus Torvalds authored
      Pull scheduler fixlets from Thomas Gleixner:
       "Three trivial fixlets for the scheduler:
      
         - move print_rt_rq() and print_dl_rq() declarations to the right
           place
      
         - make grub_reclaim() static
      
         - fix the bogus documentation reference in Kconfig"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix documentation file path
        sched/deadline: Make the grub_reclaim() function static
        sched/debug: Move the print_rt_rq() and print_dl_rq() declarations to kernel/sched/sched.h
      441cab96
    • Linus Torvalds's avatar
      Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 74cce52f
      Linus Torvalds authored
      Pull RAS fix from Thomas Gleixner:
       "Fix a regression in the new AMD SMCA code which issues an SMP function
        call from the early interrupt disabled region of CPU hotplug. To avoid
        that, use cached block addresses which can be used directly"
      
      * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/MCE/AMD: Cache SMCA MISC block addresses
      74cce52f
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 95bcce4d
      Linus Torvalds authored
      Pull perf tooling fixes from Thomas Gleixner:
      
       - fix segfault when processing unknown threads in cs-etm
      
       - fix "perf test inet_pton" on s390 failing due to missing inline
      
       - display all available events on 'perf annotate --stdio'
      
       - add missing newline when parsing an empty BPF program
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Add missing newline when parsing empty BPF proggie
        perf cs-etm: Remove redundant space
        perf cs-etm: Support unknown_thread in cs_etm_auxtrace
        perf annotate: Display all available events on --stdio
        perf test: "probe libc's inet_pton" fails on s390 due to missing inline
      95bcce4d
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b65f455
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "Two fixes to address shortcomings of the rwsem/percpu-rwsem lock
        debugging code which emits false positive warnings when the rwsem is
        anonymously locked and unlocked"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
        locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag
      4b65f455
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 056ad121
      Linus Torvalds authored
      Pull EFI fixes from Thomas Gleixner:
      
       - Use explicitely sized type for the romimage pointer in the 32bit EFI
         protocol struct so a 64bit kernel does not expand it to 64bit. Ditto
         for the 64bit struct to avoid the reverse issue on 32bit kernels.
      
       - Handle randomized tex offset correctly in the ARM64 EFI stub to avoid
         unaligned data resulting in stack corruption and other hard to
         diagnose wreckage.
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/libstub/arm64: Handle randomized TEXT_OFFSET
        efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
      056ad121
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 583dbad3
      Linus Torvalds authored
      Pull core fixes from Thomas Gleixner:
      
       - Unbreak the BPF compilation which got broken by the unconditional
         requirement of asm-goto, which is not supported by clang.
      
       - Prevent probing on exception masking instructions in uprobes and
         kprobes to avoid the issues of the delayed exceptions instead of
         having an ugly workaround.
      
       - Prevent a double free_page() in the error path of do_kexec_load()
      
       - A set of objtool updates addressing various issues mostly related to
         switch tables and the noreturn detection for recursive sibling calls
      
       - Header sync for tools.
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Detect RIP-relative switch table references, part 2
        objtool: Detect RIP-relative switch table references
        objtool: Support GCC 8 switch tables
        objtool: Support GCC 8's cold subfunctions
        objtool: Fix "noreturn" detection for recursive sibling calls
        objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h
        x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
        uprobes/x86: Prohibit probing on MOV SS instruction
        kprobes/x86: Prohibit probing on exception masking instructions
        x86/kexec: Avoid double free_page() upon do_kexec_load() failure
      583dbad3
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 203ec2fe
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A handful of fixes. I've been queuing them up a bit too long so the
        list is longer than it otherwise would have been spread out across a
        few -rcs.
      
        In general, it's a scattering of fixes across several platforms,
        nothing truly serious enough to point out.
      
        There's a slightly larger batch of them for the Davinci platforms due
        to work to bring them back to life after some time, so there's a
        handful of regressions, some of them going back very far, others more
        recent.
      
        There's also a few patches fixing DT on Renesas platforms since they
        changed some bindings without remaining backwards compatible,
        splitting up describing LVDS as a proper bridge instead of having it
        as part of the display unit.
      
        We could push for them to be backwards compatible with old device
        trees, but it's likely to regress eventually if nobody's actually
        using said compatibility"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (36 commits)
        ARM: davinci: board-dm646x-evm: set VPIF capture card name
        ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
        ARM: davinci: dm646x: fix timer interrupt generation
        ARM: keystone: fix platform_domain_notifier array overrun
        arm64: dts: exynos: Fix interrupt type for I2S1 device on Exynos5433
        ARM: dts: imx51-zii-rdu1: fix touchscreen bindings
        firmware: arm_scmi: Use after free in scmi_create_protocol_device()
        ARM: dts: cygnus: fix irq type for arm global timer
        Revert "ARM: dts: logicpd-som-lv: Fix pinmux controller references"
        tee: check shm references are consistent in offset/size
        tee: shm: fix use-after-free via temporarily dropped reference
        ARM: dts: imx7s: Pass the 'fsl,sec-era' property
        ARM: dts: tegra20: Revert "Fix ULPI regression on Tegra20"
        ARM: dts: correct missing "compatible" entry for ti81xx SoCs
        ARM: OMAP1: ams-delta: fix deferred_fiq handler
        arm64: tegra: Make BCM89610 PHY interrupt as active low
        ARM: davinci: fix GPIO lookup for I2C
        ARM: dts: logicpd-som-lv: Fix pinmux controller references
        ARM: dts: logicpd-som-lv: Fix Audio Mute
        ARM: dts: logicpd-som-lv: Fix WL127x Startup Issues
        ...
      203ec2fe
    • Olof Johansson's avatar
      Merge tag 'tegra-for-4.17-fixes-2' of... · 709f490d
      Olof Johansson authored
      Merge tag 'tegra-for-4.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into fixes
      
      arm64: tegra: Device tree fixes for v4.17
      
      This contains a one-line update to the device tree of the Tegra186 P3310
      processor module, fixing the polarity of the PHY interrupt. Originally,
      this was queued to go into v4.18, but the PHY ID matching patch has now
      found its way into v4.17-rc5, which means that the PHY driver will know
      how to identify the PHY on this board and try to use the interrupt. This
      will unfortunately cause networking to break on P3310, hence why I think
      this should go into v4.17.
      
      * tag 'tegra-for-4.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
        arm64: tegra: Make BCM89610 PHY interrupt as active low
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      709f490d
  3. 19 May, 2018 25 commits
    • Russell King's avatar
      ARM: fix kill( ,SIGFPE) breakage · 92d44a42
      Russell King authored
      Commit 7771c664 ("signal/arm: Document conflicts with SI_USER and
      SIGFPE") broke the siginfo structure for userspace triggered signals,
      causing the strace testsuite to regress.  Fix this by eliminating
      the FPE_FIXME definition (which is at the root of the breakage) and
      use FPE_FLTINV instead for the case where the hardware appears to be
      reporting nonsense.
      
      Fixes: 7771c664 ("signal/arm: Document conflicts with SI_USER and SIGFPE")
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      92d44a42
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma · 0b449a44
      Linus Torvalds authored
      Pull dmaengine fix from Vinod Koul:
      
       - qcom bam runtime_pm fix
      
       - email update for Vinod
      
      * tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: qcom: bam_dma: check if the runtime pm enabled
        dmaengine: Update email address for Vinod
      0b449a44
    • Linus Torvalds's avatar
      mmap: relax file size limit for regular files · 423913ad
      Linus Torvalds authored
      Commit be83bbf8 ("mmap: introduce sane default mmap limits") was
      introduced to catch problems in various ad-hoc character device drivers
      doing mmap and getting the size limits wrong.  In the process, it used
      "known good" limits for the normal cases of mapping regular files and
      block device drivers.
      
      It turns out that the "s_maxbytes" limit was less "known good" than I
      thought.  In particular, /proc doesn't set it, but exposes one regular
      file to mmap: /proc/vmcore.  As a result, that file got limited to the
      default MAX_INT s_maxbytes value.
      
      This went unnoticed for a while, because apparently the only thing that
      needs it is the s390 kernel zfcpdump, but there might be other tools
      that use this too.
      
      Vasily suggested just changing s_maxbytes for all of /proc, which isn't
      wrong, but makes me nervous at this stage.  So instead, just make the
      new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't
      affect anything else.  It wasn't the regular file case I was worried
      about.
      
      I'd really prefer for maxsize to have been per-inode, but that is not
      how things are today.
      
      Fixes: be83bbf8 ("mmap: introduce sane default mmap limits")
      Reported-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      423913ad
    • Borislav Petkov's avatar
      x86/MCE/AMD: Cache SMCA MISC block addresses · 78ce2410
      Borislav Petkov authored
      ... into a global, two-dimensional array and service subsequent reads from
      that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs
      disabled).
      
      In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage
      of the bank->blocks pointer.
      
      Fixes: 27bd5950 ("x86/mce/AMD: Get address from already initialized block")
      Reported-by: default avatarJohannes Hirte <johannes.hirte@datenkhaos.de>
      Tested-by: default avatarJohannes Hirte <johannes.hirte@datenkhaos.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Yazen Ghannam <yazen.ghannam@amd.com>
      Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
      78ce2410
    • Masami Hiramatsu's avatar
      ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions · 0d73c3f8
      Masami Hiramatsu authored
      Since do_undefinstr() uses get_user to get the undefined
      instruction, it can be called before kprobes processes
      recursive check. This can cause an infinit recursive
      exception.
      Prohibit probing on get_user functions.
      
      Fixes: 24ba613c ("ARM kprobes: core code")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      0d73c3f8
    • Masami Hiramatsu's avatar
      ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr · eb0146da
      Masami Hiramatsu authored
      Prohibit kprobes on do_undefinstr because kprobes on
      arm is implemented by undefined instruction. This means
      if we probe do_undefinstr(), it can cause infinit
      recursive exception.
      
      Fixes: 24ba613c ("ARM kprobes: core code")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      eb0146da
    • Masami Hiramatsu's avatar
      ARM: 8770/1: kprobes: Prohibit probing on optimized_callback · 70948c05
      Masami Hiramatsu authored
      Prohibit probing on optimized_callback() because
      it is called from kprobes itself. If we put a kprobes
      on it, that will cause a recursive call loop.
      Mark it NOKPROBE_SYMBOL.
      
      Fixes: 0dc016db ("ARM: kprobes: enable OPTPROBES for ARM 32")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      70948c05
    • Masami Hiramatsu's avatar
      ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed · 69af7e23
      Masami Hiramatsu authored
      Since get_kprobe_ctlblk() uses smp_processor_id() to access
      per-cpu variable, it hits smp_processor_id sanity check as below.
      
      [    7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
      [    7.007859] caller is debug_smp_processor_id+0x20/0x24
      [    7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1
      [    7.008890] Hardware name: Generic DT based system
      [    7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24)
      [    7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98)
      [    7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c)
      [    7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24)
      [    7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4)
      [    7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0)
      
      To fix this issue, call get_kprobe_ctlblk() right after
      irq-disabled since that disables preemption.
      
      Fixes: 0dc016db ("ARM: kprobes: enable OPTPROBES for ARM 32")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      69af7e23
    • Russell King's avatar
      ARM: replace unnecessary perl with sed and the shell $(( )) operator · 6cea14f5
      Russell King authored
      You can build a kernel in a cross compiling environment that doesn't
      have perl in the $PATH. Commit 429f7a06 broke that for 32 bit
      ARM. Fix it.
      
      As reported by Stephen Rothwell, it appears that the symbols can be
      either part of the BSS section or absolute symbols depending on the
      binutils version.  When they're an absolute symbol, the $(( ))
      operator errors out and the build fails.  Fix this as well.
      
      Fixes: 429f7a06 ("ARM: decompressor: fix BSS size calculation")
      Reported-by: default avatarRob Landley <rob@landley.net>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarRob Landley <rob@landley.net>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      6cea14f5
    • Russell King's avatar
      ARM: kexec: record parent context registers for non-crash CPUs · 1c37963b
      Russell King authored
      How we got to machine_crash_nonpanic_core() (iow, from an IPI, etc) is
      not interesting for debugging a crash.  The more interesting context
      is the parent context prior to the IPI being received.
      
      Record the parent context register state rather than the register state
      in machine_crash_nonpanic_core(), which is more relevant to the failing
      condition.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      1c37963b
    • Russell King's avatar
      ARM: kexec: fix kdump register saving on panic() · 2d7b3c64
      Russell King authored
      When a panic() occurs, the kexec code uses smp_send_stop() to stop
      the other CPUs, but this results in the CPU register state not being
      saved, and gdb is unable to inspect the state of other CPUs.
      
      Commit 0ee59413 ("x86/panic: replace smp_send_stop() with kdump
      friendly version in panic path") addressed the issue on x86, but
      ignored other architectures.  Address the issue on ARM by splitting
      out the crash stop implementation to crash_smp_send_stop() and
      adding the necessary protection.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      2d7b3c64
    • Łukasz Stelmach's avatar
      ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel · f2ae9de0
      Łukasz Stelmach authored
      The hypervisor setup before __enter_kernel destroys the value
      sotred in r1. The value needs to be restored just before the jump.
      
      Fixes: 6b52f7bd ("ARM: hyp-stub: Use r1 for the soft-restart address")
      Signed-off-by: default avatarŁukasz Stelmach <l.stelmach@samsung.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      f2ae9de0
    • Łukasz Stelmach's avatar
      ARM: 8753/1: decompressor: add a missing parameter to the addruart macro · e07e3c33
      Łukasz Stelmach authored
      In commit 639da5ee ("ARM: add an extra temp register to the low
      level debugging addruart macro") an additional temporary register was
      added to the addruart macro, but the decompressor code wasn't updated.
      
      Fixes: 639da5ee ("ARM: add an extra temp register to the low level debugging addruart macro")
      Signed-off-by: default avatarŁukasz Stelmach <l.stelmach@samsung.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      e07e3c33
    • Dmitry Safonov's avatar
      x86/mm: Drop TS_COMPAT on 64-bit exec() syscall · acf46020
      Dmitry Safonov authored
      The x86 mmap() code selects the mmap base for an allocation depending on
      the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
      for 32bit mm->mmap_compat_base.
      
      exec() calls mmap() which in turn uses in_compat_syscall() to check whether
      the mapping is for a 32bit or a 64bit task. The decision is made on the
      following criteria:
      
        ia32    child->thread.status & TS_COMPAT
         x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
        ia64    !ia32 && !x32
      
      __set_personality_x32() was dropping TS_COMPAT flag, but
      set_personality_64bit() has kept compat syscall flag making
      in_compat_syscall() return true during the first exec() syscall.
      
      Which in result has user-visible effects, mentioned by Alexey:
      1) It breaks ASAN
      $ gcc -fsanitize=address wrap.c -o wrap-asan
      $ ./wrap32 ./wrap-asan true
      ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
      ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
      ==1217==Process memory map follows:
              0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
              0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
              0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
              0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
              0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
              0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
              0x0000f7fe4000-0x0000f7fe5000
              0x7fed9abff000-0x7fed9af54000
              0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
      [snip]
      
      2) It doesn't seem to be great for security if an attacker always knows
      that ld.so is going to be mapped into the first 4GB in this case
      (the same thing happens for PIEs as well).
      
      The testcase:
      $ cat wrap.c
      
      int main(int argc, char *argv[]) {
        execvp(argv[1], &argv[1]);
        return 127;
      }
      
      $ gcc wrap.c -o wrap
      $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
      AT_BASE:         0x7f63b8309000
      AT_BASE:         0x7faec143c000
      AT_BASE:         0x7fbdb25fa000
      
      $ gcc -m32 wrap.c -o wrap32
      $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
      AT_BASE:         0xf7eff000
      AT_BASE:         0xf7cee000
      AT_BASE:         0x7f8b9774e000
      
      Fixes: 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
      Fixes: ada26481 ("x86/mm: Make in_compat_syscall() work during exec")
      Reported-by: default avatarAlexey Izbyshev <izbyshev@ispras.ru>
      Bisected-by: default avatarAlexander Monakov <amonakov@ispras.ru>
      Investigated-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Alexander Monakov <amonakov@ispras.ru>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
      acf46020
    • Josh Poimboeuf's avatar
      objtool: Detect RIP-relative switch table references, part 2 · 7dec80cc
      Josh Poimboeuf authored
      With the following commit:
      
        fd35c88b ("objtool: Support GCC 8 switch tables")
      
      I added a "can't find switch jump table" warning, to stop covering up
      silent failures if add_switch_table() can't find anything.
      
      That warning found yet another bug in the objtool switch table detection
      logic.  For cases 1 and 2 (as described in the comments of
      find_switch_table()), the find_symbol_containing() check doesn't adjust
      the offset for RIP-relative switch jumps.
      
      Incidentally, this bug was already fixed for case 3 with:
      
        6f5ec299 ("objtool: Detect RIP-relative switch table references")
      
      However, that commit missed the fix for cases 1 and 2.
      
      The different cases are now starting to look more and more alike.  So
      fix the bug by consolidating them into a single case, by checking the
      original dynamic jump instruction in the case 3 loop.
      
      This also simplifies the code and makes it more robust against future
      switch table detection issues -- of which I'm sure there will be many...
      
      Switch table detection has been the most fragile area of objtool, by
      far.  I long for the day when we'll have a GCC plugin for annotating
      switch tables.  Linus asked me to delay such a plugin due to the
      flakiness of the plugin infrastructure in older versions of GCC, so this
      rickety code is what we're stuck with for now.  At least the code is now
      a little simpler than it was.
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/f400541613d45689086329432f3095119ffbc328.1526674218.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7dec80cc
    • Mark Rutland's avatar
      efi/libstub/arm64: Handle randomized TEXT_OFFSET · 4f74d72a
      Mark Rutland authored
      When CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET is an arbitrary
      multiple of PAGE_SIZE in the interval [0, 2MB).
      
      The EFI stub does not account for the potential misalignment of
      TEXT_OFFSET relative to EFI_KIMG_ALIGN, and produces a randomized
      physical offset which is always a round multiple of EFI_KIMG_ALIGN.
      This may result in statically allocated objects whose alignment exceeds
      PAGE_SIZE to appear misaligned in memory. This has been observed to
      result in spurious stack overflow reports and failure to make use of
      the IRQ stacks, and theoretically could result in a number of other
      issues.
      
      We can OR in the low bits of TEXT_OFFSET to ensure that we have the
      necessary offset (and hence preserve the misalignment of TEXT_OFFSET
      relative to EFI_KIMG_ALIGN), so let's do that.
      Reported-by: default avatarKim Phillips <kim.phillips@arm.com>
      Tested-by: default avatarKim Phillips <kim.phillips@arm.com>
      [ardb: clarify comment and commit log, drop unneeded parens]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 6f26b367 ("arm64: kaslr: increase randomization granularity")
      Link: http://lkml.kernel.org/r/20180518140841.9731-2-ard.biesheuvel@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4f74d72a
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 73fcb1a3
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        hfsplus: stop workqueue when fill_super() failed
        mm: don't allow deferred pages with NEED_PER_CPU_KM
        MAINTAINERS: add Q: entry to kselftest for patchwork project
        radix tree: fix multi-order iteration race
        radix tree test suite: multi-order iteration race
        radix tree test suite: add item_delete_rcu()
        radix tree test suite: fix compilation issue
        radix tree test suite: fix mapshift build target
        include/linux/mm.h: add new inline function vmf_error()
        lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly
      73fcb1a3
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform-drivers-x86 · 10a2f874
      Linus Torvalds authored
      Pull x86 platform driver fix from Darren Hart:
       "Remove the last of the "select DELL_SMBIOS" references in the Kconfig"
      
      * tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/x86: DELL_WMI use depends on instead of select for DELL_SMBIOS
      10a2f874
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · f65cfecf
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
      
       - a modified revert of a patch that made new choices come out for a
         couple stm32 clk drivers that really always need to be there when
         that particular machine is compiled in
      
       - boot fix on i.MX for Stefan who noticed odd behavior from the
         critical flag patch that came in during the merge window
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: stm32: fix: stm32 clock drivers are not compiled by default
        clk: imx6ull: use OSC clock during AXI rate change
      f65cfecf
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 6d16db00
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "A bunch of driver bugfixes and a MAINTAINERS addition"
      
      * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        MAINTAINERS: add entry for STM32 I2C driver
        i2c: viperboard: return message count on master_xfer success
        i2c: pmcmsp: fix error return from master_xfer
        i2c: pmcmsp: return message count on master_xfer success
        i2c: designware: fix poll-after-enable regression
        eeprom: at24: fix retrieving the at24_chip_data structure
        i2c: core: ACPI: Log device not acking errors at dbg loglevel
        i2c: core: ACPI: Improve OpRegion read errors
      6d16db00
    • Tetsuo Handa's avatar
      hfsplus: stop workqueue when fill_super() failed · 66072c29
      Tetsuo Handa authored
      syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].  This
      is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
      
      As far as I can see, it is hfsplus_mark_mdb_dirty() from
      hfsplus_new_inode() in hfsplus_fill_super() that calls
      queue_delayed_work().  Therefore, I assume that hfsplus_new_inode() does
      not fail if queue_delayed_work() was called, and the out_put_hidden_dir
      label is the appropriate location to call cancel_delayed_work_sync().
      
      [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
      
      Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      66072c29
    • Pavel Tatashin's avatar
      mm: don't allow deferred pages with NEED_PER_CPU_KM · ab1e8d89
      Pavel Tatashin authored
      It is unsafe to do virtual to physical translations before mm_init() is
      called if struct page is needed in order to determine the memory section
      number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
      we initialize struct pages for all the allocated memory when deferred
      struct pages are used.
      
      My recent fix in commit c9e97a19 ("mm: initialize pages on demand
      during boot") exposed this problem, because it greatly reduced number of
      pages that are initialized before mm_init(), but the problem existed
      even before my fix, as Fengguang Wu found.
      
      Below is a more detailed explanation of the problem.
      
      We initialize struct pages in four places:
      
      1. Early in boot a small set of struct pages is initialized to fill the
         first section, and lower zones.
      
      2. During mm_init() we initialize "struct pages" for all the memory that
         is allocated, i.e reserved in memblock.
      
      3. Using on-demand logic when pages are allocated after mm_init call
         (when memblock is finished)
      
      4. After smp_init() when the rest free deferred pages are initialized.
      
      The problem occurs if we try to do va to phys translation of a memory
      between steps 1 and 2.  Because we have not yet initialized struct pages
      for all the reserved pages, it is inherently unsafe to do va to phys if
      the translation itself requires access of "struct page" as in case of
      this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP
      
      The following path exposes the problem:
      
        start_kernel()
         trap_init()
          setup_cpu_entry_areas()
           setup_cpu_entry_area(cpu)
            get_cpu_gdt_paddr(cpu)
             per_cpu_ptr_to_phys(addr)
              pcpu_addr_to_page(addr)
               virt_to_page(addr)
                pfn_to_page(__pa(addr) >> PAGE_SHIFT)
      
      We disable this path by not allowing NEED_PER_CPU_KM with deferred
      struct pages feature.
      
      The problems are discussed in these threads:
        http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com
        http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com
        http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com
      
      Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Dennis Zhou <dennisszhou@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab1e8d89
    • Shuah Khan (Samsung OSG)'s avatar
      MAINTAINERS: add Q: entry to kselftest for patchwork project · f3d8d3cf
      Shuah Khan (Samsung OSG) authored
      A new patchwork project is created to track kselftest patches.  Update
      the kselftest entry in the MAINTAINERS file adding 'Q:' entry:
      
        https://patchwork.kernel.org/project/linux-kselftest/list/
      
      Link: http://lkml.kernel.org/r/20180515164427.12201-1-shuah@kernel.orgSigned-off-by: default avatarShuah Khan (Samsung OSG) <shuah@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3d8d3cf
    • Ross Zwisler's avatar
      radix tree: fix multi-order iteration race · 9f418224
      Ross Zwisler authored
      Fix a race in the multi-order iteration code which causes the kernel to
      hit a GP fault.  This was first seen with a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
      order 9 PMD DAX entries.
      
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember for example that
      an order 2 entry looks like this:
      
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      
      When we delete 'entry' from the tree, we call :
      
        radix_tree_delete()
          radix_tree_delete_item()
            __radix_tree_delete()
              replace_slot()
      
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
      so:
      
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      mm/filemap.c.
      
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
      
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      
      This lets you find the first sibling, and you skip them all in order.
      
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
      
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      'entry'.
      
      We fix this race by fixing the way that skip_siblings() detects sibling
      nodes.  Instead of testing against the preceding slot we instead look
      for siblings via is_sibling_entry() which compares against the position
      of the struct radix_tree_node.slots[] array.  This ensures that sibling
      entries are properly identified, even if they are no longer contiguous
      with the 'entry' they point to.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
      Fixes: 148deab2 ("radix-tree: improve multiorder iterators")
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarCR, Sapthagirish <sapthagirish.cr@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f418224
    • Ross Zwisler's avatar
      radix tree test suite: multi-order iteration race · fd8f58c4
      Ross Zwisler authored
      Add a test which shows a race in the multi-order iteration code.  This
      test reliably hits the race in under a second on my machine, and is the
      result of a real bug report against kernel a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64).  With a real kernel this issue is hit
      when using order 9 PMD DAX radix tree entries.
      
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember that an order 2
      entry looks like this:
      
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      
      When we delete 'entry' from the tree, we call :
      
        radix_tree_delete()
          radix_tree_delete_item()
            __radix_tree_delete()
              replace_slot()
      
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
      so:
      
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      mm/filemap.c.
      
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
      
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      
      This lets you find the first sibling, and you skip them all in order.
      
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
      
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      'entry'.
      
      In the radix tree test suite this will be caught by the address
      sanitizer:
      
        ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address
        0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0
        READ of size 8 at 0x60c0008ae400 thread T3
            #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660
            #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567
            #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655
            #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a)
            #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e)
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd8f58c4