1. 19 Oct, 2017 2 commits
  2. 11 Oct, 2017 3 commits
    • Tony Nguyen's avatar
      PCI: Restore ARI Capable Hierarchy before setting numVFs · ff26449e
      Tony Nguyen authored
      In the restore path, we previously read PCI_SRIOV_VF_OFFSET and
      PCI_SRIOV_VF_STRIDE before restoring PCI_SRIOV_CTRL_ARI:
      
        pci_restore_state
          pci_restore_iov_state
            sriov_restore_state
              pci_iov_set_numvfs
                pci_read_config_word(... PCI_SRIOV_VF_OFFSET, &iov->offset)
                pci_read_config_word(... PCI_SRIOV_VF_STRIDE, &iov->stride)
              pci_write_config_word(... PCI_SRIOV_CTRL, iov->ctrl)
      
      But per SR-IOV r1.1, sec 3.3.3.5, the device can use PCI_SRIOV_CTRL_ARI to
      determine PCI_SRIOV_VF_OFFSET and PCI_SRIOV_VF_STRIDE.  Therefore, this
      path, which is used for suspend/resume and AER recovery, can corrupt
      iov->offset and iov->stride.
      
      Since the iov state is associated with the device, not the driver, if we
      reload the driver, it will use the the corrupted data, which may cause
      crashes like this:
      
        kernel BUG at drivers/pci/iov.c:157!
        RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
        Call Trace:
         pci_enable_sriov+0x353/0x440
         ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
         sriov_numvfs_store+0xf7/0x170
         dev_attr_store+0x18/0x30
         sysfs_kf_write+0x37/0x40
         kernfs_fop_write+0x120/0x1b0
         vfs_write+0xb5/0x1a0
         SyS_write+0x55/0xc0
      
      Restore PCI_SRIOV_CTRL_ARI before calling pci_iov_set_numvfs(), then
      restore the rest of PCI_SRIOV_CTRL (which may set PCI_SRIOV_CTRL_VFE)
      afterwards.
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      [bhelgaas: changelog, add comment, also clear ARI if necessary]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      CC: Emil Tantilov <emil.s.tantilov@intel.com>
      ff26449e
    • Stuart Hayes's avatar
      PCI: Create SR-IOV virtfn/physfn links before attaching driver · 27d61629
      Stuart Hayes authored
      When creating virtual functions, create the "virtfn%u" and "physfn" links
      in sysfs *before* attaching the driver instead of after.  When we attach
      the driver to the new virtual network interface first, there is a race when
      the driver attaches to the new sends out an "add" udev event, and the
      network interface naming software (biosdevname or systemd, for example)
      tries to look at these links.
      Signed-off-by: default avatarStuart Hayes <stuart.w.hayes@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      27d61629
    • Filippo Sironi's avatar
      PCI: Expose SR-IOV offset, stride, and VF device ID via sysfs · 7dfca152
      Filippo Sironi authored
      Expose the SR-IOV device offset, stride, and VF device ID via sysfs to make
      it easier for userspace applications to consume them.
      Signed-off-by: default avatarFilippo Sironi <sironi@amazon.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      7dfca152
  3. 05 Oct, 2017 4 commits
  4. 01 Oct, 2017 9 commits
    • Linus Torvalds's avatar
      Linux 4.14-rc3 · 9e66317d
      Linus Torvalds authored
      9e66317d
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 368f8998
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "This contains the following fixes and improvements:
      
         - Avoid dereferencing an unprotected VMA pointer in the fault signal
           generation code
      
         - Fix inline asm call constraints for GCC 4.4
      
         - Use existing register variable to retrieve the stack pointer
           instead of forcing the compiler to create another indirect access
           which results in excessive extra 'mov %rsp, %<dst>' instructions
      
         - Disable branch profiling for the memory encryption code to prevent
           an early boot crash
      
         - Fix a sparse warning caused by casting the __user annotation in
           __get_user_asm_u64() away
      
         - Fix an off by one error in the loop termination of the error patch
           in the x86 sysfs init code
      
         - Add missing CPU IDs to various Intel specific drivers to enable the
           functionality on recent hardware
      
         - More (init) constification in the numachip code"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Use register variable to get stack pointer value
        x86/mm: Disable branch profiling in mem_encrypt.c
        x86/asm: Fix inline asm call constraints for GCC 4.4
        perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
        perf/x86/intel/rapl: Add missing CPU IDs
        perf/x86/msr: Add missing CPU IDs
        perf/x86/intel/cstate: Add missing CPU IDs
        x86: Don't cast away the __user in __get_user_asm_u64()
        x86/sysfs: Fix off-by-one error in loop termination
        x86/mm: Fix fault error path using unsafe vma pointer
        x86/numachip: Add const and __initconst to numachip2_clockevent
      368f8998
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c42ed9f9
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "This adds a new timer wheel function which is required for the
        conversion of the timer callback function from the 'unsigned long
        data' argument to 'struct timer_list *timer'. This conversion has two
        benefits:
      
         1) It makes struct timer_list smaller
      
         2) Many callers hand in a pointer to the timer or to the structure
            containing the timer, which happens via type casting both at setup
            and in the callback. This change gets rid of the typecasts.
      
        Once the conversion is complete, which is planned for 4.15, the old
        setup function and the intermediate typecast in the new setup function
        go away along with the data field in struct timer_list.
      
        Merging this now into mainline allows a smooth queueing of the actual
        conversion in the affected maintainer trees without creating
        dependencies"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        um/time: Fixup namespace collision
        timer: Prepare to change timer callback argument type
      c42ed9f9
    • Linus Torvalds's avatar
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82513545
      Linus Torvalds authored
      Pull smp/hotplug fixes from Thomas Gleixner:
       "This addresses the fallout of the new lockdep mechanism which covers
        completions in the CPU hotplug code.
      
        The lockdep splats are false positives, but there is no way to
        annotate that reliably. The solution is to split the completions for
        CPU up and down, which requires some reshuffling of the failure
        rollback handling as well"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp/hotplug: Hotplug state fail injection
        smp/hotplug: Differentiate the AP completion between up and down
        smp/hotplug: Differentiate the AP-work lockdep class between up and down
        smp/hotplug: Callback vs state-machine consistency
        smp/hotplug: Rewrite AP state machine core
        smp/hotplug: Allow external multi-instance rollback
        smp/hotplug: Add state diagram
      82513545
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7e103ace
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "The scheduler pull request comes with the following updates:
      
         - Prevent a divide by zero issue by validating the input value of
           sysctl_sched_time_avg
      
         - Make task state printing consistent all over the place and have
           explicit state characters for IDLE and PARKED so they wont be
           displayed as 'D' state which confuses tools"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/sysctl: Check user input value of sysctl_sched_time_avg
        sched/debug: Add explicit TASK_PARKED printing
        sched/debug: Ignore TASK_IDLE for SysRq-W
        sched/debug: Add explicit TASK_IDLE printing
        sched/tracing: Use common task-state helpers
        sched/tracing: Fix trace_sched_switch task-state printing
        sched/debug: Remove unused variable
        sched/debug: Convert TASK_state to hex
        sched/debug: Implement consistent task-state printing
      7e103ace
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c6f705b
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
      
       - Prevent a division by zero in the perf aux buffer handling
      
       - Sync kernel headers with perf tool headers
      
       - Fix a build failure in the syscalltbl code
      
       - Make the debug messages of perf report --call-graph work correctly
      
       - Make sure that all required perf files are in the MANIFEST for
         container builds
      
       - Fix the atrr.exclude kernel handling so it respects the
         perf_event_paranoid and the user permissions
      
       - Make perf test on s390x work correctly
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/aux: Only update ->aux_wakeup in non-overwrite mode
        perf test: Fix vmlinux failure on s390x part 2
        perf test: Fix vmlinux failure on s390x
        perf tools: Fix syscalltbl build failure
        perf report: Fix debug messages with --call-graph option
        perf evsel: Fix attr.exclude_kernel setting for default cycles:p
        tools include: Sync kernel ABI headers with tooling headers
        perf tools: Get all of tools/{arch,include}/ in the MANIFEST
      1c6f705b
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1de47f3c
      Linus Torvalds authored
      Pull  locking fixes from Thomas Gleixner:
       "Two fixes for locking:
      
         - Plug a hole the pi_stat->owner serialization which was changed
           recently and failed to fixup two usage sites.
      
         - Prevent reordering of the rwsem_has_spinner() check vs the
           decrement of rwsem count in up_write() which causes a missed
           wakeup"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem-xadd: Fix missed wakeup due to reordering of load
        futex: Fix pi_state->owner serialization
      1de47f3c
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3d9d62b9
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
      
       - Add a missing NULL pointer check in free_irq()
      
       - Fix a memory leak/memory corruption in the generic irq chip
      
       - Add missing rcu annotations for radix tree access
      
       - Use ffs instead of fls when extracting data from a chip register in
         the MIPS GIC irq driver
      
       - Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
         end up at the target CPU and not at CPU0
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irq/generic-chip: Don't replace domain's name
        irqdomain: Add __rcu annotations to radix tree accessors
        irqchip/mips-gic: Use effective affinity to unmask
        irqchip/mips-gic: Fix shifts to extract register fields
        genirq: Check __free_irq() return value for NULL
      3d9d62b9
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 156069f8
      Linus Torvalds authored
      Pull objtool fixes from Thomas Gleixner:
       "Two small fixes for objtool:
      
         - Support frame pointer setup via 'lea (%rsp), %rbp' which was not
           yet supported and caused build warnings
      
         - Disable unreacahble warnings for GCC4.4 and older to avoid false
           positives caused by the compiler itself"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Support unoptimized frame pointer setup
        objtool: Skip unreachable warnings for GCC 4.4 and older
      156069f8
  5. 30 Sep, 2017 4 commits
  6. 29 Sep, 2017 18 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 99637e42
      Linus Torvalds authored
      Pull waitid fix from Al Viro:
       "Fix infoleak in waitid()"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix infoleak in waitid(2)
      99637e42
    • Linus Torvalds's avatar
      Merge branch 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 5ba88cd6
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "We've collected a bunch of isolated fixes, for crashes, user-visible
        behaviour or missing bits from other subsystem cleanups from the past.
      
        The overall number is not small but I was not able to make it
        significantly smaller. Most of the patches are supposed to go to
        stable"
      
      * 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: log csums for all modified extents
        Btrfs: fix unexpected result when dio reading corrupted blocks
        btrfs: Report error on removing qgroup if del_qgroup_item fails
        Btrfs: skip checksum when reading compressed data if some IO have failed
        Btrfs: fix kernel oops while reading compressed data
        Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
        Btrfs: do not backup tree roots when fsync
        btrfs: remove BTRFS_FS_QUOTA_DISABLING flag
        btrfs: propagate error to btrfs_cmp_data_prepare caller
        btrfs: prevent to set invalid default subvolid
        Btrfs: send: fix error number for unknown inode types
        btrfs: fix NULL pointer dereference from free_reloc_roots()
        btrfs: finish ordered extent cleaning if no progress is found
        btrfs: clear ordered flag on cleaning up ordered extents
        Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO
        Btrfs: do not reset bio->bi_ops while writing bio
        Btrfs: use the new helper wbc_to_write_flags
      5ba88cd6
    • Linus Torvalds's avatar
      Merge tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · 7b5ef823
      Linus Torvalds authored
      Pull MD fixes from Shaohua Li:
       "A few fixes for MD. Mainly fix a problem introduced in 4.13, which we
        retry bio for some code paths but not all in some situations"
      
      * tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        md/raid5: cap worker count
        dm-raid: fix a race condition in request handling
        md: fix a race condition for flush request handling
        md: separate request handling
      7b5ef823
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 93b5533a
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - fix CONFIG_PCI=n build error (introduced in v4.14-rc1) (Geert
         Uytterhoeven)
      
       - fix a race in sysfs driver_override store/show (Nicolai Stange)
      
      * tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Fix race condition with driver_override
        PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
      93b5533a
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux · a3583202
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Regular fixes pull, some amdkfd, amdgpu, etnaviv, sun4i, qxl, tegra
        fixes.
      
        I've got an outstanding pull for i915 but it wasn't on an rc2 base so
        I wanted to ship these out first, I might get to it before rc3 or I
        might not"
      
      * tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux:
        drm/tegra: trace: Fix path to include
        qxl: fix framebuffer unpinning
        drm/sun4i: cec: Enable back CEC-pin framework
        drm/amdkfd: Print event limit messages only once per process
        drm/amdkfd: Fix kernel-queue wrapping bugs
        drm/amdkfd: Fix incorrect destroy_mqd parameter
        drm/radeon: disable hard reset in hibernate for APUs
        drm/amdgpu: revert tile table update for oland
        etnaviv: fix gem object list corruption
        etnaviv: fix submit error path
        qxl: fix primary surface handling
        drm/amdkfd: check for null dev to avoid a null pointer dereference
      a3583202
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 35dbba31
      Linus Torvalds authored
      Pull IOMMU fixes from Joerg Roedel:
      
       - A comment fix for 'struct iommu_ops'
      
       - Format string fixes for AMD IOMMU, unfortunatly I missed that during
         review.
      
       - Limit mediatek physical addresses to 32 bit for v7s to fix a warning
         triggered in io-page-table code.
      
       - Fix dma-sync in io-pgtable-arm-v7s code
      
      * tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu: Fix comment for iommu_ops.map_sg
        iommu/amd: pr_err() strings should end with newlines
        iommu/mediatek: Limit the physical address in 32bit for v7s
        iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
      35dbba31
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 06482600
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - SPsel register initialisation on reset as the architecture defines
         its state as unknown
      
       - Use READ_ONCE when dereferencing pmd_t pointers to avoid race
         conditions in page_vma_mapped_walk() (or fast GUP) with concurrent
         modifications of the page table
      
       - Avoid invoking the mm fault handling code for kernel addresses (check
         against TASK_SIZE) which would otherwise result in calling
         might_sleep() in atomic context
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fault: Route pte translation faults via do_translation_fault
        arm64: mm: Use READ_ONCE when dereferencing pointer to pte table
        arm64: Make sure SPsel is always set
      06482600
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 9f2a5128
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - avoid a warning when compiling with clang
      
       - consider read-only bits in xen-pciback when writing to a BAR
      
       - fix a boot crash of pv-domains
      
      * tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping
        xen-pciback: relax BAR sizing write value check
        x86/xen: clean up clang build warning
      9f2a5128
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 42057e18
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "Mixed bugfixes. Perhaps the most interesting one is a latent bug that
        was finally triggered by PCID support"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm/x86: Handle async PF in RCU read-side critical sections
        KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
        KVM: VMX: use cmpxchg64
        KVM: VMX: simplify and fix vmx_vcpu_pi_load
        KVM: VMX: avoid double list add with VT-d posted interrupts
        KVM: VMX: extract __pi_post_block
        KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception
        KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
      42057e18
    • Al Viro's avatar
      fix infoleak in waitid(2) · 6c85501f
      Al Viro authored
      kernel_waitid() can return a PID, an error or 0.  rusage is filled in the first
      case and waitid(2) rusage should've been copied out exactly in that case, *not*
      whenever kernel_waitid() has not returned an error.  Compat variant shares that
      braino; none of kernel_wait4() callers do, so the below ought to fix it.
      Reported-and-tested-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: ce72a16f ("wait4(2)/waitid(2): separate copying rusage to userland")
      Cc: stable@vger.kernel.org # v4.13
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6c85501f
    • Andrey Ryabinin's avatar
      x86/asm: Use register variable to get stack pointer value · 196bd485
      Andrey Ryabinin authored
      Currently we use current_stack_pointer() function to get the value
      of the stack pointer register. Since commit:
      
        f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      
      ... we have a stack register variable declared. It can be used instead of
      current_stack_pointer() function which allows to optimize away some
      excessive "mov %rsp, %<dst>" instructions:
      
       -mov    %rsp,%rdx
       -sub    %rdx,%rax
       -cmp    $0x3fff,%rax
       -ja     ffffffff810722fd <ist_begin_non_atomic+0x2d>
      
       +sub    %rsp,%rax
       +cmp    $0x3fff,%rax
       +ja     ffffffff810722fa <ist_begin_non_atomic+0x2a>
      
      Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
      and use it instead of the removed function.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      196bd485
    • Tom Lendacky's avatar
      x86/mm: Disable branch profiling in mem_encrypt.c · bc829ee3
      Tom Lendacky authored
      Some routines in mem_encrypt.c are called very early in the boot process,
      e.g. sme_encrypt_kernel(). When CONFIG_TRACE_BRANCH_PROFILING=y is defined
      the resulting branch profiling associated with the check to see if SME is
      active results in a kernel crash. Disable branch profiling for
      mem_encrypt.c by defining DISABLE_BRANCH_PROFILING before including any
      header files.
      Reported-by: default avatarkernel test robot <lkp@01.org>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170929162419.6016.53390.stgit@tlendack-t1.amdoffice.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bc829ee3
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-4.14-20170928' of... · 1addcd55
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-4.14-20170928' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      - Fix syscalltbl build failure (Akemi Yagi)
      
      - Fix attr.exclude_kernel setting for default cycles:p, this time for
        !root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo)
      
      - Sync kernel ABI headers with tooling headers (Ingo Molnar)
      
      - Remove misleading debug messages with --call-graph option (Mengting Zhang)
      
      - Revert vmlinux symbol resolution patches for s390x (Thomas Richter)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1addcd55
    • Linus Torvalds's avatar
      Merge branch 'fixes-v4.14-rc3' of... · 95d3652e
      Linus Torvalds authored
      Merge branch 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
      
      Pull keys fixes from James Morris:
       "Notable here is a rewrite of big_key crypto by Jason Donenfeld to
        address some issues in the original code.
      
        From Jason's commit log:
         "This started out as just replacing the use of crypto/rng with
          get_random_bytes_wait, so that we wouldn't use bad randomness at
          boot time. But, upon looking further, it appears that there were
          even deeper underlying cryptographic problems, and that this seems
          to have been committed with very little crypto review. So, I rewrote
          the whole thing, trying to keep to the conventions introduced by the
          previous author, to fix these cryptographic flaws."
      
        There has been positive review of the new code by Eric Biggers and
        Herbert Xu, and it passes basic testing via the keyutils test suite.
        Eric also manually tested it.
      
        Generally speaking, we likely need to improve the amount of crypto
        review for kernel crypto users including keys (I'll post a note
        separately to ksummit-discuss)"
      
      * 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        security/keys: rewrite all of big_key crypto
        security/keys: properly zero out sensitive key material in big_key
        KEYS: use kmemdup() in request_key_auth_new()
        KEYS: restrict /proc/keys by credentials at open time
        KEYS: reset parent each time before searching key_user_tree
        KEYS: prevent KEYCTL_READ on negative key
        KEYS: prevent creating a different user's keyrings
        KEYS: fix writing past end of user-supplied buffer in keyring_read()
        KEYS: fix key refcount leak in keyctl_read_key()
        KEYS: fix key refcount leak in keyctl_assume_authority()
        KEYS: don't revoke uninstantiated key in request_key_auth_new()
        KEYS: fix cred refcount leak in request_key_auth_new()
      95d3652e
    • Will Deacon's avatar
      arm64: fault: Route pte translation faults via do_translation_fault · 760bfb47
      Will Deacon authored
      We currently route pte translation faults via do_page_fault, which elides
      the address check against TASK_SIZE before invoking the mm fault handling
      code. However, this can cause issues with the path walking code in
      conjunction with our word-at-a-time implementation because
      load_unaligned_zeropad can end up faulting in kernel space if it reads
      across a page boundary and runs into a page fault (e.g. by attempting to
      read from a guard region).
      
      In the case of such a fault, load_unaligned_zeropad has registered a
      fixup to shift the valid data and pad with zeroes, however the abort is
      reported as a level 3 translation fault and we dispatch it straight to
      do_page_fault, despite it being a kernel address. This results in calling
      a sleeping function from atomic context:
      
        BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
        in_atomic(): 0, irqs_disabled(): 0, pid: 10290
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        [...]
        [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
        [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
        [<ffffff8e016977f0>] do_page_fault+0x140/0x330
        [<ffffff8e01681328>] do_mem_abort+0x54/0xb0
        Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
        [...]
        [<ffffff8e016844fc>] el1_da+0x18/0x78
        [<ffffff8e017f399c>] path_parentat+0x44/0x88
        [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
        [<ffffff8e017f5044>] filename_create+0x4c/0x128
        [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
        [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
        Code: 36380080 d5384100 f9400800 9402566d (d4210000)
        ---[ end trace 2d01889f2bca9b9f ]---
      
      Fix this by dispatching all translation faults to do_translation_faults,
      which avoids invoking the page fault logic for faults on kernel addresses.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarAnkit Jain <ankijain@codeaurora.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      760bfb47
    • Will Deacon's avatar
      arm64: mm: Use READ_ONCE when dereferencing pointer to pte table · f069faba
      Will Deacon authored
      On kernels built with support for transparent huge pages, different CPUs
      can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
      and they must take care to use READ_ONCE to avoid value tearing or caching
      of stale values by the compiler. Unfortunately, these functions call into
      our pgtable macros, which don't use READ_ONCE, and compiler caching has
      been observed to cause the following crash during ext4 writeback:
      
      PC is at check_pte+0x20/0x170
      LR is at page_vma_mapped_walk+0x2e0/0x540
      [...]
      Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
      Call trace:
      [<ffff000008233328>] check_pte+0x20/0x170
      [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
      [<ffff000008234adc>] page_mkclean_one+0xac/0x278
      [<ffff000008234d98>] rmap_walk_file+0xf0/0x238
      [<ffff000008236e74>] rmap_walk+0x64/0xa0
      [<ffff0000082370c8>] page_mkclean+0x90/0xa8
      [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
      [<ffff00000832f984>] mpage_submit_page+0x34/0x98
      [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
      [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
      [<ffff00000833530c>] ext4_writepages+0x484/0xe30
      [<ffff0000081f6ab4>] do_writepages+0x44/0xe8
      [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
      [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
      [<ffff000008324310>] ext4_sync_file+0x80/0x4b8
      [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
      [<ffff0000082332b4>] SyS_msync+0x194/0x1e8
      
      This is because page_vma_mapped_walk loads the PMD twice before calling
      pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
      due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
      (where it sees a valid table pointer due to a concurrent pmd_populate).
      However, the compiler inlines everything and caches the first value in
      a register, which is subsequently used in pte_offset_phys which returns
      a junk pointer that is later dereferenced when attempting to access the
      relevant pte.
      
      This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
      that a stale value is not used. Whilst this is a point fix for a known
      failure (and simple to backport), a full fix moving all of our page table
      accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
      page_vma_mapped_walk is in the works for a future kernel release.
      
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Timur Tabi <timur@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Fixes: f27176cf ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
      Tested-by: default avatarRichard Ruigrok <rruigrok@codeaurora.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      f069faba
    • Boqun Feng's avatar
      kvm/x86: Handle async PF in RCU read-side critical sections · b862789a
      Boqun Feng authored
      Sasha Levin reported a WARNING:
      
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      ...
      | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      | 1.10.1-1ubuntu1 04/01/2014
      | Call Trace:
      ...
      | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
      | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
      | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
      | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
      | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
      | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
      | __schedule+0x201/0x2240 kernel/sched/core.c:3292
      | schedule+0x113/0x460 kernel/sched/core.c:3421
      | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
      | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
      | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
      | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
      | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
      | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
      | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
      | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
      | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
      | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
      | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
      | sprintf+0xbe/0xf0 lib/vsprintf.c:2386
      | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
      | get_link fs/namei.c:1047 [inline]
      | link_path_walk+0x1041/0x1490 fs/namei.c:2127
      ...
      
      This happened when the host hit a page fault, and delivered it as in an
      async page fault, while the guest was in an RCU read-side critical
      section.  The guest then tries to reschedule in kvm_async_pf_task_wait(),
      but rcu_preempt_note_context_switch() would treat the reschedule as a
      sleep in RCU read-side critical section, which is not allowed (even in
      preemptible RCU).  Thus the WARN.
      
      To cure this, make kvm_async_pf_task_wait() go to the halt path if the
      PF happens in a RCU read-side critical section.
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b862789a
    • Wanpeng Li's avatar
      KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume · 305d0ab4
      Wanpeng Li authored
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
       CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0+ #17
       RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
       Call Trace:
        ? emulator_read_emulated+0x15/0x20 [kvm]
        ? segmented_read+0xae/0xf0 [kvm]
        vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
        ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
        x86_emulate_instruction+0x733/0x810 [kvm]
        vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
        ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
        kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
      
      A nested #PF is triggered during L0 emulating instruction for L2. However, it
      doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
      it by queuing the #PF exception instead ,requesting an immediate VM exit from
      L2 and keeping the exception for L1 pending for a subsequent nested VM exit.
      
      This should actually work all the time, making vmx_inject_page_fault_nested
      totally unnecessary.  However, that's not working yet, so this patch can work
      around the issue in the meanwhile.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      305d0ab4