1. 27 Jun, 2019 4 commits
    • Rafael J. Wysocki's avatar
      PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete() · b51033e0
      Rafael J. Wysocki authored
      In pci_pm_complete() there are checks to decide whether or not to
      resume devices that were left in runtime-suspend during the preceding
      system-wide transition into a sleep state.  They involve checking the
      current power state of the device and comparing it with the power
      state of it set before the preceding system-wide transition, but the
      platform component of the device's power state is not handled
      correctly in there.
      
      Namely, on platforms with ACPI, the device power state information
      needs to be updated with care, so that the reference counters of
      power resources used by the device (if any) are set to ensure that
      the refreshed power state of it will be maintained going forward.
      
      To that end, introduce a new ->refresh_state() platform PM callback
      for PCI devices, for asking the platform to refresh the device power
      state data and ensure that the corresponding power state will be
      maintained going forward, make it invoke acpi_device_update_power()
      (for devices with ACPI PM) on platforms with ACPI and make
      pci_pm_complete() use it, through a new pci_refresh_power_state()
      wrapper function.
      
      Fixes: a0d2a959 (PCI: Avoid unnecessary resume after direct-complete)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      b51033e0
    • Mika Westerberg's avatar
      PCI / ACPI: Add _PR0 dependent devices · 53b22f90
      Mika Westerberg authored
      If otherwise unrelated PCI devices share ACPI power resources turning
      them on causes the devices to enter D0uninitialized power state which may
      cause problems.
      
      For example in Intel Ice Lake two root ports (RP0 and RP1), Thunderbolt
      controller (NHI) and xHCI controller all share power resources as can be
      ween in the topology below where power resources are marked with []:
      
        Host bridge
          |
          +- RP0 ---\
          +- RP1 ---|--+--> [TBT]
          +- NHI --/   |
          |            |
          |            v
          +- xHCI --> [D3C]
      
      In a situation where all devices sharing the power resources are in
      D3cold (the power resources are turned off) and for example the
      Thunderbolt controller is runtime resumed resulting that the power
      resources are turned on. This means that the other devices sharing them
      (RP0, RP1 and xHCI) are transitioned into D0uninitialized state. If they
      were configured to trigger wake (PME) on a certain event that
      configuration gets lost after reset so we would need to re-initialize
      them to get the wakeup working as expected again. To do so we would need
      to runtime resume all of them to make sure their registers get restored
      properly before we can runtime suspend them again.
      
      Since we just added concept of "_PR0 dependent device" we can solve this
      by calling the relevant add/remove functions when the PCI device is bind
      to its ACPI representation. If it has power resources the PCI device
      will be added as dependent device to them and runtime resumed whenever
      they are physically turned on. This should make sure PCI core can
      reconfigure wakes after the device is transitioned into D0uninitialized.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      53b22f90
    • Mika Westerberg's avatar
      ACPI / PM: Introduce concept of a _PR0 dependent device · 4533771c
      Mika Westerberg authored
      If there are shared power resources between otherwise unrelated devices
      turning them on causes the other devices sharing them to be powered up
      as well. In case of PCI devices go into D0uninitialized state meaning
      that if they were configured to trigger wake that configuration is lost
      at this point.
      
      For this reason introduce a concept of "_PR0 dependent device" that can
      be added to any ACPI device that has power resources. The dependent
      device will be included in a list of dependent devices for all power
      resources returned by the ACPI device's _PR0 (assuming it has one).
      Whenever a power resource having dependent devices is turned physically
      on (its _ON method is called) we runtime resume all of them to allow
      their driver or in case of PCI the PCI core to re-initialize the device
      and its wake configuration.
      
      This adds two functions that can be used to add and remove these
      dependent devices. Note the dependent device does not necessary need
      share power resources so this functionality can be used to add "software
      dependencies" as well if needed.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4533771c
    • Mika Westerberg's avatar
      PCI / ACPI: Use cached ACPI device state to get PCI device power state · 83a16e3f
      Mika Westerberg authored
      The ACPI power state returned by acpi_device_get_power() may depend on
      the configuration of ACPI power resources in the system which may change
      any time after acpi_device_get_power() has returned, unless the
      reference counters of the ACPI power resources in question are set to
      prevent that from happening. Thus it is invalid to use acpi_device_get_power()
      in acpi_pci_get_power_state() the way it is done now and the value of
      the ->power.state field in the corresponding struct acpi_device objects
      (which reflects the ACPI power resources reference counting, among other
      things) should be used instead.
      
      As an example where this becomes an issue is Intel Ice Lake where the
      Thunderbolt controller (NHI), two PCIe root ports (RP0 and RP1) and xHCI
      all share the same power resources. The following picture with power
      resources marked with [] shows the topology:
      
        Host bridge
          |
          +- RP0 ---\
          +- RP1 ---|--+--> [TBT]
          +- NHI --/   |
          |            |
          |            v
          +- xHCI --> [D3C]
      
      Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method
      of the devices in question returns either TBT or D3C or both.
      
      Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now
      since the TBT power resource is still on when the root ports are runtime
      suspended their dev->current_state is set to D3hot. When NHI is runtime
      suspended TBT is finally turned off but state of the root ports remain
      to be D3hot. Now when the xHCI is runtime suspended D3C gets also turned
      off. PCI core thus has power states of these devices cached in their
      dev->current_state as follows:
      
        RP0 -> D3hot
        RP1 -> D3hot
        NHI -> D3cold
        xHCI -> D3cold
      
      If the user now runs lspci for instance, the result is all 1's like in
      the below output (00:07.0 is the first root port, RP0):
      
      00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
          !!! Unknown header type 7f
          Kernel driver in use: pcieport
      
      In short the hardware state is not in sync with the software state
      anymore. The exact same thing happens with the PME polling thread which
      ends up bringing the root ports back into D0 after they are runtime
      suspended.
      
      For this reason, modify acpi_pci_get_power_state() so that it uses the
      ACPI device power state that was cached by the ACPI core. This makes the
      PCI device power state match the ACPI device power state regardless of
      state of the shared power resources which may still be on at this point.
      
      Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.comSigned-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      83a16e3f
  2. 24 Jun, 2019 1 commit
  3. 17 Jun, 2019 4 commits
    • Mika Westerberg's avatar
      PCI: Do not poll for PME if the device is in D3cold · 000dd531
      Mika Westerberg authored
      PME polling does not take into account that a device that is directly
      connected to the host bridge may go into D3cold as well. This leads to a
      situation where the PME poll thread reads from a config space of a
      device that is in D3cold and gets incorrect information because the
      config space is not accessible.
      
      Here is an example from Intel Ice Lake system where two PCIe root ports
      are in D3cold (I've instrumented the kernel to log the PMCSR register
      contents):
      
        [   62.971442] pcieport 0000:00:07.1: Check PME status, PMCSR=0xffff
        [   62.971504] pcieport 0000:00:07.0: Check PME status, PMCSR=0xffff
      
      Since 0xffff is interpreted so that PME is pending, the root ports will
      be runtime resumed. This repeats over and over again essentially
      blocking all runtime power management.
      
      Prevent this from happening by checking whether the device is in D3cold
      before its PME status is read.
      
      Fixes: 71a83bd7 ("PCI/PM: add runtime PM support to PCIe port")
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: 3.6+ <stable@vger.kernel.org> # v3.6+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      000dd531
    • Mika Westerberg's avatar
      PCI: Add missing link delays required by the PCIe spec · c2bf1fc2
      Mika Westerberg authored
      Currently Linux does not follow PCIe spec regarding the required delays
      after reset. A concrete example is a Thunderbolt add-in-card that
      consists of a PCIe switch and two PCIe endpoints:
      
        +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
                                        +-01.0-[04-36]-- DS hotplug port
                                        +-02.0-[37]----00.0 xHCI controller
                                        \-04.0-[38-6b]-- DS hotplug port
      
      The root port (1b.0) and the PCIe switch downstream ports are all PCIe
      gen3 so they support 8GT/s link speeds.
      
      We wait for the PCIe hierarchy to enter D3cold (runtime):
      
        pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
      
      When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
      PCIe switch is put to reset and its power is re-applied. This means that
      we must follow the rules in PCIe 4.0 section 6.6.1.
      
      For the PCIe gen3 ports we are dealing with here, the following applies:
      
        With a Downstream Port that supports Link speeds greater than 5.0
        GT/s, software must wait a minimum of 100 ms after Link training
        completes before sending a Configuration Request to the device
        immediately below that Port. Software can determine when Link training
        completes by polling the Data Link Layer Link Active bit or by setting
        up an associated interrupt (see Section 6.7.3.3).
      
      Translating this into the above topology we would need to do this (DLLLA
      stands for Data Link Layer Link Active):
      
        pcieport 0000:00:1b.0: wait for 100ms after DLLLA is set before access to 0000:01:00.0
        pcieport 0000:02:00.0: wait for 100ms after DLLLA is set before access to 0000:03:00.0
        pcieport 0000:02:02.0: wait for 100ms after DLLLA is set before access to 0000:37:00.0
      
      I've instrumented the kernel with additional logging so we can see the
      actual delays the kernel performs:
      
        pcieport 0000:00:1b.0: power state changed by ACPI to D0
        pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
        pcieport 0000:00:1b.0: waking up bus
        pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
        pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
        ...
        pcieport 0000:00:1b.0: PME# disabled
        pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        ...
        pcieport 0000:01:00.0: PME# disabled
        pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        ...
        pcieport 0000:02:00.0: PME# disabled
        pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        ...
        pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
        pcieport 0000:02:01.0: PME# disabled
        pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        ...
        pcieport 0000:02:02.0: PME# disabled
        pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        ...
        pcieport 0000:02:04.0: PME# disabled
        pcieport 0000:02:01.0: PME# enabled
        pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
        pcieport 0000:02:04.0: PME# enabled
        pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
        thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
        ...
        thunderbolt 0000:03:00.0: PME# disabled
        xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
        ...
        xhci_hcd 0000:37:00.0: PME# disabled
      
      For the switch upstream port (01:00.0) we wait for 100ms but not taking
      into account the DLLLA requirement. We then wait 10ms for D3hot -> D0
      transition of the root port and the two downstream hotplug ports. This
      means that we deviate from what the spec requires.
      
      Performing the same check for system sleep (s2idle) transitions we can
      see following when resuming from s2idle:
      
        pcieport 0000:00:1b.0: power state changed by ACPI to D0
        pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
        ...
        pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        ...
        pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        pcieport 0000:02:02.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
        pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        pcieport 0000:02:02.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
        pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
        pcieport 0000:02:02.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
        pcieport 0000:02:01.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
        pcieport 0000:02:02.0: restoring config space at offset 0x20 (was 0x0, writing 0x73f073f0)
        pcieport 0000:02:04.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
        pcieport 0000:02:01.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
        pcieport 0000:02:00.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
        pcieport 0000:02:02.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
        pcieport 0000:02:04.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
        pcieport 0000:02:01.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1ff10001)
        pcieport 0000:02:00.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
        pcieport 0000:02:02.0: restoring config space at offset 0x18 (was 0x0, writing 0x373702)
        pcieport 0000:02:04.0: restoring config space at offset 0x24 (was 0x10001, writing 0x49f12001)
        pcieport 0000:02:01.0: restoring config space at offset 0x20 (was 0x0, writing 0x73e05c00)
        pcieport 0000:02:00.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
        pcieport 0000:02:04.0: restoring config space at offset 0x20 (was 0x0, writing 0x89f07400)
        pcieport 0000:02:01.0: restoring config space at offset 0x1c (was 0x101, writing 0x5151)
        pcieport 0000:02:00.0: restoring config space at offset 0x20 (was 0x0, writing 0x8a008a00)
        pcieport 0000:02:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
        pcieport 0000:02:04.0: restoring config space at offset 0x1c (was 0x101, writing 0x6161)
        pcieport 0000:02:01.0: restoring config space at offset 0x18 (was 0x0, writing 0x360402)
        pcieport 0000:02:00.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
        pcieport 0000:02:04.0: restoring config space at offset 0x18 (was 0x0, writing 0x6b3802)
        pcieport 0000:02:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
        pcieport 0000:02:00.0: restoring config space at offset 0x18 (was 0x0, writing 0x30302)
        pcieport 0000:02:01.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
        pcieport 0000:02:04.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
        pcieport 0000:02:00.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
        pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
        pcieport 0000:02:04.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
        pcieport 0000:02:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
        xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
        ...
        thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
      
      This is even worse. None of the mandatory delays are performed. If this
      would be S3 instead of s2idle then according to PCI FW spec 3.2 section
      4.6.8.  there is a specific _DSM that allows the OS to skip the delays
      but this platform does not provide the _DSM and does not go to S3 anyway
      so no firmware is involved that could already handle these delays.
      
      In this particular Intel Coffee Lake platform these delays are not
      actually needed because there is an additional delay as part of the ACPI
      power resource that is used to turn on power to the hierarchy but since
      that additional delay is not required by any of standards (PCIe, ACPI)
      it is not present in the Intel Ice Lake, for example where missing the
      mandatory delays causes pciehp to start tearing down the stack too early
      (links are not yet trained).
      
      For this reason, change the PCIe portdrv PM resume hooks so that they
      perform the mandatory delays before the downstream component gets
      resumed. We perform the delays before port services are resumed because
      otherwise pciehp might find that the link is not up (even if it is just
      training) and tears-down the hierarchy.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c2bf1fc2
    • Rafael J. Wysocki's avatar
      PCI: PM: Replace pci_dev_keep_suspended() with two functions · 0c7376ad
      Rafael J. Wysocki authored
      The code in pci_dev_keep_suspended() is relatively hard to follow due
      to the negative checks in it and in its callers and the function has
      a possible side-effect (disabling the PME) which doesn't really match
      its role.
      
      For this reason, move the PME disabling from pci_dev_keep_suspended()
      to a separate function and change the semantics (and name) of the
      rest of it, so that 'true' is returned when the device needs to be
      resumed (and not the other way around).  Change the callers of
      pci_dev_keep_suspended() accordingly.
      
      While at it, make the code flow in pci_pm_poweroff() reflect the
      pci_pm_suspend() more closely to avoid arbitrary differences between
      them.
      
      This is a cosmetic change with no intention to alter behavior.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      0c7376ad
    • Rafael J. Wysocki's avatar
      PCI: PM: Avoid resuming devices in D3hot during system suspend · 234f223d
      Rafael J. Wysocki authored
      The current code resumes devices in D3hot during system suspend if
      the target power state for them is D3cold, but that is not necessary
      in general.  It only is necessary to do that if the platform firmware
      requires the device to be resumed, but that should be covered by
      the platform_pci_need_resume() check anyway, so rework
      pci_dev_keep_suspended() to avoid returning 'false' for devices
      in D3hot which need not be resumed due to platform firmware
      requirements.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      234f223d
  4. 16 Jun, 2019 4 commits
    • Linus Torvalds's avatar
      Linux 5.2-rc5 · 9e0babf2
      Linus Torvalds authored
      9e0babf2
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 963172d9
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "The accumulated fixes from this and last week:
      
         - Fix vmalloc TLB flush and map range calculations which lead to
           stale TLBs, spurious faults and other hard to diagnose issues.
      
         - Use fault_in_pages_writable() for prefaulting the user stack in the
           FPU code as it's less fragile than the current solution
      
         - Use the PF_KTHREAD flag when checking for a kernel thread instead
           of current->mm as the latter can give the wrong answer due to
           use_mm()
      
         - Compute the vmemmap size correctly for KASLR and 5-Level paging.
           Otherwise this can end up with a way too small vmemmap area.
      
         - Make KASAN and 5-level paging work again by making sure that all
           invalid bits are masked out when computing the P4D offset. This
           worked before but got broken recently when the LDT remap area was
           moved.
      
         - Prevent a NULL pointer dereference in the resource control code
           which can be triggered with certain mount options when the
           requested resource is not available.
      
         - Enforce ordering of microcode loading vs. perf initialization on
           secondary CPUs. Otherwise perf tries to access a non-existing MSR
           as the boot CPU marked it as available.
      
         - Don't stop the resource control group walk early otherwise the
           control bitmaps are not updated correctly and become inconsistent.
      
         - Unbreak kgdb by returning 0 on success from
           kgdb_arch_set_breakpoint() instead of an error code.
      
         - Add more Icelake CPU model defines so depending changes can be
           queued in other trees"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
        x86/kasan: Fix boot with 5-level paging and KASAN
        x86/fpu: Don't use current->mm to check for a kthread
        x86/kgdb: Return 0 from kgdb_arch_set_breakpoint()
        x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled
        x86/resctrl: Don't stop walking closids when a locksetup group is found
        x86/fpu: Update kernel's FPU state before using for the fsave header
        x86/mm/KASLR: Compute the size of the vmemmap section properly
        x86/fpu: Use fault_in_pages_writeable() for pre-faulting
        x86/CPU: Add more Icelake model numbers
        mm/vmalloc: Avoid rare case of flushing TLB with weird arguments
        mm/vmalloc: Fix calculation of direct map addr range
      963172d9
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · efba92d5
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "A set of small fixes:
      
         - Repair the ktime_get_coarse() functions so they actually deliver
           what they are supposed to: tick granular time stamps. The current
           code missed to add the accumulated nanoseconds part of the
           timekeeper so the resulting granularity was 1 second.
      
         - Prevent the tracer from infinitely recursing into time getter
           functions in the arm architectured timer by marking these functions
           notrace
      
         - Fix a trivial compiler warning caused by wrong qualifier ordering"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timekeeping: Repair ktime_get_coarse*() granularity
        clocksource/drivers/arm_arch_timer: Don't trace count reader functions
        clocksource/drivers/timer-ti-dm: Change to new style declaration
      efba92d5
    • Linus Torvalds's avatar
      Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f763cf8e
      Linus Torvalds authored
      Pull RAS fixes from Thomas Gleixner:
       "Two small fixes for RAS:
      
         - Use a proper search algorithm to find the correct element in the
           CEC array. The replacement was a better choice than fixing the
           crash causes by the original search function with horrible duct
           tape.
      
         - Move the timer based decay function into thread context so it can
           actually acquire the mutex which protects the CEC array to prevent
           corruption"
      
      * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        RAS/CEC: Convert the timer callback to a workqueue
        RAS/CEC: Fix binary search function
      f763cf8e
  5. 15 Jun, 2019 12 commits
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.2-3' of git://git.infradead.org/linux-platform-drivers-x86 · e01e060f
      Linus Torvalds authored
      Pull x86 platform driver fixes from Andy Shevchenko:
      
       - fix a couple of Mellanox driver enumeration issues
      
       - fix ASUS laptop regression with backlight
      
       - fix Dell computers that got a wrong mode (tablet versus laptop) after
         resume
      
      * tag 'platform-drivers-x86-v5.2-3' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow
        platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration
        platform/x86: intel-vbtn: Report switch events when event wakes device
        platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
      e01e060f
    • Linus Torvalds's avatar
      Merge tag 'usb-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · ff39074b
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB driver fixes for 5.2-rc5
      
        Nothing major, just some small gadget fixes, usb-serial new device
        ids, a few new quirks, and some small fixes for some regressions that
        have been found after the big 5.2-rc1 merge.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'usb-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: Make sure an alt mode exist before getting its partner
        usb: gadget: udc: lpc32xx: fix return value check in lpc32xx_udc_probe()
        usb: gadget: dwc2: fix zlp handling
        usb: dwc2: Set actual frame number for completed ISOC transfer for none DDMA
        usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
        usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
        usb: phy: mxs: Disable external charger detect in mxs_phy_hw_init()
        usb: dwc2: Fix DMA cache alignment issues
        usb: dwc2: host: Fix wMaxPacketSize handling (fix webcam regression)
        USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
        USB: usb-storage: Add new ID to ums-realtek
        usb: typec: ucsi: ccg: fix memory leak in do_flash
        USB: serial: option: add Telit 0x1260 and 0x1261 compositions
        USB: serial: pl2303: add Allied Telesis VT-Kit3
        USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
      ff39074b
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · fa1827d7
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "One fix for a regression introduced by our 32-bit KASAN support, which
        broke booting on machines with "bootx" early debugging enabled.
      
        A fix for a bug which broke kexec on 32-bit, introduced by changes to
        the 32-bit STRICT_KERNEL_RWX support in v5.1.
      
        Finally two fixes going to stable for our THP split/collapse handling,
        discovered by Nick. The first fixes random crashes and/or corruption
        in guests under sufficient load.
      
        Thanks to: Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu
        Malaterre"
      
      * tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
        powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()
        powerpc/64s: Fix THP PMD collapse serialisation
        powerpc: Fix kexec failure on book3s/32
      fa1827d7
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 6a71398c
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Out of range read of stack trace output
      
       - Fix for NULL pointer dereference in trace_uprobe_create()
      
       - Fix to a livepatching / ftrace permission race in the module code
      
       - Fix for NULL pointer dereference in free_ftrace_func_mapper()
      
       - A couple of build warning clean ups
      
      * tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
        module: Fix livepatch/ftrace module text permissions race
        tracing/uprobe: Fix obsolete comment on trace_uprobe_create()
        tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()
        tracing: Make two symbols static
        tracing: avoid build warning with HAVE_NOP_MCOUNT
        tracing: Fix out-of-range read in trace_stack_print()
      6a71398c
    • Borislav Petkov's avatar
      x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback · 78f4e932
      Borislav Petkov authored
      Adric Blake reported the following warning during suspend-resume:
      
        Enabling non-boot CPUs ...
        x86: Booting SMP configuration:
        smpboot: Booting Node 0 Processor 1 APIC 0x2
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
         at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
        Call Trace:
         intel_set_tfa
         intel_pmu_cpu_starting
         ? x86_pmu_dead_cpu
         x86_pmu_starting_cpu
         cpuhp_invoke_callback
         ? _raw_spin_lock_irqsave
         notify_cpu_starting
         start_secondary
         secondary_startup_64
        microcode: sig=0x806ea, pf=0x80, revision=0x96
        microcode: updated to revision 0xb4, date = 2019-04-01
        CPU1 is up
      
      The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
      by microcode. The log above shows that the microcode loader callback
      happens after the PMU restoration, leading to the conjecture that
      because the microcode hasn't been updated yet, that MSR is not present
      yet, leading to the #GP.
      
      Add a microcode loader-specific hotplug vector which comes before
      the PERF vectors and thus executes earlier and makes sure the MSR is
      present.
      
      Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort")
      Reported-by: default avatarAdric Blake <promarbler14@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: x86@kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
      78f4e932
    • Linus Torvalds's avatar
      Merge branch 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 0011572c
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "This has an unusually high density of tricky fixes:
      
         - task_get_css() could deadlock when it races against a dying cgroup.
      
         - cgroup.procs didn't list thread group leaders with live threads.
      
           This could mislead readers to think that a cgroup is empty when
           it's not. Fixed by making PROCS iterator include dead tasks. I made
           a couple mistakes making this change and this pull request contains
           a couple follow-up patches.
      
         - When cpusets run out of online cpus, it updates cpusmasks of member
           tasks in bizarre ways. Joel improved the behavior significantly"
      
      * 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cpuset: restore sanity to cpuset_cpus_allowed_fallback()
        cgroup: Fix css_task_iter_advance_css_set() cset skip condition
        cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
        cgroup: Include dying leaders with live threads in PROCS iterations
        cgroup: Implement css_task_iter_skip()
        cgroup: Call cgroup_release() before __exit_signal()
        docs cgroups: add another example size for hugetlb
        cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
      0011572c
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm · 6aa7a22b
      Linus Torvalds authored
      Pull drm fixes from Daniel Vetter:
       "Nothing unsettling here, also not aware of anything serious still
        pending.
      
        The edid override regression fix took a bit longer since this seems to
        be an area with an overabundance of bad options. But the fix we have
        now seems like a good path forward.
      
        Next week it should be back to Dave.
      
        Summary:
      
         - fix regression on amdgpu on SI
      
         - fix edid override regression
      
         - driver fixes: amdgpu, i915, mediatek, meson, panfrost
      
         - fix writecombine for vmap in gem-shmem helper (used by panfrost)
      
         - add more panel quirks"
      
      * tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm: (25 commits)
        drm/amdgpu: return 0 by default in amdgpu_pm_load_smu_firmware
        drm/amdgpu: Fix bounds checking in amdgpu_ras_is_supported()
        drm: add fallback override/firmware EDID modes workaround
        drm/edid: abstract override/firmware EDID retrieval
        drm/i915/perf: fix whitelist on Gen10+
        drm/i915/sdvo: Implement proper HDMI audio support for SDVO
        drm/i915: Fix per-pixel alpha with CCS
        drm/i915/dmc: protect against reading random memory
        drm/i915/dsi: Use a fuzzy check for burst mode clock check
        drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
        drm/panfrost: Require the simple_ondemand governor
        drm/panfrost: make devfreq optional again
        drm/gem_shmem: Use a writecombine mapping for ->vaddr
        drm: panel-orientation-quirks: Add quirk for GPD MicroPC
        drm: panel-orientation-quirks: Add quirk for GPD pocket2
        drm/meson: fix G12A primary plane disabling
        drm/meson: fix primary plane disabling
        drm/meson: fix G12A HDMI PLL settings for 4K60 1000/1001 variations
        drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable()
        drm/mediatek: clear num_pipes when unbind driver
        ...
      6aa7a22b
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 40665244
      Linus Torvalds authored
      Pull gfs2 fix from Andreas Gruenbacher:
       "Fix rounding error in gfs2_iomap_page_prepare"
      
      * tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Fix rounding error in gfs2_iomap_page_prepare
      40665244
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 1ed1fa5f
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "A single bug fix for hpsa.
      
        The user visible consequences aren't clear, but the ioaccel2 raid
        acceleration may misfire on the malformed request assuming the payload
        is big enough to require chaining (more than 31 sg entries)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: hpsa: correct ioaccel2 chaining
      1ed1fa5f
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190614' of git://git.kernel.dk/linux-block · 7b103151
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Remove references to old schedulers for the scheduler switching and
         blkio controller documentation (Andreas)
      
       - Kill duplicate check for report zone for null_blk (Chaitanya)
      
       - Two bcache fixes (Coly)
      
       - Ensure that mq-deadline is selected if zoned block device is enabled,
         as we need that to support them (Damien)
      
       - Fix io_uring memory leak (Eric)
      
       - ps3vram fallout from LBDAF removal (Geert)
      
       - Redundant blk-mq debugfs debugfs_create return check cleanup (Greg)
      
       - Extend NOPLM quirk for ST1000LM024 drives (Hans)
      
       - Remove error path warning that can now trigger after the queue
         removal/addition fixes (Ming)
      
      * tag 'for-linus-20190614' of git://git.kernel.dk/linux-block:
        block/ps3vram: Use %llu to format sector_t after LBDAF removal
        libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
        bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
        bcache: fix stack corruption by PRECEDING_KEY()
        blk-mq: remove WARN_ON(!q->elevator) from blk_mq_sched_free_requests
        blkio-controller.txt: Remove references to CFQ
        block/switching-sched.txt: Update to blk-mq schedulers
        null_blk: remove duplicate check for report zone
        blk-mq: no need to check return value of debugfs_create functions
        io_uring: fix memory leak of UNIX domain socket inode
        block: force select mq-deadline for zoned block devices
      7b103151
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5dcedf46
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "I2C has two simple but wanted driver fixes for you"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: pca-platform: Fix GPIO lookup code
        i2c: acorn: fix i2c warning
      5dcedf46
    • Casey Schaufler's avatar
      Smack: Restore the smackfsdef mount option and add missing prefixes · 6e7739fc
      Casey Schaufler authored
      The 5.1 mount system rework changed the smackfsdef mount option to
      smackfsdefault.  This fixes the regression by making smackfsdef treated
      the same way as smackfsdefault.
      
      Also fix the smack_param_specs[] to have "smack" prefixes on all the
      names.  This isn't visible to a user unless they either:
      
       (a) Try to mount a filesystem that's converted to the internal mount API
           and that implements the ->parse_monolithic() context operation - and
           only then if they call security_fs_context_parse_param() rather than
           security_sb_eat_lsm_opts().
      
           There are no examples of this upstream yet, but nfs will probably want
           to do this for nfs2 or nfs3.
      
       (b) Use fsconfig() to configure the filesystem - in which case
           security_fs_context_parse_param() will be called.
      
      This issue is that smack_sb_eat_lsm_opts() checks for the "smack" prefix
      on the options, but smack_fs_context_parse_param() does not.
      
      Fixes: c3300aaf ("smack: get rid of match_token()")
      Fixes: 2febd254 ("smack: Implement filesystem context security hooks")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJose Bollo <jose.bollo@iot.bzh>
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e7739fc
  6. 14 Jun, 2019 15 commits
    • Wei Li's avatar
      ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() · 04e03d9a
      Wei Li authored
      The mapper may be NULL when called from register_ftrace_function_probe()
      with probe->data == NULL.
      
      This issue can be reproduced as follow (it may be covered by compiler
      optimization sometime):
      
      / # cat /sys/kernel/debug/tracing/set_ftrace_filter
      #### all functions enabled ####
      / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
      [  206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  206.952402] Mem abort info:
      [  206.952819]   ESR = 0x96000006
      [  206.955326]   Exception class = DABT (current EL), IL = 32 bits
      [  206.955844]   SET = 0, FnV = 0
      [  206.956272]   EA = 0, S1PTW = 0
      [  206.956652] Data abort info:
      [  206.957320]   ISV = 0, ISS = 0x00000006
      [  206.959271]   CM = 0, WnR = 0
      [  206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
      [  206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
      [  206.964953] Internal error: Oops: 96000006 [#1] SMP
      [  206.971122] Dumping ftrace buffer:
      [  206.973677]    (ftrace buffer empty)
      [  206.975258] Modules linked in:
      [  206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
      [  206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
      [  206.978955] Hardware name: linux,dummy-virt (DT)
      [  206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
      [  206.980874] lr : ftrace_count_free+0x68/0x80
      [  206.982539] sp : ffff0000182f3ab0
      [  206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
      [  206.983632] x27: ffff000013054b40 x26: 0000000000000001
      [  206.984000] x25: ffff00001385f000 x24: 0000000000000000
      [  206.984394] x23: ffff000013453000 x22: ffff000013054000
      [  206.984775] x21: 0000000000000000 x20: ffff00001385fe28
      [  206.986575] x19: ffff000013872c30 x18: 0000000000000000
      [  206.987111] x17: 0000000000000000 x16: 0000000000000000
      [  206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
      [  206.987850] x13: 000000000017430e x12: 0000000000000580
      [  206.988251] x11: 0000000000000000 x10: cccccccccccccccc
      [  206.988740] x9 : 0000000000000000 x8 : ffff000013917550
      [  206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
      [  206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
      [  206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
      [  206.991771] x1 : 0000000000000000 x0 : 0000000000000000
      [  206.992557] Call trace:
      [  206.993101]  free_ftrace_func_mapper+0x2c/0x118
      [  206.994827]  ftrace_count_free+0x68/0x80
      [  206.995238]  release_probe+0xfc/0x1d0
      [  206.995555]  register_ftrace_function_probe+0x4a8/0x868
      [  206.995923]  ftrace_trace_probe_callback.isra.4+0xb8/0x180
      [  206.996330]  ftrace_dump_callback+0x50/0x70
      [  206.996663]  ftrace_regex_write.isra.29+0x290/0x3a8
      [  206.997157]  ftrace_filter_write+0x44/0x60
      [  206.998971]  __vfs_write+0x64/0xf0
      [  206.999285]  vfs_write+0x14c/0x2f0
      [  206.999591]  ksys_write+0xbc/0x1b0
      [  206.999888]  __arm64_sys_write+0x3c/0x58
      [  207.000246]  el0_svc_common.constprop.0+0x408/0x5f0
      [  207.000607]  el0_svc_handler+0x144/0x1c8
      [  207.000916]  el0_svc+0x8/0xc
      [  207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
      [  207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
      [  207.010126] Kernel panic - not syncing: Fatal exception
      [  207.011322] SMP: stopping secondary CPUs
      [  207.013956] Dumping ftrace buffer:
      [  207.014595]    (ftrace buffer empty)
      [  207.015632] Kernel Offset: disabled
      [  207.017187] CPU features: 0x002,20006008
      [  207.017985] Memory Limit: none
      [  207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.comSigned-off-by: default avatarWei Li <liwei391@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      04e03d9a
    • Josh Poimboeuf's avatar
      module: Fix livepatch/ftrace module text permissions race · 9f255b63
      Josh Poimboeuf authored
      It's possible for livepatch and ftrace to be toggling a module's text
      permissions at the same time, resulting in the following panic:
      
        BUG: unable to handle page fault for address: ffffffffc005b1d9
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
        Oops: 0003 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 453 Comm: insmod Tainted: G           O  K   5.2.0-rc1-a188339c #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
        RIP: 0010:apply_relocate_add+0xbe/0x14c
        Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
        RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
        RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
        RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
        RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
        R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
        R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
        FS:  00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         klp_init_object_loaded+0x10f/0x219
         ? preempt_latency_start+0x21/0x57
         klp_enable_patch+0x662/0x809
         ? virt_to_head_page+0x3a/0x3c
         ? kfree+0x8c/0x126
         patch_init+0x2ed/0x1000 [livepatch_test02]
         ? 0xffffffffc0060000
         do_one_initcall+0x9f/0x1c5
         ? kmem_cache_alloc_trace+0xc4/0xd4
         ? do_init_module+0x27/0x210
         do_init_module+0x5f/0x210
         load_module+0x1c41/0x2290
         ? fsnotify_path+0x3b/0x42
         ? strstarts+0x2b/0x2b
         ? kernel_read+0x58/0x65
         __do_sys_finit_module+0x9f/0xc3
         ? __do_sys_finit_module+0x9f/0xc3
         __x64_sys_finit_module+0x1a/0x1c
         do_syscall_64+0x52/0x61
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The above panic occurs when loading two modules at the same time with
      ftrace enabled, where at least one of the modules is a livepatch module:
      
      CPU0					CPU1
      klp_enable_patch()
        klp_init_object_loaded()
          module_disable_ro()
          					ftrace_module_enable()
      					  ftrace_arch_code_modify_post_process()
      				    	    set_all_modules_text_ro()
            klp_write_object_relocations()
              apply_relocate_add()
      	  *patches read-only code* - BOOM
      
      A similar race exists when toggling ftrace while loading a livepatch
      module.
      
      Fix it by ensuring that the livepatch and ftrace code patching
      operations -- and their respective permissions changes -- are protected
      by the text_mutex.
      
      Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.comReported-by: default avatarJohannes Erdfelt <johannes@erdfelt.com>
      Fixes: 444d13ff ("modules: add ro_after_init support")
      Acked-by: default avatarJessica Yu <jeyu@kernel.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      9f255b63
    • Eiichi Tsukata's avatar
      tracing/uprobe: Fix obsolete comment on trace_uprobe_create() · a4158345
      Eiichi Tsukata authored
      Commit 0597c49c ("tracing/uprobes: Use dyn_event framework for
      uprobe events") cleaned up the usage of trace_uprobe_create(), and the
      function has been no longer used for removing uprobe/uretprobe.
      
      Link: http://lkml.kernel.org/r/20190614074026.8045-2-devel@etsukata.comReviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      a4158345
    • Eiichi Tsukata's avatar
      tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create() · f01098c7
      Eiichi Tsukata authored
      Just like the case of commit 8b05a3a7 ("tracing/kprobes: Fix NULL
      pointer dereference in trace_kprobe_create()"), writing an incorrectly
      formatted string to uprobe_events can trigger NULL pointer dereference.
      
      Reporeducer:
      
        # echo r > /sys/kernel/debug/tracing/uprobe_events
      
      dmesg:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP PTI
        CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ #15
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        RIP: 0010:strchr+0x0/0x30
        Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c
        RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293
      
        RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625
        RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000
        RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
        R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000
        FS:  00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0
        Call Trace:
         trace_uprobe_create+0xe6/0xb10
         ? __kmalloc_track_caller+0xe6/0x1c0
         ? __kmalloc+0xf0/0x1d0
         ? trace_uprobe_create+0xb10/0xb10
         create_or_delete_trace_uprobe+0x35/0x90
         ? trace_uprobe_create+0xb10/0xb10
         trace_run_command+0x9c/0xb0
         trace_parse_run_command+0xf9/0x1eb
         ? probes_open+0x80/0x80
         __vfs_write+0x43/0x90
         vfs_write+0x14a/0x2a0
         ksys_write+0xa2/0x170
         do_syscall_64+0x7f/0x200
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Link: http://lkml.kernel.org/r/20190614074026.8045-1-devel@etsukata.com
      
      Cc: stable@vger.kernel.org
      Fixes: 0597c49c ("tracing/uprobes: Use dyn_event framework for uprobe events")
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f01098c7
    • YueHaibing's avatar
      tracing: Make two symbols static · ff585c5b
      YueHaibing authored
      Fix sparse warnings:
      
      kernel/trace/trace.c:6927:24: warning:
       symbol 'get_tracing_log_err' was not declared. Should it be static?
      kernel/trace/trace.c:8196:15: warning:
       symbol 'trace_instance_dir' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/20190614153210.24424-1-yuehaibing@huawei.comAcked-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      ff585c5b
    • Vasily Gorbik's avatar
      tracing: avoid build warning with HAVE_NOP_MCOUNT · cbdaeaf0
      Vasily Gorbik authored
      Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
      and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
      testing CC_USING_* defines) to avoid conditional compilation and fix
      the following gcc 9 warning on s390:
      
      kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
      but not used [-Wunused-function]
      
      Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours
      
      Fixes: 2f4df001 ("tracing: Add -mcount-nop option support")
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      cbdaeaf0
    • Eiichi Tsukata's avatar
      tracing: Fix out-of-range read in trace_stack_print() · becf33f6
      Eiichi Tsukata authored
      Puts range check before dereferencing the pointer.
      
      Reproducer:
      
        # echo stacktrace > trace_options
        # echo 1 > events/enable
        # cat trace > /dev/null
      
      KASAN report:
      
        ==================================================================
        BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0
        Read of size 8 at addr ffff888069d20000 by task cat/1953
      
        CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        Call Trace:
         dump_stack+0x8a/0xce
         print_address_description+0x60/0x224
         ? trace_stack_print+0x26b/0x2c0
         ? trace_stack_print+0x26b/0x2c0
         __kasan_report.cold+0x1a/0x3e
         ? trace_stack_print+0x26b/0x2c0
         kasan_report+0xe/0x20
         trace_stack_print+0x26b/0x2c0
         print_trace_line+0x6ea/0x14d0
         ? tracing_buffers_read+0x700/0x700
         ? trace_find_next_entry_inc+0x158/0x1d0
         s_show+0xea/0x310
         seq_read+0xaa7/0x10e0
         ? seq_escape+0x230/0x230
         __vfs_read+0x7c/0x100
         vfs_read+0x16c/0x3a0
         ksys_read+0x121/0x240
         ? kernel_write+0x110/0x110
         ? perf_trace_sys_enter+0x8a0/0x8a0
         ? syscall_slow_exit_work+0xa9/0x410
         do_syscall_64+0xb7/0x390
         ? prepare_exit_to_usermode+0x165/0x200
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f867681f910
        Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04
        RSP: 002b:00007ffdabf23488 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
        RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f867681f910
        RDX: 0000000000020000 RSI: 00007f8676cde000 RDI: 0000000000000003
        RBP: 00007f8676cde000 R08: ffffffffffffffff R09: 0000000000000000
        R10: 0000000000000871 R11: 0000000000000246 R12: 00007f8676cde000
        R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000ec0
      
        Allocated by task 1214:
         save_stack+0x1b/0x80
         __kasan_kmalloc.constprop.0+0xc2/0xd0
         kmem_cache_alloc+0xaf/0x1a0
         getname_flags+0xd2/0x5b0
         do_sys_open+0x277/0x5a0
         do_syscall_64+0xb7/0x390
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 1214:
         save_stack+0x1b/0x80
         __kasan_slab_free+0x12c/0x170
         kmem_cache_free+0x8a/0x1c0
         putname+0xe1/0x120
         do_sys_open+0x2c5/0x5a0
         do_syscall_64+0xb7/0x390
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        The buggy address belongs to the object at ffff888069d20000
         which belongs to the cache names_cache of size 4096
        The buggy address is located 0 bytes inside of
         4096-byte region [ffff888069d20000, ffff888069d21000)
        The buggy address belongs to the page:
        page:ffffea0001a74800 refcount:1 mapcount:0 mapping:ffff88806ccd1380 index:0x0 compound_mapcount: 0
        flags: 0x100000000010200(slab|head)
        raw: 0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380
        raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        >ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                           ^
         ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
         ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ==================================================================
      
      Link: http://lkml.kernel.org/r/20190610040016.5598-1-devel@etsukata.com
      
      Fixes: 4285f2fc ("tracing: Remove the ULONG_MAX stack trace hackery")
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      becf33f6
    • Andreas Gruenbacher's avatar
      gfs2: Fix rounding error in gfs2_iomap_page_prepare · 2741b672
      Andreas Gruenbacher authored
      The pos and len arguments to the iomap page_prepare callback are not
      block aligned, so we need to take that into account when computing the
      number of blocks.
      
      Fixes: d0a22a4b ("gfs2: Fix iomap write page reclaim deadlock")
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      2741b672
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 72a20cee
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Here are some arm64 fixes for -rc5.
      
        The only non-trivial change (in terms of the diffstat) is fixing our
        SVE ptrace API for big-endian machines, but the majority of this is
        actually the addition of much-needed comments and updates to the
        documentation to try to avoid this mess biting us again in future.
      
        There are still a couple of small things on the horizon, but nothing
        major at this point.
      
        Summary:
      
         - Fix broken SVE ptrace API when running in a big-endian configuration
      
         - Fix performance regression due to off-by-one in TLBI range checking
      
         - Fix build regression when using Clang"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/sve: Fix missing SVE/FPSIMD endianness conversions
        arm64: tlbflush: Ensure start/end of address range are aligned to stride
        arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
      72a20cee
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · fd6b99fa
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/devm_memremap_pages: fix final page put race
        PCI/P2PDMA: track pgmap references per resource, not globally
        lib/genalloc: introduce chunk owners
        PCI/P2PDMA: fix the gen_pool_add_virt() failure path
        mm/devm_memremap_pages: introduce devm_memunmap_pages
        drivers/base/devres: introduce devm_release_action()
        mm/vmscan.c: fix trying to reclaim unevictable LRU page
        coredump: fix race condition between collapse_huge_page() and core dumping
        mm/mlock.c: change count_mm_mlocked_page_nr return type
        mm: mmu_gather: remove __tlb_reset_range() for force flush
        fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
        mm/vmscan.c: fix recent_rotated history
        mm/mlock.c: mlockall error for flag MCL_ONFAULT
        scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE
        mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
        mm: memcontrol: don't batch updates of local VM stats and events
      fd6b99fa
    • Daniel Vetter's avatar
      Merge branch 'drm-fixes-5.2' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · e14c5873
      Daniel Vetter authored
      Fixes for 5.2:
      - Extend previous vce fix for resume to uvd and vcn
      - Fix bounds checking in ras debugfs interface
      - Fix a regression on SI using amdgpu
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190613021856.3307-1-alexander.deucher@amd.com
      e14c5873
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · c78ad1be
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - three fixes for Intel VT-d to fix a potential dead-lock, a formatting
         fix and a bit setting fix
      
       - one fix for the ARM-SMMU to make it work on some platforms with
         sub-optimal SMMU emulation
      
      * tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/arm-smmu: Avoid constant zero in TLBI writes
        iommu/vt-d: Set the right field for Page Walk Snoop
        iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock
        iommu: Add missing new line for dma type
      c78ad1be
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 7617c9a0
      Linus Torvalds authored
      Pull GPIO fix from Linus Walleij:
       "A single fix for the PCA953x driver affecting some fringe variants of
        the chip"
      
      * tag 'gpio-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: pca953x: hack to fix 24 bit gpio expanders
      7617c9a0
    • Linus Torvalds's avatar
      Merge tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · bcb46a0e
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "It might feel like deja vu to receive a bulk of changes at rc5, and it
        happens again; we've got a collection of fixes for ASoC. Most of fixes
        are targeted for the newly merged SOF (Sound Open Firmware) stuff and
        the relevant fixes for Intel platforms.
      
        Other than that, there are a few regression fixes for the recent ASoC
        core changes and HD-audio quirk, as well as a couple of FireWire fixes
        and for other ASoC codecs"
      
      * tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (54 commits)
        Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops"
        ALSA: ice1712: Check correct return value to snd_i2c_sendbytes (EWS/DMX 6Fire)
        ALSA: oxfw: allow PCM capture for Stanton SCS.1m
        ALSA: firewire-motu: fix destruction of data for isochronous resources
        ASoC: Intel: sst: fix kmalloc call with wrong flags
        ASoC: core: Fix deadlock in snd_soc_instantiate_card()
        SoC: rt274: Fix internal jack assignment in set_jack callback
        ALSA: hdac: fix memory release for SST and SOF drivers
        ASoC: SOF: Intel: hda: use the defined ppcap functions
        ASoC: core: move DAI pre-links initiation to snd_soc_instantiate_card
        ASoC: Intel: cht_bsw_rt5672: fix kernel oops with platform_name override
        ASoC: Intel: cht_bsw_nau8824: fix kernel oops with platform_name override
        ASoC: Intel: bytcht_es8316: fix kernel oops with platform_name override
        ASoC: Intel: cht_bsw_max98090: fix kernel oops with platform_name override
        ASoC: sun4i-i2s: Add offset to RX channel select
        ASoC: sun4i-i2s: Fix sun8i tx channel offset mask
        ASoC: max98090: remove 24-bit format support if RJ is 0
        ASoC: da7219: Fix build error without CONFIG_I2C
        ASoC: SOF: Intel: hda: Fix COMPILE_TEST build error
        ASoC: SOF: fix DSP oops definitions in FW ABI
        ...
      bcb46a0e
    • Andrey Ryabinin's avatar
      x86/kasan: Fix boot with 5-level paging and KASAN · f3176ec9
      Andrey Ryabinin authored
      Since commit d52888aa ("x86/mm: Move LDT remap out of KASLR region on
      5-level paging") kernel doesn't boot with KASAN on 5-level paging machines.
      The bug is actually in early_p4d_offset() and introduced by commit
      12a8cc7f ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
      
      early_p4d_offset() tries to convert pgd_val(*pgd) value to a physical
      address. This doesn't make sense because pgd_val() already contains the
      physical address.
      
      It did work prior to commit d52888aa because the result of
      "__pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK" was the same as "pgd_val(*pgd)
      & PTE_PFN_MASK". __pa_nodebug() just set some high bits which were masked
      out by applying PTE_PFN_MASK.
      
      After the change of the PAGE_OFFSET offset in commit d52888aa
      __pa_nodebug(pgd_val(*pgd)) started to return a value with more high bits
      set and PTE_PFN_MASK wasn't enough to mask out all of them. So it returns a
      wrong not even canonical address and crashes on the attempt to dereference
      it.
      
      Switch back to pgd_val() & PTE_PFN_MASK to cure the issue.
      
      Fixes: 12a8cc7f ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
      Reported-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Cc: stable@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20190614143149.2227-1-aryabinin@virtuozzo.com
      f3176ec9