Commits · ad9001f2f41198784b0423646450ba2cb24793a3 · Kirill Smelkov / linux

20 Nov, 2019 24 commits

PCI/PM: Add missing link delays required by the PCIe spec · ad9001f2

Mika Westerberg authored Nov 12, 2019

Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that consists
of a PCIe switch and two PCIe endpoints:

  +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
                                  +-01.0-[04-36]-- DS hotplug port
                                  +-02.0-[37]----00.0 xHCI controller
                                  \-04.0-[38-6b]-- DS hotplug port

The root port (1b.0) and the PCIe switch downstream ports are all PCIe Gen3
so they support 8GT/s link speeds.

We wait for the PCIe hierarchy to enter D3cold (runtime):

  pcieport 0000:00:1b.0: power state changed by ACPI to D3cold

When it wakes up from D3cold, according to the PCIe 5.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that we
must follow the rules in PCIe 5.0 section 6.6.1.

For the PCIe Gen3 ports we are dealing with here, the following applies:

  With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
  software must wait a minimum of 100 ms after Link training completes
  before sending a Configuration Request to the device immediately below
  that Port. Software can determine when Link training completes by polling
  the Data Link Layer Link Active bit or by setting up an associated
  interrupt (see Section 6.7.3.3).

Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):

  0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
  0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
  0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0

I've instrumented the kernel with some additional logging so we can see the
actual delays performed:

  pcieport 0000:00:1b.0: power state changed by ACPI to D0
  pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
  pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms

For the switch upstream port (01:00.0 reachable through 00:1b.0 root port)
we wait for 100 ms but not taking into account the DLLLA requirement. We
then wait 10 ms for D3hot -> D0 transition of the root port and the two
downstream hotplug ports. This means that we deviate from what the spec
requires.

Performing the same check for system sleep (s2idle) transitions it turns
out to be even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8. there is a specific _DSM that allows the OS to skip the delays but
this platform does not provide the _DSM and does not go to S3 anyway so no
firmware is involved that could already handle these delays.

On this particular platform these delays are not actually needed because
there is an additional delay as part of the ACPI power resource that is
used to turn on power to the hierarchy but since that additional delay is
not required by any of standards (PCIe, ACPI) it is not present in the
Intel Ice Lake, for example where missing the mandatory delays causes
pciehp to start tearing down the stack too early (links are not yet
trained). Below is an example how it looks like when this happens:

  pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
  pcieport 0000:87:04.0: PME# disabled
  pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
  pcieport 0000:86:00.0: Refused to change power state, currently in D3
  pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
  pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
  ...

There is also one reported case (see the bugzilla link below) where the
missing delay causes xHCI on a Titan Ridge controller fail to runtime
resume when USB-C dock is plugged. This does not involve pciehp but instead
the PCI core fails to runtime resume the xHCI device:

  pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
  xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
  xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
  xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
  ...

Add a new function pci_bridge_wait_for_secondary_bus() that is called on
PCI core resume and runtime resume paths accordingly if the bridge entered
D3cold (and thus went through reset).

This is second attempt to add the missing delays. The previous solution in
c2bf1fc2 ("PCI: Add missing link delays required by the PCIe spec") was
reverted because of two issues it caused:

  1. One system become unresponsive after S3 resume due to PME service
     spinning in pcie_pme_work_fn(). The root port in question reports that
     the xHCI sent PME but the xHCI device itself does not have PME status
     set. The PME status bit is never cleared in the root port resulting
     the indefinite loop in pcie_pme_work_fn().

  2. Slows down resume if the root/downstream port does not support Data
     Link Layer Active Reporting because pcie_wait_for_link_delay() waits
     1100 ms in that case.

This version should avoid the above issues because we restrict the delay to
happen only if the port went into D3cold.

Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
Link: https://lore.kernel.org/r/20191112091617.70282-3-mika.westerberg@linux.intel.comReported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

ad9001f2

PCI/PM: Add pcie_wait_for_link_delay() · 4827d638

Mika Westerberg authored Nov 12, 2019

Add pcie_wait_for_link_delay(). Similar to pcie_wait_for_link() but allows
passing custom activation delay in milliseconds.

Link: https://lore.kernel.org/r/20191112091617.70282-2-mika.westerberg@linux.intel.comSigned-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

4827d638

PCI/PM: Return error when changing power state from D3cold · 327ccbbc

Bjorn Helgaas authored Aug 01, 2019

pci_raw_set_power_state() uses the Power Management capability to change a
device's power state. The capability is in config space, which is
accessible in D0, D1, D2, and D3hot, but not in D3cold.

If we call pci_raw_set_power_state() on a device that's in D3cold, config
reads fail and return ~0 data, which we erroneously interpreted as "the
device is in D3hot", leading to messages like this:

pcieport 0000:03:00.0: Refused to change power state, currently in D3

The PCI_PM_CTRL has several RsvdP fields, so ~0 is never a valid register
value. If we get that value, print a more informative message and return
an error.

Changing the power state of a device from D3cold must be done by a platform
power management method or some other non-config space mechanism.

Link: https://lore.kernel.org/r/20190822200551.129039-4-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

327ccbbc

PCI/PM: Decode D3cold power state correctly · e43f15ea

Bjorn Helgaas authored Aug 02, 2019

Use pci_power_name() to print pci_power_t correctly.  This changes:

  "state 0" or "D0"   to   "D0"
  "state 1" or "D1"   to   "D1"
  "state 2" or "D2"   to   "D2"
  "state 3" or "D3"   to   "D3hot"
  "state 4" or "D4"   to   "D3cold"

Changes dmesg logging only, no other functional change intended.

Link: https://lore.kernel.org/r/20190822200551.129039-3-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

e43f15ea

PCI/PM: Fold __pci_complete_power_transition() into its caller · 9c77e63b

Rafael J. Wysocki authored Nov 05, 2019

Because pci_set_power_state() has become the only caller of
__pci_complete_power_transition(), there is no need for the latter to
be a separate function any more, so fold it into the former, drop a
redundant check and reduce the number of lines of code somewhat.

Code rearrangement, no intentional functional impact.

Link: https://lore.kernel.org/r/15576968.k611qn3UU0@kreacherSigned-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

9c77e63b

PCI/PM: Avoid exporting __pci_complete_power_transition() · d6aa37cd

Rafael J. Wysocki authored Nov 05, 2019

Notice that radeon_set_suspend(), which is the only caller of
__pci_complete_power_transition() outside of pci.c, really only
cares about the pci_platform_power_transition() invoked by it,
so export the latter instead of it, update the radeon driver to
call pci_platform_power_transition() directly and make
__pci_complete_power_transition() static.

Code rearrangement, no intentional functional impact.

Link: https://lore.kernel.org/r/1731661.ykamz2Tiuf@kreacherSigned-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

d6aa37cd

PCI/PM: Fold __pci_start_power_transition() into its caller · dc2256b0

Rafael J. Wysocki authored Nov 05, 2019

Because pci_power_up() has become the only caller of
__pci_start_power_transition(), there is no need for the latter to
be a separate function any more, so fold it into the former, drop a
redundant check and reduce the number of lines of code somewhat.

Code rearrangement, no intentional functional impact.

Link: https://lore.kernel.org/r/3458080.lsoDbfkST9@kreacherSigned-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

dc2256b0

PCI/PM: Use pci_power_up() in pci_set_power_state() · adfac8f6

Rafael J. Wysocki authored Nov 05, 2019

Make it explicitly clear that the code to put devices into D0 in
pci_set_power_state() and in pci_pm_default_resume_early() is the
same by making the latter use pci_power_up() for transitions into D0.

Code rearrangement, no intentional functional impact.

Link: https://lore.kernel.org/r/2520019.OZ1nXS5aSj@kreacherSigned-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

adfac8f6

PCI/PM: Move power state update away from pci_power_up() · 81cfa590

Rafael J. Wysocki authored Nov 05, 2019

Move the invocation of pci_update_current_state() from pci_power_up() to
pci_pm_default_resume_early(), which is the only caller of that function.

Preparatory change, no functional impact.

Link: https://lore.kernel.org/r/37482337.udjOGdOKNb@kreacherSigned-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

81cfa590

PCI/PM: Remove unused pci_driver.suspend_late() hook · 1a1daf09

Bjorn Helgaas authored Oct 31, 2019

The struct pci_driver.suspend_late() hook is one of the legacy PCI power
management callbacks, and there are no remaining users of it. Remove it.

Link: https://lore.kernel.org/r/20191101204558.210235-7-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

1a1daf09

PCI/PM: Remove unused pci_driver.resume_early() hook · 89cdbc35

Bjorn Helgaas authored Oct 31, 2019

The struct pci_driver.resume_early() hook is one of the legacy PCI power
management callbacks, and there are no remaining users of it. Remove it.

Link: https://lore.kernel.org/r/20191101204558.210235-6-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

89cdbc35

xen-platform: Convert to generic power management · 77b84bb3

Bjorn Helgaas authored Nov 01, 2019

Convert xen-platform from the legacy PCI power management callbacks to the
generic operations. This is one step towards removing support for the
legacy PCI callbacks.

The generic .resume_noirq() operation is called by pci_pm_resume_noirq() at
the same point the legacy PCI .resume_early() callback was, so this patch
should not change the xen-platform behavior.

Link: https://lore.kernel.org/r/20191101204558.210235-5-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: KarimAllah Ahmed <karahmed@amazon.de>

77b84bb3

PCI/PM: Simplify pci_set_power_state() · baef7f8e

Bjorn Helgaas authored Oct 16, 2019

Check for the PCI_DEV_FLAGS_NO_D3 quirk early, before calling
__pci_start_power_transition().  This way all the cases where we don't need
to do anything at all are checked up front.

This doesn't fix anything because if the caller requested D3hot or D3cold,
__pci_start_power_transition() is a no-op.  But calling it is pointless and
makes the code harder to analyze.

Link: https://lore.kernel.org/r/20191101204558.210235-4-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

baef7f8e

PCI/PM: Expand PM reset messages to mention D3hot (not just D3) · 993cc6d1

Bjorn Helgaas authored Oct 28, 2019

pci_pm_reset() resets a device by putting it in D3hot and bringing it back
to D0. Clarify related messages to mention "D3hot" explicitly instead of
just "D3".

Link: https://lore.kernel.org/r/20191101204558.210235-3-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

993cc6d1

PCI/PM: Apply D2 delay as milliseconds, not microseconds · 7e24bc34

Bjorn Helgaas authored Oct 23, 2019

PCI_PM_D2_DELAY is defined as 200, which is milliseconds, but previously we
used udelay(), which only waited for 200 microseconds. Use msleep()
instead so we wait the correct amount of time. See PCIe r5.0, sec 5.9.

Link: https://lore.kernel.org/r/20191101204558.210235-2-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

7e24bc34

PCI/PM: Use pci_WARN() to include device information · 12bcae44

Bjorn Helgaas authored Oct 07, 2019

Add and use pci_WARN() wrappers so warnings include device information.

Link: https://lore.kernel.org/r/20191017212851.54237-3-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

12bcae44

PCI/PM: Use PCI dev_printk() wrappers for consistency · 6941a0c2

Bjorn Helgaas authored Oct 07, 2019

Use the PCI dev_printk() wrappers for consistency with the rest of the PCI
core. No functional change intended.

Link: https://lore.kernel.org/r/20191017212851.54237-2-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

6941a0c2

PCI/PM: Wrap long lines in documentation · b64cf7a1

Bjorn Helgaas authored Oct 08, 2019

Documentation/power/pci.rst is wrapped to fit in 80 columns, but directory
structure changes made a few lines longer. Wrap them so they all fit in 80
columns again.

Link: https://lore.kernel.org/r/20191014230016.240912-7-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

b64cf7a1

PCI/PM: Note that PME can be generated from D0 · 85a9b050

Bjorn Helgaas authored Oct 08, 2019

Per PCIe r5.0 sec 7.5.2.1, PME may be generated from D0, so update
Documentation/power/pci.rst to reflect that.

Link: https://lore.kernel.org/r/20191016194450.68959-1-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

85a9b050

PCI/PM: Make power management op coding style consistent · 6da2f2cc

Bjorn Helgaas authored Oct 14, 2019

Some of the power management ops use this style:

  struct device_driver *drv = dev->driver;
  if (drv && drv->pm && drv->pm->prepare(dev))
    drv->pm->prepare(dev);

while others use this:

  const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
  if (pm && pm->runtime_resume)
    pm->runtime_resume(dev);

Convert the first style to the second so they're all consistent.  Remove
local "error" variables when unnecessary.  No functional change intended.

Link: https://lore.kernel.org/r/20191014230016.240912-6-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

6da2f2cc

PCI/PM: Run resume fixups before disabling wakeup events · f7b32a86

Bjorn Helgaas authored Oct 12, 2019

pci_pm_resume() and pci_pm_restore() call pci_pm_default_resume(), which
runs resume fixups before disabling wakeup events:

  static void pci_pm_default_resume(struct pci_dev *pci_dev)
  {
    pci_fixup_device(pci_fixup_resume, pci_dev);
    pci_enable_wake(pci_dev, PCI_D0, false);
  }

pci_pm_runtime_resume() does both of these, but in the opposite order:

  pci_enable_wake(pci_dev, PCI_D0, false);
  pci_fixup_device(pci_fixup_resume, pci_dev);

We should always use the same ordering unless there's a reason to do
otherwise.  Change pci_pm_runtime_resume() to call pci_pm_default_resume()
instead of open-coding this, so the fixups are always done before disabling
wakeup events.

pci_pm_default_resume() is called from pci_pm_runtime_resume(), which is
under #ifdef CONFIG_PM.  If SUSPEND and HIBERNATION are disabled, PM_SLEEP
is disabled also, so move pci_pm_default_resume() from #ifdef
CONFIG_PM_SLEEP to #ifdef CONFIG_PM.

Link: https://lore.kernel.org/r/20191014230016.240912-5-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

f7b32a86

PCI/PM: Clear PCIe PME Status even for legacy power management · ec6a75ef

Bjorn Helgaas authored Oct 10, 2019

Previously, pci_pm_resume_noirq() cleared the PME Status bit in the Root
Status register only if the device had no driver or the driver did not
implement legacy power management. It should clear PME Status regardless
of what sort of power management the driver supports, so do this before
checking for legacy power management.

This affects Root Ports and Root Complex Event Collectors, for which the
usual driver is the PCIe portdrv, which implements new power management, so
this change is just on principle, not to fix any actual defects.

Fixes: a39bd851 ("PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver")
Link: https://lore.kernel.org/r/20191014230016.240912-4-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ec6a75ef

PCI/PM: Correct pci_pm_thaw_noirq() documentation · dc68b406

Bjorn Helgaas authored Oct 14, 2019

According to the documentation, pci_pm_thaw_noirq() did not put the device
into the full-power state and restore its standard configuration registers.
This is incorrect, so update the documentation to match the code.

Link: https://lore.kernel.org/r/20191014230016.240912-3-helgaas@kernel.orgSigned-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

dc68b406

PCI/PM: Always return devices to D0 when thawing · f2c33cca

Dexuan Cui authored Aug 14, 2019

pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
configuration registers, but previously it only did that for devices whose
drivers implemented the new power management ops.

Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
devices, creating a hibernation image, thawing devices, writing the image,
and powering off.  The fact that thawing did not return devices with legacy
power management to D0 caused errors, e.g., in this path:

  pci_pm_thaw_noirq
    if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
      return pci_legacy_resume_early(dev)   # ... legacy PM skips the rest
    pci_set_power_state(pci_dev, PCI_D0)
    pci_restore_state(pci_dev)
  pci_pm_thaw
    if (pci_has_legacy_pm_support(pci_dev))
      pci_legacy_resume
	drv->resume
	  mlx4_resume
	    ...
	      pci_enable_msix_range
	        ...
		  if (dev->current_state != PCI_D0)  # <---
		    return -EINVAL;

which caused these warnings:

  mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
  PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
  PM: Device a6d1:00:02.0 failed to thaw: error -95

Return devices to D0 and restore config registers for all devices, not just
those whose drivers support new power management.

[bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(),
update comment, add stable tag, commit log]
Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COMSigned-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org	# v4.13+

f2c33cca

20 Oct, 2019 6 commits

Linux 5.4-rc4 · 7d194c21
Linus Torvalds authored Oct 20, 2019

7d194c21

Merge tag 'kbuild-fixes-v5.4-2' of... · e2ab4ef8

Linus Torvalds authored Oct 20, 2019

Merge tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild fixes from Masahiro Yamada:

 - fix a bashism of setlocalversion

 - do not use the too new --sort option of tar

* tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kheaders: substituting --sort in archive creation
  scripts: setlocalversion: fix a bashism
  kbuild: update comment about KBUILD_ALLDIRS

e2ab4ef8

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4fe34d61

Linus Torvalds authored Oct 20, 2019

Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - Prevent a NULL pointer dereference in the X2APIC code in case of a
     CPU hotplug failure.

   - Prevent boot failures on HP superdome machines by invalidating the
     level2 kernel pagetable entries outside of the kernel area as
     invalid so BIOS reserved space won't be touched unintentionally.

     Also ensure that memory holes are rounded up to the next PMD
     boundary correctly.

   - Enable X2APIC support on Hyper-V to prevent boot failures.

   - Set the paravirt name when running on Hyper-V for consistency

   - Move a function under the appropriate ifdef guard to prevent build
     warnings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
  x86/hyperv: Set pv_info.name to "Hyper-V"
  x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
  x86/hyperv: Make vapic support x2apic mode
  x86/boot/64: Round memory hole size up to next PMD page
  x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area

4fe34d61

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 81c4bc31

Linus Torvalds authored Oct 20, 2019

Pull irq fixes from Thomas Gleixner:
 "A small set of irq chip driver fixes and updates:

   - Update the SIFIVE PLIC interrupt driver to use the fasteoi handler
     to address the shortcomings of the existing flow handling which was
     prone to lose interrupts

   - Use the proper limit for GIC interrupt line numbers

   - Add retrigger support for the recently merged Anapurna Labs Fabric
     interrupt controller to make it complete

   - Enable the ATMEL AIC5 interrupt controller driver on the new
     SAM9X60 SoC"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/sifive-plic: Switch to fasteoi flow
  irqchip/gic-v3: Fix GIC_LINE_NR accessor
  irqchip/atmel-aic5: Add support for sam9x60 irqchip
  irqchip/al-fic: Add support for irq retrigger

81c4bc31

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 188768f3

Linus Torvalds authored Oct 20, 2019

Pull hrtimer fixlet from Thomas Gleixner:
 "A single commit annotating the lockcless access to timer->base with
  READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Annotate lockless access to timer->base

188768f3

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 589f1222

Linus Torvalds authored Oct 20, 2019

Pull stop-machine fix from Thomas Gleixner:
 "A single fix, amending stop machine with WRITE/READ_ONCE() to address
  the fallout of KCSAN"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine: Avoid potential race behaviour

589f1222

19 Oct, 2019 10 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 531e93d1

Linus Torvalds authored Oct 19, 2019

Pull networking fixes from David Miller:
 "I was battling a cold after some recent trips, so quite a bit piled up
  meanwhile, sorry about that.

  Highlights:

   1) Fix fd leak in various bpf selftests, from Brian Vazquez.

   2) Fix crash in xsk when device doesn't support some methods, from
      Magnus Karlsson.

   3) Fix various leaks and use-after-free in rxrpc, from David Howells.

   4) Fix several SKB leaks due to confusion of who owns an SKB and who
      should release it in the llc code. From Eric Biggers.

   5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.

   6) Jumbo packets don't work after resume on r8169, as the BIOS resets
      the chip into non-jumbo mode during suspend. From Heiner Kallweit.

   7) Corrupt L2 header during MPLS push, from Davide Caratti.

   8) Prevent possible infinite loop in tc_ctl_action, from Eric
      Dumazet.

   9) Get register bits right in bcmgenet driver, based upon chip
      version. From Florian Fainelli.

  10) Fix mutex problems in microchip DSA driver, from Marek Vasut.

  11) Cure race between route lookup and invalidation in ipv4, from Wei
      Wang.

  12) Fix performance regression due to false sharing in 'net'
      structure, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
  net: reorder 'struct net' fields to avoid false sharing
  net: dsa: fix switch tree list
  net: ethernet: dwmac-sun8i: show message only when switching to promisc
  net: aquantia: add an error handling in aq_nic_set_multicast_list
  net: netem: correct the parent's backlog when corrupted packet was dropped
  net: netem: fix error path for corrupted GSO frames
  macb: propagate errors when getting optional clocks
  xen/netback: fix error path of xenvif_connect_data()
  net: hns3: fix mis-counting IRQ vector numbers issue
  net: usb: lan78xx: Connect PHY before registering MAC
  vsock/virtio: discard packets if credit is not respected
  vsock/virtio: send a credit update when buffer size is changed
  mlxsw: spectrum_trap: Push Ethernet header before reporting trap
  net: ensure correct skb->tstamp in various fragmenters
  net: bcmgenet: reset 40nm EPHY on energy detect
  net: bcmgenet: soft reset 40nm EPHYs before MAC init
  net: phy: bcm7xxx: define soft_reset for 40nm EPHY
  net: bcmgenet: don't set phydev->link from MAC
  net: Update address for MediaTek ethernet driver in MAINTAINERS
  ipv4: fix race condition between route lookup and invalidation
  ...

531e93d1

net: reorder 'struct net' fields to avoid false sharing · 2a06b898

Eric Dumazet authored Oct 18, 2019

Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.

Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)

So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.

We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.

We move in the first cache line fields that can be
dirtied often.

We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.

Fixes: 355b9855 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a06b898

net: dsa: fix switch tree list · 50c7d2ba

Vivien Didelot authored Oct 18, 2019

If there are multiple switch trees on the device, only the last one
will be listed, because the arguments of list_add_tail are swapped.

Fixes: 83c0afae ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50c7d2ba

net: ethernet: dwmac-sun8i: show message only when switching to promisc · 05908d72

Mans Rullgard authored Oct 18, 2019

Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam.  Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05908d72

net: aquantia: add an error handling in aq_nic_set_multicast_list · 3d00cf2f

Chenwandun authored Oct 18, 2019

add an error handling in aq_nic_set_multicast_list, it may not
work when hw_multicast_list_set error; and at the same time
it will remove gcc Wunused-but-set-variable warning.
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Reviewed-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d00cf2f

Merge branch 'netem-fix-further-issues-with-packet-corruption' · 70873837

David S. Miller authored Oct 19, 2019

Jakub Kicinski says:

====================
net: netem: fix further issues with packet corruption

This set is fixing two more issues with the netem packet corruption.

First patch (which was previously posted) avoids NULL pointer dereference
if the first frame gets freed due to allocation or checksum failure.
v2 improves the clarity of the code a little as requested by Cong.

Second patch ensures we don't return SUCCESS if the frame was in fact
dropped. Thanks to this commit message for patch 1 no longer needs the
"this will still break with a single-frame failure" disclaimer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

70873837

net: netem: correct the parent's backlog when corrupted packet was dropped · e0ad032e

Jakub Kicinski authored Oct 18, 2019

If packet corruption failed we jump to finish_segs and return
NET_XMIT_SUCCESS. Seeing success will make the parent qdisc
increment its backlog, that's incorrect - we need to return
NET_XMIT_DROP.

Fixes: 6071bd1a ("netem: Segment GSO packets on enqueue")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0ad032e

net: netem: fix error path for corrupted GSO frames · a7fa12d1

Jakub Kicinski authored Oct 18, 2019

To corrupt a GSO frame we first perform segmentation.  We then
proceed using the first segment instead of the full GSO skb and
requeue the rest of the segments as separate packets.

If there are any issues with processing the first segment we
still want to process the rest, therefore we jump to the
finish_segs label.

Commit 177b8007 ("net: netem: fix backlog accounting for
corrupted GSO frames") started using the pointer to the first
segment in the "rest of segments processing", but as mentioned
above the first segment may had already been freed at this point.

Backlog corrections for parent qdiscs have to be adjusted.

Fixes: 177b8007 ("net: netem: fix backlog accounting for corrupted GSO frames")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7fa12d1

macb: propagate errors when getting optional clocks · bd310aca

Michael Tretter authored Oct 18, 2019

The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
marks clock as not available if it receives an error when trying to get
a clock. This is wrong, because a clock controller might return
-EPROBE_DEFER if a clock is not available, but will eventually become
available.

In these cases, the driver would probe successfully but will never be
able to adjust the clocks, because the clocks were not available during
probe, but became available later.

For example, the clock controller for the ZynqMP is implemented in the
PMU firmware and the clocks are only available after the firmware driver
has been probed.

Use devm_clk_get_optional() in instead of devm_clk_get() to get the
optional clock and propagate all errors to the calling function.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd310aca

xen/netback: fix error path of xenvif_connect_data() · 3d5c1a03

Juergen Gross authored Oct 18, 2019

xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().

Remove the superfluous module_put().

Fixes: 279f438e ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d5c1a03