Commits · 11c1a8e1f4fb697c3d504b1951f476e2e0b3c076 · Kirill Smelkov / linux

15 Aug, 2018 10 commits

Bjorn Helgaas authored Aug 15, 2018

  - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)

* pci/msi:
  PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips

11c1a8e1

Merge branch 'pci/misc' · a40f72db

Bjorn Helgaas authored Aug 15, 2018

  - Mark fall-through switch cases before enabling -Wimplicit-fallthrough
    (Gustavo A. R. Silva)

  - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig)

  - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied
    (Heiner Kallweit)

  - Unify PCI and DMA direction #defines (Shunyong Yang)

  - Add PCI_DEVICE_DATA() macro (Andy Shevchenko)

  - Check for VPD completion before checking for timeout (Bert Kenward)

  - Limit Netronome NFP5000 config space size to work around erratum (Jakub
    Kicinski)

* pci/misc:
  PCI: Limit config space size for Netronome NFP5000
  PCI/VPD: Check for VPD access completion before checking for timeout
  PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
  PCI: Unify PCI and normal DMA direction definitions
  PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler
  PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
  PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough

# Conflicts:
#	drivers/pci/hotplug/pciehp_ctrl.c

a40f72db

Merge branch 'pci/hotplug' · c0638a45

Bjorn Helgaas authored Aug 15, 2018

  - Simplify SHPC existence/permission checks (Bjorn Helgaas)

  - Remove hotplug sample skeleton driver (Lukas Wunner)

  - Convert pciehp to threaded IRQ handling (Lukas Wunner)

  - Improve pciehp tolerance of missed events and initially unstable links
    (Lukas Wunner)

  - Clear spurious pciehp events on resume (Lukas Wunner)

  - Add pciehp runtime PM support, including for Thunderbolt controllers
    (Lukas Wunner)

  - Support interrupts from pciehp bridges in D3hot (Lukas Wunner)

* pci/hotplug:
  PCI: pciehp: Deduplicate presence check on probe & resume
  PCI: pciehp: Avoid implicit fallthroughs in switch statements
  PCI: Whitelist Thunderbolt ports for runtime D3
  PCI: Whitelist native hotplug ports for runtime D3
  PCI: sysfs: Resume to D0 on function reset
  PCI: pciehp: Resume parent to D0 on config space access
  PCI: pciehp: Resume to D0 on enable/disable
  PCI: pciehp: Support interrupts sent from D3hot
  PCI: pciehp: Obey compulsory command delay after resume
  PCI: pciehp: Clear spurious events earlier on resume
  PCI: portdrv: Deduplicate PM callback iterator
  PCI: pciehp: Avoid slot access during reset
  PCI: pciehp: Always enable occupied slot on probe
  PCI: pciehp: Become resilient to missed events
  PCI: pciehp: Tolerate initially unstable link
  PCI: pciehp: Declare pciehp_enable/disable_slot() static
  PCI: pciehp: Drop enable/disable lock
  PCI: pciehp: Enable/disable exclusively from IRQ thread
  PCI: pciehp: Track enable/disable status
  PCI: pciehp: Publish to user space last on probe
  PCI: hotplug: Demidlayer registration with the core
  PCI: pciehp: Drop slot workqueue
  PCI: pciehp: Handle events synchronously
  PCI: pciehp: Stop blinking on slot enable failure
  PCI: pciehp: Convert to threaded polling
  PCI: pciehp: Convert to threaded IRQ
  PCI: pciehp: Document struct slot and struct controller
  PCI: pciehp: Declare pciehp_unconfigure_device() void
  PCI: pciehp: Drop unnecessary NULL pointer check
  PCI: pciehp: Fix unprotected list iteration in IRQ handler
  PCI: pciehp: Fix use-after-free on unplug
  PCI: hotplug: Don't leak pci_slot on registration failure
  PCI: hotplug: Delete skeleton driver
  PCI: shpchp: Separate existence of SHPC and permission to use it

c0638a45

Merge branch 'pci/enumeration' · a8bcb5e5

Bjorn Helgaas authored Aug 15, 2018

  - Work around IDT switch ACS Source Validation erratum (James
    Puthukattukaran)

  - Emit diagnostics for all cases of PCIe Link downtraining (Links
    operating slower than they're capable of) (Alexandru Gagniuc)

  - Skip VFs when configuring Max Payload Size (Myron Stowe)

  - Reduce Root Port Max Payload Size if necessary when hot-adding a device
    below it (Myron Stowe)

* pci/enumeration:
  PCI: Match Root Port's MPS to endpoint's MPSS as necessary
  PCI: Skip MPS logic for Virtual Functions (VFs)
  PCI: Check for PCIe Link downtraining
  PCI: Workaround IDT switch ACS Source Validation erratum

a8bcb5e5

Merge branch 'pci/dpc' · 1ca358a8

Bjorn Helgaas authored Aug 15, 2018

  - Defer DPC event handling to work queue (Keith Busch)

  - Use threaded IRQ for DPC bottom half (Keith Busch)

  - Print AER status while handling DPC events (Keith Busch)

* pci/dpc:
  PCI/DPC: Remove indirection waiting for inactive link
  PCI/DPC: Use threaded IRQ for bottom half handling
  PCI/DPC: Print AER status in DPC event handling
  PCI/DPC: Remove rp_pio_status from dpc struct
  PCI/DPC: Defer event handling to work queue
  PCI/DPC: Leave interrupts enabled while handling event

1ca358a8

Merge branch 'pci/aspm' · 187dacce

Bjorn Helgaas authored Aug 15, 2018

  - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
    Shevchenko)

  - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)

* pci/aspm:
  PCI: Remove unnecessary include of <linux/pci-aspm.h>
  iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
  ath9k: Remove unnecessary include of <linux/pci-aspm.h>
  igb: Remove unnecessary include of <linux/pci-aspm.h>
  PCI/ASPM: Convert to use sysfs_match_string() helper

187dacce

Merge branch 'pci/aer' · 3c3ab37f

Bjorn Helgaas authored Aug 15, 2018

  - Decode AER errors with names similar to "lspci" (Tyler Baicar)

  - Expose AER statistics in sysfs (Rajat Jain)

  - Clear AER status bits selectively based on the type of recovery (Oza
    Pawandeep)

  - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
    Gagniuc)

  - Don't clear AER status bits if we're using the "Firmware-First"
    strategy where firmware owns the registers (Alexandru Gagniuc)

* pci/aer:
  PCI/AER: Don't clear AER bits if error handling is Firmware-First
  PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
  PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
  PCI/AER: Clear device status bits during ERR_COR handling
  PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
  PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
  PCI/AER: Factor out ERR_NONFATAL status bit clearing
  PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
  PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
  PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
  PCI/AER: Add sysfs attributes for rootport cumulative stats
  PCI/AER: Add sysfs attributes to provide AER stats and breakdown
  PCI/AER: Define aer_stats structure for AER capable devices
  PCI/AER: Move internal declarations to drivers/pci/pci.h
  PCI/AER: Adopt lspci names for AER error decoding
  PCI/AER: Expose internal API for obtaining AER information

# Conflicts:
#	drivers/pci/pci.h

3c3ab37f

Merge branch 'for-linus' · af863d18

Bjorn Helgaas authored Aug 15, 2018

* for-linus:
  PCI: Fix is_added/is_busmaster race condition
  PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
  PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
  PCI: v3-semi: Fix I/O space page leak
  PCI: mediatek: Fix I/O space page leak
  PCI: faraday: Fix I/O space page leak
  PCI: aardvark: Fix I/O space page leak
  PCI: designware: Fix I/O space page leak
  PCI: versatile: Fix I/O space page leak
  PCI: xgene: Fix I/O space page leak
  PCI: OF: Fix I/O space page leak
  PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
  PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
  nfp: stop limiting VFs to 0
  PCI/IOV: Reset total_VFs limit after detaching PF driver
  PCI: faraday: Add missing of_node_put()
  PCI: xilinx-nwl: Add missing of_node_put()
  PCI: xilinx: Add missing of_node_put()
  PCI: endpoint: Use after free in pci_epf_unregister_driver()
  PCI: controller: dwc: Do not let PCIE_DW_PLAT_HOST default to yes
  PCI: rcar: Clean up PHY init on failure
  PCI: rcar: Shut the PHY down in failpath
  PCI: controller: Move PCI_DOMAINS selection to arch Kconfig
  PCI: Initialize endpoint library before controllers
  PCI: shpchp: Manage SHPC unconditionally on non-ACPI systems

af863d18

PCI/AER: Don't clear AER bits if error handling is Firmware-First · 45687f96

Alexandru Gagniuc authored Jul 17, 2018

If the platform requests Firmware-First error handling, firmware is
responsible for reading and clearing AER status bits.  If OSPM also clears
them, we may miss errors.  See ACPI v6.2, sec 18.3.2.5 and 18.4.

This race is mostly of theoretical significance, as it is not easy to
reasonably demonstrate it in testing.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
and pci_aer_clear_fatal_status()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

45687f96

PCI: Limit config space size for Netronome NFP5000 · 2538fb89

Jakub Kicinski authored Aug 14, 2018

Like the NFP4000 and NFP6000, the NFP5000 as an erratum where reading/
writing to PCI config space addresses above 0x600 can cause the NFP to
generate PCIe completion timeouts.

Limit the NFP5000's PF's config space size to 0x600 bytes as is already
done for the NFP4000 and NFP6000.

The NFP5000's VF is 0x6003 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
device ID as the NFP6000's VF.  Thus, its config space is already limited
by the existing use of quirk_nfp6000().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tony Egan <tony.egan@netronome.com>

2538fb89

14 Aug, 2018 5 commits

PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips · 923aa4c3

Heiner Kallweit authored Aug 05, 2018

If flag IRQCHIP_ONESHOT_SAFE isn't set for an irqchip and we have a
threaded interrupt with no primary handler, flag IRQF_ONESHOT needs to be
set for the interrupt, causing some overhead in the threaded interrupt
handler. For more detailed explanation also check following comment in
__setup_irq():

The interrupt was requested with handler = NULL, so we use the default
primary handler for it. But it does not have the oneshot flag set. In
combination with level interrupts this is deadly, because the default
primary handler just wakes the thread, then the irq lines is reenabled,
but the device still has the level irq asserted. Rinse and repeat....

While this works for edge type interrupts, we play it safe and reject
unconditionally because we can't say for sure which type this interrupt
really has. The type flags are unreliable as the underlying chip
implementation can override them.

Another comment in __setup_irq() gives a hint already that this
overhead can be avoided for PCI-MSI:

Some irq chips like MSI based interrupts are per se one shot safe. Check
the chip flags, so we can avoid the unmask dance at the end of the
threaded handler for those.

Following this let's mark all PCI-MSI irqchips as oneshot-safe.

See also discussion here:
https://lkml.kernel.org/r/alpine.DEB.2.21.1808032136490.1658@nanos.tec.linutronix.deSigned-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

923aa4c3

PCI/VPD: Check for VPD access completion before checking for timeout · 6eaf2781

Bert Kenward authored Jul 26, 2018

Previously we checked the timeout before checking the VPD access completion
bit. On a very heavily loaded system this can cause VPD access to timeout.
Check the completion bit before checking the timeout.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

6eaf2781

PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry · b72ae8ca

Andy Shevchenko authored Jul 29, 2018

There are a lot of examples in the kernel where PCI_VDEVICE() is used and
still looks not so convenient due to additional driver_data field attached.

Introduce PCI_DEVICE_DATA() macro to fully describe device ID entry in
shortest possible form. For example,

  before:

    { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_MRFLD),
      (kernel_ulong_t) &dwc3_pci_mrfld_properties, },

  after:

    { PCI_DEVICE_DATA(INTEL, MRFLD, &dwc3_pci_mrfld_properties) },

Drivers can be converted later on in independent way.

While here, remove the unused macro with the same name from Ralink wireless
driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org>	# for rt2x00

b72ae8ca

PCI: Match Root Port's MPS to endpoint's MPSS as necessary · 9f0e8935

Myron Stowe authored Aug 13, 2018

In commit 27d868b5 ("PCI: Set MPS to match upstream bridge"), we made
sure every device's MPS setting matches its upstream bridge, making it more
likely that a hot-added device will work in a system with an optimized MPS
configuration.

Recently I've started encountering systems where the endpoint device's MPSS
capability is less than its Root Port's current MPS value, thus the
endpoint is not capable of matching its upstream bridge's MPS setting (see:
bugzilla via "Link:" below).  This leaves the system vulnerable - the
upstream Root Port could respond with larger TLPs than the device can
handle, and the device will consider them to be 'Malformed'.

One could use the "pci=pcie_bus_safe" kernel parameter to work around the
issue, but that forces a user to supply a kernel parameter to get the
system to function reliably and may end up limiting MPS settings of other
unrelated, sub-topologies which could benefit from maintaining their larger
values.

Augment Keith's approach to include tuning down a Root Port's MPS setting
when its hot-added endpoint device is not capable of matching it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>

9f0e8935

PCI: Skip MPS logic for Virtual Functions (VFs) · 3dbe97ef

Myron Stowe authored Aug 13, 2018

PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
VFs.  Just prior to the table it states:

  "PF and VF functionality is defined in Section 7.5.3.4 except where
   noted in Table 9-16.  For VF fields marked 'RsvdP', the PF setting
   applies to the VF."

All of which implies that with respect to Max_Payload_Size Supported
(MPSS), MPS, and MRRS values, we should not be paying any attention to the
VF's fields, but rather only to the PF's.  Only looking at the PF's fields
also logically makes sense as it's the sole physical interface to the PCIe
bus.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Fixes: 27d868b5 ("PCI: Set MPS to match upstream bridge")
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.3+
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>

3dbe97ef

10 Aug, 2018 1 commit

PCI: Check for PCIe Link downtraining · 2d1ce5ec

Alexandru Gagniuc authored Aug 06, 2018

When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained".  A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.

Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.

Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links.  It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

2d1ce5ec

06 Aug, 2018 5 commits

PCI: Remove unnecessary include of <linux/pci-aspm.h> · ce29af2a

Bjorn Helgaas authored Jul 25, 2018

Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file.  Remove the unnecessary includes of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

ce29af2a

iwlwifi: Remove unnecessary include of <linux/pci-aspm.h> · 2b2654b8

Bjorn Helgaas authored Jul 25, 2018

This part of the iwlwifi driver doesn't need anything provided by
pci-aspm.h, so remove the unnecessary include of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Luca Coelho <luciano.coelho@intel.com>

2b2654b8

ath9k: Remove unnecessary include of <linux/pci-aspm.h> · a78613c0

Bjorn Helgaas authored Jul 25, 2018

The ath9k driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>

a78613c0

igb: Remove unnecessary include of <linux/pci-aspm.h> · f5ddcf71

Bjorn Helgaas authored Jul 25, 2018

The igb driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f5ddcf71

PCI/ASPM: Convert to use sysfs_match_string() helper · 36131ce9

Andy Shevchenko authored Aug 06, 2018

The sysfs_match_string() helper returns index of the matching string in an
array.  Use it in pcie_aspm_set_policy() to simplify the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bhelgaas: squash sysfs_match_string() fix into original patch for issue
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

36131ce9

31 Jul, 2018 16 commits

PCI: Unify PCI and normal DMA direction definitions · 546c596c

Shunyong Yang authored Jul 18, 2018

Current DMA direction definitions in pci-dma-compat.h and dma-direction.h
are mirrored in value.  Unify them to enhance readability and avoid
possible inconsistency.
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Joey Zheng <yu.zheng@hxt-semitech.com>

546c596c

PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition · 944d5859

Bjorn Helgaas authored Jul 31, 2018

PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level.  This looks like
my merge error from these commits:

  fd3362cb ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
  41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c")

Remove the duplicate PCI_EXP_AER_FLAGS definition.

Fixes: 41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>

944d5859

PCI: pciehp: Deduplicate presence check on probe & resume · 4e6a1335

Lukas Wunner authored Jul 28, 2018

On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.

On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.

To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.

However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

4e6a1335

PCI: pciehp: Avoid implicit fallthroughs in switch statements · 8bb46b07

Lukas Wunner authored Jul 28, 2018

Per Mika's request, add an explicit break to the last case of switch
statements everywhere in pciehp to be more defensive towards future
amendments.

Per Gustavo's request, mark all non-empty implicit fallthroughs with a
comment to silence warnings triggered by -Wimplicit-fallthrough=2.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

8bb46b07

PCI: Fix is_added/is_busmaster race condition · 44bda4b7

Hari Vyas authored Jul 03, 2018

When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master().  As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state.  This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>

44bda4b7

PCI: Whitelist Thunderbolt ports for runtime D3 · 47a8e237

Lukas Wunner authored Jul 19, 2018

Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
This requires that runtime D3 is allowed on its PCIe ports, so whitelist
them.

The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
is unnecessary on Thunderbolt because we know that even the oldest
controller, Light Ridge (2010), is able to suspend its ports to D3 just
fine -- specifically including its hotplug ports.  And the power saving
should be afforded to machines even if their BIOS predates 2015.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>

47a8e237

PCI: Whitelist native hotplug ports for runtime D3 · eb3b5bf1

Lukas Wunner authored Jul 19, 2018

Previously we blacklisted PCIe hotplug ports for runtime D3 because:

(a) Ports handled by the firmware must not be transitioned to D3 by the
    OS behind the firmware's back:
    https://bugzilla.kernel.org/show_bug.cgi?id=53811

(b) Ports handled natively by the OS lacked runtime D3 support in the
    pciehp driver.

We've just rectified the latter, so allow users to manually enable and
test it by passing pcie_port_pm=force on the command line.  Vendors are
thus put in a position to validate hotplug ports for runtime D3 and
perhaps we can someday enable it by default, but with a BIOS cutoff date.

Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
2017 and encountered Hardware Error NMIs, so this feature clearly cannot
be enabled for everyone yet:
https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03

While at it, remove an erroneous code comment I added with 97a90aee
("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
claims that parents of a hotplug port must stay awake lest interrupts
cannot be delivered.  That has turned out to be wrong at least for
Thunderbolt hotplug ports.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>

eb3b5bf1

PCI: sysfs: Resume to D0 on function reset · 82c3fbff

Lukas Wunner authored Jul 19, 2018

When performing a function reset via sysfs, the device's config space is
accessed in places such as pcie_flr() and its MMIO space is accessed e.g.
in reset_ivb_igd(), so ensure accessibility by resuming the device to D0.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>

82c3fbff

PCI: pciehp: Resume parent to D0 on config space access · 4417aa45

Lukas Wunner authored Jul 19, 2018

Ensure accessibility of a hotplug port's config space when accessed via
sysfs by resuming its parent to D0.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>

4417aa45

PCI: pciehp: Resume to D0 on enable/disable · 83503074

Lukas Wunner authored Jul 19, 2018

pciehp's IRQ thread ensures accessibility of the port by runtime resuming
its parent to D0.  However when the slot is enabled/disabled, the port
itself needs to be in D0 because its secondary bus is accessed in:

    pciehp_check_link_status(),
    pciehp_configure_device() (both called from board_added())
and
    pciehp_unconfigure_device() (called from remove_board()).

Thus, acquire a runtime PM ref on enable/disablement of the slot.

Yinghai Lu additionally discovered that some SkyLake servers feature a
Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
which requires the port to be in D0 when invoking

    pciehp_power_on_slot() (likewise called from board_added()).

If slot power is turned on while in D3hot, link training later fails:
https://lkml.kernel.org/r/20170205073454.GA253@wunner.de

The spec is silent about such a requirement, but it seems prudent to
assume that any hotplug port with a Power Controller may need this.

The present commit holds a runtime PM ref whenever slot power is turned
on and off, but it doesn't keep the port in D0 as long as slot power is
on.  If vendors determine that's necessary, they need to amend pciehp to
acquire a runtime PM ref in pciehp_power_on_slot() and release one in
pciehp_power_off_slot().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>

83503074

PCI: pciehp: Support interrupts sent from D3hot · 6b08c385

Lukas Wunner authored Jul 28, 2018

If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?

It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.

If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.

That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.

Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)

PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."

This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)

(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>

6b08c385

PCI: pciehp: Obey compulsory command delay after resume · 469e764c

Lukas Wunner authored Jul 19, 2018

Upon resume from system sleep, the Slot Control register is written via:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that
targets any portion of the Port's Slot Control register, [...] software
must wait for [the] command to complete before issuing the next command".

pciehp currently fails to enforce that rule after the above-mentioned
write.  Fix it.

(Moving restoration of the Slot Control register to pciehp doesn't seem
to make sense because the other PCIe hotplug drivers may need it as
well.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

469e764c

PCI: pciehp: Clear spurious events earlier on resume · 79037824

Lukas Wunner authored Jul 19, 2018

Thunderbolt hotplug ports that were occupied before system sleep resume
with their downstream link in "off" state.  Only after the Thunderbolt
controller has reestablished the PCIe tunnels does the link go up.
As a result, a spurious Presence Detect Changed and/or Data Link Layer
State Changed event occurs.

The events are not immediately acted upon because tunnel reestablishment
happens in the ->resume_noirq phase, when interrupts are still disabled.
Also, notification of events may initially be disabled in the Slot
Control register when coming out of system sleep and is reenabled in the
->resume_noirq phase through:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

It is not guaranteed that the events are acted upon at all:  PCIe r4.0,
sec 6.7.3.4 says that "a port may optionally send an MSI when there are
hot-plug events that occur while interrupt generation is disabled, and
interrupt generation is subsequently enabled."  Note the "optionally".

If an MSI is sent, pciehp will gratuitously turn the slot off and back
on once the ->resume_early phase has commenced.

If an MSI is not sent, the extant, unacknowledged events in the Slot
Status register will prevent future notification of presence or link
changes.

Commit 13c65840 ("PCI: pciehp: Clear Presence Detect and Data Link
Layer Status Changed on resume") fixed the latter by clearing the events
in the ->resume phase.  Move this to the ->resume_noirq phase to also
fix the gratuitous disable/enablement of the slot.

The commit further restored the Slot Control register in the ->resume
phase, but that's dispensable because as shown above it's already been
done in the ->resume_noirq phase.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>

79037824

PCI: portdrv: Deduplicate PM callback iterator · 6ccb127b

Lukas Wunner authored Jul 19, 2018

Replace suspend_iter() and resume_iter() with a single function pm_iter()
to allow addition of port service callbacks for further power management
phases without having to add another iterator each time.

No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

6ccb127b

PCI: pciehp: Avoid slot access during reset · 5b3f7b7d

Lukas Wunner authored Jul 28, 2018

The ->reset_slot callback introduced by commits:

  2e35afae ("PCI: pciehp: Add reset_slot() method") and
  06a8d89a ("PCI: pciehp: Disable link notification across slot reset")

disables notification of Presence Detect Changed and Data Link Layer
State Changed events for the duration of a secondary bus reset.

However a bus reset not only triggers these events, but may also clear
the Presence Detect State bit in the Slot Status register and the Data
Link Layer Link Active bit in the Link Status register momentarily.
According to Sinan Kaya:

 "I know for a fact that bus reset clears the Data Link Layer Active bit
  as soon as link goes down.  It gets set again following link up.
  Presence detect depends on the HW implementation.  QDT root ports
  don't change presence detect for instance since nobody actually
  removed the card.  If an implementation supports in-band presence
  detect, the answer is yes.  As soon as the link goes down, presence
  detect bit will get cleared until recovery."
  https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org

  In-band presence detect is also covered in Table 4-15 in PCIe r4.0,
  sec 4.2.6.

pciehp should therefore ensure that any parts of the driver that access
those bits do not run concurrently to a bus reset.  The only precaution
the commits took to that effect was to halt interrupt polling.  They
made no effort to drain the slot workqueue, cancel an outstanding
Attention Button work, or block slot enable/disable requests via sysfs
and in the ->probe hook.

Now that pciehp is converted to enable/disable the slot exclusively from
the IRQ thread, the only places accessing the two above-mentioned bits
are the IRQ thread and the ->probe hook.  Add locking to serialize them
with a bus reset.  This obviates the need to halt interrupt polling.
Do not add locking to the ->get_adapter_status sysfs callback to afford
users unfettered access to that bit.  Use an rw_semaphore in lieu of a
regular mutex to allow parallel execution of the non-reset code paths
accessing the critical bits, i.e. the IRQ thread and the ->probe hook.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rajat Jain <rajatja@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Sinan Kaya <okaya@kernel.org>

5b3f7b7d

PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler · f7368a55

Heiner Kallweit authored Jul 30, 2018

If we have a threaded interrupt with the handler being NULL, then
request_threaded_irq() -> __setup_irq() will complain and bail out if the
IRQF_ONESHOT flag isn't set.  Therefore check for the handler being NULL
and set IRQF_ONESHOT in this case.

This change is needed to migrate the mei_me driver to
pci_alloc_irq_vectors() and pci_request_irq().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

f7368a55

30 Jul, 2018 1 commit

PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core · a8651194

Christoph Hellwig authored Jul 30, 2018

There is nothing arch-specific about PCI or dma-debug, so call
dma_debug_add_bus() from the PCI core just after registering the bus type.

Most of dma-debug is already generic; this just adds reporting of pending
dma-allocations on driver unload for arches other than powerpc, sh, and
x86.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

a8651194

27 Jul, 2018 1 commit

PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE · a54e43f9

Dan Carpenter authored Jul 27, 2018

IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64.

Fixes: 9af6bcb1 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

a54e43f9

26 Jul, 2018 1 commit

PCI/AER: Work around use-after-free in pcie_do_fatal_recovery() · bd91b56c

Thomas Tai authored Jul 26, 2018

When an fatal error is received by a non-bridge device, the device is
removed, and pci_stop_and_remove_bus_device() deallocates the device
structure.  The freed device structure is used by subsequent code to send
uevents and print messages.

Hold a reference on the device until we're finished using it.  This is not
an ideal fix because pcie_do_fatal_recovery() should not use the device at
all after removing it, but that's too big a project for right now.

Fixes: 7e9084b3 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices")
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
[bhelgaas: changelog, reduce get/put coverage]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

bd91b56c