- 23 Jun, 2023 36 commits
-
-
Ying Hsu authored
In a setup where a Thunderbolt hub connects to Ethernet and a display through USB Type-C, users may experience a hung task timeout when they remove the cable between the PC and the Thunderbolt hub. This is because the igb_down function is called multiple times when the Thunderbolt hub is unplugged. For example, the igb_io_error_detected triggers the first call, and the igb_remove triggers the second call. The second call to igb_down will block at napi_synchronize. Here's the call trace: __schedule+0x3b0/0xddb ? __mod_timer+0x164/0x5d3 schedule+0x44/0xa8 schedule_timeout+0xb2/0x2a4 ? run_local_timers+0x4e/0x4e msleep+0x31/0x38 igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4] __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4] igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4] __dev_close_many+0x95/0xec dev_close_many+0x6e/0x103 unregister_netdevice_many+0x105/0x5b1 unregister_netdevice_queue+0xc2/0x10d unregister_netdev+0x1c/0x23 igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4] pci_device_remove+0x3f/0x9c device_release_driver_internal+0xfe/0x1b4 pci_stop_bus_device+0x5b/0x7f pci_stop_bus_device+0x30/0x7f pci_stop_bus_device+0x30/0x7f pci_stop_and_remove_bus_device+0x12/0x19 pciehp_unconfigure_device+0x76/0xe9 pciehp_disable_slot+0x6e/0x131 pciehp_handle_presence_or_link_change+0x7a/0x3f7 pciehp_ist+0xbe/0x194 irq_thread_fn+0x22/0x4d ? irq_thread+0x1fd/0x1fd irq_thread+0x17b/0x1fd ? irq_forced_thread_fn+0x5f/0x5f kthread+0x142/0x153 ? __irq_get_irqchip_state+0x46/0x46 ? kthread_associate_blkcg+0x71/0x71 ret_from_fork+0x1f/0x30 In this case, igb_io_error_detected detaches the network interface and requests a PCIE slot reset, however, the PCIE reset callback is not being invoked and thus the Ethernet connection breaks down. As the PCIE error in this case is a non-fatal one, requesting a slot reset can be avoided. This patch fixes the task hung issue and preserves Ethernet connection by ignoring non-fatal PCIE errors. Signed-off-by: Ying Hsu <yinghsu@chromium.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620174732.4145155-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Rasmus Villemoes says: ==================== net: dsa: microchip: fix writes to phy registers >= 0x10 Patch 1 is just a simplification, technically unrelated to the other two patches. But it would be a bit inconsistent to have the new ksz_prmw32() introduced in patch 2 use ksz_rmw32() while leaving ksz_prmw8() as-is. The actual fix is of course patch 3. I can definitely see some weird behaviour on our ksz9567 when writing to phy registers 0x1e and 0x1f (with phytool from userspace), though it does not seem that the effect is always to write zeroes to the buddy register as the errata sheet says would be the case. In our case, the switch is connected via i2c; I hope somebody with other switches and/or the SPI variants can test this. ==================== Link: https://lore.kernel.org/r/20230620113855.733526-1-linux@rasmusvillemoes.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rasmus Villemoes authored
According to the errata sheets for ksz9477 and ksz9567, writes to the PHY registers 0x10-0x1f (i.e. those located at addresses 0xN120 to 0xN13f) must be done as a 32 bit write to the 4-byte aligned address containing the register, hence requires a RMW in order not to change the adjacent PHY register. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230620113855.733526-4-linux@rasmusvillemoes.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rasmus Villemoes authored
This will be used in a subsequent patch fixing an errata for writes to certain PHY registers. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20230620113855.733526-3-linux@rasmusvillemoes.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rasmus Villemoes authored
Implement ksz_prmw8() in terms of ksz_rmw8(), just as all the other ksz_pX are implemented in terms of ksz_X. No functional change. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20230620113855.733526-2-linux@rasmusvillemoes.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Przemek suggests that I shouldn't accuse GCC of witchcraft, there is a simpler explanation for why we need manual define. scripts/headers_install.sh modifies the guard, removing _UAPI. That's why including a kernel header from the tree and from /usr leads to duplicate definitions. This also solves the mystery of why I needed to include the header conditionally. I had the wrong guards for most cases but ethtool. Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230621231719.2728928-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Zhengchao Shao authored
Half a year passed since commit 049fe536 ("net: txgbe: Add operations to interact with firmware") was submitted, the buffer in txgbe_calc_eeprom_checksum was not used. So remove it and the related branch codes. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306200242.FXsHokaJ-lkp@intel.com/Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620062519.1575298-1-shaozhengchao@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Russell King says: ==================== Add and use helper for PCS negotiation modes Earlier this month, I proposed a helper for deciding whether a PCS should use inband negotiation modes or not. There was some discussion around this topic, and I believe there was no disagreement about providing the helper. The initial discussion can be found at: https://lore.kernel.org/r/ZGIkGmyL8yL1q1zp@shell.armlinux.org.uk Subsequently, I posted a RFC series back in May: https://lore.kernel.org/r/ZGzhvePzPjJ0v2En@shell.armlinux.org.uk that added a helper, phylink_pcs_neg_mode() which PCS drivers could use to parse the state, and updated a bunch of drivers to use it. I got a couple of bits of feedback to it, including some ACKs. However, I've decided to take this slightly further and change the "mode" parameter to both the pcs_config() and pcs_link_up() methods when a PCS driver opts in to this (by setting "neg_mode" in the phylink_pcs structure.) If this is not set, we default to the old behaviour. That said, this series converts all the PCS implementations I can find currently in net-next. Doing this has the added benefit that the negotiation mode parameter is also available to the pcs_link_up() function, which can now know whether inband negotiation was in fact enabled or not at pcs_config() time. It has been posted as RFC at: https://lore.kernel.org/r/ZIh/CLQ3z89g0Ua0@shell.armlinux.org.uk and received one reply, thanks Elad, which is a similar amount of interest to previous postings. Let's post it as non-RFC and see whether we get more reaction. ==================== Link: https://lore.kernel.org/r/ZIxQIBfO9dH5xFlg@shell.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update macb's embedded PCS drivers to use neg_mode, even though it makes no use of it or the "mode" argument. This makes the driver consistent with converted drivers. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Eo-00EaGX-KJ@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update mt7530's embedded PCS driver to use neg_mode, even though it makes no use of it or the "mode" argument. This makes the driver consistent with converted drivers. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Ej-00EaGR-Fk@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update B53's embedded PCS driver to use neg_mode, even though it makes no use of it or the "mode" argument. This makes the driver consistent with converted drivers. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1qA8Ee-00EaGL-Az@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update Sparx5's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EZ-00EaGF-6F@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update qca8k's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EU-00EaG9-1l@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update prestera's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Acked-by: Elad Nachman <enachman@marvell.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EO-00EaG3-TR@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update mvpp2's embedded PCS drivers to use neg_mode rather than the mode argument, remembering to update the ACPI path as well. As there are no pcs_link_up() methods, this only affects the two pcs_config() methods. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EJ-00EaFx-P6@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update mvneta's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EE-00EaFr-Kx@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update lan966x's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8E9-00EaFl-GN@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update the Lynx PCS driver to use neg_mode rather than the mode argument. This ensures that the link_up() method will always program the speed and duplex when negotiation is disabled. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8E4-00EaFf-Bf@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update the Lynxi PCS driver to use neg_mode rather than the mode argument. This ensures that the link_up() method will always program the speed and duplex when negotiation is disabled. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Dz-00EaFY-5A@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Update xpcs to use neg_mode to configure whether inband negotiation should be used. We need to update sja1105 as well as that directly calls into the XPCS driver's config function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Dt-00EaFS-W9@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Convert fman_dtsec, xilinx_axienet and pcs-lynx to pass the neg_mode into phylink_mii_c22_pcs_config(). Where appropriate, drivers are updated to have neg_mode passed into their pcs_config() and pcs_link_up() functions. For other drivers, we just hoist the call to phylink_pcs_neg_mode() to their pcs_config() method out of phylink_mii_c22_pcs_config(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Do-00EaFM-Ra@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Use phylink_pcs_neg_mode() for phylink_mii_c22_pcs_config(). This results in no functional change. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Dj-00EaFG-Mt@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
PCS have to work out whether they should enable PCS negotiation by looking at the "mode" and "interface" arguments, and the Autoneg bit in the advertising mask. This leads to some complex logic, so lets pull that out into phylink and instead pass a "neg_mode" argument to the PCS configuration and link up methods, instead of the "mode" argument. In order to transition drivers, add a "neg_mode" flag to the phylink PCS structure to PCS can indicate whether they want to be passed the neg_mode or the old mode argument. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8De-00EaFA-Ht@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Yueh-Shun Li says: ==================== Fix comment typos about "transmit" Fix typos about "transmit" missing the first "s" found by searching with keyword "tram" in the first 7 patches. ==================== Link: https://lore.kernel.org/r/20230622012627.15050-1-shamrocklee@posteo.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yueh-Shun Li authored
Spell "retransmit" properly. Found by searching for keyword "tranm". Signed-off-by: Yueh-Shun Li <shamrocklee@posteo.net> Link: https://lore.kernel.org/r/20230622012627.15050-7-shamrocklee@posteo.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yueh-Shun Li authored
Spell "transmissions" properly. Found by searching for keyword "tranm". Signed-off-by: Yueh-Shun Li <shamrocklee@posteo.net> Link: https://lore.kernel.org/r/20230622012627.15050-6-shamrocklee@posteo.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yueh-Shun Li authored
Spell "transmission" properly. Found by searching for keyword "tranm". Signed-off-by: Yueh-Shun Li <shamrocklee@posteo.net> Link: https://lore.kernel.org/r/20230622012627.15050-3-shamrocklee@posteo.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
Cross-merge networking fixes after downstream PR. Conflicts: tools/testing/selftests/net/fcnal-test.sh d7a2fc14 ("selftests: net: fcnal-test: check if FIPS mode is enabled") dd017c72 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.") https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/ https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Paolo Abeni: "Including fixes from ipsec, bpf, mptcp and netfilter. Current release - regressions: - netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain - eth: mlx5e: - fix scheduling of IPsec ASO query while in atomic - free IRQ rmap and notifier on kernel shutdown Current release - new code bugs: - phy: manual remove LEDs to ensure correct ordering Previous releases - regressions: - mptcp: fix possible divide by zero in recvmsg() - dsa: revert "net: phy: dp83867: perform soft reset and retain established link" Previous releases - always broken: - sched: netem: acquire qdisc lock in netem_change() - bpf: - fix verifier id tracking of scalars on spill - fix NULL dereference on exceptions - accept function names that contain dots - netfilter: disallow element updates of bound anonymous sets - mptcp: ensure listener is unhashed before updating the sk status - xfrm: - add missed call to delete offloaded policies - fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets - selftests: fixes for FIPS mode - dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling - eth: sfc: use budget for TX completions Misc: - wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0" * tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits) revert "net: align SO_RCVMARK required privileges with SO_MARK" net: wwan: iosm: Convert single instance struct member to flexible array sch_netem: acquire qdisc lock in netem_change() selftests: forwarding: Fix race condition in mirror installation wifi: mac80211: report all unusable beacon frames mptcp: ensure listener is unhashed before updating the sk status mptcp: drop legacy code around RX EOF mptcp: consolidate fallback and non fallback state machine mptcp: fix possible list corruption on passive MPJ mptcp: fix possible divide by zero in recvmsg() mptcp: handle correctly disconnect() failures bpf: Force kprobe multi expected_attach_type for kprobe_multi link bpf/btf: Accept function names that contain dots Revert "net: phy: dp83867: perform soft reset and retain established link" net: mdio: fix the wrong parameters netfilter: nf_tables: Fix for deleting base chains with payload netfilter: nfnetlink_osf: fix module autoload netfilter: nf_tables: drop module reference after updating chain netfilter: nf_tables: disallow timeout for anonymous sets netfilter: nf_tables: disallow updates of anonymous sets ...
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "ARM: - Correctly save/restore PMUSERNR_EL0 when host userspace is using PMU counters directly - Fix GICv2 emulation on GICv3 after the locking rework - Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and document why Generic: - Avoid setting page table entries pointing to a deleted memslot if a host page table entry is changed concurrently with the deletion" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Avoid illegal stage2 mapping on invalid memory slot KVM: arm64: Use raw_smp_processor_id() in kvm_pmu_probe_armpmu() KVM: arm64: Restore GICv2-on-GICv3 functionality KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fix from Michael Ellerman: - Disable IRQs when switching mm in exit_lazy_flush_tlb() called from exit_mmap() Thanks to Nicholas Piggin and Sachin Sant. * tag 'powerpc-6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs enabled
-
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pciLinus Torvalds authored
Pull pci fix from Bjorn Helgaas: - Transfer Intel LGM GW PCIe maintenance from Rahul Tanwar to Chuanhua Lei (Zhu YiXin) * tag 'pci-v6.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: MAINTAINERS: Add Chuanhua Lei as Intel LGM GW PCIe maintainer
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds authored
Pull MMC fixes from Ulf Hansson: - Fix support for deferred probing for several host drivers - litex_mmc: Use async probe as it's common for all mmc hosts - meson-gx: Fix bug when scheduling while atomic - mmci_stm32: Fix max busy timeout calculation - sdhci-msm: Disable broken 64-bit DMA on MSM8916 * tag 'mmc-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: usdhi60rol0: fix deferred probing mmc: sunxi: fix deferred probing mmc: sh_mmcif: fix deferred probing mmc: sdhci-spear: fix deferred probing mmc: sdhci-acpi: fix deferred probing mmc: owl: fix deferred probing mmc: omap_hsmmc: fix deferred probing mmc: omap: fix deferred probing mmc: mvsdio: fix deferred probing mmc: mtk-sd: fix deferred probing mmc: meson-gx: fix deferred probing mmc: bcm2835: fix deferred probing mmc: litex_mmc: set PROBE_PREFER_ASYNCHRONOUS mmc: meson-gx: remove redundant mmc_request_done() call from irq context mmc: mmci: stm32: fix max busy timeout calculation mmc: sdhci-msm: Disable broken 64-bit DMA on MSM8916
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fix from Hans de Goede: "One small fix for an AMD PMF driver issue which is causing issues for users of just released AMD laptop models" * tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/amd/pmf: Register notify handler only if SPS is enabled
-
git://git.kernel.dk/linuxLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: "A fix for a race condition with poll removal and linked timeouts, and then a few followup fixes/tweaks for the msg_control patch from last week. Not super important, particularly the sparse fixup, as it was broken before that recent commit. But let's get it sorted for real for this release, rather than just have it broken a bit differently" * tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux: io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr io_uring/net: disable partial retries for recvmsg with cmsg io_uring/net: clear msg_controllen on partial sendmsg retry io_uring/poll: serialize poll linked timer start with poll removal
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup fixes from Tejun Heo: "It's late but here are two bug fixes. Both fix problems which can be severe but are very confined in scope. The risk to most use cases should be minimal. - Fix for an old bug which triggers if a cgroup subsystem is remounted to a different hierarchy while someone is reading its cgroup.procs/tasks file. The risk is pretty low given how seldom cgroup subsystems are moved across hierarchies. - We moved cpus_read_lock() outside of cgroup internal locks a while ago but forgot to update the legacy_freezer leading to lockdep triggers. Fixed" * tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Do not corrupt task iteration when rebinding subsystem cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
-
- 22 Jun, 2023 4 commits
-
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.4, take #4 - Correctly save/restore PMUSERNR_EL0 when host userspace is using PMU counters directly - Fix GICv2 emulation on GICv3 after the locking rework - Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and document why...
-
Gavin Shan authored
We run into guest hang in edk2 firmware when KSM is kept as running on the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash device (TYPE_PFLASH_CFI01) during the operation of sector erasing or buffered write. The status is returned by reading the memory region of the pflash device and the read request should have been forwarded to QEMU and emulated by it. Unfortunately, the read request is covered by an illegal stage2 mapping when the guest hang issue occurs. The read request is completed with QEMU bypassed and wrong status is fetched. The edk2 firmware runs into an infinite loop with the wrong status. The illegal stage2 mapping is populated due to same page sharing by KSM at (C) even the associated memory slot has been marked as invalid at (B) when the memory slot is requested to be deleted. It's notable that the active and inactive memory slots can't be swapped when we're in the middle of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count is elevated, and kvm_swap_active_memslots() will busy loop until it reaches to zero again. Besides, the swapping from the active to the inactive memory slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(), corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots(). CPU-A CPU-B ----- ----- ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION) kvm_vm_ioctl_set_memory_region kvm_set_memory_region __kvm_set_memory_region kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE) kvm_invalidate_memslot kvm_copy_memslot kvm_replace_memslot kvm_swap_active_memslots (A) kvm_arch_flush_shadow_memslot (B) same page sharing by KSM kvm_mmu_notifier_invalidate_range_start : kvm_mmu_notifier_change_pte kvm_handle_hva_range __kvm_handle_hva_range kvm_set_spte_gfn (C) : kvm_mmu_notifier_invalidate_range_end Fix the issue by skipping the invalid memory slot at (C) to avoid the illegal stage2 mapping so that the read request for the pflash's status is forwarded to QEMU and emulated by it. In this way, the correct pflash's status can be returned from QEMU to break the infinite loop in the edk2 firmware. We tried a git-bisect and the first problematic commit is cd4c7183 (" KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this, clean_dcache_guest_page() is called after the memory slots are iterated in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called before the iteration on the memory slots before this commit. This change literally enlarges the racy window between kvm_mmu_notifier_change_pte() and memory slot removal so that we're able to reproduce the issue in a practical test case. However, the issue exists since commit d5d8184d ("KVM: ARM: Memory virtualization setup"). Cc: stable@vger.kernel.org # v3.9+ Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup") Reported-by: Shuai Hu <hshuai@redhat.com> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Message-Id: <20230615054259.14911-1-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni authored
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net This is v3, including a crash fix for patch 01/14. The following patchset contains Netfilter/IPVS fixes for net: 1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock. 2) Fix chain binding transaction logic, add a bound flag to rule transactions. Remove incorrect logic in nft_data_hold() and nft_data_release(). 3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing the set/chain as a follow up to 1240eb93 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") 4) Drop map element references from preparation phase instead of set destroy path, otherwise bogus EBUSY with transactions such as: flush chain ip x y delete chain ip x w where chain ip x y contains jump/goto from set elements. 5) Pipapo set type does not regard generation mask from the walk iteration. 6) Fix reference count underflow in set element reference to stateful object. 7) Several patches to tighten the nf_tables API: - disallow set element updates of bound anonymous set - disallow unbound anonymous set/chain at the end of transaction. - disallow updates of anonymous set. - disallow timeout configuration for anonymous sets. 8) Fix module reference leak in chain updates. 9) Fix nfnetlink_osf module autoload. 10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as in iptables-nft. This Netfilter batch is larger than usual at this stage, I am aware we are fairly late in the -rc cycle, if you prefer to route them through net-next, please let me know. netfilter pull request 23-06-21 * tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: Fix for deleting base chains with payload netfilter: nfnetlink_osf: fix module autoload netfilter: nf_tables: drop module reference after updating chain netfilter: nf_tables: disallow timeout for anonymous sets netfilter: nf_tables: disallow updates of anonymous sets netfilter: nf_tables: reject unbound chain set before commit phase netfilter: nf_tables: reject unbound anonymous set before commit phase netfilter: nf_tables: disallow element updates of bound anonymous sets netfilter: nf_tables: fix underflow in object reference counter netfilter: nft_set_pipapo: .walk does not deal with generations netfilter: nf_tables: drop map element references from preparation phase netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain netfilter: nf_tables: fix chain binding transaction logic ipvs: align inner_mac_header for encapsulation ==================== Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Maciej Żenczykowski authored
This reverts commit 1f86123b ("net: align SO_RCVMARK required privileges with SO_MARK") because the reasoning in the commit message is not really correct: SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which sets the socket mark and does require privs. Additionally incoming skb->mark may already be visible if sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled. Furthermore, it is easier to block the getsockopt via bpf (either cgroup setsockopt hook, or via syscall filters) then to unblock it if it requires CAP_NET_RAW/ADMIN. On Android the socket mark is (among other things) used to store the network identifier a socket is bound to. Setting it is privileged, but retrieving it is not. We'd like unprivileged userspace to be able to read the network id of incoming packets (where mark is set via iptables [to be moved to bpf])... An alternative would be to add another sysctl to control whether setting SO_RCVMARK is privilged or not. (or even a MASK of which bits in the mark can be exposed) But this seems like over-engineering... Note: This is a non-trivial revert, due to later merged commit e42c7bee ("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()") which changed both 'ns_capable' into 'sockopt_ns_capable' calls. Fixes: 1f86123b ("net: align SO_RCVMARK required privileges with SO_MARK") Cc: Larysa Zaremba <larysa.zaremba@intel.com> Cc: Simon Horman <simon.horman@corigine.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eyal Birger <eyal.birger@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Patrick Rohr <prohr@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-