- 30 Jun, 2023 1 commit
-
-
Pawel Dembicki authored
Switch in MAXLEN register stores the maximum size of a data frame. The MTU size is 18 bytes smaller than the frame size. The current settings are causing problems with packet forwarding. This patch fixes the MTU settings to proper values. Fixes: fb77ffc6 ("net: dsa: vsc73xx: make the MTU configurable") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20230628194327.1765644-1-paweldembicki@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 29 Jun, 2023 33 commits
-
-
Nick Child authored
All ibmvnic resets, make a call to netdev_tx_reset_queue() when re-opening the device. netdev_tx_reset_queue() resets the num_queued and num_completed byte counters. These stats are used in Byte Queue Limit (BQL) algorithms. The difference between these two stats tracks the number of bytes currently sitting on the physical NIC. ibmvnic increases the number of queued bytes though calls to netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports that it is done transmitting bytes, the ibmvnic device increases the number of completed bytes through calls to netdev_tx_completed_queue(). It is important to note that the driver batches its transmit calls and num_queued is increased every time that an skb is added to the next batch, not necessarily when the batch is sent to VIOS for transmission. Unlike other reset types, a NON FATAL reset will not flush the sub crq tx buffers. Therefore, it is possible for the batched skb array to be partially full. So if there is call to netdev_tx_reset_queue() when re-opening the device, the value of num_queued (0) would not account for the skb's that are currently batched. Eventually, when the batch is sent to VIOS, the call to netdev_tx_completed_queue() would increase num_completed to a value greater than the num_queued. This causes a BUG_ON crash: ibmvnic 30000002: Firmware reports error, cause: adapter problem. Starting recovery... ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:27! Oops: Exception in kernel mode, sig: 5 [....] NIP dql_completed+0x28/0x1c0 LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] Call Trace: ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] __handle_irq_event_percpu+0x98/0x270 ---[ end trace ]--- Therefore, do not reset the dql stats when performing a NON_FATAL reset. Fixes: 0d973388 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Martin Habets authored
On systems without MAE permission efx->mae is not initialised, and trying to lookup an mport results in a NULL pointer dereference. Fixes: 25414b2a ("sfc: add devlink port support for ef100") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Cherry-pick what looks like safe fixes from the bluetooth net-next PR. The other changes will have to wait for 6.6 Link: https://lore.kernel.org/all/20230627191004.2586540-1-luiz.dentz@gmail.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Luiz Augusto von Dentz authored
The ISO Interval on CIS Established Event uses 1.25 ms slots: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2304: Time = N * 1.25 ms In addition to that this always update the QoS settings based on CIS Established Event. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiapeng Chong authored
Use kmemdup rather than duplicating its implementation. ./net/bluetooth/hci_conn.c:1880:7-14: WARNING opportunity for kmemdup. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5597Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthew Anderson authored
Adding the device ID from the Asus Ally gets the bluetooth working on the device. Signed-off-by: Matthew Anderson <ruinairas1992@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ivan Orlov authored
Now that the driver core allows for struct class to be in read-only memory, move the bt_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at load time. Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: linux-bluetooth@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Luiz Augusto von Dentz authored
This rework sync_interval to be sync_factor as having sync_interval in the order of seconds is sometimes not disarable. Wit sync_factor the application can tell how many SDU intervals it wants to send an announcement with PA, the EA interval is set to 2 times that so a factor of 24 of BIG SDU interval of 10ms would look like the following: < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 Handle: 0x01 Properties: 0x0000 Min advertising interval: 480.000 msec (0x0300) Max advertising interval: 480.000 msec (0x0300) Channel map: 37, 38, 39 (0x07) Own address type: Random (0x01) Peer address type: Public (0x00) Peer address: 00:00:00:00:00:00 (OUI 00-00-00) Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00) TX power: Host has no preference (0x7f) Primary PHY: LE 1M (0x01) Secondary max skip: 0x00 Secondary PHY: LE 2M (0x02) SID: 0x00 Scan request notifications: Disabled (0x00) < HCI Command: LE Set Periodic Advertising Parameters (0x08|0x003e) plen 7 Handle: 1 Min interval: 240.00 msec (0x00c0) Max interval: 240.00 msec (0x00c0) Properties: 0x0000 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Luiz Augusto von Dentz authored
When receiving a scan response there is no way to know if the remote device is connectable or not, so when it cannot be merged don't make any assumption and instead just mark it with a new flag defined as MGMT_DEV_FOUND_SCAN_RSP so userspace can tell it is a standalone SCAN_RSP. Link: https://lore.kernel.org/linux-bluetooth/CABBYNZ+CYMsDSPTxBn09Js3BcdC-x7vZFfyLJ3ppZGGwJKmUTw@mail.gmail.com/ Fixes: c70a7e4c ("Bluetooth: Add support for Not Connectable flag for Device Found events") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pauli Virtanen authored
If the event has error status, return right error code and don't show incorrect "response malformed" messages. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pauli Virtanen authored
When reconfiguring CIG after disconnection of the last CIS, LE Remove CIG shall be sent before LE Set CIG Parameters. Otherwise, it fails because CIG is in the inactive state and not configurable (Core v5.3 Vol 6 Part B Sec. 4.5.14.3). This ordering is currently wrong under suitable timing conditions, because LE Remove CIG is sent via the hci_sync queue and may be delayed, but Set CIG Parameters is via hci_send_cmd. Make the ordering well-defined by sending also Set CIG Parameters via hci_sync. Fixes: 26afbd82 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Johan Hovold authored
A recent commit restored the original (and still documented) semantics for the HCI_QUIRK_USE_BDADDR_PROPERTY quirk so that the device address is considered invalid unless an address is provided by firmware. This specifically means that this flag must only be set for devices with invalid addresses, but the Broadcom driver has so far been setting this flag unconditionally. Fortunately the driver already checks for invalid addresses during setup and sets the HCI_QUIRK_INVALID_BDADDR flag. Use this flag to indicate when the address can be overridden by firmware (long term, this should probably just always be allowed). Fixes: 6945795b ("Bluetooth: fix use-bdaddr-property quirk") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/lkml/ecef83c8-497f-4011-607b-a63c24764867@samsung.comSigned-off-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sungwoo Kim authored
l2cap_sock_release(sk) frees sk. However, sk's children are still alive and point to the already free'd sk's address. To fix this, l2cap_sock_release(sk) also cleans sk's children. ================================================================== BUG: KASAN: use-after-free in l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 Read of size 8 at addr ffff888104617aa8 by task kworker/u3:0/276 CPU: 0 PID: 276 Comm: kworker/u3:0 Not tainted 6.2.0-00001-gef397bd4d5fb-dirty #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci2 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x72/0x95 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x175/0x478 mm/kasan/report.c:417 kasan_report+0xb1/0x130 mm/kasan/report.c:517 l2cap_sock_ready_cb+0xb7/0x100 net/bluetooth/l2cap_sock.c:1650 l2cap_chan_ready+0x10e/0x1e0 net/bluetooth/l2cap_core.c:1386 l2cap_config_req+0x753/0x9f0 net/bluetooth/l2cap_core.c:4480 l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:5739 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:6509 [inline] l2cap_recv_frame+0xe2e/0x43c0 net/bluetooth/l2cap_core.c:7788 l2cap_recv_acldata+0x6ed/0x7e0 net/bluetooth/l2cap_core.c:8506 hci_acldata_packet net/bluetooth/hci_core.c:3813 [inline] hci_rx_work+0x66e/0xbc0 net/bluetooth/hci_core.c:4048 process_one_work+0x4ea/0x8e0 kernel/workqueue.c:2289 worker_thread+0x364/0x8e0 kernel/workqueue.c:2436 kthread+0x1b9/0x200 kernel/kthread.c:376 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK> Allocated by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0x82/0x90 mm/kasan/common.c:383 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0x140 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x113/0x1f0 net/core/sock.c:2040 sk_alloc+0x36/0x3c0 net/core/sock.c:2093 l2cap_sock_alloc.constprop.0+0x39/0x1c0 net/bluetooth/l2cap_sock.c:1852 l2cap_sock_create+0x10d/0x220 net/bluetooth/l2cap_sock.c:1898 bt_sock_create+0x183/0x290 net/bluetooth/af_bluetooth.c:132 __sock_create+0x226/0x380 net/socket.c:1518 sock_create net/socket.c:1569 [inline] __sys_socket_create net/socket.c:1606 [inline] __sys_socket_create net/socket.c:1591 [inline] __sys_socket+0x112/0x200 net/socket.c:1639 __do_sys_socket net/socket.c:1652 [inline] __se_sys_socket net/socket.c:1650 [inline] __x64_sys_socket+0x40/0x50 net/socket.c:1650 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 288: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:523 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free mm/kasan/common.c:200 [inline] __kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook mm/slub.c:1807 [inline] slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x88/0x1f0 mm/slub.c:3800 sk_prot_free net/core/sock.c:2076 [inline] __sk_destruct+0x347/0x430 net/core/sock.c:2168 sk_destruct+0x9c/0xb0 net/core/sock.c:2183 __sk_free+0x82/0x220 net/core/sock.c:2194 sk_free+0x7c/0xa0 net/core/sock.c:2205 sock_put include/net/sock.h:1991 [inline] l2cap_sock_kill+0x256/0x2b0 net/bluetooth/l2cap_sock.c:1257 l2cap_sock_release+0x1a7/0x220 net/bluetooth/l2cap_sock.c:1428 __sock_release+0x80/0x150 net/socket.c:650 sock_close+0x19/0x30 net/socket.c:1368 __fput+0x17a/0x5c0 fs/file_table.c:320 task_work_run+0x132/0x1c0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x113/0x120 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:296 do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff888104617800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 680 bytes inside of 1024-byte region [ffff888104617800, ffff888104617c00) The buggy address belongs to the physical page: page:00000000dbca6a80 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888104614000 pfn:0x104614 head:00000000dbca6a80 order:2 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffff888100041dc0 ffffea0004212c10 ffffea0004234b10 raw: ffff888104614000 0000000000080002 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888104617980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888104617a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888104617b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888104617b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Ack: This bug is found by FuzzBT with a modified Syzkaller. Other contributors are Ruoyu Wu and Hui Peng. Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Johan Hovold authored
Devices that lack persistent storage for the device address can indicate this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller to be marked as unconfigured until user space has set a valid address. The related HCI_QUIRK_USE_BDADDR_PROPERTY was later added to similarly indicate that the device lacks a valid address but that one may be specified in the devicetree. As is clear from commit 7a0e5b15 ("Bluetooth: Add quirk for reading BD_ADDR from fwnode property") that added and documented this quirk and commits like de79a9df ("Bluetooth: btqcomsmd: use HCI_QUIRK_USE_BDADDR_PROPERTY"), the device address of controllers with this flag should be treated as invalid until user space has had a chance to configure the controller in case the devicetree property is missing. As it does not make sense to allow controllers with invalid addresses, restore the original semantics, which also makes sure that the implementation is consistent (e.g. get_missing_options() indicates that the address must be set) and matches the documentation (including comments in the code, such as, "In case any of them is set, the controller has to start up as unconfigured."). Fixes: e668eb1e ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Johan Hovold authored
Devices that lack persistent storage for the device address can indicate this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller to be marked as unconfigured until user space has set a valid address. Once configured, the device address must be set on every setup for controllers with HCI_QUIRK_NON_PERSISTENT_SETUP to avoid marking the controller as unconfigured and requiring the address to be set again. Fixes: 740011cf ("Bluetooth: Add new quirk for non-persistent setup settings") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Zhengping Jiang authored
Fix potential use-after-free in l2cap_le_command_rej. Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Min-Hua Chen authored
Use le32_to_cpu for ver.soc_id to fix the following sparse warning. drivers/bluetooth/btqca.c:640:24: sparse: warning: restricted __le32 degrades to integer Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Gora authored
This device is an Inspire branded BT 5.1 USB dongle with a Realtek RTL8761BU chip using the "Best Buy China" vendor ID. The device table is as follows: T: Bus=01 Lev=01 Prnt=02 Port=09 Cnt=01 Dev#= 7 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=6655 ProdID=8771 Rev=02.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00E04C239987 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms Signed-off-by: Dan Gora <dan.gora@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Gora authored
Add missing MODULE_FIRMWARE declarations for firmware referenced in btrtl.c. Signed-off-by: Dan Gora <dan.gora@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Heider authored
Fixes a bug where on the M1 mac mini initramfs-tools fails to include the necessary firmware into the initrd. Fixes: c4dab506 ("tg3: Download 57766 EEE service patch firmware") Signed-off-by: Tobias Heider <me@tobhe.de> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/ZJt7LKzjdz8+dClx@tobhe.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Vladimir Oltean says: ==================== Fix PTP received on wrong port with bridged SJA1105 DSA Since the changes were made to tag_8021q to support imprecise RX for bridged ports, the tag_sja1105 driver still prefers the source port information deduced from the VLAN headers for link-local traffic, even though the switch can theoretically do better and report the precise source port. The problem is that the tagger doesn't know when to trust one source of information over another, because the INCL_SRCPT option (to "tag" link local frames) is sometimes enabled and sometimes it isn't. The first patch makes the switch provide the hardware tag for link local traffic under all circumstances, and the second patch makes the tagger always use that hardware tag as primary source of information for link local packets. ==================== Link: https://lore.kernel.org/r/20230627094207.3385231-1-vladimir.oltean@nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
Currently the sja1105 tagging protocol prefers using the source port information from the VLAN header if that is available, falling back to the INCL_SRCPT option if it isn't. The VLAN header is available for all frames except for META frames initiated by the switch (containing RX timestamps), and thus, the "if (is_link_local)" branch is practically dead. The tag_8021q source port identification has become more loose ("imprecise") and will report a plausible rather than exact bridge port, when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local traffic always needs to know the precise source port. With incorrect source port reporting, for example PTP traffic over 2 bridged ports will all be seen on sockets opened on the first such port, which is incorrect. Now that the tagging protocol has been changed to make link-local frames always contain source port information, we can reverse the order of the checks so that we always give precedence to that information (which is always precise) in lieu of the tag_8021q VID which is only precise for a standalone port. Fixes: d7f9787a ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21 ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
Link-local traffic on bridged SJA1105 ports is sometimes tagged by the hardware with source port information (when the port is under a VLAN aware bridge). The tag_8021q source port identification has become more loose ("imprecise") and will report a plausible rather than exact bridge port, when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local traffic always needs to know the precise source port. Modify the driver logic (and therefore: the tagging protocol itself) to always include the source port information with link-local packets, regardless of whether the port is standalone, under a VLAN-aware or VLAN-unaware bridge. This makes it possible for the tagging driver to give priority to that information over the tag_8021q VLAN header. The big drawback with INCL_SRCPT is that it makes it impossible to distinguish between an original MAC DA of 01:80:C2:XX:YY:ZZ and 01:80:C2:AA:BB:ZZ, because the tagger just patches MAC DA bytes 3 and 4 with zeroes. Only if PTP RX timestamping is enabled, the switch will generate a META follow-up frame containing the RX timestamp and the original bytes 3 and 4 of the MAC DA. Those will be used to patch up the original packet. Nonetheless, in the absence of PTP RX timestamping, we have to live with this limitation, since it is more important to have the more precise source port information for link-local traffic. Fixes: d7f9787a ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21 ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Vladimir Oltean says: ==================== Fix PTP packet drops with ocelot-8021q DSA tag protocol Changes in v2: - Distinguish between L2 and L4 PTP packets v1 at: https://lore.kernel.org/netdev/20230626154003.3153076-1-vladimir.oltean@nxp.com/ Patch 3/3 fixes an issue with the ocelot/felix driver, where it would drop PTP traffic on RX unless hardware timestamping for that packet type was enabled. Fixing that requires the driver to know whether it had previously configured the hardware to timestamp PTP packets on that port. But it cannot correctly determine that today using the existing code structure, so patches 1/3 and 2/3 fix the control path of the code such that ocelot->ports[port]->trap_proto faithfully reflects whether that configuration took place. ==================== Link: https://lore.kernel.org/r/20230627163114.3561597-1-vladimir.oltean@nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
The driver implements a workaround for the fact that it doesn't have an IRQ source to tell it whether PTP frames are available through the extraction registers, for those frames to be processed and passed towards the network stack. That workaround is to configure the switch, through felix_hwtstamp_set() -> felix_update_trapping_destinations(), to create two copies of PTP packets: one sent over Ethernet to the DSA master, and one to be consumed through the aforementioned CPU extraction queue registers. The reason why we want PTP packets to be consumed through the CPU extraction registers in the first place is because we want to see their hardware RX timestamp. With tag_8021q, that is only visible that way, and it isn't visible with the copy of the packet that's transmitted over Ethernet. The problem with the workaround implementation is that it drops the packet received over Ethernet, in expectation of its copy being present in the CPU extraction registers. However, if felix_hwtstamp_set() hasn't run (aka PTP RX timestamping is disabled), the driver will drop the original PTP frame and there will be no copy of it in the CPU extraction registers. So, the network stack will simply not see any PTP frame. Look at the port's trapping configuration to see whether the driver has previously enabled the CPU extraction registers. If it hasn't, just don't RX timestamp the frame and let it be passed up the stack by DSA, which is perfectly fine. Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
In a future change, the driver will need to determine whether PTP RX timestamping is enabled on a port (including whether traps were set up on that port in particular) and that is currently not possible. The driver supports different RX filters (L2, L4) and kinds of TX timestamping (one-step, two-step) on its ports, but it saves all configuration in a single struct hwtstamp_config that is global to the switch. So, the latest timestamping configuration on one port (including a request to disable timestamping) affects what gets reported for all ports, even though the configuration itself is still individual to each port. The port timestamping configurations are only coupled because of the common structure, so replace the hwtstamp_config with a mask of trapped protocols saved per port. We also have the ptp_cmd to distinguish between one-step and two-step PTP timestamping, so with those 2 bits of information we can fully reconstruct a descriptive struct hwtstamp_config for each port, during the SIOCGHWTSTAMP ioctl. Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support") Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
PTP RX timestamping should be enabled when the user requests it, not by default. If it is enabled by default, it can be problematic when the ocelot driver is a DSA master, and it sidesteps what DSA tries to avoid through __dsa_master_hwtstamp_validate(). Additionally, after the change which made ocelot trap PTP packets only to the CPU at ocelot_hwtstamp_set() time, it is no longer even true that RX timestamping is enabled by default, because until ocelot_hwtstamp_set() is called, the PTP traps are actually not set up. So the rx_filter field of ocelot->hwtstamp_config reflects an incorrect reality. Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets") Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Florian Westphal says: ==================== net/sched: act_ipt bug fixes v3: prefer skb_header() helper in patch 2. No other changes. I've retained Acks and RvB-Tags of v2. While checking if netfilter could be updated to replace selected instances of NF_DROP with kfree_skb_reason+NF_STOLEN to improve debugging info via drop monitor I found that act_ipt is incompatible with such an approach. Moreover, it lacks multiple sanity checks to avoid certain code paths that make assumptions that the tc layer doesn't meet, such as header sanity checks, availability of skb_dst, skb_nfct() and so on. act_ipt test in the tc selftest still pass with this applied. I think that we should consider removal of this module, while this should take care of all problems, its ipv4 only and I don't think there are any netfilter targets that lack a native tc equivalent, even when ignoring bpf. ==================== Link: https://lore.kernel.org/r/20230627123813.3036-1-fw@strlen.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Florian Westphal authored
xtables relies on skb being owned by ip stack, i.e. with ipv4 check in place skb->cb is supposed to be IPCB. I don't see an immediate problem (REJECT target cannot be used anymore now that PRE/POSTROUTING hook validation has been fixed), but better be safe than sorry. A much better patch would be to either mark act_ipt as "depends on BROKEN" or remove it altogether. I plan to do this for -next in the near future. This tc extension is broken in the sense that tc lacks an equivalent of NF_STOLEN verdict. With NF_STOLEN, target function takes complete ownership of skb, caller cannot dereference it anymore. ACT_STOLEN cannot be used for this: it has a different meaning, caller is allowed to dereference the skb. At this time NF_STOLEN won't be returned by any targets as far as I can see, but this may change in the future. It might be possible to work around this via list of allowed target extensions known to only return DROP or ACCEPT verdicts, but this is error prone/fragile. Existing selftest only validates xt_LOG and act_ipt is restricted to ipv4 so I don't think this action is used widely. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Florian Westphal authored
Netfilter targets make assumptions on the skb state, for example iphdr is supposed to be in the linear area. This is normally done by IP stack, but in act_ipt case no such checks are made. Some targets can even assume that skb_dst will be valid. Make a minimum effort to check for this: - Don't call the targets eval function for non-ipv4 skbs. - Don't call the targets eval function for POSTROUTING emulation when the skb has no dst set. v3: use skb_protocol helper (Davide Caratti) Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Florian Westphal authored
Looks like "tc" hard-codes "mangle" as the only supported table name, but on kernel side there are no checks. This is wrong. Not all xtables targets are safe to call from tc. E.g. "nat" targets assume skb has a conntrack object assigned to it. Normally those get called from netfilter nat core which consults the nat table to obtain the address mapping. "tc" userspace either sets PRE or POSTROUTING as hook number, but there is no validation of this on kernel side, so update netlink policy to reject bogus numbers. Some targets may assume skb_dst is set for input/forward hooks, so prevent those from being used. act_ipt uses the hook number in two places: 1. the state hook number, this is fine as-is 2. to set par.hook_mask The latter is a bit mask, so update the assignment to make xt_check_target() to the right thing. Followup patch adds required checks for the skb/packet headers before calling the targets evaluation function. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Chengfeng Ye authored
As &net->sctp.addr_wq_lock is also acquired by the timer sctp_addr_wq_timeout_handler() in protocal.c, the same lock acquisition at sctp_auto_asconf_init() seems should disable irq since it is called from sctp_accept() under process context. Possible deadlock scenario: sctp_accept() -> sctp_sock_migrate() -> sctp_auto_asconf_init() -> spin_lock(&net->sctp.addr_wq_lock) <timer interrupt> -> sctp_addr_wq_timeout_handler() -> spin_lock_bh(&net->sctp.addr_wq_lock); (deadlock here) This flaw was found using an experimental static analysis tool we are developing for irq-related deadlock. The tentative patch fix the potential deadlock by spin_lock_bh(). Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Fixes: 34e5b011 ("sctp: delay auto_asconf init until binding the first addr") Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230627120340.19432-1-dg573847474@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Moritz Fischer authored
dev_set_rx_mode() grabs a spin_lock, and the lan743x implementation proceeds subsequently to go to sleep using readx_poll_timeout(). Introduce a helper wrapping the readx_poll_timeout_atomic() function and use it to replace the calls to readx_polL_timeout(). Fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver") Cc: stable@vger.kernel.org Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: Moritz Fischer <moritzf@google.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230627035000.1295254-1-moritzf@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
- 28 Jun, 2023 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds authored
Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linuxLinus Torvalds authored
Pull sysctl updates from Luis Chamberlain: "The changes for sysctl are in line with prior efforts to stop usage of deprecated routines which incur recursion and also make it hard to remove the empty array element in each sysctl array declaration. The most difficult user to modify was parport which required a bit of re-thinking of how to declare shared sysctls there, Joel Granados has stepped up to the plate to do most of this work and eventual removal of register_sysctl_table(). That work ended up saving us about 1465 bytes according to bloat-o-meter. Since we gained a few bloat-o-meter karma points I moved two rather small sysctl arrays from kernel/sysctl.c leaving us only two more sysctl arrays to move left. Most changes have been tested on linux-next for about a month. The last straggler patches are a minor parport fix, changes to the sysctl kernel selftest so to verify correctness and prevent regressions for the future change he made to provide an alternative solution for the special sysctl mount point target which was using the now deprecated sysctl child element. This is all prep work to now finally be able to remove the empty array element in all sysctl declarations / registrations which is expected to save us a bit of bytes all over the kernel. That work will be tested early after v6.5-rc1 is out" * tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: replace child with an enumeration sysctl: Remove debugging dump_stack test_sysclt: Test for registering a mount point test_sysctl: Add an option to prevent test skip test_sysctl: Add an unregister sysctl test test_sysctl: Group node sysctl test under one func test_sysctl: Fix test metadata getters parport: plug a sysctl register leak sysctl: move security keys sysctl registration to its own file sysctl: move umh sysctl registration to its own file signal: move show_unhandled_signals sysctl to its own file sysctl: remove empty dev table sysctl: Remove register_sysctl_table sysctl: Refactor base paths registrations sysctl: stop exporting register_sysctl_table parport: Removed sysctl related defines parport: Remove register_sysctl_table from parport_default_proc_register parport: Remove register_sysctl_table from parport_device_proc_register parport: Remove register_sysctl_table from parport_proc_register parport: Move magic number "15" to a define
-
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linuxLinus Torvalds authored
Pull module updates from Luis Chamberlain: "The changes queued up for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes" [ There was a mis-communication about the concurrent module load changes that I had expected to come through Luis despite me authoring the patch. So some of the module updates were left hanging in the email ether, and I just committed them separately. It's my bad - I should have made it more clear that I expected my own patches to come through the module tree too. Now they missed linux-next, but hopefully that won't cause any issues - Linus ] * tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: make kallsyms_show_value() as generic function kallsyms: move kallsyms_show_value() out of kallsyms.c kallsyms: remove unsed API lookup_symbol_attrs kallsyms: remove unused arch_get_kallsym() helper module: Remove preempt_disable() from module reference counting.
-
Linus Torvalds authored
This is the new-and-improved attempt at avoiding huge memory load spikes when the user space boot sequence tries to load hundreds (or even thousands) of redundant duplicate modules in parallel. See commit 9828ed3f ("module: error out early on concurrent load of the same module file") for background and an earlier failed attempt that was reverted. That earlier attempt just said "concurrently loading the same module is silly, just open the module file exclusively and return -ETXTBSY if somebody else is already loading it". While it is true that concurrent module loads of the same module is silly, the reason that earlier attempt then failed was that the concurrently loaded module would often be a prerequisite for another module. Thus failing to load the prerequisite would then cause cascading failures of the other modules, rather than just short-circuiting that one unnecessary module load. At the same time, we still really don't want to load the contents of the same module file hundreds of times, only to then wait for an eventually successful load, and have everybody else return -EEXIST. As a result, this takes another approach, and treats concurrent module loads from the same file as "idempotent" in the inode. So if one module load is ongoing, we don't start a new one, but instead just wait for the first one to complete and return the same return value as it did. So unlike the first attempt, this does not return early: the intent is not to speed up the boot, but to avoid a thundering herd problem in allocating memory (both physical and virtual) for a module more than once. Also note that this does change behavior: it used to be that when you had concurrent loads, you'd have one "winner" that would return success, and everybody else would return -EEXIST. In contrast, this idempotent logic goes all Oprah on the problem, and says "You are a winner! And you are a winner! We are ALL winners". But since there's no possible actual real semantic difference between "you loaded the module" and "somebody else already loaded the module", this is more of a feel-good change than an actual honest-to-goodness semantic change. Of course, any true Johnny-come-latelies that don't get caught in the concurrency filter will still return -EEXIST. It's no different from not even getting a seat at an Oprah taping. That's life. See the long thread on the kernel mailing list about this all, which includes some numbers for memory use before and after the patch. Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/Reviewed-by: Johan Hovold <johan@kernel.org> Tested-by: Johan Hovold <johan@kernel.org> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Rudi Heitbaum <rudi@heitbaum..com> Tested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
This will simplify the next step, where we can then key off the inode to do one idempotent module load. Let's do the obvious re-organization in one step, and then the new code in another. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds authored
Pull MMC updates from Ulf Hansson: "MMC core: - Allow synchronous detection of (e)MMC/SD/SDIO cards - Fixup error check for ioctls for SPI hosts - Disable broken SD-Cache support for Kingston Canvas Go Plus from 2019 - Disable broken eMMC-Trim support for Kingston EMMC04G-M627 - Disable broken eMMC-Trim support for Micron MTFC4GACAJCN-1M MMC host: - bcm2835: Convert DT bindings to YAML - mmci: - Enable asynchronous probe - Transform the ux500 HW-busy detection into a proper state machine - Add support for SW busy-end timeouts for the ux500 variants - mmci_stm32: - Add support for sdm32 variant revision v3.0 used on STM32MP25 - Improve the tuning sequence - mtk-sd: Tune polling-period to improve performance - sdhci: Fixup DMA configuration for 64-bit DMA mode - sdhci-bcm-kona: Convert DT bindings to YAML - sdhci-msm: - Switch to use the new ICE API - Add support for the SC8280XP/IPQ6018/QDU1000/QRU1000 variants - sdhci-pci-gli: - Add support SD Express cards for GL9767 - Add support for the Genesys Logic GL9767 variant" * tag 'mmc-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (42 commits) dt-bindings: mmc: fsl-imx-esdhc: Add imx6ul support mmc: mmci: Add support for SW busy-end timeouts mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019 mmc: core: disable TRIM on Kingston EMMC04G-M627 mmc: mmci: stm32: add delay block support for STM32MP25 mmc: mmci: stm32: prepare other delay block support mmc: mmci: stm32: manage block gap hardware flow control mmc: mmci: Add support for sdmmc variant revision v3.0 mmc: mmci: add stm32_idmabsize_align parameter dt-bindings: mmc: mmci: Add st,stm32mp25-sdmmc2 compatible mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M mmc: mmci: Break out a helper function mmc: mmci: Use a switch statement machine mmc: mmci: Use state machine state as exit condition mmc: mmci: Retry the busy start condition mmc: mmci: Make busy complete state machine explicit mmc: mmci: Break out error check in busy detect mmc: mmci: Stash status while waiting for busy mmc: mmci: Unwind big if() clause mmc: mmci: Clear busy_status when starting command ...
-