1. 16 Feb, 2024 3 commits
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Ensure deferred event delivery on unoffload · f7a70d65
      Tobias Waldekranz authored
      When unoffloading a device, it is important to ensure that all
      relevant deferred events are delivered to it before it disassociates
      itself from the bridge.
      
      Before this change, this was true for the normal case when a device
      maps 1:1 to a net_bridge_port, i.e.
      
         br0
         /
      swp0
      
      When swp0 leaves br0, the call to switchdev_deferred_process() in
      del_nbp() makes sure to process any outstanding events while the
      device is still associated with the bridge.
      
      In the case when the association is indirect though, i.e. when the
      device is attached to the bridge via an intermediate device, like a
      LAG...
      
          br0
          /
        lag0
        /
      swp0
      
      ...then detaching swp0 from lag0 does not cause any net_bridge_port to
      be deleted, so there was no guarantee that all events had been
      processed before the device disassociated itself from the bridge.
      
      Fix this by always synchronously processing all deferred events before
      signaling completion of unoffloading back to the driver.
      
      Fixes: 4e51bf44 ("net: bridge: move the switchdev object replay helpers to "push" mode")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a70d65
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Skip MDB replays of deferred events on offload · dc489f86
      Tobias Waldekranz authored
      Before this change, generation of the list of MDB events to replay
      would race against the creation of new group memberships, either from
      the IGMP/MLD snooping logic or from user configuration.
      
      While new memberships are immediately visible to walkers of
      br->mdb_list, the notification of their existence to switchdev event
      subscribers is deferred until a later point in time. So if a replay
      list was generated during a time that overlapped with such a window,
      it would also contain a replay of the not-yet-delivered event.
      
      The driver would thus receive two copies of what the bridge internally
      considered to be one single event. On destruction of the bridge, only
      a single membership deletion event was therefore sent. As a
      consequence of this, drivers which reference count memberships (at
      least DSA), would be left with orphan groups in their hardware
      database when the bridge was destroyed.
      
      This is only an issue when replaying additions. While deletion events
      may still be pending on the deferred queue, they will already have
      been removed from br->mdb_list, so no duplicates can be generated in
      that scenario.
      
      To a user this meant that old group memberships, from a bridge in
      which a port was previously attached, could be reanimated (in
      hardware) when the port joined a new bridge, without the new bridge's
      knowledge.
      
      For example, on an mv88e6xxx system, create a snooping bridge and
      immediately add a port to it:
      
          root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
          > ip link set dev x3 up master br0
      
      And then destroy the bridge:
      
          root@infix-06-0b-00:~$ ip link del dev br0
          root@infix-06-0b-00:~$ mvls atu
          ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
          DEV:0 Marvell 88E6393X
          33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
          root@infix-06-0b-00:~$
      
      The two IPv6 groups remain in the hardware database because the
      port (x3) is notified of the host's membership twice: once via the
      original event and once via a replay. Since only a single delete
      notification is sent, the count remains at 1 when the bridge is
      destroyed.
      
      Then add the same port (or another port belonging to the same hardware
      domain) to a new bridge, this time with snooping disabled:
      
          root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
          > ip link set dev x3 up master br1
      
      All multicast, including the two IPv6 groups from br0, should now be
      flooded, according to the policy of br1. But instead the old
      memberships are still active in the hardware database, causing the
      switch to only forward traffic to those groups towards the CPU (port
      0).
      
      Eliminate the race in two steps:
      
      1. Grab the write-side lock of the MDB while generating the replay
         list.
      
      This prevents new memberships from showing up while we are generating
      the replay list. But it leaves the scenario in which a deferred event
      was already generated, but not delivered, before we grabbed the
      lock. Therefore:
      
      2. Make sure that no deferred version of a replay event is already
         enqueued to the switchdev deferred queue, before adding it to the
         replay list, when replaying additions.
      
      Fixes: 4f2673b3 ("net: bridge: add helper to replay port and host-joined mdb entries")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc489f86
    • Alexander Gordeev's avatar
      net/iucv: fix the allocation size of iucv_path_table array · b4ea9b6a
      Alexander Gordeev authored
      iucv_path_table is a dynamically allocated array of pointers to
      struct iucv_path items. Yet, its size is calculated as if it was
      an array of struct iucv_path items.
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4ea9b6a
  2. 15 Feb, 2024 28 commits
  3. 14 Feb, 2024 9 commits
    • Linus Torvalds's avatar
      Merge tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 1f3a3e2a
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few regular fixes and one fix for space reservation regression since
        6.7 that users have been reporting:
      
         - fix over-reservation of metadata chunks due to not keeping proper
           balance between global block reserve and delayed refs reserve; in
           practice this leaves behind empty metadata block groups, the
           workaround is to reclaim them by using the '-musage=1' balance
           filter
      
         - other space reservation fixes:
            - do not delete unused block group if it may be used soon
            - do not reserve space for checksums for NOCOW files
      
         - fix extent map assertion failure when writing out free space inode
      
         - reject encoded write if inode has nodatasum flag set
      
         - fix chunk map leak when loading block group zone info"
      
      * tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: don't refill whole delayed refs block reserve when starting transaction
        btrfs: zoned: fix chunk map leak when loading block group zone info
        btrfs: reject encoded write if inode has nodatasum flag set
        btrfs: don't reserve space for checksums when writing to nocow files
        btrfs: add new unused block groups to the list of unused block groups
        btrfs: do not delete unused block group if it may be used soon
        btrfs: add and use helper to check if block group is used
        btrfs: don't drop extent_map for free space inode on write error
      1f3a3e2a
    • Linus Torvalds's avatar
      Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of... · 91f842ff
      Linus Torvalds authored
      Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit fix from Shuah Khan:
       "One important fix to unregister kunit_bus when KUnit module is
        unloaded.
      
        Not doing so causes an error when KUnit module tries to re-register
        the bus when it gets reloaded"
      
      * tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kunit: device: Unregister the kunit_bus on shutdown
      91f842ff
    • Felix Fietkau's avatar
      netfilter: nf_tables: fix bidirectional offload regression · 84443741
      Felix Fietkau authored
      Commit 8f84780b ("netfilter: flowtable: allow unidirectional rules")
      made unidirectional flow offload possible, while completely ignoring (and
      breaking) bidirectional flow offload for nftables.
      Add the missing flag that was left out as an exercise for the reader :)
      
      Cc: Vlad Buslov <vladbu@nvidia.com>
      Fixes: 8f84780b ("netfilter: flowtable: allow unidirectional rules")
      Reported-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      84443741
    • Kyle Swenson's avatar
      netfilter: nat: restore default DNAT behavior · 0f1ae282
      Kyle Swenson authored
      When a DNAT rule is configured via iptables with different port ranges,
      
      iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010
      -j DNAT --to-destination 192.168.0.10:21000-21010
      
      we seem to be DNATing to some random port on the LAN side. While this is
      expected if --random is passed to the iptables command, it is not
      expected without passing --random.  The expected behavior (and the
      observed behavior prior to the commit in the "Fixes" tag) is the traffic
      will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
      with that destination.  In that case, we expect the traffic to be
      instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
      the range.
      
      This patch intends to restore the behavior observed prior to the "Fixes"
      tag.
      
      Fixes: 6ed5943f ("netfilter: nat: remove l4 protocol port rovers")
      Signed-off-by: default avatarKyle Swenson <kyle.swenson@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0f1ae282
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_pipapo: fix missing : in kdoc · f6374a82
      Pablo Neira Ayuso authored
      Add missing : in kdoc field names.
      
      Fixes: 8683f4b9 ("nft_set_pipapo: Prepare for vectorised implementation: helpers")
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f6374a82
    • Sasha Neftin's avatar
      igc: Remove temporary workaround · 55ea9899
      Sasha Neftin authored
      PHY_CONTROL register works as defined in the IEEE 802.3 specification
      (IEEE 802.3-2008 22.2.4.1). Tidy up the temporary workaround.
      
      User impact: PHY can now be powered down when the ethernet link is down.
      
      Testing hints: ip link set down <device> (or just disconnect the
      ethernet cable).
      
      Oldest tested NVM version is: 1045:740.
      
      Fixes: 5586838f ("igc: Add code for PHY support")
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      55ea9899
    • Kunwu Chan's avatar
      igb: Fix string truncation warnings in igb_set_fw_version · c56d0558
      Kunwu Chan authored
      Commit 1978d3ea ("intel: fix string truncation warnings")
      fixes '-Wformat-truncation=' warnings in igb_main.c by using kasprintf.
      
      drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning:‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=]
       3092 |                                  "%d.%d, 0x%08x, %d.%d.%d",
            |                                                     ^~
      drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
       3092 |                                  "%d.%d, 0x%08x, %d.%d.%d",
            |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
      drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note:‘snprintf’ output between 23 and 43 bytes into a destination of size 32
      
      kasprintf() returns a pointer to dynamically allocated memory
      which can be NULL upon failure.
      
      Fix this warning by using a larger space for adapter->fw_version,
      and then fall back and continue to use snprintf.
      
      Fixes: 1978d3ea ("intel: fix string truncation warnings")
      Signed-off-by: default avatarKunwu Chan <chentao@kylinos.cn>
      Cc: Kunwu Chan <kunwu.chan@hotmail.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c56d0558
    • Maxime Jayat's avatar
      can: netlink: Fix TDCO calculation using the old data bittiming · 2aa0a5e6
      Maxime Jayat authored
      The TDCO calculation was done using the currently applied data bittiming,
      instead of the newly computed data bittiming, which means that the TDCO
      had an invalid value unless setting the same data bittiming twice.
      
      Fixes: d99755f7 ("can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)")
      Signed-off-by: default avatarMaxime Jayat <maxime.jayat@mobile-devices.fr>
      Reviewed-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Link: https://lore.kernel.org/all/40579c18-63c0-43a4-8d4c-f3a6c1c0b417@munic.io
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      2aa0a5e6
    • Oleksij Rempel's avatar
      can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) · efe7cf82
      Oleksij Rempel authored
      Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...)
      modifies jsk->filters while receiving packets.
      
      Following trace was seen on affected system:
       ==================================================================
       BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
       Read of size 4 at addr ffff888012144014 by task j1939/350
      
       CPU: 0 PID: 350 Comm: j1939 Tainted: G        W  OE      6.5.0-rc5 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
       Call Trace:
        print_report+0xd3/0x620
        ? kasan_complete_mode_report_info+0x7d/0x200
        ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
        kasan_report+0xc2/0x100
        ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
        __asan_load4+0x84/0xb0
        j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
        j1939_sk_recv+0x20b/0x320 [can_j1939]
        ? __kasan_check_write+0x18/0x20
        ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939]
        ? j1939_simple_recv+0x69/0x280 [can_j1939]
        ? j1939_ac_recv+0x5e/0x310 [can_j1939]
        j1939_can_recv+0x43f/0x580 [can_j1939]
        ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
        ? raw_rcv+0x42/0x3c0 [can_raw]
        ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
        can_rcv_filter+0x11f/0x350 [can]
        can_receive+0x12f/0x190 [can]
        ? __pfx_can_rcv+0x10/0x10 [can]
        can_rcv+0xdd/0x130 [can]
        ? __pfx_can_rcv+0x10/0x10 [can]
        __netif_receive_skb_one_core+0x13d/0x150
        ? __pfx___netif_receive_skb_one_core+0x10/0x10
        ? __kasan_check_write+0x18/0x20
        ? _raw_spin_lock_irq+0x8c/0xe0
        __netif_receive_skb+0x23/0xb0
        process_backlog+0x107/0x260
        __napi_poll+0x69/0x310
        net_rx_action+0x2a1/0x580
        ? __pfx_net_rx_action+0x10/0x10
        ? __pfx__raw_spin_lock+0x10/0x10
        ? handle_irq_event+0x7d/0xa0
        __do_softirq+0xf3/0x3f8
        do_softirq+0x53/0x80
        </IRQ>
        <TASK>
        __local_bh_enable_ip+0x6e/0x70
        netif_rx+0x16b/0x180
        can_send+0x32b/0x520 [can]
        ? __pfx_can_send+0x10/0x10 [can]
        ? __check_object_size+0x299/0x410
        raw_sendmsg+0x572/0x6d0 [can_raw]
        ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
        ? apparmor_socket_sendmsg+0x2f/0x40
        ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
        sock_sendmsg+0xef/0x100
        sock_write_iter+0x162/0x220
        ? __pfx_sock_write_iter+0x10/0x10
        ? __rtnl_unlock+0x47/0x80
        ? security_file_permission+0x54/0x320
        vfs_write+0x6ba/0x750
        ? __pfx_vfs_write+0x10/0x10
        ? __fget_light+0x1ca/0x1f0
        ? __rcu_read_unlock+0x5b/0x280
        ksys_write+0x143/0x170
        ? __pfx_ksys_write+0x10/0x10
        ? __kasan_check_read+0x15/0x20
        ? fpregs_assert_state_consistent+0x62/0x70
        __x64_sys_write+0x47/0x60
        do_syscall_64+0x60/0x90
        ? do_syscall_64+0x6d/0x90
        ? irqentry_exit+0x3f/0x50
        ? exc_page_fault+0x79/0xf0
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
       Allocated by task 348:
        kasan_save_stack+0x2a/0x50
        kasan_set_track+0x29/0x40
        kasan_save_alloc_info+0x1f/0x30
        __kasan_kmalloc+0xb5/0xc0
        __kmalloc_node_track_caller+0x67/0x160
        j1939_sk_setsockopt+0x284/0x450 [can_j1939]
        __sys_setsockopt+0x15c/0x2f0
        __x64_sys_setsockopt+0x6b/0x80
        do_syscall_64+0x60/0x90
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
       Freed by task 349:
        kasan_save_stack+0x2a/0x50
        kasan_set_track+0x29/0x40
        kasan_save_free_info+0x2f/0x50
        __kasan_slab_free+0x12e/0x1c0
        __kmem_cache_free+0x1b9/0x380
        kfree+0x7a/0x120
        j1939_sk_setsockopt+0x3b2/0x450 [can_j1939]
        __sys_setsockopt+0x15c/0x2f0
        __x64_sys_setsockopt+0x6b/0x80
        do_syscall_64+0x60/0x90
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Reported-by: default avatarSili Luo <rootlab@huawei.com>
      Suggested-by: default avatarSili Luo <rootlab@huawei.com>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      efe7cf82