1. 12 Apr, 2024 22 commits
  2. 11 Apr, 2024 18 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 94426ed2
      Jakub Kicinski authored
      Cross-merge networking fixes after downstream PR.
      
      Conflicts:
      
      net/unix/garbage.c
        47d8ac01 ("af_unix: Fix garbage collector racing against connect()")
        4090fa37 ("af_unix: Replace garbage collection algorithm.")
      
      Adjacent changes:
      
      drivers/net/ethernet/broadcom/bnxt/bnxt.c
        faa12ca2 ("bnxt_en: Reset PTP tx_avail after possible firmware reset")
        b3d0083c ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
      
      drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
        7ac10c7d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()")
        194fad5b ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions")
      
      drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
        958f56e4 ("net/mlx5e: Un-expose functions in en.h")
        49e6c938 ("net/mlx5e: RSS, Block XOR hash with over 128 channels")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      94426ed2
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 2ae9a897
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth.
      
        Current release - new code bugs:
      
         - netfilter: complete validation of user input
      
         - mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
      
        Previous releases - regressions:
      
         - core: fix u64_stats_init() for lockdep when used repeatedly in one
           file
      
         - ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
      
         - bluetooth: fix memory leak in hci_req_sync_complete()
      
         - batman-adv: avoid infinite loop trying to resize local TT
      
         - drv: geneve: fix header validation in geneve[6]_xmit_skb
      
         - drv: bnxt_en: fix possible memory leak in
           bnxt_rdma_aux_device_init()
      
         - drv: mlx5: offset comp irq index in name by one
      
         - drv: ena: avoid double-free clearing stale tx_info->xdpf value
      
         - drv: pds_core: fix pdsc_check_pci_health deadlock
      
        Previous releases - always broken:
      
         - xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
      
         - bluetooth: fix setsockopt not validating user input
      
         - af_unix: clear stale u->oob_skb.
      
         - nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
      
         - drv: virtio_net: fix guest hangup on invalid RSS update
      
         - drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
      
         - dsa: mt7530: trap link-local frames regardless of ST Port State"
      
      * tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
        net: ena: Set tx_info->xdpf value to NULL
        net: ena: Fix incorrect descriptor free behavior
        net: ena: Wrong missing IO completions check order
        net: ena: Fix potential sign extension issue
        af_unix: Fix garbage collector racing against connect()
        net: dsa: mt7530: trap link-local frames regardless of ST Port State
        Revert "s390/ism: fix receive message buffer allocation"
        net: sparx5: fix wrong config being used when reconfiguring PCS
        net/mlx5: fix possible stack overflows
        net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
        net/mlx5e: RSS, Block XOR hash with over 128 channels
        net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
        net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
        net/mlx5e: Fix mlx5e_priv_init() cleanup flow
        net/mlx5e: RSS, Block changing channels number when RXFH is configured
        net/mlx5: Correctly compare pkt reformat ids
        net/mlx5: Properly link new fs rules into the tree
        net/mlx5: offset comp irq index in name by one
        net/mlx5: Register devlink first under devlink lock
        net/mlx5: E-switch, store eswitch pointer before registering devlink_param
        ...
      2ae9a897
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ab4319fd
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "The most important fix is the sg one because the regression it fixes
        (spurious warning and use after final put) is already backported to
        stable.
      
        The next biggest impact is the target fix for wrong credentials used
        to load a module because it's affecting new kernels installed on
        selinux based distributions.
      
        The other three fixes are an obvious off by one and SATA protocol
        issues"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
        scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
        scsi: hisi_sas: Handle the NCQ error returned by D2H frame
        scsi: target: Fix SELinux error when systemd-modules loads the target module
        scsi: sg: Avoid race in error handling & drop bogus warn
      ab4319fd
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.9-1' of... · 5de6b467
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
      
       - make {virt, phys, page, pfn} translation work with KFENCE for
         LoongArch (otherwise NVMe and virtio-blk cannot work with KFENCE
         enabled)
      
       - update dts files for Loongson-2K series to make devices work
         correctly
      
       - fix a build error
      
      * tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors
        LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET
        LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI
        LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC
        LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC
        LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE
        LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
        mm: Move lowmem_page_address() a little later
      5de6b467
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs · e1dc191d
      Linus Torvalds authored
      Pull more bcachefs fixes from Kent Overstreet:
       "Notable user impacting bugs
      
         - On multi device filesystems, recovery was looping in
           btree_trans_too_many_iters(). This checks if a transaction has
           touched too many btree paths (because of iteration over many keys),
           and isuses a restart to drop unneeded paths.
      
           But it's now possible for some paths to exceed the previous limit
           without iteration in the interior btree update path, since the
           transaction commit will do alloc updates for every old and new
           btree node, and during journal replay we don't use the btree write
           buffer for locking reasons and thus those updates use btree paths
           when they wouldn't normally.
      
         - Fix a corner case in rebalance when moving extents on a
           durability=0 device. This wouldn't be hit when a device was
           formatted with durability=0 since in that case we'll only use it as
           a write through cache (only cached extents will live on it), but
           durability can now be changed on an existing device.
      
         - bch2_get_acl() could rarely forget to handle a transaction restart;
           this manifested as the occasional missing acl that came back after
           dropping caches.
      
         - Fix a major performance regression on high iops multithreaded write
           workloads (only since 6.9-rc1); a previous fix for a deadlock in
           the interior btree update path to check the journal watermark
           introduced a dependency on the state of btree write buffer flushing
           that we didn't want.
      
         - Assorted other repair paths and recovery fixes"
      
      * tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs: (25 commits)
        bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
        bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
        bcachefs: Fix a race in btree_update_nodes_written()
        bcachefs: btree_node_scan: Respect member.data_allowed
        bcachefs: Don't scan for btree nodes when we can reconstruct
        bcachefs: Fix check_topology() when using node scan
        bcachefs: fix eytzinger0_find_gt()
        bcachefs: fix bch2_get_acl() transaction restart handling
        bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list
        bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
        bcachefs: Rename struct field swap to prevent macro naming collision
        MAINTAINERS: Add entry for bcachefs documentation
        Documentation: filesystems: Add bcachefs toctree
        bcachefs: JOURNAL_SPACE_LOW
        bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
        bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
        bcachefs: fix rand_delete unit test
        bcachefs: fix ! vs ~ typo in __clear_bit_le64()
        bcachefs: Fix rebalance from durability=0 device
        bcachefs: Print shutdown journal sequence number
        ...
      e1dc191d
    • Linus Torvalds's avatar
      Merge tag 'tag-chrome-platform-fixes-for-v6.9-rc4' of... · 346668f0
      Linus Torvalds authored
      Merge tag 'tag-chrome-platform-fixes-for-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
      
      Pull chrome platform fix from Tzung-Bi Shih:
       "Fix a NULL pointer dereference"
      
      * tag 'tag-chrome-platform-fixes-for-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
        platform/chrome: cros_ec_uart: properly fix race condition
      346668f0
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-add-last-time-fields-in-mptcp_info' · a55b39e8
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: add last time fields in mptcp_info
      
      These patches from Geliang add support for the "last time" field in
      MPTCP Info, and verify that the counters look valid.
      
      Patch 1 adds these counters: last_data_sent, last_data_recv and
      last_ack_recv. They are available in the MPTCP Info, so exposed via
      getsockopt(MPTCP_INFO) and the Netlink Diag interface.
      
      Patch 2 adds a test in diag.sh MPTCP selftest, to check that the
      counters have moved by at least 250ms, after having waited twice that
      time.
      
      v1: https://lore.kernel.org/r/20240405-upstream-net-next-20240405-mptcp-last-time-info-v1-0-52dc49453649@kernel.org
      ====================
      
      Link: https://lore.kernel.org/r/20240410-upstream-net-next-20240405-mptcp-last-time-info-v2-0-f95bd6b33e51@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a55b39e8
    • Geliang Tang's avatar
      selftests: mptcp: test last time mptcp_info · 22724c89
      Geliang Tang authored
      This patch adds a new helper chk_msk_info() to show the counters in
      mptcp_info of the given info, and check that the timestamps move
      forward. Use it to show newly added last_data_sent, last_data_recv
      and last_ack_recv in mptcp_info in chk_last_time_info().
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240410-upstream-net-next-20240405-mptcp-last-time-info-v2-2-f95bd6b33e51@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22724c89
    • Geliang Tang's avatar
      mptcp: add last time fields in mptcp_info · 18d82cde
      Geliang Tang authored
      This patch adds "last time" fields last_data_sent, last_data_recv and
      last_ack_recv in struct mptcp_sock to record the last time data_sent,
      data_recv and ack_recv happened. They all are initialized as
      tcp_jiffies32 in __mptcp_init_sock(), and updated as tcp_jiffies32 too
      when data is sent in __subflow_push_pending(), data is received in
      __mptcp_move_skbs_from_subflow(), and ack is received in ack_update_msk().
      
      Similar to tcpi_last_data_sent, tcpi_last_data_recv and tcpi_last_ack_recv
      exposed with TCP, this patch exposes the last time "an action happened" for
      MPTCP in mptcp_info, named mptcpi_last_data_sent, mptcpi_last_data_recv and
      mptcpi_last_ack_recv, calculated in mptcp_diag_fill_info() as the time
      deltas between now and the newly added last time fields in mptcp_sock.
      
      Since msk->last_ack_recv needs to be protected by mptcp_data_lock/unlock,
      and lock_sock_fast can sleep and be quite slow, move the entire
      mptcp_data_lock/unlock block after the lock/unlock_sock_fast block.
      Then mptcpi_last_data_sent and mptcpi_last_data_recv are set in
      lock/unlock_sock_fast block, while mptcpi_last_ack_recv is set in
      mptcp_data_lock/unlock block, which is protected by a spinlock and
      should not block for too long.
      
      Also add three reserved bytes in struct mptcp_info not to have holes in
      this structure exposed to userspace.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/446Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240410-upstream-net-next-20240405-mptcp-last-time-info-v2-1-f95bd6b33e51@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18d82cde
    • Jakub Kicinski's avatar
      Merge branch mana-ib-flex of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git · 0e36c21d
      Jakub Kicinski authored
      Erick Archer says:
      
      ====================
      mana: Add flex array to struct mana_cfg_rx_steer_req_v2 (part)
      
      The "struct mana_cfg_rx_steer_req_v2" uses a dynamically sized set of
      trailing elements. Specifically, it uses a "mana_handle_t" array. So,
      use the preferred way in the kernel declaring a flexible array [1].
      
      At the same time, prepare for the coming implementation by GCC and Clang
      of the __counted_by attribute. Flexible array members annotated with
      __counted_by can have their accesses bounds-checked at run-time via
      CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
      strcpy/memcpy-family functions).
      
      Also, avoid the open-coded arithmetic in the memory allocator functions
      [2] using the "struct_size" macro.
      
      Moreover, use the "offsetof" helper to get the indirect table offset
      instead of the "sizeof" operator and avoid the open-coded arithmetic in
      pointers using the new flex member. This new structure member also allow
      us to remove the "req_indir_tab" variable since it is no longer needed.
      
      Now, it is also possible to use the "flex_array_size" helper to compute
      the size of these trailing elements in the "memcpy" function.
      
      Specifically, the first commit adds the flex member and the patches 2 and
      3 refactor the consumers of the "struct mana_cfg_rx_steer_req_v2".
      
      This code was detected with the help of Coccinelle, and audited and
      modified manually. The Coccinelle script used to detect this code pattern
      is the following:
      
      virtual report
      
      @rule1@
      type t1;
      type t2;
      identifier i0;
      identifier i1;
      identifier i2;
      identifier ALLOC =~ "kmalloc|kzalloc|kmalloc_node|kzalloc_node|vmalloc|vzalloc|kvmalloc|kvzalloc";
      position p1;
      @@
      
      i0 = sizeof(t1) + sizeof(t2) * i1;
      ...
      i2 = ALLOC@p1(..., i0, ...);
      
      @script:python depends on report@
      p1 << rule1.p1;
      @@
      
      msg = "WARNING: verify allocation on line %s" % (p1[0].line)
      coccilib.report.print_report(p1[0],msg)
      
      Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [1]
      Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2]
      
      v1: https://lore.kernel.org/linux-hardening/AS8PR02MB7237974EF1B9BAFA618166C38B382@AS8PR02MB7237.eurprd02.prod.outlook.com/
      v2: https://lore.kernel.org/linux-hardening/AS8PR02MB723729C5A63F24C312FC9CD18B3F2@AS8PR02MB7237.eurprd02.prod.outlook.com/
      ====================
      
      Link: https://lore.kernel.org/r/AS8PR02MB72374BD1B23728F2E3C3B1A18B022@AS8PR02MB7237.eurprd02.prod.outlook.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e36c21d
    • Erick Archer's avatar
      net: mana: Avoid open coded arithmetic · a68292eb
      Erick Archer authored
      This is an effort to get rid of all multiplications from allocation
      functions in order to prevent integer overflows [1][2].
      
      As the "req" variable is a pointer to "struct mana_cfg_rx_steer_req_v2"
      and this structure ends in a flexible array:
      
      struct mana_cfg_rx_steer_req_v2 {
              [...]
              mana_handle_t indir_tab[] __counted_by(num_indir_entries);
      };
      
      the preferred way in the kernel is to use the struct_size() helper to
      do the arithmetic instead of the calculation "size + size * count" in
      the kzalloc() function.
      
      Moreover, use the "offsetof" helper to get the indirect table offset
      instead of the "sizeof" operator and avoid the open-coded arithmetic in
      pointers using the new flex member. This new structure member also allow
      us to remove the "req_indir_tab" variable since it is no longer needed.
      
      Now, it is also possible to use the "flex_array_size" helper to compute
      the size of these trailing elements in the "memcpy" function.
      
      This way, the code is more readable and safer.
      
      This code was detected with the help of Coccinelle, and audited and
      modified manually.
      
      Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
      Link: https://github.com/KSPP/linux/issues/160 [2]
      Signed-off-by: default avatarErick Archer <erick.archer@outlook.com>
      Link: https://lore.kernel.org/r/AS8PR02MB7237A21355C86EC0DCC0D83B8B022@AS8PR02MB7237.eurprd02.prod.outlook.comReviewed-by: default avatarJustin Stitt <justinstitt@google.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      a68292eb
    • Erick Archer's avatar
      RDMA/mana_ib: Prefer struct_size over open coded arithmetic · 29b8e13a
      Erick Archer authored
      This is an effort to get rid of all multiplications from allocation
      functions in order to prevent integer overflows [1][2].
      
      As the "req" variable is a pointer to "struct mana_cfg_rx_steer_req_v2"
      and this structure ends in a flexible array:
      
      struct mana_cfg_rx_steer_req_v2 {
      	[...]
              mana_handle_t indir_tab[] __counted_by(num_indir_entries);
      };
      
      the preferred way in the kernel is to use the struct_size() helper to
      do the arithmetic instead of the calculation "size + size * count" in
      the kzalloc() function.
      
      Moreover, use the "offsetof" helper to get the indirect table offset
      instead of the "sizeof" operator and avoid the open-coded arithmetic in
      pointers using the new flex member. This new structure member also allow
      us to remove the "req_indir_tab" variable since it is no longer needed.
      
      This way, the code is more readable and safer.
      
      This code was detected with the help of Coccinelle, and audited and
      modified manually.
      
      Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
      Link: https://github.com/KSPP/linux/issues/160 [2]
      Signed-off-by: default avatarErick Archer <erick.archer@outlook.com>
      Link: https://lore.kernel.org/r/AS8PR02MB72375EB06EE1A84A67BE722E8B022@AS8PR02MB7237.eurprd02.prod.outlook.comReviewed-by: default avatarLong Li <longli@microsoft.com>
      Reviewed-by: default avatarJustin Stitt <justinstitt@google.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      29b8e13a
    • Erick Archer's avatar
      net: mana: Add flex array to struct mana_cfg_rx_steer_req_v2 · bfec4e18
      Erick Archer authored
      The "struct mana_cfg_rx_steer_req_v2" uses a dynamically sized set of
      trailing elements. Specifically, it uses a "mana_handle_t" array. So,
      use the preferred way in the kernel declaring a flexible array [1].
      
      At the same time, prepare for the coming implementation by GCC and Clang
      of the __counted_by attribute. Flexible array members annotated with
      __counted_by can have their accesses bounds-checked at run-time via
      CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
      strcpy/memcpy-family functions).
      
      This is a previous step to refactor the two consumers of this structure.
      
       drivers/infiniband/hw/mana/qp.c
       drivers/net/ethernet/microsoft/mana/mana_en.c
      
      The ultimate goal is to avoid the open-coded arithmetic in the memory
      allocator functions [2] using the "struct_size" macro.
      
      Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [1]
      Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2]
      Signed-off-by: default avatarErick Archer <erick.archer@outlook.com>
      Link: https://lore.kernel.org/r/AS8PR02MB7237E2900247571C9CB84C678B022@AS8PR02MB7237.eurprd02.prod.outlook.comReviewed-by: default avatarLong Li <longli@microsoft.com>
      Reviewed-by: default avatarJustin Stitt <justinstitt@google.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      bfec4e18
    • Paolo Abeni's avatar
      Merge branch 'ena-driver-bug-fixes' · 4e1ad31c
      Paolo Abeni authored
      David Arinzon says:
      
      ====================
      ENA driver bug fixes
      
      From: David Arinzon <darinzon@amazon.com>
      
      This patchset contains multiple bug fixes for the
      ENA driver.
      ====================
      
      Link: https://lore.kernel.org/r/20240410091358.16289-1-darinzon@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4e1ad31c
    • David Arinzon's avatar
      net: ena: Set tx_info->xdpf value to NULL · 36a1ca01
      David Arinzon authored
      The patch mentioned in the `Fixes` tag removed the explicit assignment
      of tx_info->xdpf to NULL with the justification that there's no need
      to set tx_info->xdpf to NULL and tx_info->num_of_bufs to 0 in case
      of a mapping error. Both values won't be used once the mapping function
      returns an error, and their values would be overridden by the next
      transmitted packet.
      
      While both values do indeed get overridden in the next transmission
      call, the value of tx_info->xdpf is also used to check whether a TX
      descriptor's transmission has been completed (i.e. a completion for it
      was polled).
      
      An example scenario:
      1. Mapping failed, tx_info->xdpf wasn't set to NULL
      2. A VF reset occurred leading to IO resource destruction and
         a call to ena_free_tx_bufs() function
      3. Although the descriptor whose mapping failed was freed by the
         transmission function, it still passes the check
           if (!tx_info->skb)
      
         (skb and xdp_frame are in a union)
      4. The xdp_frame associated with the descriptor is freed twice
      
      This patch returns the assignment of NULL to tx_info->xdpf to make the
      cleaning function knows that the descriptor is already freed.
      
      Fixes: 504fd6a5 ("net: ena: fix DMA mapping function issues in XDP")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      36a1ca01
    • David Arinzon's avatar
      net: ena: Fix incorrect descriptor free behavior · bf02d9fe
      David Arinzon authored
      ENA has two types of TX queues:
      - queues which only process TX packets arriving from the network stack
      - queues which only process TX packets forwarded to it by XDP_REDIRECT
        or XDP_TX instructions
      
      The ena_free_tx_bufs() cycles through all descriptors in a TX queue
      and unmaps + frees every descriptor that hasn't been acknowledged yet
      by the device (uncompleted TX transactions).
      The function assumes that the processed TX queue is necessarily from
      the first category listed above and ends up using napi_consume_skb()
      for descriptors belonging to an XDP specific queue.
      
      This patch solves a bug in which, in case of a VF reset, the
      descriptors aren't freed correctly, leading to crashes.
      
      Fixes: 548c4940 ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bf02d9fe
    • David Arinzon's avatar
      net: ena: Wrong missing IO completions check order · f7e41718
      David Arinzon authored
      Missing IO completions check is called every second (HZ jiffies).
      This commit fixes several issues with this check:
      
      1. Duplicate queues check:
         Max of 4 queues are scanned on each check due to monitor budget.
         Once reaching the budget, this check exits under the assumption that
         the next check will continue to scan the remainder of the queues,
         but in practice, next check will first scan the last already scanned
         queue which is not necessary and may cause the full queue scan to
         last a couple of seconds longer.
         The fix is to start every check with the next queue to scan.
         For example, on 8 IO queues:
         Bug: [0,1,2,3], [3,4,5,6], [6,7]
         Fix: [0,1,2,3], [4,5,6,7]
      
      2. Unbalanced queues check:
         In case the number of active IO queues is not a multiple of budget,
         there will be checks which don't utilize the full budget
         because the full scan exits when reaching the last queue id.
         The fix is to run every TX completion check with exact queue budget
         regardless of the queue id.
         For example, on 7 IO queues:
         Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
         Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
         The budget may be lowered in case the number of IO queues is less
         than the budget (4) to make sure there are no duplicate queues on
         the same check.
         For example, on 3 IO queues:
         Bug: [0,1,2,0], [1,2,0,1]
         Fix: [0,1,2], [0,1,2]
      
      Fixes: 1738cd3e ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarAmit Bernstein <amitbern@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f7e41718
    • David Arinzon's avatar
      net: ena: Fix potential sign extension issue · 713a8519
      David Arinzon authored
      Small unsigned types are promoted to larger signed types in
      the case of multiplication, the result of which may overflow.
      In case the result of such a multiplication has its MSB
      turned on, it will be sign extended with '1's.
      This changes the multiplication result.
      
      Code example of the phenomenon:
      -------------------------------
      u16 x, y;
      size_t z1, z2;
      
      x = y = 0xffff;
      printk("x=%x y=%x\n",x,y);
      
      z1 = x*y;
      z2 = (size_t)x*y;
      
      printk("z1=%lx z2=%lx\n", z1, z2);
      
      Output:
      -------
      x=ffff y=ffff
      z1=fffffffffffe0001 z2=fffe0001
      
      The expected result of ffff*ffff is fffe0001, and without the
      explicit casting to avoid the unwanted sign extension we got
      fffffffffffe0001.
      
      This commit adds an explicit casting to avoid the sign extension
      issue.
      
      Fixes: 689b2bda ("net: ena: add functions for handling Low Latency Queues in ena_com")
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      713a8519