1. 06 Oct, 2023 1 commit
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2023100502' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 19fbf677
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - power management fix for intel-ish-hid (Srinivas Pandruvada)
      
       - power management fix for hid-nintendo (Martino Fontana)
      
       - error handling fixes for nvidia-shield (Christophe JAILLET)
      
       - memory leak fix for hid-sony (Christophe JAILLET)
      
       - fix for slab out-of-bound write in hid-holtek (Ma Ke)
      
       - other assorted smaller fixes and device ID / quirk entry additions
      
      * tag 'for-linus-2023100502' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: Add quirk to ignore the touchscreen battery on HP ENVY 15-eu0556ng
        HID: intel-ish-hid: ipc: Disable and reenable ACPI GPE bit
        HID: sony: remove duplicate NULL check before calling usb_free_urb()
        HID: nintendo: reinitialize USB Pro Controller after resuming from suspend
        HID: nvidia-shield: Fix some missing function calls() in the probe error handling path
        HID: nvidia-shield: Fix a missing led_classdev_unregister() in the probe error handling path
        HID: multitouch: Add required quirk for Synaptics 0xcd7e device
        HID: nvidia-shield: Select POWER_SUPPLY Kconfig option
        HID: holtek: fix slab-out-of-bounds Write in holtek_kbd_input_event
        HID: nvidia-shield: add LEDS_CLASS dependency
        HID: logitech-hidpp: Add Bluetooth ID for the Logitech M720 Triathlon mouse
        HID: steelseries: Fix signedness bug in steelseries_headset_arctis_1_fetch_battery()
        HID: sony: Fix a potential memory leak in sony_probe()
      19fbf677
  2. 05 Oct, 2023 27 commits
  3. 04 Oct, 2023 12 commits
    • Neal Cardwell's avatar
      tcp: fix delayed ACKs for MSS boundary condition · 4720852e
      Neal Cardwell authored
      This commit fixes poor delayed ACK behavior that can cause poor TCP
      latency in a particular boundary condition: when an application makes
      a TCP socket write that is an exact multiple of the MSS size.
      
      The problem is that there is painful boundary discontinuity in the
      current delayed ACK behavior. With the current delayed ACK behavior,
      we have:
      
      (1) If an app reads data when > 1*MSS is unacknowledged, then
          tcp_cleanup_rbuf() ACKs immediately because of:
      
           tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
      
      (2) If an app reads all received data, and the packets were < 1*MSS,
          and either (a) the app is not ping-pong or (b) we received two
          packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
          of:
      
           ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
            ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
             !inet_csk_in_pingpong_mode(sk))) &&
      
      (3) *However*: if an app reads exactly 1*MSS of data,
          tcp_cleanup_rbuf() does not send an immediate ACK. This is true
          even if the app is not ping-pong and the 1*MSS of data had the PSH
          bit set, suggesting the sending application completed an
          application write.
      
      Thus if the app is not ping-pong, we have this painful case where
      >1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
      write whose last skb is an exact multiple of 1*MSS can get a 40ms
      delayed ACK. This means that any app that transfers data in one
      direction and takes care to align write size or packet size with MSS
      can suffer this problem. With receive zero copy making 4KB MSS values
      more common, it is becoming more common to have application writes
      naturally align with MSS, and more applications are likely to
      encounter this delayed ACK problem.
      
      The fix in this commit is to refine the delayed ACK heuristics with a
      simple check: immediately ACK a received 1*MSS skb with PSH bit set if
      the app reads all data. Why? If an skb has a len of exactly 1*MSS and
      has the PSH bit set then it is likely the end of an application
      write. So more data may not be arriving soon, and yet the data sender
      may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
      set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
      an ACK immediately if the app reads all of the data and is not
      ping-pong. Note that this logic is also executed for the case where
      len > MSS, but in that case this logic does not matter (and does not
      hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
      app reads data and there is more than an MSS of unACKed data.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Xin Guo <guoxin0309@gmail.com>
      Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4720852e
    • Neal Cardwell's avatar
      tcp: fix quick-ack counting to count actual ACKs of new data · 059217c1
      Neal Cardwell authored
      This commit fixes quick-ack counting so that it only considers that a
      quick-ack has been provided if we are sending an ACK that newly
      acknowledges data.
      
      The code was erroneously using the number of data segments in outgoing
      skbs when deciding how many quick-ack credits to remove. This logic
      does not make sense, and could cause poor performance in
      request-response workloads, like RPC traffic, where requests or
      responses can be multi-segment skbs.
      
      When a TCP connection decides to send N quick-acks, that is to
      accelerate the cwnd growth of the congestion control module
      controlling the remote endpoint of the TCP connection. That quick-ack
      decision is purely about the incoming data and outgoing ACKs. It has
      nothing to do with the outgoing data or the size of outgoing data.
      
      And in particular, an ACK only serves the intended purpose of allowing
      the remote congestion control to grow the congestion window quickly if
      the ACK is ACKing or SACKing new data.
      
      The fix is simple: only count packets as serving the goal of the
      quickack mechanism if they are ACKing/SACKing new data. We can tell
      whether this is the case by checking inet_csk_ack_scheduled(), since
      we schedule an ACK exactly when we are ACKing/SACKing new data.
      
      Fixes: fc6415bc ("[TCP]: Fix quick-ack decrementing with TSO.")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      059217c1
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · c56e67f3
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      netfilter patches for net
      
      First patch resolves a regression with vlan header matching, this was
      broken since 6.5 release.  From myself.
      
      Second patch fixes an ancient problem with sctp connection tracking in
      case INIT_ACK packets are delayed.  This comes with a selftest, both
      patches from Xin Long.
      
      Patch 4 extends the existing nftables audit selftest, from
      Phil Sutter.
      
      Patch 5, also from Phil, avoids a situation where nftables
      would emit an audit record twice. This was broken since 5.13 days.
      
      Patch 6, from myself, avoids spurious insertion failure if we encounter an
      overlapping but expired range during element insertion with the
      'nft_set_rbtree' backend. This problem exists since 6.2.
      
      * tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
        netfilter: nf_tables: Deduplicate nft_register_obj audit logs
        selftests: netfilter: Extend nft_audit.sh
        selftests: netfilter: test for sctp collision processing in nf_conntrack
        netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
        netfilter: nft_payload: rebuild vlan header on h_proto access
      ====================
      
      Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c56e67f3
    • Randy Dunlap's avatar
      page_pool: fix documentation typos · 513dbc10
      Randy Dunlap authored
      Correct grammar for better readability.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Link: https://lore.kernel.org/r/20231001003846.29541-1-rdunlap@infradead.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      513dbc10
    • Chengfeng Ye's avatar
      tipc: fix a potential deadlock on &tx->lock · 08e50cf0
      Chengfeng Ye authored
      It seems that tipc_crypto_key_revoke() could be be invoked by
      wokequeue tipc_crypto_work_rx() under process context and
      timer/rx callback under softirq context, thus the lock acquisition
      on &tx->lock seems better use spin_lock_bh() to prevent possible
      deadlock.
      
      This flaw was found by an experimental static analysis tool I am
      developing for irq-related deadlock.
      
      tipc_crypto_work_rx() <workqueue>
      --> tipc_crypto_key_distr()
      --> tipc_bcast_xmit()
      --> tipc_bcbase_xmit()
      --> tipc_bearer_bc_xmit()
      --> tipc_crypto_xmit()
      --> tipc_ehdr_build()
      --> tipc_crypto_key_revoke()
      --> spin_lock(&tx->lock)
      <timer interrupt>
         --> tipc_disc_timeout()
         --> tipc_bearer_xmit_skb()
         --> tipc_crypto_xmit()
         --> tipc_ehdr_build()
         --> tipc_crypto_key_revoke()
         --> spin_lock(&tx->lock) <deadlock here>
      Signed-off-by: default avatarChengfeng Ye <dg573847474@gmail.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Fixes: fc1b6d6d ("tipc: introduce TIPC encryption & authentication")
      Link: https://lore.kernel.org/r/20230927181414.59928-1-dg573847474@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08e50cf0
    • Ben Wolsieffer's avatar
      net: stmmac: dwmac-stm32: fix resume on STM32 MCU · 6f195d6b
      Ben Wolsieffer authored
      The STM32MP1 keeps clk_rx enabled during suspend, and therefore the
      driver does not enable the clock in stm32_dwmac_init() if the device was
      suspended. The problem is that this same code runs on STM32 MCUs, which
      do disable clk_rx during suspend, causing the clock to never be
      re-enabled on resume.
      
      This patch adds a variant flag to indicate that clk_rx remains enabled
      during suspend, and uses this to decide whether to enable the clock in
      stm32_dwmac_init() if the device was suspended.
      
      This approach fixes this specific bug with limited opportunity for
      unintended side-effects, but I have a follow up patch that will refactor
      the clock configuration and hopefully make it less error prone.
      
      Fixes: 6528e02c ("net: ethernet: stmmac: add adaptation for stm32mp157c.")
      Signed-off-by: default avatarBen Wolsieffer <ben.wolsieffer@hefring.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230927175749.1419774-1-ben.wolsieffer@hefring.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f195d6b
    • Rahul Rameshbabu's avatar
      HID: nvidia-shield: Select POWER_SUPPLY Kconfig option · 0c0faa29
      Rahul Rameshbabu authored
      Battery information reported by the driver depends on the power supply
      subsystem. Select the required subsystem when the HID_NVIDIA_SHIELD Kconfig
      option is enabled.
      
      Fixes: 3ab196f8 ("HID: nvidia-shield: Add battery support for Thunderstrike")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      0c0faa29
    • Benjamin Poirier's avatar
      ipv4: Set offload_failed flag in fibmatch results · 0add5c59
      Benjamin Poirier authored
      Due to a small omission, the offload_failed flag is missing from ipv4
      fibmatch results. Make sure it is set correctly.
      
      The issue can be witnessed using the following commands:
      echo "1 1" > /sys/bus/netdevsim/new_device
      ip link add dummy1 up type dummy
      ip route add 192.0.2.0/24 dev dummy1
      echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/fib/fail_route_offload
      ip route add 198.51.100.0/24 dev dummy1
      ip route
      	# 192.168.15.0/24 has rt_trap
      	# 198.51.100.0/24 has rt_offload_failed
      ip route get 192.168.15.1 fibmatch
      	# Result has rt_trap
      ip route get 198.51.100.1 fibmatch
      	# Result differs from the route shown by `ip route`, it is missing
      	# rt_offload_failed
      ip link del dev dummy1
      echo 1 > /sys/bus/netdevsim/del_device
      
      Fixes: 36c5100e ("IPv4: Add "offload failed" indication to routes")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20230926182730.231208-1-bpoirier@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0add5c59
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-6.6-rc5' of... · ba7d997a
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fix from Shuah Khan:
       "One single fix to Makefile to fix the incorrect TARGET name for uevent
        test"
      
      * tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: Fix wrong TARGET in kselftest top level Makefile
      ba7d997a
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 72897b29
      Jakub Kicinski authored
      Johannes Berg says:
      
      ====================
      
      Quite a collection of fixes this time, really too many
      to list individually. Many stack fixes, even rfkill
      (found by simulation and the new eevdf scheduler)!
      
      Also a bigger maintainers file cleanup, to remove old
      and redundant information.
      
      * tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (32 commits)
        wifi: iwlwifi: mvm: Fix incorrect usage of scan API
        wifi: mac80211: Create resources for disabled links
        wifi: cfg80211: avoid leaking stack data into trace
        wifi: mac80211: allow transmitting EAPOL frames with tainted key
        wifi: mac80211: work around Cisco AP 9115 VHT MPDU length
        wifi: cfg80211: Fix 6GHz scan configuration
        wifi: mac80211: fix potential key leak
        wifi: mac80211: fix potential key use-after-free
        wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling
        wifi: brcmfmac: Replace 1-element arrays with flexible arrays
        wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet
        wifi: rtw88: rtw8723d: Fix MAC address offset in EEPROM
        rfkill: sync before userspace visibility/changes
        wifi: mac80211: fix mesh id corruption on 32 bit systems
        wifi: cfg80211: add missing kernel-doc for cqm_rssi_work
        wifi: cfg80211: fix cqm_config access race
        wifi: iwlwifi: mvm: Fix a memory corruption issue
        wifi: iwlwifi: Ensure ack flag is properly cleared.
        wifi: iwlwifi: dbg_ini: fix structure packing
        iwlwifi: mvm: handle PS changes in vif_cfg_changed
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230927095835.25803-2-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72897b29
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1eb3dee1
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-10-02
      
      We've added 11 non-merge commits during the last 12 day(s) which contain
      a total of 12 files changed, 176 insertions(+), 41 deletions(-).
      
      The main changes are:
      
      1) Fix BPF verifier to reset backtrack_state masks on global function
         exit as otherwise subsequent precision tracking would reuse them,
         from Andrii Nakryiko.
      
      2) Several sockmap fixes for available bytes accounting,
         from John Fastabend.
      
      3) Reject sk_msg egress redirects to non-TCP sockets given this
         is only supported for TCP sockets today, from Jakub Sitnicki.
      
      4) Fix a syzkaller splat in bpf_mprog when hitting maximum program
         limits with BPF_F_BEFORE directive, from Daniel Borkmann
         and Nikolay Aleksandrov.
      
      5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust
         size_index for selecting a bpf_mem_cache, from Hou Tao.
      
      6) Fix arch_prepare_bpf_trampoline return code for s390 JIT,
         from Song Liu.
      
      7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off,
         from Leon Hwang.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Use kmalloc_size_roundup() to adjust size_index
        selftest/bpf: Add various selftests for program limits
        bpf, mprog: Fix maximum program check on mprog attachment
        bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets
        bpf, sockmap: Add tests for MSG_F_PEEK
        bpf, sockmap: Do not inc copied_seq when PEEK flag set
        bpf: tcp_read_skb needs to pop skb regardless of seq
        bpf: unconditionally reset backtrack_state masks on global func exit
        bpf: Fix tr dereferencing
        selftests/bpf: Check bpf_cubic_acked() is called via struct_ops
        s390/bpf: Let arch_prepare_bpf_trampoline return program size
      ====================
      
      Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1eb3dee1
    • Florian Westphal's avatar
      netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure · 08738827
      Florian Westphal authored
      nft_rbtree_gc_elem() walks back and removes the end interval element that
      comes before the expired element.
      
      There is a small chance that we've cached this element as 'rbe_ge'.
      If this happens, we hold and test a pointer that has been queued for
      freeing.
      
      It also causes spurious insertion failures:
      
      $ cat test-testcases-sets-0044interval_overlap_0.1/testout.log
      Error: Could not process rule: File exists
      add element t s {  0 -  2 }
                         ^^^^^^
      Failed to insert  0 -  2 given:
      table ip t {
              set s {
                      type inet_service
                      flags interval,timeout
                      timeout 2s
                      gc-interval 2s
              }
      }
      
      The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.
      
      Reason is that when we try to insert, the tree may hold an expired
      element that collides with the range we're adding.
      While we do evict/erase this element, we can trip over this check:
      
      if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
            return -ENOTEMPTY;
      
      rbe_ge was erased by the synchronous gc, we should not have done this
      check.  Next attempt won't find it, so retry results in successful
      insertion.
      
      Restart in-kernel to avoid such spurious errors.
      
      Such restart are rare, unless userspace intentionally adds very large
      numbers of elements with very short timeouts while setting a huge
      gc interval.
      
      Even in this case, this cannot loop forever, on each retry an existing
      element has been removed.
      
      As the caller is holding the transaction mutex, its impossible
      for a second entity to add more expiring elements to the tree.
      
      After this it also becomes feasible to remove the async gc worker
      and perform all garbage collection from the commit path.
      
      Fixes: c9e6978e ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      08738827