1. 28 Feb, 2024 26 commits
  2. 27 Feb, 2024 14 commits
    • Eric Dumazet's avatar
      netlink: use kvmalloc() in netlink_alloc_large_skb() · f8cbf6bd
      Eric Dumazet authored
      This is a followup of commit 234ec0b6 ("netlink: fix potential
      sleeping issue in mqueue_flush_file"), because vfree_atomic()
      overhead is unfortunate for medium sized allocations.
      
      1) If the allocation is smaller than PAGE_SIZE, do not bother
         with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
         while NLMSG_GOODSIZE is smaller than 8KB.
      
      2) Use kvmalloc(), which might allocate one high order page
         instead of vmalloc if memory is not too fragmented.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8cbf6bd
    • Alexander Lobakin's avatar
      bnxt_en: fix accessing vnic_info before allocating it · c4b04a80
      Alexander Lobakin authored
      bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration
      block, but allocates it much later. As a result, the following crash
      happens on my setup:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000090
       fbcon: Taking over console
       #PF: supervisor write access in kernel mode
       #PF: error_code (0x0002) - not-present page
       PGD 12f382067 P4D 0
       Oops: 8002 [#1] PREEMPT SMP NOPTI
       CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ #49
       Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023
       RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en]
       Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74
       RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000
       RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
       R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540
       FS:  000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CRO: 0000000080050033
       CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0
       DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
      
       Call Trace:
       <TASK>
       ? __die_body+0x68/0xb0
       ? page_fault_oops+0x3a6/0x400
       ? exc_page_fault+0x7a/0x1b0
       ? asm_exc_page_fault+0x26/8x30
       ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en]
       ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en]
       _bnxt_open_nic+0x198/0xa50 [bnxt_en]
       ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en]
       bnxt_open+0xeb/0x1b0 [bnxt_en]
       _dev_open+0x12e/0x1f0
       _dev_change_flags+0xb0/0x200
       dev_change_flags+0x25/0x60
       do_setlink+0x463/0x1260
       ? sock_def_readable+0x14/0xc0
       ? rtnl_getlink+0x4b9/0x590
       ? _nla_validate_parse+0x91/0xfa0
       rtnl_newlink+0xbac/0xe40
       <...>
      
      Don't create a variable and dereference the first array member directly
      since it's used only once in the code.
      
      Fixes: ef4ee64e ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index")
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20240226144911.1297336-1-aleksander.lobakin@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c4b04a80
    • Jakub Kicinski's avatar
      selftests: netdevsim: be less selective for FW for the devlink test · b819a848
      Jakub Kicinski authored
      Commit 6151ff9c ("selftests: netdevsim: use suitable existing dummy
      file for flash test") introduced a nice trick to the devlink flashing
      test. Instead of user having to create a file under /lib/firmware
      we just pick the first one that already exists.
      
      Sadly, in AWS Linux there are no files directly under /lib/firmware,
      only in subdirectories. Don't limit the search to -maxdepth 1.
      We can use the %P print format to get the correct path for files
      inside subdirectories:
      
      $ find /lib/firmware -type f -printf '%P\n' | head -1
      intel-ucode/06-1a-05
      
      The full path is /lib/firmware/intel-ucode/06-1a-05
      
      This works in GNU find, busybox doesn't have printf at all,
      so we're not making it worse.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240224050658.930272-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b819a848
    • Jesper Nilsson's avatar
      net: stmmac: mmc_core: Drop interrupt registers from stats · d0dc1e42
      Jesper Nilsson authored
      The MMC IPC interrupt status and interrupt mask registers are
      of little use as Ethernet statistics, but incrementing counters
      based on the current interrupt and interrupt mask registers
      makes them actively misleading.
      
      For example, if the interrupt mask is set to 0x08420842,
      the current code will increment by that amount each iteration,
      leading to the following sequence of nonsense:
      
      mmc_rx_ipc_intr_mask: 969816526
      mmc_rx_ipc_intr_mask: 1108361744
      
      These registers have been included in the Ethernet statistics
      since the first version of MMC back in 2011 (commit 1c901a46).
      That commit also mentions the MMC interrupts as
      "something to add later (if actually useful)".
      
      If the registers are actually useful, they should probably
      be part of the Ethernet register dump instead of statistics,
      but for now, drop the counters for mmc_rx_ipc_intr and
      mmc_rx_ipc_intr_mask completely.
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: default avatarJesper Nilsson <jesper.nilsson@axis.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240223-stmmac_stats-v3-1-5d483c2a071a@axis.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d0dc1e42
    • Ciprian Regus's avatar
      net: ethernet: adi: adin1110: Reduce the MDIO_TRDONE poll interval · 2322467a
      Ciprian Regus authored
      In order to do a clause 22 access to the PHY registers of the ADIN1110,
      we have to write the MDIO frame to the ADIN1110_MDIOACC register, and
      then poll the MDIO_TRDONE bit (for a 1) in the same register. The
      device will set this bit to 1 once the internal MDIO transaction is
      done. In practice, this bit takes ~50 - 60 us to be set.
      
      The first attempt to poll the bit is right after the ADIN1110_MDIOACC
      register is written, so it will always be read as 0. The next check will
      only be done after 10 ms, which will result in the MDIO transactions
      taking a long time to complete. Reduce this polling interval to 100 us.
      Since this interval is short enough, switch the poll function to
      readx_poll_timeout_atomic() instead.
      Reviewed-by: default avatarNuno Sa <nuno.sa@analog.com>
      Signed-off-by: default avatarCiprian Regus <ciprian.regus@analog.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20240223162129.154114-1-ciprian.regus@analog.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2322467a
    • Paolo Abeni's avatar
      Merge branch 'net-ipa-don-t-abort-system-suspend' · 58cc8640
      Paolo Abeni authored
      Alex Elder says:
      
      ====================
      net: ipa: don't abort system suspend
      
      Currently the IPA code aborts an in-progress system suspend if an
      IPA interrupt arrives before the suspend completes.  There is no
      need to do that though, because the IPA driver handles a forced
      suspend correctly, quiescing any hardware activity before finally
      turning off clocks and interconnects.
      
      This series drops the call to pm_wakeup_dev_event() if an IPA
      SUSPEND interrupt arrives during system suspend.  Doing this
      makes the two remaining IPA power flags unnecessary, and allows
      some additional code to be cleaned up--and best of all, removed.
      The result is much simpler (and I'm really glad not to be using
      these flags any more).
      
      The first patch implements the main change.  The second and
      third remove the flags that were used to determine whether to
      call pm_wakeup_dev_event().  The next two remove a function that
      becomes a trivial wrapper, and the last one just avoids writing
      a register unnecessarily.
      
      Note that the first two patches will have checkpatch warnings,
      because checkpatch disagrees with my compiler on what to do when
      a block contains only a semicolon.  I went with what the compiler
      recommends.
      
      clang says: warning: suggest braces around empty body
      checkpatch: WARNING: braces {} are not necessary for single statement blocks
      
      ====================
      
      Link: https://lore.kernel.org/r/20240223133930.582041-1-elder@linaro.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      58cc8640
    • Alex Elder's avatar
      net: ipa: don't bother zeroing an already zero register · f9345952
      Alex Elder authored
      In ipa_interrupt_suspend_clear_all(), if the SUSPEND_INFO register
      read contains no set bits, there's no interrupt condition to clear.
      Skip the write to the clear register in that case.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f9345952
    • Alex Elder's avatar
      net: ipa: kill ipa_power_suspend_handler() · 423df2e0
      Alex Elder authored
      Now that ipa_power_suspend_handler() is a trivial wrapper around
      ipa_interrupt_suspend_clear_all(), we can open-code it in the one
      place it's used, and get rid of the function.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      423df2e0
    • Alex Elder's avatar
      net: ipa: move ipa_interrupt_suspend_clear_all() up · ef63ca78
      Alex Elder authored
      The next patch makes ipa_interrupt_suspend_clear_all() static,
      calling it only within "ipa_interrupt.c".  Move its definition
      higher in the file so no declaration is needed.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ef63ca78
    • Alex Elder's avatar
      net: ipa: kill the IPA_POWER_FLAG_RESUMED flag · dae5d6e8
      Alex Elder authored
      The IPA_POWER_FLAG_RESUMED was originally used to avoid calling
      pm_wakeup_dev_event() more than once when handling a SUSPEND
      interrupt.  This call is no longer made, so there' no need for the
      flag, so get rid of it.
      
      That leaves no more IPA power flags usefully defined, so just get
      rid of the bitmap in the IPA power structure and the definition of
      the ipa_power_flag enumerated type.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dae5d6e8
    • Alex Elder's avatar
      net: ipa: kill IPA_POWER_FLAG_SYSTEM · 54f19ff7
      Alex Elder authored
      The SYSTEM IPA power flag is set, cleared, and tested.  But nothing
      happens based on its value when tested, so it serves no purpose.
      Get rid of this flag.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      54f19ff7
    • Alex Elder's avatar
      net: ipa: don't bother aborting system resume · 4b2274d3
      Alex Elder authored
      The IPA interrupt can fire if there is data to be delivered to a GSI
      channel that is suspended.  This condition occurs in three scenarios.
      
      First, runtime power management automatically suspends the IPA
      hardware after half a second of inactivity.  This has nothing
      to do with system suspend, so a SYSTEM IPA power flag is used to
      avoid calling pm_wakeup_dev_event() when runtime suspended.
      
      Second, if the system is suspended, the receipt of an IPA interrupt
      should trigger a system resume.  Configuring the IPA interrupt for
      wakeup accomplishes this.
      
      Finally, if system suspend is underway and the IPA interrupt fires,
      we currently call pm_wakeup_dev_event() to abort the system suspend.
      
      The IPA driver correctly handles quiescing the hardware before
      suspending it, so there's really no need to abort a suspend in
      progress in the third case.  We can simply quiesce and suspend
      things, and be done.
      
      Incoming data can still wake the system after it's suspended.
      The IPA interrupt has wakeup mode enabled, so if it fires *after*
      we've suspended, it will trigger a wakeup (if not disabled via
      sysfs).
      
      Stop calling pm_wakeup_dev_event() to abort a system suspend in
      progress in ipa_power_suspend_handler().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4b2274d3
    • Heiner Kallweit's avatar
      net: phy: simplify genphy_c45_ethtool_set_eee · b38061fe
      Heiner Kallweit authored
      Simplify the function, no functional change intended.
      
      - Remove not needed variable unsupp, I think code is even better
        readable now.
      - Move setting phydev->eee_enabled out of the if clause
      - Simplify return value handling
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/442277c7-7431-4542-80b5-1d3d691714d7@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b38061fe
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-various-small-improvements' · 55a72460
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: various small improvements
      
      This series brings various small improvements to MPTCP and its
      selftests:
      
      Patch 1 prints an error if there are duplicated subtests names. It is
      important to have unique (sub)tests names in TAP, because some CI
      environments drop (sub)tests with duplicated names.
      
      Patch 2 is a preparation for patches 3 and 4, which check the protocol
      in tcp_sk() and mptcp_sk() with DEBUG_NET, only in code from net/mptcp/.
      We recently had the case where an MPTCP socket was wrongly treated as a
      TCP one, and fuzzers and static checkers never spot the issue. This
      would prevent such issues in the future.
      
      Patches 5 to 7 are some cleanup for the MPTCP selftests. These patches
      are not supposed to change the behaviour.
      
      Patch 8 sets the poll timeout in diag selftest to the same value as the
      one used in the other selftests.
      ====================
      
      Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-0-b6c8a10396bd@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55a72460