1. 14 Jul, 2022 12 commits
    • Sasha Neftin's avatar
      e1000e: Enable GPT clock before sending message to CSME · b49feacb
      Sasha Neftin authored
      On corporate (CSME) ADL systems, the Ethernet Controller may stop working
      ("HW unit hang") after exiting from the s0ix state. The reason is that
      CSME misses the message sent by the host. Enabling the dynamic GPT clock
      solves this problem. This clock is cleared upon HW initialization.
      
      Fixes: 3e55d231 ("e1000e: Add handshake with the CSME to support S0ix")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821Reviewed-by: default avatarDima Ruinskiy <dima.ruinskiy@intel.com>
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b49feacb
    • Nicolas Dichtel's avatar
      selftests/net: test nexthop without gw · cd72e61b
      Nicolas Dichtel authored
      This test implement the scenario described in the commit
      "ip: fix dflt addr selection for connected nexthop".
      The test configures a nexthop object with an output device only (no gateway
      address) and a route that uses this nexthop. The goal is to check if the
      kernel selects a valid source address.
      
      Link: https://lore.kernel.org/netdev/20220712095545.10947-1-nicolas.dichtel@6wind.com/Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20220713114853.29406-2-nicolas.dichtel@6wind.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cd72e61b
    • Nicolas Dichtel's avatar
      ip: fix dflt addr selection for connected nexthop · 747c1430
      Nicolas Dichtel authored
      When a nexthop is added, without a gw address, the default scope was set
      to 'host'. Thus, when a source address is selected, 127.0.0.1 may be chosen
      but rejected when the route is used.
      
      When using a route without a nexthop id, the scope can be configured in the
      route, thus the problem doesn't exist.
      
      To explain more deeply: when a user creates a nexthop, it cannot specify
      the scope. To create it, the function nh_create_ipv4() calls fib_check_nh()
      with scope set to 0. fib_check_nh() calls fib_check_nh_nongw() wich was
      setting scope to 'host'. Then, nh_create_ipv4() calls
      fib_info_update_nhc_saddr() with scope set to 'host'. The src addr is
      chosen before the route is inserted.
      
      When a 'standard' route (ie without a reference to a nexthop) is added,
      fib_create_info() calls fib_info_update_nhc_saddr() with the scope set by
      the user. iproute2 set the scope to 'link' by default.
      
      Here is a way to reproduce the problem:
      ip netns add foo
      ip -n foo link set lo up
      ip netns add bar
      ip -n bar link set lo up
      sleep 1
      
      ip -n foo link add name eth0 type dummy
      ip -n foo link set eth0 up
      ip -n foo address add 192.168.0.1/24 dev eth0
      
      ip -n foo link add name veth0 type veth peer name veth1 netns bar
      ip -n foo link set veth0 up
      ip -n bar link set veth1 up
      
      ip -n bar address add 192.168.1.1/32 dev veth1
      ip -n bar route add default dev veth1
      
      ip -n foo nexthop add id 1 dev veth0
      ip -n foo route add 192.168.1.1 nhid 1
      
      Try to get/use the route:
      > $ ip -n foo route get 192.168.1.1
      > RTNETLINK answers: Invalid argument
      > $ ip netns exec foo ping -c1 192.168.1.1
      > ping: connect: Invalid argument
      
      Try without nexthop group (iproute2 sets scope to 'link' by dflt):
      ip -n foo route del 192.168.1.1
      ip -n foo route add 192.168.1.1 dev veth0
      
      Try to get/use the route:
      > $ ip -n foo route get 192.168.1.1
      > 192.168.1.1 dev veth0 src 192.168.0.1 uid 0
      >     cache
      > $ ip netns exec foo ping -c1 192.168.1.1
      > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
      > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.039 ms
      >
      > --- 192.168.1.1 ping statistics ---
      > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
      > rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms
      
      CC: stable@vger.kernel.org
      Fixes: 597cfe4f ("nexthop: Add support for IPv4 nexthops")
      Reported-by: default avatarEdwin Brossette <edwin.brossette@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20220713114853.29406-1-nicolas.dichtel@6wind.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      747c1430
    • Chia-Lin Kao (AceLan)'s avatar
      net: atlantic: remove aq_nic_deinit() when resume · 2e15c51f
      Chia-Lin Kao (AceLan) authored
      aq_nic_deinit() has been called while suspending, so we don't have to call
      it again on resume.
      Actually, call it again leads to another hang issue when resuming from
      S3.
      
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992345] Call Trace:
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992346] <TASK>
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992348] aq_nic_deinit+0xb4/0xd0 [atlantic]
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992356] aq_pm_thaw+0x7f/0x100 [atlantic]
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992362] pci_pm_resume+0x5c/0x90
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992366] ? pci_pm_thaw+0x80/0x80
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992368] dpm_run_callback+0x4e/0x120
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992371] device_resume+0xad/0x200
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992373] async_resume+0x1e/0x40
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992374] async_run_entry_fn+0x33/0x120
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992377] process_one_work+0x220/0x3c0
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992380] worker_thread+0x4d/0x3f0
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992382] ? process_one_work+0x3c0/0x3c0
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992384] kthread+0x12a/0x150
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992386] ? set_kthread_struct+0x40/0x40
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992387] ret_from_fork+0x22/0x30
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992391] </TASK>
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992392] ---[ end trace 1ec8c79604ed5e0d ]---
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992394] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
      Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992397] atlantic 0000:02:00.0: PM: failed to resume async: error -110
      
      Fixes: 1809c30b ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
      Signed-off-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
      Link: https://lore.kernel.org/r/20220713111224.1535938-2-acelan.kao@canonical.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2e15c51f
    • Chia-Lin Kao (AceLan)'s avatar
      net: atlantic: remove deep parameter on suspend/resume functions · 0f332507
      Chia-Lin Kao (AceLan) authored
      Below commit claims that atlantic NIC requires to reset the device on pm
      op, and had set the deep to true for all suspend/resume functions.
      commit 1809c30b ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
      So, we could remove deep parameter on suspend/resume functions without
      any functional change.
      
      Fixes: 1809c30b ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
      Signed-off-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
      Link: https://lore.kernel.org/r/20220713111224.1535938-1-acelan.kao@canonical.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0f332507
    • Íñigo Huguet's avatar
      sfc: fix kernel panic when creating VF · ada74c55
      Íñigo Huguet authored
      When creating VFs a kernel panic can happen when calling to
      efx_ef10_try_update_nic_stats_vf.
      
      When releasing a DMA coherent buffer, sometimes, I don't know in what
      specific circumstances, it has to unmap memory with vunmap. It is
      disallowed to do that in IRQ context or with BH disabled. Otherwise, we
      hit this line in vunmap, causing the crash:
        BUG_ON(in_interrupt());
      
      This patch reenables BH to release the buffer.
      
      Log messages when the bug is hit:
       kernel BUG at mm/vmalloc.c:2727!
       invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G          I      --------- ---  5.14.0-119.el9.x86_64 #1
       Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
       RIP: 0010:vunmap+0x2e/0x30
       ...skip...
       Call Trace:
        __iommu_dma_free+0x96/0x100
        efx_nic_free_buffer+0x2b/0x40 [sfc]
        efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc]
        efx_ef10_update_stats_vf+0x18/0x40 [sfc]
        efx_start_all+0x15e/0x1d0 [sfc]
        efx_net_open+0x5a/0xe0 [sfc]
        __dev_open+0xe7/0x1a0
        __dev_change_flags+0x1d7/0x240
        dev_change_flags+0x21/0x60
        ...skip...
      
      Fixes: d7788196 ("sfc: DMA the VF stats only when requested")
      Reported-by: default avatarMa Yuying <yuma@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Acked-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ada74c55
    • Paolo Abeni's avatar
      Merge branch 'seg6-fix-skb-checksum-for-srh-encapsulation-insertion' · cc91b09b
      Paolo Abeni authored
      Andrea Mayer says:
      
      ====================
      seg6: fix skb checksum for SRH encapsulation/insertion
      
      The Linux kernel supports Segment Routing Header (SRH)
      encapsulation/insertion operations by providing the capability to: i)
      encapsulate a packet in an outer IPv6 header with a specified SRH; ii)
      insert a specified SRH directly after the IPv6 header of the packet.
      Note that the insertion operation is also referred to as 'injection'.
      
      The two operations are respectively supported by seg6_do_srh_encap() and
      seg6_do_srh_inline(), which operate on the skb associated to the packet as
      needed (e.g. adding the necessary headers and initializing them, while
      taking care to recalculate the skb checksum).
      
      seg6_do_srh_encap() and seg6_do_srh_inline() do not initialize the payload
      length of the IPv6 header, which is carried out by the caller functions.
      However, this approach causes the corruption of the skb checksum which
      needs to be updated only after initialization of headers is completed
      (thanks to Paolo Abeni for detecting this issue).
      
      The patchset fixes the skb checksum corruption by moving the IPv6 header
      payload length initialization from the callers of seg6_do_srh_encap() and
      seg6_do_srh_inline() directly into these functions.
      
      This patchset is organized as follows:
       - patch 1/3, seg6: fix skb checksum evaluation in SRH
         encapsulation/insertion;
          (* SRH encapsulation/insertion available since v4.10)
      
       - patch 2/3, seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps
         behaviors;
          (* SRv6 End.B6 and End.B6.Encaps behaviors available since v4.14)
      
       - patch 3/3, seg6: bpf: fix skb checksum in bpf_push_seg6_encap();
          (* bpf IPv6 Segment Routing helpers available since v4.18)
      
      ====================
      
      Link: https://lore.kernel.org/r/20220712175837.16267-1-andrea.mayer@uniroma2.itSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cc91b09b
    • Andrea Mayer's avatar
      seg6: bpf: fix skb checksum in bpf_push_seg6_encap() · 4889fbd9
      Andrea Mayer authored
      Both helper functions bpf_lwt_seg6_action() and bpf_lwt_push_encap() use
      the bpf_push_seg6_encap() to encapsulate the packet in an IPv6 with Segment
      Routing Header (SRH) or insert an SRH between the IPv6 header and the
      payload.
      To achieve this result, such helper functions rely on bpf_push_seg6_encap()
      which, in turn, leverages seg6_do_srh_{encap,inline}() to perform the
      required operation (i.e. encap/inline).
      
      This patch removes the initialization of the IPv6 header payload length
      from bpf_push_seg6_encap(), as it is now handled properly by
      seg6_do_srh_{encap,inline}() to prevent corruption of the skb checksum.
      
      Fixes: fe94cc29 ("bpf: Add IPv6 Segment Routing helpers")
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4889fbd9
    • Andrea Mayer's avatar
      seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors · f048880f
      Andrea Mayer authored
      The SRv6 End.B6 and End.B6.Encaps behaviors rely on functions
      seg6_do_srh_{encap,inline}() to, respectively: i) encapsulate the
      packet within an outer IPv6 header with the specified Segment Routing
      Header (SRH); ii) insert the specified SRH directly after the IPv6
      header of the packet.
      
      This patch removes the initialization of the IPv6 header payload length
      from the input_action_end_b6{_encap}() functions, as it is now handled
      properly by seg6_do_srh_{encap,inline}() to avoid corruption of the skb
      checksum.
      
      Fixes: 140f04c3 ("ipv6: sr: implement several seg6local actions")
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f048880f
    • Andrea Mayer's avatar
      seg6: fix skb checksum evaluation in SRH encapsulation/insertion · df8386d1
      Andrea Mayer authored
      Support for SRH encapsulation and insertion was introduced with
      commit 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and
      injection with lwtunnels"), through the seg6_do_srh_encap() and
      seg6_do_srh_inline() functions, respectively.
      The former encapsulates the packet in an outer IPv6 header along with
      the SRH, while the latter inserts the SRH between the IPv6 header and
      the payload. Then, the headers are initialized/updated according to the
      operating mode (i.e., encap/inline).
      Finally, the skb checksum is calculated to reflect the changes applied
      to the headers.
      
      The IPv6 payload length ('payload_len') is not initialized
      within seg6_do_srh_{inline,encap}() but is deferred in seg6_do_srh(), i.e.
      the caller of seg6_do_srh_{inline,encap}().
      However, this operation invalidates the skb checksum, since the
      'payload_len' is updated only after the checksum is evaluated.
      
      To solve this issue, the initialization of the IPv6 payload length is
      moved from seg6_do_srh() directly into the seg6_do_srh_{inline,encap}()
      functions and before the skb checksum update takes place.
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/all/20220705190727.69d532417be7438b15404ee1@uniroma2.itSigned-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      df8386d1
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · f46a5a9c
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-07-12
      
      This series contains updates to ice driver only.
      
      Paul fixes detection of E822 devices for firmware update and changes NVM
      read for snapshot creation to be done in chunks as some systems cannot
      read the entire NVM in the allotted time.
      ====================
      
      Link: https://lore.kernel.org/r/20220712164829.7275-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f46a5a9c
    • Íñigo Huguet's avatar
      sfc: fix use after free when disabling sriov · ebe41da5
      Íñigo Huguet authored
      Use after free is detected by kfence when disabling sriov. What was read
      after being freed was vf->pci_dev: it was freed from pci_disable_sriov
      and later read in efx_ef10_sriov_free_vf_vports, called from
      efx_ef10_sriov_free_vf_vswitching.
      
      Set the pointer to NULL at release time to not trying to read it later.
      
      Reproducer and dmesg log (note that kfence doesn't detect it every time):
      $ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
      $ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
      
       BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
      
       Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224):
        efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
        efx_ef10_pci_sriov_disable+0x38/0x70 [sfc]
        efx_pci_sriov_configure+0x24/0x40 [sfc]
        sriov_numvfs_store+0xfe/0x140
        kernfs_fop_write_iter+0x11c/0x1b0
        new_sync_write+0x11f/0x1b0
        vfs_write+0x1eb/0x280
        ksys_write+0x5f/0xe0
        do_syscall_64+0x5c/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k
      
       allocated by task 6771 on cpu 10 at 3137.860196s:
        pci_alloc_dev+0x21/0x60
        pci_iov_add_virtfn+0x2a2/0x320
        sriov_enable+0x212/0x3e0
        efx_ef10_sriov_configure+0x67/0x80 [sfc]
        efx_pci_sriov_configure+0x24/0x40 [sfc]
        sriov_numvfs_store+0xba/0x140
        kernfs_fop_write_iter+0x11c/0x1b0
        new_sync_write+0x11f/0x1b0
        vfs_write+0x1eb/0x280
        ksys_write+0x5f/0xe0
        do_syscall_64+0x5c/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       freed by task 6771 on cpu 12 at 3170.991309s:
        device_release+0x34/0x90
        kobject_cleanup+0x3a/0x130
        pci_iov_remove_virtfn+0xd9/0x120
        sriov_disable+0x30/0xe0
        efx_ef10_pci_sriov_disable+0x57/0x70 [sfc]
        efx_pci_sriov_configure+0x24/0x40 [sfc]
        sriov_numvfs_store+0xfe/0x140
        kernfs_fop_write_iter+0x11c/0x1b0
        new_sync_write+0x11f/0x1b0
        vfs_write+0x1eb/0x280
        ksys_write+0x5f/0xe0
        do_syscall_64+0x5c/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 3c5eb876 ("sfc: create vports for VFs and assign random MAC addresses")
      Reported-by: default avatarYanghang Liu <yanghliu@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebe41da5
  2. 13 Jul, 2022 27 commits
  3. 12 Jul, 2022 1 commit
    • Paul M Stillwell Jr's avatar
      ice: change devlink code to read NVM in blocks · 7b6f9462
      Paul M Stillwell Jr authored
      When creating a snapshot of the NVM the driver needs to read the entire
      contents from the NVM and store it. The NVM reads are protected by a lock
      that is shared between the driver and the firmware.
      
      If the driver takes too long to read the entire NVM (which can happen on
      some systems) then the firmware could reclaim the lock and cause subsequent
      reads from the driver to fail.
      
      We could fix this by increasing the timeout that we pass to the firmware,
      but we could end up in the same situation again if the system is slow.
      Instead have the driver break the reading of the NVM into blocks that are
      small enough that we have confidence that the read will complete within the
      timeout time, but large enough not to cause significant AQ overhead.
      
      Fixes: dce730f1 ("ice: add a devlink region for dumping NVM contents")
      Signed-off-by: default avatarPaul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7b6f9462