1. 07 Mar, 2023 1 commit
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 757b56a6
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-03-06
      
      We've added 8 non-merge commits during the last 7 day(s) which contain
      a total of 9 files changed, 64 insertions(+), 18 deletions(-).
      
      The main changes are:
      
      1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier,
         that is, keep resolving such instances instead of bailing out,
         from Lorenz Bauer.
      
      2) Fix BPF test framework with regards to xdp_frame info misplacement
         in the "live packet" code, from Alexander Lobakin.
      
      3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX,
         from Liu Jian.
      
      4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n,
         from Randy Dunlap.
      
      5) Several BPF doc fixes with either broken links or external instead
         of internal doc links, from Bagas Sanjaya.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: check that modifier resolves after pointer
        btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
        bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
        bpf, doc: Link to submitting-patches.rst for general patch submission info
        bpf, doc: Do not link to docs.kernel.org for kselftest link
        bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
        riscv, bpf: Fix patch_text implicit declaration
        bpf, docs: Fix link to BTF doc
      ====================
      
      Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      757b56a6
  2. 06 Mar, 2023 13 commits
    • Jakub Kicinski's avatar
      net: tls: fix device-offloaded sendpage straddling records · e539a105
      Jakub Kicinski authored
      Adrien reports that incorrect data is transmitted when a single
      page straddles multiple records. We would transmit the same
      data in all iterations of the loop.
      Reported-by: default avatarAdrien Moulin <amoulin@corp.free.fr>
      Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr
      Fixes: c1318b39 ("tls: Add opt-in zerocopy mode of sendfile()")
      Tested-by: default avatarAdrien Moulin <amoulin@corp.free.fr>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarMaxim Mikityanskiy <maxtram95@gmail.com>
      Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e539a105
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix RX data corruption issue · 193250ac
      Daniel Golle authored
      Fix data corruption issue with SerDes connected PHYs operating at 1.25
      Gbps speed where we could previously observe about 30% packet loss while
      the bad packet counter was increasing.
      
      As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531
      switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using
      rate-adaptation to 2500Base-X mode, this issue only got exposed now when
      we started trying to use SFP modules operating with 1.25 Gbps with the
      BananaPi R3 board.
      
      The fix is to set bit 12 which disables the RX FIFO clear function when
      setting up MAC MCR, MediaTek SDK did the same change stating:
      "If without this patch, kernel might receive invalid packets that are
      corrupted by GMAC."[1]
      
      [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
      
      Fixes: 42c03844 ("net-next: mediatek: add support for MediaTek MT7622 SoC")
      Tested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      193250ac
    • Heiner Kallweit's avatar
      net: phy: smsc: fix link up detection in forced irq mode · 58aac3a2
      Heiner Kallweit authored
      Currently link up can't be detected in forced mode if polling
      isn't used. Only link up interrupt source we have is aneg
      complete which isn't applicable in forced mode. Therefore we
      have to use energy-on as link up indicator.
      
      Fixes: 73654945 ("net: phy: smsc: skip ENERGYON interrupt if disabled")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58aac3a2
    • Martin KaFai Lau's avatar
      Merge branch 'fix resolving VAR after DATASEC' · 32dfc59e
      Martin KaFai Lau authored
      Lorenz Bauer says:
      
      ====================
      
      See the first patch for a detailed explanation.
      
      v2:
      - Move RESOLVE_TBD assignment out of the loop (Martin)
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      32dfc59e
    • Lorenz Bauer's avatar
      selftests/bpf: check that modifier resolves after pointer · dfdd608c
      Lorenz Bauer authored
      Add a regression test that ensures that a VAR pointing at a
      modifier which follows a PTR (or STRUCT or ARRAY) is resolved
      correctly by the datasec validator.
      Signed-off-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      dfdd608c
    • Lorenz Bauer's avatar
      btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR · 9b459804
      Lorenz Bauer authored
      btf_datasec_resolve contains a bug that causes the following BTF
      to fail loading:
      
          [1] DATASEC a size=2 vlen=2
              type_id=4 offset=0 size=1
              type_id=7 offset=1 size=1
          [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
          [3] PTR (anon) type_id=2
          [4] VAR a type_id=3 linkage=0
          [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
          [6] TYPEDEF td type_id=5
          [7] VAR b type_id=6 linkage=0
      
      This error message is printed during btf_check_all_types:
      
          [1] DATASEC a size=2 vlen=2
              type_id=7 offset=1 size=1 Invalid type
      
      By tracing btf_*_resolve we can pinpoint the problem:
      
          btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0
              btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0
                  btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0
              btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0
          btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22
      
      The last invocation of btf_datasec_resolve should invoke btf_var_resolve
      by means of env_stack_push, instead it returns EINVAL. The reason is that
      env_stack_push is never executed for the second VAR.
      
          if (!env_type_is_resolve_sink(env, var_type) &&
              !env_type_is_resolved(env, var_type_id)) {
              env_stack_set_next_member(env, i + 1);
              return env_stack_push(env, var_type, var_type_id);
          }
      
      env_type_is_resolve_sink() changes its behaviour based on resolve_mode.
      For RESOLVE_PTR, we can simplify the if condition to the following:
      
          (btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved()
      
      Since we're dealing with a VAR the clause evaluates to false. This is
      not sufficient to trigger the bug however. The log output and EINVAL
      are only generated if btf_type_id_size() fails.
      
          if (!btf_type_id_size(btf, &type_id, &type_size)) {
              btf_verifier_log_vsi(env, v->t, vsi, "Invalid type");
              return -EINVAL;
          }
      
      Most types are sized, so for example a VAR referring to an INT is not a
      problem. The bug is only triggered if a VAR points at a modifier. Since
      we skipped btf_var_resolve that modifier was also never resolved, which
      means that btf_resolved_type_id returns 0 aka VOID for the modifier.
      This in turn causes btf_type_id_size to return NULL, triggering EINVAL.
      
      To summarise, the following conditions are necessary:
      
      - VAR pointing at PTR, STRUCT, UNION or ARRAY
      - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or
        TYPE_TAG
      
      The fix is to reset resolve_mode to RESOLVE_TBD before attempting to
      resolve a VAR from a DATASEC.
      
      Fixes: 1dc92851 ("bpf: kernel side support for BTF Var and DataSec")
      Signed-off-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      9b459804
    • Alexander Lobakin's avatar
      bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES · 294635a8
      Alexander Lobakin authored
      &xdp_buff and &xdp_frame are bound in a way that
      
      xdp_buff->data_hard_start == xdp_frame
      
      It's always the case and e.g. xdp_convert_buff_to_frame() relies on
      this.
      IOW, the following:
      
      	for (u32 i = 0; i < 0xdead; i++) {
      		xdpf = xdp_convert_buff_to_frame(&xdp);
      		xdp_convert_frame_to_buff(xdpf, &xdp);
      	}
      
      shouldn't ever modify @xdpf's contents or the pointer itself.
      However, "live packet" code wrongly treats &xdp_frame as part of its
      context placed *before* the data_hard_start. With such flow,
      data_hard_start is sizeof(*xdpf) off to the right and no longer points
      to the XDP frame.
      
      Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
      places and praying that there are no more miscalcs left somewhere in the
      code, unionize ::frm with ::data in a flex array, so that both starts
      pointing to the actual data_hard_start and the XDP frame actually starts
      being a part of it, i.e. a part of the headroom, not the context.
      A nice side effect is that the maximum frame size for this mode gets
      increased by 40 bytes, as xdp_buff::frame_sz includes everything from
      data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
      info.
      Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
      hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
      
      Minor: align `&head->data` with how `head->frm` is assigned for
      consistency.
      Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
      clarity.
      
      (was found while testing XDP traffic generator on ice, which calls
       xdp_convert_frame_to_buff() for each XDP frame)
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      294635a8
    • Bagas Sanjaya's avatar
      bpf, doc: Link to submitting-patches.rst for general patch submission info · b7abcd9c
      Bagas Sanjaya authored
      The link for patch submission information in general refers to index
      page for "Working with the kernel development community" section of
      kernel docs, whereas the link should have been
      Documentation/process/submitting-patches.rst instead.
      
      Fix it by replacing the index target with the appropriate doc.
      
      Fixes: 54222838 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230228074523.11493-3-bagasdotme@gmail.com
      b7abcd9c
    • Bagas Sanjaya's avatar
      bpf, doc: Do not link to docs.kernel.org for kselftest link · 32db18d6
      Bagas Sanjaya authored
      The question on how to run BPF selftests have a reference link to kernel
      selftest documentation (Documentation/dev-tools/kselftest.rst). However,
      it uses external link to the documentation at kernel.org/docs (aka
      docs.kernel.org) instead, which requires Internet access.
      
      Fix this and replace the link with internal linking, by using :doc: directive
      while keeping the anchor text.
      
      Fixes: b7a27c3a ("bpf, doc: howto use/run the BPF selftests")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230228074523.11493-2-bagasdotme@gmail.com
      32db18d6
    • Selvin Xavier's avatar
      bnxt_en: Fix the double free during device removal · 89b59a84
      Selvin Xavier authored
      Following warning reported by KASAN during driver unload
      
      ==================================================================
      BUG: KASAN: double-free in bnxt_remove_one+0x103/0x200 [bnxt_en]
      Free of addr ffff88814e8dd4c0 by task rmmod/17469
      CPU: 47 PID: 17469 Comm: rmmod Kdump: loaded Tainted: G S                 6.2.0-rc7+ #2
      Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.3.10 08/15/2019
      Call Trace:
       <TASK>
       dump_stack_lvl+0x33/0x46
       print_report+0x17b/0x4b3
       ? __call_rcu_common.constprop.79+0x27e/0x8c0
       ? __pfx_free_object_rcu+0x10/0x10
       ? __virt_addr_valid+0xe3/0x160
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       kasan_report_invalid_free+0x64/0xd0
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       __kasan_slab_free+0x179/0x1c0
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       __kmem_cache_free+0x194/0x350
       bnxt_remove_one+0x103/0x200 [bnxt_en]
       pci_device_remove+0x62/0x110
       device_release_driver_internal+0xf6/0x1c0
       driver_detach+0x76/0xe0
       bus_remove_driver+0x89/0x160
       pci_unregister_driver+0x26/0x110
       ? strncpy_from_user+0x188/0x1c0
       bnxt_exit+0xc/0x24 [bnxt_en]
       __x64_sys_delete_module+0x21f/0x390
       ? __pfx___x64_sys_delete_module+0x10/0x10
       ? __pfx_mem_cgroup_handle_over_high+0x10/0x10
       ? _raw_spin_lock+0x87/0xe0
       ? __pfx__raw_spin_lock+0x10/0x10
       ? __audit_syscall_entry+0x185/0x210
       ? ktime_get_coarse_real_ts64+0x51/0x80
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x37/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7effcb6fd71b
      Code: 73 01 c3 48 8b 0d 6d 17 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 17 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffeada270b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 00005623660e0750 RCX: 00007effcb6fd71b
      RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005623660e07b8
      RBP: 0000000000000000 R08: 00007ffeada26031 R09: 0000000000000000
      R10: 00007effcb771280 R11: 0000000000000206 R12: 00007ffeada272e0
      R13: 00007ffeada28bc4 R14: 00005623660e02a0 R15: 00005623660e0750
       </TASK>
      
      Auxiliary device structures are freed in bnxt_aux_dev_release. So avoid
      calling kfree from bnxt_remove_one.
      
      Also, set bp->edev to NULL before freeing the auxilary private structure.
      
      Fixes: d80d88b0 ("bnxt_en: Add auxiliary driver support")
      Reviewed-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89b59a84
    • Michael Chan's avatar
      bnxt_en: Avoid order-5 memory allocation for TPA data · accd7e23
      Michael Chan authored
      The driver needs to keep track of all the possible concurrent TPA (GRO/LRO)
      completions on the aggregation ring.  On P5 chips, the maximum number
      of concurrent TPA is 256 and the amount of memory we allocate is order-5
      on systems using 4K pages.  Memory allocation failure has been reported:
      
      NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
      CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1
      Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022
      Call Trace:
       dump_stack+0x57/0x6e
       warn_alloc.cold.120+0x7b/0xdd
       ? _cond_resched+0x15/0x30
       ? __alloc_pages_direct_compact+0x15f/0x170
       __alloc_pages_slowpath.constprop.108+0xc58/0xc70
       __alloc_pages_nodemask+0x2d0/0x300
       kmalloc_order+0x24/0xe0
       kmalloc_order_trace+0x19/0x80
       bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en]
       ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en]
       __bnxt_open_nic+0x12e/0x780 [bnxt_en]
       bnxt_open+0x10b/0x240 [bnxt_en]
       __dev_open+0xe9/0x180
       __dev_change_flags+0x1af/0x220
       dev_change_flags+0x21/0x60
       do_setlink+0x35c/0x1100
      
      Instead of allocating this big chunk of memory and dividing it up for the
      concurrent TPA instances, allocate each small chunk separately for each
      TPA instance.  This will reduce it to order-0 allocations.
      
      Fixes: 79632e9b ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.")
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      accd7e23
    • Russell King (Oracle)'s avatar
      net: phylib: get rid of unnecessary locking · f4b47a2e
      Russell King (Oracle) authored
      The locking in phy_probe() and phy_remove() does very little to prevent
      any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA
      warnings. Remove it.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.2.0-dirty #1108 Tainted: G        W   E
      ------------------------------------------------------
      ip/415 is trying to acquire lock:
      ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy]
      
      but task is already holding lock:
      ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}-{3:3}:
             __lock_acquire+0x35c/0x6c0
             lock_acquire.part.0+0xcc/0x220
             lock_acquire+0x68/0x84
             __mutex_lock+0x8c/0x414
             mutex_lock_nested+0x34/0x40
             rtnl_lock+0x24/0x30
             sfp_bus_add_upstream+0x34/0x150
             phy_sfp_probe+0x4c/0x94 [libphy]
             mv3310_probe+0x148/0x184 [marvell10g]
             phy_probe+0x8c/0x200 [libphy]
             call_driver_probe+0xbc/0x15c
             really_probe+0xc0/0x320
             __driver_probe_device+0x84/0x120
             driver_probe_device+0x44/0x120
             __device_attach_driver+0xc4/0x160
             bus_for_each_drv+0x80/0xe0
             __device_attach+0xb0/0x1f0
             device_initial_probe+0x1c/0x2c
             bus_probe_device+0xa4/0xb0
             device_add+0x360/0x53c
             phy_device_register+0x60/0xa4 [libphy]
             fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio]
             fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio]
             of_mdiobus_register+0x140/0x340 [of_mdio]
             orion_mdio_probe+0x298/0x3c0 [mvmdio]
             platform_probe+0x70/0xe0
             call_driver_probe+0x34/0x15c
             really_probe+0xc0/0x320
             __driver_probe_device+0x84/0x120
             driver_probe_device+0x44/0x120
             __driver_attach+0x104/0x210
             bus_for_each_dev+0x78/0xdc
             driver_attach+0x2c/0x3c
             bus_add_driver+0x184/0x240
             driver_register+0x80/0x13c
             __platform_driver_register+0x30/0x3c
             xt_compat_calc_jump+0x28/0xa4 [x_tables]
             do_one_initcall+0x50/0x1b0
             do_init_module+0x50/0x1fc
             load_module+0x684/0x744
             __do_sys_finit_module+0xc4/0x140
             __arm64_sys_finit_module+0x28/0x34
             invoke_syscall+0x50/0x120
             el0_svc_common.constprop.0+0x6c/0x1b0
             do_el0_svc+0x34/0x44
             el0_svc+0x48/0xf0
             el0t_64_sync_handler+0xb8/0xc0
             el0t_64_sync+0x1a0/0x1a4
      
      -> #0 (&dev->lock){+.+.}-{3:3}:
             check_prev_add+0xb4/0xc80
             validate_chain+0x414/0x47c
             __lock_acquire+0x35c/0x6c0
             lock_acquire.part.0+0xcc/0x220
             lock_acquire+0x68/0x84
             __mutex_lock+0x8c/0x414
             mutex_lock_nested+0x34/0x40
             phy_attach_direct+0x17c/0x3a0 [libphy]
             phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink]
             phylink_fwnode_phy_connect+0x48/0x60 [phylink]
             mvpp2_open+0xec/0x2e0 [mvpp2]
             __dev_open+0x104/0x214
             __dev_change_flags+0x1d4/0x254
             dev_change_flags+0x2c/0x7c
             do_setlink+0x254/0xa50
             __rtnl_newlink+0x430/0x514
             rtnl_newlink+0x58/0x8c
             rtnetlink_rcv_msg+0x17c/0x560
             netlink_rcv_skb+0x64/0x150
             rtnetlink_rcv+0x20/0x30
             netlink_unicast+0x1d4/0x2b4
             netlink_sendmsg+0x1a4/0x400
             ____sys_sendmsg+0x228/0x290
             ___sys_sendmsg+0x88/0xec
             __sys_sendmsg+0x70/0xd0
             __arm64_sys_sendmsg+0x2c/0x40
             invoke_syscall+0x50/0x120
             el0_svc_common.constprop.0+0x6c/0x1b0
             do_el0_svc+0x34/0x44
             el0_svc+0x48/0xf0
             el0t_64_sync_handler+0xb8/0xc0
             el0t_64_sync+0x1a0/0x1a4
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock(&dev->lock);
                                     lock(rtnl_mutex);
        lock(&dev->lock);
      
       *** DEADLOCK ***
      
      Fixes: 298e54fa ("net: phy: add core phylib sfp support")
      Reported-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4b47a2e
    • Rongguang Wei's avatar
      net: stmmac: add to set device wake up flag when stmmac init phy · a9334b70
      Rongguang Wei authored
      When MAC is not support PMT, driver will check PHY's WoL capability
      and set device wakeup capability in stmmac_init_phy(). We can enable
      the WoL through ethtool, the driver would enable the device wake up
      flag. Now the device_may_wakeup() return true.
      
      But if there is a way which enable the PHY's WoL capability derectly,
      like in BIOS. The driver would not know the enable thing and would not
      set the device wake up flag. The phy_suspend may failed like this:
      
      [   32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16
      [   32.409065] PM: Device stmmac-1:00 failed to suspend: error -16
      [   32.409067] PM: Some devices failed to suspend, or early wake event detected
      
      Add to set the device wakeup enable flag according to the get_wol
      function result in PHY can fix the error in this scene.
      
      v2: add a Fixes tag.
      
      Fixes: 1d8e5b0f ("net: stmmac: Support WOL with phy")
      Signed-off-by: default avatarRongguang Wei <weirongguang@kylinos.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9334b70
  3. 03 Mar, 2023 13 commits
    • Liu Jian's avatar
      bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() · d900f3d2
      Liu Jian authored
      When the buffer length of the recvmsg system call is 0, we got the
      flollowing soft lockup problem:
      
      watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149]
      CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:remove_wait_queue+0xb/0xc0
      Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20
      RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768
      RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040
      RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7
      R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800
      R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0
      FS:  00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_msg_wait_data+0x279/0x2f0
       tcp_bpf_recvmsg_parser+0x3c6/0x490
       inet_recvmsg+0x280/0x290
       sock_recvmsg+0xfc/0x120
       ____sys_recvmsg+0x160/0x3d0
       ___sys_recvmsg+0xf0/0x180
       __sys_recvmsg+0xea/0x1a0
       do_syscall_64+0x3f/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The logic in tcp_bpf_recvmsg_parser is as follows:
      
      msg_bytes_ready:
      	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
      	if (!copied) {
      		wait data;
      		goto msg_bytes_ready;
      	}
      
      In this case, "copied" always is 0, the infinite loop occurs.
      
      According to the Linux system call man page, 0 should be returned in this
      case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly
      return. Also modify several other functions with the same problem.
      
      Fixes: 1f5be6b3 ("udp: Implement udp_bpf_recvmsg() for sockmap")
      Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()")
      Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com
      d900f3d2
    • David S. Miller's avatar
      Merge branch 'nfp-ipsec-csum' · 52812526
      David S. Miller authored
      Simon Horman says:
      
      ====================
      nfp: fix incorrect IPsec checksum handling
      
      this short series resolves two problems with IPsec checksum handling
      in the nfp driver.
      
      * PATCH 1/3, 2/3: Correct setting of checksum flags.
        One patch for each of the nfd3 and nfdk datapaths.
      
      * Patch 3/3: Correct configuration of NETIF_F_CSUM_MASK
        so that the stack does not unecessarily calculate csums for
        IPsec offload packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52812526
    • Huanhuan Wang's avatar
      nfp: fix esp-tx-csum-offload doesn't take effect · 1cf78d4c
      Huanhuan Wang authored
      When esp-tx-csum-offload is set to on, the protocol stack shouldn't
      calculate the IPsec offload packet's csum, but it does. Because the
      callback `.ndo_features_check` incorrectly masked NETIF_F_CSUM_MASK bit.
      
      Fixes: 57f273ad ("nfp: add framework to support ipsec offloading")
      Signed-off-by: default avatarHuanhuan Wang <huanhuan.wang@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cf78d4c
    • Huanhuan Wang's avatar
      nfp: fix incorrectly set csum flag for nfdk path · 8b46168c
      Huanhuan Wang authored
      The csum flag of IPsec packet are set repeatedly. Therefore, the csum
      flag set of IPsec and non-IPsec packet need to be distinguished.
      
      As the ipv6 header does not have a csum field, so l3-csum flag is not
      required to be set for ipv6 case.
      
      Fixes: 436396f2 ("nfp: support IPsec offloading for NFP3800")
      Signed-off-by: default avatarHuanhuan Wang <huanhuan.wang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b46168c
    • Huanhuan Wang's avatar
      nfp: fix incorrectly set csum flag for nfd3 path · 3e04419c
      Huanhuan Wang authored
      The csum flag of IPsec packet are set repeatedly. Therefore, the csum
      flag set of IPsec and non-IPsec packet need to be distinguished.
      
      As the ipv6 header does not have a csum field, so l3-csum flag is not
      required to be set for ipv6 case.
      
      L4-csum flag include the tcp csum flag and udp csum flag, we shouldn't
      set the udp and tcp csum flag at the same time for one packet, should
      set l4-csum flag according to the transport layer is tcp or udp.
      
      Fixes: 57f273ad ("nfp: add framework to support ipsec offloading")
      Signed-off-by: default avatarHuanhuan Wang <huanhuan.wang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e04419c
    • Petr Oros's avatar
      ice: copy last block omitted in ice_get_module_eeprom() · 84cba184
      Petr Oros authored
      ice_get_module_eeprom() is broken since commit e9c9692c ("ice:
      Reimplement module reads used by ethtool") In this refactor,
      ice_get_module_eeprom() reads the eeprom in blocks of size 8.
      But the condition that should protect the buffer overflow
      ignores the last block. The last block always contains zeros.
      
      Bug uncovered by ethtool upstream commit 9538f384b535
      ("netlink: eeprom: Defer page requests to individual parsers")
      After this commit, ethtool reads a block with length = 1;
      to read the SFF-8024 identifier value.
      
      unpatched driver:
      $ ethtool -m enp65s0f0np0 offset 0x90 length 8
      Offset          Values
      ------          ------
      0x0090:         00 00 00 00 00 00 00 00
      $ ethtool -m enp65s0f0np0 offset 0x90 length 12
      Offset          Values
      ------          ------
      0x0090:         00 00 01 a0 4d 65 6c 6c 00 00 00 00
      $
      
      $ ethtool -m enp65s0f0np0
      Offset          Values
      ------          ------
      0x0000:         11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0010:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0020:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0030:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0060:         00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00
      0x0070:         00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      patched driver:
      $ ethtool -m enp65s0f0np0 offset 0x90 length 8
      Offset          Values
      ------          ------
      0x0090:         00 00 01 a0 4d 65 6c 6c
      $ ethtool -m enp65s0f0np0 offset 0x90 length 12
      Offset          Values
      ------          ------
      0x0090:         00 00 01 a0 4d 65 6c 6c 61 6e 6f 78
      $ ethtool -m enp65s0f0np0
          Identifier                                : 0x11 (QSFP28)
          Extended identifier                       : 0x00
          Extended identifier description           : 1.5W max. Power consumption
          Extended identifier description           : No CDR in TX, No CDR in RX
          Extended identifier description           : High Power Class (> 3.5 W) not enabled
          Connector                                 : 0x23 (No separable connector)
          Transceiver codes                         : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
          Transceiver type                          : 40G Ethernet: 40G Base-CR4
          Transceiver type                          : 25G Ethernet: 25G Base-CR CA-N
          Encoding                                  : 0x05 (64B/66B)
          BR, Nominal                               : 25500Mbps
          Rate identifier                           : 0x00
          Length (SMF,km)                           : 0km
          Length (OM3 50um)                         : 0m
          Length (OM2 50um)                         : 0m
          Length (OM1 62.5um)                       : 0m
          Length (Copper or Active cable)           : 1m
          Transmitter technology                    : 0xa0 (Copper cable unequalized)
          Attenuation at 2.5GHz                     : 4db
          Attenuation at 5.0GHz                     : 5db
          Attenuation at 7.0GHz                     : 7db
          Attenuation at 12.9GHz                    : 10db
          ........
          ....
      
      Fixes: e9c9692c ("ice: Reimplement module reads used by ethtool")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84cba184
    • David S. Miller's avatar
      Merge branch 'net-tools-ynl-fixes' · 8f632a0a
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      tools: ynl: fix subset use and change default value for attrs/ops
      
      Fix a problem in subsetting, which will become apparent when
      the devlink family comes after the merge window. Even tho none
      of the existing families need this, we don't want someone to
      get "inspired" by the current, incorrect code when using specs
      in other languages.
      
      Change the default value for the first attr/op. This is a slight
      behavior change so needs to go in now. The diffstat of the last
      patch should serve as the clearest justification there..
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f632a0a
    • Jakub Kicinski's avatar
      netlink: specs: update for codegen enumerating from 1 · bcec7171
      Jakub Kicinski authored
      Now that the codegen rules had been changed we can update
      the specs to reflect the new default.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcec7171
    • Jakub Kicinski's avatar
      tools: ynl: use 1 as the default for first entry in attrs/ops · ad4fafcd
      Jakub Kicinski authored
      Pretty much all families use value: 1 or reserve as unspec
      the first entry in attribute set and the first operation.
      Make this the default. Update documentation (the doc for
      values of operations just refers back to doc for attrs
      so updating only attrs).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad4fafcd
    • Jakub Kicinski's avatar
      tools: ynl: fully inherit attrs in subsets · 7cf93538
      Jakub Kicinski authored
      To avoid having to repeat the entire definition of an attribute
      (including the value) use the Attr object from the original set.
      In fact this is already the documented expectation.
      
      Fixes: be5bea1c ("net: add basic C code generators for Netlink")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cf93538
    • Jakub Kicinski's avatar
      Merge tag 'ieee802154-for-net-2023-03-02' of... · ad93bab6
      Jakub Kicinski authored
      Merge tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
      
      Stefan Schmidt says:
      
      ====================
      ieee802154 for net 2023-03-02
      
      Two small fixes this time.
      
      Alexander Aring fixed a potential negative array access in the ca8210
      driver.
      
      Miquel Raynal fixed a crash that could have been triggered through
      the extended netlink API for 802154. This only came in this merge window.
      Found by syzkaller.
      
      * tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
        ieee802154: Prevent user from crashing the host
        ca8210: fix mac_len negative array access
      ====================
      
      Link: https://lore.kernel.org/r/20230302153032.1312755-1-stefan@datenfreihafen.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ad93bab6
    • Shigeru Yoshida's avatar
      net: caif: Fix use-after-free in cfusbl_device_notify() · 9781e98a
      Shigeru Yoshida authored
      syzbot reported use-after-free in cfusbl_device_notify() [1].  This
      causes a stack trace like below:
      
      BUG: KASAN: use-after-free in cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
      Read of size 8 at addr ffff88807ac4e6f0 by task kworker/u4:6/1214
      
      CPU: 0 PID: 1214 Comm: kworker/u4:6 Not tainted 5.19.0-rc3-syzkaller-00146-g92f20ff7 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1945
       call_netdevice_notifiers_extack net/core/dev.c:1983 [inline]
       call_netdevice_notifiers net/core/dev.c:1997 [inline]
       netdev_wait_allrefs_any net/core/dev.c:10227 [inline]
       netdev_run_todo+0xbc0/0x10f0 net/core/dev.c:10341
       default_device_exit_batch+0x44e/0x590 net/core/dev.c:11334
       ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
       cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
       </TASK>
      
      When unregistering a net device, unregister_netdevice_many_notify()
      sets the device's reg_state to NETREG_UNREGISTERING, calls notifiers
      with NETDEV_UNREGISTER, and adds the device to the todo list.
      
      Later on, devices in the todo list are processed by netdev_run_todo().
      netdev_run_todo() waits devices' reference count become 1 while
      rebdoadcasting NETDEV_UNREGISTER notification.
      
      When cfusbl_device_notify() is called with NETDEV_UNREGISTER multiple
      times, the parent device might be freed.  This could cause UAF.
      Processing NETDEV_UNREGISTER multiple times also causes inbalance of
      reference count for the module.
      
      This patch fixes the issue by accepting only first NETDEV_UNREGISTER
      notification.
      
      Fixes: 7ad65bf6 ("caif: Add support for CAIF over CDC NCM USB interface")
      CC: sjur.brandeland@stericsson.com <sjur.brandeland@stericsson.com>
      Reported-by: syzbot+b563d33852b893653a9e@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?id=c3bfd8e2450adab3bffe4d80821fbbced600407f [1]
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Link: https://lore.kernel.org/r/20230301163913.391304-1-syoshida@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9781e98a
    • Yuiko Oshino's avatar
      net: lan78xx: fix accessing the LAN7800's internal phy specific registers from the MAC driver · e57cf363
      Yuiko Oshino authored
      Move the LAN7800 internal phy (phy ID  0x0007c132) specific register
      accesses to the phy driver (microchip.c).
      
      Fix the error reported by Enguerrand de Ribaucourt in December 2022,
      "Some operations during the cable switch workaround modify the register
      LAN88XX_INT_MASK of the PHY. However, this register is specific to the
      LAN8835 PHY. For instance, if a DP8322I PHY is connected to the LAN7801,
      that register (0x19), corresponds to the LED and MAC address
      configuration, resulting in unapropriate behavior."
      
      I did not test with the DP8322I PHY, but I tested with an EVB-LAN7800
      with the internal PHY.
      
      Fixes: 14437e3f ("lan78xx: workaround of forced 100 Full/Half duplex mode error")
      Signed-off-by: default avatarYuiko Oshino <yuiko.oshino@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230301154307.30438-1-yuiko.oshino@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e57cf363
  4. 02 Mar, 2023 6 commits
  5. 01 Mar, 2023 7 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nft_quota: copy content when cloning expression · aabef97a
      Pablo Neira Ayuso authored
      If the ruleset contains consumed quota, restore them accordingly.
      Otherwise, listing after restoration shows never used items.
      
      Restore the user-defined quota and flags too.
      
      Fixes: ed0a0c60 ("netfilter: nft_quota: move stateful fields out of expression data")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      aabef97a
    • Pablo Neira Ayuso's avatar
      netfilter: nft_last: copy content when cloning expression · 860e8742
      Pablo Neira Ayuso authored
      If the ruleset contains last timestamps, restore them accordingly.
      Otherwise, listing after restoration shows never used items.
      
      Fixes: 33a24de3 ("netfilter: nft_last: move stateful fields out of expression data")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      860e8742
    • Hangbin Liu's avatar
      selftests: nft_nat: ensuring the listening side is up before starting the client · 2067e7a0
      Hangbin Liu authored
      The test_local_dnat_portonly() function initiates the client-side as
      soon as it sets the listening side to the background. This could lead to
      a race condition where the server may not be ready to listen. To ensure
      that the server-side is up and running before initiating the
      client-side, a delay is introduced to the test_local_dnat_portonly()
      function.
      
      Before the fix:
        # ./nft_nat.sh
        PASS: netns routing/connectivity: ns0-rthlYrBU can reach ns1-rthlYrBU and ns2-rthlYrBU
        PASS: ping to ns1-rthlYrBU was ip NATted to ns2-rthlYrBU
        PASS: ping to ns1-rthlYrBU OK after ip nat output chain flush
        PASS: ipv6 ping to ns1-rthlYrBU was ip6 NATted to ns2-rthlYrBU
        2023/02/27 04:11:03 socat[6055] E connect(5, AF=2 10.0.1.99:2000, 16): Connection refused
        ERROR: inet port rewrite
      
      After the fix:
        # ./nft_nat.sh
        PASS: netns routing/connectivity: ns0-9sPJV6JJ can reach ns1-9sPJV6JJ and ns2-9sPJV6JJ
        PASS: ping to ns1-9sPJV6JJ was ip NATted to ns2-9sPJV6JJ
        PASS: ping to ns1-9sPJV6JJ OK after ip nat output chain flush
        PASS: ipv6 ping to ns1-9sPJV6JJ was ip6 NATted to ns2-9sPJV6JJ
        PASS: inet port rewrite without l3 address
      
      Fixes: 282e5f8f ("netfilter: nat: really support inet nat without l3 address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2067e7a0
    • Horatiu Vultur's avatar
      net: lan966x: Fix port police support using tc-matchall · 81563d85
      Horatiu Vultur authored
      When the police was removed from the port, then it was trying to
      remove the police from the police id and not from the actual
      police index.
      The police id represents the id of the police and police index
      represents the position in HW where the police is situated.
      The port police id can be any number while the port police index
      is a number based on the port chip port.
      Fix this by deleting the police from HW that is situated at the
      police index and not police id.
      
      Fixes: 5390334b ("net: lan966x: Add port police support using tc-matchall")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81563d85
    • Eric Dumazet's avatar
      net/sched: flower: fix fl_change() error recovery path · dfd2f0eb
      Eric Dumazet authored
      The two "goto errout;" paths in fl_change() became wrong
      after cited commit.
      
      Indeed we only must not call __fl_put() until the net pointer
      has been set in tcf_exts_init_ex()
      
      This is a minimal fix. We might in the future validate TCA_FLOWER_FLAGS
      before we allocate @fnew.
      
      BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:72 [inline]
      BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
      BUG: KASAN: null-ptr-deref in refcount_read include/linux/refcount.h:147 [inline]
      BUG: KASAN: null-ptr-deref in __refcount_add_not_zero include/linux/refcount.h:152 [inline]
      BUG: KASAN: null-ptr-deref in __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
      BUG: KASAN: null-ptr-deref in refcount_inc_not_zero include/linux/refcount.h:245 [inline]
      BUG: KASAN: null-ptr-deref in maybe_get_net include/net/net_namespace.h:269 [inline]
      BUG: KASAN: null-ptr-deref in tcf_exts_get_net include/net/pkt_cls.h:260 [inline]
      BUG: KASAN: null-ptr-deref in __fl_put net/sched/cls_flower.c:513 [inline]
      BUG: KASAN: null-ptr-deref in __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508
      Read of size 4 at addr 000000000000014c by task syz-executor548/5082
      
      CPU: 0 PID: 5082 Comm: syz-executor548 Not tainted 6.2.0-syzkaller-05251-g5b7c4cab #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
      print_report mm/kasan/report.c:420 [inline]
      kasan_report+0xec/0x130 mm/kasan/report.c:517
      check_region_inline mm/kasan/generic.c:183 [inline]
      kasan_check_range+0x141/0x190 mm/kasan/generic.c:189
      instrument_atomic_read include/linux/instrumented.h:72 [inline]
      atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
      refcount_read include/linux/refcount.h:147 [inline]
      __refcount_add_not_zero include/linux/refcount.h:152 [inline]
      __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
      refcount_inc_not_zero include/linux/refcount.h:245 [inline]
      maybe_get_net include/net/net_namespace.h:269 [inline]
      tcf_exts_get_net include/net/pkt_cls.h:260 [inline]
      __fl_put net/sched/cls_flower.c:513 [inline]
      __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508
      fl_change+0x101b/0x4ab0 net/sched/cls_flower.c:2341
      tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310
      rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
      netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
      sock_sendmsg_nosec net/socket.c:722 [inline]
      sock_sendmsg+0xde/0x190 net/socket.c:745
      ____sys_sendmsg+0x334/0x900 net/socket.c:2504
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2558
      __sys_sendmmsg+0x18f/0x460 net/socket.c:2644
      __do_sys_sendmmsg net/socket.c:2673 [inline]
      __se_sys_sendmmsg net/socket.c:2670 [inline]
      __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2670
      
      Fixes: 08a0063d ("net/sched: flower: Move filter handle initialization earlier")
      Reported-by: syzbot+baabf3efa7c1e57d28b2@syzkaller.appspotmail.com
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfd2f0eb
    • Eric Dumazet's avatar
      ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping() · 693aa2c0
      Eric Dumazet authored
      ila_xlat_nl_cmd_get_mapping() generates an empty skb,
      triggerring a recent sanity check [1].
      
      Instead, return an error code, so that user space
      can get it.
      
      [1]
      skb_assert_len
      WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 skb_assert_len include/linux/skbuff.h:2527 [inline]
      WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
      Modules linked in:
      CPU: 0 PID: 5923 Comm: syz-executor269 Not tainted 6.2.0-syzkaller-18300-g2ebd1fbb946d #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
      pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : skb_assert_len include/linux/skbuff.h:2527 [inline]
      pc : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
      lr : skb_assert_len include/linux/skbuff.h:2527 [inline]
      lr : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
      sp : ffff80001e0d6c40
      x29: ffff80001e0d6e60 x28: dfff800000000000 x27: ffff0000c86328c0
      x26: dfff800000000000 x25: ffff0000c8632990 x24: ffff0000c8632a00
      x23: 0000000000000000 x22: 1fffe000190c6542 x21: ffff0000c8632a10
      x20: ffff0000c8632a00 x19: ffff80001856e000 x18: ffff80001e0d5fc0
      x17: 0000000000000000 x16: ffff80001235d16c x15: 0000000000000000
      x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001
      x11: ff80800008353a30 x10: 0000000000000000 x9 : 21567eaf25bfb600
      x8 : 21567eaf25bfb600 x7 : 0000000000000001 x6 : 0000000000000001
      x5 : ffff80001e0d6558 x4 : ffff800015c74760 x3 : ffff800008596744
      x2 : 0000000000000001 x1 : 0000000100000000 x0 : 000000000000000e
      Call trace:
      skb_assert_len include/linux/skbuff.h:2527 [inline]
      __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
      dev_queue_xmit include/linux/netdevice.h:3033 [inline]
      __netlink_deliver_tap_skb net/netlink/af_netlink.c:307 [inline]
      __netlink_deliver_tap+0x45c/0x6f8 net/netlink/af_netlink.c:325
      netlink_deliver_tap+0xf4/0x174 net/netlink/af_netlink.c:338
      __netlink_sendskb net/netlink/af_netlink.c:1283 [inline]
      netlink_sendskb+0x6c/0x154 net/netlink/af_netlink.c:1292
      netlink_unicast+0x334/0x8d4 net/netlink/af_netlink.c:1380
      nlmsg_unicast include/net/netlink.h:1099 [inline]
      genlmsg_unicast include/net/genetlink.h:433 [inline]
      genlmsg_reply include/net/genetlink.h:443 [inline]
      ila_xlat_nl_cmd_get_mapping+0x620/0x7d0 net/ipv6/ila/ila_xlat.c:493
      genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline]
      genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
      genl_rcv_msg+0x938/0xc1c net/netlink/genetlink.c:1065
      netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
      genl_rcv+0x38/0x50 net/netlink/genetlink.c:1076
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      ____sys_sendmsg+0x558/0x844 net/socket.c:2479
      ___sys_sendmsg net/socket.c:2533 [inline]
      __sys_sendmsg+0x26c/0x33c net/socket.c:2562
      __do_sys_sendmsg net/socket.c:2571 [inline]
      __se_sys_sendmsg net/socket.c:2569 [inline]
      __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2569
      __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
      invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
      el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
      do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
      el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
      el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
      el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
      irq event stamp: 136484
      hardirqs last enabled at (136483): [<ffff800008350244>] __up_console_sem+0x60/0xb4 kernel/printk/printk.c:345
      hardirqs last disabled at (136484): [<ffff800012358d60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:405
      softirqs last enabled at (136418): [<ffff800008020ea8>] softirq_handle_end kernel/softirq.c:414 [inline]
      softirqs last enabled at (136418): [<ffff800008020ea8>] __do_softirq+0xd4c/0xfa4 kernel/softirq.c:600
      softirqs last disabled at (136371): [<ffff80000802b4a4>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80
      ---[ end trace 0000000000000000 ]---
      skb len=0 headroom=0 headlen=0 tailroom=192
      mac=(0,0) net=(0,-1) trans=-1
      shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
      csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
      hash(0x0 sw=0 l4=0) proto=0x0010 pkttype=6 iif=0
      dev name=nlmon0 feat=0x0000000000005861
      
      Fixes: 7f00feaf ("ila: Add generic ILA translation facility")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      693aa2c0
    • Pedro Tammela's avatar
      net/sched: act_connmark: handle errno on tcf_idr_check_alloc · fb073904
      Pedro Tammela authored
      Smatch reports that 'ci' can be used uninitialized.
      The current code ignores errno coming from tcf_idr_check_alloc, which
      will lead to the incorrect usage of 'ci'. Handle the errno as it should.
      
      Fixes: 288864ef ("net/sched: act_connmark: transition to percpu stats and rcu")
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb073904