1. 19 Apr, 2023 3 commits
    • Tzung-Bi Shih's avatar
      netfilter: conntrack: fix wrong ct->timeout value · 73db1b8f
      Tzung-Bi Shih authored
      (struct nf_conn)->timeout is an interval before the conntrack
      confirmed.  After confirmed, it becomes a timestamp.
      
      It is observed that timeout of an unconfirmed conntrack:
      - Set by calling ctnetlink_change_timeout(). As a result,
        `nfct_time_stamp` was wrongly added to `ct->timeout` twice.
      - Get by calling ctnetlink_dump_timeout(). As a result,
        `nfct_time_stamp` was wrongly subtracted.
      
      Call Trace:
       <TASK>
       dump_stack_lvl
       ctnetlink_dump_timeout
       __ctnetlink_glue_build
       ctnetlink_glue_build
       __nfqnl_enqueue_packet
       nf_queue
       nf_hook_slow
       ip_mc_output
       ? __pfx_ip_finish_output
       ip_send_skb
       ? __pfx_dst_output
       udp_send_skb
       udp_sendmsg
       ? __pfx_ip_generic_getfrag
       sock_sendmsg
      
      Separate the 2 cases in:
      - Setting `ct->timeout` in __nf_ct_set_timeout().
      - Getting `ct->timeout` in ctnetlink_dump_timeout().
      
      Pablo appends:
      
      Update ctnetlink to set up the timeout _after_ the IPS_CONFIRMED flag is
      set on, otherwise conntrack creation via ctnetlink breaks.
      
      Note that the problem described in this patch occurs since the
      introduction of the nfnetlink_queue conntrack support, select a
      sufficiently old Fixes: tag for -stable kernel to pick up this fix.
      
      Fixes: a4b4766c ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info")
      Signed-off-by: default avatarTzung-Bi Shih <tzungbi@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      73db1b8f
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert() · 2cdaa3ee
      Pablo Neira Ayuso authored
      e6d57e9f ("netfilter: conntrack: fix rmmod double-free race")
      consolidates IPS_CONFIRMED bit set in nf_conntrack_hash_check_insert().
      However, this breaks ctnetlink:
      
       # conntrack -I -p tcp --timeout 123 --src 1.2.3.4 --dst 5.6.7.8 --state ESTABLISHED --sport 1 --dport 4 -u SEEN_REPLY
       conntrack v1.4.6 (conntrack-tools): Operation failed: Device or resource busy
      
      This is a partial revert of the aforementioned commit to restore
      IPS_CONFIRMED.
      
      Fixes: e6d57e9f ("netfilter: conntrack: fix rmmod double-free race")
      Reported-by: default avatarStéphane Graber <stgraber@stgraber.org>
      Tested-by: default avatarStéphane Graber <stgraber@stgraber.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2cdaa3ee
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 92e8c732
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Unbreak br_netfilter physdev match support, from Florian Westphal.
      
      2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian.
      
      3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal.
      
      4) Fix validation of catch-all set element.
      
      5) Tighten requirements for catch-all set elements.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
        netfilter: nf_tables: validate catch-all set elements
        netfilter: nf_tables: fix ifdef to also consider nf_tables=m
        netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT
        netfilter: br_netfilter: fix recent physdev match breakage
      ====================
      
      Link: https://lore.kernel.org/r/20230418145048.67270-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92e8c732
  2. 18 Apr, 2023 8 commits
    • Nikita Zhandarovich's avatar
      mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() · c0e73276
      Nikita Zhandarovich authored
      Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in
      question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This
      behaviour may lead to NULL pointer dereference in 'multi->total_len'.
      Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value
      against NULL.
      
      Found by Linux Verification Center (linuxtesting.org) with static
      analysis tool SVACE.
      
      Fixes: 410ed13c ("Add the mlxfw module for Mellanox firmware flash process")
      Co-developed-by: default avatarNatalia Petrova <n.petrova@fintech.ru>
      Signed-off-by: default avatarNikita Zhandarovich <n.zhandarovich@fintech.ru>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20230417120718.52325-1-n.zhandarovich@fintech.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c0e73276
    • Paolo Abeni's avatar
      Merge branch 'bnxt_en-bug-fixes' · 28e63d01
      Paolo Abeni authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes
      
      This small series contains 2 fixes.  The first one fixes the PTP
      initialization logic on older chips to avoid logging a warning.  The
      second one fixes a potenial NULL pointer dereference in the driver's
      aux bus unload path.
      ====================
      
      Link: https://lore.kernel.org/r/20230417065819.122055-1-michael.chan@broadcom.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      28e63d01
    • Kalesh AP's avatar
      bnxt_en: Fix a possible NULL pointer dereference in unload path · 4f4e54b1
      Kalesh AP authored
      In the driver unload path, the driver currently checks the valid
      BNXT_FLAG_ROCE_CAP flag in bnxt_rdma_aux_device_uninit() before
      proceeding.  This is flawed because the flag may not be set initially
      during driver load.  It may be set later after the NVRAM setting is
      changed followed by a firmware reset.  Relying on the
      BNXT_FLAG_ROCE_CAP flag may crash in bnxt_rdma_aux_device_uninit() if
      the aux device was never initialized:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      PGD 8ae6aa067 P4D 0
      Oops: 0000 [#1] SMP NOPTI
      CPU: 39 PID: 42558 Comm: rmmod Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-348.el8.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
      RIP: 0010:device_del+0x1b/0x410
      Code: 89 a5 50 03 00 00 4c 89 a5 58 03 00 00 eb 89 0f 1f 44 00 00 41 56 41 55 41 54 4c 8d a7 80 00 00 00 55 53 48 89 fb 48 83 ec 18 <48> 8b 2f 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0
      RSP: 0018:ff7f82bf469a7dc8 EFLAGS: 00010292
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000000
      RBP: ff31b7cd114b0ac0 R08: 0000000000000000 R09: ffffffff935c3400
      R10: ff31b7cd45bc3440 R11: 0000000000000001 R12: 0000000000000080
      R13: ffffffffc1069f40 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007fc9903ce740(0000) GS:ff31b7d4ffac0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000992fee004 CR4: 0000000000773ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       bnxt_rdma_aux_device_uninit+0x1f/0x30 [bnxt_en]
       bnxt_remove_one+0x2f/0x1f0 [bnxt_en]
       pci_device_remove+0x3b/0xc0
       device_release_driver_internal+0x103/0x1f0
       driver_detach+0x54/0x88
       bus_remove_driver+0x77/0xc9
       pci_unregister_driver+0x2d/0xb0
       bnxt_exit+0x16/0x2c [bnxt_en]
       __x64_sys_delete_module+0x139/0x280
       do_syscall_64+0x5b/0x1a0
       entry_SYSCALL_64_after_hwframe+0x65/0xca
      RIP: 0033:0x7fc98f3af71b
      
      Fix this by modifying the check inside bnxt_rdma_aux_device_uninit()
      to check for bp->aux_priv instead.  We also need to make some changes
      in bnxt_rdma_aux_device_init() to make sure that bp->aux_priv is set
      only when the aux device is fully initialized.
      
      Fixes: d80d88b0 ("bnxt_en: Add auxiliary driver support")
      Reviewed-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Signed-off-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4f4e54b1
    • Michael Chan's avatar
      bnxt_en: Do not initialize PTP on older P3/P4 chips · e8b51a1a
      Michael Chan authored
      The driver does not support PTP on these older chips and it is assuming
      that firmware on these older chips will not return the
      PORT_MAC_PTP_QCFG_RESP_FLAGS_HWRM_ACCESS flag in __bnxt_hwrm_ptp_qcfg(),
      causing the function to abort quietly.
      
      But newer firmware now sets this flag and so __bnxt_hwrm_ptp_qcfg()
      will proceed further.  Eventually it will fail in bnxt_ptp_init() ->
      bnxt_map_ptp_regs() because there is no code to support the older chips.
      The driver will then complain:
      
      "PTP initialization failed.\n"
      
      Fix it so that we abort quietly earlier without going through the
      unnecessary steps and alarming the user with the warning log.
      
      Fixes: ae5c42f0 ("bnxt_en: Get PTP hardware capability from firmware")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e8b51a1a
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements · d4eb7e39
      Pablo Neira Ayuso authored
      If NFT_SET_ELEM_CATCHALL is set on, then userspace provides no set element
      key. Otherwise, bail out with -EINVAL.
      
      Fixes: aaa31047 ("netfilter: nftables: add catch-all set element support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d4eb7e39
    • Duoming Zhou's avatar
      cxgb4: fix use after free bugs caused by circular dependency problem · e50b9b9e
      Duoming Zhou authored
      The flower_stats_timer can schedule flower_stats_work and
      flower_stats_work can also arm the flower_stats_timer. The
      process is shown below:
      
      ----------- timer schedules work ------------
      ch_flower_stats_cb() //timer handler
        schedule_work(&adap->flower_stats_work);
      
      ----------- work arms timer ------------
      ch_flower_stats_handler() //workqueue callback function
        mod_timer(&adap->flower_stats_timer, ...);
      
      When the cxgb4 device is detaching, the timer and workqueue
      could still be rearmed. The process is shown below:
      
        (cleanup routine)           | (timer and workqueue routine)
      remove_one()                  |
        free_some_resources()       | ch_flower_stats_cb() //timer
          cxgb4_cleanup_tc_flower() |   schedule_work()
            del_timer_sync()        |
                                    | ch_flower_stats_handler() //workqueue
                                    |   mod_timer()
            cancel_work_sync()      |
        kfree(adapter) //FREE       | ch_flower_stats_cb() //timer
                                    |   adap->flower_stats_work //USE
      
      This patch changes del_timer_sync() to timer_shutdown_sync(),
      which could prevent rearming of the timer from the workqueue.
      
      Fixes: e0f911c8 ("cxgb4: fetch stats for offloaded tc flower flows")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Link: https://lore.kernel.org/r/20230415081227.7463-1-duoming@zju.edu.cnSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e50b9b9e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: validate catch-all set elements · d46fc894
      Pablo Neira Ayuso authored
      catch-all set element might jump/goto to chain that uses expressions
      that require validation.
      
      Fixes: aaa31047 ("netfilter: nftables: add catch-all set element support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d46fc894
    • Jacob Keller's avatar
      ice: document RDMA devlink parameters · 1a2bd3bd
      Jacob Keller authored
      Commit e523af4e ("net/ice: Add support for enable_iwarp and enable_roce
      devlink param") added support for the enable_roce and enable_iwarp
      parameters in the ice driver. It didn't document these parameters in the
      ice devlink documentation file. Add this documentation, including a note
      about the mutual exclusion between the two modes.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230414162614.571861-1-jacob.e.keller@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1a2bd3bd
  3. 17 Apr, 2023 4 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: fix ifdef to also consider nf_tables=m · c55c0e91
      Florian Westphal authored
      nftables can be built as a module, so fix the preprocessor conditional
      accordingly.
      
      Fixes: 478b360a ("netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n")
      Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c55c0e91
    • Pedro Tammela's avatar
      net/sched: clear actions pointer in miss cookie init fail · 338469d6
      Pedro Tammela authored
      Palash reports a UAF when using a modified version of syzkaller[1].
      
      When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()'
      a call to 'tcf_exts_destroy()' is made to free up the tcf_exts
      resources.
      In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made;
      Then calling 'tcf_exts_destroy()', which triggers an UAF since the
      already freed tcf_exts action pointer is lingering in the struct.
      
      Before the offending patch, this was not an issue since there was no
      case where the tcf_exts action pointer could linger. Therefore, restore
      the old semantic by clearing the action pointer in case of a failure to
      initialize the miss_cookie.
      
      [1] https://github.com/cmu-pasta/linux-kernel-enriched-corpus
      
      v1->v2: Fix compilation on configs without tc actions (kernel test robot)
      
      Fixes: 80cd22c3 ("net/sched: cls_api: Support hardware miss to tc action")
      Reported-by: default avatarPalash Oswal <oswalpalash@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      338469d6
    • Ding Hui's avatar
      sfc: Fix use-after-free due to selftest_work · a80bb8e7
      Ding Hui authored
      There is a use-after-free scenario that is:
      
      When the NIC is down, user set mac address or vlan tag to VF,
      the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop()
      and efx_net_open(), since netif_running() is false, the port will not
      start and keep port_enabled false, but selftest_work is scheduled
      in efx_net_open().
      
      If we remove the device before selftest_work run, the efx_stop_port()
      will not be called since the NIC is down, and then efx is freed,
      we will soon get a UAF in run_timer_softirq() like this:
      
      [ 1178.907941] ==================================================================
      [ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90
      [ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0
      [ 1178.907950]
      [ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022
      [ 1178.907955] Call Trace:
      [ 1178.907956]  <IRQ>
      [ 1178.907960]  dump_stack+0x71/0xab
      [ 1178.907963]  print_address_description+0x6b/0x290
      [ 1178.907965]  ? run_timer_softirq+0xdea/0xe90
      [ 1178.907967]  kasan_report+0x14a/0x2b0
      [ 1178.907968]  run_timer_softirq+0xdea/0xe90
      [ 1178.907971]  ? init_timer_key+0x170/0x170
      [ 1178.907973]  ? hrtimer_cancel+0x20/0x20
      [ 1178.907976]  ? sched_clock+0x5/0x10
      [ 1178.907978]  ? sched_clock_cpu+0x18/0x170
      [ 1178.907981]  __do_softirq+0x1c8/0x5fa
      [ 1178.907985]  irq_exit+0x213/0x240
      [ 1178.907987]  smp_apic_timer_interrupt+0xd0/0x330
      [ 1178.907989]  apic_timer_interrupt+0xf/0x20
      [ 1178.907990]  </IRQ>
      [ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370
      
      If the NIC is not actually brought up, there is no need to schedule
      selftest_work, so let's move invoking efx_selftest_async_start()
      into efx_start_all(), and it will be canceled by broughting down.
      
      Fixes: dd40781e ("sfc: Run event/IRQ self-test asynchronously when interface is brought up")
      Fixes: e340be92 ("sfc: add ndo_set_vf_mac() function for EF10")
      Debugged-by: default avatarHuang Cun <huangcun@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Suggested-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a80bb8e7
    • Xuan Zhuo's avatar
      virtio_net: bugfix overflow inside xdp_linearize_page() · 853618d5
      Xuan Zhuo authored
      Here we copy the data from the original buf to the new page. But we
      not check that it may be overflow.
      
      As long as the size received(including vnethdr) is greater than 3840
      (PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow.
      
      And this is completely possible, as long as the MTU is large, such
      as 4096. In our test environment, this will cause crash. Since crash is
      caused by the written memory, it is meaningless, so I do not include it.
      
      Fixes: 72979a6c ("virtio_net: xdp, add slowpath case for non contiguous buffers")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      853618d5
  4. 14 Apr, 2023 1 commit
    • Gwangun Jung's avatar
      net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg · 30379334
      Gwangun Jung authored
      If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device.
      The MTU of the loopback device can be set up to 2^31-1.
      As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX.
      
      Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors.
      
      The following reports a oob access:
      
      [   84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313)
      [   84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301
      [   84.583686]
      [   84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1
      [   84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      [   84.584644] Call Trace:
      [   84.584787]  <TASK>
      [   84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
      [   84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
      [   84.585570] kasan_report (mm/kasan/report.c:538)
      [   84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313)
      [   84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255)
      [   84.587607] dev_qdisc_enqueue (net/core/dev.c:3776)
      [   84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212)
      [   84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228)
      [   84.589460] ip_output (net/ipv4/ip_output.c:430)
      [   84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606)
      [   84.590285] raw_sendmsg (net/ipv4/raw.c:649)
      [   84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747)
      [   84.592084] __sys_sendto (net/socket.c:2142)
      [   84.593306] __x64_sys_sendto (net/socket.c:2150)
      [   84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      [   84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      [   84.594070] RIP: 0033:0x7fe568032066
      [   84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      
      Code starting with the faulting instruction
      ===========================================
      [   84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066
      [   84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003
      [   84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010
      [   84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
      [   84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001
      [   84.596218]  </TASK>
      [   84.596295]
      [   84.596351] Allocated by task 291:
      [   84.596467] kasan_save_stack (mm/kasan/common.c:46)
      [   84.596597] kasan_set_track (mm/kasan/common.c:52)
      [   84.596725] __kasan_kmalloc (mm/kasan/common.c:384)
      [   84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974)
      [   84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938)
      [   84.597100] qdisc_create (net/sched/sch_api.c:1244)
      [   84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680)
      [   84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174)
      [   84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574)
      [   84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365)
      [   84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942)
      [   84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747)
      [   84.598016] ____sys_sendmsg (net/socket.c:2501)
      [   84.598147] ___sys_sendmsg (net/socket.c:2557)
      [   84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586)
      [   84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      [   84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      [   84.598688]
      [   84.598744] The buggy address belongs to the object at ffff88810f674000
      [   84.598744]  which belongs to the cache kmalloc-8k of size 8192
      [   84.599135] The buggy address is located 2664 bytes to the right of
      [   84.599135]  allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0)
      [   84.599544]
      [   84.599598] The buggy address belongs to the physical page:
      [   84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670
      [   84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      [   84.600330] flags: 0x200000000010200(slab|head|node=0|zone=2)
      [   84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000
      [   84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000
      [   84.601009] page dumped because: kasan: bad access detected
      [   84.601187]
      [   84.601241] Memory state around the buggy address:
      [   84.601396]  ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   84.601620]  ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   84.602069]                                               ^
      [   84.602243]  ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   84.602468]  ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   84.602693] ==================================================================
      [   84.602924] Disabling lock debugging due to kernel taint
      
      Fixes: 3015f3d2 ("pkt_sched: enable QFQ to support TSO/GSO")
      Reported-by: default avatarGwangun Jung <exsociety@gmail.com>
      Signed-off-by: default avatarGwangun Jung <exsociety@gmail.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30379334
  5. 13 Apr, 2023 24 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 829cca4d
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, and bluetooth.
      
        Not all that quiet given spring celebrations, but "current" fixes are
        thinning out, which is encouraging. One outstanding regression in the
        mlx5 driver when using old FW, not blocking but we're pushing for a
        fix.
      
        Current release - new code bugs:
      
         - eth: enetc: workaround for unresponsive pMAC after receiving
           express traffic
      
        Previous releases - regressions:
      
         - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
           pid/seq fields 0 for backward compatibility
      
        Previous releases - always broken:
      
         - sctp: fix a potential overflow in sctp_ifwdtsn_skip
      
         - mptcp:
            - use mptcp_schedule_work instead of open-coding it and make the
              worker check stricter, to avoid scheduling work on closed
              sockets
            - fix NULL pointer dereference on fastopen early fallback
      
         - skbuff: fix memory corruption due to a race between skb coalescing
           and releasing clones confusing page_pool reference counting
      
         - bonding: fix neighbor solicitation validation on backup slaves
      
         - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
      
         - bpf: arm64: fixed a BTI error on returning to patched function
      
         - openvswitch: fix race on port output leading to inf loop
      
         - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
           returning a different errno than expected
      
         - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
      
         - Bluetooth: fix printing errors if LE Connection times out
      
         - Bluetooth: assorted UaF, deadlock and data race fixes
      
         - eth: macb: fix memory corruption in extended buffer descriptor mode
      
        Misc:
      
         - adjust the XDP Rx flow hash API to also include the protocol layers
           over which the hash was computed"
      
      * tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
        selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
        mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
        veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
        mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
        xdp: rss hash types representation
        selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
        skbuff: Fix a race between coalescing and releasing SKBs
        net: macb: fix a memory corruption in extended buffer descriptor mode
        selftests: add the missing CONFIG_IP_SCTP in net config
        udp6: fix potential access to stale information
        selftests: openvswitch: adjust datapath NL message declaration
        selftests: mptcp: userspace pm: uniform verify events
        mptcp: fix NULL pointer dereference on fastopen early fallback
        mptcp: stricter state check in mptcp_worker
        mptcp: use mptcp_schedule_work instead of open-coding it
        net: enetc: workaround for unresponsive pMAC after receiving express traffic
        sctp: fix a potential overflow in sctp_ifwdtsn_skip
        net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
        rtnetlink: Restore RTM_NEW/DELLINK notification behavior
        net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
        ...
      829cca4d
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 4413ad01
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix interaction between fw_devlink and DT overlays causing devices to
         not be probed
      
       - Fix the compatible string for loongson,cpu-interrupt-controller
      
      * tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        treewide: Fix probing of devices in DT overlays
        dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible
      4413ad01
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 531f27ad
      Linus Torvalds authored
      Pull pin control fix from Linus Walleij:
       "This is just a revert of the AMD fix, because the fix broke some
        laptops. We are working on a proper solution"
      
      * tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        Revert "pinctrl: amd: Disable and mask interrupts on resume"
      531f27ad
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm · f1be7b6c
      Linus Torvalds authored
      Pull drm fixes from Daniel Vetter:
      
       - two fbcon regressions
      
       - amdgpu: dp mst, smu13
      
       - i915: dual link dsi for tgl+
      
       - armada, nouveau, drm/sched, fbmem
      
      * tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm:
        fbcon: set_con2fb_map needs to set con2fb_map!
        fbcon: Fix error paths in set_con2fb_map
        drm/amd/pm: correct the pcie link state check for SMU13
        drm/amd/pm: correct SMU13.0.7 max shader clock reporting
        drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
        drm/amd/display: Pass the right info to drm_dp_remove_payload
        drm/armada: Fix a potential double free in an error handling path
        fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
        drm/nouveau/fb: add missing sysmen flush callbacks
        drm/i915/dsi: fix DSS CTL register offsets for TGL+
        drm/scheduler: Fix UAF race in drm_sched_entity_push_job()
      f1be7b6c
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d0f89c4c
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-04-13
      
      We've added 6 non-merge commits during the last 1 day(s) which contain
      a total of 14 files changed, 205 insertions(+), 38 deletions(-).
      
      The main changes are:
      
      1) One late straggler fix on the XDP hints side which fixes
         bpf_xdp_metadata_rx_hash kfunc API before the release goes out
         in order to provide information on the RSS hash type,
         from Jesper Dangaard Brouer.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
        mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
        veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
        mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
        xdp: rss hash types representation
        selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
      ====================
      
      Link: https://lore.kernel.org/r/20230413192939.10202-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d0f89c4c
    • Daniel Vetter's avatar
      Merge tag 'drm-misc-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · cab29322
      Daniel Vetter authored
      Short summary of fixes pull:
      
       * armada: Fix double free
       * fb: Clear FB_ACTIVATE_KD_TEXT in ioctl
       * nouveau: Add missing callbacks
       * scheduler: Fix use-after-free error
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230413184233.GA8148@linux-uq9g
      cab29322
    • Alexei Starovoitov's avatar
      Merge branch 'XDP-hints: change RX-hash kfunc bpf_xdp_metadata_rx_hash' · b65ef48c
      Alexei Starovoitov authored
      Jesper Dangaard Brouer says:
      
      ====================
      
      Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value,
      but doesn't provide information on the RSS hash type (part of 6.3-rc).
      
      This patchset proposal is to change the function call signature via adding
      a pointer value argument for providing the RSS hash type.
      
      Patchset also removes all bpf_printk's from xdp_hw_metadata program
      that we expect driver developers to use. Instead counters are introduced
      for relaying e.g. skip and fail info.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b65ef48c
    • Jesper Dangaard Brouer's avatar
      0f26b74e
    • Jesper Dangaard Brouer's avatar
      mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type · 9123397a
      Jesper Dangaard Brouer authored
      Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
      via matching individual Completion Queue Entry (CQE) status bits.
      
      Fixes: ab46182d ("net/mlx4_en: Support RX XDP metadata")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoulSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9123397a
    • Jesper Dangaard Brouer's avatar
      veth: bpf_xdp_metadata_rx_hash add xdp rss hash type · 96b1a098
      Jesper Dangaard Brouer authored
      Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type.
      
      The veth driver currently only support XDP-hints based on SKB code path.
      The SKB have lost information about the RSS hash type, by compressing
      the information down to a single bitfield skb->l4_hash, that only knows
      if this was a L4 hash value.
      
      In preparation for veth, the xdp_rss_hash_type have an L4 indication
      bit that allow us to return a meaningful L4 indication when working
      with SKB based packets.
      
      Fixes: 306531f0 ("veth: Support RX XDP metadata")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoulSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      96b1a098
    • Jesper Dangaard Brouer's avatar
      mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type · 67f245c2
      Jesper Dangaard Brouer authored
      Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
      via mapping table.
      
      The mlx5 hardware can also identify and RSS hash IPSEC.  This indicate
      hash includes SPI (Security Parameters Index) as part of IPSEC hash.
      
      Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.
      
      Fixes: bc8d405b ("net/mlx5e: Support RX XDP metadata")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoulSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      67f245c2
    • Jesper Dangaard Brouer's avatar
      xdp: rss hash types representation · 0cd917a4
      Jesper Dangaard Brouer authored
      The RSS hash type specifies what portion of packet data NIC hardware used
      when calculating RSS hash value. The RSS types are focused on Internet
      traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
      value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
      primarily TCP vs UDP, but some hardware supports SCTP.
      
      Hardware RSS types are differently encoded for each hardware NIC. Most
      hardware represent RSS hash type as a number. Determining L3 vs L4 often
      requires a mapping table as there often isn't a pattern or sorting
      according to ISO layer.
      
      The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
      contains both BITs for the L3/L4 types, and combinations to be used by
      drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
      to BPF via BTF, and it is up to the BPF-programmer to match using these
      defines.
      
      This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
      a pointer value argument for provide the RSS hash type.
      Change signature for all xmo_rx_hash calls in drivers to make it compile.
      
      The RSS type implementations for each driver comes as separate patches.
      
      Fixes: 3d76a4d3 ("bpf: XDP metadata RX kfuncs")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoulSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0cd917a4
    • Jesper Dangaard Brouer's avatar
      selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters · e8163b98
      Jesper Dangaard Brouer authored
      The tool xdp_hw_metadata can be used by driver developers
      implementing XDP-hints metadata kfuncs.
      
      Remove all bpf_printk calls, as the tool already transfers all the
      XDP-hints related information via metadata area to AF_XDP
      userspace process.
      
      Add counters for providing remaining information about failure and
      skipped packet events.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoulSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e8163b98
    • Daniel Vetter's avatar
      fbcon: set_con2fb_map needs to set con2fb_map! · fffb0b52
      Daniel Vetter authored
      I got really badly confused in d443d938 ("fbcon: move more common
      code into fb_open()") because we set the con2fb_map before the failure
      points, which didn't look good.
      
      But in trying to fix that I moved the assignment into the wrong path -
      we need to do it for _all_ vc we take over, not just the first one
      (which additionally requires the call to con2fb_acquire_newinfo).
      
      I've figured this out because of a KASAN bug report, where the
      fbcon_registered_fb and fbcon_display arrays went out of sync in
      fbcon_mode_deleted() because the con2fb_map pointed at the old
      fb_info, but the modes and everything was updated for the new one.
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Tested-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Fixes: d443d938 ("fbcon: move more common code into fb_open()")
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Xingyuan Mo <hdthky0@gmail.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v5.19+
      fffb0b52
    • Daniel Vetter's avatar
      fbcon: Fix error paths in set_con2fb_map · edf79dd2
      Daniel Vetter authored
      This is a regressoin introduced in b07db395 ("fbcon: Ditch error
      handling for con2fb_release_oldinfo"). I failed to realize what the if
      (!err) checks. The mentioned commit was dropping the
      con2fb_release_oldinfo() return value but the if (!err) was also
      checking whether the con2fb_acquire_newinfo() function call above
      failed or not.
      
      Fix this with an early return statement.
      
      Note that there's still a difference compared to the orginal state of
      the code, the below lines are now also skipped on error:
      
      	if (!search_fb_in_map(info_idx))
      		info_idx = newidx;
      
      These are only needed when we've actually thrown out an old fb_info
      from the console mappings, which only happens later on.
      
      Also move the fbcon_add_cursor_work() call into the same if block,
      it's all protected by console_lock so doesn't matter when we set up
      the blinking cursor delayed work anyway. This further simplifies the
      control flow and allows us to ditch the found local variable.
      
      v2: Clarify commit message (Javier)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Tested-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Fixes: b07db395 ("fbcon: Ditch error handling for con2fb_release_oldinfo")
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Xingyuan Mo <hdthky0@gmail.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v5.19+
      edf79dd2
    • Liang Chen's avatar
      skbuff: Fix a race between coalescing and releasing SKBs · 0646dc31
      Liang Chen authored
      Commit 1effe8ca ("skbuff: fix coalescing for page_pool fragment
      recycling") allowed coalescing to proceed with non page pool page and page
      pool page when @from is cloned, i.e.
      
      to->pp_recycle    --> false
      from->pp_recycle  --> true
      skb_cloned(from)  --> true
      
      However, it actually requires skb_cloned(@from) to hold true until
      coalescing finishes in this situation. If the other cloned SKB is
      released while the merging is in process, from_shinfo->nr_frags will be
      set to 0 toward the end of the function, causing the increment of frag
      page _refcount to be unexpectedly skipped resulting in inconsistent
      reference counts. Later when SKB(@to) is released, it frees the page
      directly even though the page pool page is still in use, leading to
      use-after-free or double-free errors. So it should be prohibited.
      
      The double-free error message below prompted us to investigate:
      BUG: Bad page state in process swapper/1  pfn:0e0d1
      page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
      index:0x2 pfn:0xe0d1
      flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
      raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
      page dumped because: nonzero _refcount
      
      CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
      Call Trace:
       <IRQ>
      dump_stack_lvl+0x32/0x50
      bad_page+0x69/0xf0
      free_pcp_prepare+0x260/0x2f0
      free_unref_page+0x20/0x1c0
      skb_release_data+0x10b/0x1a0
      napi_consume_skb+0x56/0x150
      net_rx_action+0xf0/0x350
      ? __napi_schedule+0x79/0x90
      __do_softirq+0xc8/0x2b1
      __irq_exit_rcu+0xb9/0xf0
      common_interrupt+0x82/0xa0
      </IRQ>
      <TASK>
      asm_common_interrupt+0x22/0x40
      RIP: 0010:default_idle+0xb/0x20
      
      Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool")
      Signed-off-by: default avatarLiang Chen <liangchen.linux@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230413090353.14448-1-liangchen.linux@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0646dc31
    • Roman Gushchin's avatar
      net: macb: fix a memory corruption in extended buffer descriptor mode · e8b74453
      Roman Gushchin authored
      For quite some time we were chasing a bug which looked like a sudden
      permanent failure of networking and mmc on some of our devices.
      The bug was very sensitive to any software changes and even more to
      any kernel debug options.
      
      Finally we got a setup where the problem was reproducible with
      CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma:
      
      [   16.992082] ------------[ cut here ]------------
      [   16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes]
      [   17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900
      [   17.018977] Modules linked in: xxxxx
      [   17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28
      [   17.045345] Hardware name: xxxxx
      [   17.049528] pstate: 60000005 (nZCv daif -PAN -UAO)
      [   17.054322] pc : check_unmap+0x6a0/0x900
      [   17.058243] lr : check_unmap+0x6a0/0x900
      [   17.062163] sp : ffffffc010003c40
      [   17.065470] x29: ffffffc010003c40 x28: 000000004000c03c
      [   17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800
      [   17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8
      [   17.081407] x23: 0000000000000000 x22: ffffffc010a08750
      [   17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000
      [   17.092032] x19: 0000000875e3e244 x18: 0000000000000010
      [   17.097343] x17: 0000000000000000 x16: 0000000000000000
      [   17.102647] x15: ffffff8879e4a988 x14: 0720072007200720
      [   17.107959] x13: 0720072007200720 x12: 0720072007200720
      [   17.113261] x11: 0720072007200720 x10: 0720072007200720
      [   17.118565] x9 : 0720072007200720 x8 : 000000000000022d
      [   17.123869] x7 : 0000000000000015 x6 : 0000000000000098
      [   17.129173] x5 : 0000000000000000 x4 : 0000000000000000
      [   17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370
      [   17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000
      [   17.145082] Call trace:
      [   17.147524]  check_unmap+0x6a0/0x900
      [   17.151091]  debug_dma_unmap_page+0x88/0x90
      [   17.155266]  gem_rx+0x114/0x2f0
      [   17.158396]  macb_poll+0x58/0x100
      [   17.161705]  net_rx_action+0x118/0x400
      [   17.165445]  __do_softirq+0x138/0x36c
      [   17.169100]  irq_exit+0x98/0xc0
      [   17.172234]  __handle_domain_irq+0x64/0xc0
      [   17.176320]  gic_handle_irq+0x5c/0xc0
      [   17.179974]  el1_irq+0xb8/0x140
      [   17.183109]  xiic_process+0x5c/0xe30
      [   17.186677]  irq_thread_fn+0x28/0x90
      [   17.190244]  irq_thread+0x208/0x2a0
      [   17.193724]  kthread+0x130/0x140
      [   17.196945]  ret_from_fork+0x10/0x20
      [   17.200510] ---[ end trace 7240980785f81d6f ]---
      
      [  237.021490] ------------[ cut here ]------------
      [  237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b
      [  237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240
      [  237.041802] Modules linked in: xxxxx
      [  237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.4.0 #28
      [  237.068941] Hardware name: xxxxx
      [  237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO)
      [  237.077900] pc : add_dma_entry+0x214/0x240
      [  237.081986] lr : add_dma_entry+0x214/0x240
      [  237.086072] sp : ffffffc010003c30
      [  237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00
      [  237.094683] x27: 0000000000000180 x26: ffffff8878e387c0
      [  237.099987] x25: 0000000000000002 x24: 0000000000000000
      [  237.105290] x23: 000000000000003b x22: ffffffc010a0fa00
      [  237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600
      [  237.115897] x19: 00000000ffffffef x18: 0000000000000010
      [  237.121201] x17: 0000000000000000 x16: 0000000000000000
      [  237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720
      [  237.131807] x13: 0720072007200720 x12: 0720072007200720
      [  237.137111] x11: 0720072007200720 x10: 0720072007200720
      [  237.142415] x9 : 0720072007200720 x8 : 0000000000000259
      [  237.147718] x7 : 0000000000000001 x6 : 0000000000000000
      [  237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001
      [  237.158325] x3 : 0000000000000006 x2 : 0000000000000007
      [  237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000
      [  237.168932] Call trace:
      [  237.171373]  add_dma_entry+0x214/0x240
      [  237.175115]  debug_dma_map_page+0xf8/0x120
      [  237.179203]  gem_rx_refill+0x190/0x280
      [  237.182942]  gem_rx+0x224/0x2f0
      [  237.186075]  macb_poll+0x58/0x100
      [  237.189384]  net_rx_action+0x118/0x400
      [  237.193125]  __do_softirq+0x138/0x36c
      [  237.196780]  irq_exit+0x98/0xc0
      [  237.199914]  __handle_domain_irq+0x64/0xc0
      [  237.204000]  gic_handle_irq+0x5c/0xc0
      [  237.207654]  el1_irq+0xb8/0x140
      [  237.210789]  arch_cpu_idle+0x40/0x200
      [  237.214444]  default_idle_call+0x18/0x30
      [  237.218359]  do_idle+0x200/0x280
      [  237.221578]  cpu_startup_entry+0x20/0x30
      [  237.225493]  rest_init+0xe4/0xf0
      [  237.228713]  arch_call_rest_init+0xc/0x14
      [  237.232714]  start_kernel+0x47c/0x4a8
      [  237.236367] ---[ end trace 7240980785f81d70 ]---
      
      Lars was fast to find an explanation: according to the datasheet
      bit 2 of the rx buffer descriptor entry has a different meaning in the
      extended mode:
        Address [2] of beginning of buffer, or
        in extended buffer descriptor mode (DMA configuration register [28] = 1),
        indicates a valid timestamp in the buffer descriptor entry.
      
      The macb driver didn't mask this bit while getting an address and it
      eventually caused a memory corruption and a dma failure.
      
      The problem is resolved by explicitly clearing the problematic bit
      if hw timestamping is used.
      
      Fixes: 7b429614 ("net: macb: Add support for PTP timestamps in DMA descriptors")
      Signed-off-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Co-developed-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230412232144.770336-1-roman.gushchin@linux.devSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8b74453
    • Xin Long's avatar
      selftests: add the missing CONFIG_IP_SCTP in net config · 3a0385be
      Xin Long authored
      The selftest sctp_vrf needs CONFIG_IP_SCTP set in config
      when building the kernel, so add it.
      
      Fixes: a61bd7b9 ("selftests: add a selftest for sctp vrf")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Link: https://lore.kernel.org/r/61dddebc4d2dd98fe7fb145e24d4b2430e42b572.1681312386.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a0385be
    • Eric Dumazet's avatar
      udp6: fix potential access to stale information · 1c5950fc
      Eric Dumazet authored
      lena wang reported an issue caused by udpv6_sendmsg()
      mangling msg->msg_name and msg->msg_namelen, which
      are later read from ____sys_sendmsg() :
      
      	/*
      	 * If this is sendmmsg() and sending to current destination address was
      	 * successful, remember it.
      	 */
      	if (used_address && err >= 0) {
      		used_address->name_len = msg_sys->msg_namelen;
      		if (msg_sys->msg_name)
      			memcpy(&used_address->name, msg_sys->msg_name,
      			       used_address->name_len);
      	}
      
      udpv6_sendmsg() wants to pretend the remote address family
      is AF_INET in order to call udp_sendmsg().
      
      A fix would be to modify the address in-place, instead
      of using a local variable, but this could have other side effects.
      
      Instead, restore initial values before we return from udpv6_sendmsg().
      
      Fixes: c71d8ebe ("net: Fix security_socket_sendmsg() bypass problem.")
      Reported-by: default avatarlena wang <lena.wang@mediatek.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMaciej Żenczykowski <maze@google.com>
      Link: https://lore.kernel.org/r/20230412130308.1202254-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c5950fc
    • Aaron Conole's avatar
      selftests: openvswitch: adjust datapath NL message declaration · 306dc213
      Aaron Conole authored
      The netlink message for creating a new datapath takes an array
      of ports for the PID creation.  This shouldn't cause much issue
      but correct it for future cases where we need to do decode of
      datapath information that could include the per-cpu PID map.
      
      Fixes: 25f16c87 ("selftests: add openvswitch selftest suite")
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      306dc213
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-more-fixes-for-6-3' · ecfcc6fb
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: more fixes for 6.3
      
      Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
      edge cases. It fixes issues that can be visible from v5.11.
      
      Patch 2 makes sure the MPTCP worker doesn't try to manipulate
      disconnected sockets. This is also a fix for an issue that can be
      visible from v5.11.
      
      Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
      and an early fallback is done. A fix for v6.2.
      
      Patch 4 improves the stability of the userspace PM selftest for a
      subtest added in v6.2.
      ====================
      
      Link: https://lore.kernel.org/r/20230411-upstream-net-20230411-mptcp-fixes-v1-0-ca540f3ef986@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ecfcc6fb
    • Matthieu Baerts's avatar
      selftests: mptcp: userspace pm: uniform verify events · 711ae788
      Matthieu Baerts authored
      Simply adding a "sleep" before checking something is usually not a good
      idea because the time that has been picked can not be enough or too
      much. The best is to wait for events with a timeout.
      
      In this selftest, 'sleep 0.5' is used more than 40 times. It is always
      used before calling a 'verify_*' function except for this
      verify_listener_events which has been added later.
      
      At the end, using all these 'sleep 0.5' seems to work: the slow CIs
      don't complain so far. Also because it doesn't take too much time, we
      can just add two more 'sleep 0.5' to uniform what is done before calling
      a 'verify_*' function. For the same reasons, we can also delay a bigger
      refactoring to replace all these 'sleep 0.5' by functions waiting for
      events instead of waiting for a fix time and hope for the best.
      
      Fixes: 6c73008a ("selftests: mptcp: listener test for userspace PM")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      711ae788
    • Paolo Abeni's avatar
      mptcp: fix NULL pointer dereference on fastopen early fallback · c0ff6f6d
      Paolo Abeni authored
      In case of early fallback to TCP, subflow_syn_recv_sock() deletes
      the subflow context before returning the newly allocated sock to
      the caller.
      
      The fastopen path does not cope with the above unconditionally
      dereferencing the subflow context.
      
      Fixes: 36b122ba ("mptcp: add subflow_v(4,6)_send_synack()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0ff6f6d
    • Paolo Abeni's avatar
      mptcp: stricter state check in mptcp_worker · d6a04437
      Paolo Abeni authored
      As reported by Christoph, the mptcp protocol can run the
      worker when the relevant msk socket is in an unexpected state:
      
      connect()
      // incoming reset + fastclose
      // the mptcp worker is scheduled
      mptcp_disconnect()
      // msk is now CLOSED
      listen()
      mptcp_worker()
      
      Leading to the following splat:
      
      divide error: 0000 [#1] PREEMPT SMP
      CPU: 1 PID: 21 Comm: kworker/1:0 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:__tcp_select_window+0x22c/0x4b0 net/ipv4/tcp_output.c:3018
      RSP: 0018:ffffc900000b3c98 EFLAGS: 00010293
      RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff8214ce97 RDI: 0000000000000004
      RBP: 000000000000ffd7 R08: 0000000000000004 R09: 0000000000010000
      R10: 000000000000ffd7 R11: ffff888005afa148 R12: 000000000000ffd7
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000405270 CR3: 000000003011e006 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_select_window net/ipv4/tcp_output.c:262 [inline]
       __tcp_transmit_skb+0x356/0x1280 net/ipv4/tcp_output.c:1345
       tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
       tcp_send_active_reset+0x13e/0x320 net/ipv4/tcp_output.c:3459
       mptcp_check_fastclose net/mptcp/protocol.c:2530 [inline]
       mptcp_worker+0x6c7/0x800 net/mptcp/protocol.c:2705
       process_one_work+0x3bd/0x950 kernel/workqueue.c:2390
       worker_thread+0x5b/0x610 kernel/workqueue.c:2537
       kthread+0x138/0x170 kernel/kthread.c:376
       ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
       </TASK>
      
      This change addresses the issue explicitly checking for bad states
      before running the mptcp worker.
      
      Fixes: e16163b6 ("mptcp: refactor shutdown and close")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/374Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Tested-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6a04437