1. 04 Apr, 2024 14 commits
    • Ziyang Xuan's avatar
      netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() · 24225011
      Ziyang Xuan authored
      nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
      concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
      And thhere is not any protection when iterate over nf_tables_flowtables
      list in __nft_flowtable_type_get(). Therefore, there is pertential
      data-race of nf_tables_flowtables list entry.
      
      Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
      in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
      nft_flowtable_type_get() to protect the entire type query process.
      
      Fixes: 3b49e2e9 ("netfilter: nf_tables: add flow table netlink frontend")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      24225011
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject new basechain after table flag update · 994209dd
      Pablo Neira Ayuso authored
      When dormant flag is toggled, hooks are disabled in the commit phase by
      iterating over current chains in table (existing and new).
      
      The following configuration allows for an inconsistent state:
      
        add table x
        add chain x y { type filter hook input priority 0; }
        add table x { flags dormant; }
        add chain x w { type filter hook input priority 1; }
      
      which triggers the following warning when trying to unregister chain w
      which is already unregistered.
      
      [  127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50                                                                     1 __nf_unregister_net_hook+0x21a/0x260
      [...]
      [  127.322519] Call Trace:
      [  127.322521]  <TASK>
      [  127.322524]  ? __warn+0x9f/0x1a0
      [  127.322531]  ? __nf_unregister_net_hook+0x21a/0x260
      [  127.322537]  ? report_bug+0x1b1/0x1e0
      [  127.322545]  ? handle_bug+0x3c/0x70
      [  127.322552]  ? exc_invalid_op+0x17/0x40
      [  127.322556]  ? asm_exc_invalid_op+0x1a/0x20
      [  127.322563]  ? kasan_save_free_info+0x3b/0x60
      [  127.322570]  ? __nf_unregister_net_hook+0x6a/0x260
      [  127.322577]  ? __nf_unregister_net_hook+0x21a/0x260
      [  127.322583]  ? __nf_unregister_net_hook+0x6a/0x260
      [  127.322590]  ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
      [  127.322655]  nft_table_disable+0x75/0xf0 [nf_tables]
      [  127.322717]  nf_tables_commit+0x2571/0x2620 [nf_tables]
      
      Fixes: 179d9ba5 ("netfilter: nf_tables: fix table flag updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      994209dd
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: flush pending destroy work before exit_net release · 24cea967
      Pablo Neira Ayuso authored
      Similar to 2c9f0293 ("netfilter: nf_tables: flush pending destroy
      work before netlink notifier") to address a race between exit_net and
      the destroy workqueue.
      
      The trace below shows an element to be released via destroy workqueue
      while exit_net path (triggered via module removal) has already released
      the set that is used in such transaction.
      
      [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
      [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
      [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [ 1360.547984] Call Trace:
      [ 1360.547991]  <TASK>
      [ 1360.547998]  dump_stack_lvl+0x53/0x70
      [ 1360.548014]  print_report+0xc4/0x610
      [ 1360.548026]  ? __virt_addr_valid+0xba/0x160
      [ 1360.548040]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [ 1360.548054]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548176]  kasan_report+0xae/0xe0
      [ 1360.548189]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548312]  nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548447]  ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
      [ 1360.548577]  ? _raw_spin_unlock_irq+0x18/0x30
      [ 1360.548591]  process_one_work+0x2f1/0x670
      [ 1360.548610]  worker_thread+0x4d3/0x760
      [ 1360.548627]  ? __pfx_worker_thread+0x10/0x10
      [ 1360.548640]  kthread+0x16b/0x1b0
      [ 1360.548653]  ? __pfx_kthread+0x10/0x10
      [ 1360.548665]  ret_from_fork+0x2f/0x50
      [ 1360.548679]  ? __pfx_kthread+0x10/0x10
      [ 1360.548690]  ret_from_fork_asm+0x1a/0x30
      [ 1360.548707]  </TASK>
      
      [ 1360.548719] Allocated by task 192061:
      [ 1360.548726]  kasan_save_stack+0x20/0x40
      [ 1360.548739]  kasan_save_track+0x14/0x30
      [ 1360.548750]  __kasan_kmalloc+0x8f/0xa0
      [ 1360.548760]  __kmalloc_node+0x1f1/0x450
      [ 1360.548771]  nf_tables_newset+0x10c7/0x1b50 [nf_tables]
      [ 1360.548883]  nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
      [ 1360.548909]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
      [ 1360.548927]  netlink_unicast+0x367/0x4f0
      [ 1360.548935]  netlink_sendmsg+0x34b/0x610
      [ 1360.548944]  ____sys_sendmsg+0x4d4/0x510
      [ 1360.548953]  ___sys_sendmsg+0xc9/0x120
      [ 1360.548961]  __sys_sendmsg+0xbe/0x140
      [ 1360.548971]  do_syscall_64+0x55/0x120
      [ 1360.548982]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      [ 1360.548994] Freed by task 192222:
      [ 1360.548999]  kasan_save_stack+0x20/0x40
      [ 1360.549009]  kasan_save_track+0x14/0x30
      [ 1360.549019]  kasan_save_free_info+0x3b/0x60
      [ 1360.549028]  poison_slab_object+0x100/0x180
      [ 1360.549036]  __kasan_slab_free+0x14/0x30
      [ 1360.549042]  kfree+0xb6/0x260
      [ 1360.549049]  __nft_release_table+0x473/0x6a0 [nf_tables]
      [ 1360.549131]  nf_tables_exit_net+0x170/0x240 [nf_tables]
      [ 1360.549221]  ops_exit_list+0x50/0xa0
      [ 1360.549229]  free_exit_list+0x101/0x140
      [ 1360.549236]  unregister_pernet_operations+0x107/0x160
      [ 1360.549245]  unregister_pernet_subsys+0x1c/0x30
      [ 1360.549254]  nf_tables_module_exit+0x43/0x80 [nf_tables]
      [ 1360.549345]  __do_sys_delete_module+0x253/0x370
      [ 1360.549352]  do_syscall_64+0x55/0x120
      [ 1360.549360]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      (gdb) list *__nft_release_table+0x473
      0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
      11349           list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
      11350                   list_del(&flowtable->list);
      11351                   nft_use_dec(&table->use);
      11352                   nf_tables_flowtable_destroy(flowtable);
      11353           }
      11354           list_for_each_entry_safe(set, ns, &table->sets, list) {
      11355                   list_del(&set->list);
      11356                   nft_use_dec(&table->use);
      11357                   if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
      11358                           nft_map_deactivate(&ctx, set);
      (gdb)
      
      [ 1360.549372] Last potentially related work creation:
      [ 1360.549376]  kasan_save_stack+0x20/0x40
      [ 1360.549384]  __kasan_record_aux_stack+0x9b/0xb0
      [ 1360.549392]  __queue_work+0x3fb/0x780
      [ 1360.549399]  queue_work_on+0x4f/0x60
      [ 1360.549407]  nft_rhash_remove+0x33b/0x340 [nf_tables]
      [ 1360.549516]  nf_tables_commit+0x1c6a/0x2620 [nf_tables]
      [ 1360.549625]  nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
      [ 1360.549647]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
      [ 1360.549671]  netlink_unicast+0x367/0x4f0
      [ 1360.549680]  netlink_sendmsg+0x34b/0x610
      [ 1360.549690]  ____sys_sendmsg+0x4d4/0x510
      [ 1360.549697]  ___sys_sendmsg+0xc9/0x120
      [ 1360.549706]  __sys_sendmsg+0xbe/0x140
      [ 1360.549715]  do_syscall_64+0x55/0x120
      [ 1360.549725]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      Fixes: 0935d558 ("netfilter: nf_tables: asynchronous release")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      24cea967
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path · 0d459e2f
      Pablo Neira Ayuso authored
      The commit mutex should not be released during the critical section
      between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC
      worker could collect expired objects and get the released commit lock
      within the same GC sequence.
      
      nf_tables_module_autoload() temporarily releases the mutex to load
      module dependencies, then it goes back to replay the transaction again.
      Move it at the end of the abort phase after nft_gc_seq_end() is called.
      
      Cc: stable@vger.kernel.org
      Fixes: 72034434 ("netfilter: nf_tables: GC transaction race with abort path")
      Reported-by: default avatarKuan-Ting Chen <hexrabbit@devco.re>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0d459e2f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: release batch on table validation from abort path · a45e6889
      Pablo Neira Ayuso authored
      Unlike early commit path stage which triggers a call to abort, an
      explicit release of the batch is required on abort, otherwise mutex is
      released and commit_list remains in place.
      
      Add WARN_ON_ONCE to ensure commit_list is empty from the abort path
      before releasing the mutex.
      
      After this patch, commit_list is always assumed to be empty before
      grabbing the mutex, therefore
      
        03c1f1ef ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()")
      
      only needs to release the pending modules for registration.
      
      Cc: stable@vger.kernel.org
      Fixes: c0391b6a ("netfilter: nf_tables: missing validation from the abort path")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a45e6889
    • Paolo Abeni's avatar
      Revert "tg3: Remove residual error handling in tg3_suspend" · 72076fc9
      Paolo Abeni authored
      This reverts commit 9ab4ad29.
      
      I went out of coffee and applied it to the wrong tree. Blame on me.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      72076fc9
    • Nikita Kiryushin's avatar
      tg3: Remove residual error handling in tg3_suspend · 9ab4ad29
      Nikita Kiryushin authored
      As of now, tg3_power_down_prepare always ends with success, but
      the error handling code from former tg3_set_power_state call is still here.
      
      This code became unreachable in commit c866b7ea ("tg3: Do not use
      legacy PCI power management").
      
      Remove (now unreachable) error handling code for simplification and change
      tg3_power_down_prepare to a void function as its result is no more checked.
      Signed-off-by: default avatarNikita Kiryushin <kiryushin@ancud.ru>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240401191418.361747-1-kiryushin@ancud.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9ab4ad29
    • Haiyang Zhang's avatar
      net: mana: Fix Rx DMA datasize and skb_over_panic · c0de6ab9
      Haiyang Zhang authored
      mana_get_rxbuf_cfg() aligns the RX buffer's DMA datasize to be
      multiple of 64. So a packet slightly bigger than mtu+14, say 1536,
      can be received and cause skb_over_panic.
      
      Sample dmesg:
      [ 5325.237162] skbuff: skb_over_panic: text:ffffffffc043277a len:1536 put:1536 head:ff1100018b517000 data:ff1100018b517100 tail:0x700 end:0x6ea dev:<NULL>
      [ 5325.243689] ------------[ cut here ]------------
      [ 5325.245748] kernel BUG at net/core/skbuff.c:192!
      [ 5325.247838] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [ 5325.258374] RIP: 0010:skb_panic+0x4f/0x60
      [ 5325.302941] Call Trace:
      [ 5325.304389]  <IRQ>
      [ 5325.315794]  ? skb_panic+0x4f/0x60
      [ 5325.317457]  ? asm_exc_invalid_op+0x1f/0x30
      [ 5325.319490]  ? skb_panic+0x4f/0x60
      [ 5325.321161]  skb_put+0x4e/0x50
      [ 5325.322670]  mana_poll+0x6fa/0xb50 [mana]
      [ 5325.324578]  __napi_poll+0x33/0x1e0
      [ 5325.326328]  net_rx_action+0x12e/0x280
      
      As discussed internally, this alignment is not necessary. To fix
      this bug, remove it from the code. So oversized packets will be
      marked as CQE_RX_TRUNCATED by NIC, and dropped.
      
      Cc: stable@vger.kernel.org
      Fixes: 2fbbd712 ("net: mana: Enable RX path to handle various MTU sizes")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
      Link: https://lore.kernel.org/r/1712087316-20886-1-git-send-email-haiyangz@microsoft.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0de6ab9
    • Eric Dumazet's avatar
      net/sched: fix lockdep splat in qdisc_tree_reduce_backlog() · 7eb32236
      Eric Dumazet authored
      qdisc_tree_reduce_backlog() is called with the qdisc lock held,
      not RTNL.
      
      We must use qdisc_lookup_rcu() instead of qdisc_lookup()
      
      syzbot reported:
      
      WARNING: suspicious RCU usage
      6.1.74-syzkaller #0 Not tainted
      -----------------------------
      net/sched/sch_api.c:305 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      3 locks held by udevd/1142:
        #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline]
        #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
        #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: net_tx_action+0x64a/0x970 net/core/dev.c:5282
        #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
        #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: net_tx_action+0x754/0x970 net/core/dev.c:5297
        #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline]
        #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
        #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: qdisc_tree_reduce_backlog+0x84/0x580 net/sched/sch_api.c:792
      
      stack backtrace:
      CPU: 1 PID: 1142 Comm: udevd Not tainted 6.1.74-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      Call Trace:
       <TASK>
        [<ffffffff85b85f14>] __dump_stack lib/dump_stack.c:88 [inline]
        [<ffffffff85b85f14>] dump_stack_lvl+0x1b1/0x28f lib/dump_stack.c:106
        [<ffffffff85b86007>] dump_stack+0x15/0x1e lib/dump_stack.c:113
        [<ffffffff81802299>] lockdep_rcu_suspicious+0x1b9/0x260 kernel/locking/lockdep.c:6592
        [<ffffffff84f0054c>] qdisc_lookup+0xac/0x6f0 net/sched/sch_api.c:305
        [<ffffffff84f037c3>] qdisc_tree_reduce_backlog+0x243/0x580 net/sched/sch_api.c:811
        [<ffffffff84f5b78c>] pfifo_tail_enqueue+0x32c/0x4b0 net/sched/sch_fifo.c:51
        [<ffffffff84fbcf63>] qdisc_enqueue include/net/sch_generic.h:833 [inline]
        [<ffffffff84fbcf63>] netem_dequeue+0xeb3/0x15d0 net/sched/sch_netem.c:723
        [<ffffffff84eecab9>] dequeue_skb net/sched/sch_generic.c:292 [inline]
        [<ffffffff84eecab9>] qdisc_restart net/sched/sch_generic.c:397 [inline]
        [<ffffffff84eecab9>] __qdisc_run+0x249/0x1e60 net/sched/sch_generic.c:415
        [<ffffffff84d7aa96>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125
        [<ffffffff84d85d29>] net_tx_action+0x7c9/0x970 net/core/dev.c:5313
        [<ffffffff85e002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:616
        [<ffffffff81568bca>] invoke_softirq kernel/softirq.c:447 [inline]
        [<ffffffff81568bca>] __irq_exit_rcu+0xca/0x230 kernel/softirq.c:700
        [<ffffffff81568ae9>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:712
        [<ffffffff85b89f52>] sysvec_apic_timer_interrupt+0x42/0x90 arch/x86/kernel/apic/apic.c:1107
        [<ffffffff85c00ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:656
      
      Fixes: d636fc5d ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20240402134133.2352776-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7eb32236
    • Horatiu Vultur's avatar
      net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping · de99e1ea
      Horatiu Vultur authored
      There are 2 issues with the blamed commit.
      1. When the phy is initialized, it would enable the disabled of UDPv4
         checksums. The UDPv6 checksum is already enabled by default. So when
         1-step is configured then it would clear these flags.
      2. After the 1-step is configured, then if 2-step is configured then the
         1-step would be still configured because it is not clearing the flag.
         So the sync frames will still have origin timestamps set.
      
      Fix this by reading first the value of the register and then
      just change bit 12 as this one determines if the timestamp needs to
      be inserted in the frame, without changing any other bits.
      
      Fixes: ece19502 ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarDivya Koppera <divya.koppera@microchip.com>
      Link: https://lore.kernel.org/r/20240402071634.2483524-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      de99e1ea
    • Piotr Wejman's avatar
      net: stmmac: fix rx queue priority assignment · b3da86d4
      Piotr Wejman authored
      The driver should ensure that same priority is not mapped to multiple
      rx queues. From DesignWare Cores Ethernet Quality-of-Service
      Databook, section 17.1.29 MAC_RxQ_Ctrl2:
      "[...]The software must ensure that the content of this field is
      mutually exclusive to the PSRQ fields for other queues, that is,
      the same priority is not mapped to multiple Rx queues[...]"
      
      Previously rx_queue_priority() function was:
      - clearing all priorities from a queue
      - adding new priorities to that queue
      After this patch it will:
      - first assign new priorities to a queue
      - then remove those priorities from all other queues
      - keep other priorities previously assigned to that queue
      
      Fixes: a8f5102a ("net: stmmac: TX and RX queue priority configuration")
      Fixes: 2142754f ("net: stmmac: Add MAC related callbacks for XGMAC2")
      Signed-off-by: default avatarPiotr Wejman <piotrwejman90@gmail.com>
      Link: https://lore.kernel.org/r/20240401192239.33942-1-piotrwejman90@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b3da86d4
    • Duanqiang Wen's avatar
      net: txgbe: fix i2c dev name cannot match clkdev · c644920c
      Duanqiang Wen authored
      txgbe clkdev shortened clk_name, so i2c_dev info_name
      also need to shorten. Otherwise, i2c_dev cannot initialize
      clock.
      
      Fixes: e30cef00 ("net: txgbe: fix clk_name exceed MAX_DEV_ID limits")
      Signed-off-by: default avatarDuanqiang Wen <duanqiangwen@net-swift.com>
      Link: https://lore.kernel.org/r/20240402021843.126192-1-duanqiangwen@net-swift.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c644920c
    • Jakub Kicinski's avatar
      Merge branch 'net-fec-fix-to-suspend-resume-with-mac_managed_pm' · 22c5e0bc
      Jakub Kicinski authored
      John Ernberg says:
      
      ====================
      net: fec: Fix to suspend / resume with mac_managed_pm
      
      Since the introduction of mac_managed_pm in the FEC driver there were some
      discrepancies regarding power management of the PHY.
      
      This failed on our board that has a permanently powered Microchip LAN8700R
      attached to the FEC. Although the root cause of the failure can be traced
      back to f166f890 ("net: ethernet: fec: Replace interrupt driven MDIO
      with polled IO") and probably even before that, we only started noticing
      the problem going from 5.10 to 6.1.
      
      Since 557d5dc8 ("net: fec: use mac-managed PHY PM") is actually a fix
      to most of the power management sequencing problems that came with power
      managing the MDIO bus which for the FEC meant adding a race with FEC
      resume (and phy_start() if netif was running) and PHY resume.
      
      That it worked before for us was probably just luck...
      
      Thanks to Wei's response to my report at [1] I was able to pick up his
      patch and start honing in on the remaining missing details.
      
      [1]: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/
      
      v3: https://lore.kernel.org/netdev/20240306133734.4144808-1-john.ernberg@actia.se/
      v2: https://lore.kernel.org/netdev/20240229105256.2903095-1-john.ernberg@actia.se/
      v1: https://lore.kernel.org/netdev/20240212105010.2258421-1-john.ernberg@actia.se/
      ====================
      
      Link: https://lore.kernel.org/r/20240328155909.59613-1-john.ernberg@actia.seSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22c5e0bc
    • Wei Fang's avatar
      net: fec: Set mac_managed_pm during probe · cbc17e78
      Wei Fang authored
      Setting mac_managed_pm during interface up is too late.
      
      In situations where the link is not brought up yet and the system suspends
      the regular PHY power management will run. Since the FEC ETHEREN control
      bit is cleared (automatically) on suspend the controller is off in resume.
      When the regular PHY power management resume path runs in this context it
      will write to the MII_DATA register but nothing will be transmitted on the
      MDIO bus.
      
      This can be observed by the following log:
      
          fec 5b040000.ethernet eth0: MDIO read timeout
          Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
          Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
      
      The data written will however remain in the MII_DATA register.
      
      When the link later is set to administrative up it will trigger a call to
      fec_restart() which will restore the MII_SPEED register. This triggers the
      quirk explained in f166f890 ("net: ethernet: fec: Replace interrupt
      driven MDIO with polled IO") causing an extra MII_EVENT.
      
      This extra event desynchronizes all the MDIO register reads, causing them
      to complete too early. Leading all reads to read as 0 because
      fec_enet_mdio_wait() returns too early.
      
      When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
      the PHY to be initialized incorrectly and the PHY will not transmit any
      ethernet signal in this state. It cannot be brought out of this state
      without a power cycle of the PHY.
      
      Fixes: 557d5dc8 ("net: fec: use mac-managed PHY PM")
      Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      [jernberg: commit message]
      Signed-off-by: default avatarJohn Ernberg <john.ernberg@actia.se>
      Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.seSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cbc17e78
  2. 03 Apr, 2024 7 commits
    • Phil Elwell's avatar
      net: bcmgenet: Reset RBUF on first open · 0a6380cb
      Phil Elwell authored
      If the RBUF logic is not reset when the kernel starts then there
      may be some data left over from any network boot loader. If the
      64-byte packet headers are enabled then this can be fatal.
      
      Extend bcmgenet_dma_disable to do perform the reset, but not when
      called from bcmgenet_resume in order to preserve a wake packet.
      
      N.B. This different handling of resume is just based on a hunch -
      why else wouldn't one reset the RBUF as well as the TBUF? If this
      isn't the case then it's easy to change the patch to make the RBUF
      reset unconditional.
      
      See: https://github.com/raspberrypi/linux/issues/3850
      See: https://github.com/raspberrypi/firmware/issues/1882Signed-off-by: default avatarPhil Elwell <phil@raspberrypi.com>
      Signed-off-by: default avatarMaarten Vanraes <maarten@rmail.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a6380cb
    • Aleksandr Mishin's avatar
      octeontx2-af: Add array index check · ef15ddee
      Aleksandr Mishin authored
      In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach
      value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array.
      Fix this bug by adding 'iter' value check.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 91c6945e ("octeontx2-af: cn10k: Add RPM MAC support")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef15ddee
    • Tariq Toukan's avatar
      MAINTAINERS: mlx5: Add Tariq Toukan · c53fe72c
      Tariq Toukan authored
      Add myself as mlx5 core and EN maintainer.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20240401184347.53884-1-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c53fe72c
    • Kuniyuki Iwashima's avatar
      ipv6: Fix infinite recursion in fib6_dump_done(). · d21d4060
      Kuniyuki Iwashima authored
      syzkaller reported infinite recursive calls of fib6_dump_done() during
      netlink socket destruction.  [1]
      
      From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
      the response was generated.  The following recvmmsg() resumed the dump
      for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
      to the fault injection.  [0]
      
        12:01:34 executing program 3:
        r0 = socket$nl_route(0x10, 0x3, 0x0)
        sendmsg$nl_route(r0, ... snip ...)
        recvmmsg(r0, ... snip ...) (fail_nth: 8)
      
      Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
      of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3].  syzkaller stopped
      receiving the response halfway through, and finally netlink_sock_destruct()
      called nlk_sk(sk)->cb.done().
      
      fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
      is still not NULL.  fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
      nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
      itself recursively and hitting the stack guard page.
      
      To avoid the issue, let's set the destructor after kzalloc().
      
      [0]:
      FAULT_INJECTION: forcing a failure.
      name failslab, interval 1, probability 0, space 0, times 0
      CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91-dirty #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl (lib/dump_stack.c:117)
       should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
       should_failslab (mm/slub.c:3733)
       kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
       inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
       rtnl_dump_all (net/core/rtnetlink.c:4029)
       netlink_dump (net/netlink/af_netlink.c:2269)
       netlink_recvmsg (net/netlink/af_netlink.c:1988)
       ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
       ___sys_recvmsg (net/socket.c:2846)
       do_recvmmsg (net/socket.c:2943)
       __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
      
      [1]:
      BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
      stack guard page: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91-dirty #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: events netlink_sock_destruct_work
      RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
      Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
      RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
      RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
      RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
      R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
      FS:  0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
      PKRU: 55555554
      Call Trace:
       <#DF>
       </#DF>
       <TASK>
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       ...
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
       netlink_sock_destruct (net/netlink/af_netlink.c:401)
       __sk_destruct (net/core/sock.c:2177 (discriminator 2))
       sk_destruct (net/core/sock.c:2224)
       __sk_free (net/core/sock.c:2235)
       sk_free (net/core/sock.c:2246)
       process_one_work (kernel/workqueue.c:3259)
       worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
       kthread (kernel/kthread.c:388)
       ret_from_fork (arch/x86/kernel/process.c:153)
       ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
      Modules linked in:
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240401211003.25274-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d21d4060
    • Heiner Kallweit's avatar
      r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d · 5d872c9f
      Heiner Kallweit authored
      On some boards with this chip version the BIOS is buggy and misses
      to reset the PHY page selector. This results in the PHY ID read
      accessing registers on a different page, returning a more or
      less random value. Fix this by resetting the page selector first.
      
      Fixes: f1e911d5 ("r8169: add basic phylib support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/64f2055e-98b8-45ec-8568-665e3d54d4e6@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5d872c9f
    • Marco Pinna's avatar
      vsock/virtio: fix packet delivery to tap device · b32a09ea
      Marco Pinna authored
      Commit 82dfb540 ("VSOCK: Add virtio vsock vsockmon hooks") added
      virtio_transport_deliver_tap_pkt() for handing packets to the
      vsockmon device. However, in virtio_transport_send_pkt_work(),
      the function is called before actually sending the packet (i.e.
      before placing it in the virtqueue with virtqueue_add_sgs() and checking
      whether it returned successfully).
      Queuing the packet in the virtqueue can fail even multiple times.
      However, in virtio_transport_deliver_tap_pkt() we deliver the packet
      to the monitoring tap interface only the first time we call it.
      This certainly avoids seeing the same packet replicated multiple times
      in the monitoring interface, but it can show the packet sent with the
      wrong timestamp or even before we succeed to queue it in the virtqueue.
      
      Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
      and making sure it returned successfully.
      
      Fixes: 82dfb540 ("VSOCK: Add virtio vsock vsockmon hooks")
      Cc: stable@vge.kernel.org
      Signed-off-by: default avatarMarco Pinna <marco.pinn95@gmail.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b32a09ea
    • Duoming Zhou's avatar
      ax25: fix use-after-free bugs caused by ax25_ds_del_timer · fd819ad3
      Duoming Zhou authored
      When the ax25 device is detaching, the ax25_dev_device_down()
      calls ax25_ds_del_timer() to cleanup the slave_timer. When
      the timer handler is running, the ax25_ds_del_timer() that
      calls del_timer() in it will return directly. As a result,
      the use-after-free bugs could happen, one of the scenarios
      is shown below:
      
            (Thread 1)          |      (Thread 2)
                                | ax25_ds_timeout()
      ax25_dev_device_down()    |
        ax25_ds_del_timer()     |
          del_timer()           |
        ax25_dev_put() //FREE   |
                                |  ax25_dev-> //USE
      
      In order to mitigate bugs, when the device is detaching, use
      timer_shutdown_sync() to stop the timer.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd819ad3
  3. 02 Apr, 2024 7 commits
  4. 29 Mar, 2024 12 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2024-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 365af7ac
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Bluetooth: Fix TOCTOU in HCI debugfs implementation
       - Bluetooth: hci_event: set the conn encrypted before conn establishes
       - Bluetooth: qca: fix device-address endianness
       - Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
      
      * tag 'for-net-2024-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: Fix TOCTOU in HCI debugfs implementation
        Bluetooth: hci_event: set the conn encrypted before conn establishes
        Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
        Bluetooth: qca: fix device-address endianness
        Bluetooth: add quirk for broken address properties
        arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
        dt-bindings: bluetooth: add 'qcom,local-bd-address-broken'
        Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT"
      ====================
      
      Link: https://lore.kernel.org/r/20240329140453.2016486-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      365af7ac
    • Jakub Kicinski's avatar
      Merge branch 'tcp-fix-bind-regression-and-more-tests' · ec7ef3ea
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      tcp: Fix bind() regression and more tests.
      
      bhash2 has not been well tested for IPV6_V6ONLY option.
      
      This series fixes two regression around IPV6_V6ONLY, one of which
      has been there since bhash2 introduction, and another is introduced
      by a recent change.
      
      Also, this series adds as many tests as possible to catch regression
      easily.  The baseline is 28044fc1~ which is pre-bhash2 commit.
      
       Tested on 28044fc1~:
        # PASSED: 132 / 132 tests passed.
        # Totals: pass:132 fail:0 xfail:0 xpass:0 skip:0 error:0
      
       net.git:
        # FAILED: 125 / 132 tests passed.
        # Totals: pass:125 fail:7 xfail:0 xpass:0 skip:0 error:0
      
       With this series:
        # PASSED: 132 / 132 tests passed.
        # Totals: pass:132 fail:0 xfail:0 xpass:0 skip:0 error:0
      
      v1: https://lore.kernel.org/netdev/20240325181923.48769-1-kuniyu@amazon.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240326204251.51301-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec7ef3ea
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Add bind() tests for SO_REUSEADDR/SO_REUSEPORT. · 7679f096
      Kuniyuki Iwashima authored
      This patch adds two tests using SO_REUSEADDR and SO_REUSEPORT and
      defines errno for each test case.
      
      SO_REUSEADDR/SO_REUSEPORT is set for the per-fixture two bind()
      calls.
      
      The notable pattern is the pair of v6only [::] and plain [::].
      The two sockets are put into the same tb2, where per-bucket v6only
      flag would be useless to detect bind() conflict.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-9-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7679f096
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Add bind() tests for IPV6_V6ONLY. · d37f2f72
      Kuniyuki Iwashima authored
      bhash2 was not well tested for IPv6-only sockets.
      
      This patch adds test cases where we set IPV6_V6ONLY for per-fixture
      bind() calls if variant->ipv6_only[i] is true.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-8-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d37f2f72
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Add more bind() calls. · f40742c2
      Kuniyuki Iwashima authored
      In addtition to the two addresses defined in the fixtures, this patch
      add 6 more bind calls():
      
        * 0.0.0.0
        * 127.0.0.1
        * ::
        * ::1
        * ::ffff:0.0.0.0
        * ::ffff:127.0.0.1
      
      The first two per-fixture bind() calls control how inet_bind2_bucket
      is created, and the rest 6 bind() calls cover as many conflicting
      patterns as possible.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-7-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f40742c2
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Add v4-v4 and v6-v6 bind() conflict tests. · 5e9e9afd
      Kuniyuki Iwashima authored
      We don't have bind() conflict tests for the same protocol pairs.
      
      Let's add them except for the same address pair, which will be
      covered by the following patch adding 6 more bind() calls for
      each test case.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-6-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e9e9afd
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Define the reverse order bind() tests explicitly. · 6f9bc755
      Kuniyuki Iwashima authored
      Currently, bind_wildcard.c calls bind() twice for two addresses and
      checks the pre-defined errno against the 2nd call.  Also, the two
      bind() calls are swapped to cover various patterns how bind buckets
      are created.
      
      However, only testing two addresses is insufficient to detect regression.
      So, we will add more bind() calls, and then, we need to define different
      errno for each bind() per test case.
      
      As a prepartion, let's define the reverse order bind() test cases as
      fixtures.
      
      No functional changes are intended.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-5-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f9bc755
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Make bind() selftest flexible. · c48baf56
      Kuniyuki Iwashima authored
      Currently, bind_wildcard.c tests only (IPv4, IPv6) pairs, but we will
      add more tests for the same protocol pairs.
      
      This patch makes it possible by changing the address pointer to void.
      
      No functional changes are intended.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-4-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c48baf56
    • Kuniyuki Iwashima's avatar
      tcp: Fix bind() regression for v6-only wildcard and v4(-mapped-v6) non-wildcard addresses. · d91ef1e1
      Kuniyuki Iwashima authored
      Jianguo Wu reported another bind() regression introduced by bhash2.
      
      Calling bind() for the following 3 addresses on the same port, the
      3rd one should fail but now succeeds.
      
        1. 0.0.0.0 or ::ffff:0.0.0.0
        2. [::] w/ IPV6_V6ONLY
        3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
      
      The first two bind() create tb2 like this:
      
        bhash2 -> tb2(:: w/ IPV6_V6ONLY) -> tb2(0.0.0.0)
      
      The 3rd bind() will match with the IPv6 only wildcard address bucket
      in inet_bind2_bucket_match_addr_any(), however, no conflicting socket
      exists in the bucket.  So, inet_bhash2_conflict() will returns false,
      and thus, inet_bhash2_addr_any_conflict() returns false consequently.
      
      As a result, the 3rd bind() bypasses conflict check, which should be
      done against the IPv4 wildcard address bucket.
      
      So, in inet_bhash2_addr_any_conflict(), we must iterate over all buckets.
      
      Note that we cannot add ipv6_only flag for inet_bind2_bucket as it
      would confuse the following patetrn.
      
        1. [::] w/ SO_REUSE{ADDR,PORT} and IPV6_V6ONLY
        2. [::] w/ SO_REUSE{ADDR,PORT}
        3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
      
      The first bind() would create a bucket with ipv6_only flag true,
      the second bind() would add the [::] socket into the same bucket,
      and the third bind() could succeed based on the wrong assumption
      that ipv6_only bucket would not conflict with v4(-mapped-v6) address.
      
      Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
      Diagnosed-by: default avatarJianguo Wu <wujianguo106@163.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-3-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d91ef1e1
    • Kuniyuki Iwashima's avatar
      tcp: Fix bind() regression for v6-only wildcard and v4-mapped-v6 non-wildcard addresses. · ea111449
      Kuniyuki Iwashima authored
      Commit 5e07e672 ("tcp: Use bhash2 for v4-mapped-v6 non-wildcard
      address.") introduced bind() regression for v4-mapped-v6 address.
      
      When we bind() the following two addresses on the same port, the 2nd
      bind() should succeed but fails now.
      
        1. [::] w/ IPV6_ONLY
        2. ::ffff:127.0.0.1
      
      After the chagne, v4-mapped-v6 uses bhash2 instead of bhash to
      detect conflict faster, but I forgot to add a necessary change.
      
      During the 2nd bind(), inet_bind2_bucket_match_addr_any() returns
      the tb2 bucket of [::], and inet_bhash2_conflict() finally calls
      inet_bind_conflict(), which returns true, meaning conflict.
      
        inet_bhash2_addr_any_conflict
        |- inet_bind2_bucket_match_addr_any  <-- return [::] bucket
        `- inet_bhash2_conflict
           `- __inet_bhash2_conflict <-- checks IPV6_ONLY for AF_INET
              |                          but not for v4-mapped-v6 address
              `- inet_bind_conflict  <-- does not check address
      
      inet_bind_conflict() does not check socket addresses because
      __inet_bhash2_conflict() is expected to do so.
      
      However, it checks IPV6_V6ONLY attribute only against AF_INET
      socket, and not for v4-mapped-v6 address.
      
      As a result, v4-mapped-v6 address conflicts with v6-only wildcard
      address.
      
      To avoid that, let's add the missing test to use bhash2 for
      v4-mapped-v6 address.
      
      Fixes: 5e07e672 ("tcp: Use bhash2 for v4-mapped-v6 non-wildcard address.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240326204251.51301-2-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ea111449
    • Eric Dumazet's avatar
      erspan: make sure erspan_base_hdr is present in skb->head · 17af4205
      Eric Dumazet authored
      syzbot reported a problem in ip6erspan_rcv() [1]
      
      Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make
      sure erspan_base_hdr is present in skb linear part (skb->head)
      before getting @ver field from it.
      
      Add the missing pskb_may_pull() calls.
      
      v2: Reload iph pointer in erspan_rcv() after pskb_may_pull()
          because skb->head might have changed.
      
      [1]
      
       BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
       BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline]
       BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
       BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
        pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
        pskb_may_pull include/linux/skbuff.h:2756 [inline]
        ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
        gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
        ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438
        ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
        ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
        dst_input include/net/dst.h:460 [inline]
        ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652
        netif_receive_skb_internal net/core/dev.c:5738 [inline]
        netif_receive_skb+0x58/0x660 net/core/dev.c:5798
        tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549
        tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002
        tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
        call_write_iter include/linux/fs.h:2108 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xb63/0x1520 fs/read_write.c:590
        ksys_write+0x20f/0x4c0 fs/read_write.c:643
        __do_sys_write fs/read_write.c:655 [inline]
        __se_sys_write fs/read_write.c:652 [inline]
        __x64_sys_write+0x93/0xe0 fs/read_write.c:652
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:3804 [inline]
        slab_alloc_node mm/slub.c:3845 [inline]
        kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
        __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
        alloc_skb include/linux/skbuff.h:1318 [inline]
        alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
        sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
        tun_alloc_skb drivers/net/tun.c:1525 [inline]
        tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846
        tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
        call_write_iter include/linux/fs.h:2108 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xb63/0x1520 fs/read_write.c:590
        ksys_write+0x20f/0x4c0 fs/read_write.c:643
        __do_sys_write fs/read_write.c:655 [inline]
        __se_sys_write fs/read_write.c:652 [inline]
        __x64_sys_write+0x93/0xe0 fs/read_write.c:652
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g96249052 #0
      
      Fixes: cb73ee40 ("net: ip_gre: use erspan key field for tunnel lookup")
      Reported-by: syzbot+1c1cf138518bf0c53d68@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/netdev/000000000000772f2c0614b66ef7@google.com/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/20240328112248.1101491-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      17af4205
    • Atlas Yu's avatar
      r8169: skip DASH fw status checks when DASH is disabled · 5e864d90
      Atlas Yu authored
      On devices that support DASH, the current code in the "rtl_loop_wait" function
      raises false alarms when DASH is disabled. This occurs because the function
      attempts to wait for the DASH firmware to be ready, even though it's not
      relevant in this case.
      
      r8169 0000:0c:00.0 eth0: RTL8168ep/8111ep, 38:7c:76:49:08:d9, XID 502, IRQ 86
      r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
      r8169 0000:0c:00.0 eth0: DASH disabled
      ...
      r8169 0000:0c:00.0 eth0: rtl_ep_ocp_read_cond == 0 (loop: 30, delay: 10000).
      
      This patch modifies the driver start/stop functions to skip checking the DASH
      firmware status when DASH is explicitly disabled. This prevents unnecessary
      delays and false alarms.
      
      The patch has been tested on several ThinkStation P8/PX workstations.
      
      Fixes: 0ab0c45d ("r8169: add handling DASH when DASH is disabled")
      Signed-off-by: default avatarAtlas Yu <atlas.yu@canonical.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/20240328055152.18443-1-atlas.yu@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e864d90