1. 02 Nov, 2022 10 commits
    • David S. Miller's avatar
      Merge branch 'misdn-fixes' · ba9169f5
      David S. Miller authored
      Yang Yingliang says:
      
      ====================
      two fixes for mISDN
      
      This patchset fixes two issues when device_add() returns error.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba9169f5
    • Yang Yingliang's avatar
      isdn: mISDN: netjet: fix wrong check of device registration · bf00f542
      Yang Yingliang authored
      The class is set in mISDN_register_device(), but if device_add() returns
      error, it will lead to delete a device without added, fix this by using
      device_is_registered() to check if the device is registered.
      
      Fixes: a900845e ("mISDN: Add support for Traverse Technologies NETJet PCI cards")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf00f542
    • Yang Yingliang's avatar
      mISDN: fix possible memory leak in mISDN_register_device() · e7d1d4d9
      Yang Yingliang authored
      Afer commit 1fa5ae85 ("driver core: get rid of struct device's
      bus_id string array"), the name of device is allocated dynamically,
      add put_device() to give up the reference, so that the name can be
      freed in kobject_cleanup() when the refcount is 0.
      
      Set device class before put_device() to avoid null release() function
      WARN message in device_release().
      
      Fixes: 1fa5ae85 ("driver core: get rid of struct device's bus_id string array")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7d1d4d9
    • Zhang Qilong's avatar
      rose: Fix NULL pointer dereference in rose_send_frame() · e97c089d
      Zhang Qilong authored
      The syzkaller reported an issue:
      
      KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387]
      CPU: 0 PID: 4069 Comm: kworker/0:15 Not tainted 6.0.0-syzkaller-02734-g0326074f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Workqueue: rcu_gp srcu_invoke_callbacks
      RIP: 0010:rose_send_frame+0x1dd/0x2f0 net/rose/rose_link.c:101
      Call Trace:
       <IRQ>
       rose_transmit_clear_request+0x1d5/0x290 net/rose/rose_link.c:255
       rose_rx_call_request+0x4c0/0x1bc0 net/rose/af_rose.c:1009
       rose_loopback_timer+0x19e/0x590 net/rose/rose_loopback.c:111
       call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474
       expire_timers kernel/time/timer.c:1519 [inline]
       __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790
       __run_timers kernel/time/timer.c:1768 [inline]
       run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803
       __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
       [...]
       </IRQ>
      
      It triggers NULL pointer dereference when 'neigh->dev->dev_addr' is
      called in the rose_send_frame(). It's the first occurrence of the
      `neigh` is in rose_loopback_timer() as `rose_loopback_neigh', and
      the 'dev' in 'rose_loopback_neigh' is initialized sa nullptr.
      
      It had been fixed by commit 3b3fd068
      ("rose: Fix Null pointer dereference in rose_send_frame()") ever.
      But it's introduced by commit 3c53cd65
      ("rose: check NULL rose_loopback_neigh->loopback") again.
      
      We fix it by add NULL check in rose_transmit_clear_request(). When
      the 'dev' in 'neigh' is NULL, we don't reply the request and just
      clear it.
      
      syzkaller don't provide repro, and I provide a syz repro like:
      r0 = syz_init_net_socket$bt_sco(0x1f, 0x5, 0x2)
      ioctl$sock_inet_SIOCSIFFLAGS(r0, 0x8914, &(0x7f0000000180)={'rose0\x00', 0x201})
      r1 = syz_init_net_socket$rose(0xb, 0x5, 0x0)
      bind$rose(r1, &(0x7f00000000c0)=@full={0xb, @dev, @null, 0x0, [@null, @null, @netrom, @netrom, @default, @null]}, 0x40)
      connect$rose(r1, &(0x7f0000000240)=@short={0xb, @dev={0xbb, 0xbb, 0xbb, 0x1, 0x0}, @remote={0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x1}, 0x1, @netrom={0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0x0, 0x0}}, 0x1c)
      
      Fixes: 3c53cd65 ("rose: check NULL rose_loopback_neigh->loopback")
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e97c089d
    • Florian Westphal's avatar
      netlink: introduce bigendian integer types · ecaf75ff
      Florian Westphal authored
      Jakub reported that the addition of the "network_byte_order"
      member in struct nla_policy increases size of 32bit platforms.
      
      Instead of scraping the bit from elsewhere Johannes suggested
      to add explicit NLA_BE types instead, so do this here.
      
      NLA_POLICY_MAX_BE() macro is removed again, there is no need
      for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing.
      
      NLA_BE64 can be added later.
      
      Fixes: 08724ef6 ("netlink: introduce NLA_POLICY_MAX_BE")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Suggested-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ecaf75ff
    • Horatiu Vultur's avatar
      net: lan966x: Fix unmapping of received frames using FDMA · fc57062f
      Horatiu Vultur authored
      When lan966x was receiving a frame, then it was building the skb and
      after that it was calling dma_unmap_single with frame size as the
      length. This actually has 2 issues:
      1. It is using a length to map and a different length to unmap.
      2. When the unmap was happening, the data was sync for cpu but it could
         be that this will overwrite what build_skb was initializing.
      
      The fix for these two problems is to change the order of operations.
      First to sync the frame for cpu, then to build the skb and in the end to
      unmap using the correct size but without sync the frame again for cpu.
      
      Fixes: c8349639 ("net: lan966x: Add FDMA functionality")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Link: https://lore.kernel.org/r/20221031133421.1283196-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fc57062f
    • Jakub Kicinski's avatar
      Merge branch 'net-lan966x-fixes-for-when-mtu-is-changed' · 70644f72
      Jakub Kicinski authored
      Horatiu Vultur says:
      
      ====================
      net: lan966x: Fixes for when MTU is changed
      
      There were multiple problems in different parts of the driver when
      the MTU was changed.
      The first problem was that the HW was missing to configure the correct
      value, it was missing ETH_HLEN and ETH_FCS_LEN. The second problem was
      when vlan filtering was enabled/disabled, the MRU was not adjusted
      corretly. While the last issue was that the FDMA was calculated wrongly
      the correct maximum MTU.
      ====================
      
      Link: https://lore.kernel.org/r/20221030213636.1031408-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      70644f72
    • Horatiu Vultur's avatar
      net: lan966x: Fix FDMA when MTU is changed · 872ad758
      Horatiu Vultur authored
      When MTU is changed, FDMA is required to calculate what is the maximum
      size of the frame that it can received. So it can calculate what is the
      page order needed to allocate for the received frames.
      The first problem was that, when the max MTU was calculated it was
      reading the value from dev and not from HW, so in this way it was
      missing L2 header + the FCS.
      The other problem was that once the skb is created using
      __build_skb_around, it would reserve some space for skb_shared_info.
      So if we received a frame which size is at the limit of the page order
      then the creating will failed because it would not have space to put all
      the data.
      
      Fixes: 2ea1cbac ("net: lan966x: Update FDMA to change MTU.")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      872ad758
    • Horatiu Vultur's avatar
      net: lan966x: Adjust maximum frame size when vlan is enabled/disabled · 25f28bb1
      Horatiu Vultur authored
      When vlan filtering is enabled/disabled, it is required to adjust the
      maximum received frame size that it can received. When vlan filtering is
      enabled, it would all to receive extra 4 bytes, that are the vlan tag.
      So the maximum frame size would be 1522 with a vlan tag. If vlan
      filtering is disabled then the maximum frame size would be 1518
      regardless if there is or not a vlan tag.
      
      Fixes: 6d2c186a ("net: lan966x: Add vlan support.")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      25f28bb1
    • Horatiu Vultur's avatar
      net: lan966x: Fix the MTU calculation · 486c2922
      Horatiu Vultur authored
      When the MTU was changed, the lan966x didn't take in consideration
      the L2 header and the FCS. So the HW was configured with a smaller
      value than what was desired. Therefore the correct value to configure
      the HW would be new_mtu + ETH_HLEN + ETH_FCS_LEN.
      The vlan tag is not considered here, because at the time when the
      blamed commit was added, there was no vlan filtering support. The
      vlan fix will be part of the next patch.
      
      Fixes: d28d6d2e ("net: lan966x: add port module support")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      486c2922
  2. 01 Nov, 2022 3 commits
    • Christophe JAILLET's avatar
      sfc: Fix an error handling path in efx_pci_probe() · 6c412da5
      Christophe JAILLET authored
      If an error occurs after the first kzalloc() the corresponding memory
      allocation is never freed.
      
      Add the missing kfree() in the error handling path, as already done in the
      remove() function.
      
      Fixes: 7e773594 ("sfc: Separate efx_nic memory from net_device memory")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/dc114193121c52c8fa3779e49bdd99d4b41344a9.1667077009.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c412da5
    • Ziyang Xuan's avatar
      net: tun: fix bugs for oversize packet when napi frags enabled · 363a5328
      Ziyang Xuan authored
      Recently, we got two syzkaller problems because of oversize packet
      when napi frags enabled.
      
      One of the problems is because the first seg size of the iov_iter
      from user space is very big, it is 2147479538 which is bigger than
      the threshold value for bail out early in __alloc_pages(). And
      skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc
      reserves without __GFP_NOWARN flag. Thus we got a warning as following:
      
      ========================================================
      WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
      ...
      Call trace:
       __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
       __alloc_pages_node include/linux/gfp.h:550 [inline]
       alloc_pages_node include/linux/gfp.h:564 [inline]
       kmalloc_large_node+0x94/0x350 mm/slub.c:4038
       __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
       __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
       pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
       __skb_grow include/linux/skbuff.h:2779 [inline]
       tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
       tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
       tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
      
      The other problem is because odd IPv6 packets without NEXTHDR_NONE
      extension header and have big packet length, it is 2127925 which is
      bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in
      ipv6_gro_receive(), network_header offset and transport_header offset
      are all bigger than U16_MAX. That would trigger skb->network_header
      and skb->transport_header overflow error, because they are all '__u16'
      type. Eventually, it would affect the value for __skb_push(skb, value),
      and make it be a big value. After __skb_push() in ipv6_gro_receive(),
      skb->data would less than skb->head, an out of bounds memory bug occurred.
      That would trigger the problem as following:
      
      ==================================================================
      BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260
      ...
      Call trace:
       dump_backtrace+0xd8/0x130
       show_stack+0x1c/0x50
       dump_stack_lvl+0x64/0x7c
       print_address_description.constprop.0+0xbc/0x2e8
       print_report+0x100/0x1e4
       kasan_report+0x80/0x120
       __asan_load8+0x78/0xa0
       eth_type_trans+0x100/0x260
       napi_gro_frags+0x164/0x550
       tun_get_user+0xda4/0x1270
       tun_chr_write_iter+0x74/0x130
       do_iter_readv_writev+0x130/0x1ec
       do_iter_write+0xbc/0x1e0
       vfs_writev+0x13c/0x26c
      
      To fix the problems, restrict the packet size less than
      (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved
      skb space in napi_alloc_skb() because transport_header is an offset from
      skb->head. Add len check in tun_napi_alloc_frags() simply.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      363a5328
    • Rick Lindsley's avatar
      ibmvnic: change maintainers for vnic driver · e230d36f
      Rick Lindsley authored
      Changed maintainers for vnic driver, since Dany has new responsibilities.
      Also added Nick Child as reviewer.
      Signed-off-by: default avatarRick Lindsley <ricklind@us.ibm.com>
      Link: https://lore.kernel.org/r/20221028203509.4070154-1-ricklind@us.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e230d36f
  3. 31 Oct, 2022 2 commits
  4. 30 Oct, 2022 5 commits
  5. 29 Oct, 2022 10 commits
  6. 28 Oct, 2022 2 commits
    • Radhey Shyam Pandey's avatar
      net: emaclite: update reset_lock member documentation · 8fdf3f6a
      Radhey Shyam Pandey authored
      Instead of generic description, mention what reset_lock actually
      protects i.e. lock to serialize xmit and tx_timeout execution.
      Signed-off-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fdf3f6a
    • Chen Zhongjin's avatar
      net: dsa: Fix possible memory leaks in dsa_loop_init() · 633efc8b
      Chen Zhongjin authored
      kmemleak reported memory leaks in dsa_loop_init():
      
      kmemleak: 12 new suspected memory leaks
      
      unreferenced object 0xffff8880138ce000 (size 2048):
        comm "modprobe", pid 390, jiffies 4295040478 (age 238.976s)
        backtrace:
          [<000000006a94f1d5>] kmalloc_trace+0x26/0x60
          [<00000000a9c44622>] phy_device_create+0x5d/0x970
          [<00000000d0ee2afc>] get_phy_device+0xf3/0x2b0
          [<00000000dca0c71f>] __fixed_phy_register.part.0+0x92/0x4e0
          [<000000008a834798>] fixed_phy_register+0x84/0xb0
          [<0000000055223fcb>] dsa_loop_init+0xa9/0x116 [dsa_loop]
          ...
      
      There are two reasons for memleak in dsa_loop_init().
      
      First, fixed_phy_register() create and register phy_device:
      
      fixed_phy_register()
        get_phy_device()
          phy_device_create() # freed by phy_device_free()
        phy_device_register() # freed by phy_device_remove()
      
      But fixed_phy_unregister() only calls phy_device_remove().
      So the memory allocated in phy_device_create() is leaked.
      
      Second, when mdio_driver_register() fail in dsa_loop_init(),
      it just returns and there is no cleanup for phydevs.
      
      Fix the problems by catching the error of mdio_driver_register()
      in dsa_loop_init(), then calling both fixed_phy_unregister() and
      phy_device_free() to release phydevs.
      Also add a function for phydevs cleanup to avoid duplacate.
      
      Fixes: 98cd1552 ("net: dsa: Mock-up driver")
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      633efc8b
  7. 27 Oct, 2022 8 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 23758867
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from 802.15.4 (Zigbee et al).
      
        Current release - regressions:
      
         - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1
      
        Current release - new code bugs:
      
         - mptcp: fix abba deadlock on fastopen
      
         - eth: stmmac: rk3588: allow multiple gmac controllers in one system
      
        Previous releases - regressions:
      
         - ip: rework the fix for dflt addr selection for connected nexthop
      
         - net: couple more fixes for misinterpreting bits in struct page
           after the signature was added
      
        Previous releases - always broken:
      
         - ipv6: ensure sane device mtu in tunnels
      
         - openvswitch: switch from WARN to pr_warn on a user-triggerable path
      
         - ethtool: eeprom: fix null-deref on genl_info in dump
      
         - ieee802154: more return code fixes for corner cases in
           dgram_sendmsg
      
         - mac802154: fix link-quality-indicator recording
      
         - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack
           offload
      
         - eth: fec: limit register access on i.MX6UL
      
         - eth: bcm4908_enet: update TX stats after actual transmission
      
         - can: rcar_canfd: improve IRQ handling for RZ/G2L
      
        Misc:
      
         - genetlink: piggy back on the newly added resv_op_start to enforce
           more sanity checks on new commands"
      
      * tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
        net: enetc: survive memory pressure without crashing
        kcm: do not sense pfmemalloc status in kcm_sendpage()
        net: do not sense pfmemalloc status in skb_append_pagefrags()
        net/mlx5e: Fix macsec sci endianness at rx sa update
        net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
        net/mlx5e: Fix macsec rx security association (SA) update/delete
        net/mlx5e: Fix macsec coverity issue at rx sa update
        net/mlx5: Fix crash during sync firmware reset
        net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
        net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
        net/mlx5e: TC, Reject forwarding from internal port to internal port
        net/mlx5: Fix possible use-after-free in async command interface
        net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
        net/mlx5e: Update restore chain id for slow path packets
        net/mlx5e: Extend SKB room check to include PTP-SQ
        net/mlx5: DR, Fix matcher disconnect error flow
        net/mlx5: Wait for firmware to enable CRS before pci_restore_state
        net/mlx5e: Do not increment ESN when updating IPsec ESN state
        netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
        netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed
        ...
      23758867
    • Linus Torvalds's avatar
      Merge tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 7dd257d0
      Linus Torvalds authored
      Pull execve fixes from Kees Cook:
      
       - Fix an ancient signal action copy race (Bernd Edlinger)
      
       - Fix a memory leak in ELF loader, when under memory pressure (Li
         Zetao)
      
      * tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        fs/binfmt_elf: Fix memory leak in load_elf_binary()
        exec: Copy oldsighand->action under spin-lock
      7dd257d0
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 2eb72d85
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - Fix older Clang vs recent overflow KUnit test additions (Nick
         Desaulniers, Kees Cook)
      
       - Fix kern-doc visibility for overflow helpers (Kees Cook)
      
      * tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        overflow: Refactor test skips for Clang-specific issues
        overflow: disable failing tests for older clang versions
        overflow: Fix kern-doc markup for functions
      2eb72d85
    • Linus Torvalds's avatar
      Merge tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 7f9a7cd6
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "A bunch of patches addressing issues in the vivid driver and adding
        new checks in V4L2 to validate the input parameters from some ioctls"
      
      * tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: vivid.rst: loop_video is set on the capture devnode
        media: vivid: set num_in/outputs to 0 if not supported
        media: vivid: drop GFP_DMA32
        media: vivid: fix control handler mutex deadlock
        media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced'
        media: v4l2-dv-timings: add sanity checks for blanking values
        media: vivid: dev->bitmap_cap wasn't freed in all cases
        media: vivid: s_fbuf: add more sanity checks
      7f9a7cd6
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 200204f5
      Linus Torvalds authored
      Pull fscrypt fix from Eric Biggers:
       "Fix a memory leak that was introduced by a change that went into -rc1"
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fscrypt: fix keyring memory leak on mount failure
      200204f5
    • Vladimir Oltean's avatar
      net: enetc: survive memory pressure without crashing · 84ce1ca3
      Vladimir Oltean authored
      Under memory pressure, enetc_refill_rx_ring() may fail, and when called
      during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
      checked for.
      
      An extreme case of memory pressure will result in exactly zero buffers
      being allocated for the RX ring, and in such a case it is expected that
      hardware drops all RX packets due to lack of buffers.
      
      This does not happen, because the reset-default value of the consumer
      and produces index is 0, and this makes the ENETC think that all buffers
      have been initialized and that it owns them (when in reality none were).
      
      The hardware guide explains this best:
      
      | Configure the receive ring producer index register RBaPIR with a value
      | of 0. The producer index is initially configured by software but owned
      | by hardware after the ring has been enabled. Hardware increments the
      | index when a frame is received which may consume one or more BDs.
      | Hardware is not allowed to increment the producer index to match the
      | consumer index since it is used to indicate an empty condition. The ring
      | can hold at most RBLENR[LENGTH]-1 received BDs.
      |
      | Configure the receive ring consumer index register RBaCIR. The
      | consumer index is owned by software and updated during operation of the
      | of the BD ring by software, to indicate that any receive data occupied
      | in the BD has been processed and it has been prepared for new data.
      | - If consumer index and producer index are initialized to the same
      |   value, it indicates that all BDs in the ring have been prepared and
      |   hardware owns all of the entries.
      | - If consumer index is initialized to producer index plus N, it would
      |   indicate N BDs have been prepared. Note that hardware cannot start if
      |   only a single buffer is prepared due to the restrictions described in
      |   (2).
      | - Software may write consumer index to match producer index anytime
      |   while the ring is operational to indicate all received BDs prior have
      |   been processed and new BDs prepared for hardware.
      
      Normally, the value of rx_ring->rcir (consumer index) is brought in sync
      with the rx_ring->next_to_use software index, but this only happens if
      page allocation ever succeeded.
      
      When PI==CI==0, the hardware appears to receive frames and write them to
      DMA address 0x0 (?!), then set the READY bit in the BD.
      
      The enetc_clean_rx_ring() function (and its XDP derivative) is naturally
      not prepared to handle such a condition. It will attempt to process
      those frames using the rx_swbd structure associated with index i of the
      RX ring, but that structure is not fully initialized (enetc_new_page()
      does all of that). So what happens next is undefined behavior.
      
      To operate using no buffer, we must initialize the CI to PI + 1, which
      will block the hardware from advancing the CI any further, and drop
      everything.
      
      The issue was seen while adding support for zero-copy AF_XDP sockets,
      where buffer memory comes from user space, which can even decide to
      supply no buffers at all (example: "xdpsock --txonly"). However, the bug
      is present also with the network stack code, even though it would take a
      very determined person to trigger a page allocation failure at the
      perfect time (a series of ifup/ifdown under memory pressure should
      eventually reproduce it given enough retries).
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Link: https://lore.kernel.org/r/20221027182925.3256653-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      84ce1ca3
    • Eric Dumazet's avatar
      kcm: do not sense pfmemalloc status in kcm_sendpage() · ee15e1f3
      Eric Dumazet authored
      Similar to changes done in TCP in blamed commit.
      We should not sense pfmemalloc status in sendpage() methods.
      
      Fixes: 32614006 ("tcp: TX zerocopy should not sense pfmemalloc status")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221027040637.1107703-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ee15e1f3
    • Eric Dumazet's avatar
      net: do not sense pfmemalloc status in skb_append_pagefrags() · 228ebc41
      Eric Dumazet authored
      skb_append_pagefrags() is used by af_unix and udp sendpage()
      implementation so far.
      
      In commit 32614006 ("tcp: TX zerocopy should not sense
      pfmemalloc status") we explained why we should not sense
      pfmemalloc status for pages owned by user space.
      
      We should also use skb_fill_page_desc_noacc()
      in skb_append_pagefrags() to avoid following KCSAN report:
      
      BUG: KCSAN: data-race in lru_add_fn / skb_append_pagefrags
      
      write to 0xffffea00058fc1c8 of 8 bytes by task 17319 on cpu 0:
      __list_add include/linux/list.h:73 [inline]
      list_add include/linux/list.h:88 [inline]
      lruvec_add_folio include/linux/mm_inline.h:323 [inline]
      lru_add_fn+0x327/0x410 mm/swap.c:228
      folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246
      lru_add_drain_cpu+0x73/0x250 mm/swap.c:669
      lru_add_drain+0x21/0x60 mm/swap.c:773
      free_pages_and_swap_cache+0x16/0x70 mm/swap_state.c:311
      tlb_batch_pages_flush mm/mmu_gather.c:59 [inline]
      tlb_flush_mmu_free mm/mmu_gather.c:256 [inline]
      tlb_flush_mmu+0x5b2/0x640 mm/mmu_gather.c:263
      tlb_finish_mmu+0x86/0x100 mm/mmu_gather.c:363
      exit_mmap+0x190/0x4d0 mm/mmap.c:3098
      __mmput+0x27/0x1b0 kernel/fork.c:1185
      mmput+0x3d/0x50 kernel/fork.c:1207
      copy_process+0x19fc/0x2100 kernel/fork.c:2518
      kernel_clone+0x166/0x550 kernel/fork.c:2671
      __do_sys_clone kernel/fork.c:2812 [inline]
      __se_sys_clone kernel/fork.c:2796 [inline]
      __x64_sys_clone+0xc3/0xf0 kernel/fork.c:2796
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffffea00058fc1c8 of 8 bytes by task 17325 on cpu 1:
      page_is_pfmemalloc include/linux/mm.h:1817 [inline]
      __skb_fill_page_desc include/linux/skbuff.h:2432 [inline]
      skb_fill_page_desc include/linux/skbuff.h:2453 [inline]
      skb_append_pagefrags+0x210/0x600 net/core/skbuff.c:3974
      unix_stream_sendpage+0x45e/0x990 net/unix/af_unix.c:2338
      kernel_sendpage+0x184/0x300 net/socket.c:3561
      sock_sendpage+0x5a/0x70 net/socket.c:1054
      pipe_to_sendpage+0x128/0x160 fs/splice.c:361
      splice_from_pipe_feed fs/splice.c:415 [inline]
      __splice_from_pipe+0x222/0x4d0 fs/splice.c:559
      splice_from_pipe fs/splice.c:594 [inline]
      generic_splice_sendpage+0x89/0xc0 fs/splice.c:743
      do_splice_from fs/splice.c:764 [inline]
      direct_splice_actor+0x80/0xa0 fs/splice.c:931
      splice_direct_to_actor+0x305/0x620 fs/splice.c:886
      do_splice_direct+0xfb/0x180 fs/splice.c:974
      do_sendfile+0x3bf/0x910 fs/read_write.c:1255
      __do_sys_sendfile64 fs/read_write.c:1323 [inline]
      __se_sys_sendfile64 fs/read_write.c:1309 [inline]
      __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1309
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x0000000000000000 -> 0xffffea00058fc188
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 17325 Comm: syz-executor.0 Not tainted 6.1.0-rc1-syzkaller-00158-g440b7895-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
      
      Fixes: 32614006 ("tcp: TX zerocopy should not sense pfmemalloc status")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221027040346.1104204-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      228ebc41