1. 13 May, 2023 10 commits
    • Kuniyuki Iwashima's avatar
      ping: Convert hlist_nulls to plain hlist. · f1b5dfe6
      Kuniyuki Iwashima authored
      Since introduced in commit c319b4d7 ("net: ipv4: add IPPROTO_ICMP
      socket kind"), ping socket does not use SLAB_TYPESAFE_BY_RCU nor check
      nulls marker in loops.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1b5dfe6
    • David S. Miller's avatar
      Merge branch 'skb_frag_fill_page_desc' · d5e7d196
      David S. Miller authored
      Yunsheng Lin says:
      
      ====================
      net: introduce skb_frag_fill_page_desc()
      
      Most users use __skb_frag_set_page()/skb_frag_off_set()/
      skb_frag_size_set() to fill the page desc for a skb frag.
      It does not make much sense to calling __skb_frag_set_page()
      without calling skb_frag_off_set(), as the offset may depend
      on whether the page is head page or tail page, so add
      skb_frag_fill_page_desc() to fill the page desc for a skb
      frag.
      
      In the future, we can make sure the page in the frag is
      head page of compound page or a base page, if not, we
      may warn about that and convert the tail page to head
      page and update the offset accordingly, if we see a warning
      about that, we also fix the caller to fill the head page
      in the frag. when the fixing is done, we may remove the
      warning and converting.
      
      In this way, we can remove the compound_head() or use
      page_ref_*() like the below case:
      https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L881
      https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3383
      
      It may also convert net stack to use the folio easier.
      
      V1: repost with all the ack/review tags included.
      RFC: remove a local variable as pointed out by Simon.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5e7d196
    • Yunsheng Lin's avatar
      net: remove __skb_frag_set_page() · 278fda0d
      Yunsheng Lin authored
      The remaining users calling __skb_frag_set_page() with
      page being NULL seems to be doing defensive programming,
      as shinfo->nr_frags is already decremented, so remove
      them.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      278fda0d
    • Yunsheng Lin's avatar
      net: introduce and use skb_frag_fill_page_desc() · b51f4113
      Yunsheng Lin authored
      Most users use __skb_frag_set_page()/skb_frag_off_set()/
      skb_frag_size_set() to fill the page desc for a skb frag.
      
      Introduce skb_frag_fill_page_desc() to do that.
      
      net/bpf/test_run.c does not call skb_frag_off_set() to
      set the offset, "copy_from_user(page_address(page), ...)"
      and 'shinfo' being part of the 'data' kzalloced in
      bpf_test_init() suggest that it is assuming offset to be
      initialized as zero, so call skb_frag_fill_page_desc()
      with offset being zero for this case.
      
      Also, skb_frag_set_page() is not used anymore, so remove
      it.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b51f4113
    • Vladimir Nikishkin's avatar
      selftests: net: vxlan: Add tests for vxlan nolocalbypass option. · 305c0418
      Vladimir Nikishkin authored
      Add test to make sure that the localbypass option is on by default.
      
      Add test to change vxlan localbypass to nolocalbypass and check
      that packets are delivered to userspace.
      Signed-off-by: default avatarVladimir Nikishkin <vladimir@nikishkin.pw>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      305c0418
    • Vladimir Nikishkin's avatar
      net: vxlan: Add nolocalbypass option to vxlan. · 69474a8a
      Vladimir Nikishkin authored
      If a packet needs to be encapsulated towards a local destination IP, the
      packet will undergo a "local bypass" and be injected into the Rx path as
      if it was received by the target VXLAN device without undergoing
      encapsulation. If such a device does not exist, the packet will be
      dropped.
      
      There are scenarios where we do not want to perform such a bypass, but
      instead want the packet to be encapsulated and locally received by a
      user space program for post-processing.
      
      To that end, add a new VXLAN device attribute that controls whether a
      "local bypass" is performed or not. Default to performing a bypass to
      maintain existing behavior.
      Signed-off-by: default avatarVladimir Nikishkin <vladimir@nikishkin.pw>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69474a8a
    • David S. Miller's avatar
      Merge branch 'broadcom-phy-wol' · 7eef636e
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      Support for Wake-on-LAN for Broadcom PHYs
      
      This patch series adds support for Wake-on-LAN to the Broadcom PHY
      driver. Specifically the BCM54210E/B50212E are capable of supporting
      Wake-on-LAN using an external pin typically wired up to a system's GPIO.
      
      These PHY operate a programmable Ethernet MAC destination address
      comparator which will fire up an interrupt whenever a match is received.
      Because of that, it was necessary to introduce patch #1 which allows the
      PHY driver's ->suspend() routine to be called unconditionally. This is
      necessary in our case because we need a hook point into the device
      suspend/resume flow to enable the wake-up interrupt as late as possible.
      
      Patch #2 adds support for the Broadcom PHY library and driver for
      Wake-on-LAN proper with the WAKE_UCAST, WAKE_MCAST, WAKE_BCAST,
      WAKE_MAGIC and WAKE_MAGICSECURE. Note that WAKE_FILTER is supportable,
      however this will require further discussions and be submitted as a RFC
      series later on.
      
      Patch #3 updates the GENET driver to defer to the PHY for Wake-on-LAN if
      the PHY supports it, thus allowing the MAC to be powered down to
      conserve power.
      
      Changes in v3:
      
      - collected Reviewed-by tags
      - explicitly use return 0 in bcm54xx_phy_probe() (Paolo)
      
      Changes in v2:
      
      - introduce PHY_ALWAYS_CALL_SUSPEND and only have the Broadcom PHY
        driver set this flag to minimize changes to the suspend flow to only
        drivers that need it
      
      - corrected possibly uninitialized variable in bcm54xx_set_wakeup_irq
        (Simon)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7eef636e
    • Florian Fainelli's avatar
      net: bcmgenet: Add support for PHY-based Wake-on-LAN · 7e400ff3
      Florian Fainelli authored
      If available, interrogate the PHY to find out whether we can use it for
      Wake-on-LAN. This can be a more power efficient way of implementing
      that feature, especially when the MAC is powered off in low power
      states.
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e400ff3
    • Florian Fainelli's avatar
      net: phy: broadcom: Add support for Wake-on-LAN · 8baddaa9
      Florian Fainelli authored
      Add support for WAKE_UCAST, WAKE_MCAST, WAKE_BCAST, WAKE_MAGIC and
      WAKE_MAGICSECURE. This is only supported with the BCM54210E and
      compatible Ethernet PHYs. Using the in-band interrupt or an out of band
      GPIO interrupts are supported.
      
      Broadcom PHYs will generate a Wake-on-LAN level low interrupt on LED4 as
      soon as one of the supported patterns is being matched. That includes
      generating such an interrupt even if the PHY is operated during normal
      modes. If WAKE_UCAST is selected, this could lead to the LED4 interrupt
      firing up for every packet being received which is absolutely
      undesirable from a performance point of view.
      
      Because the Wake-on-LAN configuration can be set long before the system
      is actually put to sleep, we cannot have an interrupt service routine to
      clear on read the interrupt status register and ensure that new packet
      matches will be detected.
      
      It is desirable to enable the Wake-on-LAN interrupt as late as possible
      during the system suspend process such that we limit the number of
      interrupts to be handled by the system, but also conversely feed into
      the Linux's system suspend way of dealing with interrupts in and around
      the points of no return.
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8baddaa9
    • Florian Fainelli's avatar
      net: phy: Allow drivers to always call into ->suspend() · a7e34480
      Florian Fainelli authored
      A few PHY drivers are currently attempting to not suspend the PHY when
      Wake-on-LAN is enabled, however that code is not currently executing at
      all due to an early check in phy_suspend().
      
      This prevents PHY drivers from making an appropriate decisions and put
      the hardware into a low power state if desired.
      
      In order to allow the PHY drivers to opt into getting their ->suspend
      routine to be called, add a PHY_ALWAYS_CALL_SUSPEND bit which can be
      set. A boolean that tracks whether the PHY or the attached MAC has
      Wake-on-LAN enabled is also provided for convenience.
      
      If phydev::wol_enabled then the PHY shall not prevent its own
      Wake-on-LAN detection logic from working and shall not prevent the
      Ethernet MAC from receiving packets for matching.
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7e34480
  2. 12 May, 2023 24 commits
  3. 11 May, 2023 6 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · bc88ba0c
      Jakub Kicinski authored
      Cross-merge networking fixes. No conflicts.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc88ba0c
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6e27831b
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter.
      
        Current release - regressions:
      
         - mtk_eth_soc: fix NULL pointer dereference
      
        Previous releases - regressions:
      
         - core:
            - skb_partial_csum_set() fix against transport header magic value
            - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
            - annotate sk->sk_err write from do_recvmmsg()
            - add vlan_get_protocol_and_depth() helper
      
         - netlink: annotate accesses to nlk->cb_running
      
         - netfilter: always release netdev hooks from notifier
      
        Previous releases - always broken:
      
         - core: deal with most data-races in sk_wait_event()
      
         - netfilter: fix possible bug_on with enable_hooks=1
      
         - eth: bonding: fix send_peer_notif overflow
      
         - eth: xpcs: fix incorrect number of interfaces
      
         - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
      
         - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"
      
      * tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
        af_unix: Fix data races around sk->sk_shutdown.
        af_unix: Fix a data race of sk->sk_receive_queue->qlen.
        net: datagram: fix data-races in datagram_poll()
        net: mscc: ocelot: fix stat counter register values
        ipvlan:Fix out-of-bounds caused by unclear skb->cb
        docs: networking: fix x25-iface.rst heading & index order
        gve: Remove the code of clearing PBA bit
        tcp: add annotations around sk->sk_shutdown accesses
        net: add vlan_get_protocol_and_depth() helper
        net: pcs: xpcs: fix incorrect number of interfaces
        net: deal with most data-races in sk_wait_event()
        net: annotate sk->sk_err write from do_recvmmsg()
        netlink: annotate accesses to nlk->cb_running
        kselftest: bonding: add num_grat_arp test
        selftests: forwarding: lib: add netns support for tc rule handle stats get
        Documentation: bonding: fix the doc of peer_notif_delay
        bonding: fix send_peer_notif overflow
        net: ethernet: mtk_eth_soc: fix NULL pointer dereference
        selftests: nft_flowtable.sh: check ingress/egress chain too
        selftests: nft_flowtable.sh: monitor result file sizes
        ...
      6e27831b
    • Linus Torvalds's avatar
      Merge tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 691e1eee
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
      
       - fix some unused-variable warning in mtk-mdp3
      
       - ignore unused suspend operations in nxp
      
       - some driver fixes in rcar-vin
      
      * tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: platform: mtk-mdp3: work around unused-variable warning
        media: nxp: ignore unused suspend operations
        media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
        media: rcar-vin: Fix NV12 size alignment
        media: rcar-vin: Gen3 can not scale NV12
      691e1eee
    • Paolo Abeni's avatar
      Merge branch 'net-mvneta-reduce-size-of-tso-header-allocation' · 285b2a46
      Paolo Abeni authored
      Russell King says:
      
      ====================
      net: mvneta: reduce size of TSO header allocation
      
      With reference to
      https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/
      https://github.com/openwrt/openwrt/pull/12375#issuecomment-1528842334
      
      It appears that mvneta attempts an order-6 allocation for the TSO
      header memory. While this succeeds early on in the system's life time,
      trying order-6 allocations later can result in failure due to memory
      fragmentation.
      
      Firstly, the reason it's so large is that we take the number of
      transmit descriptors, and allocate a TSO header buffer for each, and
      each TSO header is 256 bytes. The driver uses a simple mechanism to
      determine the address - it uses the transmit descriptor index as an
      index into the TSO header memory.
      
      	(The first obvious question is: do there need to be this
      	many? Won't each TSO header always have at least one bit
      	of data to go with it? In other words, wouldn't the maximum
      	number of TSO headers that a ring could accept be the number
      	of ring entries divided by 2?)
      
      There is no real need for this memory to be an order-6 allocation,
      since nothing in hardware requires this buffer to be contiguous.
      
      Therefore, this series splits this order-6 allocation up into 32
      order-1 allocations (8k pages on 4k page platforms), each giving
      32 TSO headers per page.
      
      In order to do this, these patches:
      
      1) fix a horrible transmit path error-cleanup bug - the existing
         code unmaps from the first descriptor that was allocated at
         interface bringup, not the first descriptor that the packet
         is using, resulting in the wrong descriptors being unmapped.
      
      2) since xdp support was added, we now have buf->type which indicates
         what this transmit buffer contains. Use this to mark TSO header
         buffers.
      
      3) get rid of IS_TSO_HEADER(), instead using buf->type to determine
         whether this transmit buffer needs to be DMA-unmapped.
      
      4) move tso_build_hdr() into mvneta_tso_put_hdr() to keep all the
         TSO header building code together.
      
      5) split the TSO header allocation into chunks of order-1 pages.
      
      This has now been tested by the Turris folk and has been found to fix
      the allocation error.
      ====================
      
      Link: https://lore.kernel.org/r/ZFtuhJOC03qpASt2@shell.armlinux.org.ukSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      285b2a46
    • Russell King (Oracle)'s avatar
      net: mvneta: allocate TSO header DMA memory in chunks · 33f4cefb
      Russell King (Oracle) authored
      Now that we no longer need to check whether the DMA address is within
      the TSO header DMA memory range for the queue, we can allocate the TSO
      header DMA memory in chunks rather than one contiguous order-6 chunk,
      which can stress the kernel's memory subsystems to allocate.
      
      Instead, use order-1 (8k) allocations, which will result in 32 order-1
      pages containing 32 TSO headers.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      33f4cefb
    • Russell King (Oracle)'s avatar
      net: mvneta: move tso_build_hdr() into mvneta_tso_put_hdr() · d41eb555
      Russell King (Oracle) authored
      Move tso_build_hdr() into mvneta_tso_put_hdr() so that all the TSO
      header building code is in one place.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d41eb555