1. 19 Nov, 2019 10 commits
    • David S. Miller's avatar
      Merge branch 'page_pool-followup-changes-to-restore-tracepoint-features' · 6960f7e3
      David S. Miller authored
      Jesper Dangaard says:
      
      ====================
      page_pool: followup changes to restore tracepoint features
      
      This patchset is a followup to Jonathan patch, that do not release
      pool until inflight == 0. That changed page_pool to be responsible for
      its own delayed destruction instead of relying on xdp memory model.
      
      As the page_pool maintainer, I'm promoting the use of tracepoint to
      troubleshoot and help driver developers verify correctness when
      converting at driver to use page_pool. The role of xdp:mem_disconnect
      have changed, which broke my bpftrace tools for shutdown verification.
      With these changes, the same capabilities are regained.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6960f7e3
    • Jesper Dangaard Brouer's avatar
      page_pool: extend tracepoint to also include the page PFN · 832ccf6f
      Jesper Dangaard Brouer authored
      The MM tracepoint for page free (called kmem:mm_page_free) doesn't provide
      the page pointer directly, instead it provides the PFN (Page Frame Number).
      This is annoying when writing a page_pool leak detector in BPF.
      
      This patch change page_pool tracepoints to also provide the PFN.
      The page pointer is still provided to allow other kinds of
      troubleshooting from BPF.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      832ccf6f
    • Jesper Dangaard Brouer's avatar
      page_pool: add destroy attempts counter and rename tracepoint · 7c9e6942
      Jesper Dangaard Brouer authored
      When Jonathan change the page_pool to become responsible to its
      own shutdown via deferred work queue, then the disconnect_cnt
      counter was removed from xdp memory model tracepoint.
      
      This patch change the page_pool_inflight tracepoint name to
      page_pool_release, because it reflects the new responsability
      better.  And it reintroduces a counter that reflect the number of
      times page_pool_release have been tried.
      
      The counter is also used by the code, to only empty the alloc
      cache once.  With a stuck work queue running every second and
      counter being 64-bit, it will overrun in approx 584 billion
      years. For comparison, Earth lifetime expectancy is 7.5 billion
      years, before the Sun will engulf, and destroy, the Earth.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c9e6942
    • Jesper Dangaard Brouer's avatar
      xdp: remove memory poison on free for struct xdp_mem_allocator · c491eae8
      Jesper Dangaard Brouer authored
      When looking at the details I realised that the memory poison in
      __xdp_mem_allocator_rcu_free doesn't make sense. This is because the
      SLUB allocator uses the first 16 bytes (on 64 bit), for its freelist,
      which overlap with members in struct xdp_mem_allocator, that were
      updated.  Thus, SLUB already does the "poisoning" for us.
      
      I still believe that poisoning memory make sense in other cases.
      Kernel have gained different use-after-free detection mechanism, but
      enabling those is associated with a huge overhead. Experience is that
      debugging facilities can change the timing so much, that that a race
      condition will not be provoked when enabled. Thus, I'm still in favour
      of poisoning memory where it makes sense.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c491eae8
    • Russell King's avatar
      net: phy: avoid matching all-ones clause 45 PHY IDs · b95e86d8
      Russell King authored
      We currently match clause 45 PHYs using any ID read from a MMD marked
      as present in the "Devices in package" registers 5 and 6.  However,
      this is incorrect.  45.2 says:
      
        "The definition of the term package is vendor specific and could be
         a chip, module, or other similar entity."
      
      so a package could be more or less than the whole PHY - a PHY could be
      made up of several modules instantiated onto a single chip such as the
      Marvell 88x3310, or some of the MMDs could be disabled according to
      chip configuration, such as the Broadcom 84881.
      
      In the case of Broadcom 84881, the "Devices in package" registers
      contain 0xc000009b, meaning that there is a PHYXS present in the
      package, but all registers in MMD 4 return 0xffff.  This leads to our
      matching code incorrectly binding this PHY to one of our generic PHY
      drivers.
      
      This patch changes the way we determine whether to attempt to match a
      MMD identifier, or use it to request a module - if the identifier is
      all-ones, then we skip over it. When reading the identifiers, we
      initialise phydev->c45_ids.device_ids to all-ones, only reading the
      device ID if the "Devices in package" registers indicates we should.
      
      This avoids the generic drivers incorrectly matching on a PHY ID of
      0xffffffff.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b95e86d8
    • David S. Miller's avatar
      Merge branch 'Add-support-for-SFPs-behind-PHYs' · e64dbb1a
      David S. Miller authored
      Russell King says:
      
      ====================
      Add support for SFPs behind PHYs
      
      This series adds partial support for SFP cages connected to PHYs,
      specifically optical SFPs.
      
      We add core infrastructure to phylib for this, and arrange for
      minimal code in the PHY driver - currently, this is code to verify
      that the module is one that we can support for Marvell 10G PHYs.
      
      v2: add yaml binding patch
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e64dbb1a
    • Russell King's avatar
      net: phy: marvell10g: add SFP+ support · 36023da1
      Russell King authored
      Add support for SFP+ cages to the Marvell 10G PHY driver. This is
      slightly complicated by the way phylib works in that we need to use
      a multi-step process to attach the SFP bus, and we also need to track
      the phylink state machine to know when the module's transmit disable
      signal should change state.
      
      With appropriate DT changes, this allows the SFP+ canges on the
      Macchiatobin platform to be functional.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36023da1
    • Russell King's avatar
      net: phy: add core phylib sfp support · 298e54fa
      Russell King authored
      Add core phylib help for supporting SFP sockets on PHYs.  This provides
      a mechanism to inform the SFP layer about PHY up/down events, and also
      unregister the SFP bus when the PHY is going away.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      298e54fa
    • Russell King's avatar
      dt-bindings: net: add ethernet controller and phy sfp property · fb3d8bcd
      Russell King authored
      Document the missing sfp property for ethernet controllers (which
      has existed for some time) which is being extended to ethernet PHYs.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb3d8bcd
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 99638e9d
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      1) Wildcard support for the net,iface set from Kristian Evensen.
      
      2) Offload support for matching on the input interface.
      
      3) Simplify matching on vlan header fields.
      
      4) Add nft_payload_rebuild_vlan_hdr() function to rebuild the vlan
         header from the vlan sk_buff metadata.
      
      5) Pass extack to nft_flow_cls_offload_setup().
      
      6) Add C-VLAN matching support.
      
      7) Use time64_t in xt_time to fix y2038 overflow, from Arnd Bergmann.
      
      8) Use time_t in nft_meta to fix y2038 overflow, also from Arnd.
      
      9) Add flow_action_entry_next() helper function to flowtable offload
         infrastructure.
      
      10) Add IPv6 support to the flowtable offload infrastructure.
      
      11) Support for input interface matching from postrouting,
          from Phil Sutter.
      
      12) Missing check for ndo callback in flowtable offload, from wenxu.
      
      13) Remove conntrack parameter from flow_offload_fill_dir(), from wenxu.
      
      14) Do not pass flow_rule object for rule removal, cookie is sufficient
          to achieve this.
      
      15) Release flow_rule object in case of error from the offload commit
          path.
      
      16) Undo offload ruleset updates if transaction fails.
      
      17) Check for error when binding flowtable callbacks, from wenxu.
      
      18) Always unbind flowtable callbacks when unregistering hooks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99638e9d
  2. 17 Nov, 2019 5 commits
  3. 16 Nov, 2019 25 commits
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ffaf037
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc fixes: a handful of AUX event handling related fixes, a Sparse
        fix and two ABI fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Fix missing static inline on perf_cgroup_switch()
        perf/core: Consistently fail fork on allocation failures
        perf/aux: Disallow aux_output for kernel events
        perf/core: Reattach a misplaced comment
        perf/aux: Fix the aux_output group inheritance fix
        perf/core: Disallow uncore-cgroup events
      5ffaf037
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8be636dd
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix memory leak in xfrm_state code, from Steffen Klassert.
      
       2) Fix races between devlink reload operations and device
          setup/cleanup, from Jiri Pirko.
      
       3) Null deref in NFC code, from Stephan Gerhold.
      
       4) Refcount fixes in SMC, from Ursula Braun.
      
       5) Memory leak in slcan open error paths, from Jouni Hogander.
      
       6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.
      
       7) Info leak on short USB request answers in ax88172a driver, from
          Oliver Neukum.
      
       8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.
      
       9) PTP config timestamp flags validation, from Richard Cochran.
      
      10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.
      
      11) Missing free_netdev() in gemini driver, from Chuhong Yuan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
        ipmr: Fix skb headroom in ipmr_get_route().
        net: hns3: cleanup of stray struct hns3_link_mode_mapping
        net/smc: fix fastopen for non-blocking connect()
        rds: ib: update WR sizes when bringing up connection
        net: gemini: add missed free_netdev
        net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
        seg6: fix skb transport_header after decap_and_validate()
        seg6: fix srh pointer in get_srh()
        net: stmmac: Use the correct style for SPDX License Identifier
        octeontx2-af: Use the correct style for SPDX License Identifier
        ptp: Extend the test program to check the external time stamp flags.
        mlx5: Reject requests to enable time stamping on both edges.
        igb: Reject requests that fail to enable time stamping on both edges.
        dp83640: Reject requests to enable time stamping on both edges.
        mv88e6xxx: Reject requests to enable time stamping on both edges.
        ptp: Introduce strict checking of external time stamp options.
        renesas: reject unsupported external timestamp flags
        mlx5: reject unsupported external timestamp flags
        igb: reject unsupported external timestamp flags
        dp83640: reject unsupported external timestamp flags
        ...
      8be636dd
    • kbuild test robot's avatar
      mscc.c: fix semicolon.cocci warnings · 1e8795b1
      kbuild test robot authored
      drivers/net/phy/mscc.c:1683:3-4: Unneeded semicolon
      
       Remove unneeded semicolon.
      
      Generated by: scripts/coccinelle/misc/semicolon.cocci
      
      Fixes: 75a1ccfe ("mscc.c: Add support for additional VSC PHYs")
      CC: Bryan Whitehead <Bryan.Whitehead@microchip.com>
      Signed-off-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e8795b1
    • Eric Dumazet's avatar
      selftests: net: avoid ptl lock contention in tcp_mmap · 597b01ed
      Eric Dumazet authored
      tcp_mmap is used as a reference program for TCP rx zerocopy,
      so it is important to point out some potential issues.
      
      If multiple threads are concurrently using getsockopt(...
      TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
      functions compete on shared ptl lock, if vma are arbitrary placed.
      
      Instead of letting the mm layer place the chunks back to back,
      this patch enforces an alignment so that each thread uses
      a different ptl lock.
      
      Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
      launched at the same time :
      
      $ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done
      
      In the following run, we reproduce the old behavior by requesting no alignment :
      
      $ tcp_mmap -sz -C $((128*1024)) -a 4096
      received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
        cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
      received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
        cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
      received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
        cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
      received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
        cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
      received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
        cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
      received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
        cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
      received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
        cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
      received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
        cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches
      
      New behavior (automatic alignment based on Hugepagesize),
      we can see the system overhead being dramatically reduced.
      
      $ tcp_mmap -sz -C $((128*1024))
      received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
        cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
      received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
        cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
      received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
        cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
      received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
        cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
      received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
        cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
      received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
        cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
      received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
        cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
      received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
        cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Arjun Roy <arjunroy@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      597b01ed
    • Heiner Kallweit's avatar
      r8169: load firmware for RTL8168fp/RTL8117 · 229c1e0d
      Heiner Kallweit authored
      Load Realtek-provided firmware for RTL8168fp/RTL8117. Unlike the
      firmware for other chip versions which is for the PHY, firmware for
      RTL8168fp/RTL8117 is for the MAC.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      229c1e0d
    • Heiner Kallweit's avatar
      r8169: improve conditional firmware loading for RTL8168d · 718af5bc
      Heiner Kallweit authored
      Using constant MII_EXPANSION is misleading here because register 0x06
      has a different meaning on page 0x0005. Here a proprietary PHY
      parameter is read by writing the parameter id to register 0x05 on page
      0x0005, followed by reading the parameter value from register 0x06.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      718af5bc
    • Russell King's avatar
      net: phylink: update to use phy_support_asym_pause() · 725ea4bf
      Russell King authored
      Use phy_support_asym_pause() rather than open-coding it.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      725ea4bf
    • Guillaume Nault's avatar
      ipmr: Fix skb headroom in ipmr_get_route(). · 7901cd97
      Guillaume Nault authored
      In route.c, inet_rtm_getroute_build_skb() creates an skb with no
      headroom. This skb is then used by inet_rtm_getroute() which may pass
      it to rt_fill_info() and, from there, to ipmr_get_route(). The later
      might try to reuse this skb by cloning it and prepending an IPv4
      header. But since the original skb has no headroom, skb_push() triggers
      skb_under_panic():
      
      skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:108!
      invalid opcode: 0000 [#1] SMP KASAN PTI
      CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      RIP: 0010:skb_panic+0xbf/0xd0
      Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
      RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
      RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
      RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
      R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
      R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
      FS:  00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_push+0x7e/0x80
       ipmr_get_route+0x459/0x6fa
       rt_fill_info+0x692/0x9f0
       inet_rtm_getroute+0xd26/0xf20
       rtnetlink_rcv_msg+0x45d/0x630
       netlink_rcv_skb+0x1a5/0x220
       rtnetlink_rcv+0x15/0x20
       netlink_unicast+0x305/0x3a0
       netlink_sendmsg+0x575/0x730
       sock_sendmsg+0xb5/0xc0
       ___sys_sendmsg+0x497/0x4f0
       __sys_sendmsg+0xcb/0x150
       __x64_sys_sendmsg+0x48/0x50
       do_syscall_64+0xd2/0xac0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Actually the original skb used to have enough headroom, but the
      reserve_skb() call was lost with the introduction of
      inet_rtm_getroute_build_skb() by commit 404eb77e ("ipv4: support
      sport, dport and ip_proto in RTM_GETROUTE").
      
      We could reserve some headroom again in inet_rtm_getroute_build_skb(),
      but this function shouldn't be responsible for handling the special
      case of ipmr_get_route(). Let's handle that directly in
      ipmr_get_route() by calling skb_realloc_headroom() instead of
      skb_clone().
      
      Fixes: 404eb77e ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7901cd97
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-2019-11-15' of... · 50bef719
      David S. Miller authored
      Merge tag 'wireless-drivers-next-2019-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for v5.5
      
      Second set of patches for v5.5. Nothing special this time, smaller
      features to various drivers and of course fixes all over.
      
      Major changes:
      
      iwlwifi
      
      * update scan FW API
      
      * bump the supported FW API version
      
      * add debug dump collection on assert in WoWLAN
      
      * enable adaptive dwell on P2P interfaces
      
      ath10k
      
      * request for PM_QOS_CPU_DMA_LATENCY to improve firmware initialisation time
      
      qtnfmac
      
      * add support for getting/setting transmit power
      
      * handle MIC failure event from firmware
      
      rtl8xxxu
      
      * add support for Edimax EW-7611ULB
      
      wil6210
      
      * add SPDX license identifiers
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50bef719
    • Salil Mehta's avatar
      net: hns3: cleanup of stray struct hns3_link_mode_mapping · b696083d
      Salil Mehta authored
      This patch cleans-up the stray left over code. It has no
      functionality impact.
      Signed-off-by: default avatarSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b696083d
    • Ursula Braun's avatar
      net/smc: fix fastopen for non-blocking connect() · 8204df72
      Ursula Braun authored
      FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
      TCP native during connection start, the FASTOPEN setsockopts trigger
      this fallback, if the SMC-socket is still in state SMC_INIT.
      But if a FASTOPEN setsockopt is called after a non-blocking connect(),
      this is broken, and fallback does not make sense.
      This change complements
      commit cd206360 ("net/smc: avoid fallback in case of non-blocking connect")
      and fixes the syzbot reported problem "WARNING in smc_unhash_sk".
      
      Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
      Fixes: e1bbdd57 ("net/smc: reduce sock_put() for fallback sockets")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8204df72
    • Matteo Croce's avatar
      bonding: symmetric ICMP transmit · df98be06
      Matteo Croce authored
      A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
      to balance packets between slaves. With some network errors, we receive an ICMP
      error packet by the remote host or a router. If sent by a router, the source IP
      can differ from the remote host one. Additionally the ICMP protocol has no port
      numbers, so a layer3+4 bonding will get a different hash than the previous one.
      These two conditions could let the packet go through a different interface than
      the other packets of the same flow:
      
          # tcpdump -qltnni veth0 |sed 's/^/0: /' &
          # tcpdump -qltnni veth1 |sed 's/^/1: /' &
          # hping3 -2 192.168.0.2 -p 9
          0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
      
      An ICMP error packet contains the header of the packet which caused the network
      error, so inspect it and match the flow against it, so we can send the ICMP via
      the same interface of the previous packet in the flow.
      Move the IP and port dissect code into a generic function bond_flow_ip() and if
      we are dissecting an ICMP error packet, call it again with the adjusted offset.
      
          # hping3 -2 192.168.0.2 -p 9
          1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
          0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
          0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df98be06
    • Horatiu Vultur's avatar
      net: mscc: ocelot: omit error check from of_get_phy_mode · 4214fa1e
      Horatiu Vultur authored
      The commit 0c65b2b9 ("net: of_get_phy_mode: Change API to solve
      int/unit warnings") updated the function of_get_phy_mode declaration.
      Now it returns an error code and in case the node doesn't contain the
      property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would
      set the phy_interface_t to PHY_INTERFACE_MODE_NA.
      
      Ocelot VSC7514 has 4 internal phys which have the phy interface
      PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign
      PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need
      to add the error check.
      
      Updates for v2:
       - drop error check because of_get_phy_mode already assigns phy_interface
         to PHY_INTERFACE_MODE in case of error.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4214fa1e
    • Alexander Lobakin's avatar
      net: core: allow fast GRO for skbs with Ethernet header in head · 8aef998d
      Alexander Lobakin authored
      Commit 78d3fd0b ("gro: Only use skb_gro_header for completely
      non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
      original condition '!skb_headlen(skb)' to
      'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
      the drivers that need this optimisation all provide completely
      non-linear packets" (note that this condition has become the current
      'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit
      ced14f68 ("net: Correct comparisons and calculations using
      skb->tail and skb-transport_header") without any functional changes).
      
      For now, we have the following rough statistics for v5.4-rc7:
      1) napi_gro_frags: 14
      2) napi_gro_receive with skb->head containing (most of) payload: 83
      3) napi_gro_receive with skb->head containing all the headers: 20
      4) napi_gro_receive with skb->head containing only Ethernet header: 2
      
      With the current condition, fast GRO with the usage of
      NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
      Packets pushed by [2] and [3] go through the 'slow' path, but
      it's not a problem for them as they already contain all the needed
      headers in skb->head, so pskb_may_pull() only moves skb->data.
      
      The layout of skbs in the fourth [4] case at the moment of
      dev_gro_receive() is identical to skbs that have come through [1],
      as napi_frags_skb() pulls Ethernet header to skb->head. The only
      difference is that the mentioned condition is always false for them,
      because skb_put() and friends irreversibly alter the tail pointer.
      They also go through the 'slow' path, but now every single
      pskb_may_pull() in every single .gro_receive() will call the *really*
      slow __pskb_pull_tail() to pull headers to head. This significantly
      decreases the overall performance for no visible reasons.
      
      The only two users of method [4] is:
      * drivers/staging/qlge
      * drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)
      
      Note that in case with wireless drivers we can't use [1]
      (napi_gro_frags()) at least for now and mac80211 stack always
      performs pushes and pulls anyways, so performance hit is inavoidable.
      
      At the moment of v2.6.31 the mentioned change was necessary (that's
      why I don't add the "Fixes:" tag), but it became obsolete since
      skb_gro_mac_header() has gone in commit a50e233c ("net-gro:
      restore frag0 optimization"), so we can simply revert the condition
      in gro_reset_offset() to allow skbs from [4] go through the 'fast'
      path just like in case [1].
      
      This was tested on a 600 MHz MIPS CPU and a custom driver and this
      patch gave boosts up to 40 Mbps to method [4] in both directions
      comparing to net-next, which made overall performance relatively
      close to [1] (without it, [4] is the slowest).
      
      v2:
      - Add more references and explanations to commit message
      - Fix some typos ibid
      - No functional changes
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8aef998d
    • Dag Moxnes's avatar
      rds: ib: update WR sizes when bringing up connection · a36e629e
      Dag Moxnes authored
      Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
      rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
      a connection being down while rds_ib_sysctl_max_send_wr or
      rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
      it comes back up.
      
      Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
      with the most current WR sizes.
      Signed-off-by: default avatarDag Moxnes <dag.moxnes@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a36e629e
    • Chuhong Yuan's avatar
      net: gemini: add missed free_netdev · 18d647ae
      Chuhong Yuan authored
      This driver forgets to free allocated netdev in remove like
      what is done in probe failure.
      Add the free to fix it.
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18d647ae
    • David S. Miller's avatar
      Merge branch 'bnx2x-Remove-function-casts' · f92e88db
      David S. Miller authored
      Kees Cook says:
      
      ====================
      bnx2x: Remove function casts
      
      In order to make the entire kernel usable under Clang's Control Flow
      Integrity protections, function prototype casts need to be avoided
      because this will trip CFI checks at runtime (i.e. a mismatch between
      the caller's expected function prototype and the destination function's
      prototype). Many of these cases can be found with -Wcast-function-type,
      which found that bnx2x had a bunch of needless (or at least confusing)
      function casts. This series removes them all.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f92e88db
    • Kees Cook's avatar
      bnx2x: Remove hw_reset_t function casts · 548e5ffe
      Kees Cook authored
      All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a
      void return type. No callers of .hw_reset check a return value and
      bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all
      hw_reset_t casts and fix the return type to void.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      548e5ffe
    • Kees Cook's avatar
      bnx2x: Remove format_fw_ver_t function casts · 26658f6b
      Kees Cook authored
      The return values for format_fw_ver_t callbacks are supposed to be
      "int", not "u8". Ultimately, the top-level caller doesn't actually check
      the return value at all, but just clean this all up anyway and fix the
      prototypes so that casts are no longer needed.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26658f6b
    • Kees Cook's avatar
      bnx2x: Remove config_init_t function casts · 3e19d1f2
      Kees Cook authored
      No callers of .config_init check return values. Remove the casting and
      change all callbacks to have the correct function prototype.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e19d1f2
    • Kees Cook's avatar
      bnx2x: Remove read_status_t function casts · 2c855d73
      Kees Cook authored
      The function casts for .read_status callbacks end up casting some int
      return values to u8. This seems to be bug-prone (-EINVAL being returned
      into something that appears to be true/false), but fixing the function
      prototypes doesn't change the existing behavior. Fix the return values
      to remove the casts.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c855d73
    • Kees Cook's avatar
      bnx2x: Drop redundant callback function casts · 86c1fe88
      Kees Cook authored
      NULL is already "void *" so it will auto-cast in assignments and
      initializers. Additionally, all the callbacks for .link_reset,
      .config_loopback, .set_link_led, and .phy_specific_func are already
      correct. No casting is needed for these, so remove them.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86c1fe88
    • Po Liu's avatar
      enetc: update TSN Qbv PSPEED set according to adjust link speed · 2e47cb41
      Po Liu authored
      ENETC has a register PSPEED to indicate the link speed of hardware.
      It is need to update accordingly. PSPEED field needs to be updated
      with the port speed for QBV scheduling purposes. Or else there is
      chance for gate slot not free by frame taking the MAC if PSPEED and
      phy speed not match. So update PSPEED when link adjust. This is
      implement by the adjust_link.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e47cb41
    • Po Liu's avatar
      enetc: Configure the Time-Aware Scheduler via tc-taprio offload · 34c6adf1
      Po Liu authored
      ENETC supports in hardware for time-based egress shaping according
      to IEEE 802.1Qbv. This patch implement the Qbv enablement by the
      hardware offload method qdisc tc-taprio method.
      Also update cbdr writeback to up level since control bd ring may
      writeback data to control bd ring.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34c6adf1
    • Jonathan Lemon's avatar
      page_pool: do not release pool until inflight == 0. · c3f812ce
      Jonathan Lemon authored
      The page pool keeps track of the number of pages in flight, and
      it isn't safe to remove the pool until all pages are returned.
      
      Disallow removing the pool until all pages are back, so the pool
      is always available for page producers.
      
      Make the page pool responsible for its own delayed destruction
      instead of relying on XDP, so the page pool can be used without
      the xdp memory model.
      
      When all pages are returned, free the pool and notify xdp if the
      pool is registered with the xdp memory system.  Have the callback
      perform a table walk since some drivers (cpsw) may share the pool
      among multiple xdp_rxq_info.
      
      Note that the increment of pages_state_release_cnt may result in
      inflight == 0, resulting in the pool being released.
      
      Fixes: d956a048 ("xdp: force mem allocator removal and periodic warning")
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3f812ce