1. 17 Apr, 2021 12 commits
    • Vladimir Oltean's avatar
      net: enetc: fix buffer leaks with XDP_TX enqueue rejections · 92ff9a6e
      Vladimir Oltean authored
      If the TX ring is congested, enetc_xdp_tx() returns false for the
      current XDP frame (represented as an array of software BDs).
      
      This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd
      from software BDs freshly cleaned from the RX ring. The issue is that we
      scrub the RX software BDs too soon, more precisely before we know that
      we can enqueue the TX BDs successfully into the TX ring.
      
      If we can't enqueue them (and enetc_xdp_tx returns false), we call
      enetc_xdp_drop which attempts to recycle the buffers held by the RX
      software BDs. But because we scrubbed those RX BDs already, two things
      happen:
      
      (a) we leak their memory
      (b) we populate the RX software BD ring with an all-zero rx_swbd
          structure, which makes the buffer refill path allocate more memory.
      
      enetc_refill_rx_ring
      -> if (unlikely(!rx_swbd->page))
         -> enetc_new_page
      
      That is a recipe for fast OOM.
      
      Fixes: 7ed2bc80 ("net: enetc: add support for XDP_TX")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92ff9a6e
    • Vladimir Oltean's avatar
      net: enetc: handle the invalid XDP action the same way as XDP_DROP · 975acc83
      Vladimir Oltean authored
      When the XDP program returns an invalid action, we should free the RX
      buffer.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      975acc83
    • Vladimir Oltean's avatar
      net: enetc: use dedicated TX rings for XDP · 7eab503b
      Vladimir Oltean authored
      It is possible for one CPU to perform TX hashing (see netdev_pick_tx)
      between the 8 ENETC TX rings, and the TX hashing to select TX queue 1.
      
      At the same time, it is possible for the other CPU to already use TX
      ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual
      exclusion between XDP and the network stack, we run into an issue
      because the ENETC TX procedure is not reentrant.
      
      The obvious approach would be to just make XDP take the lock of the
      network stack's TX queue corresponding to the ring it's about to enqueue
      in.
      
      For XDP_REDIRECT, this is quite straightforward, a lock at the beginning
      and end of enetc_xdp_xmit() should do the trick.
      
      But for XDP_TX, it's a bit more complicated. For one, we do TX batching
      all by ourselves for frames with the XDP_TX verdict. This is something
      we would like to keep the way it is, for performance reasons. But
      batching means that the network stack's lock should be kept from the
      first enqueued XDP_TX frame and until we ring the doorbell. That is
      mostly fine, except for cases when in the same NAPI loop we have mixed
      XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while
      we are holding the lock from the RX NAPI, then bam, deadlock. The naive
      answer could be 'just flush the XDP_TX frames first, then release the
      network stack's TX queue lock, then call xdp_do_flush_map()'. But even
      xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT
      frames, so unless we unlock/relock the TX queue around xdp_do_redirect(),
      there simply isn't any clean way to protect XDP_TX from concurrent
      network stack .ndo_start_xmit() on another CPU.
      
      So we need to take a different approach, and that is to reserve two
      rings for the sole use of XDP. We leave TX rings
      0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we
      pick them from the end of the priv->tx_ring array.
      
      We make an effort to keep the mapping done by enetc_alloc_msix() which
      decides which CPU handles the TX completions of which TX ring in its
      NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the
      XDP TX ring of CPU 1 is handled by TX ring 7.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7eab503b
    • Vladimir Oltean's avatar
      net: enetc: increase TX ring size · ee3e875f
      Vladimir Oltean authored
      Now that commit d6a2829e ("net: enetc: increase RX ring default
      size") has increased the RX ring size, it is quite easy to congest the
      TX rings when the traffic is predominantly XDP_TX, as the RX ring is
      quite a bit larger than the TX one.
      
      Since we bit the bullet and did the expensive thing already (larger RX
      rings consume more memory pages), it seems quite foolish to keep the TX
      rings small. So make them equally sized with TX.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee3e875f
    • Vladimir Oltean's avatar
      net: enetc: remove unneeded xdp_do_flush_map() · a6369fe6
      Vladimir Oltean authored
      xdp_do_redirect already contains:
      -> dev_map_enqueue
         -> __xdp_enqueue
            -> bq_enqueue
               -> bq_xmit_all // if we have more than 16 frames
      
      So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK
      is 128.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6369fe6
    • Vladimir Oltean's avatar
      net: enetc: stop XDP NAPI processing when build_skb() fails · 8f50d8bb
      Vladimir Oltean authored
      When the code path below fails:
      
      enetc_clean_rx_ring_xdp // XDP_PASS
      -> enetc_build_skb
         -> enetc_map_rx_buff_to_skb
            -> build_skb
      
      enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't
      strong enough to actually break the NAPI poll loop, just the switch/case
      statement for XDP actions. So we increment rx_frm_cnt and go to the next
      frames minding our own business.
      
      Instead let's do what the skb NAPI poll function does, and break the
      loop now, waiting for the memory pressure to go away. Otherwise the next
      calls to build_skb() are likely to fail too.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f50d8bb
    • Vladimir Oltean's avatar
      net: enetc: recycle buffers for frames with RX errors · 672f9a21
      Vladimir Oltean authored
      When receiving a frame with errors, currently we do nothing with it (we
      don't construct an skb or an xdp_buff), we just exit the NAPI poll loop.
      
      Let's put the buffer back into the RX ring (similar to XDP_DROP).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      672f9a21
    • Vladimir Oltean's avatar
      net: enetc: rename the buffer reuse helpers · 6b04830d
      Vladimir Oltean authored
      enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a
      helper to populate the recycle end of the shadow RX BD ring
      (next_to_alloc) with a given buffer.
      
      On the other hand, enetc_put_rx_buff plays more tricks than its name
      would suggest.
      
      So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the
      half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff
      into enetc_put_rx_buff which suggests a more garden-variety operation.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b04830d
    • Vladimir Oltean's avatar
      net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf path · e9e49ae8
      Vladimir Oltean authored
      Later in enetc_clean_tx_ring we have:
      
      		/* Scrub the swbd here so we don't have to do that
      		 * when we reuse it during xmit
      		 */
      		memset(tx_swbd, 0, sizeof(*tx_swbd));
      
      So these assignments are unnecessary.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9e49ae8
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · bc45f524
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2021-04-16
      
      This series contains updates to igb and igc drivers.
      
      Ederson adjusts Tx buffer distributions in Qav mode to improve
      TSN-aware traffic for igb. He also enable PPS support and auxiliary PHC
      functions for igc.
      
      Grzegorz checks that the MTA register was properly written and
      retries if not for igb.
      
      Sasha adds reporting of EEE low power idle counters to ethtool and fixes
      a return value being overwritten through looping for igc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc45f524
    • Gustavo A. R. Silva's avatar
      flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target() · 1e3d976d
      Gustavo A. R. Silva authored
      Fix the following out-of-bounds warning:
      
      net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds]
      
      The problem is that the original code is trying to copy data into a
      couple of struct members adjacent to each other in a single call to
      memcpy().  So, the compiler legitimately complains about it. As these
      are just a couple of members, fix this by copying each one of them in
      separate calls to memcpy().
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      Link: https://github.com/KSPP/linux/issues/109Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e3d976d
    • David S. Miller's avatar
      Merge branch 'ethtool-stats' · 1c86514d
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      ethtool: add uAPI for reading standard stats
      
      Continuing the effort of providing a unified access method
      to standard stats, and explicitly tying the definitions to
      the standards this series adds an API for general stats
      which do no fit into more targeted control APIs.
      
      There is nothing clever here, just a netlink API for dumping
      statistics defined by standards and RFCs which today end up
      in ethtool -S under infinite variations of names.
      
      This series adds basic IEEE stats (for PHY, MAC, Ctrl frames)
      and RMON stats. AFAICT other RFCs only duplicate the IEEE
      stats.
      
      This series does _not_ add a netlink API to read driver-defined
      stats. There seems to be little to gain from moving that part
      to netlink.
      
      The netlink message format is very simple, and aims to allow
      adding stats and groups with no changes to user tooling (which
      IIUC is expected for ethtool).
      
      On user space side we can re-use -S, and make it dump
      standard stats if --groups are defined.
      
      $ ethtool -S eth0 --groups eth-phy eth-mac eth-ctrl rmon
      Stats for eth0:
      eth-phy-SymbolErrorDuringCarrier: 0
      eth-mac-FramesTransmittedOK: 0
      eth-mac-FrameTooLongErrors: 0
      eth-ctrl-MACControlFramesTransmitted: 0
      eth-ctrl-MACControlFramesReceived: 1
      eth-ctrl-UnsupportedOpcodesReceived: 0
      rmon-etherStatsUndersizePkts: 0
      rmon-etherStatsJabbers: 0
      rmon-rx-etherStatsPkts64Octets: 1
      rmon-rx-etherStatsPkts128to255Octets: 0
      rmon-rx-etherStatsPkts1024toMaxOctets: 1
      rmon-tx-etherStatsPkts64Octets: 1
      rmon-tx-etherStatsPkts128to255Octets: 0
      rmon-tx-etherStatsPkts1024toMaxOctets: 1
      
      v1:
      
      Driver support for mlxsw, mlx5 and bnxt included.
      
      Compared to the RFC I went ahead with wrapping the stats into
      a 1:1 nest. Now IDs of stats can start from 0, at a cost of
      slightly "careful" u64 alignment handling.
      
      v2:
      
      Add missing kdoc in patch 5.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c86514d
  2. 16 Apr, 2021 28 commits
    • Jakub Kicinski's avatar
      mlx5: implement ethtool standard stats · b572ec9f
      Jakub Kicinski authored
      Add support for PHY/MAC/Ctrl/RMON stats.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b572ec9f
    • Jakub Kicinski's avatar
      bnxt: implement ethtool standard stats · 782bc00a
      Jakub Kicinski authored
      Most of the names seem to strongly correlate with names from
      the standard and RFC. Whether ..+good_frames are indeed Frames..OK
      I'm the least sure of.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      782bc00a
    • Jakub Kicinski's avatar
      mlxsw: implement ethtool standard stats · c1912ab0
      Jakub Kicinski authored
      mlxsw has nicely grouped stats, add support for standard uAPI.
      I'm guessing the register access part. Compile tested only.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1912ab0
    • Jakub Kicinski's avatar
      ethtool: add interface to read RMON stats · a8b06e9d
      Jakub Kicinski authored
      Most devices maintain RMON (RFC 2819) stats - particularly
      the "histogram" of packets received by size. Unlike other
      RFCs which duplicate IEEE stats, the short/oversized frame
      counters in RMON don't seem to match IEEE stats 1-to-1 either,
      so expose those, too. Do not expose basic packet, CRC errors
      etc - those are already otherwise covered.
      
      Because standard defines packet ranges only up to 1518, and
      everything above that should theoretically be "oversized"
      - devices often create their own ranges.
      
      Going beyond what the RFC defines - expose the "histogram"
      in the Tx direction (assume for now that the ranges will
      be the same).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8b06e9d
    • Jakub Kicinski's avatar
      ethtool: add interface to read standard MAC Ctrl stats · bfad2b97
      Jakub Kicinski authored
      Number of devices maintains the standard-based MAC control
      counters for control frames. Add a API for those.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfad2b97
    • Jakub Kicinski's avatar
      ethtool: add interface to read standard MAC stats · ca224454
      Jakub Kicinski authored
      Most of the MAC statistics are included in
      struct rtnl_link_stats64, but some fields
      are aggregated. Besides it's good to expose
      these clearly hardware stats separately.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca224454
    • Jakub Kicinski's avatar
      ethtool: add a new command for reading standard stats · f09ea6fb
      Jakub Kicinski authored
      Add an interface for reading standard stats, including
      stats which don't have a corresponding control interface.
      
      Start with IEEE 802.3 PHY stats. There seems to be only
      one stat to expose there.
      
      Define API to not require user space changes when new
      stats or groups are added. Groups are based on bitset,
      stats have a string set associated.
      
      v1: wrap stats in a nest
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f09ea6fb
    • Jakub Kicinski's avatar
      docs: ethtool: document standard statistics · ddc78b36
      Jakub Kicinski authored
      Add documentation for ETHTOOL_MSG_STATS_GET.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddc78b36
    • Jakub Kicinski's avatar
      docs: networking: extend the statistics documentation · f117c48c
      Jakub Kicinski authored
      Make the lack of expectations for switching NICs explicit,
      describe the new stats.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f117c48c
    • Gustavo A. R. Silva's avatar
      sctp: Fix out-of-bounds warning in sctp_process_asconf_param() · e5272ad4
      Gustavo A. R. Silva authored
      Fix the following out-of-bounds warning:
      
      net/sctp/sm_make_chunk.c:3150:4: warning: 'memcpy' offset [17, 28] from the object at 'addr' is out of the bounds of referenced subobject 'v4' with type 'struct sockaddr_in' at offset 0 [-Warray-bounds]
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      Link: https://github.com/KSPP/linux/issues/109Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5272ad4
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 03e481e8
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2021-04-16
      
      This patchset introduces updates to mlx5e netdev driver.
      
      1) Tariq refactors TLS offloads and adds resiliency against RX resync
         failures
      
      2) Maxim reduces code duplications by unifying channels reset flow
         regardless if channels are closed or open
      
      3) Aya Enhances TX/RX health reporters diagnostics to expose the
         internal clock time-stamping format
      
      4) Moshe adds support for ethtool extended link state, to show the reason
         for link down
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03e481e8
    • David S. Miller's avatar
      Merge branch 'gianfar-mq-polling' · 70c18375
      David S. Miller authored
      Claudiu Manoil says:
      
      ====================
      net: gianfar: Drop GFAR_MQ_POLLING support
      
      Drop long time obsolete "per NAPI multi-queue" support in gianfar,
      and related (and undocumented) device tree properties.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70c18375
    • Claudiu Manoil's avatar
      powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map properties · 221e8c12
      Claudiu Manoil authored
      These are very old properties that were used by the "gianfar" ethernet
      driver.  They don't have documented bindings and are obsolete.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      221e8c12
    • Claudiu Manoil's avatar
      gianfar: Drop GFAR_MQ_POLLING support · 8eda54c5
      Claudiu Manoil authored
      Gianfar used to enable all 8 Rx queues (DMA rings) per
      ethernet device, even though the controller can only
      support 2 interrupt lines at most.  This meant that
      multiple Rx queues would have to be grouped per NAPI poll
      routine, and the CPU would have to split the budget and
      service them in a round robin manner.  The overhead of
      this scheme proved to outweight the potential benefits.
      The alternative was to introduce the "Single Queue" polling
      mode, supporting one Rx queue per NAPI, which became the
      default packet processing option and helped improve the
      performance of the driver.
      MQ_POLLING also relies on undocumeted device tree properties
      to specify how to map the 8 Rx and Tx queues to a given
      interrupt line (aka "interrupt group").  Using module parameters
      to enable this mode wasn't an option either.  Long story short,
      MQ_POLLING became obsolete, now it is just dead code, and no
      one asked for it so far.
      For the Tx queues, multi-queue support (more than 1 Tx queue
      per CPU) could be revisited by adding tc MQPRIO support, but
      again, one has to consider that there are only 2 interrupt lines.
      So the NAPI poll routine would have to service multiple Tx rings.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8eda54c5
    • Toke Høiland-Jørgensen's avatar
      veth: check for NAPI instead of xdp_prog before xmit of XDP frame · 0e672f30
      Toke Høiland-Jørgensen authored
      The recent patch that tied enabling of veth NAPI to the GRO flag also has
      the nice side effect that a veth device can be the target of an
      XDP_REDIRECT without an XDP program needing to be loaded on the peer
      device. However, the patch adding this extra NAPI mode didn't actually
      change the check in veth_xdp_xmit() to also look at the new NAPI pointer,
      so let's fix that.
      
      Fixes: 6788fa154546 ("veth: allow enabling NAPI even without XDP")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e672f30
    • Taehee Yoo's avatar
      mld: fix suspicious RCU usage in __ipv6_dev_mc_dec() · aa8caa76
      Taehee Yoo authored
      __ipv6_dev_mc_dec() internally uses sleepable functions so that caller
      must not acquire atomic locks. But caller, which is addrconf_verify_rtnl()
      acquires rcu_read_lock_bh().
      So this warning occurs in the __ipv6_dev_mc_dec().
      
      Test commands:
          ip netns add A
          ip link add veth0 type veth peer name veth1
          ip link set veth1 netns A
          ip link set veth0 up
          ip netns exec A ip link set veth1 up
          ip a a 2001:db8::1/64 dev veth0 valid_lft 2 preferred_lft 1
      
      Splat looks like:
      ============================
      WARNING: suspicious RCU usage
      5.12.0-rc6+ #515 Not tainted
      -----------------------------
      kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side
      critical section!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      4 locks held by kworker/4:0/1997:
       #0: ffff88810bd72d48 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at:
      process_one_work+0x761/0x1440
       #1: ffff888105c8fe00 ((addr_chk_work).work){+.+.}-{0:0}, at:
      process_one_work+0x795/0x1440
       #2: ffffffffb9279fb0 (rtnl_mutex){+.+.}-{3:3}, at:
      addrconf_verify_work+0xa/0x20
       #3: ffffffffb8e30860 (rcu_read_lock_bh){....}-{1:2}, at:
      addrconf_verify_rtnl+0x23/0xc60
      
      stack backtrace:
      CPU: 4 PID: 1997 Comm: kworker/4:0 Not tainted 5.12.0-rc6+ #515
      Workqueue: ipv6_addrconf addrconf_verify_work
      Call Trace:
       dump_stack+0xa4/0xe5
       ___might_sleep+0x27d/0x2b0
       __mutex_lock+0xc8/0x13f0
       ? lock_downgrade+0x690/0x690
       ? __ipv6_dev_mc_dec+0x49/0x2a0
       ? mark_held_locks+0xb7/0x120
       ? mutex_lock_io_nested+0x1270/0x1270
       ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
       ? _raw_spin_unlock_irqrestore+0x47/0x50
       ? trace_hardirqs_on+0x41/0x120
       ? __wake_up_common_lock+0xc9/0x100
       ? __wake_up_common+0x620/0x620
       ? memset+0x1f/0x40
       ? netlink_broadcast_filtered+0x2c4/0xa70
       ? __ipv6_dev_mc_dec+0x49/0x2a0
       __ipv6_dev_mc_dec+0x49/0x2a0
       ? netlink_broadcast_filtered+0x2f6/0xa70
       addrconf_leave_solict.part.64+0xad/0xf0
       ? addrconf_join_solict.part.63+0xf0/0xf0
       ? nlmsg_notify+0x63/0x1b0
       __ipv6_ifa_notify+0x22c/0x9c0
       ? inet6_fill_ifaddr+0xbe0/0xbe0
       ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
       ? __local_bh_enable_ip+0xa5/0xf0
       ? ipv6_del_addr+0x347/0x870
       ipv6_del_addr+0x3b1/0x870
       ? addrconf_ifdown+0xfe0/0xfe0
       ? rcu_read_lock_any_held.part.27+0x20/0x20
       addrconf_verify_rtnl+0x8a9/0xc60
       addrconf_verify_work+0xf/0x20
       process_one_work+0x84c/0x1440
      
      In order to avoid this problem, it uses rcu_read_unlock_bh() for
      a short time. RCU is used for avoiding freeing
      ifp(struct *inet6_ifaddr) while ifp is being used. But this will
      not be released even if rcu_read_unlock_bh() is used.
      Because before rcu_read_unlock_bh(), it uses in6_ifa_hold(ifp).
      So this is safe.
      
      Fixes: 63ed8de4 ("mld: add mc_lock for protecting per-interface mld data")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa8caa76
    • David S. Miller's avatar
      Merge branch 'ipa-fw-names' · d8214c7a
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: allow different firmware names
      
      Add the ability to define a "firmware-name" property in the IPA DT
      node, specifying an alternate name to use for the firmware file.
      Used only if the AP (Trust Zone) does early IPA initialization.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8214c7a
    • Alex Elder's avatar
      net: ipa: optionally define firmware name via DT · 9ce062ba
      Alex Elder authored
      IPA initialization includes loading some firmware.  This step is
      done either by the modem or by the AP under Trust Zone.  If the
      AP loads firmware, the name of the firmware file is currently
      hard-coded ("ipa_fws.mdt").
      
      Add the ability to specify the relative path of the firmware file to
      use in a property in the Device Tree IPA node.  If the property is
      not found (or if any other error occurs attempting to get it), fall
      back to using a default relative path.
      
      Use the "old" fixed name as the default.  Rename the symbol that
      represents this default to emphasize its purpose.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ce062ba
    • Alex Elder's avatar
      dt-bindings: net: qcom,ipa: add firmware-name property · d8604b20
      Alex Elder authored
      Add a new optional firmware-name property to the IPA DT node.  It
      is used only if the modem is not doing early initialization (i.e.,
      if the modem-init property is not present).  Its value is the name
      of the firmware file to use; if it's not specified, a default name
      ("ipa_fws.mdt") is used.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8604b20
    • Xuan Zhuo's avatar
      virtio-net: page_to_skb() use build_skb when there's sufficient tailroom · fb32856b
      Xuan Zhuo authored
      In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
      can use build_skb to create skb directly. No need to alloc for
      additional space. And it can save a 'frags slot', which is very friendly
      to GRO.
      
      Here, if the payload of the received package is too small (less than
      GOOD_COPY_LEN), we still choose to copy it directly to the space got by
      napi_alloc_skb. So we can reuse these pages.
      
      Testing Machine:
          The four queues of the network card are bound to the cpu1.
      
      Test command:
          for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done
      
      The size of the udp package is 1000, so in the case of this patch, there
      will always be enough tailroom to use build_skb. The sent udp packet
      will be discarded because there is no port to receive it. The irqsoftd
      of the machine is 100%, we observe the received quantity displayed by
      sar -n DEV 1:
      
      no build_skb:  956864.00 rxpck/s
      build_skb:    1158465.00 rxpck/s
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Suggested-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb32856b
    • Loic Poulain's avatar
      net: Add Qcom WWAN control driver · fa588eba
      Loic Poulain authored
      The MHI WWWAN control driver allows MHI QCOM-based modems to expose
      different modem control protocols/ports via the WWAN framework, so that
      userspace modem tools or daemon (e.g. ModemManager) can control WWAN
      config and state (APN config, SMS, provider selection...). A QCOM-based
      modem can expose one or several of the following protocols:
      - AT: Well known AT commands interactive protocol (microcom, minicom...)
      - MBIM: Mobile Broadband Interface Model (libmbim, mbimcli)
      - QMI: QCOM MSM/Modem Interface (libqmi, qmicli)
      - QCDM: QCOM Modem diagnostic interface (libqcdm)
      - FIREHOSE: XML-based protocol for Modem firmware management
              (qmi-firmware-update)
      
      Note that this patch is mostly a rework of the earlier MHI UCI
      tentative that was a generic interface for accessing MHI bus from
      userspace. As suggested, this new version is WWAN specific and is
      dedicated to only expose channels used for controlling a modem, and
      for which related opensource userpace support exist.
      Signed-off-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa588eba
    • Loic Poulain's avatar
      net: Add a WWAN subsystem · 9a44c1cc
      Loic Poulain authored
      This change introduces initial support for a WWAN framework. Given the
      complexity and heterogeneity of existing WWAN hardwares and interfaces,
      there is no strict definition of what a WWAN device is and how it should
      be represented. It's often a collection of multiple devices that perform
      the global WWAN feature (netdev, tty, chardev, etc).
      
      One usual way to expose modem controls and configuration is via high
      level protocols such as the well known AT command protocol, MBIM or
      QMI. The USB modems started to expose them as character devices, and
      user daemons such as ModemManager learnt to use them.
      
      This initial version adds the concept of WWAN port, which is a logical
      pipe to a modem control protocol. The protocols are rawly exposed to
      user via character device, allowing straigthforward support in existing
      tools (ModemManager, ofono...). The WWAN core takes care of the generic
      part, including character device management, and relies on port driver
      operations to receive/submit protocol data.
      
      Since the different devices exposing protocols for a same WWAN hardware
      do not necessarily know about each others (e.g. two different USB
      interfaces, PCI/MHI channel devices...) and can be created/removed in
      different orders, the WWAN core ensures that all WAN ports contributing
      to the 'whole' WWAN feature are grouped under the same virtual WWAN
      device, relying on the provided parent device (e.g. mhi controller,
      USB device). It's a 'trick' I copied from Johannes's earlier WWAN
      subsystem proposal.
      
      This initial version is purposely minimalist, it's essentially moving
      the generic part of the previously proposed mhi_wwan_ctrl driver inside
      a common WWAN framework, but the implementation is open and flexible
      enough to allow extension for further drivers.
      Signed-off-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a44c1cc
    • Stefan Chulski's avatar
      net: mvpp2: Add parsing support for different IPv4 IHL values · 4ad29b1a
      Stefan Chulski authored
      Add parser entries for different IPv4 IHL values.
      Each entry will set the L4 header offset according to the IPv4 IHL field.
      L3 header offset will set during the parsing of the IPv4 protocol.
      
      Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails
      and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack).
      This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20.
      
      v1 --> v2
      - Improve commit message.
      Suggested-by: default avatarDana Vardi <danat@marvell.com>
      Signed-off-by: default avatarStefan Chulski <stefanc@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ad29b1a
    • David S. Miller's avatar
      Merge branch 'r8152--new-chips' · af1fa6b6
      David S. Miller authored
      Hayes Wang says:
      
      ====================
      r8152: support new chips
      
      Support new RTL8153 and RTL8156 series.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af1fa6b6
    • Hayes Wang's avatar
      r8152: search the configuration of vendor mode · c2198943
      Hayes Wang authored
      The vendor mode is not always at config #1, so it is necessary to
      set the correct configuration number.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2198943
    • Hayes Wang's avatar
      r8152: support PHY firmware for RTL8156 series · 4a51b0e8
      Hayes Wang authored
      Support new firmware type and method for RTL8156 series.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a51b0e8
    • Hayes Wang's avatar
      r8152: support new chips · 195aae32
      Hayes Wang authored
      Support RTL8153C, RTL8153D, RTL8156A, and RTL8156B. The RTL8156A
      and RTL8156B are the 2.5G ethernet.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      195aae32
    • Hayes Wang's avatar
      r8152: add help function to change mtu · 67ce1a80
      Hayes Wang authored
      The different chips may have different requests when changing mtu.
      Therefore, add a new help function of rtl_ops to change mtu. Besides,
      reset the tx/rx after changing mtu.
      
      Additionally, add mtu_to_size() and size_to_mtu() macros to simplify
      the code.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67ce1a80