1. 30 Jun, 2016 13 commits
    • Seymour, Shane M's avatar
      tcp: increase size at which tcp_bound_to_half_wnd bounds to > TCP_MSS_DEFAULT · 2631b79f
      Seymour, Shane M authored
      In previous commit 01f83d69
      the following comments were added:
      
      "When peer uses tiny windows, there is no use in packetizing to sub-MSS
      pieces for the sake of SWS or making sure there are enough packets in
      the pipe for fast recovery."
      
      The test should be > TCP_MSS_DEFAULT not >= 512. This allows low end
      devices that send an MSS of 536 (TCP_MSS_DEFAULT) to see better network
      performance by sending it 536 bytes of data at a time instead of bounding
      to half window size (268). Other network stacks work this way, e.g. HP-UX.
      Signed-off-by: default avatarShane Seymour <shane.seymour@hpe.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2631b79f
    • Andrey Vagin's avatar
      tcp: add an ability to dump and restore window parameters · b1ed4c4f
      Andrey Vagin authored
      We found that sometimes a restored tcp socket doesn't work.
      
      A reason of this bug is incorrect window parameters and in this case
      tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The
      other side drops packets with this seq, because seq is less than
      tp->rcv_nxt ( tcp_sequence() ).
      
      Data from a send queue is sent only if there is enough space in a
      window, so when we restore unacked data, we need to expand a window to
      fit this data.
      
      This was in a first version of this patch:
      "tcp: extend window to fit all restored unacked data in a send queue"
      
      Then Alexey recommended me to restore window parameters instead of
      adjusted them according with data in a sent queue. This sounds resonable.
      
      rcv_wnd has to be restored, because it was reported to another side
      and the offered window is never shrunk.
      One of reasons why we need to restore snd_wnd was described above.
      
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1ed4c4f
    • David S. Miller's avatar
      Merge branch 'bridge-igmp-stats' · 641f7e40
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: add support for IGMP/MLD stats
      
      This patchset adds support for the new IFLA_STATS_LINK_XSTATS_SLAVE
      attribute which can be used with RTM_GETSTATS in order to export per-slave
      statistics. It works by passing the attribute to the linkxstats callback
      and if the callback user supports it - it should dump that slave's stats.
      This is much more scalable and permits us to request only a single port's
      statistics instead of dumping everything every time.
      The second patch adds support for per-port IGMP/MLD statistics and uses
      the new API to export them for the bridge and its ports. The stats are
      made in a very lightweight manner, the normal fast-path is not affected
      at all and the flood paths (br_flood/br_multicast_flood) are only affected
      if the packet is IGMP and the IGMP stats have been enabled using cache-hot
      data for the check.
      
      v2: Patch 01 is new, patch 02 has been reworked to use the new API, also
      in addition counters for IGMP/MLD parse errors have been added and members
      are added for per-port multicast traffic stats. The multicast counting has
      been slightly optimized (moved the br_multicast_count inside the IPv4/6
      IGMP functions after the checks for IGMP traffic) to avoid one conditional
      that was on all of the multicast traffic path (both IGMP and other).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      641f7e40
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for IGMP/MLD stats and export them via netlink · 1080ab95
      Nikolay Aleksandrov authored
      This patch adds stats support for the currently used IGMP/MLD types by the
      bridge. The stats are per-port (plus one stat per-bridge) and per-direction
      (RX/TX). The stats are exported via netlink via the new linkxstats API
      (RTM_GETSTATS). In order to minimize the performance impact, a new option
      is used to enable/disable the stats - multicast_stats_enabled, similar to
      the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
      lookups and checks, we make use of the current "igmp" member of the bridge
      private skb->cb region to record the type on Rx (both host-generated and
      external packets pass by multicast_rcv()). We can do that since the igmp
      member was used as a boolean and all the valid IGMP/MLD types are positive
      values. The normal bridge fast-path is not affected at all, the only
      affected paths are the flooding ones and since we make use of the IGMP/MLD
      type, we can quickly determine if the packet should be counted using
      cache-hot data (cb's igmp member). We add counters for:
      * IGMP Queries
      * IGMP Leaves
      * IGMP v1/v2/v3 reports
      
      * MLD Queries
      * MLD Leaves
      * MLD v1/v2 reports
      
      These are invaluable when monitoring or debugging complex multicast setups
      with bridges.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1080ab95
    • Nikolay Aleksandrov's avatar
      net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute · 80e73cc5
      Nikolay Aleksandrov authored
      This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
      which allows to export per-slave statistics if the master device supports
      the linkxstats callback. The attribute is passed down to the linkxstats
      callback and it is up to the callback user to use it (an example has been
      added to the only current user - the bridge). This allows us to query only
      specific slaves of master devices like bridge ports and export only what
      we're interested in instead of having to dump all ports and searching only
      for a single one. This will be used to export per-port IGMP/MLD stats and
      also per-port vlan stats in the future, possibly other statistics as well.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80e73cc5
    • David S. Miller's avatar
      Merge branch 'bpf-helper-improvements' · 545c321b
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF helper improvements
      
      This set adds various BPF helper improvements, that is, cleaning
      up and adding BPF_F_CURRENT_CPU flag for tracing helper, allowing
      for preemption checks on bpf_get_smp_processor_id() helper, and
      adding two new helpers bpf_skb_change_{proto, type} for tc related
      programs. For further details please see individual patches.
      
      Note, this set requires -net to be merged into -net-next tree first.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      545c321b
    • Daniel Borkmann's avatar
      bpf: add bpf_skb_change_type helper · d2485c42
      Daniel Borkmann authored
      This work adds a helper for changing skb->pkt_type in a controlled way.
      We only allow a subset of possible values and can extend that in future
      should other use cases come up. Doing this as a helper has the advantage
      that errors can be handeled gracefully and thus helper kept extensible.
      
      It's a write counterpart to pkt_type member we can already read from
      struct __sk_buff context. Major use case is to change incoming skbs to
      PACKET_HOST in a programmatic way instead of having to recirculate via
      redirect(..., BPF_F_INGRESS), for example.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2485c42
    • Daniel Borkmann's avatar
      bpf: add bpf_skb_change_proto helper · 6578171a
      Daniel Borkmann authored
      This patch adds a minimal helper for doing the groundwork of changing
      the skb->protocol in a controlled way. Currently supported is v4 to
      v6 and vice versa transitions, which allows f.e. for a minimal, static
      nat64 implementation where applications in containers that still
      require IPv4 can be transparently operated in an IPv6-only environment.
      For example, host facing veth of the container can transparently do
      the transitions in a programmatic way with the help of clsact qdisc
      and cls_bpf.
      
      Idea is to separate concerns for keeping complexity of the helper
      lower, which means that the programs utilize bpf_skb_change_proto(),
      bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done,
      instead of doing everything in a single helper (and thus partially
      duplicating helper functionality). Also, bpf_skb_change_proto()
      shouldn't need to deal with raw packet data as this is done by other
      helpers.
      
      bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to
      operate on a private one, push or pop additionally required header
      space and migrate the gso/gro meta data from the shared info. We do
      mark the gso type as dodgy so that headers are checked and segs
      recalculated by the gso/gro engine. The gso_size target is adapted
      as well. The flags argument added is currently reserved and can be
      used for future extensions.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6578171a
    • Daniel Borkmann's avatar
      bpf: don't use raw processor id in generic helper · 80b48c44
      Daniel Borkmann authored
      Use smp_processor_id() for the generic helper bpf_get_smp_processor_id()
      instead of the raw variant. This allows for preemption checks when we
      have DEBUG_PREEMPT, and otherwise uses the raw variant anyway. We only
      need to keep the raw variant for socket filters, but we can reuse the
      helper that is already there from cBPF side.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80b48c44
    • Daniel Borkmann's avatar
      bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_read · 6816a7ff
      Daniel Borkmann authored
      Follow-up commit to 1e33759c ("bpf, trace: add BPF_F_CURRENT_CPU
      flag for bpf_perf_event_output") to add the same functionality into
      bpf_perf_event_read() helper. The split of index into flags and index
      component is also safe here, since such large maps are rejected during
      map allocation time.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6816a7ff
    • Daniel Borkmann's avatar
      bpf, trace: fetch current cpu only once · d7931330
      Daniel Borkmann authored
      We currently have two invocations, which is unnecessary. Fetch it only
      once and use the smp_processor_id() variant, so we also get preemption
      checks along with it when DEBUG_PREEMPT is set.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7931330
    • Daniel Borkmann's avatar
      bpf: minor cleanups on fd maps and helpers · 1ca1cc98
      Daniel Borkmann authored
      Some minor cleanups: i) Remove the unlikely() from fd array map lookups
      and let the CPU branch predictor do its job, scenarios where there is not
      always a map entry are very well valid. ii) Move the attribute type check
      in the bpf_perf_event_read() helper a bit earlier so it's consistent wrt
      checks with bpf_perf_event_output() helper as well. iii) remove some
      comments that are self-documenting in kprobe_prog_is_valid_access() and
      therefore make it consistent to tp_prog_is_valid_access() as well.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ca1cc98
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ee58b571
      David S. Miller authored
      Several cases of overlapping changes, except the packet scheduler
      conflicts which deal with the addition of the free list parameter
      to qdisc_enqueue().
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee58b571
  2. 29 Jun, 2016 27 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · e7bdea77
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "Stable bugfixes:
         - Fix _cancel_empty_pagelist
         - Fix a double page unlock
         - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
         - Fix another OPEN_DOWNGRADE bug
      
        Other bugfixes:
         - Ensure we handle delegation errors in nfs4_proc_layoutget()
         - Layout stateids start out as being invalid
         - Add sparse lock annotations for pnfs_find_alloc_layout
         - Handle bad delegation stateids in nfs4_layoutget_handle_exception
         - Fix up O_DIRECT results
         - Fix potential use after free of state in nfs4_do_reclaim.
         - Mark the layout stateid invalid when all segments are removed
         - Don't let readdirplus revalidate an inode that was marked as stale
         - Fix potential race in nfs_fhget()
         - Fix an unused variable warning"
      
      * tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFS: Fix another OPEN_DOWNGRADE bug
        make nfs_atomic_open() call d_drop() on all ->open_context() errors.
        NFS: Fix an unused variable warning
        NFS: Fix potential race in nfs_fhget()
        NFS: Don't let readdirplus revalidate an inode that was marked as stale
        NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
        NFS: Fix a double page unlock
        pnfs_nfs: fix _cancel_empty_pagelist
        nfs4: Fix potential use after free of state in nfs4_do_reclaim.
        NFS: Fix up O_DIRECT results
        NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
        NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
        NFSv4.1/pnfs: Layout stateids start out as being invalid
        NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
      e7bdea77
    • Linus Torvalds's avatar
      Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit · 89a82a92
      Linus Torvalds authored
      Pull audit fixes from Paul Moore:
       "Two small patches to fix audit problems in 4.7-rcX: the first fixes a
        potential kref leak, the second removes some header file noise.
      
        The first is an important bug fix that really should go in before 4.7
        is released, the second is not critical, but falls into the very-nice-
        to-have category so I'm including in the pull request.
      
        Both patches are straightforward, self-contained, and pass our
        testsuite without problem"
      
      * 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
        audit: move audit_get_tty to reduce scope and kabi changes
        audit: move calcs after alloc and check when logging set loginuid
      89a82a92
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 32826ac4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "I've been traveling so this accumulates more than week or so of bug
        fixing.  It perhaps looks a little worse than it really is.
      
         1) Fix deadlock in ath10k driver, from Ben Greear.
      
         2) Increase scan timeout in iwlwifi, from Luca Coelho.
      
         3) Unbreak STP by properly reinjecting STP packets back into the
            stack.  Regression fix from Ido Schimmel.
      
         4) Mediatek driver fixes (missing malloc failure checks, leaking of
            scratch memory, wrong indexing when mapping TX buffers, etc.) from
            John Crispin.
      
         5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
            Sowa.
      
         6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.
      
         7) Fix netlink notifications in ovs for tunnels, delete link messages
            are never emitted because of how the device registry state is
            handled.  From Nicolas Dichtel.
      
         8) Conntrack module leaks kmemcache on unload, from Florian Westphal.
      
         9) Prevent endless jump loops in nft rules, from Liping Zhang and
            Pablo Neira Ayuso.
      
        10) Not early enough spinlock initialization in mlx4, from Eric
            Dumazet.
      
        11) Bind refcount leak in act_ipt, from Cong WANG.
      
        12) Missing RCU locking in HTB scheduler, from Florian Westphal.
      
        13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
            barrier, using heap for SG and IV, and erroneous use of async flag
            when allocating AEAD conext.)
      
        14) RCU handling fix in TIPC, from Ying Xue.
      
        15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
            SIT driver, from Simon Horman.
      
        16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.
      
        17) Fix potential deadlock in team enslave, from Ido Schimmel.
      
        18) Memory leak in KCM procfs handling, from Jiri Slaby.
      
        19) ESN generation fix in ipv4 ESP, from Herbert Xu.
      
        20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
            WANG.
      
        21) Use after free in netem, from Eric Dumazet.
      
        22) Uninitialized last assert time in multicast router code, from Tom
            Goff.
      
        23) Skip raw sockets in sock_diag destruction broadcast, from Willem
            de Bruijn.
      
        24) Fix link status reporting in thunderx, from Sunil Goutham.
      
        25) Limit resegmentation of retransmit queue so that we do not
            retransmit too large GSO frames.  From Eric Dumazet.
      
        26) Delay bpf program release after grace period, from Daniel
            Borkmann"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
        openvswitch: fix conntrack netlink event delivery
        qed: Protect the doorbell BAR with the write barriers.
        neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
        e1000e: keep VLAN interfaces functional after rxvlan off
        cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
        qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
        bpf, perf: delay release of BPF prog after grace period
        net: bridge: fix vlan stats continue counter
        tcp: do not send too big packets at retransmit time
        ibmvnic: fix to use list_for_each_safe() when delete items
        net: thunderx: Fix TL4 configuration for secondary Qsets
        net: thunderx: Fix link status reporting
        net/mlx5e: Reorganize ethtool statistics
        net/mlx5e: Fix number of PFC counters reported to ethtool
        net/mlx5e: Prevent adding the same vxlan port
        net/mlx5e: Check for BlueFlame capability before allocating SQ uar
        net/mlx5e: Change enum to better reflect usage
        net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
        net/mlx5: Update command strings
        net: marvell: Add separate config ANEG function for Marvell 88E1111
        ...
      32826ac4
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 653c574a
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "Another two bug fixes for 4.7:
      
         - The revert of patch which removed boot information for systems
           using an intermediate boot kernel, e.g. the SLES12 grub setup.
      
         - A fix for an incorrect inline assembly constraint that causes
           broken code to be generated with gcc 4.8.5"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: fix test_fp_ctl inline assembly contraints
        Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
      653c574a
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 00bf377d
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Here are a bunch of fixes for pin control.  Just drivers and a
        MAINTAINERS fixup:
      
         - Driver fixes for i.MX, single register, Tegra and BayTrail.
      
         - MAINTAINERS entry for the documentation"
      
      * tag 'pinctrl-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: baytrail: Fix mingled clock pins
        MAINTAINERS: belong Documentation/pinctrl.txt properly
        pinctrl: tegra: Fix build dependency
        gpio: tegra: Make lockdep class file-scoped
        pinctrl: single: Fix missing flush of posted write for a wakeirq
        pinctrl: imx: Do not treat a PIN without MUX register as an error
      00bf377d
    • Linus Torvalds's avatar
      Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 52827f38
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "Three fix patches.  Two are for cgroup / css init failure path.  The
        last one makes css_set_lock irq-safe as the deadline scheduler ends up
        calling put_css_set() from irq context"
      
      * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Disable IRQs while holding css_set_lock
        cgroup: set css->id to -1 during init
        cgroup: remove redundant cleanup in css_create
      52827f38
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2016-06-29-v2' of... · 751ad819
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2016-06-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Just two small fixes
       * fix mesh peer link counter, decrement wasn't always done at all
       * fix ethertype (length) for packets without RFC 1042 or bridge
         tunnel header
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      751ad819
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 6f30e8b0
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-06-27
      
      This series contains updates to i40e and i40evf only.
      
      Mitch provides several changes, first adds functions to enable and disable
      VSI on a VEB, which allows for configuration of limited promiscuous mode
      specifically for bridging purposes.  Sets the RSS Hash Enable registers by
      default now that VF RSS is configured by the PF driver.  Fixed a issue
      where we could overflow the buffer, by checking the address count and bail
      out of the loop at the appropriate time.  Removed the need for a reset
      when the device enters limited promiscuous mode, since this was causing
      heartburn for people who were using VFs and bridging.
      
      Catherine adds a call to set the client interface down when we put the VSI
      down.  Fixed an issue where RSS queues was being limited to the number
      of CPUs, so if a user wants to use more queues than CPUs, we want to
      trust they know what they are doing and let them.
      
      Greg cleans up the driver suspend routine to ensure we are calling
      synchronize_irq() before freeing IRQ vectors and explicitly free the other
      causes interrupt resources and shut down the MSIX interrupt.
      
      Serey fixes i40e_set_settings() to not fail when a Direct Attach (DA)
      cable is used.
      
      Avinash fixes a supported link bug by removing code which was not allowing
      100BaseT to show up in the supported link modes for 10GBaseT PHYs.
      
      Shannon adds a bit of information to the error messages to help determine
      the source of error by adding VSI info to macaddr messages.
      
      Tushar Dave fixes error received when turning off TSO on some systems,
      which was caused by enabling FD_SB without checking availability of
      MSIx vectors, so add the check.
      
      Neerav fixes a possible panic when LLDP/DCBX change happens and the
      driver tried to notify the client(s) for each of the PF VSIs, which would
      panic when it reached a VSI that did not have any netdev associated with
      it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f30e8b0
    • Philippe Reynes's avatar
      net: ethernet: lpc_eth: use phy_ethtool_{get|set}_link_ksettings · cb90d3e1
      Philippe Reynes authored
      There are two generics functions phy_ethtool_{get|set}_link_ksettings,
      so we can use them instead of defining the same code in the driver.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb90d3e1
    • Philippe Reynes's avatar
      net: ethernet: lpc_eth: use phydev from struct net_device · f786f356
      Philippe Reynes authored
      The private structure contain a pointer to phydev, but the structure
      net_device already contain such pointer. So we can remove the pointer
      phydev in the private structure, and update the driver to use the
      one contained in struct net_device.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f786f356
    • Samuel Gauthier's avatar
      openvswitch: fix conntrack netlink event delivery · d913d3a7
      Samuel Gauthier authored
      Only the first and last netlink message for a particular conntrack are
      actually sent. The first message is sent through nf_conntrack_confirm when
      the conntrack is committed. The last one is sent when the conntrack is
      destroyed on timeout. The other conntrack state change messages are not
      advertised.
      
      When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
      is called for each packet, from the postrouting hook, which in turn calls
      nf_ct_deliver_cached_events to send the state change netlink messages.
      
      This commit fixes the problem by calling nf_ct_deliver_cached_events in the
      non-commit case as well.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      CC: Joe Stringer <joestringer@nicira.com>
      CC: Justin Pettit <jpettit@nicira.com>
      CC: Andy Zhou <azhou@nicira.com>
      CC: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarSamuel Gauthier <samuel.gauthier@6wind.com>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d913d3a7
    • Sudarsana Reddy Kalluru's avatar
      qed: Protect the doorbell BAR with the write barriers. · 34c7bb47
      Sudarsana Reddy Kalluru authored
      SPQ doorbell is currently protected with the compilation barrier. Under the
      stress scenarios, we may get into a state where (due to the weak ordering)
      several ramrod doorbells were written to the BAR with an out-of-order
      producer values. Need to change the barrier type to a write barrier to make
      sure that the write buffer is flushed after each doorbell.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34c7bb47
    • Eric Dumazet's avatar
      net_sched: netem: do not call qdisc_drop() with a NULL skb · 8a6e9c67
      Eric Dumazet authored
      If skb_unshare() fails, we call qdisc_drop() with a NULL skb, which
      is no longer supported.
      
      Fixes: 520ac30f ("net_sched: drop packets after root qdisc lock is released")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a6e9c67
    • David Barroso's avatar
      neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() · b560f03d
      David Barroso authored
      neigh_xmit() expects to be called inside an RCU-bh read side critical
      section, and while one of its two current callers gets this right, the
      other one doesn't.
      
      More specifically, neigh_xmit() has two callers, mpls_forward() and
      mpls_output(), and while both callers call neigh_xmit() under
      rcu_read_lock(), this provides sufficient protection for neigh_xmit()
      only in the case of mpls_forward(), as that is always called from
      softirq context and therefore doesn't need explicit BH protection,
      while mpls_output() can be called from process context with softirqs
      enabled.
      
      When mpls_output() is called from process context, with softirqs
      enabled, we can be preempted by a softirq at any time, and RCU-bh
      considers the completion of a softirq as signaling the end of any
      pending read-side critical sections, so if we do get a softirq
      while we are in the part of neigh_xmit() that expects to be run inside
      an RCU-bh read side critical section, we can end up with an unexpected
      RCU grace period running right in the middle of that critical section,
      making things go boom.
      
      This patch fixes this impedance mismatch in the callee, by making
      neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
      expects to be treated as an RCU-bh read side critical section, as this
      seems a safer option than fixing it in the callers.
      
      Fixes: 4fd3d7d9 ("neigh: Add helper function neigh_xmit")
      Signed-off-by: default avatarDavid Barroso <dbarroso@fastly.com>
      Signed-off-by: default avatarLennert Buytenhek <lbuytenhek@fastly.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b560f03d
    • David S. Miller's avatar
      Merge branch 'qed-static-checker' · c973f24d
      David S. Miller authored
      Sudarsana Reddy Kalluru says:
      
      ====================
      qed*: Fix the static checker warnings.
      
      The patch series addresses the static checker warnings introduced by the
      earlier patches related to qed/qede coalesce configuration support.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c973f24d
    • Sudarsana Reddy Kalluru's avatar
      qede: Fix the static checker warnings. · d2890dea
      Sudarsana Reddy Kalluru authored
      Static checker warnings:
      drivers/net/ethernet/qlogic/qede/qede_ethtool.c:435 qede_get_coalesce()
      warn: passing casted pointer '&coal->rx_coalesce_usecs' to
      'edev->ops->common->get_coalesce()' 32 vs 16.
      
      The u32 pointer is being typecasted to u16 which may fail for big-endian
      platforms.
      
      Fixes: d552fa84 ("qede: Add support for coalescing config read/update.")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2890dea
    • Sudarsana Reddy Kalluru's avatar
      qed: Fix static checker warnings. · 51d99880
      Sudarsana Reddy Kalluru authored
      Static checker warnings:
      drivers/net/ethernet/qlogic/qed/qed_int.c:2450 qed_init_cau_sb_entry()
      warn: always true condition '(cdev->rx_coalesce_usecs <= 255) =>
      (0-255 <= 255)'
      drivers/net/ethernet/qlogic/qed/qed_int.c:2511 qed_int_cau_conf_sb()
      warn: always true condition '(p_hwfn->cdev->rx_coalesce_usecs <= 255)
      => (0-255 <= 255)'
      ..
      
      The data types for rx/tx_coalesce_usecs should be u16.
      
      Fixes: commit 722003ac ("qed: Add support for coalescing config read/update.")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51d99880
    • Jarod Wilson's avatar
      e1000e: keep VLAN interfaces functional after rxvlan off · 889ad456
      Jarod Wilson authored
      I've got a bug report about an e1000e interface, where a VLAN interface is
      set up on top of it:
      
      $ ip link add link ens1f0 name ens1f0.99 type vlan id 99
      $ ip link set ens1f0 up
      $ ip link set ens1f0.99 up
      $ ip addr add 192.168.99.92 dev ens1f0.99
      
      At this point, I can ping another host on vlan 99, ip 192.168.99.91.
      However, if I do the following:
      
      $ ethtool -K ens1f0 rxvlan off
      
      Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on
      again. I'm not sure if this is actually intended behavior, or if there's a
      lack of software VLAN stripping fallback, or what, but things continue to
      work if I simply don't call e1000e_vlan_strip_disable() if there are
      active VLANs (plagiarizing a function from the e1000 driver here) on the
      interface.
      
      Also slipped a related-ish fix to the kerneldoc text for
      e1000e_vlan_strip_disable here...
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      889ad456
    • Philippe Reynes's avatar
      net: ethernet: mvpp2: use phy_ethtool_{get|set}_link_ksettings · fb773e97
      Philippe Reynes authored
      There are two generics functions phy_ethtool_{get|set}_link_ksettings,
      so we can use them instead of defining the same code in the driver.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb773e97
    • Philippe Reynes's avatar
      net: ethernet: mvpp2: use phydev from struct net_device · 8e07269d
      Philippe Reynes authored
      The private structure contain a pointer to phydev, but the structure
      net_device already contain such pointer. So we can remove the pointer
      phydev in the private structure, and update the driver to use the
      one contained in struct net_device.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e07269d
    • Felix Fietkau's avatar
      cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header · c041778c
      Felix Fietkau authored
      The PDU length of incoming LLC frames is set to the total skb payload size
      in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly
      includes the length of the IEEE 802.11 header.
      
      The resulting LLC frame header has a too large PDU length, causing the
      llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming
      skb, effectively breaking STP.
      
      Solve the problem by properly substracting the IEEE 802.11 frame header size
      from the PDU length, allowing the LLC processor to pick up the incoming
      control messages.
      
      Special thanks to Gerry Rozema for tracking down the regression and proposing
      a suitable patch.
      
      Fixes: 2d1c304c ("cfg80211: add function for 802.3 conversion with separate output buffer")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarGerry Rozema <gerryr@rozeware.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      c041778c
    • Dan Carpenter's avatar
      qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag() · 5b4d10f5
      Dan Carpenter authored
      There is a static checker warning here "warn: mask and shift to zero"
      and the code sets "ring" to zero every time.  From looking at how
      QLCNIC_FETCH_RING_ID() is used in qlcnic_83xx_process_rcv_ring() the
      qlcnic_83xx_hndl() should be removed.
      
      Fixes: 4be41e92 ('qlcnic: 83xx data path routines')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b4d10f5
    • Daniel Borkmann's avatar
      bpf, perf: delay release of BPF prog after grace period · ceb56070
      Daniel Borkmann authored
      Commit dead9f29 ("perf: Fix race in BPF program unregister") moved
      destruction of BPF program from free_event_rcu() callback to __free_event(),
      which is problematic if used with tail calls: if prog A is attached as
      trace event directly, but at the same time present in a tail call map used
      by another trace event program elsewhere, then we need to delay destruction
      via RCU grace period since it can still be in use by the program doing the
      tail call (the prog first needs to be dropped from the tail call map, then
      trace event with prog A attached destroyed, so we get immediate destruction).
      
      Fixes: dead9f29 ("perf: Fix race in BPF program unregister")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Jann Horn <jann@thejh.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceb56070
    • Nikolay Aleksandrov's avatar
      net: bridge: fix vlan stats continue counter · 565ce8f3
      Nikolay Aleksandrov authored
      I made a dumb off-by-one mistake when I added the vlan stats counter
      dumping code. The increment should happen before the check, not after
      otherwise we miss one entry when we continue dumping.
      
      Fixes: a60c0903 ("bridge: netlink: export per-vlan stats")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      565ce8f3
    • Eric Dumazet's avatar
      tcp: do not send too big packets at retransmit time · a3d2e9f8
      Eric Dumazet authored
      Arjun reported a bug in TCP stack and bisected it to a recent commit.
      
      In case where we process SACK, we can coalesce multiple skbs
      into fat ones (tcp_shift_skb_data()), to lower write queue
      overhead, because we do not expect to retransmit these packets.
      
      However, SACK reneging can happen, forcing the sender to retransmit
      all these packets. If skb->len is above 64KB, we then send buggy
      IP packets that could hang TSO engine on cxgb4.
      
      Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
      so that we cook packets of optimal size vs TCP/pacing.
      
      Thanks to Arjun for reporting the bug and running the tests !
      
      Fixes: 10d3be56 ("tcp-tso: do not split TSO packets at retransmit time")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarArjun V <arjun@chelsio.com>
      Tested-by: default avatarArjun V <arjun@chelsio.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3d2e9f8
    • Wei Yongjun's avatar
      ibmvnic: fix to use list_for_each_safe() when delete items · 96183182
      Wei Yongjun authored
      Since we will remove items off the list using list_del() we need
      to use a safe version of the list_for_each() macro aptly named
      list_for_each_safe().
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96183182
    • Richard Alpe's avatar
      tipc: rename udp_port in struct udp_media_addr · bc3a334c
      Richard Alpe authored
      Context implies that port in struct "udp_media_addr" is referring
      to a UDP port.
      Signed-off-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc3a334c