1. 17 Feb, 2020 1 commit
    • Florian Westphal's avatar
      netfilter: conntrack: allow insertion of clashing entries · 6a757c07
      Florian Westphal authored
      This patch further relaxes the need to drop an skb due to a clash with
      an existing conntrack entry.
      
      Current clash resolution handles the case where the clash occurs between
      two identical entries (distinct nf_conn objects with same tuples), i.e.:
      
                          Original                        Reply
      existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      
      ... existing handling will discard the unconfirmed clashing entry and
      makes skb->_nfct point to the existing one.  The skb can then be
      processed normally just as if the clash would not have existed in the
      first place.
      
      For other clashes, the skb needs to be dropped.
      This frequently happens with DNS resolvers that send A and AAAA queries
      back-to-back when NAT rules are present that cause packets to get
      different DNAT transformations applied, for example:
      
      -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
      -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353
      
      In this case the A or AAAA query is dropped which incurs a costly
      delay during name resolution.
      
      This patch also allows this collision type:
                             Original                   Reply
      existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.7:5353
      
      In this case, clash is in original direction -- the reply direction
      is still unique.
      
      The change makes it so that when the 2nd colliding packet is received,
      the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed
      1 second timeout and is inserted in the reply direction only.
      
      The entry is hidden from 'conntrack -L', it will time out quickly
      and it can be early dropped because it will never progress to the
      ASSURED state.
      
      To avoid special-casing the delete code path to special case
      the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is
      added so hlist_nulls_del() will work.
      
      Example:
      
            CPU A:                               CPU B:
      1.  10.2.3.4:42 -> 10.8.8.8:53 (A)
      2.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
      3.  Apply DNAT, reply changed to 10.0.0.6
      4.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
      5.                                         Apply DNAT, reply changed to 10.0.0.7
      6. confirm/commit to conntrack table, no collisions
      7.                                         commit clashing entry
      
      Reply comes in:
      
      10.2.3.4:42 <- 10.0.0.6:5353 (A)
       -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
      10.2.3.4:42 <- 10.0.0.7:5353 (AAAA)
       -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
          The conntrack entry is deleted from table, as it has the NAT_CLASH
          bit set.
      
      In case of a retransmit from ORIGINAL dir, all further packets will get
      the DNAT transformation to 10.0.0.6.
      
      I tried to come up with other solutions but they all have worse
      problems.
      
      Alternatives considered were:
      1.  Confirm ct entries at allocation time, not in postrouting.
       a. will cause uneccesarry work when the skb that creates the
          conntrack is dropped by ruleset.
       b. in case nat is applied, ct entry would need to be moved in
          the table, which requires another spinlock pair to be taken.
       c. breaks the 'unconfirmed entry is private to cpu' assumption:
          we would need to guard all nfct->ext allocation requests with
          ct->lock spinlock.
      
      2. Make the unconfirmed list a hash table instead of a pcpu list.
         Shares drawback c) of the first alternative.
      
      3. Document this is expected and force users to rearrange their
         ruleset (e.g. by using "-m cluster" instead of "-m statistics").
         nft has the 'jhash' expression which can be used instead of 'numgen'.
      
         Major drawback: doesn't fix what I consider a bug, not very realistic
         and I believe its reasonable to have the existing rulesets to 'just
         work'.
      
      4. Document this is expected and force users to steer problematic
         packets to the same CPU -- this would serialize the "allocate new
         conntrack entry/nat table evaluation/perform nat/confirm entry", so
         no race can occur.  Similar drawback to 3.
      
      Another advantage of this patch compared to 1) and 2) is that there are
      no changes to the hot path; things are handled in the udp tracker and
      the clash resolution path.
      
      Cc: rcu@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6a757c07
  2. 11 Feb, 2020 3 commits
  3. 07 Feb, 2020 21 commits
    • Florian Westphal's avatar
      netfilter: flowtable: skip offload setup if disabled · a7da92c2
      Florian Westphal authored
      nftables test case
      tests/shell/testcases/flowtable/0001flowtable_0
      
      results in a crash. After the refactor, if we leave early via
      nf_flowtable_hw_offload(), then "struct flow_block_offload" is left
      in an uninitialized state, but later users assume its initialised.
      
      Fixes: a7965d58 ("netfilter: flowtable: add nf_flow_table_offload_cmd()")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a7da92c2
    • Cong Wang's avatar
      netfilter: xt_hashlimit: limit the max size of hashtable · 8d0015a7
      Cong Wang authored
      The user-specified hashtable size is unbound, this could
      easily lead to an OOM or a hung task as we hold the global
      mutex while allocating and initializing the new hashtable.
      
      Add a max value to cap both cfg->size and cfg->max, as
      suggested by Florian.
      
      Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8d0015a7
    • Cong Wang's avatar
      netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put() · c4a3922d
      Cong Wang authored
      It is unnecessary to hold hashlimit_mutex for htable_destroy()
      as it is already removed from the global hashtable and its
      refcount is already zero.
      
      Also, switch hinfo->use to refcount_t so that we don't have
      to hold the mutex until it reaches zero in htable_put().
      
      Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c4a3922d
    • David S. Miller's avatar
      Merge branch 'stmmac-fixes' · 259039fa
      David S. Miller authored
      Ong Boon Leong says:
      
      ====================
      net: stmmac: general fixes for Ethernet functionality
      
      1/5: It ensures that the previous value of GMAC_VLAN_TAG register is
           read first before for updating the register.
      
      2/5: Similar to 2/6 patch but it is a fix for XGMAC_VLAN_TAG register
           as requested by Jose Abreu.
      
      3/5: It ensures the GMAC IP v4.xx and above behaves correctly to:-
             ip link set <devname> multicast off|on
      
      4/5: Added similar IFF_MULTICAST flag for xgmac2, similar to 4/6.
      
      5/5: It ensures PCI platform data is using plat->phy_interface.
      
      Changes from v4:-
         patch 1/6 - this patch is dropped now and will take the input on
                     handling return value from netif_set_real_num_rx|
                     tx_queues() in future patch series.
      
      v3:-
         patch 1/6 - add rtnl_lock() and rtnl_unlock() for stmmac_hw_setup()
                     called inside stmmac_resume()
         patch 3/6 - Added new patch to fix XGMAC_VLAN_TAG register writting
      
      v2:-
         patch 1/5 - added control for rtnl_lock() & rtnl_unlock() to ensure
                     they are used forstmmac_resume()
         patch 4/5 - added IFF_MULTICAST flag check for xgmac to ensure
                     multicast works correctly.
      
      v1:-
       - Drop v1 patches (1/7, 3/7 & 4/7) that are not valid.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      259039fa
    • Voon Weifeng's avatar
      net: stmmac: update pci platform data to use phy_interface · 909c1dde
      Voon Weifeng authored
      The recent patch to support passive mode converter did not take care the
      phy interface configuration in PCI platform data. Hence, converting all
      the PCI platform data from plat->interface to plat->phy_interface as the
      default mode is meant for PHY.
      
      Fixes: 0060c878 ("net: stmmac: implement support for passive mode converters via dt")
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Tested-by: default avatarTan, Tee Min <tee.min.tan@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      909c1dde
    • Tan, Tee Min's avatar
      net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter · 2f633d58
      Tan, Tee Min authored
      Without checking for IFF_MULTICAST flag, it is wrong to assume multicast
      filtering is always enabled. By checking against IFF_MULTICAST, now
      the driver behaves correctly when the multicast support is toggled by below
      command:-
        ip link set <devname> multicast off|on
      
      Fixes: 0efedbf1 ("net: stmmac: xgmac: Fix XGMAC selftests")
      Signed-off-by: default avatarTan, Tee Min <tee.min.tan@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f633d58
    • Verma, Aashish's avatar
      net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter · 2ba31cd9
      Verma, Aashish authored
      Without checking for IFF_MULTICAST flag, it is wrong to assume multicast
      filtering is always enabled. By checking against IFF_MULTICAST, now
      the driver behaves correctly when the multicast support is toggled by below
      command:-
        ip link set <devname> multicast off|on
      
      Fixes: 477286b5 ("stmmac: add GMAC4 core support")
      Signed-off-by: default avatarVerma, Aashish <aashishx.verma@intel.com>
      Tested-by: default avatarTan, Tee Min <tee.min.tan@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ba31cd9
    • Ong Boon Leong's avatar
      net: stmmac: xgmac: fix incorrect XGMAC_VLAN_TAG register writting · 907a0768
      Ong Boon Leong authored
      We should always do a read of current value of XGMAC_VLAN_TAG instead of
      directly overwriting the register value.
      
      Fixes: 3cd1cfcb ("net: stmmac: Implement VLAN Hash Filtering in XGMAC")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      907a0768
    • Tan, Tee Min's avatar
      net: stmmac: fix incorrect GMAC_VLAN_TAG register writting in GMAC4+ · 9eeeb3c9
      Tan, Tee Min authored
      It should always do a read of current value of GMAC_VLAN_TAG instead of
      directly overwriting the register value.
      
      Fixes: c1be0022 ("net: stmmac: Add VLAN HASH filtering support in GMAC4+")
      Signed-off-by: default avatarTan, Tee Min <tee.min.tan@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eeeb3c9
    • Haiyang Zhang's avatar
      hv_netvsc: Fix XDP refcnt for synthetic and VF NICs · 184367dc
      Haiyang Zhang authored
      The caller of XDP_SETUP_PROG has already incremented refcnt in
      __bpf_prog_get(), so drivers should only increment refcnt by
      num_queues - 1.
      
      To fix the issue, update netvsc_xdp_set() to add the correct number
      to refcnt.
      
      Hold a refcnt in netvsc_xdp_set()’s other caller, netvsc_attach().
      
      And, do the same in netvsc_vf_setxdp(). Otherwise, every time when VF is
      removed and added from the host side, the refcnt will be decreased by one,
      which may cause page fault when unloading xdp program.
      
      Fixes: 351e1581 ("hv_netvsc: Add XDP support")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      184367dc
    • David S. Miller's avatar
      Merge branch 'taprio-Some-fixes' · 6910fe95
      David S. Miller authored
      Vinicius Costa Gomes says:
      
      ====================
      taprio: Some fixes
      
      Changes from v3:
        - Replaced ENOTSUPP error code with EOPNOTSUPP (Jakub Kicinski);
        - Added the missing policy validation for the flags netlink argument
          (Jakub Kicinski);
        - Fixed the destroy() flow to also destroy the priority to traffic
          class mapping (David Miller);
        - Fixed dropping packets when taprio offloading is used together
          with ETF offloading (more on this below);
      
      Changes from v2:
        - Squashed commits 2/3 and 3/3 into a single one (I think a single
          commit is going to be easier to review);
        - Removed an "improvement" that was causing changes in user visible
          behavior;
      
      Changes from v1:
        - Fixed ignoring the 'flags' argument when adding a new
          instance (Vladimir Oltean);
        - Changed the order of commits;
      
      Updated cover letter:
      
      One bit that might need some attention is the fix for not dropping all
      packets when taprio and ETF offloading are used, patch 5/5. The
      behavior when the fix is applied is that packets that have a 'txtime'
      that would fall outside of their transmission window are now dropped
      by taprio. The question that might be raised is: should taprio be
      responsible for dropping these packets, or should it be handled lower
      in the stack?
      
      My opinion is: taprio has all the information, and it's able to give
      feeback to the user. Lower in the stack, those packets might go into
      the void, and the only feedback could be a hard to find counter
      increasing.
      
      Patch 1/5: Reported by Po Liu, is more of a improvement of usability for
      drivers implementing offloading features, now they can rely on the
      value of dev->num_tc, instead of going through some hops to get this
      value.
      
      Patch 2/5: Use 'q->flags' as the source of truth for the offloading
      flags. Tries to solidify the current behavior, while avoiding going
      into invalid states, one of which was causing a "rcu stall" (more
      information in the commit message).
      
      Patch 3/5: Adds the missing netlink attribute validation for
      TCA_TAPRIO_ATTR_FLAGS.
      
      Patch 4/5: Replaces the usage of netdev_set_num_tc() with
      netdev_reset_tc() in taprio_destroy(), taprio_destroy() is called when
      applying a configuration fails, making sure that the device traffic
      class configuration goes back to the default state.
      
      @Vladimir: If possible, I would appreciate your Ack on patch 2/5. I
      have been looking at this code for so long that I might have missed
      something obvious (and my growing dislike for the word 'flags' may be
      affecting my judgement :-).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6910fe95
    • Vinicius Costa Gomes's avatar
      taprio: Fix dropping packets when using taprio + ETF offloading · bfabd41d
      Vinicius Costa Gomes authored
      When using taprio offloading together with ETF offloading, configured
      like this, for example:
      
      $ tc qdisc replace dev $IFACE parent root handle 100 taprio \
        	num_tc 4 \
              map 2 2 1 0 3 2 2 2 2 2 2 2 2 2 2 2 \
      	queues 1@0 1@1 1@2 1@3 \
      	base-time $BASE_TIME \
      	sched-entry S 01 1000000 \
      	sched-entry S 0e 1000000 \
      	flags 0x2
      
      $ tc qdisc replace dev $IFACE parent 100:1 etf \
           	offload delta 300000 clockid CLOCK_TAI
      
      During enqueue, it works out that the verification added for the
      "txtime" assisted mode is run when using taprio + ETF offloading, the
      only thing missing is initializing the 'next_txtime' of all the cycle
      entries. (if we don't set 'next_txtime' all packets from SO_TXTIME
      sockets are dropped)
      
      Fixes: 4cfd5779 ("taprio: Add support for txtime-assist mode")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfabd41d
    • Vinicius Costa Gomes's avatar
      taprio: Use taprio_reset_tc() to reset Traffic Classes configuration · 7c16680a
      Vinicius Costa Gomes authored
      When destroying the current taprio instance, which can happen when the
      creation of one fails, we should reset the traffic class configuration
      back to the default state.
      
      netdev_reset_tc() is a better way because in addition to setting the
      number of traffic classes to zero, it also resets the priority to
      traffic classes mapping to the default value.
      
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c16680a
    • Vinicius Costa Gomes's avatar
      taprio: Add missing policy validation for flags · 49c684d7
      Vinicius Costa Gomes authored
      netlink policy validation for the 'flags' argument was missing.
      
      Fixes: 4cfd5779 ("taprio: Add support for txtime-assist mode")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49c684d7
    • Vinicius Costa Gomes's avatar
      taprio: Fix still allowing changing the flags during runtime · a9d62274
      Vinicius Costa Gomes authored
      Because 'q->flags' starts as zero, and zero is a valid value, we
      aren't able to detect the transition from zero to something else
      during "runtime".
      
      The solution is to initialize 'q->flags' with an invalid value, so we
      can detect if 'q->flags' was set by the user or not.
      
      To better solidify the behavior, 'flags' handling is moved to a
      separate function. The behavior is:
       - 'flags' if unspecified by the user, is assumed to be zero;
       - 'flags' cannot change during "runtime" (i.e. a change() request
       cannot modify it);
      
      With this new function we can remove taprio_flags, which should reduce
      the risk of future accidents.
      
      Allowing flags to be changed was causing the following RCU stall:
      
      [ 1730.558249] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      [ 1730.558258] rcu: 	  6-...0: (190 ticks this GP) idle=922/0/0x1 softirq=25580/25582 fqs=16250
      [ 1730.558264] 		  (detected by 2, t=65002 jiffies, g=33017, q=81)
      [ 1730.558269] Sending NMI from CPU 2 to CPUs 6:
      [ 1730.559277] NMI backtrace for cpu 6
      [ 1730.559277] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G            E     5.5.0-rc6+ #35
      [ 1730.559278] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019
      [ 1730.559278] RIP: 0010:__hrtimer_run_queues+0xe2/0x440
      [ 1730.559278] Code: 48 8b 43 28 4c 89 ff 48 8b 75 c0 48 89 45 c8 e8 f4 bb 7c 00 0f 1f 44 00 00 65 8b 05 40 31 f0 68 89 c0 48 0f a3 05 3e 5c 25 01 <0f> 82 fc 01 00 00 48 8b 45 c8 48 89 df ff d0 89 45 c8 0f 1f 44 00
      [ 1730.559279] RSP: 0018:ffff9970802d8f10 EFLAGS: 00000083
      [ 1730.559279] RAX: 0000000000000006 RBX: ffff8b31645bff38 RCX: 0000000000000000
      [ 1730.559280] RDX: 0000000000000000 RSI: ffffffff9710f2ec RDI: ffffffff978daf0e
      [ 1730.559280] RBP: ffff9970802d8f68 R08: 0000000000000000 R09: 0000000000000000
      [ 1730.559280] R10: 0000018336d7944e R11: 0000000000000001 R12: ffff8b316e39f9c0
      [ 1730.559281] R13: ffff8b316e39f940 R14: ffff8b316e39f998 R15: ffff8b316e39f7c0
      [ 1730.559281] FS:  0000000000000000(0000) GS:ffff8b316e380000(0000) knlGS:0000000000000000
      [ 1730.559281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1730.559281] CR2: 00007f1105303760 CR3: 0000000227210005 CR4: 00000000003606e0
      [ 1730.559282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1730.559282] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1730.559282] Call Trace:
      [ 1730.559282]  <IRQ>
      [ 1730.559283]  ? taprio_dequeue_soft+0x2d0/0x2d0 [sch_taprio]
      [ 1730.559283]  hrtimer_interrupt+0x104/0x220
      [ 1730.559283]  ? irqtime_account_irq+0x34/0xa0
      [ 1730.559283]  smp_apic_timer_interrupt+0x6d/0x230
      [ 1730.559284]  apic_timer_interrupt+0xf/0x20
      [ 1730.559284]  </IRQ>
      [ 1730.559284] RIP: 0010:cpu_idle_poll+0x35/0x1a0
      [ 1730.559285] Code: 88 82 ff 65 44 8b 25 12 7d 73 68 0f 1f 44 00 00 e8 90 c3 89 ff fb 65 48 8b 1c 25 c0 7e 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 be a8 a8 00 85 c0 75 ed e8 75 48 84 ff
      [ 1730.559285] RSP: 0018:ffff997080137ea8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
      [ 1730.559285] RAX: 0000000000000001 RBX: ffff8b316bc3c580 RCX: 0000000000000000
      [ 1730.559286] RDX: 0000000000000001 RSI: 000000002819aad9 RDI: ffffffff978da730
      [ 1730.559286] RBP: ffff997080137ec0 R08: 0000018324a6d387 R09: 0000000000000000
      [ 1730.559286] R10: 0000000000000400 R11: 0000000000000001 R12: 0000000000000006
      [ 1730.559286] R13: ffff8b316bc3c580 R14: 0000000000000000 R15: 0000000000000000
      [ 1730.559287]  ? cpu_idle_poll+0x20/0x1a0
      [ 1730.559287]  ? cpu_idle_poll+0x20/0x1a0
      [ 1730.559287]  do_idle+0x4d/0x1f0
      [ 1730.559287]  ? complete+0x44/0x50
      [ 1730.559288]  cpu_startup_entry+0x1b/0x20
      [ 1730.559288]  start_secondary+0x142/0x180
      [ 1730.559288]  secondary_startup_64+0xb6/0xc0
      [ 1776.686313] nvme nvme0: I/O 96 QID 1 timeout, completion polled
      
      Fixes: 4cfd5779 ("taprio: Add support for txtime-assist mode")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9d62274
    • Vinicius Costa Gomes's avatar
      taprio: Fix enabling offload with wrong number of traffic classes · 5652e63d
      Vinicius Costa Gomes authored
      If the driver implementing taprio offloading depends on the value of
      the network device number of traffic classes (dev->num_tc) for
      whatever reason, it was going to receive the value zero. The value was
      only set after the offloading function is called.
      
      So, moving setting the number of traffic classes to before the
      offloading function is called fixes this issue. This is safe because
      this only happens when taprio is instantiated (we don't allow this
      configuration to be changed without first removing taprio).
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Reported-by: default avatarPo Liu <po.liu@nxp.com>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5652e63d
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port · de34d708
      Florian Fainelli authored
      The 7445 switch clocking profiles do not allow us to run the IMP port at
      2Gb/sec in a way that it is reliable and consistent. Make sure that the
      setting is only applied to the 7278 family.
      
      Fixes: 8f1880cb ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de34d708
    • Florian Fainelli's avatar
      net: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan() · df373702
      Florian Fainelli authored
      b53_configure_vlan() is called by the bcm_sf2 driver upon setup and
      indirectly through resume as well. During the initial setup, we are
      guaranteed that dev->vlan_enabled is false, so there is no change in
      behavior, however during suspend, we may have enabled VLANs before, so we
      do want to restore that setting.
      
      Fixes: dad8d7c6 ("net: dsa: b53: Properly account for VLAN filtering")
      Fixes: 967dd82f ("net: dsa: b53: Add support for Broadcom RoboSwitch")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df373702
    • Dejin Zheng's avatar
      net: stmmac: fix a possible endless loop · 7d10f077
      Dejin Zheng authored
      It forgot to reduce the value of the variable retry in a while loop
      in the ethqos_configure() function. It may cause an endless loop and
      without timeout.
      
      Fixes: a7c30e62 ("net: stmmac: Add driver for Qualcomm ethqos")
      Signed-off-by: default avatarDejin Zheng <zhengdejin5@gmail.com>
      Acked-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d10f077
    • David Howells's avatar
      rxrpc: Fix call RCU cleanup using non-bh-safe locks · 963485d4
      David Howells authored
      rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
      put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
      number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
      local->client_conns_lock.  RCU callbacks, however, are normally called from
      softirq context, which can cause lockdep to notice the locking
      inconsistency.
      
      To get lockdep to detect this, it's necessary to have the connection
      cleaned up on the put at the end of the last of its calls, though normally
      the clean up is deferred.  This can be induced, however, by starting a call
      on an AF_RXRPC socket and then closing the socket without reading the
      reply.
      
      Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
      workqueue if in softirq-mode and defer the destruction to process context.
      
      Note that another way to fix this could be to add a bunch of bh-disable
      annotations to the spinlocks concerned - and there might be more than just
      those two - but that means spending more time with BHs disabled.
      
      Note also that some of these places were covered by bh-disable spinlocks
      belonging to the rxrpc_transport object, but these got removed without the
      _bh annotation being retained on the next lock in.
      
      Fixes: 999b69f8 ("rxrpc: Kill the client connection bundle concept")
      Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
      Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      963485d4
    • David Howells's avatar
      rxrpc: Fix service call disconnection · b39a934e
      David Howells authored
      The recent patch that substituted a flag on an rxrpc_call for the
      connection pointer being NULL as an indication that a call was disconnected
      puts the set_bit in the wrong place for service calls.  This is only a
      problem if a call is implicitly terminated by a new call coming in on the
      same connection channel instead of a terminating ACK packet.
      
      In such a case, rxrpc_input_implicit_end_call() calls
      __rxrpc_disconnect_call(), which is now (incorrectly) setting the
      disconnection bit, meaning that when rxrpc_release_call() is later called,
      it doesn't call rxrpc_disconnect_call() and so the call isn't removed from
      the peer's error distribution list and the list gets corrupted.
      
      KASAN finds the issue as an access after release on a call, but the
      position at which it occurs is confusing as it appears to be related to a
      different call (the call site is where the latter call is being removed
      from the error distribution list and either the next or pprev pointer
      points to a previously released call).
      
      Fix this by moving the setting of the flag from __rxrpc_disconnect_call()
      to rxrpc_disconnect_call() in the same place that the connection pointer
      was being cleared.
      
      Fixes: 5273a191 ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b39a934e
  4. 06 Feb, 2020 13 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f798a5a0
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-02-06
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.19:
       ('net/mlx5: IPsec, Fix esp modify function attribute')
       ('net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx')
      
      For -stable v5.4:
         ('net/mlx5: Deprecate usage of generic TLS HW capability bit')
         ('net/mlx5: Fix deadlock in fs_core')
      
      For -stable v5.5:
         ('net/mlx5e: TX, Error completion is for last WQE in batch')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f798a5a0
    • Tariq Toukan's avatar
      net/mlx5: Deprecate usage of generic TLS HW capability bit · 61c00cca
      Tariq Toukan authored
      Deprecate the generic TLS cap bit, use the new TX-specific
      TLS cap bit instead.
      
      Fixes: a12ff35e ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      61c00cca
    • Tariq Toukan's avatar
      net/mlx5e: TX, Error completion is for last WQE in batch · b57e66ad
      Tariq Toukan authored
      For a cyclic work queue, when not requesting a completion per WQE,
      a single CQE might indicate the completion of several WQEs.
      However, in case some WQE in the batch causes an error, then an error
      completion is issued, breaking the batch, and pointing to the offending
      WQE in the wqe_counter field.
      
      Hence, WQE-specific error CQE handling (like printing, breaking, etc...)
      should be performed only for the last WQE in batch.
      
      Fixes: 130c7b46 ("net/mlx5e: TX, Dump WQs wqe descriptors on CQE with error events")
      Fixes: fd9b4be8 ("net/mlx5e: RX, Support multiple outstanding UMR posts")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b57e66ad
    • Raed Salem's avatar
      net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx · 08db2cf5
      Raed Salem authored
      SA context is allocated at mlx5_fpga_ipsec_create_sa_ctx,
      however the counterpart mlx5_fpga_ipsec_delete_sa_ctx function
      nullifies sa_ctx pointer without freeing the memory allocated,
      hence the memory leak.
      
      Fix by free SA context when the SA is released.
      
      Fixes: d6c4f029 ("net/mlx5: Refactor accel IPSec code")
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      08db2cf5
    • Raed Salem's avatar
      net/mlx5: IPsec, Fix esp modify function attribute · 0dc2c534
      Raed Salem authored
      The function mlx5_fpga_esp_validate_xfrm_attrs is wrongly used
      with negative negation as zero value indicates success but it
      used as failure return value instead.
      
      Fix by remove the unary not negation operator.
      
      Fixes: 05564d0a ("net/mlx5: Add flow-steering commands for FPGA IPSec implementation")
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0dc2c534
    • Maor Gottlieb's avatar
      net/mlx5: Fix deadlock in fs_core · c1948390
      Maor Gottlieb authored
      free_match_list could be called when the flow table is already
      locked. We need to pass this notation to tree_put_node.
      
      It fixes the following lockdep warnning:
      
      [ 1797.268537] ============================================
      [ 1797.276837] WARNING: possible recursive locking detected
      [ 1797.285101] 5.5.0-rc5+ #10 Not tainted
      [ 1797.291641] --------------------------------------------
      [ 1797.299917] handler10/9296 is trying to acquire lock:
      [ 1797.307885] ffff889ad399a0a0 (&node->lock){++++}, at:
      tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.319694]
      [ 1797.319694] but task is already holding lock:
      [ 1797.330904] ffff889ad399a0a0 (&node->lock){++++}, at:
      nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
      [ 1797.344707]
      [ 1797.344707] other info that might help us debug this:
      [ 1797.356952]  Possible unsafe locking scenario:
      [ 1797.356952]
      [ 1797.368333]        CPU0
      [ 1797.373357]        ----
      [ 1797.378364]   lock(&node->lock);
      [ 1797.384222]   lock(&node->lock);
      [ 1797.390031]
      [ 1797.390031]  *** DEADLOCK ***
      [ 1797.390031]
      [ 1797.403003]  May be due to missing lock nesting notation
      [ 1797.403003]
      [ 1797.414691] 3 locks held by handler10/9296:
      [ 1797.421465]  #0: ffff889cf2c5a110 (&block->cb_lock){++++}, at:
      tc_setup_cb_add+0x70/0x250
      [ 1797.432810]  #1: ffff88a030081490 (&comp->sem){++++}, at:
      mlx5_devcom_get_peer_data+0x4c/0xb0 [mlx5_core]
      [ 1797.445829]  #2: ffff889ad399a0a0 (&node->lock){++++}, at:
      nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
      [ 1797.459913]
      [ 1797.459913] stack backtrace:
      [ 1797.469436] CPU: 1 PID: 9296 Comm: handler10 Kdump: loaded Not
      tainted 5.5.0-rc5+ #10
      [ 1797.480643] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS
      2.4.3 01/17/2017
      [ 1797.491480] Call Trace:
      [ 1797.496701]  dump_stack+0x96/0xe0
      [ 1797.502864]  __lock_acquire.cold.63+0xf8/0x212
      [ 1797.510301]  ? lockdep_hardirqs_on+0x250/0x250
      [ 1797.517701]  ? mark_held_locks+0x55/0xa0
      [ 1797.524547]  ? quarantine_put+0xb7/0x160
      [ 1797.531422]  ? lockdep_hardirqs_on+0x17d/0x250
      [ 1797.538913]  lock_acquire+0xd6/0x1f0
      [ 1797.545529]  ? tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.553701]  down_write+0x94/0x140
      [ 1797.560206]  ? tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.568464]  ? down_write_killable_nested+0x170/0x170
      [ 1797.576925]  ? del_hw_flow_group+0xde/0x1f0 [mlx5_core]
      [ 1797.585629]  tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.593891]  ? free_match_list.part.25+0x147/0x170 [mlx5_core]
      [ 1797.603389]  free_match_list.part.25+0xe0/0x170 [mlx5_core]
      [ 1797.612654]  _mlx5_add_flow_rules+0x17e2/0x20b0 [mlx5_core]
      [ 1797.621838]  ? lock_acquire+0xd6/0x1f0
      [ 1797.629028]  ? esw_get_prio_table+0xb0/0x3e0 [mlx5_core]
      [ 1797.637981]  ? alloc_insert_flow_group+0x420/0x420 [mlx5_core]
      [ 1797.647459]  ? try_to_wake_up+0x4c7/0xc70
      [ 1797.654881]  ? lock_downgrade+0x350/0x350
      [ 1797.662271]  ? __mutex_unlock_slowpath+0xb1/0x3f0
      [ 1797.670396]  ? find_held_lock+0xac/0xd0
      [ 1797.677540]  ? mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
      [ 1797.686467]  mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
      [ 1797.695134]  ? _mlx5_add_flow_rules+0x20b0/0x20b0 [mlx5_core]
      [ 1797.704270]  ? irq_exit+0xa5/0x170
      [ 1797.710764]  ? retint_kernel+0x10/0x10
      [ 1797.717698]  ? mlx5_eswitch_set_rule_source_port.isra.9+0x122/0x230
      [mlx5_core]
      [ 1797.728708]  mlx5_eswitch_add_offloaded_rule+0x465/0x6d0 [mlx5_core]
      [ 1797.738713]  ? mlx5_eswitch_get_prio_range+0x30/0x30 [mlx5_core]
      [ 1797.748384]  ? mlx5_fc_stats_work+0x670/0x670 [mlx5_core]
      [ 1797.757400]  mlx5e_tc_offload_fdb_rules.isra.27+0x24/0x90 [mlx5_core]
      [ 1797.767665]  mlx5e_tc_add_fdb_flow+0xaf8/0xd40 [mlx5_core]
      [ 1797.776886]  ? mlx5e_encap_put+0xd0/0xd0 [mlx5_core]
      [ 1797.785562]  ? mlx5e_alloc_flow.isra.43+0x18c/0x1c0 [mlx5_core]
      [ 1797.795353]  __mlx5e_add_fdb_flow+0x2e2/0x440 [mlx5_core]
      [ 1797.804558]  ? mlx5e_tc_update_neigh_used_value+0x8c0/0x8c0
      [mlx5_core]
      [ 1797.815093]  ? wait_for_completion+0x260/0x260
      [ 1797.823272]  mlx5e_configure_flower+0xe94/0x1620 [mlx5_core]
      [ 1797.832792]  ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
      [ 1797.842096]  ? down_read+0x11a/0x2e0
      [ 1797.849090]  ? down_write+0x140/0x140
      [ 1797.856142]  ? mlx5e_rep_indr_setup_block_cb+0xc0/0xc0 [mlx5_core]
      [ 1797.866027]  tc_setup_cb_add+0x11a/0x250
      [ 1797.873339]  fl_hw_replace_filter+0x25e/0x320 [cls_flower]
      [ 1797.882385]  ? fl_hw_destroy_filter+0x1c0/0x1c0 [cls_flower]
      [ 1797.891607]  fl_change+0x1d54/0x1fb6 [cls_flower]
      [ 1797.899772]  ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
      [cls_flower]
      [ 1797.910728]  ? lock_downgrade+0x350/0x350
      [ 1797.918187]  ? __radix_tree_lookup+0xa5/0x130
      [ 1797.926046]  ? fl_set_key+0x1590/0x1590 [cls_flower]
      [ 1797.934611]  ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
      [cls_flower]
      [ 1797.945673]  tc_new_tfilter+0xcd1/0x1240
      [ 1797.953138]  ? tc_del_tfilter+0xb10/0xb10
      [ 1797.960688]  ? avc_has_perm_noaudit+0x92/0x320
      [ 1797.968721]  ? avc_has_perm_noaudit+0x1df/0x320
      [ 1797.976816]  ? avc_has_extended_perms+0x990/0x990
      [ 1797.985090]  ? mark_lock+0xaa/0x9e0
      [ 1797.991988]  ? match_held_lock+0x1b/0x240
      [ 1797.999457]  ? match_held_lock+0x1b/0x240
      [ 1798.006859]  ? find_held_lock+0xac/0xd0
      [ 1798.014045]  ? symbol_put_addr+0x40/0x40
      [ 1798.021317]  ? rcu_read_lock_sched_held+0xd0/0xd0
      [ 1798.029460]  ? tc_del_tfilter+0xb10/0xb10
      [ 1798.036810]  rtnetlink_rcv_msg+0x4d5/0x620
      [ 1798.044236]  ? rtnl_bridge_getlink+0x460/0x460
      [ 1798.052034]  ? lockdep_hardirqs_on+0x250/0x250
      [ 1798.059837]  ? match_held_lock+0x1b/0x240
      [ 1798.067146]  ? find_held_lock+0xac/0xd0
      [ 1798.074246]  netlink_rcv_skb+0xc6/0x1f0
      [ 1798.081339]  ? rtnl_bridge_getlink+0x460/0x460
      [ 1798.089104]  ? netlink_ack+0x440/0x440
      [ 1798.096061]  netlink_unicast+0x2d4/0x3b0
      [ 1798.103189]  ? netlink_attachskb+0x3f0/0x3f0
      [ 1798.110724]  ? _copy_from_iter_full+0xda/0x370
      [ 1798.118415]  netlink_sendmsg+0x3ba/0x6a0
      [ 1798.125478]  ? netlink_unicast+0x3b0/0x3b0
      [ 1798.132705]  ? netlink_unicast+0x3b0/0x3b0
      [ 1798.139880]  sock_sendmsg+0x94/0xa0
      [ 1798.146332]  ____sys_sendmsg+0x36c/0x3f0
      [ 1798.153251]  ? copy_msghdr_from_user+0x165/0x230
      [ 1798.160941]  ? kernel_sendmsg+0x30/0x30
      [ 1798.167738]  ___sys_sendmsg+0xeb/0x150
      [ 1798.174411]  ? sendmsg_copy_msghdr+0x30/0x30
      [ 1798.181649]  ? lock_downgrade+0x350/0x350
      [ 1798.188559]  ? rcu_read_lock_sched_held+0xd0/0xd0
      [ 1798.196239]  ? __fget+0x21d/0x320
      [ 1798.202335]  ? do_dup2+0x2a0/0x2a0
      [ 1798.208499]  ? lock_downgrade+0x350/0x350
      [ 1798.215366]  ? __fget_light+0xd6/0xf0
      [ 1798.221808]  ? syscall_trace_enter+0x369/0x5d0
      [ 1798.229112]  __sys_sendmsg+0xd3/0x160
      [ 1798.235511]  ? __sys_sendmsg_sock+0x60/0x60
      [ 1798.242478]  ? syscall_trace_enter+0x233/0x5d0
      [ 1798.249721]  ? syscall_slow_exit_work+0x280/0x280
      [ 1798.257211]  ? do_syscall_64+0x1e/0x2e0
      [ 1798.263680]  do_syscall_64+0x72/0x2e0
      [ 1798.269950]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c1948390
    • Florian Fainelli's avatar
      net: systemport: Avoid RBUF stuck in Wake-on-LAN mode · 263a425a
      Florian Fainelli authored
      After a number of suspend and resume cycles, it is possible for the RBUF
      to be stuck in Wake-on-LAN mode, despite the MPD enable bit being
      cleared which instructed the RBUF to exit that mode.
      
      Avoid creating that problematic condition by clearing the RX_EN and
      TX_EN bits in the UniMAC prior to disable the Magic Packet Detector
      logic which is guaranteed to make the RBUF exit Wake-on-LAN mode.
      
      Fixes: 83e82f4c ("net: systemport: add Wake-on-LAN support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      263a425a
    • Heiner Kallweit's avatar
      r8169: fix performance regression related to PCIe max read request size · 21b5f672
      Heiner Kallweit authored
      It turned out that on low performance systems the original change can
      cause lower tx performance. On a N3450-based mini-PC tx performance
      in iperf3 was reduced from 950Mbps to ~900Mbps. Therefore effectively
      revert the original change, just use pcie_set_readrq() now instead of
      changing the PCIe capability register directly.
      
      Fixes: 2df49d36 ("r8169: remove fiddling with the PCIe max read request size")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21b5f672
    • Dan Carpenter's avatar
      net: sched: prevent a use after free · 7a02ea65
      Dan Carpenter authored
      The bug is that we call kfree_skb(skb) and then pass "skb" to
      qdisc_pkt_len(skb) on the next line, which is a use after free.
      Also Cong Wang points out that it's better to delay the actual
      frees until we drop the rtnl lock so we should use rtnl_kfree_skbs()
      instead of kfree_skb().
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a02ea65
    • Qian Cai's avatar
      skbuff: fix a data race in skb_queue_len() · 86b18aaa
      Qian Cai authored
      sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
      
       read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
        unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
      				 net/unix/af_unix.c:1761
        ____sys_sendmsg+0x33e/0x370
        ___sys_sendmsg+0xa6/0xf0
        __sys_sendmsg+0x69/0xf0
        __x64_sys_sendmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
        __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
        __skb_try_recv_datagram+0xbe/0x220
        unix_dgram_recvmsg+0xee/0x850
        ____sys_recvmsg+0x1fb/0x210
        ___sys_recvmsg+0xa2/0xf0
        __sys_recvmsg+0x66/0xf0
        __x64_sys_recvmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Since only the read is operating as lockless, it could introduce a logic
      bug in unix_recvq_full() due to the load tearing. Fix it by adding
      a lockless variant of skb_queue_len() and unix_recvq_full() where
      READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
      the commit d7d16a89 ("net: add skb_queue_empty_lockless()").
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86b18aaa
    • Lorenzo Bianconi's avatar
      net: mvneta: move rx_dropped and rx_errors in per-cpu stats · c35947b8
      Lorenzo Bianconi authored
      Move rx_dropped and rx_errors counters in mvneta_pcpu_stats in order to
      avoid possible races updating statistics
      
      Fixes: 562e2f46 ("net: mvneta: Improve the buffer allocation method for SWBM")
      Fixes: dc35a10f ("net: mvneta: bm: add support for hardware buffer management")
      Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c35947b8
    • Devulapally Shiva Krishna's avatar
      cxgb4: Added tls stats prints. · 45a8317e
      Devulapally Shiva Krishna authored
      Added debugfs entry to show the tls stats.
      Signed-off-by: default avatarDevulapally Shiva Krishna <shiva@chelsio.com>
      Signed-off-by: default avatarVinay Kumar Yadav <vinay.yadav@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45a8317e
    • Florian Westphal's avatar
      mptcp: fix use-after-free for ipv6 · b0519de8
      Florian Westphal authored
      Turns out that when we accept a new subflow, the newly created
      inet_sk(tcp_sk)->pinet6 points at the ipv6_pinfo structure of the
      listener socket.
      
      This wasn't caught by the selftest because it closes the accepted fd
      before the listening one.
      
      adding a close(listenfd) after accept returns is enough:
       BUG: KASAN: use-after-free in inet6_getname+0x6ba/0x790
       Read of size 1 at addr ffff88810e310866 by task mptcp_connect/2518
       Call Trace:
        inet6_getname+0x6ba/0x790
        __sys_getpeername+0x10b/0x250
        __x64_sys_getpeername+0x6f/0xb0
      
      also alter test program to exercise this.
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0519de8
  5. 05 Feb, 2020 2 commits
    • Sudarsana Reddy Kalluru's avatar
      qed: Fix timestamping issue for L2 unicast ptp packets. · 0202d293
      Sudarsana Reddy Kalluru authored
      commit cedeac9d ("qed: Add support for Timestamping the unicast
      PTP packets.") handles the timestamping of L4 ptp packets only.
      This patch adds driver changes to detect/timestamp both L2/L4 unicast
      PTP packets.
      
      Fixes: cedeac9d ("qed: Add support for Timestamping the unicast PTP packets.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0202d293
    • David S. Miller's avatar
      Merge branch 'macb-TSO-bug-fixes' · 83576e32
      David S. Miller authored
      Harini Katakam says:
      
      ====================
      macb: TSO bug fixes
      
      An IP errata was recently discovered when testing TSO enabled versions
      with perf test tools where a false amba error is reported by the IP.
      Some ways to reproduce would be to use iperf or applications with payload
      descriptor sizes very close to 16K. Once the error is observed TXERR (or
      bit 6 of ISR) will be constantly triggered leading to a series of tx path
      error handling and clean up. Workaround the same by limiting this size to
      0x3FC0 as recommended by Cadence. There was no performance impact on 1G
      system that I tested with.
      
      Note on patch 1: The alignment code may be unused but leaving it there
      in case anyone is using UFO.
      
      Added Fixes tag to patch 1.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83576e32