1. 18 Jun, 2016 2 commits
    • Jon Paul Maloy's avatar
      tipc: fix socket timer deadlock · f1d048f2
      Jon Paul Maloy authored
      We sometimes observe a 'deadly embrace' type deadlock occurring
      between mutually connected sockets on the same node. This happens
      when the one-hour peer supervision timers happen to expire
      simultaneously in both sockets.
      
      The scenario is as follows:
      
      CPU 1:                          CPU 2:
      --------                        --------
      tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
        lock(sk1.slock)                 lock(sk2.slock)
        msg_create(probe)               msg_create(probe)
        unlock(sk1.slock)               unlock(sk2.slock)
        tipc_node_xmit_skb()            tipc_node_xmit_skb()
          tipc_node_xmit()                tipc_node_xmit()
            tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
              lock(sk2.slock)                 lock((sk1.slock)
              filter_rcv()                    filter_rcv()
                tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
                  msg_create(probe_rsp)           msg_create(probe_rsp)
                  tipc_sk_respond()               tipc_sk_respond()
                    tipc_node_xmit_skb()            tipc_node_xmit_skb()
                      tipc_node_xmit()                tipc_node_xmit()
                        tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                          lock((sk1.slock)                lock((sk2.slock)
                          ===> DEADLOCK                   ===> DEADLOCK
      
      Further analysis reveals that there are three different locations in the
      socket code where tipc_sk_respond() is called within the context of the
      socket lock, with ensuing risk of similar deadlocks.
      
      We now solve this by passing a buffer queue along with all upcalls where
      sk_lock.slock may potentially be held. Response or rejected message
      buffers are accumulated into this queue instead of being sent out
      directly, and only sent once we know we are safely outside the slock
      context.
      Reported-by: default avatarGUNA <gbalasun@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1d048f2
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 695ef16c
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are rather small patches but fixing several outstanding bugs in
      nf_conntrack and nf_tables, as well as minor problems with missing
      SYNPROXY header uapi installation:
      
      1) Oneliner not to leak conntrack kmemcache on module removal, this
         problem was introduced in the previous merge window, patch from
         Florian Westphal.
      
      2) Two fixes for insufficient ruleset loop validation, one due to
         incorrect flag check in nf_tables_bind_set() and another related to
         silly wrong generation mask logic from the walk path, from Liping
         Zhang.
      
      3) Fix double-free of anonymous sets on error, this fix simplifies the
         code to let the abort path take care of releasing the set object,
         also from Liping Zhang.
      
      4) The introduction of helper function for transactions broke the skip
         inactive rules logic from the nft_do_chain(), again from Liping
         Zhang.
      
      5) Two patches to install uapi xt_SYNPROXY.h header and calm down
         kbuild robot due to missing #include <linux/types.h>.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      695ef16c
  2. 17 Jun, 2016 4 commits
  3. 16 Jun, 2016 14 commits
    • Alexander Duyck's avatar
      mlx4e: Do not attempt to offload VXLAN ports that are unrecognized · a547224d
      Alexander Duyck authored
      The mlx4e driver does not support more than one port for VXLAN offload.  As
      such expecting the hardware to offload other ports is invalid since it
      appears the parsing logic is used to perform Tx checksum and segmentation
      offloads.  Use the vxlan_port number to determine in which cases we can
      apply the offload and in which cases we can not.
      Signed-off-by: default avatarAlexander Duyck <aduyck@mirantis.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a547224d
    • Arnd Bergmann's avatar
      net: sfc: avoid -Wtype-limits warning · 17471c7b
      Arnd Bergmann authored
      When building with -Wextra, we get a harmless warning from the
      EFX_EXTRACT_OWORD32 macro:
      
      ethernet/sfc/farch.c: In function 'efx_farch_test_registers':
      ethernet/sfc/farch.c:119:30: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
      ethernet/sfc/farch.c:124:144: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
      ethernet/sfc/farch.c:124:392: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
      ethernet/sfc/farch.c:124:731: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
      
      The macro and the caller are both correct, but we can avoid the
      warning by changing the index variable to a signed type.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17471c7b
    • David S. Miller's avatar
      Merge branch 'r8152-fixes' · 13eab83f
      David S. Miller authored
      Hayes Wang says:
      
      ====================
      r8152: fix known issues
      
      These patches fix some known issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13eab83f
    • hayeswang's avatar
      r8152: correct the rx early size · a59e6d81
      hayeswang authored
      The rx early size should be
      
      	(agg_buf_sz - packet size) / 8
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a59e6d81
    • hayeswang's avatar
      r8152: reset the bmu · 93fe9b18
      hayeswang authored
      Reset the BMU to clear the rx/tx fifo. This avoids that the unexpected
      data remains in the hw.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93fe9b18
    • hayeswang's avatar
      r8152: disable MAC clock speed down · 4e384ac1
      hayeswang authored
      Disable MAC clock speed down. It may casue the first control
      transfer to contain the wrong data, when the power state change
      from U1 to U0.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e384ac1
    • David S. Miller's avatar
      Merge branch 'bpf-fixes' · 8c08c732
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf fixes
      
      Fixes for two bpf bugs:
      1st bug reported by Sasha Goldshtein here:
      https://github.com/iovisor/bcc/issues/570
      2nd discovered by Daniel Borkmann by manual code analysis.
      See patches for details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c08c732
    • Alexei Starovoitov's avatar
      bpf, trace: check event type in bpf_perf_event_read · ad572d17
      Alexei Starovoitov authored
      similar to bpf_perf_event_output() the bpf_perf_event_read() helper
      needs to check the type of the perf_event before reading the counter.
      
      Fixes: a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad572d17
    • Alexei Starovoitov's avatar
      bpf: fix matching of data/data_end in verifier · 19de99f7
      Alexei Starovoitov authored
      The ctx structure passed into bpf programs is different depending on bpf
      program type. The verifier incorrectly marked ctx->data and ctx->data_end
      access based on ctx offset only. That caused loads in tracing programs
      int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
      to be incorrectly marked as PTR_TO_PACKET which later caused verifier
      to reject the program that was actually valid in tracing context.
      Fix this by doing program type specific matching of ctx offsets.
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Reported-by: default avatarSasha Goldshtein <goldshtn@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19de99f7
    • Eric Dumazet's avatar
      gre: fix error handler · e582615a
      Eric Dumazet authored
      1) gre_parse_header() can be called from gre_err()
      
         At this point transport header points to ICMP header, not the inner
      header.
      
      2) We can not really change transport header as ipgre_err() will later
      assume transport header still points to ICMP header (using icmp_hdr())
      
      3) pskb_may_pull() logic in gre_parse_header() really works
        if we are interested at zone pointed by skb->data
      
      4) As Jiri explained in commit b7f8fe25 ("gre: do not pull header in
      ICMP error processing") we should not pull headers in error handler.
      
      So this fix :
      
      A) changes gre_parse_header() to use skb->data instead of
      skb_transport_header()
      
      B) Adds a nhs parameter to gre_parse_header() so that we can skip the
      not pulled IP header from error path.
        This offset is 0 for normal receive path.
      
      C) remove obsolete IPV6 includes
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Jiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e582615a
    • Jason A. Donenfeld's avatar
      net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG · daddef76
      Jason A. Donenfeld authored
      The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
      case was added with 2c94b537 ("net: Implement net_dbg_ratelimited() for
      CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
      usual definition of the dynamic_pr_debug macro, but alter it by adding a
      call to "net_ratelimit()" in the if statement. This is, in fact, the
      correct approach.
      
      However, while doing this, the author of the commit forgot to surround
      fmt by pr_fmt, resulting in unprefixed log messages appearing in the
      console. So, this commit adds back the pr_fmt(fmt) invocation, making
      net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
      DYNAMIC_DEBUG cases, and bringing parity with the behavior of
      dynamic_pr_debug as well.
      
      Fixes: 2c94b537 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Tim Bingham <tbingham@akamai.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daddef76
    • Arnd Bergmann's avatar
      net: skfb: remove obsolete -I cflag · 4a183670
      Arnd Bergmann authored
      The skfp driver has been moved to drivers/net/fddi/skfp a long time
      ago, but we still attempt to include headers from the old location,
      which causes a warning when building with W=1:
      
      cc1: error: /git/arm-soc/drivers/net/skfp: No such file or directory [-Werror=missing-include-dirs]
      cc1: error: drivers/net/skfp: No such file or directory [-Werror=missing-include-dirs]
      
      Clearly this include directive is not needed any more, so we can
      just remove it now.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a183670
    • Ying Xue's avatar
      tipc: eliminate uninitialized variable warning · c91522f8
      Ying Xue authored
      net/tipc/link.c: In function ‘tipc_link_timeout’:
      net/tipc/link.c:744:28: warning: ‘mtyp’ may be used uninitialized in this function [-Wuninitialized]
      
      Fixes: 42b18f60 ("tipc: refactor function tipc_link_timeout()")
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c91522f8
    • Ying Xue's avatar
      tipc: fix suspicious RCU usage · 66d95b67
      Ying Xue authored
      When run tipcTS&tipcTC test suite, the following complaint appears:
      
      [   56.926168] ===============================
      [   56.926169] [ INFO: suspicious RCU usage. ]
      [   56.926171] 4.7.0-rc1+ #160 Not tainted
      [   56.926173] -------------------------------
      [   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage!
      [   56.926175]
      [   56.926175] other info that might help us debug this:
      [   56.926175]
      [   56.926177]
      [   56.926177] rcu_scheduler_active = 1, debug_locks = 1
      [   56.926179] 3 locks held by swapper/4/0:
      [   56.926180]  #0:  (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340
      [   56.926203]  #1:  (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc]
      [   56.926212]  #2:  (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
      [   56.926218]
      [   56.926218] stack backtrace:
      [   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160
      [   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   56.926224]  0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0
      [   56.926227]  0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120
      [   56.926230]  ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88
      [   56.926234] Call Trace:
      [   56.926235]  <IRQ>  [<ffffffff813c4423>] dump_stack+0x67/0x94
      [   56.926250]  [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120
      [   56.926256]  [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc]
      [   56.926261]  [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
      [   56.926266]  [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
      [   56.926273]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926278]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926283]  [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc]
      [   56.926288]  [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340
      [   56.926291]  [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340
      [   56.926296]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926300]  [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390
      [   56.926306]  [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130
      [   56.926316]  [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2
      [   56.926323]  [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0
      [   56.926327]  [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60
      [   56.926331]  [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90
      [   56.926333]  <EOI>  [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0
      [   56.926340]  [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0
      [   56.926342]  [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20
      [   56.926345]  [<ffffffff810adf0f>] default_idle_call+0x2f/0x50
      [   56.926347]  [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0
      [   56.926353]  [<ffffffff81040ad9>] start_secondary+0xf9/0x100
      
      The warning appears as rtnl_dereference() is wrongly used in
      tipc_l2_send_msg() under RCU read lock protection. Instead the proper
      usage should be that rcu_dereference_rtnl() is called here.
      
      Fixes: 5b7066c3 ("tipc: stricter filtering of packets in bearer layer")
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66d95b67
  4. 15 Jun, 2016 19 commits
  5. 14 Jun, 2016 1 commit
    • Su, Xuemin's avatar
      udp reuseport: fix packet of same flow hashed to different socket · d1e37288
      Su, Xuemin authored
      There is a corner case in which udp packets belonging to a same
      flow are hashed to different socket when hslot->count changes from 10
      to 11:
      
      1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
      and always passes 'daddr' to udp_ehashfn().
      
      2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
      but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
      INADDR_ANY instead of some specific addr.
      
      That means when hslot->count changes from 10 to 11, the hash calculated by
      udp_ehashfn() is also changed, and the udp packets belonging to a same
      flow will be hashed to different socket.
      
      This is easily reproduced:
      1) Create 10 udp sockets and bind all of them to 0.0.0.0:40000.
      2) From the same host send udp packets to 127.0.0.1:40000, record the
      socket index which receives the packets.
      3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
      is 40000 + UDP_HASH_SIZE(4096), this makes the new socket put into the
      same hslot as the aformentioned 10 sockets, and makes the hslot->count
      change from 10 to 11.
      4) From the same host send udp packets to 127.0.0.1:40000, and the socket
      index which receives the packets will be different from the one received
      in step 2.
      This should not happen as the socket bound to 0.0.0.0:44096 should not
      change the behavior of the sockets bound to 0.0.0.0:40000.
      
      It's the same case for IPv6, and this patch also fixes that.
      Signed-off-by: default avatarSu, Xuemin <suxm@chinanetcenter.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1e37288