1. 10 Feb, 2017 40 commits
    • Chuck Lever's avatar
      svc: Avoid garbage replies when pc_func() returns rpc_drop_reply · 824c2230
      Chuck Lever authored
      commit 0533b130 upstream.
      
      If an RPC program does not set vs_dispatch and pc_func() returns
      rpc_drop_reply, the server sends a reply anyway containing a single
      word containing the value RPC_DROP_REPLY (in network byte-order, of
      course). This is a nonsense RPC message.
      
      Fixes: 9e701c61 ("svcrpc: simpler request dropping")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      824c2230
    • Sara Sharon's avatar
      iwlwifi: pcie: fix access to scratch buffer · 8d83a538
      Sara Sharon authored
      commit d5d0689a upstream.
      
      This fixes a pretty ancient bug that hasn't manifested itself
      until now.
      The scratchbuf for command queue is allocated only for 32 slots
      but is accessed with the queue write pointer - which can be
      up to 256.
      Since the scratch buf size was 16 and there are up to 256 TFDs
      we never passed a page boundary when accessing the scratch buffer,
      but when attempting to increase the size of the scratch buffer a
      panic was quick to follow when trying to access the address resulted
      in a page boundary.
      Signed-off-by: default avatarSara Sharon <sara.sharon@intel.com>
      Fixes: 38c0f334 ("iwlwifi: use coherent DMA memory for command header")
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8d83a538
    • Michal Kubecek's avatar
      ipvs: count pre-established TCP states as active · d53b6609
      Michal Kubecek authored
      commit be2cef49 upstream.
      
      Some users observed that "least connection" distribution algorithm doesn't
      handle well bursts of TCP connections from reconnecting clients after
      a node or network failure.
      
      This is because the algorithm counts active connection as worth 256
      inactive ones where for TCP, "active" only means TCP connections in
      ESTABLISHED state. In case of a connection burst, new connections are
      handled before previous ones have finished the three way handshaking so
      that all are still counted as "inactive", i.e. cheap ones. The become
      "active" quickly but at that time, all of them are already assigned to one
      real server (or few), resulting in highly unbalanced distribution.
      
      Address this by counting the "pre-established" states as "active".
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d53b6609
    • Michal Kubecek's avatar
      net: disable fragment reassembly if high_thresh is set to zero · 5a0b77dc
      Michal Kubecek authored
      commit 30759219 upstream.
      
      Before commit 6d7b857d ("net: use lib/percpu_counter API for
      fragmentation mem accounting"), setting high threshold to 0 prevented
      fragment reassembly as first fragment would be always evicted before
      second could be added to the queue. While inefficient, some users
      apparently relied on it.
      
      Since the commit mentioned above, a percpu counter is used for
      reassembly memory accounting and high batch size avoids taking slow path
      in most common scenarios. As a result, a whole full sized packet can be
      reassembled without the percpu counter's main counter changing its
      value so that even with high_thresh set to 0, fragmented packets can be
      still reassembled and processed.
      
      Add explicit checks preventing reassembly if high threshold is zero.
      
      [mk] backport to 3.12
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5a0b77dc
    • Emrah Demir's avatar
      mISDN: Fixing missing validation in base_sock_bind() · eec89a77
      Emrah Demir authored
      commit b8216468 upstream.
      
      Add validation code into mISDN/socket.c
      Signed-off-by: default avatarEmrah Demir <ed@abdsec.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      eec89a77
    • Maciej S. Szmigiero's avatar
      mISDN: Support DR6 indication in mISDNipac driver · 54fbda50
      Maciej S. Szmigiero authored
      commit 1e1589ad upstream.
      
      According to figure 39 in PEB3086 data sheet, version 1.4 this indication
      replaces DR when layer 1 transition source state is F6.
      
      This fixes mISDN layer 1 getting stuck in F6 state in TE mode on
      Dialogic Diva 2.02 card (and possibly others) when NT deactivates it.
      Signed-off-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Acked-by: default avatarKarsten Keil <keil@b1-systems.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      54fbda50
    • Konstantin Khlebnikov's avatar
      net: ratelimit warnings about dst entry refcount underflow or overflow · 49c201c1
      Konstantin Khlebnikov authored
      commit 8bf4ada2 upstream.
      
      Kernel generates a lot of warnings when dst entry reference counter
      overflows and becomes negative. That bug was seen several times at
      machines with outdated 3.10.y kernels. Most like it's already fixed
      in upstream. Anyway that flood completely kills machine and makes
      further debugging impossible.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      49c201c1
    • Mahesh Bandewar's avatar
      bonding: Fix bonding crash · 745db354
      Mahesh Bandewar authored
      commit 24b27fc4 upstream.
      
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      745db354
    • Eric Dumazet's avatar
      tcp: take care of truncations done by sk_filter() · 56325d9f
      Eric Dumazet authored
      commit ac6e7800 upstream.
      
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Reported-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      56325d9f
    • Douglas Caetano dos Santos's avatar
      tcp: fix wrong checksum calculation on MTU probing · d318f82f
      Douglas Caetano dos Santos authored
      commit 2fe664f1 upstream.
      
      With TCP MTU probing enabled and offload TX checksumming disabled,
      tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
      into the probe's SKB had an odd length. This was caused by the direct use
      of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
      fragment being copied, if needed. When this fragment was not the last, a
      subsequent call used the previous checksum without considering this
      padding.
      
      The effect was a stale connection in one way, as even retransmissions
      wouldn't solve the problem, because the checksum was never recalculated for
      the full SKB length.
      Signed-off-by: default avatarDouglas Caetano dos Santos <douglascs@taghos.com.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d318f82f
    • Eric Dumazet's avatar
      tcp: fix overflow in __tcp_retransmit_skb() · 93522d31
      Eric Dumazet authored
      commit ffb4d6c8 upstream.
      
      If a TCP socket gets a large write queue, an overflow can happen
      in a test in __tcp_retransmit_skb() preventing all retransmits.
      
      The flow then stalls and resets after timeouts.
      
      Tested:
      
      sysctl -w net.core.wmem_max=1000000000
      netperf -H dest -- -s 1000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      93522d31
    • Eric Dumazet's avatar
      tcp: properly scale window in tcp_v[46]_reqsk_send_ack() · 1c50d3ae
      Eric Dumazet authored
      commit 20a2b49f upstream.
      
      When sending an ack in SYN_RECV state, we must scale the offered
      window if wscale option was negotiated and accepted.
      
      Tested:
       Following packetdrill test demonstrates the issue :
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      // Establish a connection.
      +0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
      +0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>
      
      +0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
      // check that window is properly scaled !
      +0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1c50d3ae
    • Eric Dumazet's avatar
      tcp: fix use after free in tcp_xmit_retransmit_queue() · 13403121
      Eric Dumazet authored
      commit bb1fceca upstream.
      
      When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
      tail of the write queue using tcp_add_write_queue_tail()
      
      Then it attempts to copy user data into this fresh skb.
      
      If the copy fails, we undo the work and remove the fresh skb.
      
      Unfortunately, this undo lacks the change done to tp->highest_sack and
      we can leave a dangling pointer (to a freed skb)
      
      Later, tcp_xmit_retransmit_queue() can dereference this pointer and
      access freed memory. For regular kernels where memory is not unmapped,
      this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
      returning garbage instead of tp->snd_nxt, but with various debug
      features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
      
      This bug was found by Marco Grassi thanks to syzkaller.
      
      Fixes: 6859d494 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      13403121
    • Vegard Nossum's avatar
      net/irda: handle iriap_register_lsap() allocation failure · 942878ee
      Vegard Nossum authored
      commit 5ba092ef upstream.
      
      If iriap_register_lsap() fails to allocate memory, self->lsap is
      set to NULL. However, none of the callers handle the failure and
      irlmp_connect_request() will happily dereference it:
      
          iriap_register_lsap: Unable to allocated LSAP!
          ================================================================================
          UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
          member access within null pointer of type 'struct lsap_cb'
          CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
          04/01/2014
           0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
           ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
           ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
          Call Trace:
           [<ffffffff82344f40>] dump_stack+0xac/0xfc
           [<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a
           [<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411
           [<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970
           [<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160
           [<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0
           [<ffffffff83b78904>] iriap_do_client_event+0x94/0x120
           [<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0
           [<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630
           [<ffffffff83ba90c8>] irda_connect+0x828/0x12d0
           [<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340
           [<ffffffff833c7e09>] SyS_connect+0x9/0x10
           [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
           [<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25
          ================================================================================
      
      The bug seems to have been around since forever.
      
      There's more problems with missing error checks in iriap_init() (and
      indeed all of irda_init()), but that's a bigger problem that needs
      very careful review and testing. This patch will fix the most serious
      bug (as it's easily reached from unprivileged userspace).
      
      I have tested my patch with a reproducer.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      942878ee
    • Paolo Abeni's avatar
      ip6_tunnel: disable caching when the traffic class is inherited · 68836e49
      Paolo Abeni authored
      commit b5c2d495 upstream.
      
      If an ip6 tunnel is configured to inherit the traffic class from
      the inner header, the dst_cache must be disabled or it will foul
      the policy routing.
      
      The issue is apprently there since at leat Linux-2.6.12-rc2.
      Reported-by: default avatarLiam McBirnie <liam.mcbirnie@boeing.com>
      Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      68836e49
    • Eli Cooper's avatar
      ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() · b030cd1a
      Eli Cooper authored
      commit 23f4ffed upstream.
      
      skb->cb may contain data from previous layers. In the observed scenario,
      the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
      that small packets sent through the tunnel are mistakenly fragmented.
      
      This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
      which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
      these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b030cd1a
    • Eric Dumazet's avatar
      ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped · c3a924e1
      Eric Dumazet authored
      commit 990ff4d8 upstream.
      
      While fuzzing kernel with syzkaller, Andrey reported a nasty crash
      in inet6_bind() caused by DCCP lacking a required method.
      
      Fixes: ab1e0a13 ("[SOCK] proto: Add hashinfo member to struct proto")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c3a924e1
    • Eric Dumazet's avatar
      ipv6: dccp: fix out of bound access in dccp_v6_err() · bd380617
      Eric Dumazet authored
      commit 1aa9d1a0 upstream.
      
      dccp_v6_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmpv6_notify() are more than enough.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bd380617
    • Nicolas Dichtel's avatar
      ipv6: correctly add local routes when lo goes up · 14ba02f9
      Nicolas Dichtel authored
      commit a220445f upstream.
      
      The goal of the patch is to fix this scenario:
       ip link add dummy1 type dummy
       ip link set dummy1 up
       ip link set lo down ; ip link set lo up
      
      After that sequence, the local route to the link layer address of dummy1 is
      not there anymore.
      
      When the loopback is set down, all local routes are deleted by
      addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
      exists, because the corresponding idev has a reference on it. After the rcu
      grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
      set obsolete to DST_OBSOLETE_DEAD.
      
      In this case, init_loopback() is called before dst_rcu_free(), thus
      obsolete is still sets to something <= 0. So, the function doesn't add the
      route again. To avoid that race, let's check the rt6 refcnt instead.
      
      Fixes: 25fb6ca4 ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
      Fixes: a881ae1f ("ipv6: don't call addrconf_dst_alloc again when enable lo")
      Fixes: 33d99113 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
      Reported-by: default avatarFrancesco Santoro <francesco.santoro@6wind.com>
      Reported-by: default avatarSamuel Gauthier <samuel.gauthier@6wind.com>
      CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
      CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Weilong Chen <chenweilong@huawei.com>
      CC: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      14ba02f9
    • Lance Richardson's avatar
      ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() · 1735aaba
      Lance Richardson authored
      commit db32e4e4 upstream.
      
      Similar to commit 3be07244 ("ip6_gre: fix flowi6_proto value in
      xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.
      
      Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value.
      This affected output route lookup for packets sent on an ip6gretap device
      in cases where routing was dependent on the value of flowi6_proto.
      
      Since the correct proto is already set in the tunnel flowi6 template via
      commit 252f3f5a ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
      path."), simply delete the line setting the incorrect flowi6_proto value.
      Suggested-by: default avatarJiri Benc <jbenc@redhat.com>
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reviewed-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: default avatarLance Richardson <lrichard@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1735aaba
    • Sabrina Dubroca's avatar
      ipv6: fix rtnl locking in setsockopt for anycast and multicast · b64222a9
      Sabrina Dubroca authored
      commit a9ed4a29 upstream.
      
      Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
      triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()
      
      ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
      take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
      ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
      calling ipv6_dev_mc_inc/dec.
      
      This patch moves ASSERT_RTNL() up a level in the call stack.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reported-by: default avatarTommi Rantala <tt.rantala@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org> # 3.10.y: b7b1bfce: ipv6: split dad and rs timers
      Cc: <stable@vger.kernel.org> # 3.10.y: c15b1cca: ipv6: move dad to workqueue
      Cc: <stable@vger.kernel.org> # 3.10.y
      [Mike Manning <mmanning@brocade.com>: resolved minor conflicts in addrconf.c]
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b64222a9
    • Wei Yongjun's avatar
      ipv6: addrconf: fix dev refcont leak when DAD failed · 04355d67
      Wei Yongjun authored
      commit 751eb6b6 upstream.
      
      In general, when DAD detected IPv6 duplicate address, ifp->state
      will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
      delayed work, the call tree should be like this:
      
      ndisc_recv_ns
        -> addrconf_dad_failure        <- missing ifp put
           -> addrconf_mod_dad_work
             -> schedule addrconf_dad_work()
               -> addrconf_dad_stop()  <- missing ifp hold before call it
      
      addrconf_dad_failure() called with ifp refcont holding but not put.
      addrconf_dad_work() call addrconf_dad_stop() without extra holding
      refcount. This will not cause any issue normally.
      
      But the race between addrconf_dad_failure() and addrconf_dad_work()
      may cause ifp refcount leak and netdevice can not be unregister,
      dmesg show the following messages:
      
      IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
      ...
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Cc: stable@vger.kernel.org
      Fixes: c15b1cca ("ipv6: move DAD and addrconf_verify processing
      to workqueue")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org> # 3.10.y
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      04355d67
    • Hannes Frederic Sowa's avatar
      ipv6: move DAD and addrconf_verify processing to workqueue · 835b474b
      Hannes Frederic Sowa authored
      commit c15b1cca upstream.
      
      addrconf_join_solict and addrconf_join_anycast may cause actions which
      need rtnl locked, especially on first address creation.
      
      A new DAD state is introduced which defers processing of the initial
      DAD processing into a workqueue.
      
      To get rtnl lock we need to push the code paths which depend on those
      calls up to workqueues, specifically addrconf_verify and the DAD
      processing.
      
      (v2)
      addrconf_dad_failure needs to be queued up to the workqueue, too. This
      patch introduces a new DAD state and stop the DAD processing in the
      workqueue (this is because of the possible ipv6_del_addr processing
      which removes the solicited multicast address from the device).
      
      addrconf_verify_lock is removed, too. After the transition it is not
      needed any more.
      
      As we are not processing in bottom half anymore we need to be a bit more
      careful about disabling bottom half out when we lock spin_locks which are also
      used in bh.
      
      Relevant backtrace:
      [  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
      [  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
      [  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
      [  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
      [  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
      [  541.031152] Call Trace:
      [  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
      [  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
      [  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
      [  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
      [  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
      [  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
      [  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
      [  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
      [  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
      [  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
      [  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
      [  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
      [  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
      [  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
      [  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
      [  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
      [  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
      [  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
      [  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
      
      Hunks and backtrace stolen from a patch by Stephen Hemminger.
      Reported-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org> # 3.10.y: b7b1bfce: ipv6: split dad and rs timers
      Cc: <stable@vger.kernel.org> # 3.10.y
      [Mike Manning <mmanning@brocade.com>: resolved minor conflicts in addrconf.c]
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      835b474b
    • Hannes Frederic Sowa's avatar
      ipv6: split duplicate address detection and router solicitation timer · 973d5956
      Hannes Frederic Sowa authored
      commit b7b1bfce upstream.
      
      This patch splits the timers for duplicate address detection and router
      solicitations apart. The router solicitations timer goes into inet6_dev
      and the dad timer stays in inet6_ifaddr.
      
      The reason behind this patch is to reduce the number of unneeded router
      solicitations send out by the host if additional link-local addresses
      are created. Currently we send out RS for every link-local address on
      an interface.
      
      If the RS timer fires we pick a source address with ipv6_get_lladdr. This
      change could hurt people adding additional link-local addresses and
      specifying these addresses in the radvd clients section because we
      no longer guarantee that we use every ll address as source address in
      router solicitations.
      
      Cc: Flavio Leitner <fleitner@redhat.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: David Stevens <dlstevens@us.ibm.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Reviewed-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org> # 3.10.y
      [Mike Manning <mmanning@brocade.com>: resolved conflicts with 36bddb]
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      973d5956
    • Michal Kubeček's avatar
      ipv6: don't call fib6_run_gc() until routing is ready · af80b973
      Michal Kube�ek authored
      commit 2c861cc6 upstream.
      
      When loading the ipv6 module, ndisc_init() is called before
      ip6_route_init(). As the former registers a handler calling
      fib6_run_gc(), this opens a window to run the garbage collector
      before necessary data structures are initialized. If a network
      device is initialized in this window, adding MAC address to it
      triggers a NETDEV_CHANGEADDR event, leading to a crash in
      fib6_clean_all().
      
      Take the event handler registration out of ndisc_init() into a
      separate function ndisc_late_init() and move it after
      ip6_route_init().
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org> # 3.10.y
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      af80b973
    • Joe Perches's avatar
      stddef.h: move offsetofend inside #ifndef/#endif guard, neaten · 349759be
      Joe Perches authored
      commit 8c7fbe57 upstream.
      
      Commit 38764884 ("include/stddef.h: Move offsetofend() from vfio.h
      to a generic kernel header") added offsetofend outside the normal
      include #ifndef/#endif guard.  Move it inside.
      
      Miscellanea:
      
      o remove unnecessary blank line
      o standardize offsetof macros whitespace style
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [wt: backported only for ipv6 out-of-bounds fix]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      349759be
    • Denys Vlasenko's avatar
      include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header · 1ddb7944
      Denys Vlasenko authored
      commit 38764884 upstream.
      
      Suggested by Andy.
      Suggested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1425912738-559-1-git-send-email-dvlasenk@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [wt: backported only for ipv6 out-of-bounds fix]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1ddb7944
    • Gavin Shan's avatar
      drivers/vfio: Rework offsetofend() · 6cc73a1c
      Gavin Shan authored
      commit b13460b9 upstream.
      
      The macro offsetofend() introduces unnecessary temporary variable
      "tmp". The patch avoids that and saves a bit memory in stack.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      [wt: backported only for ipv6 out-of-bounds fix]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6cc73a1c
    • Scot Doyle's avatar
      vt: clear selection before resizing · 5812a9bc
      Scot Doyle authored
      commit 009e39ae upstream.
      
      When resizing a vt its selection may exceed the new size, resulting in
      an invalid memory access [1]. Clear the selection before resizing.
      
      [1] http://lkml.kernel.org/r/CACT4Y+acDTwy4umEvf5ROBGiRJNrxHN4Cn5szCXE5Jw-d1B=Xw@mail.gmail.comReported-and-tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarScot Doyle <lkml14@scotdoyle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5812a9bc
    • Jiri Slaby's avatar
      tty: vt, fix bogus division in csi_J · 902fd8d5
      Jiri Slaby authored
      commit 42acfc66 upstream.
      
      In csi_J(3), the third parameter of scr_memsetw (vc_screenbuf_size) is
      divided by 2 inappropriatelly. But scr_memsetw expects size, not
      count, because it divides the size by 2 on its own before doing actual
      memset-by-words.
      
      So remove the bogus division.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Petr Písař <ppisar@redhat.com>
      Fixes: f8df13e0 (tty: Clean console safely)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      902fd8d5
    • Dmitry Vyukov's avatar
      tty: limit terminal size to 4M chars · 35059401
      Dmitry Vyukov authored
      commit 32b2921e upstream.
      
      Size of kmalloc() in vc_do_resize() is controlled by user.
      Too large kmalloc() size triggers WARNING message on console.
      Put a reasonable upper bound on terminal size to prevent WARNINGs.
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      CC: David Rientjes <rientjes@google.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: syzkaller@googlegroups.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      35059401
    • Peter Hurley's avatar
      tty: Prevent ldisc drivers from re-using stale tty fields · 67de5f0a
      Peter Hurley authored
      commit dd42bf11 upstream.
      
      Line discipline drivers may mistakenly misuse ldisc-related fields
      when initializing. For example, a failure to initialize tty->receive_room
      in the N_GIGASET_M101 line discipline was recently found and fixed [1].
      Now, the N_X25 line discipline has been discovered accessing the previous
      line discipline's already-freed private data [2].
      
      Harden the ldisc interface against misuse by initializing revelant
      tty fields before instancing the new line discipline.
      
      [1]
          commit fd98e941
          Author: Tilman Schmidt <tilman@imap.cc>
          Date:   Tue Jul 14 00:37:13 2015 +0200
      
          isdn/gigaset: reset tty->receive_room when attaching ser_gigaset
      
      [2] Report from Sasha Levin <sasha.levin@oracle.com>
          [  634.336761] ==================================================================
          [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
          [  634.339558] Read of size 4 by task syzkaller_execu/8981
          [  634.340359] =============================================================================
          [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
          ...
          [  634.405018] Call Trace:
          [  634.405277] dump_stack (lib/dump_stack.c:52)
          [  634.405775] print_trailer (mm/slub.c:655)
          [  634.406361] object_err (mm/slub.c:662)
          [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
          [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
          [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
          [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
          [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
          [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
          [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
          [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
          [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
      
      Cc: Tilman Schmidt <tilman@imap.cc>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [wt: adjust context]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      67de5f0a
    • Peter Zijlstra's avatar
      perf: Tighten (and fix) the grouping condition · ac74acf2
      Peter Zijlstra authored
      commit c3c87e77 upstream.
      
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ac74acf2
    • Arnaldo Carvalho de Melo's avatar
      perf symbols: Fixup symbol sizes before picking best ones · d636f64a
      Arnaldo Carvalho de Melo authored
      commit 432746f8 upstream.
      
      When we call symbol__fixup_duplicate() we use algorithms to pick the
      "best" symbols for cases where there are various functions/aliases to an
      address, and those check zero size symbols, which, before calling
      symbol__fixup_end() are _all_ symbols in a just parsed kallsyms file.
      
      So first fixup the end, then fixup the duplicates.
      
      Found while trying to figure out why 'perf test vmlinux' failed, see the
      output of 'perf test -v vmlinux' to see cases where the symbols picked
      as best for vmlinux don't match the ones picked for kallsyms.
      
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 694bf407 ("perf symbols: Add some heuristics for choosing the best duplicate symbol")
      Link: http://lkml.kernel.org/n/tip-rxqvdgr0mqjdxee0kf8i2ufn@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d636f64a
    • Karl Beldan's avatar
      mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl · 44e3b0c7
      Karl Beldan authored
      commit f6d7c1b5 upstream.
      
      This fixes subpage writes when using 4-bit HW ECC.
      
      There has been numerous reports about ECC errors with devices using this
      driver for a while.  Also the 4-bit ECC has been reported as broken with
      subpages in [1] and with 16 bits NANDs in the driver and in mach* board
      files both in mainline and in the vendor BSPs.
      
      What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to
      try reinitializing the ECC engine:
      - R/W on whole pages properly generates/checks RS code
      - try writing the 1st subpage only of a blank page, the subpage is well
        written and the RS code properly generated, re-reading the same page
        the HW detects some ECC error, reading the same page again no ECC
        error is detected
      
      Note that the ECC engine is already reinitialized in the 1-bit case.
      
      Tested on my LCDK with UBI+UBIFS using subpages.
      This could potentially get rid of the issue workarounded in [1].
      
      [1] 28c015a9 ("mtd: davinci-nand: disable subpage write for keystone-nand")
      
      Fixes: 6a4123e5 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage")
      Signed-off-by: default avatarKarl Beldan <kbeldan@baylibre.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      44e3b0c7
    • Dan Carpenter's avatar
      mtd: pmcmsp-flash: Allocating too much in init_msp_flash() · 1b47a57f
      Dan Carpenter authored
      commit 79ad07d4 upstream.
      
      There is a cut and paste issue here.  The bug is that we are allocating
      more memory than necessary for msp_maps.  We should be allocating enough
      space for a map_info struct (144 bytes) but we instead allocate enough
      for an mtd_info struct (1840 bytes).  It's a small waste.
      
      The other part of this is not harmful but when we allocated msp_flash
      then we allocated enough space fro a map_info pointer instead of an
      mtd_info pointer.  But since pointers are the same size it works out
      fine.
      
      Anyway, I decided to clean up all three allocations a bit to make them
      a bit more consistent and clear.
      
      Fixes: 68aa0fa8 ('[MTD] PMC MSP71xx flash/rootfs mappings')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1b47a57f
    • Brian Norris's avatar
      mtd: blkdevs: fix potential deadlock + lockdep warnings · bdd7043b
      Brian Norris authored
      commit f3c63795 upstream.
      
      Commit 073db4a5 ("mtd: fix: avoid race condition when accessing
      mtd->usecount") fixed a race condition but due to poor ordering of the
      mutex acquisition, introduced a potential deadlock.
      
      The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
      will delete one or more MTDs, along with any corresponding mtdblock
      devices. This could potentially race with an acquisition of the block
      device as follows.
      
       -> blktrans_open()
          ->  mutex_lock(&dev->lock);
          ->  mutex_lock(&mtd_table_mutex);
      
       -> del_mtd_device()
          ->  mutex_lock(&mtd_table_mutex);
          ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
             ->  mutex_lock(&dev->lock);
      
      This is a classic (potential) ABBA deadlock, which can be fixed by
      making the A->B ordering consistent everywhere. There was no real
      purpose to the ordering in the original patch, AFAIR, so this shouldn't
      be a problem. This ordering was actually already present in
      del_mtd_blktrans_dev(), for one, where the function tried to ensure that
      its caller already held mtd_table_mutex before it acquired &dev->lock:
      
              if (mutex_trylock(&mtd_table_mutex)) {
                      mutex_unlock(&mtd_table_mutex);
                      BUG();
              }
      
      So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
      we always acquire mtd_table_mutex first.
      
      Snippets of the lockdep output follow:
      
        # modprobe -r m25p80
        [   53.419251]
        [   53.420838] ======================================================
        [   53.427300] [ INFO: possible circular locking dependency detected ]
        [   53.433865] 4.3.0-rc6 #96 Not tainted
        [   53.437686] -------------------------------------------------------
        [   53.444220] modprobe/372 is trying to acquire lock:
        [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.457271]
        [   53.457271] but task is already holding lock:
        [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
        [   53.471321]
        [   53.471321] which lock already depends on the new lock.
        [   53.471321]
        [   53.479856]
        [   53.479856] the existing dependency chain (in reverse order) is:
        [   53.487660]
        -> #1 (mtd_table_mutex){+.+.+.}:
        [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
        [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
        [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
        [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
        [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
        [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
        [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
        [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.536375]
        -> #0 (&new->lock){+.+...}:
        [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
        [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
        [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
        [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
        [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
        [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
        [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
        [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
        [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
        [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
        [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
        [   53.611849]        [<c046d878>] __unregister+0x10/0x20
        [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
        [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
        [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
        [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
        [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
        [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
        [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
        [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.664621]
        [   53.664621] other info that might help us debug this:
        [   53.664621]
        [   53.672979]  Possible unsafe locking scenario:
        [   53.672979]
        [   53.679169]        CPU0                    CPU1
        [   53.683900]        ----                    ----
        [   53.688633]   lock(mtd_table_mutex);
        [   53.692383]                                lock(&new->lock);
        [   53.698306]                                lock(mtd_table_mutex);
        [   53.704658]   lock(&new->lock);
        [   53.707946]
        [   53.707946]  *** DEADLOCK ***
      
      Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount")
      Reported-by: default avatarFelipe Balbi <balbi@ti.com>
      Tested-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bdd7043b
    • Mark Bloch's avatar
      IB/cm: Mark stale CM id's whenever the mad agent was unregistered · aa19a889
      Mark Bloch authored
      commit 9db0ff53 upstream.
      
      When there is a CM id object that has port assigned to it, it means that
      the cm-id asked for the specific port that it should go by it, but if
      that port was removed (hot-unplug event) the cm-id was not updated.
      In order to fix that the port keeps a list of all the cm-id's that are
      planning to go by it, whenever the port is removed it marks all of them
      as invalid.
      
      This commit fixes a kernel panic which happens when running traffic between
      guests and we force reboot a guest mid traffic, it triggers a kernel panic:
      
       Call Trace:
        [<ffffffff815271fa>] ? panic+0xa7/0x16f
        [<ffffffff8152b534>] ? oops_end+0xe4/0x100
        [<ffffffff8104a00b>] ? no_context+0xfb/0x260
        [<ffffffff81084db2>] ? del_timer_sync+0x22/0x30
        [<ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0
        [<ffffffff81084240>] ? process_timeout+0x0/0x10
        [<ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20
        [<ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480
        [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
        [<ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core]
        [<ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core]
        [<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
        [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
        [<ffffffff8152a815>] ? page_fault+0x25/0x30
        [<ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm]
        [<ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm]
        [<ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm]
        [<ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm]
        [<ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib]
        [<ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib]
        [<ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib]
        [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
        [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
        [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
        [<ffffffff8109aef6>] ? kthread+0x96/0xa0
        [<ffffffff8100c20a>] ? child_rip+0xa/0x20
        [<ffffffff8109ae60>] ? kthread+0x0/0xa0
        [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      Fixes: a977049d ("[PATCH] IB: Add the kernel CM implementation")
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      aa19a889
    • Tariq Toukan's avatar
      IB/uverbs: Fix leak of XRC target QPs · 52aac91d
      Tariq Toukan authored
      commit 5b810a24 upstream.
      
      The real QP is destroyed in case of the ref count reaches zero, but
      for XRC target QPs this call was missed and caused to QP leaks.
      
      Let's call to destroy for all flows.
      
      Fixes: 0e0ec7e0 ('RDMA/core: Export ib_open_qp() to share XRC...')
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      52aac91d
    • Matan Barak's avatar
      IB/mlx4: Fix create CQ error flow · 1aecb8e4
      Matan Barak authored
      commit 593ff73b upstream.
      
      Currently, if ib_copy_to_udata fails, the CQ
      won't be deleted from the radix tree and the HW (HW2SW).
      
      Fixes: 225c7b1f ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1aecb8e4