1. 03 Mar, 2016 40 commits
    • Jan Kara's avatar
      ext4: fix bh->b_state corruption · 7c3d1424
      Jan Kara authored
      commit ed8ad838 upstream.
      
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      Reported-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      [NB: Backported to 4.4.2]
      Acked-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c3d1424
    • Neil Horman's avatar
      sctp: Fix port hash table size computation · 1c2efb14
      Neil Horman authored
      [ Upstream commit d9749fb5 ]
      
      Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
      its size computation, observing that the current method never guaranteed
      that the hashsize (measured in number of entries) would be a power of two,
      which the input hash function for that table requires.  The root cause of
      the problem is that two values need to be computed (one, the allocation
      order of the storage requries, as passed to __get_free_pages, and two the
      number of entries for the hash table).  Both need to be ^2, but for
      different reasons, and the existing code is simply computing one order
      value, and using it as the basis for both, which is wrong (i.e. it assumes
      that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).
      
      To fix this, we change the logic slightly.  We start by computing a goal
      allocation order (which is limited by the maximum size hash table we want
      to support.  Then we attempt to allocate that size table, decreasing the
      order until a successful allocation is made.  Then, with the resultant
      successful order we compute the number of buckets that hash table supports,
      which we then round down to the nearest power of two, giving us the number
      of entries the table actually supports.
      
      I've tested this locally here, using non-debug and spinlock-debug kernels,
      and the number of entries in the hashtable consistently work out to be
      powers of two in all cases.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      CC: Dmitry Vyukov <dvyukov@google.com>
      CC: Vladislav Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c2efb14
    • Dmitry V. Levin's avatar
      unix_diag: fix incorrect sign extension in unix_lookup_by_ino · 82f26aa4
      Dmitry V. Levin authored
      [ Upstream commit b5f05492 ]
      
      The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
      __u32, but unix_lookup_by_ino's argument ino has type int, which is not
      a problem yet.
      However, when ino is compared with sock_i_ino return value of type
      unsigned long, ino is sign extended to signed long, and this results
      to incorrect comparison on 64-bit architectures for inode numbers
      greater than INT_MAX.
      
      This bug was found by strace test suite.
      
      Fixes: 5d3cae8b ("unix_diag: Dumping exact socket core")
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82f26aa4
    • Insu Yun's avatar
      tipc: unlock in error path · 4ac39c3e
      Insu Yun authored
      [ Upstream commit b53ce3e7 ]
      
      tipc_bcast_unlock need to be unlocked in error path.
      Signed-off-by: default avatarInsu Yun <wuninsu@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ac39c3e
    • Anton Protopopov's avatar
      rtnl: RTM_GETNETCONF: fix wrong return value · b7c2e2ac
      Anton Protopopov authored
      [ Upstream commit a97eb33f ]
      
      An error response from a RTM_GETNETCONF request can return the positive
      error value EINVAL in the struct nlmsgerr that can mislead userspace.
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7c2e2ac
    • Phil Sutter's avatar
      IFF_NO_QUEUE: Fix for drivers not calling ether_setup() · 9b87f63b
      Phil Sutter authored
      [ Upstream commit a813104d ]
      
      My implementation around IFF_NO_QUEUE driver flag assumed that leaving
      tx_queue_len untouched (specifically: not setting it to zero) by drivers
      would make it possible to assign a regular qdisc to them without having
      to worry about setting tx_queue_len to a useful value. This was only
      partially true: I overlooked that some drivers don't call ether_setup()
      and therefore not initialize tx_queue_len to the default value of 1000.
      Consequently, removing the workarounds in place for that case in qdisc
      implementations which cared about it (namely, pfifo, bfifo, gred, htb,
      plug and sfb) leads to problems with these specific interface types and
      qdiscs.
      
      Luckily, there's already a sanitization point for drivers setting
      tx_queue_len to zero, which can be reused to assign the fallback value
      most qdisc implementations used, which is 1.
      
      Fixes: 348e3435 ("net: sched: drop all special handling of tx_queue_len == 0")
      Tested-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b87f63b
    • Eric Dumazet's avatar
      tcp/dccp: fix another race at listener dismantle · 9653359e
      Eric Dumazet authored
      [ Upstream commit 7716682c ]
      
      Ilya reported following lockdep splat:
      
      kernel: =========================
      kernel: [ BUG: held lock freed! ]
      kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
      kernel: -------------------------
      kernel: swapper/5/0 is freeing memory
      ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
      kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      kernel: 4 locks held by swapper/5/0:
      kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
      netif_receive_skb_internal+0x4b/0x1f0
      kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
      ip_local_deliver_finish+0x3f/0x380
      kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
      sk_clone_lock+0x19b/0x440
      kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      
      To properly fix this issue, inet_csk_reqsk_queue_add() needs
      to return to its callers if the child as been queued
      into accept queue.
      
      We also need to make sure listener is still there before
      calling sk->sk_data_ready(), by holding a reference on it,
      since the reference carried by the child can disappear as
      soon as the child is put on accept queue.
      Reported-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9653359e
    • Xin Long's avatar
      route: check and remove route cache when we get route · 54d77a22
      Xin Long authored
      [ Upstream commit deed49df ]
      
      Since the gc of ipv4 route was removed, the route cached would has
      no chance to be removed, and even it has been timeout, it still could
      be used, cause no code to check it's expires.
      
      Fix this issue by checking  and removing route cache when we get route.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d77a22
    • Jamal Hadi Salim's avatar
      net_sched fix: reclassification needs to consider ether protocol changes · d4775ea0
      Jamal Hadi Salim authored
      [ Upstream commit 619fe326 ]
      
      actions could change the etherproto in particular with ethernet
      tunnelled data. Typically such actions, after peeling the outer header,
      will ask for the packet to be  reclassified. We then need to restart
      the classification with the new proto header.
      
      Example setup used to catch this:
      sudo tc qdisc add dev $ETH ingress
      sudo $TC filter add dev $ETH parent ffff: pref 1 protocol 802.1Q \
      u32 match u32 0 0 flowid 1:1 \
      action  vlan pop reclassify
      
      Fixes: 3b3ae880 ("net: sched: consolidate tc_classify{,_compat}")
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4775ea0
    • Guillaume Nault's avatar
      pppoe: fix reference counting in PPPoE proxy · 26fd5ed6
      Guillaume Nault authored
      [ Upstream commit 29e73269 ]
      
      Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
      This is already handled correctly in the error path.
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26fd5ed6
    • Mark Tomlinson's avatar
      l2tp: Fix error creating L2TP tunnels · e9f13d3f
      Mark Tomlinson authored
      [ Upstream commit 853effc5 ]
      
      A previous commit (33f72e6f) added notification via netlink for tunnels
      when created/modified/deleted. If the notification returned an error,
      this error was returned from the tunnel function. If there were no
      listeners, the error code ESRCH was returned, even though having no
      listeners is not an error. Other calls to this and other similar
      notification functions either ignore the error code, or filter ESRCH.
      This patch checks for ESRCH and does not flag this as an error.
      Reviewed-by: default avatarHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: default avatarMark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9f13d3f
    • Eugenia Emantayev's avatar
      net/mlx4_en: Avoid changing dev->features directly in run-time · 1cabc3e3
      Eugenia Emantayev authored
      [ Upstream commit 925ab1aa ]
      
      It's forbidden to manually change dev->features in run-time. Currently, this is
      done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
      VXLAN tunnel is set. However, since the stack actually does features intersection
      with hw_enc_features, we can safely revert to advertizing features early when
      registering the netdevice.
      
      Fixes: f4a1edd5 ('net/mlx4_en: Advertize encapsulation offloads [...]')
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cabc3e3
    • Eugenia Emantayev's avatar
      net/mlx4_en: Choose time-stamping shift value according to HW frequency · 7675c3c6
      Eugenia Emantayev authored
      [ Upstream commit 31c128b6 ]
      
      Previously, the shift value used for time-stamping was constant and didn't
      depend on the HW chip frequency. Change that to take the frequency into account
      and calculate the maximal value in cycles per wraparound of ten seconds. This
      time slot was chosen since it gives a good accuracy in time synchronization.
      
      Algorithm for shift value calculation:
       * Round up the maximal value in cycles to nearest power of two
      
       * Calculate maximal multiplier by division of all 64 bits set
         to above result
      
       * Then, invert the function clocksource_khz2mult() to get the shift from
         maximal mult value
      
      Fixes: ec693d47 ('net/mlx4_en: Add HW timestamping (TS) support')
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Reviewed-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7675c3c6
    • Amir Vadai's avatar
      net/mlx4_en: Count HW buffer overrun only once · 3b7abf72
      Amir Vadai authored
      [ Upstream commit 281e8b2f ]
      
      RdropOvflw counts overrun of HW buffer, therefore should
      be used for rx_fifo_errors only.
      
      Currently RdropOvflw counter is mistakenly also set into
      rx_missed_errors and rx_over_errors too, which makes the
      device total dropped packets accounting to show wrong results.
      
      Fix that. Use it for rx_fifo_errors only.
      
      Fixes: c27a02cd ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
      Signed-off-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b7abf72
    • Bjørn Mork's avatar
      qmi_wwan: add "4G LTE usb-modem U901" · 617d22dd
      Bjørn Mork authored
      [ Upstream commit aac8d3c2 ]
      
      Thomas reports:
      
      T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=05c6 ProdID=6001 Rev=00.00
      S:  Manufacturer=USB Modem
      S:  Product=USB Modem
      S:  SerialNumber=1234567890ABCDEF
      C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      Reported-by: default avatarThomas Schäfer <tschaefer@t-online.de>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      617d22dd
    • Eric Dumazet's avatar
      tcp: md5: release request socket instead of listener · a4b84d5e
      Eric Dumazet authored
      [ Upstream commit 72923555 ]
      
      If tcp_v4_inbound_md5_hash() returns an error, we must release
      the refcount on the request socket, not on the listener.
      
      The bug was added for IPv4 only.
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4b84d5e
    • Jon Paul Maloy's avatar
      tipc: fix premature addition of node to lookup table · 692925fe
      Jon Paul Maloy authored
      [ Upstream commit d5c91fb7 ]
      
      In commit 52666986 ("tipc: let broadcast packet reception
      use new link receive function") we introduced a new per-node
      broadcast reception link instance. This link is created at the
      moment the node itself is created. Unfortunately, the allocation
      is done after the node instance has already been added to the node
      lookup hash table. This creates a potential race condition, where
      arriving broadcast packets are able to find and access the node
      before it has been fully initialized, and before the above mentioned
      link has been created. The result is occasional crashes in the function
      tipc_bcast_rcv(), which is trying to access the not-yet existing link.
      
      We fix this by deferring the addition of the node instance until after
      it has been fully initialized in the function tipc_node_create().
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      692925fe
    • Rainer Weikusat's avatar
      af_unix: Guard against other == sk in unix_dgram_sendmsg · 1bd36785
      Rainer Weikusat authored
      [ Upstream commit a5527dda ]
      
      The unix_dgram_sendmsg routine use the following test
      
      if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
      
      to determine if sk and other are in an n:1 association (either
      established via connect or by using sendto to send messages to an
      unrelated socket identified by address). This isn't correct as the
      specified address could have been bound to the sending socket itself or
      because this socket could have been connected to itself by the time of
      the unix_peer_get but disconnected before the unix_state_lock(other). In
      both cases, the if-block would be entered despite other == sk which
      might either block the sender unintentionally or lead to trying to unlock
      the same spin lock twice for a non-blocking send. Add a other != sk
      check to guard against this.
      
      Fixes: 7d267278 ("unix: avoid use-after-free in ep_remove_wait_queue")
      Reported-By: default avatarPhilipp Hahn <pmhahn@pmhahn.de>
      Signed-off-by: default avatarRainer Weikusat <rweikusat@mobileactivedefense.com>
      Tested-by: default avatarPhilipp Hahn <pmhahn@pmhahn.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bd36785
    • Rainer Weikusat's avatar
      af_unix: Don't set err in unix_stream_read_generic unless there was an error · 2f46f069
      Rainer Weikusat authored
      [ Upstream commit 1b92ee3d ]
      
      The present unix_stream_read_generic contains various code sequences of
      the form
      
      err = -EDISASTER;
      if (<test>)
      	goto out;
      
      This has the unfortunate side effect of possibly causing the error code
      to bleed through to the final
      
      out:
      	return copied ? : err;
      
      and then to be wrongly returned if no data was copied because the caller
      didn't supply a data buffer, as demonstrated by the program available at
      
      http://pad.lv/1540731
      
      Change it such that err is only set if an error condition was detected.
      
      Fixes: 3822b5c2 ("af_unix: Revert 'lock_interruptible' in stream receive code")
      Reported-by: default avatarJoseph Salisbury <joseph.salisbury@canonical.com>
      Signed-off-by: default avatarRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f46f069
    • Eric Dumazet's avatar
      ipv4: fix memory leaks in ip_cmsg_send() callers · 6b567a1a
      Eric Dumazet authored
      [ Upstream commit 91948309 ]
      
      Dmitry reported memory leaks of IP options allocated in
      ip_cmsg_send() when/if this function returns an error.
      
      Callers are responsible for the freeing.
      
      Many thanks to Dmitry for the report and diagnostic.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b567a1a
    • Jay Vosburgh's avatar
      bonding: Fix ARP monitor validation · 07cc96fb
      Jay Vosburgh authored
      [ Upstream commit 21a75f09 ]
      
      The current logic in bond_arp_rcv will accept an incoming ARP for
      validation if (a) the receiving slave is either "active" (which includes
      the currently active slave, or the current ARP slave) or, (b) there is a
      currently active slave, and it has received an ARP since it became active.
      For case (b), the receiving slave isn't the currently active slave, and is
      receiving the original broadcast ARP request, not an ARP reply from the
      target.
      
      	This logic can fail if there is no currently active slave.  In
      this situation, the ARP probe logic cycles through all slaves, assigning
      each in turn as the "current_arp_slave" for one arp_interval, then setting
      that one as "active," and sending an ARP probe from that slave.  The
      current logic expects the ARP reply to arrive on the sending
      current_arp_slave, however, due to switch FDB updating delays, the reply
      may be directed to another slave.
      
      	This can arise if the bonding slaves and switch are working, but
      the ARP target is not responding.  When the ARP target recovers, a
      condition may result wherein the ARP target host replies faster than the
      switch can update its forwarding table, causing each ARP reply to be sent
      to the previous current_arp_slave.  This will never pass the logic in
      bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
      
      	Some experimentation on a LAN shows ARP reply round trips in the
      200 usec range, but my available switches never update their FDB in less
      than 4000 usec.
      
      	This patch changes the logic in bond_arp_rcv to additionally
      accept an ARP reply for validation on any slave if there is a current ARP
      slave and it sent an ARP probe during the previous arp_interval.
      
      Fixes: aeea64ac ("bonding: don't trust arp requests unless active slave really works")
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07cc96fb
    • Daniel Borkmann's avatar
      bpf: fix branch offset adjustment on backjumps after patching ctx expansion · a34f2f9f
      Daniel Borkmann authored
      [ Upstream commit a1b14d27 ]
      
      When ctx access is used, the kernel often needs to expand/rewrite
      instructions, so after that patching, branch offsets have to be
      adjusted for both forward and backward jumps in the new eBPF program,
      but for backward jumps it fails to account the delta. Meaning, for
      example, if the expansion happens exactly on the insn that sits at
      the jump target, it doesn't fix up the back jump offset.
      
      Analysis on what the check in adjust_branches() is currently doing:
      
        /* adjust offset of jmps if necessary */
        if (i < pos && i + insn->off + 1 > pos)
          insn->off += delta;
        else if (i > pos && i + insn->off + 1 < pos)
          insn->off -= delta;
      
      First condition (forward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- i/insn            insns[1] <--- i/insn
        insns[2] <--- pos               insns[P] <--- pos
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- target_X          insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- target_X
                                        insns[5]
      
      First case is if we cross pos-boundary and the jump instruction was
      before pos. This is handeled correctly. I.e. if i == pos, then this
      would mean our jump that we currently check was the patchlet itself
      that we just injected. Since such patchlets are self-contained and
      have no awareness of any insns before or after the patched one, the
      delta is correctly not adjusted. Also, for the second condition in
      case of i + insn->off + 1 == pos, means we jump to that newly patched
      instruction, so no offset adjustment are needed. That part is correct.
      
      Second condition (backward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- target_X          insns[1] <--- target_X
        insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- i/insn            insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- i/insn
                                        insns[5]
      
      Second interesting case is where we cross pos-boundary and the jump
      instruction was after pos. Backward jump with i == pos would be
      impossible and pose a bug somewhere in the patchlet, so the first
      condition checking i > pos is okay only by itself. However, i +
      insn->off + 1 < pos does not always work as intended to trigger the
      adjustment. It works when jump targets would be far off where the
      delta wouldn't matter. But, for example, where the fixed insn->off
      before pointed to pos (target_Y), it now points to pos + delta, so
      that additional room needs to be taken into account for the check.
      This means that i) both tests here need to be adjusted into pos + delta,
      and ii) for the second condition, the test needs to be <= as pos
      itself can be a target in the backjump, too.
      
      Fixes: 9bac3d6d ("bpf: allow extended BPF programs access skb fields")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a34f2f9f
    • Alexander Duyck's avatar
      flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen · b083b36c
      Alexander Duyck authored
      [ Upstream commit 461547f3 ]
      
      This patch fixes an issue with unaligned accesses when using
      eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
      The fact is when trying to check the length we don't need to be looking at
      the flow label so we can reorder the checks to first check if we are
      supposed to gather the flow label and then make the call to actually get
      it.
      
      v2:  Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
           cause us to check for the flow label.
      Reported-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b083b36c
    • Alexander Duyck's avatar
      net: Copy inner L3 and L4 headers as unaligned on GRE TEB · e3865b8b
      Alexander Duyck authored
      [ Upstream commit 78565208 ]
      
      This patch corrects the unaligned accesses seen on GRE TEB tunnels when
      generating hash keys.  Specifically what this patch does is make it so that
      we force the use of skb_copy_bits when the GRE inner headers will be
      unaligned due to NET_IP_ALIGNED being a non-zero value.
      Signed-off-by: default avatarAlexander Duyck <aduyck@mirantis.com>
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3865b8b
    • Xin Long's avatar
      sctp: translate network order to host order when users get a hmacid · 2038fb6f
      Xin Long authored
      [ Upstream commit 7a84bd46 ]
      
      Commit ed5a377d ("sctp: translate host order to network order when
      setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
      but the same issue also exists on getting a hmacid.
      
      We fix it by changing hmacids to host order when users get them with
      getsockopt.
      
      Fixes: Commit ed5a377d ("sctp: translate host order to network order when setting a hmacid")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2038fb6f
    • Sandeep Pillai's avatar
      enic: increment devcmd2 result ring in case of timeout · ff914007
      Sandeep Pillai authored
      [ Upstream commit ca7f41a4 ]
      
      Firmware posts the devcmd result in result ring. In case of timeout, driver
      does not increment the current result pointer and firmware could post the
      result after timeout has occurred. During next devcmd, driver would be
      reading the result of previous devcmd.
      
      Fix this by incrementing result even in case of timeout.
      
      Fixes: 373fb087 ("enic: add devcmd2")
      Signed-off-by: default avatarSandeep Pillai <sanpilla@cisco.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff914007
    • Siva Reddy Kallam's avatar
      tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs · 98673eb0
      Siva Reddy Kallam authored
      [ Upstream commit b7d98729 ]
      
      tg3_tso_bug() can hit a condition where the entire tx ring is not big
      enough to segment the GSO packet. For example, if MSS is very small,
      gso_segs can exceed the tx ring size. When we hit the condition, it
      will cause tx timeout.
      
      tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
      For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
      For DMA bugs, we can still fall back to linearize the SKB and let the
      hardware transmit the TSO packet.
      
      This patch adds a function tg3_tso_bug_gso_check() to check if there
      are enough tx descriptors for GSO before calling tg3_tso_bug().
      The caller will then handle the error appropriately - drop or
      lineraize the SKB.
      
      v2: Corrected patch description to avoid confusion.
      Signed-off-by: default avatarSiva Reddy Kallam <siva.kallam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Acked-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98673eb0
    • Hans Westgaard Ry's avatar
      net:Add sysctl_max_skb_frags · 1bec5f40
      Hans Westgaard Ry authored
      [ Upstream commit 5f74f82e ]
      
      Devices may have limits on the number of fragments in an skb they support.
      Current codebase uses a constant as maximum for number of fragments one
      skb can hold and use.
      When enabling scatter/gather and running traffic with many small messages
      the codebase uses the maximum number of fragments and may thereby violate
      the max for certain devices.
      The patch introduces a global variable as max number of fragments.
      Signed-off-by: default avatarHans Westgaard Ry <hans.westgaard.ry@oracle.com>
      Reviewed-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bec5f40
    • Eric Dumazet's avatar
      tcp: do not drop syn_recv on all icmp reports · 2679161c
      Eric Dumazet authored
      [ Upstream commit 9cf74903 ]
      
      Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
      were leading to RST.
      
      This is of course incorrect.
      
      A specific list of ICMP messages should be able to drop a SYN_RECV.
      
      For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
      not hold a dst per SYN_RECV pseudo request.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Reported-by: default avatarPetr Novopashenniy <pety@rusnet.ru>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2679161c
    • Hannes Frederic Sowa's avatar
      unix: correctly track in-flight fds in sending process user_struct · 3ba9b9f2
      Hannes Frederic Sowa authored
      [ Upstream commit 415e3d3e ]
      
      The commit referenced in the Fixes tag incorrectly accounted the number
      of in-flight fds over a unix domain socket to the original opener
      of the file-descriptor. This allows another process to arbitrary
      deplete the original file-openers resource limit for the maximum of
      open files. Instead the sending processes and its struct cred should
      be credited.
      
      To do so, we add a reference counted struct user_struct pointer to the
      scm_fp_list and use it to account for the number of inflight unix fds.
      
      Fixes: 712f4aad ("unix: properly account for FDs passed over unix sockets")
      Reported-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ba9b9f2
    • Eric Dumazet's avatar
      ipv6: fix a lockdep splat · 4c890233
      Eric Dumazet authored
      [ Upstream commit 44c3d0c1 ]
      
      Silence lockdep false positive about rcu_dereference() being
      used in the wrong context.
      
      First one should use rcu_dereference_protected() as we own the spinlock.
      
      Second one should be a normal assignation, as no barrier is needed.
      
      Fixes: 18367681 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c890233
    • subashab@codeaurora.org's avatar
      ipv6: addrconf: Fix recursive spin lock call · cdbc6682
      subashab@codeaurora.org authored
      [ Upstream commit 16186a82 ]
      
      A rcu stall with the following backtrace was seen on a system with
      forwarding, optimistic_dad and use_optimistic set. To reproduce,
      set these flags and allow ipv6 autoconf.
      
      This occurs because the device write_lock is acquired while already
      holding the read_lock. Back trace below -
      
      INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
       g=3992 c=3991 q=4471)
      <6> Task dump for CPU 1:
      <2> kworker/1:0     R  running task    12168    15   2 0x00000002
      <2> Workqueue: ipv6_addrconf addrconf_dad_work
      <6> Call trace:
      <2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
      <2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
      <2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
      <2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
      <2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
      <2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
      <2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
      <2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
      <2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
      <2> [<ffffffc0000bb40c>] kthread+0xe0/0xec
      
      v2: do addrconf_dad_kick inside read lock and then acquire write
      lock for ipv6_ifa_notify as suggested by Eric
      
      Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
      addresses useful candidates")
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Erik Kline <ek@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdbc6682
    • Paolo Abeni's avatar
      ipv6/udp: use sticky pktinfo egress ifindex on connect() · e1c4e14b
      Paolo Abeni authored
      [ Upstream commit 1cdda918 ]
      
      Currently, the egress interface index specified via IPV6_PKTINFO
      is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
      can be subverted when the user space application calls connect()
      before sendmsg().
      Fix it by initializing properly flowi6_oif in connect() before
      performing the route lookup.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1c4e14b
    • Paolo Abeni's avatar
      ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() · e8e729cc
      Paolo Abeni authored
      [ Upstream commit 6f21c96a ]
      
      The current implementation of ip6_dst_lookup_tail basically
      ignore the egress ifindex match: if the saddr is set,
      ip6_route_output() purposefully ignores flowi6_oif, due
      to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
      flag if saddr set"), if the saddr is 'any' the first route lookup
      in ip6_dst_lookup_tail fails, but upon failure a second lookup will
      be performed with saddr set, thus ignoring the ifindex constraint.
      
      This commit adds an output route lookup function variant, which
      allows the caller to specify lookup flags, and modify
      ip6_dst_lookup_tail() to enforce the ifindex match on the second
      lookup via said helper.
      
      ip6_route_output() becames now a static inline function build on
      top of ip6_route_output_flags(); as a side effect, out-of-tree
      modules need now a GPL license to access the output route lookup
      functionality.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8e729cc
    • Eric Dumazet's avatar
      tcp: beware of alignments in tcp_get_info() · 87e40d8d
      Eric Dumazet authored
      [ Upstream commit ff5d7497 ]
      
      With some combinations of user provided flags in netlink command,
      it is possible to call tcp_get_info() with a buffer that is not 8-bytes
      aligned.
      
      It does matter on some arches, so we need to use put_unaligned() to
      store the u64 fields.
      
      Current iproute2 package does not trigger this particular issue.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87e40d8d
    • Ido Schimmel's avatar
      switchdev: Require RTNL mutex to be held when sending FDB notifications · ba50e6d9
      Ido Schimmel authored
      [ Upstream commit 4f2c6ae5 ]
      
      When switchdev drivers process FDB notifications from the underlying
      device they resolve the netdev to which the entry points to and notify
      the bridge using the switchdev notifier.
      
      However, since the RTNL mutex is not held there is nothing preventing
      the netdev from disappearing in the middle, which will cause
      br_switchdev_event() to dereference a non-existing netdev.
      
      Make switchdev drivers hold the lock at the beginning of the
      notification processing session and release it once it ends, after
      notifying the bridge.
      
      Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
      when RTNL mutex is held.
      
      Fixes: 03bf0c28 ("switchdev: introduce switchdev notifier")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba50e6d9
    • Joe Stringer's avatar
      inet: frag: Always orphan skbs inside ip_defrag() · 649dc6c3
      Joe Stringer authored
      [ Upstream commit 8282f274 ]
      
      Later parts of the stack (including fragmentation) expect that there is
      never a socket attached to frag in a frag_list, however this invariant
      was not enforced on all defrag paths. This could lead to the
      BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
      end of this commit message.
      
      While the call could be added to openvswitch to fix this particular
      error, the head and tail of the frags list are already orphaned
      indirectly inside ip_defrag(), so it seems like the remaining fragments
      should all be orphaned in all circumstances.
      
      kernel BUG at net/ipv4/ip_output.c:586!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
       [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
       [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
       [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
       [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
       [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
       [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
       [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
       [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
       [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
       [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
       [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
       [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
       [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
       [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
       [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
       [<ffffffff816dfa06>] icmp_echo+0x36/0x70
       [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
       [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
       [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
       [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
       [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
       [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
       [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
       [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
       [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
       [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
       [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
       [<ffffffff8165e678>] process_backlog+0x78/0x230
       [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
       [<ffffffff8165e355>] net_rx_action+0x155/0x400
       [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
       [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
       [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106b88e>] do_softirq+0x4e/0x60
       [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
       [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
       [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
       [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
       [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
       [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
       [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
       [<ffffffff81204886>] ? __fget_light+0x66/0x90
       [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
       [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
       [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
       66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
      RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
       RSP <ffff88006d603170>
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      649dc6c3
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix connection abort during subscription cancel · c57e51ff
      Parthasarathy Bhuvaragan authored
      [ Upstream commit 4d5cfcba ]
      
      In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing
      to events")', we terminate the connection if the subscription
      creation fails.
      In the same commit, the subscription creation result was based on
      the value of the subscription pointer (set in the function) instead
      of the return code.
      
      Unfortunately, the same function tipc_subscrp_create() handles
      subscription cancel request. For a subscription cancellation request,
      the subscription pointer cannot be set. Thus if a subscriber has
      several subscriptions and cancels any of them, the connection is
      terminated.
      
      In this commit, we terminate the connection based on the return value
      of tipc_subscrp_create().
      Fixes: commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c57e51ff
    • Russell King's avatar
      net: dsa: fix mv88e6xxx switches · 7f76933d
      Russell King authored
      [ Upstream commit db0e51af ]
      
      Since commit 76e398a6 ("net: dsa: use switchdev obj for VLAN add/del
      ops"), the Marvell 88E6xxx switch has been unable to pass traffic
      between ports - any received traffic is discarded by the switch.
      Taking a port out of bridge mode and configuring a vlan on it also the
      port to start passing traffic.
      
      With the debugfs files re-instated to allow debug of this issue by
      comparing the register settings between the working and non-working
      case, the reason becomes clear:
      
           GLOBAL GLOBAL2 SERDES   0    1    2    3    4    5    6
      - 7:  1111    707f    2001     2    2    2    2    2    0    2
      + 7:  1111    707f    2001     1    1    1    1    1    0    1
      
      Register 7 for the ports is the default vlan tag register, and in the
      non-working setup, it has been set to 2, despite vlan 2 not being
      configured.  This causes the switch to drop all packets coming in to
      these ports.  The working setup has the default vlan tag register set
      to 1, which is the default vlan when none is configured.
      
      Inspection of the code reveals why.  The code prior to this commit
      was:
      
      -		for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
      ...
      -			if (!err && vlan->flags & BRIDGE_VLAN_INFO_PVID)
      -				err = ds->drv->port_pvid_set(ds, p->port, vid);
      
      but the new code is:
      
      +	for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
      ...
      +	}
      ...
      +	if (pvid)
      +		err = _mv88e6xxx_port_pvid_set(ds, port, vid);
      
      This causes the new code to always set the default vlan to one higher
      than the old code.
      
      Fix this.
      
      Fixes: 76e398a6 ("net: dsa: use switchdev obj for VLAN add/del ops")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f76933d
    • Marcelo Ricardo Leitner's avatar
      sctp: allow setting SCTP_SACK_IMMEDIATELY by the application · 293c41f8
      Marcelo Ricardo Leitner authored
      [ Upstream commit 27f7ed2b ]
      
      This patch extends commit b93d6471 ("sctp: implement the sender side
      for SACK-IMMEDIATELY extension") as it didn't white list
      SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be
      understood as an invalid flag and returning -EINVAL to the application.
      
      Note that the actual handling of the flag is already there in
      sctp_datamsg_from_user().
      
      https://tools.ietf.org/html/rfc7053#section-7
      
      Fixes: b93d6471 ("sctp: implement the sender side for SACK-IMMEDIATELY extension")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      293c41f8