1. 15 Jan, 2017 8 commits
    • Paul Blakey's avatar
      net/sched: cls_flower: Fix missing addr_type in classify · 9b4a34ff
      Paul Blakey authored
      [ Upstream commit 0df0f207 ]
      
      Since we now use a non zero mask on addr_type, we are matching on its
      value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
      failed in SW/classify path since its value was zero.
      This patch sets the proper value of addr_type for encapsulated packets.
      
      Fixes: 970bfcd0 ('net/sched: cls_flower: Use mask for addr_type')
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b4a34ff
    • Florian Fainelli's avatar
      net: stmmac: Fix race between stmmac_drv_probe and stmmac_open · 99f40c6b
      Florian Fainelli authored
      [ Upstream commit 57016590 ]
      
      There is currently a small window during which the network device registered by
      stmmac can be made visible, yet all resources, including and clock and MDIO bus
      have not had a chance to be set up, this can lead to the following error to
      occur:
      
      [  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                      stmmac_dvr_probe: warning: cannot get CSR clock
      [  473.919382] stmmaceth 0000:01:00.0: no reset control found
      [  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
      [  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
      [  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
      [  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
      [  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
                      Enable RX Mitigation via HW Watchdog Timer
      [  473.921395] libphy: PHY stmmac-1:00 not found
      [  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
      [  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
                      PHY (error: -19)
      [  473.959710] libphy: stmmac: probed
      [  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
                      (stmmac-1:00) active
      [  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
                      (stmmac-1:01)
      [  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
                      (stmmac-1:02)
      [  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
                      (stmmac-1:03)
      
      Fix this by making sure that register_netdev() is the last thing being done,
      which guarantees that the clock and the MDIO bus are available.
      
      Fixes: 4bfcbd7a ("stmmac: Move the mdio_register/_unregister in probe/remove")
      Reported-by: default avatarKweh, Hock Leong <hock.leong.kweh@intel.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99f40c6b
    • Daniel Borkmann's avatar
      net, sched: fix soft lockup in tc_classify · 09babe4c
      Daniel Borkmann authored
      [ Upstream commit 628185cf ]
      
      Shahar reported a soft lockup in tc_classify(), where we run into an
      endless loop when walking the classifier chain due to tp->next == tp
      which is a state we should never run into. The issue only seems to
      trigger under load in the tc control path.
      
      What happens is that in tc_ctl_tfilter(), thread A allocates a new
      tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
      with it. In that classifier callback we had to unlock/lock the rtnl
      mutex and returned with -EAGAIN. One reason why we need to drop there
      is, for example, that we need to request an action module to be loaded.
      
      This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
      after we loaded and found the requested action, we need to redo the
      whole request so we don't race against others. While we had to unlock
      rtnl in that time, thread B's request was processed next on that CPU.
      Thread B added a new tp instance successfully to the classifier chain.
      When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
      and destroying its tp instance which never got linked, we goto replay
      and redo A's request.
      
      This time when walking the classifier chain in tc_ctl_tfilter() for
      checking for existing tp instances we had a priority match and found
      the tp instance that was created and linked by thread B. Now calling
      again into tp->ops->change() with that tp was successful and returned
      without error.
      
      tp_created was never cleared in the second round, thus kernel thinks
      that we need to link it into the classifier chain (once again). tp and
      *back point to the same object due to the match we had earlier on. Thus
      for thread B's already public tp, we reset tp->next to tp itself and
      link it into the chain, which eventually causes the mentioned endless
      loop in tc_classify() once a packet hits the data path.
      
      Fix is to clear tp_created at the beginning of each request, also when
      we replay it. On the paths that can cause -EAGAIN we already destroy
      the original tp instance we had and on replay we really need to start
      from scratch. It seems that this issue was first introduced in commit
      12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
      and avoid kernel panic when we use cls_cgroup").
      
      Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
      Reported-by: default avatarShahar Klein <shahark@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarShahar Klein <shahark@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09babe4c
    • Dave Jones's avatar
      ipv6: handle -EFAULT from skb_copy_bits · ee99e2bc
      Dave Jones authored
      [ Upstream commit a98f9175 ]
      
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee99e2bc
    • Willem de Bruijn's avatar
      inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets · d36a1cb1
      Willem de Bruijn authored
      [ Upstream commit 39b2dd76 ]
      
      Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
      the packet. For sockets that have transport headers pulled, transport
      offset can be negative. Use signed comparison to avoid overflow.
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Reported-by: default avatarNisar Jagabar <njagabar@cloudmark.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d36a1cb1
    • Xin Long's avatar
      sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null · ed3cc329
      Xin Long authored
      [ Upstream commit 08abb795 ]
      
      Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
      when it failed to find a transport by sctp_addrs_lookup_transport.
      
      This patch is to fix it by moving up rcu_read_unlock right before checking
      transport and also to remove the out path.
      
      Fixes: 1cceda78 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed3cc329
    • David Ahern's avatar
      net: vrf: Drop conntrack data after pass through VRF device on Tx · 8b8fbe5c
      David Ahern authored
      [ Upstream commit eb63ecc1 ]
      
      Locally originated traffic in a VRF fails in the presence of a POSTROUTING
      rule. For example,
      
          $ iptables -t nat -A POSTROUTING -s 11.1.1.0/24  -j MASQUERADE
          $ ping -I red -c1 11.1.1.3
          ping: Warning: source address might be selected on device other than red.
          PING 11.1.1.3 (11.1.1.3) from 11.1.1.2 red: 56(84) bytes of data.
          ping: sendmsg: Operation not permitted
      
      Worse, the above causes random corruption resulting in a panic in random
      places (I have not seen a consistent backtrace).
      
      Call nf_reset to drop the conntrack info following the pass through the
      VRF device.  The nf_reset is needed on Tx but not Rx because of the order
      in which NF_HOOK's are hit: on Rx the VRF device is after the real ingress
      device and on Tx it is is before the real egress device. Connection
      tracking should be tied to the real egress device and not the VRF device.
      
      Fixes: 8f58336d ("net: Add ethernet header for pass through VRF device")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b8fbe5c
    • David Ahern's avatar
      net: vrf: Fix NAT within a VRF · d4a0b2e4
      David Ahern authored
      [ Upstream commit a0f37efa ]
      
      Connection tracking with VRF is broken because the pass through the VRF
      device drops the connection tracking info. Removing the call to nf_reset
      allows DNAT and MASQUERADE to work across interfaces within a VRF.
      
      Fixes: 73e20b76 ("net: vrf: Add support for PREROUTING rules on vrf device")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4a0b2e4
  2. 12 Jan, 2017 32 commits