1. 03 Oct, 2017 2 commits
  2. 02 Oct, 2017 13 commits
  3. 01 Oct, 2017 8 commits
    • Parthasarathy Bhuvaragan's avatar
      tipc: use only positive error codes in messages · aad06212
      Parthasarathy Bhuvaragan authored
      In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
      we have updated the function tipc_msg_lookup_dest() to set the error
      codes to negative values at destination lookup failures. Thus when
      the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
      into the 4 bit error field of the message header as 0xf instead of
      TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
      
      In this commit, we set only positive error code.
      
      Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aad06212
    • Guillaume Nault's avatar
      ppp: fix __percpu annotation · 5a59a3a0
      Guillaume Nault authored
      Move sparse annotation right after pointer type.
      
      Fixes sparse warning:
          drivers/net/ppp/ppp_generic.c:1422:13: warning: incorrect type in initializer (different address spaces)
          drivers/net/ppp/ppp_generic.c:1422:13:    expected void const [noderef] <asn:3>*__vpp_verify
          drivers/net/ppp/ppp_generic.c:1422:13:    got int *<noident>
          ...
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a59a3a0
    • David S. Miller's avatar
      Merge branch 'udp-fix-early-demux-for-mcast-packets' · 230583c1
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      udp: fix early demux for mcast packets
      
      Currently the early demux callbacks do not perform source address validation.
      This is not an issue for TCP or UDP unicast, where the early demux
      is only allowed for connected sockets and the source address is validated
      for the first packet and never change.
      
      The UDP protocol currently allows early demux also for unconnected multicast
      sockets, and we are not currently doing any validation for them, after that
      the first packet lands on the socket: beyond ignoring the rp_filter - if
      enabled - any kind of martian sources are also allowed.
      
      This series addresses the issue allowing the early demux callback to return an
      error code, and performing the proper checks for unconnected UDP multicast
      sockets before leveraging the rx dst cache.
      
      Alternatively we could disable the early demux for unconnected mcast sockets,
      but that would cause relevant performance regression - around 50% - while with
      this series, with full rp_filter in place, we keep the regression to a more
      moderate level.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      230583c1
    • Paolo Abeni's avatar
      udp: perform source validation for mcast early demux · bc044e8d
      Paolo Abeni authored
      The UDP early demux can leverate the rx dst cache even for
      multicast unconnected sockets.
      
      In such scenario the ipv4 source address is validated only on
      the first packet in the given flow. After that, when we fetch
      the dst entry  from the socket rx cache, we stop enforcing
      the rp_filter and we even start accepting any kind of martian
      addresses.
      
      Disabling the dst cache for unconnected multicast socket will
      cause large performace regression, nearly reducing by half the
      max ingress tput.
      
      Instead we factor out a route helper to completely validate an
      skb source address for multicast packets and we call it from
      the UDP early demux for mcast packets landing on unconnected
      sockets, after successful fetching the related cached dst entry.
      
      This still gives a measurable, but limited performance
      regression:
      
      		rp_filter = 0		rp_filter = 1
      edmux disabled:	1182 Kpps		1127 Kpps
      edmux before:	2238 Kpps		2238 Kpps
      edmux after:	2037 Kpps		2019 Kpps
      
      The above figures are on top of current net tree.
      Applying the net-next commit 6e617de8 ("net: avoid a full
      fib lookup when rp_filter is disabled.") the delta with
      rp_filter == 0 will decrease even more.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc044e8d
    • Paolo Abeni's avatar
      IPv4: early demux can return an error code · 7487449c
      Paolo Abeni authored
      Currently no error is emitted, but this infrastructure will
      used by the next patch to allow source address validation
      for mcast sockets.
      Since early demux can do a route lookup and an ipv4 route
      lookup can return an error code this is consistent with the
      current ipv4 route infrastructure.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7487449c
    • Xin Long's avatar
      ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · d41bb33b
      Xin Long authored
      Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
      device, like ip6gre_tap tunnel, for which it should also subtract ether
      header to get the correct mtu.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d41bb33b
    • Xin Long's avatar
      ip6_gre: ip6gre_tap device should keep dst · 2d40557c
      Xin Long authored
      The patch 'ip_gre: ipgre_tap device should keep dst' fixed
      a issue that ipgre_tap mtu couldn't be updated in tx path.
      
      The same fix is needed for ip6gre_tap as well.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d40557c
    • Xin Long's avatar
      ip_gre: ipgre_tap device should keep dst · d51711c0
      Xin Long authored
      Without keeping dst, the tunnel will not update any mtu/pmtu info,
      since it does not have a dst on the skb.
      
      Reproducer:
        client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server
      
      After reducing eth1's mtu on client, then perforamnce became 0.
      
      This patch is to netif_keep_dst in gre_tap_init, as ipgre does.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d51711c0
  4. 30 Sep, 2017 1 commit
    • Jason A. Donenfeld's avatar
      netlink: do not proceed if dump's start() errs · fef0035c
      Jason A. Donenfeld authored
      Drivers that use the start method for netlink dumping rely on dumpit not
      being called if start fails. For example, ila_xlat.c allocates memory
      and assigns it to cb->args[0] in its start() function. It might fail to
      do that and return -ENOMEM instead. However, even when returning an
      error, dumpit will be called, which, in the example above, quickly
      dereferences the memory in cb->args[0], which will OOPS the kernel. This
      is but one example of how this goes wrong.
      
      Since start() has always been a function with an int return type, it
      therefore makes sense to use it properly, rather than ignoring it. This
      patch thus returns early and does not call dumpit() when start() fails.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fef0035c
  5. 29 Sep, 2017 1 commit
  6. 28 Sep, 2017 15 commits
    • Christoph Paasch's avatar
      net: Set sk_prot_creator when cloning sockets to the right proto · 9d538fa6
      Christoph Paasch authored
      sk->sk_prot and sk->sk_prot_creator can differ when the app uses
      IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
      Which is why sk_prot_creator is there to make sure that sk_prot_free()
      does the kmem_cache_free() on the right kmem_cache slab.
      
      Now, if such a socket gets transformed back to a listening socket (using
      connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
      sk_clone_lock() when a new connection comes in. But sk_prot_creator will
      still point to the IPv6 kmem_cache (as everything got copied in
      sk_clone_lock()). When freeing, we will thus put this
      memory back into the IPv6 kmem_cache although it was allocated in the
      IPv4 cache. I have seen memory corruption happening because of this.
      
      With slub-debugging and MEMCG_KMEM enabled this gives the warning
      	"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
      
      A C-program to trigger this:
      
      void main(void)
      {
              int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
              int new_fd, newest_fd, client_fd;
              struct sockaddr_in6 bind_addr;
              struct sockaddr_in bind_addr4, client_addr1, client_addr2;
              struct sockaddr unsp;
              int val;
      
              memset(&bind_addr, 0, sizeof(bind_addr));
              bind_addr.sin6_family = AF_INET6;
              bind_addr.sin6_port = ntohs(42424);
      
              memset(&client_addr1, 0, sizeof(client_addr1));
              client_addr1.sin_family = AF_INET;
              client_addr1.sin_port = ntohs(42424);
              client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&client_addr2, 0, sizeof(client_addr2));
              client_addr2.sin_family = AF_INET;
              client_addr2.sin_port = ntohs(42421);
              client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&unsp, 0, sizeof(unsp));
              unsp.sa_family = AF_UNSPEC;
      
              bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));
      
              listen(fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1));
              new_fd = accept(fd, NULL, NULL);
              close(fd);
      
              val = AF_INET;
              setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val));
      
              connect(new_fd, &unsp, sizeof(unsp));
      
              memset(&bind_addr4, 0, sizeof(bind_addr4));
              bind_addr4.sin_family = AF_INET;
              bind_addr4.sin_port = ntohs(42421);
              bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4));
      
              listen(new_fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2));
      
              newest_fd = accept(new_fd, NULL, NULL);
              close(new_fd);
      
              close(client_fd);
              close(new_fd);
      }
      
      As far as I can see, this bug has been there since the beginning of the
      git-days.
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d538fa6
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: lock mutex when freeing IRQs · b32ca44a
      Vivien Didelot authored
      mv88e6xxx_g2_irq_free locks the registers mutex, but not
      mv88e6xxx_g1_irq_free, which results in a stack trace from
      assert_reg_lock when unloading the mv88e6xxx module. Fix this.
      
      Fixes: 3460a577 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b32ca44a
    • Willem de Bruijn's avatar
      packet: only test po->has_vnet_hdr once in packet_snd · da7c9561
      Willem de Bruijn authored
      Packet socket option po->has_vnet_hdr can be updated concurrently with
      other operations if no ring is attached.
      
      Do not test the option twice in packet_snd, as the value may change in
      between calls. A race on setsockopt disable may cause a packet > mtu
      to be sent without having GSO options set.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da7c9561
    • Willem de Bruijn's avatar
      packet: in packet_do_bind, test fanout with bind_lock held · 4971613c
      Willem de Bruijn authored
      Once a socket has po->fanout set, it remains a member of the group
      until it is destroyed. The prot_hook must be constant and identical
      across sockets in the group.
      
      If fanout_add races with packet_do_bind between the test of po->fanout
      and taking the lock, the bind call may make type or dev inconsistent
      with that of the fanout group.
      
      Hold po->bind_lock when testing po->fanout to avoid this race.
      
      I had to introduce artificial delay (local_bh_enable) to actually
      observe the race.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4971613c
    • Ed Blake's avatar
      net: stmmac: dwmac4: Re-enable MAC Rx before powering down · 1579f678
      Ed Blake authored
      Re-enable the MAC receiver by setting CONFIG_RE before powering down,
      as instructed in section 6.3.5.1 of [1].  Without this the MAC fails
      to receive WoL packets and never wakes up.
      
      [1] DWC Ethernet QoS Databook 4.10a October 2014
      Signed-off-by: default avatarEd Blake <ed.blake@sondrel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1579f678
    • Ed Blake's avatar
      net: stmmac: dwc-qos: Add suspend / resume support · 06d7a1b9
      Ed Blake authored
      Add hook to stmmac_pltfr_pm_ops for suspend / resume handling.
      Signed-off-by: default avatarEd Blake <ed.blake@sondrel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06d7a1b9
    • Florian Fainelli's avatar
      net: dsa: Fix network device registration order · e804441c
      Florian Fainelli authored
      We cannot be registering the network device first, then setting its
      carrier off and finally connecting it to a PHY, doing that leaves a
      window during which the carrier is at best inconsistent, and at worse
      the device is not usable without a down/up sequence since the network
      device is visible to user space with possibly no PHY device attached.
      
      Re-order steps so that they make logical sense. This fixes some devices
      where the port was not usable after e.g: an unbind then bind of the
      driver.
      
      Fixes: 0071f56e ("dsa: Register netdev before phy")
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e804441c
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Allow dsa and cpu ports in multiple vlans · db06ae41
      Andrew Lunn authored
      Ports with the same VLAN must all be in the same bridge. However the
      CPU and DSA ports need to be in multiple VLANs spread over multiple
      bridges. So exclude them when performing this test.
      
      Fixes: b2f81d30 ("net: dsa: add CPU and DSA ports as VLAN members")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db06ae41
    • Eric Dumazet's avatar
      inetpeer: fix RCU lookup() again · 35f493b8
      Eric Dumazet authored
      My prior fix was not complete, as we were dereferencing a pointer
      three times per node, not twice as I initially thought.
      
      Fixes: 4cc5b44b ("inetpeer: fix RCU lookup()")
      Fixes: b145425f ("inetpeer: remove AVL implementation in favor of RB tree")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35f493b8
    • David S. Miller's avatar
      Merge branch 'mvpp2-various-fixes' · 2d3924c2
      David S. Miller authored
      Antoine Tenart says:
      
      ====================
      net: mvpp2: various fixes
      
      This series contains 3 fixes for the Marvell PPv2 driver.
      
      Since v1:
        - Removed one patch about dma masks as it would need a better fix.
        - Added one fix about the MAC Tx clock source selection.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d3924c2
    • Antoine Tenart's avatar
      net: mvpp2: do not select the internal source clock · c7dfc8c8
      Antoine Tenart authored
      This patch stops the internal MAC Tx clock from being enabled as the
      internal clock isn't used. The definition used for the bit controlling
      this behaviour is renamed as well as it was wrongly named (bit 4 of
      GMAC_CTRL_2_REG).
      
      Fixes: 3919357f ("net: mvpp2: initialize the GMAC when using a port")
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7dfc8c8
    • Yan Markman's avatar
      net: mvpp2: fix port list indexing · 6bf69a1d
      Yan Markman authored
      The private port_list array has a list of pointers to mvpp2_port
      instances. This list is allocated given the number of ports enabled in
      the device tree, but the pointers are set using the port-id property. If
      on a single port is enabled, the port_list array will be of size 1, but
      when registering the port, if its id is not 0 the driver will crash.
      Other crashes were encountered in various situations.
      
      This fixes the issue by using an index not equal to the value of the
      port-id property.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: default avatarYan Markman <ymarkman@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bf69a1d
    • Stefan Chulski's avatar
      net: mvpp2: fix parsing fragmentation detection · aff3da39
      Stefan Chulski authored
      Parsing fragmentation detection failed due to wrong configured
      parser TCAM entry's. Some traffic was marked as fragmented in RX
      descriptor, even it wasn't IP fragmented. The hardware also failed to
      calculate checksums which lead to use software checksum and caused
      performance degradation.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: default avatarStefan Chulski <stefanc@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aff3da39
    • Alexander Potapenko's avatar
      tun: bail out from tun_get_user() if the skb is empty · 2580c4c1
      Alexander Potapenko authored
      KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
      skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
      
      ================================================
      BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
      CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
      ...
       __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
       tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
      ...
      origin:
      ...
       kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
       slab_alloc_node mm/slub.c:2732
       __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:903
       alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
       sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
       tun_alloc_skb drivers/net/tun.c:1144
       tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
      ================================================
      
      Make sure tun_get_user() doesn't touch skb->data[0] unless there is
      actual data.
      
      C reproducer below:
      ==========================
          // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
          #define _GNU_SOURCE
      
          #include <fcntl.h>
          #include <linux/if_tun.h>
          #include <netinet/ip.h>
          #include <net/if.h>
          #include <string.h>
          #include <sys/ioctl.h>
      
          int main()
          {
            int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
            int tun_fd = open("/dev/net/tun", O_RDWR);
            struct ifreq req;
            memset(&req, 0, sizeof(struct ifreq));
            strcpy((char*)&req.ifr_name, "gre0");
            req.ifr_flags = IFF_UP | IFF_MULTICAST;
            ioctl(tun_fd, TUNSETIFF, &req);
            ioctl(sock, SIOCSIFFLAGS, "gre0");
            write(tun_fd, "hi", 0);
            return 0;
          }
      ==========================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2580c4c1
    • Or Gerlitz's avatar
      net/mlx5: Fix wrong indentation in enable SRIOV code · 353f59f4
      Or Gerlitz authored
      Smatch is screaming:
      
      drivers/net/ethernet/mellanox/mlx5/core/sriov.c:112
      	mlx5_device_enable_sriov() warn: inconsistent indenting
      
      fix that.
      
      Fixes: 7ecf6d8f ('IB/mlx5: Restore IB guid/policy for virtual functions')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      353f59f4