1. 03 Feb, 2018 28 commits
  2. 31 Jan, 2018 12 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.114 · 49fe90b8
      Greg Kroah-Hartman authored
      49fe90b8
    • Ben Hutchings's avatar
      nfsd: auth: Fix gid sorting when rootsquash enabled · 3f84339b
      Ben Hutchings authored
      commit 19952667 upstream.
      
      Commit bdcf0a42 ("kernel: make groups_sort calling a responsibility
      group_info allocators") appears to break nfsd rootsquash in a pretty
      major way.
      
      It adds a call to groups_sort() inside the loop that copies/squashes
      gids, which means the valid gids are sorted along with the following
      garbage.  The net result is that the highest numbered valid gids are
      replaced with any lower-valued garbage gids, possibly including 0.
      
      We should sort only once, after filling in all the gids.
      
      Fixes: bdcf0a42 ("kernel: make groups_sort calling a responsibility ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Wolfgang Walter <linux@stwm.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f84339b
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · edaafa80
      Dan Streetman authored
      
      [ Upstream commit 4ee806d5 ]
      
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edaafa80
    • Eric Dumazet's avatar
      flow_dissector: properly cap thoff field · d35cd5e2
      Eric Dumazet authored
      
      [ Upstream commit d0c081b4 ]
      
      syzbot reported yet another crash [1] that is caused by
      insufficient validation of DODGY packets.
      
      Two bugs are happening here to trigger the crash.
      
      1) Flow dissection leaves with incorrect thoff field.
      
      2) skb_probe_transport_header() sets transport header to this invalid
      thoff, even if pointing after skb valid data.
      
      3) qdisc_pkt_len_init() reads out-of-bound data because it
      trusts tcp_hdrlen(skb)
      
      Possible fixes :
      
      - Full flow dissector validation before injecting bad DODGY packets in
      the stack.
       This approach was attempted here : https://patchwork.ozlabs.org/patch/
      861874/
      
      - Have more robust functions in the core.
        This might be needed anyway for stable versions.
      
      This patch fixes the flow dissection issue.
      
      [1]
      CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:355 [inline]
       kasan_report+0x23b/0x360 mm/kasan/report.c:413
       __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
       __tcp_hdrlen include/linux/tcp.h:35 [inline]
       tcp_hdrlen include/linux/tcp.h:40 [inline]
       qdisc_pkt_len_init net/core/dev.c:3160 [inline]
       __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
       packet_snd net/packet/af_packet.c:2943 [inline]
       packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       sock_write_iter+0x31a/0x5d0 net/socket.c:907
       call_write_iter include/linux/fs.h:1776 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Fixes: 34fad54c ("net: __skb_flow_dissect() must cap its return value")
      Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d35cd5e2
    • Jim Westfall's avatar
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY · cab84514
      Jim Westfall authored
      
      [ Upstream commit cd9ff4de ]
      
      Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
      to avoid making an entry for every remote ip the device needs to talk to.
      
      This used the be the old behavior but became broken in a263b309
      (ipv4: Make neigh lookups directly in output packet path) and later removed
      in 0bb4087c (ipv4: Fix neigh lookup keying over loopback/point-to-point
      devices) because it was broken.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cab84514
    • Jim Westfall's avatar
      net: Allow neigh contructor functions ability to modify the primary_key · 29837a4a
      Jim Westfall authored
      
      [ Upstream commit 096b9854 ]
      
      Use n->primary_key instead of pkey to account for the possibility that a neigh
      constructor function may have modified the primary_key value.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29837a4a
    • Neil Horman's avatar
      vmxnet3: repair memory leak · 0d9bcadb
      Neil Horman authored
      
      [ Upstream commit 848b1598 ]
      
      with the introduction of commit
      b0eb57cb, it appears that rq->buf_info
      is improperly handled.  While it is heap allocated when an rx queue is
      setup, and freed when torn down, an old line of code in
      vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
      being set to NULL prior to its being freed, causing a memory leak, which
      eventually exhausts the system on repeated create/destroy operations
      (for example, when  the mtu of a vmxnet3 interface is changed
      frequently.
      
      Fix is pretty straight forward, just move the NULL set to after the
      free.
      
      Tested by myself with successful results
      
      Applies to net, and should likely be queued for stable, please
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-By: boyang@redhat.com
      CC: boyang@redhat.com
      CC: Shrikrishna Khare <skhare@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: David S. Miller <davem@davemloft.net>
      Acked-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d9bcadb
    • Xin Long's avatar
      sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf · 50194c3f
      Xin Long authored
      
      [ Upstream commit a0ff6600 ]
      
      After commit cea0cc80 ("sctp: use the right sk after waking up from
      wait_buf sleep"), it may change to lock another sk if the asoc has been
      peeled off in sctp_wait_for_sndbuf.
      
      However, the asoc's new sk could be already closed elsewhere, as it's in
      the sendmsg context of the old sk that can't avoid the new sk's closing.
      If the sk's last one refcnt is held by this asoc, later on after putting
      this asoc, the new sk will be freed, while under it's own lock.
      
      This patch is to revert that commit, but fix the old issue by returning
      error under the old sk's lock.
      
      Fixes: cea0cc80 ("sctp: use the right sk after waking up from wait_buf sleep")
      Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50194c3f
    • Xin Long's avatar
      sctp: do not allow the v4 socket to bind a v4mapped v6 address · 23f521bc
      Xin Long authored
      
      [ Upstream commit c5006b8a ]
      
      The check in sctp_sockaddr_af is not robust enough to forbid binding a
      v4mapped v6 addr on a v4 socket.
      
      The worse thing is that v4 socket's bind_verify would not convert this
      v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
      socket bound a v6 addr.
      
      This patch is to fix it by doing the common sa.sa_family check first,
      then AF_INET check for v4mapped v6 addrs.
      
      Fixes: 7dab83de ("sctp: Support ipv6only AF_INET6 sockets.")
      Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23f521bc
    • Francois Romieu's avatar
      r8169: fix memory corruption on retrieval of hardware statistics. · d6611191
      Francois Romieu authored
      
      [ Upstream commit a78e9366 ]
      
      Hardware statistics retrieval hurts in tight invocation loops.
      
      Avoid extraneous write and enforce strict ordering of writes targeted to
      the tally counters dump area address registers.
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Tested-by: default avatarOliver Freyermuth <o.freyermuth@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6611191
    • Guillaume Nault's avatar
      pppoe: take ->needed_headroom of lower device into account on xmit · a49558eb
      Guillaume Nault authored
      
      [ Upstream commit 02612bb0 ]
      
      In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
      was probably fine before the introduction of ->needed_headroom in
      commit f5184d26 ("net: Allow netdevices to specify needed head/tailroom").
      
      But now, virtual devices typically advertise the size of their overhead
      in dev->needed_headroom, so we must also take it into account in
      skb_reserve().
      Allocation size of skb is also updated to take dev->needed_tailroom
      into account and replace the arbitrary 32 bytes with the real size of
      a PPPoE header.
      
      This issue was discovered by syzbot, who connected a pppoe socket to a
      gre device which had dev->header_ops->create == ipgre_header and
      dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
      headroom, and dev_hard_header() crashed when ipgre_header() tried to
      prepend its header to skb->data.
      
      skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
      head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:104!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
      4.15.0-rc7-next-20180115+ #97
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
      RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
      RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
      RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
      RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
      R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
      FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        skb_under_panic net/core/skbuff.c:114 [inline]
        skb_push+0xce/0xf0 net/core/skbuff.c:1714
        ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
        dev_hard_header include/linux/netdevice.h:2723 [inline]
        pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        sock_write_iter+0x31a/0x5d0 net/socket.c:909
        call_write_iter include/linux/fs.h:1775 [inline]
        do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
        do_iter_write+0x154/0x540 fs/read_write.c:932
        vfs_writev+0x18a/0x340 fs/read_write.c:977
        do_writev+0xfc/0x2a0 fs/read_write.c:1012
        SYSC_writev fs/read_write.c:1085 [inline]
        SyS_writev+0x27/0x30 fs/read_write.c:1082
        entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
      interfaces, but reserving space for ->needed_headroom is a more
      fundamental issue that needs to be addressed first.
      
      Same problem exists for __pppoe_xmit(), which also needs to take
      dev->needed_headroom into account in skb_cow_head().
      
      Fixes: f5184d26 ("net: Allow netdevices to specify needed head/tailroom")
      Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a49558eb
    • Eric Dumazet's avatar
      net: qdisc_pkt_len_init() should be more robust · f50fc5f4
      Eric Dumazet authored
      
      [ Upstream commit 7c68d1a6 ]
      
      Without proper validation of DODGY packets, we might very well
      feed qdisc_pkt_len_init() with invalid GSO packets.
      
      tcp_hdrlen() might access out-of-bound data, so let's use
      skb_header_pointer() and proper checks.
      
      Whole story is described in commit d0c081b4 ("flow_dissector:
      properly cap thoff field")
      
      We have the goal of validating DODGY packets earlier in the stack,
      so we might very well revert this fix in the future.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f50fc5f4