1. 02 Jul, 2019 10 commits
    • Davide Caratti's avatar
    • Cong Wang's avatar
      idr: introduce idr_for_each_entry_continue_ul() · d39d7149
      Cong Wang authored
      Similarly, other callers of idr_get_next_ul() suffer the same
      overflow bug as they don't handle it properly either.
      
      Introduce idr_for_each_entry_continue_ul() to help these callers
      iterate from a given ID.
      
      cls_flower needs more care here because it still has overflow when
      does arg->cookie++, we have to fold its nested loops into one
      and remove the arg->cookie++.
      
      Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr")
      Fixes: 12d6066c ("net/mlx5: Add flow counters idr")
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Cc: Davide Caratti <dcaratti@redhat.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Chris Mi <chrism@mellanox.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d39d7149
    • Cong Wang's avatar
      idr: fix overflow case for idr_for_each_entry_ul() · e33d2b74
      Cong Wang authored
      idr_for_each_entry_ul() is buggy as it can't handle overflow
      case correctly. When we have an ID == UINT_MAX, it becomes an
      infinite loop. This happens when running on 32-bit CPU where
      unsigned long has the same size with unsigned int.
      
      There is no better way to fix this than casting it to a larger
      integer, but we can't just 64 bit integer on 32 bit CPU. Instead
      we could just use an additional integer to help us to detect this
      overflow case, that is, adding a new parameter to this macro.
      Fortunately tc action is its only user right now.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Tested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e33d2b74
    • David S. Miller's avatar
      Merge branch 'vsock-virtio-fixes' · eb1f5c02
      David S. Miller authored
      Stefano Garzarella says:
      
      ====================
      vsock/virtio: several fixes in the .probe() and .remove()
      
      During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
      before registering the driver", Stefan pointed out some possible issues
      in the .probe() and .remove() callbacks of the virtio-vsock driver.
      
      This series tries to solve these issues:
      - Patch 1 adds RCU critical sections to avoid use-after-free of
        'the_virtio_vsock' pointer.
      - Patch 2 stops workers before to call vdev->config->reset(vdev) to
        be sure that no one is accessing the device.
      - Patch 3 moves the works flush at the end of the .remove() to avoid
        use-after-free of 'vsock' object.
      
      v2:
      - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
      - Patch 2: no changes
      - Patch 3: flush works only at the end of .remove()
      - Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
        allocated.
      
      v1: https://patchwork.kernel.org/cover/10964733/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb1f5c02
    • Stefano Garzarella's avatar
      vsock/virtio: fix flush of works during the .remove() · 0d20e56e
      Stefano Garzarella authored
      This patch moves the flush of works after vdev->config->del_vqs(vdev),
      because we need to be sure that no workers run before to free the
      'vsock' object.
      
      Since we stopped the workers using the [tx|rx|event]_run flags,
      we are sure no one is accessing the device while we are calling
      vdev->config->reset(vdev), so we can safely move the workers' flush.
      
      Before the vdev->config->del_vqs(vdev), workers can be scheduled
      by VQ callbacks, so we must flush them after del_vqs(), to avoid
      use-after-free of 'vsock' object.
      Suggested-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d20e56e
    • Stefano Garzarella's avatar
      vsock/virtio: stop workers during the .remove() · 17dd1367
      Stefano Garzarella authored
      Before to call vdev->config->reset(vdev) we need to be sure that
      no one is accessing the device, for this reason, we add new variables
      in the struct virtio_vsock to stop the workers during the .remove().
      
      This patch also add few comments before vdev->config->reset(vdev)
      and vdev->config->del_vqs(vdev).
      Suggested-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Suggested-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17dd1367
    • Stefano Garzarella's avatar
      vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock · 9c7a5582
      Stefano Garzarella authored
      Some callbacks used by the upper layers can run while we are in the
      .remove(). A potential use-after-free can happen, because we free
      the_virtio_vsock without knowing if the callbacks are over or not.
      
      To solve this issue we move the assignment of the_virtio_vsock at the
      end of .probe(), when we finished all the initialization, and at the
      beginning of .remove(), before to release resources.
      For the same reason, we do the same also for the vdev->priv.
      
      We use RCU to be sure that all callbacks that use the_virtio_vsock
      ended before freeing it. This is not required for callbacks that
      use vdev->priv, because after the vdev->config->del_vqs() we are sure
      that they are ended and will no longer be invoked.
      
      We also take the mutex during the .remove() to avoid that .probe() can
      run while we are resetting the device.
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c7a5582
    • Taehee Yoo's avatar
      vxlan: do not destroy fdb if register_netdevice() is failed · 7c31e54a
      Taehee Yoo authored
      __vxlan_dev_create() destroys FDB using specific pointer which indicates
      a fdb when error occurs.
      But that pointer should not be used when register_netdevice() fails because
      register_netdevice() internally destroys fdb when error occurs.
      
      This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev
      internally.
      Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan
      dev.
      
      vxlan_fdb_insert() is called after calling register_netdevice().
      This routine can avoid situation that ->ndo_uninit() destroys fdb entry
      in error path of register_netdevice().
      Hence, error path of __vxlan_dev_create() routine can have an opportunity
      to destroy default fdb entry by hand.
      
      Test command
          ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \
      	    dev enp0s9 dstport 4789
      
      Splat looks like:
      [  213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ #256
      [  213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan]
      [  213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d
      [  213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202
      [  213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000
      [  213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0
      [  213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000
      [  213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200
      [  213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0
      [  213.402178] FS:  00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
      [  213.402178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0
      [  213.402178] Call Trace:
      [  213.402178]  __vxlan_dev_create+0x3a9/0x7d0 [vxlan]
      [  213.402178]  ? vxlan_changelink+0x740/0x740 [vxlan]
      [  213.402178]  ? rcu_read_unlock+0x60/0x60 [vxlan]
      [  213.402178]  ? __kasan_kmalloc.constprop.3+0xa0/0xd0
      [  213.402178]  vxlan_newlink+0x8d/0xc0 [vxlan]
      [  213.402178]  ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan]
      [  213.554119]  ? __netlink_ns_capable+0xc3/0xf0
      [  213.554119]  __rtnl_newlink+0xb75/0x1180
      [  213.554119]  ? rtnl_link_unregister+0x230/0x230
      [ ... ]
      
      Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
      Suggested-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c31e54a
    • Marcelo Ricardo Leitner's avatar
      sctp: fix error handling on stream scheduler initialization · 4d141581
      Marcelo Ricardo Leitner authored
      It allocates the extended area for outbound streams only on sendmsg
      calls, if they are not yet allocated.  When using the priority
      stream scheduler, this initialization may imply into a subsequent
      allocation, which may fail.  In this case, it was aborting the stream
      scheduler initialization but leaving the ->ext pointer (allocated) in
      there, thus in a partially initialized state.  On a subsequent call to
      sendmsg, it would notice the ->ext pointer in there, and trip on
      uninitialized stuff when trying to schedule the data chunk.
      
      The fix is undo the ->ext initialization if the stream scheduler
      initialization fails and avoid the partially initialized state.
      
      Although syzkaller bisected this to commit 4ff40b86 ("sctp: set
      chunk transport correctly when it's a new asoc"), this bug was actually
      introduced on the commit I marked below.
      
      Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Tested-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d141581
    • Cong Wang's avatar
      netrom: fix a memory leak in nr_rx_frame() · c8c8218e
      Cong Wang authored
      When the skb is associated with a new sock, just assigning
      it to skb->sk is not sufficient, we have to set its destructor
      to free the sock properly too.
      
      Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8c8218e
  2. 01 Jul, 2019 5 commits
  3. 30 Jun, 2019 6 commits
  4. 29 Jun, 2019 9 commits
    • Baruch Siach's avatar
      net: dsa: mv88e6xxx: wait after reset deactivation · 7b75e49d
      Baruch Siach authored
      Add a 1ms delay after reset deactivation. Otherwise the chip returns
      bogus ID value. This is observed with 88E6390 (Peridot) chip.
      Signed-off-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b75e49d
    • Guilherme G. Piccoli's avatar
      bnx2x: Prevent ptp_task to be rescheduled indefinitely · 3c91f25c
      Guilherme G. Piccoli authored
      Currently bnx2x ptp worker tries to read a register with timestamp
      information in case of TX packet timestamping and in case it fails,
      the routine reschedules itself indefinitely. This was reported as a
      kworker always at 100% of CPU usage, which was narrowed down to be
      bnx2x ptp_task.
      
      By following the ioctl handler, we could narrow down the problem to
      an NTP tool (chrony) requesting HW timestamping from bnx2x NIC with
      RX filter zeroed; this isn't reproducible for example with ptp4l
      (from linuxptp) since this tool requests a supported RX filter.
      It seems NIC FW timestamp mechanism cannot work well with
      RX_FILTER_NONE - driver's PTP filter init routine skips a register
      write to the adapter if there's not a supported filter request.
      
      This patch addresses the problem of bnx2x ptp thread's everlasting
      reschedule by retrying the register read 10 times; between the read
      attempts the thread sleeps for an increasing amount of time starting
      in 1ms to give FW some time to perform the timestamping. If it still
      fails after all retries, we bail out in order to prevent an unbound
      resource consumption from bnx2x.
      
      The patch also adds an ethtool statistic for accounting the skipped
      TX timestamp packets and it reduces the priority of timestamping
      error messages to prevent log flooding. The code was tested using
      both linuxptp and chrony.
      Reported-and-tested-by: default avatarPrzemyslaw Hausman <przemyslaw.hausman@canonical.com>
      Suggested-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@canonical.com>
      Acked-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c91f25c
    • Eric Dumazet's avatar
      igmp: fix memory leak in igmpv3_del_delrec() · e5b1c6c6
      Eric Dumazet authored
      im->tomb and/or im->sources might not be NULL, but we
      currently overwrite their values blindly.
      
      Using swap() will make sure the following call to kfree_pmc(pmc)
      will properly free the psf structures.
      
      Tested with the C repro provided by syzbot, which basically does :
      
       socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
       setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, "\340\0\0\2\177\0\0\1\0\0\0\0", 12) = 0
       ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=0}) = 0
       setsockopt(3, SOL_IP, IP_MSFILTER, "\340\0\0\2\177\0\0\1\1\0\0\0\1\0\0\0\377\377\377\377", 20) = 0
       ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP}) = 0
       exit_group(0)                    = ?
      
      BUG: memory leak
      unreferenced object 0xffff88811450f140 (size 64):
        comm "softirq", pid 0, jiffies 4294942448 (age 32.070s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
          00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000c7bad083>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<00000000c7bad083>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000c7bad083>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000c7bad083>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
          [<000000009acc4151>] kmalloc include/linux/slab.h:547 [inline]
          [<000000009acc4151>] kzalloc include/linux/slab.h:742 [inline]
          [<000000009acc4151>] ip_mc_add1_src net/ipv4/igmp.c:1976 [inline]
          [<000000009acc4151>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2100
          [<000000004ac14566>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2484
          [<0000000052d8f995>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:959
          [<000000004ee1e21f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1248
          [<0000000066cdfe74>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2618
          [<000000009383a786>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3126
          [<00000000d8ac0c94>] __sys_setsockopt+0x98/0x120 net/socket.c:2072
          [<000000001b1e9666>] __do_sys_setsockopt net/socket.c:2083 [inline]
          [<000000001b1e9666>] __se_sys_setsockopt net/socket.c:2080 [inline]
          [<000000001b1e9666>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2080
          [<00000000420d395e>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<000000007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info when set link down")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hangbin Liu <liuhangbin@gmail.com>
      Reported-by: syzbot+6ca1abd0db68b5173a4f@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5b1c6c6
    • David S. Miller's avatar
      Merge branch 'Sub-ns-increment-fixes-in-Macb-PTP' · c09fedd6
      David S. Miller authored
      Harini Katakam says:
      
      ====================
      Sub ns increment fixes in Macb PTP
      
      The subns increment register fields are not captured correctly in the
      driver. Fix the same and also increase the subns incr resolution.
      
      Sub ns resolution was increased to 24 bits in r1p06f2 version. To my
      knowledge, this PTP driver, with its current BD time stamp
      implementation, is only useful to that version or above. So, I have
      increased the resolution unconditionally. Please let me know if there
      is any IP versions incompatible with this - there is no register to
      obtain this information from.
      
      Changes from RFC:
      None
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c09fedd6
    • Harini Katakam's avatar
      net: macb: Fix SUBNS increment and increase resolution · 7ad342bc
      Harini Katakam authored
      The subns increment register has 24 bits as follows:
      RegBit[15:0] = Subns[23:8]; RegBit[31:24] = Subns[7:0]
      
      Fix the same in the driver and increase sub ns resolution to the
      best capable, 24 bits. This should be the case on all GEM versions
      that this PTP driver supports.
      Signed-off-by: default avatarHarini Katakam <harini.katakam@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ad342bc
    • Harini Katakam's avatar
      net: macb: Add separate definition for PPM fraction · a8ee4dc1
      Harini Katakam authored
      The scaled ppm parameter passed to _adjfine() contains a 16 bit
      fraction. This just happens to be the same as SUBNSINCR_SIZE now.
      Hence define this separately.
      Signed-off-by: default avatarHarini Katakam <harini.katakam@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8ee4dc1
    • Jiunn Chang's avatar
      packet: Fix undefined behavior in bit shift · 79293f49
      Jiunn Chang authored
      Shifting signed 32-bit value by 31 bits is undefined.  Changing most
      significant bit to unsigned.
      
      Changes included in v2:
        - use subsystem specific subject lines
        - CC required mailing lists
      Signed-off-by: default avatarJiunn Chang <c0d1n61at3@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79293f49
    • Florian Westphal's avatar
      net: make skb_dst_force return true when dst is refcounted · b60a7738
      Florian Westphal authored
      netfilter did not expect that skb_dst_force() can cause skb to lose its
      dst entry.
      
      I got a bug report with a skb->dst NULL dereference in netfilter
      output path.  The backtrace contains nf_reinject(), so the dst might have
      been cleared when skb got queued to userspace.
      
      Other users were fixed via
      if (skb_dst(skb)) {
      	skb_dst_force(skb);
      	if (!skb_dst(skb))
      		goto handle_err;
      }
      
      But I think its preferable to make the 'dst might be cleared' part
      of the function explicit.
      
      In netfilter case, skb with a null dst is expected when queueing in
      prerouting hook, so drop skb for the other hooks.
      
      v2:
       v1 of this patch returned true in case skb had no dst entry.
       Eric said:
         Say if we have two skb_dst_force() calls for some reason
         on the same skb, only the first one will return false.
      
       This now returns false even when skb had no dst, as per Erics
       suggestion, so callers might need to check skb_dst() first before
       skb_dst_force().
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b60a7738
    • Xin Long's avatar
      sctp: not bind the socket in sctp_connect · 9b6c0887
      Xin Long authored
      Now when sctp_connect() is called with a wrong sa_family, it binds
      to a port but doesn't set bp->port, then sctp_get_af_specific will
      return NULL and sctp_connect() returns -EINVAL.
      
      Then if sctp_bind() is called to bind to another port, the last
      port it has bound will leak due to bp->port is NULL by then.
      
      sctp_connect() doesn't need to bind ports, as later __sctp_connect
      will do it if bp->port is NULL. So remove it from sctp_connect().
      While at it, remove the unnecessary sockaddr.sa_family len check
      as it's already done in sctp_inet_connect.
      
      Fixes: 644fbdea ("sctp: fix the issue that flags are ignored when using kernel_connect")
      Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b6c0887
  5. 28 Jun, 2019 10 commits