1. 02 Oct, 2020 7 commits
    • Coly Li's avatar
      tcp: use sendpage_ok() to detect misused .sendpage · cf83a17e
      Coly Li authored
      commit a10674bf ("tcp: detecting the misuse of .sendpage for Slab
      objects") adds the checks for Slab pages, but the pages don't have
      page_count are still missing from the check.
      
      Network layer's sendpage method is not designed to send page_count 0
      pages neither, therefore both PageSlab() and page_count() should be
      both checked for the sending page. This is exactly what sendpage_ok()
      does.
      
      This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused
      .sendpage, to make the code more robust.
      
      Fixes: a10674bf ("tcp: detecting the misuse of .sendpage for Slab objects")
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf83a17e
    • Coly Li's avatar
      nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() · 7d4194ab
      Coly Li authored
      Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
      send slab pages. But for pages allocated by __get_free_pages() without
      __GFP_COMP, which also have refcount as 0, they are still sent by
      kernel_sendpage() to remote end, this is problematic.
      
      The new introduced helper sendpage_ok() checks both PageSlab tag and
      page_count counter, and returns true if the checking page is OK to be
      sent by kernel_sendpage().
      
      This patch fixes the page checking issue of nvme_tcp_try_send_data()
      with sendpage_ok(). If sendpage_ok() returns true, send this page by
      kernel_sendpage(), otherwise use sock_no_sendpage to handle this page.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Vlastimil Babka <vbabka@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d4194ab
    • Coly Li's avatar
      net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send · 7b62d31d
      Coly Li authored
      If a page sent into kernel_sendpage() is a slab page or it doesn't have
      ref_count, this page is improper to send by the zero copy sendpage()
      method. Otherwise such page might be unexpected released in network code
      path and causes impredictable panic due to kernel memory management data
      structure corruption.
      
      This path adds a WARN_ON() on the sending page before sends it into the
      concrete zero-copy sendpage() method, if the page is improper for the
      zero-copy sendpage() method, a warning message can be observed before
      the consequential unpredictable kernel panic.
      
      This patch does not change existing kernel_sendpage() behavior for the
      improper page zero-copy send, it just provides hint warning message for
      following potential panic due the kernel memory heap corruption.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Cong Wang <amwang@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Sridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b62d31d
    • Coly Li's avatar
      net: introduce helper sendpage_ok() in include/linux/net.h · c381b079
      Coly Li authored
      The original problem was from nvme-over-tcp code, who mistakenly uses
      kernel_sendpage() to send pages allocated by __get_free_pages() without
      __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on
      tail pages, sending them by kernel_sendpage() may trigger a kernel panic
      from a corrupted kernel heap, because these pages are incorrectly freed
      in network stack as page_count 0 pages.
      
      This patch introduces a helper sendpage_ok(), it returns true if the
      checking page,
      - is not slab page: PageSlab(page) is false.
      - has page refcount: page_count(page) is not zero
      
      All drivers who want to send page to remote end by kernel_sendpage()
      may use this helper to check whether the page is OK. If the helper does
      not return true, the driver should try other non sendpage method (e.g.
      sock_no_sendpage()) to handle the page.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Vlastimil Babka <vbabka@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c381b079
    • Petko Manolov's avatar
      net: usb: pegasus: Proper error handing when setting pegasus' MAC address · f30e25a9
      Petko Manolov authored
      v2:
      
      If reading the MAC address from eeprom fail don't throw an error, use randomly
      generated MAC instead.  Either way the adapter will soldier on and the return
      type of set_ethernet_addr() can be reverted to void.
      
      v1:
      
      Fix a bug in set_ethernet_addr() which does not take into account possible
      errors (or partial reads) returned by its helpers.  This can potentially lead to
      writing random data into device's MAC address registers.
      Signed-off-by: default avatarPetko Manolov <petko.manolov@konsulko.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f30e25a9
    • Mauro Carvalho Chehab's avatar
      net: core: document two new elements of struct net_device · a93bdcb9
      Mauro Carvalho Chehab authored
      As warned by "make htmldocs", there are two new struct elements
      that aren't documented:
      
      	../include/linux/netdevice.h:2159: warning: Function parameter or member 'unlink_list' not described in 'net_device'
      	../include/linux/netdevice.h:2159: warning: Function parameter or member 'nested_level' not described in 'net_device'
      
      Fixes: 1fc70edb ("net: core: add nested_level variable in net_device")
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a93bdcb9
    • Johannes Berg's avatar
      netlink: fix policy dump leak · a95bc734
      Johannes Berg authored
      If userspace doesn't complete the policy dump, we leak the
      allocated state. Fix this.
      
      Fixes: d07dcf9a ("netlink: add infrastructure to expose policies to userspace")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a95bc734
  2. 01 Oct, 2020 2 commits
  3. 30 Sep, 2020 12 commits
  4. 29 Sep, 2020 16 commits
  5. 28 Sep, 2020 3 commits
    • Manivannan Sadhasivam's avatar
      net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks · a7809ff9
      Manivannan Sadhasivam authored
      The rcu read locks are needed to avoid potential race condition while
      dereferencing radix tree from multiple threads. The issue was identified
      by syzbot. Below is the crash report:
      
      =============================
      WARNING: suspicious RCU usage
      5.7.0-syzkaller #0 Not tainted
      -----------------------------
      include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by kworker/u4:1/21:
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline]
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241
       #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243
      
      stack backtrace:
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: qrtr_ns_handler qrtr_ns_worker
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1e9/0x30e lib/dump_stack.c:118
       radix_tree_deref_slot include/linux/radix-tree.h:176 [inline]
       ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline]
       qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674
       process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268
       worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414
       kthread+0x353/0x380 kernel/kthread.c:268
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com
      Signed-off-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7809ff9
    • David S. Miller's avatar
      Merge branch 'net-core-fix-a-lockdep-splat-in-the-dev_addr_list' · 0ba56b89
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: core: fix a lockdep splat in the dev_addr_list.
      
      This patchset is to avoid lockdep splat.
      
      When a stacked interface graph is changed, netif_addr_lock() is called
      recursively and it internally calls spin_lock_nested().
      The parameter of spin_lock_nested() is 'dev->lower_level',
      this is called subclass.
      The problem of 'dev->lower_level' is that while 'dev->lower_level' is
      being used as a subclass of spin_lock_nested(), its value can be changed.
      So, spin_lock_nested() would be called recursively with the same
      subclass value, the lockdep understands a deadlock.
      In order to avoid this, a new variable is needed and it is going to be
      used as a parameter of spin_lock_nested().
      The first and second patch is a preparation patch for the third patch.
      In the third patch, the problem will be fixed.
      
      The first patch is to add __netdev_upper_dev_unlink().
      An existed netdev_upper_dev_unlink() is renamed to
      __netdev_upper_dev_unlink(). and netdev_upper_dev_unlink()
      is added as an wrapper of this function.
      
      The second patch is to add the netdev_nested_priv structure.
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      The third patch is to add a new variable 'nested_level'
      into the net_device structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      Due to this variable, it can avoid lockdep splat.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ba56b89
    • Taehee Yoo's avatar
      net: core: add nested_level variable in net_device · 1fc70edb
      Taehee Yoo authored
      This patch is to add a new variable 'nested_level' into the net_device
      structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      
      netif_addr_lock() can be called recursively so spin_lock_nested() is
      used instead of spin_lock() and dev->lower_level is used as a parameter
      of spin_lock_nested().
      But, dev->lower_level value can be updated while it is being used.
      So, lockdep would warn a possible deadlock scenario.
      
      When a stacked interface is deleted, netif_{uc | mc}_sync() is
      called recursively.
      So, spin_lock_nested() is called recursively too.
      At this moment, the dev->lower_level variable is used as a parameter of it.
      dev->lower_level value is updated when interfaces are being unlinked/linked
      immediately.
      Thus, After unlinking, dev->lower_level shouldn't be a parameter of
      spin_lock_nested().
      
          A (macvlan)
          |
          B (vlan)
          |
          C (bridge)
          |
          D (macvlan)
          |
          E (vlan)
          |
          F (bridge)
      
          A->lower_level : 6
          B->lower_level : 5
          C->lower_level : 4
          D->lower_level : 3
          E->lower_level : 2
          F->lower_level : 1
      
      When an interface 'A' is removed, it releases resources.
      At this moment, netif_addr_lock() would be called.
      Then, netdev_upper_dev_unlink() is called recursively.
      Then dev->lower_level is updated.
      There is no problem.
      
      But, when the bridge module is removed, 'C' and 'F' interfaces
      are removed at once.
      If 'F' is removed first, a lower_level value is like below.
          A->lower_level : 5
          B->lower_level : 4
          C->lower_level : 3
          D->lower_level : 2
          E->lower_level : 1
          F->lower_level : 1
      
      Then, 'C' is removed. at this moment, netif_addr_lock() is called
      recursively.
      The ordering is like this.
      C(3)->D(2)->E(1)->F(1)
      At this moment, the lower_level value of 'E' and 'F' are the same.
      So, lockdep warns a possible deadlock scenario.
      
      In order to avoid this problem, a new variable 'nested_level' is added.
      This value is the same as dev->lower_level - 1.
      But this value is updated in rtnl_unlock().
      So, this variable can be used as a parameter of spin_lock_nested() safely
      in the rtnl context.
      
      Test commands:
         ip link add br0 type bridge vlan_filtering 1
         ip link add vlan1 link br0 type vlan id 10
         ip link add macvlan2 link vlan1 type macvlan
         ip link add br3 type bridge vlan_filtering 1
         ip link set macvlan2 master br3
         ip link add vlan4 link br3 type vlan id 10
         ip link add macvlan5 link vlan4 type macvlan
         ip link add br6 type bridge vlan_filtering 1
         ip link set macvlan5 master br6
         ip link add vlan7 link br6 type vlan id 10
         ip link add macvlan8 link vlan7 type macvlan
      
         ip link set br0 up
         ip link set vlan1 up
         ip link set macvlan2 up
         ip link set br3 up
         ip link set vlan4 up
         ip link set macvlan5 up
         ip link set br6 up
         ip link set vlan7 up
         ip link set macvlan8 up
         modprobe -rv bridge
      
      Splat looks like:
      [   36.057436][  T744] WARNING: possible recursive locking detected
      [   36.058848][  T744] 5.9.0-rc6+ #728 Not tainted
      [   36.059959][  T744] --------------------------------------------
      [   36.061391][  T744] ip/744 is trying to acquire lock:
      [   36.062590][  T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
      [   36.064922][  T744]
      [   36.064922][  T744] but task is already holding lock:
      [   36.066626][  T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.068851][  T744]
      [   36.068851][  T744] other info that might help us debug this:
      [   36.070731][  T744]  Possible unsafe locking scenario:
      [   36.070731][  T744]
      [   36.072497][  T744]        CPU0
      [   36.073238][  T744]        ----
      [   36.074007][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.075290][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.076590][  T744]
      [   36.076590][  T744]  *** DEADLOCK ***
      [   36.076590][  T744]
      [   36.078515][  T744]  May be due to missing lock nesting notation
      [   36.078515][  T744]
      [   36.080491][  T744] 3 locks held by ip/744:
      [   36.081471][  T744]  #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
      [   36.083614][  T744]  #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.085942][  T744]  #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
      [   36.088400][  T744]
      [   36.088400][  T744] stack backtrace:
      [   36.089772][  T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
      [   36.091364][  T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   36.093630][  T744] Call Trace:
      [   36.094416][  T744]  dump_stack+0x77/0x9b
      [   36.095385][  T744]  __lock_acquire+0xbc3/0x1f40
      [   36.096522][  T744]  lock_acquire+0xb4/0x3b0
      [   36.097540][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.098657][  T744]  ? rtmsg_ifinfo+0x1f/0x30
      [   36.099711][  T744]  ? __dev_notify_flags+0xa5/0xf0
      [   36.100874][  T744]  ? rtnl_is_locked+0x11/0x20
      [   36.101967][  T744]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   36.103230][  T744]  _raw_spin_lock_bh+0x38/0x70
      [   36.104348][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.105461][  T744]  dev_set_rx_mode+0x19/0x30
      [   36.106532][  T744]  dev_set_promiscuity+0x36/0x50
      [   36.107692][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.108929][  T744]  dev_set_promiscuity+0x1e/0x50
      [   36.110093][  T744]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   36.111415][  T744]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   36.112728][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.113967][  T744]  ? __hw_addr_sync_one+0x23/0x50
      [   36.115135][  T744]  __dev_set_rx_mode+0x68/0x90
      [   36.116249][  T744]  dev_uc_sync+0x70/0x80
      [   36.117244][  T744]  dev_uc_add+0x50/0x60
      [   36.118223][  T744]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   36.119470][  T744]  __dev_open+0xd6/0x170
      [   36.120470][  T744]  __dev_change_flags+0x181/0x1d0
      [   36.121644][  T744]  dev_change_flags+0x23/0x60
      [   36.122741][  T744]  do_setlink+0x30a/0x11e0
      [   36.123778][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.124929][  T744]  ? __nla_validate_parse.part.6+0x45/0x8e0
      [   36.126309][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.127457][  T744]  __rtnl_newlink+0x546/0x8e0
      [   36.128560][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.129623][  T744]  ? deactivate_slab.isra.85+0x6a1/0x850
      [   36.130946][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.132102][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.133176][  T744]  ? is_bpf_text_address+0x5/0xe0
      [   36.134364][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.135445][  T744]  ? rcu_read_lock_sched_held+0x32/0x60
      [   36.136771][  T744]  ? kmem_cache_alloc_trace+0x2d8/0x380
      [   36.138070][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.139164][  T744]  rtnl_newlink+0x47/0x70
      [ ... ]
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fc70edb