1. 08 Oct, 2020 1 commit
    • Daniel Borkmann's avatar
      bpf: Fix scalar32_min_max_or bounds tracking · 5b9fbeb7
      Daniel Borkmann authored
      Simon reported an issue with the current scalar32_min_max_or() implementation.
      That is, compared to the other 32 bit subreg tracking functions, the code in
      scalar32_min_max_or() stands out that it's using the 64 bit registers instead
      of 32 bit ones. This leads to bounds tracking issues, for example:
      
        [...]
        8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        8: (79) r1 = *(u64 *)(r0 +0)
         R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        9: (b7) r0 = 1
        10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        10: (18) r2 = 0x600000002
        12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        12: (ad) if r1 < r2 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: (95) exit
        14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        14: (25) if r1 > 0x0 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: (95) exit
        16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        16: (47) r1 |= 0
        17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        [...]
      
      The bound tests on the map value force the upper unsigned bound to be 25769803777
      in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By
      using OR they are truncated and thus result in the range [1,1] for the 32 bit reg
      tracker. This is incorrect given the only thing we know is that the value must be
      positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the
      subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes
      sense, for example, for the case where we update dst_reg->s32_{min,max}_value in
      the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as
      we know that these are positive. Previously, in the else branch the 64 bit values
      of umin_value=1 and umax_value=32212254719 were used and latter got truncated to
      be 1 as upper bound there. After the fix the subreg range is now correct:
      
        [...]
        8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        8: (79) r1 = *(u64 *)(r0 +0)
         R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
        9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        9: (b7) r0 = 1
        10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
        10: (18) r2 = 0x600000002
        12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        12: (ad) if r1 < r2 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        13: (95) exit
        14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        14: (25) if r1 > 0x0 goto pc+1
         R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        15: (95) exit
        16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        16: (47) r1 |= 0
        17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
        [...]
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: default avatarSimon Scannell <scannell.smn@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5b9fbeb7
  2. 01 Oct, 2020 1 commit
  3. 30 Sep, 2020 12 commits
  4. 29 Sep, 2020 16 commits
  5. 28 Sep, 2020 8 commits
    • Manivannan Sadhasivam's avatar
      net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks · a7809ff9
      Manivannan Sadhasivam authored
      The rcu read locks are needed to avoid potential race condition while
      dereferencing radix tree from multiple threads. The issue was identified
      by syzbot. Below is the crash report:
      
      =============================
      WARNING: suspicious RCU usage
      5.7.0-syzkaller #0 Not tainted
      -----------------------------
      include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by kworker/u4:1/21:
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline]
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241
       #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243
      
      stack backtrace:
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: qrtr_ns_handler qrtr_ns_worker
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1e9/0x30e lib/dump_stack.c:118
       radix_tree_deref_slot include/linux/radix-tree.h:176 [inline]
       ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline]
       qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674
       process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268
       worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414
       kthread+0x353/0x380 kernel/kthread.c:268
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com
      Signed-off-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7809ff9
    • David S. Miller's avatar
      Merge branch 'net-core-fix-a-lockdep-splat-in-the-dev_addr_list' · 0ba56b89
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: core: fix a lockdep splat in the dev_addr_list.
      
      This patchset is to avoid lockdep splat.
      
      When a stacked interface graph is changed, netif_addr_lock() is called
      recursively and it internally calls spin_lock_nested().
      The parameter of spin_lock_nested() is 'dev->lower_level',
      this is called subclass.
      The problem of 'dev->lower_level' is that while 'dev->lower_level' is
      being used as a subclass of spin_lock_nested(), its value can be changed.
      So, spin_lock_nested() would be called recursively with the same
      subclass value, the lockdep understands a deadlock.
      In order to avoid this, a new variable is needed and it is going to be
      used as a parameter of spin_lock_nested().
      The first and second patch is a preparation patch for the third patch.
      In the third patch, the problem will be fixed.
      
      The first patch is to add __netdev_upper_dev_unlink().
      An existed netdev_upper_dev_unlink() is renamed to
      __netdev_upper_dev_unlink(). and netdev_upper_dev_unlink()
      is added as an wrapper of this function.
      
      The second patch is to add the netdev_nested_priv structure.
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      The third patch is to add a new variable 'nested_level'
      into the net_device structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      Due to this variable, it can avoid lockdep splat.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ba56b89
    • Taehee Yoo's avatar
      net: core: add nested_level variable in net_device · 1fc70edb
      Taehee Yoo authored
      This patch is to add a new variable 'nested_level' into the net_device
      structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      
      netif_addr_lock() can be called recursively so spin_lock_nested() is
      used instead of spin_lock() and dev->lower_level is used as a parameter
      of spin_lock_nested().
      But, dev->lower_level value can be updated while it is being used.
      So, lockdep would warn a possible deadlock scenario.
      
      When a stacked interface is deleted, netif_{uc | mc}_sync() is
      called recursively.
      So, spin_lock_nested() is called recursively too.
      At this moment, the dev->lower_level variable is used as a parameter of it.
      dev->lower_level value is updated when interfaces are being unlinked/linked
      immediately.
      Thus, After unlinking, dev->lower_level shouldn't be a parameter of
      spin_lock_nested().
      
          A (macvlan)
          |
          B (vlan)
          |
          C (bridge)
          |
          D (macvlan)
          |
          E (vlan)
          |
          F (bridge)
      
          A->lower_level : 6
          B->lower_level : 5
          C->lower_level : 4
          D->lower_level : 3
          E->lower_level : 2
          F->lower_level : 1
      
      When an interface 'A' is removed, it releases resources.
      At this moment, netif_addr_lock() would be called.
      Then, netdev_upper_dev_unlink() is called recursively.
      Then dev->lower_level is updated.
      There is no problem.
      
      But, when the bridge module is removed, 'C' and 'F' interfaces
      are removed at once.
      If 'F' is removed first, a lower_level value is like below.
          A->lower_level : 5
          B->lower_level : 4
          C->lower_level : 3
          D->lower_level : 2
          E->lower_level : 1
          F->lower_level : 1
      
      Then, 'C' is removed. at this moment, netif_addr_lock() is called
      recursively.
      The ordering is like this.
      C(3)->D(2)->E(1)->F(1)
      At this moment, the lower_level value of 'E' and 'F' are the same.
      So, lockdep warns a possible deadlock scenario.
      
      In order to avoid this problem, a new variable 'nested_level' is added.
      This value is the same as dev->lower_level - 1.
      But this value is updated in rtnl_unlock().
      So, this variable can be used as a parameter of spin_lock_nested() safely
      in the rtnl context.
      
      Test commands:
         ip link add br0 type bridge vlan_filtering 1
         ip link add vlan1 link br0 type vlan id 10
         ip link add macvlan2 link vlan1 type macvlan
         ip link add br3 type bridge vlan_filtering 1
         ip link set macvlan2 master br3
         ip link add vlan4 link br3 type vlan id 10
         ip link add macvlan5 link vlan4 type macvlan
         ip link add br6 type bridge vlan_filtering 1
         ip link set macvlan5 master br6
         ip link add vlan7 link br6 type vlan id 10
         ip link add macvlan8 link vlan7 type macvlan
      
         ip link set br0 up
         ip link set vlan1 up
         ip link set macvlan2 up
         ip link set br3 up
         ip link set vlan4 up
         ip link set macvlan5 up
         ip link set br6 up
         ip link set vlan7 up
         ip link set macvlan8 up
         modprobe -rv bridge
      
      Splat looks like:
      [   36.057436][  T744] WARNING: possible recursive locking detected
      [   36.058848][  T744] 5.9.0-rc6+ #728 Not tainted
      [   36.059959][  T744] --------------------------------------------
      [   36.061391][  T744] ip/744 is trying to acquire lock:
      [   36.062590][  T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
      [   36.064922][  T744]
      [   36.064922][  T744] but task is already holding lock:
      [   36.066626][  T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.068851][  T744]
      [   36.068851][  T744] other info that might help us debug this:
      [   36.070731][  T744]  Possible unsafe locking scenario:
      [   36.070731][  T744]
      [   36.072497][  T744]        CPU0
      [   36.073238][  T744]        ----
      [   36.074007][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.075290][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.076590][  T744]
      [   36.076590][  T744]  *** DEADLOCK ***
      [   36.076590][  T744]
      [   36.078515][  T744]  May be due to missing lock nesting notation
      [   36.078515][  T744]
      [   36.080491][  T744] 3 locks held by ip/744:
      [   36.081471][  T744]  #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
      [   36.083614][  T744]  #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.085942][  T744]  #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
      [   36.088400][  T744]
      [   36.088400][  T744] stack backtrace:
      [   36.089772][  T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
      [   36.091364][  T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   36.093630][  T744] Call Trace:
      [   36.094416][  T744]  dump_stack+0x77/0x9b
      [   36.095385][  T744]  __lock_acquire+0xbc3/0x1f40
      [   36.096522][  T744]  lock_acquire+0xb4/0x3b0
      [   36.097540][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.098657][  T744]  ? rtmsg_ifinfo+0x1f/0x30
      [   36.099711][  T744]  ? __dev_notify_flags+0xa5/0xf0
      [   36.100874][  T744]  ? rtnl_is_locked+0x11/0x20
      [   36.101967][  T744]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   36.103230][  T744]  _raw_spin_lock_bh+0x38/0x70
      [   36.104348][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.105461][  T744]  dev_set_rx_mode+0x19/0x30
      [   36.106532][  T744]  dev_set_promiscuity+0x36/0x50
      [   36.107692][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.108929][  T744]  dev_set_promiscuity+0x1e/0x50
      [   36.110093][  T744]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   36.111415][  T744]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   36.112728][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.113967][  T744]  ? __hw_addr_sync_one+0x23/0x50
      [   36.115135][  T744]  __dev_set_rx_mode+0x68/0x90
      [   36.116249][  T744]  dev_uc_sync+0x70/0x80
      [   36.117244][  T744]  dev_uc_add+0x50/0x60
      [   36.118223][  T744]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   36.119470][  T744]  __dev_open+0xd6/0x170
      [   36.120470][  T744]  __dev_change_flags+0x181/0x1d0
      [   36.121644][  T744]  dev_change_flags+0x23/0x60
      [   36.122741][  T744]  do_setlink+0x30a/0x11e0
      [   36.123778][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.124929][  T744]  ? __nla_validate_parse.part.6+0x45/0x8e0
      [   36.126309][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.127457][  T744]  __rtnl_newlink+0x546/0x8e0
      [   36.128560][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.129623][  T744]  ? deactivate_slab.isra.85+0x6a1/0x850
      [   36.130946][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.132102][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.133176][  T744]  ? is_bpf_text_address+0x5/0xe0
      [   36.134364][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.135445][  T744]  ? rcu_read_lock_sched_held+0x32/0x60
      [   36.136771][  T744]  ? kmem_cache_alloc_trace+0x2d8/0x380
      [   36.138070][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.139164][  T744]  rtnl_newlink+0x47/0x70
      [ ... ]
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fc70edb
    • Taehee Yoo's avatar
      net: core: introduce struct netdev_nested_priv for nested interface infrastructure · eff74233
      Taehee Yoo authored
      Functions related to nested interface infrastructure such as
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      In the following patch, a new member variable will be added into this
      struct to fix the lockdep issue.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eff74233
    • Taehee Yoo's avatar
      net: core: add __netdev_upper_dev_unlink() · fe8300fd
      Taehee Yoo authored
      The netdev_upper_dev_unlink() has to work differently according to flags.
      This idea is the same with __netdev_upper_dev_link().
      
      In the following patches, new flags will be added.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe8300fd
    • Cong Wang's avatar
      net_sched: remove a redundant goto chain check · 1aad8049
      Cong Wang authored
      All TC actions call tcf_action_check_ctrlact() to validate
      goto chain, so this check in tcf_action_init_1() is actually
      redundant. Remove it to save troubles of leaking memory.
      
      Fixes: e49d8c22 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
      Reported-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Suggested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aad8049
    • Nikolay Aleksandrov's avatar
      net: bridge: fdb: don't flush ext_learn entries · f2f3729f
      Nikolay Aleksandrov authored
      When a user-space software manages fdb entries externally it should
      set the ext_learn flag which marks the fdb entry as externally managed
      and avoids expiring it (they're treated as static fdbs). Unfortunately
      on events where fdb entries are flushed (STP down, netlink fdb flush
      etc) these fdbs are also deleted automatically by the bridge. That in turn
      causes trouble for the managing user-space software (e.g. in MLAG setups
      we lose remote fdb entries on port flaps).
      These entries are completely externally managed so we should avoid
      automatically deleting them, the only exception are offloaded entries
      (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
      before.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f3729f
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · a4be47af
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-09-28
      
      1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set.
         From YueHaibing.
      
      2) Restore IPCB on espintcp before handing the packet to xfrm
         as the information there is still needed.
         From Sabrina Dubroca.
      
      3) Fix pmtu updating for xfrm interfaces.
         From Sabrina Dubroca.
      
      4) Some xfrm state information was not cloned with xfrm_do_migrate.
         Fixes to clone the full xfrm state, from Antony Antony.
      
      5) Use the correct address family in xfrm_state_find. The struct
         flowi must always be interpreted along with the original
         address family. This got lost over the years.
         Fix from Herbert Xu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4be47af
  6. 27 Sep, 2020 2 commits