• Tejun Heo's avatar
    kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() · 4207b556
    Tejun Heo authored
    The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
    which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
    can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
    programs including e.g. the ones that attach to functions which are holding
    the scheduler rq lock.
    
    Consider the following BPF program:
    
      SEC("fentry/__set_cpus_allowed_ptr_locked")
      int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
    	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
      {
    	  struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id);
    
    	  if (cgrp) {
    		  bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name);
    		  bpf_cgroup_release(cgrp);
    	  }
    	  return 0;
      }
    
    __set_cpus_allowed_ptr_locked() is called with rq lock held and the above
    BPF program calls bpf_cgroup_from_id() within leading to the following
    lockdep warning:
    
      =====================================================
      WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
      6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
      -----------------------------------------------------
      repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70
    
    		and this task is already holding:
      ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
      which would create a new lock dependency:
       (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
      ...
       Possible interrupt unsafe locking scenario:
    
    	 CPU0                    CPU1
    	 ----                    ----
        lock(kernfs_idr_lock);
    				 local_irq_disable();
    				 lock(&rq->__lock);
    				 lock(kernfs_idr_lock);
        <Interrupt>
          lock(&rq->__lock);
    
    		 *** DEADLOCK ***
      ...
      Call Trace:
       dump_stack_lvl+0x55/0x70
       dump_stack+0x10/0x20
       __lock_acquire+0x781/0x2a40
       lock_acquire+0xbf/0x1f0
       _raw_spin_lock+0x2f/0x40
       kernfs_find_and_get_node_by_id+0x1e/0x70
       cgroup_get_from_id+0x21/0x240
       bpf_cgroup_from_id+0xe/0x20
       bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
       bpf_trampoline_6442545632+0x4f/0x1000
       __set_cpus_allowed_ptr_locked+0x5/0x5a0
       sched_setaffinity+0x1b3/0x290
       __x64_sys_sched_setaffinity+0x4f/0x60
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x46/0x4e
    
    Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
    kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
    kernfs_idr_lock.
    
    This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
    Combined with the preceding rearrange patch, the net increase is 8 bytes.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: Andrea Righi <andrea.righi@canonical.com>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    4207b556
kernfs-internal.h 4.57 KB