1. 20 May, 2023 4 commits
    • Aditi Ghag's avatar
      udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state · e4fe1bf1
      Aditi Ghag authored
      This is a preparatory commit to remove the field. The field was
      previously shared between proc fs and BPF UDP socket iterators. As the
      follow-up commits will decouple the implementation for the iterators,
      remove the field. As for BPF socket iterator, filtering of sockets is
      exepected to be done in BPF programs.
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20230519225157.760788-5-aditi.ghag@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      e4fe1bf1
    • Aditi Ghag's avatar
      bpf: udp: Encapsulate logic to get udp table · 7625d2e9
      Aditi Ghag authored
      This is a preparatory commit that encapsulates the logic
      to get udp table in iterator inside udp_get_table_afinfo, and
      renames the function to `udp_get_table_seq` accordingly.
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20230519225157.760788-4-aditi.ghag@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      7625d2e9
    • Aditi Ghag's avatar
      udp: seq_file: Helper function to match socket attributes · f44b1c51
      Aditi Ghag authored
      This is a preparatory commit to refactor code that matches socket
      attributes in iterators to a helper function, and use it in the
      proc fs iterator.
      Signed-off-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20230519225157.760788-3-aditi.ghag@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      f44b1c51
    • Aditi Ghag's avatar
      bpf: tcp: Avoid taking fast sock lock in iterator · 9378096e
      Aditi Ghag authored
      This is a preparatory commit to replace `lock_sock_fast` with
      `lock_sock`,and facilitate BPF programs executed from the TCP sockets
      iterator to be able to destroy TCP sockets using the bpf_sock_destroy
      kfunc (implemented in follow-up commits).
      
      Previously, BPF TCP iterator was acquiring the sock lock with BH
      disabled. This led to scenarios where the sockets hash table bucket lock
      can be acquired with BH enabled in some path versus disabled in other.
      In such situation, kernel issued a warning since it thinks that in the
      BH enabled path the same bucket lock *might* be acquired again in the
      softirq context (BH disabled), which will lead to a potential dead lock.
      Since bpf_sock_destroy also happens in a process context, the potential
      deadlock warning is likely a false alarm.
      
      Here is a snippet of annotated stack trace that motivated this change:
      
      ```
      
      Possible interrupt unsafe locking scenario:
      
            CPU0                    CPU1
            ----                    ----
       lock(&h->lhash2[i].lock);
                                    local_bh_disable();
                                    lock(&h->lhash2[i].lock);
      kernel imagined possible scenario:
        local_bh_disable();  /* Possible softirq */
        lock(&h->lhash2[i].lock);
      *** Potential Deadlock ***
      
      process context:
      
      lock_acquire+0xcd/0x330
      _raw_spin_lock+0x33/0x40
      ------> Acquire (bucket) lhash2.lock with BH enabled
      __inet_hash+0x4b/0x210
      inet_csk_listen_start+0xe6/0x100
      inet_listen+0x95/0x1d0
      __sys_listen+0x69/0xb0
      __x64_sys_listen+0x14/0x20
      do_syscall_64+0x3c/0x90
      entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      bpf_sock_destroy run from iterator:
      
      lock_acquire+0xcd/0x330
      _raw_spin_lock+0x33/0x40
      ------> Acquire (bucket) lhash2.lock with BH disabled
      inet_unhash+0x9a/0x110
      tcp_set_state+0x6a/0x210
      tcp_abort+0x10d/0x200
      bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9
      bpf_iter_run_prog+0x1ff/0x340
      ------> lock_sock_fast that acquires sock lock with BH disabled
      bpf_iter_tcp_seq_show+0xca/0x190
      bpf_seq_read+0x177/0x450
      
      ```
      
      Also, Yonghong reported a deadlock for non-listening TCP sockets that
      this change resolves. Previously, `lock_sock_fast` held the sock spin
      lock with BH which was again being acquired in `tcp_abort`:
      
      ```
      watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331]
      RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500
      Call Trace:
       <TASK>
       _raw_spin_lock+0x84/0x90
       tcp_abort+0x13c/0x1f0
       bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89
       bpf_iter_run_prog+0x1aa/0x2c0
       ? preempt_count_sub+0x1c/0xd0
       ? from_kuid_munged+0x1c8/0x210
       bpf_iter_tcp_seq_show+0x14e/0x1b0
       bpf_seq_read+0x36c/0x6a0
      
      bpf_iter_tcp_seq_show
         lock_sock_fast
           __lock_sock_fast
             spin_lock_bh(&sk->sk_lock.slock);
      	/* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */
      
       ...
       tcp_abort
         local_bh_disable();
         spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk)
      
      ```
      
      With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning:
      
      ```
      lock_sock
          lock_sock_nested
             spin_lock_bh(&sk->sk_lock.slock);
             :
             spin_unlock_bh(&sk->sk_lock.slock);
      ```
      Acked-by: default avatarYonghong Song <yhs@meta.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      9378096e
  2. 19 May, 2023 3 commits
  3. 17 May, 2023 32 commits
  4. 16 May, 2023 1 commit