1. 31 Jul, 2023 1 commit
    • Hou Tao's avatar
      bpf, cpumap: Make sure kthread is running before map update returns · 640a6045
      Hou Tao authored
      The following warning was reported when running stress-mode enabled
      xdp_redirect_cpu with some RT threads:
      
        ------------[ cut here ]------------
        WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
        CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
        Workqueue: events cpu_map_kthread_stop
        RIP: 0010:put_cpu_map_entry+0xda/0x220
        ......
        Call Trace:
         <TASK>
         ? show_regs+0x65/0x70
         ? __warn+0xa5/0x240
         ......
         ? put_cpu_map_entry+0xda/0x220
         cpu_map_kthread_stop+0x41/0x60
         process_one_work+0x6b0/0xb80
         worker_thread+0x96/0x720
         kthread+0x1a5/0x1f0
         ret_from_fork+0x3a/0x70
         ret_from_fork_asm+0x1b/0x30
         </TASK>
      
      The root cause is the same as commit 43690164 ("bpf: cpumap: Fix memory
      leak in cpu_map_update_elem"). The kthread is stopped prematurely by
      kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
      cpu_map_kthread_run() at all but XDP program has already queued some
      frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
      the ptr_ring, it will find it was not emptied and report a warning.
      
      An alternative fix is to use __cpu_map_ring_cleanup() to drop these
      pending frames or skbs when kthread_stop() returns -EINTR, but it may
      confuse the user, because these frames or skbs have been handled
      correctly by XDP program. So instead of dropping these frames or skbs,
      just make sure the per-cpu kthread is running before
      __cpu_map_entry_alloc() returns.
      
      After apply the fix, the error handle for kthread_stop() will be
      unnecessary because it will always return 0, so just remove it.
      
      Fixes: 6710e112 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Reviewed-by: default avatarPu Lehui <pulehui@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      640a6045
  2. 27 Jul, 2023 1 commit
  3. 26 Jul, 2023 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-disable-preemption-in-perf_event_output-helpers-code' · aa89592f
      Alexei Starovoitov authored
      Jiri Olsa says:
      
      ====================
      bpf: Disable preemption in perf_event_output helpers code
      
      hi,
      we got report of kernel crash [1][3] within bpf_event_output helper.
      
      The reason is the nesting protection code in bpf_event_output that expects
      disabled preemption, which is not guaranteed for programs executed by
      bpf_prog_run_array_cg.
      
      I managed to reproduce on tracing side where we have the same problem
      in bpf_perf_event_output. The reproducer [2] just creates busy uprobe
      and call bpf_perf_event_output helper a lot.
      
      v3 changes:
        - added acks and fixed 'Fixes' tag style [Hou Tao]
        - added Closes tag to patch 2
      
      v2 changes:
        - I changed 'Fixes' commits to where I saw we switched from preempt_disable
          to migrate_disable, but I'm not completely sure about the patch 2, because
          it was tricky to find, would be nice if somebody could check on that
      
      thanks,
      jirka
      
      [1] https://github.com/cilium/cilium/issues/26756
      [2] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf_output_fix_reproducer&id=8054dcc634121b884c7c331329d61d93351d03b5
      [3] slack:
          [66194.378161] BUG: kernel NULL pointer dereference, address: 0000000000000001
          [66194.378324] #PF: supervisor instruction fetch in kernel mode
          [66194.378447] #PF: error_code(0x0010) - not-present page
          ...
          [66194.378692] Oops: 0010 [#1] PREEMPT SMP NOPTI
          ...
          [66194.380666]  <TASK>
          [66194.380775]  ? perf_output_sample+0x12a/0x9a0
          [66194.380902]  ? finish_task_switch.isra.0+0x81/0x280
          [66194.381024]  ? perf_event_output+0x66/0xa0
          [66194.381148]  ? bpf_event_output+0x13a/0x190
          [66194.381270]  ? bpf_event_output_data+0x22/0x40
          [66194.381391]  ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
          [66194.381519]  ? xa_load+0x87/0xe0
          [66194.381635]  ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
          [66194.381759]  ? release_sock+0x3e/0x90
          [66194.381876]  ? sk_setsockopt+0x1a1/0x12f0
          [66194.381996]  ? udp_pre_connect+0x36/0x50
          [66194.382114]  ? inet_dgram_connect+0x93/0xa0
          [66194.382233]  ? __sys_connect+0xb4/0xe0
          [66194.382353]  ? udp_setsockopt+0x27/0x40
          [66194.382470]  ? __pfx_udp_push_pending_frames+0x10/0x10
          [66194.382593]  ? __sys_setsockopt+0xdf/0x1a0
          [66194.382713]  ? __x64_sys_connect+0xf/0x20
          [66194.382832]  ? do_syscall_64+0x3a/0x90
          [66194.382949]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
          [66194.383077]  </TASK>
      ---
      ====================
      
      Link: https://lore.kernel.org/r/20230725084206.580930-1-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      aa89592f
    • Jiri Olsa's avatar
      bpf: Disable preemption in bpf_event_output · d62cc390
      Jiri Olsa authored
      We received report [1] of kernel crash, which is caused by
      using nesting protection without disabled preemption.
      
      The bpf_event_output can be called by programs executed by
      bpf_prog_run_array_cg function that disabled migration but
      keeps preemption enabled.
      
      This can cause task to be preempted by another one inside the
      nesting protection and lead eventually to two tasks using same
      perf_sample_data buffer and cause crashes like:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000001
        #PF: supervisor instruction fetch in kernel mode
        #PF: error_code(0x0010) - not-present page
        ...
        ? perf_output_sample+0x12a/0x9a0
        ? finish_task_switch.isra.0+0x81/0x280
        ? perf_event_output+0x66/0xa0
        ? bpf_event_output+0x13a/0x190
        ? bpf_event_output_data+0x22/0x40
        ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
        ? xa_load+0x87/0xe0
        ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
        ? release_sock+0x3e/0x90
        ? sk_setsockopt+0x1a1/0x12f0
        ? udp_pre_connect+0x36/0x50
        ? inet_dgram_connect+0x93/0xa0
        ? __sys_connect+0xb4/0xe0
        ? udp_setsockopt+0x27/0x40
        ? __pfx_udp_push_pending_frames+0x10/0x10
        ? __sys_setsockopt+0xdf/0x1a0
        ? __x64_sys_connect+0xf/0x20
        ? do_syscall_64+0x3a/0x90
        ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Fixing this by disabling preemption in bpf_event_output.
      
      [1] https://github.com/cilium/cilium/issues/26756
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOleg "livelace" Popov <o.popov@livelace.ru>
      Closes: https://github.com/cilium/cilium/issues/26756
      Fixes: 2a916f2f ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d62cc390
    • Jiri Olsa's avatar
      bpf: Disable preemption in bpf_perf_event_output · f2c67a3e
      Jiri Olsa authored
      The nesting protection in bpf_perf_event_output relies on disabled
      preemption, which is guaranteed for kprobes and tracepoints.
      
      However bpf_perf_event_output can be also called from uprobes context
      through bpf_prog_run_array_sleepable function which disables migration,
      but keeps preemption enabled.
      
      This can cause task to be preempted by another one inside the nesting
      protection and lead eventually to two tasks using same perf_sample_data
      buffer and cause crashes like:
      
        kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
        BUG: unable to handle page fault for address: ffffffff82be3eea
        ...
        Call Trace:
         ? __die+0x1f/0x70
         ? page_fault_oops+0x176/0x4d0
         ? exc_page_fault+0x132/0x230
         ? asm_exc_page_fault+0x22/0x30
         ? perf_output_sample+0x12b/0x910
         ? perf_event_output+0xd0/0x1d0
         ? bpf_perf_event_output+0x162/0x1d0
         ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
         ? __uprobe_perf_func+0x12b/0x540
         ? uprobe_dispatcher+0x2c4/0x430
         ? uprobe_notify_resume+0x2da/0xce0
         ? atomic_notifier_call_chain+0x7b/0x110
         ? exit_to_user_mode_prepare+0x13e/0x290
         ? irqentry_exit_to_user_mode+0x5/0x30
         ? asm_exc_int3+0x35/0x40
      
      Fixing this by disabling preemption in bpf_perf_event_output.
      
      Cc: stable@vger.kernel.org
      Fixes: 8c7dcb84 ("bpf: implement sleepable uprobes by chaining gps")
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f2c67a3e
  4. 25 Jul, 2023 8 commits
  5. 24 Jul, 2023 15 commits
  6. 23 Jul, 2023 1 commit
  7. 22 Jul, 2023 1 commit
  8. 21 Jul, 2023 3 commits
  9. 20 Jul, 2023 7 commits