1. 31 Jul, 2023 3 commits
    • Martin KaFai Lau's avatar
      Merge branch 'Two fixes for cpu-map' · 4c9fbff5
      Martin KaFai Lau authored
      Hou Tao says:
      
      ====================
      
      The patchset fixes two reported warning in cpu-map when running
      xdp_redirect_cpu and some RT threads concurrently. Patch #1 fixes
      the warning in __cpu_map_ring_cleanup() when kthread is stopped
      prematurely. Patch #2 fixes the warning in __xdp_return() when
      there are pending skbs in ptr_ring.
      
      Please see individual patches for more details. And comments are always
      welcome.
      
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      4c9fbff5
    • Hou Tao's avatar
      bpf, cpumap: Handle skb as well when clean up ptr_ring · 7c62b75c
      Hou Tao authored
      The following warning was reported when running xdp_redirect_cpu with
      both skb-mode and stress-mode enabled:
      
        ------------[ cut here ]------------
        Incorrect XDP memory type (-2128176192) usage
        WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
        Modules linked in:
        CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G  6.5.0-rc2+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
        Workqueue: events __cpu_map_entry_free
        RIP: 0010:__xdp_return+0x1e4/0x4a0
        ......
        Call Trace:
         <TASK>
         ? show_regs+0x65/0x70
         ? __warn+0xa5/0x240
         ? __xdp_return+0x1e4/0x4a0
         ......
         xdp_return_frame+0x4d/0x150
         __cpu_map_entry_free+0xf9/0x230
         process_one_work+0x6b0/0xb80
         worker_thread+0x96/0x720
         kthread+0x1a5/0x1f0
         ret_from_fork+0x3a/0x70
         ret_from_fork_asm+0x1b/0x30
         </TASK>
      
      The reason for the warning is twofold. One is due to the kthread
      cpu_map_kthread_run() is stopped prematurely. Another one is
      __cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
      ptr_ring as XDP frames.
      
      Prematurely-stopped kthread will be fixed by the preceding patch and
      ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
      as the comments in __cpu_map_ring_cleanup() said, handling and freeing
      skbs in ptr_ring as well to "catch any broken behaviour gracefully".
      
      Fixes: 11941f8a ("bpf: cpumap: Implement generic cpumap")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      7c62b75c
    • Hou Tao's avatar
      bpf, cpumap: Make sure kthread is running before map update returns · 640a6045
      Hou Tao authored
      The following warning was reported when running stress-mode enabled
      xdp_redirect_cpu with some RT threads:
      
        ------------[ cut here ]------------
        WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
        CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
        Workqueue: events cpu_map_kthread_stop
        RIP: 0010:put_cpu_map_entry+0xda/0x220
        ......
        Call Trace:
         <TASK>
         ? show_regs+0x65/0x70
         ? __warn+0xa5/0x240
         ......
         ? put_cpu_map_entry+0xda/0x220
         cpu_map_kthread_stop+0x41/0x60
         process_one_work+0x6b0/0xb80
         worker_thread+0x96/0x720
         kthread+0x1a5/0x1f0
         ret_from_fork+0x3a/0x70
         ret_from_fork_asm+0x1b/0x30
         </TASK>
      
      The root cause is the same as commit 43690164 ("bpf: cpumap: Fix memory
      leak in cpu_map_update_elem"). The kthread is stopped prematurely by
      kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
      cpu_map_kthread_run() at all but XDP program has already queued some
      frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
      the ptr_ring, it will find it was not emptied and report a warning.
      
      An alternative fix is to use __cpu_map_ring_cleanup() to drop these
      pending frames or skbs when kthread_stop() returns -EINTR, but it may
      confuse the user, because these frames or skbs have been handled
      correctly by XDP program. So instead of dropping these frames or skbs,
      just make sure the per-cpu kthread is running before
      __cpu_map_entry_alloc() returns.
      
      After apply the fix, the error handle for kthread_stop() will be
      unnecessary because it will always return 0, so just remove it.
      
      Fixes: 6710e112 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Reviewed-by: default avatarPu Lehui <pulehui@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      640a6045
  2. 27 Jul, 2023 1 commit
  3. 26 Jul, 2023 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-disable-preemption-in-perf_event_output-helpers-code' · aa89592f
      Alexei Starovoitov authored
      Jiri Olsa says:
      
      ====================
      bpf: Disable preemption in perf_event_output helpers code
      
      hi,
      we got report of kernel crash [1][3] within bpf_event_output helper.
      
      The reason is the nesting protection code in bpf_event_output that expects
      disabled preemption, which is not guaranteed for programs executed by
      bpf_prog_run_array_cg.
      
      I managed to reproduce on tracing side where we have the same problem
      in bpf_perf_event_output. The reproducer [2] just creates busy uprobe
      and call bpf_perf_event_output helper a lot.
      
      v3 changes:
        - added acks and fixed 'Fixes' tag style [Hou Tao]
        - added Closes tag to patch 2
      
      v2 changes:
        - I changed 'Fixes' commits to where I saw we switched from preempt_disable
          to migrate_disable, but I'm not completely sure about the patch 2, because
          it was tricky to find, would be nice if somebody could check on that
      
      thanks,
      jirka
      
      [1] https://github.com/cilium/cilium/issues/26756
      [2] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf_output_fix_reproducer&id=8054dcc634121b884c7c331329d61d93351d03b5
      [3] slack:
          [66194.378161] BUG: kernel NULL pointer dereference, address: 0000000000000001
          [66194.378324] #PF: supervisor instruction fetch in kernel mode
          [66194.378447] #PF: error_code(0x0010) - not-present page
          ...
          [66194.378692] Oops: 0010 [#1] PREEMPT SMP NOPTI
          ...
          [66194.380666]  <TASK>
          [66194.380775]  ? perf_output_sample+0x12a/0x9a0
          [66194.380902]  ? finish_task_switch.isra.0+0x81/0x280
          [66194.381024]  ? perf_event_output+0x66/0xa0
          [66194.381148]  ? bpf_event_output+0x13a/0x190
          [66194.381270]  ? bpf_event_output_data+0x22/0x40
          [66194.381391]  ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
          [66194.381519]  ? xa_load+0x87/0xe0
          [66194.381635]  ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
          [66194.381759]  ? release_sock+0x3e/0x90
          [66194.381876]  ? sk_setsockopt+0x1a1/0x12f0
          [66194.381996]  ? udp_pre_connect+0x36/0x50
          [66194.382114]  ? inet_dgram_connect+0x93/0xa0
          [66194.382233]  ? __sys_connect+0xb4/0xe0
          [66194.382353]  ? udp_setsockopt+0x27/0x40
          [66194.382470]  ? __pfx_udp_push_pending_frames+0x10/0x10
          [66194.382593]  ? __sys_setsockopt+0xdf/0x1a0
          [66194.382713]  ? __x64_sys_connect+0xf/0x20
          [66194.382832]  ? do_syscall_64+0x3a/0x90
          [66194.382949]  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
          [66194.383077]  </TASK>
      ---
      ====================
      
      Link: https://lore.kernel.org/r/20230725084206.580930-1-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      aa89592f
    • Jiri Olsa's avatar
      bpf: Disable preemption in bpf_event_output · d62cc390
      Jiri Olsa authored
      We received report [1] of kernel crash, which is caused by
      using nesting protection without disabled preemption.
      
      The bpf_event_output can be called by programs executed by
      bpf_prog_run_array_cg function that disabled migration but
      keeps preemption enabled.
      
      This can cause task to be preempted by another one inside the
      nesting protection and lead eventually to two tasks using same
      perf_sample_data buffer and cause crashes like:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000001
        #PF: supervisor instruction fetch in kernel mode
        #PF: error_code(0x0010) - not-present page
        ...
        ? perf_output_sample+0x12a/0x9a0
        ? finish_task_switch.isra.0+0x81/0x280
        ? perf_event_output+0x66/0xa0
        ? bpf_event_output+0x13a/0x190
        ? bpf_event_output_data+0x22/0x40
        ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
        ? xa_load+0x87/0xe0
        ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
        ? release_sock+0x3e/0x90
        ? sk_setsockopt+0x1a1/0x12f0
        ? udp_pre_connect+0x36/0x50
        ? inet_dgram_connect+0x93/0xa0
        ? __sys_connect+0xb4/0xe0
        ? udp_setsockopt+0x27/0x40
        ? __pfx_udp_push_pending_frames+0x10/0x10
        ? __sys_setsockopt+0xdf/0x1a0
        ? __x64_sys_connect+0xf/0x20
        ? do_syscall_64+0x3a/0x90
        ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Fixing this by disabling preemption in bpf_event_output.
      
      [1] https://github.com/cilium/cilium/issues/26756
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOleg "livelace" Popov <o.popov@livelace.ru>
      Closes: https://github.com/cilium/cilium/issues/26756
      Fixes: 2a916f2f ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d62cc390
    • Jiri Olsa's avatar
      bpf: Disable preemption in bpf_perf_event_output · f2c67a3e
      Jiri Olsa authored
      The nesting protection in bpf_perf_event_output relies on disabled
      preemption, which is guaranteed for kprobes and tracepoints.
      
      However bpf_perf_event_output can be also called from uprobes context
      through bpf_prog_run_array_sleepable function which disables migration,
      but keeps preemption enabled.
      
      This can cause task to be preempted by another one inside the nesting
      protection and lead eventually to two tasks using same perf_sample_data
      buffer and cause crashes like:
      
        kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
        BUG: unable to handle page fault for address: ffffffff82be3eea
        ...
        Call Trace:
         ? __die+0x1f/0x70
         ? page_fault_oops+0x176/0x4d0
         ? exc_page_fault+0x132/0x230
         ? asm_exc_page_fault+0x22/0x30
         ? perf_output_sample+0x12b/0x910
         ? perf_event_output+0xd0/0x1d0
         ? bpf_perf_event_output+0x162/0x1d0
         ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
         ? __uprobe_perf_func+0x12b/0x540
         ? uprobe_dispatcher+0x2c4/0x430
         ? uprobe_notify_resume+0x2da/0xce0
         ? atomic_notifier_call_chain+0x7b/0x110
         ? exit_to_user_mode_prepare+0x13e/0x290
         ? irqentry_exit_to_user_mode+0x5/0x30
         ? asm_exc_int3+0x35/0x40
      
      Fixing this by disabling preemption in bpf_perf_event_output.
      
      Cc: stable@vger.kernel.org
      Fixes: 8c7dcb84 ("bpf: implement sleepable uprobes by chaining gps")
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f2c67a3e
  4. 25 Jul, 2023 8 commits
  5. 24 Jul, 2023 15 commits
  6. 23 Jul, 2023 1 commit
  7. 22 Jul, 2023 1 commit
  8. 21 Jul, 2023 3 commits
  9. 20 Jul, 2023 5 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 57f1f9dd
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from BPF, netfilter, bluetooth and CAN.
      
        Current release - regressions:
      
         - eth: r8169: multiple fixes for PCIe ASPM-related problems
      
         - vrf: fix RCU lockdep splat in output path
      
        Previous releases - regressions:
      
         - gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set
      
         - dsa: mv88e6xxx: do a final check before timing out when polling
      
         - nf_tables: fix sleep in atomic in nft_chain_validate
      
        Previous releases - always broken:
      
         - sched: fix undoing tcf_bind_filter() in multiple classifiers
      
         - bpf, arm64: fix BTI type used for freplace attached functions
      
         - can: gs_usb: fix time stamp counter initialization
      
         - nft_set_pipapo: fix improper element removal (leading to UAF)
      
        Misc:
      
         - net: support STP on bridge in non-root netns, STP prevents packet
           loops so not supporting it results in freezing systems of
           unsuspecting users, and in turn very upset noises being made
      
         - fix kdoc warnings
      
         - annotate various bits of TCP state to prevent data races"
      
      * tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
        net: phy: prevent stale pointer dereference in phy_init()
        tcp: annotate data-races around fastopenq.max_qlen
        tcp: annotate data-races around icsk->icsk_user_timeout
        tcp: annotate data-races around tp->notsent_lowat
        tcp: annotate data-races around rskq_defer_accept
        tcp: annotate data-races around tp->linger2
        tcp: annotate data-races around icsk->icsk_syn_retries
        tcp: annotate data-races around tp->keepalive_probes
        tcp: annotate data-races around tp->keepalive_intvl
        tcp: annotate data-races around tp->keepalive_time
        tcp: annotate data-races around tp->tsoffset
        tcp: annotate data-races around tp->tcp_tx_delay
        Bluetooth: MGMT: Use correct address for memcpy()
        Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014
        Bluetooth: SCO: fix sco_conn related locking and validity issues
        Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
        Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
        Bluetooth: coredump: fix building with coredump disabled
        Bluetooth: ISO: fix iso_conn related locking and validity issues
        Bluetooth: hci_event: call disconnect callback before deleting conn
        ...
      57f1f9dd
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2023-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 75d42b35
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix building with coredump disabled
       - Fix use-after-free in hci_remove_adv_monitor
       - Use RCU for hci_conn_params and iterate safely in hci_sync
       - Fix locking issues on ISO and SCO
       - Fix bluetooth on Intel Macbook 2014
      
      * tag 'for-net-2023-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: MGMT: Use correct address for memcpy()
        Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014
        Bluetooth: SCO: fix sco_conn related locking and validity issues
        Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
        Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
        Bluetooth: coredump: fix building with coredump disabled
        Bluetooth: ISO: fix iso_conn related locking and validity issues
        Bluetooth: hci_event: call disconnect callback before deleting conn
        Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync
      ====================
      
      Link: https://lore.kernel.org/r/20230720190201.446469-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      75d42b35
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-07-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 9b39f758
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      Netfilter fixes for net:
      
      The following patchset contains Netfilter fixes for net:
      
      1. Fix spurious -EEXIST error from userspace due to
         padding holes, this was broken since 4.9 days
         when 'ignore duplicate entries on insert' feature was
         added.
      
      2. Fix a sched-while-atomic bug, present since 5.19.
      
      3. Properly remove elements if they lack an "end range".
         nft userspace always sets an end range attribute, even
         when its the same as the start, but the abi doesn't
         have such a restriction. Always broken since it was
         added in 5.6, all three from myself.
      
      4 + 5: Bound chain needs to be skipped in netns release
         and on rule flush paths, from Pablo Neira.
      
      * tag 'nf-23-07-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: skip bound chain on rule flush
        netfilter: nf_tables: skip bound chain in netns release path
        netfilter: nft_set_pipapo: fix improper element removal
        netfilter: nf_tables: can't schedule in nft_chain_validate
        netfilter: nf_tables: fix spurious set element insertion failure
      ====================
      
      Link: https://lore.kernel.org/r/20230720165143.30208-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9b39f758
    • Vladimir Oltean's avatar
      net: phy: prevent stale pointer dereference in phy_init() · 1c613bea
      Vladimir Oltean authored
      mdio_bus_init() and phy_driver_register() both have error paths, and if
      those are ever hit, ethtool will have a stale pointer to the
      phy_ethtool_phy_ops stub structure, which references memory from a
      module that failed to load (phylib).
      
      It is probably hard to force an error in this code path even manually,
      but the error teardown path of phy_init() should be the same as
      phy_exit(), which is now simply not the case.
      
      Fixes: 55d8f053 ("net: phy: Register ethtool PHY operations")
      Link: https://lore.kernel.org/netdev/ZLaiJ4G6TaJYGJyU@shell.armlinux.org.uk/Suggested-by: default avatarRussell King (Oracle) <linux@armlinux.org.uk>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230720000231.1939689-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c613bea
    • Jakub Kicinski's avatar
      Merge branch 'tcp-add-missing-annotations' · 7998c0ad
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      tcp: add missing annotations
      
      This series was inspired by one syzbot (KCSAN) report.
      
      do_tcp_getsockopt() does not lock the socket, we need to
      annotate most of the reads there (and other places as well).
      
      This is a first round, another series will come later.
      ====================
      
      Link: https://lore.kernel.org/r/20230719212857.3943972-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7998c0ad