• Hawkins Jiawei's avatar
    net: sched: fix memory leak in tcindex_set_parms · 399ab7fe
    Hawkins Jiawei authored
    Syzkaller reports a memory leak as follows:
    ====================================
    BUG: memory leak
    unreferenced object 0xffff88810c287f00 (size 256):
      comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
        [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
        [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
        [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
        [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
        [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
        [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
        [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
        [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
        [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
        [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
        [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
        [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
        [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
        [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
        [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
        [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
        [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
        [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
        [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
        [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
        [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
        [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
        [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
        [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
    ====================================
    
    Kernel uses tcindex_change() to change an existing
    filter properties.
    
    Yet the problem is that, during the process of changing,
    if `old_r` is retrieved from `p->perfect`, then
    kernel uses tcindex_alloc_perfect_hash() to newly
    allocate filter results, uses tcindex_filter_result_init()
    to clear the old filter result, without destroying
    its tcf_exts structure, which triggers the above memory leak.
    
    To be more specific, there are only two source for the `old_r`,
    according to the tcindex_lookup(). `old_r` is retrieved from
    `p->perfect`, or `old_r` is retrieved from `p->h`.
    
      * If `old_r` is retrieved from `p->perfect`, kernel uses
    tcindex_alloc_perfect_hash() to newly allocate the
    filter results. Then `r` is assigned with `cp->perfect + handle`,
    which is newly allocated. So condition `old_r && old_r != r` is
    true in this situation, and kernel uses tcindex_filter_result_init()
    to clear the old filter result, without destroying
    its tcf_exts structure
    
      * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
    according to the tcindex_lookup(). Considering that `cp->h`
    is directly copied from `p->h` and `p->perfect` is NULL,
    `r` is assigned with `tcindex_lookup(cp, handle)`, whose value
    should be the same as `old_r`, so condition `old_r && old_r != r`
    is false in this situation, kernel ignores using
    tcindex_filter_result_init() to clear the old filter result.
    
    So only when `old_r` is retrieved from `p->perfect` does kernel use
    tcindex_filter_result_init() to clear the old filter result, which
    triggers the above memory leak.
    
    Considering that there already exists a tc_filter_wq workqueue
    to destroy the old tcindex_data by tcindex_partial_destroy_work()
    at the end of tcindex_set_parms(), this patch solves
    this memory leak bug by removing this old filter result
    clearing part and delegating it to the tc_filter_wq workqueue.
    
    Note that this patch doesn't introduce any other issues. If
    `old_r` is retrieved from `p->perfect`, this patch just
    delegates old filter result clearing part to the
    tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
    kernel doesn't reach the old filter result clearing part, so
    removing this part has no effect.
    
    [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
    and Dmitry Vyukov]
    
    Fixes: b9a24bb7 ("net_sched: properly handle failure case of tcf_exts_init()")
    Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
    Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
    Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
    Cc: Cong Wang <cong.wang@bytedance.com>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: Paolo Abeni <pabeni@redhat.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
    Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    399ab7fe
cls_tcindex.c 17.1 KB