1. 18 Apr, 2023 35 commits
  2. 16 Apr, 2023 5 commits
    • Andrew Morton's avatar
    • Peter Xu's avatar
      Revert "userfaultfd: don't fail on unrecognized features" · 2ff559f3
      Peter Xu authored
      This is a proposal to revert commit 914eedcb.
      
      I found this when writing a simple UFFDIO_API test to be the first unit
      test in this set.  Two things breaks with the commit:
      
        - UFFDIO_API check was lost and missing.  According to man page, the
        kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa.  This
        check is needed if the api version will be extended in the future, or
        user app won't be able to identify which is a new kernel.
      
        - Feature flags checks were removed, which means UFFDIO_API with a
        feature that does not exist will also succeed.  According to the man
        page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
        unknown features passed in.
      
      Link: https://lore.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20230412163922.327282-2-peterx@redhat.com
      Fixes: 914eedcb ("userfaultfd: don't fail on unrecognized features")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2ff559f3
    • Baokun Li's avatar
      writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs · 1ba1199e
      Baokun Li authored
      KASAN report null-ptr-deref:
      ==================================================================
      BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
      Write of size 8 at addr 0000000000000000 by task sync/943
      CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
      Call Trace:
       <TASK>
       dump_stack_lvl+0x7f/0xc0
       print_report+0x2ba/0x340
       kasan_report+0xc4/0x120
       kasan_check_range+0x1b7/0x2e0
       __kasan_check_write+0x24/0x40
       bdi_split_work_to_wbs+0x5c5/0x7b0
       sync_inodes_sb+0x195/0x630
       sync_inodes_one_sb+0x3a/0x50
       iterate_supers+0x106/0x1b0
       ksys_sync+0x98/0x160
      [...]
      ==================================================================
      
      The race that causes the above issue is as follows:
      
                 cpu1                     cpu2
      -------------------------|-------------------------
      inode_switch_wbs
       INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
       queue_rcu_work(isw_wq, &isw->work)
       // queue_work async
        inode_switch_wbs_work_fn
         wb_put_many(old_wb, nr_switched)
          percpu_ref_put_many
           ref->data->release(ref)
           cgwb_release
            queue_work(cgwb_release_wq, &wb->release_work)
            // queue_work async
             &wb->release_work
             cgwb_release_workfn
                                  ksys_sync
                                   iterate_supers
                                    sync_inodes_one_sb
                                     sync_inodes_sb
                                      bdi_split_work_to_wbs
                                       kmalloc(sizeof(*work), GFP_ATOMIC)
                                       // alloc memory failed
              percpu_ref_exit
               ref->data = NULL
               kfree(data)
                                       wb_get(wb)
                                        percpu_ref_get(&wb->refcnt)
                                         percpu_ref_get_many(ref, 1)
                                          atomic_long_add(nr, &ref->data->count)
                                           atomic64_add(i, v)
                                           // trigger null-ptr-deref
      
      bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
      wbs.  If the allocation of new work fails, the on-stack fallback will be
      used and the reference count of the current wb is increased afterwards. 
      If cgroup writeback membership switches occur before getting the reference
      count and the current wb is released as old_wd, then calling wb_get() or
      wb_put() will trigger the null pointer dereference above.
      
      This issue was introduced in v4.3-rc7 (see fix tag1).  Both
      sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
      bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
      sync_inodes_sb(), originally commit 7fc5854f ("writeback: synchronize
      sync(2) against cgroup writeback membership switches") reduced the
      possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
      fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
      inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
      thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
      and the issue becomes easily reproducible again.
      
      To solve this problem, percpu_ref_exit() is called under RCU protection to
      avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). 
      Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
      and skip the current wb if wb_tryget() fails because the wb has already
      been shutdown.
      
      Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
      Fixes: b817525a ("writeback: bdi_writeback iteration must not skip dying ones")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Hou Tao <houtao1@huawei.com>
      Cc: yangerkun <yangerkun@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1ba1199e
    • Peng Zhang's avatar
      maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug · 1f5f12ec
      Peng Zhang authored
      In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
      node_count field of the new node, but the node may not be a new node.  It
      may be a node that existed before and node_count has a value, setting it
      to 0 will cause a memory leak.  At this time, mas->alloc->total will be
      greater than the actual number of nodes in the linked list, which may
      cause many other errors.  For example, out-of-bounds access in
      mas_pop_node(), and mas_pop_node() may return addresses that should not be
      used.  Fix it by initializing node_count only for new nodes.
      
      Also, by the way, an if-else statement was removed to simplify the code.
      
      Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
      Fixes: 54a611b6 ("Maple Tree: add new data structure")
      Signed-off-by: default avatarPeng Zhang <zhangpeng.00@bytedance.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f5f12ec
    • Steve Chou's avatar
      tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used · 92357568
      Steve Chou authored
      When using cull option with 'tg' flag, the fprintf is using pid instead
      of tgid. It should use tgid instead.
      
      Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw
      Fixes: 9c8a0a8e ("tools/vm/page_owner_sort.c: support for user-defined culling rules")
      Signed-off-by: default avatarSteve Chou <steve_chou@pesi.com.tw>
      Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      92357568