• Baokun Li's avatar
    cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie() · 5d8f8057
    Baokun Li authored
    We got the following issue in our fault injection stress test:
    
    ==================================================================
    BUG: KASAN: slab-use-after-free in cachefiles_withdraw_cookie+0x4d9/0x600
    Read of size 8 at addr ffff888118efc000 by task kworker/u78:0/109
    
    CPU: 13 PID: 109 Comm: kworker/u78:0 Not tainted 6.8.0-dirty #566
    Call Trace:
     <TASK>
     kasan_report+0x93/0xc0
     cachefiles_withdraw_cookie+0x4d9/0x600
     fscache_cookie_state_machine+0x5c8/0x1230
     fscache_cookie_worker+0x91/0x1c0
     process_one_work+0x7fa/0x1800
     [...]
    
    Allocated by task 117:
     kmalloc_trace+0x1b3/0x3c0
     cachefiles_acquire_volume+0xf3/0x9c0
     fscache_create_volume_work+0x97/0x150
     process_one_work+0x7fa/0x1800
     [...]
    
    Freed by task 120301:
     kfree+0xf1/0x2c0
     cachefiles_withdraw_cache+0x3fa/0x920
     cachefiles_put_unbind_pincount+0x1f6/0x250
     cachefiles_daemon_release+0x13b/0x290
     __fput+0x204/0xa00
     task_work_run+0x139/0x230
     do_exit+0x87a/0x29b0
     [...]
    ==================================================================
    
    Following is the process that triggers the issue:
    
               p1                |             p2
    ------------------------------------------------------------
                                  fscache_begin_lookup
                                   fscache_begin_volume_access
                                    fscache_cache_is_live(fscache_cache)
    cachefiles_daemon_release
     cachefiles_put_unbind_pincount
      cachefiles_daemon_unbind
       cachefiles_withdraw_cache
        fscache_withdraw_cache
         fscache_set_cache_state(cache, FSCACHE_CACHE_IS_WITHDRAWN);
        cachefiles_withdraw_objects(cache)
        fscache_wait_for_objects(fscache)
          atomic_read(&fscache_cache->object_count) == 0
                                  fscache_perform_lookup
                                   cachefiles_lookup_cookie
                                    cachefiles_alloc_object
                                     refcount_set(&object->ref, 1);
                                     object->volume = volume
                                     fscache_count_object(vcookie->cache);
                                      atomic_inc(&fscache_cache->object_count)
        cachefiles_withdraw_volumes
         cachefiles_withdraw_volume
          fscache_withdraw_volume
          __cachefiles_free_volume
           kfree(cachefiles_volume)
                                  fscache_cookie_state_machine
                                   cachefiles_withdraw_cookie
                                    cache = object->volume->cache;
                                    // cachefiles_volume UAF !!!
    
    After setting FSCACHE_CACHE_IS_WITHDRAWN, wait for all the cookie lookups
    to complete first, and then wait for fscache_cache->object_count == 0 to
    avoid the cookie exiting after the volume has been freed and triggering
    the above issue. Therefore call fscache_withdraw_volume() before calling
    cachefiles_withdraw_objects().
    
    This way, after setting FSCACHE_CACHE_IS_WITHDRAWN, only the following two
    cases will occur:
    1) fscache_begin_lookup fails in fscache_begin_volume_access().
    2) fscache_withdraw_volume() will ensure that fscache_count_object() has
       been executed before calling fscache_wait_for_objects().
    
    Fixes: fe2140e2 ("cachefiles: Implement volume support")
    Suggested-by: default avatarHou Tao <houtao1@huawei.com>
    Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
    Link: https://lore.kernel.org/r/20240628062930.2467993-4-libaokun@huaweicloud.comSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
    5d8f8057
volume.c 3.28 KB