• Shiyang Ruan's avatar
    mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind · fa422b35
    Shiyang Ruan authored
    Now, if we suddenly remove a PMEM device(by calling unbind) which
    contains FSDAX while programs are still accessing data in this device,
    e.g.:
    ```
     $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 &
     # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 &
     echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
    ```
    it could come into an unacceptable state:
      1. device has gone but mount point still exists, and umount will fail
           with "target is busy"
      2. programs will hang and cannot be killed
      3. may crash with NULL pointer dereference
    
    To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
    are going to remove the whole device, and make sure all related processes
    could be notified so that they could end up gracefully.
    
    This patch is inspired by Dan's "mm, dax, pmem: Introduce
    dev_pagemap_failure()"[1].  With the help of dax_holder and
    ->notify_failure() mechanism, the pmem driver is able to ask filesystem
    on it to unmap all files in use, and notify processes who are using
    those files.
    
    Call trace:
    trigger unbind
     -> unbind_store()
      -> ... (skip)
       -> devres_release_all()
        -> kill_dax()
         -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
          -> xfs_dax_notify_failure()
          `-> freeze_super()             // freeze (kernel call)
          `-> do xfs rmap
          ` -> mf_dax_kill_procs()
          `  -> collect_procs_fsdax()    // all associated processes
          `  -> unmap_and_kill()
          ` -> invalidate_inode_pages2_range() // drop file's cache
          `-> thaw_super()               // thaw (both kernel & user call)
    
    Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
    event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
    new dax mapping from being created.  Do not shutdown filesystem directly
    if configuration is not supported, or if failure range includes metadata
    area.  Make sure all files and processes(not only the current progress)
    are handled correctly.  Also drop the cache of associated files before
    pmem is removed.
    
    [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
    [2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
    Signed-off-by: default avatarChandan Babu R <chandanbabu@kernel.org>
    fa422b35
xfs_notify_failure.c 8.37 KB