• Kirill Smelkov's avatar
    fuse: allow filesystems to have precise control over data cache · ad2ba64d
    Kirill Smelkov authored
    On networked filesystems file data can be changed externally.  FUSE
    provides notification messages for filesystem to inform kernel that
    metadata or data region of a file needs to be invalidated in local page
    cache. That provides the basis for filesystem implementations to invalidate
    kernel cache explicitly based on observed filesystem-specific events.
    
    FUSE has also "automatic" invalidation mode(*) when the kernel
    automatically invalidates data cache of a file if it sees mtime change.  It
    also automatically invalidates whole data cache of a file if it sees file
    size being changed.
    
    The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA.
    However, due to probably historical reason, that capability controls only
    whether mtime change should be resulting in automatic invalidation or
    not. A change in file size always results in invalidating whole data cache
    of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+).
    
    The filesystem I write[1] represents data arrays stored in networked
    database as local files suitable for mmap. It is read-only filesystem -
    changes to data are committed externally via database interfaces and the
    filesystem only glues data into contiguous file streams suitable for mmap
    and traditional array processing. The files are big - starting from
    hundreds gigabytes and more. The files change regularly, and frequently by
    data being appended to their end. The size of files thus changes
    frequently.
    
    If a file was accessed locally and some part of its data got into page
    cache, we want that data to stay cached unless there is memory pressure, or
    unless corresponding part of the file was actually changed. However current
    FUSE behaviour - when it sees file size change - is to invalidate the whole
    file. The data cache of the file is thus completely lost even on small size
    change, and despite that the filesystem server is careful to accurately
    translate database changes into FUSE invalidation messages to kernel.
    
    Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA
    capability, indicates to kernel that it is fully responsible for data cache
    invalidation, then the kernel won't invalidate files data cache on size
    change and only truncate that cache to new size in case the size decreased.
    
    (*) see 72d0d248 "fuse: add FUSE_AUTO_INVAL_DATA init flag",
    eed2179e "fuse: invalidate inode mapping if mtime changes"
    
    (+) in writeback mode the kernel does not invalidate data cache on file
    size change, but neither it allows the filesystem to set the size due to
    external event (see 8373200b "fuse: Trust kernel i_size only")
    
    [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20Signed-off-by: Kirill Smelkov's avatarKirill Smelkov <kirr@nexedi.com>
    Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
    ad2ba64d
inode.c 33.7 KB