1. 15 Oct, 2018 6 commits
    • Dan Schatzberg's avatar
      fuse: enable caching of symlinks · 5571f1e6
      Dan Schatzberg authored
      FUSE file reads are cached in the page cache, but symlink reads are
      not. This patch enables FUSE READLINK operations to be cached which
      can improve performance of some FUSE workloads.
      
      In particular, I'm working on a FUSE filesystem for access to source
      code and discovered that about a 10% improvement to build times is
      achieved with this patch (there are a lot of symlinks in the source
      tree).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5571f1e6
    • Miklos Szeredi's avatar
      fuse: only invalidate atime in direct read · 9a2eb24d
      Miklos Szeredi authored
      After sending a synchronous READ request from __fuse_direct_read() we only
      need to invalidate atime; none of the other attributes should be changed by
      a read().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      9a2eb24d
    • Miklos Szeredi's avatar
      fuse: don't need GETATTR after every READ · 802dc049
      Miklos Szeredi authored
      If 'auto_inval_data' mode is active, then fuse_file_read_iter() will call
      fuse_update_attributes(), which will check the attribute validity and send
      a GETATTR request if some of the attributes are no longer valid.  The page
      cache is then invalidated if the size or mtime have changed.
      
      Then, if a READ request was sent and reply received (which is the case if
      the data wasn't cached yet, or if the file is opened for O_DIRECT), the
      atime attribute is invalidated.
      
      This will result in the next read() also triggering a GETATTR, ...
      
      This can be fixed by only sending GETATTR if the mode or size are invalid,
      we don't need to do a refresh if only atime is invalid.
      
      More generally, none of the callers of fuse_update_attributes() need an
      up-to-date atime value, so for now just remove STATX_ATIME from the request
      mask when attributes are updated for internal use.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      802dc049
    • Miklos Szeredi's avatar
      fuse: allow fine grained attr cache invaldation · 2f1e8196
      Miklos Szeredi authored
      This patch adds the infrastructure for more fine grained attribute
      invalidation.  Currently only 'atime' is invalidated separately.
      
      The use of this infrastructure is extended to the statx(2) interface, which
      for now means that if only 'atime' is invalid and STATX_ATIME is not
      specified in the mask argument, then no GETATTR request will be generated.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2f1e8196
    • Miklos Szeredi's avatar
      bitops: protect variables in bit_clear_unless() macro · edfa8728
      Miklos Szeredi authored
      Unprotected naming of local variables within bit_clear_unless() can easily
      lead to using the wrong scope.
      
      Noticed this by code review after having hit this issue in set_mask_bits()
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 85ad1d13 ("md: set MD_CHANGE_PENDING in a atomic region")
      Cc: Guoqing Jiang <gqjiang@suse.com>
      edfa8728
    • Miklos Szeredi's avatar
      bitops: protect variables in set_mask_bits() macro · 18127429
      Miklos Szeredi authored
      Unprotected naming of local variables within the set_mask_bits() can easily
      lead to using the wrong scope.
      
      Noticed this when "set_mask_bits(&foo->bar, 0, mask)" behaved as no-op.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 00a1a053 ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()")
      Cc: Theodore Ts'o <tytso@mit.edu>
      18127429
  2. 01 Oct, 2018 9 commits
    • Miklos Szeredi's avatar
      fuse: realloc page array · e52a8250
      Miklos Szeredi authored
      Writeback caching currently allocates requests with the maximum number of
      possible pages, while the actual number of pages per request depends on a
      couple of factors that cannot be determined when the request is allocated
      (whether page is already under writeback, whether page is contiguous with
      previous pages already added to a request).
      
      This patch allows such requests to start with no page allocation (all pages
      inline) and grow the page array on demand.
      
      If the max_pages tunable remains the default value, then this will mean
      just one allocation that is the same size as before.  If the tunable is
      larger, then this adds at most 3 additional memory allocations (which is
      generously compensated by the improved performance from the larger
      request).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e52a8250
    • Constantine Shulyupin's avatar
      fuse: add max_pages to init_out · 5da784cc
      Constantine Shulyupin authored
      Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to
      improve performance.
      
      Old RFC with detailed description of the problem and many fixes by Mitsuo
      Hayasaka (mitsuo.hayasaka.hu@hitachi.com):
       - https://lkml.org/lkml/2012/7/5/136
      
      We've encountered performance degradation and fixed it on a big and complex
      virtual environment.
      
      Environment to reproduce degradation and improvement:
      
      1. Add lag to user mode FUSE
      Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in
      passthrough_fh.c
      
      2. patch UM fuse with configurable max_pages parameter. The patch will be
      provided latter.
      
      3. run test script and perform test on tmpfs
      fuse_test()
      {
      
             cd /tmp
             mkdir -p fusemnt
             passthrough_fh -o max_pages=$1 /tmp/fusemnt
             grep fuse /proc/self/mounts
             dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \
      		count=1K bs=1M 2>&1 | grep -v records
             rm fusemnt/tmp/tmp
             killall passthrough_fh
      }
      
      Test results:
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0
      1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0
      1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s
      
      Obviously with bigger lag the difference between 'before' and 'after'
      will be more significant.
      
      Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136),
      observed improvement from 400-550 to 520-740.
      Signed-off-by: default avatarConstantine Shulyupin <const@MakeLinux.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5da784cc
    • Miklos Szeredi's avatar
      fuse: allocate page array more efficiently · 8a7aa286
      Miklos Szeredi authored
      When allocating page array for a request the array for the page pointers
      and the array for page descriptors are allocated by two separate kmalloc()
      calls.  Merge these into one allocation.
      
      Also instead of initializing the request and the page arrays with memset(),
      use the zeroing allocation variants.
      
      Reserved requests never carry pages (page array size is zero). Make that
      explicit by initializing the page array pointers to NULL and make sure the
      assumption remains true by adding a WARN_ON().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8a7aa286
    • Miklos Szeredi's avatar
      fuse: reduce size of struct fuse_inode · ab2257e9
      Miklos Szeredi authored
      Do this by grouping fields used for cached writes and putting them into a
      union with fileds used for cached readdir (with obviously no overlap, since
      we don't have hybrid objects).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      ab2257e9
    • Miklos Szeredi's avatar
      fuse: use iversion for readdir cache verification · 261aaba7
      Miklos Szeredi authored
      Use the internal iversion counter to make sure modifications of the
      directory through this filesystem are not missed by the mtime check (due to
      mtime granularity).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      261aaba7
    • Miklos Szeredi's avatar
      fuse: use mtime for readdir cache verification · 7118883b
      Miklos Szeredi authored
      Store the modification time of the directory in the cache, obtained before
      starting to fill the cache.
      
      When reading the cache, verify that the directory hasn't changed, by
      checking if current modification time is the same as the one stored in the
      cache.
      
      This only needs to be done when the current file position is at the
      beginning of the directory, as mandated by POSIX.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      7118883b
    • Miklos Szeredi's avatar
      fuse: add readdir cache version · 3494927e
      Miklos Szeredi authored
      Allow the cache to be invalidated when page(s) have gone missing.  In this
      case increment the version of the cache and reset to an empty state.
      
      Add a version number to the directory stream in struct fuse_file as well,
      indicating the version of the cache it's supposed to be reading.  If the
      cache version doesn't match the stream's version, then reset the stream to
      the beginning of the cache.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3494927e
    • Miklos Szeredi's avatar
      fuse: allow using readdir cache · 5d7bc7e8
      Miklos Szeredi authored
      The cache is only used if it's completed, not while it's still being
      filled; this constraint could be lifted later, if it turns out to be
      useful.
      
      Introduce state in struct fuse_file that indicates the position within the
      cache.  After a seek, reset the position to the beginning of the cache and
      search the cache for the current position.  If the current position is not
      found in the cache, then fall back to uncached readdir.
      
      It can also happen that page(s) disappear from the cache, in which case we
      must also fall back to uncached readdir.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5d7bc7e8
    • Miklos Szeredi's avatar
      fuse: allow caching readdir · 69e34551
      Miklos Szeredi authored
      This patch just adds the cache filling functions, which are invoked if
      FOPEN_CACHE_DIR flag is set in the OPENDIR reply.
      
      Cache reading and cache invalidation are added by subsequent patches.
      
      The directory cache uses the page cache.  Directory entries are packed into
      a page in the same format as in the READDIR reply.  A page only contains
      whole entries, the space at the end of the page is cleared.  The page is
      locked while being modified.
      
      Multiple parallel readdirs on the same directory can fill the cache; the
      only constraint is that continuity must be maintained (d_off of last entry
      points to position of current entry).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      69e34551
  3. 28 Sep, 2018 16 commits
  4. 23 Sep, 2018 7 commits
  5. 22 Sep, 2018 1 commit
    • Omar Sandoval's avatar
      block: use nanosecond resolution for iostat · b57e99b4
      Omar Sandoval authored
      Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
      updating properly on 4.18. This is because we started using ktime to
      track elapsed time, and we convert nanoseconds to jiffies when we update
      the partition counter. However, this gets rounded down, so any I/Os that
      take less than a jiffy are not accounted for. Previously in this case,
      the value of jiffies would sometimes increment while we were doing I/O,
      so at least some I/Os were accounted for.
      
      Let's convert the stats to use nanoseconds internally. We still report
      milliseconds as before, now more accurately than ever. The value is
      still truncated to 32 bits for backwards compatibility.
      
      Fixes: 522a7775 ("block: consolidate struct request timestamp fields")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKlaus Kusche <klaus.kusche@computerix.info>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b57e99b4
  6. 21 Sep, 2018 1 commit