1. 15 Oct, 2018 3 commits
  2. 01 Oct, 2018 9 commits
    • Miklos Szeredi's avatar
      fuse: realloc page array · e52a8250
      Miklos Szeredi authored
      Writeback caching currently allocates requests with the maximum number of
      possible pages, while the actual number of pages per request depends on a
      couple of factors that cannot be determined when the request is allocated
      (whether page is already under writeback, whether page is contiguous with
      previous pages already added to a request).
      
      This patch allows such requests to start with no page allocation (all pages
      inline) and grow the page array on demand.
      
      If the max_pages tunable remains the default value, then this will mean
      just one allocation that is the same size as before.  If the tunable is
      larger, then this adds at most 3 additional memory allocations (which is
      generously compensated by the improved performance from the larger
      request).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e52a8250
    • Constantine Shulyupin's avatar
      fuse: add max_pages to init_out · 5da784cc
      Constantine Shulyupin authored
      Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to
      improve performance.
      
      Old RFC with detailed description of the problem and many fixes by Mitsuo
      Hayasaka (mitsuo.hayasaka.hu@hitachi.com):
       - https://lkml.org/lkml/2012/7/5/136
      
      We've encountered performance degradation and fixed it on a big and complex
      virtual environment.
      
      Environment to reproduce degradation and improvement:
      
      1. Add lag to user mode FUSE
      Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in
      passthrough_fh.c
      
      2. patch UM fuse with configurable max_pages parameter. The patch will be
      provided latter.
      
      3. run test script and perform test on tmpfs
      fuse_test()
      {
      
             cd /tmp
             mkdir -p fusemnt
             passthrough_fh -o max_pages=$1 /tmp/fusemnt
             grep fuse /proc/self/mounts
             dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \
      		count=1K bs=1M 2>&1 | grep -v records
             rm fusemnt/tmp/tmp
             killall passthrough_fh
      }
      
      Test results:
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0
      1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0
      1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s
      
      Obviously with bigger lag the difference between 'before' and 'after'
      will be more significant.
      
      Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136),
      observed improvement from 400-550 to 520-740.
      Signed-off-by: default avatarConstantine Shulyupin <const@MakeLinux.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5da784cc
    • Miklos Szeredi's avatar
      fuse: allocate page array more efficiently · 8a7aa286
      Miklos Szeredi authored
      When allocating page array for a request the array for the page pointers
      and the array for page descriptors are allocated by two separate kmalloc()
      calls.  Merge these into one allocation.
      
      Also instead of initializing the request and the page arrays with memset(),
      use the zeroing allocation variants.
      
      Reserved requests never carry pages (page array size is zero). Make that
      explicit by initializing the page array pointers to NULL and make sure the
      assumption remains true by adding a WARN_ON().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8a7aa286
    • Miklos Szeredi's avatar
      fuse: reduce size of struct fuse_inode · ab2257e9
      Miklos Szeredi authored
      Do this by grouping fields used for cached writes and putting them into a
      union with fileds used for cached readdir (with obviously no overlap, since
      we don't have hybrid objects).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      ab2257e9
    • Miklos Szeredi's avatar
      fuse: use iversion for readdir cache verification · 261aaba7
      Miklos Szeredi authored
      Use the internal iversion counter to make sure modifications of the
      directory through this filesystem are not missed by the mtime check (due to
      mtime granularity).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      261aaba7
    • Miklos Szeredi's avatar
      fuse: use mtime for readdir cache verification · 7118883b
      Miklos Szeredi authored
      Store the modification time of the directory in the cache, obtained before
      starting to fill the cache.
      
      When reading the cache, verify that the directory hasn't changed, by
      checking if current modification time is the same as the one stored in the
      cache.
      
      This only needs to be done when the current file position is at the
      beginning of the directory, as mandated by POSIX.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      7118883b
    • Miklos Szeredi's avatar
      fuse: add readdir cache version · 3494927e
      Miklos Szeredi authored
      Allow the cache to be invalidated when page(s) have gone missing.  In this
      case increment the version of the cache and reset to an empty state.
      
      Add a version number to the directory stream in struct fuse_file as well,
      indicating the version of the cache it's supposed to be reading.  If the
      cache version doesn't match the stream's version, then reset the stream to
      the beginning of the cache.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3494927e
    • Miklos Szeredi's avatar
      fuse: allow using readdir cache · 5d7bc7e8
      Miklos Szeredi authored
      The cache is only used if it's completed, not while it's still being
      filled; this constraint could be lifted later, if it turns out to be
      useful.
      
      Introduce state in struct fuse_file that indicates the position within the
      cache.  After a seek, reset the position to the beginning of the cache and
      search the cache for the current position.  If the current position is not
      found in the cache, then fall back to uncached readdir.
      
      It can also happen that page(s) disappear from the cache, in which case we
      must also fall back to uncached readdir.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5d7bc7e8
    • Miklos Szeredi's avatar
      fuse: allow caching readdir · 69e34551
      Miklos Szeredi authored
      This patch just adds the cache filling functions, which are invoked if
      FOPEN_CACHE_DIR flag is set in the OPENDIR reply.
      
      Cache reading and cache invalidation are added by subsequent patches.
      
      The directory cache uses the page cache.  Directory entries are packed into
      a page in the same format as in the READDIR reply.  A page only contains
      whole entries, the space at the end of the page is cleared.  The page is
      locked while being modified.
      
      Multiple parallel readdirs on the same directory can fill the cache; the
      only constraint is that continuity must be maintained (d_off of last entry
      points to position of current entry).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      69e34551
  3. 28 Sep, 2018 16 commits
  4. 23 Sep, 2018 7 commits
  5. 22 Sep, 2018 1 commit
    • Omar Sandoval's avatar
      block: use nanosecond resolution for iostat · b57e99b4
      Omar Sandoval authored
      Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
      updating properly on 4.18. This is because we started using ktime to
      track elapsed time, and we convert nanoseconds to jiffies when we update
      the partition counter. However, this gets rounded down, so any I/Os that
      take less than a jiffy are not accounted for. Previously in this case,
      the value of jiffies would sometimes increment while we were doing I/O,
      so at least some I/Os were accounted for.
      
      Let's convert the stats to use nanoseconds internally. We still report
      milliseconds as before, now more accurately than ever. The value is
      still truncated to 32 bits for backwards compatibility.
      
      Fixes: 522a7775 ("block: consolidate struct request timestamp fields")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKlaus Kusche <klaus.kusche@computerix.info>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b57e99b4
  6. 21 Sep, 2018 4 commits
    • Greg Kroah-Hartman's avatar
      Merge tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 10dc890d
      Greg Kroah-Hartman authored
      Linus writes:
        "Pin control fixes for v4.19:
         - Two fixes for the Intel pin controllers than cause
           problems on laptops."
      
      * tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: intel: Do pin translation in other GPIO operations as well
        pinctrl: cannonlake: Fix gpio base for GPP-E
      10dc890d
    • Greg Kroah-Hartman's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a27fb6d9
      Greg Kroah-Hartman authored
      Paolo writes:
        "It's mostly small bugfixes and cleanups, mostly around x86 nested
         virtualization.  One important change, not related to nested
         virtualization, is that the ability for the guest kernel to trap
         CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
         now masked by default.  This is because the feature is detected
         through an MSR; a very bad idea that Intel seems to like more and
         more.  Some applications choke if the other fields of that MSR are
         not initialized as on real hardware, hence we have to disable the
         whole MSR by default, as was the case before Linux 4.12."
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
        KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
        kvm: selftests: Add platform_info_test
        KVM: x86: Control guest reads of MSR_PLATFORM_INFO
        KVM: x86: Turbo bits in MSR_PLATFORM_INFO
        nVMX x86: Check VPID value on vmentry of L2 guests
        nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
        KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
        KVM: VMX: check nested state and CR4.VMXE against SMM
        kvm: x86: make kvm_{load|put}_guest_fpu() static
        x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
        KVM: VMX: use preemption timer to force immediate VMExit
        KVM: VMX: modify preemption timer bit only when arming timer
        KVM: VMX: immediately mark preemption timer expired only for zero value
        KVM: SVM: Switch to bitmap_zalloc()
        KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
        kvm: selftests: use -pthread instead of -lpthread
        KVM: x86: don't reset root in kvm_mmu_setup()
        kvm: mmu: Don't read PDPTEs when paging is not enabled
        x86/kvm/lapic: always disable MMIO interface in x2APIC mode
        KVM: s390: Make huge pages unavailable in ucontrol VMs
        ...
      a27fb6d9
    • Greg Kroah-Hartman's avatar
      Merge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs · 0eba8697
      Greg Kroah-Hartman authored
      Richard writes:
        "This pull request contains fixes for UBIFS:
         - A wrong UBIFS assertion in mount code
         - Fix for a NULL pointer deref in mount code
         - Revert of a bad fix for xattrs"
      
      * tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs:
        Revert "ubifs: xattr: Don't operate on deleted inodes"
        ubifs: drop false positive assertion
        ubifs: Check for name being NULL while mounting
      0eba8697
    • Greg Kroah-Hartman's avatar
      Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block · 211b100a
      Greg Kroah-Hartman authored
      Jens writes:
        "Storage fixes for 4.19-rc5
      
        - Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)
      
        - NVMe pull request from Christoph, and a single ANA log page fix
          (Hannes)
      
        - Regression fix for libata qd32 support, where we trigger an illegal
          active command transition. This fixes a CD-ROM detection issue that
          was reported, but could also trigger premature completion of the
          internal tag (me)"
      
      * tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
        floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
        libata: mask swap internal and hardware tag
        nvme: count all ANA groups for ANA Log page
      211b100a