1. 01 Feb, 2013 8 commits
    • Chris Mason's avatar
      Btrfs: reduce CPU contention while waiting for delayed extent operations · bb721703
      Chris Mason authored
      We batch up operations to the extent allocation tree, which allows
      us to deal with the recursive nature of using the extent allocation
      tree to allocate extents to the extent allocation tree.
      
      It also provides a mechanism to sort and collect extent
      operations, which makes it much more efficient to record extents
      that are close together.
      
      The delayed extent operations must all be finished before the
      running transaction commits, so we have code to make sure and run a few
      of the batched operations when closing our transaction handles.
      
      This creates a great deal of contention for the locks in the
      delayed extent operation tree, and also contention for the lock on the
      extent allocation tree itself.  All the extra contention just slows
      down the operations and doesn't get things done any faster.
      
      This commit changes things to use a wait queue instead.  As procs
      want to run the delayed operations, one of them races in and gets
      permission to hit the tree, and the others step back and wait for
      progress to be made.
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      bb721703
    • Chris Mason's avatar
      Btrfs: reduce lock contention on extent buffer locks · 242e18c7
      Chris Mason authored
      The extent buffers have a refs_lock which we use to make coordinate freeing
      the extent buffer with operations on the radix tree.  On tree roots and
      other extent buffers that very cache hot, this can be highly contended.
      
      These are also the extent buffers that are basically pinned in memory.
      This commit adds code to cmpxchg our way through the ref modifications,
      and as long as the result of the reference change is still pinned in
      ram, we skip the expensive spinlock.
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      242e18c7
    • Chris Mason's avatar
      Btrfs: fix cluster alignment for mount -o ssd · 8de972b4
      Chris Mason authored
      With the new raid56 code, we want to make sure we're
      properly aligning our allocation clusters with -o ssd
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      8de972b4
    • Chris Mason's avatar
      Btrfs: add a plugging callback to raid56 writes · 6ac0f488
      Chris Mason authored
      Buffered writes and DIRECT_IO writes will often break up
      big contiguous changes to the file into sub-stripe writes.
      
      This adds a plugging callback to gather those smaller writes full stripe
      writes.
      
      Example on flash:
      
      fio job to do 64K writes in batches of 3 (which makes a full stripe):
      
      With plugging: 450MB/s
      Without plugging: 220MB/s
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      6ac0f488
    • Chris Mason's avatar
      Btrfs: Add a stripe cache to raid56 · 4ae10b3a
      Chris Mason authored
      The stripe cache allows us to avoid extra read/modify/write cycles
      by caching the pages we read off the disk.  Pages are cached when:
      
      * They are read in during a read/modify/write cycle
      
      * They are written during a read/modify/write cycle
      
      * They are involved in a parity rebuild
      
      Pages are not cached if we're doing a full stripe write.  We're
      assuming that a full stripe write won't be followed by another
      partial stripe write any time soon.
      
      This provides a substantial boost in performance for workloads that
      synchronously modify adjacent offsets in the file, and for the parity
      rebuild use case in general.
      
      The size of the stripe cache isn't tunable (yet) and is set at 1024
      entries.
      
      Example on flash: dd if=/dev/zero of=/mnt/xxx bs=4K oflag=direct
      
      Without the stripe cache  -- 2.1MB/s
      With the stripe cache 21MB/s
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      4ae10b3a
    • David Woodhouse's avatar
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse authored
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      53b381b3
    • David Woodhouse's avatar
      Btrfs: add rw argument to merge_bio_hook() · 64a16701
      David Woodhouse authored
      We'll want to merge writes so they can fill a full RAID[56] stripe, but
      not necessarily reads.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      64a16701
    • Eric Sandeen's avatar
      btrfs: don't try to notify udev about missing devices · 3c911608
      Eric Sandeen authored
      If we remove a missing device, bdev is null, and if we
      send that off to btrfs_kobject_uevent we'll panic.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      3c911608
  2. 19 Dec, 2012 1 commit
  3. 18 Dec, 2012 1 commit
  4. 17 Dec, 2012 30 commits