1. 17 Dec, 2015 9 commits
    • Omar Sandoval's avatar
      Btrfs: add free space tree mount option · 70f6d82e
      Omar Sandoval authored
      Now we can finally hook up everything so we can actually use free space
      tree. The free space tree is enabled by passing the space_cache=v2 mount
      option. On the first mount with the this option set, the free space tree
      will be created and the FREE_SPACE_TREE read-only compat bit will be
      set. Any time the filesystem is mounted from then on, we must use the
      free space tree. The clear_cache option will also clear the free space
      tree.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      70f6d82e
    • Omar Sandoval's avatar
      Btrfs: wire up the free space tree to the extent tree · 1e144fb8
      Omar Sandoval authored
      The free space tree is updated in tandem with the extent tree. There are
      only a handful of places where we need to hook in:
      
      1. Block group creation
      2. Block group deletion
      3. Delayed refs (extent creation and deletion)
      4. Block group caching
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      1e144fb8
    • Omar Sandoval's avatar
      Btrfs: add free space tree sanity tests · 7c55ee0c
      Omar Sandoval authored
      This tests the operations on the free space tree trying to excercise all
      of the main cases for both formats. Between this and xfstests, the free
      space tree should have pretty good coverage.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      7c55ee0c
    • Omar Sandoval's avatar
      Btrfs: implement the free space B-tree · a5ed9182
      Omar Sandoval authored
      The free space cache has turned out to be a scalability bottleneck on
      large, busy filesystems. When the cache for a lot of block groups needs
      to be written out, we can get extremely long commit times; if this
      happens in the critical section, things are especially bad because we
      block new transactions from happening.
      
      The main problem with the free space cache is that it has to be written
      out in its entirety and is managed in an ad hoc fashion. Using a B-tree
      to store free space fixes this: updates can be done as needed and we get
      all of the benefits of using a B-tree: checksumming, RAID handling,
      well-understood behavior.
      
      With the free space tree, we get commit times that are about the same as
      the no cache case with load times slower than the free space cache case
      but still much faster than the no cache case. Free space is represented
      with extents until it becomes more space-efficient to use bitmaps,
      giving us similar space overhead to the free space cache.
      
      The operations on the free space tree are: adding and removing free
      space, handling the creation and deletion of block groups, and loading
      the free space for a block group. We can also create the free space tree
      by walking the extent tree and clear the free space tree.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a5ed9182
    • Omar Sandoval's avatar
      Btrfs: introduce the free space B-tree on-disk format · 208acb8c
      Omar Sandoval authored
      The on-disk format for the free space tree is straightforward. Each
      block group is represented in the free space tree by a free space info
      item that stores accounting information: whether the free space for this
      block group is stored as bitmaps or extents and how many extents of free
      space exist for this block group (regardless of which format is being
      used in the tree). Extents are (start, FREE_SPACE_EXTENT, length) keys
      with no corresponding item, and bitmaps instead have the
      FREE_SPACE_BITMAP type and have a bitmap item attached, which is just an
      array of bytes.
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      208acb8c
    • Omar Sandoval's avatar
      Btrfs: refactor caching_thread() · 73fa48b6
      Omar Sandoval authored
      We're also going to load the free space tree from caching_thread(), so
      we should refactor some of the common code.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      73fa48b6
    • Omar Sandoval's avatar
      Btrfs: add helpers for read-only compat bits · 1abfbcdf
      Omar Sandoval authored
      We're finally going to add one of these for the free space tree, so
      let's add the same nice helpers that we have for the incompat bits.
      While we're add it, also add helpers to clear the bits.
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      1abfbcdf
    • Omar Sandoval's avatar
      Btrfs: add extent buffer bitmap sanity tests · 0f331229
      Omar Sandoval authored
      Sanity test the extent buffer bitmap operations (test, set, and clear)
      against the equivalent standard kernel operations.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      0f331229
    • Omar Sandoval's avatar
      Btrfs: add extent buffer bitmap operations · 3e1e8bb7
      Omar Sandoval authored
      These are going to be used for the free space tree bitmap items.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      3e1e8bb7
  2. 02 Nov, 2015 1 commit
  3. 01 Nov, 2015 7 commits
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 95fc00a4
      Linus Torvalds authored
      Pull memremap fix from Dan Williams:
       "The new memremap() api introduced in the 4.3 cycle to unify/replace
        ioremap_cache() and ioremap_wt() is mishandling the highmem case.
        This patch has received a build success notification from a
        0day-kbuild-robot run and has received an ack from Ard"
      
      From the commit message:
       "The impact of this bug is low for now since the pmem driver is the
        only user of memremap(), but this is important to fix before more
        conversions to memremap arrive in 4.4"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        memremap: fix highmem support
      95fc00a4
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ca04d396
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "This set of updates contains:
      
         - Another bugfix for the pathologic vm86 machinery.  Clear
           thread.vm86 on fork to prevent corrupting the parent state.  This
           comes along with an update to the vm86 selftest case
      
         - Fix another corner case in the ioapic setup code which causes a
           boot crash on some oddball systems
      
         - Fix the fallout from the dma allocation consolidation work, which
           leads to a NULL pointer dereference when the allocation code is
           called with a NULL device"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vm86: Set thread.vm86 to NULL on fork/clone
        selftests/x86: Add a fork() to entry_from_vm86 to catch fork bugs
        x86/ioapic: Prevent NULL pointer dereference in setup_ioapic_dest()
        x86/dma-mapping: Fix arch_dma_alloc_attrs() oops with NULL dev
      ca04d396
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f5eab267
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "The last round of minimalistic fixes for clocksource drivers:
      
         - Prevent multiple shutdown of the sh_mtu2 clocksource
      
         - Annotate a bunch of clocksource/schedclock functions with notrace
           to prevent an annoying ftrace recursion issue"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/sh_mtu2: Fix multiple shutdown call issue
        clocksource/drivers/digicolor: Prevent ftrace recursion
        clocksource/drivers/fsl_ftm_timer: Prevent ftrace recursion
        clocksource/drivers/vf_pit_timer: Prevent ftrace recursion
        clocksource/drivers/prima2: Prevent ftrace recursion
        clocksource/drivers/samsung_pwm_timer: Prevent ftrace recursion
        clocksource/drivers/pistachio: Prevent ftrace recursion
        clocksource/drivers/arm_global_timer: Prevent ftrace recursion
      f5eab267
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4bf690d7
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "The last two one-liners for 4.3 from the irqchip space:
      
         - Regression fix for armada SoC which addresses the fallout from the
           set_irq_flags() cleanup
      
         - Add the missing propagation of the irq_set_type() callback to the
           parent interrupt controller of the tegra interrupt chip"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/tegra: Propagate IRQ type setting to parent
        irqchip/armada-370-xp: Fix regression by clearing IRQ_NOAUTOEN
      4bf690d7
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 56ef9db2
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "This should be our final batch of fixes for 4.3:
      
         - A patch from Sudeep Holla that fixes annotation of wakeup sources
           properly, old unused format seems to have spread through copying.
      
         - Two patches from Tony for OMAP.  One dealing with MUSB setup
           problems due to runtime PM being enabled too early on the parent
           device.  The other fixes IRQ numbering for OMAP1"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        usb: musb: omap2430: Fix regression caused by driver core change
        ARM: OMAP1: fix incorrect INT_DMA_LCD
        ARM: dts: fix gpio-keys wakeup-source property
      56ef9db2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 060b85b0
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is three essential bug fixes for various SCSI parts.
      
        The only affected users are SCSI multi-path via device handler
        (basically all the enterprise) and mvsas users.  The dh bugs are an
        async entanglement in boot resulting in a serious WARN_ON trip and a
        use after free on remove leading to a crash with strict memory
        accounting.  The mvsas bug manifests as a null deref oops but only on
        abort sequences; however, these can commonly occur with SATA attached
        devices, hence the fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi_dh: don't try to load a device handler during async probing
        scsi_dh: fix use-after-free when removing scsi device
        mvsas: Fix NULL pointer dereference in mvs_slot_task_free
      060b85b0
    • Linus Torvalds's avatar
      Merge tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md · af7eba01
      Linus Torvalds authored
      Pull md bug fixes from Neil Brown:
       "Two more bug fixes for md.
      
        One bugfix for a list corruption in raid5 because of incorrect
        locking.
      
        Other for possible data corruption when a recovering device is failed,
        removed, and re-added.
      
        Both tagged for -stable"
      
      * tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md:
        Revert "md: allow a partially recovered device to be hot-added to an array."
        md/raid5: fix locking in handle_stripe_clean_event()
      af7eba01
  4. 31 Oct, 2015 12 commits
  5. 30 Oct, 2015 4 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 9b971e77
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Apologies for this being so late, but we've uncovered a few nasty
        issues on arm64 which didn't settle down until yesterday and the fixes
        all look suitable for 4.3.  Of the four patches, three of them are
        Cc'd to stable, with the remaining patch fixing an issue that only
        took effect during the merge window.
      
        Summary:
      
         - Fix corruption in SWP emulation when STXR fails due to contention
         - Fix MMU re-initialisation when resuming from a low-power state
         - Fix stack unwinding code to match what ftrace expects
         - Fix relocation code in the EFI stub when DRAM base is not 2MB aligned"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/efi: do not assume DRAM base is aligned to 2 MB
        Revert "ARM64: unwind: Fix PC calculation"
        arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap
        arm64: compat: fix stxr failure case in SWP emulation
      9b971e77
    • Linus Torvalds's avatar
      Merge tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 7c0f488f
      Linus Torvalds authored
      Pull ia64 kcmp syscall from Tony Luck:
       "Missed adding the kcmp() syscall a long time ago.  Now it seems that
        it is essential to build systemd"
      
      * tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Wire up kcmp syscall
      7c0f488f
    • Roman Gushchin's avatar
      md/raid5: fix locking in handle_stripe_clean_event() · b8a9d66d
      Roman Gushchin authored
      After commit 566c09c5 ("raid5: relieve lock contention in get_active_stripe()")
      __find_stripe() is called under conf->hash_locks + hash.
      But handle_stripe_clean_event() calls remove_hash() under
      conf->device_lock.
      
      Under some cirscumstances the hash chain can be circuited,
      and we get an infinite loop with disabled interrupts and locked hash
      lock in __find_stripe(). This leads to hard lockup on multiple CPUs
      and following system crash.
      
      I was able to reproduce this behavior on raid6 over 6 ssd disks.
      The devices_handle_discard_safely option should be set to enable trim
      support. The following script was used:
      
      for i in `seq 1 32`; do
          dd if=/dev/zero of=large$i bs=10M count=100 &
      done
      
      neilb: original was against a 3.x kernel.  I forward-ported
        to 4.3-rc.  This verison is suitable for any kernel since
        Commit: 59fc630b ("RAID5: batch adjacent full stripe write")
        (v4.1+).  I'll post a version for earlier kernels to stable.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Fixes: 566c09c5 ("raid5: relieve lock contention in get_active_stripe()")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <stable@vger.kernel.org> # 3.13 - 4.2
      b8a9d66d
    • Ronny Hegewald's avatar
      rbd: require stable pages if message data CRCs are enabled · bae818ee
      Ronny Hegewald authored
      rbd requires stable pages, as it performs a crc of the page data before
      they are send to the OSDs.
      
      But since kernel 3.9 (patch 1d1d1a76
      "mm: only enforce stable page writes if the backing device requires
      it") it is not assumed anymore that block devices require stable pages.
      
      This patch sets the necessary flag to get stable pages back for rbd.
      
      In a ceph installation that provides multiple ext4 formatted rbd
      devices "bad crc" messages appeared regularly (ca 1 message every 1-2
      minutes on every OSD that provided the data for the rbd) in the
      OSD-logs before this patch. After this patch this messages are pretty
      much gone (only ca 1-2 / month / OSD).
      
      Cc: stable@vger.kernel.org # 3.9+, needs backporting
      Signed-off-by: default avatarRonny Hegewald <Ronny.Hegewald@online.de>
      [idryomov@gmail.com: require stable pages only in crc case, changelog]
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      bae818ee
  6. 29 Oct, 2015 6 commits
  7. 28 Oct, 2015 1 commit