1. 11 Jan, 2016 1 commit
  2. 07 Jan, 2016 21 commits
    • Sam Tygier's avatar
      Btrfs: Check metadata redundancy on balance · ee592d07
      Sam Tygier authored
      When converting a filesystem via balance check that metadata mode
      is at least as redundant as the data mode. For example give warning
      when:
      -dconvert=raid1 -mconvert=single
      Signed-off-by: default avatarSam Tygier <samtygier@yahoo.co.uk>
      [ minor message reformatting ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ee592d07
    • David Sterba's avatar
      btrfs: statfs: report zero available if metadata are exhausted · ca8a51b3
      David Sterba authored
      There is one ENOSPC case that's very confusing. There's Available
      greater than zero but no file operation succeds (besides removing
      files). This happens when the metadata are exhausted and there's no
      possibility to allocate another chunk.
      
      In this scenario it's normal that there's still some space in the data
      chunk and the calculation in df reflects that in the Avail value.
      
      To at least give some clue about the ENOSPC situation, let statfs report
      zero value in Avail, even if there's still data space available.
      
      Current:
        /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test
      
      New:
        /dev/sdb1             4.0G  3.3G     0 100% /mnt/test
      
      We calculate the remaining metadata space minus global reserve. If this
      is (supposedly) smaller than zero, there's no space. But this does not
      hold in practice, the exhausted state happens where's still some
      positive delta. So we apply some guesswork and compare the delta to a 4M
      threshold. (Practically observed delta was 2M.)
      
      We probably cannot calculate the exact threshold value because this
      depends on the internal reservations requested by various operations, so
      some operations that consume a few metadata will succeed even if the
      Avail is zero. But this is better than the other way around.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ca8a51b3
    • David Sterba's avatar
      btrfs: preallocate path for snapshot creation at ioctl time · 8546b570
      David Sterba authored
      We can also preallocate btrfs_path that's used during pending snapshot
      creation and avoid another late ENOMEM failure.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8546b570
    • David Sterba's avatar
      btrfs: allocate root item at snapshot ioctl time · b0c0ea63
      David Sterba authored
      The actual snapshot creation is delayed until transaction commit. If we
      cannot get enough memory for the root item there, we have to fail the
      whole transaction commit which is bad. So we'll allocate the memory at
      the ioctl call and pass it along with the pending_snapshot struct. The
      potential ENOMEM will be returned to the caller of snapshot ioctl.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b0c0ea63
    • David Sterba's avatar
      btrfs: do an allocation earlier during snapshot creation · a1ee7362
      David Sterba authored
      We can allocate pending_snapshot earlier and do not have to do cleanup
      in case of failure.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a1ee7362
    • David Sterba's avatar
      btrfs: use smaller type for btrfs_path locks · 4fb72bf2
      David Sterba authored
      The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see:
      
      * overall size of btrfs_path drops down from 136 to 112 (-24 bytes),
      * better packing in a slab page +6 objects
      * the whole structure now fits to 2 cachelines
      * slight decrease in code size:
      
         text    data     bss     dec     hex filename
       938731   43670   23144 1005545   f57e9 fs/btrfs/btrfs.ko.before
       938203   43670   23144 1005017   f55d9 fs/btrfs/btrfs.ko.after
      
      (and the generated assembly does not change much)
      
      The main purpose is to decrease the size of the structure without
      affecting performance. The byte access is usually well behaving accross
      arches, the locks are not accessed frequently and sometimes just
      compared to zero.
      
      Note for further size reduction attempts: the slots could be made u16
      but this might generate worse code on some arches (non-byte and non-int
      access). Also the range of operations on slots is wider compared to
      locks and the potential performance drop should be evaluated first.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4fb72bf2
    • David Sterba's avatar
      btrfs: use smaller type for btrfs_path lowest_level · 7853f15b
      David Sterba authored
      The level is 0..7, we can use smaller type. The size of btrfs_path is now
      136 bytes from 144, which is +2 objects that fit into a 4k slab.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7853f15b
    • David Sterba's avatar
      btrfs: use smaller type for btrfs_path reada · dccabfad
      David Sterba authored
      The possible values for reada are all positive and bounded, we can later
      save some bytes by storing it in u8.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      dccabfad
    • David Sterba's avatar
      btrfs: cleanup, use enum values for btrfs_path reada · e4058b54
      David Sterba authored
      Replace the integers by enums for better readability. The value 2 does
      not have any meaning since a7175319
      "Btrfs: do less aggressive btree readahead" (2009-01-22).
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e4058b54
    • David Sterba's avatar
      btrfs: constify static arrays · 4d4ab6d6
      David Sterba authored
      There are a few statically initialized arrays that can be made const.
      The remaining (like file_system_type, sysfs attributes or prop handlers)
      do not allow that due to type mismatch when passed to the APIs or
      because the structures are modified through other members.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4d4ab6d6
    • David Sterba's avatar
      btrfs: constify remaining structs with function pointers · 20e5506b
      David Sterba authored
      * struct extent_io_ops
      * struct btrfs_free_space_op
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      20e5506b
    • David Sterba's avatar
      btrfs tests: replace whole ops structure for free space tests · 28f0779a
      David Sterba authored
      Preparatory work for making btrfs_free_space_op constant. In
      test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with
      own version thus preventing constification. We can rework it so we
      replace the whole structure with the correct function pointers.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      28f0779a
    • David Sterba's avatar
      btrfs: don't use slab cache for struct btrfs_delalloc_work · 100d5702
      David Sterba authored
      Although we prefer to use separate caches for various structs, it seems
      better not to do that for struct btrfs_delalloc_work. Objects of this
      type are allocated rarely, when transaction commit calls
      btrfs_start_delalloc_roots, requesting delayed iputs.
      
      The objects are temporary (with some IO involved) but still allocated
      and freed within __start_delalloc_inodes. Memory allocation failure is
      handled.
      
      The slab cache is empty most of the time (observed on several systems),
      so if we need to allocate a new slab object, the first one has to
      allocate a full page. In a potential case of low memory conditions this
      might fail with higher probability compared to using the generic slab
      caches.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      100d5702
    • David Sterba's avatar
      btrfs: drop duplicate prefix from scrub workqueues · 0de270fa
      David Sterba authored
      The helper btrfs_alloc_workqueue will add the "btrfs-" prefix.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0de270fa
    • David Sterba's avatar
    • David Sterba's avatar
      btrfs: handle invalid num_stripes in sys_array · f5cdedd7
      David Sterba authored
      We can handle the special case of num_stripes == 0 directly inside
      btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to
      catch other unhandled cases where we fail to validate external data.
      
      A crafted or corrupted image crashes at mount time:
      
      BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0
      BTRFS info (device loop0): disk space caching is enabled
      BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
      Kernel panic - not syncing: BUG!
      CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25
      Stack:
       637af890 60062489 602aeb2e 604192ba
       60387961 00000011 637af8a0 6038a835
       637af9c0 6038776b 634ef32b 00000000
      Call Trace:
       [<6001c86d>] show_stack+0xfe/0x15b
       [<6038a835>] dump_stack+0x2a/0x2c
       [<6038776b>] panic+0x13e/0x2b3
       [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff
       [<601cfbbe>] open_ctree+0x192d/0x27af
       [<6019c2c1>] btrfs_mount+0x8f5/0xb9a
       [<600bc9a7>] mount_fs+0x11/0xf3
       [<600d5167>] vfs_kern_mount+0x75/0x11a
       [<6019bcb0>] btrfs_mount+0x2e4/0xb9a
       [<600bc9a7>] mount_fs+0x11/0xf3
       [<600d5167>] vfs_kern_mount+0x75/0x11a
       [<600d710b>] do_mount+0xa35/0xbc9
       [<600d7557>] SyS_mount+0x95/0xc8
       [<6001e884>] handle_syscall+0x6b/0x8e
      Reported-by: default avatarJiri Slaby <jslaby@suse.com>
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      CC: stable@vger.kernel.org	# 3.19+
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f5cdedd7
    • David Sterba's avatar
      btrfs: better packing of btrfs_delayed_extent_op · 35b3ad50
      David Sterba authored
      btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now
      and has 8 unused bytes. Reducing the level type to u8 makes it possible
      to squeeze it to the padding byte after key. The bitfields were switched
      to bool as there's space to store the full byte without increasing the
      whole structure, besides that the generated assembly is smaller.
      
      struct btrfs_delayed_extent_op {
      	struct btrfs_disk_key      key;                  /*     0    17 */
      	u8                         level;                /*    17     1 */
      	bool                       update_key;           /*    18     1 */
      	bool                       update_flags;         /*    19     1 */
      	bool                       is_data;              /*    20     1 */
      
      	/* XXX 3 bytes hole, try to pack */
      
      	u64                        flags_to_set;         /*    24     8 */
      
      	/* size: 32, cachelines: 1, members: 6 */
      	/* sum members: 29, holes: 1, sum holes: 3 */
      	/* last cacheline: 32 bytes */
      };
      
      The final size is 32 bytes which gives +26 object per slab page.
      
         text	   data	    bss	    dec	    hex	filename
       938811	  43670	  23144	1005625	  f5839	fs/btrfs/btrfs.ko.before
       938747	  43670	  23144	1005561	  f57f9	fs/btrfs/btrfs.ko.after
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      35b3ad50
    • David Sterba's avatar
      btrfs: put delayed item hook into inode · 8089fe62
      David Sterba authored
      Inodes for delayed iput allocate a trivial helper structure, let's place
      the list hook directly into the inode and save a kmalloc (killing a
      __GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.
      
      The inode can be put into the delayed_iputs list more than once and we
      have to keep the count. This means we can't use the list_splice to
      process a bunch of inodes because we'd lost track of the count if the
      inode is put into the delayed iputs again while it's processed.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8089fe62
    • Zhao Lei's avatar
      btrfs: Support convert to -d dup for btrfs-convert · c5ca8781
      Zhao Lei authored
      Since we will add support for -d dup for non-mixed filesystem,
      kernel need to support converting to this raid-type.
      
      This patch remove limitation of above case.
      
      Tested by following script:
      (combination of dup conversion with fsck):
      
      export TEST_DEV='/dev/vdc'
      export TEST_DIR='/var/ltf/tester/mnt'
      
      do_dup_test()
      {
          local m_from="$1"
          local d_from="$2"
          local m_to="$3"
          local d_to="$4"
      
          echo "Convert from -m $m_from -d $d_from to -m $m_to -d $d_to"
      
          umount "$TEST_DIR" &>/dev/null
          ./mkfs.btrfs -f -m "$m_from" -d "$d_from" "$TEST_DEV" >/dev/null || return 1
          mount "$TEST_DEV" "$TEST_DIR" || return 1
      
          cp -a /sbin/* "$TEST_DIR"
      
          [[ "$m_from" != "$m_to" ]] && {
              ./btrfs balance start -f -mconvert="$m_to" "$TEST_DIR" || return 1
          }
      
          [[ "$d_from" != "$d_to" ]] && {
      	local opt=()
      	[[ "$d_to" == single ]] && opt+=("-f")
              ./btrfs balance start "${opt[@]}" -dconvert="$d_to" "$TEST_DIR" || return 1
          }
      
          umount "$TEST_DIR" || return 1
          ./btrfsck "$TEST_DEV" || return 1
          echo
      
          return 0
      }
      
      test_all()
      {
          for m_from in single dup; do
          for d_from in single dup; do
          for m_to in single dup; do
          for d_to in single dup; do
          do_dup_test "$m_from" "$d_from" "$m_to" "$d_to" || return 1
          done
          done
          done
          done
      }
      
      test_all
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c5ca8781
    • Josef Bacik's avatar
      Btrfs: igrab inode in writepage · be7bd730
      Josef Bacik authored
      We hit this panic on a few of our boxes this week where we have an
      ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
      but weren't doing it in writepage which can be called directly from the VM on
      dirty pages.  If the inode has been unlinked then we could have I_FREEING set
      which means igrab() would return NULL and we get this panic.  Fix this by trying
      to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
      and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      be7bd730
    • Anand Jain's avatar
      Btrfs: add missing brelse when superblock checksum fails · b2acdddf
      Anand Jain authored
      Looks like oversight, call brelse() when checksum fails. Further down the
      code, in the non error path, we do call brelse() and so we don't see
      brelse() in the goto error paths.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b2acdddf
  3. 30 Dec, 2015 2 commits
  4. 23 Dec, 2015 6 commits
  5. 21 Dec, 2015 2 commits
    • Filipe Manana's avatar
      Btrfs: fix unprotected list operations at btrfs_write_dirty_block_groups · e44081ef
      Filipe Manana authored
      We call btrfs_write_dirty_block_groups() in the critical section of a
      transaction's commit, when no other tasks can join the transaction and
      add more block groups to the transaction's list of dirty block groups,
      so we not taking the dirty block groups spinlock when checking for the
      list's emptyness, grabbing its first element or deleting elements from
      it.
      
      However there's a special and rare case where we can have a concurrent
      task adding elements to this list. We trigger writeback for space
      caches before at btrfs_start_dirty_block_groups() and in past iterations
      of the loop at btrfs_write_dirty_block_groups(), this means that when
      the writeback finishes (which happens asynchronously) it creates a
      task for the endio free space work queue that executes
      btrfs_finish_ordered_io() - this function is able to join the transaction,
      through btrfs_join_transaction_nolock(), and update the free space cache's
      inode item in the root tree, which can result in COWing nodes of this tree
      and therefore allocation of a new block group can happen, which gets added
      to the transaction's list of dirty block groups while the transaction
      commit task is operating on it concurrently.
      
      So fix this by taking the dirty block groups spinlock before doing
      operations on the dirty block groups list at
      btrfs_write_dirty_block_groups().
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      e44081ef
    • Linus Torvalds's avatar
      Linux 4.4-rc6 · 4ef76753
      Linus Torvalds authored
      4ef76753
  6. 20 Dec, 2015 7 commits
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 9f7e4327
      Linus Torvalds authored
      Pull RTC fixes from Alexandre Belloni:
       "Late fixes for the RTC subsystem for 4.4:
      
        A fix for a nasty hardware bug in rk808 and an initialization
        reordering in da9063 to fix a possible crash"
      
      * tag 'rtc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: da9063: fix access ordering error during RTC interrupt at system power on
        rtc: rk808: Compensate for Rockchip calendar deviation on November 31st
      9f7e4327
    • Steve Twiss's avatar
      rtc: da9063: fix access ordering error during RTC interrupt at system power on · 77535ace
      Steve Twiss authored
      This fix alters the ordering of the IRQ and device registrations in the RTC
      driver probe function. This change will apply to the RTC driver that supports
      both DA9063 and DA9062 PMICs.
      
      A problem could occur with the existing RTC driver if:
      
      A system is started from a cold boot using the PMIC RTC IRQ to initiate a
      power on operation. For instance, if an RTC alarm is used to start a
      platform from power off.
      The existing driver IRQ is requested before the device has been properly
      registered.
      i.e.
          ret = devm_request_threaded_irq()
      comes before
          rtc->rtc_dev = devm_rtc_device_register();
      
      In this case, the interrupt can be called before the device has been
      registered and the handler can be called immediately. The IRQ handler
      da9063_alarm_event() contains the function call
      
          rtc_update_irq(rtc->rtc_dev, 1, RTC_IRQF | RTC_AF);
      
      which in turn tries to access the unavailable rtc->rtc_dev.
      
      The fix is to reorder the functions inside the RTC probe. The IRQ is
      requested after the RTC device resource has been registered so that
      get_irq_byname is the last thing to happen.
      Signed-off-by: default avatarSteve Twiss <stwiss.opensource@diasemi.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      77535ace
    • Julius Werner's avatar
      rtc: rk808: Compensate for Rockchip calendar deviation on November 31st · f076ef44
      Julius Werner authored
      In A.D. 1582 Pope Gregory XIII found that the existing Julian calendar
      insufficiently represented reality, and changed the rules about
      calculating leap years to account for this. Similarly, in A.D. 2013
      Rockchip hardware engineers found that the new Gregorian calendar still
      contained flaws, and that the month of November should be counted up to
      31 days instead. Unfortunately it takes a long time for calendar changes
      to gain widespread adoption, and just like more than 300 years went by
      before the last Protestant nation implemented Greg's proposal, we will
      have to wait a while until all religions and operating system kernels
      acknowledge the inherent advantages of the Rockchip system. Until then
      we need to translate dates read from (and written to) Rockchip hardware
      back to the Gregorian format.
      
      This patch works by defining Jan 1st, 2016 as the arbitrary anchor date
      on which Rockchip and Gregorian calendars are in sync. From that we can
      translate arbitrary later dates back and forth by counting the number
      of November/December transitons since the anchor date to determine the
      offset between the calendars. We choose this method (rather than trying
      to regularly "correct" the date stored in hardware) since it's the only
      way to ensure perfect time-keeping even if the system may be shut down
      for an unknown number of years. The drawback is that other software
      reading the same hardware (e.g. mainboard firmware) must use the same
      translation convention (including the same anchor date) to be able to
      read and write correct timestamps from/to the RTC.
      Signed-off-by: default avatarJulius Werner <jwerner@chromium.org>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      f076ef44
    • Linus Torvalds's avatar
      Merge tag 'tty-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 69c37a92
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some tty/serial driver fixes for 4.4-rc6 that resolve some
        reported problems.  All of these have been in linux-next.  The details
        are in the shortlog"
      
      * tag 'tty-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: Fix GPF in flush_to_ldisc()
        serial: earlycon: Add missing spinlock initialization
        serial: sh-sci: Fix length of scatterlist
        n_tty: Fix poll() after buffer-limited eof push read
        serial: 8250_uniphier: fix dl_read and dl_write functions
      69c37a92
    • Linus Torvalds's avatar
      Merge tag 'usb-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 24b0d5e7
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some USB and PHY fixes for 4.4-rc6.  All of them resolve some
        reported problems.  Full details in the shortlog"
      
      * tag 'usb-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: fix invalid memory access in hub_activate()
        USB: ipaq.c: fix a timeout loop
        phy: core: Get a refcount to phy in devm_of_phy_get_by_index()
        phy: cygnus: pcie: add missing of_node_put
        phy: miphy365x: add missing of_node_put
        phy: miphy28lp: add missing of_node_put
        phy: rockchip-usb: add missing of_node_put
        phy: berlin-sata: add missing of_node_put
        phy: mt65xx-usb3: add missing of_node_put
        phy: brcmstb-sata: add missing of_node_put
        phy: sun9i-usb: add USB dependency
      24b0d5e7
    • Linus Torvalds's avatar
      Merge tag 'md/4.4-rc5-fixes' of git://neil.brown.name/md · 3a87711e
      Linus Torvalds authored
      Pull md fixes from Neil Brown:
       "Four fixes for md:
      
         - two recently introduced regressions fixed.
         - one older bug in RAID10 - tagged for -stable since 4.2
         - one minor sysfs api improvement"
      
      * tag 'md/4.4-rc5-fixes' of git://neil.brown.name/md:
        Fix remove_and_add_spares removes drive added as spare in slot_store
        md: fix bug due to nested suspend
        MD: change journal disk role to disk 0
        md/raid10: fix data corruption and crash during resync
      3a87711e
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 35b3154e
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - Partial revert of "powerpc: Individual System V IPC system calls"
       - pr_warn_once on unsupported OPAL_MSG type from Stewart
       - Fix deadlock in opal-irqchip introduced by "Fix double endian
         conversion" from Alistair
      
      * tag 'powerpc-4.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"
        powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
        Partial revert of "powerpc: Individual System V IPC system calls"
      35b3154e
  7. 19 Dec, 2015 1 commit