An error occurred fetching the project authors.
  1. 20 Sep, 2017 2 commits
  2. 07 Jun, 2017 2 commits
    • Brian Foster's avatar
      xfs: fix indlen accounting error on partial delalloc conversion · 53c44c23
      Brian Foster authored
      commit 0daaecac upstream.
      
      The delalloc -> real block conversion path uses an incorrect
      calculation in the case where the middle part of a delalloc extent
      is being converted. This is documented as a rare situation because
      XFS generally attempts to maximize contiguity by converting as much
      of a delalloc extent as possible.
      
      If this situation does occur, the indlen reservation for the two new
      delalloc extents left behind by the conversion of the middle range
      is calculated and compared with the original reservation. If more
      blocks are required, the delta is allocated from the global block
      pool. This delta value can be characterized as the difference
      between the new total requirement (temp + temp2) and the currently
      available reservation minus those blocks that have already been
      allocated (startblockval(PREV.br_startblock) - allocated).
      
      The problem is that the current code does not account for previously
      allocated blocks correctly. It subtracts the current allocation
      count from the (new - old) delta rather than the old indlen
      reservation. This means that more indlen blocks than have been
      allocated end up stashed in the remaining extents and free space
      accounting is broken as a result.
      
      Fix up the calculation to subtract the allocated block count from
      the original extent indlen and thus correctly allocate the
      reservation delta based on the difference between the new total
      requirement and the unused blocks from the original reservation.
      Also remove a bogus assert that contradicts the fact that the new
      indlen reservation can be larger than the original indlen
      reservation.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53c44c23
    • Christoph Hellwig's avatar
      xfs: fix integer truncation in xfs_bmap_remap_alloc · 99226b89
      Christoph Hellwig authored
      commit 52813fb1 upstream.
      
      bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
      xfs_aglock_t, which truncates the value to 32 bits.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99226b89
  3. 08 Apr, 2017 7 commits
    • Christoph Hellwig's avatar
      xfs: try any AG when allocating the first btree block when reflinking · d5dbd1c9
      Christoph Hellwig authored
      commit 2fcc319d upstream.
      
      When a reflink operation causes the bmap code to allocate a btree block
      we're currently doing single-AG allocations due to having ->firstblock
      set and then try any higher AG due a little reflink quirk we've put in
      when adding the reflink code.  But given that we do not have a minleft
      reservation of any kind in this AG we can still not have any space in
      the same or higher AG even if the file system has enough free space.
      To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
      path instead.
      
      [And yes, we need to redo this properly instead of piling hacks over
       hacks.  I'm working on that, but it's not going to be a small series.
       In the meantime this fixes the customer reported issue]
      
      Also add a warning for failing allocations to make it easier to debug.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5dbd1c9
    • Brian Foster's avatar
      xfs: use iomap new flag for newly allocated delalloc blocks · da617af8
      Brian Foster authored
      commit f65e6fad upstream.
      
      Commit fa7f138a ("xfs: clear delalloc and cache on buffered write
      failure") fixed one regression in the iomap error handling code and
      exposed another. The fundamental problem is that if a buffered write
      is a rewrite of preexisting delalloc blocks and the write fails, the
      failure handling code can punch out preexisting blocks with valid
      file data.
      
      This was reproduced directly by sub-block writes in the LTP
      kernel/syscalls/write/write03 test. A first 100 byte write allocates
      a single block in a file. A subsequent 100 byte write fails and
      punches out the block, including the data successfully written by
      the previous write.
      
      To address this problem, update the ->iomap_begin() handler to
      distinguish newly allocated delalloc blocks from preexisting
      delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
      ->iomap_end() handler to decide when a failed or short write should
      punch out delalloc blocks.
      
      This introduces the subtle requirement that ->iomap_begin() should
      never combine newly allocated delalloc blocks with existing blocks
      in the resulting iomap descriptor. This can occur when a new
      delalloc reservation merges with a neighboring extent that is part
      of the current write, for example. Therefore, drop the
      post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
      just return the record inserted into the fork. This ensures only new
      blocks are returned and thus that preexisting delalloc blocks are
      always handled as "found" blocks and not punched out on a failed
      rewrite.
      Reported-by: default avatarXiong Zhou <xzhou@redhat.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da617af8
    • Christoph Hellwig's avatar
      xfs: tune down agno asserts in the bmap code · a2402936
      Christoph Hellwig authored
      commit 410d17f6 upstream.
      
      In various places we currently assert that xfs_bmap_btalloc allocates
      from the same as the firstblock value passed in, unless it's either
      NULLAGNO or the dop_low flag is set.  But the reflink code does not
      fully follow this convention as it passes in firstblock purely as
      a hint for the allocator without actually having previous allocations
      in the transaction, and without having a minleft check on the current
      AG, leading to the assert firing on a very full and heavily used
      file system.  As even the reflink code only allocates from equal or
      higher AGs for now we can simply the check to always allow for equal
      or higher AGs.
      
      Note that we need to eventually split the two meanings of the firstblock
      value.  At that point we can also allow the reflink code to allocate
      from any AG instead of limiting it in any way.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2402936
    • Brian Foster's avatar
      xfs: split indlen reservations fairly when under reserved · c251c6c2
      Brian Foster authored
      commit 75d65361 upstream.
      
      Certain workoads that punch holes into speculative preallocation can
      cause delalloc indirect reservation splits when the delalloc extent is
      split in two. If further splits occur, an already short-handed extent
      can be split into two in a manner that leaves zero indirect blocks for
      one of the two new extents. This occurs because the shortage is large
      enough that the xfs_bmap_split_indlen() algorithm completely drains the
      requested indlen of one of the extents before it honors the existing
      reservation.
      
      This ultimately results in a warning from xfs_bmap_del_extent(). This
      has been observed during file copies of large, sparse files using 'cp
      --sparse=always.'
      
      To avoid this problem, update xfs_bmap_split_indlen() to explicitly
      apply the reservation shortage fairly between both extents. This smooths
      out the overall indlen shortage and defers the situation where we end up
      with a delalloc extent with zero indlen reservation to extreme
      circumstances.
      Reported-by: default avatarPatrick Dung <mpatdung@gmail.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c251c6c2
    • Brian Foster's avatar
      xfs: handle indlen shortage on delalloc extent merge · 2d7c1c7f
      Brian Foster authored
      commit 0e339ef8 upstream.
      
      When a delalloc extent is created, it can be merged with pre-existing,
      contiguous, delalloc extents. When this occurs,
      xfs_bmap_add_extent_hole_delay() merges the extents along with the
      associated indirect block reservations. The expectation here is that the
      combined worst case indlen reservation is always less than or equal to
      the indlen reservation for the individual extents.
      
      This is not always the case, however, as existing extents can less than
      the expected indlen reservation if the extent was previously split due
      to a hole punch. If a new extent merges with such an extent, the total
      indlen requirement may be larger than the sum of the indlen reservations
      held by both extents.
      
      xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
      reservation is always available and assigns it to the merged extent
      without consideration for the indlen held by the pre-existing extent. As
      a result, the subsequent xfs_mod_fdblocks() call can attempt an
      unintentional allocation rather than a free (indicated by an ASSERT()
      failure). Further, if the allocation happens to fail in this context,
      the failure goes unhandled and creates a filesystem wide block
      accounting inconsistency.
      
      Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
      indlen reservation assigned to the merged extent to the sum of the
      indlen reservations held by each of the individual extents.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d7c1c7f
    • Darrick J. Wong's avatar
      xfs: allow unwritten extents in the CoW fork · 8370826f
      Darrick J. Wong authored
      commit 05a630d7 upstream.
      
      In the data fork, we only allow extents to perform the following state
      transitions:
      
      delay -> real <-> unwritten
      
      There's no way to move directly from a delalloc reservation to an
      /unwritten/ allocated extent.  However, for the CoW fork we want to be
      able to do the following to each extent:
      
      delalloc -> unwritten -> written -> remapped to data fork
      
      This will help us to avoid a race in the speculative CoW preallocation
      code between a first thread that is allocating a CoW extent and a second
      thread that is remapping part of a file after a write.  In order to do
      this, however, we need two things: first, we have to be able to
      transition from da to unwritten, and second the function that converts
      between real and unwritten has to be made aware of the cow fork.  Do
      both of those things.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8370826f
    • Darrick J. Wong's avatar
      xfs: filter out obviously bad btree pointers · efab3ae2
      Darrick J. Wong authored
      commit d5a91bae upstream.
      
      Don't let anybody load an obviously bad btree pointer.  Since the values
      come from disk, we must return an error, not just ASSERT.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efab3ae2
  4. 04 Feb, 2017 3 commits
  5. 12 Jan, 2017 9 commits
  6. 20 Oct, 2016 5 commits
  7. 05 Oct, 2016 8 commits
    • Darrick J. Wong's avatar
      xfs: try other AGs to allocate a BMBT block · 90e2056d
      Darrick J. Wong authored
      Prior to the introduction of reflink, allocating a block and mapping
      it into a file was performed in a single transaction with a single
      block reservation, and the allocator was supposed to find enough
      blocks to allocate the extent and any BMBT blocks that might be
      necessary (unless we're low on space).
      
      However, due to the way copy on write works, allocation and mapping
      have been split into two transactions, which means that we must be
      able to handle the case where we allocate an extent for CoW but that
      AG runs out of free space before the blocks can be mapped into a file,
      and the mapping requires a new BMBT block.  When this happens, look in
      one of the other AGs for a BMBT block instead of taking the FS down.
      
      The same applies to the functions that convert a data fork to extents
      and later btree format.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      90e2056d
    • Darrick J. Wong's avatar
      xfs: create a separate cow extent size hint for the allocator · f7ca3522
      Darrick J. Wong authored
      Create a per-inode extent size allocator hint for copy-on-write.  This
      hint is separate from the existing extent size hint so that CoW can
      take advantage of the fragmentation-reducing properties of extent size
      hints without disabling delalloc for regular writes.
      
      The extent size hint that's fed to the allocator during a copy on
      write operation is the greater of the cowextsize and regular extsize
      hint.
      
      During reflink, if we're sharing the entire source file to the entire
      destination file and the destination file doesn't already have a
      cowextsize hint, propagate the source file's cowextsize hint to the
      destination file.
      
      Furthermore, zero the bulkstat buffer prior to setting the fields
      so that we don't copy kernel memory contents into userspace.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      f7ca3522
    • Darrick J. Wong's avatar
      xfs: store in-progress CoW allocations in the refcount btree · 174edb0e
      Darrick J. Wong authored
      Due to the way the CoW algorithm in XFS works, there's an interval
      during which blocks allocated to handle a CoW can be lost -- if the FS
      goes down after the blocks are allocated but before the block
      remapping takes place.  This is exacerbated by the cowextsz hint --
      allocated reservations can sit around for a while, waiting to get
      used.
      
      Since the refcount btree doesn't normally store records with refcount
      of 1, we can use it to record these in-progress extents.  In-progress
      blocks cannot be shared because they're not user-visible, so there
      shouldn't be any conflicts with other programs.  This is a better
      solution than holding EFIs during writeback because (a) EFIs can't be
      relogged currently, (b) even if they could, EFIs are bound by
      available log space, which puts an unnecessary upper bound on how much
      CoW we can have in flight, and (c) we already have a mechanism to
      track blocks.
      
      At mount time, read the refcount records and free anything we find
      with a refcount of 1 because those were in-progress when the FS went
      down.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      174edb0e
    • Darrick J. Wong's avatar
      xfs: support removing extents from CoW fork · 4862cfe8
      Darrick J. Wong authored
      Create a helper method to remove extents from the CoW fork without
      any of the side effects (rmapbt/bmbt updates) of the regular extent
      deletion routine.  We'll eventually use this to clear out the CoW fork
      during ioend processing.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4862cfe8
    • Darrick J. Wong's avatar
      xfs: support allocating delayed extents in CoW fork · 60b4984f
      Darrick J. Wong authored
      Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
      allocation extents in the CoW fork to real allocations, and wire this
      up all the way back to xfs_iomap_write_allocate().  In a subsequent
      patch, we'll modify the writepage handler to call this.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      60b4984f
    • Darrick J. Wong's avatar
      xfs: support bmapping delalloc extents in the CoW fork · be51f811
      Darrick J. Wong authored
      Allow the creation of delayed allocation extents in the CoW fork.  In
      a subsequent patch we'll wire up iomap_begin to actually do this via
      reflink helper functions.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      be51f811
    • Darrick J. Wong's avatar
      xfs: introduce the CoW fork · 3993baeb
      Darrick J. Wong authored
      Introduce a new in-core fork for storing copy-on-write delalloc
      reservations and allocated extents that are in the process of being
      written out.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      3993baeb
    • Darrick J. Wong's avatar
      xfs: return work remaining at the end of a bunmapi operation · 4453593b
      Darrick J. Wong authored
      Return the range of file blocks that bunmapi didn't free.  This hint
      is used by CoW and reflink to figure out what part of an extent
      actually got freed so that it can set up the appropriate atomic
      remapping of just the freed range.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4453593b
  8. 04 Oct, 2016 3 commits
  9. 03 Oct, 2016 1 commit