1. 09 Feb, 2017 33 commits
  2. 04 Feb, 2017 7 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.8 · c8ea2f3b
      Greg Kroah-Hartman authored
      c8ea2f3b
    • Darrick J. Wong's avatar
      xfs: fix bmv_count confusion w/ shared extents · b5b4d4a9
      Darrick J. Wong authored
      commit c364b6d0 upstream.
      
      In a bmapx call, bmv_count is the total size of the array, including the
      zeroth element that userspace uses to supply the search key.  The output
      array starts at offset 1 so that we can set up the user for the next
      invocation.  Since we now can split an extent into multiple bmap records
      due to shared/unshared status, we have to be careful that we don't
      overflow the output array.
      
      In the original patch f86f4037 ("xfs: teach get_bmapx about shared
      extents and the CoW fork") I used cur_ext (the output index) to check
      for overflows, albeit with an off-by-one error.  Since nexleft no longer
      describes the number of unfilled slots in the output, we can rip all
      that out and use cur_ext for the overflow check directly.
      
      Failure to do this causes heap corruption in bmapx callers such as
      xfs_io and xfs_scrub.  xfs/328 can reproduce this problem.
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5b4d4a9
    • Darrick J. Wong's avatar
      xfs: clear _XBF_PAGES from buffers when readahead page · 5d44dd54
      Darrick J. Wong authored
      commit 2aa6ba7b upstream.
      
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d44dd54
    • Christoph Hellwig's avatar
      xfs: extsize hints are not unlikely in xfs_bmap_btalloc · 29f96b7e
      Christoph Hellwig authored
      commit 493611eb upstream.
      
      With COW files they are the hotpath, just like for files with the
      extent size hint attribute.  We really shouldn't micro-manage anything
      but failure cases with unlikely.
      
      Additionally Arnd Bergmann recently reported that one of these two
      unlikely annotations causes link failures together with an upcoming
      kernel instrumentation patch, so let's get rid of it ASAP.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29f96b7e
    • Brian Foster's avatar
      xfs: remove racy hasattr check from attr ops · aab858da
      Brian Foster authored
      commit 5a93790d upstream.
      
      xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
      away a lock cycle in cases where the fork does not exist or is otherwise
      empty. This check is not safe, however, because an attribute fork short
      form to extent format conversion includes a transient state that causes
      the xfs_inode_hasattr() check to fail. Specifically,
      xfs_attr_shortform_to_leaf() creates an empty extent format attribute
      fork and then adds the existing shortform attributes to it.
      
      This means that lookup of an existing xattr can spuriously return
      -ENOATTR when racing against a setxattr that causes the associated
      format conversion. This was originally reproduced by an untar on a
      particularly configured glusterfs volume, but can also be reproduced on
      demand with properly crafted xattr requests.
      
      The format conversion occurs under the exclusive ilock. xfs_attr_get()
      and xfs_attr_remove() already have the proper locking and checks further
      down in the functions to handle this situation correctly. Drop the
      unlocked checks to avoid the spurious failure and rely on the existing
      logic.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aab858da
    • Darrick J. Wong's avatar
      xfs: verify dirblocklog correctly · 29094164
      Darrick J. Wong authored
      commit 83d230eb upstream.
      
      sb_dirblklog is added to sb_blocklog to compute the directory block size
      in bytes.  Therefore, we must compare the sum of both those values
      against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29094164
    • Christoph Hellwig's avatar
      xfs: fix COW writeback race · 214d55ef
      Christoph Hellwig authored
      commit d2b3964a upstream.
      
      Due to the way how xfs_iomap_write_allocate tries to convert the whole
      found extents from delalloc to real space we can run into a race
      condition with multiple threads doing writes to this same extent.
      For the non-COW case that is harmless as the only thing that can happen
      is that we call xfs_bmapi_write on an extent that has already been
      converted to a real allocation.  For COW writes where we move the extent
      from the COW to the data fork after I/O completion the race is, however,
      not quite as harmless.  In the worst case we are now calling
      xfs_bmapi_write on a region that contains hole in the COW work, which
      will trip up an assert in debug builds or lead to file system corruption
      in non-debug builds.  This seems to be reproducible with workloads of
      small O_DSYNC write, although so far I've not managed to come up with
      a with an isolated reproducer.
      
      The fix for the issue is relatively simple:  tell xfs_bmapi_write
      that we are only asked to convert delayed allocations and skip holes
      in that case.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      214d55ef