1. 26 Jan, 2020 6 commits
  2. 24 Jan, 2020 2 commits
  3. 20 Jan, 2020 1 commit
  4. 16 Jan, 2020 8 commits
    • Darrick J. Wong's avatar
      xfs: check log iovec size to make sure it's plausibly a buffer log format · 8a6453a8
      Darrick J. Wong authored
      When log recovery is processing buffer log items, we should check that
      the incoming iovec actually describes a region of memory large enough to
      contain the log format and the dirty map.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      8a6453a8
    • Darrick J. Wong's avatar
      xfs: make struct xfs_buf_log_format have a consistent size · b7df5e92
      Darrick J. Wong authored
      Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
      end of struct xfs_buf_log_format.  This makes the size consistent so
      that we can check it in xfs_ondisk.h, and will be needed once we start
      logging attribute values.
      
      On amd64 we get the following pahole:
      
      struct xfs_buf_log_format {
              short unsigned int         blf_type;       /*     0     2 */
              short unsigned int         blf_size;       /*     2     2 */
              short unsigned int         blf_flags;      /*     4     2 */
              short unsigned int         blf_len;        /*     6     2 */
              long long int              blf_blkno;      /*     8     8 */
              unsigned int               blf_map_size;   /*    16     4 */
              unsigned int               blf_data_map[16]; /*    20    64 */
              /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */
      
              /* size: 88, cachelines: 2, members: 7 */
              /* padding: 4 */
              /* last cacheline: 24 bytes */
      };
      
      But on i386 we get the following:
      
      struct xfs_buf_log_format {
              short unsigned int         blf_type;       /*     0     2 */
              short unsigned int         blf_size;       /*     2     2 */
              short unsigned int         blf_flags;      /*     4     2 */
              short unsigned int         blf_len;        /*     6     2 */
              long long int              blf_blkno;      /*     8     8 */
              unsigned int               blf_map_size;   /*    16     4 */
              unsigned int               blf_data_map[16]; /*    20    64 */
              /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */
      
              /* size: 84, cachelines: 2, members: 7 */
              /* last cacheline: 20 bytes */
      };
      
      Notice how the amd64 compiler inserts 4 bytes of padding to the end of
      the structure to ensure 8-byte alignment.  Prior to "xfs: fix memory
      corruption during remote attr value buffer invalidation" we would try to
      write to blf_data_map[17], which is harmless on amd64 but really bad on
      i386.
      
      This shouldn't cause any changes in the ondisk logging formats because
      the log code writes out the log vectors with the appropriate size for
      the log item's map_size, and log recovery treats the data_map array as a
      VLA.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      b7df5e92
    • Darrick J. Wong's avatar
      xfs: complain if anyone tries to create a too-large buffer log item · c3d5f0c2
      Darrick J. Wong authored
      Complain if someone calls xfs_buf_item_init on a buffer that is larger
      than the dirty bitmap can handle, or tries to log a region that's past
      the end of the dirty bitmap.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      c3d5f0c2
    • Darrick J. Wong's avatar
      xfs: clean up xfs_buf_item_get_format return value · c64dd49b
      Darrick J. Wong authored
      The only thing that can cause a nonzero return from
      xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
      Get rid of all the unnecessary error handling.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      c64dd49b
    • Darrick J. Wong's avatar
      xfs: streamline xfs_attr3_leaf_inactive · 0bb9d159
      Darrick J. Wong authored
      Now that we know we don't have to take a transaction to stale the incore
      buffers for a remote value, get rid of the unnecessary memory allocation
      in the leaf walker and call the rmt_stale function directly.  Flatten
      the loop while we're at it.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      0bb9d159
    • Darrick J. Wong's avatar
      xfs: fix memory corruption during remote attr value buffer invalidation · e8db2aaf
      Darrick J. Wong authored
      While running generic/103, I observed what looks like memory corruption
      and (with slub debugging turned on) a slub redzone warning on i386 when
      inactivating an inode with a 64k remote attr value.
      
      On a v5 filesystem, maximally sized remote attr values require one block
      more than 64k worth of space to hold both the remote attribute value
      header (64 bytes).  On a 4k block filesystem this results in a 68k
      buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
      that even though we'll never use more than 65,600 bytes of this buffer,
      XFS_MAX_BLOCKSIZE is 64k.
      
      This is a problem because the definition of struct xfs_buf_log_format
      allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
      invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
      the dirty map, writing right off the end of the log item and corrupting
      memory.  We've gotten away with this on x86_64 for years because the
      compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
      
      Fortunately for us, remote attribute values are written to disk with
      xfs_bwrite(), which is to say that they are not logged.  Fix the problem
      by removing all places where we could end up creating a buffer log item
      for a remote attribute value and leave a note explaining why.  Next,
      replace the open-coded buffer invalidation with a call to the helper we
      created in the previous patch that does better checking for bad metadata
      before marking the buffer stale.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      e8db2aaf
    • Darrick J. Wong's avatar
      xfs: refactor remote attr value buffer invalidation · 8edbb26b
      Darrick J. Wong authored
      Hoist the code that invalidates remote extended attribute value buffers
      into a separate helper function.  This prepares us for a memory
      corruption fix in the next patch.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      8edbb26b
    • Christoph Hellwig's avatar
      xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read · 7b53b868
      Christoph Hellwig authored
      Direct I/O reads can also be used with RWF_NOWAIT & co.  Fix the inode
      locking in xfs_file_dio_aio_read to take IOCB_NOWAIT into account.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      7b53b868
  5. 15 Jan, 2020 1 commit
    • Vincenzo Frascino's avatar
      xfs: Add __packed to xfs_dir2_sf_entry_t definition · ca78eee7
      Vincenzo Frascino authored
      xfs_check_ondisk_structs() verifies that the sizes of the data types
      used by xfs are correct via the XFS_CHECK_STRUCT_SIZE() macro.
      
      Since the structures padding can vary depending on the ABI (e.g. on
      ARM OABI structures are padded to multiple of 32 bits), it may happen
      that xfs_dir2_sf_entry_t size check breaks the compilation with the
      assertion below:
      
      In file included from linux/include/linux/string.h:6,
                       from linux/include/linux/uuid.h:12,
                       from linux/fs/xfs/xfs_linux.h:10,
                       from linux/fs/xfs/xfs.h:22,
                       from linux/fs/xfs/xfs_super.c:7:
      In function ‘xfs_check_ondisk_structs’,
          inlined from ‘init_xfs_fs’ at linux/fs/xfs/xfs_super.c:2025:2:
      linux/include/linux/compiler.h:350:38:
          error: call to ‘__compiletime_assert_107’ declared with attribute
          error: XFS: sizeof(xfs_dir2_sf_entry_t) is wrong, expected 3
          _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
      
      Restore the correct behavior adding __packed to the structure definition.
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      ca78eee7
  6. 14 Jan, 2020 3 commits
    • Darrick J. Wong's avatar
      xfs: fix s_maxbytes computation on 32-bit kernels · 932befe3
      Darrick J. Wong authored
      I observed a hang in generic/308 while running fstests on a i686 kernel.
      The hang occurred when trying to purge the pagecache on a large sparse
      file that had a page created past MAX_LFS_FILESIZE, which caused an
      integer overflow in the pagecache xarray and resulted in an infinite
      loop.
      
      I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in
      commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros") so
      that it is now one page short of the maximum page index on 32-bit
      kernels.  Because the XFS function to compute max offset open-codes the
      2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform
      any sanity checking of s_maxbytes, the code in generic/308 can create a
      page above the pagecache's limit and kaboom.
      
      Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and
      aborting the mount with a warning if our assumptions ever break.  I have
      no answer for why this seems to have been broken for years and nobody
      noticed.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      932befe3
    • Darrick J. Wong's avatar
      xfs: truncate should remove all blocks, not just to the end of the page cache · 4bbb04ab
      Darrick J. Wong authored
      xfs_itruncate_extents_flags() is supposed to unmap every block in a file
      from EOF onwards.  Oddly, it uses s_maxbytes as the upper limit to the
      bunmapi range, even though s_maxbytes reflects the highest offset the
      pagecache can support, not the highest offset that XFS supports.
      
      The result of this confusion is that if you create a 20T file on a
      64-bit machine, mount the filesystem on a 32-bit machine, and remove the
      file, we leak everything above 16T.  Fix this by capping the bunmapi
      request at the maximum possible block offset, not s_maxbytes.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4bbb04ab
    • Darrick J. Wong's avatar
      xfs: introduce XFS_MAX_FILEOFF · a5084865
      Darrick J. Wong authored
      Introduce a new #define for the maximum supported file block offset.
      We'll use this in the next patch to make it more obvious that we're
      doing some operation for all possible inode fork mappings after a given
      offset.  We can't use ULLONG_MAX here because bunmapi uses that to
      detect when it's done.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      a5084865
  7. 09 Jan, 2020 6 commits
  8. 07 Jan, 2020 1 commit
  9. 06 Jan, 2020 2 commits
  10. 29 Dec, 2019 5 commits
  11. 28 Dec, 2019 4 commits
  12. 27 Dec, 2019 1 commit