1. 23 Sep, 2020 8 commits
    • Gao Xiang's avatar
      xfs: clean up calculation of LR header blocks · 0c771b99
      Gao Xiang authored
      Let's use DIV_ROUND_UP() to calculate log record header
      blocks as what did in xlog_get_iclog_buffer_size() and
      wrap up a common helper for log recovery.
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      0c771b99
    • Gao Xiang's avatar
      xfs: avoid LR buffer overrun due to crafted h_len · f692d09e
      Gao Xiang authored
      Currently, crafted h_len has been blocked for the log
      header of the tail block in commit a70f9fe5 ("xfs:
      detect and handle invalid iclog size set by mkfs").
      
      However, each log record could still have crafted h_len
      and cause log record buffer overrun. So let's check
      h_len vs buffer size for each log record as well.
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      f692d09e
    • Darrick J. Wong's avatar
      xfs: don't release log intent items when recovery fails · 384ff09b
      Darrick J. Wong authored
      Nowadays, log recovery will call ->release on the recovered intent items
      if recovery fails.  Therefore, it's redundant to release them from
      inside the ->recover functions when they're about to return an error.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      384ff09b
    • Darrick J. Wong's avatar
      xfs: attach inode to dquot in xfs_bui_item_recover · 2dbf872c
      Darrick J. Wong authored
      In the bmap intent item recovery code, we must be careful to attach the
      inode to its dquots (if quotas are enabled) so that a change in the
      shape of the bmap btree doesn't cause the quota counters to be
      incorrect.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      2dbf872c
    • Darrick J. Wong's avatar
      xfs: log new intent items created as part of finishing recovered intent items · 93293bcb
      Darrick J. Wong authored
      During a code inspection, I found a serious bug in the log intent item
      recovery code when an intent item cannot complete all the work and
      decides to requeue itself to get that done.  When this happens, the
      item recovery creates a new incore deferred op representing the
      remaining work and attaches it to the transaction that it allocated.  At
      the end of _item_recover, it moves the entire chain of deferred ops to
      the dummy parent_tp that xlog_recover_process_intents passed to it, but
      fail to log a new intent item for the remaining work before committing
      the transaction for the single unit of work.
      
      xlog_finish_defer_ops logs those new intent items once recovery has
      finished dealing with the intent items that it recovered, but this isn't
      sufficient.  If the log is forced to disk after a recovered log item
      decides to requeue itself and the system goes down before we call
      xlog_finish_defer_ops, the second log recovery will never see the new
      intent item and therefore has no idea that there was more work to do.
      It will finish recovery leaving the filesystem in a corrupted state.
      
      The same logic applies to /any/ deferred ops added during intent item
      recovery, not just the one handling the remaining work.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      93293bcb
    • Darrick J. Wong's avatar
      xfs: check dabtree node hash values when loading child blocks · e581c939
      Darrick J. Wong authored
      When xchk_da_btree_block is loading a non-root dabtree block, we know
      that the parent block had to have a (hashval, address) pointer to the
      block that we just loaded.  Check that the hashval in the parent matches
      the block we just loaded.
      
      This was found by fuzzing nbtree[3].hashval = ones in xfs/394.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      e581c939
    • Darrick J. Wong's avatar
      xfs: don't free rt blocks when we're doing a REMAP bunmapi call · 8df0fa39
      Darrick J. Wong authored
      When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent
      to be unmapped from the given file fork without the extent being freed.
      We do this for non-rt files, but we forgot to do this for realtime
      files.  So far this isn't a big deal since nobody makes a bunmapi call
      to a rt file with the REMAP flag set, but don't leave a logic bomb.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      8df0fa39
    • Chandan Babu R's avatar
      xfs: Set xfs_buf's b_ops member when zeroing bitmap/summary files · c54e14d1
      Chandan Babu R authored
      In xfs_growfs_rt(), we enlarge bitmap and summary files by allocating
      new blocks for both files. For each of the new blocks allocated, we
      allocate an xfs_buf, zero the payload, log the contents and commit the
      transaction. Hence these buffers will eventually find themselves
      appended to list at xfs_ail->ail_buf_list.
      
      Later, xfs_growfs_rt() loops across all of the new blocks belonging to
      the bitmap inode to set the bitmap values to 1. In doing so, it
      allocates a new transaction and invokes the following sequence of
      functions,
        - xfs_rtfree_range()
          - xfs_rtmodify_range()
            - xfs_rtbuf_get()
              We pass '&xfs_rtbuf_ops' as the ops pointer to xfs_trans_read_buf().
              - xfs_trans_read_buf()
      	  We find the xfs_buf of interest in per-ag hash table, invoke
      	  xfs_buf_reverify() which ends up assigning '&xfs_rtbuf_ops' to
      	  xfs_buf->b_ops.
      
      On the other hand, if xfs_growfs_rt_alloc() had allocated a few blocks
      for the bitmap inode and returned with an error, all the xfs_bufs
      corresponding to the new bitmap blocks that have been allocated would
      continue to be on xfs_ail->ail_buf_list list without ever having a
      non-NULL value assigned to their b_ops members. An AIL flush operation
      would then trigger the following warning message to be printed on the
      console,
      
        XFS (loop0): _xfs_buf_ioapply: no buf ops on daddr 0x58 len 8
        00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        CPU: 3 PID: 449 Comm: xfsaild/loop0 Not tainted 5.8.0-rc4-chandan-00038-g4d8c2b9de9ab-dirty #37
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        Call Trace:
         dump_stack+0x57/0x70
         _xfs_buf_ioapply+0x37c/0x3b0
         ? xfs_rw_bdev+0x1e0/0x1e0
         ? xfs_buf_delwri_submit_buffers+0xd4/0x210
         __xfs_buf_submit+0x6d/0x1f0
         xfs_buf_delwri_submit_buffers+0xd4/0x210
         xfsaild+0x2c8/0x9e0
         ? __switch_to_asm+0x42/0x70
         ? xfs_trans_ail_cursor_first+0x80/0x80
         kthread+0xfe/0x140
         ? kthread_park+0x90/0x90
         ret_from_fork+0x22/0x30
      
      This message indicates that the xfs_buf had its b_ops member set to
      NULL.
      
      This commit fixes the issue by assigning "&xfs_rtbuf_ops" to b_ops
      member of each of the xfs_bufs logged by xfs_growfs_rt_alloc().
      Signed-off-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      c54e14d1
  2. 21 Sep, 2020 2 commits
    • Chandan Babu R's avatar
      xfs: Set xfs_buf type flag when growing summary/bitmap files · 72cc9513
      Chandan Babu R authored
      The following sequence of commands,
      
        mkfs.xfs -f -m reflink=0 -r rtdev=/dev/loop1,size=10M /dev/loop0
        mount -o rtdev=/dev/loop1 /dev/loop0 /mnt
        xfs_growfs  /mnt
      
      ... causes the following call trace to be printed on the console,
      
      XFS: Assertion failed: (bip->bli_flags & XFS_BLI_STALE) || (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF), file: fs/xfs/xfs_buf_item.c, line: 331
      Call Trace:
       xfs_buf_item_format+0x632/0x680
       ? kmem_alloc_large+0x29/0x90
       ? kmem_alloc+0x70/0x120
       ? xfs_log_commit_cil+0x132/0x940
       xfs_log_commit_cil+0x26f/0x940
       ? xfs_buf_item_init+0x1ad/0x240
       ? xfs_growfs_rt_alloc+0x1fc/0x280
       __xfs_trans_commit+0xac/0x370
       xfs_growfs_rt_alloc+0x1fc/0x280
       xfs_growfs_rt+0x1a0/0x5e0
       xfs_file_ioctl+0x3fd/0xc70
       ? selinux_file_ioctl+0x174/0x220
       ksys_ioctl+0x87/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x3e/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurs because the buffer being formatted has the value of
      XFS_BLFT_UNKNOWN_BUF assigned to the 'type' subfield of
      bip->bli_formats->blf_flags.
      
      This commit fixes the issue by assigning one of XFS_BLFT_RTSUMMARY_BUF
      and XFS_BLFT_RTBITMAP_BUF to the 'type' subfield of
      bip->bli_formats->blf_flags before committing the corresponding
      transaction.
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      72cc9513
    • Brian Foster's avatar
      xfs: drop extra transaction roll from inode extent truncate · 6dd379c7
      Brian Foster authored
      The inode extent truncate path unmaps extents from the inode block
      mapping, finishes deferred ops to free the associated extents and
      then explicitly rolls the transaction before processing the next
      extent. The latter extent roll is spurious as xfs_defer_finish()
      always returns a clean transaction and automatically relogs inodes
      attached to the transaction (with lock_flags == 0). This can
      unnecessarily increase the number of log ticket regrants that occur
      during a long running truncate operation. Remove the explicit
      transaction roll.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      6dd379c7
  3. 16 Sep, 2020 30 commits