1. 07 Nov, 2019 1 commit
    • Darrick J. Wong's avatar
      xfs: periodically yield scrub threads to the scheduler · 5d1116d4
      Darrick J. Wong authored
      Christoph Hellwig complained about the following soft lockup warning
      when running scrub after generic/175 when preemption is disabled and
      slub debugging is enabled:
      
      watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161]
      Modules linked in:
      irq event stamp: 41692326
      hardirqs last  enabled at (41692325): [<ffffffff8232c3b7>] _raw_0
      hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0
      softirqs last  enabled at (41684994): [<ffffffff8260031f>] __do_e
      softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0
      CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
      RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40
      Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2
      RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3
      RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006
      RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
      RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00
      R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040
      FS:  00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0
      Call Trace:
       free_debug_processing+0x1dd/0x240
       __slab_free+0x231/0x410
       kmem_cache_free+0x30e/0x360
       xchk_ag_btcur_free+0x76/0xb0
       xchk_ag_free+0x10/0x80
       xchk_bmap_iextent_xref.isra.14+0xd9/0x120
       xchk_bmap_iextent+0x187/0x210
       xchk_bmap+0x2e0/0x3b0
       xfs_scrub_metadata+0x2e7/0x500
       xfs_ioc_scrub_metadata+0x4a/0xa0
       xfs_file_ioctl+0x58a/0xcd0
       do_vfs_ioctl+0xa0/0x6f0
       ksys_ioctl+0x5b/0x90
       __x64_sys_ioctl+0x11/0x20
       do_syscall_64+0x4b/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      If preemption is disabled, all metadata buffers needed to perform the
      scrub are already in memory, and there are a lot of records to check,
      it's possible that the scrub thread will run for an extended period of
      time without sleeping for IO or any other reason.  Then the watchdog
      timer or the RCU stall timeout can trigger, producing the backtrace
      above.
      
      To fix this problem, call cond_resched() from the scrub thread so that
      we back out to the scheduler whenever necessary.
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5d1116d4
  2. 06 Nov, 2019 2 commits
  3. 05 Nov, 2019 20 commits
  4. 04 Nov, 2019 3 commits
  5. 03 Nov, 2019 9 commits
  6. 31 Oct, 2019 1 commit
    • Dave Chinner's avatar
      xfs: properly serialise fallocate against AIO+DIO · 249bd908
      Dave Chinner authored
      AIO+DIO can extend the file size on IO completion, and it holds
      no inode locks while the IO is in flight. Therefore, a race
      condition exists in file size updates if we do something like this:
      
      aio-thread			fallocate-thread
      
      lock inode
      submit IO beyond inode->i_size
      unlock inode
      .....
      				lock inode
      				break layouts
      				if (off + len > inode->i_size)
      					new_size = off + len
      				.....
      				inode_dio_wait()
      				<blocks>
      .....
      completes
      inode->i_size updated
      inode_dio_done()
      ....
      				<wakes>
      				<does stuff no long beyond EOF>
      				if (new_size)
      					xfs_vn_setattr(inode, new_size)
      
      
      Yup, that attempt to extend the file size in the fallocate code
      turns into a truncate - it removes the whatever the aio write
      allocated and put to disk, and reduced the inode size back down to
      where the fallocate operation ends.
      
      Fundamentally, xfs_file_fallocate()  not compatible with racing
      AIO+DIO completions, so we need to move the inode_dio_wait() call
      up to where the lock the inode and break the layouts.
      
      Secondly, storing the inode size and then using it unchecked without
      holding the ILOCK is not safe; we can only do such a thing if we've
      locked out and drained all IO and other modification operations,
      which we don't do initially in xfs_file_fallocate.
      
      It should be noted that some of the fallocate operations are
      compound operations - they are made up of multiple manipulations
      that may zero data, and so we may need to flush and invalidate the
      file multiple times during an operation. However, we only need to
      lock out IO and other space manipulation operations once, as that
      lockout is maintained until the entire fallocate operation has been
      completed.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      249bd908
  7. 29 Oct, 2019 4 commits