1. 21 Aug, 2020 2 commits
    • Dave Chinner's avatar
      xfs: validate cached inodes are free when allocated · 42c59d54
      Dave Chinner authored
      [ Upstream commit afca6c5b ]
      
      A recent fuzzed filesystem image cached random dcache corruption
      when the reproducer was run. This often showed up as panics in
      lookup_slow() on a null inode->i_ops pointer when doing pathwalks.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      ....
      Call Trace:
       lookup_slow+0x44/0x60
       walk_component+0x3dd/0x9f0
       link_path_walk+0x4a7/0x830
       path_lookupat+0xc1/0x470
       filename_lookup+0x129/0x270
       user_path_at_empty+0x36/0x40
       path_listxattr+0x98/0x110
       SyS_listxattr+0x13/0x20
       do_syscall_64+0xf5/0x280
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      but had many different failure modes including deadlocks trying to
      lock the inode that was just allocated or KASAN reports of
      use-after-free violations.
      
      The cause of the problem was a corrupt INOBT on a v4 fs where the
      root inode was marked as free in the inobt record. Hence when we
      allocated an inode, it chose the root inode to allocate, found it in
      the cache and re-initialised it.
      
      We recently fixed a similar inode allocation issue caused by inobt
      record corruption problem in xfs_iget_cache_miss() in commit
      ee457001 ("xfs: catch inode allocation state mismatch
      corruption"). This change adds similar checks to the cache-hit path
      to catch it, and turns the reproducer into a corruption shutdown
      situation.
      Reported-by: default avatarWen Xu <wen.xu@gatech.edu>
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix typos in comment]
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      42c59d54
    • Dave Chinner's avatar
      xfs: catch inode allocation state mismatch corruption · 6de46e2c
      Dave Chinner authored
      [ Upstream commit ee457001 ]
      
      We recently came across a V4 filesystem causing memory corruption
      due to a newly allocated inode being setup twice and being added to
      the superblock inode list twice. From code inspection, the only way
      this could happen is if a newly allocated inode was not marked as
      free on disk (i.e. di_mode wasn't zero).
      
      Running the metadump on an upstream debug kernel fails during inode
      allocation like so:
      
      XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
      e.c, line: 838
       ------------[ cut here ]------------
      kernel BUG at fs/xfs/xfs_message.c:114!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
      1/2014
      RIP: 0010:assfail+0x28/0x30
      RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
      RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
      RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
      R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
      FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000=
      000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
      Call Trace:
       xfs_ialloc+0x383/0x570
       xfs_dir_ialloc+0x6a/0x2a0
       xfs_create+0x412/0x670
       xfs_generic_create+0x1f7/0x2c0
       ? capable_wrt_inode_uidgid+0x3f/0x50
       vfs_mkdir+0xfb/0x1b0
       SyS_mkdir+0xcf/0xf0
       do_syscall_64+0x73/0x1a0
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Extracting the inode number we crashed on from an event trace and
      looking at it with xfs_db:
      
      xfs_db> inode 184452204
      xfs_db> p
      core.magic = 0x494e
      core.mode = 0100644
      core.version = 2
      core.format = 2 (extents)
      core.nlinkv2 = 1
      core.onlink = 0
      .....
      
      Confirms that it is not a free inode on disk. xfs_repair
      also trips over this inode:
      
      .....
      zero length extent (off = 0, fsbno = 0) in ino 184452204
      correcting nextents for inode 184452204
      bad attribute fork in inode 184452204, would clear attr fork
      bad nblocks 1 for inode 184452204, would reset to 0
      bad anextents 1 for inode 184452204, would reset to 0
      imap claims in-use inode 184452204 is free, would correct imap
      would have cleared inode 184452204
      .....
      disconnected inode 184452204, would move to lost+found
      
      And so we have a situation where the directory structure and the
      inobt thinks the inode is free, but the inode on disk thinks it is
      still in use. Where this corruption came from is not possible to
      diagnose, but we can detect it and prevent the kernel from oopsing
      on lookup. The reproducer now results in:
      
      $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
      mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
      ists
      mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
      ists
      mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
      re needs cleaning
      mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
      utput error
      mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
      utput error
      ....
      
      And this corruption shutdown:
      
      [   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
       marked free on disk
      [   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
      of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
      [   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
      443
      [   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
      S 1.10.2-1 04/01/2014
      [   54.852859] Call Trace:
      [   54.853531]  dump_stack+0x85/0xc5
      [   54.854385]  xfs_trans_cancel+0x197/0x1c0
      [   54.855421]  xfs_create+0x425/0x670
      [   54.856314]  xfs_generic_create+0x1f7/0x2c0
      [   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
      [   54.858586]  vfs_mkdir+0xfb/0x1b0
      [   54.859458]  SyS_mkdir+0xcf/0xf0
      [   54.860254]  do_syscall_64+0x73/0x1a0
      [   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      [   54.862492] RIP: 0033:0x7fb73bddf547
      [   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000=
      000000000053
      [   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73=
      bddf547
      [   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda=
      a55449a
      [   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a=
      8670dd0
      [   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000=
      00001ff
      [   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda=
      a553500
      [   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
      024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
      [   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutt=
      ing down filesystem
      [   54.884597] XFS (loop0): Please umount the filesystem and rectify the =
      problem(s)
      
      Note that this crash is only possible on v4 filesystemsi or v5
      filesystems mounted with the ikeep mount option. For all other V5
      filesystems, this problem cannot occur because we don't read inodes
      we are allocating from disk - we simply overwrite them with the new
      inode information.
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Tested-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6de46e2c
  2. 31 Jul, 2020 38 commits