• Filipe Manana's avatar
    Btrfs: fix invalid page accesses in extent_same (dedup) ioctl · df567e6d
    Filipe Manana authored
    commit e0bd70c6 upstream.
    
    In the extent_same ioctl we are getting the pages for the source and
    target ranges and unlocking them immediately after, which is incorrect
    because later we attempt to map them (with kmap_atomic) and access their
    contents at btrfs_cmp_data(). When we do such access the pages might have
    been relocated or removed from memory, which leads to an invalid memory
    access. This issue is detected on a kernel with CONFIG_DEBUG_PAGEALLOC=y
    which produces a trace like the following:
    
    186736.677437] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [186736.680382] Modules linked in: btrfs dm_flakey dm_mod ppdev xor raid6_pq sha256_generic hmac drbg ansi_cprng acpi_cpufreq evdev sg aesni_intel aes_x86_64
    parport_pc ablk_helper tpm_tis psmouse parport i2c_piix4 tpm cryptd i2c_core lrw processor button serio_raw pcspkr gf128mul glue_helper loop autofs4 ext4
    crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last
    unloaded: btrfs]
    [186736.681319] CPU: 13 PID: 10222 Comm: duperemove Tainted: G        W       4.4.0-rc6-btrfs-next-18+ #1
    [186736.681319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
    [186736.681319] task: ffff880132600400 ti: ffff880362284000 task.ti: ffff880362284000
    [186736.681319] RIP: 0010:[<ffffffff81264d00>]  [<ffffffff81264d00>] memcmp+0xb/0x22
    [186736.681319] RSP: 0018:ffff880362287d70  EFLAGS: 00010287
    [186736.681319] RAX: 000002c002468acf RBX: 0000000012345678 RCX: 0000000000000000
    [186736.681319] RDX: 0000000000001000 RSI: 0005d129c5cf9000 RDI: 0005d129c5cf9000
    [186736.681319] RBP: ffff880362287d70 R08: 0000000000000000 R09: 0000000000001000
    [186736.681319] R10: ffff880000000000 R11: 0000000000000476 R12: 0000000000001000
    [186736.681319] R13: ffff8802f91d4c88 R14: ffff8801f2a77830 R15: ffff880352e83e40
    [186736.681319] FS:  00007f27b37fe700(0000) GS:ffff88043dda0000(0000) knlGS:0000000000000000
    [186736.681319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [186736.681319] CR2: 00007f27a406a000 CR3: 0000000217421000 CR4: 00000000001406e0
    [186736.681319] Stack:
    [186736.681319]  ffff880362287ea0 ffffffffa048d0bd 000000000009f000 0000000000001000
    [186736.681319]  0100000000000000 ffff8801f2a77850 ffff8802f91d49b0 ffff880132600400
    [186736.681319]  00000000000004f8 ffff8801c1efbe41 0000000000000000 0000000000000038
    [186736.681319] Call Trace:
    [186736.681319]  [<ffffffffa048d0bd>] btrfs_ioctl+0x24cb/0x2731 [btrfs]
    [186736.681319]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
    [186736.681319]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
    [186736.681319]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
    [186736.681319]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
    [186736.681319]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
    [186736.681319]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
    [186736.681319] Code: 0a 3c 6e 74 0d 3c 79 74 04 3c 59 75 0c c6 06 01 eb 03 c6 06 00 31 c0 eb 05 b8 ea ff ff ff 5d c3 55 31 c9 48 89 e5 48 39 d1 74 13 <0f> b6
    04 0f 44 0f b6 04 0e 48 ff c1 44 29 c0 74 ea eb 02 31 c0
    
    (gdb) list *(btrfs_ioctl+0x24cb)
    0x5e0e1 is in btrfs_ioctl (fs/btrfs/ioctl.c:2972).
    2967                    dst_addr = kmap_atomic(dst_page);
    2968
    2969                    flush_dcache_page(src_page);
    2970                    flush_dcache_page(dst_page);
    2971
    2972                    if (memcmp(addr, dst_addr, cmp_len))
    2973                            ret = BTRFS_SAME_DATA_DIFFERS;
    2974
    2975                    kunmap_atomic(addr);
    2976                    kunmap_atomic(dst_addr);
    
    So fix this by making sure we keep the pages locked and respect the same
    locking order as everywhere else: get and lock the pages first and then
    lock the range in the inode's io tree (like for example at
    __btrfs_buffered_write() and extent_readpages()). If an ordered extent
    is found after locking the range in the io tree, unlock the range,
    unlock the pages, wait for the ordered extent to complete and repeat the
    entire locking process until no overlapping ordered extents are found.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    df567e6d
ioctl.c 137 KB