1. 25 Apr, 2017 9 commits
    • Brian Foster's avatar
      xfs: prevent multi-fsb dir readahead from reading random blocks · cb52ee33
      Brian Foster authored
      Directory block readahead uses a complex iteration mechanism to map
      between high-level directory blocks and underlying physical extents.
      This mechanism attempts to traverse the higher-level dir blocks in a
      manner that handles multi-fsb directory blocks and simultaneously
      maintains a reference to the corresponding physical blocks.
      
      This logic doesn't handle certain (discontiguous) physical extent
      layouts correctly with multi-fsb directory blocks. For example,
      consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
      block size and a directory with the following extent layout:
      
       EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
         0: [0..7]:          88..95            0 (88..95)             8
         1: [8..15]:         80..87            0 (80..87)             8
         2: [16..39]:        168..191          0 (168..191)          24
         3: [40..63]:        5242952..5242975  1 (72..95)            24
      
      Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
      entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
      extent 2 is larger than the directory block size, the readahead code
      erroneously assumes the block is contiguous and issues a readahead
      based on the physical mapping of the first fsb of the dirblk. This
      results in read verifier failure and a spurious corruption or crc
      failure, depending on the filesystem format.
      
      Further, the subsequent readahead code responsible for walking
      through the physical table doesn't correctly advance the physical
      block reference for dirblk 2. Instead of advancing two physical
      filesystem blocks, the first iteration of the loop advances 1 block
      (correctly), but the subsequent iteration advances 2 more physical
      blocks because the next physical extent (extent 3, above) happens to
      cover more than dirblk 2. At this point, the higher-level directory
      block walking is completely off the rails of the actual physical
      layout of the directory for the respective mapping table.
      
      Update the contiguous dirblock logic to consider the current offset
      in the physical extent to avoid issuing directory readahead to
      unrelated blocks. Also, update the mapping table advancing code to
      consider the current offset within the current dirblock to avoid
      advancing the mapping reference too far beyond the dirblock.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      cb52ee33
    • Eric Sandeen's avatar
      xfs: handle array index overrun in xfs_dir2_leaf_readbuf() · 023cc840
      Eric Sandeen authored
      Carlos had a case where "find" seemed to start spinning
      forever and never return.
      
      This was on a filesystem with non-default multi-fsb (8k)
      directory blocks, and a fragmented directory with extents
      like this:
      
      0:[0,133646,2,0]
      1:[2,195888,1,0]
      2:[3,195890,1,0]
      3:[4,195892,1,0]
      4:[5,195894,1,0]
      5:[6,195896,1,0]
      6:[7,195898,1,0]
      7:[8,195900,1,0]
      8:[9,195902,1,0]
      9:[10,195908,1,0]
      10:[11,195910,1,0]
      11:[12,195912,1,0]
      12:[13,195914,1,0]
      ...
      
      i.e. the first extent is a contiguous 2-fsb dir block, but
      after that it is fragmented into 1 block extents.
      
      At the top of the readdir path, we allocate a mapping array
      which (for this filesystem geometry) can hold 10 extents; see
      the assignment to map_info->map_size.  During readdir, we are
      therefore able to map extents 0 through 9 above into the array
      for readahead purposes.  If we count by 2, we see that the last
      mapped index (9) is the first block of a 2-fsb directory block.
      
      At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
      more readahead; the outer loop assumes one full dir block is
      processed each loop iteration, and an inner loop that ensures
      that this is so by advancing to the next extent until a full
      directory block is mapped.
      
      The problem is that this inner loop may step past the last
      extent in the mapping array as it tries to reach the end of
      the directory block.  This will read garbage for the extent
      length, and as a result the loop control variable 'j' may
      become corrupted and never fail the loop conditional.
      
      The number of valid mappings we have in our array is stored
      in map->map_valid, so stop this inner loop based on that limit.
      
      There is an ASSERT at the top of the outer loop for this
      same condition, but we never made it out of the inner loop,
      so the ASSERT never fired.
      
      Huge appreciation for Carlos for debugging and isolating
      the problem.
      Debugged-and-analyzed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Tested-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarBill O'Donnell <billodo@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      023cc840
    • Chandan Rajendra's avatar
      iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_size · a008c31c
      Chandan Rajendra authored
      On a ppc64 machine executing overlayfs/019 with xfs as the lower and
      upper filesystem causes the following call trace,
      
      WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
      Modules linked in:
      CPU: 2 PID: 8034 Comm: fsstress Tainted: G             L  4.11.0-rc5-next-20170405 #100
      task: c000000631314880 task.stack: c0000003915d4000
      NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660
      REGS: c0000003915d7570 TRAP: 0700   Tainted: G             L   (4.11.0-rc5-next-20170405)
      MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
        CR: 24004284  XER: 00000000
      CFAR: c0000000006f7190 SOFTE: 1
      GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
      GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
      GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
      GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
      GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
      GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
      GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
      GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
      NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420
      LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420
      Call Trace:
      [c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
      [c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0
      [c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420
      [c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
      [c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130
      [c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0
      [c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0
      [c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100
      [c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc
      Instruction dump:
      78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
      2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378
      
      The above problem can also be recreated on a regular xfs filesystem
      using the command,
      
      $ fsstress -d /mnt -l 1000 -n 1000 -p 1000
      
      The reason for the call trace is,
      1. When 'reserving' blocks for delayed allocation , XFS reserves more
         blocks (i.e. past file's current EOF) than required. This is done
         because XFS assumes that userspace might write more data and hence
         'reserving' more blocks might lead to the file's new data being
         stored contiguously on disk.
      2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
         then cover the prealloc-ed EOF blocks in addition to the regular blocks.
      3. When flushing the dirty blocks to disk, we only flush data till the
         file's EOF. But before writing out the dirty data, we allocate blocks
         on the disk for holding the file's new data. This allocation includes
         the blocks that are part of the 'prealloc EOF blocks'.
      4. Later, when the last reference to the inode is being closed, XFS frees the
         unused 'prealloc EOF blocks' in xfs_inactive().
      
      In step 3 above, When allocating space on disk for the delayed allocation
      range, the space allocator might sometimes allocate less blocks than
      required. If such an allocation ends right at the current EOF of the
      file, We will not be able to clear the "delayed allocation" flag for the
      'prealloc EOF blocks', since we won't have dirty buffer heads associated
      with that range of the file.
      
      In such a situation if a Direct I/O read operation is performed on file
      range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
      range [X, Y] and invalidate page cache for that range (Refer to
      iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
      the extent items (which are still cached in memory) for the file
      range. When doing so we are not supposed to get an extent item with
      IOMAP_DELALLOC flag set, since the previous "flush" operation should
      have converted any delayed allocation data in the range [X, Y]. Hence we
      end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().
      
      This commit fixes the bug by preventing the read operation from going
      beyond iomap_dio->i_size.
      Reported-by: default avatarSanthosh G <santhog4@linux.vnet.ibm.com>
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      a008c31c
    • Christoph Hellwig's avatar
      xfs: remove bmap block allocation retries · 7590632a
      Christoph Hellwig authored
      Now that reflink operations don't set the firstblock value we don't
      need the workarounds for non-NULL firstblock values without a prior
      allocation.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      7590632a
    • Christoph Hellwig's avatar
      xfs: remove xfs_bmap_remap_alloc · bf8eadba
      Christoph Hellwig authored
      The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar
      to what we do in the space allocator.  But the reflink code doesn't touch
      the allocation btree unlike the normal space allocator, so we couldn't
      care less about the state of the AGFL.
      
      So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in
      the caller.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      bf8eadba
    • Christoph Hellwig's avatar
      xfs: introduce xfs_bmapi_remap · 6ebd5a44
      Christoph Hellwig authored
      Add a new helper to be used for reflink extent list additions instead of
      funneling them through xfs_bmapi_write and overloading the firstblock
      member in struct xfs_bmalloca and struct xfs_alloc_args.
      
      With some small changes to xfs_bmap_remap_alloc this also means we do
      not need a xfs_bmalloca structure for this case at all.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      6ebd5a44
    • Christoph Hellwig's avatar
      xfs: pass individual arguments to xfs_bmap_add_extent_hole_real · 6d04558f
      Christoph Hellwig authored
      For the reflink case we'd much rather pass the required arguments than
      faking up a struct xfs_bmalloca.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      6d04558f
    • Christoph Hellwig's avatar
      xfs: remove attr fork handling in xfs_bmap_finish_one · 39e07daa
      Christoph Hellwig authored
      We never do COW operations for the attr fork, so don't pretend we handle
      them.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      39e07daa
    • Christoph Hellwig's avatar
      xfs: fix integer truncation in xfs_bmap_remap_alloc · 52813fb1
      Christoph Hellwig authored
      bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
      xfs_aglock_t, which truncates the value to 32 bits.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      52813fb1
  2. 12 Apr, 2017 3 commits
  3. 06 Apr, 2017 2 commits
  4. 03 Apr, 2017 16 commits
  5. 28 Mar, 2017 1 commit
    • Darrick J. Wong's avatar
      xfs: rework the inline directory verifiers · 005c5db8
      Darrick J. Wong authored
      The inline directory verifiers should be called on the inode fork data,
      which means after iformat_local on the read side, and prior to
      ifork_flush on the write side.  This makes the fork verifier more
      consistent with the way buffer verifiers work -- i.e. they will operate
      on the memory buffer that the code will be reading and writing directly.
      
      Furthermore, revise the verifier function to return -EFSCORRUPTED so
      that we don't flood the logs with corruption messages and assert
      notices.  This has been a particular problem with xfs/348, which
      triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
      kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
      that, at least not in a verifier.
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      ---
      v2: get the inode d_ops the proper way
      v3: describe the bug that this patch fixes; no code changes
      005c5db8
  6. 26 Mar, 2017 8 commits
    • Linus Torvalds's avatar
      Linux 4.11-rc4 · c02ed2e7
      Linus Torvalds authored
      c02ed2e7
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 0dc82fa5
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "A smattering of different small fixes for some random driver
        subsystems. Nothing all that major, just resolutions for reported
        issues and bugs.
      
        All have been in linux-next with no reported issues"
      
      * tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
        extcon: int3496: Set the id pin to direction-input if necessary
        extcon: int3496: Use gpiod_get instead of gpiod_get_index
        extcon: int3496: Add dependency on X86 as it's Intel specific
        extcon: int3496: Add GPIO ACPI mapping table
        extcon: int3496: Rename GPIO pins in accordance with binding
        vmw_vmci: handle the return value from pci_alloc_irq_vectors correctly
        ppdev: fix registering same device name
        parport: fix attempt to write duplicate procfiles
        auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
        Drivers: hv: vmbus: Don't leak memory when a channel is rescinded
        Drivers: hv: vmbus: Don't leak channel ids
        Drivers: hv: util: don't forget to init host_ts.lock
        Drivers: hv: util: move waiting for release to hv_utils_transport itself
        vmbus: remove hv_event_tasklet_disable/enable
        vmbus: use rcu for per-cpu channel list
        mei: don't wait for os version message reply
        mei: fix deadlock on mei reset
        intel_th: pci: Add Gemini Lake support
        intel_th: pci: Add Denverton SOC support
        intel_th: Don't leak module refcount on failure to activate
        ...
      0dc82fa5
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.11-rc4' of... · 9e54ef9d
      Linus Torvalds authored
      Merge tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fix from Greg KH:
       "Here is a single kernfs fix for 4.11-rc4 that resolves a reported
        issue.
      
        It has been in linux-next with no reported issues"
      
      * tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
      9e54ef9d
    • Linus Torvalds's avatar
      Merge tag 'tty-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · f1638fc6
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some tty and serial driver fixes for 4.11-rc4.
      
        One of these fix a long-standing issue in the ldisc code that was
        found by Dmitry Vyukov with his great fuzzing work. The other fixes
        resolve other reported issues, and there is one revert of a patch in
        4.11-rc1 that wasn't correct.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'tty-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: fix data race in tty_ldisc_ref_wait()
        tty: don't panic on OOM in tty_set_ldisc()
        Revert "tty: serial: pl011: add ttyAMA for matching pl011 console"
        tty: acpi/spcr: QDF2400 E44 checks for wrong OEM revision
        serial: 8250_dw: Fix breakage when HAVE_CLK=n
        serial: 8250_dw: Honor clk_round_rate errors in dw8250_set_termios
      f1638fc6
    • Linus Torvalds's avatar
      Merge tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 53b4d591
      Linus Torvalds authored
      Pull IIO driver fixes from Greg KH:
       "Here are some small IIO driver fixes for 4.11-rc4 that resolve a
        number of tiny reported issues. All of these have been in linux-next
        for a while with no reported issues"
      
      * tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: imu: st_lsm6dsx: fix FIFO_CTRL2 overwrite during watermark configuration
        iio: adc: ti_am335x_adc: fix fifo overrun recovery
        iio: sw-device: Fix config group initialization
        iio: magnetometer: ak8974: remove incorrect __exit markups
        iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
      53b4d591
    • Linus Torvalds's avatar
      Merge tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · e431e0e4
      Linus Torvalds authored
      Pull USB/PHY fixes from Greg KH:
       "Here are a number of small USB and PHY driver fixes for 4.11-rc4.
      
        Nothing major here, just an bunch of small fixes, and a handfull of
        good fixes from Johan for devices with crazy descriptors. There are a
        few new device ids in here as well.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
        usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held
        usb: gadget: udc: remove pointer dereference after free
        usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
        usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
        usb: gadget: acm: fix endianness in notifications
        usb: dwc3: gadget: delay unmap of bounced requests
        USB: serial: qcserial: add Dell DW5811e
        usb: hub: Fix crash after failure to read BOS descriptor
        ACM gadget: fix endianness in notifications
        USB: usbtmc: fix probe error path
        USB: usbtmc: add missing endpoint sanity check
        USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
        usb: musb: fix possible spinlock deadlock
        usb: musb: dsps: fix iounmap in error and exit paths
        usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
        usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
        uwb: i1480-dfu: fix NULL-deref at probe
        uwb: hwa-rc: fix NULL-deref at probe
        USB: wusbcore: fix NULL-deref at probe
        USB: uss720: fix NULL-deref at probe
        ...
      e431e0e4
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 42234bf8
      Linus Torvalds authored
      Pull more powerpc fixes from Michael Ellerman:
       "These are all pretty minor. The fix for idle wakeup would be a bad bug
        but has not been observed in practice.
      
        The update to the gcc-plugins docs was Cc'ed to Kees and Jon, Kees
        OK'ed it going via powerpc and I didn't hear from Jon.
      
         - cxl: Route eeh events to all slices for pci_channel_io_perm_failure state
      
         - powerpc/64s: Fix idle wakeup potential to clobber registers
      
         - Revert "powerpc/64: Disable use of radix under a hypervisor"
      
         - gcc-plugins: update architecture list in documentation
      
        Thanks to: Andrew Donnellan, Nicholas Piggin, Paul Mackerras, Vaibhav
        Jain"
      
      * tag 'powerpc-4.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        gcc-plugins: update architecture list in documentation
        Revert "powerpc/64: Disable use of radix under a hypervisor"
        powerpc/64s: Fix idle wakeup potential to clobber registers
        cxl: Route eeh events to all slices for pci_channel_io_perm_failure state
      42234bf8
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 1c23de63
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Fix a memory leak on an error path, and two races when modifying
        inodes relating to the inline_data and metadata checksum features"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix two spelling nits
        ext4: lock the xattr block before checksuming it
        jbd2: don't leak memory if setting up journal fails
        ext4: mark inode dirty after converting inline directory
      1c23de63
  7. 25 Mar, 2017 1 commit