1. 05 Mar, 2016 12 commits
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · a7c9b603
      Linus Torvalds authored
      Pull libnvcimm fix from Dan Williams:
       "One straggling fix for NVDIMM support.
      
        The KVM/QEMU enabling for NVDIMMs has recently reached the point where
        it is able to accept some ACPI _DSM requests from a guest VM.  However
        they immediately found that the 4.5-rc kernel is unusable because the
        kernel's 'nfit' driver fails to load upon seeing a valid "not
        supported" response from the virtual BIOS for an address range scrub
        command.
      
        It is not mandatory that a platform implement address range scrubbing,
        so this fix from Vishal properly treats the 'not supported' response
        as 'skip scrubbing and continue loading the driver'"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nfit: Continue init even if ARS commands are unimplemented
      a7c9b603
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c12f83c3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two fairly simple fixes.
      
        One is a regression with ipr firmware loading caused by one of the
        trivial patches in the last merge window which failed to strip the \n
        from the file name string, so now the firmware loader no longer works
        leading to a lot of unhappy ipr users; fix by stripping the \n.
      
        The second is a memory leak within SCSI: the BLK_PREP_INVALID state
        was introduced a recent fix but we forgot to account for it correctly
        when freeing state, resulting in memory leakage.  Add the correct
        state freeing in scsi_prep_return()"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        ipr: Fix regression when loading firmware
        SCSI: Free resources when we return BLKPREP_INVALID
      c12f83c3
    • Linus Torvalds's avatar
      Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · fab3e94a
      Linus Torvalds authored
      Pull libata fixes from Tejun Heo:
       "Assorted fixes for libata drivers.
      
         - Turns out HDIO_GET_32BIT ioctl was subtly broken all along.
      
         - Recent update to ahci external port handling was incorrectly
           marking hotpluggable ports as external making userland handle
           devices connected to those ports incorrectly.
      
         - ahci_xgene needs its own irq handler to work around a hardware
           erratum.  libahci updated to allow irq handler override.
      
         - Misc driver specific updates"
      
      * 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ata: ahci: don't mark HotPlugCapable Ports as external/removable
        ahci: Workaround for ThunderX Errata#22536
        libata: Align ata_device's id on a cacheline
        Adding Intel Lewisburg device IDs for SATA
        pata-rb532-cf: get rid of the irq_to_gpio() call
        libata: fix HDIO_GET_32BIT ioctl
        ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.
        ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci.
        libahci: Implement the capability to override the generic ahci interrupt handler.
      fab3e94a
    • Linus Torvalds's avatar
      Merge branch 'for-linus2' of git://git.kernel.dk/linux-block · e5322c54
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Round 2 of this.  I cut back to the bare necessities, the patch is
        still larger than it usually would be at this time, due to the number
        of NVMe fixes in there.  This pull request contains:
      
         - The 4 core fixes from Ming, that fix both problems with exceeding
           the virtual boundary limit in case of merging, and the gap checking
           for cloned bio's.
      
         - NVMe fixes from Keith and Christoph:
      
              - Regression on larger user commands, causing problems with
                reading log pages (for instance). This touches both NVMe,
                and the block core since that is now generally utilized also
                for these types of commands.
      
              - Hot removal fixes.
      
              - User exploitable issue with passthrough IO commands, if !length
                is given, causing us to fault on writing to the zero
                page.
      
              - Fix for a hang under error conditions
      
         - And finally, the current series regression for umount with cgroup
           writeback, where the final flush would happen async and hence open
           up window after umount where the device wasn't consistent.  fsck
           right after umount would show this.  From Tejun"
      
      * 'for-linus2' of git://git.kernel.dk/linux-block:
        block: support large requests in blk_rq_map_user_iov
        block: fix blk_rq_get_max_sectors for driver private requests
        nvme: fix max_segments integer truncation
        nvme: set queue limits for the admin queue
        writeback: flush inode cgroup wb switches instead of pinning super_block
        NVMe: Fix 0-length integrity payload
        NVMe: Don't allow unsupported flags
        NVMe: Move error handling to failed reset handler
        NVMe: Simplify device reset failure
        NVMe: Fix namespace removal deadlock
        NVMe: Use IDA for namespace disk naming
        NVMe: Don't unmap controller registers on reset
        block: merge: get the 1st and last bvec via helpers
        block: get the 1st and last bvec via helpers
        block: check virt boundary in bio_will_gap()
        block: bio: introduce helpers to get the 1st and last bvec
      e5322c54
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · bdf9d297
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "Additional 4.5-rc6 fixes.
      
        I have four patches today.  I had previously thought I had submitted
        two of them last week, but they were accidentally skipped :-(.
      
         - One fix to an error path in the core
         - One fix for RoCE in the core
         - Two related fixes for the core/mlx5"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/core: Use GRH when the path hop-limit > 0
        IB/{core, mlx5}: Fix input len in vendor part of create_qp/srq
        IB/mlx5: Avoid using user-index for SRQs
        IB/core: Fix missed clean call in registration path
      bdf9d297
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 638c201e
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This contains one i915 patch twice, as I merged it locally for
        testing, and then pulled some stuff in on top, and then Jani sent to
        me, I didn't think it was worth redoing all the merges of what I had
        tested.
      
        Summary:
      
         - amdgpu/radeon fixes for some more power management and VM races.
      
         - Two i915 fixes, one for the a recent regression, one another power
           management fix for skylake.
      
         - Two tegra dma mask fixes for a regression.
      
         - One ast fix for a typo I made transcribing the userspace driver,
           that I'd like to get into stable so I don't forget about it"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        gpu: host1x: Set DMA ops on device creation
        gpu: host1x: Set DMA mask
        drm/amdgpu: return from atombios_dp_get_dpcd only when error
        drm/amdgpu/cz: remove commented out call to enable vce pg
        drm/amdgpu/powerplay/cz: enable/disable vce dpm independent of vce pg
        drm/amdgpu/cz: enable/disable vce dpm even if vce pg is disabled
        drm/amdgpu/gfx8: specify which engine to wait before vm flush
        drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well
        drm/amd/powerplay: send event to notify powerplay all modules are initialized.
        drm/amd/powerplay: export AMD_PP_EVENT_COMPLETE_INIT task to amdgpu.
        drm/radeon/pm: update current crtc info after setting the powerstate
        drm/amdgpu/pm: update current crtc info after setting the powerstate
        drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)
        drm/i915/skl: Fix power domain suspend sequence
        drm/ast: Fix incorrect register check for DRAM width
        drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)
      638c201e
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b80e8e28
      Linus Torvalds authored
      Pull power management and ACPI fixes from Rafael Wysocki:
       "Two build fixes for cpufreq drivers (including one for breakage
        introduced recently) and a fix for a graph tracer crash when used over
        suspend-to-RAM on x86.
      
        Specifics:
      
         - Prevent the graph tracer from crashing when used over suspend-to-
           RAM on x86 by pausing it before invoking do_suspend_lowlevel() and
           un-pausing it when that function has returned (Todd Brandt).
      
         - Fix build issues in the qoriq and mediatek cpufreq drivers related
           to broken dependencies on THERMAL (Arnd Bergmann)"
      
      * tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / sleep / x86: Fix crash on graph trace through x86 suspend
        cpufreq: mediatek: allow building as a module
        cpufreq: qoriq: allow building as module with THERMAL=m
      b80e8e28
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · ed385c7a
      Linus Torvalds authored
      Pull arm64 fix from Will Deacon:
       "Arm64 fix for -rc7.  Without it, our struct page array can overflow
        the vmemmap region on systems with a large PHYS_OFFSET.
      
        Nothing else on the radar at the moment, so hopefully that's it for
        4.5 from us.
      
        Summary: Ensure struct page array fits within vmemmap area"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: vmemmap: use virtual projection of linear region
      ed385c7a
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd · c51797d2
      Linus Torvalds authored
      Pull jffs2 fixes from David Woodhouse:
       "This contains two important JFFS2 fixes marked for stable:
      
         - a lock ordering problem between the page lock and the internal
           f->sem mutex, which was causing occasional deadlocks in garbage
           collection
      
         - a scan failure causing moved directories to sometimes end up
           appearing to have hard links.
      
        There are also a couple of trivial MAINTAINERS file updates"
      
      * tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd:
        MAINTAINERS: add maintainer entry for FREESCALE GPMI NAND driver
        Fix directory hardlinks from deleted directories
        jffs2: Fix page lock / f->sem deadlock
        Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
        MAINTAINERS: update Han's email
      c51797d2
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 2cdcb2b5
      Linus Torvalds authored
      Pull btrfs fix from Chris Mason:
       "Filipe nailed down a problem where tree log replay would do some work
        that orphan code wasn't expecting to be done yet, leading to BUG_ON"
      
      * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix loading of orphan roots leading to BUG_ON
      2cdcb2b5
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v4.5-rc6' of... · 78baab7a
      Linus Torvalds authored
      Merge tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fix from Steven Rostedt:
       "A feature was added in 4.3 that allowed users to filter trace points
        on a tasks "comm" field.  But this prevented filtering on a comm field
        that is within a trace event (like sched_migrate_task).
      
        When trying to filter on when a program migrated, this change
        prevented the filtering of the sched_migrate_task.
      
        To fix this, the event fields are examined first, and then the extra
        fields like "comm" and "cpu" are examined.  Also, instead of testing
        to assign the comm filter function based on the field's name, the
        generic comm field is given a new filter type (FILTER_COMM).  When
        this field is used to filter the type is checked.  The same is done
        for the cpu filter field.
      
        Two new special filter types are added: "COMM" and "CPU".  This allows
        users to still filter the tasks comm for events that have "comm" as
        one of their fields, in cases that users would like to filter
        sched_migrate_task on the comm of the task that called the event, and
        not the comm of the task that is being migrated"
      
      * tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Do not have 'comm' filter override event 'comm' field
      78baab7a
    • Vishal Verma's avatar
      nfit: Continue init even if ARS commands are unimplemented · 6e2452df
      Vishal Verma authored
      If firmware doesn't implement any of the ARS commands, take that to
      mean that ARS is unsupported, and continue to initialize regions without
      bad block lists. We cannot make the assumption that ARS commands will be
      unconditionally supported on all NVDIMMs.
      Reported-by: default avatarHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Acked-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Tested-by: default avatarHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      6e2452df
  2. 04 Mar, 2016 6 commits
  3. 03 Mar, 2016 22 commits
    • Filipe Manana's avatar
      Btrfs: fix loading of orphan roots leading to BUG_ON · 909c3a22
      Filipe Manana authored
      When looking for orphan roots during mount we can end up hitting a
      BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
      replayed and qgroups are enabled. This is because after a log tree is
      replayed, a transaction commit is made, which triggers qgroup extent
      accounting which in turn does backref walking which ends up reading and
      inserting all roots in the radix tree fs_info->fs_root_radix, including
      orphan roots (deleted snapshots). So after the log tree is replayed, when
      finding orphan roots we hit the BUG_ON with the following trace:
      
      [118209.182438] ------------[ cut here ]------------
      [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
      [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
      processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
      virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
      [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
      [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
      [118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
      [118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
      [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
      [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
      [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
      [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
      [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
      [118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
      [118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
      [118209.186318] Stack:
      [118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
      [118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
      [118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
      [118209.186318] Call Trace:
      [118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
      [118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
      [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
      [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
      [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
      [118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
      [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
      [118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
      [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
      [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
      [118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
      [118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
      [118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
      4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
      [118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
      [118209.186318]  RSP <ffff8800af34faa8>
      [118209.230735] ---[ end trace 83938f987d85d477 ]---
      
      So fix this by not treating the error -EEXIST, returned when attempting
      to insert a root already inserted by the backref walking code, as an error.
      
      The following test case for xfstests reproduces the bug:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            cd /
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_dm_target flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        _run_btrfs_util_prog quota enable $SCRATCH_MNT
      
        # Create 2 directories with one file in one of them.
        # We use these just to trigger a transaction commit later, moving the file from
        # directory a to directory b and doing an fsync against directory a.
        mkdir $SCRATCH_MNT/a
        mkdir $SCRATCH_MNT/b
        touch $SCRATCH_MNT/a/f
        sync
      
        # Create our test file with 2 4K extents.
        $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
      
        # Create a snapshot and delete it. This doesn't really delete the snapshot
        # immediately, just makes it inaccessible and invisible to user space, the
        # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
        # which is woke up at the next transaction commit.
        # A root orphan item is inserted into the tree of tree roots, so that if a
        # power failure happens before the dedicated kernel thread does the snapshot
        # deletion, the next time the filesystem is mounted it resumes the snapshot
        # deletion.
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
        _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
      
        # Now overwrite half of the extents we wrote before. Because we made a snapshpot
        # before, which isn't really deleted yet (since no transaction commit happened
        # after we did the snapshot delete request), the non overwritten extents get
        # referenced twice, once by the default subvolume and once by the snapshot.
        $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
      
        # Now move file f from directory a to directory b and fsync directory a.
        # The fsync on the directory a triggers a transaction commit (because a file
        # was moved from it to another directory) and the file fsync leaves a log tree
        # with file extent items to replay.
        mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
        echo "File digest before power failure:"
        md5sum $SCRATCH_MNT/foobar | _filter_scratch
      
        # Now simulate a power failure and mount the filesystem to replay the log tree.
        # After the log tree was replayed, we used to hit a BUG_ON() when processing
        # the root orphan item for the deleted snapshot. This is because when processing
        # an orphan root the code expected to be the first code inserting the root into
        # the fs_info->fs_root_radix radix tree, while in reallity it was the second
        # caller attempting to do it - the first caller was the transaction commit that
        # took place after replaying the log tree, when updating the qgroup counters.
        _flakey_drop_and_remount
      
        echo "File digest before after failure:"
        # Must match what he got before the power failure.
        md5sum $SCRATCH_MNT/foobar | _filter_scratch
      
        _unmount_flakey
        status=0
        exit
      
      Fixes: 2d9e9776 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
      Cc: stable@vger.kernel.org  # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      909c3a22
    • Christoph Hellwig's avatar
      block: support large requests in blk_rq_map_user_iov · 4d6af73d
      Christoph Hellwig authored
      This patch adds support for larger requests in blk_rq_map_user_iov by
      allowing it to build multiple bios for a request.  This functionality
      used to exist for the non-vectored blk_rq_map_user in the past, and
      this patch reuses the existing functionality for it on the unmap side,
      which stuck around.  Thanks to the iov_iter API supporting multiple
      bios is fairly trivial, as we can just iterate the iov until we've
      consumed the whole iov_iter.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Tested-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4d6af73d
    • Christoph Hellwig's avatar
      block: fix blk_rq_get_max_sectors for driver private requests · f2101842
      Christoph Hellwig authored
      Driver private request types should not get the artifical cap for the
      FS requests.  This is important to use the full device capabilities
      for internal command or NVMe pass through commands.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Tested-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      
      Updated by me to use an explicit check for the one command type that
      does support extended checking, instead of relying on the ordering
      of the enum command values - as suggested by Keith.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f2101842
    • Christoph Hellwig's avatar
      nvme: fix max_segments integer truncation · 45686b61
      Christoph Hellwig authored
      The block layer uses an unsigned short for max_segments.  The way we
      calculate the value for NVMe tends to generate very large 32-bit values,
      which after integer truncation may lead to a zero value instead of
      the desired outcome.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Tested-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      45686b61
    • Christoph Hellwig's avatar
      nvme: set queue limits for the admin queue · da35825d
      Christoph Hellwig authored
      Factor out a helper to set all the device specific queue limits and apply
      them to the admin queue in addition to the I/O queues.  Without this the
      command size on the admin queue is arbitrarily low, and the missing
      other limitations are just minefields waiting for victims.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Tested-by: default avatarJeff Lien <Jeff.Lien@hgst.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      da35825d
    • Tejun Heo's avatar
      writeback: flush inode cgroup wb switches instead of pinning super_block · a1a0e23e
      Tejun Heo authored
      If cgroup writeback is in use, inodes can be scheduled for
      asynchronous wb switching.  Before 5ff8eaac ("writeback: keep
      superblock pinned during cgroup writeback association switches"), this
      could race with umount leading to super_block being destroyed while
      inodes are pinned for wb switching.  5ff8eaac fixed it by bumping
      s_active while wb switches are in flight; however, this allowed
      in-flight wb switches to make umounts asynchronous when the userland
      expected synchronosity - e.g. fsck immediately following umount may
      fail because the device is still busy.
      
      This patch removes the problematic super_block pinning and instead
      makes generic_shutdown_super() flush in-flight wb switches.  wb
      switches are now executed on a dedicated isw_wq so that they can be
      flushed and isw_nr_in_flight keeps track of the number of in-flight wb
      switches so that flushing can be avoided in most cases.
      
      v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
          check in inode_switch_wbs() as Jan an Al suggested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarTahsin Erdogan <tahsin@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
      Fixes: 5ff8eaac ("writeback: keep superblock pinned during cgroup writeback association switches")
      Cc: stable@vger.kernel.org #v4.5
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a1a0e23e
    • Keith Busch's avatar
      NVMe: Fix 0-length integrity payload · e9fc63d6
      Keith Busch authored
      A user could send a passthrough IO command with a metadata pointer to a
      namespace without metadata. With metadata length of 0, kmalloc returns
      ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as
      the bio's integrity payload, which causes an access fault on completion.
      
      This patch ignores the users metadata buffer if the namespace format
      does not support separate metadata.
      Reported-by: default avatarStephen Bates <stephen.bates@microsemi.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e9fc63d6
    • Keith Busch's avatar
      NVMe: Don't allow unsupported flags · 63088ec7
      Keith Busch authored
      The command flags can change the meaning of other fields in the command
      that the driver is not prepared to handle. Specifically, the user could
      passthrough an SGL flag, causing the controller to misinterpret the PRP
      list the driver created, potentially corrupting memory or data.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      63088ec7
    • Keith Busch's avatar
      NVMe: Move error handling to failed reset handler · 69d9a99c
      Keith Busch authored
      This moves failed queue handling out of the namespace removal path and
      into the reset failure path, fixing a hanging condition if the controller
      fails or link down during del_gendisk. Previously the driver had to see
      the controller as degraded prior to calling del_gendisk to setup the
      queues to fail. But, if the controller happened to fail after this,
      there was no task to end outstanding requests.
      
      On failure, all namespace states are set to dead. This has capacity
      revalidate to 0, and ends all new requests with error status.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      69d9a99c
    • Keith Busch's avatar
      NVMe: Simplify device reset failure · f58944e2
      Keith Busch authored
      A reset failure schedules the device to unbind from the driver through
      the pci driver's remove. This cleans up all intialization, so there is
      no need to duplicate the potentially racy cleanup.
      
      To help understand why a reset failed, the status is logged with the
      existing warning message.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f58944e2
    • Keith Busch's avatar
      NVMe: Fix namespace removal deadlock · 646017a6
      Keith Busch authored
      This patch makes nvme namespace removal lockless. It is up to the caller
      to ensure no active namespace scanning is occuring. To ensure no scan
      work occurs, the nvme pci driver adds a removing state to the controller
      device to avoid queueing scan work during removal. The work is flushed
      after setting the state, so no new scan work can be queued.
      
      The lockless removal allows the driver to cleanup a namespace
      request_queue if the controller fails during removal. Previously this
      could deadlock trying to acquire the namespace mutex in order to handle
      such events.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      646017a6
    • Keith Busch's avatar
      NVMe: Use IDA for namespace disk naming · 075790eb
      Keith Busch authored
      A namespace may be detached from a controller, but a user may be holding
      a reference to it. Attaching a new namespace with the same NSID will create
      duplicate names when using the NSID to name the disk.
      
      This patch uses an IDA that is released only when the last reference is
      released instead of using the namespace ID.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      075790eb
    • Keith Busch's avatar
      NVMe: Don't unmap controller registers on reset · b00a726a
      Keith Busch authored
      Unmapping the registers on reset or shutdown is not necessary. Keeping
      the mapping simplifies reset handling.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b00a726a
    • Ming Lei's avatar
      block: merge: get the 1st and last bvec via helpers · e827091c
      Ming Lei authored
      This patch applies the two introduced helpers to
      figure out the 1st and last bvec.
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e827091c
    • Ming Lei's avatar
      block: get the 1st and last bvec via helpers · 25e71a99
      Ming Lei authored
      This patch applies the two introduced helpers to
      figure out the 1st and last bvec, and fixes the
      original way after bio splitting.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarSagi Grimberg <sagig@dev.mellanox.co.il>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      25e71a99
    • Ming Lei's avatar
      block: check virt boundary in bio_will_gap() · e0af2917
      Ming Lei authored
      In the following patch, the way for figuring out
      the last bvec will be changed with a bit cost introduced,
      so return immediately if the queue doesn't have virt
      boundary limit. Actually most of devices have not
      this limit.
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e0af2917
    • Ming Lei's avatar
      block: bio: introduce helpers to get the 1st and last bvec · 7bcd79ac
      Ming Lei authored
      The bio passed to bio_will_gap() may be fast cloned from upper
      layer(dm, md, bcache, fs, ...), or from bio splitting in block
      core.
      
      Unfortunately bio_will_gap() just figures out the last bvec via
      'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously
      wrong.
      
      This patch introduces two helpers for getting the first and last
      bvec of one bio for fixing the issue.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarSagi Grimberg <sagig@dev.mellanox.co.il>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      7bcd79ac
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.5-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · e3c2ef41
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "Freescale Layerscape host bridge driver:
          Fix MSG TLP drop setting (Minghuan Lian)
      
        TI Keystone host bridge driver:
          Fix MSI code that retrieves struct pcie_port pointer (Murali Karicheri)"
      
      * tag 'pci-v4.5-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: layerscape: Fix MSG TLP drop setting
        PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer
      e3c2ef41
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · c2687cf9
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       - ARM/MIPS: Fixes for ioctls when copy_from_user returns nonzero
       - x86: Small fix for Skylake TSC scaling
       - x86: Improved fix for last week's missed hardware breakpoint bug
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: x86: Update tsc multiplier on change.
        mips/kvm: fix ioctl error handling
        arm/arm64: KVM: Fix ioctl error handling
        KVM: x86: fix root cause for missed hardware breakpoints
      c2687cf9
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 4237b2e6
      Linus Torvalds authored
      Pull late GPIO fix from Linus Walleij:
       "Regressions never arrive when you want them to, so here is a late fix
        for the Renesas RCAR GPIO driver.  It only affects that driver on the
        very specific Renesas platforms:
      
         - Fix a runtime PM suspend/resume bug in the RCAR driver"
      
      * tag 'gpio-v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: rcar: Add Runtime PM handling for interrupts
      4237b2e6
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 19eab220
      Linus Torvalds authored
      Pull IOMMU fixes from Joerg Roedel:
       "One fix for Intel VT-d:
      
         - Use BUS_NOTIFY_REMOVED_DEVICE notifier to unbind a device from its
           domain _after_ it has been unbound from its driver.  This fixes a
           BUG_ON being triggered in the PCI hotplug path.
      
        And three for AMD IOMMU:
      
         - Add a workaround for a hardware issue with ATS in use
      
         - Fix ATS enable/disable balance when a device is removed
      
         - Fix a boot warning being triggered when the system has IOMMU
           performance counters and PCI device 00:00.0 is not covered by the
           IOMMU"
      
      * tag 'iommu-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
        iommu/amd: Detach device from domain before removal
        iommu/amd: Apply workaround for ATS write permission check
        iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
      19eab220
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · f4bd9822
      Linus Torvalds authored
      Pull minor virtio/vhost fixes from Michael Tsirkin:
       "This fixes two minor bugs: error handling in vhost, and capability
        processing in virtio"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost: fix error path in vhost_init_used()
        virtio-pci: read the right virtio_pci_notify_cap field
      f4bd9822