1. 29 Nov, 2018 5 commits
    • Jens Axboe's avatar
      nvme: implement mq_ops->commit_rqs() hook · 04f3eafd
      Jens Axboe authored
      Split the command submission and the SQ doorbell ring, and add the
      doorbell ring as our ->commit_rqs() hook. This allows a list of
      requests to be issued, with nvme only writing the SQ update when
      it's necessary. This is more efficient if we have lists of requests
      to issue, particularly on virtualized hardware, where writing the
      SQ doorbell is more expensive than on real hardware. For those cases,
      performance increases of 2-3x have been observed.
      
      The use case for this is plugged IO, where blk-mq flushes a batch of
      requests at the time.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      04f3eafd
    • Jens Axboe's avatar
      blk-mq: add mq_ops->commit_rqs() · d666ba98
      Jens Axboe authored
      blk-mq passes information to the hardware about any given request being
      the last that we will issue in this sequence. The point is that hardware
      can defer costly doorbell type writes to the last request. But if we run
      into errors issuing a sequence of requests, we may never send the request
      with bd->last == true set. For that case, we need a hook that tells the
      hardware that nothing else is coming right now.
      
      For failures returned by the drivers ->queue_rq() hook, the driver is
      responsible for flushing pending requests, if it uses bd->last to
      optimize that part. This works like before, no changes there.
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d666ba98
    • Jens Axboe's avatar
      block: improve logic around when to sort a plug list · ce5b009c
      Jens Axboe authored
      Only do it if we have requests for multiple queues in the same
      plug.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ce5b009c
    • Dan Carpenter's avatar
      blk-mq: Add a NULL check in blk_mq_free_map_and_requests() · 4e6db0f2
      Dan Carpenter authored
      I recently found some code which called blk_mq_free_map_and_requests()
      with a NULL set->tags pointer.  I fixed the caller, but it seems like a
      good idea to add a NULL check here as well.  Now we can call:
      
      	blk_mq_free_tag_set(set);
      	blk_mq_free_tag_set(set);
      
      twice in a row and it's harmless.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4e6db0f2
    • Dan Carpenter's avatar
      ataflop: fix error handling in atari_floppy_init() · 49379e6d
      Dan Carpenter authored
      Smatch complains that there is an off by one if the allocation fails in:
      
      	DMABuffer = atari_stram_alloc(BUFFER_SIZE+512, "ataflop");
      
      In that situation, "i" would be point to one element beyond the end of
      the unit[] array.
      
      There is a second bug because the error handling calls
      blk_mq_free_tag_set(&unit[i].tag_set); regardless of whether
      "disk->queue" is NULL or non-NULL.  So if blk_mq_init_sq_queue() fails,
      then that means unit[i].tag_set->tags is NULL and it leads to an Oops.
      
      It's easiest to call put_disk() before the goto to clean up the partial
      iteration.  Then the earlier unit[] elements are fully allocated so we
      can remove the checks whether "disk->queue" is NULL and the code is
      simpler.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      49379e6d
  2. 28 Nov, 2018 4 commits
    • Weiping Zhang's avatar
      block: add io timeout to sysfs · 65cd1d13
      Weiping Zhang authored
      Give a interface to adjust io timeout(ms) by device.
      Signed-off-by: default avatarWeiping Zhang <zhangweiping@didiglobal.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65cd1d13
    • Yufen Yu's avatar
      block: use rcu_work instead of call_rcu to avoid sleep in softirq · 94a2c3a3
      Yufen Yu authored
      We recently got a stack by syzkaller like this:
      
      BUG: sleeping function called from invalid context at mm/slab.h:361
      in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
      INFO: lockdep is turned off.
      CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
       0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
       0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
       0000000000000000 0000000000000001 0000000000000004 0000000000000000
      Call Trace:
       <IRQ>  [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline]
       <IRQ>  [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
       [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
       [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
       [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline]
       [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline]
       [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline]
       [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
       [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline]
       [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline]
       [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
       [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
       [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline]
       [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675
       [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline]
       [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline]
       [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692
       [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237
       [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
       [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
       [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
       [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
       [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
       [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
       [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273
       [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline]
       [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
       [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
       [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
       [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
       <EOI>  [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180
       [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626
       [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043
       [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline]
       [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050
       [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a
      
      In softirq context, we call rcu callback function delete_partition_rcu_cb(),
      which may allocate memory by kzalloc with GFP_KERNEL flag. If the
      allocation cannot be satisfied, it may sleep. However, That is not allowed
      in softirq contex.
      
      Although we found this problem on linux 4.4, the latest kernel version
      seems to have this problem as well. And it is very similar to the
      previous one:
      	https://lkml.org/lkml/2018/7/9/391
      
      Fix it by using RCU workqueue, which allows sleep.
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      94a2c3a3
    • Jens Axboe's avatar
      blk-mq: fix failure to decrement plug count on single rq removal · 4711b573
      Jens Axboe authored
      If we yank a 'same_queue_rq' request off the plug list, we should
      also decrement the cached request count.
      
      Fixes: 5f0ed774 ("block: sum requests in the plug structure")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4711b573
    • Young Xiao's avatar
      sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN · a11f6ca9
      Young Xiao authored
      __vdc_tx_trigger should only loop on EAGAIN a finite
      number of times.
      
      See commit adddc32d ("sunvnet: Do not spin in an
      infinite loop when vio_ldc_send() returns EAGAIN") for detail.
      Signed-off-by: default avatarYoung Xiao <YangX92@hotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a11f6ca9
  3. 26 Nov, 2018 10 commits
  4. 21 Nov, 2018 2 commits
  5. 20 Nov, 2018 6 commits
  6. 19 Nov, 2018 4 commits
  7. 18 Nov, 2018 9 commits
    • Jens Axboe's avatar
      Merge tag 'v4.20-rc3' into for-4.21/block · a78b03bc
      Jens Axboe authored
      Merge in -rc3 to resolve a few conflicts, but also to get a few
      important fixes that have gone into mainline since the block
      4.21 branch was forked off (most notably the SCSI queue issue,
      which is both a conflict AND needed fix).
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a78b03bc
    • Jens Axboe's avatar
      floppy: remove now unused 'flags' variable · fce15a60
      Jens Axboe authored
      With the locking removed, it's unused. Kill it.
      
      Fixes: 503f620f ("floppy: remove queue_lock around floppy_end_request")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fce15a60
    • Linus Torvalds's avatar
      Linux 4.20-rc3 · 9ff01193
      Linus Torvalds authored
      9ff01193
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 25e19c1f
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "A small batch of fixes for v4.20-rc3.
      
        The overflow continuation fix addresses something that has been broken
        for several releases. Arguably it could wait even longer, but it's a
        one line fix and this finishes the last of the known address range
        scrub bug reports. The revert addresses a lockdep regression. The unit
        tests are not critical to fix, but no reason to hold this fix back.
      
        Summary:
      
         - Address Range Scrub overflow continuation handling has been broken
           since it was initially merged. It was only recently that error
           injection and platform-BIOS support enabled this corner case to be
           exercised.
      
         - The recent attempt to provide more isolation for the kernel Address
           Range Scrub state machine from userapace initiated sessions
           triggers a lockdep report. Revert and try again at the next merge
           window.
      
         - Fix a kasan reported buffer overflow in libnvdimm unit test
           infrastrucutre (nfit_test)"
      
      * tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        Revert "acpi, nfit: Further restrict userspace ARS start requests"
        acpi, nfit: Fix ARS overflow continuation
        tools/testing/nvdimm: Fix the array size for dimm devices.
      25e19c1f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · c67a98c0
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
        mm, page_alloc: check for max order in hot path
        scripts/spdxcheck.py: make python3 compliant
        tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
        lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
        mm/vmstat.c: fix NUMA statistics updates
        mm/gup.c: fix follow_page_mask() kerneldoc comment
        ocfs2: free up write context when direct IO failed
        scripts/faddr2line: fix location of start_kernel in comment
        mm: don't reclaim inodes with many attached pages
        mm, memory_hotplug: check zone_movable in has_unmovable_pages
        mm/swapfile.c: use kvzalloc for swap_info_struct allocation
        MAINTAINERS: update OMAP MMC entry
        hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
        kernel/sched/psi.c: simplify cgroup_move_task()
        z3fold: fix possible reclaim races
      c67a98c0
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 03582f33
      Linus Torvalds authored
      Pull scheduler fix from Ingo Molnar:
       "Fix an exec() related scalability/performance regression, which was
        caused by incorrectly calculating load and migrating tasks on exec()
        when they shouldn't be"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix cpu_util_wake() for 'execl' type workloads
      03582f33
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b53e27f6
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Fix uncore PMU enumeration for CofeeLake CPUs"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Support CoffeeLake 8th CBOX
        perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
      b53e27f6
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 743a4863
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Misc fixes: two warning splat fixes, a leak fix and persistent memory
        allocation fixes for ARM"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi: Permit calling efi_mem_reserve_persistent() from atomic context
        efi/arm: Defer persistent reservations until after paging_init()
        efi/arm/libstub: Pack FDT after populating it
        efi/arm: Revert deferred unmap of early memmap mapping
        efi: Fix debugobjects warning on 'efi_rts_work'
      743a4863
    • Linus Torvalds's avatar
      Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm · cfaa9f02
      Linus Torvalds authored
      Pull ARM spectre updates from Russell King:
       "These are the currently known final bits that resolve the Spectre
        issues. big.Little systems used to be sufficiently identical in that
        there were no differences between individual CPUs in the system that
        mattered to the kernel. With the advent of the Spectre problem, the
        CPUs now have differences in how the workaround is applied.
      
        As a result of previous Spectre patches, these systems ended up
        reporting quite a lot of:
      
           "CPUx: Spectre v2: incorrect context switching function, system vulnerable"
      
        messages due to the action of the big.Little switcher causing the CPUs
        to be re-initialised regularly. This series resolves that issue by
        making the CPU vtable unique to each CPU.
      
        However, since this is used very early, before per-cpu is setup,
        per-cpu can't be used. We also have a problem that two of the methods
        are not called from preempt-safe paths, but thankfully these remain
        identical between all CPUs in the system. To make sure, we validate
        that these are identical during boot"
      
      * 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: spectre-v2: per-CPU vtables to work around big.Little systems
        ARM: add PROC_VTABLE and PROC_TABLE macros
        ARM: clean up per-processor check_bugs method call
        ARM: split out processor lookup
        ARM: make lookup_processor_type() non-__init
      cfaa9f02