1. 07 Jun, 2017 40 commits
    • Israel Rukshin's avatar
      RDMA/srp: Fix NULL deref at srp_destroy_qp() · 68c98967
      Israel Rukshin authored
      commit 95c2ef50 upstream.
      
      If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
      may be NULL.
      Calling directly to ib_destroy_qp() is sufficient because
      no work requests were posted on the created qp.
      
      Fixes: 9294000d ("IB/srp: Drain the send queue before destroying a QP")
      Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68c98967
    • Michal Hocko's avatar
      mm: consider memblock reservations for deferred memory initialization sizing · dbab0232
      Michal Hocko authored
      commit 864b9a39 upstream.
      
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbab0232
    • James Morse's avatar
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 1cc89263
      James Morse authored
      commit 9a291a7c upstream.
      
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.comSigned-off-by: default avatarJames Morse <james.morse@arm.com>
      Acked-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cc89263
    • Yisheng Xie's avatar
      mlock: fix mlock count can not decrease in race condition · f814bf46
      Yisheng Xie authored
      commit 70feee0e upstream.
      
      Kefeng reported that when running the follow test, the mlock count in
      meminfo will increase permanently:
      
       [1] testcase
       linux:~ # cat test_mlockal
       grep Mlocked /proc/meminfo
        for j in `seq 0 10`
        do
       	for i in `seq 4 15`
       	do
       		./p_mlockall >> log &
       	done
       	sleep 0.2
       done
       # wait some time to let mlock counter decrease and 5s may not enough
       sleep 5
       grep Mlocked /proc/meminfo
      
       linux:~ # cat p_mlockall.c
       #include <sys/mman.h>
       #include <stdlib.h>
       #include <stdio.h>
      
       #define SPACE_LEN	4096
      
       int main(int argc, char ** argv)
       {
      	 	int ret;
      	 	void *adr = malloc(SPACE_LEN);
      	 	if (!adr)
      	 		return -1;
      
      	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
      	 	printf("mlcokall ret = %d\n", ret);
      
      	 	ret = munlockall();
      	 	printf("munlcokall ret = %d\n", ret);
      
      	 	free(adr);
      	 	return 0;
      	 }
      
      In __munlock_pagevec() we should decrement NR_MLOCK for each page where
      we clear the PageMlocked flag.  Commit 1ebb7cc6 ("mm: munlock: batch
      NR_MLOCK zone state updates") has introduced a bug where we don't
      decrement NR_MLOCK for pages where we clear the flag, but fail to
      isolate them from the lru list (e.g.  when the pages are on some other
      cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
      accounting gets permanently disrupted by this.
      
      Fix it by counting the number of page whose PageMlock flag is cleared.
      
      Fixes: 1ebb7cc6 (" mm: munlock: batch NR_MLOCK zone state updates")
      Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: zhongjiang <zhongjiang@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f814bf46
    • Punit Agrawal's avatar
      mm/migrate: fix refcount handling when !hugepage_migration_supported() · a0189db3
      Punit Agrawal authored
      commit 30809f55 upstream.
      
      On failing to migrate a page, soft_offline_huge_page() performs the
      necessary update to the hugepage ref-count.
      
      But when !hugepage_migration_supported() , unmap_and_move_hugepage()
      also decrements the page ref-count for the hugepage.  The combined
      behaviour leaves the ref-count in an inconsistent state.
      
      This leads to soft lockups when running the overcommitted hugepage test
      from mce-tests suite.
      
        Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
        soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
        INFO: rcu_preempt detected stalls on CPUs/tasks:
         Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
          (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
          thugetlb_overco R  running task        0  2715   2685 0x00000008
          Call trace:
            dump_backtrace+0x0/0x268
            show_stack+0x24/0x30
            sched_show_task+0x134/0x180
            rcu_print_detail_task_stall_rnp+0x54/0x7c
            rcu_check_callbacks+0xa74/0xb08
            update_process_times+0x34/0x60
            tick_sched_handle.isra.7+0x38/0x70
            tick_sched_timer+0x4c/0x98
            __hrtimer_run_queues+0xc0/0x300
            hrtimer_interrupt+0xac/0x228
            arch_timer_handler_phys+0x3c/0x50
            handle_percpu_devid_irq+0x8c/0x290
            generic_handle_irq+0x34/0x50
            __handle_domain_irq+0x68/0xc0
            gic_handle_irq+0x5c/0xb0
      
      Address this by changing the putback_active_hugepage() in
      soft_offline_huge_page() to putback_movable_pages().
      
      This only triggers on systems that enable memory failure handling
      (ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
      (!ARCH_ENABLE_HUGEPAGE_MIGRATION).
      
      I imagine this wasn't triggered as there aren't many systems running
      this configuration.
      
      [akpm@linux-foundation.org: remove dead comment, per Naoya]
      Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.comReported-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Tested-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Suggested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0189db3
    • Ross Zwisler's avatar
      dax: fix race between colliding PMD & PTE entries · bdaac1fe
      Ross Zwisler authored
      commit e2093926 upstream.
      
      We currently have two related PMD vs PTE races in the DAX code.  These
      can both be easily triggered by having two threads reading and writing
      simultaneously to the same private mapping, with the key being that
      private mapping reads can be handled with PMDs but private mapping
      writes are always handled with PTEs so that we can COW.
      
      Here is the first race:
      
        CPU 0					CPU 1
      
        (private mapping write)
        __handle_mm_fault()
          create_huge_pmd() - FALLBACK
          handle_pte_fault()
            passes check for pmd_devmap()
      
      					(private mapping read)
      					__handle_mm_fault()
      					  create_huge_pmd()
      					    dax_iomap_pmd_fault() inserts PMD
      
            dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
            			  installed in our page tables at this spot.
      
      Here's the second race:
      
        CPU 0					CPU 1
      
        (private mapping read)
        __handle_mm_fault()
          passes check for pmd_none()
          create_huge_pmd()
            dax_iomap_pmd_fault() inserts PMD
      
        (private mapping write)
        __handle_mm_fault()
          create_huge_pmd() - FALLBACK
      					(private mapping read)
      					__handle_mm_fault()
      					  passes check for pmd_none()
      					  create_huge_pmd()
      
          handle_pte_fault()
            dax_iomap_pte_fault() inserts PTE
      					    dax_iomap_pmd_fault() inserts PMD,
      					       but we already have a PTE at
      					       this spot.
      
      The core of the issue is that while there is isolation between faults to
      the same range in the DAX fault handlers via our DAX entry locking,
      there is no isolation between faults in the code in mm/memory.c.  This
      means for instance that this code in __handle_mm_fault() can run:
      
      	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
      		ret = create_huge_pmd(&vmf);
      
      But by the time we actually get to run the fault handler called by
      create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
      fault has installed a normal PMD here as a parent.  This is the cause of
      the 2nd race.  The first race is similar - there is the following check
      in handle_pte_fault():
      
      	} else {
      		/* See comment in pte_alloc_one_map() */
      		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
      			return 0;
      
      So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
      will bail and retry the fault.  This is correct, but there is nothing
      preventing the PMD from being installed after this check but before we
      actually get to the DAX PTE fault handlers.
      
      In my testing these races result in the following types of errors:
      
        BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
        BUG: non-zero nr_ptes on freeing mm: 15
      
      Fix this issue by having the DAX fault handlers verify that it is safe
      to continue their fault after they have taken an entry lock to block
      other racing faults.
      
      [ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
        Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
      Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarPawel Lebioda <pawel.lebioda@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Xiong Zhou <xzhou@redhat.com>
      Cc: Eryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdaac1fe
    • Ross Zwisler's avatar
      mm: avoid spurious 'bad pmd' warning messages · b16e6ab5
      Ross Zwisler authored
      commit d0f0931d upstream.
      
      When the pmd_devmap() checks were added by 5c7fb56e ("mm, dax:
      dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
      pages, they were all added to the end of if() statements after existing
      pmd_trans_huge() checks.  So, things like:
      
        -       if (pmd_trans_huge(*pmd))
        +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
      
      When further checks were added after pmd_trans_unstable() checks by
      commit 7267ec00 ("mm: postpone page table allocation until we have
      page to map") they were also added at the end of the conditional:
      
        +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
      
      This ordering is fine for pmd_trans_huge(), but doesn't work for
      pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
      check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
      pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
      do end up doing the right thing, but only after spamming dmesg with
      suspicious looking messages:
      
        mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
      
      Reorder these checks in a helper so that pmd_devmap() is checked first,
      avoiding the error messages, and add a comment explaining why the
      ordering is important.
      
      Fixes: commit 7267ec00 ("mm: postpone page table allocation until we have page to map")
      Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Xiong Zhou <xzhou@redhat.com>
      Cc: Eryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b16e6ab5
    • Tetsuo Handa's avatar
      mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once · de12c73f
      Tetsuo Handa authored
      commit c288983d upstream.
      
      Roman Gushchin has reported that the OOM killer can trivially selects
      next OOM victim when a thread doing memory allocation from page fault
      path was selected as first OOM victim.
      
          allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
          allocate cpuset=/ mems_allowed=0
          CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          Call Trace:
           oom_kill_process+0x219/0x3e0
           out_of_memory+0x11d/0x480
           __alloc_pages_slowpath+0xc84/0xd40
           __alloc_pages_nodemask+0x245/0x260
           alloc_pages_vma+0xa2/0x270
           __handle_mm_fault+0xca9/0x10c0
           handle_mm_fault+0xf3/0x210
           __do_page_fault+0x240/0x4e0
           trace_do_page_fault+0x37/0xe0
           do_async_page_fault+0x19/0x70
           async_page_fault+0x28/0x30
          ...
          Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
          Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
          allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
          allocate cpuset=/ mems_allowed=0
          CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          Call Trace:
           __alloc_pages_slowpath+0xd32/0xd40
           __alloc_pages_nodemask+0x245/0x260
           alloc_pages_vma+0xa2/0x270
           __handle_mm_fault+0xca9/0x10c0
           handle_mm_fault+0xf3/0x210
           __do_page_fault+0x240/0x4e0
           trace_do_page_fault+0x37/0xe0
           do_async_page_fault+0x19/0x70
           async_page_fault+0x28/0x30
          ...
          oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
          ...
          allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
          allocate cpuset=/ mems_allowed=0
          CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          Call Trace:
           oom_kill_process+0x219/0x3e0
           out_of_memory+0x11d/0x480
           pagefault_out_of_memory+0x68/0x80
           mm_fault_error+0x8f/0x190
           ? handle_mm_fault+0xf3/0x210
           __do_page_fault+0x4b2/0x4e0
           trace_do_page_fault+0x37/0xe0
           do_async_page_fault+0x19/0x70
           async_page_fault+0x28/0x30
          ...
          Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
          Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB
      
      There is a race window that the OOM reaper completes reclaiming the
      first victim's memory while nothing but mutex_trylock() prevents the
      first victim from calling out_of_memory() from pagefault_out_of_memory()
      after memory allocation for page fault path failed due to being selected
      as an OOM victim.
      
      This is a side effect of commit 9a67f648 ("mm: consolidate
      GFP_NOFAIL checks in the allocator slowpath") because that commit
      silently changed the behavior from
      
          /* Avoid allocations with no watermarks from looping endlessly */
      
      to
      
          /*
           * Give up allocations without trying memory reserves if selected
           * as an OOM victim
           */
      
      in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
      flag.  I have noticed this change but I didn't post a patch because I
      thought it is an acceptable change other than noise by warn_alloc()
      because !__GFP_NOFAIL allocations are allowed to fail.  But we
      overlooked that failing memory allocation from page fault path makes
      difference due to the race window explained above.
      
      While it might be possible to add a check to pagefault_out_of_memory()
      that prevents the first victim from calling out_of_memory() or remove
      out_of_memory() from pagefault_out_of_memory(), changing
      pagefault_out_of_memory() does not suppress noise by warn_alloc() when
      allocating thread was selected as an OOM victim.  There is little point
      with printing similar backtraces and memory information from both
      out_of_memory() and warn_alloc().
      
      Instead, if we guarantee that current thread can try allocations with no
      watermarks once when current thread looping inside
      __alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
      can use memory reserves" rules and suppress noise by warn_alloc() and
      prevent memory allocations from page fault path from calling
      pagefault_out_of_memory().
      
      If we take the comment literally, this patch would do
      
        -    if (test_thread_flag(TIF_MEMDIE))
        -        goto nopage;
        +    if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
        +        goto nopage;
      
      because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
      given.  But if I recall correctly (I couldn't find the message), the
      condition is meant to apply to only OOM victims despite the comment.
      Therefore, this patch preserves TIF_MEMDIE check.
      
      Fixes: 9a67f648 ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
      Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarRoman Gushchin <guro@fb.com>
      Tested-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de12c73f
    • Takashi Iwai's avatar
      ALSA: usb: Fix a typo in Tascam US-16x08 mixer element · af03bb0c
      Takashi Iwai authored
      commit 617163fc upstream.
      
      A mixer element created in a quirk for Tascam US-16x08 contains a
      typo: it should be "EQ MidLow Q" instead of "EQ MidQLow Q".
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
      Fixes: d2bb390a ("ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af03bb0c
    • Takashi Iwai's avatar
      Revert "ALSA: usb-audio: purge needless variable length array" · 0be9a9a4
      Takashi Iwai authored
      commit 64188cfb upstream.
      
      This reverts commit 89b593c3 ("ALSA: usb-audio: purge needless
      variable length array").  The patch turned out to cause a severe
      regression, triggering an Oops at snd_usb_ctl_msg().  It was overseen
      that snd_usb_ctl_msg() writes back the response to the given buffer,
      while the patch changed it to a read-only const buffer.  (One should
      always double-check when an extra pointer cast is present...)
      
      As a simple fix, just revert the affected commit.  It was merely a
      cleanup.  Although it brings VLA again, it's clearer as a fix.  We'll
      address the VLA later in another patch.
      
      Fixes: 89b593c3 ("ALSA: usb-audio: purge needless variable length array")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0be9a9a4
    • Alexander Tsoy's avatar
      ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 · ffb97b00
      Alexander Tsoy authored
      commit 1fc2e41f upstream.
      
      This model is actually called 92XXM2-8 in Windows driver. But since pin
      configs for M22 and M28 are identical, just reuse M22 quirk.
      
      Fixes external microphone (tested) and probably docking station ports
      (not tested).
      Signed-off-by: default avatarAlexander Tsoy <alexander@tsoy.me>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffb97b00
    • Takashi Iwai's avatar
      ALSA: hda - No loopback on ALC299 codec · 0c4afdc6
      Takashi Iwai authored
      commit fa16b69f upstream.
      
      ALC299 has no loopback mixer, but the driver still tries to add a beep
      control over the mixer NID which leads to the error at accessing it.
      This patch fixes it by properly declaring mixer_nid=0 for this codec.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195775
      Fixes: 28f1f9b2 ("ALSA: hda/realtek - Add new codec ID ALC299")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c4afdc6
    • Nicolas Iooss's avatar
      pcmcia: remove left-over %Z format · d5bc54d0
      Nicolas Iooss authored
      commit ff5a2016 upstream.
      
      Commit 5b5e0928 ("lib/vsprintf.c: remove %Z support") removed some
      usages of format %Z but forgot "%.2Zx".  This makes clang 4.0 reports a
      -Wformat-extra-args warning because it does not know about %Z.
      
      Replace %Z with %z.
      
      Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.orgSigned-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Harald Welte <laforge@gnumonks.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5bc54d0
    • Lyude's avatar
      drm/radeon: Unbreak HPD handling for r600+ · 26097709
      Lyude authored
      commit 3d18e337 upstream.
      
      We end up reading the interrupt register for HPD5, and then writing it
      to HPD6 which on systems without anything using HPD5 results in
      permanently disabling hotplug on one of the display outputs after the
      first time we acknowledge a hotplug interrupt from the GPU.
      
      This code is really bad. But for now, let's just fix this. I will
      hopefully have a large patch series to refactor all of this soon.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarLyude <lyude@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26097709
    • Alex Deucher's avatar
      drm/radeon/ci: disable mclk switching for high refresh rates (v2) · 6b693bbf
      Alex Deucher authored
      commit 58d7e3e4 upstream.
      
      Even if the vblank period would allow it, it still seems to
      be problematic on some cards.
      
      v2: fix logic inversion (Nils)
      
      bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b693bbf
    • Alex Deucher's avatar
      drm/amd/powerplay/smu7: disable mclk switching for high refresh rates · d6ba1a44
      Alex Deucher authored
      commit 2275a3a2 upstream.
      
      Even if the vblank period would allow it, it still seems to
      be problematic on some cards.
      
      bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6ba1a44
    • Alex Deucher's avatar
      drm/amd/powerplay/smu7: add vblank check for mclk switching (v2) · 662dbfcc
      Alex Deucher authored
      commit 09be4a52 upstream.
      
      Check to make sure the vblank period is long enough to support
      mclk switching.
      
      v2: drop needless initial assignment (Nils)
      
      bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarRex Zhu <Rex.Zhu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      662dbfcc
    • Ming Lei's avatar
      nvme: avoid to use blk_mq_abort_requeue_list() · a8aa8a0c
      Ming Lei authored
      commit 986f75c8 upstream.
      
      NVMe may add request into requeue list simply and not kick off the
      requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
      is called in both nvme_kill_queues() and nvme_ns_remove() for
      dealing with this issue.
      
      Unfortunately blk_mq_abort_requeue_list() is absolutely a
      race maker, for example, one request may be requeued during
      the aborting. So this patch just calls blk_mq_kick_requeue_list() in
      nvme_kill_queues() to handle this issue like what nvme_start_queues()
      does. Now all requests in requeue list when queues are stopped will be
      handled by blk_mq_kick_requeue_list() when queues are restarted, either
      in nvme_start_queues() or in nvme_kill_queues().
      Reported-by: default avatarZhang Yi <yizhan@redhat.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8aa8a0c
    • Ming Lei's avatar
      nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() · 20c03f45
      Ming Lei authored
      commit 806f026f upstream.
      
      Inside nvme_kill_queues(), we have to start hw queues for
      draining requests in sw queues, .dispatch list and requeue list,
      so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
      which only run queues if queues are stopped, but the queues may have
      been started already, for example nvme_start_queues() is called in reset work
      function.
      
      blk_mq_start_hw_queues() run hw queues in current context, instead
      of running asynchronously like before. Given nvme_kill_queues() is
      run from either remove context or reset worker context, both are fine
      to run hw queue directly. And the mutex of namespaces_mutex isn't a
      problem too becasue nvme_start_freeze() runs hw queue in this way
      already.
      Reported-by: default avatarZhang Yi <yizhan@redhat.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20c03f45
    • Marta Rybczynska's avatar
      nvme-rdma: support devices with queue size < 32 · 0fe9c551
      Marta Rybczynska authored
      commit 0544f549 upstream.
      
      In the case of small NVMe-oF queue size (<32) we may enter a deadlock
      caused by the fact that the IB completions aren't sent waiting for 32
      and the send queue will fill up.
      
      The error is seen as (using mlx5):
      [ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
      [ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12
      
      This patch changes the way the signaling is done so that it depends on
      the queue depth now. The magic define has been removed completely.
      Signed-off-by: default avatarMarta Rybczynska <marta.rybczynska@kalray.eu>
      Signed-off-by: default avatarSamuel Jones <sjones@kalray.eu>
      Acked-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fe9c551
    • Jason Gerecke's avatar
      HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference · f88d3d6e
      Jason Gerecke authored
      commit 2ac97f0f upstream.
      
      The following Smatch complaint was generated in response to commit
      2a6cdbdd ("HID: wacom: Introduce new 'touch_input' device"):
      
          drivers/hid/wacom_wac.c:1586 wacom_tpc_irq()
                   error: we previously assumed 'wacom->touch_input' could be null (see line 1577)
      
      The 'touch_input' and 'pen_input' variables point to the 'struct input_dev'
      used for relaying touch and pen events to userspace, respectively. If a
      device does not have a touch interface or pen interface, the associated
      input variable is NULL. The 'wacom_tpc_irq()' function is responsible for
      forwarding input reports to a more-specific IRQ handler function. An
      unknown report could theoretically be mistaken as e.g. a touch report
      on a device which does not have a touch interface. This can be prevented
      by only calling the pen/touch functions are called when the pen/touch
      pointers are valid.
      
      Fixes: 2a6cdbdd ("HID: wacom: Introduce new 'touch_input' device")
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Reviewed-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f88d3d6e
    • Bryant G. Ly's avatar
      ibmvscsis: Fix the incorrect req_lim_delta · 8d975ebd
      Bryant G. Ly authored
      commit 75dbf2d3 upstream.
      
      The current code is not correctly calculating the req_lim_delta.
      
      We want to make sure vscsi->credit is always incremented when
      we do not send a response for the scsi op. Thus for the case where
      there is a successfully aborted task we need to make sure the
      vscsi->credit is incremented.
      
      v2 - Moves the original location of the vscsi->credit increment
      to a better spot. Since if we increment credit, the next command
      we send back will have increased req_lim_delta. But we probably
      shouldn't be doing that until the aborted cmd is actually released.
      Otherwise the client will think that it can send a new command, and
      we could find ourselves short of command elements. Not likely, but could
      happen.
      
      This patch depends on both:
      commit 25e78531 ("ibmvscsis: Do not send aborted task response")
      commit 98883f1b ("ibmvscsis: Clear left-over abort_cmd pointers")
      Signed-off-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d975ebd
    • Bryant G. Ly's avatar
      ibmvscsis: Clear left-over abort_cmd pointers · e920be83
      Bryant G. Ly authored
      commit 98883f1b upstream.
      
      With the addition of ibmvscsis->abort_cmd pointer within
      commit 25e78531 ("ibmvscsis: Do not send aborted task response"),
      make sure to explicitly NULL these pointers when clearing
      DELAY_SEND flag.
      
      Do this for two cases, when getting the new new ibmvscsis
      descriptor in ibmvscsis_get_free_cmd() and before posting
      the response completion in ibmvscsis_send_messages().
      Signed-off-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e920be83
    • Artem Savkov's avatar
      scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get() · 1fb66c6a
      Artem Savkov authored
      commit 0648a07c upstream.
      
      rdac_failover_get references struct rdac_controller as
      ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
      inefficient this also introduces a null-pointer dereference as
      send_mode_select() sets ctlr->ms_sdev to NULL before calling
      rdac_failover_get():
      
      [   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
      [   18.436124] BUG: unable to handle kernel NULL pointer dereference at 0000000000000790
      [   18.436129] IP: send_mode_select+0xca/0x560
      [   18.436129] PGD 0
      [   18.436130] P4D 0
      [   18.436130]
      [   18.436132] Oops: 0000 [#1] SMP
      [   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 4.12.0-rc1.1.el7.test.x86_64 #1
      [   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, BIOS -[L8E164AUS-1.07]- 05/25/2011
      [   18.436145] Workqueue: kmpath_rdacd send_mode_select
      [   18.436146] task: ffff880225116a40 task.stack: ffffc90002bd8000
      [   18.436148] RIP: 0010:send_mode_select+0xca/0x560
      [   18.436148] RSP: 0018:ffffc90002bdbda8 EFLAGS: 00010246
      [   18.436149] RAX: 0000000000000000 RBX: ffffc90002bdbe08 RCX: ffff88017ef04a80
      [   18.436150] RDX: ffffc90002bdbe08 RSI: ffff88017ef04a80 RDI: ffff8802248e4388
      [   18.436151] RBP: ffffc90002bdbe48 R08: 0000000000000000 R09: ffffffff81c104c0
      [   18.436151] R10: 00000000000001ff R11: 000000000000035a R12: ffffc90002bdbdd8
      [   18.436152] R13: ffff8802248e4390 R14: ffff880225152800 R15: ffff8802248e4400
      [   18.436153] FS:  0000000000000000(0000) GS:ffff880227d00000(0000) knlGS:0000000000000000
      [   18.436154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   18.436154] CR2: 0000000000000790 CR3: 000000042535b000 CR4: 00000000000006e0
      [   18.436155] Call Trace:
      [   18.436159]  ? rdac_activate+0x14e/0x150
      [   18.436161]  ? refcount_dec_and_test+0x11/0x20
      [   18.436162]  ? kobject_put+0x1c/0x50
      [   18.436165]  ? scsi_dh_activate+0x6f/0xd0
      [   18.436168]  process_one_work+0x149/0x360
      [   18.436170]  worker_thread+0x4d/0x3c0
      [   18.436172]  kthread+0x109/0x140
      [   18.436173]  ? rescuer_thread+0x380/0x380
      [   18.436174]  ? kthread_park+0x60/0x60
      [   18.436176]  ret_from_fork+0x2c/0x40
      [   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
      [   18.436192] RIP: send_mode_select+0xca/0x560 RSP: ffffc90002bdbda8
      [   18.436192] CR2: 0000000000000790
      [   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
      [   18.436199] Kernel panic - not syncing: Fatal exception
      [   18.436222] Kernel Offset: disabled
      [-- MARK -- Thu May 18 11:45:00 2017]
      
      Fixes: 32782557 scsi_dh_rdac: switch to scsi_execute_req_flags()
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fb66c6a
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 14ba7893
      Nicholas Bellinger authored
      commit 25cdda95 upstream.
      
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14ba7893
    • Jiang Yi's avatar
      iscsi-target: Always wait for kthread_should_stop() before kthread exit · c732f308
      Jiang Yi authored
      commit 5e0cf5e6 upstream.
      
      There are three timing problems in the kthread usages of iscsi_target_mod:
      
       - np_thread of struct iscsi_np
       - rx_thread and tx_thread of struct iscsi_conn
      
      In iscsit_close_connection(), it calls
      
       send_sig(SIGINT, conn->tx_thread, 1);
       kthread_stop(conn->tx_thread);
      
      In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
      SIGINT the kthread will exit without checking the return value of
      kthread_should_stop().
      
      So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
      and kthread_stop(...), the kthread_stop() will try to stop an already
      stopped kthread.
      
      This is invalid according to the documentation of kthread_stop().
      
      (Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
       early iscsi_target_rx_thread failure case - nab)
      Signed-off-by: default avatarJiang Yi <jiangyilism@gmail.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c732f308
    • Long Li's avatar
      scsi: zero per-cmd private driver data for each MQ I/O · a168ac5b
      Long Li authored
      commit 1bad6c4a upstream.
      
      In lower layer driver's (LLD) scsi_host_template, the driver may
      optionally ask SCSI to allocate its private driver memory for each
      command, by specifying cmd_size. This memory is allocated at the end of
      scsi_cmnd by SCSI.  Later when SCSI queues a command, the LLD can use
      scsi_cmd_priv to get to its private data.
      
      Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In
      this case, the LLD may get to stale or uninitialized data in its private
      driver memory. This may result in unexpected driver and hardware
      behavior.
      
      Fix this problem by also zeroing the private driver memory before
      passing them to LLD.
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: default avatarKY Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a168ac5b
    • Srinath Mannam's avatar
      mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read · 21f8aa4c
      Srinath Mannam authored
      commit f5f968f2 upstream.
      
      The stingray SDHCI hardware supports ACMD12 and automatically
      issues after multi block transfer completed.
      
      If ACMD12 in SDHCI is disabled, spurious tx done interrupts are seen
      on multi block read command with below error message:
      
      Got data interrupt 0x00000002 even though no data
      operation was in progress.
      
      This patch uses SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 to enable
      ACM12 support in SDHCI hardware and suppress spurious interrupt.
      Signed-off-by: default avatarSrinath Mannam <srinath.mannam@broadcom.com>
      Reviewed-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Fixes: b580c52d ("mmc: sdhci-iproc: add IPROC SDHCI driver")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21f8aa4c
    • Benjamin Tissoires's avatar
      Revert "ACPI / button: Change default behavior to lid_init_state=open" · 4c5681af
      Benjamin Tissoires authored
      commit 878d8db0 upstream.
      
      Revert commit 77e9a4aa (ACPI / button: Change default behavior to
      lid_init_state=open) which changed the kernel's behavior on laptops
      that boot with closed lids and expect the lid switch state to be
      reported accurately by the kernel.
      
      If you boot or resume your laptop with the lid closed on a docking
      station while using an external monitor connected to it, both internal
      and external displays will light on, while only the external should.
      
      There is a design choice in gdm to only provide the greeter on the
      internal display when lit on, so users only see a gray area on the
      external monitor. Also, the cursor will not show up as it's by
      default on the internal display too.
      
      To "fix" that, users have to open the laptop once and close it once
      again to sync the state of the switch with the hardware state.
      
      Even if the "method" operation mode implementation can be buggy on
      some platforms, the "open" choice is worse.  It breaks docking
      stations basically and there is no way to have a user-space hwdb to
      fix that.
      
      On the contrary, it's rather easy in user-space to have a hwdb
      with the problematic platforms. Then,  libinput (1.7.0+) can fix
      the state of the lid switch for us: you need to set the udev
      property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.
      
      When libinput detects internal keyboard events, it will overwrite the
      state of the switch to open, making it reliable again.  Given that
      logind only checks the lid switch value after a timeout, we can
      assume the user will use the internal keyboard before this timeout
      expires.
      
      For example, such a hwdb entry is:
      
      libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
       LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
      
      Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c5681af
    • Lv Zheng's avatar
      ACPICA: Tables: Fix regression introduced by a too early mechanism enabling · 5d0e4205
      Lv Zheng authored
      commit 2ea65321 upstream.
      
      In the Linux kernel, acpi_get_table() "clones" haven't been fully
      balanced by acpi_put_table() invocations.  In upstream ACPICA, due to
      the design change, there are also unbalanced acpi_get_table_by_index()
      invocations requiring special care.
      
      acpi_get_table() reference counting mismatches may occor due to that
      and printing error messages related to them is not useful at this
      point.  The strict balanced validation count check should only be
      enabled after confirming that all invocations are safe and aligned
      with their designed purposes.
      
      Thus this patch removes the error value returned by acpi_tb_get_table()
      in that case along with the accompanying error message to fix the
      issue.
      
      Fixes: 174cc718 (ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel)
      Reported-by: default avatarAnush Seetharaman <anush.seetharaman@intel.com>
      Reported-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d0e4205
    • Dan Williams's avatar
      ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service · 34211cbf
      Dan Williams authored
      commit 0de0e198 upstream.
      
      Reading an ACPI table through the /sys/firmware/acpi/tables interface
      more than 65,536 times leads to the following log message:
      
       ACPI Error: Table ffff88033595eaa8, Validation count is zero after increment
        (20170119/tbutils-423)
      
      ...and the table being unavailable until the next reboot. Add the
      missing acpi_put_table() so the table ->validation_count is decremented
      after each read.
      Reported-by: default avatarAnush Seetharaman <anush.seetharaman@intel.com>
      Fixes: 174cc718 "ACPICA: Tables: Back port acpi_get_table_with_size() ..."
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34211cbf
    • Vishal Verma's avatar
      acpi, nfit: Fix the memory error check in nfit_handle_mce() · 93da4e6c
      Vishal Verma authored
      commit fc08a470 upstream.
      
      The check for an MCE being a memory error in the NFIT mce handler was
      bogus. Use the new mce_is_memory_error() helper to detect the error
      properly.
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20170519093915.15413-3-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93da4e6c
    • Borislav Petkov's avatar
      x86/MCE: Export memory_error() · 9183980a
      Borislav Petkov authored
      commit 2d1f4061 upstream.
      
      Export the function which checks whether an MCE is a memory error to
      other users so that we can reuse the logic. Drop the boot_cpu_data use,
      while at it, as mce.cpuvendor already has the CPU vendor in there.
      
      Integrate a piece from a patch from Vishal Verma
      <vishal.l.verma@intel.com> to export it for modules (nfit).
      
      The main reason we're exporting it is that the nfit handler
      nfit_handle_mce() needs to detect a memory error properly before doing
      its recovery actions.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9183980a
    • Lv Zheng's avatar
      Revert "ACPI / button: Remove lid_init_state=method mode" · 8f8dca3c
      Lv Zheng authored
      commit f369fdf4 upstream.
      
      This reverts commit ecb10b69.
      
      The only expected ACPI control method lid device's usage model is
      
       1. Listen to the lid notification,
       2. Evaluate _LID after being notified by BIOS,
       3. Suspend the system (if users configure to do so) after seeing "close".
      
      It's not ensured that BIOS will notify OS after boot/resume, and
      it's not ensured that BIOS will always generate "open" event upon
      opening the lid.
      
      But there are 2 wrong usage models:
      
       1. When the lid device is responsible for suspend/resume the system,
          userspace requires to see "open" event to be paired with "close" after
          the system is resumed, or it will suspend the system again.
      
       2. When an external monitor connects to the laptop attached docks,
          userspace requires to see "close" event after the system is resumed so
          that it can determine whether the internal display should remain dark
          and the external display should be lit on.
      
      After we made default kernel behavior to be suitable for usage model 1,
      users of usage model 2 start to report regressions for such behavior
      change.
      
      Reversion of button.lid_init_state=method doesn't actually reverts to old
      default behavior as doing so can enter a regression loop, but facilitates
      users to work the reported regressions around with
      button.lid_init_state=method.
      
      Fixes: ecb10b69 (ACPI / button: Remove lid_init_state=method mode)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1430259Tested-by: default avatarSteffen Weber <steffen.weber@gmail.com>
      Tested-by: default avatarJulian Wiedmann <julian.wiedmann@jwi.name>
      Reported-by: default avatarJoachim Frieben <jfrieben@hotmail.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f8dca3c
    • Herbert Xu's avatar
      crypto: skcipher - Add missing API setkey checks · f5eef8d2
      Herbert Xu authored
      commit 9933e113 upstream.
      
      The API setkey checks for key sizes and alignment went AWOL during the
      skcipher conversion.  This patch restores them.
      
      Fixes: 4e6c3df4 ("crypto: skcipher - Add low-level skcipher...")
      Reported-by: default avatarBaozeng <sploving1@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5eef8d2
    • Sebastian Reichel's avatar
      i2c: i2c-tiny-usb: fix buffer not being DMA capable · 2da75188
      Sebastian Reichel authored
      commit 5165da59 upstream.
      
      Since v4.9 i2c-tiny-usb generates the below call trace
      and longer works, since it can't communicate with the
      USB device. The reason is, that since v4.9 the USB
      stack checks, that the buffer it should transfer is DMA
      capable. This was a requirement since v2.2 days, but it
      usually worked nevertheless.
      
      [   17.504959] ------------[ cut here ]------------
      [   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
      [   17.506545] transfer buffer not dma capable
      [   17.507022] Modules linked in:
      [   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
      [   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [   17.509039] Call Trace:
      [   17.509320]  ? dump_stack+0x5c/0x78
      [   17.509714]  ? __warn+0xbe/0xe0
      [   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
      [   17.510532]  ? nommu_map_sg+0xb0/0xb0
      [   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
      [   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
      [   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
      [   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
      [   17.513125]  ? usb_start_wait_urb+0x65/0x160
      [   17.513604]  ? usb_control_msg+0xdc/0x130
      [   17.514061]  ? usb_xfer+0xa4/0x2a0
      [   17.514445]  ? __i2c_transfer+0x108/0x3c0
      [   17.514899]  ? i2c_transfer+0x57/0xb0
      [   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
      [   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
      [   17.516408]  ? i2c_smbus_xfer+0x125/0x330
      [   17.516876]  ? i2c_smbus_xfer+0x125/0x330
      [   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
      [   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
      [   17.518248]  ? do_vfs_ioctl+0x9f/0x600
      [   17.518671]  ? vfs_write+0x144/0x190
      [   17.519078]  ? SyS_ioctl+0x74/0x80
      [   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      [   17.519959] ---[ end trace d047c04982f5ac50 ]---
      Signed-off-by: default avatarSebastian Reichel <sebastian.reichel@collabora.co.uk>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarTill Harbaum <till@harbaum.org>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2da75188
    • Ard Biesheuvel's avatar
      drivers/tty: 8250: only call fintek_8250_probe when doing port I/O · e4bab31c
      Ard Biesheuvel authored
      commit 4c4fc909 upstream.
      
      Commit fa01e2ca ("serial: 8250: Integrate Fintek into 8250_base")
      modified the probing logic for PNP0501 devices, to remove a collision
      between the generic 16550A driver and the Fintek driver, which reused
      the same ACPI _HID.
      
      The Fintek device probe is now incorporated into the common 8250 probe
      path, and gets called for all discovered 16550A compatible devices,
      including ones that are MMIO mapped rather than IO mapped. However,
      the Fintek driver assumes the port base is a I/O address, and proceeds
      to probe some arbitrary offsets above it.
      
      This is generally a wrong thing to do, but on ARM systems (having no
      native port I/O), this may result in faulting accesses of completely
      unrelated MMIO regions in the PCI I/O space. Given that this is at
      serial probe time, this results in hard to diagnose crashes at boot.
      
      So let's restrict the Fintek probe to devices that we know are using
      port I/O in the first place.
      
      Fixes: fa01e2ca ("serial: 8250: Integrate Fintek into 8250_base")
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarRicardo Ribalda <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4bab31c
    • Johan Hovold's avatar
      serdev: fix tty-port client deregistration · 84ac7693
      Johan Hovold authored
      commit aee5da78 upstream.
      
      The port client data must be set when registering the serdev controller
      or client deregistration will fail (and the serdev devices are left
      registered and allocated) if the port was never opened in between.
      
      Make sure to clear the port client data on any probe errors to avoid a
      use-after-free when the client is later deregistered unconditionally
      (e.g. in a tty-port deregistration helper).
      
      Also move port client operation initialisation to registration. Note
      that the client ops must be restored on failed probe.
      
      Fixes: bed35c6d ("serdev: add a tty port controller driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84ac7693
    • Johan Hovold's avatar
      Revert "tty_port: register tty ports with serdev bus" · 427fa8e3
      Johan Hovold authored
      commit d3ba126a upstream.
      
      This reverts commit 8ee3fde0.
      
      The new serdev bus hooked into the tty layer in
      tty_port_register_device() by registering a serdev controller instead of
      a tty device whenever a serdev client is present, and by deregistering
      the controller in the tty-port destructor. This is broken in several
      ways:
      
      Firstly, it leads to a NULL-pointer dereference whenever a tty driver
      later deregisters its devices as no corresponding character device will
      exist.
      
      Secondly, far from every tty driver uses tty-port refcounting (e.g.
      serial core) so the serdev devices might never be deregistered or
      deallocated.
      
      Thirdly, deregistering at tty-port destruction is too late as the
      underlying device and structures may be long gone by then. A port is not
      released before an open tty device is closed, something which a
      registered serdev client can prevent from ever happening. A driver
      callback while the device is gone typically also leads to crashes.
      
      Many tty drivers even keep their ports around until the driver is
      unloaded (e.g. serial core), something which even if a late callback
      never happens, leads to leaks if a device is unbound from its driver and
      is later rebound.
      
      The right solution here is to add a new tty_port_unregister_device()
      helper and to never call tty_device_unregister() whenever the port has
      been claimed by serdev, but since this requires modifying just about
      every tty driver (and multiple subsystems) it will need to be done
      incrementally.
      
      Reverting the offending patch is the first step in fixing the broken
      lifetime assumptions. A follow-up patch will add a new pair of
      tty-device registration helpers, which a vetted tty driver can use to
      support serdev (initially serial core). When every tty driver uses the
      serdev helpers (at least for deregistration), we can add serdev
      registration to tty_port_register_device() again.
      
      Note that this also fixes another issue with serdev, which currently
      allocates and registers a serdev controller for every tty device
      registered using tty_port_device_register() only to immediately
      deregister and deallocate it when the corresponding OF node or serdev
      child node is missing. This should be addressed before enabling serdev
      for hot-pluggable buses.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      427fa8e3
    • Jeremy Kerr's avatar
      powerpc/spufs: Fix hash faults for kernel regions · baa4d411
      Jeremy Kerr authored
      commit d75e4919 upstream.
      
      Commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with
      _PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
      introduced check_pte_access() which denied kernel access to
      non-_PAGE_PRIVILEGED pages.
      
      However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
      for spufs' kernel accesses, so the DMAs required to establish SPE
      memory no longer work.
      
      This change adds _PAGE_PRIVILEGED to the hash fault handler for
      kernel accesses.
      
      Fixes: ac29c640 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
      Signed-off-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Reported-by: default avatarSombat Tragolgosol <sombat3960@gmail.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa4d411