1. 03 Feb, 2016 40 commits
    • Herbert Xu's avatar
      crypto: af_alg - Disallow bind/setkey/... after accept(2) · 0571ba52
      Herbert Xu authored
      [ Upstream commit c840ac6a ]
      
      Each af_alg parent socket obtained by socket(2) corresponds to a
      tfm object once bind(2) has succeeded.  An accept(2) call on that
      parent socket creates a context which then uses the tfm object.
      
      Therefore as long as any child sockets created by accept(2) exist
      the parent socket must not be modified or freed.
      
      This patch guarantees this by using locks and a reference count
      on the parent socket.  Any attempt to modify the parent socket will
      fail with EBUSY.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0571ba52
    • Tejun Heo's avatar
      printk: do cond_resched() between lines while outputting to consoles · bc24ac15
      Tejun Heo authored
      [ Upstream commit 8d91f8b1 ]
      
      @console_may_schedule tracks whether console_sem was acquired through
      lock or trylock.  If the former, we're inside a sleepable context and
      console_conditional_schedule() performs cond_resched().  This allows
      console drivers which use console_lock for synchronization to yield
      while performing time-consuming operations such as scrolling.
      
      However, the actual console outputting is performed while holding
      irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
      before starting outputting lines.  Also, only a few drivers call
      console_conditional_schedule() to begin with.  This means that when a
      lot of lines need to be output by console_unlock(), for example on a
      console registration, the task doing console_unlock() may not yield for
      a long time on a non-preemptible kernel.
      
      If this happens with a slow console devices, for example a serial
      console, the outputting task may occupy the cpu for a very long time.
      Long enough to trigger softlockup and/or RCU stall warnings, which in
      turn pile more messages, sometimes enough to trigger the next cycle of
      warnings incapacitating the system.
      
      Fix it by making console_unlock() insert cond_resched() between lines if
      @console_may_schedule.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarCalvin Owens <calvinowens@fb.com>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bc24ac15
    • Vitaly Kuznetsov's avatar
      kernel/panic.c: turn off locks debug before releasing console lock · 0e19e24c
      Vitaly Kuznetsov authored
      [ Upstream commit 7625b3a0 ]
      
      Commit 08d78658 ("panic: release stale console lock to always get the
      logbuf printed out") introduced an unwanted bad unlock balance report when
      panic() is called directly and not from OOPS (e.g.  from out_of_memory()).
      The difference is that in case of OOPS we disable locks debug in
      oops_enter() and on direct panic call nobody does that.
      
      Fixes: 08d78658 ("panic: release stale console lock to always get the logbuf printed out")
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0e19e24c
    • Vitaly Kuznetsov's avatar
      panic: release stale console lock to always get the logbuf printed out · 94ecf566
      Vitaly Kuznetsov authored
      [ Upstream commit 08d78658 ]
      
      In some cases we may end up killing the CPU holding the console lock
      while still having valuable data in logbuf. E.g. I'm observing the
      following:
      
      - A crash is happening on one CPU and console_unlock() is being called on
        some other.
      
      - console_unlock() tries to print out the buffer before releasing the lock
        and on slow console it takes time.
      
      - in the meanwhile crashing CPU does lots of printk()-s with valuable data
        (which go to the logbuf) and sends IPIs to all other CPUs.
      
      - console_unlock() finishes printing previous chunk and enables interrupts
        before trying to print out the rest, the CPU catches the IPI and never
        releases console lock.
      
      This is not the only possible case: in VT/fb subsystems we have many other
      console_lock()/console_unlock() users.  Non-masked interrupts (or
      receiving NMI in case of extreme slowness) will have the same result.
      Getting the whole console buffer printed out on crash should be top
      priority.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      94ecf566
    • Martijn Coenen's avatar
      memcg: only free spare array when readers are done · 2c641f5b
      Martijn Coenen authored
      [ Upstream commit 6611d8d7 ]
      
      A spare array holding mem cgroup threshold events is kept around to make
      sure we can always safely deregister an event and have an array to store
      the new set of events in.
      
      In the scenario where we're going from 1 to 0 registered events, the
      pointer to the primary array containing 1 event is copied to the spare
      slot, and then the spare slot is freed because no events are left.
      However, it is freed before calling synchronize_rcu(), which means
      readers may still be accessing threshold->primary after it is freed.
      
      Fixed by only freeing after synchronize_rcu().
      Signed-off-by: default avatarMartijn Coenen <maco@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2c641f5b
    • Naoya Horiguchi's avatar
      mm: soft-offline: check return value in second __get_any_page() call · 4ae0f4a6
      Naoya Horiguchi authored
      [ Upstream commit d96b339f ]
      
      I saw the following BUG_ON triggered in a testcase where a process calls
      madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
      calls migratepages command repeatedly (doing ping-pong among different
      NUMA nodes) for the first process:
      
         Soft offlining page 0x60000 at 0x700000600000
         __get_any_page: 0x60000 free buddy page
         page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
         flags: 0x1fffc0000000000()
         page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/include/linux/mm.h:342!
         invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
         Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
         CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
         Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
         RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
         RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
         Call Trace:
           put_hwpoison_page+0x4e/0x80
           soft_offline_page+0x501/0x520
           SyS_madvise+0x6bc/0x6f0
           entry_SYSCALL_64_fastpath+0x12/0x6a
         Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
         RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
          RSP <ffff88007c213e00>
      
      The root cause resides in get_any_page() which retries to get a refcount
      of the page to be soft-offlined.  This function calls
      put_hwpoison_page(), expecting that the target page is putback to LRU
      list.  But it can be also freed to buddy.  So the second check need to
      care about such case.
      
      Fixes: af8fae7c ("mm/memory-failure.c: clean up soft_offline_page()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4ae0f4a6
    • Kyeongdon Kim's avatar
      zram: try vmalloc() after kmalloc() · 7332411b
      Kyeongdon Kim authored
      [ Upstream commit d913897a ]
      
      When we're using LZ4 multi compression streams for zram swap, we found
      out page allocation failure message in system running test.  That was
      not only once, but a few(2 - 5 times per test).  Also, some failure
      cases were continually occurring to try allocation order 3.
      
      In order to make parallel compression private data, we should call
      kzalloc() with order 2/3 in runtime(lzo/lz4).  But if there is no order
      2/3 size memory to allocate in that time, page allocation fails.  This
      patch makes to use vmalloc() as fallback of kmalloc(), this prevents
      page alloc failure warning.
      
      After using this, we never found warning message in running test, also
      It could reduce process startup latency about 60-120ms in each case.
      
      For reference a call trace :
      
          Binder_1: page allocation failure: order:3, mode:0x10c0d0
          CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
          Call trace:
            dump_backtrace+0x0/0x270
            show_stack+0x10/0x1c
            dump_stack+0x1c/0x28
            warn_alloc_failed+0xfc/0x11c
            __alloc_pages_nodemask+0x724/0x7f0
            __get_free_pages+0x14/0x5c
            kmalloc_order_trace+0x38/0xd8
            zcomp_lz4_create+0x2c/0x38
            zcomp_strm_alloc+0x34/0x78
            zcomp_strm_multi_find+0x124/0x1ec
            zcomp_strm_find+0xc/0x18
            zram_bvec_rw+0x2fc/0x780
            zram_make_request+0x25c/0x2d4
            generic_make_request+0x80/0xbc
            submit_bio+0xa4/0x15c
            __swap_writepage+0x218/0x230
            swap_writepage+0x3c/0x4c
            shrink_page_list+0x51c/0x8d0
            shrink_inactive_list+0x3f8/0x60c
            shrink_lruvec+0x33c/0x4cc
            shrink_zone+0x3c/0x100
            try_to_free_pages+0x2b8/0x54c
            __alloc_pages_nodemask+0x514/0x7f0
            __get_free_pages+0x14/0x5c
            proc_info_read+0x50/0xe4
            vfs_read+0xa0/0x12c
            SyS_read+0x44/0x74
          DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
               0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
      
      [minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
      [sergey.senozhatsky@gmail.com: tweak comments and styles]
      Signed-off-by: default avatarKyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7332411b
    • Sergey Senozhatsky's avatar
      zram/zcomp: use GFP_NOIO to allocate streams · 664ecf4f
      Sergey Senozhatsky authored
      [ Upstream commit 3d5fe03a ]
      
      We can end up allocating a new compression stream with GFP_KERNEL from
      within the IO path, which may result is nested (recursive) IO
      operations.  That can introduce problems if the IO path in question is a
      reclaimer, holding some locks that will deadlock nested IOs.
      
      Allocate streams and working memory using GFP_NOIO flag, forbidding
      recursive IO and FS operations.
      
      An example:
      
        inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
        git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
        {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x8da/0x117b
           lock_acquire+0x10c/0x1a7
           start_this_handle+0x52d/0x555
           jbd2__journal_start+0xb4/0x237
           __ext4_journal_start_sb+0x108/0x17e
           ext4_dirty_inode+0x32/0x61
           __mark_inode_dirty+0x16b/0x60c
           iput+0x11e/0x274
           __dentry_kill+0x148/0x1b8
           shrink_dentry_list+0x274/0x44a
           prune_dcache_sb+0x4a/0x55
           super_cache_scan+0xfc/0x176
           shrink_slab.part.14.constprop.25+0x2a2/0x4d3
           shrink_zone+0x74/0x140
           kswapd+0x6b7/0x930
           kthread+0x107/0x10f
           ret_from_fork+0x3f/0x70
        irq event stamp: 138297
        hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
        hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
        softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
        softirqs last disabled at (137813):  irq_exit+0x41/0x95
      
                     other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0
               ----
          lock(jbd2_handle);
          <Interrupt>
            lock(jbd2_handle);
      
                      *** DEADLOCK ***
        5 locks held by git/20158:
         #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
         #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
         #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
         #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
         #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
      
                     stack backtrace:
        CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
        Call Trace:
          dump_stack+0x4c/0x6e
          mark_lock+0x384/0x56d
          mark_held_locks+0x5f/0x76
          lockdep_trace_alloc+0xb2/0xb5
          kmem_cache_alloc_trace+0x32/0x1e2
          zcomp_strm_alloc+0x25/0x73 [zram]
          zcomp_strm_multi_find+0xe7/0x173 [zram]
          zcomp_strm_find+0xc/0xe [zram]
          zram_bvec_rw+0x2ca/0x7e0 [zram]
          zram_make_request+0x1fa/0x301 [zram]
          generic_make_request+0x9c/0xdb
          submit_bio+0xf7/0x120
          ext4_io_submit+0x2e/0x43
          ext4_bio_write_page+0x1b7/0x300
          mpage_submit_page+0x60/0x77
          mpage_map_and_submit_buffers+0x10f/0x21d
          ext4_writepages+0xc8c/0xe1b
          do_writepages+0x23/0x2c
          __filemap_fdatawrite_range+0x84/0x8b
          filemap_flush+0x1c/0x1e
          ext4_alloc_da_blocks+0xb8/0x117
          ext4_rename+0x132/0x6dc
          ? mark_held_locks+0x5f/0x76
          ext4_rename2+0x29/0x2b
          vfs_rename+0x540/0x636
          SyS_renameat2+0x359/0x44d
          SyS_rename+0x1e/0x20
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
      [minchan@kernel.org: add stable mark]
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      664ecf4f
    • Ravi Bangoria's avatar
      perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data · 39e51d96
      Ravi Bangoria authored
      [ Upstream commit 3caeaa56 ]
      
      While recording guest samples in host using perf kvm record, it will
      populate unprocessable sample error, though samples will be recorded
      properly. While generating report using perf kvm report, no samples will
      be processed and same error will populate. We have seen this behaviour
      with upstream perf(4.4-rc3) on x86 and ppc64 hardware.
      
      Reason behind this failure is, when it tries to fetch machine from
      rb_tree of machines, it fails. As a part of tracing a bug, we figured
      out that this code was incorrectly refactored in commit 54245fdc
      ("perf session: Remove wrappers to machines__find").
      
      This patch will change the functionality such that if it can't fetch
      machine in first trial, it will create one node of machine and add that to
      rb_tree. So next time when it tries to fetch same machine from rb_tree,
      it won't fail. Actually it was the case before refactoring of code in
      aforementioned commit.
      
      This patch is generated from acme perf/core branch.
      
      Below I've mention an example that demonstrate the behaviour before and
      after applying patch.
      
      Before applying patch:
      [Note: One needs to run guest before recording data in host]
      
        ravi@ravi-bangoria:~$ ./perf kvm record -a
        Warning:
        5903 unprocessable samples recorded.
        Do you have a KVM guest running and not using 'perf kvm'?
        [ perf record: Captured and wrote 1.409 MB perf.data.guest (285 samples) ]
      
        ravi@ravi-bangoria:~$ ./perf kvm report --stdio
        Warning:
        5903 unprocessable samples recorded.
        Do you have a KVM guest running and not using 'perf kvm'?
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 285  of event 'cycles'
        # Event count (approx.): 88715406
        #
        # Overhead  Command  Shared Object  Symbol
        # ........  .......  .............  ......
        #
      
        # (For a higher level overview, try: perf report --sort comm,dso)
        #
      
      After applying patch:
      
        ravi@ravi-bangoria:~$ ./perf kvm record -a
        [ perf record: Captured and wrote 1.188 MB perf.data.guest (17 samples) ]
      
        ravi@ravi-bangoria:~$ ./perf kvm report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 17  of event 'cycles'
        # Event count (approx.): 700746
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ......................
        #
            34.19%  :5758    [unknown]         [g] 0xffffffff818682ab
            22.79%  :5758    [unknown]         [g] 0xffffffff812dc7f8
            22.79%  :5758    [unknown]         [g] 0xffffffff818650d0
            14.83%  :5758    [unknown]         [g] 0xffffffff8161a1b6
             2.49%  :5758    [unknown]         [g] 0xffffffff818692bf
             0.48%  :5758    [unknown]         [g] 0xffffffff81869253
             0.05%  :5758    [unknown]         [g] 0xffffffff81869250
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # v3.19+
      Fixes: 54245fdc ("perf session: Remove wrappers to machines__find")
      Link: http://lkml.kernel.org/r/1449471302-11283-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      39e51d96
    • xuejiufei's avatar
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · 8467278d
      xuejiufei authored
      [ Upstream commit bef5502d ]
      
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8467278d
    • Sergey Senozhatsky's avatar
      scripts/bloat-o-meter: fix python3 syntax error · c91d339d
      Sergey Senozhatsky authored
      [ Upstream commit 72214a24 ]
      
      In Python3+ print is a function so the old syntax is not correct
      anymore:
      
        $ ./scripts/bloat-o-meter vmlinux.o vmlinux.o.old
          File "./scripts/bloat-o-meter", line 61
            print "add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \
                                                                           ^
        SyntaxError: invalid syntax
      
      Fix by calling print as a function.
      
      Tested on python 2.7.11, 3.5.1
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c91d339d
    • Laura Abbott's avatar
      dma-debug: switch check from _text to _stext · 68cc9774
      Laura Abbott authored
      [ Upstream commit ea535e41 ]
      
      In include/asm-generic/sections.h:
      
        /*
         * Usage guidelines:
         * _text, _data: architecture specific, don't use them in
         * arch-independent code
         * [_stext, _etext]: contains .text.* sections, may also contain
         * .rodata.*
         *                   and/or .init.* sections
      
      _text is not guaranteed across architectures.  Architectures such as ARM
      may reuse parts which are not actually text and erroneously trigger a bug.
      Switch to using _stext which is guaranteed to contain text sections.
      
      Came out of https://lkml.kernel.org/g/<567B1176.4000106@redhat.com>
      Signed-off-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      68cc9774
    • Sudip Mukherjee's avatar
      m32r: fix m32104ut_defconfig build fail · a5254ac0
      Sudip Mukherjee authored
      [ Upstream commit 601f1db6 ]
      
      The build of m32104ut_defconfig for m32r arch was failing for long long
      time with the error:
      
        ERROR: "memory_start" [fs/udf/udf.ko] undefined!
        ERROR: "memory_end" [fs/udf/udf.ko] undefined!
        ERROR: "memory_end" [drivers/scsi/sg.ko] undefined!
        ERROR: "memory_start" [drivers/scsi/sg.ko] undefined!
        ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined!
        ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined!
      
      As done in other architectures export the symbols to fix the error.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarSudip Mukherjee <sudip@vectorindia.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a5254ac0
    • Vasily Averin's avatar
      cifs_dbg() outputs an uninitialized buffer in cifs_readdir() · c5882812
      Vasily Averin authored
      [ Upstream commit 01b9b0b2 ]
      
      In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
      therefore its printk with "%s" modifier can leak content of kernelspace memory.
      If old content of this buffer does not contain '\0' access bejond end of
      allocated object can crash the host.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarSteve French <sfrench@localhost.localdomain>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c5882812
    • Rabin Vincent's avatar
      cifs: fix race between call_async() and reconnect() · 99b79b15
      Rabin Vincent authored
      [ Upstream commit 820962dc ]
      
      cifs_call_async() queues the MID to the pending list and calls
      smb_send_rqst().  If smb_send_rqst() performs a partial send, it sets
      the tcpStatus to CifsNeedReconnect and returns an error code to
      cifs_call_async().  In this case, cifs_call_async() removes the MID
      from the list and returns to the caller.
      
      However, cifs_call_async() releases the server mutex _before_ removing
      the MID.  This means that a cifs_reconnect() can race with this function
      and manage to remove the MID from the list and delete the entry before
      cifs_call_async() calls cifs_delete_mid().  This leads to various
      crashes due to the use after free in cifs_delete_mid().
      
      Task1				Task2
      
      cifs_call_async():
       - rc = -EAGAIN
       - mutex_unlock(srv_mutex)
      
      				cifs_reconnect():
      				 - mutex_lock(srv_mutex)
      				 - mutex_unlock(srv_mutex)
      				 - list_delete(mid)
      				 - mid->callback()
      				 	cifs_writev_callback():
      				 		- mutex_lock(srv_mutex)
      						- delete(mid)
      				 		- mutex_unlock(srv_mutex)
      
       - cifs_delete_mid(mid) <---- use after free
      
      Fix this by removing the MID in cifs_call_async() before releasing the
      srv_mutex.  Also hold the srv_mutex in cifs_reconnect() until the MIDs
      are moved out of the pending list.
      Signed-off-by: default avatarRabin Vincent <rabin.vincent@axis.com>
      Acked-by: default avatarShirish Pargaonkar <shirishpargaonkar@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <sfrench@localhost.localdomain>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      99b79b15
    • Jamie Bainbridge's avatar
      cifs: Ratelimit kernel log messages · ca7342a8
      Jamie Bainbridge authored
      [ Upstream commit ec7147a9 ]
      
      Under some conditions, CIFS can repeatedly call the cifs_dbg() logging
      wrapper. If done rapidly enough, the console framebuffer can softlockup
      or "rcu_sched self-detected stall". Apply the built-in log ratelimiters
      to prevent such hangs.
      Signed-off-by: default avatarJamie Bainbridge <jamie.bainbridge@gmail.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ca7342a8
    • Dmitry V. Levin's avatar
      sparc64: fix incorrect sign extension in sys_sparc64_personality · 5301c064
      Dmitry V. Levin authored
      [ Upstream commit 525fd5a9 ]
      
      The value returned by sys_personality has type "long int".
      It is saved to a variable of type "int", which is not a problem
      yet because the type of task_struct->pesonality is "unsigned int".
      The problem is the sign extension from "int" to "long int"
      that happens on return from sys_sparc64_personality.
      
      For example, a userspace call personality((unsigned) -EINVAL) will
      result to any subsequent personality call, including absolutely
      harmless read-only personality(0xffffffff) call, failing with
      errno set to EINVAL.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5301c064
    • Carlo Caione's avatar
      mmc: core: Enable tuning according to the actual timing · 2a048a24
      Carlo Caione authored
      [ Upstream commit e10c3219 ]
      
      While in sdhci_execute_tuning() the choice whether or not to enable the
      tuning is done on the actual timing, in the mmc_sdio_init_uhs_card() the
      check is done on the capability of the card.
      
      This difference is causing some issues with some SDIO cards in DDR50
      mode where the CDM19 is wrongly issued.
      
      With this patch we modify the check in both
      mmc_(sd|sdio)_init_uhs_card() functions to take the proper decision
      only according to the actual timing specification.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarCarlo Caione <carlo@endlessm.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2a048a24
    • Weijun Yang's avatar
      mmc: core: enable CMD19 tuning for DDR50 mode · 6e1e9bd0
      Weijun Yang authored
      [ Upstream commit 4324f6de ]
      
      As SD Specifications Part1 Physical Layer Specification Version
      3.01 says, CMD19 tuning is available for unlocked cards in transfer
      state of 1.8V signaling mode. The small difference between v3.00
      and 3.01 spec means that CMD19 tuning is also available for DDR50
      mode.
      Signed-off-by: default avatarWeijun Yang <york.yang@csr.com>
      Signed-off-by: default avatarBarry Song <Baohua.Song@csr.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6e1e9bd0
    • Linus Walleij's avatar
      mmc: mmci: fix an ages old detection error · 8a55af06
      Linus Walleij authored
      [ Upstream commit 0bcb7efd ]
      
      commit 4956e109 ("ARM: 6244/1: mmci: add variant data and default
      MCICLOCK support") added variant data for ARM, U300 and Ux500 variants.
      The Nomadik NHK8815/8820 variant was erroneously labeled as a U300
      variant, and when the proper Nomadik variant was later introduced in
      commit 34fd4213 ("ARM: 7378/1: mmci: add support for the Nomadik MMCI
      variant") this was not fixes. Let's say this fixes the latter commit as
      there was no proper Nomadik support until then.
      
      Cc: stable@vger.kernel.org
      Fixes: 34fd4213 ("ARM: 7378/1: mmci: add support for the Nomadik...")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8a55af06
    • Mans Rullgard's avatar
      dmaengine: dw: fix cyclic transfer callbacks · 66eba3da
      Mans Rullgard authored
      [ Upstream commit 2895b2ca ]
      
      Cyclic transfer callbacks rely on block completion interrupts which were
      disabled in commit ff7b05f2 ("dmaengine/dw_dmac: Don't handle block
      interrupts").  This re-enables block interrupts so the cyclic callbacks
      can work.  Other transfer types are not affected as they set the INT_EN
      bit only on the last block.
      
      Fixes: ff7b05f2 ("dmaengine/dw_dmac: Don't handle block interrupts")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      66eba3da
    • Mans Rullgard's avatar
      dmaengine: dw: fix cyclic transfer setup · 13aedd78
      Mans Rullgard authored
      [ Upstream commit df3bb8a0 ]
      
      Commit 61e183f8 ("dmaengine/dw_dmac: Reconfigure interrupt and
      chan_cfg register on resume") moved some channel initialisation to
      a new function which must be called before starting a transfer.
      
      This updates dw_dma_cyclic_start() to use dwc_dostart() like the other
      modes, thus ensuring dwc_initialize() gets called and removing some code
      duplication.
      
      Fixes: 61e183f8 ("dmaengine/dw_dmac: Reconfigure interrupt and chan_cfg register on resume")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      13aedd78
    • Greg Kurz's avatar
      KVM: PPC: Fix ONE_REG AltiVec support · 064ee908
      Greg Kurz authored
      [ Upstream commit b4d7f161 ]
      
      The get and set operations got exchanged by mistake when moving the
      code from book3s.c to powerpc.c.
      
      Fixes: 3840edc8
      Cc: stable@vger.kernel.org # 3.18+
      Signed-off-by: default avatarGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      064ee908
    • Chris Wilson's avatar
      drm/i915: Restore inhibiting the load of the default context · 12060a1c
      Chris Wilson authored
      [ Upstream commit 06ef83a7 ]
      
      Following a GPU reset, we may leave the context in a poorly defined
      state, and reloading from that context will leave the GPU flummoxed. For
      secondary contexts, this will lead to that context being banned - but
      currently it is also causing the default context to become banned,
      leading to turmoil in the shared state.
      
      This is a regression from
      
      commit 6702cf16 [v4.1]
      Author: Ben Widawsky <benjamin.widawsky@intel.com>
      Date:   Mon Mar 16 16:00:58 2015 +0000
      
          drm/i915: Initialize all contexts
      
      which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
      default context.
      
      v2: Mark the global default context as uninitialized on GPU reset so
      that the context-local workarounds are reloaded upon re-enabling.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1448630935-27377-1-git-send-email-chris@chris-wilson.co.ukReviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Cc: stable@vger.kernel.org
      [danvet: This seems to fix a gpu hand on after the first resume,
      resulting in any future suspend operation failing with -EIO because
      the gpu seems to be in a funky state. Somehow this patch fixes that.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      (cherry picked from commit 42f1cae8)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      12060a1c
    • Helge Deller's avatar
      parisc: Fix __ARCH_SI_PREAMBLE_SIZE · 35a317f4
      Helge Deller authored
      [ Upstream commit e60fc5aa ]
      
      On a 64bit kernel build the compiler aligns the _sifields union in the
      struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
      compensates for this alignment and thus fixes the wait testcase of the
      strace package.
      
      The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
      _sigchld.si_stime variable is missed to be copied and thus after a
      copy_siginfo() will have uninitialized values.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      35a317f4
    • Minchan Kim's avatar
      virtio_balloon: fix race between migration and ballooning · c1a7154c
      Minchan Kim authored
      [ Upstream commit 21ea9fb6 ]
      
      In balloon_page_dequeue, pages_lock should cover the loop
      (ie, list_for_each_entry_safe). Otherwise, the cursor page could
      be isolated by compaction and then list_del by isolation could
      poison the page->lru.{prev,next} so the loop finally could
      access wrong address like this. This patch fixes the bug.
      
      general protection fault: 0000 [#1] SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
      RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
      RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
      RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
      RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
      RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
      R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
      R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
      FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
      Stack:
       0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
       0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
       ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
      Call Trace:
       [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
       [<ffffffff812c8bc7>] balloon+0x217/0x2a0
       [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
       [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
       [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
       [<ffffffff8105b6e9>] kthread+0xc9/0xe0
       [<ffffffff8105b620>] ? kthread_park+0x60/0x60
       [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
       [<ffffffff8105b620>] ? kthread_park+0x60/0x60
      Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
      RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
       RSP <ffff8800a7fefdc0>
      ---[ end trace 43cf28060d708d5f ]---
      Kernel panic - not syncing: Fatal exception
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c1a7154c
    • Minchan Kim's avatar
      virtio_balloon: fix race by fill and leak · 3bbe9868
      Minchan Kim authored
      [ Upstream commit f68b992b ]
      
      During my compaction-related stuff, I encountered a bug
      with ballooning.
      
      With repeated inflating and deflating cycle, guest memory(
      ie, cat /proc/meminfo | grep MemTotal) is decreased and
      couldn't be recovered.
      
      The reason is balloon_lock doesn't cover release_pages_balloon
      so struct virtio_balloon fields could be overwritten by race
      of fill_balloon(e,g, vb->*pfns could be critical).
      
      This patch fixes it in my test.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3bbe9868
    • Denis V. Lunev's avatar
      virtio_ballon: change stub of release_pages_by_pfn · 7daaedba
      Denis V. Lunev authored
      [ Upstream commit b4d34037 ]
      
      and rename it to release_pages_balloon. The function originally takes
      arrays of pfns and now it takes pointer to struct virtio_ballon.
      This change is necessary to conditionally call adjust_managed_page_count
      in the next patch.
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      CC: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7daaedba
    • Benjamin Tissoires's avatar
      Input: elantech - mark protocols v2 and v3 as semi-mt · 1fc9a4dd
      Benjamin Tissoires authored
      [ Upstream commit 6544a1df ]
      
      When using a protocol v2 or v3 hardware, elantech uses the function
      elantech_report_semi_mt_data() to report data. This devices are rather
      creepy because if num_finger is 3, (x2,y2) is (0,0). Yes, only one valid
      touch is reported.
      
      Anyway, userspace (libinput) is now confused by these (0,0) touches,
      and detect them as palm, and rejects them.
      
      Commit 3c0213d1 ("Input: elantech - fix semi-mt protocol for v3 HW")
      was sufficient enough for xf86-input-synaptics and libinput before it has
      palm rejection. Now we need to actually tell libinput that this device is
      a semi-mt one and it should not rely on the actual values of the 2 touches.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1fc9a4dd
    • Roman Volkov's avatar
      clocksource/drivers/vt8500: Increase the minimum delta · 27853751
      Roman Volkov authored
      [ Upstream commit f9eccf24 ]
      
      The vt8500 clocksource driver declares itself as capable to handle the
      minimum delay of 4 cycles by passing the value into
      clockevents_config_and_register(). The vt8500_timer_set_next_event()
      requires the passed cycles value to be at least 16. The impact is that
      userspace hangs in nanosleep() calls with small delay intervals.
      
      This problem is reproducible in Linux 4.2 starting from:
      c6eb3f70 ('hrtimer: Get rid of hrtimer softirq')
      
      From Russell King, more detailed explanation:
      
      "It's a speciality of the StrongARM/PXA hardware. It takes a certain
      number of OSCR cycles for the value written to hit the compare registers.
      So, if a very small delta is written (eg, the compare register is written
      with a value of OSCR + 1), the OSCR will have incremented past this value
      before it hits the underlying hardware. The result is, that you end up
      waiting a very long time for the OSCR to wrap before the event fires.
      
      So, we introduce a check in set_next_event() to detect this and return
      -ETIME if the calculated delta is too small, which causes the generic
      clockevents code to retry after adding the min_delta specified in
      clockevents_config_and_register() to the current time value.
      
      min_delta must be sufficient that we don't re-trip the -ETIME check - if
      we do, we will return -ETIME, forward the next event time, try to set it,
      return -ETIME again, and basically lock the system up. So, min_delta
      must be larger than the check inside set_next_event(). A factor of two
      was chosen to ensure that this situation would never occur.
      
      The PXA code worked on PXA systems for years, and I'd suggest no one
      changes this mechanism without access to a wide range of PXA systems,
      otherwise they're risking breakage."
      
      Cc: stable@vger.kernel.org
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: default avatarAlexey Charkov <alchark@gmail.com>
      Signed-off-by: default avatarRoman Volkov <rvolkov@v1ros.org>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      27853751
    • Dave Chinner's avatar
      xfs: handle dquot buffer readahead in log recovery correctly · e52b0623
      Dave Chinner authored
      [ Upstream commit 7d6a13f0 ]
      
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e52b0623
    • Dave Chinner's avatar
      xfs: inode recovery readahead can race with inode buffer creation · 848235ac
      Dave Chinner authored
      [ Upstream commit b79f4a1c ]
      
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      848235ac
    • Ard Biesheuvel's avatar
      s390: fix normalization bug in exception table sorting · 2826e094
      Ard Biesheuvel authored
      [ Upstream commit bcb7825a ]
      
      The normalization pass in the sorting routine of the relative exception
      table serves two purposes:
      - it ensures that the address fields of the exception table entries are
        fully ordered, so that no ambiguities arise between entries with
        identical instruction offsets (i.e., when two instructions that are
        exactly 8 bytes apart each have an exception table entry associated with
        them)
      - it ensures that the offsets of both the instruction and the fixup fields
        of each entry are relative to their final location after sorting.
      
      Commit eb608fb3 ("s390/exceptions: switch to relative exception table
      entries") ported the relative exception table format from x86, but modified
      the sorting routine to only normalize the instruction offset field and not
      the fixup offset field. The result is that the fixup offset of each entry
      will be relative to the original location of the entry before sorting,
      likely leading to crashes when those entries are dereferenced.
      
      Fixes: eb608fb3 ("s390/exceptions: switch to relative exception table entries")
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2826e094
    • Ben Skeggs's avatar
      drm/nouveau/kms: take mode_config mutex in connector hotplug path · ccafc9df
      Ben Skeggs authored
      [ Upstream commit 0a882cad ]
      
      fdo#93634
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ccafc9df
    • Vegard Nossum's avatar
      uml: flush stdout before forking · 498b3468
      Vegard Nossum authored
      [ Upstream commit 0754fb29 ]
      
      I was seeing some really weird behaviour where piping UML's output
      somewhere would cause output to get duplicated:
      
        $ ./vmlinux | head -n 40
        Checking that ptrace can change system call numbers...Core dump limits :
                soft - 0
                hard - NONE
        OK
        Checking syscall emulation patch for ptrace...Core dump limits :
                soft - 0
                hard - NONE
        OK
        Checking advanced syscall emulation patch for ptrace...Core dump limits :
                soft - 0
                hard - NONE
        OK
        Core dump limits :
                soft - 0
                hard - NONE
      
      This is because these tests do a fork() which duplicates the non-empty
      stdout buffer, then glibc flushes the duplicated buffer as each child
      exits.
      
      A simple workaround is to flush before forking.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      498b3468
    • Vegard Nossum's avatar
      uml: fix hostfs mknod() · 4e623240
      Vegard Nossum authored
      [ Upstream commit 9f2dfda2 ]
      
      An inverted return value check in hostfs_mknod() caused the function
      to return success after handling it as an error (and cleaning up).
      
      It resulted in the following segfault when trying to bind() a named
      unix socket:
      
        Pid: 198, comm: a.out Not tainted 4.4.0-rc4
        RIP: 0033:[<0000000061077df6>]
        RSP: 00000000daae5d60  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208
        RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600
        RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000
        R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000
        R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88
        Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6
        CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1
        Stack:
         e027d620 dfc54208 0000006f da981398
         61bee000 0000c1ed daae5de0 0000006e
         e027d620 dfcd4208 00000005 6092a460
        Call Trace:
         [<60dedc67>] SyS_bind+0xf7/0x110
         [<600587be>] handle_syscall+0x7e/0x80
         [<60066ad7>] userspace+0x3e7/0x4e0
         [<6006321f>] ? save_registers+0x1f/0x40
         [<6006c88e>] ? arch_prctl+0x1be/0x1f0
         [<60054985>] fork_handler+0x85/0x90
      
      Let's also get rid of the "cosmic ray protection" while we're at it.
      
      Fixes: e9193059 "hostfs: fix races in dentry_name() and inode_name()"
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4e623240
    • Mikulas Patocka's avatar
      dm snapshot: fix hung bios when copy error occurs · 27f8627c
      Mikulas Patocka authored
      [ Upstream commit 385277bf ]
      
      When there is an error copying a chunk dm-snapshot can incorrectly hold
      associated bios indefinitely, resulting in hung IO.
      
      The function copy_callback sets pe->error if there was error copying the
      chunk, and then calls complete_exception.  complete_exception calls
      pending_complete on error, otherwise it calls commit_exception with
      commit_callback (and commit_callback calls complete_exception).
      
      The persistent exception store (dm-snap-persistent.c) assumes that calls
      to prepare_exception and commit_exception are paired.
      persistent_prepare_exception increases ps->pending_count and
      persistent_commit_exception decreases it.
      
      If there is a copy error, persistent_prepare_exception is called but
      persistent_commit_exception is not.  This results in the variable
      ps->pending_count never returning to zero and that causes some pending
      exceptions (and their associated bios) to be held forever.
      
      Fix this by unconditionally calling commit_exception regardless of
      whether the copy was successful.  A new "valid" parameter is added to
      commit_exception -- when the copy fails this parameter is set to zero so
      that the chunk that failed to copy (and all following chunks) is not
      recorded in the snapshot store.  Also, remove commit_callback now that
      it is merely a wrapper around pending_complete.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      27f8627c
    • Mike Christie's avatar
      scsi: add Synology to 1024 sector blacklist · 9d8246cd
      Mike Christie authored
      [ Upstream commit 9055082f ]
      
      Another iscsi target that cannot handle large IOs, but does not tell us
      a limit.
      
      The Synology iSCSI targets report:
      
      Block limits VPD page (SBC):
        Write same no zero (WSNZ): 0
        Maximum compare and write length: 0 blocks
        Optimal transfer length granularity: 0 blocks
        Maximum transfer length: 0 blocks
        Optimal transfer length: 0 blocks
        Maximum prefetch length: 0 blocks
        Maximum unmap LBA count: 0
        Maximum unmap block descriptor count: 0
        Optimal unmap granularity: 0
        Unmap granularity alignment valid: 0
        Unmap granularity alignment: 0
        Maximum write same length: 0x0 blocks
      
      and the size of the command it can handle seems to depend on how much
      memory it can allocate at the time. This results in IO errors when
      handling large IOs. This patch just has us use the old 1024 default
      sectors for this target by adding it to the scsi blacklist. We do not
      have good contacs with this vendors, so I have not been able to try and
      fix on their side.
      
      I have posted this a long while back, but it was not merged. This
      version just fixes it up for merge/patch failures in the original
      version.
      Reported-by: default avatarAncoron Luciferis <ancoron.luciferis@googlemail.com>
      Reported-by: default avatarMichael Meyers <steltek@tcnnet.com>
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: <stable@vger.kernel.org> # 4.1+
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9d8246cd
    • Jeff Layton's avatar
      locks: fix unlock when fcntl_setlk races with a close · 2e21928b
      Jeff Layton authored
      [ Upstream commit 7f3697e2 ]
      
      Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
      fires in locks_free_lock_context when the flc_posix list isn't empty.
      
      The problem turns out to be that we're basically rebuilding the
      file_lock from scratch in fcntl_setlk when we discover that the setlk
      has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
      then we may end up with fl_start and fl_end values that differ from
      when the lock was initially set, if the file position or length of the
      file has changed in the interim.
      
      Fix this by just reusing the same lock request structure, and simply
      override fl_type value with F_UNLCK as appropriate. That ensures that
      we really are unlocking the lock that was initially set.
      
      While we're there, make sure that we do pop a WARN_ON_ONCE if the
      removal ever fails. Also return -EBADF in this event, since that's
      what we would have returned if the close had happened earlier.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Fixes: c293621b (stale POSIX lock handling)
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Acked-by: default avatar"J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2e21928b
    • Emmanuel Grumbach's avatar
      iwlwifi: pcie: properly configure the debug buffer size for 8000 · d9d42b67
      Emmanuel Grumbach authored
      [ Upstream commit 62d7476d ]
      
      8000 device family has a new debug engine that needs to be
      configured differently than 7000's.
      The debug engine's DMA works in chunks of memory and the
      size of the buffer really means the start of the last
      chunk. Since one chunk is 256-byte long, we should
      configure the device to write to buffer_size - 256.
      This fixes a situation were the device would write to
      memory it is not allowed to access.
      
      CC: <stable@vger.kernel.org> [4.1+]
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d9d42b67