1. 31 Mar, 2016 30 commits
  2. 30 Mar, 2016 10 commits
    • NeilBrown's avatar
      md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list · f29cb2b6
      NeilBrown authored
      commit 550da24f upstream.
      
      break_stripe_batch_list breaks up a batch and copies some flags from
      the batch head to the members, preserving others.
      
      It doesn't preserve or copy STRIPE_PREREAD_ACTIVE.  This is not
      normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
      stripe_head is added to a batch, and is not set on stripe_heads
      already in a batch.
      
      However there is no locking to ensure one thread doesn't set the flag
      after it has just been cleared in another.  This does occasionally happen.
      
      md/raid5 maintains a count of the number of stripe_heads with
      STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes.  When
      break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
      this could becomes incorrect and will never again return to zero.
      
      md/raid5 delays the handling of some stripe_heads until
      preread_active_stripes becomes zero.  So when the above mention race
      happens, those stripe_heads become blocked and never progress,
      resulting is write to the array handing.
      
      So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
      in the members of a batch.
      
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
      URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
      URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
      Reported-by: Martin Svec <martin.svec@zoner.cz> (and others)
      Tested-by: default avatarTom Weber <linux@junkyard.4t2.com>
      Fixes: 1b956f7a ("md/raid5: be more selective about distributing flags across batch.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f29cb2b6
    • Maurizio Lombardi's avatar
      be2iscsi: set the boot_kset pointer to NULL in case of failure · 8f6b413c
      Maurizio Lombardi authored
      commit 84bd6499 upstream.
      
      In beiscsi_setup_boot_info(), the boot_kset pointer should be set to
      NULL in case of failure otherwise an invalid pointer dereference may
      occur later.
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJitendra Bhivare <jitendra.bhivare@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8f6b413c
    • Bjorn Helgaas's avatar
      x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs · 5e731d13
      Bjorn Helgaas authored
      commit b8941571 upstream.
      
      The Home Agent and PCU PCI devices in Broadwell-EP have a non-BAR register
      where a BAR should be.  We don't know what the side effects of sizing the
      "BAR" would be, and we don't know what address space the "BAR" might appear
      to describe.
      
      Mark these devices as having non-compliant BARs so the PCI core doesn't
      touch them.
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Tested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5e731d13
    • Eric Wheeler's avatar
      bcache: fix cache_set_flush() NULL pointer dereference on OOM · bbe44fdc
      Eric Wheeler authored
      commit f8b11260 upstream.
      
      When bch_cache_set_alloc() fails to kzalloc the cache_set, the
      asyncronous closure handling tries to dereference a cache_set that
      hadn't yet been allocated inside of cache_set_flush() which is called
      by __cache_set_unregister() during cleanup.  This appears to happen only
      during an OOM condition on bcache_register.
      Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bbe44fdc
    • Eric Wheeler's avatar
      bcache: cleaned up error handling around register_cache() · b82c6a96
      Eric Wheeler authored
      commit 9b299728 upstream.
      
      Fix null pointer dereference by changing register_cache() to return an int
      instead of being void.  This allows it to return -ENOMEM or -ENODEV and
      enables upper layers to handle the OOM case without NULL pointer issues.
      
      See this thread:
        http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521
      
      Fixes this error:
        gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
      
        bcache: register_cache() error opening sdh2: cannot allocate memory
        BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
        IP: [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
        PGD 120dff067 PUD 1119a3067 PMD 0
        Oops: 0000 [#1] SMP
        Modules linked in: veth ip6table_filter ip6_tables
        (...)
        CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3
        Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
        Workqueue: events cache_set_flush [bcache]
        task: ffff88020d5dc280 ti: ffff88020b6f8000 task.ti: ffff88020b6f8000
        RIP: 0010:[<ffffffffc05a7e8d>]  [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
      Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Tested-by: default avatarMarc MERLIN <marc@merlins.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b82c6a96
    • Eric Wheeler's avatar
      bcache: fix race of writeback thread starting before complete initialization · cbea8298
      Eric Wheeler authored
      commit 07cc6ef8 upstream.
      
      The bch_writeback_thread might BUG_ON in read_dirty() if
      dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed
      its related initialization.  This patch downs the dc->writeback_lock until
      after initialization is complete, thus preventing bch_writeback_thread
      from proceeding prematurely.
      
      See this thread:
        http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453Signed-off-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Tested-by: default avatarMarc MERLIN <marc@merlins.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cbea8298
    • Chris Friesen's avatar
      sched/cputime: Fix steal_account_process_tick() to always return jiffies · 1f69a59b
      Chris Friesen authored
      commit f9c904b7 upstream.
      
      The callers of steal_account_process_tick() expect it to return
      whether a jiffy should be considered stolen or not.
      
      Currently the return value of steal_account_process_tick() is in
      units of cputime, which vary between either jiffies or nsecs
      depending on CONFIG_VIRT_CPU_ACCOUNTING_GEN.
      
      If cputime has nsecs granularity and there is a tiny amount of
      stolen time (a few nsecs, say) then we will consider the entire
      tick stolen and will not account the tick on user/system/idle,
      causing /proc/stats to show invalid data.
      
      The fix is to change steal_account_process_tick() to accumulate
      the stolen time and only account it once it's worth a jiffy.
      
      (Thanks to Frederic Weisbecker for suggestions to fix a bug in my
      first version of the patch.)
      Signed-off-by: default avatarChris Friesen <chris.friesen@windriver.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/56DBBDB8.40305@mail.usask.caSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1f69a59b
    • Kan Liang's avatar
      perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi · 2a659dd5
      Kan Liang authored
      commit c3d266c8 upstream.
      
      This patch tries to fix a PEBS warning found in my stress test. The
      following perf command can easily trigger the pebs warning or spurious
      NMI error on Skylake/Broadwell/Haswell platforms:
      
        sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a
      
      Also the NMI watchdog must be enabled.
      
      For this case, the events number is larger than counter number. So
      perf has to do multiplexing.
      
      In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
      old events, rotate_ctx, schedule in new events and finally
      perf_pmu_enable().
      
      If the old events include precise event, the MSR_IA32_PEBS_ENABLE
      should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
      should keep 0 until the perf_pmu_enable() is called and the new event is
      precise event.
      
      However, there is a corner case which could restore PEBS_ENABLE to
      stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
      will be set to 0 to stop overflow and followed PMI. But there may be
      pending PMI from an earlier overflow, which cannot be stopped. So even
      GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
      the end of the PMI handler, __intel_pmu_enable_all() will be called,
      which will restore the stale values if old events haven't scheduled
      out.
      
      Once the stale pebs value is set, it's impossible to be corrected if
      the new events are non-precise. Because the pebs_enabled will be set
      to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
      setting. As a result, the following NMI with stale PEBS_ENABLE
      trigger pebs warning.
      
      The pending PMI after enabled=0 will become harmless if the NMI handler
      does not change the state. This patch checks cpuc->enabled in pmi and
      only restore the state when PMU is active.
      
      Here is the dump:
      
        Call Trace:
         <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
         [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
         [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
         [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
         [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
         [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
         [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
         [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
         [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
         [<ffffffff810310b9>] nmi_handle+0x69/0x120
         [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
         [<ffffffff810317f2>] do_nmi+0xe2/0x130
         [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
         [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
         [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
         [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
         [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
         [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
         [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
         [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
         [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
         [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
         [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
         <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
         [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
         [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff8119765a>] event_function_call+0x10a/0x120
         [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
         [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
         [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
         [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
         [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
         [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
         [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
         [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
         [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
         [<ffffffff81036d29>] ? sched_clock+0x9/0x10
         [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
         [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
        ---[ end trace aef202839fe9a71d ]---
        Uhhuh. NMI received for unknown reason 2d on CPU 2.
        Do you have a strange power saving mode enabled?
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
      [ Fixed various typos and other small details. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ kamal: backport to 4.2-stable: files renamed ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2a659dd5
    • Jiri Olsa's avatar
      perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2 · bd4ed8b4
      Jiri Olsa authored
      commit e72daf3f upstream.
      
      Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in
      intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we
      don't do this.
      
      The hard lockup is easily triggered by running 'perf test attr'
      repeatedly. Most of the time it gets stuck on sample session with
      small periods.
      
        # perf test attr -vv
        14: struct perf_event_attr setup                             :
        --- start ---
        ...
          'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ kamal: backport to 4.2-stable: files renamed ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bd4ed8b4
    • Alexander Shishkin's avatar
      perf/core: Fix perf_sched_count derailment · 34812019
      Alexander Shishkin authored
      commit 927a5570 upstream.
      
      The error path in perf_event_open() is such that asking for a sampling
      event on a PMU that doesn't generate interrupts will end up in dropping
      the perf_sched_count even though it hasn't been incremented for this
      event yet.
      
      Given a sufficient amount of these calls, we'll end up disabling
      scheduler's jump label even though we'd still have active events in the
      system, thereby facilitating the arrival of the infernal regions upon us.
      
      I'm fixing this by moving account_event() inside perf_event_alloc().
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1456917854-29427-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      34812019