1. 12 May, 2016 7 commits
    • Wanpeng Li's avatar
      sched/nohz: Fix affine unpinned timers mess · 44496922
      Wanpeng Li authored
      The following commit:
      
        9642d18e ("nohz: Affine unpinned timers to housekeepers")'
      
      intended to affine unpinned timers to housekeepers:
      
        unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(houserkeepers, idle)    =>   nearest busy housekeepers(otherwise, fallback to itself)
      
      However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
      intention to:
      
        unpinned timers(full dynaticks, idle)   =>   any housekeepers(no mattter cpu topology)
        unpinned timers(full dynaticks, busy)   =>   any housekeepers(no mattter cpu topology)
        unpinned timers(housekeepers, idle)     =>   any busy cpus(otherwise, fallback to any housekeepers)
      
      This patch fixes it by checking if there are busy housekeepers nearby,
      otherwise falls to any housekeepers/itself. After the patch:
      
        unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(housekeepers, idle)     =>   nearest busy housekeepers(otherwise, fallback to itself)
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Fixed the changelog. ]
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 'commit 9642d18e ("nohz: Affine unpinned timers to housekeepers")'
      Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      44496922
    • Peter Zijlstra's avatar
      sched/fair: Fix fairness issue on migration · 2f950354
      Peter Zijlstra authored
      Pavan reported that in the presence of very light tasks (or cgroups)
      the placement of migrated tasks can cause severe fairness issues.
      
      The problem is that enqueue_entity() places the task before it updates
      time, thereby it can place the task far in the past (remember that
      light tasks will shoot virtual time forward at a high speed, so in
      relation to the pre-existing light task, we can land far in the past).
      
      This is done because update_curr() needs the current task, and we
      might be placing the current task.
      
      The obvious solution is to differentiate between the current and any
      other task; placing the current before we update time, and placing any
      other task after, such that !curr tasks end up at the current moment
      in time, and not in the past.
      
      This commit re-introduces the previously reverted commit:
      
        3a47d512 ("sched/fair: Fix fairness issue on migration")
      
      ... which is now safe to do, after we've also fixed another
      underlying bug first, in:
      
        sched/fair: Prepare to fix fairness problems on migration
      
      and cleaned up other details in the migration code:
      
        sched/core: Kill sched_class::task_waking
      Reported-by: default avatarPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2f950354
    • Peter Zijlstra's avatar
      sched/core: Kill sched_class::task_waking to clean up the migration logic · 59efa0ba
      Peter Zijlstra authored
      With sched_class::task_waking being called only when we do
      set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
      and eliminate sched_class::task_waking entirely.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      59efa0ba
    • Peter Zijlstra's avatar
      sched/fair: Prepare to fix fairness problems on migration · b5179ac7
      Peter Zijlstra authored
      Mike reported that our recent attempt to fix migration problems:
      
        3a47d512 ("sched/fair: Fix fairness issue on migration")
      
      broke interactivity and the signal starve test. We reverted that
      commit and now let's try it again more carefully, with some other
      underlying problems fixed first.
      
      One problem is that I assumed ENQUEUE_WAKING was only set when we do a
      cross-cpu wakeup (migration), which isn't true. This means we now
      destroy the vruntime history of tasks and wakeup-preemption suffers.
      
      Cure this by making my assumption true, only call
      sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
      the indirect call in the case we do a local wakeup.
      Reported-by: default avatarMike Galbraith <mgalbraith@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Fixes: 3a47d512 ("sched/fair: Fix fairness issue on migration")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b5179ac7
    • Peter Zijlstra's avatar
      sched/fair: Move record_wakee() · c58d25f3
      Peter Zijlstra authored
      Since I want to make ->task_woken() conditional on the task getting
      migrated, we cannot use it to call record_wakee().
      
      Move it to select_task_rq_fair(), which gets called in almost all the
      same conditions. The only exception is if the woken task (@p) is
      CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c58d25f3
    • Ingo Molnar's avatar
      Merge branch 'smp/hotplug' into sched/core, to resolve conflicts · 4eb86765
      Ingo Molnar authored
      Conflicts:
      	kernel/sched/core.c
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4eb86765
    • Ingo Molnar's avatar
  2. 11 May, 2016 1 commit
  3. 10 May, 2016 10 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.6-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c5114626
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "Since v4.5, we've WARNed during resume if a PCI device, including a
        Thunderbolt device, was added while we were suspended.  A change we
        merged for v4.6-rc1 turned that warning into a system hang.  These
        enumeration patches from Lukas Wunner fix this issue:
      
         - Fix BUG on device attach failure
         - Do not treat EPROBE_DEFER as device attach failure"
      
      * tag 'pci-v4.6-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Do not treat EPROBE_DEFER as device attach failure
        PCI: Fix BUG on device attach failure
      c5114626
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7ec02e3b
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Two topology corner case fixes, and a MAINTAINERS file update for
        mmiotrace maintenance"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n
        MAINTAINERS: Add mmiotrace entry
        x86/topology: Handle CPUID bogosity gracefully
      7ec02e3b
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ac244065
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "A UP kernel cpufreq fix and a rt/dl scheduler corner case fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/rt, sched/dl: Don't push if task's scheduling class was changed
        sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage
      ac244065
    • Xunlei Pang's avatar
      sched/rt, sched/dl: Don't push if task's scheduling class was changed · 13b5ab02
      Xunlei Pang authored
      We got this warning:
      
          WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
          [...]
          Call Trace:
      
          dump_stack+0x63/0x87
          __warn+0xd1/0xf0
          warn_slowpath_null+0x1d/0x20
          set_task_cpu+0x1af/0x1c0
          push_dl_task.part.34+0xea/0x180
          push_dl_tasks+0x17/0x30
          __balance_callback+0x45/0x5c
          __sched_setscheduler+0x906/0xb90
          SyS_sched_setattr+0x150/0x190
          do_syscall_64+0x62/0x110
          entry_SYSCALL64_slow_path+0x25/0x25
      
      This corresponds to:
      
          WARN_ON_ONCE(p->state == TASK_RUNNING &&
                   p->sched_class == &fair_sched_class &&
                   (p->on_rq && !task_on_rq_migrating(p)))
      
      It happens because in find_lock_later_rq(), the task whose scheduling
      class was changed to fair class is still pushed away as if it were
      a deadline task ...
      
      So, check in find_lock_later_rq() after double_lock_balance(), if the
      scheduling class of the deadline task was changed, break and retry.
      
      Apply the same logic to RT tasks.
      Signed-off-by: default avatarXunlei Pang <xlpang@redhat.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      13b5ab02
    • Thomas Gleixner's avatar
      x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n · 8d415ee2
      Thomas Gleixner authored
      Josef reported that the uncore driver trips over with CONFIG_SMP=n because
      x86_max_cores is 16 instead of 12.
      
      The reason is, that for SMP=n the extended topology detection is a NOOP and
      the cache leaf is used to determine the number of cores. That's wrong in two
      aspects:
      
      1) The cache leaf enumerates the maximum addressable number of cores in the
         package, which is obviously not correct
      
      2) UP has no business with topology bits at all.
      
      Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n
      Reported-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team <Kernel-team@fb.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com
      8d415ee2
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2d0bd953
      Linus Torvalds authored
      Pull libnvdimm build fix from Dan Williams:
       "A build fix for the usage of HPAGE_SIZE in the last libnvdimm pull
        request.
      
        I have taken note that the kbuild robot build success test does not
        include results for alpha_allmodconfig.  Thanks to Guenter for the
        report.  It's tagged for -stable since the original fix will land
        there and cause build problems"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure
      2d0bd953
    • Andy Lutomirski's avatar
      perf/core: Change the default paranoia level to 2 · 0161028b
      Andy Lutomirski authored
      Allowing unprivileged kernel profiling lets any user dump follow kernel
      control flow and dump kernel registers.  This most likely allows trivial
      kASLR bypassing, and it may allow other mischief as well.  (Off the top
      of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads
      could be quite interesting.)
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0161028b
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 5c56b563
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "2 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        zsmalloc: fix zs_can_compact() integer overflow
        Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""
      5c56b563
    • Sergey Senozhatsky's avatar
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky authored
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99
    • Robin Humble's avatar
      Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan"" · 1e92a61c
      Robin Humble authored
      This reverts the 4.6-rc1 commit 7e2bc81d ("proc/base: make prompt
      shell start from new line after executing "cat /proc/$pid/wchan")
      because it breaks /proc/$PID/whcan formatting in ps and top.
      
      Revert also because the patch is inconsistent - it adds a newline at the
      end of only the '0' wchan, and does not add a newline when
      /proc/$PID/wchan contains a symbol name.
      
      eg.
      $ ps -eo pid,stat,wchan,comm
      PID STAT WCHAN  COMMAND
      ...
      1189 S    -      dbus-launch
      1190 Ssl  0
      dbus-daemon
      1198 Sl   0
      lightdm
      1299 Ss   ep_pol systemd
      1301 S    -      (sd-pam)
      1304 Ss   wait   sh
      Signed-off-by: default avatarRobin Humble <plaguedbypenguins@gmail.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e92a61c
  4. 09 May, 2016 13 commits
  5. 08 May, 2016 2 commits
  6. 07 May, 2016 7 commits