- 28 Oct, 2014 20 commits
-
-
Wanpeng Li authored
As Kirill mentioned (https://lkml.org/lkml/2013/1/29/118): | If rq has already had 2 or more pushable tasks and we try to add a | pinned task then call of push_rt_task will just waste a time. Just switched pinned task is not able to be pushed. If the rq has had several dl tasks before they have already been considered as candidates to be pushed (or pulled). This patch implements the same behavior as rt class which introduced by commit 10447917 ("sched/rt: Do not try to push tasks if pinned task switches to RT"). Suggested-by: Kirill V Tkhai <tkhai@yandex.ru> Acked-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413938203-224610-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Oleg Nesterov authored
task_preempt_count() is pointless if preemption counter is per-cpu, currently this is x86 only. It is only valid if the task is not running, and even in this case the only info it can provide is the state of PREEMPT_ACTIVE bit. Change its single caller to check p->on_rq instead, this should be the same if p->state != TASK_RUNNING, and kill this helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Oleg Nesterov authored
Both callers of finish_task_switch() need to recalculate this_rq() and pass it as an argument, plus __schedule() does this again after context_switch(). It would be simpler to call this_rq() once in finish_task_switch() and return the this rq to the callers. Note: probably "int cpu" in __schedule() should die; it is not used and both rcu_note_context_switch() and wq_worker_sleeping() do not really need this argument. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141009193232.GB5408@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Oleg Nesterov authored
finish_task_switch() enables preemption, so post_schedule(rq) can be called on the wrong (and even dead) CPU. Afaics, nothing really bad can happen, but in this case we can wrongly clear rq->post_schedule on that CPU. And this simply looks wrong in any case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141008193644.GA32055@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Oleg Nesterov authored
task_preempt_count() has nothing to do with the actual preempt counter, thread_info->saved_preempt_count is only valid right after switch_to(). __trace_sched_switch_state() can use preempt_count(), prev is still the current task when trace_sched_switch() is called. Signed-off-by: Oleg Nesterov <oleg@redhat.com> [ Added BUG_ON(). ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20141007195108.GB28002@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Rik van Riel authored
In pseudo-interleaved numa_groups, all tasks try to relocate to the group's preferred_nid. When a group is spread across multiple NUMA nodes, this can lead to tasks swapping their location with other tasks inside the same group, instead of swapping location with tasks from other NUMA groups. This can keep NUMA groups from converging. Examining all nodes, when dealing with a task in a pseudo-interleaved NUMA group, avoids this problem. Note that only CPUs in nodes that improve the task or group score are examined, so the loop isn't too bad. Tested-by: Vinod Chegu <chegu_vinod@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Vinod Chegu" <chegu_vinod@hp.com> Cc: mgorman@suse.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141009172747.0d97c38c@annuminas.surriel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Rik van Riel authored
On systems with complex NUMA topologies, the node scoring is adjusted to allow workloads to converge on nodes that are near each other. The way a task group's preferred nid is determined needs to be adjusted, in order for the preferred_nid to be consistent with group_weight scoring. This ensures that we actually try to converge workloads on adjacent nodes. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: chegu_vinod@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413530994-9732-6-git-send-email-riel@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Rik van Riel authored
In order to do task placement on systems with complex NUMA topologies, it is necessary to count the faults on nodes nearby the node that is being examined for a potential move. In case of a system with a backplane interconnect, we are dealing with groups of NUMA nodes; each of the nodes within a group is the same number of hops away from nodes in other groups in the system. Optimal placement on this topology is achieved by counting all nearby nodes equally. When comparing nodes A and B at distance N, nearby nodes are those at distances smaller than N from nodes A or B. Placement strategy on a system with a glueless mesh NUMA topology needs to be different, because there are no natural groups of nodes determined by the hardware. Instead, when dealing with two nodes A and B at distance N, N >= 2, there will be intermediate nodes at distance < N from both nodes A and B. Good placement can be achieved by right shifting the faults on nearby nodes by the number of hops from the node being scored. In this context, a nearby node is any node less than the maximum distance in the system away from the node. Those nodes are skipped for efficiency reasons, there is no real policy reason to do so. Placement policy on directly connected NUMA systems is not affected. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: mgorman@suse.de Cc: chegu_vinod@hp.com Link: http://lkml.kernel.org/r/1413530994-9732-5-git-send-email-riel@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Rik van Riel authored
Preparatory patch for adding NUMA placement on systems with complex NUMA topology. Also fix a potential divide by zero in group_weight() Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: chegu_vinod@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413530994-9732-4-git-send-email-riel@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Rik van Riel authored
Smaller NUMA systems tend to have all NUMA nodes directly connected to each other. This includes the degenerate case of a system with just one node, ie. a non-NUMA system. Larger systems can have two kinds of NUMA topology, which affects how tasks and memory should be placed on the system. On glueless mesh systems, nodes that are not directly connected to each other will bounce traffic through intermediary nodes. Task groups can be run closer to each other by moving tasks from a node to an intermediary node between it and the task's preferred node. On NUMA systems with backplane controllers, the intermediary hops are incapable of running programs. This creates "islands" of nodes that are at an equal distance to anywhere else in the system. Each kind of topology requires a slightly different placement algorithm; this patch provides the mechanism to detect the kind of NUMA topology of a system. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> [ Changed to use kernel/sched/sched.h ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: mgorman@suse.de Cc: chegu_vinod@hp.com Link: http://lkml.kernel.org/r/1413530994-9732-3-git-send-email-riel@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Rik van Riel authored
Export some information that is necessary to do placement of tasks on systems with multi-level NUMA topologies. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: chegu_vinod@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413530994-9732-2-git-send-email-riel@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Kirill Tkhai authored
1) switched_to_dl() check is wrong. We reschedule only if rq->curr is deadline task, and we do not reschedule if it's a lower priority task. But we must always preempt a task of other classes. 2) dl_task_timer(): Policy does not change in case of priority inheritance. rt_mutex_setprio() changes prio, while policy remains old. So we lose some balancing logic in dl_task_timer() and switched_to_dl() when we check policy instead of priority. Boosted task may be rq->curr. (I didn't change switched_from_dl() because no check is necessary there at all). I've looked at this place(switched_to_dl) several times and even fixed this function, but found just now... I suppose some performance tests may work better after this. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413909356.19914.128.camel@tkhaiSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Chen Hanxiao authored
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1412674147-8941-1-git-send-email-chenhanxiao@cn.fujitsu.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Oleg Nesterov authored
preempt_schedule_context() does preempt_enable_notrace() at the end and this can call the same function again; exception_exit() is heavy and it is quite possible that need-resched is true again. 1. Change this code to dec preempt_count() and check need_resched() by hand. 2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid the enable/disable dance around __schedule(). But in this case we need to move into sched/core.c. 3. Cosmetic, but x86 forgets to declare this function. This doesn't really matter because it is only called by asm helpers, still it make sense to add the declaration into asm/preempt.h to match preempt_schedule(). Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Kirill Tkhai authored
File /proc/sys/kernel/numa_balancing_scan_size_mb allows writing of zero. This bash command reproduces problem: $ while :; do echo 0 > /proc/sys/kernel/numa_balancing_scan_size_mb; \ echo 256 > /proc/sys/kernel/numa_balancing_scan_size_mb; done divide error: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 24112 Comm: bash Not tainted 3.17.0+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88013c852600 ti: ffff880037a68000 task.ti: ffff880037a68000 RIP: 0010:[<ffffffff81074191>] [<ffffffff81074191>] task_scan_min+0x21/0x50 RSP: 0000:ffff880037a6bce0 EFLAGS: 00010246 RAX: 0000000000000a00 RBX: 00000000000003e8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013c852600 RBP: ffff880037a6bcf0 R08: 0000000000000001 R09: 0000000000015c90 R10: ffff880239bf6c00 R11: 0000000000000016 R12: 0000000000003fff R13: ffff88013c852600 R14: ffffea0008d1b000 R15: 0000000000000003 FS: 00007f12bb048700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001505678 CR3: 0000000234770000 CR4: 00000000000006f0 Stack: ffff88013c852600 0000000000003fff ffff880037a6bd18 ffffffff810741d1 ffff88013c852600 0000000000003fff 000000000002bfff ffff880037a6bda8 ffffffff81077ef7 ffffea0008a56d40 0000000000000001 0000000000000001 Call Trace: [<ffffffff810741d1>] task_scan_max+0x11/0x40 [<ffffffff81077ef7>] task_numa_fault+0x1f7/0xae0 [<ffffffff8115a896>] ? migrate_misplaced_page+0x276/0x300 [<ffffffff81134a4d>] handle_mm_fault+0x62d/0xba0 [<ffffffff8103e2f1>] __do_page_fault+0x191/0x510 [<ffffffff81030122>] ? native_smp_send_reschedule+0x42/0x60 [<ffffffff8106dc00>] ? check_preempt_curr+0x80/0xa0 [<ffffffff8107092c>] ? wake_up_new_task+0x11c/0x1a0 [<ffffffff8104887d>] ? do_fork+0x14d/0x340 [<ffffffff811799bb>] ? get_unused_fd_flags+0x2b/0x30 [<ffffffff811799df>] ? __fd_install+0x1f/0x60 [<ffffffff8103e67c>] do_page_fault+0xc/0x10 [<ffffffff8150d322>] page_fault+0x22/0x30 RIP [<ffffffff81074191>] task_scan_min+0x21/0x50 RSP <ffff880037a6bce0> ---[ end trace 9a826d16936c04de ]--- Also fix race in task_scan_min (it depends on compiler behaviour). Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: David Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@fb.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1413455977.24793.78.camel@tkhaiSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yasuaki Ishimatsu authored
While offling node by hot removing memory, the following divide error occurs: divide error: 0000 [#1] SMP [...] Call Trace: [...] handle_mm_fault [...] ? try_to_wake_up [...] ? wake_up_state [...] __do_page_fault [...] ? do_futex [...] ? put_prev_entity [...] ? __switch_to [...] do_page_fault [...] page_fault [...] RIP [<ffffffff810a7081>] task_numa_fault RSP <ffff88084eb2bcb0> The issue occurs as follows: 1. When page fault occurs and page is allocated from node 1, task_struct->numa_faults_buffer_memory[] of node 1 is incremented and p->numa_faults_locality[] is also incremented as follows: o numa_faults_buffer_memory[] o numa_faults_locality[] NR_NUMA_HINT_FAULT_TYPES | 0 | 1 | ---------------------------------- ---------------------- node 0 | 0 | 0 | remote | 0 | node 1 | 0 | 1 | locale | 1 | ---------------------------------- ---------------------- 2. node 1 is offlined by hot removing memory. 3. When page fault occurs, fault_types[] is calculated by using p->numa_faults_buffer_memory[] of all online nodes in task_numa_placement(). But node 1 was offline by step 2. So the fault_types[] is calculated by using only p->numa_faults_buffer_memory[] of node 0. So both of fault_types[] are set to 0. 4. The values(0) of fault_types[] pass to update_task_scan_period(). 5. numa_faults_locality[1] is set to 1. So the following division is calculated. static void update_task_scan_period(struct task_struct *p, unsigned long shared, unsigned long private){ ... ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared)); } 6. But both of private and shared are set to 0. So divide error occurs here. The divide error is rare case because the trigger is node offline. This patch always increments denominator for avoiding divide error. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/54475703.8000505@jp.fujitsu.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Kirill Tkhai authored
Unlocked access to dst_rq->curr in task_numa_compare() is racy. If curr task is exiting this may be a reason of use-after-free: task_numa_compare() do_exit() ... current->flags |= PF_EXITING; ... release_task() ... ~~delayed_put_task_struct()~~ ... schedule() rcu_read_lock() ... cur = ACCESS_ONCE(dst_rq->curr) ... ... rq->curr = next; ... context_switch() ... finish_task_switch() ... put_task_struct() ... __put_task_struct() ... free_task_struct() task_numa_assign() ... get_task_struct() ... As noted by Oleg: <<The lockless get_task_struct(tsk) is only safe if tsk == current and didn't pass exit_notify(), or if this tsk was found on a rcu protected list (say, for_each_process() or find_task_by_vpid()). IOW, it is only safe if release_task() was not called before we take rcu_read_lock(), in this case we can rely on the fact that delayed_put_pid() can not drop the (potentially) last reference until rcu_read_unlock(). And as Kirill pointed out task_numa_compare()->task_numa_assign() path does get_task_struct(dst_rq->curr) and this is not safe. The task_struct itself can't go away, but rcu_read_lock() can't save us from the final put_task_struct() in finish_task_switch(); this reference goes away without rcu gp>> The patch provides simple check of PF_EXITING flag. If it's not set, this guarantees that call_rcu() of delayed_put_task_struct() callback hasn't happened yet, so we can safely do get_task_struct() in task_numa_assign(). Locked dst_rq->lock protects from concurrency with the last schedule(). Reusing or unmapping of cur's memory may happen without it. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhaiSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Juri Lelli authored
dl_task_timer() is racy against several paths. Daniel noticed that the replenishment timer may experience a race condition against an enqueue_dl_entity() called from rt_mutex_setprio(). With his own words: rt_mutex_setprio() resets p->dl.dl_throttled. So the pattern is: start_dl_timer() throttled = 1, rt_mutex_setprio() throlled = 0, sched_switch() -> enqueue_task(), dl_task_timer-> enqueue_task() throttled is 0 => BUG_ON(on_dl_rq(dl_se)) fires as the scheduling entity is already enqueued on the -deadline runqueue. As we do for the other races, we just bail out in the replenishment timer code. Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: vincent@legout.info Cc: Dario Faggioli <raistlin@linux.it> Cc: Michael Trimarchi <michael@amarulasolutions.com> Cc: Fabio Checconi <fchecconi@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414142198-18552-5-git-send-email-juri.lelli@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Juri Lelli authored
In the deboost path, right after the dl_boosted flag has been reset, we can currently end up replenishing using -deadline parameters of a !SCHED_DEADLINE entity. This of course causes a bug, as those parameters are empty. In the case depicted above it is safe to simply bail out, as the deboosted task is going to be back to its original scheduling class anyway. Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: vincent@legout.info Cc: Dario Faggioli <raistlin@linux.it> Cc: Michael Trimarchi <michael@amarulasolutions.com> Cc: Fabio Checconi <fchecconi@gmail.com> Link: http://lkml.kernel.org/r/1414142198-18552-4-git-send-email-juri.lelli@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Kirill Tkhai authored
The race may happen when somebody is changing task_group of a forking task. Child's cgroup is the same as parent's after dup_task_struct() (there just memory copying). Also, cfs_rq and rt_rq are the same as parent's. But if parent changes its task_group before it's called cgroup_post_fork(), we do not reflect this situation on child. Child's cfs_rq and rt_rq remain the same, while child's task_group changes in cgroup_post_fork(). To fix this we introduce fork() method, which calls sched_move_task() directly. This function changes sched_task_group on appropriate (also its logic has no problem with freshly created tasks, so we shouldn't introduce something special; we are able just to use it). Possibly, this decides the Burke Libbey's problem: https://lkml.org/lkml/2014/10/24/456Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414405105.19914.169.camel@tkhaiSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 26 Oct, 2014 4 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds authored
Pull ARM SoC fixes from Olof Johansson: "Another week, another small batch of fixes. Most of these make zynq, socfpga and sunxi platforms work a bit better: - due to new requirements for regulators, DWMMC on socfpga broke past v3.17 - SMP spinup fix for socfpga - a few DT fixes for zynq - another option (FIXED_REGULATOR) for sunxi is needed that used to be selected by other options but no longer is. - a couple of small DT fixes for at91 - ...and a couple for i.MX" * tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: imx28-evk: Let i2c0 run at 100kHz ARM: i.MX6: Fix "emi" clock name typo ARM: multi_v7_defconfig: enable CONFIG_MMC_DW_ROCKCHIP ARM: sunxi_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE ARM: dts: socfpga: Add a 3.3V fixed regulator node ARM: dts: socfpga: Fix SD card detect ARM: dts: socfpga: rename gpio nodes ARM: at91/dt: sam9263: fix PLLB frequencies power: reset: at91-reset: fix power down register MAINTAINERS: add atmel ssc driver maintainer entry arm: socfpga: fix fetching cpu1start_addr for SMP ARM: zynq: DT: trivial: Fix mc node ARM: zynq: DT: Add cadence watchdog node ARM: zynq: DT: Add missing reference for memory-controller ARM: zynq: DT: Add missing reference for ADC ARM: zynq: DT: Add missing address for L2 pl310 ARM: zynq: DT: Remove 222 MHz OPP ARM: zynq: DT: Fix GEM register area size
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs updates from Al Viro: "overlayfs merge + leak fix for d_splice_alias() failure exits" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: overlayfs: embed middle into overlay_readdir_data overlayfs: embed root into overlay_readdir_data overlayfs: make ovl_cache_entry->name an array instead of pointer overlayfs: don't hold ->i_mutex over opening the real directory fix inode leaks on d_splice_alias() failure exits fs: limit filesystem stacking depth overlay: overlay filesystem documentation overlayfs: implement show_options overlayfs: add statfs support overlay filesystem shmem: support RENAME_WHITEOUT ext4: support RENAME_WHITEOUT vfs: add RENAME_WHITEOUT vfs: add whiteout support vfs: export check_sticky() vfs: introduce clone_private_mount() vfs: export __inode_permission() to modules vfs: export do_splice_direct() to modules vfs: add i_op->dentry_open()
-
Olof Johansson authored
Merge tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Merge "ARM: imx: fixes for 3.18" from Shawn Guo: The i.MX fixes for 3.18: - Revert one patch which increases I2C bus frequency on imx28-evk - Fix a typo on imx6q EIM clock name * tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx28-evk: Let i2c0 run at 100kHz ARM: i.MX6: Fix "emi" clock name typo Signed-off-by: Olof Johansson <olof@lixom.net>
-
- 25 Oct, 2014 6 commits
-
-
Fabio Estevam authored
Commit 78b81f46 ("ARM: dts: imx28-evk: Run I2C0 at 400kHz") caused issues when doing the following sequence in loop: - Boot the kernel - Perform audio playback - Reboot the system via 'reboot' command In many times the audio card cannot be probed, which causes playback to fail. After restoring to the original i2c0 frequency of 100kHz there is no such problem anymore. This reverts commit 78b81f46. Cc: <stable@vger.kernel.org> # 3.16+ Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
-
Steve Longerbeam authored
Fix a typo error, the "emi" names refer to the eim clocks. The change fixes typo in EIM and EIM_SLOW pre-output dividers and selectors clock names. Notably EIM_SLOW clock itself is named correctly. Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com> [vladimir_zapolskiy@mentor.com: ported to v3.17] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Sascha Hauer <kernel@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
-
Al Viro authored
same story... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
no sense having it a pointer - all instances have it pointing to local variable in the same stack frame Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
just use it to serialize the assignment Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 24 Oct, 2014 10 commits
-
-
git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds authored
Pull MIPS fixes from Ralf Baechle: "This is the first round of fixes and tying up loose ends for MIPS. - plenty of fixes for build errors in specific obscure configurations - remove redundant code on the Lantiq platform - removal of a useless SEAD I2C driver that was causing a build issue - fix an earlier TLB exeption handler fix to also work on Octeon. - fix ISA level dependencies in FPU emulator's instruction decoding. - don't hardcode kernel command line in Octeon software emulator. - fix an earlier fix for the Loondson 2 clock setting" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: SEAD3: Fix I2C device registration. MIPS: SEAD3: Nuke PIC32 I2C driver. MIPS: ftrace: Fix a microMIPS build problem MIPS: MSP71xx: Fix build error MIPS: Malta: Do not build the malta-amon.c file if CMP is not enabled MIPS: Prevent compiler warning from cop2_{save,restore} MIPS: Kconfig: Add missing MIPS_CPS dependencies to PM and cpuidle MIPS: idle: Remove leftover __pastwait symbol and its references MIPS: Sibyte: Include the swarm subdir to the sb1250 LittleSur builds MIPS: ptrace.h: Add a missing include MIPS: ath79: Fix compilation error when CONFIG_PCI is disabled MIPS: MSP71xx: Remove compilation error when CONFIG_MIPS_MT is present MIPS: Octeon: Remove special case for simulator command line. MIPS: tlbex: Properly fix HUGE TLB Refill exception handler MIPS: loongson2_cpufreq: Fix CPU clock rate setting mismerge pci: pci-lantiq: remove duplicate check on resource MIPS: Lasat: Add missing CONFIG_PROC_FS dependency to PICVUE_PROC MIPS: cp1emu: Fix ISA restrictions for cop1x_op instructions
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Catalin Marinas: - enable 48-bit VA space now that KVM has been fixed, together with a couple of fixes for pgd allocation alignment and initial memblock current_limit. There is still a dependency on !ARM_SMMU which needs to be updated as it uses the page table manipulation macros of the host kernel - eBPF fixes following changes/conflicts during the merging window - Compat types affecting compat_elf_prpsinfo - Compilation error on UP builds - ASLR fix when /proc/sys/kernel/randomize_va_space == 0 - DT definitions for CLCD support on ARMv8 model platform * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix memblock current_limit with 64K pages and 48-bit VA arm64: ASLR: Don't randomise text when randomise_va_space == 0 arm64: vexpress: Add CLCD support to the ARMv8 model platform arm64: Fix compilation error on UP builds Documentation/arm64/memory.txt: fix typo net: bpf: arm64: minor fix of type in jited arm64: bpf: add 'load 64-bit immediate' instruction arm64: bpf: add 'shift by register' instructions net: bpf: arm64: address randomize and write protect JIT code arm64: mm: Correct fixmap pagetable types arm64: compat: fix compat types affecting struct compat_elf_prpsinfo arm64: Align less than PAGE_SIZE pgds naturally arm64: Allow 48-bits VA space without ARM_SMMU
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull two sparc fixes from David Miller: 1) Fix boots with gcc-4.9 compiled sparc64 kernels. 2) Add missing __get_user_pages_fast() on sparc64 to fix hangs on futexes used in transparent hugepage areas. It's really idiotic to have a weak symbolled fallback that just returns zero, and causes this kind of bug. There should be no backup implementation and the link should fail if the architecture fails to provide __get_user_pages_fast() and supports transparent hugepages. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Implement __get_user_pages_fast(). sparc64: Fix register corruption in top-most kernel stack frame during boot.
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vfio: fix unregister kvm_device_ops of vfio KVM: x86: Wrong assertion on paging_tmpl.h kvm: fix excessive pages un-pinning in kvm_iommu_map error path. KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag KVM: x86: Emulator does not decode clflush well KVM: emulate: avoid accessing NULL ctxt->memopp KVM: x86: Decoding guest instructions which cross page boundary may fail kvm: x86: don't kill guest on unknown exit reason kvm: vmx: handle invvpid vm exit gracefully KVM: x86: Handle errors when RIP is set during far jumps KVM: x86: Emulator fixes for eip canonical checks on near branches KVM: x86: Fix wrong masking on relative jump/call KVM: x86: Improve thread safety in pit KVM: x86: Prevent host from panicking on shared MSR writes. KVM: x86: Check non-canonical addresses upon WRMSR
-
Linus Torvalds authored
Merge tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix regression in xen_clocksource_read() which caused all Xen guests to crash early in boot. - Several fixes for super rare race conditions in the p2m. - Assorted other minor fixes. * tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Allocate memory for physdev_pci_device_add's optarr x86/xen: panic on bad Xen-provided memory map x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read() x86/xen: avoid race in p2m handling x86/xen: delay construction of mfn_list_list x86/xen: avoid writing to freed memory after race in p2m handling xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Here are a chunk of small fixes since rc1: two PCM core fixes, one is a long-standing annoyance about lockdep and another is an ARM64 mmap fix. The rest are a HD-audio HDMI hotplug notification fix, a fix for missing NULL termination in Realtek codec quirks and a few new device/codec-specific quirks as usual" * tag 'sound-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Add missing terminating entry to SND_HDA_PIN_QUIRK macro ALSA: pcm: Fix false lockdep warnings ALSA: hda - Fix inverted LED gpio setup for Lenovo Ideapad ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug ALSA: usb-audio: Add support for Steinberg UR22 USB interface ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume ALSA: pcm: use the same dma mmap codepath both for arm and arm64
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/randomLinus Torvalds authored
Pull /dev/random updates from Ted Ts'o: "This adds a memzero_explicit() call which is guaranteed not to be optimized away by GCC. This is important when we are wiping cryptographically sensitive material" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: crypto: memzero_explicit - make sure to clear out sensitive data random: add and use memzero_explicit() for clearing data
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI and power management updates from Rafael Wysocki: "This is material that didn't make it to my 3.18-rc1 pull request for various reasons, mostly related to timing and travel (LinuxCon EU / LPC) plus a couple of fixes for recent bugs. The only really new thing here is the PM QoS class for memory bandwidth, but it is simple enough and users of it will be added in the next cycle. One major change in behavior is that platform devices enumerated by ACPI will use 32-bit DMA mask by default. Also included is an ACPICA update to a new upstream release, but that's mostly cleanups, changes in tools and similar. The rest is fixes and cleanups mostly. Specifics: - Fix for a recent PCI power management change that overlooked the fact that some IRQ chips might not be able to configure PCIe PME for system wakeup from Lucas Stach. - Fix for a bug introduced in 3.17 where acpi_device_wakeup() is called with a wrong ordering of arguments from Zhang Rui. - A bunch of intel_pstate driver fixes (all -stable candidates) from Dirk Brandewie, Gabriele Mazzotta and Pali Rohár. - Fixes for a rather long-standing problem with the OOM killer and the freezer that frozen processes killed by the OOM do not actually release any memory until they are thawed, so OOM-killing them is rather pointless, with a couple of cleanups on top (Michal Hocko, Cong Wang, Rafael J Wysocki). - ACPICA update to upstream release 20140926, inlcuding mostly cleanups reducing differences between the upstream ACPICA and the kernel code, tools changes (acpidump, acpiexec) and support for the _DDN object (Bob Moore, Lv Zheng). - New PM QoS class for memory bandwidth from Tomeu Vizoso. - Default 32-bit DMA mask for platform devices enumerated by ACPI (this change is mostly needed for some drivers development in progress targeted at 3.19) from Heikki Krogerus. - ACPI EC driver cleanups, mostly related to debugging, from Lv Zheng. - cpufreq-dt driver updates from Thomas Petazzoni. - powernv cpuidle driver update from Preeti U Murthy" * tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits) intel_pstate: Correct BYT VID values. intel_pstate: Fix BYT frequency reporting intel_pstate: Don't lose sysfs settings during cpu offline cpufreq: intel_pstate: Reflect current no_turbo state correctly cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy PCI / PM: handle failure to enable wakeup on PCIe PME ACPI: invoke acpi_device_wakeup() with correct parameters PM / freezer: Clean up code after recent fixes PM: convert do_each_thread to for_each_process_thread OOM, PM: OOM killed task shouldn't escape PM suspend freezer: remove obsolete comments in __thaw_task() freezer: Do not freeze tasks killed by OOM killer ACPI / platform: provide default DMA mask cpuidle: powernv: Populate cpuidle state details by querying the device-tree cpufreq: cpufreq-dt: adjust message related to regulators cpufreq: cpufreq-dt: extend with platform_data cpufreq: allow driver-specific data ACPI / EC: Cleanup coding style. ACPI / EC: Refine event/query debugging messages. ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linuxLinus Torvalds authored
Pull thermal management updates from Zhang Rui: "Sorry that I missed the merge window as there is a bug found in the last minute, and I have to fix it and wait for the code to be tested in linux-next tree for a few days. Now the buggy patch has been dropped entirely from my next branch. Thus I hope those changes can still be merged in 3.18-rc2 as most of them are platform thermal driver changes. Specifics: - introduce ACPI INT340X thermal drivers. Newer laptops and tablets may have thermal sensors and other devices with thermal control capabilities that are exposed for the OS to use via the ACPI INT340x device objects. Several drivers are introduced to expose the temperature information and cooling ability from these objects to user-space via the normal thermal framework. From: Lu Aaron, Lan Tianyu, Jacob Pan and Zhang Rui. - introduce a new thermal governor, which just uses a hysteresis to switch abruptly on/off a cooling device. This governor can be used to control certain fan devices that can not be throttled but just switched on or off. From: Peter Feuerer. - introduce support for some new thermal interrupt functions on i.MX6SX, in IMX thermal driver. From: Anson, Huang. - introduce tracing support on thermal framework. From: Punit Agrawal. - small fixes in OF thermal and thermal step_wise governor" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits) Thermal: int340x thermal: select ACPI fan driver Thermal: int3400_thermal: use acpi_thermal_rel parsing APIs Thermal: int340x_thermal: expose acpi thermal relationship tables Thermal: introduce int3403 thermal driver Thermal: introduce INT3402 thermal driver Thermal: move the KELVIN_TO_MILLICELSIUS macro to thermal.h ACPI / Fan: support INT3404 thermal device ACPI / Fan: add ACPI 4.0 style fan support ACPI / fan: convert to platform driver ACPI / fan: use acpi_device_xxx_power instead of acpi_bus equivelant ACPI / fan: remove no need check for device pointer ACPI / fan: remove unused macro Thermal: int3400 thermal: register to thermal framework Thermal: int3400 thermal: add capability to detect supporting UUIDs Thermal: introduce int3400 thermal driver ACPI: add ACPI_TYPE_LOCAL_REFERENCE support to acpi_extract_package() ACPI: make acpi_create_platform_device() an external API thermal: step_wise: fix: Prevent from binary overflow when trend is dropping ACPI: introduce ACPI int340x thermal scan handler thermal: Added Bang-bang thermal governor ...
-
Catalin Marinas authored
With 48-bit VA space, the 64K page configuration uses 3 levels instead of 2 and PUD_SIZE != PMD_SIZE. Since with 64K pages we only cover PMD_SIZE with the initial swapper_pg_dir populated in head.S, the memblock current_limit needs to be set accordingly in map_mem() to avoid allocating unmapped memory. The memblock current_limit is progressively increased as more blocks are mapped. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-