- 27 Jan, 2015 3 commits
-
-
Thomas Graf authored
[ Upstream commit a18e6a18 ] Each mmap Netlink frame contains a status field which indicates whether the frame is unused, reserved, contains data or needs to be skipped. Both loads and stores may not be reordeded and must complete before the status field is changed and another CPU might pick up the frame for use. Use an smp_mb() to cover needs of both types of callers to netlink_set_status(), callers which have been reading data frame from the frame, and callers which have been filling or releasing and thus writing to the frame. - Example code path requiring a smp_rmb(): memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len); netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED); - Example code path requiring a smp_wmb(): hdr->nm_uid = from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid); hdr->nm_gid = from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid); netlink_frame_flush_dcache(hdr); netlink_set_status(hdr, NL_MMAP_STATUS_VALID); Fixes: f9c228 ("netlink: implement memory mapped recvmsg()") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Miller authored
[ Upstream commit 4682a035 ] Checking the file f_count and the nlk->mapped count is not completely sufficient to prevent the mmap'd area contents from changing from under us during netlink mmap sendmsg() operations. Be careful to sample the header's length field only once, because this could change from under us as well. Fixes: 5fd96123 ("netlink: implement memory mapped sendmsg()") Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Timo Teräs authored
[ Upstream commit 8a0033a9 ] The NBMA GRE tunnels temporarily push GRE header that contain the per-packet NBMA destination on the skb via header ops early in xmit path. It is the later pulled before the real GRE header is constructed. The inner mac was thus set differently in nbma case: the GRE header has been pushed by neighbor layer, and mac header points to beginning of the temporary gre header (set by dev_queue_xmit). Now that the offloads expect mac header to point to the gre payload, fix the xmit patch to: - pull first the temporary gre header away - and reset mac header to point to gre payload This fixes tso to work again with nbma tunnels. Fixes: 14051f04 ("gre: Use inner mac length when computing tunnel length") Signed-off-by: Timo Teräs <timo.teras@iki.fi> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 16 Jan, 2015 37 commits
-
-
Greg Kroah-Hartman authored
-
Linus Torvalds authored
commit 690eac53 upstream. Commit fee7e49d ("mm: propagate error from stack expansion even for guard page") made sure that we return the error properly for stack growth conditions. It also theorized that counting the guard page towards the stack limit might break something, but also said "Let's see if anybody notices". Somebody did notice. Apparently android-x86 sets the stack limit very close to the limit indeed, and including the guard page in the rlimit check causes the android 'zygote' process problems. So this adds the (fairly trivial) code to make the stack rlimit check be against the actual real stack size, rather than the size of the vma that includes the guard page. Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org> Cc: Jay Foad <jay.foad@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit fee7e49d upstream. Jay Foad reports that the address sanitizer test (asan) sometimes gets confused by a stack pointer that ends up being outside the stack vma that is reported by /proc/maps. This happens due to an interaction between RLIMIT_STACK and the guard page: when we do the guard page check, we ignore the potential error from the stack expansion, which effectively results in a missing guard page, since the expected stack expansion won't have been done. And since /proc/maps explicitly ignores the guard page (commit d7824370: "mm: fix up some user-visible effects of the stack guard page"), the stack pointer ends up being outside the reported stack area. This is the minimal patch: it just propagates the error. It also effectively makes the guard page part of the stack limit, which in turn measn that the actual real stack is one page less than the stack limit. Let's see if anybody notices. We could teach acct_stack_growth() to allow an extra page for a grow-up/grow-down stack in the rlimit test, but I don't want to add more complexity if it isn't needed. Reported-and-tested-by: Jay Foad <jay.foad@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlastimil Babka authored
commit 9e5e3661 upstream. Charles Shirron and Paul Cassella from Cray Inc have reported kswapd stuck in a busy loop with nothing left to balance, but kswapd_try_to_sleep() failing to sleep. Their analysis found the cause to be a combination of several factors: 1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait 2. The process has been killed (by OOM in this case), but has not yet been scheduled to remove itself from the waitqueue and die. 3. kswapd checks for throttled processes in prepare_kswapd_sleep(): if (waitqueue_active(&pgdat->pfmemalloc_wait)) { wake_up(&pgdat->pfmemalloc_wait); return false; // kswapd will not go to sleep } However, for a process that was already killed, wake_up() does not remove the process from the waitqueue, since try_to_wake_up() checks its state first and returns false when the process is no longer waiting. 4. kswapd is running on the same CPU as the only CPU that the process is allowed to run on (through cpus_allowed, or possibly single-cpu system). 5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd encounters no voluntary preemption points and repeatedly fails prepare_kswapd_sleep(), blocking the process from running and removing itself from the waitqueue, which would let kswapd sleep. So, the source of the problem is that we prevent kswapd from going to sleep until there are processes waiting on the pfmemalloc_wait queue, and a process waiting on a queue is guaranteed to be removed from the queue only when it gets scheduled. This was done to make sure that no process is left sleeping on pfmemalloc_wait when kswapd itself goes to sleep. However, it isn't necessary to postpone kswapd sleep until the pfmemalloc_wait queue actually empties. To prevent processes from being left sleeping, it's actually enough to guarantee that all processes waiting on pfmemalloc_wait queue have been woken up by the time we put kswapd to sleep. This patch therefore fixes this issue by substituting 'wake_up' with 'wake_up_all' and removing 'return false' in the code snippet from prepare_kswapd_sleep() above. Note that if any process puts itself in the queue after this waitqueue_active() check, or after the wake up itself, it means that the process will also wake up kswapd - and since we are under prepare_to_wait(), the wake up won't be missed. Also we update the comment prepare_kswapd_sleep() to hopefully more clearly describe the races it is preventing. Fixes: 5515061d ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Krzysztof Kozlowski authored
commit 2836766a upstream. Sleep in atomic context happened on Trats2 board after inserting or removing SD card because mmc_gpio_get_cd() was called under spin lock. Fix this by moving card detection earlier, before acquiring spin lock. The mmc_gpio_get_cd() call does not have to be protected by spin lock because it does not access any sdhci internal data. The sdhci_do_get_cd() call access host flags (SDHCI_DEVICE_DEAD). After moving it out side of spin lock it could theoretically race with driver removal but still there is no actual protection against manual card eject. Dmesg after inserting SD card: [ 41.663414] BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:1511 [ 41.670469] in_atomic(): 1, irqs_disabled(): 128, pid: 30, name: kworker/u8:1 [ 41.677580] INFO: lockdep is turned off. [ 41.681486] irq event stamp: 61972 [ 41.684872] hardirqs last enabled at (61971): [<c0490ee0>] _raw_spin_unlock_irq+0x24/0x5c [ 41.693118] hardirqs last disabled at (61972): [<c04907ac>] _raw_spin_lock_irq+0x18/0x54 [ 41.701190] softirqs last enabled at (61648): [<c0026fd4>] __do_softirq+0x234/0x2c8 [ 41.708914] softirqs last disabled at (61631): [<c00273a0>] irq_exit+0xd0/0x114 [ 41.716206] Preemption disabled at:[< (null)>] (null) [ 41.721500] [ 41.722985] CPU: 3 PID: 30 Comm: kworker/u8:1 Tainted: G W 3.18.0-rc5-next-20141121 #883 [ 41.732111] Workqueue: kmmcd mmc_rescan [ 41.735945] [<c0014d2c>] (unwind_backtrace) from [<c0011c80>] (show_stack+0x10/0x14) [ 41.743661] [<c0011c80>] (show_stack) from [<c0489d14>] (dump_stack+0x70/0xbc) [ 41.750867] [<c0489d14>] (dump_stack) from [<c0228b74>] (gpiod_get_raw_value_cansleep+0x18/0x30) [ 41.759628] [<c0228b74>] (gpiod_get_raw_value_cansleep) from [<c03646e8>] (mmc_gpio_get_cd+0x38/0x58) [ 41.768821] [<c03646e8>] (mmc_gpio_get_cd) from [<c036d378>] (sdhci_request+0x50/0x1a4) [ 41.776808] [<c036d378>] (sdhci_request) from [<c0357934>] (mmc_start_request+0x138/0x268) [ 41.785051] [<c0357934>] (mmc_start_request) from [<c0357cc8>] (mmc_wait_for_req+0x58/0x1a0) [ 41.793469] [<c0357cc8>] (mmc_wait_for_req) from [<c0357e68>] (mmc_wait_for_cmd+0x58/0x78) [ 41.801714] [<c0357e68>] (mmc_wait_for_cmd) from [<c0361c00>] (mmc_io_rw_direct_host+0x98/0x124) [ 41.810480] [<c0361c00>] (mmc_io_rw_direct_host) from [<c03620f8>] (sdio_reset+0x2c/0x64) [ 41.818641] [<c03620f8>] (sdio_reset) from [<c035a3d8>] (mmc_rescan+0x254/0x2e4) [ 41.826028] [<c035a3d8>] (mmc_rescan) from [<c003a0e0>] (process_one_work+0x180/0x3f4) [ 41.833920] [<c003a0e0>] (process_one_work) from [<c003a3bc>] (worker_thread+0x34/0x4b0) [ 41.841991] [<c003a3bc>] (worker_thread) from [<c003fed8>] (kthread+0xe4/0x104) [ 41.849285] [<c003fed8>] (kthread) from [<c000f268>] (ret_from_fork+0x14/0x2c) [ 42.038276] mmc0: new high speed SDHC card at address 1234 Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 94144a46 ("mmc: sdhci: add get_cd() implementation") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stefan Roese authored
commit 4302a596 upstream. When used via spidev with more than one messages to tranfer via SPI_IOC_MESSAGE the current implementation would return with -EINVAL, since bits_per_word and speed_hz are set in all transfer structs. And in the 2nd loop status will stay at -EINVAL as its not overwritten again via fsl_spi_setup_transfer(). This patch changes this behavious by first checking if one of the messages uses different settings. If this is the case the function will return with -EINVAL. If not, the messages are transferred correctly. Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Mark Brown <broonie@linaro.org> Cc: Esben Haabendal <esbenhaabendal@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Olsa authored
commit f61ff6c0 upstream. Linus reported perf report command being interrupted due to processing of 'out of order' event, with following error: Timestamp below last timeslice flush 0x5733a8 [0x28]: failed to process type: 3 I could reproduce the issue and in my case it was caused by one CPU (mmap) being behind during record and userspace mmap reader seeing the data after other CPUs data were already stored. This is expected under some circumstances because we need to limit the number of events that we queue for reordering when we receive a PERF_RECORD_FINISHED_ROUND or when we force flush due to memory pressure. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1417016371-30249-1-git-send-email-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> [zhangzhiqiang: backport to 3.10: - adjust context - commit f61ff6c0 struct events_stats was defined in tools/perf/util/event.h while 3.10 stable defined in tools/perf/util/hist.h. - 3.10 stable there is no pr_oe_time() which used for debug. - After the above adjustments, becomes same to the original patch: https://github.com/torvalds/linux/commit/f61ff6c06dc8f32c7036013ad802c899ec590607 ] Signed-off-by: Zhiqiang Zhang <zhangzhiqiang.zhang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Olsa authored
commit 9fc81d87 upstream. We allow PMU driver to change the cpu on which the event should be installed to. This happened in patch: e2d37cd2 ("perf: Allow the PMU driver to choose the CPU on which to install events") This patch also forces all the group members to follow the currently opened events cpu if the group happened to be moved. This and the change of event->cpu in perf_install_in_context() function introduced in: 0cda4c02 ("perf: Introduce perf_pmu_migrate_context()") forces group members to change their event->cpu, if the currently-opened-event's PMU changed the cpu and there is a group move. Above behaviour causes problem for breakpoint events, which uses event->cpu to touch cpu specific data for breakpoints accounting. By changing event->cpu, some breakpoints slots were wrongly accounted for given cpu. Vinces's perf fuzzer hit this issue and caused following WARN on my setup: WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150() Can't find any breakpoint slot [...] This patch changes the group moving code to keep the event's original cpu. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Olsa authored
commit af91568e upstream. The uncore_collect_events functions assumes that event group might contain only uncore events which is wrong, because it might contain any type of events. This bug leads to uncore framework touching 'not' uncore events, which could end up all sorts of bugs. One was triggered by Vince's perf fuzzer, when the uncore code touched breakpoint event private event space as if it was uncore event and caused BUG: BUG: unable to handle kernel paging request at ffffffff82822068 IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250 ... The code in uncore_assign_events() function was looking for event->hw.idx data while the event was initialized as a breakpoint with different members in event->hw union. This patch forces uncore_collect_events() to collect only uncore events. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Mason authored
commit 6f896054 upstream. Commit 1d52c78a (Btrfs: try not to ENOSPC on log replay) added a check to skip delayed inode updates during log replay because it confuses the enospc code. But the delayed processing will end up ignoring delayed refs from log replay because the inode itself wasn't put through the delayed code. This can end up triggering a warning at commit time: WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34() Which is repeated for each commit because we never process the delayed inode ref update. The fix used here is to change btrfs_delayed_delete_inode_ref to return an error if we're currently in log replay. The caller will do the ref deletion immediately and everything will work properly. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lorenzo Pieralisi authored
commit f43c2718 upstream. On arm64 the TTBR0_EL1 register is set to either the reserved TTBR0 page tables on boot or to the active_mm mappings belonging to user space processes, it must never be set to swapper_pg_dir page tables mappings. When a CPU is booted its active_mm is set to init_mm even though its TTBR0_EL1 points at the reserved TTBR0 page mappings. This implies that when __cpu_suspend is triggered the active_mm can point at init_mm even if the current TTBR0_EL1 register contains the reserved TTBR0_EL1 mappings. Therefore, the mm save and restore executed in __cpu_suspend might turn out to be erroneous in that, if the current->active_mm corresponds to init_mm, on resume from low power it ends up restoring in the TTBR0_EL1 the init_mm mappings that are global and can cause speculation of TLB entries which end up being propagated to user space. This patch fixes the issue by checking the active_mm pointer before restoring the TTBR0 mappings. If the current active_mm == &init_mm, the code sets the TTBR0_EL1 to the reserved TTBR0 mapping instead of switching back to the active_mm, which is the expected behaviour corresponding to the TTBR0_EL1 settings when __cpu_suspend was entered. Fixes: 95322526 ("arm64: kernel: cpu_{suspend/resume} implementation") Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Laura Abbott authored
commit c3684fbb upstream. The function cpu_resume currently lives in the .data section. There's no reason for it to be there since we can use relative instructions without a problem. Move a few cpu_resume data structures out of the assembly file so the .data annotation can be dropped completely and cpu_resume ends up in the read only text section. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lorenzo Pieralisi authored
commit 714f5992 upstream. CPU suspend is the standard kernel interface to be used to enter low-power states on ARM64 systems. Current cpu_suspend implementation by default assumes that all low power states are losing the CPU context, so the CPU registers must be saved and cleaned to DRAM upon state entry. Furthermore, the current cpu_suspend() implementation assumes that if the CPU suspend back-end method returns when called, this has to be considered an error regardless of the return code (which can be successful) since the CPU was not expected to return from a code path that is different from cpu_resume code path - eg returning from the reset vector. All in all this means that the current API does not cope well with low-power states that preserve the CPU context when entered (ie retention states), since first of all the context is saved for nothing on state entry for those states and a successful state entry can return as a normal function return, which is considered an error by the current CPU suspend implementation. This patch refactors the cpu_suspend() API so that it can be split in two separate functionalities. The arm64 cpu_suspend API just provides a wrapper around CPU suspend operation hook. A new function is introduced (for architecture code use only) for states that require context saving upon entry: __cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) __cpu_suspend() saves the context on function entry and calls the so called suspend finisher (ie fn) to complete the suspend operation. The finisher is not expected to return, unless it fails in which case the error is propagated back to the __cpu_suspend caller. The API refactoring results in the following pseudo code call sequence for a suspending CPU, when triggered from a kernel subsystem: /* * int cpu_suspend(unsigned long idx) * @idx: idle state index */ { -> cpu_suspend(idx) |---> CPU operations suspend hook called, if present |--> if (retention_state) |--> direct suspend back-end call (eg PSCI suspend) else |--> __cpu_suspend(idx, &back_end_finisher); } By refactoring the cpu_suspend API this way, the CPU operations back-end has a chance to detect whether idle states require state saving or not and can call the required suspend operations accordingly either through simple function call or indirectly through __cpu_suspend() which carries out state saving and suspend finisher dispatching to complete idle state entry. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lorenzo Pieralisi authored
commit 18ab7db6 upstream. Suspend init function must be marked as __init, since it is not needed after the kernel has booted. This patch moves the cpu_suspend_init() function to the __init section. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rafael J. Wysocki authored
commit 1b1f3e16 upstream. If an ACPI device object whose _STA returns 0 (not present and not functional) has _PR0 or _PS0, its power_manageable flag will be set and acpi_bus_init_power() will return 0 for it. Consequently, if such a device object is passed to the ACPI device PM functions, they will attempt to carry out the requested operation on the device, although they should not do that for devices that are not present. To fix that problem make acpi_bus_init_power() return an error code for devices that are not present which will cause power_manageable to be cleared for them as appropriate in acpi_bus_get_power_flags(). However, the lists of power resources should not be freed for the device in that case, so modify acpi_bus_get_power_flags() to keep those lists even if acpi_bus_init_power() returns an error. Accordingly, when deciding whether or not the lists of power resources need to be freed, acpi_free_power_resources_lists() should check the power.flags.power_resources flag instead of flags.power_manageable, so make that change too. Furthermore, if acpi_bus_attach() sees that flags.initialized is unset for the given device, it should reset the power management settings of the device and re-initialize them from scratch instead of relying on the previous settings (the device may have appeared after being not present previously, for example), so make it use the 'valid' flag of the D0 power state as the initial value of flags.power_manageable for it and call acpi_bus_init_power() to discover its current power state. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Petazzoni authored
commit e5535545 upstream. Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada 38x and Armada XP requires a certain number of conditions: - On Armada 370, the cache policy must be set to write-allocate. - On Armada 375, 38x and XP, the cache policy must be set to write-allocate, the pages must be mapped with the shareable attribute, and the SMP bit must be set Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none of these conditions are met. With Armada 370, the situation is worse: since the processor is single core, regardless of whether CONFIG_SMP or !CONFIG_SMP is used, the cache policy will be set to write-back by the kernel and not write-allocate. Since solving this problem turns out to be quite complicated, and we don't want to let users with a mainline kernel known to have infrequent but existing data corruptions, this commit proposes to simply disable hardware I/O coherency in situations where it is known not to work. And basically, the is_smp() function of the kernel tells us whether it is OK to enable hardware I/O coherency or not, so this commit slightly refactors the coherency_type() function to return COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate type of the coherency fabric in the other case. Thanks to this, the I/O coherency fabric will no longer be used at all in !CONFIG_SMP configurations. It will continue to be used in CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x (which are multiple cores processors), but will no longer be used on Armada 370 (which is a single core processor). In the process, it simplifies the implementation of the coherency_type() function, and adds a missing call to of_node_put(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: e60304f8 ("arm: mvebu: Add hardware I/O Coherency support") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Machek authored
commit 4bf9636c upstream. Commit 9fc2105a ("ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo") breaks audio in python, and probably elsewhere, with message FATAL: cannot locate cpu MHz in /proc/cpuinfo I'm not the first one to hit it, see for example https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/ https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1 Reading original changelog, I have to say "Stop breaking working setups. You know who you are!". Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nishanth Menon authored
commit 9008d83f upstream. Commit 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to re-use it for OMAP5") Moved logic generic for OMAP5+ as part of the init routine by introducing omap4_pm_init. However, the patch left the powerdomain initial setup, an unused omap4430 es1.0 check and a spurious log "Power Management for TI OMAP4." in the original code. Remove the duplicate code which is already present in omap4_pm_init from omap4_init_static_deps. As part of this change, also move the u-boot version print out of the static dependency function to the omap4_pm_init function. Fixes: 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to re-use it for OMAP5") Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tomasz Figa authored
commit 5e794de5 upstream. The PWM block is required for system clock source so it must be always enabled. This patch fixes boot issues on SMDK6410 which did not have the node enabled explicitly for other purposes. Fixes: eeb93d02 ("clocksource: of: Respect device tree node status") Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lokesh Vutla authored
commit be668835 upstream. OMAP wdt driver supports only ti,omap3-wdt compatible. In DRA7 dt wdt compatible property is defined as ti,omap4-wdt by mistake instead of ti,omap3-wdt. Correcting the typo. Fixes: 6e58b8f1 ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board") Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Luca Abeni authored
commit 269ad801 upstream. The dl_runtime_exceeded() function is supposed to ckeck if a SCHED_DEADLINE task must be throttled, by checking if its current runtime is <= 0. However, it also checks if the scheduling deadline has been missed (the current time is larger than the current scheduling deadline), further decreasing the runtime if this happens. This "double accounting" is wrong: - In case of partitioned scheduling (or single CPU), this happens if task_tick_dl() has been called later than expected (due to small HZ values). In this case, the current runtime is also negative, and replenish_dl_entity() can take care of the deadline miss by recharging the current runtime to a value smaller than dl_runtime - In case of global scheduling on multiple CPUs, scheduling deadlines can be missed even if the task did not consume more runtime than expected, hence penalizing the task is wrong This patch fix this problem by throttling a SCHED_DEADLINE task only when its runtime becomes negative, and not modifying the runtime Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.itSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Luca Abeni authored
commit 6a503c3b upstream. According to global EDF, tasks should be migrated between runqueues without checking if their scheduling deadlines and runtimes are valid. However, SCHED_DEADLINE currently performs such a check: a migration happens doing: deactivate_task(rq, next_task, 0); set_task_cpu(next_task, later_rq->cpu); activate_task(later_rq, next_task, 0); which ends up calling dequeue_task_dl(), setting the new CPU, and then calling enqueue_task_dl(). enqueue_task_dl() then calls enqueue_dl_entity(), which calls update_dl_entity(), which can modify scheduling deadline and runtime, breaking global EDF scheduling. As a result, some of the properties of global EDF are not respected: for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on two cores can have unbounded response times for the third task even if 30/80+40/80+120/170 = 1.5809 < 2 This can be fixed by invoking update_dl_entity() only in case of wakeup, or if this is a new SCHED_DEADLINE task. Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.itSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johannes Berg authored
commit 7b990789 upstream. The change from \d+ to .+ inside __aligned() means that the following structure: struct test { u8 a __aligned(2); u8 b __aligned(2); }; essentially gets modified to struct test { u8 a; }; for purposes of kernel-doc, thus dropping a struct member, which in turns causes warnings and invalid kernel-doc generation. Fix this by replacing the catch-all (".") with anything that's not a semicolon ("[^;]"). Fixes: 9dc30918 ("scripts/kernel-doc: handle struct member __aligned without numbers") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Nishanth Menon <nm@ti.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ryusuke Konishi authored
commit 705304a8 upstream. Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead of insert_inode_locked() and a bug of a check for dead inodes needs to be fixed. If nilfs_iget() is called from nfsd after nilfs_new_inode() calls insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode. If nilfs_iget() is called before nilfs_new_inode() calls insert_inode_locked4(), it will create an in-core inode and read its data from the on-disk inode. But, nilfs_iget() will find i_nlink equals zero and fail at nilfs_read_inode_common(), which will lead it to call iget_failed() and cleanly fail. However, this sanity check doesn't work as expected for reused on-disk inodes because they leave a non-zero value in i_mode field and it hinders the test of i_nlink. This patch also fixes the issue by removing the test on i_mode that nilfs2 doesn't need. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian Norris authored
commit 68f29815 upstream. The torture test should quit once it actually induces an error in the flash. This step was accidentally removed during refactoring. Without this fix, the torturetest just continues infinitely, or until the maximum cycle count is reached. e.g.: ... [ 7619.218171] mtd_test: error -5 while erasing EB 100 [ 7619.297981] mtd_test: error -5 while erasing EB 100 [ 7619.377953] mtd_test: error -5 while erasing EB 100 [ 7619.457998] mtd_test: error -5 while erasing EB 100 [ 7619.537990] mtd_test: error -5 while erasing EB 100 ... Fixes: 6cf78358 ("mtd: mtd_torturetest: use mtd_test helpers") Signed-off-by: Brian Norris <computersforpeace@gmail.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yan, Zheng authored
commit 00bd8edb upstream. send_mds_reconnect() may call discard_cap_releases() after all release messages have been dropped by cleanup_cap_releases() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com> Cc: Markus Blank-Burian <burian@muenster.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit 021b77be upstream. Probably this code was syncing a lot more often then intended because the do_sync variable wasn't set to zero. Fixes: c62988ec ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Benjamin Coddington authored
commit 5a64e569 upstream. Fix a bug where nfsd4_encode_components_esc() includes the esc_end char as an additional string encoding. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: e7a0444a "nfsd: add IPv6 addr escaping to fs_location hosts" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rasmus Villemoes authored
commit ef17af2a upstream. Bugs similar to the one in acbbe6fb (kcmp: fix standard comparison bug) are in rich supply. In this variant, the problem is that struct xdr_netobj::len has type unsigned int, so the expression o1->len - o2->len _also_ has type unsigned int; it has completely well-defined semantics, and the result is some non-negative integer, which is always representable in a long long. But this means that if the conditional triggers, we are guaranteed to return a positive value from compare_blob. In this case it could be fixed by - res = o1->len - o2->len; + res = (long long)o1->len - (long long)o2->len; but I'd rather eliminate the usually broken 'return a - b;' idiom. Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
commit 04a258c1 upstream. When build with Debug the following crash is sometimes observed: Call Trace: [<ffffffff812b9600>] string+0x40/0x100 [<ffffffff812bb038>] vsnprintf+0x218/0x5e0 [<ffffffff810baf7d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff812bb4c1>] vscnprintf+0x11/0x30 [<ffffffff8107a2f0>] vprintk+0xd0/0x5c0 [<ffffffffa0051ea0>] ? vmbus_process_rescind_offer+0x0/0x110 [hv_vmbus] [<ffffffff8155c71c>] printk+0x41/0x45 [<ffffffffa004ebac>] vmbus_device_unregister+0x2c/0x40 [hv_vmbus] [<ffffffffa0051ecb>] vmbus_process_rescind_offer+0x2b/0x110 [hv_vmbus] ... This happens due to the following race: between 'if (channel->device_obj)' check in vmbus_process_rescind_offer() and pr_debug() in vmbus_device_unregister() the device can disappear. Fix the issue by taking an additional reference to the device before proceeding to vmbus_device_unregister(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian Riesch authored
commit 8bfbe2de upstream. Commit 19e2ad6a ("n_tty: Remove overflow tests from receive_buf() path") moved the increment of read_head into the arguments list of read_buf_addr(). Function calls represent a sequence point in C. Therefore read_head is incremented before the character c is placed in the buffer. Since the circular read buffer is a lock-less design since commit 6d76bd26 ("n_tty: Make N_TTY ldisc receive path lockless"), this creates a race condition that leads to communication errors. This patch modifies the code to increment read_head _after_ the data is placed in the buffer and thus fixes the race for non-SMP machines. To fix the problem for SMP machines, memory barriers must be added in a separate patch. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Robert Baldyga authored
commit 1ff383a4 upstream. This patch adds waiting until transmit buffer and shifter will be empty before clock disabling. Without this fix it's possible to have clock disabled while data was not transmited yet, which causes unproper state of TX line and problems in following data transfers. Signed-off-by: Robert Baldyga <r.baldyga@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (Red Hat) authored
commit aee4e5f3 upstream. When recording the state of a task for the sched_switch tracepoint a check of task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is because, technically, a task being preempted is really in the TASK_RUNNING state, and that is what should be recorded when tracing a sched_switch, even if the task put itself into another state (it hasn't scheduled out in that state yet). But with the change to use per_cpu preempt counts, the task_thread_info(p)->preempt_count is no longer used, and instead task_preempt_count(p) is used. The problem is that this does not use the current preempt count but a stale one from a previous sched_switch. The task_preempt_count(p) uses saved_preempt_count and not preempt_count(). But for tracing sched_switch, if p is current, we really want preempt_count(). I hit this bug when I was tracing sleep and the call from do_nanosleep() scheduled out in the "RUNNING" state. sleep-4290 [000] 537272.259992: sched_switch: sleep:4290 [120] R ==> swapper/0:0 [120] sleep-4290 [000] 537272.260015: kernel_stack: <stack trace> => __schedule (ffffffff8150864a) => schedule (ffffffff815089f8) => do_nanosleep (ffffffff8150b76c) => hrtimer_nanosleep (ffffffff8108d66b) => SyS_nanosleep (ffffffff8108d750) => return_to_handler (ffffffff8150e8e5) => tracesys_phase2 (ffffffff8150c844) After a bit of hair pulling, I found that the state was really TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE set and caused the sched_switch tracepoint to show it as RUNNING. Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.homeAcked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 01028747 "sched: Create more preempt_count accessors" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tejun Heo authored
commit 9c6ac78e upstream. After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and tests inode->i_state locklessly to see whether it already has all the necessary I_DIRTY bits set. The comment above the barrier doesn't contain any useful information - memory barriers can't ensure "changes are seen by all cpus" by itself. And it sure enough was broken. Please consider the following scenario. CPU 0 CPU 1 ------------------------------------------------------------------------------- enters __writeback_single_inode() grabs inode->i_lock tests PAGECACHE_TAG_DIRTY which is clear enters __set_page_dirty() grabs mapping->tree_lock sets PAGECACHE_TAG_DIRTY releases mapping->tree_lock leaves __set_page_dirty() enters __mark_inode_dirty() smp_mb() sees I_DIRTY_PAGES set leaves __mark_inode_dirty() clears I_DIRTY_PAGES releases inode->i_lock Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem to lead to an immediately critical problem because requeue_inode() later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when deciding whether the inode needs to be requeued for IO and there are enough unintentional memory barriers inbetween, so while the inode ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the IO list. The lack of explicit barrier may also theoretically affect the other I_DIRTY bits which deal with metadata dirtiness. There is no guarantee that a strong enough barrier exists between I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied inode. Filesystem inode writeout path likely has enough stuff which can behave as full barrier but it's theoretically possible that the writeout may not see all the updates from ->dirty_inode(). Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note that I_DIRTY_PAGES needs a special treatment as it always needs to be cleared to be interlocked with the lockless test on __mark_inode_dirty() side. It's cleared unconditionally and reinstated after smp_mb() if the mapping still has dirty pages. Also add comments explaining how and why the barriers are paired. Lightly tested. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Oliver Neukum authored
commit d908f847 upstream. If probe() fails not only the attributes need to be removed but also the memory freed. Reported-by: Ahmed Tamrawi <ahmedtamrawi@gmail.com> Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
commit 5fabcb4c upstream. We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition() with a user passed in partno value. If we pass in 0x7fffffff, the new target in disk_expand_part_tbl() overflows the 'int' and we access beyond the end of ptbl->part[] and even write to it when we do the rcu_assign_pointer() to assign the new partition. Reported-by: David Ramos <daramos@stanford.edu> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steev Klimaszewski authored
commit 007487f1 upstream. Currently we enable Exynos devices in the multi v7 defconfig, however, when testing on my ODROID-U3, I noticed that USB was not working. Enabling this option causes USB to work, which enables networking support as well since the ODROID-U3 has networking on the USB bus. [arnd] Support for odroid-u3 was added in 3.10, so it would be nice to backport this fix at least that far. Signed-off-by: Steev Klimaszewski <steev@gentoo.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-