1. 27 Jan, 2015 14 commits
  2. 16 Jan, 2015 26 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.14.29 · a2ab9187
      Greg Kroah-Hartman authored
      a2ab9187
    • Linus Torvalds's avatar
      mm: Don't count the stack guard page towards RLIMIT_STACK · 1bec714a
      Linus Torvalds authored
      commit 690eac53 upstream.
      
      Commit fee7e49d ("mm: propagate error from stack expansion even for
      guard page") made sure that we return the error properly for stack
      growth conditions.  It also theorized that counting the guard page
      towards the stack limit might break something, but also said "Let's see
      if anybody notices".
      
      Somebody did notice.  Apparently android-x86 sets the stack limit very
      close to the limit indeed, and including the guard page in the rlimit
      check causes the android 'zygote' process problems.
      
      So this adds the (fairly trivial) code to make the stack rlimit check be
      against the actual real stack size, rather than the size of the vma that
      includes the guard page.
      Reported-and-tested-by: default avatarChih-Wei Huang <cwhuang@android-x86.org>
      Cc: Jay Foad <jay.foad@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bec714a
    • Linus Torvalds's avatar
      mm: propagate error from stack expansion even for guard page · 11e4f3bf
      Linus Torvalds authored
      commit fee7e49d upstream.
      
      Jay Foad reports that the address sanitizer test (asan) sometimes gets
      confused by a stack pointer that ends up being outside the stack vma
      that is reported by /proc/maps.
      
      This happens due to an interaction between RLIMIT_STACK and the guard
      page: when we do the guard page check, we ignore the potential error
      from the stack expansion, which effectively results in a missing guard
      page, since the expected stack expansion won't have been done.
      
      And since /proc/maps explicitly ignores the guard page (commit
      d7824370: "mm: fix up some user-visible effects of the stack guard
      page"), the stack pointer ends up being outside the reported stack area.
      
      This is the minimal patch: it just propagates the error.  It also
      effectively makes the guard page part of the stack limit, which in turn
      measn that the actual real stack is one page less than the stack limit.
      
      Let's see if anybody notices.  We could teach acct_stack_growth() to
      allow an extra page for a grow-up/grow-down stack in the rlimit test,
      but I don't want to add more complexity if it isn't needed.
      Reported-and-tested-by: default avatarJay Foad <jay.foad@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11e4f3bf
    • Vlastimil Babka's avatar
      mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed · 18d9304b
      Vlastimil Babka authored
      commit 9e5e3661 upstream.
      
      Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
      stuck in a busy loop with nothing left to balance, but
      kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
      to be a combination of several factors:
      
      1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait
      
      2. The process has been killed (by OOM in this case), but has not yet been
         scheduled to remove itself from the waitqueue and die.
      
      3. kswapd checks for throttled processes in prepare_kswapd_sleep():
      
              if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                      wake_up(&pgdat->pfmemalloc_wait);
      		return false; // kswapd will not go to sleep
      	}
      
         However, for a process that was already killed, wake_up() does not remove
         the process from the waitqueue, since try_to_wake_up() checks its state
         first and returns false when the process is no longer waiting.
      
      4. kswapd is running on the same CPU as the only CPU that the process is
         allowed to run on (through cpus_allowed, or possibly single-cpu system).
      
      5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
         encounters no voluntary preemption points and repeatedly fails
         prepare_kswapd_sleep(), blocking the process from running and removing
         itself from the waitqueue, which would let kswapd sleep.
      
      So, the source of the problem is that we prevent kswapd from going to
      sleep until there are processes waiting on the pfmemalloc_wait queue,
      and a process waiting on a queue is guaranteed to be removed from the
      queue only when it gets scheduled.  This was done to make sure that no
      process is left sleeping on pfmemalloc_wait when kswapd itself goes to
      sleep.
      
      However, it isn't necessary to postpone kswapd sleep until the
      pfmemalloc_wait queue actually empties.  To prevent processes from being
      left sleeping, it's actually enough to guarantee that all processes
      waiting on pfmemalloc_wait queue have been woken up by the time we put
      kswapd to sleep.
      
      This patch therefore fixes this issue by substituting 'wake_up' with
      'wake_up_all' and removing 'return false' in the code snippet from
      prepare_kswapd_sleep() above.  Note that if any process puts itself in
      the queue after this waitqueue_active() check, or after the wake up
      itself, it means that the process will also wake up kswapd - and since
      we are under prepare_to_wait(), the wake up won't be missed.  Also we
      update the comment prepare_kswapd_sleep() to hopefully more clearly
      describe the races it is preventing.
      
      Fixes: 5515061d ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18d9304b
    • Krzysztof Kozlowski's avatar
      mmc: sdhci: Fix sleep in atomic after inserting SD card · b36cd20d
      Krzysztof Kozlowski authored
      commit 2836766a upstream.
      
      Sleep in atomic context happened on Trats2 board after inserting or
      removing SD card because mmc_gpio_get_cd() was called under spin lock.
      
      Fix this by moving card detection earlier, before acquiring spin lock.
      The mmc_gpio_get_cd() call does not have to be protected by spin lock
      because it does not access any sdhci internal data.
      The sdhci_do_get_cd() call access host flags (SDHCI_DEVICE_DEAD). After
      moving it out side of spin lock it could theoretically race with driver
      removal but still there is no actual protection against manual card
      eject.
      
      Dmesg after inserting SD card:
      [   41.663414] BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:1511
      [   41.670469] in_atomic(): 1, irqs_disabled(): 128, pid: 30, name: kworker/u8:1
      [   41.677580] INFO: lockdep is turned off.
      [   41.681486] irq event stamp: 61972
      [   41.684872] hardirqs last  enabled at (61971): [<c0490ee0>] _raw_spin_unlock_irq+0x24/0x5c
      [   41.693118] hardirqs last disabled at (61972): [<c04907ac>] _raw_spin_lock_irq+0x18/0x54
      [   41.701190] softirqs last  enabled at (61648): [<c0026fd4>] __do_softirq+0x234/0x2c8
      [   41.708914] softirqs last disabled at (61631): [<c00273a0>] irq_exit+0xd0/0x114
      [   41.716206] Preemption disabled at:[<  (null)>]   (null)
      [   41.721500]
      [   41.722985] CPU: 3 PID: 30 Comm: kworker/u8:1 Tainted: G        W      3.18.0-rc5-next-20141121 #883
      [   41.732111] Workqueue: kmmcd mmc_rescan
      [   41.735945] [<c0014d2c>] (unwind_backtrace) from [<c0011c80>] (show_stack+0x10/0x14)
      [   41.743661] [<c0011c80>] (show_stack) from [<c0489d14>] (dump_stack+0x70/0xbc)
      [   41.750867] [<c0489d14>] (dump_stack) from [<c0228b74>] (gpiod_get_raw_value_cansleep+0x18/0x30)
      [   41.759628] [<c0228b74>] (gpiod_get_raw_value_cansleep) from [<c03646e8>] (mmc_gpio_get_cd+0x38/0x58)
      [   41.768821] [<c03646e8>] (mmc_gpio_get_cd) from [<c036d378>] (sdhci_request+0x50/0x1a4)
      [   41.776808] [<c036d378>] (sdhci_request) from [<c0357934>] (mmc_start_request+0x138/0x268)
      [   41.785051] [<c0357934>] (mmc_start_request) from [<c0357cc8>] (mmc_wait_for_req+0x58/0x1a0)
      [   41.793469] [<c0357cc8>] (mmc_wait_for_req) from [<c0357e68>] (mmc_wait_for_cmd+0x58/0x78)
      [   41.801714] [<c0357e68>] (mmc_wait_for_cmd) from [<c0361c00>] (mmc_io_rw_direct_host+0x98/0x124)
      [   41.810480] [<c0361c00>] (mmc_io_rw_direct_host) from [<c03620f8>] (sdio_reset+0x2c/0x64)
      [   41.818641] [<c03620f8>] (sdio_reset) from [<c035a3d8>] (mmc_rescan+0x254/0x2e4)
      [   41.826028] [<c035a3d8>] (mmc_rescan) from [<c003a0e0>] (process_one_work+0x180/0x3f4)
      [   41.833920] [<c003a0e0>] (process_one_work) from [<c003a3bc>] (worker_thread+0x34/0x4b0)
      [   41.841991] [<c003a3bc>] (worker_thread) from [<c003fed8>] (kthread+0xe4/0x104)
      [   41.849285] [<c003fed8>] (kthread) from [<c000f268>] (ret_from_fork+0x14/0x2c)
      [   42.038276] mmc0: new high speed SDHC card at address 1234
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 94144a46 ("mmc: sdhci: add get_cd() implementation")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b36cd20d
    • Stefan Roese's avatar
      spi: fsl: Fix problem with multi message transfers · 2937d5ac
      Stefan Roese authored
      commit 4302a596 upstream.
      
      When used via spidev with more than one messages to tranfer via
      SPI_IOC_MESSAGE the current implementation would return with
      -EINVAL, since bits_per_word and speed_hz are set in all
      transfer structs. And in the 2nd loop status will stay at
      -EINVAL as its not overwritten again via fsl_spi_setup_transfer().
      
      This patch changes this behavious by first checking if one of
      the messages uses different settings. If this is the case
      the function will return with -EINVAL. If not, the messages
      are transferred correctly.
      Signed-off-by: default avatarStefan Roese <sr@denx.de>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Cc: Esben Haabendal <esbenhaabendal@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2937d5ac
    • Jiri Olsa's avatar
      perf session: Do not fail on processing out of order event · ceaefcdf
      Jiri Olsa authored
      commit f61ff6c0 upstream.
      
      Linus reported perf report command being interrupted due to processing
      of 'out of order' event, with following error:
      
        Timestamp below last timeslice flush
        0x5733a8 [0x28]: failed to process type: 3
      
      I could reproduce the issue and in my case it was caused by one CPU
      (mmap) being behind during record and userspace mmap reader seeing the
      data after other CPUs data were already stored.
      
      This is expected under some circumstances because we need to limit the
      number of events that we queue for reordering when we receive a
      PERF_RECORD_FINISHED_ROUND or when we force flush due to memory
      pressure.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1417016371-30249-1-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      [zhangzhiqiang: backport to 3.10:
       - adjust context
       - commit f61ff6c0 struct events_stats was defined in tools/perf/util/event.h
         while 3.10 stable defined in tools/perf/util/hist.h.
       - 3.10 stable there is no pr_oe_time() which used for debug.
       - After the above adjustments, becomes same to the original patch:
         https://github.com/torvalds/linux/commit/f61ff6c06dc8f32c7036013ad802c899ec590607
      ]
      Signed-off-by: default avatarZhiqiang Zhang <zhangzhiqiang.zhang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ceaefcdf
    • Jiri Olsa's avatar
      perf: Fix events installation during moving group · c8af8989
      Jiri Olsa authored
      commit 9fc81d87 upstream.
      
      We allow PMU driver to change the cpu on which the event
      should be installed to. This happened in patch:
      
        e2d37cd2 ("perf: Allow the PMU driver to choose the CPU on which to install events")
      
      This patch also forces all the group members to follow
      the currently opened events cpu if the group happened
      to be moved.
      
      This and the change of event->cpu in perf_install_in_context()
      function introduced in:
      
        0cda4c02 ("perf: Introduce perf_pmu_migrate_context()")
      
      forces group members to change their event->cpu,
      if the currently-opened-event's PMU changed the cpu
      and there is a group move.
      
      Above behaviour causes problem for breakpoint events,
      which uses event->cpu to touch cpu specific data for
      breakpoints accounting. By changing event->cpu, some
      breakpoints slots were wrongly accounted for given
      cpu.
      
      Vinces's perf fuzzer hit this issue and caused following
      WARN on my setup:
      
         WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
         Can't find any breakpoint slot
         [...]
      
      This patch changes the group moving code to keep the event's
      original cpu.
      Reported-by: default avatarVince Weaver <vince@deater.net>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8af8989
    • Jiri Olsa's avatar
      perf/x86/intel/uncore: Make sure only uncore events are collected · ec9c772a
      Jiri Olsa authored
      commit af91568e upstream.
      
      The uncore_collect_events functions assumes that event group
      might contain only uncore events which is wrong, because it
      might contain any type of events.
      
      This bug leads to uncore framework touching 'not' uncore events,
      which could end up all sorts of bugs.
      
      One was triggered by Vince's perf fuzzer, when the uncore code
      touched breakpoint event private event space as if it was uncore
      event and caused BUG:
      
         BUG: unable to handle kernel paging request at ffffffff82822068
         IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
         ...
      
      The code in uncore_assign_events() function was looking for
      event->hw.idx data while the event was initialized as a
      breakpoint with different members in event->hw union.
      
      This patch forces uncore_collect_events() to collect only uncore
      events.
      Reported-by: default avatarVince Weaver <vince@deater.net>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec9c772a
    • Chris Mason's avatar
      Btrfs: don't delay inode ref updates during log replay · 33417386
      Chris Mason authored
      commit 6f896054 upstream.
      
      Commit 1d52c78a (Btrfs: try not to ENOSPC on log replay) added a
      check to skip delayed inode updates during log replay because it
      confuses the enospc code.  But the delayed processing will end up
      ignoring delayed refs from log replay because the inode itself wasn't
      put through the delayed code.
      
      This can end up triggering a warning at commit time:
      
      WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
      
      Which is repeated for each commit because we never process the delayed
      inode ref update.
      
      The fix used here is to change btrfs_delayed_delete_inode_ref to return
      an error if we're currently in log replay.  The caller will do the ref
      deletion immediately and everything will work properly.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33417386
    • Lorenzo Pieralisi's avatar
      arm64: kernel: fix __cpu_suspend mm switch on warm-boot · 852cacf6
      Lorenzo Pieralisi authored
      commit f43c2718 upstream.
      
      On arm64 the TTBR0_EL1 register is set to either the reserved TTBR0
      page tables on boot or to the active_mm mappings belonging to user space
      processes, it must never be set to swapper_pg_dir page tables mappings.
      
      When a CPU is booted its active_mm is set to init_mm even though its
      TTBR0_EL1 points at the reserved TTBR0 page mappings. This implies
      that when __cpu_suspend is triggered the active_mm can point at
      init_mm even if the current TTBR0_EL1 register contains the reserved
      TTBR0_EL1 mappings.
      
      Therefore, the mm save and restore executed in __cpu_suspend might
      turn out to be erroneous in that, if the current->active_mm corresponds
      to init_mm, on resume from low power it ends up restoring in the
      TTBR0_EL1 the init_mm mappings that are global and can cause speculation
      of TLB entries which end up being propagated to user space.
      
      This patch fixes the issue by checking the active_mm pointer before
      restoring the TTBR0 mappings. If the current active_mm == &init_mm,
      the code sets the TTBR0_EL1 to the reserved TTBR0 mapping instead of
      switching back to the active_mm, which is the expected behaviour
      corresponding to the TTBR0_EL1 settings when __cpu_suspend was entered.
      
      Fixes: 95322526 ("arm64: kernel: cpu_{suspend/resume} implementation")
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852cacf6
    • Laura Abbott's avatar
      arm64: Move cpu_resume into the text section · 219591c5
      Laura Abbott authored
      commit c3684fbb upstream.
      
      The function cpu_resume currently lives in the .data section.
      There's no reason for it to be there since we can use relative
      instructions without a problem. Move a few cpu_resume data
      structures out of the assembly file so the .data annotation
      can be dropped completely and cpu_resume ends up in the read
      only text section.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      219591c5
    • Lorenzo Pieralisi's avatar
      arm64: kernel: refactor the CPU suspend API for retention states · 0e42d84b
      Lorenzo Pieralisi authored
      commit 714f5992 upstream.
      
      CPU suspend is the standard kernel interface to be used to enter
      low-power states on ARM64 systems. Current cpu_suspend implementation
      by default assumes that all low power states are losing the CPU context,
      so the CPU registers must be saved and cleaned to DRAM upon state
      entry. Furthermore, the current cpu_suspend() implementation assumes
      that if the CPU suspend back-end method returns when called, this has
      to be considered an error regardless of the return code (which can be
      successful) since the CPU was not expected to return from a code path that
      is different from cpu_resume code path - eg returning from the reset vector.
      
      All in all this means that the current API does not cope well with low-power
      states that preserve the CPU context when entered (ie retention states),
      since first of all the context is saved for nothing on state entry for
      those states and a successful state entry can return as a normal function
      return, which is considered an error by the current CPU suspend
      implementation.
      
      This patch refactors the cpu_suspend() API so that it can be split in
      two separate functionalities. The arm64 cpu_suspend API just provides
      a wrapper around CPU suspend operation hook. A new function is
      introduced (for architecture code use only) for states that require
      context saving upon entry:
      
      __cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
      
      __cpu_suspend() saves the context on function entry and calls the
      so called suspend finisher (ie fn) to complete the suspend operation.
      The finisher is not expected to return, unless it fails in which case
      the error is propagated back to the __cpu_suspend caller.
      
      The API refactoring results in the following pseudo code call sequence for a
      suspending CPU, when triggered from a kernel subsystem:
      
      /*
       * int cpu_suspend(unsigned long idx)
       * @idx: idle state index
       */
      {
      -> cpu_suspend(idx)
      	|---> CPU operations suspend hook called, if present
      		|--> if (retention_state)
      			|--> direct suspend back-end call (eg PSCI suspend)
      		     else
      			|--> __cpu_suspend(idx, &back_end_finisher);
      }
      
      By refactoring the cpu_suspend API this way, the CPU operations back-end
      has a chance to detect whether idle states require state saving or not
      and can call the required suspend operations accordingly either through
      simple function call or indirectly through __cpu_suspend() which carries out
      state saving and suspend finisher dispatching to complete idle state entry.
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarHanjun Guo <hanjun.guo@linaro.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e42d84b
    • Lorenzo Pieralisi's avatar
      arm64: kernel: add missing __init section marker to cpu_suspend_init · 5ef30fef
      Lorenzo Pieralisi authored
      commit 18ab7db6 upstream.
      
      Suspend init function must be marked as __init, since it is not needed
      after the kernel has booted. This patch moves the cpu_suspend_init()
      function to the __init section.
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ef30fef
    • Rafael J. Wysocki's avatar
      ACPI / PM: Fix PM initialization for devices that are not present · 47abb28e
      Rafael J. Wysocki authored
      commit 1b1f3e16 upstream.
      
      If an ACPI device object whose _STA returns 0 (not present and not
      functional) has _PR0 or _PS0, its power_manageable flag will be set
      and acpi_bus_init_power() will return 0 for it.  Consequently, if
      such a device object is passed to the ACPI device PM functions, they
      will attempt to carry out the requested operation on the device,
      although they should not do that for devices that are not present.
      
      To fix that problem make acpi_bus_init_power() return an error code
      for devices that are not present which will cause power_manageable to
      be cleared for them as appropriate in acpi_bus_get_power_flags().
      However, the lists of power resources should not be freed for the
      device in that case, so modify acpi_bus_get_power_flags() to keep
      those lists even if acpi_bus_init_power() returns an error.
      Accordingly, when deciding whether or not the lists of power
      resources need to be freed, acpi_free_power_resources_lists()
      should check the power.flags.power_resources flag instead of
      flags.power_manageable, so make that change too.
      
      Furthermore, if acpi_bus_attach() sees that flags.initialized is
      unset for the given device, it should reset the power management
      settings of the device and re-initialize them from scratch instead
      of relying on the previous settings (the device may have appeared
      after being not present previously, for example), so make it use
      the 'valid' flag of the D0 power state as the initial value of
      flags.power_manageable for it and call acpi_bus_init_power() to
      discover its current power state.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47abb28e
    • Thomas Petazzoni's avatar
      ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP · 8d33f514
      Thomas Petazzoni authored
      commit e5535545 upstream.
      
      Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada
      38x and Armada XP requires a certain number of conditions:
      
       - On Armada 370, the cache policy must be set to write-allocate.
      
       - On Armada 375, 38x and XP, the cache policy must be set to
         write-allocate, the pages must be mapped with the shareable
         attribute, and the SMP bit must be set
      
      Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions
      are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none
      of these conditions are met. With Armada 370, the situation is worse:
      since the processor is single core, regardless of whether CONFIG_SMP
      or !CONFIG_SMP is used, the cache policy will be set to write-back by
      the kernel and not write-allocate.
      
      Since solving this problem turns out to be quite complicated, and we
      don't want to let users with a mainline kernel known to have
      infrequent but existing data corruptions, this commit proposes to
      simply disable hardware I/O coherency in situations where it is known
      not to work.
      
      And basically, the is_smp() function of the kernel tells us whether it
      is OK to enable hardware I/O coherency or not, so this commit slightly
      refactors the coherency_type() function to return
      COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate
      type of the coherency fabric in the other case.
      
      Thanks to this, the I/O coherency fabric will no longer be used at all
      in !CONFIG_SMP configurations. It will continue to be used in
      CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x
      (which are multiple cores processors), but will no longer be used on
      Armada 370 (which is a single core processor).
      
      In the process, it simplifies the implementation of the
      coherency_type() function, and adds a missing call to of_node_put().
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: e60304f8 ("arm: mvebu: Add hardware I/O Coherency support")
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      8d33f514
    • Pavel Machek's avatar
      Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo" · 3f4ddf1a
      Pavel Machek authored
      commit 4bf9636c upstream.
      
      Commit 9fc2105a ("ARM: 7830/1: delay: don't bother reporting
      bogomips in /proc/cpuinfo") breaks audio in python, and probably
      elsewhere, with message
      
        FATAL: cannot locate cpu MHz in /proc/cpuinfo
      
      I'm not the first one to hit it, see for example
      
        https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
        https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1
      
      Reading original changelog, I have to say "Stop breaking working setups.
      You know who you are!".
      Signed-off-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f4ddf1a
    • Nishanth Menon's avatar
      ARM: OMAP4: PM: Only do static dependency configuration in omap4_init_static_deps · dab35042
      Nishanth Menon authored
      commit 9008d83f upstream.
      
      Commit 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to
      re-use it for OMAP5")
      
      Moved logic generic for OMAP5+ as part of the init routine by
      introducing omap4_pm_init. However, the patch left the powerdomain
      initial setup, an unused omap4430 es1.0 check and a spurious log
      "Power Management for TI OMAP4." in the original code.
      
      Remove the duplicate code which is already present in omap4_pm_init from
      omap4_init_static_deps.
      
      As part of this change, also move the u-boot version print out of the
      static dependency function to the omap4_pm_init function.
      
      Fixes: 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to re-use it for OMAP5")
      Signed-off-by: default avatarNishanth Menon <nm@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dab35042
    • Tomasz Figa's avatar
      ARM: dts: Enable PWM node by default for s3c64xx · 8be47453
      Tomasz Figa authored
      commit 5e794de5 upstream.
      
      The PWM block is required for system clock source so it must be always
      enabled. This patch fixes boot issues on SMDK6410 which did not have
      the node enabled explicitly for other purposes.
      
      Fixes: eeb93d02 ("clocksource: of: Respect device tree node status")
      Signed-off-by: default avatarTomasz Figa <tomasz.figa@gmail.com>
      Signed-off-by: default avatarKukjin Kim <kgene.kim@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8be47453
    • Lokesh Vutla's avatar
      ARM: dts: DRA7: wdt: Fix compatible property for watchdog node · 9feeb8f3
      Lokesh Vutla authored
      commit be668835 upstream.
      
      OMAP wdt driver supports only ti,omap3-wdt compatible. In DRA7 dt
      wdt compatible property is defined as ti,omap4-wdt by mistake instead of
      ti,omap3-wdt. Correcting the typo.
      
      Fixes: 6e58b8f1 ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board")
      Signed-off-by: default avatarLokesh Vutla <lokeshvutla@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9feeb8f3
    • Luca Abeni's avatar
      sched/deadline: Avoid double-accounting in case of missed deadlines · cae817ad
      Luca Abeni authored
      commit 269ad801 upstream.
      
      The dl_runtime_exceeded() function is supposed to ckeck if
      a SCHED_DEADLINE task must be throttled, by checking if its
      current runtime is <= 0. However, it also checks if the
      scheduling deadline has been missed (the current time is
      larger than the current scheduling deadline), further
      decreasing the runtime if this happens.
      This "double accounting" is wrong:
      
      - In case of partitioned scheduling (or single CPU), this
        happens if task_tick_dl() has been called later than expected
        (due to small HZ values). In this case, the current runtime is
        also negative, and replenish_dl_entity() can take care of the
        deadline miss by recharging the current runtime to a value smaller
        than dl_runtime
      
      - In case of global scheduling on multiple CPUs, scheduling
        deadlines can be missed even if the task did not consume more
        runtime than expected, hence penalizing the task is wrong
      
      This patch fix this problem by throttling a SCHED_DEADLINE task
      only when its runtime becomes negative, and not modifying the runtime
      Signed-off-by: default avatarLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJuri Lelli <juri.lelli@gmail.com>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.itSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cae817ad
    • Luca Abeni's avatar
      sched/deadline: Fix migration of SCHED_DEADLINE tasks · 678c8bb7
      Luca Abeni authored
      commit 6a503c3b upstream.
      
      According to global EDF, tasks should be migrated between runqueues
      without checking if their scheduling deadlines and runtimes are valid.
      However, SCHED_DEADLINE currently performs such a check:
      a migration happens doing:
      
      	deactivate_task(rq, next_task, 0);
      	set_task_cpu(next_task, later_rq->cpu);
      	activate_task(later_rq, next_task, 0);
      
      which ends up calling dequeue_task_dl(), setting the new CPU, and then
      calling enqueue_task_dl().
      
      enqueue_task_dl() then calls enqueue_dl_entity(), which calls
      update_dl_entity(), which can modify scheduling deadline and runtime,
      breaking global EDF scheduling.
      
      As a result, some of the properties of global EDF are not respected:
      for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
      two cores can have unbounded response times for the third task even
      if 30/80+40/80+120/170 = 1.5809 < 2
      
      This can be fixed by invoking update_dl_entity() only in case of
      wakeup, or if this is a new SCHED_DEADLINE task.
      Signed-off-by: default avatarLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJuri Lelli <juri.lelli@gmail.com>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.itSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      678c8bb7
    • Johannes Berg's avatar
      scripts/kernel-doc: don't eat struct members with __aligned · 11d1b5db
      Johannes Berg authored
      commit 7b990789 upstream.
      
      The change from \d+ to .+ inside __aligned() means that the following
      structure:
      
        struct test {
              u8 a __aligned(2);
              u8 b __aligned(2);
        };
      
      essentially gets modified to
      
        struct test {
              u8 a;
        };
      
      for purposes of kernel-doc, thus dropping a struct member, which in
      turns causes warnings and invalid kernel-doc generation.
      
      Fix this by replacing the catch-all (".") with anything that's not a
      semicolon ("[^;]").
      
      Fixes: 9dc30918 ("scripts/kernel-doc: handle struct member __aligned without numbers")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Cc: Nishanth Menon <nm@ti.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11d1b5db
    • Ryusuke Konishi's avatar
      nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races · 1cde1254
      Ryusuke Konishi authored
      commit 705304a8 upstream.
      
      Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar
      ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
      of insert_inode_locked() and a bug of a check for dead inodes needs to
      be fixed.
      
      If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
      insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
      the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
      
      If nilfs_iget() is called before nilfs_new_inode() calls
      insert_inode_locked4(), it will create an in-core inode and read its
      data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
      zero and fail at nilfs_read_inode_common(), which will lead it to call
      iget_failed() and cleanly fail.
      
      However, this sanity check doesn't work as expected for reused on-disk
      inodes because they leave a non-zero value in i_mode field and it
      hinders the test of i_nlink.  This patch also fixes the issue by
      removing the test on i_mode that nilfs2 doesn't need.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cde1254
    • Brian Norris's avatar
      mtd: tests: abort torturetest on erase errors · b9c2571e
      Brian Norris authored
      commit 68f29815 upstream.
      
      The torture test should quit once it actually induces an error in the
      flash. This step was accidentally removed during refactoring.
      
      Without this fix, the torturetest just continues infinitely, or until
      the maximum cycle count is reached. e.g.:
      
         ...
         [ 7619.218171] mtd_test: error -5 while erasing EB 100
         [ 7619.297981] mtd_test: error -5 while erasing EB 100
         [ 7619.377953] mtd_test: error -5 while erasing EB 100
         [ 7619.457998] mtd_test: error -5 while erasing EB 100
         [ 7619.537990] mtd_test: error -5 while erasing EB 100
         ...
      
      Fixes: 6cf78358 ("mtd: mtd_torturetest: use mtd_test helpers")
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9c2571e
    • Yan, Zheng's avatar
      ceph: fix null pointer dereference in discard_cap_releases() · 92a34c87
      Yan, Zheng authored
      commit 00bd8edb upstream.
      
      send_mds_reconnect() may call discard_cap_releases() after all
      release messages have been dropped by cleanup_cap_releases()
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      Cc: Markus Blank-Burian <burian@muenster.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92a34c87