1. 09 Dec, 2017 6 commits
    • Wang Nan's avatar
      mm, oom_reaper: gather each vma to prevent leaking TLB entry · ee23ae91
      Wang Nan authored
      commit 687cb088 upstream.
      
      tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
      space.  In this case, tlb->fullmm is true.  Some archs like arm64
      doesn't flush TLB when tlb->fullmm is true:
      
        commit 5a7862e8 ("arm64: tlbflush: avoid flushing when fullmm == 1").
      
      Which causes leaking of tlb entries.
      
      Will clarifies his patch:
       "Basically, we tag each address space with an ASID (PCID on x86) which
        is resident in the TLB. This means we can elide TLB invalidation when
        pulling down a full mm because we won't ever assign that ASID to
        another mm without doing TLB invalidation elsewhere (which actually
        just nukes the whole TLB).
      
        I think that means that we could potentially not fault on a kernel
        uaccess, because we could hit in the TLB"
      
      There could be a window between complete_signal() sending IPI to other
      cores and all threads sharing this mm are really kicked off from cores.
      In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
      flush TLB then frees pages.  However, due to the above problem, the TLB
      entries are not really flushed on arm64.  Other threads are possible to
      access these pages through TLB entries.  Moreover, a copy_to_user() can
      also write to these pages without generating page fault, causes
      use-after-free bugs.
      
      This patch gathers each vma instead of gathering full vm space.  In this
      case tlb->fullmm is not true.  The behavior of oom reaper become similar
      to munmapping before do_exit, which should be safe for all archs.
      
      Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [backported to 4.9 stable tree]
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee23ae91
    • Horia Geantă's avatar
      Revert "crypto: caam - get rid of tasklet" · 0de12a77
      Horia Geantă authored
      commit 2b163b5b upstream.
      
      This reverts commit 66d2e202.
      
      Quoting from Russell's findings:
      https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html
      
      [quote]
      Okay, I've re-tested, using a different way of measuring, because using
      openssl speed is impractical for off-loaded engines.  I've decided to
      use this way to measure the performance:
      
      dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5
      
      For the threaded IRQs case gives:
      
      0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
      0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
      0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
      	=> 5.36s => 25.0MB/s
      
      and the tasklet case:
      
      0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
      0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
      0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
      	=> 4.95 => 27.1MB/s
      
      which corresponds to an 8% slowdown for the threaded IRQ case.  So,
      tasklets are indeed faster than threaded IRQs.
      
      [...]
      
      I think I've proven from the above that this patch needs to be reverted
      due to the performance regression, and that there _is_ most definitely
      a deterimental effect of switching from tasklets to threaded IRQs.
      [/quote]
      Signed-off-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0de12a77
    • Stefan Agner's avatar
      drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume() · cffc01d2
      Stefan Agner authored
      commit 9fd99f4f upstream.
      
      The resume helpers wait for a vblank to occurre hence IRQ need
      to be enabled. This avoids a warning as follows during resume:
        WARNING: CPU: 0 PID: 314 at drivers/gpu/drm/drm_atomic_helper.c:1249 drm_atomic_helper_wait_for_vblanks.part.1+0x284/0x288
        [CRTC:28:crtc-0] vblank wait timed out
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cffc01d2
    • Stefan Agner's avatar
      drm/fsl-dcu: avoid disabling pixel clock twice on suspend · 48f4d1f7
      Stefan Agner authored
      commit 9306e996 upstream.
      
      With commit 0a70c998 ("drm/fsl-dcu: enable pixel clock when
      enabling CRTC") the pixel clock is controlled by the CRTC code.
      Disabling the pixel clock in suspend leads to a warning due to
      the second clk_disable_unprepare call:
        WARNING: CPU: 0 PID: 359 at drivers/clk/clk.c:594 clk_core_disable+0x8c/0x90
      
      Remove clk_disable_unprepare call for pixel clock to avoid
      unbalanced clock disable on suspend.
      
      Fixes: 0a70c998 ("drm/fsl-dcu: enable pixel clock when enabling CRTC")
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48f4d1f7
    • Rui Hua's avatar
      bcache: recover data from backing when data is clean · 9db9b5f2
      Rui Hua authored
      commit e393aa24 upstream.
      
      When we send a read request and hit the clean data in cache device, there
      is a situation called cache read race in bcache(see the commit in the tail
      of cache_look_up(), the following explaination just copy from there):
      The bucket we're reading from might be reused while our bio is in flight,
      and we could then end up reading the wrong data. We guard against this
      by checking (in bch_cache_read_endio()) if the pointer is stale again;
      if so, we treat it as an error (s->iop.error = -EINTR) and reread from
      the backing device (but we don't pass that error up anywhere)
      
      It should be noted that cache read race happened under normal
      circumstances, not the circumstance when SSD failed, it was counted
      and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.
      
      Without this patch, when we use writeback mode, we will never reread from
      the backing device when cache read race happened, until the whole cache
      device is clean, because the condition
      (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
      cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
      will be passed up, at last, user will receive -EINTR when it's bio end,
      this is not suitable, and wield to up-application.
      
      In this patch, we use s->read_dirty_data to judge whether the read
      request hit dirty data in cache device, it is safe to reread data from
      the backing device when the read request hit clean data. This can not
      only handle cache read race, but also recover data when failed read
      request from cache device.
      
      [edited by mlyle to fix up whitespace, commit log title, comment
      spelling]
      
      Fixes: d59b2379 ("bcache: only permit to recovery read error when cache device is clean")
      Signed-off-by: default avatarHua Rui <huarui.dev@gmail.com>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reviewed-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9db9b5f2
    • Coly Li's avatar
      bcache: only permit to recovery read error when cache device is clean · 322e659a
      Coly Li authored
      commit d59b2379 upstream.
      
      When bcache does read I/Os, for example in writeback or writethrough mode,
      if a read request on cache device is failed, bcache will try to recovery
      the request by reading from cached device. If the data on cached device is
      not synced with cache device, then requester will get a stale data.
      
      For critical storage system like database, providing stale data from
      recovery may result an application level data corruption, which is
      unacceptible.
      
      With this patch, for a failed read request in writeback or writethrough
      mode, recovery a recoverable read request only happens when cache device
      is clean. That is to say, all data on cached device is up to update.
      
      For other cache modes in bcache, read request will never hit
      cached_dev_read_error(), they don't need this patch.
      
      Please note, because cache mode can be switched arbitrarily in run time, a
      writethrough mode might be switched from a writeback mode. Therefore
      checking dc->has_data in writethrough mode still makes sense.
      
      Changelog:
      V4: Fix parens error pointed by Michael Lyle.
      v3: By response from Kent Oversteet, he thinks recovering stale data is a
          bug to fix, and option to permit it is unnecessary. So this version
          the sysfs file is removed.
      v2: rename sysfs entry from allow_stale_data_on_failure  to
          allow_stale_data_on_failure, and fix the confusing commit log.
      v1: initial patch posted.
      
      [small change to patch comment spelling by mlyle]
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Reported-by: default avatarArne Wolf <awolf@lenovo.com>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Nix <nix@esperi.org.uk>
      Cc: Kai Krakow <hurikhan77@gmail.com>
      Cc: Eric Wheeler <bcache@lists.ewheeler.net>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      322e659a
  2. 05 Dec, 2017 34 commits