1. 29 Apr, 2018 23 commits
  2. 24 Apr, 2018 17 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.96 · 5cd35f3e
      Greg Kroah-Hartman authored
      5cd35f3e
    • Wanpeng Li's avatar
      block/mq: fix potential deadlock during cpu hotplug · 8d7f1fde
      Wanpeng Li authored
      commit 51d638b1 upstream.
      
      This can be triggered by hot-unplug one cpu.
      
      ======================================================
       [ INFO: possible circular locking dependency detected ]
       4.11.0+ #17 Not tainted
       -------------------------------------------------------
       step_after_susp/2640 is trying to acquire lock:
        (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110
      
       but task is already holding lock:
        (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (cpu_hotplug.lock){+.+.+.}:
              lock_acquire+0x11c/0x230
              __mutex_lock+0x92/0x990
              mutex_lock_nested+0x1b/0x20
              get_online_cpus+0x64/0x80
              blk_mq_init_allocated_queue+0x3a0/0x4e0
              blk_mq_init_queue+0x3a/0x60
              loop_add+0xe5/0x280
              loop_init+0x124/0x177
              do_one_initcall+0x53/0x1c0
              kernel_init_freeable+0x1e3/0x27f
              kernel_init+0xe/0x100
              ret_from_fork+0x31/0x40
      
       -> #0 (all_q_mutex){+.+...}:
              __lock_acquire+0x189a/0x18a0
              lock_acquire+0x11c/0x230
              __mutex_lock+0x92/0x990
              mutex_lock_nested+0x1b/0x20
              blk_mq_queue_reinit_work+0x18/0x110
              blk_mq_queue_reinit_dead+0x1c/0x20
              cpuhp_invoke_callback+0x1f2/0x810
              cpuhp_down_callbacks+0x42/0x80
              _cpu_down+0xb2/0xe0
              freeze_secondary_cpus+0xb6/0x390
              suspend_devices_and_enter+0x3b3/0xa40
              pm_suspend+0x129/0x490
              state_store+0x82/0xf0
              kobj_attr_store+0xf/0x20
              sysfs_kf_write+0x45/0x60
              kernfs_fop_write+0x135/0x1c0
              __vfs_write+0x37/0x160
              vfs_write+0xcd/0x1d0
              SyS_write+0x58/0xc0
              do_syscall_64+0x8f/0x710
              return_from_SYSCALL_64+0x0/0x7a
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(cpu_hotplug.lock);
                                      lock(all_q_mutex);
                                      lock(cpu_hotplug.lock);
         lock(all_q_mutex);
      
        *** DEADLOCK ***
      
       8 locks held by step_after_susp/2640:
        #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
        #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
        #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
        #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
        #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
        #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
        #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
        #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0
      
       stack backtrace:
       CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
       Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
       Call Trace:
        dump_stack+0x99/0xce
        print_circular_bug+0x1fa/0x270
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        ? lock_acquire+0x11c/0x230
        ? blk_mq_queue_reinit_work+0x18/0x110
        ? blk_mq_queue_reinit_work+0x18/0x110
        __mutex_lock+0x92/0x990
        ? blk_mq_queue_reinit_work+0x18/0x110
        ? kmem_cache_free+0x2cb/0x330
        ? anon_transport_class_unregister+0x20/0x20
        ? blk_mq_queue_reinit_work+0x110/0x110
        mutex_lock_nested+0x1b/0x20
        ? mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        ? __flow_cache_shrink+0x160/0x160
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        ? rcu_read_lock_sched_held+0x79/0x80
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        ? rcu_read_lock_sched_held+0x79/0x80
        ? rcu_sync_lockdep_assert+0x2f/0x60
        ? __sb_start_write+0xd9/0x1c0
        ? vfs_write+0x1ad/0x1d0
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        ? rcu_read_lock_sched_held+0x79/0x80
        do_syscall_64+0x8f/0x710
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
      queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
      contend these two locks in the inversion order. This is due to commit eabe0659
      (blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
      issue because of hotplug rework, however the hotplug rework is still work-in-progress
      and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
      breaks the linus's tree in the merge window, so this patch reverts the lock order
      and avoids to splat linus's tree.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Cc: Thierry Escande <thierry.escande@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d7f1fde
    • Greg Thelen's avatar
      writeback: safer lock nesting · 18484eb9
      Greg Thelen authored
      commit 2e898e4c upstream.
      
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Reported-by: default avatarWang Long <wanglong19@meituan.com>
      Acked-by: default avatarWang Long <wanglong19@meituan.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [natechancellor: Adjust context due to lack of b93b0163]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18484eb9
    • Amir Goldstein's avatar
      fanotify: fix logic of events on child · 71f24a91
      Amir Goldstein authored
      commit 54a307ba upstream.
      
      When event on child inodes are sent to the parent inode mark and
      parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
      will not be delivered to the listener process. However, if the same
      process also has a mount mark, the event to the parent inode will be
      delivered regadless of the mount mark mask.
      
      This behavior is incorrect in the case where the mount mark mask does
      not contain the specific event type. For example, the process adds
      a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
      and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).
      
      A modify event on a file inside that directory (and inside that mount)
      should not create a FAN_MODIFY event, because neither of the marks
      requested to get that event on the file.
      
      Fixes: 1968f5ee ("fanotify: use both marks when possible")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      [natechancellor: Fix small conflict due to lack of 3cd5eca8]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71f24a91
    • Matthew Wilcox's avatar
      mm/filemap.c: fix NULL pointer in page_cache_tree_insert() · f4c86fa0
      Matthew Wilcox authored
      commit abc1be13 upstream.
      
      f2fs specifies the __GFP_ZERO flag for allocating some of its pages.
      Unfortunately, the page cache also uses the mapping's GFP flags for
      allocating radix tree nodes.  It always masked off the __GFP_HIGHMEM
      flag, and masks off __GFP_ZERO in some paths, but not all.  That causes
      radix tree nodes to be allocated with a NULL list_head, which causes
      backtraces like:
      
        __list_del_entry+0x30/0xd0
        list_lru_del+0xac/0x1ac
        page_cache_tree_insert+0xd8/0x110
      
      The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through
      if they are ever used.  Fix them all by using GFP_RECLAIM_MASK at the
      innermost location, and remove it from earlier in the callchain.
      
      Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Reported-by: default avatarChris Fries <cfries@google.com>
      Debugged-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4c86fa0
    • Ian Kent's avatar
      autofs: mount point create should honour passed in mode · 858052b1
      Ian Kent authored
      commit 1e630665 upstream.
      
      The autofs file system mkdir inode operation blindly sets the created
      directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
      cause selinux dac_override denials.
      
      But the function also checks if the caller is the daemon (as no-one else
      should be able to do anything here) so there's no point in not honouring
      the passed in mode, allowing the daemon to set appropriate mode when
      required.
      
      Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.netSigned-off-by: default avatarIan Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      858052b1
    • Al Viro's avatar
      Don't leak MNT_INTERNAL away from internal mounts · 3f68e22e
      Al Viro authored
      commit 16a34adb upstream.
      
      We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
      their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
      somewhere in a new namespace and exiting yields a stack overflow.
      
      Cc: stable@kernel.org
      Reported-by: default avatarAlexander Aring <aring@mojatatu.com>
      Bisected-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarAlexander Aring <aring@mojatatu.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f68e22e
    • Al Viro's avatar
      rpc_pipefs: fix double-dput() · 463f8459
      Al Viro authored
      commit 4a3877c4 upstream.
      
      if we ever hit rpc_gssd_dummy_depopulate() dentry passed to
      it has refcount equal to 1.  __rpc_rmpipe() drops it and
      dput() done after that hits an already freed dentry.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      463f8459
    • Al Viro's avatar
      orangefs_kill_sb(): deal with allocation failures · e2210c54
      Al Viro authored
      commit 65903842 upstream.
      
      orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
      oops in that case.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2210c54
    • Al Viro's avatar
      hypfs_kill_super(): deal with failed allocations · f3ba3eaa
      Al Viro authored
      commit a24cd490 upstream.
      
      hypfs_fill_super() might fail to allocate sbi; hypfs_kill_super()
      should not oops on that.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3ba3eaa
    • Al Viro's avatar
      jffs2_kill_sb(): deal with failed allocations · 02d20e67
      Al Viro authored
      commit c66b23c2 upstream.
      
      jffs2_fill_super() might fail to allocate jffs2_sb_info;
      jffs2_kill_sb() must survive that.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02d20e67
    • Jan Kara's avatar
      udf: Fix leak of UTF-16 surrogates into encoded strings · 8c72cf48
      Jan Kara authored
      commit 44f06ba8 upstream.
      
      OSTA UDF specification does not mention whether the CS0 charset in case
      of two bytes per character encoding should be treated in UTF-16 or
      UCS-2. The sample code in the standard does not treat UTF-16 surrogates
      in any special way but on systems such as Windows which work in UTF-16
      internally, filenames would be treated as being in UTF-16 effectively.
      In Linux it is more difficult to handle characters outside of Base
      Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
      characters only. Just make sure we don't leak UTF-16 surrogates into the
      resulting string when loading names from the filesystem for now.
      
      CC: stable@vger.kernel.org # >= v4.6
      Reported-by: default avatarMingye Wang <arthur200126@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c72cf48
    • Michael Ellerman's avatar
      powerpc/lib: Fix off-by-one in alternate feature patching · 22578e22
      Michael Ellerman authored
      commit b8858581 upstream.
      
      When we patch an alternate feature section, we have to adjust any
      relative branches that branch out of the alternate section.
      
      But currently we have a bug if we have a branch that points to past
      the last instruction of the alternate section, eg:
      
        FTR_SECTION_ELSE
        1:     b       2f
               or      6,6,6
        2:
        ALT_FTR_SECTION_END(...)
               nop
      
      This will result in a relative branch at 1 with a target that equals
      the end of the alternate section.
      
      That branch does not need adjusting when it's moved to the non-else
      location. Currently we do adjust it, resulting in a branch that goes
      off into the link-time location of the else section, which is junk.
      
      The fix is to not patch branches that have a target == end of the
      alternate section.
      
      Fixes: d20fe50a ("KVM: PPC: Book3S HV: Branch inside feature section")
      Fixes: 9b1a735d ("powerpc: Add logic to patch alternative feature sections")
      Cc: stable@vger.kernel.org # v2.6.27+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22578e22
    • Michael Neuling's avatar
      powerpc/eeh: Fix enabling bridge MMIO windows · 73de98fb
      Michael Neuling authored
      commit 13a83eac upstream.
      
      On boot we save the configuration space of PCIe bridges. We do this so
      when we get an EEH event and everything gets reset that we can restore
      them.
      
      Unfortunately we save this state before we've enabled the MMIO space
      on the bridges. Hence if we have to reset the bridge when we come back
      MMIO is not enabled and we end up taking an PE freeze when the driver
      starts accessing again.
      
      This patch forces the memory/MMIO and bus mastering on when restoring
      bridges on EEH. Ideally we'd do this correctly by saving the
      configuration space writes later, but that will have to come later in
      a larger EEH rewrite. For now we have this simple fix.
      
      The original bug can be triggered on a boston machine by doing:
        echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
      On boston, this PHB has a PCIe switch on it.  Without this patch,
      you'll see two EEH events, 1 expected and 1 the failure we are fixing
      here. The second EEH event causes the anything under the PHB to
      disappear (i.e. the i40e eth).
      
      With this patch, only 1 EEH event occurs and devices properly recover.
      
      Fixes: 652defed ("powerpc/eeh: Check PCIe link after reset")
      Cc: stable@vger.kernel.org # v3.11+
      Reported-by: default avatarPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Acked-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73de98fb
    • Matt Redfearn's avatar
      MIPS: memset.S: Fix clobber of v1 in last_fixup · 95a9a529
      Matt Redfearn authored
      commit c96eebf0 upstream.
      
      The label .Llast_fixup\@ is jumped to on page fault within the final
      byte set loop of memset (on < MIPSR6 architectures). For some reason, in
      this fault handler, the v1 register is randomly set to a2 & STORMASK.
      This clobbers v1 for the calling function. This can be observed with the
      following test code:
      
      static int __init __attribute__((optimize("O0"))) test_clear_user(void)
      {
        register int t asm("v1");
        char *test;
        int j, k;
      
        pr_info("\n\n\nTesting clear_user\n");
        test = vmalloc(PAGE_SIZE);
      
        for (j = 256; j < 512; j++) {
          t = 0xa5a5a5a5;
          if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
              pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k);
          }
          if (t != 0xa5a5a5a5) {
             pr_err("v1 was clobbered to 0x%x!\n", t);
          }
        }
      
        return 0;
      }
      late_initcall(test_clear_user);
      
      Which demonstrates that v1 is indeed clobbered (MIPS64):
      
      Testing clear_user
      v1 was clobbered to 0x1!
      v1 was clobbered to 0x2!
      v1 was clobbered to 0x3!
      v1 was clobbered to 0x4!
      v1 was clobbered to 0x5!
      v1 was clobbered to 0x6!
      v1 was clobbered to 0x7!
      
      Since the number of bytes that could not be set is already contained in
      a2, the andi placing a value in v1 is not necessary and actively
      harmful in clobbering v1.
      Reported-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19109/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95a9a529
    • Matt Redfearn's avatar
      MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup · 6cd712bf
      Matt Redfearn authored
      commit daf70d89 upstream.
      
      The __clear_user function is defined to return the number of bytes that
      could not be cleared. From the underlying memset / bzero implementation
      this means setting register a2 to that number on return. Currently if a
      page fault is triggered within the memset_partial block, the value
      loaded into a2 on return is meaningless.
      
      The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
      out how many bytes failed to copy, the exception handler should find how
      many bytes left in the partial block (andi a2, STORMASK), add that to
      the partial block end address (a2), and subtract the faulting address to
      get the remainder. Currently it incorrectly subtracts the partial block
      start address (t1), which has additionally been clobbered to generate a
      jump target in memset_partial. Fix this by adding the block end address
      instead.
      
      This issue was found with the following test code:
            int j, k;
            for (j = 0; j < 512; j++) {
              if ((k = clear_user(NULL, j)) != j) {
                 pr_err("clear_user (NULL %d) returned %d\n", j, k);
              }
            }
      Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).
      Suggested-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19108/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cd712bf
    • Matt Redfearn's avatar
      MIPS: memset.S: EVA & fault support for small_memset · 7177f0b3
      Matt Redfearn authored
      commit 8a8158c8 upstream.
      
      The MIPS kernel memset / bzero implementation includes a small_memset
      branch which is used when the region to be set is smaller than a long (4
      bytes on 32bit, 8 bytes on 64bit). The current small_memset
      implementation uses a simple store byte loop to write the destination.
      There are 2 issues with this implementation:
      
      1. When EVA mode is active, user and kernel address spaces may overlap.
      Currently the use of the sb instruction means kernel mode addressing is
      always used and an intended write to userspace may actually overwrite
      some critical kernel data.
      
      2. If the write triggers a page fault, for example by calling
      __clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
      is triggered.
      
      Fix these issues by replacing the sb instruction with the EX() macro,
      which will emit EVA compatible instuctions as required. Additionally
      implement a fault fixup for small_memset which sets a2 to the number of
      bytes that could not be cleared (as defined by __clear_user).
      Reported-by: default avatarChuanhua Lei <chuanhua.lei@intel.com>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/18975/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7177f0b3