1. 16 Aug, 2016 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.6.7 · 096998b1
      Greg Kroah-Hartman authored
      096998b1
    • Vegard Nossum's avatar
      ext4: fix reference counting bug on block allocation error · ce1afec0
      Vegard Nossum authored
      commit 554a5ccc upstream.
      
      If we hit this error when mounted with errors=continue or
      errors=remount-ro:
      
          EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
      
      then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
      continue. However, ext4_mb_release_context() is the wrong thing to call
      here since we are still actually using the allocation context.
      
      Instead, just error out. We could retry the allocation, but there is a
      possibility of getting stuck in an infinite loop instead, so this seems
      safer.
      
      [ Fixed up so we don't return EAGAIN to userspace. --tytso ]
      
      Fixes: 8556e8f3 ("ext4: Don't allow new groups to be added during block allocation")
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce1afec0
    • Vegard Nossum's avatar
      ext4: short-cut orphan cleanup on error · d123a55f
      Vegard Nossum authored
      commit c65d5c6c upstream.
      
      If we encounter a filesystem error during orphan cleanup, we should stop.
      Otherwise, we may end up in an infinite loop where the same inode is
      processed again and again.
      
          EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
          EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
          Aborting journal on device loop0-8.
          EXT4-fs (loop0): Remounting filesystem read-only
          EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
          EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
          EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
          [...]
      
      See-also: c9eb13a9 ("ext4: fix hang when processing corrupted orphaned inode list")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d123a55f
    • Theodore Ts'o's avatar
      ext4: validate s_reserved_gdt_blocks on mount · 7dac74db
      Theodore Ts'o authored
      commit 5b9554dc upstream.
      
      If s_reserved_gdt_blocks is extremely large, it's possible for
      ext4_init_block_bitmap(), which is called when ext4 sets up an
      uninitialized block bitmap, to corrupt random kernel memory.  Add the
      same checks which e2fsck has --- it must never be larger than
      blocksize / sizeof(__u32) --- and then add a backup check in
      ext4_init_block_bitmap() in case the superblock gets modified after
      the file system is mounted.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dac74db
    • Vegard Nossum's avatar
      ext4: don't call ext4_should_journal_data() on the journal inode · 1bf87f85
      Vegard Nossum authored
      commit 6a7fd522 upstream.
      
      If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
      to call ext4_should_journal_data() before superblock options and flags
      are fully set up.  In that case, the iput() on the journal inode can
      end up causing a BUG().
      
      Work around this problem by reordering the tests so we only call
      ext4_should_journal_data() after we know it's not the journal inode.
      
      Fixes: 2d859db3 ("ext4: fix data corruption in inodes with journalled data")
      Fixes: 2b405bfa ("ext4: fix data=journal fast mount/umount hang")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bf87f85
    • Jan Kara's avatar
      ext4: fix deadlock during page writeback · 4034e22c
      Jan Kara authored
      commit 646caa9c upstream.
      
      Commit 06bd3c36 (ext4: fix data exposure after a crash) uncovered a
      deadlock in ext4_writepages() which was previously much harder to hit.
      After this commit xfstest generic/130 reproduces the deadlock on small
      filesystems.
      
      The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
      and marks current inode handle as synchronous. That subsequently results
      in ext4_journal_stop() called from ext4_writepages() to block waiting for
      transaction commit while still holding page locks, reference to io_end,
      and some prepared bio in mpd structure each of which can possibly block
      transaction commit from completing and thus results in deadlock.
      
      Fix the problem by releasing page locks, io_end reference, and
      submitting prepared bio before calling ext4_journal_stop().
      
      [ Changed to defer the call to ext4_journal_stop() only if the handle
        is synchronous.  --tytso ]
      Reported-and-tested-by: default avatarEryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4034e22c
    • Vegard Nossum's avatar
      ext4: check for extents that wrap around · f69cb089
      Vegard Nossum authored
      commit f70749ca upstream.
      
      An extent with lblock = 4294967295 and len = 1 will pass the
      ext4_valid_extent() test:
      
      	ext4_lblk_t last = lblock + len - 1;
      
      	if (len == 0 || lblock > last)
      		return 0;
      
      since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
      the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
      
      We can simplify it by removing the - 1 altogether and changing the test
      to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
      lblock and it fails, and if len > 0 then lblock + len > lblock in order
      to pass (i.e. it doesn't overflow).
      
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
      Cc: Eryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f69cb089
    • Thomas Petazzoni's avatar
      serial: mvebu-uart: free the IRQ in ->shutdown() · 7fbb3336
      Thomas Petazzoni authored
      commit c2c1659b upstream.
      
      As suggested by the serial port infrastructure documentation, the IRQ is
      requested in ->startup(). However, it is never freed in the ->shutdown()
      hook.
      
      With simple systems that open the serial port once for all and always
      have at least one process that keep the serial port opened, there was no
      problem. But with a more complicated system (*cough* systemd *cough*),
      the serial port is opened/closed many times, which at some point no
      processes having the serial port open at all. Due to this ->startup()
      gets called again, tries to request_irq() again, which fails.
      
      Fixes: 30530791 ("serial: mvebu-uart: initial support for Armada-3700 serial port")
      Cc: Ofer Heifetz <oferh@marvell.com>
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fbb3336
    • Herbert Xu's avatar
      crypto: scatterwalk - Fix test in scatterwalk_done · c56d9bf4
      Herbert Xu authored
      commit 5f070e81 upstream.
      
      When there is more data to be processed, the current test in
      scatterwalk_done may prevent us from calling pagedone even when
      we should.
      
      In particular, if we're on an SG entry spanning multiple pages
      where the last page is not a full page, we will incorrectly skip
      calling pagedone on the second last page.
      
      This patch fixes this by adding a separate test for whether we've
      reached the end of a page.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c56d9bf4
    • Herbert Xu's avatar
      crypto: gcm - Filter out async ghash if necessary · da4a784e
      Herbert Xu authored
      commit b30bdfa8 upstream.
      
      As it is if you ask for a sync gcm you may actually end up with
      an async one because it does not filter out async implementations
      of ghash.
      
      This patch fixes this by adding the necessary filter when looking
      for ghash.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da4a784e
    • Andreas Herrmann's avatar
      Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" · 3e52fe17
      Andreas Herrmann authored
      commit da7d3abe upstream.
      
      This reverts commit 790d849b.
      
      Using a v4.7-rc7 kernel on a HP ProLiant triggered following messages
      
       pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
       cpufreq: ondemand governor failed, too long transition latency of HW, fallback to performance governor
      
      The last line was shown for each CPU in the system.
      Testing v4.5 (where commit 790d849b was integrated) triggered
      similar messages. Same behaviour on a 2nd HP Proliant system.
      
      So commit 790d849b (cpufreq: pcc-cpufreq: update default value of
      cpuinfo_transition_latency) causes the system to use performance
      governor which, I guess, was not the intention of the patch.
      
      Enabling debug output in pcc-cpufreq provides following verbose output:
      
       pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
       pcc_get_offset: for CPU 0: pcc_cpu_data input_offset: 0x44, pcc_cpu_data output_offset: 0x48
       init: policy->max is 2800000, policy->min is 1200000
       get: get_freq for CPU 0
       get: SUCCESS: (virtual) output_offset for cpu 0 is 0xffffc9000d7c0048, contains a value of: 0xff06. Speed is: 168000 MHz
       cpufreq: ondemand governor failed, too long transition latency of HW, fallback to performance governor
       target: CPU 0 should go to target freq: 2800000 (virtual) input_offset is 0xffffc9000d7c0044
       target: was SUCCESSFUL for cpu 0
      
      I am asking to revert 790d849b to re-enable usage of ondemand
      governor with pcc-cpufreq.
      
      Fixes: 790d849b (cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency)
      Signed-off-by: default avatarAndreas Herrmann <aherrmann@suse.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e52fe17
    • Wei Fang's avatar
      fs/dcache.c: avoid soft-lockup in dput() · ec686b01
      Wei Fang authored
      commit 47be6184 upstream.
      
      We triggered soft-lockup under stress test which
      open/access/write/close one file concurrently on more than
      five different CPUs:
      
      WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
      ...
      [<ffffffc0003986f8>] dput+0x100/0x298
      [<ffffffc00038c2dc>] terminate_walk+0x4c/0x60
      [<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8
      [<ffffffc00038f780>] filename_lookup+0x38/0xf0
      [<ffffffc000391180>] user_path_at_empty+0x78/0xd0
      [<ffffffc0003911f4>] user_path_at+0x1c/0x28
      [<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230
      
      ->d_lock trylock may failed many times because of concurrently
      operations, and dput() may execute a long time.
      
      Fix this by replacing cpu_relax() with cond_resched().
      dput() used to be sleepable, so make it sleepable again
      should be safe.
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec686b01
    • Michal Hocko's avatar
      Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" · 8d80b5e1
      Michal Hocko authored
      commit 4e390b2b upstream.
      
      This reverts commit f9054c70 ("mm, mempool: only set __GFP_NOMEMALLOC
      if there are free elements").
      
      There has been a report about OOM killer invoked when swapping out to a
      dm-crypt device.  The primary reason seems to be that the swapout out IO
      managed to completely deplete memory reserves.  Ondrej was able to
      bisect and explained the issue by pointing to f9054c70 ("mm,
      mempool: only set __GFP_NOMEMALLOC if there are free elements").
      
      The reason is that the swapout path is not throttled properly because
      the md-raid layer needs to allocate from the generic_make_request path
      which means it allocates from the PF_MEMALLOC context.  dm layer uses
      mempool_alloc in order to guarantee a forward progress which used to
      inhibit access to memory reserves when using page allocator.  This has
      changed by f9054c70 ("mm, mempool: only set __GFP_NOMEMALLOC if
      there are free elements") which has dropped the __GFP_NOMEMALLOC
      protection when the memory pool is depleted.
      
      If we are running out of memory and the only way forward to free memory
      is to perform swapout we just keep consuming memory reserves rather than
      throttling the mempool allocations and allowing the pending IO to
      complete up to a moment when the memory is depleted completely and there
      is no way forward but invoking the OOM killer.  This is less than
      optimal.
      
      The original intention of f9054c70 was to help with the OOM
      situations where the oom victim depends on mempool allocation to make a
      forward progress.  David has mentioned the following backtrace:
      
        schedule
        schedule_timeout
        io_schedule_timeout
        mempool_alloc
        __split_and_process_bio
        dm_request
        generic_make_request
        submit_bio
        mpage_readpages
        ext4_readpages
        __do_page_cache_readahead
        ra_submit
        filemap_fault
        handle_mm_fault
        __do_page_fault
        do_page_fault
        page_fault
      
      We do not know more about why the mempool is depleted without being
      replenished in time, though.  In any case the dm layer shouldn't depend
      on any allocations outside of the dedicated pools so a forward progress
      should be guaranteed.  If this is not the case then the dm should be
      fixed rather than papering over the problem and postponing it to later
      by accessing more memory reserves.
      
      mempools are a mechanism to maintain dedicated memory reserves to
      guaratee forward progress.  Allowing them an unbounded access to the
      page allocator memory reserves is going against the whole purpose of
      this mechanism.
      
      Bisected by Ondrej Kozina.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20160721145309.GR26379@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarNeilBrown <neilb@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Ondrej Kozina <okozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d80b5e1
    • Wei Fang's avatar
      fuse: fix wrong assignment of ->flags in fuse_send_init() · dac176f0
      Wei Fang authored
      commit 9446385f upstream.
      
      FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 69fe05c9 ("fuse: add missing INIT flags")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dac176f0
    • Maxim Patlasov's avatar
      fuse: fuse_flush must check mapping->flags for errors · d34c0dab
      Maxim Patlasov authored
      commit 9ebce595 upstream.
      
      fuse_flush() calls write_inode_now() that triggers writeback, but actual
      writeback will happen later, on fuse_sync_writes(). If an error happens,
      fuse_writepage_end() will set error bit in mapping->flags. So, we have to
      check mapping->flags after fuse_sync_writes().
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d34c0dab
    • Alexey Kuznetsov's avatar
      fuse: fsync() did not return IO errors · 2bfc3723
      Alexey Kuznetsov authored
      commit ac7f052b upstream.
      
      Due to implementation of fuse writeback filemap_write_and_wait_range() does
      not catch errors. We have to do this directly after fuse_sync_writes()
      Signed-off-by: default avatarAlexey Kuznetsov <kuznet@virtuozzo.com>
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bfc3723
    • Josh Poimboeuf's avatar
      x86/power/64: Fix hibernation return address corruption · 75b52369
      Josh Poimboeuf authored
      commit 4ce827b4 upstream.
      
      In kernel bug 150021, a kernel panic was reported when restoring a
      hibernate image.  Only a picture of the oops was reported, so I can't
      paste the whole thing here.  But here are the most interesting parts:
      
        kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
        BUG: unable to handle kernel paging request at ffff8804615cfd78
        ...
        RIP: ffff8804615cfd78
        RSP: ffff8804615f0000
        RBP: ffff8804615cfdc0
        ...
        Call Trace:
         do_signal+0x23
         exit_to_usermode_loop+0x64
         ...
      
      The RIP is on the same page as RBP, so it apparently started executing
      on the stack.
      
      The bug was bisected to commit ef0f3ed5 (x86/asm/power: Create
      stack frames in hibernate_asm_64.S), which in retrospect seems quite
      dangerous, since that code saves and restores the stack pointer from a
      global variable ('saved_context').
      
      There are a lot of moving parts in the hibernate save and restore paths,
      so I don't know exactly what caused the panic.  Presumably, a FRAME_END
      was executed without the corresponding FRAME_BEGIN, or vice versa.  That
      would corrupt the return address on the stack and would be consistent
      with the details of the above panic.
      
      [ rjw: One major problem is that by the time the FRAME_BEGIN in
        restore_registers() is executed, the stack pointer value may not
        be valid any more.  Namely, the stack area pointed to by it
        previously may have been overwritten by some image memory contents
        and that page frame may now be used for whatever different purpose
        it had been allocated for before hibernation.  In that case, the
        FRAME_BEGIN will corrupt that memory. ]
      
      Instead of doing the frame pointer save/restore around the bounds of the
      affected functions, just do it around the call to swsusp_save().
      
      That has the same effect of ensuring that if swsusp_save() sleeps, the
      frame pointers will be correct.  It's also a much more obviously safe
      way to do it than the original patch.  And objtool still doesn't report
      any warnings.
      
      Fixes: ef0f3ed5 (x86/asm/power: Create stack frames in hibernate_asm_64.S)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021Reported-by: default avatarAndre Reinke <andre.reinke@mailbox.org>
      Tested-by: default avatarAndre Reinke <andre.reinke@mailbox.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75b52369
    • Borislav Petkov's avatar
      x86/microcode: Fix suspend to RAM with builtin microcode · 4d501510
      Borislav Petkov authored
      commit 4b703305 upstream.
      
      Usually, after we have found the proper microcode blob for the current
      machine, we stash it away for later use with save_microcode_in_initrd().
      
      However, with builtin microcode which doesn't come from the initrd, we
      don't call that function because CONFIG_BLK_DEV_INITRD=n and even if
      set, we don't have a valid initrd.
      
      In order to fix this, let's make save_microcode_in_initrd() an
      fs_initcall which runs before rootfs_initcall() as this was the time it
      was called previously through:
      
       rootfs_initcall(populate_rootfs)
       |-> free_initrd()
           |-> free_initrd_mem()
               |-> save_microcode_in_initrd()
      
      Also, we make it run independently from initrd functionality being
      present or not.
      
      And since it is called in the microcode loader only now, we can also
      make it static.
      Reported-and-tested-by: default avatarJim Bos <jim876@xs4all.nl>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1465225850-7352-3-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d501510
    • Vladimir Davydov's avatar
      radix-tree: account nodes to memcg only if explicitly requested · 0ac814ee
      Vladimir Davydov authored
      commit 05eb6e72 upstream.
      
      Radix trees may be used not only for storing page cache pages, so
      unconditionally accounting radix tree nodes to the current memory cgroup
      is bad: if a radix tree node is used for storing data shared among
      different cgroups we risk pinning dead memory cgroups forever.
      
      So let's only account radix tree nodes if it was explicitly requested by
      passing __GFP_ACCOUNT to INIT_RADIX_TREE.  Currently, we only want to
      account page cache entries, so mark mapping->page_tree so.
      
      Fixes: 58e698af ("radix-tree: account radix_tree_node to memory cgroup")
      Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.comSigned-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ac814ee
    • Fabian Frederick's avatar
      sysv, ipc: fix security-layer leaking · efd09342
      Fabian Frederick authored
      commit 9b24fef9 upstream.
      
      Commit 53dad6d3 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
      to receive rcu freeing function but used generic ipc_rcu_free() instead
      of msg_rcu_free() which does security cleaning.
      
      Running LTP msgsnd06 with kmemleak gives the following:
      
        cat /sys/kernel/debug/kmemleak
      
        unreferenced object 0xffff88003c0a11f8 (size 8):
          comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
          hex dump (first 8 bytes):
            1b 00 00 00 01 00 00 00                          ........
          backtrace:
            kmemleak_alloc+0x23/0x40
            kmem_cache_alloc_trace+0xe1/0x180
            selinux_msg_queue_alloc_security+0x3f/0xd0
            security_msg_queue_alloc+0x2e/0x40
            newque+0x4e/0x150
            ipcget+0x159/0x1b0
            SyS_msgget+0x39/0x40
            entry_SYSCALL_64_fastpath+0x13/0x8f
      
      Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
      only use ipc_rcu_free in case of security allocation failure in newary()
      
      Fixes: 53dad6d3 ("ipc: fix race with LSMs")
      Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.beSigned-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efd09342
    • Vegard Nossum's avatar
      block: fix use-after-free in seq file · 199e5c22
      Vegard Nossum authored
      commit 77da1605 upstream.
      
      I got a KASAN report of use-after-free:
      
          ==================================================================
          BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
          Read of size 8 by task trinity-c1/315
          =============================================================================
          BUG kmalloc-32 (Not tainted): kasan: bad access detected
          -----------------------------------------------------------------------------
      
          Disabling lock debugging due to kernel taint
          INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
                  ___slab_alloc+0x4f1/0x520
                  __slab_alloc.isra.58+0x56/0x80
                  kmem_cache_alloc_trace+0x260/0x2a0
                  disk_seqf_start+0x66/0x110
                  traverse+0x176/0x860
                  seq_read+0x7e3/0x11a0
                  proc_reg_read+0xbc/0x180
                  do_loop_readv_writev+0x134/0x210
                  do_readv_writev+0x565/0x660
                  vfs_readv+0x67/0xa0
                  do_preadv+0x126/0x170
                  SyS_preadv+0xc/0x10
                  do_syscall_64+0x1a1/0x460
                  return_from_SYSCALL_64+0x0/0x6a
          INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
                  __slab_free+0x17a/0x2c0
                  kfree+0x20a/0x220
                  disk_seqf_stop+0x42/0x50
                  traverse+0x3b5/0x860
                  seq_read+0x7e3/0x11a0
                  proc_reg_read+0xbc/0x180
                  do_loop_readv_writev+0x134/0x210
                  do_readv_writev+0x565/0x660
                  vfs_readv+0x67/0xa0
                  do_preadv+0x126/0x170
                  SyS_preadv+0xc/0x10
                  do_syscall_64+0x1a1/0x460
                  return_from_SYSCALL_64+0x0/0x6a
      
          CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
           ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
           ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
           ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
          Call Trace:
           [<ffffffff81d6ce81>] dump_stack+0x65/0x84
           [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
           [<ffffffff814704ff>] object_err+0x2f/0x40
           [<ffffffff814754d1>] kasan_report_error+0x221/0x520
           [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
           [<ffffffff83888161>] klist_iter_exit+0x61/0x70
           [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
           [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
           [<ffffffff8151f812>] seq_read+0x4b2/0x11a0
           [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
           [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
           [<ffffffff814b4c45>] do_readv_writev+0x565/0x660
           [<ffffffff814b8a17>] vfs_readv+0x67/0xa0
           [<ffffffff814b8de6>] do_preadv+0x126/0x170
           [<ffffffff814b92ec>] SyS_preadv+0xc/0x10
      
      This problem can occur in the following situation:
      
      open()
       - pread()
          - .seq_start()
             - iter = kmalloc() // succeeds
             - seqf->private = iter
          - .seq_stop()
             - kfree(seqf->private)
       - pread()
          - .seq_start()
             - iter = kmalloc() // fails
          - .seq_stop()
             - class_dev_iter_exit(seqf->private) // boom! old pointer
      
      As the comment in disk_seqf_stop() says, stop is called even if start
      failed, so we need to reinitialise the private pointer to NULL when seq
      iteration stops.
      
      An alternative would be to set the private pointer to NULL when the
      kmalloc() in disk_seqf_start() fails.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      199e5c22
    • David Howells's avatar
      x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace · fe128ba3
      David Howells authored
      commit f7d66562 upstream.
      
      x86_64 needs to use compat_sys_keyctl for 32-bit userspace rather than
      calling sys_keyctl(). The latter will work in a lot of cases, thereby
      hiding the issue.
      Reported-by: default avatarStephan Mueller <smueller@chronox.de>
      Tested-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keyrings@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Link: http://lkml.kernel.org/r/146961615805.14395.5581949237156769439.stgit@warthog.procyon.org.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe128ba3
    • Vladimir Davydov's avatar
      mm: memcontrol: fix memcg id ref counter on swap charge move · 65eee9e0
      Vladimir Davydov authored
      commit 615d66c3 upstream.
      
      Since commit 73f576c0 ("mm: memcontrol: fix cgroup creation failure
      after many small jobs") swap entries do not pin memcg->css.refcnt
      directly.  Instead, they pin memcg->id.ref.  So we should adjust the
      reference counters accordingly when moving swap charges between cgroups.
      
      Fixes: 73f576c0 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
      Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.comSigned-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65eee9e0
    • Vladimir Davydov's avatar
      mm: memcontrol: fix swap counter leak on swapout from offline cgroup · 8d87f55b
      Vladimir Davydov authored
      commit 1f47b61f upstream.
      
      An offline memory cgroup might have anonymous memory or shmem left
      charged to it and no swap.  Since only swap entries pin the id of an
      offline cgroup, such a cgroup will have no id and so an attempt to
      swapout its anon/shmem will not store memory cgroup info in the swap
      cgroup map.  As a result, memcg->swap or memcg->memsw will never get
      uncharged from it and any of its ascendants.
      
      Fix this by always charging swapout to the first ancestor cgroup that
      hasn't released its id yet.
      
      [hannes@cmpxchg.org: add comment to mem_cgroup_swapout]
      [vdavydov@virtuozzo.com: use WARN_ON_ONCE() in mem_cgroup_id_get_online()]
        Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza
      Fixes: 73f576c0 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
      Link: http://lkml.kernel.org/r/5336daa5c9a32e776067773d9da655d2dc126491.1470219853.git.vdavydov@virtuozzo.comSigned-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d87f55b
    • Naveen N. Rao's avatar
      powerpc/bpf/jit: Disable classic BPF JIT on ppc64le · d71e050d
      Naveen N. Rao authored
      commit 844e3be4 upstream.
      
      Classic BPF JIT was never ported completely to work on little endian
      powerpc. However, it can be enabled and will crash the system when used.
      As such, disable use of BPF JIT on ppc64le.
      
      Fixes: 7c105b63 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
      Reported-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d71e050d
    • Gavin Shan's avatar
      powerpc/eeh: Fix invalid cached PE primary bus · 77bf0465
      Gavin Shan authored
      commit a3aa256b upstream.
      
      The PE primary bus cannot be got from its child devices when having
      full hotplug in error recovery. The PE primary bus is cached, which
      is done in commit <05ba75f8> ("powerpc/eeh: Fix stale cached primary
      bus"). In eeh_reset_device(), the flag (EEH_PE_PRI_BUS) is cleared
      before the PCI hot remove. eeh_pe_bus_get() then returns NULL as the
      PE primary bus in pnv_eeh_reset() and it crashes the kernel eventually.
      
      This fixes the issue by clearing the flag (EEH_PE_PRI_BUS) before the
      PCI hot add. With it, the PowerNV EEH reset backend (pnv_eeh_reset())
      can get valid PE primary bus through eeh_pe_bus_get().
      
      Fixes: 67086e32 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE")
      Reported-by: default avatarPridhiviraj Paidipeddi <ppaiddipe@in.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77bf0465
    • Alden Tondettar's avatar
      udf: Prevent stack overflow on corrupted filesystem mount · 1e80f40d
      Alden Tondettar authored
      commit a47241cd upstream.
      
      Presently, a corrupted or malicious UDF filesystem containing a very large
      number (or cycle) of Logical Volume Integrity Descriptor extent
      indirections may trigger a stack overflow and kernel panic in
      udf_load_logicalvolint() on mount.
      
      Replace the unnecessary recursion in udf_load_logicalvolint() with
      simple iteration. Set an arbitrary limit of 1000 indirections (which would
      have almost certainly overflowed the stack without this fix), and treat
      such cases as if there were no LVID.
      Signed-off-by: default avatarAlden Tondettar <alden.tondettar@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e80f40d
    • Toshi Kani's avatar
      x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386 · 1537fac9
      Toshi Kani authored
      commit 1886297c upstream.
      
      The following BUG_ON() crash was reported on QEMU/i386:
      
        kernel BUG at arch/x86/mm/physaddr.c:79!
        Call Trace:
        phys_mem_access_prot_allowed
        mmap_mem
        ? mmap_region
        mmap_region
        do_mmap
        vm_mmap_pgoff
        SyS_mmap_pgoff
        do_int80_syscall_32
        entry_INT80_32
      
      after commit:
      
        edfe63ec ("x86/mtrr: Fix Xorg crashes in Qemu sessions")
      
      PAT is now set to disabled state when MTRRs are disabled.
      Thus, reactivating the __pa(high_memory) check in
      phys_mem_access_prot_allowed().
      
      When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(),
      which in turn calls slow_virt_to_phys() for 'high_memory'.
      Because 'high_memory' is set to (the max direct mapped virt
      addr + 1), it is not a valid virtual address.  Hence,
      slow_virt_to_phys() returns 0 and hit the BUG_ON.  Using
      __pa_nodebug() instead of __pa() will fix this BUG_ON.
      
      However, this code block, originally written for Pentiums and
      earlier, is no longer adequate since a 32-bit Xen guest has
      MTRRs disabled and supports ZONE_HIGHMEM.  In this setup,
      this code sets UC attribute for accessing RAM in high memory
      range.
      
      Delete this code block as it has been unused for a long time.
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1460403360-25441-1-git-send-email-toshi.kani@hpe.com
      Link: https://lkml.org/lkml/2016/4/1/608Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1537fac9
    • Toshi Kani's avatar
      x86/pat: Document the PAT initialization sequence · d855dbb7
      Toshi Kani authored
      commit b6350c21 upstream.
      
      Update PAT documentation to describe how PAT is initialized under
      various configurations.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: elliott@hpe.com
      Cc: konrad.wilk@oracle.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-8-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d855dbb7
    • Toshi Kani's avatar
      x86/xen, pat: Remove PAT table init code from Xen · 4ef71f44
      Toshi Kani authored
      commit 88ba2811 upstream.
      
      Xen supports PAT without MTRRs for its guests.  In order to
      enable WC attribute, it was necessary for xen_start_kernel()
      to call pat_init_cache_modes() to update PAT table before
      starting guest kernel.
      
      Now that the kernel initializes PAT table to the BIOS handoff
      state when MTRR is disabled, this Xen-specific PAT init code
      is no longer necessary.  Delete it from xen_start_kernel().
      
      Also change __init_cache_modes() to a static function since
      PAT table should not be tweaked by other modules.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: elliott@hpe.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ef71f44
    • Toshi Kani's avatar
      x86/mtrr: Fix PAT init handling when MTRR is disabled · d362fc52
      Toshi Kani authored
      commit ad025a73 upstream.
      
      get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled.
      This results in calling pat_init() on BSP only since APs do not call
      pat_init() when MTRR is disabled.  This inconsistency between BSP
      and APs leads to undefined behavior.
      
      Make BSP's calling condition to pat_init() consistent with AP's,
      mtrr_ap_init() and mtrr_aps_init().
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: elliott@hpe.com
      Cc: konrad.wilk@oracle.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-6-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d362fc52
    • Toshi Kani's avatar
      x86/mtrr: Fix Xorg crashes in Qemu sessions · 8095bbc2
      Toshi Kani authored
      commit edfe63ec upstream.
      
      A Xorg failure on qemu32 was reported as a regression [1] caused by
      commit 9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled").
      
      This patch fixes the Xorg crash.
      
      Negative effects of this regression were the following two failures [2]
      in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
      triggered by the fact that its virtual CPU does not support MTRRs.
      
       #1. copy_process() failed in the check in reserve_pfn_range()
      
          copy_process
           copy_mm
            dup_mm
             dup_mmap
              copy_page_range
               track_pfn_copy
                reserve_pfn_range
      
       A WC map request was tracked as WC in memtype, which set a PTE as
       UC (pgprot) per __cachemode2pte_tbl[].  This led to this error in
       reserve_pfn_range() called from track_pfn_copy(), which obtained
       a pgprot from a PTE.  It converts pgprot to page_cache_mode, which
       does not necessarily result in the original page_cache_mode since
       __cachemode2pte_tbl[] redirects multiple types to UC.
      
       #2. error path in copy_process() then hit WARN_ON_ONCE in
           untrack_pfn().
      
           x86/PAT: Xorg:509 map pfn expected mapping type uncached-
           minus for [mem 0xfd000000-0xfdffffff], got write-combining
            Call Trace:
           dump_stack
           warn_slowpath_common
           ? untrack_pfn
           ? untrack_pfn
           warn_slowpath_null
           untrack_pfn
           ? __kunmap_atomic
           unmap_single_vma
           ? pagevec_move_tail_fn
           unmap_vmas
           exit_mmap
           mmput
           copy_process.part.47
           _do_fork
           SyS_clone
           do_syscall_32_irqs_on
           entry_INT80_32
      
      These negative effects are caused by two separate bugs, but they
      can be addressed in separate patches.  Fixing the pat_init() issue
      described below addresses the root cause, and avoids Xorg to hit
      these cases.
      
      When the CPU does not support MTRRs, MTRR does not call pat_init(),
      which leaves PAT enabled without initializing PAT.  This pat_init()
      issue is a long-standing issue, but manifested as issue #1 (and then
      hit issue #2) with the above-mentioned commit because the memtype
      now tracks cache attribute with 'page_cache_mode'.
      
      This pat_init() issue existed before the commit, but we used pgprot
      in memtype.  Hence, we did not have issue #1 before.  But WC request
      resulted in WT in effect because WC pgrot is actually WT when PAT
      is not initialized.  This is not how it was designed to work.  When
      PAT is set to disable properly, WC is converted to UC.  The use of
      WT can result in a system crash if the target range does not support
      WT.  Fortunately, nobody ran into such issue before.
      
      To fix this pat_init() issue, PAT code has been enhanced to provide
      pat_disable() interface.  Call this interface when MTRRs are disabled.
      By setting PAT to disable properly, PAT bypasses the memtype check,
      and avoids issue #1.
      
        [1]: https://lkml.org/lkml/2016/3/3/828
        [2]: https://lkml.org/lkml/2016/3/4/775Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: elliott@hpe.com
      Cc: konrad.wilk@oracle.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8095bbc2
    • Toshi Kani's avatar
      x86/mm/pat: Replace cpu_has_pat with boot_cpu_has() · 22173e3b
      Toshi Kani authored
      commit d63dcf49 upstream.
      
      Borislav Petkov suggested:
      
       > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast
       > paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX
       > ugliness.
      
      Replace the use of cpu_has_pat on init paths with boot_cpu_has().
      Suggested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: konrad.wilk@oracle.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-4-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22173e3b
    • Toshi Kani's avatar
      x86/mm/pat: Add pat_disable() interface · 69bf9ad9
      Toshi Kani authored
      commit 224bb1e5 upstream.
      
      In preparation for fixing a regression caused by:
      
        9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled")
      
      ... PAT needs to provide an interface that prevents the OS from
      initializing the PAT MSR.
      
      PAT MSR initialization must be done on all CPUs using the specific
      sequence of operations defined in the Intel SDM.  This requires MTRRs
      to be enabled since pat_init() is called as part of MTRR init
      from mtrr_rendezvous_handler().
      
      Make pat_disable() as the interface that prevents the OS from
      initializing the PAT MSR.  MTRR will call this interface when it
      cannot provide the SDM-defined sequence to initialize PAT.
      
      This also assures that pat_disable() called from pat_bsp_init()
      will set the PAT table properly when CPU does not support PAT.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: konrad.wilk@oracle.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69bf9ad9
    • Toshi Kani's avatar
      x86/mm/pat: Add support of non-default PAT MSR setting · d809898a
      Toshi Kani authored
      commit 02f037d6 upstream.
      
      In preparation for fixing a regression caused by:
      
        9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled")'
      
      ... PAT needs to support a case that PAT MSR is initialized with a
      non-default value.
      
      When pat_init() is called and PAT is disabled, it initializes the
      PAT table with the BIOS default value. Xen, however, sets PAT MSR
      with a non-default value to enable WC. This causes inconsistency
      between the PAT table and PAT MSR when PAT is set to disable on Xen.
      
      Change pat_init() to handle the PAT disable cases properly.  Add
      init_cache_modes() to handle two cases when PAT is set to disable.
      
       1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
       2. CPU does not support PAT: Set PAT table to be consistent with
          PWT and PCD bits in a PTE.
      
      Note, __init_cache_modes(), renamed from pat_init_cache_modes(),
      will be changed to a static function in a later patch.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: elliott@hpe.com
      Cc: konrad.wilk@oracle.com
      Cc: paul.gortmaker@windriver.com
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d809898a
    • Theodore Ts'o's avatar
      random: strengthen input validation for RNDADDTOENTCNT · 0a66d3a8
      Theodore Ts'o authored
      commit 86a574de upstream.
      
      Don't allow RNDADDTOENTCNT or RNDADDENTROPY to accept a negative
      entropy value.  It doesn't make any sense to subtract from the entropy
      counter, and it can trigger a warning:
      
      random: negative entropy/overflow: pool input count -40000
      ------------[ cut here ]------------
      WARNING: CPU: 3 PID: 6828 at drivers/char/random.c:670[<      none
       >] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
      Modules linked in:
      CPU: 3 PID: 6828 Comm: a.out Not tainted 4.7.0-rc4+ #4
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffffffff880b58e0 ffff88005dd9fcb0 ffffffff82cc838f ffffffff87158b40
       fffffbfff1016b1c 0000000000000000 0000000000000000 ffffffff87158b40
       ffffffff83283dae 0000000000000009 ffff88005dd9fcf8 ffffffff8136d27f
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82cc838f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
       [<ffffffff8136d27f>] __warn+0x19f/0x1e0 kernel/panic.c:516
       [<ffffffff8136d48c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:551
       [<ffffffff83283dae>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
       [<     inline     >] credit_entropy_bits_safe drivers/char/random.c:734
       [<ffffffff8328785d>] random_ioctl+0x21d/0x250 drivers/char/random.c:1546
       [<     inline     >] vfs_ioctl fs/ioctl.c:43
       [<ffffffff8185316c>] do_vfs_ioctl+0x18c/0xff0 fs/ioctl.c:674
       [<     inline     >] SYSC_ioctl fs/ioctl.c:689
       [<ffffffff8185405f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:680
       [<ffffffff86a995c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
      arch/x86/entry/entry_64.S:207
      ---[ end trace 5d4902b2ba842f1f ]---
      
      This was triggered using the test program:
      
      // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
      int main() {
              int fd = open("/dev/random", O_RDWR);
              int val = -5000;
              ioctl(fd, RNDADDTOENTCNT, &val);
              return 0;
      }
      
      It's harmless in that (a) only root can trigger it, and (b) after
      complaining the code never does let the entropy count go negative, but
      it's better to simply not allow this userspace from passing in a
      negative entropy value altogether.
      
      Google-Bug-Id: #29575089
      Reported-By: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a66d3a8
    • Axel Lin's avatar
      regulator: qcom_smd: Remove list_voltage callback for rpm_smps_ldo_ops_fixed · 141c47d9
      Axel Lin authored
      commit 43160ffd upstream.
      
      Use regulator_list_voltage_linear_range in rpm_smps_ldo_ops_fixed is
      wrong because it is used for fixed regulator without any linear range.
      The rpm_smps_ldo_ops_fixed is used for pm8941_lnldo which has fixed_uV
      set and n_voltages = 1. In this case, regulator_list_voltage() can return
      rdev->desc->fixed_uV without .list_voltage implementation.
      
      Fixes: 3bfbb4d1 ("regulator: qcom_smd: add list_voltage callback")
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      141c47d9
    • John Johansen's avatar
    • Mike Marciniszyn's avatar
      IB/hfi1: Fix deadlock with txreq allocation slow path · eb860ec1
      Mike Marciniszyn authored
      commit 2aee309d upstream.
      
      A failure in the get_txreq() inline will result in a
      slow path retry using __get_txreq().
      
      __get_txreq() attempts to procure the qp s_lock, which
      is already held in all callers.
      
      Fix by deleting the s_lock maintenance in __get_txreq()
      and add sparse syntax hooks to future proof the code.
      
      Cc: Stable <stable@vger.kernel.org> # 4.6+
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb860ec1
    • Mike Marciniszyn's avatar
      IB/hfi1: Correct issues with sc5 computation · 6ff49f18
      Mike Marciniszyn authored
      commit 896ce45d upstream.
      
      There are several computatations of the sc in the
      ud receive routine.
      
      Besides the code duplication, all are wrong when the
      sc is greater than 15.   In that case the code incorrectly
      or's a 1 into the computed sc instead of 1 shifted left
      by 4.
      
      Fix precomputed sc5 by using an already implemented routine
      hdr2sc() and deleting flawed duplicated code.
      
      Cc: Stable <stable@vger.kernel.org> # 4.6+
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ff49f18