1. 15 Feb, 2018 17 commits
  2. 14 Feb, 2018 2 commits
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 6556677a
      Linus Torvalds authored
      Pull gfs2 fix from Bob Peterson:
       "Fix regressions in the gfs2 iomap for block_map implementation we
        recently discovered in commit 3974320c"
      
      * tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Fixes to "Implement iomap for block_map"
      6556677a
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 694a20da
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "A larger batch of fixes than we'd like. Roughly 1/3 fixes for new
        code, 1/3 fixes for stable and 1/3 minor things.
      
        There's four commits fixing bugs when using 16GB huge pages on hash,
        caused by some of the preparatory changes for pkeys.
      
        Two fixes for bugs in the enhanced IRQ soft masking for local_t, one
        of which broke KVM in some circumstances.
      
        Four fixes for Power9. The most bizarre being a bug where futexes
        stopped working because a NULL pointer dereference didn't trap during
        early boot (it aliased the kernel mapping). A fix for memory hotplug
        when using the Radix MMU, and a fix for live migration of guests using
        the Radix MMU.
      
        Two fixes for hotplug on pseries machines. One where we weren't
        correctly updating NUMA info when CPUs are added and removed. And the
        other fixes crashes/hangs seen when doing memory hot remove during
        boot, which is apparently a thing people do.
      
        Finally a handful of build fixes for obscure configs and other minor
        fixes.
      
        Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin
        Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck,
        Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de
        Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff"
      
      * tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Fix to use ucontext_t instead of struct ucontext
        powerpc/kdump: Fix powernv build break when KEXEC_CORE=n
        powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug
        powerpc/mm/hash64: Zero PGD pages on allocation
        powerpc/mm/hash64: Store the slot information at the right offset for hugetlb
        powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled
        powerpc/mm: Fix crashes with 16G huge pages
        powerpc/mm: Flush radix process translations when setting MMU type
        powerpc/vas: Don't set uses_vas for kernel windows
        powerpc/pseries: Enable RAS hotplug events later
        powerpc/mm/radix: Split linear mapping on hot-unplug
        powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID
        ocxl: fix signed comparison with less than zero
        powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking
        powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro
        powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove
      694a20da
  3. 13 Feb, 2018 21 commits
    • Andreas Gruenbacher's avatar
      gfs2: Fixes to "Implement iomap for block_map" · 49edd5bf
      Andreas Gruenbacher authored
      It turns out that commit 3974320c "Implement iomap for block_map"
      introduced a few bugs that trigger occasional failures with xfstest
      generic/476:
      
      In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
      beyond the end of the allocated metadata (height > ip->i_height).
      There, we can end up calling hole_size with a metapath that doesn't
      match the current metadata tree, which doesn't make sense.  After
      untangling the code at do_alloc, fix this by checking if the block we
      are looking for is within the range of allocated metadata.
      
      In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
      for reading stuffed files: this is handled separately.  Make sure we
      don't truncate iomap->length for reads beyond the end of the file; in
      that case, the entire range counts as a hole.
      
      Finally, revert to taking a bitmap write lock when doing allocations.
      It's unclear why that change didn't lead to any failures during testing.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      49edd5bf
    • Linus Torvalds's avatar
      Merge tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips · 61f14c01
      Linus Torvalds authored
      Pull MIPS fix from James Hogan:
       "A single change (and associated DT binding update) to allow the
        address of the MIPS Cluster Power Controller (CPC) to be chosen by DT,
        which allows SMP to work on generic MIPS kernels where the bootloader
        hasn't configured the CPC address (i.e. the new Ranchu platform)"
      
      * tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
        MIPS: CPC: Map registers using DT in mips_cpc_default_phys_base()
        dt-bindings: Document mti,mips-cpc binding
      61f14c01
    • Tony Luck's avatar
      x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages · fd0e786d
      Tony Luck authored
      In the following commit:
      
        ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      
      ... we added code to memory_failure() to unmap the page from the
      kernel 1:1 virtual address space to avoid speculative access to the
      page logging additional errors.
      
      But memory_failure() may not always succeed in taking the page offline,
      especially if the page belongs to the kernel.  This can happen if
      there are too many corrected errors on a page and either mcelog(8)
      or drivers/ras/cec.c asks to take a page offline.
      
      Since we remove the 1:1 mapping early in memory_failure(), we can
      end up with the page unmapped, but still in use. On the next access
      the kernel crashes :-(
      
      There are also various debug paths that call memory_failure() to simulate
      occurrence of an error. Since there is no actual error in memory, we
      don't need to map out the page for those cases.
      
      Revert most of the previous attempt and keep the solution local to
      arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
      
      	1) there is a real error
      	2) memory_failure() succeeds.
      
      All of this only applies to 64-bit systems. 32-bit kernel doesn't map
      all of memory into kernel space. It isn't worth adding the code to unmap
      the piece that is mapped because nobody would run a 32-bit kernel on a
      machine that has recoverable machine checks.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert (Persistent Memory) <elliott@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org #v4.14
      Fixes: ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd0e786d
    • Tycho Andersen's avatar
      locking/semaphore: Update the file path in documentation · 2dd6fd2e
      Tycho Andersen authored
      While reading this header I noticed that the locking stuff has moved to
      kernel/locking/*, so update the path in semaphore.h to point to that.
      Signed-off-by: default avatarTycho Andersen <tycho@tycho.ws>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180201114119.1090-1-tycho@tycho.wsSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2dd6fd2e
    • Will Deacon's avatar
      locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit() · 61e02392
      Will Deacon authored
      A test_and_{}_bit() operation fails if the value of the bit is such that
      the modification does not take place. For example, if test_and_set_bit()
      returns 1. In these cases, follow the behaviour of cmpxchg and allow the
      operation to be unordered. This also applies to test_and_set_bit_lock()
      if the lock is found to be be taken already.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1518528619-20049-1-git-send-email-will.deacon@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      61e02392
    • Will Deacon's avatar
      locking/qspinlock: Ensure node->count is updated before initialising node · 11dc1322
      Will Deacon authored
      When queuing on the qspinlock, the count field for the current CPU's head
      node is incremented. This needn't be atomic because locking in e.g. IRQ
      context is balanced and so an IRQ will return with node->count as it
      found it.
      
      However, the compiler could in theory reorder the initialisation of
      node[idx] before the increment of the head node->count, causing an
      IRQ to overwrite the initialised node and potentially corrupt the lock
      state.
      
      Avoid the potential for this harmful compiler reordering by placing a
      barrier() between the increment of the head node->count and the subsequent
      node initialisation.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      11dc1322
    • Will Deacon's avatar
      locking/qspinlock: Ensure node is initialised before updating prev->next · 95bcade3
      Will Deacon authored
      When a locker ends up queuing on the qspinlock locking slowpath, we
      initialise the relevant mcs node and publish it indirectly by updating
      the tail portion of the lock word using xchg_tail. If we find that there
      was a pre-existing locker in the queue, we subsequently update their
      ->next field to point at our node so that we are notified when it's our
      turn to take the lock.
      
      This can be roughly illustrated as follows:
      
        /* Initialise the fields in node and encode a pointer to node in tail */
        tail = initialise_node(node);
      
        /*
         * Exchange tail into the lockword using an atomic read-modify-write
         * operation with release semantics
         */
        old = xchg_tail(lock, tail);
      
        /* If there was a pre-existing waiter ... */
        if (old & _Q_TAIL_MASK) {
      	prev = decode_tail(old);
      	smp_read_barrier_depends();
      
      	/* ... then update their ->next field to point to node.
      	WRITE_ONCE(prev->next, node);
        }
      
      The conditional update of prev->next therefore relies on the address
      dependency from the result of xchg_tail ensuring order against the
      prior initialisation of node. However, since the release semantics of
      the xchg_tail operation apply only to the write portion of the RmW,
      then this ordering is not guaranteed and it is possible for the CPU
      to return old before the writes to node have been published, consequently
      allowing us to point prev->next to an uninitialised node.
      
      This patch fixes the problem by making the update of prev->next a RELEASE
      operation, which also removes the reliance on dependency ordering.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      95bcade3
    • Arnd Bergmann's avatar
      x86/error_inject: Make just_return_func() globally visible · 01684e72
      Arnd Bergmann authored
      With link time optimizations enabled, I get a link failure:
      
        ./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return':
        <artificial>:(.text+0x7f3): undefined reference to `just_return_func'
      
      Marking the symbol .globl makes it work as expected.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 540adea3 ("error-injection: Separate error-injection from kprobe")
      Link: http://lkml.kernel.org/r/20180202145634.200291-3-arnd@arndb.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      01684e72
    • mike.travis@hpe.com's avatar
      x86/platform/UV: Fix GAM Range Table entries less than 1GB · c25d99d2
      mike.travis@hpe.com authored
      The latest UV platforms include the new ApachePass NVDIMMs into the
      UV address space.  This has introduced address ranges in the Global
      Address Map Table that are less than the previous lowest range, which
      was 2GB.  Fix the address calculation so it accommodates address ranges
      from bytes to exabytes.
      Signed-off-by: default avatarMike Travis <mike.travis@hpe.com>
      Reviewed-by: default avatarAndrew Banman <andrew.banman@hpe.com>
      Reviewed-by: default avatarDimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c25d99d2
    • Progyan Bhattacharya's avatar
      x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore · 74eb816b
      Progyan Bhattacharya authored
      The file was generated by make command and should not be in the source tree.
      Signed-off-by: default avatarProgyan Bhattacharya <progyanb@acm.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      74eb816b
    • Masayoshi Mizuma's avatar
      x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU · 295cc7eb
      Masayoshi Mizuma authored
      When a physical CPU is hot-removed, the following warning messages
      are shown while the uncore device is removed in uncore_pci_remove():
      
        WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
        uncore_pci_remove+0xf1/0x110
        ...
        CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
        Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        ...
        Call Trace:
        pci_device_remove+0x36/0xb0
        device_release_driver_internal+0x145/0x210
        pci_stop_bus_device+0x76/0xa0
        pci_stop_root_bus+0x44/0x60
        acpi_pci_root_remove+0x1f/0x80
        acpi_bus_trim+0x54/0x90
        acpi_bus_trim+0x2e/0x90
        acpi_device_hotplug+0x2bc/0x4b0
        acpi_hotplug_work_fn+0x1a/0x30
        process_one_work+0x141/0x340
        worker_thread+0x47/0x3e0
        kthread+0xf5/0x130
      
      When uncore_pci_remove() runs, it tries to get the package ID to
      clear the value of uncore_extra_pci_dev[].dev[] by using
      topology_phys_to_logical_pkg(). The warning messesages are
      shown because topology_phys_to_logical_pkg() returns -1.
      
        arch/x86/events/intel/uncore.c:
        static void uncore_pci_remove(struct pci_dev *pdev)
        {
        ...
                phys_id = uncore_pcibus_to_physid(pdev->bus);
        ...
                        pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                        for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                                if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                        uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                        break;
                                }
                        }
                        WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
      
      topology_phys_to_logical_pkg() tries to find
      cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
      
        arch/x86/kernel/smpboot.c:
        int topology_phys_to_logical_pkg(unsigned int phys_pkg)
        {
                int cpu;
      
                for_each_possible_cpu(cpu) {
                        struct cpuinfo_x86 *c = &cpu_data(cpu);
      
                        if (c->initialized && c->phys_proc_id == phys_pkg)
                                return c->logical_proc_id;
                }
                return -1;
        }
      
      However, the phys_proc_id was already set to 0 by remove_siblinginfo()
      when the CPU was offlined.
      
      So, topology_phys_to_logical_pkg() cannot find the correct
      logical_proc_id and always returns -1.
      
      As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
      messages are shown.
      
      What is worse is that the bogus 'pkg' index results in two bugs:
      
       - We dereference uncore_extra_pci_dev[] with a negative index
       - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
      
      To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
      
      This should not cause any problems, because ->phys_proc_id is not
      used after it is hot-removed and it is re-set while hot-adding.
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yasu.isimatu@gmail.com
      Cc: <stable@vger.kernel.org>
      Fixes: 30bb9811 ("x86/topology: Avoid wasting 128k for package id array")
      Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      295cc7eb
    • Harish's avatar
      selftests/powerpc: Fix to use ucontext_t instead of struct ucontext · ecdf06e1
      Harish authored
      With glibc 2.26 'struct ucontext' is removed to improve POSIX
      compliance, which breaks powerpc/alignment_handler selftest. Fix the
      test by using ucontext_t. Tested on ppc, works with older glibc
      versions as well.
      
      Fixes the following:
        alignment_handler.c: In function ‘sighandler’:
        alignment_handler.c:68:5: error: dereferencing pointer to incomplete type ‘struct ucontext’
          ucp->uc_mcontext.gp_regs[PT_NIP] += 4;
      Signed-off-by: default avatarHarish <harish@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ecdf06e1
    • Guenter Roeck's avatar
      powerpc/kdump: Fix powernv build break when KEXEC_CORE=n · 91096175
      Guenter Roeck authored
      If KEXEC_CORE is not enabled, powernv builds fail as follows.
      
        arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
        arch/powerpc/platforms/powernv/smp.c:236:4: error:
        	implicit declaration of function 'crash_ipi_callback'
      
      Add dummy function calls, similar to kdump_in_progress(), to solve the
      problem.
      
      Fixes: 4145f358 ("powernv/kdump: Fix cases where the kdump kernel can get HMI's")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      91096175
    • Guenter Roeck's avatar
      powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug · 82343484
      Guenter Roeck authored
      Commit e67e02a5 ("powerpc/pseries: Fix cpu hotplug crash with
      memoryless nodes") adds an unconditional call to
      find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR
      is enabled. This results in the following build error if this is not
      the case.
      
        arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
        arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
        			undefined reference to `.find_and_online_cpu_nid'
      
      Follow the guideline provided by similar functions and provide a dummy
      function if CONFIG_PPC_SPLPAR is not enabled. This also moves the
      external function declaration into an include file where it should be.
      
      Fixes: e67e02a5 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      [mpe: Change subject to emphasise the build fix]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      82343484
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash64: Zero PGD pages on allocation · fc5c2f4a
      Aneesh Kumar K.V authored
      On powerpc we allocate page table pages from slab caches of different
      sizes. Currently we have a constructor that zeroes out the objects when
      we allocate them for the first time.
      
      We expect the objects to be zeroed out when we free the the object
      back to slab cache. This happens in the unmap path. For hugetlb pages
      we call huge_pte_get_and_clear() to do that.
      
      With the current configuration of page table size, both PUD and PGD
      level tables are allocated from the same slab cache. At the PUD level,
      we use the second half of the table to store the slot information. But
      we never clear that when unmapping.
      
      When such a freed object is then allocated for a PGD page, the second
      half of the page table page will not be zeroed as expected. This
      results in a kernel crash.
      
      Fix it by always clearing PGD pages when they're allocated.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log wording and formatting, add whitespace]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fc5c2f4a
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash64: Store the slot information at the right offset for hugetlb · ff31e105
      Aneesh Kumar K.V authored
      The hugetlb pte entries are at the PMD and PUD level, so we can't use
      PTRS_PER_PTE to find the second half of the page table. Use the right
      offset for PUD/PMD to get to the second half of the table.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ff31e105
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled · 4a7aa4fe
      Aneesh Kumar K.V authored
      We use the second half of the page table to store slot information, so we must
      allocate it always if hugetlb is possible.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4a7aa4fe
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fix crashes with 16G huge pages · fae22116
      Aneesh Kumar K.V authored
      To support memory keys, we moved the hash pte slot information to the
      second half of the page table. This was ok with PTE entries at level
      4 (PTE page) and level 3 (PMD). We already allocate larger page table
      pages at those levels to accomodate extra details. For level 4 we
      already have the extra space which was used to track 4k hash page
      table entry details and at level 3 the extra space was allocated to
      track the THP details.
      
      With hugetlbfs PTE, we used this extra space at the PMD level to store
      the slot details. But we also support hugetlbfs PTE at PUD level for
      16GB pages and PUD level page didn't allocate extra space. This
      resulted in memory corruption.
      
      Fix this by allocating extra space at PUD level when HUGETLB is
      enabled.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fae22116
    • Alexey Kardashevskiy's avatar
      powerpc/mm: Flush radix process translations when setting MMU type · 62e984dd
      Alexey Kardashevskiy authored
      Radix guests do normally invalidate process-scoped translations when a
      new pid is allocated but migrated guests do not invalidate these so
      migrated guests crash sometime, especially easy to reproduce with
      migration happening within first 10 seconds after the guest boot start
      on the same machine.
      
      This adds the "Invalidate process-scoped translations" flush to fix
      radix guests migration.
      
      Fixes: 2ee13be3 ("KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00")
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Tested-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Tested-by: default avatarDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      62e984dd
    • Nicholas Piggin's avatar
      powerpc/vas: Don't set uses_vas for kernel windows · b00b6289
      Nicholas Piggin authored
      cp_abort is only required for user windows, because kernel context
      must not be preempted between a copy/paste pair.
      
      Without this patch, the init task gets used_vas set when it runs the
      nx842_powernv_init initcall, which opens windows for kernel usage.
      
      used_vas is then never cleared anywhere, so it gets propagated into
      all other tasks. It's a property of the address space, so it should
      really be cleared when a new mm is created (or in dup_mmap if the
      mmaps are marked as VM_DONTCOPY). For now we seem to have no such
      driver, so leave that for another patch.
      
      Fixes: 6c8e6bb2 ("powerpc/vas: Add support for user receive window")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b00b6289
    • Sam Bobroff's avatar
      powerpc/pseries: Enable RAS hotplug events later · c9dccf1d
      Sam Bobroff authored
      Currently if the kernel receives a memory hot-unplug event early
      enough, it may get stuck in an infinite loop in
      dissolve_free_huge_pages(). This appears as a stall just after:
      
        pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY
      
      It appears to be caused by "minimum_order" being uninitialized, due to
      init_ras_IRQ() executing before hugetlb_init().
      
      To correct this, extract the part of init_ras_IRQ() that enables
      hotplug event processing and place it in the machine_late_initcall
      phase, which is guaranteed to be after hugetlb_init() is called.
      Signed-off-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      [mpe: Reorder the functions to make the diff readable]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c9dccf1d