1. 05 Jan, 2019 7 commits
    • Linus Torvalds's avatar
      Merge tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-arm · 1205b623
      Linus Torvalds authored
      Pull ARM updates from Russell King:
       "Included in this update:
      
         - Florian Fainelli noticed that userspace segfaults caused by the
           lack of kernel-userspace helpers was hard to diagnose; we now issue
           a warning when userspace tries to use the helpers but the kernel
           has them disabled.
      
         - Ben Dooks wants compatibility for the old ATAG serial number with
           DT systems.
      
         - Some cleanup of assembly by Nicolas Pitre.
      
         - User accessors optimisation from Vincent Whitchurch.
      
         - More robust kdump on SMP systems from Yufen Wang.
      
         - Sebastian Andrzej Siewior noticed problems with the SMP "boot_lock"
           on RT kernels, and so we convert the Versatile series of platforms
           to use a raw spinlock instead, consolidating the Versatile
           implementation. We entirely remove the boot_lock on OMAP systems,
           where it's unnecessary. Further patches for other systems will be
           submitted for the following merge window.
      
         - Start switching old StrongARM-11x0 systems to use gpiolib rather
           than their private GPIO implementation - mostly PCMCIA bits.
      
         - ARM Kconfig cleanups.
      
         - Cleanup a mostly harmless mistake in the recent Spectre patch in
           4.20 (which had the effect that data that can be placed into the
           init sections was incorrectly always placed in the rodata section)"
      
      * tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-arm: (25 commits)
        ARM: omap2: remove unnecessary boot_lock
        ARM: versatile: rename and comment SMP implementation
        ARM: versatile: convert boot_lock to raw
        ARM: vexpress/realview: consolidate immitation CPU hotplug
        ARM: fix the cockup in the previous patch
        ARM: sa1100/cerf: switch to using gpio_led_register_device()
        ARM: sa1100/assabet: switch to using gpio leds
        ARM: sa1100/assabet: add gpio keys support for right-hand two buttons
        ARM: sa1111: remove legacy GPIO interfaces
        pcmcia: sa1100*: remove redundant bvd1/bvd2 setting
        ARM: pxa/lubbock: switch PCMCIA to MAX1600 library
        ARM: pxa/mainstone: switch PCMCIA to MAX1600 library and gpiod APIs
        ARM: sa1100/neponset: switch PCMCIA to MAX1600 library and gpiod APIs
        ARM: sa1100/jornada720: switch PCMCIA to gpiod APIs
        pcmcia: add MAX1600 library
        ARM: sa1100: explicitly register sa11x0-pcmcia devices
        ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+
        ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS
        ARM: 8811/1: always list both ldrd/strd registers explicitly
        ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
        ...
      1205b623
    • Linus Torvalds's avatar
      Merge tag 'csky-for-linus-4.21' of git://github.com/c-sky/csky-linux · 9ee3b3f4
      Linus Torvalds authored
      Pull arch/csky updates from Guo Ren:
       "Here are three main features (cpu_hotplug, basic ftrace, basic perf)
        and some bugfixes:
      
        Features:
         - Add CPU-hotplug support for SMP
         - Add ftrace with function trace and function graph trace
         - Add Perf support
         - Add EM_CSKY_OLD 39
         - optimize kernel panic print.
         - remove syscall_exit_work
      
        Bugfixes:
         - fix abiv2 mmap(... O_SYNC) failure
         - fix gdb coredump error
         - remove vdsp implement for kernel
         - fix qemu failure to bootup sometimes
         - fix ftrace call-graph panic
         - fix device tree node reference leak
         - remove meaningless header-y
         - fix save hi,lo,dspcr regs in switch_stack
         - remove unused members in processor.h"
      
      * tag 'csky-for-linus-4.21' of git://github.com/c-sky/csky-linux:
        csky: Add perf support for C-SKY
        csky: Add EM_CSKY_OLD 39
        clocksource/drivers/c-sky: fixup ftrace call-graph panic
        csky: ftrace call graph supported.
        csky: basic ftrace supported
        csky: remove unused members in processor.h
        csky: optimize kernel panic print.
        csky: stacktrace supported.
        csky: CPU-hotplug supported for SMP
        clocksource/drivers/c-sky: fixup qemu fail to bootup sometimes.
        csky: fixup save hi,lo,dspcr regs in switch_stack.
        csky: remove syscall_exit_work
        csky: fixup remove vdsp implement for kernel.
        csky: bugfix gdb coredump error.
        csky: fixup abiv2 mmap(... O_SYNC) failed.
        csky: define syscall_get_arch()
        elf-em.h: add EM_CSKY
        csky: remove meaningless header-y
        csky: Don't leak device tree node reference
      9ee3b3f4
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · a6598110
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
      
       - procfs updates
      
       - various misc bits
      
       - lib/ updates
      
       - epoll updates
      
       - autofs
      
       - fatfs
      
       - a few more MM bits
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
        mm/page_io.c: fix polled swap page in
        checkpatch: add Co-developed-by to signature tags
        docs: fix Co-Developed-by docs
        drivers/base/platform.c: kmemleak ignore a known leak
        fs: don't open code lru_to_page()
        fs/: remove caller signal_pending branch predictions
        mm/: remove caller signal_pending branch predictions
        arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
        kernel/sched/: remove caller signal_pending branch predictions
        kernel/locking/mutex.c: remove caller signal_pending branch predictions
        mm: select HAVE_MOVE_PMD on x86 for faster mremap
        mm: speed up mremap by 20x on large regions
        mm: treewide: remove unused address argument from pte_alloc functions
        initramfs: cleanup incomplete rootfs
        scripts/gdb: fix lx-version string output
        kernel/kcov.c: mark write_comp_data() as notrace
        kernel/sysctl: add panic_print into sysctl
        panic: add options to print system info when panic happens
        bfs: extra sanity checking and static inode bitmap
        exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
        ...
      a6598110
    • Christoph Hellwig's avatar
      ia64: fix compile without swiotlb · 3fed6ae4
      Christoph Hellwig authored
      Some non-generic ia64 configs don't build swiotlb, and thus should not
      pull in the generic non-coherent DMA infrastructure.
      
      Fixes: 68c60834 ("swiotlb: remove dma_mark_clean")
      Reported-by: default avatarTony Luck <tony.luck@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fed6ae4
    • Linus Torvalds's avatar
      x86: re-introduce non-generic memcpy_{to,from}io · 170d13ca
      Linus Torvalds authored
      This has been broken forever, and nobody ever really noticed because
      it's purely a performance issue.
      
      Long long ago, in commit 6175ddf0 ("x86: Clean up mem*io functions")
      Brian Gerst simplified the memory copies to and from iomem, since on
      x86, the instructions to access iomem are exactly the same as the
      regular instructions.
      
      That is technically true, and things worked, and nobody said anything.
      Besides, back then the regular memcpy was pretty simple and worked fine.
      
      Nobody noticed except for David Laight, that is.  David has a testing a
      TLP monitor he was writing for an FPGA, and has been occasionally
      complaining about how memcpy_toio() writes things one byte at a time.
      
      Which is completely unacceptable from a performance standpoint, even if
      it happens to technically work.
      
      The reason it's writing one byte at a time is because while it's
      technically true that accesses to iomem are the same as accesses to
      regular memory on x86, the _granularity_ (and ordering) of accesses
      matter to iomem in ways that they don't matter to regular cached memory.
      
      In particular, when ERMS is set, we default to using "rep movsb" for
      larger memory copies.  That is indeed perfectly fine for real memory,
      since the whole point is that the CPU is going to do cacheline
      optimizations and executes the memory copy efficiently for cached
      memory.
      
      With iomem? Not so much.  With iomem, "rep movsb" will indeed work, but
      it will copy things one byte at a time. Slowly and ponderously.
      
      Now, originally, back in 2010 when commit 6175ddf0 was done, we
      didn't use ERMS, and this was much less noticeable.
      
      Our normal memcpy() was simpler in other ways too.
      
      Because in fact, it's not just about using the string instructions.  Our
      memcpy() these days does things like "read and write overlapping values"
      to handle the last bytes of the copy.  Again, for normal memory,
      overlapping accesses isn't an issue.  For iomem? It can be.
      
      So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
      memset_io() functions.  It doesn't particularly optimize them, but it
      tries to at least not be horrid, or do overlapping accesses.  In fact,
      this uses the existing __inline_memcpy() function that we still had
      lying around that uses our very traditional "rep movsl" loop followed by
      movsw/movsb for the final bytes.
      
      Somebody may decide to try to improve on it, but if we've gone almost a
      decade with only one person really ever noticing and complaining, maybe
      it's not worth worrying about further, once it's not _completely_ broken?
      Reported-by: default avatarDavid Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      170d13ca
    • Linus Torvalds's avatar
      Use __put_user_goto in __put_user_size() and unsafe_put_user() · a959dc88
      Linus Torvalds authored
      This actually enables the __put_user_goto() functionality in
      unsafe_put_user().
      
      For an example of the effect of this, this is the code generated for the
      
              unsafe_put_user(signo, &infop->si_signo, Efault);
      
      in the waitid() system call:
      
      	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_2]
      
      It's just one single store instruction, along with generating an
      exception table entry pointing to the Efault label case in case that
      instruction faults.
      
      Before, we would generate this:
      
      	xorl    %edx, %edx
      	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_3]
              testl   %edx, %edx
              jne     .L309
      
      with the exception table generated for that 'mov' instruction causing us
      to jump to a stub that set %edx to -EFAULT and then jumped back to the
      'testl' instruction.
      
      So not only do we now get rid of the extra code in the normal sequence,
      we also avoid unnecessarily keeping that extra error register live
      across it all.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a959dc88
    • Linus Torvalds's avatar
      x86 uaccess: Introduce __put_user_goto · 4a789213
      Linus Torvalds authored
      This is finally the actual reason for the odd error handling in the
      "unsafe_get/put_user()" functions, introduced over three years ago.
      
      Using a "jump to error label" interface is somewhat odd, but very
      convenient as a programming interface, and more importantly, it fits
      very well with simply making the target be the exception handler address
      directly from the inline asm.
      
      The reason it took over three years to actually do this? We need "asm
      goto" support for it, which only became the default on x86 last year.
      It's now been a year that we've forced asm goto support (see commit
      e501ce95 "x86: Force asm-goto"), and so let's just do it here too.
      
      [ Side note: this commit was originally done back in 2016. The above
        commentary about timing is obviously about it only now getting merged
        into my real upstream tree     - Linus ]
      
      Sadly, gcc still only supports "asm goto" with asms that do not have any
      outputs, so we are limited to only the put_user case for this.  Maybe in
      several more years we can do the get_user case too.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a789213
  2. 04 Jan, 2019 33 commits