An error occurred fetching the project authors.
  1. 20 Oct, 2020 1 commit
  2. 15 Aug, 2020 1 commit
    • Xiaoming Ni's avatar
      all arch: remove system call sys_sysctl · 88db0aa2
      Xiaoming Ni authored
      Since commit 61a47c1a ("sysctl: Remove the sysctl system call"),
      sys_sysctl is actually unavailable: any input can only return an error.
      
      We have been warning about people using the sysctl system call for years
      and believe there are no more users.  Even if there are users of this
      interface if they have not complained or fixed their code by now they
      probably are not going to, so there is no point in warning them any
      longer.
      
      So completely remove sys_sysctl on all architectures.
      
      [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
       Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.comSigned-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: chenzefeng <chenzefeng2@huawei.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kars de Jong <jongk@linux-m68k.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Sven Schnelle <svens@stackframe.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
      Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88db0aa2
  3. 12 Aug, 2020 1 commit
  4. 27 Jul, 2020 1 commit
    • Al Viro's avatar
      introduction of regset ->get() wrappers, switching ELF coredumps to those · b4e9c954
      Al Viro authored
      Two new helpers: given a process and regset, dump into a buffer.
      regset_get() takes a buffer and size, regset_get_alloc() takes size
      and allocates a buffer.
      
      Return value in both cases is the amount of data actually dumped in
      case of success or -E...  on error.
      
      In both cases the size is capped by regset->n * regset->size, so
      ->get() is called with offset 0 and size no more than what regset
      expects.
      
      binfmt_elf.c callers of ->get() are switched to using those; the other
      caller (copy_regset_to_user()) will need some preparations to switch.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b4e9c954
  5. 24 Jul, 2020 1 commit
    • Thomas Gleixner's avatar
      entry: Provide generic syscall entry functionality · 142781e1
      Thomas Gleixner authored
      On syscall entry certain work needs to be done:
      
         - Establish state (lockdep, context tracking, tracing)
         - Conditional work (ptrace, seccomp, audit...)
      
      This code is needlessly duplicated and  different in all
      architectures.
      
      Provide a generic version based on the x86 implementation which has all the
      RCU and instrumentation bits right.
      
      As interrupt/exception entry from user space needs parts of the same
      functionality, provide a function for this as well.
      
      syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be
      called right after the low level ASM entry. The calling code must be
      non-instrumentable. After the functions returns state is correct and the
      subsequent functions can be instrumented.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lkml.kernel.org/r/20200722220519.513463269@linutronix.de
      142781e1
  6. 04 Jul, 2020 1 commit
  7. 24 Jun, 2020 1 commit
  8. 19 May, 2020 1 commit
    • David Howells's avatar
      pipe: Add general notification queue support · c73be61c
      David Howells authored
      Make it possible to have a general notification queue built on top of a
      standard pipe.  Notifications are 'spliced' into the pipe and then read
      out.  splice(), vmsplice() and sendfile() are forbidden on pipes used for
      notifications as post_one_notification() cannot take pipe->mutex.  This
      means that notifications could be posted in between individual pipe
      buffers, making iov_iter_revert() difficult to effect.
      
      The way the notification queue is used is:
      
       (1) An application opens a pipe with a special flag and indicates the
           number of messages it wishes to be able to queue at once (this can
           only be set once):
      
      	pipe2(fds, O_NOTIFICATION_PIPE);
      	ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
      
       (2) The application then uses poll() and read() as normal to extract data
           from the pipe.  read() will return multiple notifications if the
           buffer is big enough, but it will not split a notification across
           buffers - rather it will return a short read or EMSGSIZE.
      
           Notification messages include a length in the header so that the
           caller can split them up.
      
      Each message has a header that describes it:
      
      	struct watch_notification {
      		__u32	type:24;
      		__u32	subtype:8;
      		__u32	info;
      	};
      
      The type indicates the source (eg. mount tree changes, superblock events,
      keyring changes, block layer events) and the subtype indicates the event
      type (eg. mount, unmount; EIO, EDQUOT; link, unlink).  The info field
      indicates a number of things, including the entry length, an ID assigned to
      a watchpoint contributing to this buffer and type-specific flags.
      
      Supplementary data, such as the key ID that generated an event, can be
      attached in additional slots.  The maximum message size is 127 bytes.
      Messages may not be padded or aligned, so there is no guarantee, for
      example, that the notification type will be on a 4-byte bounary.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c73be61c
  9. 15 May, 2020 1 commit
  10. 31 Jan, 2020 1 commit
    • Dmitry Vyukov's avatar
      kcov: ignore fault-inject and stacktrace · 43e76af8
      Dmitry Vyukov authored
      Don't instrument 3 more files that contain debugging facilities and
      produce large amounts of uninteresting coverage for every syscall.
      
      The following snippets are sprinkled all over the place in kcov traces
      in a debugging kernel.  We already try to disable instrumentation of
      stack unwinding code and of most debug facilities.  I guess we did not
      use fault-inject.c at the time, and stacktrace.c was somehow missed (or
      something has changed in kernel/configs).  This change both speeds up
      kcov (kernel doesn't need to store these PCs, user-space doesn't need to
      process them) and frees trace buffer capacity for more useful coverage.
      
        should_fail
        lib/fault-inject.c:149
        fail_dump
        lib/fault-inject.c:45
      
        stack_trace_save
        kernel/stacktrace.c:124
        stack_trace_consume_entry
        kernel/stacktrace.c:86
        stack_trace_consume_entry
        kernel/stacktrace.c:89
        ... a hundred frames skipped ...
        stack_trace_consume_entry
        kernel/stacktrace.c:93
        stack_trace_consume_entry
        kernel/stacktrace.c:86
      
      Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.comSigned-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43e76af8
  11. 16 Nov, 2019 2 commits
  12. 11 Nov, 2019 1 commit
  13. 30 Sep, 2019 1 commit
  14. 06 Sep, 2019 1 commit
  15. 04 Sep, 2019 1 commit
    • Masahiro Yamada's avatar
      kbuild: add $(BASH) to run scripts with bash-extension · 858805b3
      Masahiro Yamada authored
      CONFIG_SHELL falls back to sh when bash is not installed on the system,
      but nobody is testing such a case since bash is usually installed.
      So, shell scripts invoked by CONFIG_SHELL are only tested with bash.
      
      It makes it difficult to test whether the hashbang #!/bin/sh is real.
      For example, #!/bin/sh in arch/powerpc/kernel/prom_init_check.sh is
      false. (I fixed it up)
      
      Besides, some shell scripts invoked by CONFIG_SHELL use bash-extension
      and #!/bin/bash is specified as the hashbang, while CONFIG_SHELL may
      not always be set to bash.
      
      Probably, the right thing to do is to introduce BASH, which is bash by
      default, and always set CONFIG_SHELL to sh. Replace $(CONFIG_SHELL)
      with $(BASH) for bash scripts.
      
      If somebody tries to add bash-extension to a #!/bin/sh script, it will
      be caught in testing because /bin/sh is a symlink to dash on some major
      distributions.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      858805b3
  16. 05 Aug, 2019 1 commit
  17. 03 Aug, 2019 1 commit
  18. 24 May, 2019 1 commit
  19. 15 May, 2019 1 commit
  20. 29 Apr, 2019 1 commit
    • Joel Fernandes (Google)'s avatar
      Provide in-kernel headers to make extending kernel easier · 43d8ce9d
      Joel Fernandes (Google) authored
      Introduce in-kernel headers which are made available as an archive
      through proc (/proc/kheaders.tar.xz file). This archive makes it
      possible to run eBPF and other tracing programs that need to extend the
      kernel for tracing purposes without any dependency on the file system
      having headers.
      
      A github PR is sent for the corresponding BCC patch at:
      https://github.com/iovisor/bcc/pull/2312
      
      On Android and embedded systems, it is common to switch kernels but not
      have kernel headers available on the file system. Further once a
      different kernel is booted, any headers stored on the file system will
      no longer be useful. This is an issue even well known to distros.
      By storing the headers as a compressed archive within the kernel, we can
      avoid these issues that have been a hindrance for a long time.
      
      The best way to use this feature is by building it in. Several users
      have a need for this, when they switch debug kernels, they do not want to
      update the filesystem or worry about it where to store the headers on
      it. However, the feature is also buildable as a module in case the user
      desires it not being part of the kernel image. This makes it possible to
      load and unload the headers from memory on demand. A tracing program can
      load the module, do its operations, and then unload the module to save
      kernel memory. The total memory needed is 3.3MB.
      
      By having the archive available at a fixed location independent of
      filesystem dependencies and conventions, all debugging tools can
      directly refer to the fixed location for the archive, without concerning
      with where the headers on a typical filesystem which significantly
      simplifies tooling that needs kernel headers.
      
      The code to read the headers is based on /proc/config.gz code and uses
      the same technique to embed the headers.
      
      Other approaches were discussed such as having an in-memory mountable
      filesystem, but that has drawbacks such as requiring an in-kernel xz
      decompressor which we don't have today, and requiring usage of 42 MB of
      kernel memory to host the decompressed headers at anytime. Also this
      approach is simpler than such approaches.
      Reviewed-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43d8ce9d
  21. 03 Apr, 2019 1 commit
    • Peter Zijlstra's avatar
      x86/uaccess, kcov: Disable stack protector · 40ea9729
      Peter Zijlstra authored
      New tooling noticed this mishap:
      
        kernel/kcov.o: warning: objtool: write_comp_data()+0x138: call to __stack_chk_fail() with UACCESS enabled
        kernel/kcov.o: warning: objtool: __sanitizer_cov_trace_pc()+0xd9: call to __stack_chk_fail() with UACCESS enabled
      
      All the other instrumentation (KASAN,UBSAN) also have stack protector
      disabled.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      40ea9729
  22. 08 Mar, 2019 1 commit
  23. 06 Jan, 2019 1 commit
  24. 07 Dec, 2018 1 commit
    • Arnd Bergmann's avatar
      y2038: futex: Move compat implementation into futex.c · 04e7712f
      Arnd Bergmann authored
      We are going to share the compat_sys_futex() handler between 64-bit
      architectures and 32-bit architectures that need to deal with both 32-bit
      and 64-bit time_t, and this is easier if both entry points are in the
      same file.
      
      In fact, most other system call handlers do the same thing these days, so
      let's follow the trend here and merge all of futex_compat.c into futex.c.
      
      In the process, a few minor changes have to be done to make sure everything
      still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become
      local symbol, and the compat version of the fetch_robust_entry() function
      gets renamed to compat_fetch_robust_entry() to avoid a symbol clash.
      
      This is intended as a purely cosmetic patch, no behavior should
      change.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      04e7712f
  25. 19 Nov, 2018 1 commit
  26. 04 Sep, 2018 1 commit
    • Alexander Popov's avatar
      x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls · afaef01c
      Alexander Popov authored
      The STACKLEAK feature (initially developed by PaX Team) has the following
      benefits:
      
      1. Reduces the information that can be revealed through kernel stack leak
         bugs. The idea of erasing the thread stack at the end of syscalls is
         similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
         crypto, which all comply with FDP_RIP.2 (Full Residual Information
         Protection) of the Common Criteria standard.
      
      2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
         CVE-2010-2963). That kind of bugs should be killed by improving C
         compilers in future, which might take a long time.
      
      This commit introduces the code filling the used part of the kernel
      stack with a poison value before returning to userspace. Full
      STACKLEAK feature also contains the gcc plugin which comes in a
      separate commit.
      
      The STACKLEAK feature is ported from grsecurity/PaX. More information at:
        https://grsecurity.net/
        https://pax.grsecurity.net/
      
      This code is modified from Brad Spengler/PaX Team's code in the last
      public patch of grsecurity/PaX based on our understanding of the code.
      Changes or omissions from the original code are ours and don't reflect
      the original grsecurity/PaX code.
      
      Performance impact:
      
      Hardware: Intel Core i7-4770, 16 GB RAM
      
      Test #1: building the Linux kernel on a single core
              0.91% slowdown
      
      Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
              4.2% slowdown
      
      So the STACKLEAK description in Kconfig includes: "The tradeoff is the
      performance impact: on a single CPU system kernel compilation sees a 1%
      slowdown, other systems and workloads may vary and you are advised to
      test this feature on your expected workload before deploying it".
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      afaef01c
  27. 17 Jul, 2018 1 commit
    • Masahiro Yamada's avatar
      kbuild: move bin2c back to scripts/ from scripts/basic/ · c417fbce
      Masahiro Yamada authored
      Commit 8370edea ("bin2c: move bin2c in scripts/basic") moved bin2c
      to the scripts/basic/ directory, incorrectly stating "Kexec wants to
      use bin2c and it wants to use it really early in the build process.
      See arch/x86/purgatory/ code in later patches."
      
      Commit bdab125c ("Revert "kexec/purgatory: Add clean-up for
      purgatory directory"") and commit d6605b6b ("x86/build: Remove
      unnecessary preparation for purgatory") removed the redundant
      purgatory build magic entirely.
      
      That means that the move of bin2c was unnecessary in the first place.
      
      fixdep is the only host program that deserves to sit in the
      scripts/basic/ directory.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      c417fbce
  28. 14 Jun, 2018 1 commit
    • Christoph Hellwig's avatar
      dma-mapping: move all DMA mapping code to kernel/dma · cf65a0f6
      Christoph Hellwig authored
      Currently the code is split over various files with dma- prefixes in the
      lib/ and drives/base directories, and the number of files keeps growing.
      Move them into a single directory to keep the code together and remove
      the file name prefixes.  To match the irq infrastructure this directory
      is placed under the kernel/ directory.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      cf65a0f6
  29. 06 Jun, 2018 1 commit
    • Mathieu Desnoyers's avatar
      rseq: Introduce restartable sequences system call · d7822b1e
      Mathieu Desnoyers authored
      Expose a new system call allowing each thread to register one userspace
      memory area to be used as an ABI between kernel and user-space for two
      purposes: user-space restartable sequences and quick access to read the
      current CPU number value from user-space.
      
      * Restartable sequences (per-cpu atomics)
      
      Restartables sequences allow user-space to perform update operations on
      per-cpu data without requiring heavy-weight atomic operations.
      
      The restartable critical sections (percpu atomics) work has been started
      by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
      critical sections. [1] [2] The re-implementation proposed here brings a
      few simplifications to the ABI which facilitates porting to other
      architectures and speeds up the user-space fast path.
      
      Here are benchmarks of various rseq use-cases.
      
      Test hardware:
      
      arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
      x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading
      
      The following benchmarks were all performed on a single thread.
      
      * Per-CPU statistic counter increment
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:                344.0                 31.4          11.0
      x86-64:                15.3                  2.0           7.7
      
      * LTTng-UST: write event 32-bit header, 32-bit payload into tracer
                   per-cpu buffer
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:               2502.0                 2250.0         1.1
      x86-64:               117.4                   98.0         1.2
      
      * liburcu percpu: lock-unlock pair, dereference, read/compare word
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:                751.0                 128.5          5.8
      x86-64:                53.4                  28.6          1.9
      
      * jemalloc memory allocator adapted to use rseq
      
      Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
      rseq 2016 implementation):
      
      The production workload response-time has 1-2% gain avg. latency, and
      the P99 overall latency drops by 2-3%.
      
      * Reading the current CPU number
      
      Speeding up reading the current CPU number on which the caller thread is
      running is done by keeping the current CPU number up do date within the
      cpu_id field of the memory area registered by the thread. This is done
      by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
      current thread. Upon return to user-space, a notify-resume handler
      updates the current CPU value within the registered user-space memory
      area. User-space can then read the current CPU number directly from
      memory.
      
      Keeping the current cpu id in a memory area shared between kernel and
      user-space is an improvement over current mechanisms available to read
      the current CPU number, which has the following benefits over
      alternative approaches:
      
      - 35x speedup on ARM vs system call through glibc
      - 20x speedup on x86 compared to calling glibc, which calls vdso
        executing a "lsl" instruction,
      - 14x speedup on x86 compared to inlined "lsl" instruction,
      - Unlike vdso approaches, this cpu_id value can be read from an inline
        assembly, which makes it a useful building block for restartable
        sequences.
      - The approach of reading the cpu id through memory mapping shared
        between kernel and user-space is portable (e.g. ARM), which is not the
        case for the lsl-based x86 vdso.
      
      On x86, yet another possible approach would be to use the gs segment
      selector to point to user-space per-cpu data. This approach performs
      similarly to the cpu id cache, but it has two disadvantages: it is
      not portable, and it is incompatible with existing applications already
      using the gs segment selector for other purposes.
      
      Benchmarking various approaches for reading the current CPU number:
      
      ARMv7 Processor rev 4 (v7l)
      Machine model: Cubietruck
      - Baseline (empty loop):                                    8.4 ns
      - Read CPU from rseq cpu_id:                               16.7 ns
      - Read CPU from rseq cpu_id (lazy register):               19.8 ns
      - glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
      - getcpu system call:                                     234.9 ns
      
      x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
      - Baseline (empty loop):                                    0.8 ns
      - Read CPU from rseq cpu_id:                                0.8 ns
      - Read CPU from rseq cpu_id (lazy register):                0.8 ns
      - Read using gs segment selector:                           0.8 ns
      - "lsl" inline assembly:                                   13.0 ns
      - glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
      - getcpu system call:                                      53.9 ns
      
      - Speed (benchmark taken on v8 of patchset)
      
      Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
      expectations, that enabling CONFIG_RSEQ slightly accelerates the
      scheduler:
      
      Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
      2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
      saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
      kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
      restartable sequences series applied.
      
      * CONFIG_RSEQ=n
      
      avg.:      41.37 s
      std.dev.:   0.36 s
      
      * CONFIG_RSEQ=y
      
      avg.:      40.46 s
      std.dev.:   0.33 s
      
      - Size
      
      On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
      567 bytes, and the data size increase of vmlinux is 5696 bytes.
      
      [1] https://lwn.net/Articles/650333/
      [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdfSigned-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-api@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
      Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
      Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
      d7822b1e
  30. 16 May, 2018 1 commit
  31. 13 Jan, 2018 1 commit
    • Masami Hiramatsu's avatar
      error-injection: Support fault injection framework · 4b1a29a7
      Masami Hiramatsu authored
      Support in-kernel fault-injection framework via debugfs.
      This allows you to inject a conditional error to specified
      function using debugfs interfaces.
      
      Here is the result of test script described in
      Documentation/fault-injection/fault-injection.txt
      
        ===========
        # ./test_fail_function.sh
        1+0 records in
        1+0 records out
        1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0227404 s, 46.1 MB/s
        btrfs-progs v4.4
        See http://btrfs.wiki.kernel.org for more information.
      
        Label:              (null)
        UUID:               bfa96010-12e9-4360-aed0-42eec7af5798
        Node size:          16384
        Sector size:        4096
        Filesystem size:    1001.00MiB
        Block group profiles:
          Data:             single            8.00MiB
          Metadata:         DUP              58.00MiB
          System:           DUP              12.00MiB
        SSD detected:       no
        Incompat features:  extref, skinny-metadata
        Number of devices:  1
        Devices:
           ID        SIZE  PATH
            1  1001.00MiB  /dev/loop2
      
        mount: mount /dev/loop2 on /opt/tmpmnt failed: Cannot allocate memory
        SUCCESS!
        ===========
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4b1a29a7
  32. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  33. 09 Sep, 2017 2 commits
    • Luis R. Rodriguez's avatar
      kmod: move #ifdef CONFIG_MODULES wrapper to Makefile · 0ce2c202
      Luis R. Rodriguez authored
      The entire file is now conditionally compiled only when CONFIG_MODULES is
      enabled, and this this is a bool.  Just move this conditional to the
      Makefile as its easier to read this way.
      
      Link: http://lkml.kernel.org/r/20170810180618.22457-5-mcgrof@kernel.orgSigned-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ce2c202
    • Luis R. Rodriguez's avatar
      kmod: split out umh code into its own file · 23558693
      Luis R. Rodriguez authored
      Patch series "kmod: few code cleanups to split out umh code"
      
      The usermode helper has a provenance from the old usb code which first
      required a usermode helper.  Eventually this was shoved into kmod.c and
      the kernel's modprobe calls was converted over eventually to share the
      same code.  Over time the list of usermode helpers in the kernel has grown
      -- so kmod is just but one user of the API.
      
      This series is a simple logical cleanup which acknowledges the code
      evolution of the usermode helper and shoves the UMH API into its own
      dedicated file.  This way users of the API can later just include umh.h
      instead of kmod.h.
      
      Note despite the diff state the first patch really is just a code shove,
      no functional changes are done there.  I did use git format-patch -M to
      generate the patch, but in the end the split was not enough for git to
      consider it a rename hence the large diffstat.
      
      I've put this through 0-day and it gives me their machine compilation
      blessings with all tests as OK.
      
      This patch (of 4):
      
      There's a slew of usermode helper users and kmod is just one of them.
      Split out the usermode helper code into its own file to keep the logic and
      focus split up.
      
      This change provides no functional changes.
      
      Link: http://lkml.kernel.org/r/20170810180618.22457-2-mcgrof@kernel.orgSigned-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23558693
  34. 17 Aug, 2017 1 commit
    • Mathieu Desnoyers's avatar
      membarrier: Provide expedited private command · 22e4ebb9
      Mathieu Desnoyers authored
      Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
      from all runqueues for which current thread's mm is the same as the
      thread calling sys_membarrier. It executes faster than the non-expedited
      variant (no blocking). It also works on NOHZ_FULL configurations.
      
      Scheduler-wise, it requires a memory barrier before and after context
      switching between processes (which have different mm). The memory
      barrier before context switch is already present. For the barrier after
      context switch:
      
      * Our TSO archs can do RELEASE without being a full barrier. Look at
        x86 spin_unlock() being a regular STORE for example.  But for those
        archs, all atomics imply smp_mb and all of them have atomic ops in
        switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
        barrier.
      
      * From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
        the rest does indeed do smp_mb(), so there the spin_unlock() is a full
        barrier and we're good.
      
      * ARM64 has a very heavy barrier in switch_to(), which suffices.
      
      * PPC just removed its barrier from switch_to(), but appears to be
        talking about adding something to switch_mm(). So add a
        smp_mb__after_unlock_lock() for now, until this is settled on the PPC
        side.
      
      Changes since v3:
      - Properly document the memory barriers provided by each architecture.
      
      Changes since v2:
      - Address comments from Peter Zijlstra,
      - Add smp_mb__after_unlock_lock() after finish_lock_switch() in
        finish_task_switch() to add the memory barrier we need after storing
        to rq->curr. This is much simpler than the previous approach relying
        on atomic_dec_and_test() in mmdrop(), which actually added a memory
        barrier in the common case of switching between userspace processes.
      - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
        kernel, rather than having the whole membarrier system call returning
        -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
        Adapt the CMD_QUERY mask accordingly.
      
      Changes since v1:
      - move membarrier code under kernel/sched/ because it uses the
        scheduler runqueue,
      - only add the barrier when we switch from a kernel thread. The case
        where we switch from a user-space thread is already handled by
        the atomic_dec_and_test() in mmdrop().
      - add a comment to mmdrop() documenting the requirement on the implicit
        memory barrier.
      
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Boqun Feng <boqun.feng@gmail.com>
      CC: Andrew Hunter <ahh@google.com>
      CC: Maged Michael <maged.michael@gmail.com>
      CC: gromer@google.com
      CC: Avi Kivity <avi@scylladb.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarDave Watson <davejwatson@fb.com>
      22e4ebb9
  35. 12 Jul, 2017 1 commit
  36. 09 May, 2017 1 commit
    • Hari Bathini's avatar
      crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE · 692f66f2
      Hari Bathini authored
      Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
      reuse crashkernel parameter for fadump", v4.
      
      Traditionally, kdump is used to save vmcore in case of a crash.  Some
      architectures like powerpc can save vmcore using architecture specific
      support instead of kexec/kdump mechanism.  Such architecture specific
      support also needs to reserve memory, to be used by dump capture kernel.
      crashkernel parameter can be a reused, for memory reservation, by such
      architecture specific infrastructure.
      
      This patchset removes dependency with CONFIG_KEXEC for crashkernel
      parameter and vmcoreinfo related code as it can be reused without kexec
      support.  Also, crashkernel parameter is reused instead of
      fadump_reserve_mem to reserve memory for fadump.
      
      The first patch moves crashkernel parameter parsing and vmcoreinfo
      related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE.  The
      second patch reuses the definitions of append_elf_note() & final_note()
      functions under CONFIG_CRASH_CORE in IA64 arch code.  The third patch
      removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
      in powerpc.  The next patch reuses crashkernel parameter for reserving
      memory for fadump, instead of the fadump_reserve_mem parameter.  This
      has the advantage of using all syntaxes crashkernel parameter supports,
      for fadump as well.  The last patch updates fadump kernel documentation
      about use of crashkernel parameter.
      
      This patch (of 5):
      
      Traditionally, kdump is used to save vmcore in case of a crash.  Some
      architectures like powerpc can save vmcore using architecture specific
      support instead of kexec/kdump mechanism.  Such architecture specific
      support also needs to reserve memory, to be used by dump capture kernel.
      crashkernel parameter can be a reused, for memory reservation, by such
      architecture specific infrastructure.
      
      But currently, code related to vmcoreinfo and parsing of crashkernel
      parameter is built under CONFIG_KEXEC_CORE.  This patch introduces
      CONFIG_CRASH_CORE and moves the above mentioned code under this config,
      allowing code reuse without dependency on CONFIG_KEXEC.  There is no
      functional change with this patch.
      
      Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.comSigned-off-by: default avatarHari Bathini <hbathini@linux.vnet.ibm.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      692f66f2
  37. 27 Dec, 2016 1 commit
  38. 15 Dec, 2016 1 commit