1. 11 Aug, 2015 1 commit
    • David Howells's avatar
      sign-file: Document dependency on OpenSSL devel libraries · f81977b4
      David Howells authored
      
      The revised sign-file program is no longer a script that wraps the openssl
      program, but now rather a program that makes use of the OpenSSL's crypto
      library.  This means that to build the sign-file program, the kernel build
      process now has a dependency on openssl-devel in addition to openssl.
      
      Document this in Kconfig and in module-signing.txt.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f81977b4
  2. 07 Aug, 2015 6 commits
  3. 04 Jul, 2015 1 commit
  4. 01 Jul, 2015 1 commit
    • Ingo Molnar's avatar
      printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25 · fb39f98d
      Ingo Molnar authored
      
      So I tried to some kernel debugging that produced a ton of kernel messages
      on a big box, and wanted to save them all: but CONFIG_LOG_BUF_SHIFT maxes
      out at 21 (2 MB).
      
      Increase it to 25 (32 MB).
      
      This does not affect any existing config or defaults.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fb39f98d
  5. 26 Jun, 2015 1 commit
    • Iago López Galeiras's avatar
      fs, proc: introduce CONFIG_PROC_CHILDREN · 2e13ba54
      Iago López Galeiras authored
      Commit 81841161 ("fs, proc: introduce /proc/<pid>/task/<tid>/children
      entry") introduced the children entry for checkpoint restore and the
      file is only available on kernels configured with CONFIG_EXPERT and
      CONFIG_CHECKPOINT_RESTORE.
      
      This is available in most distributions (Fedora, Debian, Ubuntu, CoreOS)
      because they usually enable CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE.
      But Arch does not enable CONFIG_EXPERT or CONFIG_CHECKPOINT_RESTORE.
      
      However, the children proc file is useful outside of checkpoint restore.
      I would like to use it in rkt.  The rkt process exec() another program
      it does not control, and that other program will fork()+exec() a child
      process.  I would like to find the pid of the child process from an
      external tool without iterating in /proc over all processes to find
      which one has a parent pid equal to rkt.
      
      This commit introduces CONFIG_PROC_CHILDREN and makes
      CONFIG_CHECKPOINT_RESTORE select it.  This allows enabling
      /proc/<pid>/task/<...
      2e13ba54
  6. 23 Jun, 2015 1 commit
  7. 02 Jun, 2015 1 commit
    • Tejun Heo's avatar
      writeback: add {CONFIG|BDI_CAP|FS}_CGROUP_WRITEBACK · 89e9b9e0
      Tejun Heo authored
      
      cgroup writeback requires support from both bdi and filesystem sides.
      Add BDI_CAP_CGROUP_WRITEBACK and FS_CGROUP_WRITEBACK to indicate
      support and enable BDI_CAP_CGROUP_WRITEBACK on block based bdi's by
      default.  Also, define CONFIG_CGROUP_WRITEBACK which is enabled if
      both MEMCG and BLK_CGROUP are enabled.
      
      inode_cgwb_enabled() which determines whether a given inode's both bdi
      and fs support cgroup writeback is added.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      89e9b9e0
  8. 28 May, 2015 1 commit
  9. 27 May, 2015 10 commits
  10. 08 May, 2015 1 commit
  11. 15 Apr, 2015 1 commit
    • Iulia Manda's avatar
      kernel: conditionally support non-root users, groups and capabilities · 2813893f
      Iulia Manda authored
      There are a lot of embedded systems that run most or all of their
      functionality in init, running as root:root.  For these systems,
      supporting multiple users is not necessary.
      
      This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
      non-root users, non-root groups, and capabilities optional.  It is enabled
      under CONFIG_EXPERT menu.
      
      When this symbol is not defined, UID and GID are zero in any possible case
      and processes always have all capabilities.
      
      The following syscalls are compiled out: setuid, setregid, setgid,
      setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
      getgroups, setfsuid, setfsgid, capget, capset.
      
      Also, groups.c is compiled out completely.
      
      In kernel/capability.c, capable function was moved in order to avoid
      adding two ifdef blocks.
      
      This change saves about 25 KB on a defconfig build.  The most minimal
      kernels have total text sizes in the high hundreds of kB rather than
      low MB.  (The 25k goes ...
      2813893f
  12. 11 Apr, 2015 1 commit
  13. 02 Apr, 2015 1 commit
    • Ingo Molnar's avatar
      bpf: Fix the build on BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable · e1abf2cc
      Ingo Molnar authored
      
      So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
      dependency, it also depends on the tracing infrastructure and on kprobes,
      without which it will fail to build with:
      
        In file included from kernel/trace/bpf_trace.c:14:0:
        kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
        kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
          unsigned int val = current->trace_recursion;
        [...]
      
      It took quite some time to trigger this build failure, because right now
      BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
      more configurable, not just under CONFIG_EXPERT.
      
      If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
      gateway as well.
      
      We might want to make this an interactive option later on, although
      I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
      an indicator that the user wants BPF support.
      
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e1abf2cc
  14. 26 Feb, 2015 1 commit
    • Paul E. McKenney's avatar
      rcu: Add Kconfig option to expedite grace periods during boot · ee42571f
      Paul E. McKenney authored
      
      This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
      that emulates a very early boot rcu_expedite_gp().  A late-boot
      call to rcu_end_inkernel_boot() will provide the corresponding
      rcu_unexpedite_gp().  The late-boot call to rcu_end_inkernel_boot()
      should be made just before init is spawned.
      
      According to Arjan:
      
      > To show the boot time, I'm using the timestamp of the "Write protecting"
      > line, that's pretty much the last thing we print prior to ring 3 execution.
      >
      > A kernel with default RCU behavior (inside KVM, only virtual devices)
      > looks like this:
      >
      > [    0.038724] Write protecting the kernel read-only data: 10240k
      >
      > a kernel with expedited RCU (using the command line option, so that I
      > don't have to recompile between measurements and thus am completely
      > oranges-to-oranges)
      >
      > [    0.031768] Write protecting the kernel read-only data: 10240k
      >
      > which, in percentage, is an 18% improvement.
      Reported-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      ee42571f
  15. 14 Feb, 2015 1 commit
  16. 16 Jan, 2015 1 commit
    • Paul E. McKenney's avatar
      rcu: Optionally run grace-period kthreads at real-time priority · a94844b2
      Paul E. McKenney authored
      
      Recent testing has shown that under heavy load, running RCU's grace-period
      kthreads at real-time priority can improve performance (according to 0day
      test robot) and reduce the incidence of RCU CPU stall warnings.  However,
      most systems do just fine with the default non-realtime priorities for
      these kthreads, and it does not make sense to expose the entire user
      base to any risk stemming from this change, given that this change is
      of use only to a few users running extremely heavy workloads.
      
      Therefore, this commit allows users to specify realtime priorities
      for the grace-period kthreads, but leaves them running SCHED_OTHER
      by default.  The realtime priority may be specified at build time
      via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
      rcutree.kthread_prio parameter.  Either way, 0 says to continue the
      default SCHED_OTHER behavior and values from 1-99 specify that priority
      of SCHED_FIFO behavior.  Note that a value of 0 is not permitted when
      the RCU_BOOST Kconfig parameter is specified.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a94844b2
  17. 08 Jan, 2015 1 commit
  18. 07 Jan, 2015 1 commit
  19. 06 Jan, 2015 2 commits
    • Pranith Kumar's avatar
      rcu: Make SRCU optional by using CONFIG_SRCU · 83fe27ea
      Pranith Kumar authored
      
      SRCU is not necessary to be compiled by default in all cases. For tinification
      efforts not compiling SRCU unless necessary is desirable.
      
      The current patch tries to make compiling SRCU optional by introducing a new
      Kconfig option CONFIG_SRCU which is selected when any of the components making
      use of SRCU are selected.
      
      If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
      
         text    data     bss     dec     hex filename
         2007       0       0    2007     7d7 kernel/rcu/srcu.o
      
      Size of arch/powerpc/boot/zImage changes from
      
         text    data     bss     dec     hex filename
       831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
       829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after
      
      so the savings are about ~2000 bytes.
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-...
      83fe27ea
    • Lai Jiangshan's avatar
      rcu: Remove "select IRQ_WORK" from config TREE_RCU · 5a43b88e
      Lai Jiangshan authored
      The 48a7639c
      
       ("rcu: Make callers awaken grace-period kthread")
      removed the irq_work_queue(), so the TREE_RCU doesn't need
      irq work any more.  This commit therefore updates RCU's Kconfig and
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a43b88e
  20. 11 Dec, 2014 6 commits
    • Andy Lutomirski's avatar
      init: allow CONFIG_INIT_FALLBACK=n to disable defaults if init= fails · 6ef4536e
      Andy Lutomirski authored
      
      If a user puts init=/whatever on the command line and /whatever can't be
      run, then the kernel will try a few default options before giving up.  If
      init=/whatever came from a bootloader prompt, then this is unexpected but
      probably harmless.  On the other hand, if it comes from a script (e.g.  a
      tool like virtme or perhaps a future kselftest script), then the fallbacks
      are likely to exist, but they'll do the wrong thing.  For example, they
      might unexpectedly invoke systemd.
      
      This adds a config option CONFIG_INIT_FALLBACK.  If unset, then a failure
      to run the specified init= process be fatal.
      
      The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Rob Landley <rob@landley.net>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ef4536e
    • Johannes Weiner's avatar
      mm: move page->mem_cgroup bad page handling into generic code · 9edad6ea
      Johannes Weiner authored
      
      Now that the external page_cgroup data structure and its lookup is
      gone, let the generic bad_page() check for page->mem_cgroup sanity.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9edad6ea
    • Aneesh Kumar K.V's avatar
      mm/numa balancing: rearrange Kconfig entry · 6f7c97e8
      Aneesh Kumar K.V authored
      
      Add the default enable config option after the NUMA_BALANCING option so
      that it appears related in the nconfig interface.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f7c97e8
    • Johannes Weiner's avatar
      kernel: res_counter: remove the unused API · 5b1efc02
      Johannes Weiner authored
      
      All memory accounting and limiting has been switched over to the
      lockless page counters.  Bye, res_counter!
      
      [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
      [mhocko@suse.cz: ditch the last remainings of res_counter]
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b1efc02
    • Johannes Weiner's avatar
      mm: hugetlb_cgroup: convert to lockless page counters · 71f87bee
      Johannes Weiner authored
      
      Abandon the spinlock-protected byte counters in favor of the unlocked
      page counters in the hugetlb controller as well.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71f87bee
    • Johannes Weiner's avatar
      mm: memcontrol: lockless page counters · 3e32cb2e
      Johannes Weiner authored
      
      Memory is internally accounted in bytes, using spinlock-protected 64-bit
      counters, even though the smallest accounting delta is a page.  The
      counter interface is also convoluted and does too many things.
      
      Introduce a new lockless word-sized page counter API, then change all
      memory accounting over to it.  The translation from and to bytes then only
      happens when interfacing with userspace.
      
      The removed locking overhead is noticable when scaling beyond the per-cpu
      charge caches - on a 4-socket machine with 144-threads, the following test
      shows the performance differences of 288 memcgs concurrently running a
      page fault benchmark:
      
      vanilla:
      
         18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
               1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
                  24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
           1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
      50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
       8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
       1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
           1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )
      
           132.474343877 seconds time elapsed                                          ( +-  0.21% )
      
      lockless:
      
         12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
                 832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
                  15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
           1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
      32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
       9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
       2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
           1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )
      
            91.369330729 seconds time elapsed                                          ( +-  0.45% )
      
      On top of improved scalability, this also gets rid of the icky long long
      types in the very heart of memcg, which is great for 32 bit and also makes
      the code a lot more readable.
      
      Notable differences between the old and new API:
      
      - res_counter_charge() and res_counter_charge_nofail() become
        page_counter_try_charge() and page_counter_charge() resp. to match
        the more common kernel naming scheme of try_do()/do()
      
      - res_counter_uncharge_until() is only ever used to cancel a local
        counter and never to uncharge bigger segments of a hierarchy, so
        it's replaced by the simpler page_counter_cancel()
      
      - res_counter_set_limit() is replaced by page_counter_limit(), which
        expects its callers to serialize against themselves
      
      - res_counter_memparse_write_strategy() is replaced by
        page_counter_limit(), which rounds down to the nearest page size -
        rather than up.  This is more reasonable for explicitely requested
        hard upper limits.
      
      - to keep charging light-weight, page_counter_try_charge() charges
        speculatively, only to roll back if the result exceeds the limit.
        Because of this, a failing bigger charge can temporarily lock out
        smaller charges that would otherwise succeed.  The error is bounded
        to the difference between the smallest and the biggest possible
        charge size, so for memcg, this means that a failing THP charge can
        send base page charges into reclaim upto 2MB (4MB) before the limit
        would have been reached.  This should be acceptable.
      
      [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
      [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e32cb2e