1. 12 Mar, 2015 8 commits
    • Paul E. McKenney's avatar
      rcu: Add diagnostics to grace-period cleanup · 5c60d25f
      Paul E. McKenney authored
      At grace-period initialization time, RCU checks that all quiescent
      states were really reported for the previous grace period.  Now that
      grace-period cleanup has been split out of grace-period initialization,
      this commit also performs those checks at grace-period cleanup time.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5c60d25f
    • Paul E. McKenney's avatar
      rcutorture: Default to grace-period-initialization delays · 186bea5d
      Paul E. McKenney authored
      Given that CPU-hotplug events are now applied only at the starts of
      grace periods, it makes sense to unconditionally enable slow grace-period
      initialization for rcutorture testing.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      186bea5d
    • Paul E. McKenney's avatar
      rcu: Handle outgoing CPUs on exit from idle loop · 88428cc5
      Paul E. McKenney authored
      This commit informs RCU of an outgoing CPU just before that CPU invokes
      arch_cpu_idle_dead() during its last pass through the idle loop (via a
      new CPU_DYING_IDLE notifier value).  This change means that RCU need not
      deal with outgoing CPUs passing through the scheduler after informing
      RCU that they are no longer online.  Note that removing the CPU from
      the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
      and orphaning callbacks is still done at CPU_DEAD time, the reason being
      that at CPU_DEAD time we have another CPU that can adopt them.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88428cc5
    • Paul E. McKenney's avatar
      cpu: Make CPU-offline idle-loop transition point more precise · 528a25b0
      Paul E. McKenney authored
      This commit uses a per-CPU variable to make the CPU-offline code path
      through the idle loop more precise, so that the outgoing CPU is
      guaranteed to make it into the idle loop before it is powered off.
      This commit is in preparation for putting the RCU offline-handling
      code on this code path, which will eliminate the magic one-jiffy
      wait that RCU uses as the maximum time for an outgoing CPU to get
      all the way through the scheduler.
      
      The magic one-jiffy wait for incoming CPUs remains a separate issue.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      528a25b0
    • Paul E. McKenney's avatar
      rcu: Eliminate ->onoff_mutex from rcu_node structure · c1990689
      Paul E. McKenney authored
      Because that RCU grace-period initialization need no longer exclude
      CPU-hotplug operations, this commit eliminates the ->onoff_mutex and
      its uses.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c1990689
    • Paul E. McKenney's avatar
      rcu: Process offlining and onlining only at grace-period start · 0aa04b05
      Paul E. McKenney authored
      Races between CPU hotplug and grace periods can be difficult to resolve,
      so the ->onoff_mutex is used to exclude the two events.  Unfortunately,
      this means that it is impossible for an outgoing CPU to perform the
      last bits of its offlining from its last pass through the idle loop,
      because sleeplocks cannot be acquired in that context.
      
      This commit avoids these problems by buffering online and offline events
      in a new ->qsmaskinitnext field in the leaf rcu_node structures.  When a
      grace period starts, the events accumulated in this mask are applied to
      the ->qsmaskinit field, and, if needed, up the rcu_node tree.  The special
      case of all CPUs corresponding to a given leaf rcu_node structure being
      offline while there are still elements in that structure's ->blkd_tasks
      list is handled using a new ->wait_blkd_tasks field.  In this case,
      propagating the offline bits up the tree is deferred until the beginning
      of the grace period after all of the tasks have exited their RCU read-side
      critical sections and removed themselves from the list, at which point
      the ->wait_blkd_tasks flag is cleared.  If one of that leaf rcu_node
      structure's CPUs comes back online before the list empties, then the
      ->wait_blkd_tasks flag is simply cleared.
      
      This of course means that RCU's notion of which CPUs are offline can be
      out of date.  This is OK because RCU need only wait on CPUs that were
      online at the time that the grace period started.  In addition, RCU's
      force-quiescent-state actions will handle the case where a CPU goes
      offline after the grace period starts.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0aa04b05
    • Paul E. McKenney's avatar
      rcu: Move rcu_report_unblock_qs_rnp() to common code · cc99a310
      Paul E. McKenney authored
      The rcu_report_unblock_qs_rnp() function is invoked when the
      last task blocking the current grace period exits its outermost
      RCU read-side critical section.  Previously, this was called only
      from rcu_read_unlock_special(), and was therefore defined only when
      CONFIG_RCU_PREEMPT=y.  However, this function will be invoked even when
      CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at
      the beginnings of RCU grace periods.  The reason for this change is that
      the last task on a given leaf rcu_node structure's ->blkd_tasks list
      might well exit its RCU read-side critical section between the time that
      recent CPU-hotplug operations were applied and when the new grace period
      was initialized.  This situation could result in RCU waiting forever on
      that leaf rcu_node structure, because if all that structure's CPUs were
      already offline, there would be no quiescent-state events to drive that
      structure's part of the grace period.
      
      This commit therefore moves rcu_report_unblock_qs_rnp() to common code
      that is built unconditionally so that the quiescent-state-forcing code
      can clean up after this situation, avoiding the grace-period stall.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      cc99a310
    • Paul E. McKenney's avatar
      rcu: Rework preemptible expedited bitmask handling · 8eb74b2b
      Paul E. McKenney authored
      Currently, the rcu_node tree ->expmask bitmasks are initially set to
      reflect the online CPUs.  This is pointless, because only the CPUs
      preempted within RCU read-side critical sections by the preceding
      synchronize_sched_expedited() need to be tracked.  This commit therefore
      instead sets up these bitmasks based on the state of the ->blkd_tasks
      lists.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8eb74b2b
  2. 11 Mar, 2015 12 commits
    • Paul E. McKenney's avatar
      rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs · 999c2863
      Paul E. McKenney authored
      Offline CPUs cannot safely invoke trace events, but such CPUs do execute
      within rcu_cpu_notify().  Therefore, this commit removes the trace events
      from rcu_cpu_notify().  These trace events are for utilization, against
      which rcu_cpu_notify() execution time should be negligible.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      999c2863
    • Paul E. McKenney's avatar
      rcutorture: Enable slow grace-period initializations · b6505dea
      Paul E. McKenney authored
      This commit sets CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y, but leaves the
      default time zero.  This can be overridden by passing the
      "--bootargs rcutree.gp_init_delay=1" argument to kvm.sh.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b6505dea
    • Paul E. McKenney's avatar
      rcu: Provide diagnostic option to slow down grace-period initialization · 37745d28
      Paul E. McKenney authored
      Grace-period initialization normally proceeds quite quickly, so
      that it is very difficult to reproduce races against grace-period
      initialization.  This commit therefore allows grace-period
      initialization to be artificially slowed down, increasing
      race-reproduction probability.  A pair of new Kconfig parameters are
      provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and
      CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies
      of slowdown to apply.  A boot-time parameter named rcutree.gp_init_delay
      allows boot-time delay to be specified.  By default, no delay will be
      applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      37745d28
    • Paul E. McKenney's avatar
      rcu: Detect stalls caused by failure to propagate up rcu_node tree · 237a0f21
      Paul E. McKenney authored
      If all CPUs have passed through quiescent states, then stalls might be
      due to starvation of the grace-period kthread or to failure to propagate
      the quiescent states up the rcu_node combining tree.  The current stall
      warning messages do not differentiate, so this commit adds a printout
      of the root rcu_node structure's ->qsmask field.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      237a0f21
    • Paul E. McKenney's avatar
      18c629ea
    • Paul E. McKenney's avatar
      rcu: Simplify sync_rcu_preempt_exp_init() · c8aead6a
      Paul E. McKenney authored
      This commit eliminates a boolean and associated "if" statement by
      rearranging the code.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c8aead6a
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
      rcu: Consolidate offline-CPU callback initialization · b33078b6
      Paul E. McKenney authored
      Currently, both rcu_cleanup_dead_cpu() and rcu_send_cbs_to_orphanage()
      initialize the outgoing CPU's callback list.  However, only
      rcu_cleanup_dead_cpu() invokes rcu_send_cbs_to_orphanage(), and
      it does so unconditionally, which means that only one of these
      initializations is required.  This commit therefore consolidates the
      callback-list initialization with the rest of the callback handling in
      rcu_send_cbs_to_orphanage().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b33078b6
    • Paul E. McKenney's avatar
      metag: Use common outgoing-CPU-notification code · 490ab882
      Paul E. McKenney authored
      This commit removes the open-coded CPU-offline notification with new
      common code.  This change avoids calling scheduler code using RCU from
      an offline CPU that RCU is ignoring.  This commit is compatible with
      the existing code in not checking for timeout during a prior offline
      for a given CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: <linux-metag@vger.kernel.org>
      490ab882
    • Paul E. McKenney's avatar
      blackfin: Use common outgoing-CPU-notification code · a17b4b74
      Paul E. McKenney authored
      This commit removes the open-coded CPU-offline notification with new
      common code.  This change avoids calling scheduler code using RCU from
      an offline CPU that RCU is ignoring.  This commit is compatible with
      the existing code in not checking for timeout during a prior offline
      for a given CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: <adi-buildroot-devel@lists.sourceforge.net>
      a17b4b74
    • Paul E. McKenney's avatar
      x86: Use common outgoing-CPU-notification code · 2a442c9c
      Paul E. McKenney authored
      This commit removes the open-coded CPU-offline notification with new
      common code.  Among other things, this change avoids calling scheduler
      code using RCU from an offline CPU that RCU is ignoring.  It also allows
      Xen to notice at online time that the CPU did not go offline correctly.
      Note that Xen has the surviving CPU carry out some cleanup operations,
      so if the surviving CPU times out, these cleanup operations might have
      been carried out while the outgoing CPU was still running.  It might
      therefore be unwise to bring this CPU back online, and this commit
      avoids doing so.
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <x86@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: <xen-devel@lists.xenproject.org>
      2a442c9c
    • Paul E. McKenney's avatar
      smpboot: Add common code for notification from dying CPU · 8038dad7
      Paul E. McKenney authored
      RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
      (They -can- use SRCU, but not RCU.)  This means that any use of RCU
      during or after the call to arch_cpu_idle_dead().  Unfortunately,
      commit 2ed53c0d added a complete() call, which will contain RCU
      read-side critical sections if there is a task waiting to be awakened.
      
      Which, as it turns out, there almost never is.  In my qemu/KVM testing,
      the to-be-awakened task is not yet asleep more than 99.5% of the time.
      In current mainline, failure is even harder to reproduce, requiring a
      virtualized environment that delays the outgoing CPU by at least three
      jiffies between the time it exits its stop_machine() task at CPU_DYING
      time and the time it calls arch_cpu_idle_dead() from the idle loop.
      However, this problem really can occur, especially in virtualized
      environments, and therefore really does need to be fixed
      
      This suggests moving back to the polling loop, but using a much shorter
      wait, with gentle exponential backoff instead of the old 100-millisecond
      wait.  Most of the time, the loop will exit without waiting at all,
      and almost all of the remaining uses will wait only five microseconds.
      If the outgoing CPU is preempted, a loop will wait one jiffy, then
      increase the wait by a factor of 11/10ths, rounding up.  As before, there
      is a five-second timeout.
      
      This commit therefore provides common-code infrastructure to do the
      dying-to-surviving CPU handoff in a safe manner.  This code also
      provides an indication at CPU-online of whether the CPU to be onlined
      previously timed out on offline.  The new cpu_check_up_prepare() function
      returns -EBUSY if this CPU previously took more than five seconds to
      go offline, or -EAGAIN if it has not yet managed to go offline.  The
      rationale for -EAGAIN is that it might still be preempted, so an additional
      wait might well find it correctly offlined.  Architecture-specific code
      can decide how to handle these conditions.  Systems in which CPUs take
      themselves completely offline might respond to an -EBUSY return as if
      it was a zero (success) return.  Systems in which the surviving CPU must
      take some action might take it at this time, or might simply mark the
      other CPU as unusable.
      
      Note that architectures that take the easy way out and simply pass the
      -EBUSY and -EAGAIN upwards will change the sysfs API.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      [ paulmck: Fixed state machine for architectures that don't check earlier
        CPU-hotplug results as suggested by James Hogan. ]
      8038dad7
  3. 23 Feb, 2015 3 commits
    • Linus Torvalds's avatar
      Linux 4.0-rc1 · c517d838
      Linus Torvalds authored
      .. after extensive statistical analysis of my G+ polling, I've come to
      the inescapable conclusion that internet polls are bad.
      
      Big surprise.
      
      But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38%
      margin, in a poll that people weren't even supposed to participate in.
      Who can argue with solid numbers like that? 5,796 votes from people who
      can't even follow the most basic directions?
      
      In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%,
      but with a total of 29,110 votes right now.
      
      Now, arguably, that vote spread is only about 3,200 votes, which is less
      than the almost six thousand votes that the "please ignore" poll got, so
      it could be considered noise.
      
      But hey, I asked, so I'll honor the votes.
      c517d838
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · feaf2229
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Ext4 bug fixes.
      
        We also reserved code points for encryption and read-only images (for
        which the implementation is mostly just the reserved code point for a
        read-only feature :-)"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix indirect punch hole corruption
        ext4: ignore journal checksum on remount; don't fail
        ext4: remove duplicate remount check for JOURNAL_CHECKSUM change
        ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize
        ext4: support read-only images
        ext4: change to use setup_timer() instead of init_timer()
        ext4: reserve codepoints used by the ext4 encryption feature
        jbd2: complain about descriptor block checksum errors
      feaf2229
    • Linus Torvalds's avatar
      Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · be5e6616
      Linus Torvalds authored
      Pull more vfs updates from Al Viro:
       "Assorted stuff from this cycle.  The big ones here are multilayer
        overlayfs from Miklos and beginning of sorting ->d_inode accesses out
        from David"
      
      * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
        autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
        procfs: fix race between symlink removals and traversals
        debugfs: leave freeing a symlink body until inode eviction
        Documentation/filesystems/Locking: ->get_sb() is long gone
        trylock_super(): replacement for grab_super_passive()
        fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
        Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
        VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
        SELinux: Use d_is_positive() rather than testing dentry->d_inode
        Smack: Use d_is_positive() rather than testing dentry->d_inode
        TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
        Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
        Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
        VFS: Split DCACHE_FILE_TYPE into regular and special types
        VFS: Add a fallthrough flag for marking virtual dentries
        VFS: Add a whiteout dentry type
        VFS: Introduce inode-getting helpers for layered/unioned fs environments
        Infiniband: Fix potential NULL d_inode dereference
        posix_acl: fix reference leaks in posix_acl_create
        autofs4: Wrong format for printing dentry
        ...
      be5e6616
  4. 22 Feb, 2015 17 commits