An error occurred fetching the project authors.
  1. 07 Jul, 2016 2 commits
    • Thomas Gleixner's avatar
      timers: Remove the deprecated mod_timer_pinned() API · 177ec0a0
      Thomas Gleixner authored
      We switched all users to initialize the timers as pinned and call
      mod_timer(). Remove the now unused timer API function.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      177ec0a0
    • Thomas Gleixner's avatar
      timers: Make 'pinned' a timer property · e675447b
      Thomas Gleixner authored
      We want to move the timer migration logic from a 'push' to a 'pull' model.
      
      Under the current 'push' model pinned timers are handled via
      a runtime API variant: mod_timer_pinned().
      
      The 'pull' model requires us to store the pinned attribute of a timer
      in the timer_list structure itself, as a new TIMER_PINNED bit in
      timer->flags.
      
      This flag must be set at initialization time and the timer APIs
      recognize the flag.
      
      This patch:
      
       - Implements the new flag and associated new-style initialization
         methods
      
       - makes mod_timer() recognize new-style pinned timers,
      
       - and adds some migration helper facility to allow
         step by step conversion of old-style to new-style
         pinned timers.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e675447b
  2. 20 May, 2016 2 commits
  3. 25 Mar, 2016 1 commit
  4. 17 Mar, 2016 1 commit
    • John Stultz's avatar
      timer: convert timer_slack_ns from unsigned long to u64 · da8b44d5
      John Stultz authored
      This patchset introduces a /proc/<pid>/timerslack_ns interface which
      would allow controlling processes to be able to set the timerslack value
      on other processes in order to save power by avoiding wakeups (Something
      Android currently does via out-of-tree patches).
      
      The first patch tries to fix the internal timer_slack_ns usage which was
      defined as a long, which limits the slack range to ~4 seconds on 32bit
      systems.  It converts it to a u64, which provides the same basically
      unlimited slack (500 years) on both 32bit and 64bit machines.
      
      The second patch introduces the /proc/<pid>/timerslack_ns interface
      which allows the full 64bit slack range for a task to be read or set on
      both 32bit and 64bit machines.
      
      With these two patches, on a 32bit machine, after setting the slack on
      bash to 10 seconds:
      
      $ time sleep 1
      
      real    0m10.747s
      user    0m0.001s
      sys     0m0.005s
      
      The first patch is a little ugly, since I had to chase the slack delta
      arguments through a number of functions converting them to u64s.  Let me
      know if it makes sense to break that up more or not.
      
      Other than that things are fairly straightforward.
      
      This patch (of 2):
      
      The timer_slack_ns value in the task struct is currently a unsigned
      long.  This means that on 32bit applications, the maximum slack is just
      over 4 seconds.  However, on 64bit machines, its much much larger (~500
      years).
      
      This disparity could make application development a little (as well as
      the default_slack) to a u64.  This means both 32bit and 64bit systems
      have the same effective internal slack range.
      
      Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
      the interface as a unsigned long, so we preserve that limitation on
      32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
      long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
      actually larger then what can be stored by an unsigned long.
      
      This patch also modifies hrtimer functions which specified the slack
      delta as a unsigned long.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da8b44d5
  5. 04 Nov, 2015 1 commit
    • Tejun Heo's avatar
      timers: Use proper base migration in add_timer_on() · 22b886dd
      Tejun Heo authored
      Regardless of the previous CPU a timer was on, add_timer_on()
      currently simply sets timer->flags to the new CPU.  As the caller must
      be seeing the timer as idle, this is locally fine, but the timer
      leaving the old base while unlocked can lead to race conditions as
      follows.
      
      Let's say timer was on cpu 0.
      
        cpu 0					cpu 1
        -----------------------------------------------------------------------------
        del_timer(timer) succeeds
      					del_timer(timer)
      					  lock_timer_base(timer) locks cpu_0_base
        add_timer_on(timer, 1)
          spin_lock(&cpu_1_base->lock)
          timer->flags set to cpu_1_base
          operates on @timer			  operates on @timer
      
      This triggered with mod_delayed_work_on() which contains
      "if (del_timer()) add_timer_on()" sequence eventually leading to the
      following oops.
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
        ...
        Workqueue: wqthrash wqthrash_workfunc [wqthrash]
        task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000
        RIP: 0010:[<ffffffff810ca6e9>]  [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
        ...
        Call Trace:
         [<ffffffff810cb0b4>] del_timer+0x44/0x60
         [<ffffffff8106e836>] try_to_grab_pending+0xb6/0x160
         [<ffffffff8106e913>] mod_delayed_work_on+0x33/0x80
         [<ffffffffa0000081>] wqthrash_workfunc+0x61/0x90 [wqthrash]
         [<ffffffff8106dba8>] process_one_work+0x1e8/0x650
         [<ffffffff8106e05e>] worker_thread+0x4e/0x450
         [<ffffffff810746af>] kthread+0xef/0x110
         [<ffffffff8185980f>] ret_from_fork+0x3f/0x70
      
      Fix it by updating add_timer_on() to perform proper migration as
      __mod_timer() does.
      Reported-and-tested-by: default avatarJeff Layton <jlayton@poochiereds.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Chris Worley <chris.worley@primarydata.com>
      Cc: bfields@fieldses.org
      Cc: Michael Skralivetsky <michael.skralivetsky@primarydata.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: kernel-team@fb.com
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20151029103113.2f893924@tlielax.poochiereds.net
      Link: http://lkml.kernel.org/r/20151104171533.GI5749@mtj.duckdns.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      22b886dd
  6. 11 Oct, 2015 1 commit
  7. 22 Sep, 2015 1 commit
  8. 18 Aug, 2015 1 commit
    • Eric Dumazet's avatar
      timer: Write timer->flags atomically · d0023a14
      Eric Dumazet authored
      lock_timer_base() cannot prevent the following :
      
      CPU1 ( in __mod_timer()
      timer->flags |= TIMER_MIGRATING;
      spin_unlock(&base->lock);
      base = new_base;
      spin_lock(&base->lock);
      // The next line clears TIMER_MIGRATING
      timer->flags &= ~TIMER_BASEMASK;
                                        CPU2 (in lock_timer_base())
                                        see timer base is cpu0 base
                                        spin_lock_irqsave(&base->lock, *flags);
                                        if (timer->flags == tf)
                                             return base; // oops, wrong base
      timer->flags |= base->cpu // too late
      
      We must write timer->flags in one go, otherwise we can fool other cpus.
      
      Fixes: bc7a34b8 ("timer: Reduce timer migration overhead if disabled")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jon Christopherson <jon@jons.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: xen-devel@lists.xen.org
      Cc: david.vrabel@citrix.com
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Link: http://lkml.kernel.org/r/1439831928.32680.11.camel@edumazet-glaptop2.roam.corp.google.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d0023a14
  9. 26 Jun, 2015 1 commit
    • Thomas Gleixner's avatar
      timer: Fix hotplug regression · 24bfcb10
      Thomas Gleixner authored
      The recent timer wheel rework removed the get/put_cpu_var() pair in
      the hotplug migration code, which results in:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: hib.sh/2845
      ...
      [<ffffffff810d4fa3>] timer_cpu_notify+0x53/0x12
      
      That hunk is a leftover from an earlier iteration and went unnoticed
      so far.
      
      Restore the previous code which was obviously correct.
      
      Fixes: 0eeda71b 'timer: Replace timer base by a cpu index'
      Reported-and_tested-by: Borislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      24bfcb10
  10. 19 Jun, 2015 7 commits
    • Thomas Gleixner's avatar
      timer: Minimize nohz off overhead · 683be13a
      Thomas Gleixner authored
      If nohz is disabled on the kernel command line the [hr]timer code
      still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
      pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
      avoid the overhead.
      
      Before:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.73%  hog       [.] main
        15.36%  [kernel]  [k] _raw_spin_lock_irqsave
         9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.61%  [kernel]  [k] lock_timer_base.isra.38
         6.42%  [kernel]  [k] mod_timer
         3.90%  [kernel]  [k] detach_if_pending
         3.76%  [kernel]  [k] del_timer
         2.41%  [kernel]  [k] internal_add_timer
         1.39%  [kernel]  [k] __internal_add_timer
         0.76%  [kernel]  [k] timerfn
      
      We probably should have a cached value for nohz full in the per cpu
      bases as well to avoid the cpumask check. The base cache line is hot
      already, the cpumask not necessarily.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      683be13a
    • Thomas Gleixner's avatar
      timer: Reduce timer migration overhead if disabled · bc7a34b8
      Thomas Gleixner authored
      Eric reported that the timer_migration sysctl is not really nice
      performance wise as it needs to check at every timer insertion whether
      the feature is enabled or not. Further the check does not live in the
      timer code, so we have an extra function call which checks an extra
      cache line to figure out that it is disabled.
      
      We can do better and store that information in the per cpu (hr)timer
      bases. I pondered to use a static key, but that's a nightmare to
      update from the nohz code and the timer base cache line is hot anyway
      when we select a timer base.
      
      The old logic enabled the timer migration unconditionally if
      CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
      line.
      
      With this modification, we start off with migration disabled. The user
      visible sysctl is still set to enabled. If the kernel switches to NOHZ
      migration is enabled, if the user did not disable it via the sysctl
      prior to the switch. If nohz=off is on the kernel command line,
      migration stays disabled no matter what.
      
      Before:
        47.76%  hog       [.] main
        14.84%  [kernel]  [k] _raw_spin_lock_irqsave
         9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.71%  [kernel]  [k] mod_timer
         6.24%  [kernel]  [k] lock_timer_base.isra.38
         3.76%  [kernel]  [k] detach_if_pending
         3.71%  [kernel]  [k] del_timer
         2.50%  [kernel]  [k] internal_add_timer
         1.51%  [kernel]  [k] get_nohz_timer_target
         1.28%  [kernel]  [k] __internal_add_timer
         0.78%  [kernel]  [k] timerfn
         0.48%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bc7a34b8
    • Thomas Gleixner's avatar
      timer: Stats: Simplify the flags handling · c74441a1
      Thomas Gleixner authored
      Simplify the handling of the flag storage for the timer statistics. No
      intermediate storage anymore. Just hand over the flags field.
      
      I left the printout of 'deferrable' for now because changing this
      would be an ABI update and I have no idea how strong people feel about
      that. OTOH, I wonder whether we should kill the whole timer stats
      stuff because all of that information can be retrieved via ftrace/perf
      as well.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c74441a1
    • Thomas Gleixner's avatar
      timer: Replace timer base by a cpu index · 0eeda71b
      Thomas Gleixner authored
      Instead of storing a pointer to the per cpu tvec_base we can simply
      cache a CPU index in the timer_list and use that to get hold of the
      correct per cpu tvec_base. This is only used in lock_timer_base() and
      the slightly larger code is peanuts versus the spinlock operation and
      the d-cache foot print of the timer wheel.
      
      Aside of that this allows to get rid of following nuisances:
      
       - boot_tvec_base
      
         That statically allocated 4k bss data is just kept around so the
         timer has a home when it gets statically initialized. It serves no
         other purpose.
      
         With the CPU index we assign the timer to CPU0 at static
         initialization time and therefor can avoid the whole boot_tvec_base
         dance.  That also simplifies the init code, which just can use the
         per cpu base.
      
         Before:
           text	   data	    bss	    dec	    hex	filename
          17491	   9201	   4160	  30852	   7884	../build/kernel/time/timer.o
         After:
           text	   data	    bss	    dec	    hex	filename
          17440	   9193	      0	  26633	   6809	../build/kernel/time/timer.o
      
       - Overloading the base pointer with various flags
      
         The CPU index has enough space to hold the flags (deferrable,
         irqsafe) so we can get rid of the extra masking and bit fiddling
         with the base pointer.
      
      As a benefit we reduce the size of struct timer_list on 64 bit
      machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
      which is a real win as we have tons of them embedded in other structs.
      
      This changes also the newly added deferrable printout of the timer
      start trace point to capture and print all timer->flags, which allows
      us to decode the target cpu of the timer as well.
      
      We might have used bitfields for this, but that would change the
      static initializers and the init function for no value to accomodate
      big endian bitfields.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Badhri Jagan Sridharan <Badhri@google.com>
      Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0eeda71b
    • Thomas Gleixner's avatar
      timer: Use hlist for the timer wheel hash buckets · 1dabbcec
      Thomas Gleixner authored
      This reduces the size of struct tvec_base by 50% and results in
      slightly smaller code as well.
      
      Before:
         struct tvec_base: size: 8256, cachelines: 129
      
         text	   data	    bss	    dec	    hex	filename
        17698	  13297	   8256	  39251	   9953	../build/kernel/time/timer.o
      
      After:
        struct tvec_base: 4160, cachelines: 65
      
         text	   data	    bss	    dec	    hex	filename
        17491	   9201	   4160	  30852	   7884	../build/kernel/time/timer.o
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1dabbcec
    • Thomas Gleixner's avatar
      timer: Remove FIFO "guarantee" · 1bd04bf6
      Thomas Gleixner authored
      The FIFO guarantee is only there if two timers are queued into the
      same bucket at the same jiffie on the same cpu:
      
       - The slack value depends on the delta between expiry and enqueue
         time, so the resulting expiry time can be different for timers
         which are queued in different jiffies.
      
       - Timers which are queued into the secondary array end up after a
         later queued timer which was queued into the primary array due to
         cascading.
      
       - Timers can end up on different cpus due to the NOHZ target moving
         around. Obviously there is no guarantee of expiry ordering between
         cpus.
      
      So anything which relies on FIFO behaviour of the timer wheel is
      broken already.
      
      This is a preparatory patch for converting the timer wheel to hlist
      which reduces the memory foot print of the wheel by 50%.
      
      It's a seperate patch so any (unlikely to happen) regression caused by
      this can be identified clearly.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Cc: George Spelvin <linux@horizon.com>
      Link: http://lkml.kernel.org/r/20150526224511.757520403@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1bd04bf6
    • Thomas Gleixner's avatar
      timers: Sanitize catchup_timer_jiffies() usage · 3bb475a3
      Thomas Gleixner authored
      catchup_timer_jiffies() has been applied blindly to several functions
      without looking for possible better ways to do it.
      
      1) internal_add_timer()
      
         Move the update to base->all_timers before we actually insert the
         timer into the wheel.
      
      2) detach_if_pending()
      
         Again the update to base->all_timers allows us to explicitely do
         the timer_jiffies update in place, if this was the last timer which
         got removed.
      
      3) __run_timers()
      
         We only check on entry, which is silly, because base->timer_jiffies
         can be behind - especially on NOHZ kernels - and if there is a
         single deferrable timer somewhere between base->timer_jiffies and
         jiffies we expire it and then loop until base->timer_jiffies ==
         jiffies.
      
         Move it into the loop.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224511.662994644@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      3bb475a3
  11. 22 May, 2015 1 commit
  12. 05 May, 2015 1 commit
  13. 22 Apr, 2015 4 commits
  14. 02 Apr, 2015 3 commits
  15. 04 Nov, 2014 1 commit
  16. 13 Sep, 2014 1 commit
    • Frederic Weisbecker's avatar
      irq_work: Force raised irq work to run on irq work interrupt · 76a33061
      Frederic Weisbecker authored
      The nohz full kick, which restarts the tick when any resource depend
      on it, can't be executed anywhere given the operation it does on timers.
      If it is called from the scheduler or timers code, chances are that
      we run into a deadlock.
      
      This is why we run the nohz full kick from an irq work. That way we make
      sure that the kick runs on a virgin context.
      
      However if that's the case when irq work runs in its own dedicated
      self-ipi, things are different for the big bunch of archs that don't
      support the self triggered way. In order to support them, irq works are
      also handled by the timer interrupt as fallback.
      
      Now when irq works run on the timer interrupt, the context isn't blank.
      More precisely, they can run in the context of the hrtimer that runs the
      tick. But the nohz kick cancels and restarts this hrtimer and cancelling
      an hrtimer from itself isn't allowed. This is why we run in an endless
      loop:
      
      	Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
      	CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
      	Workqueue: btrfs-endio-write normal_work_helper [btrfs]
      	 ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
      	 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
      	 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
      	Call Trace:
      	 <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
      	 [<ffffffff8a7ef928>] panic+0xd4/0x207
      	 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
      	 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
      	 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
      	 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
      	 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
      	 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
      	 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
      	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
      	 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
      	 [<ffffffff8a008268>] do_nmi+0xb8/0x100
      	 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
      	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
      	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
      	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
      	 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
      	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
      	 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
      	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
      	 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
      	 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
      	 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
      	 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
      	 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
      	 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
      	 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
      	 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
      	 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
      	 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
      	 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
      	 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
      	 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
      	 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
      	 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
      	 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
      	 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80
      
      To fix this we force non-lazy irq works to run on irq work self-IPIs
      when available. That ability of the arch to trigger irq work self IPIs
      is available with arch_irq_work_has_interrupt().
      Reported-by: default avatarCatalin Iacob <iacobcatalin@gmail.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      76a33061
  17. 26 Aug, 2014 1 commit
  18. 23 Jun, 2014 3 commits
    • Viresh Kumar's avatar
      timer: Kick dynticks targets on mod_timer*() calls · 9f6d9baa
      Viresh Kumar authored
      When a timer is enqueued or modified on a dynticks target, that CPU
      must re-evaluate the next tick to service that timer.
      
      The tick re-evaluation is performed by an IPI kick on the target.
      Now while we correctly call wake_up_nohz_cpu() from add_timer_on(), the
      mod_timer*() API family doesn't support so well dynticks targets.
      
      The reason for this is likely that __mod_timer() isn't supposed to
      select an idle target for a timer, unless that target is the current
      CPU, in which case a dynticks idle kick isn't actually needed.
      
      But there is a small race window lurking behind that assumption: the
      elected target has all the time to turn dynticks idle between the call
      to get_nohz_timer_target() and the locking of its base. Hence a risk
      that we enqueue a timer on a dynticks idle destination without kicking
      it. As a result, the timer might be serviced too late in the future.
      
      Also a target elected by __mod_timer() can be in full dynticks mode
      and thus require to be kicked as well. And unlike idle dynticks, this
      concern both local and remote targets.
      
      To fix this whole issue, lets centralize the dynticks kick to
      internal_add_timer() so that it is well handled for all sort of timer
      enqueue. Even timer migration is concerned so that a full dynticks target
      is correctly kicked as needed when timers are migrating to it.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1403393357-2070-3-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      9f6d9baa
    • Viresh Kumar's avatar
      timer: Store cpu-number in struct tvec_base · d6f93829
      Viresh Kumar authored
      Timers are serviced by the tick. But when a timer is enqueued on a
      dynticks target, we need to kick it in order to make it reconsider the
      next tick to schedule to correctly handle the timer's expiring time.
      
      Now while this kick is correctly performed for add_timer_on(), the
      mod_timer*() family has been a bit neglected.
      
      To prepare for fixing this, we need internal_add_timer() to be able to
      resolve the CPU target associated to a timer's object 'base' so that the
      kick can be centralized there.
      
      This can't be passed as an argument as not all the callers know the CPU
      number of a timer's base. So lets store it in the struct tvec_base to
      resolve the CPU without much overhead. It is set once for good at every
      CPU's first boot.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1403393357-2070-2-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d6f93829
    • Thomas Gleixner's avatar
      time/timers: Move all time(r) related files into kernel/time · 5cee9645
      Thomas Gleixner authored
      Except for Kconfig.HZ. That needs a separate treatment.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5cee9645
  19. 30 Apr, 2014 1 commit
  20. 20 Mar, 2014 2 commits
  21. 04 Mar, 2014 2 commits
  22. 25 Feb, 2014 2 commits