1. 23 Dec, 2013 4 commits
    • John Stultz's avatar
      tick/timekeeping: Call update_wall_time outside the jiffies lock · 47a1b796
      John Stultz authored
      Since the xtime lock was split into the timekeeping lock and
      the jiffies lock, we no longer need to call update_wall_time()
      while holding the jiffies lock.
      
      Thus, this patch splits update_wall_time() out from do_timer().
      
      This allows us to get away from calling clock_was_set_delayed()
      in update_wall_time() and instead use the standard clock_was_set()
      call that previously would deadlock, as it causes the jiffies lock
      to be acquired.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      47a1b796
    • John Stultz's avatar
      timekeeping: Avoid possible deadlock from clock_was_set_delayed · 6fdda9a9
      John Stultz authored
      As part of normal operaions, the hrtimer subsystem frequently calls
      into the timekeeping code, creating a locking order of
        hrtimer locks -> timekeeping locks
      
      clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
      between the timekeeping the hrtimer subsystem, so that we could
      notify the hrtimer subsytem the time had changed while holding
      the timekeeping locks. This was done by scheduling delayed work
      that would run later once we were out of the timekeeing code.
      
      But unfortunately the lock chains are complex enoguh that in
      scheduling delayed work, we end up eventually trying to grab
      an hrtimer lock.
      
      Sasha Levin noticed this in testing when the new seqlock lockdep
      enablement triggered the following (somewhat abrieviated) message:
      
      [  251.100221] ======================================================
      [  251.100221] [ INFO: possible circular locking dependency detected ]
      [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
      [  251.101967] -------------------------------------------------------
      [  251.101967] kworker/10:1/4506 is trying to acquire lock:
      [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
      [  251.101967]
      [  251.101967] but task is already holding lock:
      [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] which lock already depends on the new lock.
      [  251.101967]
      [  251.101967]
      [  251.101967] the existing dependency chain (in reverse order) is:
      [  251.101967]
      -> #5 (hrtimer_bases.lock#11){-.-...}:
      [snipped]
      -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [snipped]
      -> #3 (&rq->lock){-.-.-.}:
      [snipped]
      -> #2 (&p->pi_lock){-.-.-.}:
      [snipped]
      -> #1 (&(&pool->lock)->rlock){-.-...}:
      [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
      [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
      [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
      [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
      [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
      [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
      [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
      [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
      [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
      [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
      [  251.101967]
      -> #0 (timekeeper_seq){----..}:
      [snipped]
      [  251.101967] other info that might help us debug this:
      [  251.101967]
      [  251.101967] Chain exists of:
        timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
      
      [  251.101967]  Possible unsafe locking scenario:
      [  251.101967]
      [  251.101967]        CPU0                    CPU1
      [  251.101967]        ----                    ----
      [  251.101967]   lock(hrtimer_bases.lock#11);
      [  251.101967]                                lock(&rt_b->rt_runtime_lock);
      [  251.101967]                                lock(hrtimer_bases.lock#11);
      [  251.101967]   lock(timekeeper_seq);
      [  251.101967]
      [  251.101967]  *** DEADLOCK ***
      [  251.101967]
      [  251.101967] 3 locks held by kworker/10:1/4506:
      [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] stack backtrace:
      [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
      [  251.101967] Workqueue: events clock_was_set_work
      
      So the best solution is to avoid calling clock_was_set_delayed() while
      holding the timekeeping lock, and instead using a flag variable to
      decide if we should call clock_was_set() once we've released the locks.
      
      This works for the case here, where the do_adjtimex() was the deadlock
      trigger point. Unfortuantely, in update_wall_time() we still hold
      the jiffies lock, which would deadlock with the ipi triggered by
      clock_was_set(), preventing us from calling it even after we drop the
      timekeeping lock. So instead call clock_was_set_delayed() at that point.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      6fdda9a9
    • John Stultz's avatar
      timekeeping: Fix potential lost pv notification of time change · 5258d3f2
      John Stultz authored
      In 780427f0 (Indicate that clock was set in the pvclock
      gtod notifier), logic was added to pass a CLOCK_WAS_SET
      notification to the pvclock notifier chain.
      
      While that patch added a action flag returned from
      accumulate_nsecs_to_secs(), it only uses the returned value
      in one location, and not in the logarithmic accumulation.
      
      This means if a leap second triggered during the logarithmic
      accumulation (which is most likely where it would happen),
      the notification that the clock was set would not make it to
      the pv notifiers.
      
      This patch extends the logarithmic_accumulation pass down
      that action flag so proper notification will occur.
      
      This patch also changes the varialbe action -> clock_set
      per Ingo's suggestion.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: <xen-devel@lists.xen.org>
      Cc: stable <stable@vger.kernel.org> #3.11+
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      5258d3f2
    • John Stultz's avatar
      timekeeping: Fix lost updates to tai adjustment · f55c0760
      John Stultz authored
      Since 48cdc135 (Implement a shadow timekeeper), we have to
      call timekeeping_update() after any adjustment to the timekeeping
      structure in order to make sure that any adjustments to the structure
      persist.
      
      Unfortunately, the updates to the tai offset via adjtimex do not
      trigger this update, causing adjustments to the tai offset to be
      made and then over-written by the previous value at the next
      update_wall_time() call.
      
      This patch resovles the issue by calling timekeeping_update()
      right after setting the tai offset.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      f55c0760
  2. 10 Dec, 2013 1 commit
    • Ingo Molnar's avatar
      Merge branch 'timers/posix-timers-for-tip-v2' of... · 0e6601ee
      Ingo Molnar authored
      Merge branch 'timers/posix-timers-for-tip-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
      
      Pull posix cpu timer changes for v3.14 from Frederic Weisbecker:
      
       * Remove dying thread/process timers caching that was complicating the code
         for no significant win.
      
       * Remove early task reference release on dying timer sample read. Again it was
         not worth the code complication. The other timer's resources aren't released
         until timer_delete() is called anyway (or when the whole process dies).
      
       * Remove leftover arguments in reaped target cleanup
      
       * Consolidate some timer sampling code
      
       * Remove use of tasklist lock
      
       * Robustify sighand locking against exec and exit by using the safer
         lock_task_sighand() API instead of sighand raw locking.
      
       * Convert some unnecessary BUG_ON() to WARN_ON()
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0e6601ee
  3. 09 Dec, 2013 10 commits
    • Frederic Weisbecker's avatar
      posix-timers: Convert abuses of BUG_ON to WARN_ON · 531f64fd
      Frederic Weisbecker authored
      The posix cpu timers code makes a heavy use of BUG_ON()
      but none of these concern fatal issues that require
      to stop the machine. So let's just warn the user when
      some internal state slips out of our hands.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      531f64fd
    • Frederic Weisbecker's avatar
      posix-timers: Remove remaining uses of tasklist_lock · e73d84e3
      Frederic Weisbecker authored
      The remaining uses of tasklist_lock were mostly about synchronizing
      against sighand modifications, getting coherent and safe group samples
      and also thread/process wide timers list handling.
      
      All of this is already safely synchronizable with the target's
      sighand lock. Let's use it on these places instead.
      
      Also update the comments about locking.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      e73d84e3
    • Frederic Weisbecker's avatar
      posix-timers: Use sighand lock instead of tasklist_lock on timer deletion · 3d7a1427
      Frederic Weisbecker authored
      Timer deletion doesn't need the tasklist lock.
      We need to protect against:
      
      * concurrent access to the lists p->cputime_expires and
        p->sighand->cputime_expires
      
      * task reaping that may also delete the timer list entry
      
      * timer firing
      
      We already hold the timer lock which protects us against concurrent
      timer firing.
      
      The rest only need the targets sighand to be locked.
      So hold it and drop the use of tasklist_lock there.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      3d7a1427
    • Frederic Weisbecker's avatar
      posix-timers: Use sighand lock instead of tasklist_lock for task clock sample · 50875788
      Frederic Weisbecker authored
      There is no need for the tasklist_lock just to take a process
      wide clock sample.
      
      All we need is to get a coherent sample that doesn't race with
      exit() and exec():
      
      * exit() may be concurrently reaping a task and flushing its time
      
      * sighand is unstable under exit() and exec(), and the latter also
        result in group leader that can change
      
      To protect against these, locking the target's sighand is enough.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      50875788
    • Frederic Weisbecker's avatar
      posix-timers: Consolidate posix_cpu_clock_get() · 33ab0fec
      Frederic Weisbecker authored
      Consolidate the clock sampling common code used for both local
      and remote targets.
      
      Note that this introduces a tiny user ABI change: if a
      PID is passed to clock_gettime() along the clockid,
      we used to forbid a process wide clock sample when that
      PID doesn't belong to a group leader. Now after this patch
      we allow process wide clock samples if that PID belongs to
      the current task, even if the current task is not the
      group leader.
      
      But local process wide clock samples are allowed if PID == 0
      (current task) even if the current task is not the group leader.
      So in the end this should be no big deal as this actually harmonize
      the behaviour when the remote sample is actually a local one.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      33ab0fec
    • Frederic Weisbecker's avatar
      posix-timers: Remove useless clock sample on timers cleanup · af82eb3c
      Frederic Weisbecker authored
      a0b2062b
      ("posix_timers: fix racy timer delta caching on task exit") forgot
      to remove the arguments used for timer caching.
      
      Fix this leftover.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      af82eb3c
    • Frederic Weisbecker's avatar
      posix-timers: Remove dead task special case · a3222f88
      Frederic Weisbecker authored
      Now that we've removed all the optimizations that could
      result in NULL timer's targets, we can remove all the
      associated special case handling.
      
      Also add some warnings on NULL targets to spot any possible
      leftover.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      a3222f88
    • Frederic Weisbecker's avatar
      posix-timers: Cleanup reaped target handling · e26d70d2
      Frederic Weisbecker authored
      When a timer's target is seen to be buried, for example on calls
      to timer_gettime(), the posix cpu timers code behaves a bit
      like a garbage collector and releases early the reference to the
      task.
      
      Then again, this optimization complicates the code for no much
      value: it's up to the user to release the timer and its associated
      ressources by calling timer_delete() after it buries the target
      tasks.
      
      Remove this to simplify the code.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      e26d70d2
    • Frederic Weisbecker's avatar
      posix-timers: Remove dead process posix cpu timers caching · d430b917
      Frederic Weisbecker authored
      Now that we removed dead thread posix cpu timers caching,
      lets remove the dead process wide version. This caching
      is similar to the per thread version but it should be even
      more rare:
      
      * If the process id dead, we are not reading its timers
      status from a thread belonging to its group since they
      are all dead. So this caching only concern remote process
      timers reads. Now posix cpu timers using itimers or timer_settime()
      can't do remote process timers anyway so it's not even clear if there
      is actually a user for this caching.
      
      * Unlike per thread timers caching, this only applies to
      zombies targets. Buried targets' process wide timers return
      0 values. But then again, timer_gettime() can't read remote
      process timers, so if the process is dead, there can't be
      any reader left anyway.
      
      Then again this caching seem to complicate the code for
      corner cases that are probably not worth it. So lets get
      rid of it.
      
      Also remove the sample snapshot on dying process timer
      that is now useless, as suggested by Kosaki.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      d430b917
    • Frederic Weisbecker's avatar
      posix-timers: Remove dead thread posix cpu timers caching · 724a3713
      Frederic Weisbecker authored
      When a task is exiting or has exited, its posix cpu timers
      don't tick anymore and won't elapse further. It's too late
      for them to expire.
      
      So any further call to timer_gettime() on these timers will
      return the same remaining expiry time.
      
      The current code optimize this by caching the remaining delta
      and storing it where we use to save the absolute expiration time.
      This way, the future calls to timer_gettime() won't need to
      compute the difference between the absolute expiration time and
      the current time anymore.
      
      Now this optimization doesn't seem to bring much value. Computing
      the timer remaining delta is not very costly. Fetching the timer
      value OTOH can be costly in two ways:
      
      * CPUCLOCK_SCHED read requires to lock the target's rq. But some
      optimizations are on the way to make task_sched_runtime() not holding
      the rq lock of a non-running target.
      
      * CPUCLOCK_VIRT/CPUCLOCK_PROF read simply consist in fetching
      current->utime/current->stime except when the system uses full
      dynticks cputime accounting. The latter requires a per task lock
      in order to correctly compute user and system time. But once the
      target is dead, this lock shouldn't be contended anyway.
      
      All in one this caching doesn't seem to be justified.
      Given that it complicates the code significantly for
      few wins, let's remove it on single thread timers.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      724a3713
  4. 04 Dec, 2013 1 commit
    • Ingo Molnar's avatar
      Merge branch 'timers/core-v2' of... · a934a56e
      Ingo Molnar authored
      Merge branch 'timers/core-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
      
      Pull dynticks updates from Frederic Weisbecker:
      
        * Fix a bug where posix cpu timers requeued due to interval got ignored on full
          dynticks CPUs (not a regression though as it only impacts full dynticks and the
          bug is there since we merged full dynticks).
      
        * Optimizations and cleanups on the use of per CPU APIs to improve code readability,
          performance and debuggability in the nohz subsystem;
      
        * Optimize posix cpu timer by sparing stub workqueue queue with full dynticks off case
      
        * Rename some functions to extend with *_this_cpu() suffix for clarity
      
        * Refine the naming of some context tracking subsystem state accessors
      
        * Trivial spelling fix by Paul Gortmaker
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a934a56e
  5. 02 Dec, 2013 20 commits
    • Linus Torvalds's avatar
      Merge branch 'leds-fixes-for-3.13' of... · dea4f48a
      Linus Torvalds authored
      Merge branch 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
      
      Pull LED subsystem bugfix from Bryan Wu.
      
      * 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
        leds: pwm: Fix for deferred probe in DT booted mode
      dea4f48a
    • Peter Ujfalusi's avatar
      leds: pwm: Fix for deferred probe in DT booted mode · aa1a6d6d
      Peter Ujfalusi authored
      We need to make sure that the error code from devm_of_pwm_get() is the one
      the module returns in case of failure.
      Restructure the code to make this possible for DT booted case.
      With this patch the driver can ask for deferred probing when the board is
      booted with DT.
      Fixes for example omap4-sdp board's keyboard backlight led.
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarBryan Wu <cooloney@gmail.com>
      aa1a6d6d
    • Linus Torvalds's avatar
      uio: we cannot mmap unaligned page contents · b6550287
      Linus Torvalds authored
      In commit 7314e613 ("Fix a few incorrectly checked
      [io_]remap_pfn_range() calls") the uio driver started more properly
      checking the passed-in user mapping arguments against the size of the
      actual uio driver data.
      
      That in turn exposed that some driver authors apparently didn't realize
      that mmap can only work on a page granularity, and had tried to use it
      with smaller mappings, with the new size check catching that out.
      
      So since it's not just the user mmap() arguments that can be confused,
      make the uio mmap code also verify that the uio driver has the memory
      allocated at page boundaries in order for mmap to work.  If the device
      memory isn't properly aligned, we return
      
        [ENODEV]
          The fildes argument refers to a file whose type is not supported by mmap().
      
      as per the open group documentation on mmap.
      Reported-by: default avatarHolger Brunck <holger.brunck@keymile.com>
      Acked-by: default avatarGreg KH <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6550287
    • Frederic Weisbecker's avatar
      posix-timers: Fix full dynticks CPUs kick on timer rescheduling · c925077c
      Frederic Weisbecker authored
      A posix CPU timer can be rearmed while it is firing or after it is
      notified with a signal. This can happen for example with timers that
      were set with a non zero interval in timer_settime().
      
      This rearming can happen in two places:
      
      1) On timer firing time, which happens on the target's tick. If the timer
      can't trigger a signal because it is ignored, it reschedules itself
      to honour the timer interval.
      
      2) On signal handling from the timer's notification target. This one
      can be a different task than the timer's target itself. Once the
      signal is notified, the notification target rearms the timer, again
      to honour the timer interval.
      
      When a timer is rearmed, we need to notify the full dynticks CPUs
      such that they restart their tick in case they are running tasks that
      may have a share in elapsing this timer.
      
      Now the 1st case above handles full dynticks CPUs with a call to
      posix_cpu_timer_kick_nohz() from the posix cpu timer firing code. But
      the second case ignores the fact that some CPUs may run non-idle tasks
      with their tick off. As a result, when a timer is resheduled after its signal
      notification, the full dynticks CPUs may completely ignore it and not
      tick on the timer as expected
      
      This patch fixes this bug by handling both cases in one. All we need
      is to move the kick to the rearming common code in posix_cpu_timer_schedule().
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Olivier Langlois <olivier@olivierlanglois.net>
      c925077c
    • Frederic Weisbecker's avatar
      posix-timers: Spare workqueue if there is no full dynticks CPU to kick · d4283c65
      Frederic Weisbecker authored
      After a posix cpu timer is set, a workqueue is scheduled in order to
      kick the full dynticks CPUs and let them restart their tick if
      necessary in case the task they are running is concerned by the
      new timer.
      
      This kick is implemented by way of IPIs, which require interrupts
      to be enabled, hence the need for a workqueue to raise them because
      the posix cpu timer set path has interrupts disabled.
      
      Now if there is no full dynticks CPU on the system, the workqueue is
      still scheduled but it simply won't send any IPI and return immediately.
      
      So lets spare that worqueue when it is not needed.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      d4283c65
    • Frederic Weisbecker's avatar
      context_tracking: Rename context_tracking_active() to context_tracking_cpu_is_enabled() · d0df09eb
      Frederic Weisbecker authored
      We currently have a confusing couple of API naming with the existing
      context_tracking_active() and context_tracking_is_enabled().
      
      Lets keep the latter one, context_tracking_is_enabled(), for global
      context tracking state check and use context_tracking_cpu_is_enabled()
      for local state check.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      d0df09eb
    • Frederic Weisbecker's avatar
      context_tracking: Wrap static key check into more intuitive function name · 58135f57
      Frederic Weisbecker authored
      Use a function with a meaningful name to check the global context
      tracking state. static_key_false() is a bit confusing for reviewers.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      58135f57
    • Paul Gortmaker's avatar
      trivial: fix spelling in CONTEXT_TRACKING_FORCE help text · 99c8b1ea
      Paul Gortmaker authored
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      99c8b1ea
    • Frederic Weisbecker's avatar
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker authored
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a45299e7
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       - Correction of fuzzy and fragile IRQ_RETVAL macro
       - IRQ related resume fix affecting only XEN
       - ARM/GIC fix for chained GIC controllers
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip: Gic: fix boot for chained gics
        irq: Enable all irqs unconditionally in irq_resume
        genirq: Correct fuzzy and fragile IRQ_RETVAL() definition
      a45299e7
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0b57ca3
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Various smaller fixlets, all over the place"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/doc: Fix generation of device-drivers
        sched: Expose preempt_schedule_irq()
        sched: Fix a trivial typo in comments
        sched: Remove unused variable in 'struct sched_domain'
        sched: Avoid NULL dereference on sd_busy
        sched: Check sched_domain before computing group power
        MAINTAINERS: Update file patterns in the lockdep and scheduler entries
      a0b57ca3
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e321ae4c
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc kernel and tooling fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tools lib traceevent: Fix conversion of pointer to integer of different size
        perf/trace: Properly use u64 to hold event_id
        perf: Remove fragile swevent hlist optimization
        ftrace, perf: Avoid infinite event generation loop
        tools lib traceevent: Fix use of multiple options in processing field
        perf header: Fix possible memory leaks in process_group_desc()
        perf header: Fix bogus group name
        perf tools: Tag thread comm as overriden
      e321ae4c
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · bcc2f9b7
      Linus Torvalds authored
      Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
       "Fixes to patches that went in this merge window along with a latent
        bug:
         - Fix lazy flushing in case m2p override fails.
         - Fix module compile issues with ARM/Xen
         - Add missing call to DMA map page for Xen SWIOTLB for ARM"
      
      * tag 'stable/for-linus-3.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/gnttab: leave lazy MMU mode in the case of a m2p override failure
        xen/arm: p2m_init and p2m_lock should be static
        arm/xen: Export phys_to_mach to fix Xen module link errors
        swiotlb-xen: add missing xen_dma_map_page call
      bcc2f9b7
    • Linus Torvalds's avatar
      Merge tag 'spi-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · aeac8103
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A smattering of driver specific fixes here, including a bunch for a
        long standing common pattern in the error handling paths, and a fix
        for an embarrassing thinko in the new devm master registration code"
      
      * tag 'spi-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi/pxa2xx: Restore private register bits.
        spi/qspi: Fix qspi remove path.
        spi/qspi: cleanup pm_runtime error check.
        spi/qspi: set correct platform drvdata in ti_qspi_probe()
        spi/pxa2xx: add new ACPI IDs
        spi: core: invert success test in devm_spi_register_master
        spi: spi-mxs: fix reference leak to master in mxs_spi_remove()
        spi: bcm63xx: fix reference leak to master in bcm63xx_spi_remove()
        spi: txx9: fix reference leak to master in txx9spi_remove()
        spi: mpc512x: fix reference leak to master in mpc512x_psc_spi_do_remove()
        spi: rspi: use platform drvdata correctly in rspi_remove()
        spi: bcm2835: fix reference leak to master in bcm2835_spi_remove()
      aeac8103
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5fc92de3
      Linus Torvalds authored
      Pull networking updates from David Miller:
       "Here is a pile of bug fixes that accumulated while I was in Europe"
      
       1) In fixing kernel leaks to userspace during copying of socket
          addresses, we broke a case that used to work, namely the user
          providing a buffer larger than the in-kernel generic socket address
          structure.  This broke Ruby amongst other things.  Fix from Dan
          Carpenter.
      
       2) Fix regression added by byte queue limit support in 8139cp driver,
          from Yang Yingliang.
      
       3) The addition of MSG_SENDPAGE_NOTLAST buggered up a few sendpage
          implementations, they should just treat it the same as MSG_MORE.
          Fix from Richard Weinberger and Shawn Landden.
      
       4) Handle icmpv4 errors received on ipv6 SIT tunnels correctly, from
          Oussama Ghorbel.  In particular we should send an ICMPv6 unreachable
          in such situations.
      
       5) Fix some regressions in the recent genetlink fixes, in particular
          get the pmcraid driver to use the new safer interfaces correctly.
          From Johannes Berg.
      
       6) macvtap was converted to use a per-cpu set of statistics, but some
          code was still bumping tx_dropped elsewhere.  From Jason Wang.
      
       7) Fix build failure of xen-netback due to missing include on some
          architectures, from Andy Whitecroft.
      
       8) macvtap double counts received packets in statistics, fix from Vlad
          Yasevich.
      
       9) Fix various cases of using *_STATS_BH() when *_STATS() is more
          appropriate.  From Eric Dumazet and Hannes Frederic Sowa.
      
      10) Pktgen ipsec mode doesn't update the ipv4 header length and checksum
          properly after encapsulation.  Fix from Fan Du.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
        net/mlx4_en: Remove selftest TX queues empty condition
        {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation
        virtio_net: make all RX paths handle erors consistently
        virtio_net: fix error handling for mergeable buffers
        virtio_net: Fixed a trivial typo (fitler --> filter)
        netem: fix gemodel loss generator
        netem: fix loss 4 state model
        netem: missing break in ge loss generator
        net/hsr: Support iproute print_opt ('ip -details ...')
        net/hsr: Very small fix of comment style.
        MAINTAINERS: Added net/hsr/ maintainer
        ipv6: fix possible seqlock deadlock in ip6_finish_output2
        ixgbe: Make ixgbe_identify_qsfp_module_generic static
        ixgbe: turn NETIF_F_HW_L2FW_DOFFLOAD off by default
        ixgbe: ixgbe_fwd_ring_down needs to be static
        e1000: fix possible reset_task running after adapter down
        e1000: fix lockdep warning in e1000_reset_task
        e1000: prevent oops when adapter is being closed and reset simultaneously
        igb: Fixed Wake On LAN support
        inet: fix possible seqlock deadlocks
        ...
      5fc92de3
    • Linus Torvalds's avatar
      vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229
      Linus Torvalds authored
      The pipe code was trying (and failing) to be very careful about freeing
      the pipe info only after the last access, with a pattern like:
      
              spin_lock(&inode->i_lock);
              if (!--pipe->files) {
                      inode->i_pipe = NULL;
                      kill = 1;
              }
              spin_unlock(&inode->i_lock);
              __pipe_unlock(pipe);
              if (kill)
                      free_pipe_info(pipe);
      
      where the final freeing is done last.
      
      HOWEVER.  The above is actually broken, because while the freeing is
      done at the end, if we have two racing processes releasing the pipe
      inode info, the one that *doesn't* free it will decrement the ->files
      count, and unlock the inode i_lock, but then still use the
      "pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
      
      This is *very* hard to trigger in practice, since the race window is
      very small, and adding debug options seems to just hide it by slowing
      things down.
      
      Simon originally reported this way back in July as an Oops in
      kmem_cache_allocate due to a single bit corruption (due to the final
      "spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
      allocation that had re-used the free'd pipe-info), it's taken this long
      to figure out.
      
      Since the 'pipe->files' accesses aren't even protected by the pipe lock
      (we very much use the inode lock for that), the simple solution is to
      just drop the pipe lock early.  And since there were two users of this
      pattern, create a helper function for it.
      
      Introduced commit ba5bb147 ("pipe: take allocation and freeing of
      pipe_inode_info out of ->i_mutex").
      Reported-by: default avatarSimon Kirby <sim@hostway.ca>
      Reported-by: default avatarIan Applegate <ia@cloudflare.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org   # v3.10+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0d8d229
    • Eugenia Emantayev's avatar
      net/mlx4_en: Remove selftest TX queues empty condition · 833846e8
      Eugenia Emantayev authored
      Remove waiting for TX queues to become empty during selftest.
      This check is not necessary for any purpose, and might put
      the driver into an infinite loop.
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      833846e8
    • fan.du's avatar
      {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation · 3868204d
      fan.du authored
      commit a553e4a6 ("[PKTGEN]: IPSEC support")
      tried to support IPsec ESP transport transformation for pktgen, but acctually
      this doesn't work at all for two reasons(The orignal transformed packet has
      bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
      
      - After transpormation, IPv4 header total length needs update,
        because encrypted payload's length is NOT same as that of plain text.
      
      - After transformation, IPv4 checksum needs re-caculate because of payload
        has been changed.
      
      With this patch, armmed pktgen with below cofiguration, Wireshark is able to
      decrypted ESP packet generated by pktgen without any IPv4 checksum error or
      auth value error.
      
      pgset "flag IPSEC"
      pgset "flows 1"
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3868204d
    • Michael S. Tsirkin's avatar
      virtio_net: make all RX paths handle erors consistently · f121159d
      Michael S. Tsirkin authored
      receive mergeable now handles errors internally.
      Do same for big and small packet paths, otherwise
      the logic is too hard to follow.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f121159d
    • Michael S. Tsirkin's avatar
      virtio_net: fix error handling for mergeable buffers · 8fc3b9e9
      Michael S. Tsirkin authored
      Eric Dumazet noticed that if we encounter an error
      when processing a mergeable buffer, we don't
      dequeue all of the buffers from this packet,
      the result is almost sure to be loss of networking.
      
      Jason Wang noticed that we also leak a page and that we don't decrement
      the rq buf count, so we won't repost buffers (a resource leak).
      
      Fix both issues.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael Dalton <mwdalton@google.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fc3b9e9
  6. 01 Dec, 2013 4 commits