1. 30 Oct, 2014 6 commits
  2. 15 Oct, 2014 17 commits
  3. 09 Oct, 2014 14 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.57 · f41c15f2
      Greg Kroah-Hartman authored
      f41c15f2
    • Stratos Karafotis's avatar
      cpufreq: ondemand: Change the calculation of target frequency · bed53965
      Stratos Karafotis authored
      commit dfa5bb62 upstream.
      
      The ondemand governor calculates load in terms of frequency and
      increases it only if load_freq is greater than up_threshold
      multiplied by the current or average frequency.  This appears to
      produce oscillations of frequency between min and max because,
      for example, a relatively small load can easily saturate minimum
      frequency and lead the CPU to the max.  Then, it will decrease
      back to the min due to small load_freq.
      
      Change the calculation method of load and target frequency on the
      basis of the following two observations:
      
       - Load computation should not depend on the current or average
         measured frequency.  For example, absolute load of 80% at 100MHz
         is not necessarily equivalent to 8% at 1000MHz in the next
         sampling interval.
      
       - It should be possible to increase the target frequency to any
         value present in the frequency table proportional to the absolute
         load, rather than to the max only, so that:
      
         Target frequency = C * load
      
         where we take C = policy->cpuinfo.max_freq / 100.
      
      Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
      Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
      increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
      that middle frequencies are used more, with this patch.  Highest
      and lowest frequencies were used less by ~9%.
      
      [rjw: We have run multiple other tests on kernels with this
       change applied and in the vast majority of cases it turns out
       that the resulting performance improvement also leads to reduced
       consumption of energy.  The change is additionally justified by
       the overall simplification of the code in question.]
      Signed-off-by: default avatarStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bed53965
    • Andreas Schwab's avatar
      cpufreq: Fix wrong time unit conversion · 35c23914
      Andreas Schwab authored
      commit a857c0b9 upstream.
      
      The time spent by a CPU under a given frequency is stored in jiffies unit
      in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
      the frequency.
      
      This is what is displayed in the following file on the right column:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 19835820
           2300000 3172
           [...]
      
      Now cpufreq converts this jiffies unit delta to clock_t before returning it
      to the user as in the above file. And that conversion is achieved using the API
      cputime64_to_clock_t().
      
      Although it accidentally works on traditional tick based cputime accounting, where
      cputime_t maps directly to jiffies, it doesn't work with other types of cputime
      accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
      or any granularity preffered by the architecture.
      
      For example we get a buggy zero delta on full dyntick configurations:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 0
           2300000 0
           [...]
      
      Fix this with using the proper jiffies_64_t to clock_t conversion.
      Reported-and-tested-by: default avatarCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35c23914
    • Johannes Berg's avatar
      nl80211: clear skb cb before passing to netlink · 7dd31112
      Johannes Berg authored
      commit bd8c78e7 upstream.
      
      In testmode and vendor command reply/event SKBs we use the
      skb cb data to store nl80211 parameters between allocation
      and sending. This causes the code for CONFIG_NETLINK_MMAP
      to get confused, because it takes ownership of the skb cb
      data when the SKB is handed off to netlink, and it doesn't
      explicitly clear it.
      
      Clear the skb cb explicitly when we're done and before it
      gets passed to netlink to avoid this issue.
      Reported-by: default avatarAssaf Azulay <assaf.azulay@intel.com>
      Reported-by: default avatarDavid Spinadel <david.spinadel@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dd31112
    • Lars Ellenberg's avatar
      drbd: fix regression 'out of mem, failed to invoke fence-peer helper' · 6353c97a
      Lars Ellenberg authored
      commit bbc1c5e8 upstream.
      
      Since linux kernel 3.13, kthread_run() internally uses
      wait_for_completion_killable().  We sometimes may use kthread_run()
      while we still have a signal pending, which we used to kick our threads
      out of potentially blocking network functions, causing kthread_run() to
      mistake that as a new fatal signal and fail.
      
      Fix: flush_signals() before kthread_run().
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6353c97a
    • Andrew Hunter's avatar
      jiffies: Fix timeval conversion to jiffies · 00790d45
      Andrew Hunter authored
      commit d78c9300 upstream.
      
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarPaul Turner <pjt@google.com>
      Reported-by: default avatarAaron Jacobs <jacobsa@google.com>
      Signed-off-by: default avatarAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00790d45
    • NeilBrown's avatar
      md/raid5: disable 'DISCARD' by default due to safety concerns. · 06905ff8
      NeilBrown authored
      commit 8e0e99ba upstream.
      
      It has come to my attention (thanks Martin) that 'discard_zeroes_data'
      is only a hint.  Some devices in some cases don't do what it
      says on the label.
      
      The use of DISCARD in RAID5 depends on reads from discarded regions
      being predictably zero.  If a write to a previously discarded region
      performs a read-modify-write cycle it assumes that the parity block
      was consistent with the data blocks.  If all were zero, this would
      be the case.  If some are and some aren't this would not be the case.
      This could lead to data corruption after a device failure when
      data needs to be reconstructed from the parity.
      
      As we cannot trust 'discard_zeroes_data', ignore it by default
      and so disallow DISCARD on all raid4/5/6 arrays.
      
      As many devices are trustworthy, and as there are benefits to using
      DISCARD, add a module parameter to over-ride this caution and cause
      DISCARD to work if discard_zeroes_data is set.
      
      If a site want to enable DISCARD on some arrays but not on others they
      should select DISCARD support at the filesystem level, and set the
      raid456 module parameter.
          raid456.devices_handle_discard_safely=Y
      
      As this is a data-safety issue, I believe this patch is suitable for
      -stable.
      DISCARD support for RAID456 was added in 3.7
      
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Heinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: 620125f2Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      [bwh: Backported to 3.10: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06905ff8
    • Hans Verkuil's avatar
      media: vb2: fix VBI/poll regression · f5d34b7c
      Hans Verkuil authored
      commit 58d75f4b upstream.
      
      The recent conversion of saa7134 to vb2 unconvered a poll() bug that
      broke the teletext applications alevt and mtt. These applications
      expect that calling poll() without having called VIDIOC_STREAMON will
      cause poll() to return POLLERR. That did not happen in vb2.
      
      This patch fixes that behavior. It also fixes what should happen when
      poll() is called when STREAMON is called but no buffers have been
      queued. In that case poll() will also return POLLERR, but only for
      capture queues since output queues will always return POLLOUT
      anyway in that situation.
      
      This brings the vb2 behavior in line with the old videobuf behavior.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5d34b7c
    • Mel Gorman's avatar
      mm: numa: Do not mark PTEs pte_numa when splitting huge pages · f35407ac
      Mel Gorman authored
      commit abc40bd2 upstream.
      
      This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the
      NUMA type from the pmd to the pte"). If a huge page is being split due
      a protection change and the tail will be in a PROT_NONE vma then NUMA
      hinting PTEs are temporarily created in the protected VMA.
      
       VM_RW|VM_PROTNONE
      |-----------------|
            ^
            split here
      
      In the specific case above, it should get fixed up by change_pte_range()
      but there is a window of opportunity for weirdness to happen. Similarly,
      if a huge page is shrunk and split during a protection update but before
      pmd_numa is cleared then a pte_numa can be left behind.
      
      Instead of adding complexity trying to deal with the case, this patch
      will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
      will not be triggered which is marginal in comparison to the complexity
      in dealing with the corner cases during THP split.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f35407ac
    • Waiman Long's avatar
      mm, thp: move invariant bug check out of loop in __split_huge_page_map · 183c062c
      Waiman Long authored
      commit f8303c25 upstream.
      
      In __split_huge_page_map(), the check for page_mapcount(page) is
      invariant within the for loop.  Because of the fact that the macro is
      implemented using atomic_read(), the redundant check cannot be optimized
      away by the compiler leading to unnecessary read to the page structure.
      
      This patch moves the invariant bug check out of the loop so that it will
      be done only once.  On a 3.16-rc1 based kernel, the execution time of a
      microbenchmark that broke up 1000 transparent huge pages using munmap()
      had an execution time of 38,245us and 38,548us with and without the
      patch respectively.  The performance gain is about 1%.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      183c062c
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Fix infinite spin in reading buffer · 78a3db11
      Steven Rostedt (Red Hat) authored
      commit 24607f11 upstream.
      
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78a3db11
    • Josh Triplett's avatar
      init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu · 0000372a
      Josh Triplett authored
      commit 62b4d204 upstream.
      
      commit 03b8c7b6 ("futex: Allow
      architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
      HAVE_FUTEX_CMPXCHG symbol right below FUTEX.  This placed it right in
      the middle of the options for the EXPERT menu.  However,
      HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
      placing items in the EXPERT menu, and displays the remaining several
      EXPERT items (starting with EPOLL) directly in the General Setup menu.
      
      Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
      HAVE_FUTEX_CMPXCHG itself depend on FUTEX.  With this change, the
      subsequent items display as part of the EXPERT menu again; the EMBEDDED
      menu now appears as the next top-level item in the General Setup menu,
      which makes General Setup much shorter and more usable.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0000372a
    • Peter Zijlstra's avatar
      perf: fix perf bug in fork() · bee870fc
      Peter Zijlstra authored
      commit 6c72e350 upstream.
      
      Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
      calling perf_event_free_task() when failing sched_fork() we will not yet
      have done the memset() on ->perf_event_ctxp[] and will therefore try and
      'free' the inherited contexts, which are still in use by the parent
      process.  This is bad..
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarSylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bee870fc
    • Jan Kara's avatar
      udf: Avoid infinite loop when processing indirect ICBs · 07d209bd
      Jan Kara authored
      commit c03aa9f6 upstream.
      
      We did not implement any bound on number of indirect ICBs we follow when
      loading inode. Thus corrupted medium could cause kernel to go into an
      infinite loop, possibly causing a stack overflow.
      
      Fix the possible stack overflow by removing recursion from
      __udf_read_inode() and limit number of indirect ICBs we follow to avoid
      infinite loops.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07d209bd
  4. 05 Oct, 2014 3 commits