1. 06 Aug, 2014 40 commits
    • Nicolas Pitre's avatar
      ARM: 7670/1: fix the memset fix · 2c58922a
      Nicolas Pitre authored
      commit 418df63a upstream.
      
      Commit 455bd4c4 ("ARM: 7668/1: fix memset-related crashes caused by
      recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
      with the memset return value.  However the memset itself became broken
      by that patch for misaligned pointers.
      
      This fixes the above by branching over the entry code from the
      misaligned fixup code to avoid reloading the original pointer.
      
      Also, because the function entry alignment is wrong in the Thumb mode
      compilation, that fixup code is moved to the end.
      
      While at it, the entry instructions are slightly reworked to help dual
      issue pipelines.
      Signed-off-by: default avatarNicolas Pitre <nico@linaro.org>
      Tested-by: default avatarAlexander Holler <holler@ahsoftware.de>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2c58922a
    • Ivan Djelic's avatar
      ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations · fe7b4c33
      Ivan Djelic authored
      commit 455bd4c4 upstream.
      
      Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
      assumptions about the implementation of memset and similar functions.
      The current ARM optimized memset code does not return the value of
      its first argument, as is usually expected from standard implementations.
      
      For instance in the following function:
      
      void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
      {
      	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
      	waiter->magic = waiter;
      	INIT_LIST_HEAD(&waiter->list);
      }
      
      compiled as:
      
      800554d0 <debug_mutex_lock_common>:
      800554d0:       e92d4008        push    {r3, lr}
      800554d4:       e1a00001        mov     r0, r1
      800554d8:       e3a02010        mov     r2, #16 ; 0x10
      800554dc:       e3a01011        mov     r1, #17 ; 0x11
      800554e0:       eb04426e        bl      80165ea0 <memset>
      800554e4:       e1a03000        mov     r3, r0
      800554e8:       e583000c        str     r0, [r3, #12]
      800554ec:       e5830000        str     r0, [r3]
      800554f0:       e5830004        str     r0, [r3, #4]
      800554f4:       e8bd8008        pop     {r3, pc}
      
      GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
      register/memory corruptions.
      
      This patch fixes the return value of the assembly version of memset.
      It adds a 'mov' instruction and merges an additional load+store into
      existing load/store instructions.
      For ease of review, here is a breakdown of the patch into 4 simple steps:
      
      Step 1
      ======
      Perform the following substitutions:
      ip -> r8, then
      r0 -> ip,
      and insert 'mov ip, r0' as the first statement of the function.
      At this point, we have a memset() implementation returning the proper result,
      but corrupting r8 on some paths (the ones that were using ip).
      
      Step 2
      ======
      Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
      
      save r8:
      -       str     lr, [sp, #-4]!
      +       stmfd   sp!, {r8, lr}
      
      and restore r8 on both exit paths:
      -       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
      +       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
      (...)
              tst     r2, #16
              stmneia ip!, {r1, r3, r8, lr}
      -       ldr     lr, [sp], #4
      +       ldmfd   sp!, {r8, lr}
      
      Step 3
      ======
      Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
      
      save r8:
      -       stmfd   sp!, {r4-r7, lr}
      +       stmfd   sp!, {r4-r8, lr}
      
      and restore r8 on both exit paths:
              bgt     3b
      -       ldmeqfd sp!, {r4-r7, pc}
      +       ldmeqfd sp!, {r4-r8, pc}
      (...)
              tst     r2, #16
              stmneia ip!, {r4-r7}
      -       ldmfd   sp!, {r4-r7, lr}
      +       ldmfd   sp!, {r4-r8, lr}
      
      Step 4
      ======
      Rewrite register list "r4-r7, r8" as "r4-r8".
      Signed-off-by: default avatarIvan Djelic <ivan.djelic@parrot.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarDirk Behme <dirk.behme@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fe7b4c33
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix copy_hugetlb_page_range() · e6a5e9f6
      Naoya Horiguchi authored
      commit 0253d634 upstream.
      
      Commit 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry") changed the order of
      huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
      in some workloads like hugepage-backed heap allocation via libhugetlbfs.
      This patch fixes it.
      
      The test program for the problem is shown below:
      
        $ cat heap.c
        #include <unistd.h>
        #include <stdlib.h>
        #include <string.h>
      
        #define HPS 0x200000
      
        int main() {
        	int i;
        	char *p = malloc(HPS);
        	memset(p, '1', HPS);
        	for (i = 0; i < 5; i++) {
        		if (!fork()) {
        			memset(p, '2', HPS);
        			p = malloc(HPS);
        			memset(p, '3', HPS);
        			free(p);
        			return 0;
        		}
        	}
        	sleep(1);
        	free(p);
        	return 0;
        }
      
        $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
      
      Fixes 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry"), so is applicable to -stable kernels which
      include it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Suggested-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e6a5e9f6
    • Markus F.X.J. Oberhumer's avatar
      crypto: testmgr - update LZO compression test vectors · 6156e7c0
      Markus F.X.J. Oberhumer authored
      commit 0ec73820 upstream.
      
      Update the LZO compression test vectors according to the latest compressor
      version.
      Signed-off-by: default avatarMarkus F.X.J. Oberhumer <markus@oberhumer.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6156e7c0
    • Julian Anastasov's avatar
      ipvs: stop tot_stats estimator only under CONFIG_SYSCTL · 25cc3a3e
      Julian Anastasov authored
      [ Upstream commit 9802d21e ]
      
      The tot_stats estimator is started only when CONFIG_SYSCTL
      is defined. But it is stopped without checking CONFIG_SYSCTL.
      Fix the crash by moving ip_vs_stop_estimator into
      ip_vs_control_net_cleanup_sysctl.
      
      The change is needed after commit 14e40546
      ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39.
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Tested-by: default avatarJet Chen <jet.chen@intel.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      25cc3a3e
    • Roland Dreier's avatar
      x86, ioremap: Speed up check for RAM pages · b3ff2867
      Roland Dreier authored
      commit c81c8a1e upstream.
      
      In __ioremap_caller() (the guts of ioremap), we loop over the range of
      pfns being remapped and checks each one individually with page_is_ram().
      For large ioremaps, this can be very slow.  For example, we have a
      device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
      seconds -- sometimes long enough to trigger the soft lockup detector!
      
      Internally, page_is_ram() calls walk_system_ram_range() on a single
      page.  Instead, we can make a single call to walk_system_ram_range()
      from __ioremap_caller(), and do our further checks only for any RAM
      pages that we find.  For the common case of MMIO, this saves an enormous
      amount of work, since the range being ioremapped doesn't intersect
      system RAM at all.
      
      With this change, ioremap on our 256 GiB BAR takes less than 1 second.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.orgSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b3ff2867
    • Mikulas Patocka's avatar
      sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue · 30b605d6
      Mikulas Patocka authored
      commit fd1232b2 upstream.
      
      This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
      returns QUEUE FULL status.
      
      When the controller encounters an error (including QUEUE FULL or BUSY
      status), it aborts all not yet submitted requests in the function
      sym_dequeue_from_squeue.
      
      This function aborts them with DID_SOFT_ERROR.
      
      If the disk has full tag queue, the request that caused the overflow is
      aborted with QUEUE FULL status (and the scsi midlayer properly retries
      it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
      the following requests with DID_SOFT_ERROR --- for them, the midlayer
      does just a few retries and then signals the error up to sd.
      
      The result is that disk returning QUEUE FULL causes request failures.
      
      The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
      (rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
      64 tags, but under some access patterns it return QUEUE FULL when there
      are less than 64 pending tags.  The SCSI specification allows returning
      QUEUE FULL anytime and it is up to the host to retry.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30b605d6
    • Dan Carpenter's avatar
      applicom: dereferencing NULL on error path · 12489e9c
      Dan Carpenter authored
      commit 8bab797c upstream.
      
      This is a static checker fix.  The "dev" variable is always NULL after
      the while statement so we would be dereferencing a NULL pointer here.
      
      Fixes: 819a3eba ('[PATCH] applicom: fix error handling')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      12489e9c
    • H. Peter Anvin's avatar
      x86-32, espfix: Remove filter for espfix32 due to race · 6806fa8b
      H. Peter Anvin authored
      commit 246f2d2e upstream.
      
      It is not safe to use LAR to filter when to go down the espfix path,
      because the LDT is per-process (rather than per-thread) and another
      thread might change the descriptors behind our back.  Fortunately it
      is always *safe* (if a bit slow) to go down the espfix path, and a
      32-bit LDT stack segment is extremely rare.
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.comSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6806fa8b
    • Jiang Liu's avatar
      score: normalize global variables exported by vmlinux.lds · 1280d204
      Jiang Liu authored
      commit ae49b83d upstream.
      
      Generate mandatory global variables _sdata in file vmlinux.lds.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1280d204
    • Michael Cree's avatar
      alpha: add io{read,write}{16,32}be functions · 1df72e0d
      Michael Cree authored
      commit 25534eb7 upstream.
      
      These functions are used in some PCI drivers with big-endian
      MMIO space.
      
      Admittedly it is almost certain that no one this side of the
      Moon would use such a card in an Alpha but it does get us
      closer to being able to build allyesconfig or allmodconfig,
      and it enables the Debian default generic config to build.
      Tested-by: default avatarRaúl Porcel <armin76@gentoo.org>
      Signed-off-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1df72e0d
    • Ben Hutchings's avatar
      score: Add missing #include <linux/export.h> · 951c03ac
      Ben Hutchings authored
      There is no upstream commit for this, as arch/score/kernel/init_task.c
      has been replaced by generic code and <linux/export.h> is included
      indirectly by arch/score/mm/init.c.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      951c03ac
    • Lennox Wu's avatar
      Score: The commit is for compiling successfully. The modifications include: 1.... · ef2706a9
      Lennox Wu authored
      Score: The commit is for compiling successfully. The modifications include: 1. Kconfig of Score: we don't support ioremap 2. Missed headfile including 3. There are some errors in other people's commit not checked by us, we fix it now 3.1 arch/score/kernel/entry.S: wrong instructions 3.2 arch/score/kernel/process.c : just some typos
      
      commit 5fbbf8a1 upstream.
      Signed-off-by: default avatarLennox Wu <lennox.wu@gmail.com>
      [bwh: Backported to 3.2:
       - Drop addition of 'select HAVE_GENERIC_HARDIRQS' which was not removed here
       - Drop inapplicale change to copy_thread()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ef2706a9
    • Fengguang Wu's avatar
      unicore32: select generic atomic64_t support · 3e1ad261
      Fengguang Wu authored
      commit 82e54a6a upstream.
      
      It's required for the core fs/namespace.c and many other basic features.
      Signed-off-by: default avatarGuan Xuetao <gxt@mprc.pku.edu.cn>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3e1ad261
    • Guan Xuetao's avatar
      unicore32: add ioremap_nocache definition · 86e93708
      Guan Xuetao authored
      commit a50e4213 upstream.
      
      Bugfix for following error messages:
      lib/iomap.c: In function 'pci_iomap':
      lib/iomap.c:274: error: implicit declaration of function 'ioremap_nocache'
      lib/iomap.c:274: warning: return makes pointer from integer without a cast
      
      Also see commit <f1ecc698>
        it will hide the ioremap_nocache function for systems with an MMU
      Signed-off-by: default avatarGuan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      86e93708
    • Hugh Dickins's avatar
      shmem: fix splicing from a hole while it's punched · a9883115
      Hugh Dickins authored
      commit b1a36650 upstream.
      
      shmem_fault() is the actual culprit in trinity's hole-punch starvation,
      and the most significant cause of such problems: since a page faulted is
      one that then appears page_mapped(), needing unmap_mapping_range() and
      i_mmap_mutex to be unmapped again.
      
      But it is not the only way in which a page can be brought into a hole in
      the radix_tree while that hole is being punched; and Vlastimil's testing
      implies that if enough other processors are busy filling in the hole,
      then shmem_undo_range() can be kept from completing indefinitely.
      
      shmem_file_splice_read() is the main other user of SGP_CACHE, which can
      instantiate shmem pagecache pages in the read-only case (without holding
      i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
      silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
      ought to be safe, but might bring surprises - not a change to be rushed.
      
      shmem_read_mapping_page_gfp() is an internal interface used by
      drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
      shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
      called internally by the kernel (perhaps for a stacking filesystem,
      which might rely on holes to be reserved): it's unclear whether it could
      be provoked to keep hole-punch busy or not.
      
      We could apply the same umbrella as now used in shmem_fault() to
      shmem_file_splice_read() and the others; but it looks ugly, and use over
      a range raises questions - should it actually be per page? can these get
      starved themselves?
      
      The origin of this part of the problem is my v3.1 commit d0823576
      ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
      into shmem.c.  It seemed like a nice idea at the time, to ensure
      (barring RCU lookup fuzziness) that there's an instant when the entire
      hole is empty; but the indefinitely repeated scans to ensure that make
      it vulnerable.
      
      Revert that "enhancement" to hole-punch from shmem_undo_range(), but
      retain the unproblematic rescanning when it's truncating; add a couple
      of comments there.
      
      Remove the "indices[0] >= end" test: that is now handled satisfactorily
      by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
      to be worth avoiding here.
      
      But if we do not always loop indefinitely, we do need to handle the case
      of swap swizzled back to page before shmem_free_swap() gets it: add a
      retry for that case, as suggested by Konstantin Khlebnikov; and for the
      case of page swizzled back to swap, as suggested by Johannes Weiner.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a9883115
    • Hugh Dickins's avatar
      shmem: fix faulting into a hole, not taking i_mutex · de21fd42
      Hugh Dickins authored
      commit 8e205f77 upstream.
      
      Commit f00cdc6d ("shmem: fix faulting into a hole while it's
      punched") was buggy: Sasha sent a lockdep report to remind us that
      grabbing i_mutex in the fault path is a no-no (write syscall may already
      hold i_mutex while faulting user buffer).
      
      We tried a completely different approach (see following patch) but that
      proved inadequate: good enough for a rational workload, but not good
      enough against trinity - which forks off so many mappings of the object
      that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
      into serious starvation when concurrent faults force the puncher to fall
      back to single-page unmap_mapping_range() searches of the i_mmap tree.
      
      So return to the original umbrella approach, but keep away from i_mutex
      this time.  We really don't want to bloat every shmem inode with a new
      mutex or completion, just to protect this unlikely case from trinity.
      So extend the original with wait_queue_head on stack at the hole-punch
      end, and wait_queue item on the stack at the fault end.
      
      This involves further use of i_lock to guard against the races: lockdep
      has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
      i_lock around wake_up_bit(), which is comparable to what we do here.
      i_lock is more convenient, but we could switch to shmem's info->lock.
      
      This issue has been tagged with CVE-2014-4171, which will require commit
      f00cdc6d and this and the following patch to be backported: we
      suggest to 3.1+, though in fact the trinity forkbomb effect might go
      back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
      not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
      Anyone running trinity on 3.0 and earlier? I don't think we need care.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      de21fd42
    • Hugh Dickins's avatar
      shmem: fix faulting into a hole while it's punched · f159cc25
      Hugh Dickins authored
      commit f00cdc6d upstream.
      
      Trinity finds that mmap access to a hole while it's punched from shmem
      can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
      from completing, until the reader chooses to stop; with the puncher's
      hold on i_mutex locking out all other writers until it can complete.
      
      It appears that the tmpfs fault path is too light in comparison with its
      hole-punching path, lacking an i_data_sem to obstruct it; but we don't
      want to slow down the common case.
      
      Extend shmem_fallocate()'s existing range notification mechanism, so
      shmem_fault() can refrain from faulting pages into the hole while it's
      punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
      faulting when not).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f159cc25
    • Dave Chinner's avatar
      xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near · d9892580
      Dave Chinner authored
      commit e3a746f5 upstream.
      
      The current cursor is reallocated when retrying the allocation, so
      the existing cursor needs to be destroyed in both the restart and
      the failure cases.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d9892580
    • Dave Chinner's avatar
      xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near · 381687bd
      Dave Chinner authored
      commit 76d09538 upstream.
      
      When we fail to find an matching extent near the requested extent
      specification during a left-right distance search in
      xfs_alloc_ag_vextent_near, we fail to free the original cursor that
      we used to look up the XFS_BTNUM_CNT tree and hence leak it.
      Reported-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      381687bd
    • Mathias Krause's avatar
      netfilter: ipt_ULOG: fix info leaks · 0368fea2
      Mathias Krause authored
      commit 278f2b3e upstream.
      
      The ulog messages leak heap bytes by the means of padding bytes and
      incompletely filled string arrays. Fix those by memset(0)'ing the
      whole struct before filling it.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0368fea2
    • Martin Schwidefsky's avatar
      s390/ptrace: fix PSW mask check · 438127dd
      Martin Schwidefsky authored
      commit dab6cf55 upstream.
      
      The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect.
      For the default user_mode=home address space layout the psw_user_bits
      variable has the home space address-space-control bits set. But the
      PSW_MASK_USER contains PSW_MASK_ASC, the ptrace validity check for the
      PSW mask will therefore always fail.
      
      Fixes CVE-2014-3534
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      438127dd
    • Thomas Gleixner's avatar
      nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off · c4b4c3c5
      Thomas Gleixner authored
      commit 0e576acb upstream.
      
      If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.
      
      If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
      command line option "nohz=off" or not enabled due to missing hardware
      support, then tick_nohz_get_sleep_length() returns 0. That happens
      because ts->sleep_length is never set in that case.
      
      Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.
      Reported-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c4b4c3c5
    • Michal Schmidt's avatar
      rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 · ea43a736
      Michal Schmidt authored
      commit e5eca6d4 upstream.
      
      When running RHEL6 userspace on a current upstream kernel, "ip link"
      fails to show VF information.
      
      The reason is a kernel<->userspace API change introduced by commit
      88c5b5ce ("rtnetlink: Call nlmsg_parse() with correct header length"),
      after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
      in the netlink request.
      
      iproute2 adjusted for the API change in its commit 63338dca4513
      ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").
      
      The problem has been noticed before:
      http://marc.info/?l=linux-netdev&m=136692296022182&w=2
      (Subject: Re: getting VF link info seems to be broken in 3.9-rc8)
      
      We can do better than tell those with old userspace to upgrade. We can
      recognize the old iproute2 in the kernel by checking the netlink message
      length. Even when including the IFLA_EXT_MASK attribute, its netlink
      message is shorter than struct ifinfomsg.
      
      With this patch "ip link" shows VF information in both old and new
      iproute2 versions.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ea43a736
    • Eric Dumazet's avatar
      ipv4: fix buffer overflow in ip_options_compile() · 22310565
      Eric Dumazet authored
      [ Upstream commit 10ec9472 ]
      
      There is a benign buffer overflow in ip_options_compile spotted by
      AddressSanitizer[1] :
      
      Its benign because we always can access one extra byte in skb->head
      (because header is followed by struct skb_shared_info), and in this case
      this byte is not even used.
      
      [28504.910798] ==================================================================
      [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
      [28504.913170] Read of size 1 by thread T15843:
      [28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
      [28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
      [28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.924722]
      [28504.925106] Allocated by thread T15843:
      [28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
      [28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.934018]
      [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
      [28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
      [28504.937144]
      [28504.937474] Memory state around the buggy address:
      [28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.945573]                         ^
      [28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.099804] Legend:
      [28505.100269]  f - 8 freed bytes
      [28505.100884]  r - 8 redzone bytes
      [28505.101649]  . - 8 allocated bytes
      [28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
      [28505.103637] ==================================================================
      
      [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      22310565
    • Ben Hutchings's avatar
      dns_resolver: Null-terminate the right string · 0d604e94
      Ben Hutchings authored
      [ Upstream commit 640d7efe ]
      
      *_result[len] is parsed as *(_result[len]) which is not at all what we
      want to touch here.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 84a7c0b1 ("dns_resolver: assure that dns_query() result is null-terminated")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d604e94
    • Manuel Schölling's avatar
      dns_resolver: assure that dns_query() result is null-terminated · bba87648
      Manuel Schölling authored
      [ Upstream commit 84a7c0b1 ]
      
      dns_query() credulously assumes that keys are null-terminated and
      returns a copy of a memory block that is off by one.
      Signed-off-by: default avatarManuel Schölling <manuel.schoelling@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bba87648
    • Sowmini Varadhan's avatar
      sunvnet: clean up objects created in vnet_new() on vnet_exit() · 8b9b0927
      Sowmini Varadhan authored
      [ Upstream commit a4b70a07 ]
      
      Nothing cleans up the objects created by
      vnet_new(), they are completely leaked.
      
      vnet_exit(), after doing the vio_unregister_driver() to clean
      up ports, should call a helper function that iterates over vnet_list
      and cleans up those objects. This includes unregister_netdevice()
      as well as free_netdev().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Reviewed-by: default avatarKarl Volz <karl.volz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8b9b0927
    • Daniel Borkmann's avatar
      net: sctp: fix information leaks in ulpevent layer · 2a3fda71
      Daniel Borkmann authored
      [ Upstream commit 8f2e5ae4 ]
      
      While working on some other SCTP code, I noticed that some
      structures shared with user space are leaking uninitialized
      stack or heap buffer. In particular, struct sctp_sndrcvinfo
      has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
      remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
      putting this into cmsg. But also struct sctp_remote_error
      contains a 2 bytes hole that we don't fill but place into a skb
      through skb_copy_expand() via sctp_ulpevent_make_remote_error().
      
      Both structures are defined by the IETF in RFC6458:
      
      * Section 5.3.2. SCTP Header Information Structure:
      
        The sctp_sndrcvinfo structure is defined below:
      
        struct sctp_sndrcvinfo {
          uint16_t sinfo_stream;
          uint16_t sinfo_ssn;
          uint16_t sinfo_flags;
          <-- 2 bytes hole  -->
          uint32_t sinfo_ppid;
          uint32_t sinfo_context;
          uint32_t sinfo_timetolive;
          uint32_t sinfo_tsn;
          uint32_t sinfo_cumtsn;
          sctp_assoc_t sinfo_assoc_id;
        };
      
      * 6.1.3. SCTP_REMOTE_ERROR:
      
        A remote peer may send an Operation Error message to its peer.
        This message indicates a variety of error conditions on an
        association. The entire ERROR chunk as it appears on the wire
        is included in an SCTP_REMOTE_ERROR event. Please refer to the
        SCTP specification [RFC4960] and any extensions for a list of
        possible error formats. An SCTP error notification has the
        following format:
      
        struct sctp_remote_error {
          uint16_t sre_type;
          uint16_t sre_flags;
          uint32_t sre_length;
          uint16_t sre_error;
          <-- 2 bytes hole  -->
          sctp_assoc_t sre_assoc_id;
          uint8_t  sre_data[];
        };
      
      Fix this by setting both to 0 before filling them out. We also
      have other structures shared between user and kernel space in
      SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
      copy that buffer over from user space first and thus don't need
      to care about it in that cases.
      
      While at it, we can also remove lengthy comments copied from
      the draft, instead, we update the comment with the correct RFC
      number where one can look it up.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2a3fda71
    • Andrey Utkin's avatar
      appletalk: Fix socket referencing in skb · 7961c1a1
      Andrey Utkin authored
      [ Upstream commit 36beddc2 ]
      
      Setting just skb->sk without taking its reference and setting a
      destructor is invalid. However, in the places where this was done, skb
      is used in a way not requiring skb->sk setting. So dropping the setting
      of skb->sk.
      Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441Reported-by: default avatarEd Martin <edman007@edman007.com>
      Signed-off-by: default avatarAndrey Utkin <andrey.krieger.utkin@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7961c1a1
    • dingtianhong's avatar
      igmp: fix the problem when mc leave group · 00fcf0cf
      dingtianhong authored
      [ Upstream commit 52ad353a ]
      
      The problem was triggered by these steps:
      
      1) create socket, bind and then setsockopt for add mc group.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
         setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
      
      2) drop the mc group for this socket.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
         setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
      
      3) and then drop the socket, I found the mc group was still used by the dev:
      
         netstat -g
      
         Interface       RefCnt Group
         --------------- ------ ---------------------
         eth2		   1	  255.0.0.37
      
      Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
      to be released for the netdev when drop the socket, but this process was broken when
      route default is NULL, the reason is that:
      
      The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
      is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
      then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
      the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
      released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
      when dropping the socket, the mc_list is NULL and the dev still keep this group.
      
      v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
      	so I add the checking for the in_dev before polling the mc_list, make sure when
      	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
      	The problem would never happened again.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      00fcf0cf
    • Li RongQing's avatar
      8021q: fix a potential memory leak · 96f641a7
      Li RongQing authored
      [ Upstream commit 916c1689 ]
      
      skb_cow called in vlan_reorder_header does not free the skb when it failed,
      and vlan_reorder_header returns NULL to reset original skb when it is called
      in vlan_untag, lead to a memory leak.
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      96f641a7
    • Neal Cardwell's avatar
      tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb · 94fb7252
      Neal Cardwell authored
      [ Upstream commit 2cd0d743 ]
      
      If there is an MSS change (or misbehaving receiver) that causes a SACK
      to arrive that covers the end of an skb but is less than one MSS, then
      tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
      the skb ("Round if necessary..."), then chopping all bytes off the skb
      and creating a zero-byte skb in the write queue.
      
      This was visible now because the recently simplified TLP logic in
      bef1909e ("tcp: fixing TLP's FIN recovery") could find that 0-byte
      skb at the end of the write queue, and now that we do not check that
      skb's length we could send it as a TLP probe.
      
      Consider the following example scenario:
      
       mss: 1000
       skb: seq: 0 end_seq: 4000  len: 4000
       SACK: start_seq: 3999 end_seq: 4000
      
      The tcp_match_skb_to_sack() code will compute:
      
       in_sack = false
       pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
       new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
       new_len += mss = 4000
      
      Previously we would find the new_len > skb->len check failing, so we
      would fall through and set pkt_len = new_len = 4000 and chop off
      pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
      afterward in the write queue.
      
      With this new commit, we notice that the new new_len >= skb->len check
      succeeds, so that we return without trying to fragment.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      94fb7252
    • Gavin Guo's avatar
      usb: Check if port status is equal to RxDetect · 23596265
      Gavin Guo authored
      commit bb86cf56 upstream.
      
      When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller
      [1022:7814], the second hotplugging will experience the USB 3.0 pen
      drive is recognized as high-speed device. After bisecting the kernel,
      I found the commit number 41e7e056
      (USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing
      some experiments, the bug can be fixed by avoiding executing the function
      hub_usb3_port_disable(). Because the port status with [AMD] FCH USB
      XHCI Controlleris [1022:7814] is already in RxDetect
      (I tried printing out the port status before setting to Disabled state),
      it's reasonable to check the port status before really executing
      hub_usb3_port_disable().
      
      Fixes: 41e7e056 (USB: Allow USB 3.0 ports to be disabled.)
      Signed-off-by: default avatarGavin Guo <gavin.guo@canonical.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: use hub device as context for dev_dbg(),
       as hub ports are not devices in their own right]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      23596265
    • Alex Deucher's avatar
      drm/radeon: avoid leaking edid data · 4f9bb3eb
      Alex Deucher authored
      commit 0ac66eff upstream.
      
      In some cases we fetch the edid in the detect() callback
      in order to determine what sort of monitor is connected.
      If that happens, don't fetch the edid again in the get_modes()
      callback or we will leak the edid.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4f9bb3eb
    • Guenter Roeck's avatar
      hwmon: (adt7470) Fix writes to temperature limit registers · f8bac169
      Guenter Roeck authored
      commit de12d6f4 upstream.
      
      Temperature limit registers are signed. Limits therefore need
      to be clamped to (-128, 127) degrees C and not to (0, 255)
      degrees C.
      
      Without this fix, writing a limit of 128 degrees C sets the
      actual limit to -128 degrees C.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarAxel Lin <axel.lin@ingics.com>
      [bwh: Backported to 3.2: driver was using SENSORS_LIMIT(), which we can
       replace with clamp_val()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f8bac169
    • Peter Zijlstra's avatar
      locking/mutex: Disable optimistic spinning on some architectures · e87c95f8
      Peter Zijlstra authored
      commit 4badad35 upstream.
      
      The optimistic spin code assumes regular stores and cmpxchg() play nice;
      this is found to not be true for at least: parisc, sparc32, tile32,
      metag-lock1, arc-!llsc and hexagon.
      
      There is further wreckage, but this in particular seemed easy to
      trigger, so blacklist this.
      
      Opt in for known good archs.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: John David Anglin <dave.anglin@bell.net>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2:
       - Adjust context
       - Drop arm64 change]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e87c95f8
    • Mateusz Guzik's avatar
      sched: Fix possible divide by zero in avg_atom() calculation · 8f88141b
      Mateusz Guzik authored
      commit b0ab99e7 upstream.
      
      proc_sched_show_task() does:
      
        if (nr_switches)
      	do_div(avg_atom, nr_switches);
      
      nr_switches is unsigned long and do_div truncates it to 32 bits, which
      means it can test non-zero on e.g. x86-64 and be truncated to zero for
      division.
      
      Fix the problem by using div64_ul() instead.
      
      As a side effect calculations of avg_atom for big nr_switches are now correct.
      Signed-off-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8f88141b
    • Alex Shi's avatar
      include/linux/math64.h: add div64_ul() · 79e70e9d
      Alex Shi authored
      commit c2853c8d upstream.
      
      There is div64_long() to handle the s64/long division, but no mocro do
      u64/ul division.  It is necessary in some scenarios, so add this
      function.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      79e70e9d
    • Martin Lau's avatar
      ring-buffer: Fix polling on trace_pipe · 0be53320
      Martin Lau authored
      commit 97b8ee84 upstream.
      
      ring_buffer_poll_wait() should always put the poll_table to its wait_queue
      even there is immediate data available.  Otherwise, the following epoll and
      read sequence will eventually hang forever:
      
      1. Put some data to make the trace_pipe ring_buffer read ready first
      2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
      3. epoll_wait()
      4. read(trace_pipe_fd) till EAGAIN
      5. Add some more data to the trace_pipe ring_buffer
      6. epoll_wait() -> this epoll_wait() will block forever
      
      ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
        ring_buffer_poll_wait() returns immediately without adding poll_table,
        which has poll_table->_qproc pointing to ep_poll_callback(), to its
        wait_queue.
      ~ During the epoll_wait() call in step 3 and step 6,
        ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
        because the poll_table->_qproc is NULL and it is how epoll works.
      ~ When there is new data available in step 6, ring_buffer does not know
        it has to call ep_poll_callback() because it is not in its wait queue.
        Hence, block forever.
      
      Other poll implementation seems to call poll_wait() unconditionally as the very
      first thing to do.  For example, tcp_poll() in tcp.c.
      
      Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
      
      Fixes: 2a2cc8f7 "ftrace: allow the event pipe to be polled"
      Reviewed-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarMartin Lau <kafai@fb.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [bwh: Backported to 3.2: the poll implementation looks rather different
       but does have a conditional return before and after the poll_wait() call;
       delete the return before it.]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0be53320