1. 16 Jan, 2013 1 commit
    • Heiko Carstens's avatar
      s390/time: fix sched_clock() overflow · ed4f2094
      Heiko Carstens authored
      Converting a 64 Bit TOD format value to nanoseconds means that the value
      must be divided by 4.096. In order to achieve that we multiply with 125
      and divide by 512.
      When used within sched_clock() this triggers an overflow after appr.
      417 days. Resulting in a sched_clock() return value that is much smaller
      than previously and therefore may cause all sort of weird things in
      subsystems that rely on a monotonic sched_clock() behaviour.
      
      To fix this implement a tod_to_ns() helper function which converts TOD
      values without overflow and call this function from both places that
      open coded the conversion: sched_clock() and kvm_s390_handle_wait().
      
      Cc: stable@kernel.org
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      ed4f2094
  2. 12 Jan, 2013 2 commits
  3. 11 Jan, 2013 27 commits
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · b719f430
      Linus Torvalds authored
      Pull a hwmon patch from Guenter Roeck:
       "Fix build error in vexpress driver"
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (vexpress) Fix build error seen if CONFIG_OF_DEVICE is not set
      b719f430
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming fixes from Andrew) · c727b4c6
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "The audit fixes have been floating around for a while - Al and Eric
        aren't responding to either myself or Kees so I asked Kees to
        re-review them and here they are."
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
        lib/rbtree.c: avoid the use of non-static __always_inline
        MAINTAINERS: Omar had moved
        mm: compaction: partially revert capture of suitable high-order page
        linux/audit.h: move ptrace.h include to kernel header
        kernel/audit.c: avoid negative sleep durations
        audit: catch possible NULL audit buffers
        audit: create explicit AUDIT_SECCOMP event type
        MAINTAINERS: fix a status pattern
        MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h
        mm: thp: acquire the anon_vma rwsem for write during split
        mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep
        lockdep, rwsem: provide down_write_nest_lock()
        arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64
        mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
        mm: use aligned zone start for pfn_to_bitidx calculation
        fs/exec.c: work around icc miscompilation
        mm: compaction: fix echo 1 > compact_memory return error issue
        mm: memblock: fix wrong memmove size in memblock_merge_regions()
        drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
        mm: migrate: check page_count of THP before migrating
        ...
      c727b4c6
    • Michel Lespinasse's avatar
      lib/rbtree.c: avoid the use of non-static __always_inline · 3cb7a563
      Michel Lespinasse authored
      lib/rbtree.c declared __rb_erase_color() as __always_inline void, and
      then exported it with EXPORT_SYMBOL.
      
      This was because __rb_erase_color() must be exported for augmented
      rbtree users, but it must also be inlined into rb_erase() so that the
      dummy callback can get optimized out of that call site.
      
      (Actually with a modern compiler, none of the dummy callback functions
      should even be generated as separate text functions).
      
      The above usage is legal C, but it was unusual enough for some compilers
      to warn about it.  This change makes things more explicit, with a static
      __always_inline ____rb_erase_color function for use in rb_erase(), and a
      separate non-inline __rb_erase_color function for use in
      rb_erase_augmented call sites.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reported-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cb7a563
    • Chen Gang's avatar
      MAINTAINERS: Omar had moved · a8906b0b
      Chen Gang authored
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: Omar Ramirez Luna <omar.ramirez@ti.com>
      Cc: Omar Ramirez Luna <omar.ramirez@copitl.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8906b0b
    • Mel Gorman's avatar
      mm: compaction: partially revert capture of suitable high-order page · 8fb74b9f
      Mel Gorman authored
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca ("mm: compaction: capture a
      suitable high-order page immediately when it is made available").
      Reported-and-tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fb74b9f
    • Mike Frysinger's avatar
      linux/audit.h: move ptrace.h include to kernel header · c0a3a20b
      Mike Frysinger authored
      While the kernel internals want pt_regs (and so it includes
      linux/ptrace.h), the user version of audit.h does not need it.  So move
      the include out of the uapi version.
      
      This avoids issues where people want the audit defines and userland
      ptrace api.  Including both the kernel ptrace and the userland ptrace
      headers can easily lead to failure.
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0a3a20b
    • Andrew Morton's avatar
      kernel/audit.c: avoid negative sleep durations · 82919919
      Andrew Morton authored
      audit_log_start() performs the same jiffies comparison in two places.
      If sufficient time has elapsed between the two comparisons, the second
      one produces a negative sleep duration:
      
        schedule_timeout: wrong timeout value fffffffffffffff0
        Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
        Call Trace:
          schedule_timeout+0x305/0x340
          audit_log_start+0x311/0x470
          audit_log_exit+0x4b/0xfb0
          __audit_syscall_exit+0x25f/0x2c0
          sysret_audit+0x17/0x21
      
      Fix it by performing the comparison a single time.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82919919
    • Kees Cook's avatar
      audit: catch possible NULL audit buffers · 0644ec0c
      Kees Cook authored
      It's possible for audit_log_start() to return NULL.  Handle it in the
      various callers.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Julien Tinnes <jln@google.com>
      Cc: Will Drewry <wad@google.com>
      Cc: Steve Grubb <sgrubb@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0644ec0c
    • Kees Cook's avatar
      audit: create explicit AUDIT_SECCOMP event type · 7b9205bd
      Kees Cook authored
      The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
      could only kill a process.  While we still want to make sure an audit
      record is forced on a kill, this should use a separate record type since
      seccomp mode 2 introduces other behaviors.
      
      In the case of "handled" behaviors (process wasn't killed), only emit a
      record if the process is under inspection.  This change also fixes
      userspace examination of seccomp audit events, since it was considered
      malformed due to missing fields of the AUDIT_ANOM_ABEND event type.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Julien Tinnes <jln@google.com>
      Acked-by: default avatarWill Drewry <wad@chromium.org>
      Acked-by: default avatarSteve Grubb <sgrubb@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b9205bd
    • Zhang Yanfei's avatar
      MAINTAINERS: fix a status pattern · 56ca9d98
      Zhang Yanfei authored
      Change MAINTAINED to Maintained.
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56ca9d98
    • Zhang Yanfei's avatar
      MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h · 8fc8b12b
      Zhang Yanfei authored
      This file was moved to arch/arm/mach-omap2/omap=5Fhwmod.h by commit
      2a296c8f ("ARM: OMAP: Make plat/omap=5Fhwmod.h local to
      mach-omap2").
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fc8b12b
    • Mel Gorman's avatar
      mm: thp: acquire the anon_vma rwsem for write during split · 062f1af2
      Mel Gorman authored
      Zhouping Liu reported the following against 3.8-rc1 when running a mmap
      testcase from LTP.
      
        mapcount 0 page_mapcount 3
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:1798!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth rfkill iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_mod cdc_ether iTCO_wdt i7core_edac coretemp usbnet iTCO_vendor_support mii crc32c_intel edac_core lpc_ich shpchp ioatdma mfd_core i2c_i801 pcspkr serio_raw bnx2 microcode dca vhost_net tun macvtap macvlan kvm_intel kvm uinput mgag200 sr_mod cdrom i2c_algo_bit sd_mod drm_kms_helper crc_t10dif ata_generic pata_acpi ttm ata_piix drm libata i2c_core megaraid_sas
        CPU 1
        Pid: 23217, comm: mmap10 Not tainted 3.8.0-rc1mainline+ #17 IBM IBM System x3400 M3 Server -[7379I08]-/69Y4356
        RIP: __split_huge_page+0x677/0x6d0
        RSP: 0000:ffff88017a03fc08  EFLAGS: 00010293
        RAX: 0000000000000003 RBX: ffff88027a6c22e0 RCX: 00000000000034d2
        RDX: 000000000000748b RSI: 0000000000000046 RDI: 0000000000000246
        RBP: ffff88017a03fcb8 R08: ffffffff819d2440 R09: 000000000000054a
        R10: 0000000000aaaaaa R11: 00000000ffffffff R12: 0000000000000000
        R13: 00007f4f11a00000 R14: ffff880179e96e00 R15: ffffea0005c08000
        FS:  00007f4f11f4a740(0000) GS:ffff88017bc20000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00000037e9ebb404 CR3: 000000017a436000 CR4: 00000000000007e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process mmap10 (pid: 23217, threadinfo ffff88017a03e000, task ffff880172dd32e0)
        Stack:
         ffff88017a540ec8 ffff88017a03fc20 ffffffff816017b5 ffff88017a03fc88
         ffffffff812fa014 0000000000000000 ffff880279ebd5c0 00000000f4f11a4c
         00000007f4f11f49 00000007f4f11a00 ffff88017a540ef0 ffff88017a540ee8
        Call Trace:
          split_huge_page+0x68/0xb0
          __split_huge_page_pmd+0x134/0x330
          split_huge_page_pmd_mm+0x51/0x60
          split_huge_page_address+0x3b/0x50
          __vma_adjust_trans_huge+0x9c/0xf0
          vma_adjust+0x684/0x750
          __split_vma.isra.28+0x1fa/0x220
          do_munmap+0xf9/0x420
          vm_munmap+0x4e/0x70
          sys_munmap+0x2b/0x40
          system_call_fastpath+0x16/0x1b
      
      Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
      identified that commit 5a505085 ("mm/rmap: Convert the struct
      anon_vma::mutex to an rwsem") and commit 4fc3f1d6 ("mm/rmap,
      migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
      were likely the problem.  Reverting these commits was reported to solve
      the problem for Alexander.
      
      Despite the reason for these commits, NUMA balancing is not the direct
      source of the problem.  split_huge_page() expects the anon_vma lock to
      be exclusive to serialise the whole split operation.  Ordinarily it is
      expected that the anon_vma lock would only be required when updating the
      avcs but THP also uses the anon_vma rwsem for collapse and split
      operations where the page lock or compound lock cannot be used (as the
      page is changing from base to THP or vice versa) and the page table
      locks are insufficient.
      
      This patch takes the anon_vma lock for write to serialise against parallel
      split_huge_page as THP expected before the conversion to rwsem.
      Reported-and-tested-by: default avatarZhouping Liu <zliu@redhat.com>
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Reported-by: default avatarAlex Xu <alex_y_xu@yahoo.ca>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      062f1af2
    • Jiri Kosina's avatar
      mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep · 572043c9
      Jiri Kosina authored
      Commit 5a505085 ("mm/rmap: Convert the struct anon_vma::mutex to an
      rwsem") turned anon_vma mutex to rwsem.
      
      However, the properly annotated nested locking in mm_take_all_locks()
      has been converted from
      
      	mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);
      
      to
      
      	down_write(&anon_vma->root->rwsem);
      
      which is incomplete, and causes the false positive report from lockdep
      below.
      
      Annotate the fact that mmap_sem is used as an outter lock to serialize
      taking of all the anon_vma rwsems at once no matter the order, using the
      down_write_nest_lock() primitive.
      
      This patch fixes this lockdep report:
      
       =============================================
       [ INFO: possible recursive locking detected ]
       3.8.0-rc2-00036-g5f738967 #171 Not tainted
       ---------------------------------------------
       qemu-kvm/2315 is trying to acquire lock:
        (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
      
       but task is already holding lock:
        (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&anon_vma->rwsem);
         lock(&anon_vma->rwsem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       4 locks held by qemu-kvm/2315:
        #0:  (&mm->mmap_sem){++++++}, at: do_mmu_notifier_register+0xfc/0x170
        #1:  (mm_all_locks_mutex){+.+...}, at: mm_take_all_locks+0x36/0x1b0
        #2:  (&mapping->i_mmap_mutex){+.+...}, at: mm_take_all_locks+0xc9/0x1b0
        #3:  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
      
       stack backtrace:
       Pid: 2315, comm: qemu-kvm Not tainted 3.8.0-rc2-00036-g5f738967 #171
       Call Trace:
         print_deadlock_bug+0xf2/0x100
         validate_chain+0x4f6/0x720
         __lock_acquire+0x359/0x580
         lock_acquire+0x121/0x190
         down_write+0x3f/0x70
         mm_take_all_locks+0x149/0x1b0
         do_mmu_notifier_register+0x68/0x170
         mmu_notifier_register+0xe/0x10
         kvm_create_vm+0x22b/0x330 [kvm]
         kvm_dev_ioctl+0xf8/0x1a0 [kvm]
         do_vfs_ioctl+0x9d/0x350
         sys_ioctl+0x91/0xb0
         system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      572043c9
    • Jiri Kosina's avatar
      lockdep, rwsem: provide down_write_nest_lock() · 1b963c81
      Jiri Kosina authored
      down_write_nest_lock() provides a means to annotate locking scenario
      where an outer lock is guaranteed to serialize the order nested locks
      are being acquired.
      
      This is analogoue to already existing mutex_lock_nest_lock() and
      spin_lock_nest_lock().
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b963c81
    • Andrew Morton's avatar
      arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64 · fef6c12e
      Andrew Morton authored
      mn10300 doesn't provide its own atomic64 implementation, so it should pull
      in the generic one.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fef6c12e
    • Max Filippov's avatar
      mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment · 10d73e65
      Max Filippov authored
      Currently free_all_bootmem_core ignores that node_min_pfn may be not
      multiple of BITS_PER_LONG.  Eg commit 6dccdcbe ("mm: bootmem: fix
      checking the bitmap when finally freeing bootmem") shifts vec by lower
      bits of start instead of lower bits of idx.  Also
      
        if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL)
      
      assumes that vec bit 0 corresponds to start pfn, which is only true when
      node_min_pfn is a multiple of BITS_PER_LONG.  Also loop in the else
      clause can double-free pages (e.g.  with node_min_pfn == start == 1,
      map[0] == ~0 on 32-bit machine page 32 will be double-freed).
      
      This bug causes the following message during xtensa kernel boot:
      
        bootmem::free_all_bootmem_core nid=0 start=1 end=8000
        BUG: Bad page state in process swapper  pfn:00001
        page:d04bd020 count:0 mapcount:-127 mapping:  (null) index:0x2
        page flags: 0x0()
        Call Trace:
          bad_page+0x8c/0x9c
          free_pages_prepare+0x5e/0x88
          free_hot_cold_page+0xc/0xa0
          __free_pages+0x24/0x38
          __free_pages_bootmem+0x54/0x56
          free_all_bootmem_core$part$11+0xeb/0x138
          free_all_bootmem+0x46/0x58
          mem_init+0x25/0xa4
          start_kernel+0x11e/0x25c
          should_never_return+0x0/0x3be7
      
      The fix is the following:
       - always align vec so that its bit 0 corresponds to start
       - provide BITS_PER_LONG bits in vec, if those bits are available in the
         map
       - don't free pages past next start position in the else clause.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Prasad Koya <prasad.koya@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10d73e65
    • Laura Abbott's avatar
      mm: use aligned zone start for pfn_to_bitidx calculation · c060f943
      Laura Abbott authored
      The current calculation in pfn_to_bitidx assumes that (pfn -
      zone->zone_start_pfn) >> pageblock_order will return the same bit for
      all pfn in a pageblock.  If zone_start_pfn is not aligned to
      pageblock_nr_pages, this may not always be correct.
      
      Consider the following with pageblock order = 10, zone start 2MB:
      
        pfn     | pfn - zone start | (pfn - zone start) >> page block order
        ----------------------------------------------------------------
        0x26000 | 0x25e00	   |  0x97
        0x26100 | 0x25f00	   |  0x97
        0x26200 | 0x26000	   |  0x98
        0x26300 | 0x26100	   |  0x98
      
      This means that calling {get,set}_pageblock_migratetype on a single page
      will not set the migratetype for the full block.  Fix this by rounding
      down zone_start_pfn when doing the bitidx calculation.
      
      For our use case, the effects of this bug were mostly tied to the fact
      that CMA allocations would either take a long time or fail to happen.
      Depending on the driver using CMA, this could result in anything from
      visual glitches to application failures.
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c060f943
    • Xi Wang's avatar
      fs/exec.c: work around icc miscompilation · 6d92d4f6
      Xi Wang authored
      The tricky problem is this check:
      
      	if (i++ >= max)
      
      icc (mis)optimizes this check as:
      
      	if (++i > max)
      
      The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).
      
      This is "allowed" by the C standard, assuming i++ never overflows,
      because signed integer overflow is undefined behavior.  This
      optimization effectively reverts the previous commit 362e6663
      ("exec.c, compat.c: fix count(), compat_count() bounds checking") that
      tries to fix the check.
      
      This patch simply moves ++ after the check.
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d92d4f6
    • Jason Liu's avatar
      mm: compaction: fix echo 1 > compact_memory return error issue · 7964c06d
      Jason Liu authored
      when run the folloing command under shell, it will return error
      
        sh/$ echo 1 > /proc/sys/vm/compact_memory
        sh/$ sh: write error: Bad address
      
      After strace, I found the following log:
      
        ...
        write(1, "1\n", 2)               = 3
        write(1, "", 4294967295)         = -1 EFAULT (Bad address)
        write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
        ) = 31
      
      This tells system return 3(COMPACT_COMPLETE) after write data to
      compact_memory.
      
      The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE)
      from sysctl_compaction_handler after compaction_nodes finished.
      Signed-off-by: default avatarJason Liu <r64343@freescale.com>
      Suggested-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7964c06d
    • Lin Feng's avatar
      mm: memblock: fix wrong memmove size in memblock_merge_regions() · c0232ae8
      Lin Feng authored
      The memmove span covers from (next+1) to the end of the array, and the
      index of next is (i+1), so the index of (next+1) is (i+2).  So the size
      of remaining array elements is (type->cnt - (i + 2)).
      
      Since the remaining elements of the memblock array are move forward by
      one element and there is only one additional element caused by this bug.
      So there won't be any write overflow here but read overflow.  It may
      read one more element out of the array address if the array happens to
      be full.  Commonly it doesn't matter at all but if the array happens to
      be located at the end a memblock, it may cause a invalid read operation
      for the physical address doesn't exist.
      
      There are 2 *happens to be* here, so I think the probability is quite
      low, I don't know if any guy is haunted by this bug before.
      
      Mostly I think it's user-invisible.
      Signed-off-by: default avatarLin Feng <linfeng@cn.fujitsu.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0232ae8
    • Maxime Ripard's avatar
      drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function · 552f0cc7
      Maxime Ripard authored
      This was leading to a strange behaviour when using the fbcon driver on
      top of this one: the letters were in the right order, but each letter
      had a vertical symmetry.
      
      This was because the addressing was right for the byte, but the
      addressing of each individual bit was inverted.
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Cc: Brian Lilly <brian@crystalfontz.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: Thomas Petazzoni <thomas@free-electrons.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      552f0cc7
    • Mel Gorman's avatar
      mm: migrate: check page_count of THP before migrating · 04fa5d6a
      Mel Gorman authored
      Hugh Dickins pointed out that migrate_misplaced_transhuge_page() does
      not check page_count before migrating like base page migration and
      khugepage.  He could not see why this was safe and he is right.
      
      The potential impact of the bug is avoided due to the limitations of
      NUMA balancing.  The page_mapcount() check ensures that only a single
      address space is using this page and as THPs are typically private it
      should not be possible for another address space to fault it in
      parallel.  If the address space has one associated task then it's
      difficult to have both a GUP pin and be referencing the page at the same
      time.  If there are multiple tasks then a buggy scenario requires that
      another thread be accessing the page while the direct IO is in flight.
      This is dodgy behaviour as there is a possibility of corruption with or
      without THP migration.  It would be
      
      While we happen to be safe for the most part it is shoddy to depend on
      such "safety" so this patch checks the page count similar to anonymous
      pages.  Note that this does not mean that the page_mapcount() check can
      go away.  If we were to remove the page_mapcount() check the the THP
      would have to be unmapped from all referencing PTEs, replaced with
      migration PTEs and restored properly afterwards.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04fa5d6a
    • Andrew Morton's avatar
      drivers/rtc/rtc-da9055.c: fix cross-section reference · 0a1af1d6
      Andrew Morton authored
      Fix the warning
      
        WARNING: drivers/rtc/rtc-da9055.o(.text+0xa71): Section mismatch in reference from the function da9055_rtc_probe() to the function .init.text:da9055_rtc_device_init()
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a1af1d6
    • David Decotigny's avatar
      lib: cpu_rmap: avoid flushing all workqueues · 896f97ea
      David Decotigny authored
      In some cases, free_irq_cpu_rmap() is called while holding a lock (eg
      rtnl).  This can lead to deadlocks, because it invokes
      flush_scheduled_work() which ends up waiting for whole system workqueue
      to flush, but some pending works might try to acquire the lock we are
      already holding.
      
      This commit uses reference-counting to replace
      irq_run_affinity_notifiers().  It also removes
      irq_run_affinity_notifiers() altogether.
      
      [akpm@linux-foundation.org: eliminate free_cpu_rmap, rename cpu_rmap_reclaim() to cpu_rmap_release(), propagate kref_put() retval from cpu_rmap_put()]
      Signed-off-by: default avatarDavid Decotigny <decot@googlers.com>
      Reviewed-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Acked-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      896f97ea
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 93ccb391
      Linus Torvalds authored
      Pull NFS client bugfix from Trond Myklebust:
      
      - Fix a socket lock leak in net/sunrpc/xprt.c
      
      * tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
      93ccb391
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 52b820d9
      Linus Torvalds authored
      Pull intel DRM fixes from Dave Airlie:
       "Just intel fixes, including getting the Ironlake systems back to the
        state they were in for 3.6."
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/i915: Revert shrinker changes from "Track unbound pages"
        drm/i915: Use pixel size for computing linear offsets into a sprite
        drm/i915: Add DEBUG messages to all intel_create_user_framebuffer error paths
        drm/i915: The sprite scaler on Ironlake also support YUV planes
        drm: Only evict the blocks required to create the requested hole
        drm/i915: Treat crtc->mode.clock == 0 as disabled
        Revert "drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13"
        drm/i915; Only increment the user-pin-count after successfully pinning the bo
      52b820d9
    • Mel Gorman's avatar
      mm: compaction: Partially revert capture of suitable high-order page · 47ecfcb7
      Mel Gorman authored
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca "mm: compaction: capture a suitable
      high-order page immediately when it is made available".
      Reported-and-tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47ecfcb7
  4. 10 Jan, 2013 10 commits
    • Randy Dunlap's avatar
      seq_file: fix new kernel-doc warnings · 254adaa4
      Randy Dunlap authored
      Fix kernel-doc warnings in fs/seq_file.c:
      
        Warning(fs/seq_file.c:304): No description found for parameter 'whence'
        Warning(fs/seq_file.c:304): Excess function parameter 'origin' description in 'seq_lseek'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      254adaa4
    • Randy Dunlap's avatar
      pci: fix iov.c kernel-doc warnings · 2094f167
      Randy Dunlap authored
      Fix kernel-doc warning in iov.c:
      
        Warning(drivers/pci/iov.c:752): No description found for parameter 'numvfs'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Sorry-by: default avatarDon Dutile <ddutile@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2094f167
    • Randy Dunlap's avatar
      audit: fix auditfilter.c kernel-doc warnings · bfbbd96c
      Randy Dunlap authored
      Fix new kernel-doc warning in auditfilter.c:
      
        Warning(kernel/auditfilter.c:1157): Excess function parameter 'uid' description in 'audit_receive_filter'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: linux-audit@redhat.com (subscribers-only)
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfbbd96c
    • Randy Dunlap's avatar
      nfs: fix sunrpc/clnt.c kernel-doc warnings · 7144bca6
      Randy Dunlap authored
      Fix new kernel-doc warnings in clnt.c:
      
        Warning(net/sunrpc/clnt.c:561): No description found for parameter 'flavor'
        Warning(net/sunrpc/clnt.c:561): Excess function parameter 'auth' description in 'rpc_clone_client_set_auth'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: linux-nfs@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7144bca6
    • Dave Airlie's avatar
      Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel · 82ba789f
      Dave Airlie authored
      Daniel writes:
      "Pretty much all just major fixes:
      - 2 pieces of duct-tape for the ilk bug.
      - Sprite regression fixes from Chris.
      - OOPS fix for a div-by-zero from Chris, regression due to the modeset
        rework in 3.7, now brought to light by a benign change in 3.8.
      - Fix interrupted bo pinning, used to work around CS coherency issues on
        i830/i845 (kernel also has a w/a newly in 3.8, but pinning is more efficient if
        possible)."
      82ba789f
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86 · ecf02a60
      Linus Torvalds authored
      Pull x86 platform driver bugfixes from Matthew Garrett.
      
      * 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
        asus-laptop: Fix potential invalid pointer dereference
        Update MAINTAINERS entry
        asus-laptop: Do not call HWRS on init
        sony-laptop: fix SNC buffer calls when SN06 returns Integers
        samsung-laptop: Add quirk for broken acpi_video backlight on N250P
        acer-wmi: add Aspire 5741G touchpad toggle key
        acer-wmi: change to emit touchpad on off key
        acer-wmi: fix obj is NULL but dereferenced
        MAINTAINERS: change the mail address of acer-wmi/msi-laptop maintainer
      ecf02a60
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · ccae663c
      Linus Torvalds authored
      Pull KVM bugfixes from Marcelo Tosatti.
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: use dynamic percpu allocations for shared msrs area
        KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
        powerpc: Corrected include header path in kvm_para.h
        Add rcu user eqs exception hooks for async page fault
      ccae663c
    • Linus Torvalds's avatar
      Merge tag 'trace-3.8-rc2-regression-fix' of... · 4ffd4ebf
      Linus Torvalds authored
      Merge tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing regression fix from Steven Rostedt:
       "A change that came in this merge window broke the writing to the
        trace_options file.  It causes garbage to be read during the compare
        of option names, and breaks setting options via the trace_options
        file, although options can still be set via the options/<option>
        files."
      
      * tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix regression of trace_options file setting
      4ffd4ebf
    • Daniel Vetter's avatar
      drm/i915: Revert shrinker changes from "Track unbound pages" · 93927ca5
      Daniel Vetter authored
      This partially reverts
      
      commit 6c085a72
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Aug 20 11:40:46 2012 +0200
      
          drm/i915: Track unbound pages
      
      Closer inspection of that patch revealed a bunch of unrelated changes
      in the shrinker:
      - The shrinker count is now in pages instead of objects.
      - For counting the shrinkable objects the old code only looked at the
        inactive list, the new code looks at all bounds objects (including
        pinned ones). That is obviously in addition to the new unbound list.
      - The shrinker cound is no longer scaled with
        sysctl_vfs_cache_pressure. Note though that with the default tuning
        value of vfs_cache_pressue = 100 this doesn't affect the shrinker
        behaviour.
      - When actually shrinking objects, the old code first dropped
        purgeable objects, then normal (inactive) objects. Only then did it,
        in a last-ditch effort idle the gpu and evict everything. The new
        code omits the intermediate step of evicting normal inactive
        objects.
      
      Safe for the first change, which seems benign, and the shrinker count
      scaling, which is a bit a different story, the endresult of all these
      changes is that the shrinker is _much_ more likely to fall back to the
      last-ditch resort of idling the gpu and evicting everything.  The old
      code could only do that if something else evicted lots of objects
      meanwhile (since without any other changes the nr_to_scan will be
      smaller than the object count).
      
      Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only
      dentry/inode object caches should scale their shrinker counts with
      vfs_cache_pressure. Originally I've had that change reverted, too. But
      Chris Wilson insisted that it's too bogus and shouldn't again see the
      light of day.
      
      Hence revert all these other changes and restore the old shrinker
      behaviour, with the minor adjustment that we now first scan the
      unbound list, then the inactive list for each object category
      (purgeable or normal).
      
      A similar patch has been tested by a few people affected by the gen4/5
      hangs which started to appear in 3.7, which some people bisected to
      the "drm/i915: Track unbound pages" commit. But just disabling the
      unbound logic alone didn't change things at all.
      
      Note that this patch doesn't fix the referenced bugs, it only hides
      the underlying bug(s) well enough to restore pre-3.7 behaviour. The
      key to achieve that is to massively reduce the likelyhood of going
      into a full gpu stall and evicting everything.
      
      v2: Reword commit message a bit, taking Chris Wilson's comment into
      account.
      
      v3: On Chris Wilson's insistency, do not reinstate the rather bogus
      vfs_cache_pressure change.
      Tested-by: default avatarGreg KH <gregkh@linuxfoundation.org>
      Tested-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
      References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
      References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
      References: https://bugs.freedesktop.org/show_bug.cgi?id=57136
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      93927ca5
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 7be72c39
      Linus Torvalds authored
      Pull s390 patches from Martin Schwidefsky:
       "Add the finit_module system call, fix the irq statistics in
        /proc/stat, fix a s390dbf lockdep problem, a patch revert for a
        problem that is not 100% understood yet, and a few patches to
        fix warnings."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: define read*_relaxed functions
        s390/topology: export cpu_topology
        s390/pm: export pm_power_off
        s390/pci: define isa_dma_bridge_buggy
        s390/3215: partially revert tty close handling fix
        s390/irq: count cpu restart events
        s390/irq: remove split irq fields from /proc/stat
        s390/irq: enable irq sum accounting for /proc/stat again
        s390/syscalls: wire up finit_module syscall
        s390/pci: remove dead code
        s390/smp: fix section mismatch for smp_add_present_cpu()
        s390/debug: Fix s390dbf lockdep problem in debug_(un)register_view()
      7be72c39