1. 19 Feb, 2014 2 commits
    • Mika Westerberg's avatar
      x86: tsc: Add missing Baytrail frequency to the table · 3e11e818
      Mika Westerberg authored
      Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means
      that the CPU reference clock runs at 83.3MHz. Add this missing frequency to
      the table.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      3e11e818
    • Thomas Gleixner's avatar
      x86, tsc: Fallback to normal calibration if fast MSR calibration fails · 5f0e0309
      Thomas Gleixner authored
      If we cannot calibrate TSC via MSR based calibration
      try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
      to the caller. This value gets then propagated further to clockevents
      code resulting division by zero oops like the one below:
      
       divide error: 0000 [#1] PREEMPT SMP
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
       task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
       RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
       RSP: 0000:ffff880075507e58  EFLAGS: 00010246
       RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
       RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
       R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
       R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
       Stack:
        ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
        ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
        ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
       Call Trace:
        [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
        [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
        [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
        [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
        [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
        [<ffffffff8177c91e>] kernel_init+0xe/0x120
        [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
      
      Prevent this from happening by:
       1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
          if it fails.
       2) Check this return value in native_calibrate_tsc() and in case of zero
          fallback to use normal non-MSR based calibration.
      
      [mw: Added subject and changelog]
      Reported-and-tested-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5f0e0309
  2. 18 Feb, 2014 2 commits
  3. 17 Feb, 2014 32 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 87eeff79
      Linus Torvalds authored
      Pull Ceph fixes from Sage Weil:
       "We have some patches fixing up ACL support issues from Zheng and
        Guangliang and a mount option to enable/disable this support.  (These
        fixes were somewhat delayed by the Chinese holiday.)
      
        There is also a small fix for cached readdir handling when directories
        are fragmented"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        ceph: fix __dcache_readdir()
        ceph: add acl, noacl options for cephfs mount
        ceph: make ceph_forget_all_cached_acls() static inline
        ceph: add missing init_acl() for mkdir() and atomic_open()
        ceph: fix ceph_set_acl()
        ceph: fix ceph_removexattr()
        ceph: remove xattr when null value is given to setxattr()
        ceph: properly handle XATTR_CREATE and XATTR_REPLACE
      87eeff79
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 · 351a7934
      Linus Torvalds authored
      Pull CIFS fixes from Steve French:
       "Three cifs fixes, the most important fixing the problem with passing
        bogus pointers with writev (CVE-2014-0069).
      
        Two additional cifs fixes are still in review (including the fix for
        an append problem which Al also discovered)"
      
      * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Fix too big maxBuf size for SMB3 mounts
        cifs: ensure that uncached writes handle unmapped areas correctly
        [CIFS] Fix cifsacl mounts over smb2 to not call cifs
      351a7934
    • David Howells's avatar
      FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree · 7026f192
      David Howells authored
      When FS-Cache allocates an object, the following sequence of events can
      occur:
      
       -->fscache_alloc_object()
          -->cachefiles_alloc_object() [via cache->ops->alloc_object]
          <--[returns new object]
          -->fscache_attach_object()
          <--[failed]
          -->cachefiles_put_object() [via cache->ops->put_object]
             -->fscache_object_destroy()
                -->fscache_objlist_remove()
                   -->rb_erase() to remove the object from fscache_object_list.
      
      resulting in a crash in the rbtree code.
      
      The problem is that the object is only added to fscache_object_list on
      the success path of fscache_attach_object() where it calls
      fscache_objlist_add().
      
      So if fscache_attach_object() fails, the object won't have been added to
      the objlist rbtree.  We do, however, unconditionally try to remove the
      object from the tree.
      
      Thanks to NeilBrown for finding this and suggesting this solution.
      Reported-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatar(a customer of) NeilBrown <neilb@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7026f192
    • Dave Jones's avatar
      reiserfs: fix utterly brain-damaged indentation. · 416e2abd
      Dave Jones authored
      This has been this way for years, and every time I stumble across it I
      lose my lunch.  After coming across it for the nth time in the Coverity
      results, I had to overcome the bystander effect and do something about
      it.
      
      This ignores the 79 column limit in favor of making it look like C
      instead of gibberish.
      
      The correct thing to do here would be to lose some of the indentation by
      breaking this function up into several smaller ones.  I might do that at
      some point if I have the stomach to look at this again.
      
      (Also some of those overlong ternary operations would likely be more
      readable as regular if's)
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      416e2abd
    • Linus Torvalds's avatar
      Merge tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf · 60f76eab
      Linus Torvalds authored
      Pull dma-buf fix from Sumit Semwal:
       "Just some debugfs output updates.
      
        There's another patch related to dma-buf, but it'll get upstreamed via
        Greg KH's pull request"
      
      * tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
        dma-buf: update debugfs output
      60f76eab
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 · 2b250395
      Linus Torvalds authored
      Pull AVR32 fixes from Hans-Christian Egtvedt.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
        avr32: add generic vga.h to Kbuild
        avr32: add generic ioremap_wc() definition in io.h
        avr32: Makefile: add '-D__linux__' flag for gcc-4.4.7 use
        avr32: fix missing module.h causing build failure in mimc200/fram.c
      2b250395
    • Yan, Zheng's avatar
      ceph: fix __dcache_readdir() · 4d5f5df6
      Yan, Zheng authored
      If directory is fragmented, readdir() read its dirfrags one by one.
      After reading all dirfrags, the corresponding dentries are sorted in
      (frag_t, off) order in the dcache. If dentries of a directory are all
      cached, __dcache_readdir() can use the cached dentries to satisfy
      readdir syscall. But when checking if a given dentry is after the
      position of readdir, __dcache_readdir() compares numerical value of
      frag_t directly. This is wrong, it should use ceph_frag_compare().
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      4d5f5df6
    • Sage Weil's avatar
      ceph: add acl, noacl options for cephfs mount · 45195e42
      Sage Weil authored
      Make the 'acl' option dependent on having ACL support compiled in.  Make
      the 'noacl' option work even without it so that one can always ask it to
      be off and not error out on mount when it is not supported.
      Signed-off-by: default avatarGuangliang Zhao <lucienchao@gmail.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      45195e42
    • Guangliang Zhao's avatar
    • Yan, Zheng's avatar
    • Yan, Zheng's avatar
      ceph: fix ceph_set_acl() · 7a92d647
      Yan, Zheng authored
      If acl is equivalent to file mode permission bits, ceph_set_acl()
      needs to remove any existing acl xattr. Use __ceph_setxattr() to
      handle both setting and removing acl xattr cases, it doesn't return
      -ENODATA when there is no acl xattr.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      7a92d647
    • Yan, Zheng's avatar
      ceph: fix ceph_removexattr() · 524186ac
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      524186ac
    • Yan, Zheng's avatar
      ceph: remove xattr when null value is given to setxattr() · bcdfeb2e
      Yan, Zheng authored
      For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
      to distinguish null value case from the zero-length value case.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      bcdfeb2e
    • Yan, Zheng's avatar
      ceph: properly handle XATTR_CREATE and XATTR_REPLACE · fbc0b970
      Yan, Zheng authored
      return -EEXIST if XATTR_CREATE is set and xattr alread exists.
      return -ENODATA if XATTR_REPLACE is set but xattr does not exist.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      fbc0b970
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · f2a77abd
      Linus Torvalds authored
      Pull powerpc fixes from Ben Herrenschmidt:
       "Here are some more powerpc fixes for 3.14
      
        The main one is a nasty issue with the NUMA balancing support which
        requires a small generic change and the addition of a new accessor to
        set _PAGE_NUMA.  Both have been reviewed and acked by Mel and Rik.
      
        The changelog should have plenty of details but basically, without
        this fix, we get random user segfaults and/or corruptions due to
        missing TLB/hash flushes.  Aneesh series of 3 patches fixes it.
      
        We have some vDSO vs.  perf fixes from Anton, some small EEH fixes
        from Gavin, a ppc32 regression vs the stack overflow detector, and a
        fix for the way we handle PCIe host bridge speed settings on pseries
        (which is needed for proper operations of AMD graphics cards on
        Power8)"
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc/eeh: Disable EEH on reboot
        powerpc/eeh: Cleanup on eeh_subsystem_enabled
        powerpc/powernv: Rework EEH reset
        powerpc: Use unstripped VDSO image for more accurate profiling data
        powerpc: Link VDSOs at 0x0
        mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit
        mm: Dirty accountable change only apply to non prot numa case
        powerpc/mm: Add new "set" flag argument to pte/pmd update function
        powerpc/pseries: Add Gen3 definitions for PCIE link speed
        powerpc/pseries: Fix regression on PCI link speed
        powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
      f2a77abd
    • Linus Torvalds's avatar
      printk: fix syslog() overflowing user buffer · e4178d80
      Linus Torvalds authored
      This is not a buffer overflow in the traditional sense: we don't
      overflow any *kernel* buffers, but we do mis-count the amount of data we
      copy back to user space for the SYSLOG_ACTION_READ_ALL case.
      
      In particular, if the user buffer is too small to hold everything, and
      *if* there is a continuation line at just the right place, we can end up
      giving the user more data than he asked for.
      
      The reason is that we first count up the number of bytes all the log
      records contains, then we walk the records again until we've skipped the
      records at the beginning that won't fit, and then we walk the rest of
      the records and copy them to the user space buffer.
      
      And in between that "skip the initial records that won't fit" and the
      "copy the records that *will* fit to user space", we reset the 'prev'
      variable that contained the record information for the last record not
      copied.  That meant that when we started copying to user space, we now
      had a different character count than what we had originally calculated
      in the first record walk-through.
      
      The fix is to simply not clear the 'prev' flags value (in both cases
      where we had the same logic: syslog_print_all and kmsg_dump_get_buffer:
      the latter is used for pstore-like dumping)
      Reported-and-tested-by: default avatarDebabrata Banerjee <dbanerje@akamai.com>
      Acked-by: default avatarKay Sievers <kay@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4178d80
    • Chen Gang's avatar
      avr32: add generic vga.h to Kbuild · d7668f9d
      Chen Gang authored
      Need add generic "vga.h", or can not pass building for allmodconfig,
      the related error:
      
          CC [M]  drivers/gpu/drm/drm_irq.o
        In file included from include/linux/vgaarb.h:34,
                         from drivers/gpu/drm/drm_irq.c:42:
        include/video/vga.h:22:21: error: asm/vga.h: No such file or directory
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Acked-by: default avatarHans-Christian Egtvedt <hegtvedt@cisco.com>
      d7668f9d
    • Chen Gang's avatar
      avr32: add generic ioremap_wc() definition in io.h · 1bbce4f3
      Chen Gang authored
      Need generic ioremap_wc(), or can not pass compiling with allmodconfig,
      the related error:
      
          CC [M]  drivers/gpu/drm/drm_bufs.o
        drivers/gpu/drm/drm_bufs.c: In function 'drm_addmap_core':
        drivers/gpu/drm/drm_bufs.c:217: error: implicit declaration of function 'ioremap_wc'
        drivers/gpu/drm/drm_bufs.c:218: warning: assignment makes pointer from integer without a cast
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Acked-by: default avatarHans-Christian Egtvedt <hegtvedt@cisco.com>
      1bbce4f3
    • Chen Gang's avatar
      avr32: Makefile: add '-D__linux__' flag for gcc-4.4.7 use · 8d80390c
      Chen Gang authored
      For avr32 cross compiler, do not define '__linux__' internally, so it
      will cause issue with allmodconfig.
      
      The related error:
      
          CC [M]  fs/coda/psdev.o
        In file included from include/linux/coda.h:64,
                         from fs/coda/psdev.c:45:
        include/uapi/linux/coda.h:221: error: expected specifier-qualifier-list before 'u_quad_t'
      
      The related toolchain version (which only download, not re-compile):
      
        [root@gchen linux-next]# /upstream/toolchain/download/avr32-gnu-toolchain-linux_x86/bin/avr32-gcc -v
        Using built-in specs.
        Target: avr32
        Configured with: /data2/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/src/gcc/configure --target=avr32 --host=i686-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-languages=c,c++ --disable-nls --disable-libssp --disable-libstdcxx-pch --with-dwarf2 --enable-version-specific-runtime-libs --disable-shared --enable-doc --with-mpfr-lib=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/lib --with-mpfr-include=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/include --with-gmp=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --with-mpc=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-__cxa_atexit --disable-shared --with-newlib --with-pkgversion=AVR_32_bit_GNU_Toolchain_3.4.2_435 --with-bugurl=http://www
      .atmel.com/avr
        Thread model: single
        gcc version 4.4.7 (AVR_32_bit_GNU_Toolchain_3.4.2_435)
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Acked-by: default avatarHans-Christian Egtvedt <hegtvedt@cisco.com>
      Cc: stable@vger.kernel.org
      8d80390c
    • Paul Gortmaker's avatar
      avr32: fix missing module.h causing build failure in mimc200/fram.c · 5745d6a4
      Paul Gortmaker authored
      Causing this:
      
      In file included from arch/avr32/boards/mimc200/fram.c:13:
      include/linux/miscdevice.h:51: error: field 'list' has incomplete type
      include/linux/miscdevice.h:55: error: expected specifier-qualifier-list before 'mode_t'
      arch/avr32/boards/mimc200/fram.c:42: error: 'THIS_MODULE' undeclared here (not in a function)
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      Acked-by: default avatarHans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: stable@vger.kernel.org
      5745d6a4
    • Theodore Ts'o's avatar
      ext4: don't leave i_crtime.tv_sec uninitialized · 19ea8060
      Theodore Ts'o authored
      If the i_crtime field is not present in the inode, don't leave the
      field uninitialized.
      
      Fixes: ef7f3835 ("ext4: Add nanosecond timestamps")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      19ea8060
    • Gavin Shan's avatar
      powerpc/eeh: Disable EEH on reboot · 66f9af83
      Gavin Shan authored
      We possiblly detect EEH errors during reboot, particularly in kexec
      path, but it's impossible for device drivers and EEH core to handle
      or recover them properly.
      
      The patch registers one reboot notifier for EEH and disable EEH
      subsystem during reboot. That means the EEH errors is going to be
      cleared by hardware reset or second kernel during early stage of
      PCI probe.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      66f9af83
    • Gavin Shan's avatar
      powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad
      Gavin Shan authored
      The patch cleans up variable eeh_subsystem_enabled so that we needn't
      refer the variable directly from external. Instead, we will use
      function eeh_enabled() and eeh_set_enable() to operate the variable.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ec5a0ad
    • Gavin Shan's avatar
      powerpc/powernv: Rework EEH reset · 5b2e198e
      Gavin Shan authored
      When doing reset in order to recover the affected PE, we issue
      hot reset on PE primary bus if it's not root bus. Otherwise, we
      issue hot or fundamental reset on root port or PHB accordingly.
      For the later case, we didn't cover the situation where PE only
      includes root port and it potentially causes kernel crash upon
      EEH error to the PE.
      
      The patch reworks the logic of EEH reset to improve the code
      readability and also avoid the kernel crash.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5b2e198e
    • Anton Blanchard's avatar
      powerpc: Use unstripped VDSO image for more accurate profiling data · 24b659a1
      Anton Blanchard authored
      We are seeing a lot of hits in the VDSO that are not resolved by perf.
      A while(1) gettimeofday() loop shows the issue:
      
      27.64%  [vdso]  [.] 0x000000000000060c
      22.57%  [vdso]  [.] 0x0000000000000628
      16.88%  [vdso]  [.] 0x0000000000000610
      12.39%  [vdso]  [.] __kernel_gettimeofday
       6.09%  [vdso]  [.] 0x00000000000005f8
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We are using a stripped VDSO image which means only symbols with
      relocation info can be resolved. There isn't a lot of point to
      stripping the VDSO, the debug info is only about 1kB:
      
      4680 arch/powerpc/kernel/vdso64/vdso64.so
      5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg
      
      By using the unstripped image, we can resolve all the symbols in the
      VDSO and the perf profile data looks much better:
      
      76.53%  [vdso]  [.] __do_get_tspec
      12.20%  [vdso]  [.] __kernel_gettimeofday
       5.05%  [vdso]  [.] __get_datapage
       3.20%  test    [.] main
       2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      24b659a1
    • Anton Blanchard's avatar
      powerpc: Link VDSOs at 0x0 · a0a4419e
      Anton Blanchard authored
      perf is failing to resolve symbols in the VDSO. A while (1)
      gettimeofday() loop shows:
      
      93.99%  [vdso]  [.] 0x00000000000005e0
       3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.81%  test    [.] main
      
      The reason for this is that we are linking our VDSO shared libraries
      at 1MB, which is a little weird. Even though this is uncommon, Alan
      points out that it is valid and we should probably fix perf userspace.
      
      Regardless, I can't see a reason why we are doing this. The code
      is all position independent and we never rely on the VDSO ending
      up at 1M (and we never place it there on 64bit tasks).
      
      Changing our link address to 0x0 fixes perf VDSO symbol resolution:
      
      73.18%  [vdso]  [.] 0x000000000000060c
      12.39%  [vdso]  [.] __kernel_gettimeofday
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We still have some local symbol resolution issues that will be
      fixed in a subsequent patch.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0a4419e
    • Aneesh Kumar K.V's avatar
      mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit · 56eecdb9
      Aneesh Kumar K.V authored
      Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
      a hash table MMU for various reasons (the flush is handled as part of
      the PTE modification when necessary).
      
      ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.
      
      Additionally ppc64 require the tlb flushing to be batched within ptl locks.
      
      The reason to do that is to ensure that the hash page table is in sync with
      linux page table.
      
      We track the hpte index in linux pte and if we clear them without flushing
      hash and drop the ptl lock, we can have another cpu update the pte and can
      end up with duplicate entry in the hash table, which is fatal.
      
      We also want to keep set_pte_at simpler by not requiring them to do hash
      flush for performance reason. We do that by assuming that set_pte_at() is
      never *ever* called on a PTE that is already valid.
      
      This was the case until the NUMA code went in which broke that assumption.
      
      Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
      way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
      using set_pte_at() and a powerpc specific one using the appropriate
      mechanism needed to keep the hash table in sync.
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      56eecdb9
    • Aneesh Kumar K.V's avatar
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d
      Aneesh Kumar K.V authored
      pte_update() is a powerpc-ism used to change the bits of a PTE
      when the access permission is being restricted (a flush is
      potentially needed).
      
      It uses atomic operations on when needed and handles the hash
      synchronization on hash based processors.
      
      It is currently only used to clear PTE bits and so the current
      implementation doesn't provide a way to also set PTE bits.
      
      The new _PAGE_NUMA bit, when set, is actually restricting access
      so it must use that function too, so this change adds the ability
      for pte_update() to also set bits.
      
      We will use this later to set the _PAGE_NUMA bit.
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      88247e8d
    • Kleber Sacilotto de Souza's avatar
      powerpc/pseries: Add Gen3 definitions for PCIE link speed · 49d9684a
      Kleber Sacilotto de Souza authored
      Rev3 of the PCI Express Base Specification defines a Supported Link
      Speeds Vector where the bit definitions within this field are:
      
      Bit 0 - 2.5 GT/s
      Bit 1 - 5.0 GT/s
      Bit 2 - 8.0 GT/s
      
      This vector definition is used by the platform firmware to export the
      maximum and current link speeds of the PCI bus via the
      "ibm,pcie-link-speed-stats" device-tree property.
      
      This patch updates pseries_root_bridge_prepare() to detect Gen3
      speed buses (defined by 0x04).
      Signed-off-by: default avatarKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49d9684a
    • Kleber Sacilotto de Souza's avatar
      powerpc/pseries: Fix regression on PCI link speed · b020cc6c
      Kleber Sacilotto de Souza authored
      Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue)
      introduced a regression on the PCI link speed detection using the
      device-tree property. The ibm,pcie-link-speed-stats property is composed
      of two 32-bit integers, the first one being the maxinum link speed and
      the second the current link speed. The changes introduced by the
      aforementioned commit are considering just the first integer.
      
      Fix this issue by changing how the property is accessed, using the
      helper functions to properly access the array of values. The explicit
      byte swapping is not needed anymore here, since it's done by the helper
      functions.
      Signed-off-by: default avatarKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b020cc6c
    • Kevin Hao's avatar
      powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack · 1a18a664
      Kevin Hao authored
      Guenter Roeck has got the following call trace on a p2020 board:
        Kernel stack overflow in process eb3e5a00, r1=eb79df90
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        task: eb3e5a00 ti: c0616000 task.ti: ef440000
        NIP: c003a420 LR: c003a410 CTR: c0017518
        REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
        MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
        GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
        GPR08: 00000000 020b8000 00000000 00000000 44008442
        NIP [c003a420] __do_softirq+0x94/0x1ec
        LR [c003a410] __do_softirq+0x84/0x1ec
        Call Trace:
        [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
        [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
        [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
        [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
        [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
        --- Exception: 501 at 0xfcda524
            LR = 0x10024900
        Instruction dump:
        7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
        5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
        Kernel panic - not syncing: kernel stack overflow
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        Call Trace:
      
      The reason is that we have used the wrong register to calculate the
      ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64).
      Just fix it.
      
      As suggested by Benjamin Herrenschmidt, also add the C prototype of the
      function in the comment in order to avoid such kind of errors in the
      future.
      
      Cc: stable@vger.kernel.org # 3.12
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1a18a664
  4. 16 Feb, 2014 4 commits