1. 07 Apr, 2008 7 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 · 950b0d28
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
        x86: fix 64-bit asm NOPS for CONFIG_GENERIC_CPU
        x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier()
        revert "x86: tsc prevent time going backwards"
      950b0d28
    • Rusty Russell's avatar
      virtio: remove overzealous BUG_ON. · 2557a933
      Rusty Russell authored
      The 'disable_cb' callback is designed as an optimization to tell the host
      we don't need callbacks now.  As it is not reliable, the debug check is
      overzealous: it can happen on two CPUs at the same time.  Document this.
      
      Even if it were reliable, the virtio_net driver doesn't disable
      callbacks on transmit so the START_USE/END_USE debugging reentrance
      protection can be easily tripped even on UP.
      
      Thanks to Balaji Rao for the bug report and testing.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      CC: Balaji Rao <balajirrao@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2557a933
    • Suresh Siddha's avatar
      x86: fix 64-bit asm NOPS for CONFIG_GENERIC_CPU · 871de939
      Suresh Siddha authored
      ASM_NOP's for 64-bit kernel with CONFIG_GENERIC_CPU is broken
      with the recent x86 nops merge. They were using GENERIC_NOPS
      which will truncate the upper 32bits of %rsi, because of the missing
      64bit rex prefix.
      
      For now, fall back ASM NOPS for generic cpu to K8 NOPS, similar
      to the code before the wrong x86 nop merge.
      
      This should resolve the crash seen by Ingo on a test-system:
      
      BUG: unable to handle kernel paging request at 00000000d80d8ee8
      IP: [<ffffffff802121af>] save_i387_ia32+0x61/0xd8
      PGD b8e0067 PUD 51490067 PMD 0
      Oops: 0000 [1] SMP
      CPU 2
      Modules linked in:
      Pid: 3871, comm: distcc Not tainted 2.6.25-rc7-sched-devel.git-x86-latest.git #359
      RIP: 0010:[<ffffffff802121af>]  [<ffffffff802121af>] save_i387_ia32+0x61/0xd8
      RSP: 0000:ffff81003abd3cb8  EFLAGS: 00010246
      RAX: ffff810082e93400 RBX: 00000000ffc37f84 RCX: ffff8100d80d8ee0
      RDX: 0000000000000000 RSI: 00000000d80d8ee0 RDI: ffff810082e93400
      RBP: 00000000ffc37fdc R08: 00000000ffc37f88 R09: 0000000000000008
      R10: ffff81003abd2000 R11: 0000000000000000 R12: ffff810082e93400
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff81011fb12dc0(0063) knlGS:00000000f7f1a6c0
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000d80d8ee8 CR3: 0000000076922000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process distcc (pid: 3871, threadinfo ffff81003abd2000, task ffff8100d80d8ee0)
      Stack:  ffff8100bb670380 ffffffff8026de50 0000000000000118 0000000000000002
       0000000000000002 ffff81003abd3e68 ffff81003abd3ed8 ffff81003abd3de8
       ffff81003abd3d18 ffffffff80229785 ffff8100d80d8ee0 ffff810001041280
      Call Trace:
       [<ffffffff8026de50>] ? __generic_file_aio_write_nolock+0x343/0x377
       [<ffffffff80229785>] ? update_curr+0x54/0x64
       [<ffffffff80227cd3>] ? ia32_setup_sigcontext+0x125/0x1d2
       [<ffffffff8022839f>] ? ia32_setup_frame+0x73/0x1a5
       [<ffffffff8020b2a5>] ? do_notify_resume+0x1aa/0x7db
       [<ffffffff8024ae8c>] ? getnstimeofday+0x31/0x85
       [<ffffffff80249858>] ? ktime_get_ts+0x17/0x48
       [<ffffffff80249933>] ? ktime_get+0xc/0x41
       [<ffffffff8024973e>] ? hrtimer_nanosleep+0x75/0xd5
       [<ffffffff80249261>] ? hrtimer_wakeup+0x0/0x21
       [<ffffffff8020bfbc>] ? int_signal+0x12/0x17
       [<ffffffff8030e6b3>] ? dummy_file_free_security+0x0/0x1
      
      Code: a6 08 05 00 00 f6 40 14 01 74 34 4c 89 e7 48 0f ae 07 48 8b 86 08 05 00 00 80 78 02 00 79 02 db e2 90 8d b4 26 00 00 00 00 89 f6 <48> 8b 46 08 83 60 14 fe 0f 20 c0 48 83 c8 08 0f 22 c0 eb 07 c6 
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      871de939
    • Karsten Wiese's avatar
      x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier() · 4f41c94d
      Karsten Wiese authored
      In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it
      instead of smp_processor_id() in the call to set_cyc2ns_scale().
      This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update
      the intended cpu's cyc2ns.
      
      Related mail/thread: http://lkml.org/lkml/2007/12/7/130Signed-off-by: default avatarKarsten Wiese <fzu@wemgehoertderstaat.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4f41c94d
    • Ingo Molnar's avatar
      revert "x86: tsc prevent time going backwards" · 5b13d863
      Ingo Molnar authored
      revert:
      
      | commit 47001d60
      | Author: Thomas Gleixner <tglx@linutronix.de>
      | Date:   Tue Apr 1 19:45:18 2008 +0200
      |
      |     x86: tsc prevent time going backwards
      
      it has been identified to cause suspend regression - and the
      commit fixes a longstanding bug that existed before 2.6.25 was
      opened - so it can wait some more until the effects are better
      understood.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5b13d863
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 4cac04dd
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
        fix endian lossage in forcedeth
        net/tokenring/olympic.c section fixes
        net: marvell.c fix sparse shadowed variable warning
        [VLAN]: Fix egress priority mappings leak.
        [TG3]: Add PHY workaround for 5784
        [NET]: srandom32 fixes for networking v2
        [IPV6]: Fix refcounting for anycast dst entries.
        [IPV6]: inet6_dev on loopback should be kept until namespace stop.
        [IPV6]: Event type in addrconf_ifdown is mis-used.
        [ICMP]: Ensure that ICMP relookup maintains status quo
      4cac04dd
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · e1c287b9
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
        [SPARC64]: Fix user accesses in regset code.
        [SPARC64]: Fix FPU saving in 64-bit signal handling.
      e1c287b9
  2. 06 Apr, 2008 12 commits
  3. 05 Apr, 2008 1 commit
  4. 04 Apr, 2008 20 commits
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/upstream-linus · 6fdf5e67
      Linus Torvalds authored
      * 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/upstream-linus:
        [MIPS] Make KGDB compile on UP
        [MIPS] Pb1200: Fix header breakage
      6fdf5e67
    • David S. Miller's avatar
    • Carol Hebert's avatar
      ipmi: change device node ordering to reflect probe order · abd24df8
      Carol Hebert authored
      In 2.6.14 a patch was merged which switching the order of the ipmi device
      naming from in-order-of-discovery over to reverse-order-of-discovery.
      
      So on systems with multiple BMC interfaces, the ipmi device names are being
      created in reverse order relative to how they are discovered on the system
      (e.g.  on an IBM x3950 multinode server with N nodes, the device name for the
      BMC in the first node is /dev/ipmiN-1 and the device name for the BMC in the
      last node is /dev/ipmi0, etc.).
      
      The problem is caused by the list handling routines chosen in dmi_scan.c.
      Using list_add() causes the multiple ipmi devices to be added to the device
      list using a stack-paradigm and so the ipmi driver subsequently pulls them off
      during initialization in LIFO order.  This patch changes the
      dmi_save_ipmi_device() list handling paradigm to a queue, thereby allowing the
      ipmi driver to build the ipmi device names in the order in which they are
      found on the system.
      Signed-off-by: default avatarCarol Hebert <cah@us.ibm.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abd24df8
    • Alexey Korolev's avatar
      mtd: fix broken state in CFI driver caused by FL_SHUTDOWN · fb6d080c
      Alexey Korolev authored
      THe CFI driver in 2.6.24 kernel is broken.  Not so intensive read/write
      operations cause incomplete writes which lead to kernel panics in JFFS2.
      
      We investigated the issue - it is caused by bug in FL_SHUTDOWN parsing code.
      Sometimes chip returns -EIO as if it is in FL_SHUTDOWN state when it should
      wait in FL_PONT (error in order of conditions).
      
      The following patch fixes the bug in state parsing code of CFI.  Also I've
      added comments to notify developers if they want to add new case in future.
      Signed-off-by: default avatarAlexey Korolev <akorolev@infradead.org>
      Reviewed-by: default avatarJoern Engel <joern@logfs.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb6d080c
    • Balbir Singh's avatar
      memory controller: make memory resource control aware of boot options · 4077960e
      Balbir Singh authored
      A boot option for the memory controller was discussed on lkml.  It is a good
      idea to add it, since it saves memory for people who want to turn off the
      memory controller.
      
      By default the option is on for the following two reasons:
      
      1. It provides compatibility with the current scheme where the memory
         controller turns on if the config option is enabled
      2. It allows for wider testing of the memory controller, once the config
         option is enabled
      
      We still allow the create, destroy callbacks to succeed, since they are not
      aware of boot options.  We do not populate the directory will memory resource
      controller specific files.
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4077960e
    • Paul Menage's avatar
      cgroups: add cgroup support for enabling controllers at boot time · 8bab8dde
      Paul Menage authored
      The effects of cgroup_disable=foo are:
      
      - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
      - foo isn't visible as an individually mountable subsystem
      
      As a result there will only ever be one call to foo->create(), at init time;
      all processes will stay in this group, and the group will never be mounted on
      a visible hierarchy.  Any additional effects (e.g.  not allocating metadata)
      are up to the foo subsystem.
      
      This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
      but it could easily be extended to do so if any of the early_init systems
      wanted it - I think it would just involve some nastier parameter processing
      since it would occur before the command-line argument parser had been run.
      
      Hugh said:
      
        Ballpark figures, I'm trying to get this question out rather than
        processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
        to the affected paths, booting with cgroup_disable=memory cuts that back to
        1% overhead (due to slightly bigger struct page).
      
        I'm no expert on distros, they may have no interest whatever in
        CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
        without it, or apply the cgroup_disable=memory patches.
      
      Unix bench's execl test result on x86_64 was
      
      == just after boot without mounting any cgroup fs.==
      mem_cgorup=off : Execl Throughput       43.0     3150.1      732.6
      mem_cgroup=on  : Execl Throughput       43.0     2932.6      682.0
      ==
      
      [lizf@cn.fujitsu.com: fix boot option parsing]
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bab8dde
    • Sergei Shtylyov's avatar
      [MIPS] Make KGDB compile on UP · e64a3cfc
      Sergei Shtylyov authored
      Building UP kernel with KGDB enabled produces the following errors and warning
      (fatal due to -Werror in arch/mips/kernel/Makefile):
      
      In file included from arch/mips/kernel/gdb-stub.c:142:
      include/asm/smp.h:25:1: "raw_smp_processor_id" redefined
      In file included from include/linux/sched.h:69,
                       from arch/mips/kernel/gdb-stub.c:126:
      include/linux/smp.h:88:1: this is the location of the previous definition
      In file included from arch/mips/kernel/gdb-stub.c:142:
      include/asm/smp.h:62: error: redefinition of 'smp_send_reschedule'
      include/linux/smp.h:102: error: previous definition of 'smp_send_reschedule' was here
      include/asm/smp.h: In function `smp_send_reschedule':
      include/asm/smp.h:65: error: dereferencing pointer to incomplete type
      arch/mips/kernel/gdb-stub.c: At top level:
      arch/mips/kernel/gdb-stub.c:660: warning: 'kgdb_wait' defined but not used
      
      Fix the errors by not directly including <asm/smp.h> (which is already included
      by <linux/smp.h>) and the warning by enclosing kgdb_wait() in #ifdef CONFIG_SMP.
      Signed-off-by: default avatarSergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      e64a3cfc
    • Sergei Shtylyov's avatar
      865ab875
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 · 3a143125
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
        x86: revert assign IRQs to hpet timer
        x86: tsc prevent time going backwards
        xen: Clear PG_pinned in release_{pt,pd}()
        xen: Do not pin/unpin PMD pages
        xen: refactor xen_{alloc,release}_{pt,pd}()
        x86, agpgart: scary messages are fortunately obsolete
        xen: fix grant table bug
        x86: fix breakage of vSMP irq operations
        x86: print message if nmi_watchdog=2 cannot be enabled
        x86: fix nmi_watchdog=2 on Pentium-D CPUs
      3a143125
    • Geert Uytterhoeven's avatar
      m68k: update defconfigs for 2.6.25 · a1aa758d
      Geert Uytterhoeven authored
      Long overdue update of the m68k defconfigs
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1aa758d
    • Adrian Bunk's avatar
      m68k: use KBUILD_DEFCONFIG · ef85ecbf
      Adrian Bunk authored
      The default defconfig should be one from arch/m68k/configs/
      
      arch/m68k/defconfig was not exactly identical to amiga_defconfig but
      also considering how long they have been without any update that doesn't
      seem to have been on purpose.
      Signed-off-by: default avatarAdrian Bunk <adrian.bunk@movial.fi>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef85ecbf
    • Linus Torvalds's avatar
      Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · 7a5ac8de
      Linus Torvalds authored
      * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
        pata_ali: disable ATAPI DMA
        libata: ATA_12/16 doesn't fall into ATAPI_MISC
        libata: uninline atapi_cmd_type()
        libata: fix IDENTIFY order in ata_bus_probe()
      7a5ac8de
    • Linus Torvalds's avatar
      Be more careful about marking buffers dirty · 1be62dc1
      Linus Torvalds authored
      Mikulas Patocka noted that the optimization where we check if a buffer
      was already dirty (and we avoid re-dirtying it) was not really SMP-safe.
      
      Since the read of the old status was not synchronized with anything, an
      aggressive CPU re-ordering of memory accesses might have moved that read
      up to before the data was even written to the buffer, and another CPU
      that cleaned it again, causing the newly dirty state to never actually
      hit the disk.
      
      Admittedly this would probably never trigger in practice, but it's still
      wrong.
      
      Mikulas sent a patch that fixed the problem, but I dislike the subtlety
      of the whole optimization, so this is an alternate fix that is more
      explicit about the particular SMP ordering for the optimization, and
      separates out the speculative reads of the buffer state into its own
      conditional (and makes the memory barrier only happen if we are likely
      to actually hit the optimized case in the first place).
      
      I considered removing the optimization entirely, but Andrew argued for
      it's continued existence. I'm a push-over.
      
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1be62dc1
    • Linus Torvalds's avatar
      parport_pc: make sure to release IO ports after probing for IT87XX · 4ed91901
      Linus Torvalds authored
      Commit f63fd7e2 ("parport_pc: detection
      for SuperIO IT87XX POST") only released the IO port region on success,
      not when the probe for the IT87XX chip failed.
      
      That caused not only a reserved region to leak, but also caused an oops
      when the driver module was unloaded and somebody tried to cat
      /proc/ioports - because the string that was assigned to the IO port
      region was a static string in the module virtual address area.
      Reported-by: default avatarLubos Lunak <l.lunak@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Petr Cvek <petr.cvek@tul.cz>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ed91901
    • Al Viro's avatar
      fix endian lossage in forcedeth · 30ecce90
      Al Viro authored
      a) if you initialize something with le32_to_cpu(...), then |= it
      with host-endian and feed to cpu_to_le32(), it's most definitely
      *not* __le32.  As sparse would've told you...
      
      b) the whole sequence is |= cpu_to_le32(host-endian constant)
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      30ecce90
    • Adrian Bunk's avatar
      net/tokenring/olympic.c section fixes · e28e3a61
      Adrian Bunk authored
      My previous section fix only turned one section problem into another
      section problem.
      
      This patch fixes it for real.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      e28e3a61
    • Harvey Harrison's avatar
      net: marvell.c fix sparse shadowed variable warning · 5da4e37e
      Harvey Harrison authored
      The other if blocks don't redeclare temp, remove the redeclaration in
      the final if() block.
      
      drivers/net/phy/marvell.c:214:7: warning: symbol 'temp' shadows an earlier one
      drivers/net/phy/marvell.c:160:6: originally declared here
      Signed-off-by: default avatarHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      5da4e37e
    • Pavel Emelyanov's avatar
      [VLAN]: Fix egress priority mappings leak. · 23556323
      Pavel Emelyanov authored
      These entries are allocated in vlan_dev_set_egress_priority, 
      but are never released and leaks on vlan device removal.
      
      Drop these in vlan's ->uninit callback - after the device is 
      brought down and everyone is notified about it is going to
      be unregistered.
      
      Found during testing vlan netnsization patchset.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23556323
    • Thomas Gleixner's avatar
      x86: revert assign IRQs to hpet timer · 5761d64b
      Thomas Gleixner authored
      The commits:
      
      commit 37a47db8
      Author: Balaji Rao <balajirrao@gmail.com>
      Date:   Wed Jan 30 13:30:03 2008 +0100
      
          x86: assign IRQs to HPET timers, fix
      
      and
      
      commit e3f37a54
      Author: Balaji Rao <balajirrao@gmail.com>
      Date:   Wed Jan 30 13:30:03 2008 +0100
      
          x86: assign IRQs to HPET timers
      
      have been identified to cause a regression on some platforms due to
      the assignement of legacy IRQs which makes the legacy devices
      connected to those IRQs disfunctional.
      
      Revert them.
      
      This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5761d64b
    • Thomas Gleixner's avatar
      x86: tsc prevent time going backwards · 47001d60
      Thomas Gleixner authored
      We already catch most of the TSC problems by sanity checks, but there
      is a subtle bug which has been in the code for ever. This can cause
      time jumps in the range of hours.
      
      This was reported in:
           http://lkml.org/lkml/2007/8/23/96
      and
           http://lkml.org/lkml/2008/3/31/23
      
      I was able to reproduce the problem with a gettimeofday loop test on a
      dual core and a quad core machine which both have sychronized
      TSCs. The TSCs seems not to be perfectly in sync though, but the
      kernel is not able to detect the slight delta in the sync check. Still
      there exists an extremly small window where this delta can be observed
      with a real big time jump. So far I was only able to reproduce this
      with the vsyscall gettimeofday implementation, but in theory this
      might be observable with the syscall based version as well.
      
      CPU 0 updates the clock source variables under xtime/vyscall lock and
      CPU1, where the TSC is slighty behind CPU0, is reading the time right
      after the seqlock was unlocked.
      
      The clocksource reference data was updated with the TSC from CPU0 and
      the value which is read from TSC on CPU1 is less than the reference
      data. This results in a huge delta value due to the unsigned
      subtraction of the TSC value and the reference value. This algorithm
      can not be changed due to the support of wrapping clock sources like
      pm timer.
      
      The huge delta is converted to nanoseconds and added to xtime, which
      is then observable by the caller. The next gettimeofday call on CPU1
      will show the correct time again as now the TSC has advanced above the
      reference value.
      
      To prevent this TSC specific wreckage we need to compare the TSC value
      against the reference value and return the latter when it is larger
      than the actual TSC value.
      
      I pondered to mark the TSC unstable when the readout is smaller than
      the reference value, but this would render an otherwise good and fast
      clocksource unusable without a real good reason.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      47001d60