Commits · bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 · Kirill Smelkov / linux

30 Jan, 2008 40 commits

NTP: correct inconsistent ntp interval/tick_length usage · bbe4d18a

john stultz authored Jan 30, 2008

I recently noticed on one of my boxes that when synched with an NTP
server, the drift value reported for the system was ~283ppm. While in
some cases, clock hardware can be that bad, it struck me as unusual as
the system was using the acpi_pm clocksource, which is one of the more
trustworthy and accurate clocksources on x86 hardware.

I brought up another system and let it sync to the same NTP server, and
I noticed a similar 280some ppm drift.

In looking at the code, I found that the acpi_pm's constant frequency
was being computed correctly at boot-up, however once the system was up,
even without the ntp daemon running, the clocksource's frequency was
being modified by the clocksource_adjust() function.

Digging deeper, I realized that in the code that keeps track of how much
the clocksource is skewing from the ntp desired time, we were using
different lengths to establish how long an time interval was.

The clocksource was being setup with the following interval:
	NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ

While the ntp code was using the tick_length_base value:
	tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
					/NTP_INTERVAL_FREQ

The subtle difference is:
	(tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC

This difference in calculation was causing the clocksource correction
code to apply a correction factor to the clocksource so the two
intervals were the same, however this results in the actual frequency of
the clocksource to be made incorrect. I believe this difference would
affect all clocksources, although to differing degrees depending on the
clocksource resolution.

The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
so my apologies for the mistake, and for not noticing it until now.

The following patch, corrects the clocksource's initialization code so
it uses the same interval length as the code in ntp.c. After applying
this patch, the drift value for the same system went from ~283ppm to
only 2.635ppm.

I believe this patch to be good, however it does affect all arches and
I've only tested on x86, so some caution is advised. I do think it would
be a likely candidate for a stable 2.6.24.x release.

Any thoughts or feedback would be appreciated.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

bbe4d18a

x86: assign IRQs to HPET timers, fix · 37a47db8

Balaji Rao authored Jan 30, 2008

Looks like IRQ 31 is assigned to timer 3, even without the patch!
I wonder who wrote the number 31. But the manual says that it is
zero by default.

I think we should check whether the timer has been allocated an IRQ before
proceeding to assign one to it.  Here is a patch that does this.
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

37a47db8

x86: assign IRQs to HPET timers · e3f37a54

Balaji Rao authored Jan 30, 2008

The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
device. This patch fixes it by allocating IRQs to timer blocks in the HPET.

arch/x86/kernel/hpet.c |   13 +++++--------
drivers/char/hpet.c    |   45 ++++++++++++++++++++++++++++++++++++++-------
include/linux/hpet.h   |    2 +-
3 files changed, 44 insertions(+), 16 deletions(-)
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

e3f37a54

x86: make clockevents more robust · 45fe4fe1

Ingo Molnar authored Jan 30, 2008

detect zero event-device multiplicators - they then cause
division-by-zero crashes if a clockevent has been initialized
incorrectly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

45fe4fe1

x86: unregister PIT clocksource when PIT is disabled · 1a0c009a

Thomas Gleixner authored Jan 30, 2008

The following scenario might leave PIT as a disfunctional clock source:

    PIT is registered as clocksource
    PM_TIMER is registered as clocksource and enables highres/dyntick mode
    PIT is switched to oneshot mode
    -> now the readout of PIT is bogus, but the user might select PIT
    via the sysfs override, which would break the box as the time
    readout is unusable.

Unregister the PIT clocksource when the PIT clock event device is switched
into shutdown / oneshot mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

1a0c009a

clocksource: add unregister function to disable unusable clocksources · 4713e22c

Thomas Gleixner authored Jan 30, 2008

On x86 the PIT might become an unusable clocksource. Add an unregister
function to provide a possibilty to remove the PIT from the list of
available clock sources.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

4713e22c

x86: restrict PIT clocksource usage · 316da3b3

Thomas Gleixner authored Jan 30, 2008

PIT clocksource is registered unconditionally even when HPET is enabled
or when PIT is replaced by the local APIC timer. In both cases PIT can
not be used as it is stopped and the readout would be stale.

Prevent registering PIT in those cases.

patch depends on:

  x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

316da3b3

x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too · df619e6b

Ingo Molnar authored Jan 30, 2008

offer is_hpet_enabled() on !CONFIG_HPET_TIMER too.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

df619e6b

clocksource: make clocksource watchdog cycle through online CPUs · 1ada5cba

Andi Kleen authored Jan 30, 2008

This way it checks if the clocks are synchronized between CPUs too.
This might be able to detect slowly drifting TSCs which only
go wrong over longer time.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

1ada5cba

clocksource.c: use init_timer_deferrable for clocksource_watchdog · 1077f5a9

Parag Warudkar authored Jan 30, 2008

clocksource_watchdog can use a deferrable timer - reduces wakeups from
idle per second.
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

1077f5a9

time: fold __get_realtime_clock_ts() into getnstimeofday() · efd9ac86

Geert Uytterhoeven authored Jan 30, 2008

  - getnstimeofday() was just a wrapper around __get_realtime_clock_ts()
  - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday()
  - Fix bogus reference to get_realtime_clock_ts(), which never existed
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

efd9ac86

clocksource: make CLOCKSOURCE_MASK bullet-proof · 1d76c262

Atsushi Nemoto authored Jan 30, 2008

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

1d76c262

timer: clean up tick-broadcast.c · 186e3cb8

Thomas Gleixner authored Jan 30, 2008

clean up tick-broadcast.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

186e3cb8

time: more timer related cleanups · b10db7f0

Pavel Machek authored Jan 30, 2008

I was confused by FSEC = 10^15 NSEC statement, plus small whitespace
fixes. When there's copyright, there should be GPL.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

b10db7f0

time: timer cleanups · 4c9dc641

Pavel Machek authored Jan 30, 2008

Small cleanups to tick-related code. Wrong preempt count is followed
by BUG(), so it is hardly KERN_WARNING.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

4c9dc641

time: clean hungarian notation from timers · a6fa8e5a

Pavel Machek authored Jan 30, 2008

Clean up hungarian notation from timer code.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

a6fa8e5a

kobj: fix threshold_init_device/kobject_uevent_env oops · 213eca7f

Greg KH authored Jan 30, 2008

the logic in this function is just crazy.  It's recursive, but we
can circumvent the creation for the kobject and whole creation of the
threshold_block if some conditions are met.  That's why we see the
allocate_threshold_blocks so many times in the callstack, yet only a few
kobjects created.

Then we blow up in kobject_uevent_env() on the first debug printk.
Which means that we are just passing in garbage.

Man, this is one time that comments in code would have been very nice to
have, and why forward goto's into major code blocks are just evil...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

213eca7f

Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 · 85004cc3

Linus Torvalds authored Jan 30, 2008

* git://git.linux-nfs.org/pub/linux/nfs-2.6: (118 commits)
  NFSv4: Iterate through all nfs_clients when the server recalls a delegation
  NFSv4: Deal more correctly with duplicate delegations
  NFS: Fix a potential race between umount and nfs_access_cache_shrinker()
  NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode
  nfs: convert NFS_*(inode) helpers to static inline
  nfs: obliterate NFS_FLAGS macro
  NFS: Address memory leaks in the NFS client mount option parser
  nfs4: allow nfsv4 acls on non-regular-files
  NFS: Optimise away the sigmask code in aio/dio reads and writes
  SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
  SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create()
  SUNRPC: Clean up block comment preceding rpcb_getport_sync()
  SUNRPC: Use appropriate argument types in rpcb client
  SUNRPC: rpcb_getport_sync() should use built-in hostname generator
  SUNRPC: Clean up functions that free address_strings array
  NFS: NFS version number is unsigned
  NLM: Fix a bogus 'return' in nlmclnt_rpc_release
  NLM: Introduce an arguments structure for nlmclnt_init()
  NLM/NFS: Use cached nlm_host when calling nlmclnt_proc()
  NFS: Invoke nlmclnt_init during NFS mount processing
  ...

85004cc3

as-iosched: fix double locking bug in as_merged_requests() · 149a051f

Jens Axboe authored Jan 29, 2008

If the two requests belong to the same io context, we will attempt
to lock the same lock twice. But swapping contexts is pointless in
that case, so just check for rioc == nioc before doing the double
lock and copy.
Tested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

149a051f

NFSv4: Iterate through all nfs_clients when the server recalls a delegation · 3fbd67ad

Trond Myklebust authored Jan 26, 2008

The same delegation may have been handed out to more than one nfs_client.
Ensure that if a recall occurs, we return all instances.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3fbd67ad

NFSv4: Deal more correctly with duplicate delegations · 57bfa891

Trond Myklebust authored Jan 25, 2008

If a (broken?) server hands out two different delegations for the same
file, then we should return one of them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

57bfa891

NFS: Fix a potential race between umount and nfs_access_cache_shrinker() · 6f23e387
Trond Myklebust authored Jan 25, 2008
```
Thanks to Yawei Niu for spotting the race.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
6f23e387

NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode · e6f81075

Trond Myklebust authored Jan 24, 2008

Otherwise, there is a potential deadlock if the last dput() from an NFSv4
close() or other asynchronous operation leads to nfs_clear_inode calling
the synchronous delegreturn.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e6f81075

nfs: convert NFS_*(inode) helpers to static inline · 99fadcd7

Benny Halevy authored Jan 23, 2008

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

99fadcd7

nfs: obliterate NFS_FLAGS macro · 3a10c30a

Benny Halevy authored Jan 23, 2008

use NFS_I(inode)->flags instead
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3a10c30a

NFS: Address memory leaks in the NFS client mount option parser · fc601477

Chuck Lever authored Jan 16, 2008

David Howells noticed that repeating the same mount option twice during an
NFS mount request can result in orphaned memory in certain cases.

Only the client_address and mount_server.hostname strings are initialized
in the mount parsing loop, so those appear to be the only two pointers that
might be written over by repeating a mount option. The strings in the
nfs_server section of the nfs_parsed_mount_data structure are set only once
after the options are parsed, thus these are not susceptible to being
overwritten.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

fc601477

nfs4: allow nfsv4 acls on non-regular-files · 3d1c5508

J. Bruce Fields authored Jan 15, 2008

The rfc doesn't give any reason it shouldn't be possible to set an
attribute on a non-regular file.  And if the server supports it, then it
shouldn't be up to us to prevent it.

Thanks to Erez for the report and Trond for further analysis.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3d1c5508

NFS: Optimise away the sigmask code in aio/dio reads and writes · f3c391e8

Trond Myklebust authored Jan 15, 2008

There are no interruptible waits for asynchronous RPC tasks, so we don't
need to wrap calls to rpc_run_task() with an
rpc_clnt_sigmask/rpc_clnt_unsigmask pair.

Instead we can wrap the wait_for_completion_interruptible() in
nfs_direct_wait(). This means that we completely optimise away sigmask
setting for the case of non-blocking aio/dio.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f3c391e8

SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls · 34f5b466

Trond Myklebust authored Jan 15, 2008

The caller will never sleep in rpc_execute, so don't bother setting the
sigmask.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

34f5b466

SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create() · afc88112

Chuck Lever authored Jan 14, 2008

The variable "sin" is a pointer, so sizeof(sin) is the size of a pointer,
not the size of thing that sin points to.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

afc88112

SUNRPC: Clean up block comment preceding rpcb_getport_sync() · 67d60213

Chuck Lever authored Jan 14, 2008

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

67d60213

SUNRPC: Use appropriate argument types in rpcb client · f1ec08cb

Chuck Lever authored Jan 14, 2008

Clean up: Follow recommendations of Chapter 5 of Documentation/CodingStyle
and use "u32" instead of "__u32" for types in definitions that are not
shared with user space.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f1ec08cb

SUNRPC: rpcb_getport_sync() should use built-in hostname generator · b91e101f

Chuck Lever authored Jan 14, 2008

rpc_create() can already fill in the hostname with a string representation
of the server's IP address, so remove redundant logic in in
rpcb_getport_sync() that does that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b91e101f

SUNRPC: Clean up functions that free address_strings array · 33e01dc7

Chuck Lever authored Jan 14, 2008

Clean up: document the rule (kfree) and the exceptions
(RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in
a transport's address_strings array.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

33e01dc7

NFS: NFS version number is unsigned · c0e07cb6

Chuck Lever authored Jan 14, 2008

RPC protocol version numbers are unsigned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

c0e07cb6

NLM: Fix a bogus 'return' in nlmclnt_rpc_release · 65fdf7d2
Trond Myklebust authored Jan 11, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
65fdf7d2

NLM: Introduce an arguments structure for nlmclnt_init() · 883bb163

Chuck Lever authored Jan 15, 2008

Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the
new nfs_client_initdata structure.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

883bb163

NLM/NFS: Use cached nlm_host when calling nlmclnt_proc() · 1093a60e

Chuck Lever authored Jan 11, 2008

Now that each NFS mount point caches its own nlm_host structure, it can be
passed to nlmclnt_proc() for each lock request. By pinning an nlm_host for
each mount point, we trade the overhead of looking up or creating a fresh
nlm_host struct during every NLM procedure call for a little extra memory.

We also restrict the nlmclnt_proc symbol to limit the use of this call to
in-tree modules.

Note that nlm_lookup_host() (just removed from the client's per-request
NLM processing) could also trigger an nlm_host garbage collection. Now
client-side nlm_host garbage collection occurs only during NFS mount
processing. Since the NFS client now holds a reference on these nlm_host
structures, they wouldn't have been affected by garbage collection
anyway.

Given that nlm_lookup_host() reorders the global nlm_host chain after
every successful lookup, and that a garbage collection could be triggered
during the call, we've removed a significant amount of per-NLM-request
CPU processing overhead.

Sidebar: there are only a few remaining references to the internals of
NFS inodes in the client-side NLM code. The only references I found are
related to extracting or comparing the inode's file handle via NFS_FH().
One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

1093a60e

NFS: Invoke nlmclnt_init during NFS mount processing · 9289e7f9

Chuck Lever authored Jan 11, 2008

Cache an appropriate nlm_host structure in the NFS client's mount point
metadata for later use.

Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if
nfs_start_lockd() returns a non-zero value, its callers ensure that the
mount request fails outright.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

9289e7f9

NLM: Introduce external nlm_host set-up and tear-down functions · 52c4044d

Chuck Lever authored Jan 11, 2008

We would like to remove the per-lock-operation nlm_lookup_host() call from
nlmclnt_proc().

The new architecture pins an nlm_host structure to each NFS client
superblock that has the "lock" mount option set.  The NFS client passes
in the pinned nlm_host structure during each call to nlmclnt_proc().  NFS
client unmount processing "puts" the nlm_host so it can be garbage-
collected later.

This patch introduces externally callable NLM functions that handle
mount-time nlm_host set up and tear-down.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

52c4044d