Commits · b5d32bead1578afc5ca817d40c320764d50a8600 · Kirill Smelkov / linux

05 Feb, 2007 34 commits

Steven Whitehouse authored Jan 22, 2007

This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

b5d32bea

[DLM] lowcomms tidy · f2f5095f

Patrick Caulfield authored Jan 22, 2007

This patch removes some redundant fields from the connection structure and adds
some lockdep annotation to remove spurious warnings.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

f2f5095f

[GFS2] Remove local exclusive glock mode · 1c0f4872

Steven Whitehouse authored Jan 22, 2007

Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.

Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

1c0f4872

[GFS2] Remove unused go_callback operation · 6bd9c8c2

Steven Whitehouse authored Jan 19, 2007

This is never used, so we might as well remove it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

6bd9c8c2

[GFS2] Remove the "greedy" function from glock.[ch] · e5dab552

Steven Whitehouse authored Jan 18, 2007

The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

e5dab552

[GFS2] Shrink gfs2_inode memory by half · fee852e3

Steven Whitehouse authored Jan 17, 2007

Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

fee852e3

[GFS2] Remove max_atomic_write tunable · 330005c2

Steven Whitehouse authored Jan 15, 2007

This removes an unused sysfs tunable parameter.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

330005c2

[GFS2] Clean up/speed up readdir · 3699e3a4

Steven Whitehouse authored Jan 17, 2007

This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3699e3a4

[GFS2] Add writepages for "data=writeback" mounts · a8d638e3

Steven Whitehouse authored Jan 15, 2007

It occurred to me that although a gfs2 specific writepages for ordered
writes and journaled data would be tricky, by hooking writepages only
for "data=writeback" mounts we could take advantage of not needing
buffer heads (we don't use them on the read side, nor have we for some
time) and create much larger I/Os for the block layer.

Using blktrace both before and after, its possible to see that for large
I/Os, most of the requests generated through writepages are now 1024
sectors after this patch is applied as opposed to 8 sectors before.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

a8d638e3

[DLM] fix master recovery · 222d3960

David Teigland authored Jan 15, 2007

If master recovery happens on an rsb in one recovery sequence, then that
sequence is aborted before lock recovery happens, then in the next
sequence, we rely on the previous master recovery (which may now be
invalid due to another node ignoring a lookup result) and go on do to the
lock recovery where we get stuck due to an invalid master value.

 recovery cycle begins: master of rsb X has left
 nodes A and B send node C an rcom lookup for X to find the new master
 C gets lookup from B first, sets B as new master, and sends reply back to B
 C gets lookup from A next, and sends reply back to A saying B is master
 A gets lookup reply from C and sets B as the new master in the rsb
 recovery cycle on A, B and C is aborted to start a new recovery
 B gets lookup reply from C and ignores it since there's a new recovery
 recovery cycle begins: some other node has joined
 B doesn't think it's the master of X so it doesn't rebuild it in the directory
 C looks up the master of X, no one is master, so it becomes new master
 B looks up the master of X, finds it's C
 A believes that B is the master of X, so it sends its lock to B
 B sends an error back to A
 A resends
 this repeats forever, the incorrect master value on A is never corrected

The fix is to do master recovery on an rsb that still has the NEW_MASTER
flag set from an earlier recovery sequence, and therefore didn't complete
lock recovery.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

222d3960

[DLM] fix user unlocking · a1bc86e6

David Teigland authored Jan 15, 2007

When a user process exits, we clear all the locks it holds.  There is a
problem, though, with locks that the process had begun unlocking before it
exited.  We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD.  To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding.  We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

a1bc86e6

[DLM] Use workqueues for dlm lowcomms · 1d6e8131

Patrick Caulfield authored Jan 15, 2007

This patch converts the DLM TCP lowcomms to use workqueues rather than using its
own daemon functions. Simultaneously removing a lot of code and making it more
scalable on multi-processor machines.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

1d6e8131

[GFS2] make gfs2_change_nlink_i() static · 03dc6a53

Adrian Bunk authored Jan 13, 2007

On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly globlal gfs2_change_nlink_i() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

03dc6a53

[GFS2] gfs2 knows of directories which it chooses not to display · 70831465

Robert Peterson authored Jan 11, 2007

This is for Red Hat bugzilla bug bz #222302:

Moving a virtual IP from node to node between two NFS-over-GFS2
servers was causing one of the GFS2 servers to become confused and
reference a deleted inode. The problem was due to vfs dentries that did
not reference the gfs2_dops and therefore didn't call the gfs2 revalidate
code to revalidate a dentry after a directory had been deleted & recreated.
This patch is a crosswrite from a RHEL4 bug found in GFS1 as
bz #190756 and it is against the latest -nmw git tree.
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

70831465

[DLM] expose dlm_config_info fields in configfs · d200778e

David Teigland authored Jan 09, 2007

Make the dlm_config_info values readable and writeable via configfs
entries.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

d200778e

[DLM] add config entry to enable log_debug · 99fc6487

David Teigland authored Jan 09, 2007

Add a new dlm_config_info field to enable log_debug output and change
log_debug() to use it.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

99fc6487

[DLM] rename dlm_config_info fields · 68c817a1

David Teigland authored Jan 09, 2007

Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
can use macros to add configfs functions to access them (in a later
patch). No functional changes in this patch, just naming changes.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

68c817a1

[DLM] change some log_error to log_debug · 8ec68867

David Teigland authored Jan 09, 2007

Some common, non-error messages should use log_debug instead of log_error
so they can be turned off.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

8ec68867

[GFS2] Fix gfs2_rename deadlock · 87d21e07

S. Wendy Cheng authored Jan 18, 2007

Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.

This fixes Red Hat bugzilla #221237
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

87d21e07

[GFS2] BZ 217008 fsfuzzer fix. · 6c93fd1e

Russell Cattelan authored Jan 08, 2007

Update the quilt header comments to match the
code changes.

Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.

This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.htmlSigned-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

6c93fd1e

[GFS2] Fix ordering of page disposal vs. glock_dq · 49686f71

Steven Whitehouse authored Jan 08, 2007

In case of unlinked files with dirty pages GFS2 wasn't clearing
the pages in quite the right order. This patch clears the pages
earlier (before the qlock_dq) to avoid the situation that the
release of the glock results in attempting to write back data that
has already been deallocated.

This fixes Red Hat bugzilla: #220117
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

49686f71

[DLM] Fix spin lock already unlocked bug · 4edde74e

Patrick Caulfield authored Jan 02, 2007

I just noticed this message when testing some other changes I'd made to
lowcomms (to use workqueues) but the problem seems to be in the current
git trees too. I'm amazed no-one has seen it.

    BUG: spinlock already unlocked on CPU#1, dlm_recoverd/16868
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

4edde74e

[DLM] Fix schedule() calls · 3fb4a251

Patrick Caulfield authored Jan 02, 2007

I was a little over-enthusiastic turning schedule() calls int cond_sched() when fixing the DLM for Andrew Morton.

These four should really be calls to schedule() or the dlm can busy-wait.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3fb4a251

[GFS2] Fix change nlink deadlock · 5509826f

S. Wendy Cheng authored Jan 18, 2007

Bugzilla 215088

Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
partition. The gfs2_rename() apparently needs block allocation for the
new name (into the directory) where it requires rg locks. At the same
time, while updating the nlink count for the replaced file,
gfs2_change_nlink() tries to return the inode meta-data back to resource
group where it needs rg locks too. Our logic doesn't allow process to
acquire these locks recursively by the same process  (RHEL installer)
that results a BUG call. This only happens within rename code path and
only if the destination file exists before the rename operation.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

5509826f

[GFS2] Fail over to readpage for stuffed files · e1d5b18a

Steven Whitehouse authored Dec 15, 2006

This is partially derrived from a patch written by Russell Cattelan.
It fixes a bug where there is a race between readpages and truncate
by ignoring readpages for stuffed files. This is ok because a stuffed
file will never be more than one block (minus sizeof(struct gfs2_dinode))
in size and block size is always less than page size, so we do not lose
anything efficiency-wise by not doing readahead for stuffed files. They
will have already been "read ahead" by the action of reading the inode
in, in the first place.

This is the remaining part of the fix for Red Hat bugzilla #218966
which had not yet made it upstream.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Russell Cattelan <cattelan@redhat.com>

e1d5b18a

[GFS2] Fix DIO deadlock · c7b33834

Steven Whitehouse authored Dec 14, 2006

This patch fixes Red Hat bugzilla #212627 in which a deadlock occurs
due to trying to take the i_mutex while holding a glock. The correct
locking order is defined as i_mutex -> glock in all cases.

I've left dealing with allocating writes. I know that we need to do
that, but for now this should do the trick. We don't need to take the
i_mutex on write, because the VFS has already taken it for us. On read
we don't need it since the glock is enough protection. The reason that
I've made some of the checks into a separate function is that we'll need
to do the checks again in the allocating write case eventually, so this
is partly in preparation for this. Likewise the return value test of !=
1 might look a bit odd and thats because we'll need a third return value
in case of requiring an allocation.

I've made the change to deferred mode on the glock to ensure flushing
read caches on other nodes. I notice that (using blktrace to look at
whats going on) we appear to do a better job of large I/Os than ext3
after this patch (in terms of not splitting up the I/Os).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>

c7b33834

[DLM] fs/dlm/lowcomms-tcp.c: remove 2 functions · 927255f0

Adrian Bunk authored Dec 19, 2006

Remove the following unused functions:

- lowcomms_send_message()
- lowcomms_max_buffer_size()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

927255f0

[DLM] fix lost flags in stub replies · 075529b5

David Teigland authored Dec 13, 2006

When the dlm fakes an unlock/cancel reply from a failed node using a stub
message struct, it wasn't setting the flags in the stub message. So, in
the process of receiving the fake message the lkb flags would be updated
and cleared from the zero flags in the message. The problem observed in
tests was the loss of the USER flag which caused the dlm to think a user
lock was a kernel lock and subsequently fail an assertion checking the
validity of the ast/callback field.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

075529b5

[DLM] fix receive_request() lvb copying · 8d07fd50

David Teigland authored Dec 13, 2006

LVB's are not sent as part of new requests, but the code receiving the
request was copying data into the lvb anyway. The space in the message
where it mistakenly thought the lvb lived actually contained the resource
name, so it wound up incorrectly copying this name data into the lvb. Fix
is to just create the lvb, not copy junk into it.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

8d07fd50

[DLM] fix send_args() lvb copying · da49f36f

David Teigland authored Dec 13, 2006

The send_args() function is used to copy parameters into a message for a
number different message types. Only some of those types are set up
beforehand (in create_message) to include space for sending lvb data.
send_args was wrongly copying the lvb for all message types as long as the
lock had an lvb. This means that the lvb data was being written past the
end of the message into unknown space.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

da49f36f

[DLM] add version check · 9e971b71

David Teigland authored Dec 13, 2006

Check if we receive a message from another lockspace member running a
version of the dlm with an incompatible inter-node message protocol.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

9e971b71

[DLM] fix old rcom messages · 38aa8b0c

David Teigland authored Dec 13, 2006

A reply to a recovery message will often be received after the relevant
recovery sequence has aborted and the next recovery sequence has begun.
We need to ignore replies to these old messages from the previous
recovery.  There's already a way to do this for synchronous recovery
requests using the rc_id number, but not for async.

Each recovery sequence already has a locally unique sequence number
associated with it.  This patch adds a field to the rcom (recovery
message) structure where this recovery sequence number can be placed,
rc_seq.  When a node sends a reply to a recovery request, it copies the
rc_seq number it received into rc_seq_reply.  When the first node receives
the reply to its recovery message, it will check whether rc_seq_reply
matches the current recovery sequence number, ls_recover_seq, and if not
then it ignores the old reply.

An old, inadequate approach to filtering out old replies (checking if the
current stage of recovery has moved back to the start) has been removed
from two spots.

The protocol version number is changed to reflect the different rcom
structures.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

38aa8b0c

[DLM] fix resend rcom lock · dc200a88

David Teigland authored Dec 13, 2006

There's a chance the new master of resource hasn't learned it's the new
master before another node sends it a lock during recovery.  The node
sending the lock needs to resend if this happens.

- A sends a master lookup for resource R to C
- B sends a master lookup for resource R to C
- C receives A's lookup, assigns A to be master of R and
  sends a reply back to A
- C receives B's lookup and sends a reply back to B saying
  that A is the master
- B receives lookup reply from C and sends its lock for R to A
- A receives lock from B, doesn't think it's the master of R
  and sends an error back to B
- A receives lookup reply from C and becomes master of R
- B gets error back from A and resends its lock back to A
  (this resending is what this patch does)
- A receives lock from B, it now sees it's the master of R
  and takes the lock
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

dc200a88

[GFS2] don't try to lockfs after shutdown · c3780511

David Teigland authored Dec 06, 2006

If an fs has already been shut down, a lockfs callback should do nothing.
An fs that's been shut down can't acquire locks or do anything with
respect to the cluster.

Also, remove FIXME comment in withdraw function.  The missing bits of the
withdraw procedure are now all done by user space.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

c3780511

04 Feb, 2007 3 commits

Linux 2.6.20 · 62d0cfcb
Linus Torvalds authored Feb 04, 2007

62d0cfcb

[PATCH] EFI x86: pass firmware call parameters on the stack · 40c373cc

Frédéric Riss authored Jan 30, 2007

When calling into the EFI firmware, the parameters need to be passed on
the stack. The recent change to use -mregparm=3 breaks x86 EFI support.
This patch is needed to allow the new Intel-based Macs to suspend to ram
(efi.get_time is called during the suspend phase).
Signed-off-by: Frederic Riss <frederic.riss@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

40c373cc

[PATCH] fix rtl8150 · 886ae1fa

Al Viro authored Feb 04, 2007

That code doesn't do what its author apparently thought it would do...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

886ae1fa

03 Feb, 2007 3 commits

Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · ce35a81a

Linus Torvalds authored Feb 03, 2007

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] sd: udev accessing an uninitialized scsi_disk field results in a crash
  [SCSI] st: A MTIOCTOP/MTWEOF within the early warning will cause the file number to be incorrect
  [SCSI] qla4xxx: bug fixes
  [SCSI] Fix scsi_add_device() for async scanning

ce35a81a

[PATCH] x86-64: define dma noncoherent API functions · 259886a7

Jeff Garzik authored Feb 03, 2007

x86-64 is missing these:
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

259886a7

[PATCH] Altix: more ACPI PRT support · 72253943

John Keller authored Feb 03, 2007

The SN Altix platform does not conform to the IOSAPIC IRQ routing model.
Add code in acpi_unregister_gsi() to check if (acpi_irq_model ==
ACPI_IRQ_MODEL_PLATFORM) and return.

Due to an oversight, this code was not added previously when
similar code was added to acpi_register_gsi().

http://marc.theaimsgroup.com/?l=linux-acpi&m=116680983430121&w=2Signed-off-by: John Keller <jpk@sgi.com>
Acked-by: Len Brown <lenb@kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

72253943