Commits · 3568bd9720b4a775f28a718fcbb462ce2f386988 · nexedi / linux

11 May, 2011 1 commit

ceph: do not use i_wrbuffer_ref as refcount for Fb cap · d3d0720d

Henry C Chang authored 13 years ago


We increments i_wrbuffer_ref when taking the Fb cap. This breaks
the dirty page accounting and causes looping in
__ceph_do_pending_vmtruncate, and ceph client hangs.

This bug can be reproduced occasionally by running blogbench.

Add a new field i_wb_ref to inode and dedicate it to Fb reference
counting.
Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

d3d0720d

04 May, 2011 1 commit

ceph: do not call __mark_dirty_inode under i_lock · fca65b4a

Sage Weil authored 13 years ago

The __mark_dirty_inode helper now takes i_lock as of 250df6ed

.  Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.
Signed-off-by: Sage Weil <sage@newdream.net>

fca65b4a

03 May, 2011 1 commit
- ceph: use ihold() when i_lock is held · 3772d26d
  Sage Weil authored 13 years ago
```
See 0444d76a

.
Signed-off-by: Sage Weil <sage@newdream.net>
```
  3772d26d
31 Mar, 2011 1 commit

Fix common misspellings · 25985edc

Lucas De Marchi authored 13 years ago


Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

19 Jan, 2011 3 commits

ceph: avoid immediate cap check after import · 7e57b81c

Sage Weil authored 14 years ago


The NODELAY flag avoids the heuristics that delay cap (issued/wanted)
release.  There's no reason for that after we import a cap, and it kills
whatever benefit we get from those delays.
Signed-off-by: Sage Weil <sage@newdream.net>

7e57b81c

ceph: fix flushing of caps vs cap import · 088b3f5e

Sage Weil authored 14 years ago


If we are mid-flush and a cap is migrated to another node, we need to
resend the cap flush message to the new MDS, and do so with the original
flush_seq to avoid leaking across a sync boundary.  Previously we didn't
redo the flush (we only flushed newly dirty data), which would cause a
later sync to hang forever.
Signed-off-by: Sage Weil <sage@newdream.net>

088b3f5e

ceph: fix erroneous cap flush to non-auth mds · 24be0c48

Sage Weil authored 14 years ago


The int flushing is global and not clear on each iteration of the loop,
which can cause a second flush of caps to any MDSs with ids greater than
the auth.
Signed-off-by: Sage Weil <sage@newdream.net>

24be0c48

08 Nov, 2010 1 commit

ceph: fix rdcache_gen usage and invalidate · cd045cb4

Sage Weil authored 14 years ago

We used to use rdcache_gen to indicate whether we "might" have cached
pages. Now we just look at the mapping to determine that. However, some
old behavior remains from that transition.

First, rdcache_gen == 0 no longer means we have no pages. That can happen
at any time (presumably when we carry FILE_CACHE). We should not reset it
to zero, and we should not check that it is zero.

That means that the only purpose for rdcache_revoking is to resolve races
between new issues of FILE_CACHE and an async invalidate. If they are
equal, we should invalidate. On success, we decrement rdcache_revoking,
so that it is no longer equal to rdcache_gen. Similarly, if we success
in doing a sync invalidate, set revoking = gen - 1. (This is a small
optimization to avoid doing unnecessary invalidate work and does not
affect correctness.)
Signed-off-by: Sage Weil <sage@newdream.net>

cd045cb4

07 Nov, 2010 1 commit

ceph: re-request max_size if cap auth changes · feb4cc9b

Sage Weil authored 14 years ago


If the auth cap migrates to another MDS, clear requested_max_size so that
we resend any pending max_size increase requests.  This fixes potential
hangs on writes that extend a file and race with an cap migration between
MDSs.
Signed-off-by: Sage Weil <sage@newdream.net>

feb4cc9b

28 Oct, 2010 1 commit

Revert "ceph: update issue_seq on cap grant" · 2f56f56a

Sage Weil authored 14 years ago

This reverts commit d91f2438

.

The intent of issue_seq is to distinguish between mds->client messages that
(re)create the cap and those that do not, which means we should _only_ be
updating that value in the create paths.  By updating it in handle_cap_grant,
we reset it to zero, which then breaks release.

The larger question is what workload/problem made me think it should be
updated here...
Signed-off-by: Sage Weil <sage@newdream.net>

2f56f56a

20 Oct, 2010 3 commits

ceph: use mapping->nrpages to determine if mapping is empty · 18a38193
Sage Weil authored 14 years ago
```
This is simpler and faster.
Signed-off-by: Sage Weil <sage@newdream.net>
```
18a38193

ceph: only invalidate on check_caps if we actually have pages · 93afd449

Sage Weil authored 14 years ago

The i_rdcache_gen value only implies we MAY have cached pages; actually
check the mapping to see if it's worth bothering with an invalidate.
Signed-off-by: Sage Weil <sage@newdream.net>

93afd449

ceph: factor out libceph from Ceph file system · 3d14c5d2

Yehuda Sadeh authored 14 years ago


This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: Sage Weil <sage@newdream.net>

3d14c5d2

07 Oct, 2010 2 commits

ceph: update issue_seq on cap grant · d91f2438

Sage Weil authored 14 years ago


We need to update the issue_seq on any grant operation, be it via an MDS
reply or a separate grant message.  The update in the grant path was
missing.  This broke cap release for inodes in which the MDS sent an
explicit grant message that was not soon after followed by a successful
MDS reply on the same inode.

Also fix the signedness on seq locals.
Signed-off-by: Sage Weil <sage@newdream.net>

d91f2438

ceph: send cap release message early on failed revoke. · 21b559de

Greg Farnum authored 14 years ago


If an MDS tries to revoke caps that we don't have, we want to send
releases early since they probably contain the caps message the MDS
is looking for.

Previously, we only sent the messages if we didn't have the inode either. But
in a multi-mds system we can retain the inode after dropping all caps for
a single MDS.
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

21b559de

17 Sep, 2010 2 commits

ceph: check mapping to determine if FILE_CACHE cap is used · a43fb731

Sage Weil authored 14 years ago


See if the i_data mapping has any pages to determine if the FILE_CACHE
capability is currently in use, instead of assuming it is any time the
rdcache_gen value is set (i.e., issued -> used).

This allows the MDS RECALL_STATE process work for inodes that have cached
pages.
Signed-off-by: Sage Weil <sage@newdream.net>

a43fb731

ceph: only send one flushsnap per cap_snap per mds session · e835124c

Sage Weil authored 14 years ago


Sending multiple flushsnap messages is problematic because we ignore
the response if the tid doesn't match, and the server may only respond to
each one once.  It's also a waste.

So, skip cap_snaps that are already on the flushing list, unless the caller
tells us to resend (because we are reconnecting).
Signed-off-by: Sage Weil <sage@newdream.net>

e835124c

14 Sep, 2010 1 commit

ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap · cfc0bf66

Sage Weil authored 14 years ago


Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages
or is still writing.  We'll send the newer capsnaps only after the older
ones complete.
Signed-off-by: Sage Weil <sage@newdream.net>

cfc0bf66

24 Aug, 2010 1 commit

ceph: maintain i_head_snapc when any caps are dirty, not just for data · 7d8cb26d

Sage Weil authored 14 years ago


We used to use i_head_snapc to keep track of which snapc the current epoch
of dirty data was dirtied under.  It is used by queue_cap_snap to set up
the cap_snap.  However, since we queue cap snaps for any dirty caps, not
just for dirty file data, we need to keep a valid i_head_snapc anytime
we have dirty|flushing caps.  This fixes a NULL pointer deref in
queue_cap_snap when writing back dirty caps without data (e.g.,
snaptest-authwb.sh).
Signed-off-by: Sage Weil <sage@newdream.net>

7d8cb26d

22 Aug, 2010 2 commits

ceph: include dirty xattrs state in snapped caps · 4a625be4

Sage Weil authored 14 years ago


When we snapshot dirty metadata that needs to be written back to the MDS,
include dirty xattr metadata.  Make the capsnap reference the encoded
xattr blob so that it will be written back in the FLUSHSNAP op.

Also fix the capsnap creation guard to include dirty auth or file bits,
not just tests specific to dirty file data or file writes in progress
(this fixes auth metadata writeback).
Signed-off-by: Sage Weil <sage@newdream.net>

4a625be4

ceph: fix xattr cap writeback · 082afec9

Sage Weil authored 14 years ago


We should include the xattr metadata blob in the cap update message any
time we are flushing dirty state, NOT just when we are also dropping the
cap.  This fixes async xattr writeback.

Also, clean up the code slightly to avoid duplicating the bit test.
Signed-off-by: Sage Weil <sage@newdream.net>

082afec9

05 Aug, 2010 1 commit

ceph: only queue async writeback on cap revocation if there is dirty data · 0eb6cd49

Sage Weil authored 14 years ago

Normally, if the Fb cap bit is being revoked, we queue an async writeback.
If there is no dirty data but we still hold the cap, this leaves the
client sitting around doing nothing until the cap timeouts expire and the
cap is released on its own (as it would have been without the revocation).

Instead, only queue writeback if the bit is actually used (i.e., we have
dirty data). If not, we can reply to the revocation immediately.
Signed-off-by: Sage Weil <sage@newdream.net>

0eb6cd49

02 Aug, 2010 11 commits

ceph: support v2 client_caps encoding · ce1fbc8d

Sage Weil authored 14 years ago


Add support for v2 encoding of MClientCaps, which includes a flock blob.
Signed-off-by: Sage Weil <sage@newdream.net>

ce1fbc8d

ceph: warn on missing snap realm · b8cd07e7

Sage Weil authored 14 years ago


Well, this Shouldn't Happen, so it would be helpful to know the caller when
it does.
Signed-off-by: Sage Weil <sage@newdream.net>

b8cd07e7

ceph: add ceph_get_cap_for_mds function. · 2bc50259

Greg Farnum authored 14 years ago

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

2bc50259

ceph: connect to export targets on cap export · 154f42c2

Sage Weil authored 14 years ago


When we get a cap EXPORT message, make sure we are connected to all export
targets to ensure we can handle the matching IMPORT.
Signed-off-by: Sage Weil <sage@newdream.net>

154f42c2

ceph: do caps accounting per mds_client · 37151668

Yehuda Sadeh authored 14 years ago


Caps related accounting is now being done per mds client instead
of just being global. This prepares ground work for a later revision
of the caps preallocated reservation list.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

37151668

ceph: code cleanup · cd84db6e

Yehuda Sadeh authored 14 years ago


Mainly fixing minor issues reported by sparse.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

cd84db6e

ceph: skip if no auth cap in flush_snaps · ca81f3f6

Sage Weil authored 14 years ago


If we have a capsnap but no auth cap (e.g. because it is migrating to
another mds), bail out and do nothing for now.  Do NOT remove the capsnap
from the flush list.
Signed-off-by: Sage Weil <sage@newdream.net>

ca81f3f6

ceph: simplify caps revocation, fix for multimds · 3b454c49

Sage Weil authored 14 years ago


The caps revocation should either initiate writeback, invalidateion, or
call check_caps to ack or do the dirty work.  The primary question is
whether we can get away with only checking the auth cap or whether all
caps need to be checked.

The old code was doing...something else.  At the very least, revocations
from non-auth MDSs could break by triggering the "check auth cap only"
case.
Signed-off-by: Sage Weil <sage@newdream.net>

3b454c49

ceph: drop unused argument · ee6b272b
Sage Weil authored 14 years ago
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
ee6b272b

ceph: perform lazy reads when file mode and caps permit · 2962507c

Sage Weil authored 14 years ago


If the file mode is marked as "lazy," perform cached/buffered reads when
the caps permit it.  Adjust the rdcache_gen and invalidation logic
accordingly so that we manage our cache based on the FILE_CACHE -or-
FILE_LAZYIO cap bits.
Signed-off-by: Sage Weil <sage@newdream.net>

2962507c

ceph: perform lazy writes when file mode and caps permit · 33caad32

Sage Weil authored 14 years ago


If we have marked a file as "lazy" (using the ceph ioctl), perform buffered
writes when the MDS caps allow it.
Signed-off-by: Sage Weil <sage@newdream.net>

33caad32

27 Jul, 2010 1 commit

ceph: use complete_all and wake_up_all · 03066f23

Yehuda Sadeh authored 14 years ago


This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>

03066f23

23 Jul, 2010 1 commit

ceph: fix dentry lease release · 1dadcce3

Sage Weil authored 14 years ago


When we embed a dentry lease release notification in a request, invalidate
our lease so we don't think we still have it.  Otherwise we can get all
sorts of incorrect client behavior when multiple clients are interacting
with the same part of the namespace.
Signed-off-by: Sage Weil <sage@newdream.net>

1dadcce3

29 Jun, 2010 2 commits

ceph: fix caps usage accounting for import (non-reserved) case · 443b3760

Sage Weil authored 14 years ago


We need to increase the total and used counters when allocating a new cap
in the non-reserved (cap import) case.
Signed-off-by: Sage Weil <sage@newdream.net>

443b3760

ceph: only release clean, unused caps with mds requests · ec97f88b

Sage Weil authored 14 years ago


We can drop caps with an mds request.  Ensure we only drop unused AND
clean caps, since the MDS doesn't support cap writeback in that context,
nor do we track it.  If caps are dirty, and the MDS needs them back, we
it will revoke and we will flush in the normal fashion.

This fixes a possibly loss of metadata.
Signed-off-by: Sage Weil <sage@newdream.net>

ec97f88b

10 Jun, 2010 3 commits

ceph: try to send partial cap release on cap message on missing inode · 2b2300d6

Sage Weil authored 14 years ago


If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately.  This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.

If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.
Signed-off-by: Sage Weil <sage@newdream.net>

2b2300d6

ceph: release cap on import if we don't have the inode · 3d7ded4d

Sage Weil authored 14 years ago


If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.
Signed-off-by: Sage Weil <sage@newdream.net>

3d7ded4d

ceph: fix misleading/incorrect debug message · 9dbd412f

Sage Weil authored 14 years ago


Nothing is released here: the caps message is simply ignored in this case.
Signed-off-by: Sage Weil <sage@newdream.net>

9dbd412f