Commits · a14ea269dd6b5e48a2941ba73b202cd7cd5d716d · Kirill Smelkov / linux

14 Feb, 2013 33 commits

rbd: turn off interrupts for open/remove locking · a14ea269

Alex Elder authored Feb 05, 2013

This commit:
    bc7a62ee5 rbd: prevent open for image being removed
added checking for removing rbd before allowing an open, and used
the same request spinlock for protecting that and updating the open
count as is used for the request queue.

However it used the non-irq protected version of the spinlocks.
Fix that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

a14ea269

libceph: don't require r_num_pages for bio requests · 9cbb1d72

Alex Elder authored Jan 31, 2013

There is a check in the completion path for osd requests that
ensures the number of pages allocated is enough to hold the amount
of incoming data expected.

For bio requests coming from rbd the "number of pages" is not really
meaningful (although total length would be).  So stop requiring that
nr_pages be supplied for bio requests.  This is done by checking
whether the pages pointer is null before checking the value of
nr_pages.

Note that this value is passed on to the messenger, but there it's
only used for debugging--it's never used for validation.

While here, change another spot that used r_pages in a debug message
inappropriately, and also invalidate the r_con_filling_msg pointer
after dropping a reference to it.

This resolves:
    http://tracker.ceph.com/issues/3875Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

9cbb1d72

rbd: don't take extra bio reference for osd client · 1e32d34c

Alex Elder authored Jan 30, 2013

Currently, if the OSD client finds an osd request has had a bio list
attached to it, it drops a reference to it (or rather, to the first
entry on that list) when the request is released.

The code that added that reference (i.e., the rbd client) is
therefore required to take an extra reference to that first bio
structure.

The osd client doesn't really do anything with the bio pointer other
than transfer it from the osd request structure to outgoing (for
writes) and ingoing (for reads) messages.  So it really isn't the
right place to be taking or dropping references.

Furthermore, the rbd client already holds references to all bio
structures it passes to the osd client, and holds them until the
request is completed.  So there's no need for this extra reference
whatsoever.

So remove the bio_put() call in ceph_osdc_release_request(), as
well as its matching bio_get() call in rbd_osd_req_create().

This change could lead to a crash if old libceph.ko was used with
new rbd.ko.  Add a compatibility check at rbd initialization time to
avoid this possibilty.

This resolves:
    http://tracker.ceph.com/issues/3798    and
    http://tracker.ceph.com/issues/3799Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

1e32d34c

libceph: add a compatibility check interface · 72fe25e3

Alex Elder authored Jan 30, 2013

An upcoming change implements semantic change that could lead to
a crash if an old version of the libceph kernel module is used with
a new version of the rbd kernel module.

In order to preclude that possibility, this adds a compatibilty
check interface.  If this interface doesn't exist, the modules are
obviously not compatible.  But if it does exist, this provides a way
of letting the caller know whether it will operate properly with
this libceph module.

Perhaps confusingly, it returns false right now.  The semantic
change mentioned above will make it return true.

This resolves:
    http://tracker.ceph.com/issues/3800Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

72fe25e3

rbd: prevent open for image being removed · b82d167b

Alex Elder authored Jan 14, 2013

An open request for a mapped rbd image can arrive while removal of
that mapping is underway.  We need to prevent such an open request
from succeeding.  (It appears that Maciej Galkiewicz ran into this
problem.)

Define and use a "removing" flag to indicate a mapping is getting
removed.  Set it in the remove path after verifying nothing holds
the device open.  And check it in the open path before allowing the
open to proceed.  Acquire the rbd device's lock around each of these
spots to avoid any races accessing the flags and open_count fields.

This addresses:
    http://tracker.newdream.net/issues/3427Reported-by: Maciej Galkiewicz <maciejgalkiewicz@ragnarson.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

b82d167b

rbd: define flags field, use it for exists flag · 6d292906

Alex Elder authored Jan 14, 2013

Define a new rbd device flags field, manipulated using bit
operations.  Replace the use of the current "exists" flag with a bit
in this new "flags" field.  Add a little commentary about the
"exists" flag, which does not need to be manipulated atomically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

6d292906

rbd: don't drop watch requests on completion · 8eb87565

Alex Elder authored Jan 25, 2013

When we register an osd request to linger, it means that request
will stay around (under control of the osd client) until we've
unregistered it.  We do that for an rbd image's header object, and
we keep a pointer to the object request associated with it.

Keep a reference to the watch object request for as long as it is
registered to linger.  Drop it again after we've removed the linger
registration.

This resolves:
    http://tracker.ceph.com/issues/3937

(Note: this originally came about because the osd client was
issuing a callback more than once.  But that behavior will be
changing soon, documented in tracker issue 3967.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

8eb87565

rbd: decrement obj request count when deleting · 25dcf954

Alex Elder authored Jan 25, 2013

Decrement the obj_request_count value when deleting an object
request from its image request's list.  Rearrange a few lines
in the surrounding code.

This resolves:
    http://tracker.ceph.com/issues/3940Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

25dcf954

rbd: track object rather than osd request for watch · 975241af

Alex Elder authored Jan 25, 2013

Switch to keeping track of the object request pointer rather than
the osd request used to watch the rbd image header object.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

975241af

rbd: unregister linger in watch sync routine · 6977c3f9

Alex Elder authored Jan 25, 2013

Move the code that unregisters an rbd device's lingering header
object watch request into rbd_dev_header_watch_sync(), so it
occurs in the same function that originally sets up that request.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

6977c3f9

rbd: get rid of rbd_req_sync_exec() · 9f20e02a

Alex Elder authored Jan 20, 2013

Get rid rbd_req_sync_exec() because it is no longer used.  That
eliminates the last use of rbd_req_sync_op(), so get rid of that
too.  And finally, that leaves rbd_do_request() unreferenced, so get
rid of that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

9f20e02a

rbd: implement sync method with new code · 36be9a76

Alex Elder authored Jan 19, 2013

Reimplement synchronous object method calls using the new request
tracking code.  Use the name rbd_obj_method_sync()
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

36be9a76

rbd: send notify ack asynchronously · cf81b60e

Alex Elder authored Jan 17, 2013

When we receive notification of a change to an rbd image's header
object we need to refresh our information about the image (its
size and snapshot context).  Once we have refreshed our rbd image
we need to acknowledge the notification.

This acknowledgement was previously done synchronously, but there's
really no need to wait for it to complete.

Change it so the caller doesn't wait for the notify acknowledgement
request to complete.  And change the name to reflect it's no longer
synchronous.

This resolves:
    http://tracker.newdream.net/issues/3877Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

cf81b60e

rbd: get rid of rbd_req_sync_notify_ack() · 5ae9db81

Alex Elder authored Jan 20, 2013

Get rid rbd_req_sync_notify_ack() because it is no longer used.
As a result rbd_simple_req_cb() becomes unreferenced, so get rid
of that too.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

5ae9db81

rbd: use new code for notify ack · b8d70035

Alex Elder authored Nov 30, 2012

Use the new object request tracking mechanism for handling a
notify_ack request.

Move the callback function below the definition of this so we don't
have to do a pre-declaration.

This resolves:
    http://tracker.newdream.net/issues/3754Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

b8d70035

rbd: get rid of rbd_req_sync_watch() · ecf7a031

Alex Elder authored Jan 20, 2013

Get rid of rbd_req_sync_watch(), because it is no longer used.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

ecf7a031

rbd: implement watch/unwatch with new code · 9969ebc5

Alex Elder authored Jan 18, 2013

Implement a new function to set up or tear down a watch event
for an mapped rbd image header using the new request code.

Create a new object request type "nodata" to handle this.  And
define rbd_osd_trivial_callback() which simply marks a request done.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

9969ebc5

rbd: get rid of rbd_req_sync_read() · 86ea43bf

Alex Elder authored Jan 20, 2013

Delete rbd_req_sync_read() is no longer used, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

86ea43bf

rbd: implement sync object read with new code · 788e2df3

Alex Elder authored Jan 17, 2013

Reimplement the synchronous read operation used for reading a
version 1 header using the new request tracking code.  Name the
resulting function rbd_obj_read_sync() to better reflect that
it's a full object operation, not an object request.  To do this,
implement a new OBJ_REQUEST_PAGES object request type.

This implements a new mechanism to allow the caller to wait for
completion for an rbd_obj_request by calling rbd_obj_request_wait().

This partially resolves:
    http://tracker.newdream.net/issues/3755Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

788e2df3

rbd: kill rbd_req_coll and rbd_request · 7d250b94

Alex Elder authored Nov 30, 2012

The two remaining callers of rbd_do_request() always pass a null
collection pointer, so the "coll" and "coll_index" parameters are
not needed.  There is no other use of that data structure, so it
can be eliminated.

Deleting them means there is no need to allocate a rbd_request
structure for the callback function.  And since that's the only use
of *that* structure, it too can be eliminated.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

7d250b94

rbd: kill rbd_rq_fn() and all other related code · 2250a71b

Alex Elder authored Nov 30, 2012

Now that the request function has been replaced by one using the new
request management data structures the old one can go away.
Deleting it makes rbd_dev_do_request() no longer needed, and
deleting that makes other functions unneeded, and so on.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

2250a71b

rbd: new request tracking code · bf0d5f50

Alex Elder authored Nov 22, 2012

This patch fully implements the new request tracking code for rbd
I/O requests.

Each I/O request to an rbd image will get an rbd_image_request
structure allocated to track it.  This provides access to all
information about the original request, as well as access to the
set of one or more object requests that are initiated as a result
of the image request.

An rbd_obj_request structure defines a request sent to a single osd
object (possibly) as part of an rbd image request.  An rbd object
request refers to a ceph_osd_request structure built up to represent
the request; for now it will contain a single osd operation.  It
also provides space to hold the result status and the version of the
object when the osd request completes.

An rbd_obj_request structure can also stand on its own.  This will
be used for reading the version 1 header object, for issuing
acknowledgements to event notifications, and for making object
method calls.

All rbd object requests now complete asynchronously with respect
to the osd client--they supply a common callback routine.

This resolves:
    http://tracker.newdream.net/issues/3741Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

bf0d5f50

libceph: fix messenger CONFIG_BLOCK dependencies · 3ebc21f7

Alex Elder authored Jan 31, 2013

The ceph messenger has a few spots that are only used when
bio messages are supported, and that's only when CONFIG_BLOCK
is defined.  This surrounds a couple of spots with #ifdef's
that would cause a problem if CONFIG_BLOCK were not present
in the kernel configuration.

This resolves:
    http://tracker.ceph.com/issues/3976Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

3ebc21f7

ceph: implement hidden per-field ceph.*.layout.* vxattrs · 695b7119

Sage Weil authored Jan 20, 2013

Allow individual fields of the layout to be fetched via getxattr.
The ceph.dir.layout.* vxattr with "disappear" if the exists_cb
indicates there no dir layout set.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

695b7119

ceph: add ceph.dir.layout vxattr · 1f08f2b0

Sage Weil authored Jan 20, 2013

This virtual xattr will only appear when there is a dir layout policy
set on the directory.  It can be set via setxattr and removed via
removexattr (implemented by the MDS).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

1f08f2b0

ceph: change ceph.file.layout.* implementation, content · 32ab0bd7

Sage Weil authored Jan 19, 2013

Implement a new method to generate the ceph.file.layout vxattr using
the new framework.

Use 'stripe_unit' instead of 'chunk_size'.

Include pool name, either as a string or as an integer.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

32ab0bd7

ceph: fix listxattr handling for vxattrs · b65917dd

Sage Weil authored Jan 20, 2013

Only include vxattrs in the result if they are not hidden and exist
(as determined by the exists_cb callback).

Note that the buffer size we return when 0 is passed in always includes
vxattrs that *might* exist, forming an upper bound.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

b65917dd

ceph: fix getxattr vxattr handling · 0bee82fb

Sage Weil authored Jan 20, 2013

Change the vxattr handling for getxattr so that vxattrs are checked
prior to any xattr content, and never after.  Enforce vxattr existence
via the exists_cb callback.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

0bee82fb

ceph: add exists_cb to vxattr struct · f36e4472

Sage Weil authored Jan 20, 2013

Allow for a callback to dynamically determine if a vxattr exists for
the given inode.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

f36e4472

ceph: pass ceph.* removexattrs through to MDS · d421acb1

Sage Weil authored Jan 20, 2013

If we do not explicitly recognized a vxattr (e.g., as readonly), pass
the request through to the MDS and deal with it there.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

d421acb1

ceph: pass unhandled ceph.* setxattrs through to MDS · 3adf654d

Sage Weil authored Jan 31, 2013

If we do not specifically understand a setxattr on a ceph.* virtual
xattr, send it through to the MDS.  This allows us to implement new
functionality via the MDS without direct support on the client side.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

3adf654d

ceph: support hidden vxattrs · 8860147a

Sage Weil authored Jan 31, 2013

Add ability to flag virtual xattrs as hidden, such that you can
getxattr them but they do not appear in listxattr.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

8860147a

ceph: remove 'ceph.layout' virtual xattr · 39b648d9

Sage Weil authored Jan 31, 2013

This has been deprecated since v3.3, 114fc474.  Kill it.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>

39b648d9

30 Jan, 2013 1 commit
- Merge branch 'testing' of github.com:ceph/ceph-client into v3.8-rc5-testing · 969e5aa3
  Alex Elder authored Jan 30, 2013
  
  969e5aa3
25 Jan, 2013 6 commits

libceph: fix undefined behavior when using snprintf() · 1ec3911d

Cong Ding authored Jan 25, 2013

The variable "str" is used as both the source and destination in
function snprintf(), which is undefined behavior based on C11. The
original description in C11 is:
	"If copying takes place between objects that
	overlap, the behavior is undefined."

And, the function of ceph_osdmap_state_str() is to return the osdmap
state, so it should return "doesn't exist" when all the conditions
are not satisfied. I fix it in this patch.

[elder@inktank.com: shortened the commit message]
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>

1ec3911d

rbd: don't retry setting up header watch · c0430647

Alex Elder authored Jan 18, 2013

When an rbd image is initially mapped a watch event is registered so
we can do something if the header object changes.

The code that does this currently loops if initiating the watch
request results in an ERANGE error.  The osds will never return
ERANGE, so there's no reason to do this loop, so get rid of it.

This resolves:
    http://tracker.newdream.net/issues/3860

Note that the problem this loop was intended to solve is a race
between collecting image header information and setting up the watch
on the header object.  The real fix for that problem is described
here:
    http://tracker.newdream.net/issues/3871Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

c0430647

rbd: check for overflow in rbd_get_num_segments() · 38901e0f

Alex Elder authored Jan 10, 2013

The return type of rbd_get_num_segments() is int, but the values it
operates on are u64.  Although it's not likely, there's no guarantee
the result won't exceed what can be respresented in an int.  The
function is already designed to return -ERANGE on error, so just add
this possible overflow as another reason to return that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

38901e0f

rbd: small changes · 98571b5a

Alex Elder authored Jan 20, 2013

A few very minor changes to the rbd code:
    - RBD_MAX_OPT_LEN is unused, so get rid of it
    - Consolidate rbd options definitions
    - Make rbd_segment_name() return pointer to const char
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

98571b5a

Linux 3.8-rc5 · 949db153
Linus Torvalds authored Jan 25, 2013

949db153

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · d7df025e

Linus Torvalds authored Jan 25, 2013

Pull btrfs fixes from Chris Mason:
 "It turns out that we had two crc bugs when running fsx-linux in a
  loop.  Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
  all down.  Miao also has a new OOM fix in this v2 pull as well.

  Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
  and resuming a running balance across drives.

  Josef's orphan truncate patch fixes an obscure corruption we'd see
  during xfstests.

  Arne's patches address problems with subvolume quotas.  If the user
  destroys quota groups incorrectly the FS will refuse to mount.

  The rest are smaller fixes and plugs for memory leaks."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
  Btrfs: fix repeated delalloc work allocation
  Btrfs: fix wrong max device number for single profile
  Btrfs: fix missed transaction->aborted check
  Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
  Btrfs: put csums on the right ordered extent
  Btrfs: use right range to find checksum for compressed extents
  Btrfs: fix panic when recovering tree log
  Btrfs: do not allow logged extents to be merged or removed
  Btrfs: fix a regression in balance usage filter
  Btrfs: prevent qgroup destroy when there are still relations
  Btrfs: ignore orphan qgroup relations
  Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
  Btrfs: fix unlock order in btrfs_ioctl_rm_dev
  Btrfs: fix unlock order in btrfs_ioctl_resize
  Btrfs: fix "mutually exclusive op is running" error code
  Btrfs: bring back balance pause/resume logic
  btrfs: update timestamps on truncate()
  btrfs: fix btrfs_cont_expand() freeing IS_ERR em
  Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
  Btrfs: fix off-by-one in lseek
  ...

d7df025e