Commits · aded07ea9f7de26e352f679910ac057212ea35e3 · nexedi / linux

30 Jul, 2012 19 commits

Alex Elder authored Jul 03, 2012

Rename variables named "obj" which represent object names so they're
consistently named "object_name".

Rename the "cls" and "method" parameters in rbd_req_sync_exec()
to be "class_name" and "method_name", and make similar changes
to the names of local variables in that function representing
the lengths of those names.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

aded07ea

rbd: rename some fields in struct rbd_dev · 0bed54dc

Alex Elder authored Jul 03, 2012

An rbd image is not a single object, but a logical construct made up
of an aggregation of objects.

Rename some fields in struct rbd_dev, in hopes of reinforcing this.
    obj         --> image_name
    obj_len     --> image_name_len
    obj_md_name --> header_name
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

0bed54dc

rbd: use rbd_dev consistently · 0ce1a794

Alex Elder authored Jul 03, 2012

Most variables that represent a struct rbd_device are named
"rbd_dev", but in some cases "dev" is used instead.  Change all the
"dev" references so they use "rbd_dev" consistently, to make it
clear from the name that we're working with an RBD device (as
opposed to, for example, a struct device).  Similarly, change the
name of the "dev" field in struct rbd_notify_info to be "rbd_dev".
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

0ce1a794

rbd: dynamically allocate snapshot name · 820a5f3e

Alex Elder authored Jul 09, 2012

There is no need to impose a small limit the length of the snapshot
name recorded for an rbd image in a struct rbd_dev.  Remove the
limitation by allocating space for the snapshot name dynamically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

820a5f3e

rbd: dynamically allocate image name · bf3e5ae1

Alex Elder authored Jul 09, 2012

There is no need to impose a small limit the length of the rbd image
name recorded in a struct rbd_dev.  Remove the limitation by
allocating space for the image name dynamically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

bf3e5ae1

rbd: dynamically allocate image header name · cb8627c7

Alex Elder authored Jul 09, 2012

There is no need to impose a small limit the length of the header
name recorded for an rbd image in a struct rbd_dev.  Remove the
limitation by allocating space for the header name dynamically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

cb8627c7

rbd: dynamically allocate object prefix · 849b4260

Alex Elder authored Jul 09, 2012

There is no need to impose a small limit the length of the object
prefix recorded for an rbd image in a struct rbd_image_header.
Remove the limitation by allocating space for the object prefix
dynamically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

849b4260

rbd: dynamically allocate pool name · d22f76e7

Alex Elder authored Jul 12, 2012

There is no need to impose a small limit the length of the pool name
recorded for an rbd image in a struct rbd_device.  Remove the
limitation by allocating space for the pool name ynamically.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

d22f76e7

rbd: create pool_id device attribute · 9bb2f334

Alex Elder authored Jul 12, 2012

Add an entry under /sys/bus/rbd/devices/<N>/ named "pool_id" that
provides the id for the pool the rbd image is assocatied with.  This
is in addition to the pool name already provided.

Rename the "poolid" field in struct rbd_device  to be "pool_id".

Update the documentation to reflect the addition of this new entry.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

9bb2f334

rbd: rename rbd_dev->block_name · ca1e49a6

Alex Elder authored Jul 10, 2012

Each rbd image has a name that forms the basis of all data objects
backing the device.  Old (format 1) images refer to this name as the
"block name," while new (format 2) images use the term "object
prefix" for this.

Change the field name in the in-core rbd image header structure to
reflect the more modern usage.  We intentionally keep the the name
"block_name" in the on-disk definition for format 1 image headers.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

ca1e49a6

rbd: define dup_token() · ea3352f4

Alex Elder authored Jul 09, 2012

Define a new function dup_token(), to be used during argument
parsing for making dynamically-allocated copies of tokens being
parsed.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

ea3352f4

libceph: define ceph_extract_encoded_string() · f8c36c58

Alex Elder authored Jul 11, 2012

This adds a new utility routine which will return a dynamically-
allocated buffer containing a string that has been decoded from ceph
over-the-wire format.  It also returns the length of the string
if the address of a size variable is supplied to receive it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

f8c36c58

rbd: drop a useless local variable · ad4f232f

Alex Elder authored Jul 03, 2012

In rbd_req_sync_notify_ack(), a local variable was needlessly being
used to hold a null pointer.  Just pass NULL instead.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

ad4f232f

libceph: fix off-by-one bug in ceph_encode_filepath() · c61a1abd

Alex Elder authored Jul 03, 2012

There is a BUG_ON() call that doesn't account for the single byte
structure version at the start of an encoded filepath in
ceph_encode_filepath().  Fix that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

c61a1abd

ceph: clean up useless d_parent checks · 8842b3be

Sage Weil authored Jun 07, 2012

d_parent is never NULL, and IS_ROOT() is the proper way to check for a
(non-self-referential) parent.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>

8842b3be

libceph: prevent the race of incoming work during teardown · a2a32584

Guanjun He authored Jul 08, 2012

Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: Guanjun He <gjhe@suse.com>
Reviewed-by: Sage Weil <sage@inktank.com>

a2a32584

libceph: fix messenger retry · a16cb1f7

Sage Weil authored Jul 10, 2012

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: Sage Weil <sage@inktank.com>

a16cb1f7

libceph: initialize rb, list nodes in ceph_osd_request · cd43045c

Sage Weil authored Jul 09, 2012

These don't strictly need to be initialized based on how they are used, but
it is good practice to do so.
Reported-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>

cd43045c

libceph: initialize msgpool message types · d50b409f

Sage Weil authored Jul 09, 2012

Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.
Reported-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>

d50b409f

06 Jul, 2012 21 commits

libceph: allow sock transition from CONNECTING to CLOSED · fbb85a47

Sage Weil authored Jun 27, 2012

It is possible to close a socket that is in the OPENING state.  For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established.  con_work() will come around and shut
down the socket.
Signed-off-by: Sage Weil <sage@inktank.com>

fbb85a47

libceph: initialize mon_client con only once · 735a72ef

Sage Weil authored Jun 27, 2012

Do not re-initialize the con on every connection attempt. When we
ceph_con_close, there may still be work queued on the socket (e.g., to
close it), and re-initializing will clobber the work_struct state.
Signed-off-by: Sage Weil <sage@inktank.com>

735a72ef

libceph: set peer name on con_open, not init · b7a9e5dd

Sage Weil authored Jun 27, 2012

The peer name may change on each open attempt, even when the connection is
reused.
Signed-off-by: Sage Weil <sage@inktank.com>

b7a9e5dd

libceph: drop declaration of ceph_con_get() · 26103021

Alex Elder authored Jun 21, 2012

For some reason the declaration of ceph_con_get() and
ceph_con_put() did not get deleted in this commit:
    d59315ca libceph: drop ceph_con_get/put helpers and nref member

Clean that up.
Signed-off-by: Alex Elder <elder@inktank.com>

26103021

libceph: add some fine ASCII art · bc18f4b1

Alex Elder authored Jun 20, 2012

Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

bc18f4b1

libceph: small changes to messenger.c · 5821bd8c

Alex Elder authored Jun 11, 2012

This patch gathers a few small changes in "net/ceph/messenger.c":
  out_msg_pos_next()
    - small logic change that mostly affects indentation
  write_partial_msg_pages().
    - use a local variable trail_off to represent the offset into
      a message of the trail portion of the data (if present)
    - once we are in the trail portion we will always be there, so we
      don't always need to check against our data position
    - avoid computing len twice after we've reached the trail
    - get rid of the variable tmpcrc, which is not needed
    - trail_off and trail_len never change so mark them const
    - update some comments
  read_partial_message_bio()
    - bio_iovec_idx() will never return an error, so don't bother
      checking for it
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

5821bd8c

libceph: distinguish two phases of connect sequence · 7593af92

Alex Elder authored May 24, 2012

Currently a ceph connection enters a "CONNECTING" state when it
begins the process of (re-)connecting with its peer.  Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information exhange has begun.  The
CONNECTING bit/state continues to be set during this phase.

Rather than have the CONNECTING state continue while the NEGOTIATING
bit is set, interpret these two phases as distinct states.  In other
words, when NEGOTIATING is set, clear CONNECTING.  That way only
one of them will be active at a time.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

7593af92

libceph: separate banner and connect writes · ab166d5a

Alex Elder authored May 31, 2012

There are two phases in the process of linking together the two ends
of a ceph connection.  The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.

When initiating a connection, the client side now queues to send
its information for both phases of this process at the same time.
This is probably a bit more efficient, but it is slightly messier
from a layering perspective in the code.

So rearrange things so that the client doesn't send the connection
information until it has received and processed the response in the
initial banner phase (in process_banner()).

Move the code (in the (con->sock == NULL) case in try_write()) that
prepares for writing the connection information, delaying doing that
until the banner exchange has completed.  Move the code that begins
the transition to this second "NEGOTIATING" phase out of
process_banner() and into its caller, so preparing to write the
connection information and preparing to read the response are
adjacent to each other.

Finally, preparing to write the connection information now requires
the output kvec to be reset in all cases, so move that into the
prepare_write_connect() and delete it from all callers.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

ab166d5a

libceph: define and use an explicit CONNECTED state · e27947c7

Alex Elder authored May 23, 2012

There is no state explicitly defined when a ceph connection is fully
operational.  So define one.

It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.

Be a little more careful when examining the old state when a socket
disconnect event is reported.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

e27947c7

libceph: clear NEGOTIATING when done · 3ec50d18

Alex Elder authored May 23, 2012

A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server).  But that bit
is not cleared again--at least not until another connection attempt
is initiated.

Instead, clear it as soon as the connection is fully established.
Also, clear it when a socket connection gets prematurely closed
in the midst of establishing a ceph connection (in case we had
reached the point where it was set).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

3ec50d18

libceph: clear CONNECTING in ceph_con_close() · bb9e6bba

Alex Elder authored Jun 20, 2012

A connection that is closed will no longer be connecting.  So
clear the CONNECTING state bit in ceph_con_close().  Similarly,
if the socket has been closed we no longer are in connecting
state (a new connect sequence will need to be initiated).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

bb9e6bba

libceph: don't touch con state in con_close_socket() · 456ea468

Alex Elder authored Jun 20, 2012

In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.

Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing being done on a socket being shut down.  We no longer set
SOCK_CLOSED in the socket event routine conditionally, so setting
that bit here no longer provides whatever benefit it might have
provided before.

A race condition could still leave the SOCK_CLOSED bit set even
after we've issued the call to con_close_socket(), so we still clear
that bit after shutting the socket down.  Add a comment explaining
the reason for this.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

456ea468

libceph: just set SOCK_CLOSED when state changes · d65c9e0b

Alex Elder authored Jun 20, 2012

When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED
connection flag bit is set, and if it had not been previously set
queue_con() is called to ensure con_work() will get a chance to
handle the changed state.

con_work() atomically checks--and if set, clears--the SOCK_CLOSED
bit if it was set.  This means that even if the bit were set
repeatedly, the related processing in con_work() only gets called
once per transition of the bit from 0 to 1.

What's important then is that we ensure con_work() gets called *at
least* once when a socket close event occurs, not that it gets
called *exactly* once.

The work queue mechanism already takes care of queueing work
only if it is not already queued, so there's no need for us
to call queue_con() conditionally.

So this patch just makes it so the SOCK_CLOSED flag gets set
unconditionally in ceph_sock_state_change().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

d65c9e0b

libceph: don't change socket state on sock event · 188048bc

Alex Elder authored Jun 20, 2012

Currently the socket state change event handler records an error
message on a connection to distinguish a close while connecting from
a close while a connection was already established.

Changing connection information during handling of a socket event is
not very clean, so instead move this assignment inside con_work(),
where it can be done during normal connection-level processing (and
under protection of the connection mutex as well).

Move the handling of a socket closed event up to the top of the
processing loop in con_work(); there's no point in handling backoff
etc. if we have a newly-closed socket to take care of.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

188048bc

libceph: SOCK_CLOSED is a flag, not a state · a8d00e3c

Alex Elder authored Jun 20, 2012

The following commit changed it so SOCK_CLOSED bit was stored in
a connection's new "flags" field rather than its "state" field.

    libceph: start separating connection flags from state
    commit 928443cd

That bit is used in con_close_socket() to protect against setting an
error message more than once in the socket event handler function.

Unfortunately, the field being operated on in that function was not
updated to be "flags" as it should have been.  This fixes that
error.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

a8d00e3c

libceph: don't use bio_iter as a flag · abdaa6a8

Alex Elder authored Jun 11, 2012

Recently a bug was fixed in which the bio_iter field in a ceph
message was not being properly re-initialized when a message got
re-transmitted:
    commit 43643528
    Author: Yan, Zheng <zheng.z.yan@intel.com>
    rbd: Clear ceph_msg->bio_iter for retransmitted message

We are now only initializing the bio_iter field when we are about to
start to write message data (in prepare_write_message_data()),
rather than every time we are attempting to write any portion of the
message data (in write_partial_msg_pages()).  This means we no
longer need to use the msg->bio_iter field as a flag.

So just don't do that any more.  Trust prepare_write_message_data()
to ensure msg->bio_iter is properly initialized, every time we are
about to begin writing (or re-writing) a message's bio data.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

abdaa6a8

libceph: move init of bio_iter · 572c588e

Alex Elder authored Jun 11, 2012

If a message has a non-null bio pointer, its bio_iter field is
initialized in write_partial_msg_pages() if this has not been done
already.  This is really a one-time setup operation for sending a
message's (bio) data, so move that initialization code into
prepare_write_message_data() which serves that purpose.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

572c588e

libceph: move init_bio_*() functions up · df6ad1f9

Alex Elder authored Jun 11, 2012

Move init_bio_iter() and iter_bio_next() up in their source file so
the'll be defined before they're needed.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

df6ad1f9

libceph: don't mark footer complete before it is · fd154f3c

Alex Elder authored Jun 11, 2012

This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE
flag before the CRC for the data portion (recorded in the footer)
has been completely computed.  Hold off setting the complete flag
until we've decided it's ready to send.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

fd154f3c

libceph: encapsulate advancing msg page · 84ca8fc8

Alex Elder authored Jun 11, 2012

In write_partial_msg_pages(), once all the data from a page has been
sent we advance to the next one.  Put the code that takes care of
this into its own function.

While modifying write_partial_msg_pages(), make its local variable
"in_trail" be Boolean, and use the local variable "msg" (which is
just the connection's current out_msg pointer) consistently.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

84ca8fc8

libceph: encapsulate out message data setup · 739c905b

Alex Elder authored Jun 11, 2012

Move the code that prepares to write the data portion of a message
into its own function.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

739c905b