- 17 May, 2012 5 commits
-
-
Alex Elder authored
prepare_write_connect() prepares a connect message, then sets WRITE_PENDING on the connection. Then *after* this, it calls prepare_connect_authorizer(), which updates the content of the connection buffer already queued for sending. It's also possible it will result in prepare_write_connect() returning -EAGAIN despite the WRITE_PENDING big getting set. Fix this by preparing the connect authorizer first, setting the WRITE_PENDING bit only after that is done. Partially addresses http://tracker.newdream.net/issues/2424Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Alex Elder authored
In all cases, the value passed as the msgr argument to prepare_write_connect() is just con->msgr. Just get the msgr value from the ceph connection and drop the unneeded argument. The only msgr passed to prepare_write_banner() is also therefore just the one from con->msgr, so change that function to drop the msgr argument as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Alex Elder authored
prepare_write_connect() has an argument indicating whether a banner should be sent out before sending out a connection message. It's only ever set in one of its callers, so move the code that arranges to send the banner into that caller and drop the "include_banner" argument from prepare_write_connect(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Alex Elder authored
Reset a connection's kvec fields in the caller rather than in prepare_write_connect(). This ends up repeating a few lines of code but it's improving the separation between distinct operations on the connection, which we can take advantage of later. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Alex Elder authored
Move the kvec reset for a connection out of prepare_write_banner and into its only caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
- 16 May, 2012 2 commits
-
-
Sage Weil authored
Old users may not expect EINVAL, and there is no clear user-visibile behavior change now that we ignore it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
-
Sage Weil authored
When we are setting a new layout, fully initialize the structure: - zero it out - always set preferred_osd to -1 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
-
- 14 May, 2012 13 commits
-
-
Alex Elder authored
Make the second argument to read_partial() be the ending input byte position rather than the beginning offset it now represents. This amounts to moving the addition "to + size" into the caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Alex Elder authored
read_partial() always increases whatever "to" value is supplied by adding the requested size to it, and that's the only thing it does with that pointed-to value. Do that pointer advance in the caller (and then only when the updated value will be subsequently used), and change the "to" parameter to be an in-only and non-pointer value. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Alex Elder authored
There are two blocks of code in read_partial_message()--those that read the header and footer of the message--that can be replaced by a call to read_partial(). Do that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
-
Josh Durgin authored
Each attribute is prefixed with "snap_". Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Josh Durgin authored
This function rereads the entire header and handles any changes in it, not just changes in snapshots. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Josh Durgin authored
Snapshot sizes should be the same type as regular image sizes. This only affects their displayed size in sysfs, not the reported size of an actual block device sizes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Josh Durgin authored
The snapid parameters passed to rbd_do_op() and rbd_req_sync_op() are now always either a valid snapid or an explicit CEPH_NOSNAP. [elder@dreamhost.com: Rephrased the description] Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Josh Durgin authored
When a device was open at a snapshot, and snapshots were deleted or added, data from the wrong snapshot could be read. Instead of assuming the snap context is constant, store the actual snap id when the device is initialized, and rely on the OSDs to signal an error if we try reading from a snapshot that was deleted. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Josh Durgin authored
This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
-
Xi Wang authored
ondisk->snap_count is read from disk via rbd_req_sync_read() and thus needs validation. Otherwise, a bogus `snap_count' could overflow the kmalloc() size, leading to memory corruption. Also use `u32' consistently for `snap_count'. [elder@dreamhost.com: changed to use UINT_MAX rather than ULONG_MAX] Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>
-
Dan Carpenter authored
We should use the gfp_flags that the caller specified instead of GFP_KERNEL here. There is only one caller and it uses GFP_KERNEL, so this change is just a cleanup and doesn't change how the code works. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@dreamhost.com>
-
Xi Wang authored
Given a large n, the bounds check (*p + n > end) can be bypassed due to pointer wraparound. A safer check is (n > end - *p). [elder@dreamhost.com: inverted test and renamed ceph_has_room()] Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>
-
Alex Elder authored
From Al Viro <viro@zeniv.linux.org.uk> Al Viro noticed that we were using a non-cpu-encoded value in a switch statement in osd_req_encode_op(). The result would clearly not work correctly on a big-endian machine. Signed-off-by: Alex Elder <elder@dreamhost.com>
-
- 07 May, 2012 11 commits
-
-
Sage Weil authored
If we get an error code from crush_do_rule(), print an error to the console. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4. Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
Fix the node weight lookup for tree buckets by using a correct accessor. Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
These were used for the ill-fated forcefeed feature. Remove them. Reflects ceph.git commit ebdf80edfecfbd5a842b71fbe5732857994380c1. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
Remove forcefeed functionality from CRUSH. This is an ugly misfeature that is mostly useless and unused. Remove it. Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com> Conflicts: net/ceph/crush/mapper.c
-
Sage Weil authored
Use a temporary variable here to avoid repeated array lookups and clean up the code a bit. This reflects ceph.git commit 6b5be27634ad307b471a5bf0db85c4f5c834885f. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
If we get a map that doesn't make sense, error out or ignore the badness instead of BUGging out. This reflects the ceph.git commits 9895f0bff7dc68e9b49b572613d242315fb11b6c and 8ded26472058d5205803f244c2f33cb6cb10de79. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
This small adjustment reflects a change that was made in ceph.git commit af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago. An N-1 search is not exhaustive. Fixed ceph.git bug #1594. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
Move various types from int -> __u32 (or similar), and add const as appropriate. This reflects changes that have been present in the userland implementation for some time. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
Both of these methods perform similar checks; move that code to a helper so that we can ensure the checks are consistent. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
Sage Weil authored
This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
-
- 05 Apr, 2012 1 commit
-
-
Alex Elder authored
A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
- 22 Mar, 2012 8 commits
-
-
Josh Durgin authored
A new temporary header is allocated each time the header changes, but only the changed properties are copied over. We don't need a new semaphore for each header update. This addresses http://tracker.newdream.net/issues/2174Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com>
-
Alex Elder authored
In ceph_vxattrcb_file_layout(), there is a check to determine whether a preferred PG should be formatted into the output buffer. That check assumes that a preferred PG number of 0 indicates "no preference," but that is wrong. No preference is indicated by a negative (specifically, -1) PG number. In addition, if that condition yields true, the preferred value is formatted into a sized buffer, but the size consumed by the earlier snprintf() call is not accounted for, opening up the possibilty of a buffer overrun. Finally, in ceph_vxattrcb_dir_rctime() where the nanoseconds part of the time displayed did not include leading 0's, which led to erroneous (sub-second portion of) time values being shown. This fixes these three issues: http://tracker.newdream.net/issues/2155 http://tracker.newdream.net/issues/2156 http://tracker.newdream.net/issues/2157Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
In write_partial_msg_pages(), every case now does an identical call to kmap(page). Instead, just call it once inside the CRC-computing block where it's needed. Move the definition of kaddr inside that block, and make it a (char *) to ensure portable pointer arithmetic. We still don't kunmap() it until after the sendpage() call, in case that also ends up needing to use the mapping. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
In write_partial_msg_pages() there is a local variable used to track the starting offset within a bio segment to use. Its name, "page_shift" defies the Linux convention of using that name for log-base-2(page size). Since it's only used in the bio case rename it "bio_offset". Use it along with the page_pos field to compute the memory offset when computing CRC's in that function. This makes the bio case match the others more closely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
There's not a lot of benefit to zero_page_address, which basically holds a mapping of the zero page through the life of the messenger module. Even with our own mapping, the sendpage interface where it's used may need to kmap() it again. It's almost certain to be in low memory anyway. So stop treating the zero page specially in write_partial_msg_pages() and just get rid of zero_page_address entirely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is used, by using this helper in write_partial_msg_pages(). Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
If a message queued for send gets revoked, zeroes are sent over the wire instead of any unsent data. This is done by constructing a message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg(). Since we are already working with a page in this case we can use the sendpage interface instead. Create a new ceph_tcp_sendpage() helper that sets up flags to match the way ceph_tcp_sendmsg() does now. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-
Alex Elder authored
CRC's are computed for all messages between ceph entities. The CRC computation for the data portion of message can optionally be disabled using the "nocrc" (common) ceph option. The default is for CRC computation for the data portion to be enabled. Unfortunately, the code that implements this feature interprets the feature flag wrong, meaning that by default the CRC's have *not* been computed (or checked) for the data portion of messages unless the "nocrc" option was supplied. Fix this, in write_partial_msg_pages() and read_partial_message(). Also change the flag variable in write_partial_msg_pages() to be "no_datacrc" to match the usage elsewhere in the file. This fixes http://tracker.newdream.net/issues/2064Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
-