- 09 Nov, 2012 6 commits
-
-
Philipp Reisner authored
Since the listening socket is open all the time, it was possible to get into stable "initial packet S crossed" loops. * when both sides realize in the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with repeated "packet S crossed" messages. * when both sides do not realize with the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with alternating "packet S crossed" "packet M crossed" messages. In order to break out these stable loops randomize the behaviour if such a crossing of P_INITIAL_DATA or P_INITIAL_META packets is detected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Since now our listening socket is open all the time we will get connection tries of the peer always in. No need to try it three times. This is valid when connecting to older peers as well, it simply increases the probability that the new version DRBD will accept a connection instead that it will establish one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Since the drbd_socket_okay() function itself tests if the the socket is NULL, the explicit test "if (sock.socket && &msock.socket)" was redundent. Apart from that the address opperator ('&') before msock.socket rendered the test pointless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Marek authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
In 8.4, we may have bios spanning two activity log extents. Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Had to bump the protocol version differently, we are now 101. Added verify_can_do_stop_sector() { protocol >= 97 && protocol != 100; } Also, the return value convention for worker callbacks has changed, we returned "true/false" for "keep the connection up" in 8.3, we return 0 for success and <= for failure in 8.4. Affected: receive_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
- 08 Nov, 2012 34 commits
-
-
Lars Ellenberg authored
If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The two unused "global flags" in 8.3 are "per volume" flags in 8.4. Still, they are unused, so lose them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
drbd_adm_down() does adm_detach(), which can fail with various error codes, or be interrupted by a signal. The interrupted by signal case was not properly handled, leading to block drbd0: ASSERT( mdev->state.disk == D_DISKLESS && mdev->state.conn == C_STANDALONE ) in drbd/drbd_worker.c and further to destroying objects while still in use, and resulting crashes. Detect the interruption, and take the error path out. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Sometimes, a lower level block device turns into a tar-pit, not completing requests at all, not even doing error completion. We can force-detach from such a tar-pit block device, either by disk-timeout, or by drbdadm detach --force. Queueing for retry only from the request destruction path (kref hit 0) makes it impossible to retry affected read requests from the peer, until the local IO completion happened, as the locally submitted bio holds a reference on the drbd request object. If we can only complete READs when the local completion finally happens, we would not need to force-detach in the first place. Instead, queue for retry where we otherwise had done the error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch This looks cleaner to me, and also gets rid of the other ugly if-inside-case-fall-through. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch The logic for when to get or put a reference is in mod_rq_state(). To not get confused in the freeze/thaw respectively resend/restart paths, or when cleaning up requests waiting for P_BARRIER_ACK, this also introduces additional state flags: RQ_COMPLETION_SUSP, and RQ_EXP_BARR_ACK. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch completion_ref will count pending events necessary for completion. kref is for destruction. This only introduces these new members of struct drbd_request, a followup patch will make actual use of them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The previous commit causes __drbd_make_request() to always return 0. Change it to void. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch READs will be interesting to at most one connection, WRITEs should be interesting for all established connections. Introduce some helper functions to hopefully make this easier to follow. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch DRBD requests (struct drbd_request) are already on the per resource transfer log list, and carry their epoch number. We do not need to additionally link them on other ring lists in other structs. The drbd sender thread can recognize itself when to send a P_BARRIER, by tracking the currently processed epoch, and how many writes have been processed for that epoch. If the epoch of the request to be processed does not match the currently processed epoch, any writes have been processed in it, a P_BARRIER for this last processed epoch is send out first. The new epoch then becomes the currently processed epoch. To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK, the sender thread also needs to handle the case when the current epoch was closed already, but no new requests are queued yet, and send out P_BARRIER as soon as possible. This is done by comparing the per resource "current transfer log epoch" (tconn->current_tle_nr) with the per connection "currently processed epoch number" (tconn->send.current_epoch_nr), while waiting for new requests to be processed in wait_for_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch In 8.4, we don't distinguish between "resource work" and "connection work" yet, we have one worker for both, as we still have only one connection. We only ever used the "data.work", no need to keep the "meta.work" around. Move tconn->data.work to tconn->sender_work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch In 8.4, we still use drbd_queue_work_front(), so in normal operation, we can not dequeue batches, but only single items. Still, followup commits will wake the worker without explicitly queueing a work item, so up() is replaced by a simple wake_up(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked from drbd 9 devel branch. In preparation of multiple connections, the "barrier number" or "epoch number" needs to be tracked per-resource, not per connection. The sequence number space will not be reset anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Meanwhile, this is used to restart failed READ requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The commit drbd: simplify retry path of failed READ requests simplified it too much: it just did not do anything for local read errors. Add the missing req_may_be_completed_not_susp() to the READ_COMPLETED_WITH_ERROR case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
By disabling al-updates one might increase performace. The price for that is that in case a crashed primary (that had al-updates disabled) is reintegraded, it will receive a full-resync instead of a bitmap based resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Andreas Gruenbacher authored
These macros no longer exist in kernel version v3.5-rc1. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If *in that very moment* a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-