Commits · b403a98e260f3a8c7c33f58a07c7ae549852170f · nexedi / linux

21 May, 2010 9 commits

block: improve automatic native capacity unlocking · b403a98e

Tejun Heo authored May 15, 2010

Currently, native capacity unlocking is initiated only when a
recognized partition extends beyond the end of the disk.  However,
there are several other unhandled cases where truncated capacity can
lead to misdetection of partitions.

* Partition table is fully beyond EOD.

* Partition table is partially beyond EOD (daisy chained ones).

* Recognized partition starts beyond EOD.

This patch updates generic partition check code such that all the
above three cases are handled too.  For the first two, @state tracks
whether low level partition check code tried to read beyond EOD during
partition scan and triggers native capacity unlocking accordingly.
The third is now handled similarly to the original unlocking case.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b403a98e

block: use struct parsed_partitions *state universally in partition check code · 1493bf21

Tejun Heo authored May 15, 2010

Make the following changes to partition check code.

* Add ->bdev to struct parsed_partitions.

* Introduce read_part_sector() which is a simple wrapper around
  read_dev_sector() which takes struct parsed_partitions *state
  instead of @bdev.

* For functions which used to take @state and @bdev, drop @bdev.  For
  functions which used to take @bdev, replace it with @state.

* While updating, drop superflous checks on NULL state/bdev in ldm.c.

This cleans up the API a bit and enables better handling of IO errors
during partition check as the generic partition check code now has
much better visibility into what went wrong in the low level code
paths.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1493bf21

block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity() · c3e33e04

Tejun Heo authored May 15, 2010

bdops->set_capacity() is unnecessarily generic.  All that's required
is a simple one way notification to lower level driver telling it to
try to unlock native capacity.  There's no reason to pass in target
capacity or return the new capacity.  The former is always the
inherent native capacity and the latter can be handled via the usual
device resize / revalidation path.  In fact, the current API is always
used that way.

Replace ->set_capacity() with ->unlock_native_capacity() which take
only @disk and doesn't return anything.  IDE which is the only current
user of the API is converted accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c3e33e04

block: restart partition scan after resizing a device · 56bca017

Tejun Heo authored May 15, 2010

Device resize via ->set_capacity() can reveal new partitions (e.g. in
chained partition table formats such as dos extended parts).  Restart
partition scan from the beginning after resizing a device.  This
change also makes libata always revalidate the disk after resize which
makes lower layer native capacity unlocking implementation simpler and
more robust as resize can be handled in the usual path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

56bca017

buffer: make invalidate_bdev() drain all percpu LRU add caches · fa4b9074

Tejun Heo authored May 15, 2010

invalidate_bdev() should release all page cache pages which are clean
and not being used; however, if some pages are still in the percpu LRU
add caches on other cpus, those pages are considered in used and don't
get released.  Fix it by calling lru_add_drain_all() before trying to
invalidate pages.

This problem was discovered while testing block automatic native
capacity unlocking.  Null pages which were read before automatic
unlocking didn't get released by invalidate_bdev() and ended up
interfering with partition scan after unlocking.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fa4b9074

block: remove all rcu head initializations · f1ac2502

Paul E. McKenney authored May 19, 2010

Remove all rcu head inits. We don't care about the RCU head state before passing
it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
keep track of objects on stack.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f1ac2502

writeback: fixups for !dirty_writeback_centisecs · 6423104b

Jens Axboe authored May 21, 2010

Commit 69b62d01 fixed up most of the places where we would enter
busy schedule() spins when disabling the periodic background
writeback. This fixes up the sb timer so that it doesn't get
hammered on with the delay disabled, and ensures that it gets
rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs
gets modified.

bdi_forker_task() also needs to check for !dirty_writeback_centisecs
and use schedule() appropriately, fix that up too.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

6423104b

writeback: bdi_writeback_task() must set task state before calling schedule() · f9eadbbd

Jens Axboe authored May 18, 2010

Calling schedule without setting the task state to non-running will
return immediately, so ensure that we set it properly and check our
sleep conditions after doing so.

This is a fixup for commit 69b62d01.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f9eadbbd

writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync · 7c8a3554

Jens Axboe authored May 18, 2010

Even if the writeout itself isn't a data integrity operation, we need
to ensure that the caller doesn't drop the sb umount sem before we
have actually done the writeback.

This is a fixup for commit e913fc82.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

7c8a3554

18 May, 2010 6 commits

drivers/block/drbd: Use kzalloc · 2db4e42e

Julia Lawall authored May 13, 2010

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

2db4e42e

drbd: Create new current UUID as late as possible · 0c3f3451

Philipp Reisner authored May 17, 2010

The choice was to either delay creation of the new UUID until
IO got thawed or to delay it until the first IO request.

Both are correct, the later is more friendly to users of
dual-primary setups, that actually only write on one side.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

0c3f3451

drbd: If we detect late that IO got frozen, retry after we thawed. · 9a25a04c

Philipp Reisner authored May 10, 2010

If we detect late (= after grabing mdev->req_lock) that IO got frozen, we
return 1 to generic_make_request(), which simply will retry to make a
request for that bio.

In the subsequent call of generic_make_request() into drbd_make_request_26()
we sleep in inc_ap_bio().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

9a25a04c

drbd: always use_bmbv, ignore setting · a1c88d0d

Lars Ellenberg authored May 14, 2010

Now that the peer may handle multi-bio EEs,
we can ignore the peer's limit,
and concentrate on the limits of the local IO stack.

This is safe accross drbd protocol versions,
as our queue_max_sectors() will be adjusted accordingly.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

a1c88d0d

drbd: allow resync requests to be larger than max_segment_size · bb3d000c

Lars Ellenberg authored May 14, 2010

this should allow for better background resync performance.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

bb3d000c

drbd: Allow drbd_epoch_entries to use multiple bios. · 45bb912b

Lars Ellenberg authored May 14, 2010

This should allow for better performance if the lower level IO stack
of the peers differs in limits exposed either via the queue,
or via some merge_bvec_fn.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

45bb912b

17 May, 2010 25 commits

drbd: reduce sizeof struct drbd_epoch_entry by 8 byte by aligning members · 708d740e
Lars Ellenberg authored May 03, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
708d740e

drbd: Fixes to the new delay_probes code · 162f3ec7

Philipp Reisner authored May 06, 2010

* Only send delay_probes with protocol 93 or newer
* drbd_send_delay_probes() is called only from worker context,
  no atomic_t needed for delay_seq
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

162f3ec7

drbd: A fixes to the new resync speed code · a8cdfd8d

Philipp Reisner authored May 05, 2010

* Mention P_DELAY_PROBE in the packet naming array
* Do not corrupt the mdev->data.work list in case the timer goes
  off before delay_probe_work got handled by the worker
* Do not mod_timer() twice for a single delay_probe pair
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

a8cdfd8d

drbd: Proc bits of new resync speed stuff · eedf386a

Philipp Reisner authored May 04, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

eedf386a

drbd: Control the actual resync rate based on the queuing delay of data packets · cdd67a74

Philipp Reisner authored May 04, 2010

In a setup with a high bandwidth and high latency network, eventually
involving deep queues in routers, it is beneficial to only fill those
queues up to an limited extend with resync data.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

cdd67a74

drbd: Actually send delay probes · bd26bfc5

Philipp Reisner authored May 04, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

bd26bfc5

drbd: Four new configuration settings for resync speed control · 67c7ddd0

Philipp Reisner authored May 04, 2010

To reasonably control resync speed over drbd-proxy connections,
drbd has to measure the current delay of packets transmitted over
the (possibly congested) data socket vs the meta-data socket.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

67c7ddd0

drbd: Sending of delay_probes · 7237bc43

Philipp Reisner authored May 03, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

7237bc43

drbd: Receiving of delay_probes · 0ced55a3

Philipp Reisner authored Apr 30, 2010

Delay_probes are new packets in the DRBD protocol, which allow
DRBD to know the current delay packets have on the data socket.
(relative to the meta data socket)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

0ced55a3

drbd: Fixed bitmap in case of online-grow without resync · 5223671b

Philipp Reisner authored Apr 28, 2010

The "surplus" bits of the old (smaller) bitmap must be clean
in case of online-grow without resync.

Note: Reverted 67ae8b80d4a116ab3b7094eb3723506b20c06dff as
well, since the lines added by this patch are redundant. The
bits get set by the bm_set_surplus(b) call before that.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

5223671b

drbd: Added transmission faults to the fault injection code · 6b4388ac

Philipp Reisner authored Apr 26, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6b4388ac

drbd: bugfix: Make resize work, if remote's size was limiting and increased in the meantime · 087c2492
Philipp Reisner authored Mar 26, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
087c2492

drbd: Implemented the --assume-clean option for drbdsetup resize · 6495d2c6

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6495d2c6

drbd: Added some missing statics · b4ee79da

Philipp Reisner authored Apr 01, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

b4ee79da

drbd: Make sure to resync all of the new storage upon online resize · fd76438c

Philipp Reisner authored Apr 01, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

fd76438c

drbd: Implemented flags for the resize packet · e89b591c

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

e89b591c

drbd: Implemented the set_new_bits parameter for drbd_bm_resize() · 02d9a94b

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

02d9a94b

drbd: made determin_dev_size's parameter an flag enum · d845030f

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

d845030f

drbd: New handler: initial-split-brain · 3a11a487

Adam Gandelman authored Apr 08, 2010

Some wish to be notified of all instances of split brain, not just those that
go unresolved.  The initial-split-brain handler is called to notify someone
upon  detection of all split brain conditions even if auto-recovery policies
are configured.
Signed-off-by: Adam Gandelman <adam.gandelman@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

3a11a487

drbd: fail_requests_early: remove incorrect and unnecessary optimization · 979f5c7f

Lars Ellenberg authored Apr 06, 2010

The condition does not fit the commend (I may well be Primary,
even if I lost the disk earlier and now the connection).

And this is catched below anyways, where it also gets logged.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

979f5c7f

drbd: check for corrupt or malicous sector addresses when receiving data · 6666032a

Lars Ellenberg authored Apr 06, 2010

Even if it should never happen if the peer does behave, we need to
double check, and not even attempt access beyond end of device.
It usually would be caught by lower layers, resulting in "IO error",
but may also end up in the internal meta data area.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6666032a

drbd: cleanup: This code path to trigger a resync is no longer needed · c3fe30b0
Philipp Reisner authored Apr 01, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
c3fe30b0

drbd: don't start a resync without access to up-to-date Data · 8d4ce82b

Lars Ellenberg authored Apr 01, 2010

In case both nodes are "inconsistent", invalidate would
have started a resync anyways, without a chance to ever
succeed, just filling the logs with warning messages.

Simply disallow that state change,
re-using the SS_NO_UP_TO_DATE_DISK return value.

This also changes the corresponding error string to
"Need access to UpToDate Data" -- I found the
"Refusing to be Primary without at least one UpToDate disk"
answer misleading in some situations anyways.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

8d4ce82b

drbd: fix potential protocol error · c3470cde

Lars Ellenberg authored Apr 01, 2010

Don't forget to drain the digest in case we cannot satisfy a
checksum based resync or online-verify request.

It would additionally cause a protocoll error,
dropping the connection.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

c3470cde

drbd: remove bogus ASSERT · 8d1894eb

Lars Ellenberg authored Apr 01, 2010

block_id may be ID_SYNCER,
as well as checksum based resync request magic, or online verify magic.

Let's just drop that ASSERT.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

8d1894eb