Commit 9c7755af authored by Joe Thornber's avatar Joe Thornber Committed by Greg Kroah-Hartman

dm btree: fix serious bug in btree_split_beneath()

commit bc68d0a4 upstream.

When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:

  i) space for a new entry
  ii) adjusting the first key entry if the new key is lower than any in the node.

If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them.  The root node is
left with 2 entries corresponding to the 2 new nodes.

btree_split_beneath() then adjusts the spine to point to one of the two new
children.  This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out.  This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.

This is a serious bug, and quite hard to make trigger in normal use.  A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
  https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine.  Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.
Reported-by: default avatarMonty Pavel <monty_pavel@sina.com>
Signed-off-by: default avatarJoe Thornber <thornber@redhat.com>
Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
parent b7bd013a
...@@ -671,23 +671,8 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key) ...@@ -671,23 +671,8 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)
pn->keys[1] = rn->keys[0]; pn->keys[1] = rn->keys[0];
memcpy_disk(value_ptr(pn, 1), &val, sizeof(__le64)); memcpy_disk(value_ptr(pn, 1), &val, sizeof(__le64));
/*
* rejig the spine. This is ugly, since it knows too
* much about the spine
*/
if (s->nodes[0] != new_parent) {
unlock_block(s->info, s->nodes[0]);
s->nodes[0] = new_parent;
}
if (key < le64_to_cpu(rn->keys[0])) {
unlock_block(s->info, right);
s->nodes[1] = left;
} else {
unlock_block(s->info, left); unlock_block(s->info, left);
s->nodes[1] = right; unlock_block(s->info, right);
}
s->count = 2;
return 0; return 0;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment