Commit 7a771cea authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Fix dm-raid transient device failure processing and other smaller
   tweaks.

 - Add journal support to the DM raid target to close the 'write hole'
   on raid 4/5/6.

 - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB.

 - Add 'metadata2' feature to dm-cache to separate the dirty bitset out
   from other cache metadata. This improves speed of shutting down a
   large cache device (which implies writing out dirty bits).

 - Fix a memory leak during dm-stats data structure destruction.

 - Fix a DM multipath round-robin path selector performance regression
   that was caused by less precise balancing across all paths.

 - Lastly, introduce a DM core fix for a long-standing DM snapshot
   deadlock that is rooted in the complexity of the device stack used in
   conjunction with block core maintaining bios on current->bio_list to
   manage recursion in generic_make_request(). A more comprehensive fix
   to block core (and its hook in the cpu scheduler) would be wonderful
   but this DM-specific fix is pragmatic considering how difficult it
   has been to make progress on a generic fix.

* tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
  dm: flush queued bios when process blocks to avoid deadlock
  dm round robin: revert "use percpu 'repeat_count' and 'current_path'"
  dm stats: fix a leaked s->histogram_boundaries array
  dm space map metadata: constify dm_space_map structures
  dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()
  dm persistent data: add cursor skip functions to the cursor APIs
  dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2
  dm bitset: add dm_bitset_new()
  dm cache metadata: name the cache block that couldn't be loaded
  dm cache metadata: add "metadata2" feature
  dm cache metadata: use bitset cursor api to load discard bitset
  dm bitset: introduce cursor api
  dm btree: use GFP_NOFS in dm_btree_del()
  dm space map common: memcpy the disk root to ensure it's arch aligned
  dm block manager: add unlikely() annotations on dm_bufio error paths
  dm cache: fix corruption seen when using cache > 2TB
  dm raid: cleanup awkward branching in raid_message() option processing
  dm raid: use mddev rather than rdev->mddev
  dm raid: use read_disk_sb() throughout
  dm raid: add raid4/5/6 journaling support
  ...
parents e67bd12d d67a5f4b
...@@ -207,6 +207,10 @@ Optional feature arguments are: ...@@ -207,6 +207,10 @@ Optional feature arguments are:
block, then the cache block is invalidated. block, then the cache block is invalidated.
To enable passthrough mode the cache must be clean. To enable passthrough mode the cache must be clean.
metadata2 : use version 2 of the metadata. This stores the dirty bits
in a separate btree, which improves speed of shutting
down the cache.
A policy called 'default' is always registered. This is an alias for A policy called 'default' is always registered. This is an alias for
the policy we currently think is giving best all round performance. the policy we currently think is giving best all round performance.
......
...@@ -161,6 +161,15 @@ The target is named "raid" and it accepts the following parameters: ...@@ -161,6 +161,15 @@ The target is named "raid" and it accepts the following parameters:
the RAID type (i.e. the allocation algorithm) as well, e.g. the RAID type (i.e. the allocation algorithm) as well, e.g.
changing from raid5_ls to raid5_n. changing from raid5_ls to raid5_n.
[journal_dev <dev>]
This option adds a journal device to raid4/5/6 raid sets and
uses it to close the 'write hole' caused by the non-atomic updates
to the component devices which can cause data loss during recovery.
The journal device is used as writethrough thus causing writes to
be throttled versus non-journaled raid4/5/6 sets.
Takeover/reshape is not possible with a raid4/5/6 journal device;
it has to be deconfigured before requesting these.
<#raid_devs>: The number of devices composing the array. <#raid_devs>: The number of devices composing the array.
Each device consists of two entries. The first is the device Each device consists of two entries. The first is the device
containing the metadata (if any); the second is the one containing the containing the metadata (if any); the second is the one containing the
...@@ -245,6 +254,9 @@ recovery. Here is a fuller description of the individual fields: ...@@ -245,6 +254,9 @@ recovery. Here is a fuller description of the individual fields:
<data_offset> The current data offset to the start of the user data on <data_offset> The current data offset to the start of the user data on
each component device of a raid set (see the respective each component device of a raid set (see the respective
raid parameter to support out-of-place reshaping). raid parameter to support out-of-place reshaping).
<journal_char> 'A' - active raid4/5/6 journal device.
'D' - dead journal device.
'-' - no journal device.
Message Interface Message Interface
...@@ -314,3 +326,8 @@ Version History ...@@ -314,3 +326,8 @@ Version History
1.9.0 Add support for RAID level takeover/reshape/region size 1.9.0 Add support for RAID level takeover/reshape/region size
and set size reduction. and set size reduction.
1.9.1 Fix activation of existing RAID 4/10 mapped devices 1.9.1 Fix activation of existing RAID 4/10 mapped devices
1.9.2 Don't emit '- -' on the status table line in case the constructor
fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
'D' on the status line. If '- -' is passed into the constructor, emit
'- -' on the table line and '-' as the status line health character.
1.10.0 Add support for raid4/5/6 journal device
This diff is collapsed.
...@@ -45,18 +45,20 @@ ...@@ -45,18 +45,20 @@
* As these various flags are defined they should be added to the * As these various flags are defined they should be added to the
* following masks. * following masks.
*/ */
#define DM_CACHE_FEATURE_COMPAT_SUPP 0UL #define DM_CACHE_FEATURE_COMPAT_SUPP 0UL
#define DM_CACHE_FEATURE_COMPAT_RO_SUPP 0UL #define DM_CACHE_FEATURE_COMPAT_RO_SUPP 0UL
#define DM_CACHE_FEATURE_INCOMPAT_SUPP 0UL #define DM_CACHE_FEATURE_INCOMPAT_SUPP 0UL
/* /*
* Reopens or creates a new, empty metadata volume. * Reopens or creates a new, empty metadata volume. Returns an ERR_PTR on
* Returns an ERR_PTR on failure. * failure. If reopening then features must match.
*/ */
struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev, struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
sector_t data_block_size, sector_t data_block_size,
bool may_format_device, bool may_format_device,
size_t policy_hint_size); size_t policy_hint_size,
unsigned metadata_version);
void dm_cache_metadata_close(struct dm_cache_metadata *cmd); void dm_cache_metadata_close(struct dm_cache_metadata *cmd);
...@@ -91,7 +93,8 @@ int dm_cache_load_mappings(struct dm_cache_metadata *cmd, ...@@ -91,7 +93,8 @@ int dm_cache_load_mappings(struct dm_cache_metadata *cmd,
load_mapping_fn fn, load_mapping_fn fn,
void *context); void *context);
int dm_cache_set_dirty(struct dm_cache_metadata *cmd, dm_cblock_t cblock, bool dirty); int dm_cache_set_dirty_bits(struct dm_cache_metadata *cmd,
unsigned nr_bits, unsigned long *bits);
struct dm_cache_statistics { struct dm_cache_statistics {
uint32_t read_hits; uint32_t read_hits;
......
...@@ -179,6 +179,7 @@ enum cache_io_mode { ...@@ -179,6 +179,7 @@ enum cache_io_mode {
struct cache_features { struct cache_features {
enum cache_metadata_mode mode; enum cache_metadata_mode mode;
enum cache_io_mode io_mode; enum cache_io_mode io_mode;
unsigned metadata_version;
}; };
struct cache_stats { struct cache_stats {
...@@ -248,7 +249,7 @@ struct cache { ...@@ -248,7 +249,7 @@ struct cache {
/* /*
* Fields for converting from sectors to blocks. * Fields for converting from sectors to blocks.
*/ */
uint32_t sectors_per_block; sector_t sectors_per_block;
int sectors_per_block_shift; int sectors_per_block_shift;
spinlock_t lock; spinlock_t lock;
...@@ -2534,13 +2535,14 @@ static void init_features(struct cache_features *cf) ...@@ -2534,13 +2535,14 @@ static void init_features(struct cache_features *cf)
{ {
cf->mode = CM_WRITE; cf->mode = CM_WRITE;
cf->io_mode = CM_IO_WRITEBACK; cf->io_mode = CM_IO_WRITEBACK;
cf->metadata_version = 1;
} }
static int parse_features(struct cache_args *ca, struct dm_arg_set *as, static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
char **error) char **error)
{ {
static struct dm_arg _args[] = { static struct dm_arg _args[] = {
{0, 1, "Invalid number of cache feature arguments"}, {0, 2, "Invalid number of cache feature arguments"},
}; };
int r; int r;
...@@ -2566,6 +2568,9 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2566,6 +2568,9 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
else if (!strcasecmp(arg, "passthrough")) else if (!strcasecmp(arg, "passthrough"))
cf->io_mode = CM_IO_PASSTHROUGH; cf->io_mode = CM_IO_PASSTHROUGH;
else if (!strcasecmp(arg, "metadata2"))
cf->metadata_version = 2;
else { else {
*error = "Unrecognised cache feature requested"; *error = "Unrecognised cache feature requested";
return -EINVAL; return -EINVAL;
...@@ -2820,7 +2825,8 @@ static int cache_create(struct cache_args *ca, struct cache **result) ...@@ -2820,7 +2825,8 @@ static int cache_create(struct cache_args *ca, struct cache **result)
cmd = dm_cache_metadata_open(cache->metadata_dev->bdev, cmd = dm_cache_metadata_open(cache->metadata_dev->bdev,
ca->block_size, may_format, ca->block_size, may_format,
dm_cache_policy_get_hint_size(cache->policy)); dm_cache_policy_get_hint_size(cache->policy),
ca->features.metadata_version);
if (IS_ERR(cmd)) { if (IS_ERR(cmd)) {
*error = "Error creating metadata object"; *error = "Error creating metadata object";
r = PTR_ERR(cmd); r = PTR_ERR(cmd);
...@@ -3165,21 +3171,16 @@ static int cache_end_io(struct dm_target *ti, struct bio *bio, int error) ...@@ -3165,21 +3171,16 @@ static int cache_end_io(struct dm_target *ti, struct bio *bio, int error)
static int write_dirty_bitset(struct cache *cache) static int write_dirty_bitset(struct cache *cache)
{ {
unsigned i, r; int r;
if (get_cache_mode(cache) >= CM_READ_ONLY) if (get_cache_mode(cache) >= CM_READ_ONLY)
return -EINVAL; return -EINVAL;
for (i = 0; i < from_cblock(cache->cache_size); i++) { r = dm_cache_set_dirty_bits(cache->cmd, from_cblock(cache->cache_size), cache->dirty_bitset);
r = dm_cache_set_dirty(cache->cmd, to_cblock(i), if (r)
is_dirty(cache, to_cblock(i))); metadata_operation_failed(cache, "dm_cache_set_dirty_bits", r);
if (r) {
metadata_operation_failed(cache, "dm_cache_set_dirty", r);
return r;
}
}
return 0; return r;
} }
static int write_discard_bitset(struct cache *cache) static int write_discard_bitset(struct cache *cache)
...@@ -3540,11 +3541,11 @@ static void cache_status(struct dm_target *ti, status_type_t type, ...@@ -3540,11 +3541,11 @@ static void cache_status(struct dm_target *ti, status_type_t type,
residency = policy_residency(cache->policy); residency = policy_residency(cache->policy);
DMEMIT("%u %llu/%llu %u %llu/%llu %u %u %u %u %u %u %lu ", DMEMIT("%u %llu/%llu %llu %llu/%llu %u %u %u %u %u %u %lu ",
(unsigned)DM_CACHE_METADATA_BLOCK_SIZE, (unsigned)DM_CACHE_METADATA_BLOCK_SIZE,
(unsigned long long)(nr_blocks_metadata - nr_free_blocks_metadata), (unsigned long long)(nr_blocks_metadata - nr_free_blocks_metadata),
(unsigned long long)nr_blocks_metadata, (unsigned long long)nr_blocks_metadata,
cache->sectors_per_block, (unsigned long long)cache->sectors_per_block,
(unsigned long long) from_cblock(residency), (unsigned long long) from_cblock(residency),
(unsigned long long) from_cblock(cache->cache_size), (unsigned long long) from_cblock(cache->cache_size),
(unsigned) atomic_read(&cache->stats.read_hit), (unsigned) atomic_read(&cache->stats.read_hit),
...@@ -3555,14 +3556,19 @@ static void cache_status(struct dm_target *ti, status_type_t type, ...@@ -3555,14 +3556,19 @@ static void cache_status(struct dm_target *ti, status_type_t type,
(unsigned) atomic_read(&cache->stats.promotion), (unsigned) atomic_read(&cache->stats.promotion),
(unsigned long) atomic_read(&cache->nr_dirty)); (unsigned long) atomic_read(&cache->nr_dirty));
if (cache->features.metadata_version == 2)
DMEMIT("2 metadata2 ");
else
DMEMIT("1 ");
if (writethrough_mode(&cache->features)) if (writethrough_mode(&cache->features))
DMEMIT("1 writethrough "); DMEMIT("writethrough ");
else if (passthrough_mode(&cache->features)) else if (passthrough_mode(&cache->features))
DMEMIT("1 passthrough "); DMEMIT("passthrough ");
else if (writeback_mode(&cache->features)) else if (writeback_mode(&cache->features))
DMEMIT("1 writeback "); DMEMIT("writeback ");
else { else {
DMERR("%s: internal error: unknown io mode: %d", DMERR("%s: internal error: unknown io mode: %d",
...@@ -3810,7 +3816,7 @@ static void cache_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -3810,7 +3816,7 @@ static void cache_io_hints(struct dm_target *ti, struct queue_limits *limits)
static struct target_type cache_target = { static struct target_type cache_target = {
.name = "cache", .name = "cache",
.version = {1, 9, 0}, .version = {1, 10, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = cache_ctr, .ctr = cache_ctr,
.dtr = cache_dtr, .dtr = cache_dtr,
......
This diff is collapsed.
...@@ -17,8 +17,8 @@ ...@@ -17,8 +17,8 @@
#include <linux/module.h> #include <linux/module.h>
#define DM_MSG_PREFIX "multipath round-robin" #define DM_MSG_PREFIX "multipath round-robin"
#define RR_MIN_IO 1000 #define RR_MIN_IO 1
#define RR_VERSION "1.1.0" #define RR_VERSION "1.2.0"
/*----------------------------------------------------------------- /*-----------------------------------------------------------------
* Path-handling code, paths are held in lists * Path-handling code, paths are held in lists
...@@ -47,44 +47,19 @@ struct selector { ...@@ -47,44 +47,19 @@ struct selector {
struct list_head valid_paths; struct list_head valid_paths;
struct list_head invalid_paths; struct list_head invalid_paths;
spinlock_t lock; spinlock_t lock;
struct dm_path * __percpu *current_path;
struct percpu_counter repeat_count;
}; };
static void set_percpu_current_path(struct selector *s, struct dm_path *path)
{
int cpu;
for_each_possible_cpu(cpu)
*per_cpu_ptr(s->current_path, cpu) = path;
}
static struct selector *alloc_selector(void) static struct selector *alloc_selector(void)
{ {
struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL); struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL);
if (!s) if (s) {
return NULL; INIT_LIST_HEAD(&s->valid_paths);
INIT_LIST_HEAD(&s->invalid_paths);
INIT_LIST_HEAD(&s->valid_paths); spin_lock_init(&s->lock);
INIT_LIST_HEAD(&s->invalid_paths); }
spin_lock_init(&s->lock);
s->current_path = alloc_percpu(struct dm_path *);
if (!s->current_path)
goto out_current_path;
set_percpu_current_path(s, NULL);
if (percpu_counter_init(&s->repeat_count, 0, GFP_KERNEL))
goto out_repeat_count;
return s; return s;
out_repeat_count:
free_percpu(s->current_path);
out_current_path:
kfree(s);
return NULL;;
} }
static int rr_create(struct path_selector *ps, unsigned argc, char **argv) static int rr_create(struct path_selector *ps, unsigned argc, char **argv)
...@@ -105,8 +80,6 @@ static void rr_destroy(struct path_selector *ps) ...@@ -105,8 +80,6 @@ static void rr_destroy(struct path_selector *ps)
free_paths(&s->valid_paths); free_paths(&s->valid_paths);
free_paths(&s->invalid_paths); free_paths(&s->invalid_paths);
free_percpu(s->current_path);
percpu_counter_destroy(&s->repeat_count);
kfree(s); kfree(s);
ps->context = NULL; ps->context = NULL;
} }
...@@ -157,6 +130,11 @@ static int rr_add_path(struct path_selector *ps, struct dm_path *path, ...@@ -157,6 +130,11 @@ static int rr_add_path(struct path_selector *ps, struct dm_path *path,
return -EINVAL; return -EINVAL;
} }
if (repeat_count > 1) {
DMWARN_LIMIT("repeat_count > 1 is deprecated, using 1 instead");
repeat_count = 1;
}
/* allocate the path */ /* allocate the path */
pi = kmalloc(sizeof(*pi), GFP_KERNEL); pi = kmalloc(sizeof(*pi), GFP_KERNEL);
if (!pi) { if (!pi) {
...@@ -183,9 +161,6 @@ static void rr_fail_path(struct path_selector *ps, struct dm_path *p) ...@@ -183,9 +161,6 @@ static void rr_fail_path(struct path_selector *ps, struct dm_path *p)
struct path_info *pi = p->pscontext; struct path_info *pi = p->pscontext;
spin_lock_irqsave(&s->lock, flags); spin_lock_irqsave(&s->lock, flags);
if (p == *this_cpu_ptr(s->current_path))
set_percpu_current_path(s, NULL);
list_move(&pi->list, &s->invalid_paths); list_move(&pi->list, &s->invalid_paths);
spin_unlock_irqrestore(&s->lock, flags); spin_unlock_irqrestore(&s->lock, flags);
} }
...@@ -208,29 +183,15 @@ static struct dm_path *rr_select_path(struct path_selector *ps, size_t nr_bytes) ...@@ -208,29 +183,15 @@ static struct dm_path *rr_select_path(struct path_selector *ps, size_t nr_bytes)
unsigned long flags; unsigned long flags;
struct selector *s = ps->context; struct selector *s = ps->context;
struct path_info *pi = NULL; struct path_info *pi = NULL;
struct dm_path *current_path = NULL;
local_irq_save(flags);
current_path = *this_cpu_ptr(s->current_path);
if (current_path) {
percpu_counter_dec(&s->repeat_count);
if (percpu_counter_read_positive(&s->repeat_count) > 0) {
local_irq_restore(flags);
return current_path;
}
}
spin_lock(&s->lock); spin_lock_irqsave(&s->lock, flags);
if (!list_empty(&s->valid_paths)) { if (!list_empty(&s->valid_paths)) {
pi = list_entry(s->valid_paths.next, struct path_info, list); pi = list_entry(s->valid_paths.next, struct path_info, list);
list_move_tail(&pi->list, &s->valid_paths); list_move_tail(&pi->list, &s->valid_paths);
percpu_counter_set(&s->repeat_count, pi->repeat_count);
set_percpu_current_path(s, pi->path);
current_path = pi->path;
} }
spin_unlock_irqrestore(&s->lock, flags); spin_unlock_irqrestore(&s->lock, flags);
return current_path; return pi ? pi->path : NULL;
} }
static struct path_selector_type rr_ps = { static struct path_selector_type rr_ps = {
......
...@@ -175,6 +175,7 @@ static void dm_stat_free(struct rcu_head *head) ...@@ -175,6 +175,7 @@ static void dm_stat_free(struct rcu_head *head)
int cpu; int cpu;
struct dm_stat *s = container_of(head, struct dm_stat, rcu_head); struct dm_stat *s = container_of(head, struct dm_stat, rcu_head);
kfree(s->histogram_boundaries);
kfree(s->program_id); kfree(s->program_id);
kfree(s->aux_data); kfree(s->aux_data);
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
......
...@@ -974,10 +974,61 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) ...@@ -974,10 +974,61 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
} }
EXPORT_SYMBOL_GPL(dm_accept_partial_bio); EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
/*
* Flush current->bio_list when the target map method blocks.
* This fixes deadlocks in snapshot and possibly in other targets.
*/
struct dm_offload {
struct blk_plug plug;
struct blk_plug_cb cb;
};
static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
{
struct dm_offload *o = container_of(cb, struct dm_offload, cb);
struct bio_list list;
struct bio *bio;
INIT_LIST_HEAD(&o->cb.list);
if (unlikely(!current->bio_list))
return;
list = *current->bio_list;
bio_list_init(current->bio_list);
while ((bio = bio_list_pop(&list))) {
struct bio_set *bs = bio->bi_pool;
if (unlikely(!bs) || bs == fs_bio_set) {
bio_list_add(current->bio_list, bio);
continue;
}
spin_lock(&bs->rescue_lock);
bio_list_add(&bs->rescue_list, bio);
queue_work(bs->rescue_workqueue, &bs->rescue_work);
spin_unlock(&bs->rescue_lock);
}
}
static void dm_offload_start(struct dm_offload *o)
{
blk_start_plug(&o->plug);
o->cb.callback = flush_current_bio_list;
list_add(&o->cb.list, &current->plug->cb_list);
}
static void dm_offload_end(struct dm_offload *o)
{
list_del(&o->cb.list);
blk_finish_plug(&o->plug);
}
static void __map_bio(struct dm_target_io *tio) static void __map_bio(struct dm_target_io *tio)
{ {
int r; int r;
sector_t sector; sector_t sector;
struct dm_offload o;
struct bio *clone = &tio->clone; struct bio *clone = &tio->clone;
struct dm_target *ti = tio->ti; struct dm_target *ti = tio->ti;
...@@ -990,7 +1041,11 @@ static void __map_bio(struct dm_target_io *tio) ...@@ -990,7 +1041,11 @@ static void __map_bio(struct dm_target_io *tio)
*/ */
atomic_inc(&tio->io->io_count); atomic_inc(&tio->io->io_count);
sector = clone->bi_iter.bi_sector; sector = clone->bi_iter.bi_sector;
dm_offload_start(&o);
r = ti->type->map(ti, clone); r = ti->type->map(ti, clone);
dm_offload_end(&o);
if (r == DM_MAPIO_REMAPPED) { if (r == DM_MAPIO_REMAPPED) {
/* the bio has been remapped so dispatch it */ /* the bio has been remapped so dispatch it */
......
...@@ -976,6 +976,27 @@ int dm_array_cursor_next(struct dm_array_cursor *c) ...@@ -976,6 +976,27 @@ int dm_array_cursor_next(struct dm_array_cursor *c)
} }
EXPORT_SYMBOL_GPL(dm_array_cursor_next); EXPORT_SYMBOL_GPL(dm_array_cursor_next);
int dm_array_cursor_skip(struct dm_array_cursor *c, uint32_t count)
{
int r;
do {
uint32_t remaining = le32_to_cpu(c->ab->nr_entries) - c->index;
if (count < remaining) {
c->index += count;
return 0;
}
count -= remaining;
r = dm_array_cursor_next(c);
} while (!r);
return r;
}
EXPORT_SYMBOL_GPL(dm_array_cursor_skip);
void dm_array_cursor_get_value(struct dm_array_cursor *c, void **value_le) void dm_array_cursor_get_value(struct dm_array_cursor *c, void **value_le)
{ {
*value_le = element_at(c->info, c->ab, c->index); *value_le = element_at(c->info, c->ab, c->index);
......
...@@ -207,6 +207,7 @@ void dm_array_cursor_end(struct dm_array_cursor *c); ...@@ -207,6 +207,7 @@ void dm_array_cursor_end(struct dm_array_cursor *c);
uint32_t dm_array_cursor_index(struct dm_array_cursor *c); uint32_t dm_array_cursor_index(struct dm_array_cursor *c);
int dm_array_cursor_next(struct dm_array_cursor *c); int dm_array_cursor_next(struct dm_array_cursor *c);
int dm_array_cursor_skip(struct dm_array_cursor *c, uint32_t count);
/* /*
* value_le is only valid while the cursor points at the current value. * value_le is only valid while the cursor points at the current value.
......
...@@ -39,6 +39,48 @@ int dm_bitset_empty(struct dm_disk_bitset *info, dm_block_t *root) ...@@ -39,6 +39,48 @@ int dm_bitset_empty(struct dm_disk_bitset *info, dm_block_t *root)
} }
EXPORT_SYMBOL_GPL(dm_bitset_empty); EXPORT_SYMBOL_GPL(dm_bitset_empty);
struct packer_context {
bit_value_fn fn;
unsigned nr_bits;
void *context;
};
static int pack_bits(uint32_t index, void *value, void *context)
{
int r;
struct packer_context *p = context;
unsigned bit, nr = min(64u, p->nr_bits - (index * 64));
uint64_t word = 0;
bool bv;
for (bit = 0; bit < nr; bit++) {
r = p->fn(index * 64 + bit, &bv, p->context);
if (r)
return r;
if (bv)
set_bit(bit, (unsigned long *) &word);
else
clear_bit(bit, (unsigned long *) &word);
}
*((__le64 *) value) = cpu_to_le64(word);
return 0;
}
int dm_bitset_new(struct dm_disk_bitset *info, dm_block_t *root,
uint32_t size, bit_value_fn fn, void *context)
{
struct packer_context p;
p.fn = fn;
p.nr_bits = size;
p.context = context;
return dm_array_new(&info->array_info, root, dm_div_up(size, 64), pack_bits, &p);
}
EXPORT_SYMBOL_GPL(dm_bitset_new);
int dm_bitset_resize(struct dm_disk_bitset *info, dm_block_t root, int dm_bitset_resize(struct dm_disk_bitset *info, dm_block_t root,
uint32_t old_nr_entries, uint32_t new_nr_entries, uint32_t old_nr_entries, uint32_t new_nr_entries,
bool default_value, dm_block_t *new_root) bool default_value, dm_block_t *new_root)
...@@ -168,4 +210,108 @@ int dm_bitset_test_bit(struct dm_disk_bitset *info, dm_block_t root, ...@@ -168,4 +210,108 @@ int dm_bitset_test_bit(struct dm_disk_bitset *info, dm_block_t root,
} }
EXPORT_SYMBOL_GPL(dm_bitset_test_bit); EXPORT_SYMBOL_GPL(dm_bitset_test_bit);
static int cursor_next_array_entry(struct dm_bitset_cursor *c)
{
int r;
__le64 *value;
r = dm_array_cursor_next(&c->cursor);
if (r)
return r;
dm_array_cursor_get_value(&c->cursor, (void **) &value);
c->array_index++;
c->bit_index = 0;
c->current_bits = le64_to_cpu(*value);
return 0;
}
int dm_bitset_cursor_begin(struct dm_disk_bitset *info,
dm_block_t root, uint32_t nr_entries,
struct dm_bitset_cursor *c)
{
int r;
__le64 *value;
if (!nr_entries)
return -ENODATA;
c->info = info;
c->entries_remaining = nr_entries;
r = dm_array_cursor_begin(&info->array_info, root, &c->cursor);
if (r)
return r;
dm_array_cursor_get_value(&c->cursor, (void **) &value);
c->array_index = 0;
c->bit_index = 0;
c->current_bits = le64_to_cpu(*value);
return r;
}
EXPORT_SYMBOL_GPL(dm_bitset_cursor_begin);
void dm_bitset_cursor_end(struct dm_bitset_cursor *c)
{
return dm_array_cursor_end(&c->cursor);
}
EXPORT_SYMBOL_GPL(dm_bitset_cursor_end);
int dm_bitset_cursor_next(struct dm_bitset_cursor *c)
{
int r = 0;
if (!c->entries_remaining)
return -ENODATA;
c->entries_remaining--;
if (++c->bit_index > 63)
r = cursor_next_array_entry(c);
return r;
}
EXPORT_SYMBOL_GPL(dm_bitset_cursor_next);
int dm_bitset_cursor_skip(struct dm_bitset_cursor *c, uint32_t count)
{
int r;
__le64 *value;
uint32_t nr_array_skip;
uint32_t remaining_in_word = 64 - c->bit_index;
if (c->entries_remaining < count)
return -ENODATA;
if (count < remaining_in_word) {
c->bit_index += count;
c->entries_remaining -= count;
return 0;
} else {
c->entries_remaining -= remaining_in_word;
count -= remaining_in_word;
}
nr_array_skip = (count / 64) + 1;
r = dm_array_cursor_skip(&c->cursor, nr_array_skip);
if (r)
return r;
dm_array_cursor_get_value(&c->cursor, (void **) &value);
c->entries_remaining -= count;
c->array_index += nr_array_skip;
c->bit_index = count & 63;
c->current_bits = le64_to_cpu(*value);
return 0;
}
EXPORT_SYMBOL_GPL(dm_bitset_cursor_skip);
bool dm_bitset_cursor_get_value(struct dm_bitset_cursor *c)
{
return test_bit(c->bit_index, (unsigned long *) &c->current_bits);
}
EXPORT_SYMBOL_GPL(dm_bitset_cursor_get_value);
/*----------------------------------------------------------------*/ /*----------------------------------------------------------------*/
...@@ -92,6 +92,22 @@ void dm_disk_bitset_init(struct dm_transaction_manager *tm, ...@@ -92,6 +92,22 @@ void dm_disk_bitset_init(struct dm_transaction_manager *tm,
*/ */
int dm_bitset_empty(struct dm_disk_bitset *info, dm_block_t *new_root); int dm_bitset_empty(struct dm_disk_bitset *info, dm_block_t *new_root);
/*
* Creates a new bitset populated with values provided by a callback
* function. This is more efficient than creating an empty bitset,
* resizing, and then setting values since that process incurs a lot of
* copying.
*
* info - describes the array
* root - the root block of the array on disk
* size - the number of entries in the array
* fn - the callback
* context - passed to the callback
*/
typedef int (*bit_value_fn)(uint32_t index, bool *value, void *context);
int dm_bitset_new(struct dm_disk_bitset *info, dm_block_t *root,
uint32_t size, bit_value_fn fn, void *context);
/* /*
* Resize the bitset. * Resize the bitset.
* *
...@@ -161,6 +177,29 @@ int dm_bitset_test_bit(struct dm_disk_bitset *info, dm_block_t root, ...@@ -161,6 +177,29 @@ int dm_bitset_test_bit(struct dm_disk_bitset *info, dm_block_t root,
int dm_bitset_flush(struct dm_disk_bitset *info, dm_block_t root, int dm_bitset_flush(struct dm_disk_bitset *info, dm_block_t root,
dm_block_t *new_root); dm_block_t *new_root);
struct dm_bitset_cursor {
struct dm_disk_bitset *info;
struct dm_array_cursor cursor;
uint32_t entries_remaining;
uint32_t array_index;
uint32_t bit_index;
uint64_t current_bits;
};
/*
* Make sure you've flush any dm_disk_bitset and updated the root before
* using this.
*/
int dm_bitset_cursor_begin(struct dm_disk_bitset *info,
dm_block_t root, uint32_t nr_entries,
struct dm_bitset_cursor *c);
void dm_bitset_cursor_end(struct dm_bitset_cursor *c);
int dm_bitset_cursor_next(struct dm_bitset_cursor *c);
int dm_bitset_cursor_skip(struct dm_bitset_cursor *c, uint32_t count);
bool dm_bitset_cursor_get_value(struct dm_bitset_cursor *c);
/*----------------------------------------------------------------*/ /*----------------------------------------------------------------*/
#endif /* _LINUX_DM_BITSET_H */ #endif /* _LINUX_DM_BITSET_H */
...@@ -462,7 +462,7 @@ int dm_bm_read_lock(struct dm_block_manager *bm, dm_block_t b, ...@@ -462,7 +462,7 @@ int dm_bm_read_lock(struct dm_block_manager *bm, dm_block_t b,
int r; int r;
p = dm_bufio_read(bm->bufio, b, (struct dm_buffer **) result); p = dm_bufio_read(bm->bufio, b, (struct dm_buffer **) result);
if (IS_ERR(p)) if (unlikely(IS_ERR(p)))
return PTR_ERR(p); return PTR_ERR(p);
aux = dm_bufio_get_aux_data(to_buffer(*result)); aux = dm_bufio_get_aux_data(to_buffer(*result));
...@@ -498,7 +498,7 @@ int dm_bm_write_lock(struct dm_block_manager *bm, ...@@ -498,7 +498,7 @@ int dm_bm_write_lock(struct dm_block_manager *bm,
return -EPERM; return -EPERM;
p = dm_bufio_read(bm->bufio, b, (struct dm_buffer **) result); p = dm_bufio_read(bm->bufio, b, (struct dm_buffer **) result);
if (IS_ERR(p)) if (unlikely(IS_ERR(p)))
return PTR_ERR(p); return PTR_ERR(p);
aux = dm_bufio_get_aux_data(to_buffer(*result)); aux = dm_bufio_get_aux_data(to_buffer(*result));
...@@ -531,7 +531,7 @@ int dm_bm_read_try_lock(struct dm_block_manager *bm, ...@@ -531,7 +531,7 @@ int dm_bm_read_try_lock(struct dm_block_manager *bm,
int r; int r;
p = dm_bufio_get(bm->bufio, b, (struct dm_buffer **) result); p = dm_bufio_get(bm->bufio, b, (struct dm_buffer **) result);
if (IS_ERR(p)) if (unlikely(IS_ERR(p)))
return PTR_ERR(p); return PTR_ERR(p);
if (unlikely(!p)) if (unlikely(!p))
return -EWOULDBLOCK; return -EWOULDBLOCK;
...@@ -567,7 +567,7 @@ int dm_bm_write_lock_zero(struct dm_block_manager *bm, ...@@ -567,7 +567,7 @@ int dm_bm_write_lock_zero(struct dm_block_manager *bm,
return -EPERM; return -EPERM;
p = dm_bufio_new(bm->bufio, b, (struct dm_buffer **) result); p = dm_bufio_new(bm->bufio, b, (struct dm_buffer **) result);
if (IS_ERR(p)) if (unlikely(IS_ERR(p)))
return PTR_ERR(p); return PTR_ERR(p);
memset(p, 0, dm_bm_block_size(bm)); memset(p, 0, dm_bm_block_size(bm));
......
...@@ -272,7 +272,12 @@ int dm_btree_del(struct dm_btree_info *info, dm_block_t root) ...@@ -272,7 +272,12 @@ int dm_btree_del(struct dm_btree_info *info, dm_block_t root)
int r; int r;
struct del_stack *s; struct del_stack *s;
s = kmalloc(sizeof(*s), GFP_NOIO); /*
* dm_btree_del() is called via an ioctl, as such should be
* considered an FS op. We can't recurse back into the FS, so we
* allocate GFP_NOFS.
*/
s = kmalloc(sizeof(*s), GFP_NOFS);
if (!s) if (!s)
return -ENOMEM; return -ENOMEM;
s->info = info; s->info = info;
...@@ -1139,6 +1144,17 @@ int dm_btree_cursor_next(struct dm_btree_cursor *c) ...@@ -1139,6 +1144,17 @@ int dm_btree_cursor_next(struct dm_btree_cursor *c)
} }
EXPORT_SYMBOL_GPL(dm_btree_cursor_next); EXPORT_SYMBOL_GPL(dm_btree_cursor_next);
int dm_btree_cursor_skip(struct dm_btree_cursor *c, uint32_t count)
{
int r = 0;
while (count-- && !r)
r = dm_btree_cursor_next(c);
return r;
}
EXPORT_SYMBOL_GPL(dm_btree_cursor_skip);
int dm_btree_cursor_get_value(struct dm_btree_cursor *c, uint64_t *key, void *value_le) int dm_btree_cursor_get_value(struct dm_btree_cursor *c, uint64_t *key, void *value_le)
{ {
if (c->depth) { if (c->depth) {
......
...@@ -209,6 +209,7 @@ int dm_btree_cursor_begin(struct dm_btree_info *info, dm_block_t root, ...@@ -209,6 +209,7 @@ int dm_btree_cursor_begin(struct dm_btree_info *info, dm_block_t root,
bool prefetch_leaves, struct dm_btree_cursor *c); bool prefetch_leaves, struct dm_btree_cursor *c);
void dm_btree_cursor_end(struct dm_btree_cursor *c); void dm_btree_cursor_end(struct dm_btree_cursor *c);
int dm_btree_cursor_next(struct dm_btree_cursor *c); int dm_btree_cursor_next(struct dm_btree_cursor *c);
int dm_btree_cursor_skip(struct dm_btree_cursor *c, uint32_t count);
int dm_btree_cursor_get_value(struct dm_btree_cursor *c, uint64_t *key, void *value_le); int dm_btree_cursor_get_value(struct dm_btree_cursor *c, uint64_t *key, void *value_le);
#endif /* _LINUX_DM_BTREE_H */ #endif /* _LINUX_DM_BTREE_H */
...@@ -626,13 +626,19 @@ int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm, ...@@ -626,13 +626,19 @@ int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm,
void *root_le, size_t len) void *root_le, size_t len)
{ {
int r; int r;
struct disk_sm_root *smr = root_le; struct disk_sm_root smr;
if (len < sizeof(struct disk_sm_root)) { if (len < sizeof(struct disk_sm_root)) {
DMERR("sm_metadata root too small"); DMERR("sm_metadata root too small");
return -ENOMEM; return -ENOMEM;
} }
/*
* We don't know the alignment of the root_le buffer, so need to
* copy into a new structure.
*/
memcpy(&smr, root_le, sizeof(smr));
r = sm_ll_init(ll, tm); r = sm_ll_init(ll, tm);
if (r < 0) if (r < 0)
return r; return r;
...@@ -644,10 +650,10 @@ int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm, ...@@ -644,10 +650,10 @@ int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm,
ll->max_entries = metadata_ll_max_entries; ll->max_entries = metadata_ll_max_entries;
ll->commit = metadata_ll_commit; ll->commit = metadata_ll_commit;
ll->nr_blocks = le64_to_cpu(smr->nr_blocks); ll->nr_blocks = le64_to_cpu(smr.nr_blocks);
ll->nr_allocated = le64_to_cpu(smr->nr_allocated); ll->nr_allocated = le64_to_cpu(smr.nr_allocated);
ll->bitmap_root = le64_to_cpu(smr->bitmap_root); ll->bitmap_root = le64_to_cpu(smr.bitmap_root);
ll->ref_count_root = le64_to_cpu(smr->ref_count_root); ll->ref_count_root = le64_to_cpu(smr.ref_count_root);
return ll->open_index(ll); return ll->open_index(ll);
} }
......
...@@ -544,7 +544,7 @@ static int sm_metadata_copy_root(struct dm_space_map *sm, void *where_le, size_t ...@@ -544,7 +544,7 @@ static int sm_metadata_copy_root(struct dm_space_map *sm, void *where_le, size_t
static int sm_metadata_extend(struct dm_space_map *sm, dm_block_t extra_blocks); static int sm_metadata_extend(struct dm_space_map *sm, dm_block_t extra_blocks);
static struct dm_space_map ops = { static const struct dm_space_map ops = {
.destroy = sm_metadata_destroy, .destroy = sm_metadata_destroy,
.extend = sm_metadata_extend, .extend = sm_metadata_extend,
.get_nr_blocks = sm_metadata_get_nr_blocks, .get_nr_blocks = sm_metadata_get_nr_blocks,
...@@ -671,7 +671,7 @@ static int sm_bootstrap_copy_root(struct dm_space_map *sm, void *where, ...@@ -671,7 +671,7 @@ static int sm_bootstrap_copy_root(struct dm_space_map *sm, void *where,
return -EINVAL; return -EINVAL;
} }
static struct dm_space_map bootstrap_ops = { static const struct dm_space_map bootstrap_ops = {
.destroy = sm_bootstrap_destroy, .destroy = sm_bootstrap_destroy,
.extend = sm_bootstrap_extend, .extend = sm_bootstrap_extend,
.get_nr_blocks = sm_bootstrap_get_nr_blocks, .get_nr_blocks = sm_bootstrap_get_nr_blocks,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment