Commit fd2ed4d2 authored by Mikulas Patocka's avatar Mikulas Patocka Committed by Mike Snitzer

dm: add statistics support

Support the collection of I/O statistics on user-defined regions of
a DM device.  If no regions are defined no statistics are collected so
there isn't any performance impact.  Only bio-based DM devices are
currently supported.

Each user-defined region specifies a starting sector, length and step.
Individual statistics will be collected for each step-sized area within
the range specified.

The I/O statistics counters for each step-sized area of a region are
in the same format as /sys/block/*/stat or /proc/diskstats but extra
counters (12 and 13) are provided: total time spent reading and
writing in milliseconds.  All these counters may be accessed by sending
the @stats_print message to the appropriate DM device via dmsetup.

The creation of DM statistics will allocate memory via kmalloc or
fallback to using vmalloc space.  At most, 1/4 of the overall system
memory may be allocated by DM statistics.  The admin can see how much
memory is used by reading
/sys/module/dm_mod/parameters/stats_current_allocated_bytes

See Documentation/device-mapper/statistics.txt for more details.
Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
parent 94563bad
DM statistics
=============
Device Mapper supports the collection of I/O statistics on user-defined
regions of a DM device. If no regions are defined no statistics are
collected so there isn't any performance impact. Only bio-based DM
devices are currently supported.
Each user-defined region specifies a starting sector, length and step.
Individual statistics will be collected for each step-sized area within
the range specified.
The I/O statistics counters for each step-sized area of a region are
in the same format as /sys/block/*/stat or /proc/diskstats (see:
Documentation/iostats.txt). But two extra counters (12 and 13) are
provided: total time spent reading and writing in milliseconds. All
these counters may be accessed by sending the @stats_print message to
the appropriate DM device via dmsetup.
Each region has a corresponding unique identifier, which we call a
region_id, that is assigned when the region is created. The region_id
must be supplied when querying statistics about the region, deleting the
region, etc. Unique region_ids enable multiple userspace programs to
request and process statistics for the same DM device without stepping
on each other's data.
The creation of DM statistics will allocate memory via kmalloc or
fallback to using vmalloc space. At most, 1/4 of the overall system
memory may be allocated by DM statistics. The admin can see how much
memory is used by reading
/sys/module/dm_mod/parameters/stats_current_allocated_bytes
Messages
========
@stats_create <range> <step> [<program_id> [<aux_data>]]
Create a new region and return the region_id.
<range>
"-" - whole device
"<start_sector>+<length>" - a range of <length> 512-byte sectors
starting with <start_sector>.
<step>
"<area_size>" - the range is subdivided into areas each containing
<area_size> sectors.
"/<number_of_areas>" - the range is subdivided into the specified
number of areas.
<program_id>
An optional parameter. A name that uniquely identifies
the userspace owner of the range. This groups ranges together
so that userspace programs can identify the ranges they
created and ignore those created by others.
The kernel returns this string back in the output of
@stats_list message, but it doesn't use it for anything else.
<aux_data>
An optional parameter. A word that provides auxiliary data
that is useful to the client program that created the range.
The kernel returns this string back in the output of
@stats_list message, but it doesn't use this value for anything.
@stats_delete <region_id>
Delete the region with the specified id.
<region_id>
region_id returned from @stats_create
@stats_clear <region_id>
Clear all the counters except the in-flight i/o counters.
<region_id>
region_id returned from @stats_create
@stats_list [<program_id>]
List all regions registered with @stats_create.
<program_id>
An optional parameter.
If this parameter is specified, only matching regions
are returned.
If it is not specified, all regions are returned.
Output format:
<region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
@stats_print <region_id> [<starting_line> <number_of_lines>]
Print counters for each step-sized area of a region.
<region_id>
region_id returned from @stats_create
<starting_line>
The index of the starting line in the output.
If omitted, all lines are returned.
<number_of_lines>
The number of lines to include in the output.
If omitted, all lines are returned.
Output format for each step-sized area of a region:
<start_sector>+<length> counters
The first 11 counters have the same meaning as
/sys/block/*/stat or /proc/diskstats.
Please refer to Documentation/iostats.txt for details.
1. the number of reads completed
2. the number of reads merged
3. the number of sectors read
4. the number of milliseconds spent reading
5. the number of writes completed
6. the number of writes merged
7. the number of sectors written
8. the number of milliseconds spent writing
9. the number of I/Os currently in progress
10. the number of milliseconds spent doing I/Os
11. the weighted number of milliseconds spent doing I/Os
Additional counters:
12. the total time spent reading in milliseconds
13. the total time spent writing in milliseconds
@stats_print_clear <region_id> [<starting_line> <number_of_lines>]
Atomically print and then clear all the counters except the
in-flight i/o counters. Useful when the client consuming the
statistics does not want to lose any statistics (those updated
between printing and clearing).
<region_id>
region_id returned from @stats_create
<starting_line>
The index of the starting line in the output.
If omitted, all lines are printed and then cleared.
<number_of_lines>
The number of lines to process.
If omitted, all lines are printed and then cleared.
@stats_set_aux <region_id> <aux_data>
Store auxiliary data aux_data for the specified region.
<region_id>
region_id returned from @stats_create
<aux_data>
The string that identifies data which is useful to the client
program that created the range. The kernel returns this
string back in the output of @stats_list message, but it
doesn't use this value for anything.
Examples
========
Subdivide the DM device 'vol' into 100 pieces and start collecting
statistics on them:
dmsetup message vol 0 @stats_create - /100
Set the auxillary data string to "foo bar baz" (the escape for each
space must also be escaped, otherwise the shell will consume them):
dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
List the statistics:
dmsetup message vol 0 @stats_list
Print the statistics:
dmsetup message vol 0 @stats_print 0
Delete the statistics:
dmsetup message vol 0 @stats_delete 0
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# #
dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \ dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \
dm-ioctl.o dm-io.o dm-kcopyd.o dm-sysfs.o dm-ioctl.o dm-io.o dm-kcopyd.o dm-sysfs.o dm-stats.o
dm-multipath-y += dm-path-selector.o dm-mpath.o dm-multipath-y += dm-path-selector.o dm-mpath.o
dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \ dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \
dm-snap-persistent.o dm-snap-persistent.o
......
...@@ -1455,20 +1455,26 @@ static int table_status(struct dm_ioctl *param, size_t param_size) ...@@ -1455,20 +1455,26 @@ static int table_status(struct dm_ioctl *param, size_t param_size)
return 0; return 0;
} }
static bool buffer_test_overflow(char *result, unsigned maxlen)
{
return !maxlen || strlen(result) + 1 >= maxlen;
}
/* /*
* Process device-mapper dependent messages. * Process device-mapper dependent messages. Messages prefixed with '@'
* are processed by the DM core. All others are delivered to the target.
* Returns a number <= 1 if message was processed by device mapper. * Returns a number <= 1 if message was processed by device mapper.
* Returns 2 if message should be delivered to the target. * Returns 2 if message should be delivered to the target.
*/ */
static int message_for_md(struct mapped_device *md, unsigned argc, char **argv, static int message_for_md(struct mapped_device *md, unsigned argc, char **argv,
char *result, unsigned maxlen) char *result, unsigned maxlen)
{ {
return 2; int r;
if (**argv != '@')
return 2; /* no '@' prefix, deliver to target */
r = dm_stats_message(md, argc, argv, result, maxlen);
if (r < 2)
return r;
DMERR("Unsupported message sent to DM core: %s", argv[0]);
return -EINVAL;
} }
/* /*
...@@ -1542,7 +1548,7 @@ static int target_message(struct dm_ioctl *param, size_t param_size) ...@@ -1542,7 +1548,7 @@ static int target_message(struct dm_ioctl *param, size_t param_size)
if (r == 1) { if (r == 1) {
param->flags |= DM_DATA_OUT_FLAG; param->flags |= DM_DATA_OUT_FLAG;
if (buffer_test_overflow(result, maxlen)) if (dm_message_test_buffer_overflow(result, maxlen))
param->flags |= DM_BUFFER_FULL_FLAG; param->flags |= DM_BUFFER_FULL_FLAG;
else else
param->data_size = param->data_start + strlen(result) + 1; param->data_size = param->data_start + strlen(result) + 1;
......
This diff is collapsed.
#ifndef DM_STATS_H
#define DM_STATS_H
#include <linux/types.h>
#include <linux/mutex.h>
#include <linux/list.h>
int dm_statistics_init(void);
void dm_statistics_exit(void);
struct dm_stats {
struct mutex mutex;
struct list_head list; /* list of struct dm_stat */
struct dm_stats_last_position __percpu *last;
sector_t last_sector;
unsigned last_rw;
};
struct dm_stats_aux {
bool merged;
};
void dm_stats_init(struct dm_stats *st);
void dm_stats_cleanup(struct dm_stats *st);
struct mapped_device;
int dm_stats_message(struct mapped_device *md, unsigned argc, char **argv,
char *result, unsigned maxlen);
void dm_stats_account_io(struct dm_stats *stats, unsigned long bi_rw,
sector_t bi_sector, unsigned bi_sectors, bool end,
unsigned long duration, struct dm_stats_aux *aux);
static inline bool dm_stats_used(struct dm_stats *st)
{
return !list_empty(&st->list);
}
#endif
...@@ -60,6 +60,7 @@ struct dm_io { ...@@ -60,6 +60,7 @@ struct dm_io {
struct bio *bio; struct bio *bio;
unsigned long start_time; unsigned long start_time;
spinlock_t endio_lock; spinlock_t endio_lock;
struct dm_stats_aux stats_aux;
}; };
/* /*
...@@ -198,6 +199,8 @@ struct mapped_device { ...@@ -198,6 +199,8 @@ struct mapped_device {
/* zero-length flush that will be cloned and submitted to targets */ /* zero-length flush that will be cloned and submitted to targets */
struct bio flush_bio; struct bio flush_bio;
struct dm_stats stats;
}; };
/* /*
...@@ -269,6 +272,7 @@ static int (*_inits[])(void) __initdata = { ...@@ -269,6 +272,7 @@ static int (*_inits[])(void) __initdata = {
dm_io_init, dm_io_init,
dm_kcopyd_init, dm_kcopyd_init,
dm_interface_init, dm_interface_init,
dm_statistics_init,
}; };
static void (*_exits[])(void) = { static void (*_exits[])(void) = {
...@@ -279,6 +283,7 @@ static void (*_exits[])(void) = { ...@@ -279,6 +283,7 @@ static void (*_exits[])(void) = {
dm_io_exit, dm_io_exit,
dm_kcopyd_exit, dm_kcopyd_exit,
dm_interface_exit, dm_interface_exit,
dm_statistics_exit,
}; };
static int __init dm_init(void) static int __init dm_init(void)
...@@ -384,6 +389,16 @@ int dm_lock_for_deletion(struct mapped_device *md) ...@@ -384,6 +389,16 @@ int dm_lock_for_deletion(struct mapped_device *md)
return r; return r;
} }
sector_t dm_get_size(struct mapped_device *md)
{
return get_capacity(md->disk);
}
struct dm_stats *dm_get_stats(struct mapped_device *md)
{
return &md->stats;
}
static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo) static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
{ {
struct mapped_device *md = bdev->bd_disk->private_data; struct mapped_device *md = bdev->bd_disk->private_data;
...@@ -466,8 +481,9 @@ static int md_in_flight(struct mapped_device *md) ...@@ -466,8 +481,9 @@ static int md_in_flight(struct mapped_device *md)
static void start_io_acct(struct dm_io *io) static void start_io_acct(struct dm_io *io)
{ {
struct mapped_device *md = io->md; struct mapped_device *md = io->md;
struct bio *bio = io->bio;
int cpu; int cpu;
int rw = bio_data_dir(io->bio); int rw = bio_data_dir(bio);
io->start_time = jiffies; io->start_time = jiffies;
...@@ -476,6 +492,10 @@ static void start_io_acct(struct dm_io *io) ...@@ -476,6 +492,10 @@ static void start_io_acct(struct dm_io *io)
part_stat_unlock(); part_stat_unlock();
atomic_set(&dm_disk(md)->part0.in_flight[rw], atomic_set(&dm_disk(md)->part0.in_flight[rw],
atomic_inc_return(&md->pending[rw])); atomic_inc_return(&md->pending[rw]));
if (unlikely(dm_stats_used(&md->stats)))
dm_stats_account_io(&md->stats, bio->bi_rw, bio->bi_sector,
bio_sectors(bio), false, 0, &io->stats_aux);
} }
static void end_io_acct(struct dm_io *io) static void end_io_acct(struct dm_io *io)
...@@ -491,6 +511,10 @@ static void end_io_acct(struct dm_io *io) ...@@ -491,6 +511,10 @@ static void end_io_acct(struct dm_io *io)
part_stat_add(cpu, &dm_disk(md)->part0, ticks[rw], duration); part_stat_add(cpu, &dm_disk(md)->part0, ticks[rw], duration);
part_stat_unlock(); part_stat_unlock();
if (unlikely(dm_stats_used(&md->stats)))
dm_stats_account_io(&md->stats, bio->bi_rw, bio->bi_sector,
bio_sectors(bio), true, duration, &io->stats_aux);
/* /*
* After this is decremented the bio must not be touched if it is * After this is decremented the bio must not be touched if it is
* a flush. * a flush.
...@@ -1519,7 +1543,7 @@ static void _dm_request(struct request_queue *q, struct bio *bio) ...@@ -1519,7 +1543,7 @@ static void _dm_request(struct request_queue *q, struct bio *bio)
return; return;
} }
static int dm_request_based(struct mapped_device *md) int dm_request_based(struct mapped_device *md)
{ {
return blk_queue_stackable(md->queue); return blk_queue_stackable(md->queue);
} }
...@@ -1958,6 +1982,8 @@ static struct mapped_device *alloc_dev(int minor) ...@@ -1958,6 +1982,8 @@ static struct mapped_device *alloc_dev(int minor)
md->flush_bio.bi_bdev = md->bdev; md->flush_bio.bi_bdev = md->bdev;
md->flush_bio.bi_rw = WRITE_FLUSH; md->flush_bio.bi_rw = WRITE_FLUSH;
dm_stats_init(&md->stats);
/* Populate the mapping, nobody knows we exist yet */ /* Populate the mapping, nobody knows we exist yet */
spin_lock(&_minor_lock); spin_lock(&_minor_lock);
old_md = idr_replace(&_minor_idr, md, minor); old_md = idr_replace(&_minor_idr, md, minor);
...@@ -2009,6 +2035,7 @@ static void free_dev(struct mapped_device *md) ...@@ -2009,6 +2035,7 @@ static void free_dev(struct mapped_device *md)
put_disk(md->disk); put_disk(md->disk);
blk_cleanup_queue(md->queue); blk_cleanup_queue(md->queue);
dm_stats_cleanup(&md->stats);
module_put(THIS_MODULE); module_put(THIS_MODULE);
kfree(md); kfree(md);
} }
...@@ -2150,7 +2177,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t, ...@@ -2150,7 +2177,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
/* /*
* Wipe any geometry if the size of the table changed. * Wipe any geometry if the size of the table changed.
*/ */
if (size != get_capacity(md->disk)) if (size != dm_get_size(md))
memset(&md->geometry, 0, sizeof(md->geometry)); memset(&md->geometry, 0, sizeof(md->geometry));
__set_size(md, size); __set_size(md, size);
...@@ -2696,6 +2723,38 @@ int dm_resume(struct mapped_device *md) ...@@ -2696,6 +2723,38 @@ int dm_resume(struct mapped_device *md)
return r; return r;
} }
/*
* Internal suspend/resume works like userspace-driven suspend. It waits
* until all bios finish and prevents issuing new bios to the target drivers.
* It may be used only from the kernel.
*
* Internal suspend holds md->suspend_lock, which prevents interaction with
* userspace-driven suspend.
*/
void dm_internal_suspend(struct mapped_device *md)
{
mutex_lock(&md->suspend_lock);
if (dm_suspended_md(md))
return;
set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
synchronize_srcu(&md->io_barrier);
flush_workqueue(md->wq);
dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
}
void dm_internal_resume(struct mapped_device *md)
{
if (dm_suspended_md(md))
goto done;
dm_queue_flush(md);
done:
mutex_unlock(&md->suspend_lock);
}
/*----------------------------------------------------------------- /*-----------------------------------------------------------------
* Event notification. * Event notification.
*---------------------------------------------------------------*/ *---------------------------------------------------------------*/
......
...@@ -16,6 +16,8 @@ ...@@ -16,6 +16,8 @@
#include <linux/blkdev.h> #include <linux/blkdev.h>
#include <linux/hdreg.h> #include <linux/hdreg.h>
#include "dm-stats.h"
/* /*
* Suspend feature flags * Suspend feature flags
*/ */
...@@ -157,10 +159,16 @@ void dm_destroy(struct mapped_device *md); ...@@ -157,10 +159,16 @@ void dm_destroy(struct mapped_device *md);
void dm_destroy_immediate(struct mapped_device *md); void dm_destroy_immediate(struct mapped_device *md);
int dm_open_count(struct mapped_device *md); int dm_open_count(struct mapped_device *md);
int dm_lock_for_deletion(struct mapped_device *md); int dm_lock_for_deletion(struct mapped_device *md);
int dm_request_based(struct mapped_device *md);
sector_t dm_get_size(struct mapped_device *md);
struct dm_stats *dm_get_stats(struct mapped_device *md);
int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action, int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
unsigned cookie); unsigned cookie);
void dm_internal_suspend(struct mapped_device *md);
void dm_internal_resume(struct mapped_device *md);
int dm_io_init(void); int dm_io_init(void);
void dm_io_exit(void); void dm_io_exit(void);
...@@ -173,4 +181,12 @@ void dm_kcopyd_exit(void); ...@@ -173,4 +181,12 @@ void dm_kcopyd_exit(void);
struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, unsigned per_bio_data_size); struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, unsigned per_bio_data_size);
void dm_free_md_mempools(struct dm_md_mempools *pools); void dm_free_md_mempools(struct dm_md_mempools *pools);
/*
* Helpers that are used by DM core
*/
static inline bool dm_message_test_buffer_overflow(char *result, unsigned maxlen)
{
return !maxlen || strlen(result) + 1 >= maxlen;
}
#endif #endif
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
#include <linux/bio.h> #include <linux/bio.h>
#include <linux/blkdev.h> #include <linux/blkdev.h>
#include <linux/math64.h>
#include <linux/ratelimit.h> #include <linux/ratelimit.h>
struct dm_dev; struct dm_dev;
...@@ -550,6 +551,14 @@ extern struct ratelimit_state dm_ratelimit_state; ...@@ -550,6 +551,14 @@ extern struct ratelimit_state dm_ratelimit_state;
#define DM_MAPIO_REMAPPED 1 #define DM_MAPIO_REMAPPED 1
#define DM_MAPIO_REQUEUE DM_ENDIO_REQUEUE #define DM_MAPIO_REQUEUE DM_ENDIO_REQUEUE
#define dm_sector_div64(x, y)( \
{ \
u64 _res; \
(x) = div64_u64_rem(x, y, &_res); \
_res; \
} \
)
/* /*
* Ceiling(n / sz) * Ceiling(n / sz)
*/ */
......
...@@ -267,9 +267,9 @@ enum { ...@@ -267,9 +267,9 @@ enum {
#define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl) #define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
#define DM_VERSION_MAJOR 4 #define DM_VERSION_MAJOR 4
#define DM_VERSION_MINOR 25 #define DM_VERSION_MINOR 26
#define DM_VERSION_PATCHLEVEL 0 #define DM_VERSION_PATCHLEVEL 0
#define DM_VERSION_EXTRA "-ioctl (2013-06-26)" #define DM_VERSION_EXTRA "-ioctl (2013-08-15)"
/* Status bits */ /* Status bits */
#define DM_READONLY_FLAG (1 << 0) /* In/Out */ #define DM_READONLY_FLAG (1 << 0) /* In/Out */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment