Commit ff3cb3fe authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Range check cpu in blk_cpu_to_group
  scatterlist: prevent invalid free when alloc fails
  writeback: Fix lost wake-up shutting down writeback thread
  writeback: do not lose wakeup events when forking bdi threads
  cciss: fix reporting of max queue depth since init
  block: switch s390 tape_block and mg_disk to elevator_change()
  block: add function call to switch the IO scheduler from a driver
  fs/bio-integrity.c: return -ENOMEM on kmalloc failure
  bio-integrity.c: remove dependency on __GFP_NOFAIL
  BLOCK: fix bio.bi_rw handling
  block: put dev->kobj in blk_register_queue fail path
  cciss: handle allocation failure
  cfq-iosched: Documentation help for new tunables
  cfq-iosched: blktrace print per slice sector stats
  cfq-iosched: Implement tunable group_idle
  cfq-iosched: Do group share accounting in IOPS when slice_idle=0
  cfq-iosched: Do not idle if slice_idle=0
  cciss: disable doorbell reset on reset_devices
  blkio: Fix return code for mkdir calls
parents 6ccaa317 be14eb61
CFQ ioscheduler tunables
========================
slice_idle
----------
This specifies how long CFQ should idle for next request on certain cfq queues
(for sequential workloads) and service trees (for random workloads) before
queue is expired and CFQ selects next queue to dispatch from.
By default slice_idle is a non-zero value. That means by default we idle on
queues/service trees. This can be very helpful on highly seeky media like
single spindle SATA/SAS disks where we can cut down on overall number of
seeks and see improved throughput.
Setting slice_idle to 0 will remove all the idling on queues/service tree
level and one should see an overall improved throughput on faster storage
devices like multiple SATA/SAS disks in hardware RAID configuration. The down
side is that isolation provided from WRITES also goes down and notion of
IO priority becomes weaker.
So depending on storage and workload, it might be useful to set slice_idle=0.
In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
keeping slice_idle enabled should be useful. For any configurations where
there are multiple spindles behind single LUN (Host based hardware RAID
controller or for storage arrays), setting slice_idle=0 might end up in better
throughput and acceptable latencies.
CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
process gets bigger time slice and lower priority process gets smaller time
slice. Measuring time becomes harder if storage is fast and supports NCQ and
it would be better to dispatch multiple requests from multiple cfq queues in
request queue at a time. In such scenario, it is not possible to measure time
consumed by single queue accurately.
What is possible though is to measure number of requests dispatched from a
single queue and also allow dispatch from multiple cfq queue at the same time.
This effectively becomes the fairness in terms of IOPS (IO operations per
second).
If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
to IOPS mode and starts providing fairness in terms of number of requests
dispatched. Note that this mode switching takes effect only for group
scheduling. For non-cgroup users nothing should change.
...@@ -217,6 +217,7 @@ Details of cgroup files ...@@ -217,6 +217,7 @@ Details of cgroup files
CFQ sysfs tunable CFQ sysfs tunable
================= =================
/sys/block/<disk>/queue/iosched/group_isolation /sys/block/<disk>/queue/iosched/group_isolation
-----------------------------------------------
If group_isolation=1, it provides stronger isolation between groups at the If group_isolation=1, it provides stronger isolation between groups at the
expense of throughput. By default group_isolation is 0. In general that expense of throughput. By default group_isolation is 0. In general that
...@@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient ...@@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient
and one wants stronger isolation between groups, then set group_isolation=1 and one wants stronger isolation between groups, then set group_isolation=1
but this will come at cost of reduced throughput. but this will come at cost of reduced throughput.
/sys/block/<disk>/queue/iosched/slice_idle
------------------------------------------
On a faster hardware CFQ can be slow, especially with sequential workload.
This happens because CFQ idles on a single queue and single queue might not
drive deeper request queue depths to keep the storage busy. In such scenarios
one can try setting slice_idle=0 and that would switch CFQ to IOPS
(IO operations per second) mode on NCQ supporting hardware.
That means CFQ will not idle between cfq queues of a cfq group and hence be
able to driver higher queue depth and achieve better throughput. That also
means that cfq provides fairness among groups in terms of IOPS and not in
terms of disk time.
/sys/block/<disk>/queue/iosched/group_idle
------------------------------------------
If one disables idling on individual cfq queues and cfq service trees by
setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
on the group in an attempt to provide fairness among groups.
By default group_idle is same as slice_idle and does not do anything if
slice_idle is enabled.
One can experience an overall throughput drop if you have created multiple
groups and put applications in that group which are not driving enough
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
on individual groups and throughput should improve.
What works What works
========== ==========
- Currently only sync IO queues are support. All the buffered writes are - Currently only sync IO queues are support. All the buffered writes are
......
...@@ -966,7 +966,7 @@ blkiocg_create(struct cgroup_subsys *subsys, struct cgroup *cgroup) ...@@ -966,7 +966,7 @@ blkiocg_create(struct cgroup_subsys *subsys, struct cgroup *cgroup)
/* Currently we do not support hierarchy deeper than two level (0,1) */ /* Currently we do not support hierarchy deeper than two level (0,1) */
if (parent != cgroup->top_cgroup) if (parent != cgroup->top_cgroup)
return ERR_PTR(-EINVAL); return ERR_PTR(-EPERM);
blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL); blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
if (!blkcg) if (!blkcg)
......
...@@ -1198,9 +1198,9 @@ static int __make_request(struct request_queue *q, struct bio *bio) ...@@ -1198,9 +1198,9 @@ static int __make_request(struct request_queue *q, struct bio *bio)
int el_ret; int el_ret;
unsigned int bytes = bio->bi_size; unsigned int bytes = bio->bi_size;
const unsigned short prio = bio_prio(bio); const unsigned short prio = bio_prio(bio);
const bool sync = (bio->bi_rw & REQ_SYNC); const bool sync = !!(bio->bi_rw & REQ_SYNC);
const bool unplug = (bio->bi_rw & REQ_UNPLUG); const bool unplug = !!(bio->bi_rw & REQ_UNPLUG);
const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK; const unsigned long ff = bio->bi_rw & REQ_FAILFAST_MASK;
int rw_flags; int rw_flags;
if ((bio->bi_rw & REQ_HARDBARRIER) && if ((bio->bi_rw & REQ_HARDBARRIER) &&
......
...@@ -511,6 +511,7 @@ int blk_register_queue(struct gendisk *disk) ...@@ -511,6 +511,7 @@ int blk_register_queue(struct gendisk *disk)
kobject_uevent(&q->kobj, KOBJ_REMOVE); kobject_uevent(&q->kobj, KOBJ_REMOVE);
kobject_del(&q->kobj); kobject_del(&q->kobj);
blk_trace_remove_sysfs(disk_to_dev(disk)); blk_trace_remove_sysfs(disk_to_dev(disk));
kobject_put(&dev->kobj);
return ret; return ret;
} }
......
...@@ -142,14 +142,18 @@ static inline int queue_congestion_off_threshold(struct request_queue *q) ...@@ -142,14 +142,18 @@ static inline int queue_congestion_off_threshold(struct request_queue *q)
static inline int blk_cpu_to_group(int cpu) static inline int blk_cpu_to_group(int cpu)
{ {
int group = NR_CPUS;
#ifdef CONFIG_SCHED_MC #ifdef CONFIG_SCHED_MC
const struct cpumask *mask = cpu_coregroup_mask(cpu); const struct cpumask *mask = cpu_coregroup_mask(cpu);
return cpumask_first(mask); group = cpumask_first(mask);
#elif defined(CONFIG_SCHED_SMT) #elif defined(CONFIG_SCHED_SMT)
return cpumask_first(topology_thread_cpumask(cpu)); group = cpumask_first(topology_thread_cpumask(cpu));
#else #else
return cpu; return cpu;
#endif #endif
if (likely(group < NR_CPUS))
return group;
return cpu;
} }
/* /*
......
...@@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10; ...@@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10;
static int cfq_slice_async = HZ / 25; static int cfq_slice_async = HZ / 25;
static const int cfq_slice_async_rq = 2; static const int cfq_slice_async_rq = 2;
static int cfq_slice_idle = HZ / 125; static int cfq_slice_idle = HZ / 125;
static int cfq_group_idle = HZ / 125;
static const int cfq_target_latency = HZ * 3/10; /* 300 ms */ static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
static const int cfq_hist_divisor = 4; static const int cfq_hist_divisor = 4;
...@@ -147,6 +148,8 @@ struct cfq_queue { ...@@ -147,6 +148,8 @@ struct cfq_queue {
struct cfq_queue *new_cfqq; struct cfq_queue *new_cfqq;
struct cfq_group *cfqg; struct cfq_group *cfqg;
struct cfq_group *orig_cfqg; struct cfq_group *orig_cfqg;
/* Number of sectors dispatched from queue in single dispatch round */
unsigned long nr_sectors;
}; };
/* /*
...@@ -198,6 +201,8 @@ struct cfq_group { ...@@ -198,6 +201,8 @@ struct cfq_group {
struct hlist_node cfqd_node; struct hlist_node cfqd_node;
atomic_t ref; atomic_t ref;
#endif #endif
/* number of requests that are on the dispatch list or inside driver */
int dispatched;
}; };
/* /*
...@@ -271,6 +276,7 @@ struct cfq_data { ...@@ -271,6 +276,7 @@ struct cfq_data {
unsigned int cfq_slice[2]; unsigned int cfq_slice[2];
unsigned int cfq_slice_async_rq; unsigned int cfq_slice_async_rq;
unsigned int cfq_slice_idle; unsigned int cfq_slice_idle;
unsigned int cfq_group_idle;
unsigned int cfq_latency; unsigned int cfq_latency;
unsigned int cfq_group_isolation; unsigned int cfq_group_isolation;
...@@ -378,6 +384,21 @@ CFQ_CFQQ_FNS(wait_busy); ...@@ -378,6 +384,21 @@ CFQ_CFQQ_FNS(wait_busy);
&cfqg->service_trees[i][j]: NULL) \ &cfqg->service_trees[i][j]: NULL) \
static inline bool iops_mode(struct cfq_data *cfqd)
{
/*
* If we are not idling on queues and it is a NCQ drive, parallel
* execution of requests is on and measuring time is not possible
* in most of the cases until and unless we drive shallower queue
* depths and that becomes a performance bottleneck. In such cases
* switch to start providing fairness in terms of number of IOs.
*/
if (!cfqd->cfq_slice_idle && cfqd->hw_tag)
return true;
else
return false;
}
static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq) static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq)
{ {
if (cfq_class_idle(cfqq)) if (cfq_class_idle(cfqq))
...@@ -906,7 +927,6 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) ...@@ -906,7 +927,6 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
slice_used = cfqq->allocated_slice; slice_used = cfqq->allocated_slice;
} }
cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
return slice_used; return slice_used;
} }
...@@ -914,19 +934,21 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg, ...@@ -914,19 +934,21 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
struct cfq_queue *cfqq) struct cfq_queue *cfqq)
{ {
struct cfq_rb_root *st = &cfqd->grp_service_tree; struct cfq_rb_root *st = &cfqd->grp_service_tree;
unsigned int used_sl, charge_sl; unsigned int used_sl, charge;
int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg) int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg)
- cfqg->service_tree_idle.count; - cfqg->service_tree_idle.count;
BUG_ON(nr_sync < 0); BUG_ON(nr_sync < 0);
used_sl = charge_sl = cfq_cfqq_slice_usage(cfqq); used_sl = charge = cfq_cfqq_slice_usage(cfqq);
if (!cfq_cfqq_sync(cfqq) && !nr_sync) if (iops_mode(cfqd))
charge_sl = cfqq->allocated_slice; charge = cfqq->slice_dispatch;
else if (!cfq_cfqq_sync(cfqq) && !nr_sync)
charge = cfqq->allocated_slice;
/* Can't update vdisktime while group is on service tree */ /* Can't update vdisktime while group is on service tree */
cfq_rb_erase(&cfqg->rb_node, st); cfq_rb_erase(&cfqg->rb_node, st);
cfqg->vdisktime += cfq_scale_slice(charge_sl, cfqg); cfqg->vdisktime += cfq_scale_slice(charge, cfqg);
__cfq_group_service_tree_add(st, cfqg); __cfq_group_service_tree_add(st, cfqg);
/* This group is being expired. Save the context */ /* This group is being expired. Save the context */
...@@ -940,6 +962,9 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg, ...@@ -940,6 +962,9 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime, cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
st->min_vdisktime); st->min_vdisktime);
cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u"
" sect=%u", used_sl, cfqq->slice_dispatch, charge,
iops_mode(cfqd), cfqq->nr_sectors);
cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl); cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
cfq_blkiocg_set_start_empty_time(&cfqg->blkg); cfq_blkiocg_set_start_empty_time(&cfqg->blkg);
} }
...@@ -1587,6 +1612,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd, ...@@ -1587,6 +1612,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
cfqq->allocated_slice = 0; cfqq->allocated_slice = 0;
cfqq->slice_end = 0; cfqq->slice_end = 0;
cfqq->slice_dispatch = 0; cfqq->slice_dispatch = 0;
cfqq->nr_sectors = 0;
cfq_clear_cfqq_wait_request(cfqq); cfq_clear_cfqq_wait_request(cfqq);
cfq_clear_cfqq_must_dispatch(cfqq); cfq_clear_cfqq_must_dispatch(cfqq);
...@@ -1839,6 +1865,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq) ...@@ -1839,6 +1865,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)
BUG_ON(!service_tree); BUG_ON(!service_tree);
BUG_ON(!service_tree->count); BUG_ON(!service_tree->count);
if (!cfqd->cfq_slice_idle)
return false;
/* We never do for idle class queues. */ /* We never do for idle class queues. */
if (prio == IDLE_WORKLOAD) if (prio == IDLE_WORKLOAD)
return false; return false;
...@@ -1863,7 +1892,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) ...@@ -1863,7 +1892,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
{ {
struct cfq_queue *cfqq = cfqd->active_queue; struct cfq_queue *cfqq = cfqd->active_queue;
struct cfq_io_context *cic; struct cfq_io_context *cic;
unsigned long sl; unsigned long sl, group_idle = 0;
/* /*
* SSD device without seek penalty, disable idling. But only do so * SSD device without seek penalty, disable idling. But only do so
...@@ -1879,8 +1908,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) ...@@ -1879,8 +1908,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
/* /*
* idle is disabled, either manually or by past process history * idle is disabled, either manually or by past process history
*/ */
if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq)) if (!cfq_should_idle(cfqd, cfqq)) {
/* no queue idling. Check for group idling */
if (cfqd->cfq_group_idle)
group_idle = cfqd->cfq_group_idle;
else
return; return;
}
/* /*
* still active requests from this queue, don't idle * still active requests from this queue, don't idle
...@@ -1907,13 +1941,21 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) ...@@ -1907,13 +1941,21 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
return; return;
} }
/* There are other queues in the group, don't do group idle */
if (group_idle && cfqq->cfqg->nr_cfqq > 1)
return;
cfq_mark_cfqq_wait_request(cfqq); cfq_mark_cfqq_wait_request(cfqq);
if (group_idle)
sl = cfqd->cfq_group_idle;
else
sl = cfqd->cfq_slice_idle; sl = cfqd->cfq_slice_idle;
mod_timer(&cfqd->idle_slice_timer, jiffies + sl); mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg); cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl); cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl,
group_idle ? 1 : 0);
} }
/* /*
...@@ -1929,9 +1971,11 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq) ...@@ -1929,9 +1971,11 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq); cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq);
cfq_remove_request(rq); cfq_remove_request(rq);
cfqq->dispatched++; cfqq->dispatched++;
(RQ_CFQG(rq))->dispatched++;
elv_dispatch_sort(q, rq); elv_dispatch_sort(q, rq);
cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
cfqq->nr_sectors += blk_rq_sectors(rq);
cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq), cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
rq_data_dir(rq), rq_is_sync(rq)); rq_data_dir(rq), rq_is_sync(rq));
} }
...@@ -2198,7 +2242,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) ...@@ -2198,7 +2242,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
cfqq = NULL; cfqq = NULL;
goto keep_queue; goto keep_queue;
} else } else
goto expire; goto check_group_idle;
} }
/* /*
...@@ -2226,8 +2270,23 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd) ...@@ -2226,8 +2270,23 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
* flight or is idling for a new request, allow either of these * flight or is idling for a new request, allow either of these
* conditions to happen (or time out) before selecting a new queue. * conditions to happen (or time out) before selecting a new queue.
*/ */
if (timer_pending(&cfqd->idle_slice_timer) || if (timer_pending(&cfqd->idle_slice_timer)) {
(cfqq->dispatched && cfq_should_idle(cfqd, cfqq))) { cfqq = NULL;
goto keep_queue;
}
if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
cfqq = NULL;
goto keep_queue;
}
/*
* If group idle is enabled and there are requests dispatched from
* this group, wait for requests to complete.
*/
check_group_idle:
if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
&& cfqq->cfqg->dispatched) {
cfqq = NULL; cfqq = NULL;
goto keep_queue; goto keep_queue;
} }
...@@ -3375,6 +3434,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) ...@@ -3375,6 +3434,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
WARN_ON(!cfqq->dispatched); WARN_ON(!cfqq->dispatched);
cfqd->rq_in_driver--; cfqd->rq_in_driver--;
cfqq->dispatched--; cfqq->dispatched--;
(RQ_CFQG(rq))->dispatched--;
cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg, cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg,
rq_start_time_ns(rq), rq_io_start_time_ns(rq), rq_start_time_ns(rq), rq_io_start_time_ns(rq),
rq_data_dir(rq), rq_is_sync(rq)); rq_data_dir(rq), rq_is_sync(rq));
...@@ -3404,7 +3464,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) ...@@ -3404,7 +3464,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
* the queue. * the queue.
*/ */
if (cfq_should_wait_busy(cfqd, cfqq)) { if (cfq_should_wait_busy(cfqd, cfqq)) {
cfqq->slice_end = jiffies + cfqd->cfq_slice_idle; unsigned long extend_sl = cfqd->cfq_slice_idle;
if (!cfqd->cfq_slice_idle)
extend_sl = cfqd->cfq_group_idle;
cfqq->slice_end = jiffies + extend_sl;
cfq_mark_cfqq_wait_busy(cfqq); cfq_mark_cfqq_wait_busy(cfqq);
cfq_log_cfqq(cfqd, cfqq, "will busy wait"); cfq_log_cfqq(cfqd, cfqq, "will busy wait");
} }
...@@ -3850,6 +3913,7 @@ static void *cfq_init_queue(struct request_queue *q) ...@@ -3850,6 +3913,7 @@ static void *cfq_init_queue(struct request_queue *q)
cfqd->cfq_slice[1] = cfq_slice_sync; cfqd->cfq_slice[1] = cfq_slice_sync;
cfqd->cfq_slice_async_rq = cfq_slice_async_rq; cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle; cfqd->cfq_slice_idle = cfq_slice_idle;
cfqd->cfq_group_idle = cfq_group_idle;
cfqd->cfq_latency = 1; cfqd->cfq_latency = 1;
cfqd->cfq_group_isolation = 0; cfqd->cfq_group_isolation = 0;
cfqd->hw_tag = -1; cfqd->hw_tag = -1;
...@@ -3922,6 +3986,7 @@ SHOW_FUNCTION(cfq_fifo_expire_async_show, cfqd->cfq_fifo_expire[0], 1); ...@@ -3922,6 +3986,7 @@ SHOW_FUNCTION(cfq_fifo_expire_async_show, cfqd->cfq_fifo_expire[0], 1);
SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0); SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0);
SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0); SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0);
SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1); SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1);
SHOW_FUNCTION(cfq_group_idle_show, cfqd->cfq_group_idle, 1);
SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1); SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1);
SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1); SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0); SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
...@@ -3954,6 +4019,7 @@ STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0); ...@@ -3954,6 +4019,7 @@ STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0);
STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1, STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1,
UINT_MAX, 0); UINT_MAX, 0);
STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1); STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1);
STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1); STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1); STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1, STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
...@@ -3975,6 +4041,7 @@ static struct elv_fs_entry cfq_attrs[] = { ...@@ -3975,6 +4041,7 @@ static struct elv_fs_entry cfq_attrs[] = {
CFQ_ATTR(slice_async), CFQ_ATTR(slice_async),
CFQ_ATTR(slice_async_rq), CFQ_ATTR(slice_async_rq),
CFQ_ATTR(slice_idle), CFQ_ATTR(slice_idle),
CFQ_ATTR(group_idle),
CFQ_ATTR(low_latency), CFQ_ATTR(low_latency),
CFQ_ATTR(group_isolation), CFQ_ATTR(group_isolation),
__ATTR_NULL __ATTR_NULL
...@@ -4028,6 +4095,12 @@ static int __init cfq_init(void) ...@@ -4028,6 +4095,12 @@ static int __init cfq_init(void)
if (!cfq_slice_idle) if (!cfq_slice_idle)
cfq_slice_idle = 1; cfq_slice_idle = 1;
#ifdef CONFIG_CFQ_GROUP_IOSCHED
if (!cfq_group_idle)
cfq_group_idle = 1;
#else
cfq_group_idle = 0;
#endif
if (cfq_slab_setup()) if (cfq_slab_setup())
return -ENOMEM; return -ENOMEM;
......
...@@ -1009,18 +1009,19 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) ...@@ -1009,18 +1009,19 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
{ {
struct elevator_queue *old_elevator, *e; struct elevator_queue *old_elevator, *e;
void *data; void *data;
int err;
/* /*
* Allocate new elevator * Allocate new elevator
*/ */
e = elevator_alloc(q, new_e); e = elevator_alloc(q, new_e);
if (!e) if (!e)
return 0; return -ENOMEM;
data = elevator_init_queue(q, e); data = elevator_init_queue(q, e);
if (!data) { if (!data) {
kobject_put(&e->kobj); kobject_put(&e->kobj);
return 0; return -ENOMEM;
} }
/* /*
...@@ -1043,7 +1044,8 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) ...@@ -1043,7 +1044,8 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
__elv_unregister_queue(old_elevator); __elv_unregister_queue(old_elevator);
if (elv_register_queue(q)) err = elv_register_queue(q);
if (err)
goto fail_register; goto fail_register;
/* /*
...@@ -1056,7 +1058,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) ...@@ -1056,7 +1058,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
blk_add_trace_msg(q, "elv switch: %s", e->elevator_type->elevator_name); blk_add_trace_msg(q, "elv switch: %s", e->elevator_type->elevator_name);
return 1; return 0;
fail_register: fail_register:
/* /*
...@@ -1071,17 +1073,19 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) ...@@ -1071,17 +1073,19 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q); queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
spin_unlock_irq(q->queue_lock); spin_unlock_irq(q->queue_lock);
return 0; return err;
} }
ssize_t elv_iosched_store(struct request_queue *q, const char *name, /*
size_t count) * Switch this queue to the given IO scheduler.
*/
int elevator_change(struct request_queue *q, const char *name)
{ {
char elevator_name[ELV_NAME_MAX]; char elevator_name[ELV_NAME_MAX];
struct elevator_type *e; struct elevator_type *e;
if (!q->elevator) if (!q->elevator)
return count; return -ENXIO;
strlcpy(elevator_name, name, sizeof(elevator_name)); strlcpy(elevator_name, name, sizeof(elevator_name));
e = elevator_get(strstrip(elevator_name)); e = elevator_get(strstrip(elevator_name));
...@@ -1092,13 +1096,27 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name, ...@@ -1092,13 +1096,27 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name,
if (!strcmp(elevator_name, q->elevator->elevator_type->elevator_name)) { if (!strcmp(elevator_name, q->elevator->elevator_type->elevator_name)) {
elevator_put(e); elevator_put(e);
return count; return 0;
} }
if (!elevator_switch(q, e)) return elevator_switch(q, e);
printk(KERN_ERR "elevator: switch to %s failed\n", }
elevator_name); EXPORT_SYMBOL(elevator_change);
ssize_t elv_iosched_store(struct request_queue *q, const char *name,
size_t count)
{
int ret;
if (!q->elevator)
return count;
ret = elevator_change(q, name);
if (!ret)
return count; return count;
printk(KERN_ERR "elevator: switch to %s failed\n", name);
return ret;
} }
ssize_t elv_iosched_show(struct request_queue *q, char *name) ssize_t elv_iosched_show(struct request_queue *q, char *name)
......
...@@ -297,6 +297,8 @@ static void enqueue_cmd_and_start_io(ctlr_info_t *h, ...@@ -297,6 +297,8 @@ static void enqueue_cmd_and_start_io(ctlr_info_t *h,
spin_lock_irqsave(&h->lock, flags); spin_lock_irqsave(&h->lock, flags);
addQ(&h->reqQ, c); addQ(&h->reqQ, c);
h->Qdepth++; h->Qdepth++;
if (h->Qdepth > h->maxQsinceinit)
h->maxQsinceinit = h->Qdepth;
start_io(h); start_io(h);
spin_unlock_irqrestore(&h->lock, flags); spin_unlock_irqrestore(&h->lock, flags);
} }
...@@ -4519,6 +4521,12 @@ static __devinit int cciss_kdump_hard_reset_controller(struct pci_dev *pdev) ...@@ -4519,6 +4521,12 @@ static __devinit int cciss_kdump_hard_reset_controller(struct pci_dev *pdev)
misc_fw_support = readl(&cfgtable->misc_fw_support); misc_fw_support = readl(&cfgtable->misc_fw_support);
use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET; use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET;
/* The doorbell reset seems to cause lockups on some Smart
* Arrays (e.g. P410, P410i, maybe others). Until this is
* fixed or at least isolated, avoid the doorbell reset.
*/
use_doorbell = 0;
rc = cciss_controller_hard_reset(pdev, vaddr, use_doorbell); rc = cciss_controller_hard_reset(pdev, vaddr, use_doorbell);
if (rc) if (rc)
goto unmap_cfgtable; goto unmap_cfgtable;
...@@ -4712,6 +4720,9 @@ static int __devinit cciss_init_one(struct pci_dev *pdev, ...@@ -4712,6 +4720,9 @@ static int __devinit cciss_init_one(struct pci_dev *pdev,
h->scatter_list = kmalloc(h->max_commands * h->scatter_list = kmalloc(h->max_commands *
sizeof(struct scatterlist *), sizeof(struct scatterlist *),
GFP_KERNEL); GFP_KERNEL);
if (!h->scatter_list)
goto clean4;
for (k = 0; k < h->nr_cmds; k++) { for (k = 0; k < h->nr_cmds; k++) {
h->scatter_list[k] = kmalloc(sizeof(struct scatterlist) * h->scatter_list[k] = kmalloc(sizeof(struct scatterlist) *
h->maxsgentries, h->maxsgentries,
......
...@@ -477,7 +477,7 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio) ...@@ -477,7 +477,7 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset; pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset;
if (bio_rw(bio) == WRITE) { if (bio_rw(bio) == WRITE) {
bool barrier = (bio->bi_rw & REQ_HARDBARRIER); bool barrier = !!(bio->bi_rw & REQ_HARDBARRIER);
struct file *file = lo->lo_backing_file; struct file *file = lo->lo_backing_file;
if (barrier) { if (barrier) {
......
...@@ -974,8 +974,7 @@ static int mg_probe(struct platform_device *plat_dev) ...@@ -974,8 +974,7 @@ static int mg_probe(struct platform_device *plat_dev)
host->breq->queuedata = host; host->breq->queuedata = host;
/* mflash is random device, thanx for the noop */ /* mflash is random device, thanx for the noop */
elevator_exit(host->breq->elevator); err = elevator_change(host->breq, "noop");
err = elevator_init(host->breq, "noop");
if (err) { if (err) {
printk(KERN_ERR "%s:%d (elevator_init) fail\n", printk(KERN_ERR "%s:%d (elevator_init) fail\n",
__func__, __LINE__); __func__, __LINE__);
......
...@@ -217,8 +217,7 @@ tapeblock_setup_device(struct tape_device * device) ...@@ -217,8 +217,7 @@ tapeblock_setup_device(struct tape_device * device)
if (!blkdat->request_queue) if (!blkdat->request_queue)
return -ENOMEM; return -ENOMEM;
elevator_exit(blkdat->request_queue->elevator); rc = elevator_change(blkdat->request_queue, "noop");
rc = elevator_init(blkdat->request_queue, "noop");
if (rc) if (rc)
goto cleanup_queue; goto cleanup_queue;
......
...@@ -413,10 +413,10 @@ int bio_integrity_prep(struct bio *bio) ...@@ -413,10 +413,10 @@ int bio_integrity_prep(struct bio *bio)
/* Allocate kernel buffer for protection data */ /* Allocate kernel buffer for protection data */
len = sectors * blk_integrity_tuple_size(bi); len = sectors * blk_integrity_tuple_size(bi);
buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); buf = kmalloc(len, GFP_NOIO | q->bounce_gfp);
if (unlikely(buf == NULL)) { if (unlikely(buf == NULL)) {
printk(KERN_ERR "could not allocate integrity buffer\n"); printk(KERN_ERR "could not allocate integrity buffer\n");
return -EIO; return -ENOMEM;
} }
end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT; end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
......
...@@ -808,7 +808,7 @@ int bdi_writeback_thread(void *data) ...@@ -808,7 +808,7 @@ int bdi_writeback_thread(void *data)
wb->last_active = jiffies; wb->last_active = jiffies;
set_current_state(TASK_INTERRUPTIBLE); set_current_state(TASK_INTERRUPTIBLE);
if (!list_empty(&bdi->work_list)) { if (!list_empty(&bdi->work_list) || kthread_should_stop()) {
__set_current_state(TASK_RUNNING); __set_current_state(TASK_RUNNING);
continue; continue;
} }
......
...@@ -136,6 +136,7 @@ extern ssize_t elv_iosched_store(struct request_queue *, const char *, size_t); ...@@ -136,6 +136,7 @@ extern ssize_t elv_iosched_store(struct request_queue *, const char *, size_t);
extern int elevator_init(struct request_queue *, char *); extern int elevator_init(struct request_queue *, char *);
extern void elevator_exit(struct elevator_queue *); extern void elevator_exit(struct elevator_queue *);
extern int elevator_change(struct request_queue *, const char *);
extern int elv_rq_merge_ok(struct request *, struct bio *); extern int elv_rq_merge_ok(struct request *, struct bio *);
/* /*
......
...@@ -248,8 +248,18 @@ int __sg_alloc_table(struct sg_table *table, unsigned int nents, ...@@ -248,8 +248,18 @@ int __sg_alloc_table(struct sg_table *table, unsigned int nents,
left -= sg_size; left -= sg_size;
sg = alloc_fn(alloc_size, gfp_mask); sg = alloc_fn(alloc_size, gfp_mask);
if (unlikely(!sg)) if (unlikely(!sg)) {
/*
* Adjust entry count to reflect that the last
* entry of the previous table won't be used for
* linkage. Without this, sg_kfree() may get
* confused.
*/
if (prv)
table->nents = ++table->orig_nents;
return -ENOMEM; return -ENOMEM;
}
sg_init_table(sg, alloc_size); sg_init_table(sg, alloc_size);
table->nents = table->orig_nents += sg_size; table->nents = table->orig_nents += sg_size;
......
...@@ -445,8 +445,8 @@ static int bdi_forker_thread(void *ptr) ...@@ -445,8 +445,8 @@ static int bdi_forker_thread(void *ptr)
switch (action) { switch (action) {
case FORK_THREAD: case FORK_THREAD:
__set_current_state(TASK_RUNNING); __set_current_state(TASK_RUNNING);
task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s", task = kthread_create(bdi_writeback_thread, &bdi->wb,
dev_name(bdi->dev)); "flush-%s", dev_name(bdi->dev));
if (IS_ERR(task)) { if (IS_ERR(task)) {
/* /*
* If thread creation fails, force writeout of * If thread creation fails, force writeout of
...@@ -457,10 +457,13 @@ static int bdi_forker_thread(void *ptr) ...@@ -457,10 +457,13 @@ static int bdi_forker_thread(void *ptr)
/* /*
* The spinlock makes sure we do not lose * The spinlock makes sure we do not lose
* wake-ups when racing with 'bdi_queue_work()'. * wake-ups when racing with 'bdi_queue_work()'.
* And as soon as the bdi thread is visible, we
* can start it.
*/ */
spin_lock_bh(&bdi->wb_lock); spin_lock_bh(&bdi->wb_lock);
bdi->wb.task = task; bdi->wb.task = task;
spin_unlock_bh(&bdi->wb_lock); spin_unlock_bh(&bdi->wb_lock);
wake_up_process(task);
} }
break; break;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment