llist: Remove cpu_relax() usage in cmpxchg loops

Initial benchmarks show they're a net loss: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done $ echo 4096 32000 64 128 > /proc/sys/kernel/sem $ ./sembench -t 2048 -w 1900 -o 0 Pre: run time 30 seconds 778936 worker burns per second run time 30 seconds 912190 worker burns per second run time 30 seconds 817506 worker burns per second run time 30 seconds 830870 worker burns per second run time 30 seconds 845056 worker burns per second Post: run time 30 seconds 905920 worker burns per second run time 30 seconds 849046 worker burns per second run time 30 seconds 886286 worker burns per second run time 30 seconds 822320 worker burns per second run time 30 seconds 900283 worker burns per second So about 4% faster. (!) cpu_relax() stalls the pipeline, therefore, when used in a tight loop it has the following benefits: - allows SMT siblings to have a go; - reduces pressure on the CPU interconnect. However, cmpxchg loops are unfair and thus have unbounded completion time, therefore we should avoid getting in such heavily contended situations where the above benefits make any difference. A typical cmpxchg loop should not go round more than a handfull of times at worst, therefore adding extra delays just slows things down. Since the llist primitives are new, there aren't any bad users yet, and we should avoid growing them. Heavily contended sites should generally be better off using the ticket locks for serialization since they provide bounded completion times (fifo-fair over the cpus). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Huang Ying <ying.huang@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twinsSigned-off-by: Ingo Molnar <mingo@elte.hu>

llist: Remove cpu_relax() usage in cmpxchg loops
Initial benchmarks show they're a net loss: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done $ echo 4096 32000 64 128 > /proc/sys/kernel/sem $ ./sembench -t 2048 -w 1900 -o 0 Pre: run time 30 seconds 778936 worker burns per second run time 30 seconds 912190 worker burns per second run time 30 seconds 817506 worker burns per second run time 30 seconds 830870 worker burns per second run time 30 seconds 845056 worker burns per second Post: run time 30 seconds 905920 worker burns per second run time 30 seconds 849046 worker burns per second run time 30 seconds 886286 worker burns per second run time 30 seconds 822320 worker burns per second run time 30 seconds 900283 worker burns per second So about 4% faster. (!) cpu_relax() stalls the pipeline, therefore, when used in a tight loop it has the following benefits: - allows SMT siblings to have a go; - reduces pressure on the CPU interconnect. However, cmpxchg loops are unfair and thus have unbounded completion time, therefore we should avoid getting in such heavily contended situations where the above benefits make any difference. A typical cmpxchg loop should not go round more than a handfull of times at worst, therefore adding extra delays just slows things down. Since the llist primitives are new, there aren't any bad users yet, and we should avoid growing them. Heavily contended sites should generally be better off using the ticket locks for serialization since they provide bounded completion times (fifo-fair over the cpus). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Huang Ying <ying.huang@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twinsSigned-off-by: Ingo Molnar <mingo@elte.hu>
f0f1d32f · Peter Zijlstra · Ingo Molnar · fa14ff4a · f0f1d32f · f0f1d32f
Commit f0f1d32f authored Sep 12, 2011 by Peter Zijlstra Committed by Ingo Molnar Oct 04, 2011
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 3 deletions

include/linux/llist.h include/linux/llist.h +0 -1

lib/llist.c lib/llist.c +0 -2

No files found.
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -161,7 +161,6 @@ static inline bool llist_add(struct llist_node *new, struct llist_head *head)
 		entry = cmpxchg(&head->first, old_entry, new);
 		if (entry == old_entry)
 			break;
-		cpu_relax();
 	}

 	return old_entry == NULL;

--- a/lib/llist.c
+++ b/lib/llist.c
@@ -49,7 +49,6 @@ bool llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
 		entry = cmpxchg(&head->first, old_entry, new_first);
 		if (entry == old_entry)
 			break;
-		cpu_relax();
 	}

 	return old_entry == NULL;
@@ -83,7 +82,6 @@ struct llist_node *llist_del_first(struct llist_head *head)
 		entry = cmpxchg(&head->first, old_entry, next);
 		if (entry == old_entry)
 			break;
-		cpu_relax();
 	}

 	return entry;