Commit 603699bb authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab Committed by Jonathan Corbet

static-keys.txt: standardize document format

Each text file under Documentation follows a different
format. Some doesn't even have titles!

Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Mark titles;
- Add a warning mark;
- Mark literals and literal blocks;
- Adjust identation.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent c6d289d0
Static Keys ===========
----------- Static Keys
===========
DEPRECATED API: .. warning::
The use of 'struct static_key' directly, is now DEPRECATED. In addition DEPRECATED API:
static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:
struct static_key false = STATIC_KEY_INIT_FALSE; The use of 'struct static_key' directly, is now DEPRECATED. In addition
struct static_key true = STATIC_KEY_INIT_TRUE; static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following::
static_key_true()
static_key_false()
The updated API replacements are: struct static_key false = STATIC_KEY_INIT_FALSE;
struct static_key true = STATIC_KEY_INIT_TRUE;
static_key_true()
static_key_false()
DEFINE_STATIC_KEY_TRUE(key); The updated API replacements are::
DEFINE_STATIC_KEY_FALSE(key);
DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count);
DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);
static_branch_likely()
static_branch_unlikely()
0) Abstract DEFINE_STATIC_KEY_TRUE(key);
DEFINE_STATIC_KEY_FALSE(key);
DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count);
DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);
static_branch_likely()
static_branch_unlikely()
Abstract
========
Static keys allows the inclusion of seldom used features in Static keys allows the inclusion of seldom used features in
performance-sensitive fast-path kernel code, via a GCC feature and a code performance-sensitive fast-path kernel code, via a GCC feature and a code
patching technique. A quick example: patching technique. A quick example::
DEFINE_STATIC_KEY_FALSE(key); DEFINE_STATIC_KEY_FALSE(key);
...@@ -45,7 +49,8 @@ The static_branch_unlikely() branch will be generated into the code with as litt ...@@ -45,7 +49,8 @@ The static_branch_unlikely() branch will be generated into the code with as litt
impact to the likely code path as possible. impact to the likely code path as possible.
1) Motivation Motivation
==========
Currently, tracepoints are implemented using a conditional branch. The Currently, tracepoints are implemented using a conditional branch. The
...@@ -60,7 +65,8 @@ possible. Although tracepoints are the original motivation for this work, other ...@@ -60,7 +65,8 @@ possible. Although tracepoints are the original motivation for this work, other
kernel code paths should be able to make use of the static keys facility. kernel code paths should be able to make use of the static keys facility.
2) Solution Solution
========
gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label:
...@@ -71,7 +77,7 @@ Using the 'asm goto', we can create branches that are either taken or not taken ...@@ -71,7 +77,7 @@ Using the 'asm goto', we can create branches that are either taken or not taken
by default, without the need to check memory. Then, at run-time, we can patch by default, without the need to check memory. Then, at run-time, we can patch
the branch site to change the branch direction. the branch site to change the branch direction.
For example, if we have a simple branch that is disabled by default: For example, if we have a simple branch that is disabled by default::
if (static_branch_unlikely(&key)) if (static_branch_unlikely(&key))
printk("I am the true branch\n"); printk("I am the true branch\n");
...@@ -87,14 +93,15 @@ optimization. ...@@ -87,14 +93,15 @@ optimization.
This lowlevel patching mechanism is called 'jump label patching', and it gives This lowlevel patching mechanism is called 'jump label patching', and it gives
the basis for the static keys facility. the basis for the static keys facility.
3) Static key label API, usage and examples: Static key label API, usage and examples
========================================
In order to make use of this optimization you must first define a key: In order to make use of this optimization you must first define a key::
DEFINE_STATIC_KEY_TRUE(key); DEFINE_STATIC_KEY_TRUE(key);
or: or::
DEFINE_STATIC_KEY_FALSE(key); DEFINE_STATIC_KEY_FALSE(key);
...@@ -102,14 +109,14 @@ or: ...@@ -102,14 +109,14 @@ or:
The key must be global, that is, it can't be allocated on the stack or dynamically The key must be global, that is, it can't be allocated on the stack or dynamically
allocated at run-time. allocated at run-time.
The key is then used in code as: The key is then used in code as::
if (static_branch_unlikely(&key)) if (static_branch_unlikely(&key))
do unlikely code do unlikely code
else else
do likely code do likely code
Or: Or::
if (static_branch_likely(&key)) if (static_branch_likely(&key))
do likely code do likely code
...@@ -120,15 +127,15 @@ Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may ...@@ -120,15 +127,15 @@ Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may
be used in either static_branch_likely() or static_branch_unlikely() be used in either static_branch_likely() or static_branch_unlikely()
statements. statements.
Branch(es) can be set true via: Branch(es) can be set true via::
static_branch_enable(&key); static_branch_enable(&key);
or false via: or false via::
static_branch_disable(&key); static_branch_disable(&key);
The branch(es) can then be switched via reference counts: The branch(es) can then be switched via reference counts::
static_branch_inc(&key); static_branch_inc(&key);
... ...
...@@ -142,11 +149,11 @@ static_branch_inc(), will change the branch back to true. Likewise, if the ...@@ -142,11 +149,11 @@ static_branch_inc(), will change the branch back to true. Likewise, if the
key is initialized false, a 'static_branch_inc()', will change the branch to key is initialized false, a 'static_branch_inc()', will change the branch to
true. And then a 'static_branch_dec()', will again make the branch false. true. And then a 'static_branch_dec()', will again make the branch false.
Where an array of keys is required, it can be defined as: Where an array of keys is required, it can be defined as::
DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count);
or: or::
DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);
...@@ -159,96 +166,98 @@ simply fall back to a traditional, load, test, and jump sequence. Also, the ...@@ -159,96 +166,98 @@ simply fall back to a traditional, load, test, and jump sequence. Also, the
struct jump_entry table must be at least 4-byte aligned because the struct jump_entry table must be at least 4-byte aligned because the
static_key->entry field makes use of the two least significant bits. static_key->entry field makes use of the two least significant bits.
* select HAVE_ARCH_JUMP_LABEL, see: arch/x86/Kconfig * ``select HAVE_ARCH_JUMP_LABEL``,
see: arch/x86/Kconfig
* #define JUMP_LABEL_NOP_SIZE, see: arch/x86/include/asm/jump_label.h
* __always_inline bool arch_static_branch(struct static_key *key, bool branch), see: * ``#define JUMP_LABEL_NOP_SIZE``,
arch/x86/include/asm/jump_label.h see: arch/x86/include/asm/jump_label.h
* __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch), * ``__always_inline bool arch_static_branch(struct static_key *key, bool branch)``,
see: arch/x86/include/asm/jump_label.h see: arch/x86/include/asm/jump_label.h
* void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type), * ``__always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)``,
see: arch/x86/kernel/jump_label.c see: arch/x86/include/asm/jump_label.h
* __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type), * ``void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)``,
see: arch/x86/kernel/jump_label.c see: arch/x86/kernel/jump_label.c
* ``__init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type)``,
see: arch/x86/kernel/jump_label.c
* struct jump_entry, see: arch/x86/include/asm/jump_label.h * ``struct jump_entry``,
see: arch/x86/include/asm/jump_label.h
5) Static keys / jump label analysis, results (x86_64): 5) Static keys / jump label analysis, results (x86_64):
As an example, let's add the following branch to 'getppid()', such that the As an example, let's add the following branch to 'getppid()', such that the
system call now looks like: system call now looks like::
SYSCALL_DEFINE0(getppid) SYSCALL_DEFINE0(getppid)
{ {
int pid; int pid;
+ if (static_branch_unlikely(&key)) + if (static_branch_unlikely(&key))
+ printk("I am the true branch\n"); + printk("I am the true branch\n");
rcu_read_lock(); rcu_read_lock();
pid = task_tgid_vnr(rcu_dereference(current->real_parent)); pid = task_tgid_vnr(rcu_dereference(current->real_parent));
rcu_read_unlock(); rcu_read_unlock();
return pid; return pid;
} }
The resulting instructions with jump labels generated by GCC is: The resulting instructions with jump labels generated by GCC is::
ffffffff81044290 <sys_getppid>: ffffffff81044290 <sys_getppid>:
ffffffff81044290: 55 push %rbp ffffffff81044290: 55 push %rbp
ffffffff81044291: 48 89 e5 mov %rsp,%rbp ffffffff81044291: 48 89 e5 mov %rsp,%rbp
ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9>
ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax
ffffffff810442a0: 00 00 ffffffff810442a0: 00 00
ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax
ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax
ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi
ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr>
ffffffff810442bc: 5d pop %rbp ffffffff810442bc: 5d pop %rbp
ffffffff810442bd: 48 98 cltq ffffffff810442bd: 48 98 cltq
ffffffff810442bf: c3 retq ffffffff810442bf: c3 retq
ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi
ffffffff810442c7: 31 c0 xor %eax,%eax ffffffff810442c7: 31 c0 xor %eax,%eax
ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk>
ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9>
Without the jump label optimization it looks like: Without the jump label optimization it looks like::
ffffffff810441f0 <sys_getppid>: ffffffff810441f0 <sys_getppid>:
ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key>
ffffffff810441f6: 55 push %rbp ffffffff810441f6: 55 push %rbp
ffffffff810441f7: 48 89 e5 mov %rsp,%rbp ffffffff810441f7: 48 89 e5 mov %rsp,%rbp
ffffffff810441fa: 85 c0 test %eax,%eax ffffffff810441fa: 85 c0 test %eax,%eax
ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35>
ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax
ffffffff81044205: 00 00 ffffffff81044205: 00 00
ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax
ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax
ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi
ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr>
ffffffff81044221: 5d pop %rbp ffffffff81044221: 5d pop %rbp
ffffffff81044222: 48 98 cltq ffffffff81044222: 48 98 cltq
ffffffff81044224: c3 retq ffffffff81044224: c3 retq
ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi
ffffffff8104422c: 31 c0 xor %eax,%eax ffffffff8104422c: 31 c0 xor %eax,%eax
ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk>
ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe>
ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff8104423c: 00 00 00 00 ffffffff8104423c: 00 00 00 00
Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction
vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched
to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump
label case adds: label case adds::
6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. 6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes.
If we then include the padding bytes, the jump label code saves, 16 total bytes If we then include the padding bytes, the jump label code saves, 16 total bytes
of instruction memory for this small function. In this case the non-jump label of instruction memory for this small function. In this case the non-jump label
...@@ -262,7 +271,7 @@ Since there are a number of static key API uses in the scheduler paths, ...@@ -262,7 +271,7 @@ Since there are a number of static key API uses in the scheduler paths,
'pipe-test' (also known as 'perf bench sched pipe') can be used to show the 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the
performance improvement. Testing done on 3.3.0-rc2: performance improvement. Testing done on 3.3.0-rc2:
jump label disabled: jump label disabled::
Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):
...@@ -279,7 +288,7 @@ jump label disabled: ...@@ -279,7 +288,7 @@ jump label disabled:
1.601607384 seconds time elapsed ( +- 0.07% ) 1.601607384 seconds time elapsed ( +- 0.07% )
jump label enabled: jump label enabled::
Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment