Commit dca1e58e authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab

kernel-hacking: update document

This document is fairly updated. Yet, some stuff moved to
other kernel headers. So, update to point to the right
places.

While here, adjust some minor ReST markups.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
parent c4fcd7ca
...@@ -56,7 +56,7 @@ interrupts. You can sleep, by calling :c:func:`schedule()`. ...@@ -56,7 +56,7 @@ interrupts. You can sleep, by calling :c:func:`schedule()`.
In user context, the ``current`` pointer (indicating the task we are In user context, the ``current`` pointer (indicating the task we are
currently executing) is valid, and :c:func:`in_interrupt()` currently executing) is valid, and :c:func:`in_interrupt()`
(``include/linux/interrupt.h``) is false. (``include/linux/preempt.h``) is false.
.. warning:: .. warning::
...@@ -114,12 +114,12 @@ time, although different tasklets can run simultaneously. ...@@ -114,12 +114,12 @@ time, although different tasklets can run simultaneously.
Kuznetsov had at the time. Kuznetsov had at the time.
You can tell you are in a softirq (or tasklet) using the You can tell you are in a softirq (or tasklet) using the
:c:func:`in_softirq()` macro (``include/linux/interrupt.h``). :c:func:`in_softirq()` macro (``include/linux/preempt.h``).
.. warning:: .. warning::
Beware that this will return a false positive if a bh lock (see Beware that this will return a false positive if a
below) is held. :ref:`botton half lock <local_bh_disable>` is held.
Some Basic Rules Some Basic Rules
================ ================
...@@ -154,9 +154,7 @@ The Linux kernel is portable ...@@ -154,9 +154,7 @@ The Linux kernel is portable
ioctls: Not writing a new system call ioctls: Not writing a new system call
===================================== =====================================
A system call generally looks like this A system call generally looks like this::
::
asmlinkage long sys_mycall(int arg) asmlinkage long sys_mycall(int arg)
{ {
...@@ -175,7 +173,9 @@ If all your routine does is read or write some parameter, consider ...@@ -175,7 +173,9 @@ If all your routine does is read or write some parameter, consider
implementing a :c:func:`sysfs()` interface instead. implementing a :c:func:`sysfs()` interface instead.
Inside the ioctl you're in user context to a process. When a error Inside the ioctl you're in user context to a process. When a error
occurs you return a negated errno (see ``include/linux/errno.h``), occurs you return a negated errno (see
``include/uapi/asm-generic/errno-base.h``,
``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
otherwise you return 0. otherwise you return 0.
After you slept you should check if a signal occurred: the Unix/Linux After you slept you should check if a signal occurred: the Unix/Linux
...@@ -195,9 +195,7 @@ some data structure. ...@@ -195,9 +195,7 @@ some data structure.
If you're doing longer computations: first think userspace. If you If you're doing longer computations: first think userspace. If you
**really** want to do it in kernel you should regularly check if you need **really** want to do it in kernel you should regularly check if you need
to give up the CPU (remember there is cooperative multitasking per CPU). to give up the CPU (remember there is cooperative multitasking per CPU).
Idiom: Idiom::
::
cond_resched(); /* Will sleep */ cond_resched(); /* Will sleep */
...@@ -231,26 +229,24 @@ Really. ...@@ -231,26 +229,24 @@ Really.
Common Routines Common Routines
=============== ===============
:c:func:`printk()` ``include/linux/kernel.h`` :c:func:`printk()`
--------------------------------------------- ------------------
Defined in ``include/linux/printk.h``
:c:func:`printk()` feeds kernel messages to the console, dmesg, and :c:func:`printk()` feeds kernel messages to the console, dmesg, and
the syslog daemon. It is useful for debugging and reporting errors, and the syslog daemon. It is useful for debugging and reporting errors, and
can be used inside interrupt context, but use with caution: a machine can be used inside interrupt context, but use with caution: a machine
which has its console flooded with printk messages is unusable. It uses which has its console flooded with printk messages is unusable. It uses
a format string mostly compatible with ANSI C printf, and C string a format string mostly compatible with ANSI C printf, and C string
concatenation to give it a first "priority" argument: concatenation to give it a first "priority" argument::
::
printk(KERN_INFO "i = %u\n", i); printk(KERN_INFO "i = %u\n", i);
See ``include/linux/kernel.h``; for other ``KERN_`` values; these are See ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are
interpreted by syslog as the level. Special case: for printing an IP interpreted by syslog as the level. Special case: for printing an IP
address use address use::
::
__be32 ipaddress; __be32 ipaddress;
printk(KERN_INFO "my ip: %pI4\n", &ipaddress); printk(KERN_INFO "my ip: %pI4\n", &ipaddress);
...@@ -270,8 +266,10 @@ overruns. Make sure that will be enough. ...@@ -270,8 +266,10 @@ overruns. Make sure that will be enough.
on top of its printf function: "Printf should not be used for on top of its printf function: "Printf should not be used for
chit-chat". You should follow that advice. chit-chat". You should follow that advice.
:c:func:`copy_[to/from]_user()` / :c:func:`get_user()` / :c:func:`put_user()` ``include/linux/uaccess.h`` :c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
--------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------
Defined in ``include/linux/uaccess.h`` / ``asm/uaccess.h``
**[SLEEPS]** **[SLEEPS]**
...@@ -297,8 +295,10 @@ The functions may sleep implicitly. This should never be called outside ...@@ -297,8 +295,10 @@ The functions may sleep implicitly. This should never be called outside
user context (it makes no sense), with interrupts disabled, or a user context (it makes no sense), with interrupts disabled, or a
spinlock held. spinlock held.
:c:func:`kmalloc()`/:c:func:`kfree()` ``include/linux/slab.h`` :c:func:`kmalloc()`/:c:func:`kfree()`
-------------------------------------------------------------- -------------------------------------
Defined in ``include/linux/slab.h``
**[MAY SLEEP: SEE BELOW]** **[MAY SLEEP: SEE BELOW]**
...@@ -324,9 +324,9 @@ message, then maybe you called a sleeping allocation function from ...@@ -324,9 +324,9 @@ message, then maybe you called a sleeping allocation function from
interrupt context without ``GFP_ATOMIC``. You should really fix that. interrupt context without ``GFP_ATOMIC``. You should really fix that.
Run, don't walk. Run, don't walk.
If you are allocating at least ``PAGE_SIZE`` (``include/asm/page.h``) If you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or
bytes, consider using :c:func:`__get_free_pages()` ``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
(``include/linux/mm.h``). It takes an order argument (0 for page sized, (``include/linux/gfp.h``). It takes an order argument (0 for page sized,
1 for double page, 2 for four pages etc.) and the same memory priority 1 for double page, 2 for four pages etc.) and the same memory priority
flag word as above. flag word as above.
...@@ -344,24 +344,30 @@ routine. ...@@ -344,24 +344,30 @@ routine.
Before inventing your own cache of often-used objects consider using a Before inventing your own cache of often-used objects consider using a
slab cache in ``include/linux/slab.h`` slab cache in ``include/linux/slab.h``
:c:func:`current()` ``include/asm/current.h`` :c:func:`current()`
--------------------------------------------- -------------------
Defined in ``include/asm/current.h``
This global variable (really a macro) contains a pointer to the current This global variable (really a macro) contains a pointer to the current
task structure, so is only valid in user context. For example, when a task structure, so is only valid in user context. For example, when a
process makes a system call, this will point to the task structure of process makes a system call, this will point to the task structure of
the calling process. It is **not NULL** in interrupt context. the calling process. It is **not NULL** in interrupt context.
:c:func:`mdelay()`/:c:func:`udelay()` ``include/asm/delay.h`` ``include/linux/delay.h`` :c:func:`mdelay()`/:c:func:`udelay()`
--------------------------------------------------------------------------------------- -------------------------------------
Defined in ``include/asm/delay.h`` / ``include/linux/delay.h``
The :c:func:`udelay()` and :c:func:`ndelay()` functions can be The :c:func:`udelay()` and :c:func:`ndelay()` functions can be
used for small pauses. Do not use large values with them as you risk used for small pauses. Do not use large values with them as you risk
overflow - the helper function :c:func:`mdelay()` is useful here, or overflow - the helper function :c:func:`mdelay()` is useful here, or
consider :c:func:`msleep()`. consider :c:func:`msleep()`.
:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()` ``include/asm/byteorder.h`` :c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
--------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------
Defined in ``include/asm/byteorder.h``
The :c:func:`cpu_to_be32()` family (where the "32" can be replaced The :c:func:`cpu_to_be32()` family (where the "32" can be replaced
by 64 or 16, and the "be" can be replaced by "le") are the general way by 64 or 16, and the "be" can be replaced by "le") are the general way
...@@ -375,8 +381,10 @@ to the given type, and return the converted value. The other variation ...@@ -375,8 +381,10 @@ to the given type, and return the converted value. The other variation
is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
convert value referred to by the pointer, and return void. convert value referred to by the pointer, and return void.
:c:func:`local_irq_save()`/:c:func:`local_irq_restore()` ``include/linux/irqflags.h`` :c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
------------------------------------------------------------------------------------- --------------------------------------------------------
Defined in ``include/linux/irqflags.h``
These routines disable hard interrupts on the local CPU, and restore These routines disable hard interrupts on the local CPU, and restore
them. They are reentrant; saving the previous state in their one them. They are reentrant; saving the previous state in their one
...@@ -384,16 +392,23 @@ them. They are reentrant; saving the previous state in their one ...@@ -384,16 +392,23 @@ them. They are reentrant; saving the previous state in their one
enabled, you can simply use :c:func:`local_irq_disable()` and enabled, you can simply use :c:func:`local_irq_disable()` and
:c:func:`local_irq_enable()`. :c:func:`local_irq_enable()`.
:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()` ``include/linux/interrupt.h`` .. _local_bh_disable:
--------------------------------------------------------------------------------------
:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
--------------------------------------------------------
Defined in ``include/linux/bottom_half.h``
These routines disable soft interrupts on the local CPU, and restore These routines disable soft interrupts on the local CPU, and restore
them. They are reentrant; if soft interrupts were disabled before, they them. They are reentrant; if soft interrupts were disabled before, they
will still be disabled after this pair of functions has been called. will still be disabled after this pair of functions has been called.
They prevent softirqs and tasklets from running on the current CPU. They prevent softirqs and tasklets from running on the current CPU.
:c:func:`smp_processor_id()`() ``include/asm/smp.h`` :c:func:`smp_processor_id()`
---------------------------------------------------- ----------------------------
Defined in ``include/linux/smp.h``
:c:func:`get_cpu()` disables preemption (so you won't suddenly get :c:func:`get_cpu()` disables preemption (so you won't suddenly get
moved to another CPU) and returns the current processor number, between moved to another CPU) and returns the current processor number, between
...@@ -405,8 +420,10 @@ If you know you cannot be preempted by another task (ie. you are in ...@@ -405,8 +420,10 @@ If you know you cannot be preempted by another task (ie. you are in
interrupt context, or have preemption disabled) you can use interrupt context, or have preemption disabled) you can use
smp_processor_id(). smp_processor_id().
``__init``/``__exit``/``__initdata`` ``include/linux/init.h`` ``__init``/``__exit``/``__initdata``
------------------------------------------------------------- ------------------------------------
Defined in ``include/linux/init.h``
After boot, the kernel frees up a special section; functions marked with After boot, the kernel frees up a special section; functions marked with
``__init`` and data structures marked with ``__initdata`` are dropped ``__init`` and data structures marked with ``__initdata`` are dropped
...@@ -415,10 +432,13 @@ initialization. ``__exit`` is used to declare a function which is only ...@@ -415,10 +432,13 @@ initialization. ``__exit`` is used to declare a function which is only
required on exit: the function will be dropped if this file is not required on exit: the function will be dropped if this file is not
compiled as a module. See the header file for use. Note that it makes no compiled as a module. See the header file for use. Note that it makes no
sense for a function marked with ``__init`` to be exported to modules sense for a function marked with ``__init`` to be exported to modules
with :c:func:`EXPORT_SYMBOL()` - this will break. with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
will break.
:c:func:`__initcall()`/:c:func:`module_init()`
----------------------------------------------
:c:func:`__initcall()`/:c:func:`module_init()` ``include/linux/init.h`` Defined in ``include/linux/init.h`` / ``include/linux/module.h``
-----------------------------------------------------------------------
Many parts of the kernel are well served as a module Many parts of the kernel are well served as a module
(dynamically-loadable parts of the kernel). Using the (dynamically-loadable parts of the kernel). Using the
...@@ -438,8 +458,11 @@ to fail (unfortunately, this has no effect if the module is compiled ...@@ -438,8 +458,11 @@ to fail (unfortunately, this has no effect if the module is compiled
into the kernel). This function is called in user context with into the kernel). This function is called in user context with
interrupts enabled, so it can sleep. interrupts enabled, so it can sleep.
:c:func:`module_exit()` ``include/linux/init.h`` :c:func:`module_exit()`
------------------------------------------------ -----------------------
Defined in ``include/linux/module.h``
This macro defines the function to be called at module removal time (or This macro defines the function to be called at module removal time (or
never, in the case of the file compiled into the kernel). It will only never, in the case of the file compiled into the kernel). It will only
...@@ -450,8 +473,10 @@ it returns. ...@@ -450,8 +473,10 @@ it returns.
Note that this macro is optional: if it is not present, your module will Note that this macro is optional: if it is not present, your module will
not be removable (except for 'rmmod -f'). not be removable (except for 'rmmod -f').
:c:func:`try_module_get()`/:c:func:`module_put()` ``include/linux/module.h`` :c:func:`try_module_get()`/:c:func:`module_put()`
---------------------------------------------------------------------------- -------------------------------------------------
Defined in ``include/linux/module.h``
These manipulate the module usage count, to protect against removal (a These manipulate the module usage count, to protect against removal (a
module also can't be removed if another module uses one of its exported module also can't be removed if another module uses one of its exported
...@@ -472,8 +497,8 @@ Wait Queues ``include/linux/wait.h`` ...@@ -472,8 +497,8 @@ Wait Queues ``include/linux/wait.h``
A wait queue is used to wait for someone to wake you up when a certain A wait queue is used to wait for someone to wake you up when a certain
condition is true. They must be used carefully to ensure there is no condition is true. They must be used carefully to ensure there is no
race condition. You declare a ``wait_queue_head_t``, and then processes race condition. You declare a :c:type:`wait_queue_head_t`, and then processes
which want to wait for that condition declare a ``wait_queue_t`` which want to wait for that condition declare a :c:type:`wait_queue_t`
referring to themselves, and place that in the queue. referring to themselves, and place that in the queue.
Declaring Declaring
...@@ -490,15 +515,15 @@ Queuing ...@@ -490,15 +515,15 @@ Queuing
Placing yourself in the waitqueue is fairly complex, because you must Placing yourself in the waitqueue is fairly complex, because you must
put yourself in the queue before checking the condition. There is a put yourself in the queue before checking the condition. There is a
macro to do this: :c:func:`wait_event_interruptible()` macro to do this: :c:func:`wait_event_interruptible()`
``include/linux/wait.h`` The first argument is the wait queue head, and (``include/linux/wait.h``) The first argument is the wait queue head, and
the second is an expression which is evaluated; the macro returns 0 when the second is an expression which is evaluated; the macro returns 0 when
this expression is true, or -ERESTARTSYS if a signal is received. The this expression is true, or ``-ERESTARTSYS`` if a signal is received. The
:c:func:`wait_event()` version ignores signals. :c:func:`wait_event()` version ignores signals.
Waking Up Queued Tasks Waking Up Queued Tasks
---------------------- ----------------------
Call :c:func:`wake_up()` ``include/linux/wait.h``;, which will wake Call :c:func:`wake_up()` (``include/linux/wait.h``);, which will wake
up every process in the queue. The exception is if one has up every process in the queue. The exception is if one has
``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will ``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will
not be woken. There are other variants of this basic function available not be woken. There are other variants of this basic function available
...@@ -508,9 +533,9 @@ Atomic Operations ...@@ -508,9 +533,9 @@ Atomic Operations
================= =================
Certain operations are guaranteed atomic on all platforms. The first Certain operations are guaranteed atomic on all platforms. The first
class of operations work on ``atomic_t`` ``include/asm/atomic.h``; this class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
contains a signed integer (at least 32 bits long), and you must use this contains a signed integer (at least 32 bits long), and you must use
these functions to manipulate or read atomic_t variables. these functions to manipulate or read :c:type:`atomic_t` variables.
:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set :c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`, the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and :c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
...@@ -534,7 +559,7 @@ true if the bit was previously set; these are particularly useful for ...@@ -534,7 +559,7 @@ true if the bit was previously set; these are particularly useful for
atomically setting flags. atomically setting flags.
It is possible to call these operations with bit indices greater than It is possible to call these operations with bit indices greater than
BITS_PER_LONG. The resulting behavior is strange on big-endian ``BITS_PER_LONG``. The resulting behavior is strange on big-endian
platforms though so it is a good idea not to do this. platforms though so it is a good idea not to do this.
Symbols Symbols
...@@ -546,14 +571,18 @@ be used anywhere in the kernel). However, for modules, a special ...@@ -546,14 +571,18 @@ be used anywhere in the kernel). However, for modules, a special
exported symbol table is kept which limits the entry points to the exported symbol table is kept which limits the entry points to the
kernel proper. Modules can also export symbols. kernel proper. Modules can also export symbols.
:c:func:`EXPORT_SYMBOL()` ``include/linux/export.h`` :c:func:`EXPORT_SYMBOL()`
---------------------------------------------------- -------------------------
Defined in ``include/linux/export.h``
This is the classic method of exporting a symbol: dynamically loaded This is the classic method of exporting a symbol: dynamically loaded
modules will be able to use the symbol as normal. modules will be able to use the symbol as normal.
:c:func:`EXPORT_SYMBOL_GPL()` ``include/linux/export.h`` :c:func:`EXPORT_SYMBOL_GPL()`
-------------------------------------------------------- -----------------------------
Defined in ``include/linux/export.h``
Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols
exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
...@@ -579,11 +608,11 @@ Return Conventions ...@@ -579,11 +608,11 @@ Return Conventions
------------------ ------------------
For code called in user context, it's very common to defy C convention, For code called in user context, it's very common to defy C convention,
and return 0 for success, and a negative error number (eg. -EFAULT) for and return 0 for success, and a negative error number (eg. ``-EFAULT``) for
failure. This can be unintuitive at first, but it's fairly widespread in failure. This can be unintuitive at first, but it's fairly widespread in
the kernel. the kernel.
Using :c:func:`ERR_PTR()` ``include/linux/err.h``; to encode a Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
negative error number into a pointer, and :c:func:`IS_ERR()` and negative error number into a pointer, and :c:func:`IS_ERR()` and
:c:func:`PTR_ERR()` to get it back out again: avoids a separate :c:func:`PTR_ERR()` to get it back out again: avoids a separate
pointer parameter for the error number. Icky, but in a good way. pointer parameter for the error number. Icky, but in a good way.
...@@ -603,9 +632,7 @@ Initializing structure members ...@@ -603,9 +632,7 @@ Initializing structure members
------------------------------ ------------------------------
The preferred method of initializing structures is to use designated The preferred method of initializing structures is to use designated
initialisers, as defined by ISO C99, eg: initialisers, as defined by ISO C99, eg::
::
static struct block_device_operations opt_fops = { static struct block_device_operations opt_fops = {
.open = opt_open, .open = opt_open,
...@@ -716,18 +743,14 @@ Kernel Cantrips ...@@ -716,18 +743,14 @@ Kernel Cantrips
Some favorites from browsing the source. Feel free to add to this list. Some favorites from browsing the source. Feel free to add to this list.
``arch/x86/include/asm/delay.h:`` ``arch/x86/include/asm/delay.h``::
::
#define ndelay(n) (__builtin_constant_p(n) ? \ #define ndelay(n) (__builtin_constant_p(n) ? \
((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \ ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
__ndelay(n)) __ndelay(n))
``include/linux/fs.h``: ``include/linux/fs.h``::
::
/* /*
* Kernel pointers have redundant information, so we can use a * Kernel pointers have redundant information, so we can use a
...@@ -741,9 +764,7 @@ Some favorites from browsing the source. Feel free to add to this list. ...@@ -741,9 +764,7 @@ Some favorites from browsing the source. Feel free to add to this list.
#define PTR_ERR(ptr) ((long)(ptr)) #define PTR_ERR(ptr) ((long)(ptr))
#define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000)) #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
``arch/x86/include/asm/uaccess_32.h:`` ``arch/x86/include/asm/uaccess_32.h:``::
::
#define copy_to_user(to,from,n) \ #define copy_to_user(to,from,n) \
(__builtin_constant_p(n) ? \ (__builtin_constant_p(n) ? \
...@@ -751,9 +772,7 @@ Some favorites from browsing the source. Feel free to add to this list. ...@@ -751,9 +772,7 @@ Some favorites from browsing the source. Feel free to add to this list.
__generic_copy_to_user((to),(from),(n))) __generic_copy_to_user((to),(from),(n)))
``arch/sparc/kernel/head.S:`` ``arch/sparc/kernel/head.S:``::
::
/* /*
* Sun people can't spell worth damn. "compatability" indeed. * Sun people can't spell worth damn. "compatability" indeed.
...@@ -772,9 +791,7 @@ Some favorites from browsing the source. Feel free to add to this list. ...@@ -772,9 +791,7 @@ Some favorites from browsing the source. Feel free to add to this list.
.asciz "compatible" .asciz "compatible"
``arch/sparc/lib/checksum.S:`` ``arch/sparc/lib/checksum.S:``::
::
/* Sun, you just can't beat me, you just can't. Stop trying, /* Sun, you just can't beat me, you just can't. Stop trying,
* give up. I'm serious, I am going to kick the living shit * give up. I'm serious, I am going to kick the living shit
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment