Commit 346658a5 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.18' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a moderately busy cycle for documentation; some of the
  highlights are:

   - Numerous PDF-generation improvements

   - Kees's new document with guidelines for researchers studying the
     development community.

   - The ongoing stream of Chinese translations

   - Thorsten's new document on regression handling

   - A major reworking of the internal documentation for the kernel-doc
     script.

  Plus the usual stream of typo fixes and such"

* tag 'docs-5.18' of git://git.lwn.net/linux: (80 commits)
  docs/kernel-parameters: update description of mem=
  docs/zh_CN: Add sched-nice-design Chinese translation
  docs: scheduler: Convert schedutil.txt to ReST
  Docs: ktap: add code-block type
  docs: serial: fix a reference file name in driver.rst
  docs: UML: Mention telnetd for port channel
  docs/zh_CN: add damon reclaim translation
  docs/zh_CN: add damon usage translation
  docs/zh_CN: add admin-guide damon start translation
  docs/zh_CN: add admin-guide damon index translation
  docs/zh_CN: Refactoring the admin-guide directory index
  zh_CN: Add translation for admin-guide/mm/index.rst
  zh_CN: Add translations for admin-guide/mm/ksm.rst
  Add Chinese translation for vm/ksm.rst
  docs/zh_CN: Add sched-stats Chinese translation
  docs/zh_CN: add devicetree of_unittest translation
  docs/zh_CN: add devicetree usage-model translation
  docs/zh_CN: add devicetree index translation
  Documentation: describe how to apply incremental stable patches
  docs/zh_CN: add peci subsystem translation
  ...
parents d2eb5500 75c05fab
......@@ -26,7 +26,7 @@ SPHINX_CONF = conf.py
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode
LATEXOPTS = -interaction=batchmode -no-shell-escape
ifeq ($(KBUILD_VERBOSE),0)
SPHINXOPTS += "-q"
......
......@@ -315,8 +315,8 @@ To use the feature, admin should set up backing device via::
echo /dev/sda5 > /sys/block/zramX/backing_dev
before disksize setting. It supports only partition at this moment.
If admin wants to use incompressible page writeback, they could do via::
before disksize setting. It supports only partitions at this moment.
If admin wants to use incompressible page writeback, they could do it via::
echo huge > /sys/block/zramX/writeback
......@@ -341,9 +341,9 @@ Admin can request writeback of those idle pages at right timing via::
echo idle > /sys/block/zramX/writeback
With the command, zram writeback idle pages from memory to the storage.
With the command, zram will writeback idle pages from memory to the storage.
If admin want to write a specific page in zram device to backing device,
If an admin wants to write a specific page in zram device to the backing device,
they could write a page index into the interface.
echo "page_index=1251" > /sys/block/zramX/writeback
......@@ -354,7 +354,7 @@ to guarantee storage health for entire product life.
To overcome the concern, zram supports "writeback_limit" feature.
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
any writeback. IOW, if admin wants to apply writeback budget, he should
any writeback. IOW, if admin wants to apply writeback budget, they should
enable writeback_limit_enable via::
$ echo 1 > /sys/block/zramX/writeback_limit_enable
......@@ -365,7 +365,7 @@ until admin sets the budget via /sys/block/zramX/writeback_limit.
(If admin doesn't enable writeback_limit_enable, writeback_limit's value
assigned via /sys/block/zramX/writeback_limit is meaningless.)
If admin want to limit writeback as per-day 400M, he could do it
If admin wants to limit writeback as per-day 400M, they could do it
like below::
$ MB_SHIFT=20
......@@ -375,16 +375,16 @@ like below::
$ echo 1 > /sys/block/zram0/writeback_limit_enable
If admins want to allow further write again once the budget is exhausted,
he could do it like below::
they could do it like below::
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit
If admin wants to see remaining writeback budget since last set::
If an admin wants to see the remaining writeback budget since last set::
$ cat /sys/block/zramX/writeback_limit
If admin want to disable writeback limit, he could do::
If an admin wants to disable writeback limit, they could do::
$ echo 0 > /sys/block/zramX/writeback_limit_enable
......@@ -393,7 +393,7 @@ system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
writeback happened until you reset the zram to allocate extra writeback
budget in next setting is user's job.
If admin wants to measure writeback count in a certain period, he could
If admin wants to measure writeback count in a certain period, they could
know it via /sys/block/zram0/bd_stat's 3rd column.
memory tracking
......
......@@ -35,6 +35,7 @@ problems and bugs in particular.
:maxdepth: 1
reporting-issues
reporting-regressions
security-bugs
bug-hunting
bug-bisect
......
......@@ -76,7 +76,7 @@ Field 3 -- # of sectors read (unsigned long)
Field 4 -- # of milliseconds spent reading (unsigned int)
This is the total number of milliseconds spent by all reads (as
measured from __make_request() to end_that_request_last()).
measured from blk_mq_alloc_request() to __blk_mq_end_request()).
Field 5 -- # of writes completed (unsigned long)
This is the total number of writes completed successfully.
......@@ -89,7 +89,7 @@ Field 7 -- # of sectors written (unsigned long)
Field 8 -- # of milliseconds spent writing (unsigned int)
This is the total number of milliseconds spent by all writes (as
measured from __make_request() to end_that_request_last()).
measured from blk_mq_alloc_request() to __blk_mq_end_request()).
Field 9 -- # of I/Os currently in progress (unsigned int)
The only field that should go to zero. Incremented as requests are
......@@ -120,7 +120,7 @@ Field 14 -- # of sectors discarded (unsigned long)
Field 15 -- # of milliseconds spent discarding (unsigned int)
This is the total number of milliseconds spent by all discards (as
measured from __make_request() to end_that_request_last()).
measured from blk_mq_alloc_request() to __blk_mq_end_request()).
Field 16 -- # of flush requests completed
This is the total number of flush requests completed successfully.
......
......@@ -2827,6 +2827,9 @@
For details see: Documentation/admin-guide/hw-vuln/mds.rst
mem=nn[KMG] [HEXAGON] Set the memory size.
Must be specified, otherwise memory size will be 0.
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
Amount of memory to be used in cases as follows:
......@@ -2834,6 +2837,13 @@
2 when the kernel is not able to see the whole system memory;
3 memory that lies after 'mem=' boundary is excluded from
the hypervisor, then assigned to KVM guests.
4 to limit the memory available for kdump kernel.
[ARC,MICROBLAZE] - the limit applies only to low memory,
high memory is not affected.
[ARM64] - only limits memory covered by the linear
mapping. The NOMAP regions are not affected.
[X86] Work as limiting max address. Use together
with memmap= to avoid physical address space collisions.
......@@ -2844,6 +2854,14 @@
in above case 3, memory may need be hot added after boot
if system memory of hypervisor is not sufficient.
mem=nn[KMG]@ss[KMG]
[ARM,MIPS] - override the memory layout reported by
firmware.
Define a memory region of size nn[KMG] starting at
ss[KMG].
Multiple different regions can be specified with
multiple mem= parameters on the command line.
mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel
memory.
......
......@@ -8,6 +8,7 @@ Performance monitor support
:maxdepth: 1
hisi-pmu
hisi-pcie-pmu
imx-ddr
qcom_l2_pmu
qcom_l3_pmu
......
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
..
If you want to distribute this text under CC-BY-4.0 only, please use 'The
Linux kernel developers' for author attribution and link this as source:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst
..
Note: Only the content of this RST file as found in the Linux kernel sources
is available under CC-BY-4.0, as versions of this text that were processed
(for example by the kernel's build system) might contain content taken from
files which use a more restrictive license.
.. See the bottom of this file for additional redistribution information.
Reporting issues
++++++++++++++++
......@@ -395,22 +386,16 @@ fixed as soon as possible, hence there are 'issues of high priority' that get
handled slightly differently in the reporting process. Three type of cases
qualify: regressions, security issues, and really severe problems.
You deal with a 'regression' if something that worked with an older version of
the Linux kernel does not work with a newer one or somehow works worse with it.
It thus is a regression when a WiFi driver that did a fine job with Linux 5.7
somehow misbehaves with 5.8 or doesn't work at all. It's also a regression if
an application shows erratic behavior with a newer kernel, which might happen
due to incompatible changes in the interface between the kernel and the
userland (like procfs and sysfs). Significantly reduced performance or
increased power consumption also qualify as regression. But keep in mind: the
new kernel needs to be built with a configuration that is similar to the one
from the old kernel (see below how to achieve that). That's because the kernel
developers sometimes can not avoid incompatibilities when implementing new
features; but to avoid regressions such features have to be enabled explicitly
during build time configuration.
You deal with a regression if some application or practical use case running
fine with one Linux kernel works worse or not at all with a newer version
compiled using a similar configuration. The document
Documentation/admin-guide/reporting-regressions.rst explains this in more
detail. It also provides a good deal of other information about regressions you
might want to be aware of; it for example explains how to add your issue to the
list of tracked regressions, to ensure it won't fall through the cracks.
What qualifies as security issue is left to your judgment. Consider reading
'Documentation/admin-guide/security-bugs.rst' before proceeding, as it
Documentation/admin-guide/security-bugs.rst before proceeding, as it
provides additional details how to best handle security issues.
An issue is a 'really severe problem' when something totally unacceptably bad
......@@ -517,7 +502,7 @@ line starting with 'CPU:'. It should end with 'Not tainted' if the kernel was
not tainted when it noticed the problem; it was tainted if you see 'Tainted:'
followed by a few spaces and some letters.
If your kernel is tainted, study 'Documentation/admin-guide/tainted-kernels.rst'
If your kernel is tainted, study Documentation/admin-guide/tainted-kernels.rst
to find out why. Try to eliminate the reason. Often it's caused by one these
three things:
......@@ -1043,7 +1028,7 @@ down the culprit, as maintainers often won't have the time or setup at hand to
reproduce it themselves.
To find the change there is a process called 'bisection' which the document
'Documentation/admin-guide/bug-bisect.rst' describes in detail. That process
Documentation/admin-guide/bug-bisect.rst describes in detail. That process
will often require you to build about ten to twenty kernel images, trying to
reproduce the issue with each of them before building the next. Yes, that takes
some time, but don't worry, it works a lot quicker than most people assume.
......@@ -1073,10 +1058,11 @@ When dealing with regressions make sure the issue you face is really caused by
the kernel and not by something else, as outlined above already.
In the whole process keep in mind: an issue only qualifies as regression if the
older and the newer kernel got built with a similar configuration. The best way
to archive this: copy the configuration file (``.config``) from the old working
kernel freshly to each newer kernel version you try. Afterwards run ``make
olddefconfig`` to adjust it for the needs of the new version.
older and the newer kernel got built with a similar configuration. This can be
achieved by using ``make olddefconfig``, as explained in more detail by
Documentation/admin-guide/reporting-regressions.rst; that document also
provides a good deal of other information about regressions you might want to be
aware of.
Write and send the report
......@@ -1283,7 +1269,7 @@ them when sending the report by mail. If you filed it in a bug tracker, forward
the report's text to these addresses; but on top of it put a small note where
you mention that you filed it with a link to the ticket.
See 'Documentation/admin-guide/security-bugs.rst' for more information.
See Documentation/admin-guide/security-bugs.rst for more information.
Duties after the report went out
......@@ -1571,7 +1557,7 @@ Once your report is out your might get asked to do a proper one, as it allows to
pinpoint the exact change that causes the issue (which then can easily get
reverted to fix the issue quickly). Hence consider to do a proper bisection
right away if time permits. See the section 'Special care for regressions' and
the document 'Documentation/admin-guide/bug-bisect.rst' for details how to
the document Documentation/admin-guide/bug-bisect.rst for details how to
perform one. In case of a successful bisection add the author of the culprit to
the recipients; also CC everyone in the signed-off-by chain, which you find at
the end of its commit message.
......@@ -1594,7 +1580,7 @@ Some fixes are too complex
Even small and seemingly obvious code-changes sometimes introduce new and
totally unexpected problems. The maintainers of the stable and longterm kernels
are very aware of that and thus only apply changes to these kernels that are
within rules outlined in 'Documentation/process/stable-kernel-rules.rst'.
within rules outlined in Documentation/process/stable-kernel-rules.rst.
Complex or risky changes for example do not qualify and thus only get applied
to mainline. Other fixes are easy to get backported to the newest stable and
......@@ -1756,10 +1742,23 @@ art will lay some groundwork to improve the situation over time.
..
This text is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If you
spot a typo or small mistake, feel free to let him know directly and he'll
fix it. You are free to do the same in a mostly informal way if you want
to contribute changes to the text, but for copyright reasons please CC
end-of-content
..
This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If
you spot a typo or small mistake, feel free to let him know directly and
he'll fix it. You are free to do the same in a mostly informal way if you
want to contribute changes to the text, but for copyright reasons please CC
linux-doc@vger.kernel.org and "sign-off" your contribution as
Documentation/process/submitting-patches.rst outlines in the section "Sign
your work - the Developer's Certificate of Origin".
..
This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
of the file. If you want to distribute this text under CC-BY-4.0 only,
please use "The Linux kernel developers" for author attribution and link
this as source:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst
..
Note: Only the content of this RST file as found in the Linux kernel sources
is available under CC-BY-4.0, as versions of this text that were processed
(for example by the kernel's build system) might contain content taken from
files which use a more restrictive license.
This diff is collapsed.
......@@ -11,7 +11,7 @@ Getting started quick
- Compile and install kernel and modules, reboot.
- You need the udftools package (pktsetup, mkudffs, cdrwtool).
Download from http://sourceforge.net/projects/linux-udf/
Download from https://github.com/pali/udftools
- Grab a new CD-RW disc and format it (assuming CD-RW is hdc, substitute
as appropriate)::
......@@ -102,7 +102,7 @@ Using the pktcdvd sysfs interface
Since Linux 2.6.20, the pktcdvd module has a sysfs interface
and can be controlled by it. For example the "pktcdvd" tool uses
this interface. (see http://tom.ist-im-web.de/download/pktcdvd )
this interface. (see http://tom.ist-im-web.de/linux/software/pktcdvd )
"pktcdvd" works similar to "pktsetup", e.g.::
......
......@@ -409,135 +409,25 @@ latex_elements = {
# Additional stuff for the LaTeX preamble.
'preamble': '''
% Prevent column squeezing of tabulary.
\\setlength{\\tymin}{20em}
% Use some font with UTF-8 support with XeLaTeX
\\usepackage{fontspec}
\\setsansfont{DejaVu Sans}
\\setromanfont{DejaVu Serif}
\\setmonofont{DejaVu Sans Mono}
% Adjust \\headheight for fancyhdr
\\addtolength{\\headheight}{1.6pt}
\\addtolength{\\topmargin}{-1.6pt}
''',
''',
}
# Translations have Asian (CJK) characters which are only displayed if
# xeCJK is used
latex_elements['preamble'] += '''
\\IfFontExistsTF{Noto Sans CJK SC}{
% This is needed for translations
\\usepackage{xeCJK}
\\IfFontExistsTF{Noto Serif CJK SC}{
\\setCJKmainfont{Noto Serif CJK SC}[AutoFakeSlant]
}{
\\setCJKmainfont{Noto Sans CJK SC}[AutoFakeSlant]
}
\\setCJKsansfont{Noto Sans CJK SC}[AutoFakeSlant]
\\setCJKmonofont{Noto Sans Mono CJK SC}[AutoFakeSlant]
% CJK Language-specific font choices
\\IfFontExistsTF{Noto Serif CJK SC}{
\\newCJKfontfamily[SCmain]\\scmain{Noto Serif CJK SC}[AutoFakeSlant]
\\newCJKfontfamily[SCserif]\\scserif{Noto Serif CJK SC}[AutoFakeSlant]
}{
\\newCJKfontfamily[SCmain]\\scmain{Noto Sans CJK SC}[AutoFakeSlant]
\\newCJKfontfamily[SCserif]\\scserif{Noto Sans CJK SC}[AutoFakeSlant]
}
\\newCJKfontfamily[SCsans]\\scsans{Noto Sans CJK SC}[AutoFakeSlant]
\\newCJKfontfamily[SCmono]\\scmono{Noto Sans Mono CJK SC}[AutoFakeSlant]
\\IfFontExistsTF{Noto Serif CJK TC}{
\\newCJKfontfamily[TCmain]\\tcmain{Noto Serif CJK TC}[AutoFakeSlant]
\\newCJKfontfamily[TCserif]\\tcserif{Noto Serif CJK TC}[AutoFakeSlant]
}{
\\newCJKfontfamily[TCmain]\\tcmain{Noto Sans CJK TC}[AutoFakeSlant]
\\newCJKfontfamily[TCserif]\\tcserif{Noto Sans CJK TC}[AutoFakeSlant]
}
\\newCJKfontfamily[TCsans]\\tcsans{Noto Sans CJK TC}[AutoFakeSlant]
\\newCJKfontfamily[TCmono]\\tcmono{Noto Sans Mono CJK TC}[AutoFakeSlant]
\\IfFontExistsTF{Noto Serif CJK KR}{
\\newCJKfontfamily[KRmain]\\krmain{Noto Serif CJK KR}[AutoFakeSlant]
\\newCJKfontfamily[KRserif]\\krserif{Noto Serif CJK KR}[AutoFakeSlant]
}{
\\newCJKfontfamily[KRmain]\\krmain{Noto Sans CJK KR}[AutoFakeSlant]
\\newCJKfontfamily[KRserif]\\krserif{Noto Sans CJK KR}[AutoFakeSlant]
}
\\newCJKfontfamily[KRsans]\\krsans{Noto Sans CJK KR}[AutoFakeSlant]
\\newCJKfontfamily[KRmono]\\krmono{Noto Sans Mono CJK KR}[AutoFakeSlant]
\\IfFontExistsTF{Noto Serif CJK JP}{
\\newCJKfontfamily[JPmain]\\jpmain{Noto Serif CJK JP}[AutoFakeSlant]
\\newCJKfontfamily[JPserif]\\jpserif{Noto Serif CJK JP}[AutoFakeSlant]
}{
\\newCJKfontfamily[JPmain]\\jpmain{Noto Sans CJK JP}[AutoFakeSlant]
\\newCJKfontfamily[JPserif]\\jpserif{Noto Sans CJK JP}[AutoFakeSlant]
}
\\newCJKfontfamily[JPsans]\\jpsans{Noto Sans CJK JP}[AutoFakeSlant]
\\newCJKfontfamily[JPmono]\\jpmono{Noto Sans Mono CJK JP}[AutoFakeSlant]
% Dummy commands for Sphinx < 2.3 (no 'extrapackages' support)
\\providecommand{\\onehalfspacing}{}
\\providecommand{\\singlespacing}{}
% Define custom macros to on/off CJK
\\newcommand{\\kerneldocCJKon}{\\makexeCJKactive\\onehalfspacing}
\\newcommand{\\kerneldocCJKoff}{\\makexeCJKinactive\\singlespacing}
\\newcommand{\\kerneldocBeginSC}{%
\\begingroup%
\\scmain%
}
\\newcommand{\\kerneldocEndSC}{\\endgroup}
\\newcommand{\\kerneldocBeginTC}{%
\\begingroup%
\\tcmain%
\\renewcommand{\\CJKrmdefault}{TCserif}%
\\renewcommand{\\CJKsfdefault}{TCsans}%
\\renewcommand{\\CJKttdefault}{TCmono}%
}
\\newcommand{\\kerneldocEndTC}{\\endgroup}
\\newcommand{\\kerneldocBeginKR}{%
\\begingroup%
\\xeCJKDeclareCharClass{HalfLeft}{`“,`‘}%
\\xeCJKDeclareCharClass{HalfRight}{`”,`’}%
\\krmain%
\\renewcommand{\\CJKrmdefault}{KRserif}%
\\renewcommand{\\CJKsfdefault}{KRsans}%
\\renewcommand{\\CJKttdefault}{KRmono}%
\\xeCJKsetup{CJKspace = true} % For inter-phrase space
}
\\newcommand{\\kerneldocEndKR}{\\endgroup}
\\newcommand{\\kerneldocBeginJP}{%
\\begingroup%
\\xeCJKDeclareCharClass{HalfLeft}{`“,`‘}%
\\xeCJKDeclareCharClass{HalfRight}{`”,`’}%
\\jpmain%
\\renewcommand{\\CJKrmdefault}{JPserif}%
\\renewcommand{\\CJKsfdefault}{JPsans}%
\\renewcommand{\\CJKttdefault}{JPmono}%
}
\\newcommand{\\kerneldocEndJP}{\\endgroup}
% Single spacing in literal blocks
\\fvset{baselinestretch=1}
% To customize \\sphinxtableofcontents
\\usepackage{etoolbox}
% Inactivate CJK after tableofcontents
\\apptocmd{\\sphinxtableofcontents}{\\kerneldocCJKoff}{}{}
}{ % No CJK font found
% Custom macros to on/off CJK (Dummy)
\\newcommand{\\kerneldocCJKon}{}
\\newcommand{\\kerneldocCJKoff}{}
\\newcommand{\\kerneldocBeginSC}{}
\\newcommand{\\kerneldocEndSC}{}
\\newcommand{\\kerneldocBeginTC}{}
\\newcommand{\\kerneldocEndTC}{}
\\newcommand{\\kerneldocBeginKR}{}
\\newcommand{\\kerneldocEndKR}{}
\\newcommand{\\kerneldocBeginJP}{}
\\newcommand{\\kerneldocEndJP}{}
}
'''
# Fix reference escape troubles with Sphinx 1.4.x
if major == 1:
latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n'
# Load kerneldoc specific LaTeX settings
latex_elements['preamble'] += '''
% Load kerneldoc specific LaTeX settings
\\input{kerneldoc-preamble.sty}
'''
# With Sphinx 1.6, it is possible to change the Bg color directly
# by using:
# \definecolor{sphinxnoteBgColor}{RGB}{204,255,255}
......@@ -599,6 +489,11 @@ for fn in os.listdir('.'):
# If false, no module index is generated.
#latex_domain_indices = True
# Additional LaTeX stuff to be copied to build directory
latex_additional_files = [
'sphinx/kerneldoc-preamble.sty',
]
# -- Options for manual page output ---------------------------------------
......
Entry/exit handling for exceptions, interrupts, syscalls and KVM
================================================================
All transitions between execution domains require state updates which are
subject to strict ordering constraints. State updates are required for the
following:
* Lockdep
* RCU / Context tracking
* Preemption counter
* Tracing
* Time accounting
The update order depends on the transition type and is explained below in
the transition type sections: `Syscalls`_, `KVM`_, `Interrupts and regular
exceptions`_, `NMI and NMI-like exceptions`_.
Non-instrumentable code - noinstr
---------------------------------
Most instrumentation facilities depend on RCU, so intrumentation is prohibited
for entry code before RCU starts watching and exit code after RCU stops
watching. In addition, many architectures must save and restore register state,
which means that (for example) a breakpoint in the breakpoint entry code would
overwrite the debug registers of the initial breakpoint.
Such code must be marked with the 'noinstr' attribute, placing that code into a
special section inaccessible to instrumentation and debug facilities. Some
functions are partially instrumentable, which is handled by marking them
noinstr and using instrumentation_begin() and instrumentation_end() to flag the
instrumentable ranges of code:
.. code-block:: c
noinstr void entry(void)
{
handle_entry(); // <-- must be 'noinstr' or '__always_inline'
...
instrumentation_begin();
handle_context(); // <-- instrumentable code
instrumentation_end();
...
handle_exit(); // <-- must be 'noinstr' or '__always_inline'
}
This allows verification of the 'noinstr' restrictions via objtool on
supported architectures.
Invoking non-instrumentable functions from instrumentable context has no
restrictions and is useful to protect e.g. state switching which would
cause malfunction if instrumented.
All non-instrumentable entry/exit code sections before and after the RCU
state transitions must run with interrupts disabled.
Syscalls
--------
Syscall-entry code starts in assembly code and calls out into low-level C code
after establishing low-level architecture-specific state and stack frames. This
low-level C code must not be instrumented. A typical syscall handling function
invoked from low-level assembly code looks like this:
.. code-block:: c
noinstr void syscall(struct pt_regs *regs, int nr)
{
arch_syscall_enter(regs);
nr = syscall_enter_from_user_mode(regs, nr);
instrumentation_begin();
if (!invoke_syscall(regs, nr) && nr != -1)
result_reg(regs) = __sys_ni_syscall(regs);
instrumentation_end();
syscall_exit_to_user_mode(regs);
}
syscall_enter_from_user_mode() first invokes enter_from_user_mode() which
establishes state in the following order:
* Lockdep
* RCU / Context tracking
* Tracing
and then invokes the various entry work functions like ptrace, seccomp, audit,
syscall tracing, etc. After all that is done, the instrumentable invoke_syscall
function can be invoked. The instrumentable code section then ends, after which
syscall_exit_to_user_mode() is invoked.
syscall_exit_to_user_mode() handles all work which needs to be done before
returning to user space like tracing, audit, signals, task work etc. After
that it invokes exit_to_user_mode() which again handles the state
transition in the reverse order:
* Tracing
* RCU / Context tracking
* Lockdep
syscall_enter_from_user_mode() and syscall_exit_to_user_mode() are also
available as fine grained subfunctions in cases where the architecture code
has to do extra work between the various steps. In such cases it has to
ensure that enter_from_user_mode() is called first on entry and
exit_to_user_mode() is called last on exit.
Do not nest syscalls. Nested systcalls will cause RCU and/or context tracking
to print a warning.
KVM
---
Entering or exiting guest mode is very similar to syscalls. From the host
kernel point of view the CPU goes off into user space when entering the
guest and returns to the kernel on exit.
kvm_guest_enter_irqoff() is a KVM-specific variant of exit_to_user_mode()
and kvm_guest_exit_irqoff() is the KVM variant of enter_from_user_mode().
The state operations have the same ordering.
Task work handling is done separately for guest at the boundary of the
vcpu_run() loop via xfer_to_guest_mode_handle_work() which is a subset of
the work handled on return to user space.
Do not nest KVM entry/exit transitions because doing so is nonsensical.
Interrupts and regular exceptions
---------------------------------
Interrupts entry and exit handling is slightly more complex than syscalls
and KVM transitions.
If an interrupt is raised while the CPU executes in user space, the entry
and exit handling is exactly the same as for syscalls.
If the interrupt is raised while the CPU executes in kernel space the entry and
exit handling is slightly different. RCU state is only updated when the
interrupt is raised in the context of the CPU's idle task. Otherwise, RCU will
already be watching. Lockdep and tracing have to be updated unconditionally.
irqentry_enter() and irqentry_exit() provide the implementation for this.
The architecture-specific part looks similar to syscall handling:
.. code-block:: c
noinstr void interrupt(struct pt_regs *regs, int nr)
{
arch_interrupt_enter(regs);
state = irqentry_enter(regs);
instrumentation_begin();
irq_enter_rcu();
invoke_irq_handler(regs, nr);
irq_exit_rcu();
instrumentation_end();
irqentry_exit(regs, state);
}
Note that the invocation of the actual interrupt handler is within a
irq_enter_rcu() and irq_exit_rcu() pair.
irq_enter_rcu() updates the preemption count which makes in_hardirq()
return true, handles NOHZ tick state and interrupt time accounting. This
means that up to the point where irq_enter_rcu() is invoked in_hardirq()
returns false.
irq_exit_rcu() handles interrupt time accounting, undoes the preemption
count update and eventually handles soft interrupts and NOHZ tick state.
In theory, the preemption count could be updated in irqentry_enter(). In
practice, deferring this update to irq_enter_rcu() allows the preemption-count
code to be traced, while also maintaining symmetry with irq_exit_rcu() and
irqentry_exit(), which are described in the next paragraph. The only downside
is that the early entry code up to irq_enter_rcu() must be aware that the
preemption count has not yet been updated with the HARDIRQ_OFFSET state.
Note that irq_exit_rcu() must remove HARDIRQ_OFFSET from the preemption count
before it handles soft interrupts, whose handlers must run in BH context rather
than irq-disabled context. In addition, irqentry_exit() might schedule, which
also requires that HARDIRQ_OFFSET has been removed from the preemption count.
Even though interrupt handlers are expected to run with local interrupts
disabled, interrupt nesting is common from an entry/exit perspective. For
example, softirq handling happens within an irqentry_{enter,exit}() block with
local interrupts enabled. Also, although uncommon, nothing prevents an
interrupt handler from re-enabling interrupts.
Interrupt entry/exit code doesn't strictly need to handle reentrancy, since it
runs with local interrupts disabled. But NMIs can happen anytime, and a lot of
the entry code is shared between the two.
NMI and NMI-like exceptions
---------------------------
NMIs and NMI-like exceptions (machine checks, double faults, debug
interrupts, etc.) can hit any context and must be extra careful with
the state.
State changes for debug exceptions and machine-check exceptions depend on
whether these exceptions happened in user-space (breakpoints or watchpoints) or
in kernel mode (code patching). From user-space, they are treated like
interrupts, while from kernel mode they are treated like NMIs.
NMIs and other NMI-like exceptions handle state transitions without
distinguishing between user-mode and kernel-mode origin.
The state update on entry is handled in irqentry_nmi_enter() which updates
state in the following order:
* Preemption counter
* Lockdep
* RCU / Context tracking
* Tracing
The exit counterpart irqentry_nmi_exit() does the reverse operation in the
reverse order.
Note that the update of the preemption counter has to be the first
operation on enter and the last operation on exit. The reason is that both
lockdep and RCU rely on in_nmi() returning true in this case. The
preemption count modification in the NMI entry/exit case must not be
traced.
Architecture-specific code looks like this:
.. code-block:: c
noinstr void nmi(struct pt_regs *regs)
{
arch_nmi_enter(regs);
state = irqentry_nmi_enter(regs);
instrumentation_begin();
nmi_handler(regs);
instrumentation_end();
irqentry_nmi_exit(regs);
}
and for e.g. a debug exception it can look like this:
.. code-block:: c
noinstr void debug(struct pt_regs *regs)
{
arch_nmi_enter(regs);
debug_regs = save_debug_regs();
if (user_mode(regs)) {
state = irqentry_enter(regs);
instrumentation_begin();
user_mode_debug_handler(regs, debug_regs);
instrumentation_end();
irqentry_exit(regs, state);
} else {
state = irqentry_nmi_enter(regs);
instrumentation_begin();
kernel_mode_debug_handler(regs, debug_regs);
instrumentation_end();
irqentry_nmi_exit(regs, state);
}
}
There is no combined irqentry_nmi_if_kernel() function available as the
above cannot be handled in an exception-agnostic way.
NMIs can happen in any context. For example, an NMI-like exception triggered
while handling an NMI. So NMI entry code has to be reentrant and state updates
need to handle nesting.
......@@ -44,6 +44,14 @@ Library functionality that is used throughout the kernel.
timekeeping
errseq
Low level entry and exit
========================
.. toctree::
:maxdepth: 1
entry
Concurrency primitives
======================
......
.. SPDX-License-Identifier: GPL-2.0
========================================
The Kernel Test Anything Protocol (KTAP)
========================================
===================================================
The Kernel Test Anything Protocol (KTAP), version 1
===================================================
TAP, or the Test Anything Protocol is a format for specifying test results used
by a number of projects. It's website and specification are found at this `link
......@@ -68,7 +68,7 @@ Test case result lines
Test case result lines indicate the final status of a test.
They are required and must have the format:
.. code-block::
.. code-block:: none
<result> <number> [<description>][ # [<directive>] [<diagnostic data>]]
......@@ -117,32 +117,32 @@ separator.
Example result lines include:
.. code-block::
.. code-block:: none
ok 1 test_case_name
The test "test_case_name" passed.
.. code-block::
.. code-block:: none
not ok 1 test_case_name
The test "test_case_name" failed.
.. code-block::
.. code-block:: none
ok 1 test # SKIP necessary dependency unavailable
The test "test" was SKIPPED with the diagnostic message "necessary dependency
unavailable".
.. code-block::
.. code-block:: none
not ok 1 test # TIMEOUT 30 seconds
The test "test" timed out, with diagnostic data "30 seconds".
.. code-block::
.. code-block:: none
ok 5 check return code # rcode=0
......@@ -174,6 +174,13 @@ There may be lines within KTAP output that do not follow the format of one of
the four formats for lines described above. This is allowed, however, they will
not influence the status of the tests.
This is an important difference from TAP. Kernel tests may print messages
to the system console or a log file. Both of these destinations may contain
messages either from unrelated kernel or userspace activity, or kernel
messages from non-test code that is invoked by the test. The kernel code
invoked by the test likely is not aware that a test is in progress and
thus can not print the message as a diagnostic message.
Nested tests
------------
......@@ -186,13 +193,16 @@ starting with another KTAP version line and test plan, and end with the overall
result. If one of the subtests fail, for example, the parent test should also
fail.
Additionally, all result lines in a subtest should be indented. One level of
Additionally, all lines in a subtest should be indented. One level of
indentation is two spaces: " ". The indentation should begin at the version
line and should end before the parent test's result line.
"Unknown lines" are not considered to be lines in a subtest and thus are
allowed to be either indented or not indented.
An example of a test with two nested subtests:
.. code-block::
.. code-block:: none
KTAP version 1
1..1
......@@ -205,7 +215,7 @@ An example of a test with two nested subtests:
An example format with multiple levels of nested testing:
.. code-block::
.. code-block:: none
KTAP version 1
1..2
......@@ -224,10 +234,15 @@ An example format with multiple levels of nested testing:
Major differences between TAP and KTAP
--------------------------------------
Note the major differences between the TAP and KTAP specification:
- yaml and json are not recommended in diagnostic messages
- TODO directive not recognized
- KTAP allows for an arbitrary number of tests to be nested
================================================== ========= ===============
Feature TAP KTAP
================================================== ========= ===============
yaml and json in diagnosic message ok not recommended
TODO directive ok not recognized
allows an arbitrary number of tests to be nested no yes
"Unknown lines" are in category of "Anything else" yes no
"Unknown lines" are incorrect allowed
================================================== ========= ===============
The TAP14 specification does permit nested tests, but instead of using another
nested version line, uses a line of the form
......@@ -235,7 +250,7 @@ nested version line, uses a line of the form
Example KTAP output
--------------------
.. code-block::
.. code-block:: none
KTAP version 1
1..1
......
......@@ -311,7 +311,7 @@ hardware.
This call must not sleep
set_ldisc(port,termios)
Notifier for discipline change. See Documentation/driver-api/serial/tty.rst.
Notifier for discipline change. See Documentation/tty/tty_ldisc.rst.
Locking: caller holds tty_port->mutex
......
......@@ -247,7 +247,7 @@ based on rt_mutex which changes the semantics:
Non-PREEMPT_RT kernels disable preemption to get this effect.
PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
preemption disabled. The lock disables softirq handlers and also
preemption enabled. The lock disables softirq handlers and also
prevents reentrancy due to task preemption.
PREEMPT_RT kernels preserve all other spinlock_t semantics:
......
......@@ -249,6 +249,10 @@ The 5.x.y (-stable) and 5.x patches live at
https://www.kernel.org/pub/linux/kernel/v5.x/
The 5.x.y incremental patches live at
https://www.kernel.org/pub/linux/kernel/v5.x/incr/
The -rc patches are not stored on the webserver but are generated on
demand from git tags such as
......@@ -308,12 +312,11 @@ versions.
If no 5.x.y kernel is available, then the highest numbered 5.x kernel is
the current stable kernel.
.. note::
The -stable team provides normal as well as incremental patches. Below is
how to apply these patches.
The -stable team usually do make incremental patches available as well
as patches against the latest mainline release, but I only cover the
non-incremental ones below. The incremental ones can be found at
https://www.kernel.org/pub/linux/kernel/v5.x/incr/
Normal patches
~~~~~~~~~~~~~~
These patches are not incremental, meaning that for example the 5.7.3
patch does not apply on top of the 5.7.2 kernel source, but rather on top
......@@ -331,6 +334,21 @@ Here's a small example::
$ cd ..
$ mv linux-5.7.2 linux-5.7.3 # rename the kernel source dir
Incremental patches
~~~~~~~~~~~~~~~~~~~
Incremental patches are different: instead of being applied on top
of base 5.x kernel, they are applied on top of previous stable kernel
(5.x.y-1).
Here's the example to apply these::
$ cd ~/linux-5.7.2 # change to the kernel source dir
$ patch -p1 < ../patch-5.7.2-3 # apply the new 5.7.3 patch
$ cd ..
$ mv linux-5.7.2 linux-5.7.3 # rename the kernel source dir
The -rc kernels
===============
......
This diff is collapsed.
......@@ -25,6 +25,7 @@ Below are the essential guides that every developer should read.
code-of-conduct-interpretation
development-process
submitting-patches
handling-regressions
programming-language
coding-style
maintainer-handbooks
......@@ -48,6 +49,7 @@ Other guides to the community that are of interest to most developers are:
deprecated
embargoed-hardware-issues
maintainers
researcher-guidelines
These are some overall technical guides that have been put here for now for
lack of a better place.
......
.. SPDX-License-Identifier: GPL-2.0
.. _researcher_guidelines:
Researcher Guidelines
+++++++++++++++++++++
The Linux kernel community welcomes transparent research on the Linux
kernel, the activities involved in producing it, and any other byproducts
of its development. Linux benefits greatly from this kind of research, and
most aspects of Linux are driven by research in one form or another.
The community greatly appreciates if researchers can share preliminary
findings before making their results public, especially if such research
involves security. Getting involved early helps both improve the quality
of research and ability for Linux to improve from it. In any case,
sharing open access copies of the published research with the community
is recommended.
This document seeks to clarify what the Linux kernel community considers
acceptable and non-acceptable practices when conducting such research. At
the very least, such research and related activities should follow
standard research ethics rules. For more background on research ethics
generally, ethics in technology, and research of developer communities
in particular, see:
* `History of Research Ethics <https://www.unlv.edu/research/ORI-HSR/history-ethics>`_
* `IEEE Ethics <https://www.ieee.org/about/ethics/index.html>`_
* `Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects <https://arxiv.org/pdf/2112.13217.pdf>`_
The Linux kernel community expects that everyone interacting with the
project is participating in good faith to make Linux better. Research on
any publicly-available artifact (including, but not limited to source
code) produced by the Linux kernel community is welcome, though research
on developers must be distinctly opt-in.
Passive research that is based entirely on publicly available sources,
including posts to public mailing lists and commits to public
repositories, is clearly permissible. Though, as with any research,
standard ethics must still be followed.
Active research on developer behavior, however, must be done with the
explicit agreement of, and full disclosure to, the individual developers
involved. Developers cannot be interacted with/experimented on without
consent; this, too, is standard research ethics.
To help clarify: sending patches to developers *is* interacting
with them, but they have already consented to receiving *good faith
contributions*. Sending intentionally flawed/vulnerable patches or
contributing misleading information to discussions is not consented
to. Such communication can be damaging to the developer (e.g. draining
time, effort, and morale) and damaging to the project by eroding
the entire developer community's trust in the contributor (and the
contributor's organization as a whole), undermining efforts to provide
constructive feedback to contributors, and putting end users at risk of
software flaws.
Participation in the development of Linux itself by researchers, as
with anyone, is welcomed and encouraged. Research into Linux code is
a common practice, especially when it comes to developing or running
analysis tools that produce actionable results.
When engaging with the developer community, sending a patch has
traditionally been the best way to make an impact. Linux already has
plenty of known bugs -- what's much more helpful is having vetted fixes.
Before contributing, carefully read the appropriate documentation:
* Documentation/process/development-process.rst
* Documentation/process/submitting-patches.rst
* Documentation/admin-guide/reporting-issues.rst
* Documentation/admin-guide/security-bugs.rst
Then send a patch (including a commit log with all the details listed
below) and follow up on any feedback from other developers.
When sending patches produced from research, the commit logs should
contain at least the following details, so that developers have
appropriate context for understanding the contribution. Answer:
* What is the specific problem that has been found?
* How could the problem be reached on a running system?
* What effect would encountering the problem have on the system?
* How was the problem found? Specifically include details about any
testing, static or dynamic analysis programs, and any other tools or
methods used to perform the work.
* Which version of Linux was the problem found on? Using the most recent
release or a recent linux-next branch is strongly preferred (see
Documentation/process/howto.rst).
* What was changed to fix the problem, and why it is believed to be correct?
* How was the change build tested and run-time tested?
* What prior commit does this change fix? This should go in a "Fixes:"
tag as the documentation describes.
* Who else has reviewed this patch? This should go in appropriate
"Reviewed-by:" tags; see below.
For example::
From: Author <author@email>
Subject: [PATCH] drivers/foo_bar: Add missing kfree()
The error path in foo_bar driver does not correctly free the allocated
struct foo_bar_info. This can happen if the attached foo_bar device
rejects the initialization packets sent during foo_bar_probe(). This
would result in a 64 byte slab memory leak once per device attach,
wasting memory resources over time.
This flaw was found using an experimental static analysis tool we are
developing, LeakMagic[1], which reported the following warning when
analyzing the v5.15 kernel release:
path/to/foo_bar.c:187: missing kfree() call?
Add the missing kfree() to the error path. No other references to
this memory exist outside the probe function, so this is the only
place it can be freed.
x86_64 and arm64 defconfig builds with CONFIG_FOO_BAR=y using GCC
11.2 show no new warnings, and LeakMagic no longer warns about this
code path. As we don't have a FooBar device to test with, no runtime
testing was able to be performed.
[1] https://url/to/leakmagic/details
Reported-by: Researcher <researcher@email>
Fixes: aaaabbbbccccdddd ("Introduce support for FooBar")
Signed-off-by: Author <author@email>
Reviewed-by: Reviewer <reviewer@email>
If you are a first time contributor it is recommended that the patch
itself be vetted by others privately before being posted to public lists.
(This is required if you have been explicitly told your patches need
more careful internal review.) These people are expected to have their
"Reviewed-by" tag included in the resulting patch. Finding another
developer familiar with Linux contribution, especially within your own
organization, and having them help with reviews before sending them to
the public mailing lists tends to significantly improve the quality of the
resulting patches, and there by reduces the burden on other developers.
If no one can be found to internally review patches and you need
help finding such a person, or if you have any other questions
related to this document and the developer community's expectations,
please reach out to the private Technical Advisory Board mailing list:
<tech-board@lists.linux-foundation.org>.
......@@ -495,7 +495,8 @@ Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
The Reported-by tag gives credit to people who find bugs and report them and it
hopefully inspires them to help us again in the future. Please note that if
the bug was reported in private, then ask for permission first before using the
Reported-by tag.
Reported-by tag. The tag is intended for bugs; please do not use it to credit
feature requests.
A Tested-by: tag indicates that the patch has been successfully tested (in
some environment) by the person named. This tag informs maintainers that
......
......@@ -14,6 +14,7 @@ Linux Scheduler
sched-domains
sched-capacity
sched-energy
schedutil
sched-nice-design
sched-rt-group
sched-stats
......
......@@ -37,10 +37,10 @@ rebalancing event for the current runqueue has arrived. The actual load
balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run
in softirq context (SCHED_SOFTIRQ).
The latter function takes two arguments: the current CPU and whether it was idle
at the time the scheduler_tick() happened and iterates over all sched domains
our CPU is on, starting from its base domain and going up the ->parent chain.
While doing that, it checks to see if the current domain has exhausted its
The latter function takes two arguments: the runqueue of current CPU and whether
the CPU was idle at the time the scheduler_tick() happened and iterates over all
sched domains our CPU is on, starting from its base domain and going up the ->parent
chain. While doing that, it checks to see if the current domain has exhausted its
rebalance interval. If so, it runs load_balance() on that domain. It then checks
the parent sched_domain (if it exists), and the parent of the parent and so
forth.
......
=========
Schedutil
=========
.. note::
NOTE; all this assumes a linear relation between frequency and work capacity,
we know this is flawed, but it is the best workable approximation.
All this assumes a linear relation between frequency and work capacity,
we know this is flawed, but it is the best workable approximation.
PELT (Per Entity Load Tracking)
-------------------------------
===============================
With PELT we track some metrics across the various scheduler entities, from
individual tasks to task-group slices to CPU runqueues. As the basis for this
......@@ -38,8 +42,8 @@ while 'runnable' will increase to reflect the amount of contention.
For more detail see: kernel/sched/pelt.c
Frequency- / CPU Invariance
---------------------------
Frequency / CPU Invariance
==========================
Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
......@@ -47,7 +51,7 @@ a big CPU, we allow architectures to scale the time delta with two ratios, one
Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
For simple DVFS architectures (where software is in full control) we trivially
compute the ratio as:
compute the ratio as::
f_cur
r_dvfs := -----
......@@ -55,7 +59,7 @@ compute the ratio as:
For more dynamic systems where the hardware is in control of DVFS we use
hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio.
For Intel specifically, we use:
For Intel specifically, we use::
APERF
f_cur := ----- * P0
......@@ -87,7 +91,7 @@ For more detail see:
UTIL_EST / UTIL_EST_FASTUP
--------------------------
==========================
Because periodic tasks have their averages decayed while they sleep, even
though when running their expected utilization will be the same, they suffer a
......@@ -106,7 +110,7 @@ For more detail see: kernel/sched/fair.c:util_est_dequeue()
UCLAMP
------
======
It is possible to set effective u_min and u_max clamps on each CFS or RT task;
the runqueue keeps an max aggregate of these clamps for all running tasks.
......@@ -115,7 +119,7 @@ For more detail see: include/uapi/linux/sched/types.h
Schedutil / DVFS
----------------
================
Every time the scheduler load tracking is updated (task wakeup, task
migration, time progression) we call out to schedutil to update the hardware
......@@ -123,7 +127,7 @@ DVFS state.
The basis is the CPU runqueue's 'running' metric, which per the above it is
the frequency invariant utilization estimate of the CPU. From this we compute
a desired frequency like:
a desired frequency like::
max( running, util_est ); if UTIL_EST
u_cfs := { running; otherwise
......@@ -135,7 +139,7 @@ a desired frequency like:
f_des := min( f_max, 1.25 u * f_max )
XXX IO-wait; when the update is due to a task wakeup from IO-completion we
XXX IO-wait: when the update is due to a task wakeup from IO-completion we
boost 'u' above.
This frequency is then used to select a P-state/OPP or directly munged into a
......@@ -153,7 +157,7 @@ For more information see: kernel/sched/cpufreq_schedutil.c
NOTES
-----
=====
- On low-load scenarios, where DVFS is most relevant, the 'running' numbers
will closely reflect utilization.
......
% -*- coding: utf-8 -*-
% SPDX-License-Identifier: GPL-2.0
%
% LaTeX preamble for "make latexdocs" or "make pdfdocs" including:
% - TOC width settings
% - Setting of tabulary (\tymin)
% - Headheight setting for fancyhdr
% - Fontfamily settings for CJK (Chinese, Japanese, and Korean) translations
%
% Note on the suffix of .sty:
% This is not implemented as a LaTeX style file, but as a file containing
% plain LaTeX code to be included into preamble.
% ".sty" is chosen because ".tex" would cause the build scripts to confuse
% this file with a LaTeX main file.
%
% Copyright (C) 2022 Akira Yokosawa
% Custom width parameters for TOC
% - Redefine low-level commands defined in report.cls.
% - Indent of 2 chars is preserved for ease of comparison.
% Summary of changes from default params:
% Width of page number (\@pnumwidth): 1.55em -> 2.7em
% Width of chapter number: 1.5em -> 1.8em
% Indent of section number: 1.5em -> 1.8em
% Width of section number: 2.6em -> 3.2em
% Indent of sebsection number: 4.1em -> 5em
% Width of subsection number: 3.5em -> 4.3em
%
% These params can have 4 digit page counts, 2 digit chapter counts,
% section counts of 4 digits + 1 period (e.g., 18.10), and subsection counts
% of 5 digits + 2 periods (e.g., 18.7.13).
\makeatletter
%% Redefine \@pnumwidth (page number width)
\renewcommand*\@pnumwidth{2.7em}
%% Redefine \l@chapter (chapter list entry)
\renewcommand*\l@chapter[2]{%
\ifnum \c@tocdepth >\m@ne
\addpenalty{-\@highpenalty}%
\vskip 1.0em \@plus\p@
\setlength\@tempdima{1.8em}%
\begingroup
\parindent \z@ \rightskip \@pnumwidth
\parfillskip -\@pnumwidth
\leavevmode \bfseries
\advance\leftskip\@tempdima
\hskip -\leftskip
#1\nobreak\hfil
\nobreak\hb@xt@\@pnumwidth{\hss #2%
\kern-\p@\kern\p@}\par
\penalty\@highpenalty
\endgroup
\fi}
%% Redefine \l@section and \l@subsection
\renewcommand*\l@section{\@dottedtocline{1}{1.8em}{3.2em}}
\renewcommand*\l@subsection{\@dottedtocline{2}{5em}{4.3em}}
\makeatother
%% Sphinx < 1.8 doesn't have \sphinxtableofcontentshook
\providecommand{\sphinxtableofcontentshook}{}
%% Undefine it for compatibility with Sphinx 1.7.9
\renewcommand{\sphinxtableofcontentshook}{} % Empty the hook
% Prevent column squeezing of tabulary. \tymin is set by Sphinx as:
% \setlength{\tymin}{3\fontcharwd\font`0 }
% , which is too short.
\setlength{\tymin}{20em}
% Adjust \headheight for fancyhdr
\addtolength{\headheight}{1.6pt}
\addtolength{\topmargin}{-1.6pt}
% Translations have Asian (CJK) characters which are only displayed if
% xeCJK is used
\IfFontExistsTF{Noto Sans CJK SC}{
% Load xeCJK when CJK font is available
\usepackage{xeCJK}
% Noto CJK fonts don't provide slant shape. [AutoFakeSlant] permits
% its emulation.
% Select KR variant at the beginning of each document so that quotation
% and apostorph symbols of half-width is used in TOC of Latin documents.
\IfFontExistsTF{Noto Serif CJK KR}{
\setCJKmainfont{Noto Serif CJK KR}[AutoFakeSlant]
}{
\setCJKmainfont{Noto Sans CJK KR}[AutoFakeSlant]
}
\setCJKsansfont{Noto Sans CJK KR}[AutoFakeSlant]
\setCJKmonofont{Noto Sans Mono CJK KR}[AutoFakeSlant]
% Teach xeCJK of half-width symbols
\xeCJKDeclareCharClass{HalfLeft}{`“,`‘}
\xeCJKDeclareCharClass{HalfRight}{`”,`’}
% CJK Language-specific font choices
%% for Simplified Chinese
\IfFontExistsTF{Noto Serif CJK SC}{
\newCJKfontfamily[SCmain]\scmain{Noto Serif CJK SC}[AutoFakeSlant]
\newCJKfontfamily[SCserif]\scserif{Noto Serif CJK SC}[AutoFakeSlant]
}{
\newCJKfontfamily[SCmain]\scmain{Noto Sans CJK SC}[AutoFakeSlant]
\newCJKfontfamily[SCserif]\scserif{Noto Sans CJK SC}[AutoFakeSlant]
}
\newCJKfontfamily[SCsans]\scsans{Noto Sans CJK SC}[AutoFakeSlant]
\newCJKfontfamily[SCmono]\scmono{Noto Sans Mono CJK SC}[AutoFakeSlant]
%% for Traditional Chinese
\IfFontExistsTF{Noto Serif CJK TC}{
\newCJKfontfamily[TCmain]\tcmain{Noto Serif CJK TC}[AutoFakeSlant]
\newCJKfontfamily[TCserif]\tcserif{Noto Serif CJK TC}[AutoFakeSlant]
}{
\newCJKfontfamily[TCmain]\tcmain{Noto Sans CJK TC}[AutoFakeSlant]
\newCJKfontfamily[TCserif]\tcserif{Noto Sans CJK TC}[AutoFakeSlant]
}
\newCJKfontfamily[TCsans]\tcsans{Noto Sans CJK TC}[AutoFakeSlant]
\newCJKfontfamily[TCmono]\tcmono{Noto Sans Mono CJK TC}[AutoFakeSlant]
%% for Korean
\IfFontExistsTF{Noto Serif CJK KR}{
\newCJKfontfamily[KRmain]\krmain{Noto Serif CJK KR}[AutoFakeSlant]
\newCJKfontfamily[KRserif]\krserif{Noto Serif CJK KR}[AutoFakeSlant]
}{
\newCJKfontfamily[KRmain]\krmain{Noto Sans CJK KR}[AutoFakeSlant]
\newCJKfontfamily[KRserif]\krserif{Noto Sans CJK KR}[AutoFakeSlant]
}
\newCJKfontfamily[KRsans]\krsans{Noto Sans CJK KR}[AutoFakeSlant]
\newCJKfontfamily[KRmono]\krmono{Noto Sans Mono CJK KR}[AutoFakeSlant]
%% for Japanese
\IfFontExistsTF{Noto Serif CJK JP}{
\newCJKfontfamily[JPmain]\jpmain{Noto Serif CJK JP}[AutoFakeSlant]
\newCJKfontfamily[JPserif]\jpserif{Noto Serif CJK JP}[AutoFakeSlant]
}{
\newCJKfontfamily[JPmain]\jpmain{Noto Sans CJK JP}[AutoFakeSlant]
\newCJKfontfamily[JPserif]\jpserif{Noto Sans CJK JP}[AutoFakeSlant]
}
\newCJKfontfamily[JPsans]\jpsans{Noto Sans CJK JP}[AutoFakeSlant]
\newCJKfontfamily[JPmono]\jpmono{Noto Sans Mono CJK JP}[AutoFakeSlant]
% Dummy commands for Sphinx < 2.3 (no 'extrapackages' support)
\providecommand{\onehalfspacing}{}
\providecommand{\singlespacing}{}
% Define custom macros to on/off CJK
%% One and half spacing for CJK contents
\newcommand{\kerneldocCJKon}{\makexeCJKactive\onehalfspacing}
\newcommand{\kerneldocCJKoff}{\makexeCJKinactive\singlespacing}
% Define custom macros for switching CJK font setting
%% for Simplified Chinese
\newcommand{\kerneldocBeginSC}{%
\begingroup%
\scmain%
\xeCJKDeclareCharClass{FullLeft}{`“,`‘}% Full-width in SC
\xeCJKDeclareCharClass{FullRight}{`”,`’}% Full-width in SC
\renewcommand{\CJKrmdefault}{SCserif}%
\renewcommand{\CJKsfdefault}{SCsans}%
\renewcommand{\CJKttdefault}{SCmono}%
\xeCJKsetup{CJKspace = false}% gobble white spaces by ' '
% For CJK ascii-art alignment
\setmonofont{Noto Sans Mono CJK SC}[AutoFakeSlant]%
}
\newcommand{\kerneldocEndSC}{\endgroup}
%% for Traditional Chinese
\newcommand{\kerneldocBeginTC}{%
\begingroup%
\tcmain%
\xeCJKDeclareCharClass{FullLeft}{`“,`‘}% Full-width in TC
\xeCJKDeclareCharClass{FullRight}{`”,`’}% Full-width in TC
\renewcommand{\CJKrmdefault}{TCserif}%
\renewcommand{\CJKsfdefault}{TCsans}%
\renewcommand{\CJKttdefault}{TCmono}%
\xeCJKsetup{CJKspace = false}% gobble white spaces by ' '
% For CJK ascii-art alignment
\setmonofont{Noto Sans Mono CJK TC}[AutoFakeSlant]%
}
\newcommand{\kerneldocEndTC}{\endgroup}
%% for Korean
\newcommand{\kerneldocBeginKR}{%
\begingroup%
\krmain%
\renewcommand{\CJKrmdefault}{KRserif}%
\renewcommand{\CJKsfdefault}{KRsans}%
\renewcommand{\CJKttdefault}{KRmono}%
% \xeCJKsetup{CJKspace = true} % true by default
% For CJK ascii-art alignment (still misaligned for Hangul)
\setmonofont{Noto Sans Mono CJK KR}[AutoFakeSlant]%
}
\newcommand{\kerneldocEndKR}{\endgroup}
%% for Japanese
\newcommand{\kerneldocBeginJP}{%
\begingroup%
\jpmain%
\renewcommand{\CJKrmdefault}{JPserif}%
\renewcommand{\CJKsfdefault}{JPsans}%
\renewcommand{\CJKttdefault}{JPmono}%
\xeCJKsetup{CJKspace = false}% gobble white space by ' '
% For CJK ascii-art alignment
\setmonofont{Noto Sans Mono CJK JP}[AutoFakeSlant]%
}
\newcommand{\kerneldocEndJP}{\endgroup}
% Single spacing in literal blocks
\fvset{baselinestretch=1}
% To customize \sphinxtableofcontents
\usepackage{etoolbox}
% Inactivate CJK after tableofcontents
\apptocmd{\sphinxtableofcontents}{\kerneldocCJKoff}{}{}
\xeCJKsetup{CJKspace = true}% For inter-phrase space of Korean TOC
}{ % No CJK font found
% Custom macros to on/off CJK and switch CJK fonts (Dummy)
\newcommand{\kerneldocCJKon}{}
\newcommand{\kerneldocCJKoff}{}
%% By defining \kerneldocBegin(SC|TC|KR|JP) as commands with an argument
%% and ignore the argument (#1) in their definitions, whole contents of
%% CJK chapters can be ignored.
\newcommand{\kerneldocBeginSC}[1]{%
%% Put a note on missing CJK fonts in place of zh_CN translation.
\begin{sphinxadmonition}{note}{Note on missing fonts:}
Translations of Simplified Chinese (zh\_CN), Traditional Chinese
(zh\_TW), Korean (ko\_KR), and Japanese (ja\_JP) were skipped
due to the lack of suitable font families.
If you want them, please install ``Noto Sans CJK'' font families
by following instructions from
\sphinxcode{./scripts/sphinx-pre-install}.
Having optional ``Noto Serif CJK'' font families will improve
the looks of those translations.
\end{sphinxadmonition}}
\newcommand{\kerneldocEndSC}{}
\newcommand{\kerneldocBeginTC}[1]{}
\newcommand{\kerneldocEndTC}{}
\newcommand{\kerneldocBeginKR}[1]{}
\newcommand{\kerneldocEndKR}{}
\newcommand{\kerneldocBeginJP}[1]{}
\newcommand{\kerneldocEndJP}{}
}
......@@ -31,10 +31,13 @@ u"""
* ``dot(1)``: Graphviz (https://www.graphviz.org). If Graphviz is not
available, the DOT language is inserted as literal-block.
For conversion to PDF, ``rsvg-convert(1)`` of librsvg
(https://gitlab.gnome.org/GNOME/librsvg) is used when available.
* SVG to PDF: To generate PDF, you need at least one of this tools:
- ``convert(1)``: ImageMagick (https://www.imagemagick.org)
- ``inkscape(1)``: Inkscape (https://inkscape.org/)
List of customizations:
......@@ -49,6 +52,7 @@ import os
from os import path
import subprocess
from hashlib import sha1
import re
from docutils import nodes
from docutils.statemachine import ViewList
from docutils.parsers.rst import directives
......@@ -109,10 +113,20 @@ def pass_handle(self, node): # pylint: disable=W0613
# Graphviz's dot(1) support
dot_cmd = None
# dot(1) -Tpdf should be used
dot_Tpdf = False
# ImageMagick' convert(1) support
convert_cmd = None
# librsvg's rsvg-convert(1) support
rsvg_convert_cmd = None
# Inkscape's inkscape(1) support
inkscape_cmd = None
# Inkscape prior to 1.0 uses different command options
inkscape_ver_one = False
def setup(app):
# check toolchain first
......@@ -160,23 +174,62 @@ def setupTools(app):
This function is called once, when the builder is initiated.
"""
global dot_cmd, convert_cmd # pylint: disable=W0603
global dot_cmd, dot_Tpdf, convert_cmd, rsvg_convert_cmd # pylint: disable=W0603
global inkscape_cmd, inkscape_ver_one # pylint: disable=W0603
kernellog.verbose(app, "kfigure: check installed tools ...")
dot_cmd = which('dot')
convert_cmd = which('convert')
rsvg_convert_cmd = which('rsvg-convert')
inkscape_cmd = which('inkscape')
if dot_cmd:
kernellog.verbose(app, "use dot(1) from: " + dot_cmd)
try:
dot_Thelp_list = subprocess.check_output([dot_cmd, '-Thelp'],
stderr=subprocess.STDOUT)
except subprocess.CalledProcessError as err:
dot_Thelp_list = err.output
pass
dot_Tpdf_ptn = b'pdf'
dot_Tpdf = re.search(dot_Tpdf_ptn, dot_Thelp_list)
else:
kernellog.warn(app, "dot(1) not found, for better output quality install "
"graphviz from https://www.graphviz.org")
if convert_cmd:
kernellog.verbose(app, "use convert(1) from: " + convert_cmd)
if inkscape_cmd:
kernellog.verbose(app, "use inkscape(1) from: " + inkscape_cmd)
inkscape_ver = subprocess.check_output([inkscape_cmd, '--version'],
stderr=subprocess.DEVNULL)
ver_one_ptn = b'Inkscape 1'
inkscape_ver_one = re.search(ver_one_ptn, inkscape_ver)
convert_cmd = None
rsvg_convert_cmd = None
dot_Tpdf = False
else:
kernellog.warn(app,
"convert(1) not found, for SVG to PDF conversion install "
"ImageMagick (https://www.imagemagick.org)")
if convert_cmd:
kernellog.verbose(app, "use convert(1) from: " + convert_cmd)
else:
kernellog.warn(app,
"Neither inkscape(1) nor convert(1) found.\n"
"For SVG to PDF conversion, "
"install either Inkscape (https://inkscape.org/) (preferred) or\n"
"ImageMagick (https://www.imagemagick.org)")
if rsvg_convert_cmd:
kernellog.verbose(app, "use rsvg-convert(1) from: " + rsvg_convert_cmd)
kernellog.verbose(app, "use 'dot -Tsvg' and rsvg-convert(1) for DOT -> PDF conversion")
dot_Tpdf = False
else:
kernellog.verbose(app,
"rsvg-convert(1) not found.\n"
" SVG rendering of convert(1) is done by ImageMagick-native renderer.")
if dot_Tpdf:
kernellog.verbose(app, "use 'dot -Tpdf' for DOT -> PDF conversion")
else:
kernellog.verbose(app, "use 'dot -Tsvg' and convert(1) for DOT -> PDF conversion")
# integrate conversion tools
......@@ -242,7 +295,7 @@ def convert_image(img_node, translator, src_fname=None):
elif in_ext == '.svg':
if translator.builder.format == 'latex':
if convert_cmd is None:
if not inkscape_cmd and convert_cmd is None:
kernellog.verbose(app,
"no SVG to PDF conversion available / include SVG raw.")
img_node.replace_self(file2literal(src_fname))
......@@ -266,7 +319,14 @@ def convert_image(img_node, translator, src_fname=None):
if in_ext == '.dot':
kernellog.verbose(app, 'convert DOT to: {out}/' + _name)
ok = dot2format(app, src_fname, dst_fname)
if translator.builder.format == 'latex' and not dot_Tpdf:
svg_fname = path.join(translator.builder.outdir, fname + '.svg')
ok1 = dot2format(app, src_fname, svg_fname)
ok2 = svg2pdf_by_rsvg(app, svg_fname, dst_fname)
ok = ok1 and ok2
else:
ok = dot2format(app, src_fname, dst_fname)
elif in_ext == '.svg':
kernellog.verbose(app, 'convert SVG to: {out}/' + _name)
......@@ -303,22 +363,70 @@ def dot2format(app, dot_fname, out_fname):
return bool(exit_code == 0)
def svg2pdf(app, svg_fname, pdf_fname):
"""Converts SVG to PDF with ``convert(1)`` command.
"""Converts SVG to PDF with ``inkscape(1)`` or ``convert(1)`` command.
Uses ``convert(1)`` from ImageMagick (https://www.imagemagick.org) for
conversion. Returns ``True`` on success and ``False`` if an error occurred.
Uses ``inkscape(1)`` from Inkscape (https://inkscape.org/) or ``convert(1)``
from ImageMagick (https://www.imagemagick.org) for conversion.
Returns ``True`` on success and ``False`` if an error occurred.
* ``svg_fname`` pathname of the input SVG file with extension (``.svg``)
* ``pdf_name`` pathname of the output PDF file with extension (``.pdf``)
"""
cmd = [convert_cmd, svg_fname, pdf_fname]
# use stdout and stderr from parent
exit_code = subprocess.call(cmd)
cmd_name = 'convert(1)'
if inkscape_cmd:
cmd_name = 'inkscape(1)'
if inkscape_ver_one:
cmd = [inkscape_cmd, '-o', pdf_fname, svg_fname]
else:
cmd = [inkscape_cmd, '-z', '--export-pdf=%s' % pdf_fname, svg_fname]
try:
warning_msg = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
exit_code = 0
except subprocess.CalledProcessError as err:
warning_msg = err.output
exit_code = err.returncode
pass
if exit_code != 0:
kernellog.warn(app, "Error #%d when calling: %s" % (exit_code, " ".join(cmd)))
if warning_msg:
kernellog.warn(app, "Warning msg from %s: %s"
% (cmd_name, str(warning_msg, 'utf-8')))
elif warning_msg:
kernellog.verbose(app, "Warning msg from %s (likely harmless):\n%s"
% (cmd_name, str(warning_msg, 'utf-8')))
return bool(exit_code == 0)
def svg2pdf_by_rsvg(app, svg_fname, pdf_fname):
"""Convert SVG to PDF with ``rsvg-convert(1)`` command.
* ``svg_fname`` pathname of input SVG file, including extension ``.svg``
* ``pdf_fname`` pathname of output PDF file, including extension ``.pdf``
Input SVG file should be the one generated by ``dot2format()``.
SVG -> PDF conversion is done by ``rsvg-convert(1)``.
If ``rsvg-convert(1)`` is unavailable, fall back to ``svg2pdf()``.
"""
if rsvg_convert_cmd is None:
ok = svg2pdf(app, svg_fname, pdf_fname)
else:
cmd = [rsvg_convert_cmd, '--format=pdf', '-o', pdf_fname, svg_fname]
# use stdout and stderr from parent
exit_code = subprocess.call(cmd)
if exit_code != 0:
kernellog.warn(app, "Error #%d when calling: %s" % (exit_code, " ".join(cmd)))
ok = bool(exit_code == 0)
return ok
# image handling
# ---------------------
......
......@@ -51,7 +51,7 @@ For example::
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo osnoise > current_tracer
It is possible to follow the trace by reading the trace trace file::
It is possible to follow the trace by reading the trace file::
[root@f32 tracing]# cat trace
# tracer: osnoise
......@@ -108,7 +108,7 @@ The tracer has a set of options inside the osnoise directory, they are:
option.
- tracing_threshold: the minimum delta between two time() reads to be
considered as noise, in us. When set to 0, the default value will
will be used, which is currently 5 us.
be used, which is currently 5 us.
Additional Tracing
------------------
......
# -*- coding: utf-8 -*-
# SPDX-License-Identifier: GPL-2.0
# -- Additinal options for LaTeX output ----------------------------------
# font config for ascii-art alignment
latex_elements['preamble'] += '''
\\IfFontExistsTF{Noto Sans CJK SC}{
% For CJK ascii-art alignment
\\setmonofont{Noto Sans Mono CJK SC}[AutoFakeSlant]
}{}
'''
......@@ -3,7 +3,7 @@
\renewcommand\thesection*
\renewcommand\thesubsection*
\kerneldocCJKon
\kerneldocBeginJP
\kerneldocBeginJP{
Japanese translations
=====================
......@@ -15,4 +15,4 @@ Japanese translations
.. raw:: latex
\kerneldocEndJP
}\kerneldocEndJP
......@@ -3,7 +3,7 @@
\renewcommand\thesection*
\renewcommand\thesubsection*
\kerneldocCJKon
\kerneldocBeginKR
\kerneldocBeginKR{
한국어 번역
===========
......@@ -26,5 +26,4 @@
.. raw:: latex
\normalsize
\kerneldocEndKR
}\kerneldocEndKR
......@@ -17,6 +17,8 @@ a) 等待一个CPU(任务为可运行)
b) 完成由该任务发起的块I/O同步请求
c) 页面交换
d) 内存回收
e) 页缓存抖动
f) 直接规整
并将这些统计信息通过taskstats接口提供给用户空间。
......@@ -37,10 +39,10 @@ d) 内存回收
向用户态返回一个通用数据结构,对应每pid或每tgid的统计信息。延时计数功能填写
该数据结构的特定字段。见
include/linux/taskstats.h
include/uapi/linux/taskstats.h
其描述了延时计数相关字段。系统通常以计数器形式返回 CPU、同步块 I/O、交换、内存
回收等的累积延时。
回收、页缓存抖动、直接规整等的累积延时。
取任务某计数器两个连续读数的差值,将得到任务在该时间间隔内等待对应资源的总延时。
......@@ -72,40 +74,36 @@ kernel.task_delayacct进行开关。注意,只有在启用延时计数后启
getdelays命令的一般格式::
getdelays [-t tgid] [-p pid] [-c cmd...]
getdelays [-dilv] [-t tgid] [-p pid]
获取pid为10的任务从系统启动后的延时信息::
# ./getdelays -p 10
# ./getdelays -d -p 10
(输出信息和下例相似)
获取所有tgid为5的任务从系统启动后的总延时信息::
# ./getdelays -t 5
CPU count real total virtual total delay total
7876 92005750 100000000 24001500
IO count delay total
0 0
SWAP count delay total
0 0
RECLAIM count delay total
0 0
获取指定简单命令运行时的延时信息::
# ./getdelays -c ls /
bin data1 data3 data5 dev home media opt root srv sys usr
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
CPU count real total virtual total delay total
6 4000250 4000000 0
IO count delay total
0 0
SWAP count delay total
0 0
RECLAIM count delay total
0 0
# ./getdelays -d -t 5
print delayacct stats ON
TGID 5
CPU count real total virtual total delay total delay average
8 7000000 6872122 3382277 0.423ms
IO count delay total delay average
0 0 0ms
SWAP count delay total delay average
0 0 0ms
RECLAIM count delay total delay average
0 0 0ms
THRASHING count delay total delay average
0 0 0ms
COMPACT count delay total delay average
0 0 0ms
获取pid为1的IO计数,它只和-p一起使用::
# ./getdelays -i -p 1
printing IO accounting
linuxrc: read=65536, write=0, cancelled_write=0
上面的命令与-v一起使用,可以获取更多调试信息。
......@@ -20,15 +20,15 @@ Linux 内核用户和管理员指南
Todolist:
kernel-parameters
devices
sysctl/index
* kernel-parameters
* devices
* sysctl/index
本节介绍CPU漏洞及其缓解措施。
Todolist:
hw-vuln/index
* hw-vuln/index
下面的一组文档,针对的是试图跟踪问题和bug的用户。
......@@ -44,18 +44,18 @@ Todolist:
Todolist:
reporting-bugs
ramoops
dynamic-debug-howto
kdump/index
perf/index
* reporting-bugs
* ramoops
* dynamic-debug-howto
* kdump/index
* perf/index
这是应用程序开发人员感兴趣的章节的开始。可以在这里找到涵盖内核ABI各个
方面的文档。
Todolist:
sysfs-rules
* sysfs-rules
本手册的其余部分包括各种指南,介绍如何根据您的喜好配置内核的特定行为。
......@@ -69,61 +69,61 @@ Todolist:
lockup-watchdogs
unicode
sysrq
mm/index
Todolist:
acpi/index
aoe/index
auxdisplay/index
bcache
binderfs
binfmt-misc
blockdev/index
bootconfig
braille-console
btmrvl
cgroup-v1/index
cgroup-v2
cifs/index
dell_rbu
device-mapper/index
edid
efi-stub
ext4
nfs/index
gpio/index
highuid
hw_random
initrd
iostats
java
jfs
kernel-per-CPU-kthreads
laptops/index
lcd-panel-cgram
ldm
LSM/index
md
media/index
mm/index
module-signing
mono
namespaces/index
numastat
parport
perf-security
pm/index
pnp
rapidio
ras
rtc
serial-console
svga
thunderbolt
ufs
vga-softcursor
video-output
xfs
* acpi/index
* aoe/index
* auxdisplay/index
* bcache
* binderfs
* binfmt-misc
* blockdev/index
* bootconfig
* braille-console
* btmrvl
* cgroup-v1/index
* cgroup-v2
* cifs/index
* dell_rbu
* device-mapper/index
* edid
* efi-stub
* ext4
* nfs/index
* gpio/index
* highuid
* hw_random
* initrd
* iostats
* java
* jfs
* kernel-per-CPU-kthreads
* laptops/index
* lcd-panel-cgram
* ldm
* LSM/index
* md
* media/index
* module-signing
* mono
* namespaces/index
* numastat
* parport
* perf-security
* pm/index
* pnp
* rapidio
* ras
* rtc
* serial-console
* svga
* thunderbolt
* ufs
* vga-softcursor
* video-output
* xfs
.. only:: subproject and html
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../../../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/mm/damon/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
============
监测数据访问
============
:doc:`DAMON </vm/damon/index>` 允许轻量级的数据访问监测。使用DAMON,
用户可以分析他们系统的内存访问模式,并优化它们。
.. toctree::
:maxdepth: 2
start
usage
reclaim
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../../../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/mm/damon/reclaim.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
===============
基于DAMON的回收
===============
基于DAMON的回收(DAMON_RECLAIM)是一个静态的内核模块,旨在用于轻度内存压力下的主动和轻
量级的回收。它的目的不是取代基于LRU列表的页面回收,而是有选择地用于不同程度的内存压力和要
求。
哪些地方需要主动回收?
======================
在一般的内存超量使用(over-committed systems,虚拟化相关术语)的系统上,主动回收冷页
有助于节省内存和减少延迟高峰,这些延迟是由直接回收进程或kswapd的CPU消耗引起的,同时只产
生最小的性能下降 [1]_ [2]_ 。
基于空闲页报告 [3]_ 的内存过度承诺的虚拟化系统就是很好的例子。在这样的系统中,客户机
向主机报告他们的空闲内存,而主机则将报告的内存重新分配给其他客户。因此,系统的内存得到了充
分的利用。然而,客户可能不那么节省内存,主要是因为一些内核子系统和用户空间应用程序被设计为
使用尽可能多的内存。然后,客户机可能只向主机报告少量的内存是空闲的,导致系统的内存利用率下降。
在客户中运行主动回收可以缓解这个问题。
它是如何工作的?
================
DAMON_RECLAIM找到在特定时间内没有被访问的内存区域并分页。为了避免它在分页操作中消耗过多
的CPU,可以配置一个速度限制。在这个速度限制下,它首先分页出那些没有被访问过的内存区域。系
统管理员还可以配置在什么情况下这个方案应该自动激活和停用三个内存压力水位。
接口: 模块参数
==============
要使用这个功能,你首先要确保你的系统运行在一个以 ``CONFIG_DAMON_RECLAIM=y`` 构建的内
核上。
为了让系统管理员启用或禁用它,并为给定的系统进行调整,DAMON_RECLAIM利用了模块参数。也就
是说,你可以把 ``damon_reclaim.<parameter>=<value>`` 放在内核启动命令行上,或者把
适当的值写入 ``/sys/modules/damon_reclaim/parameters/<parameter>`` 文件。
注意,除 ``启用`` 外的参数值只在DAMON_RECLAIM启动时应用。因此,如果你想在运行时应用新
的参数值,而DAMON_RECLAIM已经被启用,你应该通过 ``启用`` 的参数文件禁用和重新启用它。
在重新启用之前,应将新的参数值写入适当的参数值中。
下面是每个参数的描述。
enable
------
启用或禁用DAMON_RECLAIM。
你可以通过把这个参数的值设置为 ``Y`` 来启用DAMON_RCLAIM,把它设置为 ``N`` 可以禁用
DAMON_RECLAIM。注意,由于基于水位的激活条件,DAMON_RECLAIM不能进行真正的监测和回收。
这一点请参考下面关于水位参数的描述。
min_age
-------
识别冷内存区域的时间阈值,单位是微秒。
如果一个内存区域在这个时间或更长的时间内没有被访问,DAMON_RECLAIM会将该区域识别为冷的,
并回收它。
默认为120秒。
quota_ms
--------
回收的时间限制,以毫秒为单位。
DAMON_RECLAIM 试图在一个时间窗口(quota_reset_interval_ms)内只使用到这个时间,以
尝试回收冷页。这可以用来限制DAMON_RECLAIM的CPU消耗。如果该值为零,则该限制被禁用。
默认为10ms。
quota_sz
--------
回收的内存大小限制,单位为字节。
DAMON_RECLAIM 收取在一个时间窗口(quota_reset_interval_ms)内试图回收的内存量,并
使其不超过这个限制。这可以用来限制CPU和IO的消耗。如果该值为零,则限制被禁用。
默认情况下是128 MiB。
quota_reset_interval_ms
-----------------------
时间/大小配额收取重置间隔,单位为毫秒。
时间(quota_ms)和大小(quota_sz)的配额的目标重置间隔。也就是说,DAMON_RECLAIM在
尝试回收‘不’超过quota_ms毫秒或quota_sz字节的内存。
默认为1秒。
wmarks_interval
---------------
当DAMON_RECLAIM被启用但由于其水位规则而不活跃时,在检查水位之前的最小等待时间。
wmarks_high
-----------
高水位的可用内存率(每千字节)。
如果系统的可用内存(以每千字节为单位)高于这个数值,DAMON_RECLAIM就会变得不活跃,所以
它什么也不做,只是定期检查水位。
wmarks_mid
----------
中间水位的可用内存率(每千字节)。
如果系统的空闲内存(以每千字节为单位)在这个和低水位线之间,DAMON_RECLAIM就会被激活,
因此开始监测和回收。
wmarks_low
----------
低水位的可用内存率(每千字节)。
如果系统的空闲内存(以每千字节为单位)低于这个数值,DAMON_RECLAIM就会变得不活跃,所以
它除了定期检查水位外什么都不做。在这种情况下,系统会退回到基于LRU列表的页面粒度回收逻辑。
sample_interval
---------------
监测的采样间隔,单位是微秒。
DAMON用于监测冷内存的采样间隔。更多细节请参考DAMON文档 (:doc:`usage`) 。
aggr_interval
-------------
监测的聚集间隔,单位是微秒。
DAMON对冷内存监测的聚集间隔。更多细节请参考DAMON文档 (:doc:`usage`)。
min_nr_regions
--------------
监测区域的最小数量。
DAMON用于冷内存监测的最小监测区域数。这可以用来设置监测质量的下限。但是,设
置的太高可能会导致监测开销的增加。更多细节请参考DAMON文档 (:doc:`usage`) 。
max_nr_regions
--------------
监测区域的最大数量。
DAMON用于冷内存监测的最大监测区域数。这可以用来设置监测开销的上限值。但是,
设置得太低可能会导致监测质量不好。更多细节请参考DAMON文档 (:doc:`usage`) 。
monitor_region_start
--------------------
目标内存区域的物理地址起点。
DAMON_RECLAIM将对其进行工作的内存区域的起始物理地址。也就是说,DAMON_RECLAIM
将在这个区域中找到冷的内存区域并进行回收。默认情况下,该区域使用最大系统内存区。
monitor_region_end
------------------
目标内存区域的结束物理地址。
DAMON_RECLAIM将对其进行工作的内存区域的末端物理地址。也就是说,DAMON_RECLAIM将
在这个区域内找到冷的内存区域并进行回收。默认情况下,该区域使用最大系统内存区。
kdamond_pid
-----------
DAMON线程的PID。
如果DAMON_RECLAIM被启用,这将成为工作线程的PID。否则,为-1。
nr_reclaim_tried_regions
------------------------
试图通过DAMON_RECLAIM回收的内存区域的数量。
bytes_reclaim_tried_regions
---------------------------
试图通过DAMON_RECLAIM回收的内存区域的总字节数。
nr_reclaimed_regions
--------------------
通过DAMON_RECLAIM成功回收的内存区域的数量。
bytes_reclaimed_regions
-----------------------
通过DAMON_RECLAIM成功回收的内存区域的总字节数。
nr_quota_exceeds
----------------
超过时间/空间配额限制的次数。
例子
====
下面的运行示例命令使DAMON_RECLAIM找到30秒或更长时间没有访问的内存区域并“回收”?
为了避免DAMON_RECLAIM在分页操作中消耗过多的CPU时间,回收被限制在每秒1GiB以内。
它还要求DAMON_RECLAIM在系统的可用内存率超过50%时不做任何事情,但如果它低于40%时
就开始真正的工作。如果DAMON_RECLAIM没有取得进展,因此空闲内存率低于20%,它会要求
DAMON_RECLAIM再次什么都不做,这样我们就可以退回到基于LRU列表的页面粒度回收了::
# cd /sys/modules/damon_reclaim/parameters
# echo 30000000 > min_age
# echo $((1 * 1024 * 1024 * 1024)) > quota_sz
# echo 1000 > quota_reset_interval_ms
# echo 500 > wmarks_high
# echo 400 > wmarks_mid
# echo 200 > wmarks_low
# echo Y > enabled
.. [1] https://research.google/pubs/pub48551/
.. [2] https://lwn.net/Articles/787611/
.. [3] https://www.kernel.org/doc/html/latest/vm/free_page_reporting.html
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../../../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/mm/damon/start.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========
入门指南
========
本文通过演示DAMON的默认用户空间工具,简要地介绍了如何使用DAMON。请注意,为了简洁
起见,本文档只描述了它的部分功能。更多细节请参考该工具的使用文档。
`doc <https://github.com/awslabs/damo/blob/next/USAGE.md>`_ .
前提条件
========
内核
----
首先,你要确保你当前系统中跑的内核构建时选定了这个功能选项 ``CONFIG_DAMON_*=y``.
用户空间工具
------------
在演示中,我们将使用DAMON的默认用户空间工具,称为DAMON Operator(DAMO)。它可以在
https://github.com/awslabs/damo找到。下面的例子假设DAMO在你的$PATH上。当然,但
这并不是强制性的。
因为DAMO使用的是DAMON的debugfs接口(详情请参考 :doc:`usage` 中的使用方法) 你应该
确保debugfs被挂载。手动挂载它,如下所示::
# mount -t debugfs none /sys/kernel/debug/
或者在你的 ``/etc/fstab`` 文件中添加以下一行,这样你的系统就可以在启动时自动挂载
debugfs了::
debugfs /sys/kernel/debug debugfs defaults 0 0
记录数据访问模式
================
下面的命令记录了一个程序的内存访问模式,并将监测结果保存到文件中。 ::
$ git clone https://github.com/sjp38/masim
$ cd masim; make; ./masim ./configs/zigzag.cfg &
$ sudo damo record -o damon.data $(pidof masim)
命令的前两行下载了一个人工内存访问生成器程序并在后台运行。生成器将重复地逐一访问两个
100 MiB大小的内存区域。你可以用你的真实工作负载来代替它。最后一行要求 ``damo`` 将
访问模式记录在 ``damon.data`` 文件中。
将记录的模式可视化
==================
你可以在heatmap中直观地看到这种模式,显示哪个内存区域(X轴)何时被访问(Y轴)以及访
问的频率(数字)。::
$ sudo damo report heats --heatmap stdout
22222222222222222222222222222222222222211111111111111111111111111111111111111100
44444444444444444444444444444444444444434444444444444444444444444444444444443200
44444444444444444444444444444444444444433444444444444444444444444444444444444200
33333333333333333333333333333333333333344555555555555555555555555555555555555200
33333333333333333333333333333333333344444444444444444444444444444444444444444200
22222222222222222222222222222222222223355555555555555555555555555555555555555200
00000000000000000000000000000000000000288888888888888888888888888888888888888400
00000000000000000000000000000000000000288888888888888888888888888888888888888400
33333333333333333333333333333333333333355555555555555555555555555555555555555200
88888888888888888888888888888888888888600000000000000000000000000000000000000000
88888888888888888888888888888888888888600000000000000000000000000000000000000000
33333333333333333333333333333333333333444444444444444444444444444444444444443200
00000000000000000000000000000000000000288888888888888888888888888888888888888400
[...]
# access_frequency: 0 1 2 3 4 5 6 7 8 9
# x-axis: space (139728247021568-139728453431248: 196.848 MiB)
# y-axis: time (15256597248362-15326899978162: 1 m 10.303 s)
# resolution: 80x40 (2.461 MiB and 1.758 s for each character)
你也可以直观地看到工作集的大小分布,按大小排序。::
$ sudo damo report wss --range 0 101 10
# <percentile> <wss>
# target_id 18446632103789443072
# avr: 107.708 MiB
0 0 B | |
10 95.328 MiB |**************************** |
20 95.332 MiB |**************************** |
30 95.340 MiB |**************************** |
40 95.387 MiB |**************************** |
50 95.387 MiB |**************************** |
60 95.398 MiB |**************************** |
70 95.398 MiB |**************************** |
80 95.504 MiB |**************************** |
90 190.703 MiB |********************************************************* |
100 196.875 MiB |***********************************************************|
在上述命令中使用 ``--sortby`` 选项,可以显示工作集的大小是如何按时间顺序变化的。::
$ sudo damo report wss --range 0 101 10 --sortby time
# <percentile> <wss>
# target_id 18446632103789443072
# avr: 107.708 MiB
0 3.051 MiB | |
10 190.703 MiB |***********************************************************|
20 95.336 MiB |***************************** |
30 95.328 MiB |***************************** |
40 95.387 MiB |***************************** |
50 95.332 MiB |***************************** |
60 95.320 MiB |***************************** |
70 95.398 MiB |***************************** |
80 95.398 MiB |***************************** |
90 95.340 MiB |***************************** |
100 95.398 MiB |***************************** |
数据访问模式感知的内存管理
==========================
以下三个命令使每一个大小>=4K的内存区域在你的工作负载中没有被访问>=60秒,就会被换掉。 ::
$ echo "#min-size max-size min-acc max-acc min-age max-age action" > test_scheme
$ echo "4K max 0 0 60s max pageout" >> test_scheme
$ damo schemes -c test_scheme <pid of your workload>
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/mm/index.rst
:翻译:
徐鑫 xu xin <xu.xin16@zte.com.cn>
========
内存管理
========
Linux内存管理子系统,顾名思义,是负责系统中的内存管理。它包括了虚拟内存与请求
分页的实现,内核内部结构和用户空间程序的内存分配、将文件映射到进程地址空间以
及许多其他很酷的事情。
Linux内存管理是一个具有许多可配置设置的复杂系统, 且这些设置中的大多数都可以通
过 ``/proc`` 文件系统获得,并且可以使用 ``sysctl`` 进行查询和调整。这些API接
口被描述在Documentation/admin-guide/sysctl/vm.rst文件和 `man 5 proc`_ 中。
.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
Linux内存管理有它自己的术语,如果你还不熟悉它,请考虑阅读下面参考:
:ref:`Documentation/admin-guide/mm/concepts.rst <mm_concepts>`.
在此目录下,我们详细描述了如何与Linux内存管理中的各种机制交互。
.. toctree::
:maxdepth: 1
damon/index
ksm
Todolist:
* concepts
* cma_debugfs
* hugetlbpage
* idle_page_tracking
* memory-hotplug
* nommu-mmap
* numa_memory_policy
* numaperf
* pagemap
* soft-dirty
* swap_numa
* transhuge
* userfaultfd
* zswap
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/mm/ksm.rst
:翻译:
徐鑫 xu xin <xu.xin16@zte.com.cn>
============
内核同页合并
============
概述
====
KSM是一种能节省内存的数据去重功能,由CONFIG_KSM=y启用,并在2.6.32版本时被添
加到Linux内核。详见 ``mm/ksm.c`` 的实现,以及http://lwn.net/Articles/306704
和https://lwn.net/Articles/330589
KSM最初目的是为了与KVM(即著名的内核共享内存)一起使用而开发的,通过共享虚拟机
之间的公共数据,将更多虚拟机放入物理内存。但它对于任何会生成多个相同数据实例的
应用程序都是很有用的。
KSM的守护进程ksmd会定期扫描那些已注册的用户内存区域,查找内容相同的页面,这些
页面可以被单个写保护页面替换(如果进程以后想要更新其内容,将自动复制)。使用:
引用:`sysfs intraface <ksm_sysfs>` 接口来配置KSM守护程序在单个过程中所扫描的页
数以及两个过程之间的间隔时间。
KSM只合并匿名(私有)页面,从不合并页缓存(文件)页面。KSM的合并页面最初只能被
锁定在内核内存中,但现在可以就像其他用户页面一样被换出(但当它们被交换回来时共
享会被破坏: ksmd必须重新发现它们的身份并再次合并)。
以madvise控制KSM
================
KSM仅在特定的地址空间区域时运行,即应用程序通过使用如下所示的madvise(2)系统调
用来请求某块地址成为可能的合并候选者的地址空间::
int madvise(addr, length, MADV_MERGEABLE)
应用程序当然也可以通过调用::
int madvise(addr, length, MADV_UNMERGEABLE)
来取消该请求,并恢复为非共享页面:此时KSM将去除合并在该范围内的任何合并页。注意:
这个去除合并的调用可能突然需要的内存量超过实际可用的内存量-那么可能会出现EAGAIN
失败,但更可能会唤醒OOM killer。
如果KSM未被配置到正在运行的内核中,则madvise MADV_MERGEABLE 和 MADV_UNMERGEABLE
的调用只会以EINVAL 失败。如果正在运行的内核是用CONFIG_KSM=y方式构建的,那么这些
调用通常会成功:即使KSM守护程序当前没有运行,MADV_MERGEABLE 仍然会在KSM守护程序
启动时注册范围,即使该范围不能包含KSM实际可以合并的任何页面,即使MADV_UNMERGEABLE
应用于从未标记为MADV_MERGEABLE的范围。
如果一块内存区域必须被拆分为至少一个新的MADV_MERGEABLE区域或MADV_UNMERGEABLE区域,
当该进程将超过 ``vm.max_map_count`` 的设定,则madvise可能返回ENOMEM。(请参阅文档
Documentation/admin-guide/sysctl/vm.rst)。
与其他madvise调用一样,它们在用户地址空间的映射区域上使用:如果指定的范围包含未
映射的间隙(尽管在中间的映射区域工作),它们将报告ENOMEM,如果没有足够的内存用于
内部结构,则可能会因EAGAIN而失败。
KSM守护进程sysfs接口
====================
KSM守护进程可以由``/sys/kernel/mm/ksm/`` 中的sysfs文件控制,所有人都可以读取,但
只能由root用户写入。各接口解释如下:
pages_to_scan
ksmd进程进入睡眠前要扫描的页数。
例如, ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``
默认值:100(该值被选择用于演示目的)
sleep_millisecs
ksmd在下次扫描前应休眠多少毫秒
例如, ``echo 20 > /sys/kernel/mm/ksm/sleep_millisecs``
默认值:20(该值被选择用于演示目的)
merge_across_nodes
指定是否可以合并来自不同NUMA节点的页面。当设置为0时,ksm仅合并在物理上位
于同一NUMA节点的内存区域中的页面。这降低了访问共享页面的延迟。在有明显的
NUMA距离上,具有更多节点的系统可能受益于设置该值为0时的更低延迟。而对于
需要对内存使用量最小化的较小系统来说,设置该值为1(默认设置)则可能会受
益于更大共享页面。在决定使用哪种设置之前,您可能希望比较系统在每种设置下
的性能。 ``merge_across_nodes`` 仅当系统中没有ksm共享页面时,才能被更改设
置:首先将接口`run` 设置为2从而对页进行去合并,然后在修改
``merge_across_nodes`` 后再将‘run’又设置为1,以根据新设置来重新合并。
默认值:1(如早期的发布版本一样合并跨站点)
run
* 设置为0可停止ksmd运行,但保留合并页面,
* 设置为1可运行ksmd,例如, ``echo 1 > /sys/kernel/mm/ksm/run`` ,
* 设置为2可停止ksmd运行,并且对所有目前已合并的页进行去合并,但保留可合并
区域以供下次运行。
默认值:0(必须设置为1才能激活KSM,除非禁用了CONFIG_SYSFS)
use_zero_pages
指定是否应当特殊处理空页(即那些仅含zero的已分配页)。当该值设置为1时,
空页与内核零页合并,而不是像通常情况下那样空页自身彼此合并。这可以根据
工作负载的不同,在具有着色零页的架构上可以提高性能。启用此设置时应小心,
因为它可能会降低某些工作负载的KSM性能,比如,当待合并的候选页面的校验和
与空页面的校验和恰好匹配的时候。此设置可随时更改,仅对那些更改后再合并
的页面有效。
默认值:0(如同早期版本的KSM正常表现)
max_page_sharing
单个KSM页面允许的最大共享站点数。这将强制执行重复数据消除限制,以避免涉
及遍历共享KSM页面的虚拟映射的虚拟内存操作的高延迟。最小值为2,因为新创
建的KSM页面将至少有两个共享者。该值越高,KSM合并内存的速度越快,去重
因子也越高,但是对于任何给定的KSM页面,虚拟映射的最坏情况遍历的速度也会
越慢。减慢了这种遍历速度就意味着在交换、压缩、NUMA平衡和页面迁移期间,
某些虚拟内存操作将有更高的延迟,从而降低这些虚拟内存操作调用者的响应能力。
其他任务如果不涉及执行虚拟映射遍历的VM操作,其任务调度延迟不受此参数的影
响,因为这些遍历本身是调度友好的。
stable_node_chains_prune_millisecs
指定KSM检查特定页面的元数据的频率(即那些达到过时信息数据去重限制标准的
页面)单位是毫秒。较小的毫秒值将以更低的延迟来释放KSM元数据,但它们将使
ksmd在扫描期间使用更多CPU。如果还没有一个KSM页面达到 ``max_page_sharing``
标准,那就没有什么用。
KSM与MADV_MERGEABLE的工作有效性体现于 ``/sys/kernel/mm/ksm/`` 路径下的接口:
pages_shared
表示多少共享页正在被使用
pages_sharing
表示还有多少站点正在共享这些共享页,即节省了多少
pages_unshared
表示有多少页是唯一的,但被反复检查以进行合并
pages_volatile
表示有多少页因变化太快而无法放在tree中
full_scans
表示所有可合并区域已扫描多少次
stable_node_chains
达到 ``max_page_sharing`` 限制的KSM页数
stable_node_dups
重复的KSM页数
比值 ``pages_sharing/pages_shared`` 的最大值受限制于 ``max_page_sharing``
的设定。要想增加该比值,则相应地要增加 ``max_page_sharing`` 的值。
......@@ -42,6 +42,7 @@
kref
assoc_array
xarray
rbtree
Todolist:
......@@ -49,7 +50,6 @@ Todolist:
idr
circular-buffers
rbtree
generic-radix-tree
packing
bus-virt-phys-mapping
......
This diff is collapsed.
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=============================
Open Firmware 和 Devicetree
=============================
该文档是整个设备树文档的总目录,标题中多是业内默认的术语,初见就翻译成中文,
晦涩难懂,因此尽量保留,后面翻译其子文档时,可能会根据语境,灵活地翻译为中文。
内核Devicetree的使用
=======================
.. toctree::
:maxdepth: 1
usage-model
of_unittest
Todolist:
* kernel-api
Devicetree Overlays
===================
.. toctree::
:maxdepth: 1
Todolist:
* changesets
* dynamic-resolution-notes
* overlay-notes
Devicetree Bindings
===================
.. toctree::
:maxdepth: 1
Todolist:
* bindings/index
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/Devicetree/of_unittest.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
=================================
Open Firmware Devicetree 单元测试
=================================
作者: Gaurav Minocha <gaurav.minocha.os@gmail.com>
1. 概述
=======
本文档解释了执行 OF 单元测试所需的测试数据是如何动态地附加到实时树上的,与机器的架构无关。
建议在继续读下去之前,先阅读以下文件。
(1) Documentation/devicetree/usage-model.rst
(2) http://www.devicetree.org/Device_Tree_Usage
OF Selftest被设计用来测试提供给设备驱动开发者的接口(include/linux/of.h),以从未扁平
化的设备树数据结构中获取设备信息等。这个接口被大多数设备驱动在各种使用情况下使用。
2. 测试数据
===========
设备树源文件(drivers/of/unittest-data/testcases.dts)包含执行drivers/of/unittest.c
中自动化单元测试所需的测试数据。目前,以下设备树源包含文件(.dtsi)被包含在testcases.dt中::
drivers/of/unittest-data/tests-interrupts.dtsi
drivers/of/unittest-data/tests-platform.dtsi
drivers/of/unittest-data/tests-phandle.dtsi
drivers/of/unittest-data/tests-match.dtsi
当内核在启用OF_SELFTEST的情况下被构建时,那么下面的make规则::
$(obj)/%.dtb: $(src)/%.dts FORCE
$(call if_changed_dep, dtc)
用于将DT源文件(testcases.dts)编译成二进制blob(testcases.dtb),也被称为扁平化的DT。
之后,使用以下规则将上述二进制blob包装成一个汇编文件(testcases.dtb.S)::
$(obj)/%.dtb.S: $(obj)/%.dtb
$(call cmd, dt_S_dtb)
汇编文件被编译成一个对象文件(testcases.dtb.o),并被链接到内核镜像中。
2.1. 添加测试数据
-----------------
未扁平化的设备树结构体:
未扁平化的设备树由连接的设备节点组成,其树状结构形式如下所述::
// following struct members are used to construct the tree
struct device_node {
...
struct device_node *parent;
struct device_node *child;
struct device_node *sibling;
...
};
图1描述了一个机器的未扁平化设备树的通用结构,只考虑了子节点和同级指针。存在另一个指针,
``*parent`` ,用于反向遍历该树。因此,在一个特定的层次上,子节点和所有的兄弟姐妹节点将
有一个指向共同节点的父指针(例如,child1、sibling2、sibling3、sibling4的父指针指向
根节点)::
root ('/')
|
child1 -> sibling2 -> sibling3 -> sibling4 -> null
| | | |
| | | null
| | |
| | child31 -> sibling32 -> null
| | | |
| | null null
| |
| child21 -> sibling22 -> sibling23 -> null
| | | |
| null null null
|
child11 -> sibling12 -> sibling13 -> sibling14 -> null
| | | |
| | | null
| | |
null null child131 -> null
|
null
Figure 1: 未扁平化的设备树的通用结构
在执行OF单元测试之前,需要将测试数据附加到机器的设备树上(如果存在)。因此,当调用
selftest_data_add()时,首先会读取通过以下内核符号链接到内核镜像中的扁平化设备树
数据::
__dtb_testcases_begin - address marking the start of test data blob
__dtb_testcases_end - address marking the end of test data blob
其次,它调用of_fdt_unflatten_tree()来解除扁平化的blob。最后,如果机器的设备树
(即实时树)是存在的,那么它将未扁平化的测试数据树附加到实时树上,否则它将自己作为
实时设备树附加。
attach_node_and_children()使用of_attach_node()将节点附加到实时树上,如下所
述。为了解释这一点,图2中描述的测试数据树被附加到图1中描述的实时树上::
root ('/')
|
testcase-data
|
test-child0 -> test-sibling1 -> test-sibling2 -> test-sibling3 -> null
| | | |
test-child01 null null null
Figure 2: 将测试数据树附在实时树上的例子。
根据上面的方案,实时树已经存在,所以不需要附加根('/')节点。所有其他节点都是通过在
每个节点上调用of_attach_node()来附加的。
在函数of_attach_node()中,新的节点被附在实时树中给定的父节点的子节点上。但是,如
果父节点已经有了一个孩子,那么新节点就会取代当前的孩子,并将其变成其兄弟姐妹。因此,
当测试案例的数据节点被连接到上面的实时树(图1)时,最终的结构如图3所示::
root ('/')
|
testcase-data -> child1 -> sibling2 -> sibling3 -> sibling4 -> null
| | | | |
(...) | | | null
| | child31 -> sibling32 -> null
| | | |
| | null null
| |
| child21 -> sibling22 -> sibling23 -> null
| | | |
| null null null
|
child11 -> sibling12 -> sibling13 -> sibling14 -> null
| | | |
null null | null
|
child131 -> null
|
null
-----------------------------------------------------------------------
root ('/')
|
testcase-data -> child1 -> sibling2 -> sibling3 -> sibling4 -> null
| | | | |
| (...) (...) (...) null
|
test-sibling3 -> test-sibling2 -> test-sibling1 -> test-child0 -> null
| | | |
null null null test-child01
Figure 3: 附加测试案例数据后的实时设备树结构。
聪明的读者会注意到,与先前的结构相比,test-child0节点成为最后一个兄弟姐妹(图2)。
在连接了第一个test-child0节点之后,又连接了test-sibling1节点,该节点推动子节点
(即test-child0)成为兄弟姐妹,并使自己成为子节点,如上所述。
如果发现一个重复的节点(即如果一个具有相同full_name属性的节点已经存在于实时树中),
那么该节点不会被附加,而是通过调用函数update_node_properties()将其属性更新到活
树的节点中。
2.2. 删除测试数据
-----------------
一旦测试用例执行完,selftest_data_remove被调用,以移除最初连接的设备节点(首先是
叶子节点被分离,然后向上移动父节点被移除,最后是整个树)。selftest_data_remove()
调用detach_node_and_children(),使用of_detach_node()将节点从实时设备树上分离。
为了分离一个节点,of_detach_node()要么将给定节点的父节点的子节点指针更新为其同级节
点,要么根据情况将前一个同级节点附在给定节点的同级节点上。就这样吧。 :)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -18,6 +18,7 @@ RISC-V 体系结构
:maxdepth: 1
boot-image-header
vm-layout
pmu
patch-acceptance
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -5,7 +5,7 @@
\renewcommand\thesection*
\renewcommand\thesubsection*
\kerneldocCJKon
\kerneldocBeginTC
\kerneldocBeginTC{
.. _linux_doc_zh_tw:
......@@ -174,4 +174,4 @@ TODOList:
.. raw:: latex
\kerneldocEndTC
}\kerneldocEndTC
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment