Commit 6f38be8f authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.17' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "This isn't a hugely busy cycle for documentation, but a few
  significant things still showed up:

   - A documentation section for ARC processors

   - Reworked and enhanced KUnit documentation

   - The ability to pick your own theme for HTML builds; if the default
     "Read the Docs" theme isn't ugly enough for you, you can now pick
     an uglier one.

   - More Chinese translation work

  Plus the usual assortment of fixes and cleanups"

* tag 'docs-5.17' of git://git.lwn.net/linux: (53 commits)
  scripts: sphinx-pre-install: Fix ctex support on Debian
  docs: discourage use of list tables
  docs: 5.Posting.rst: describe Fixes: and Link: tags
  Documentation: kgdb: Replace deprecated remotebaud
  docs: automarkup.py: Fix invalid HTML link output and broken URI fragments
  Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
  Documentation: kgdb: properly capitalize the MAGIC_SYSRQ config
  docs/zh_CN: Update and fix a couple of typos
  scripts: sphinx-pre-install: add required ctex dependency
  Documentation: KUnit: Restyled Frequently Asked Questions
  Documentation: KUnit: Restyle Test Style and Nomenclature page
  Documentation: KUnit: Rework writing page to focus on writing tests
  Documentation: kunit: Reorganize documentation related to running tests
  Documentation: KUnit: Added KUnit Architecture
  Documentation: KUnit: Rewrite getting started
  Documentation: KUnit: Rewrite main page
  docs/zh_CN: Add zh_CN/accounting/delay-accounting.rst
  Documentation/sphinx: fix typos of "its"
  docs/zh_CN: Add sched-domains translation
  doc: fs: remove bdev_try_to_free_page related doc
  ...
parents 1be5bdf8 87d6576d
......@@ -19,6 +19,8 @@ endif
SPHINXBUILD = sphinx-build
SPHINXOPTS =
SPHINXDIRS = .
DOCS_THEME =
DOCS_CSS =
_SPHINXDIRS = $(sort $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)))
SPHINX_CONF = conf.py
PAPER =
......@@ -84,7 +86,10 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
-D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
$(ALLSPHINXOPTS) \
$(abspath $(srctree)/$(src)/$5) \
$(abspath $(BUILDDIR)/$3/$4)
$(abspath $(BUILDDIR)/$3/$4) && \
if [ "x$(DOCS_CSS)" != "x" ]; then \
cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
fi
htmldocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
......@@ -154,4 +159,8 @@ dochelp:
@echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
@echo ' configuration. This is e.g. useful to build with nit-picking config.'
@echo
@echo ' make DOCS_THEME={sphinx-theme} selects a different Sphinx theme.'
@echo
@echo ' make DOCS_CSS={a .css file} adds a DOCS_CSS override file for html/epub output.'
@echo
@echo ' Default location for the generated documents is Documentation/output'
......@@ -468,7 +468,7 @@ Spectre variant 2
before invoking any firmware code to prevent Spectre variant 2 exploits
using the firmware.
Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
Using kernel address space randomization (CONFIG_RANDOMIZE_BASE=y
and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
attacks on the kernel generally more difficult.
......
.. SPDX-License-Identifier: GPL-2.0
Linux kernel for ARC processors
*******************************
Other sources of information
############################
Below are some resources where more information can be found on
ARC processors and relevant open source projects.
- `<https://embarc.org>`_ - Community portal for open source on ARC.
Good place to start to find relevant FOSS projects, toolchain releases,
news items and more.
- `<https://github.com/foss-for-synopsys-dwc-arc-processors>`_ -
Home for all development activities regarding open source projects for
ARC processors. Some of the projects are forks of various upstream projects,
where "work in progress" is hosted prior to submission to upstream projects.
Other projects are developed by Synopsys and made available to community
as open source for use on ARC Processors.
- `Official Synopsys ARC Processors website
<https://www.synopsys.com/designware-ip/processor-solutions.html>`_ -
location, with access to some IP documentation (`Programmer's Reference
Manual, AKA PRM for ARC HS processors
<https://www.synopsys.com/dw/doc.php/ds/cc/programmers-reference-manual-ARC-HS.pdf>`_)
and free versions of some commercial tools (`Free nSIM
<https://www.synopsys.com/cgi-bin/dwarcnsim/req1.cgi>`_ and
`MetaWare Light Edition <https://www.synopsys.com/cgi-bin/arcmwtk_lite/reg1.cgi>`_).
Please note though, registration is required to access both the documentation and
the tools.
Important note on ARC processors configurability
################################################
ARC processors are highly configurable and several configurable options
are supported in Linux. Some options are transparent to software
(i.e cache geometries, some can be detected at runtime and configured
and used accordingly, while some need to be explicitly selected or configured
in the kernel's configuration utility (AKA "make menuconfig").
However not all configurable options are supported when an ARC processor
is to run Linux. SoC design teams should refer to "Appendix E:
Configuration for ARC Linux" in the ARC HS Databook for configurability
guidelines.
Following these guidelines and selecting valid configuration options
up front is critical to help prevent any unwanted issues during
SoC bringup and software development in general.
Building the Linux kernel for ARC processors
############################################
The process of kernel building for ARC processors is the same as for any other
architecture and could be done in 2 ways:
- Cross-compilation: process of compiling for ARC targets on a development
host with a different processor architecture (generally x86_64/amd64).
- Native compilation: process of compiling for ARC on a ARC platform
(hardware board or a simulator like QEMU) with complete development environment
(GNU toolchain, dtc, make etc) installed on the platform.
In both cases, up-to-date GNU toolchain for ARC for the host is needed.
Synopsys offers prebuilt toolchain releases which can be used for this purpose,
available from:
- Synopsys GNU toolchain releases:
`<https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/releases>`_
- Linux kernel compilers collection:
`<https://mirrors.edge.kernel.org/pub/tools/crosstool>`_
- Bootlin's toolchain collection: `<https://toolchains.bootlin.com>`_
Once the toolchain is installed in the system, make sure its "bin" folder
is added in your ``PATH`` environment variable. Then set ``ARCH=arc`` &
``CROSS_COMPILE=arc-linux`` (or whatever matches installed ARC toolchain prefix)
and then as usual ``make defconfig && make``.
This will produce "vmlinux" file in the root of the kernel source tree
usable for loading on the target system via JTAG.
If you need to get an image usable with U-Boot bootloader,
type ``make uImage`` and ``uImage`` will be produced in ``arch/arc/boot``
folder.
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features arc
===================
ARC architecture
===================
.. toctree::
:maxdepth: 1
arc
features
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
......@@ -9,6 +9,7 @@ implementation.
.. toctree::
:maxdepth: 2
arc/index
arm/index
arm64/index
ia64/index
......
......@@ -208,16 +208,86 @@ highlight_language = 'none'
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
# The Read the Docs theme is available from
# - https://github.com/snide/sphinx_rtd_theme
# - https://pypi.python.org/pypi/sphinx_rtd_theme
# - python-sphinx-rtd-theme package (on Debian)
try:
# Default theme
html_theme = 'sphinx_rtd_theme'
html_css_files = []
if "DOCS_THEME" in os.environ:
html_theme = os.environ["DOCS_THEME"]
if html_theme == 'sphinx_rtd_theme' or html_theme == 'sphinx_rtd_dark_mode':
# Read the Docs theme
try:
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
except ImportError:
sys.stderr.write('Warning: The Sphinx \'sphinx_rtd_theme\' HTML theme was not found. Make sure you have the theme installed to produce pretty HTML output. Falling back to the default theme.\n')
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_css_files = [
'theme_overrides.css',
]
# Read the Docs dark mode override theme
if html_theme == 'sphinx_rtd_dark_mode':
try:
import sphinx_rtd_dark_mode
extensions.append('sphinx_rtd_dark_mode')
except ImportError:
html_theme == 'sphinx_rtd_theme'
if html_theme == 'sphinx_rtd_theme':
# Add color-specific RTD normal mode
html_css_files.append('theme_rtd_colors.css')
except ImportError:
html_theme = 'classic'
if "DOCS_CSS" in os.environ:
css = os.environ["DOCS_CSS"].split(" ")
for l in css:
html_css_files.append(l)
if major <= 1 and minor < 8:
html_context = {
'css_files': [],
}
for l in html_css_files:
html_context['css_files'].append('_static/' + l)
if html_theme == 'classic':
html_theme_options = {
'rightsidebar': False,
'stickysidebar': True,
'collapsiblesidebar': True,
'externalrefs': False,
'footerbgcolor': "white",
'footertextcolor': "white",
'sidebarbgcolor': "white",
'sidebarbtncolor': "black",
'sidebartextcolor': "black",
'sidebarlinkcolor': "#686bff",
'relbarbgcolor': "#133f52",
'relbartextcolor': "white",
'relbarlinkcolor': "white",
'bgcolor': "white",
'textcolor': "black",
'headbgcolor': "#f2f2f2",
'headtextcolor': "#20435c",
'headlinkcolor': "#c60f0f",
'linkcolor': "#355f7c",
'visitedlinkcolor': "#355f7c",
'codebgcolor': "#3f3f3f",
'codetextcolor': "white",
'bodyfont': "serif",
'headfont': "sans-serif",
}
sys.stderr.write("Using %s theme\n" % html_theme)
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
......@@ -246,20 +316,8 @@ except ImportError:
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['sphinx-static']
html_css_files = [
'theme_overrides.css',
]
if major <= 1 and minor < 8:
html_context = {
'css_files': [
'_static/theme_overrides.css',
],
}
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
......
......@@ -32,6 +32,7 @@ Documentation/dev-tools/testing-overview.rst
kgdb
kselftest
kunit/index
ktap
.. only:: subproject and html
......
......@@ -402,7 +402,7 @@ This is a quick example of how to use kdb.
2. Enter the kernel debugger manually or by waiting for an oops or
fault. There are several ways you can enter the kernel debugger
manually; all involve using the :kbd:`SysRq-G`, which means you must have
enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
enabled ``CONFIG_MAGIC_SYSRQ=y`` in your kernel config.
- When logged in as root or with a super user session you can run::
......@@ -461,7 +461,7 @@ This is a quick example of how to use kdb with a keyboard.
2. Enter the kernel debugger manually or by waiting for an oops or
fault. There are several ways you can enter the kernel debugger
manually; all involve using the :kbd:`SysRq-G`, which means you must have
enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
enabled ``CONFIG_MAGIC_SYSRQ=y`` in your kernel config.
- When logged in as root or with a super user session you can run::
......@@ -557,7 +557,7 @@ Connecting with gdb to a serial port
Example (using a directly connected port)::
% gdb ./vmlinux
(gdb) set remotebaud 115200
(gdb) set serial baud 115200
(gdb) target remote /dev/ttyS0
......
.. SPDX-License-Identifier: GPL-2.0
========================================
The Kernel Test Anything Protocol (KTAP)
========================================
TAP, or the Test Anything Protocol is a format for specifying test results used
by a number of projects. It's website and specification are found at this `link
<https://testanything.org/>`_. The Linux Kernel largely uses TAP output for test
results. However, Kernel testing frameworks have special needs for test results
which don't align with the original TAP specification. Thus, a "Kernel TAP"
(KTAP) format is specified to extend and alter TAP to support these use-cases.
This specification describes the generally accepted format of KTAP as it is
currently used in the kernel.
KTAP test results describe a series of tests (which may be nested: i.e., test
can have subtests), each of which can contain both diagnostic data -- e.g., log
lines -- and a final result. The test structure and results are
machine-readable, whereas the diagnostic data is unstructured and is there to
aid human debugging.
KTAP output is built from four different types of lines:
- Version lines
- Plan lines
- Test case result lines
- Diagnostic lines
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
a couple of places (notably the "Subtest" header), which are described where
relevant later in this document.
Version lines
-------------
All KTAP-formatted results begin with a "version line" which specifies which
version of the (K)TAP standard the result is compliant with.
For example:
- "KTAP version 1"
- "TAP version 13"
- "TAP version 14"
Note that, in KTAP, subtests also begin with a version line, which denotes the
start of the nested test results. This differs from TAP14, which uses a
separate "Subtest" line.
While, going forward, "KTAP version 1" should be used by compliant tests, it
is expected that most parsers and other tooling will accept the other versions
listed here for compatibility with existing tests and frameworks.
Plan lines
----------
A test plan provides the number of tests (or subtests) in the KTAP output.
Plan lines must follow the format of "1..N" where N is the number of tests or subtests.
Plan lines follow version lines to indicate the number of nested tests.
While there are cases where the number of tests is not known in advance -- in
which case the test plan may be omitted -- it is strongly recommended one is
present where possible.
Test case result lines
----------------------
Test case result lines indicate the final status of a test.
They are required and must have the format:
.. code-block::
<result> <number> [<description>][ # [<directive>] [<diagnostic data>]]
The result can be either "ok", which indicates the test case passed,
or "not ok", which indicates that the test case failed.
<number> represents the number of the test being performed. The first test must
have the number 1 and the number then must increase by 1 for each additional
subtest within the same test at the same nesting level.
The description is a description of the test, generally the name of
the test, and can be any string of words (can't include #). The
description is optional, but recommended.
The directive and any diagnostic data is optional. If either are present, they
must follow a hash sign, "#".
A directive is a keyword that indicates a different outcome for a test other
than passed and failed. The directive is optional, and consists of a single
keyword preceding the diagnostic data. In the event that a parser encounters
a directive it doesn't support, it should fall back to the "ok" / "not ok"
result.
Currently accepted directives are:
- "SKIP", which indicates a test was skipped (note the result of the test case
result line can be either "ok" or "not ok" if the SKIP directive is used)
- "TODO", which indicates that a test is not expected to pass at the moment,
e.g. because the feature it is testing is known to be broken. While this
directive is inherited from TAP, its use in the kernel is discouraged.
- "XFAIL", which indicates that a test is expected to fail. This is similar
to "TODO", above, and is used by some kselftest tests.
- “TIMEOUT”, which indicates a test has timed out (note the result of the test
case result line should be “not ok” if the TIMEOUT directive is used)
- “ERROR”, which indicates that the execution of a test has failed due to a
specific error that is included in the diagnostic data. (note the result of
the test case result line should be “not ok” if the ERROR directive is used)
The diagnostic data is a plain-text field which contains any additional details
about why this result was produced. This is typically an error message for ERROR
or failed tests, or a description of missing dependencies for a SKIP result.
The diagnostic data field is optional, and results which have neither a
directive nor any diagnostic data do not need to include the "#" field
separator.
Example result lines include:
.. code-block::
ok 1 test_case_name
The test "test_case_name" passed.
.. code-block::
not ok 1 test_case_name
The test "test_case_name" failed.
.. code-block::
ok 1 test # SKIP necessary dependency unavailable
The test "test" was SKIPPED with the diagnostic message "necessary dependency
unavailable".
.. code-block::
not ok 1 test # TIMEOUT 30 seconds
The test "test" timed out, with diagnostic data "30 seconds".
.. code-block::
ok 5 check return code # rcode=0
The test "check return code" passed, with additional diagnostic data “rcode=0”
Diagnostic lines
----------------
If tests wish to output any further information, they should do so using
"diagnostic lines". Diagnostic lines are optional, freeform text, and are
often used to describe what is being tested and any intermediate results in
more detail than the final result and diagnostic data line provides.
Diagnostic lines are formatted as "# <diagnostic_description>", where the
description can be any string. Diagnostic lines can be anywhere in the test
output. As a rule, diagnostic lines regarding a test are directly before the
test result line for that test.
Note that most tools will treat unknown lines (see below) as diagnostic lines,
even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
Unknown lines
-------------
There may be lines within KTAP output that do not follow the format of one of
the four formats for lines described above. This is allowed, however, they will
not influence the status of the tests.
Nested tests
------------
In KTAP, tests can be nested. This is done by having a test include within its
output an entire set of KTAP-formatted results. This can be used to categorize
and group related tests, or to split out different results from the same test.
The "parent" test's result should consist of all of its subtests' results,
starting with another KTAP version line and test plan, and end with the overall
result. If one of the subtests fail, for example, the parent test should also
fail.
Additionally, all result lines in a subtest should be indented. One level of
indentation is two spaces: " ". The indentation should begin at the version
line and should end before the parent test's result line.
An example of a test with two nested subtests:
.. code-block::
KTAP version 1
1..1
KTAP version 1
1..2
ok 1 test_1
not ok 2 test_2
# example failed
not ok 1 example
An example format with multiple levels of nested testing:
.. code-block::
KTAP version 1
1..2
KTAP version 1
1..2
KTAP version 1
1..2
not ok 1 test_1
ok 2 test_2
not ok 1 test_3
ok 2 test_4 # SKIP
not ok 1 example_test_1
ok 2 example_test_2
Major differences between TAP and KTAP
--------------------------------------
Note the major differences between the TAP and KTAP specification:
- yaml and json are not recommended in diagnostic messages
- TODO directive not recognized
- KTAP allows for an arbitrary number of tests to be nested
The TAP14 specification does permit nested tests, but instead of using another
nested version line, uses a line of the form
"Subtest: <name>" where <name> is the name of the parent test.
Example KTAP output
--------------------
.. code-block::
KTAP version 1
1..1
KTAP version 1
1..3
KTAP version 1
1..1
# test_1: initializing test_1
ok 1 test_1
ok 1 example_test_1
KTAP version 1
1..2
ok 1 test_1 # SKIP test_1 skipped
ok 2 test_2
ok 2 example_test_2
KTAP version 1
1..3
ok 1 test_1
# test_2: FAIL
not ok 2 test_2
ok 3 test_3 # SKIP test_3 skipped
not ok 3 example_test_3
not ok 1 main_test
This output defines the following hierarchy:
A single test called "main_test", which fails, and has three subtests:
- "example_test_1", which passes, and has one subtest:
- "test_1", which passes, and outputs the diagnostic message "test_1: initializing test_1"
- "example_test_2", which passes, and has two subtests:
- "test_1", which is skipped, with the explanation "test_1 skipped"
- "test_2", which passes
- "example_test_3", which fails, and has three subtests
- "test_1", which passes
- "test_2", which outputs the diagnostic line "test_2: FAIL", and fails.
- "test_3", which is skipped with the explanation "test_3 skipped"
Note that the individual subtests with the same names do not conflict, as they
are found in different parent tests. This output also exhibits some sensible
rules for "bubbling up" test results: a test fails if any of its subtests fail.
Skipped tests do not affect the result of the parent test (though it often
makes sense for a test to be marked skipped if _all_ of its subtests have been
skipped).
See also:
---------
- The TAP specification:
https://testanything.org/tap-version-13-specification.html
- The (stagnant) TAP version 14 specification:
https://github.com/TestAnything/Specification/blob/tap-14-specification/specification.md
- The kselftest documentation:
Documentation/dev-tools/kselftest.rst
- The KUnit documentation:
Documentation/dev-tools/kunit/index.rst
.. SPDX-License-Identifier: GPL-2.0
==================
KUnit Architecture
==================
The KUnit architecture can be divided into two parts:
- Kernel testing library
- kunit_tool (Command line test harness)
In-Kernel Testing Framework
===========================
The kernel testing library supports KUnit tests written in C using
KUnit. KUnit tests are kernel code. KUnit does several things:
- Organizes tests
- Reports test results
- Provides test utilities
Test Cases
----------
The fundamental unit in KUnit is the test case. The KUnit test cases are
grouped into KUnit suites. A KUnit test case is a function with type
signature ``void (*)(struct kunit *test)``.
These test case functions are wrapped in a struct called
``struct kunit_case``. For code, see:
.. kernel-doc:: include/kunit/test.h
:identifiers: kunit_case
.. note:
``generate_params`` is optional for non-parameterized tests.
Each KUnit test case gets a ``struct kunit`` context
object passed to it that tracks a running test. The KUnit assertion
macros and other KUnit utilities use the ``struct kunit`` context
object. As an exception, there are two fields:
- ``->priv``: The setup functions can use it to store arbitrary test
user data.
- ``->param_value``: It contains the parameter value which can be
retrieved in the parameterized tests.
Test Suites
-----------
A KUnit suite includes a collection of test cases. The KUnit suites
are represented by the ``struct kunit_suite``. For example:
.. code-block:: c
static struct kunit_case example_test_cases[] = {
KUNIT_CASE(example_test_foo),
KUNIT_CASE(example_test_bar),
KUNIT_CASE(example_test_baz),
{}
};
static struct kunit_suite example_test_suite = {
.name = "example",
.init = example_test_init,
.exit = example_test_exit,
.test_cases = example_test_cases,
};
kunit_test_suite(example_test_suite);
In the above example, the test suite ``example_test_suite``, runs the
test cases ``example_test_foo``, ``example_test_bar``, and
``example_test_baz``. Before running the test, the ``example_test_init``
is called and after running the test, ``example_test_exit`` is called.
The ``kunit_test_suite(example_test_suite)`` registers the test suite
with the KUnit test framework.
Executor
--------
The KUnit executor can list and run built-in KUnit tests on boot.
The Test suites are stored in a linker section
called ``.kunit_test_suites``. For code, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/vmlinux.lds.h?h=v5.15#n945.
The linker section consists of an array of pointers to
``struct kunit_suite``, and is populated by the ``kunit_test_suites()``
macro. To run all tests compiled into the kernel, the KUnit executor
iterates over the linker section array.
.. kernel-figure:: kunit_suitememorydiagram.svg
:alt: KUnit Suite Memory
KUnit Suite Memory Diagram
On the kernel boot, the KUnit executor uses the start and end addresses
of this section to iterate over and run all tests. For code, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/kunit/executor.c
When built as a module, the ``kunit_test_suites()`` macro defines a
``module_init()`` function, which runs all the tests in the compilation
unit instead of utilizing the executor.
In KUnit tests, some error classes do not affect other tests
or parts of the kernel, each KUnit case executes in a separate thread
context. For code, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/kunit/try-catch.c?h=v5.15#n58
Assertion Macros
----------------
KUnit tests verify state using expectations/assertions.
All expectations/assertions are formatted as:
``KUNIT_{EXPECT|ASSERT}_<op>[_MSG](kunit, property[, message])``
- ``{EXPECT|ASSERT}`` determines whether the check is an assertion or an
expectation.
- For an expectation, if the check fails, marks the test as failed
and logs the failure.
- An assertion, on failure, causes the test case to terminate
immediately.
- Assertions call function:
``void __noreturn kunit_abort(struct kunit *)``.
- ``kunit_abort`` calls function:
``void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)``.
- ``kunit_try_catch_throw`` calls function:
``void complete_and_exit(struct completion *, long) __noreturn;``
and terminates the special thread context.
- ``<op>`` denotes a check with options: ``TRUE`` (supplied property
has the boolean value “true”), ``EQ`` (two supplied properties are
equal), ``NOT_ERR_OR_NULL`` (supplied pointer is not null and does not
contain an “err” value).
- ``[_MSG]`` prints a custom message on failure.
Test Result Reporting
---------------------
KUnit prints test results in KTAP format. KTAP is based on TAP14, see:
https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md.
KTAP (yet to be standardized format) works with KUnit and Kselftest.
The KUnit executor prints KTAP results to dmesg, and debugfs
(if configured).
Parameterized Tests
-------------------
Each KUnit parameterized test is associated with a collection of
parameters. The test is invoked multiple times, once for each parameter
value and the parameter is stored in the ``param_value`` field.
The test case includes a ``KUNIT_CASE_PARAM()`` macro that accepts a
generator function.
The generator function is passed the previous parameter and returns the next
parameter. It also provides a macro to generate common-case generators based on
arrays.
For code, see:
.. kernel-doc:: include/kunit/test.h
:identifiers: KUNIT_ARRAY_PARAM
kunit_tool (Command Line Test Harness)
======================================
kunit_tool is a Python script ``(tools/testing/kunit/kunit.py)``
that can be used to configure, build, exec, parse and run (runs other
commands in order) test results. You can either run KUnit tests using
kunit_tool or can include KUnit in kernel and parse manually.
- ``configure`` command generates the kernel ``.config`` from a
``.kunitconfig`` file (and any architecture-specific options).
For some architectures, additional config options are specified in the
``qemu_config`` Python script
(For example: ``tools/testing/kunit/qemu_configs/powerpc.py``).
It parses both the existing ``.config`` and the ``.kunitconfig`` files
and ensures that ``.config`` is a superset of ``.kunitconfig``.
If this is not the case, it will combine the two and run
``make olddefconfig`` to regenerate the ``.config`` file. It then
verifies that ``.config`` is now a superset. This checks if all
Kconfig dependencies are correctly specified in ``.kunitconfig``.
``kunit_config.py`` includes the parsing Kconfigs code. The code which
runs ``make olddefconfig`` is a part of ``kunit_kernel.py``. You can
invoke this command via: ``./tools/testing/kunit/kunit.py config`` and
generate a ``.config`` file.
- ``build`` runs ``make`` on the kernel tree with required options
(depends on the architecture and some options, for example: build_dir)
and reports any errors.
To build a KUnit kernel from the current ``.config``, you can use the
``build`` argument: ``./tools/testing/kunit/kunit.py build``.
- ``exec`` command executes kernel results either directly (using
User-mode Linux configuration), or via an emulator such
as QEMU. It reads results from the log via standard
output (stdout), and passes them to ``parse`` to be parsed.
If you already have built a kernel with built-in KUnit tests,
you can run the kernel and display the test results with the ``exec``
argument: ``./tools/testing/kunit/kunit.py exec``.
- ``parse`` extracts the KTAP output from a kernel log, parses
the test results, and prints a summary. For failed tests, any
diagnostic output will be included.
......@@ -4,56 +4,55 @@
Frequently Asked Questions
==========================
How is this different from Autotest, kselftest, etc?
====================================================
How is this different from Autotest, kselftest, and so on?
==========================================================
KUnit is a unit testing framework. Autotest, kselftest (and some others) are
not.
A `unit test <https://martinfowler.com/bliki/UnitTest.html>`_ is supposed to
test a single unit of code in isolation, hence the name. A unit test should be
the finest granularity of testing and as such should allow all possible code
paths to be tested in the code under test; this is only possible if the code
under test is very small and does not have any external dependencies outside of
test a single unit of code in isolation and hence the name *unit test*. A unit
test should be the finest granularity of testing and should allow all possible
code paths to be tested in the code under test. This is only possible if the
code under test is small and does not have any external dependencies outside of
the test's control like hardware.
There are no testing frameworks currently available for the kernel that do not
require installing the kernel on a test machine or in a VM and all require
tests to be written in userspace and run on the kernel under test; this is true
for Autotest, kselftest, and some others, disqualifying any of them from being
considered unit testing frameworks.
require installing the kernel on a test machine or in a virtual machine. All
testing frameworks require tests to be written in userspace and run on the
kernel under test. This is true for Autotest, kselftest, and some others,
disqualifying any of them from being considered unit testing frameworks.
Does KUnit support running on architectures other than UML?
===========================================================
Yes, well, mostly.
Yes, mostly.
For the most part, the KUnit core framework (what you use to write the tests)
can compile to any architecture; it compiles like just another part of the
For the most part, the KUnit core framework (what we use to write the tests)
can compile to any architecture. It compiles like just another part of the
kernel and runs when the kernel boots, or when built as a module, when the
module is loaded. However, there is some infrastructure,
like the KUnit Wrapper (``tools/testing/kunit/kunit.py``) that does not support
other architectures.
module is loaded. However, there is infrastructure, like the KUnit Wrapper
(``tools/testing/kunit/kunit.py``) that does not support other architectures.
In short, this means that, yes, you can run KUnit on other architectures, but
it might require more work than using KUnit on UML.
In short, yes, you can run KUnit on other architectures, but it might require
more work than using KUnit on UML.
For more information, see :ref:`kunit-on-non-uml`.
What is the difference between a unit test and these other kinds of tests?
==========================================================================
What is the difference between a unit test and other kinds of tests?
====================================================================
Most existing tests for the Linux kernel would be categorized as an integration
test, or an end-to-end test.
- A unit test is supposed to test a single unit of code in isolation, hence the
name. A unit test should be the finest granularity of testing and as such
should allow all possible code paths to be tested in the code under test; this
is only possible if the code under test is very small and does not have any
external dependencies outside of the test's control like hardware.
- A unit test is supposed to test a single unit of code in isolation. A unit
test should be the finest granularity of testing and, as such, allows all
possible code paths to be tested in the code under test. This is only possible
if the code under test is small and does not have any external dependencies
outside of the test's control like hardware.
- An integration test tests the interaction between a minimal set of components,
usually just two or three. For example, someone might write an integration
test to test the interaction between a driver and a piece of hardware, or to
test the interaction between the userspace libraries the kernel provides and
the kernel itself; however, one of these tests would probably not test the
the kernel itself. However, one of these tests would probably not test the
entire kernel along with hardware interactions and interactions with the
userspace.
- An end-to-end test usually tests the entire system from the perspective of the
......@@ -62,26 +61,26 @@ test, or an end-to-end test.
hardware with a production userspace and then trying to exercise some behavior
that depends on interactions between the hardware, the kernel, and userspace.
KUnit isn't working, what should I do?
======================================
KUnit is not working, what should I do?
=======================================
Unfortunately, there are a number of things which can break, but here are some
things to try.
1. Try running ``./tools/testing/kunit/kunit.py run`` with the ``--raw_output``
1. Run ``./tools/testing/kunit/kunit.py run`` with the ``--raw_output``
parameter. This might show details or error messages hidden by the kunit_tool
parser.
2. Instead of running ``kunit.py run``, try running ``kunit.py config``,
``kunit.py build``, and ``kunit.py exec`` independently. This can help track
down where an issue is occurring. (If you think the parser is at fault, you
can run it manually against stdin or a file with ``kunit.py parse``.)
3. Running the UML kernel directly can often reveal issues or error messages
kunit_tool ignores. This should be as simple as running ``./vmlinux`` after
building the UML kernel (e.g., by using ``kunit.py build``). Note that UML
has some unusual requirements (such as the host having a tmpfs filesystem
mounted), and has had issues in the past when built statically and the host
has KASLR enabled. (On older host kernels, you may need to run ``setarch
`uname -m` -R ./vmlinux`` to disable KASLR.)
can run it manually against ``stdin`` or a file with ``kunit.py parse``.)
3. Running the UML kernel directly can often reveal issues or error messages,
``kunit_tool`` ignores. This should be as simple as running ``./vmlinux``
after building the UML kernel (for example, by using ``kunit.py build``).
Note that UML has some unusual requirements (such as the host having a tmpfs
filesystem mounted), and has had issues in the past when built statically and
the host has KASLR enabled. (On older host kernels, you may need to run
``setarch `uname -m` -R ./vmlinux`` to disable KASLR.)
4. Make sure the kernel .config has ``CONFIG_KUNIT=y`` and at least one test
(e.g. ``CONFIG_KUNIT_EXAMPLE_TEST=y``). kunit_tool will keep its .config
around, so you can see what config was used after running ``kunit.py run``.
......
.. SPDX-License-Identifier: GPL-2.0
=========================================
KUnit - Unit Testing for the Linux Kernel
=========================================
=================================
KUnit - Linux Kernel Unit Testing
=================================
.. toctree::
:maxdepth: 2
:caption: Contents:
start
architecture
run_wrapper
run_manual
usage
kunit-tool
api/index
......@@ -16,82 +20,94 @@ KUnit - Unit Testing for the Linux Kernel
tips
running_tips
What is KUnit?
==============
KUnit is a lightweight unit testing framework for the Linux kernel.
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
cases, grouping related test cases into test suites, providing common
infrastructure for running tests, and much more.
KUnit consists of a kernel component, which provides a set of macros for easily
writing unit tests. Tests written against KUnit will run on kernel boot if
built-in, or when loaded if built as a module. These tests write out results to
the kernel log in `TAP <https://testanything.org/>`_ format.
To make running these tests (and reading the results) easier, KUnit offers
:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
results. This provides a quick way of running KUnit tests during development,
without requiring a virtual machine or separate hardware.
Get started now: Documentation/dev-tools/kunit/start.rst
Why KUnit?
==========
A unit test is supposed to test a single unit of code in isolation, hence the
name. A unit test should be the finest granularity of testing and as such should
allow all possible code paths to be tested in the code under test; this is only
possible if the code under test is very small and does not have any external
dependencies outside of the test's control like hardware.
KUnit provides a common framework for unit tests within the kernel.
KUnit tests can be run on most architectures, and most tests are architecture
independent. All built-in KUnit tests run on kernel startup. Alternatively,
KUnit and KUnit tests can be built as modules and tests will run when the test
module is loaded.
.. note::
KUnit can also run tests without needing a virtual machine or actual
hardware under User Mode Linux. User Mode Linux is a Linux architecture,
like ARM or x86, which compiles the kernel as a Linux executable. KUnit
can be used with UML either by building with ``ARCH=um`` (like any other
architecture), or by using :doc:`kunit_tool <kunit-tool>`.
KUnit is fast. Excluding build time, from invocation to completion KUnit can run
several dozen tests in only 10 to 20 seconds; this might not sound like a big
deal to some people, but having such fast and easy to run tests fundamentally
changes the way you go about testing and even writing code in the first place.
Linus himself said in his `git talk at Google
<https://gist.github.com/lorn/1272686/revisions#diff-53c65572127855f1b003db4064a94573R874>`_:
"... a lot of people seem to think that performance is about doing the
same thing, just doing it faster, and that is not true. That is not what
performance is all about. If you can do something really fast, really
well, people will start using it differently."
In this context Linus was talking about branching and merging,
but this point also applies to testing. If your tests are slow, unreliable, are
difficult to write, and require a special setup or special hardware to run,
then you wait a lot longer to write tests, and you wait a lot longer to run
tests; this means that tests are likely to break, unlikely to test a lot of
things, and are unlikely to be rerun once they pass. If your tests are really
fast, you run them all the time, every time you make a change, and every time
someone sends you some code. Why trust that someone ran all their tests
correctly on every change when you can just run them yourself in less time than
it takes to read their test log?
This section details the kernel unit testing framework.
Introduction
============
KUnit (Kernel unit testing framework) provides a common framework for
unit tests within the Linux kernel. Using KUnit, you can define groups
of test cases called test suites. The tests either run on kernel boot
if built-in, or load as a module. KUnit automatically flags and reports
failed test cases in the kernel log. The test results appear in `TAP
(Test Anything Protocol) format <https://testanything.org/>`_. It is inspired by
JUnit, Python’s unittest.mock, and GoogleTest/GoogleMock (C++ unit testing
framework).
KUnit tests are part of the kernel, written in the C (programming)
language, and test parts of the Kernel implementation (example: a C
language function). Excluding build time, from invocation to
completion, KUnit can run around 100 tests in less than 10 seconds.
KUnit can test any kernel component, for example: file system, system
calls, memory management, device drivers and so on.
KUnit follows the white-box testing approach. The test has access to
internal system functionality. KUnit runs in kernel space and is not
restricted to things exposed to user-space.
In addition, KUnit has kunit_tool, a script (``tools/testing/kunit/kunit.py``)
that configures the Linux kernel, runs KUnit tests under QEMU or UML (`User Mode
Linux <http://user-mode-linux.sourceforge.net/>`_), parses the test results and
displays them in a user friendly manner.
Features
--------
- Provides a framework for writing unit tests.
- Runs tests on any kernel architecture.
- Runs a test in milliseconds.
Prerequisites
-------------
- Any Linux kernel compatible hardware.
- For Kernel under test, Linux kernel version 5.5 or greater.
Unit Testing
============
A unit test tests a single unit of code in isolation. A unit test is the finest
granularity of testing and allows all possible code paths to be tested in the
code under test. This is possible if the code under test is small and does not
have any external dependencies outside of the test's control like hardware.
Write Unit Tests
----------------
To write good unit tests, there is a simple but powerful pattern:
Arrange-Act-Assert. This is a great way to structure test cases and
defines an order of operations.
- Arrange inputs and targets: At the start of the test, arrange the data
that allows a function to work. Example: initialize a statement or
object.
- Act on the target behavior: Call your function/code under test.
- Assert expected outcome: Verify that the result (or resulting state) is as
expected.
Unit Testing Advantages
-----------------------
- Increases testing speed and development in the long run.
- Detects bugs at initial stage and therefore decreases bug fix cost
compared to acceptance testing.
- Improves code quality.
- Encourages writing testable code.
How do I use it?
================
* Documentation/dev-tools/kunit/start.rst - for new users of KUnit
* Documentation/dev-tools/kunit/tips.rst - for short examples of best practices
* Documentation/dev-tools/kunit/usage.rst - for a more detailed explanation of KUnit features
* Documentation/dev-tools/kunit/api/index.rst - for the list of KUnit APIs used for testing
* Documentation/dev-tools/kunit/kunit-tool.rst - for more information on the kunit_tool helper script
* Documentation/dev-tools/kunit/faq.rst - for answers to some common questions about KUnit
* Documentation/dev-tools/kunit/start.rst - for KUnit new users.
* Documentation/dev-tools/kunit/architecture.rst - KUnit architecture.
* Documentation/dev-tools/kunit/run_wrapper.rst - run kunit_tool.
* Documentation/dev-tools/kunit/run_manual.rst - run tests without kunit_tool.
* Documentation/dev-tools/kunit/usage.rst - write tests.
* Documentation/dev-tools/kunit/tips.rst - best practices with
examples.
* Documentation/dev-tools/kunit/api/index.rst - KUnit APIs
used for testing.
* Documentation/dev-tools/kunit/kunit-tool.rst - kunit_tool helper
script.
* Documentation/dev-tools/kunit/faq.rst - KUnit common questions and
answers.
<?xml version="1.0" encoding="UTF-8"?>
<svg width="796.93" height="555.73" version="1.1" viewBox="0 0 796.93 555.73" xmlns="http://www.w3.org/2000/svg">
<g transform="translate(-13.724 -17.943)">
<g fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a">
<rect x="323.56" y="18.443" width="115.75" height="41.331"/>
<rect x="323.56" y="463.09" width="115.75" height="41.331"/>
<rect x="323.56" y="531.84" width="115.75" height="41.331"/>
<rect x="323.56" y="88.931" width="115.75" height="74.231"/>
</g>
<g>
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
</g>
<g transform="translate(0 -258.6)">
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
</g>
<g transform="translate(0 -217.27)">
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
</g>
<g transform="translate(0 -175.94)">
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
</g>
<g transform="translate(0 -134.61)">
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
</g>
<g transform="translate(0 -41.331)">
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
</g>
<g transform="translate(3.4459e-5 -.71088)">
<rect x="502.19" y="143.16" width="201.13" height="41.331" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
<text x="512.02319" y="168.02026" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="512.02319" y="168.02026" font-family="monospace">_kunit_suites_start</tspan></text>
</g>
<g transform="translate(3.0518e-5 -3.1753)">
<rect x="502.19" y="445.69" width="201.13" height="41.331" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
<text x="521.61694" y="470.54846" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="521.61694" y="470.54846" font-family="monospace">_kunit_suites_end</tspan></text>
</g>
<rect x="14.224" y="277.78" width="134.47" height="41.331" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
<text x="32.062176" y="304.41287" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="32.062176" y="304.41287" font-family="monospace">.init.data</tspan></text>
<g transform="translate(217.98 145.12)" stroke="#1a1a1a">
<circle cx="149.97" cy="373.01" r="3.4012"/>
<circle cx="163.46" cy="373.01" r="3.4012"/>
<circle cx="176.95" cy="373.01" r="3.4012"/>
</g>
<g transform="translate(217.98 -298.66)" stroke="#1a1a1a">
<circle cx="149.97" cy="373.01" r="3.4012"/>
<circle cx="163.46" cy="373.01" r="3.4012"/>
<circle cx="176.95" cy="373.01" r="3.4012"/>
</g>
<g stroke="#1a1a1a">
<rect x="323.56" y="328.49" width="115.75" height="51.549" fill="#b9dbc6"/>
<g transform="translate(217.98 -18.75)">
<circle cx="149.97" cy="373.01" r="3.4012"/>
<circle cx="163.46" cy="373.01" r="3.4012"/>
<circle cx="176.95" cy="373.01" r="3.4012"/>
</g>
</g>
<g transform="scale(1.0933 .9147)" stroke-width="32.937" aria-label="{">
<path d="m275.49 545.57c-35.836-8.432-47.43-24.769-47.957-64.821v-88.536c-0.527-44.795-10.54-57.97-49.538-67.456 38.998-10.013 49.011-23.715 49.538-67.983v-88.536c0.527-40.052 12.121-56.389 47.957-64.821v-5.797c-65.348 0-85.901 17.391-86.955 73.253v93.806c-0.527 36.89-10.013 50.065-44.795 59.551 34.782 10.013 44.268 23.188 44.795 60.078v93.279c1.581 56.389 21.607 73.78 86.955 73.78z"/>
</g>
<g transform="scale(1.1071 .90325)" stroke-width="14.44" aria-label="{">
<path d="m461.46 443.55c-15.711-3.6967-20.794-10.859-21.025-28.418v-38.815c-0.23104-19.639-4.6209-25.415-21.718-29.574 17.097-4.3898 21.487-10.397 21.718-29.805v-38.815c0.23105-17.559 5.314-24.722 21.025-28.418v-2.5415c-28.649 0-37.66 7.6244-38.122 32.115v41.126c-0.23105 16.173-4.3898 21.949-19.639 26.108 15.249 4.3898 19.408 10.166 19.639 26.339v40.895c0.69313 24.722 9.4728 32.346 38.122 32.346z"/>
</g>
<path d="m449.55 161.84v2.5h49.504v-2.5z" color="#000000" style="-inkscape-stroke:none"/>
<g fill-rule="evenodd">
<path d="m443.78 163.09 8.65-5v10z" color="#000000" stroke-width="1pt" style="-inkscape-stroke:none"/>
<path d="m453.1 156.94-10.648 6.1543 0.99804 0.57812 9.6504 5.5781zm-1.334 2.3125v7.6856l-6.6504-3.8438z" color="#000000" style="-inkscape-stroke:none"/>
</g>
<path d="m449.55 461.91v2.5h49.504v-2.5z" color="#000000" style="-inkscape-stroke:none"/>
<g fill-rule="evenodd">
<path d="m443.78 463.16 8.65-5v10z" color="#000000" stroke-width="1pt" style="-inkscape-stroke:none"/>
<path d="m453.1 457-10.648 6.1562 0.99804 0.57617 9.6504 5.5781zm-1.334 2.3125v7.6856l-6.6504-3.8438z" color="#000000" style="-inkscape-stroke:none"/>
</g>
<rect x="515.64" y="223.9" width="294.52" height="178.49" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
<text x="523.33319" y="262.52542" font-family="monospace" font-size="14.667px" style="line-height:1.25" xml:space="preserve"><tspan x="523.33319" y="262.52542"><tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit_suite {</tspan><tspan x="523.33319" y="280.8588"><tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold"> const char</tspan> name[<tspan fill="#ff00ff" font-size="14.667px">256</tspan>];</tspan><tspan x="523.33319" y="299.19217"> <tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">int</tspan> (*init)(<tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit *);</tspan><tspan x="523.33319" y="317.52554"> <tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">void</tspan> (*exit)(<tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit *);</tspan><tspan x="523.33319" y="335.85892"> <tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit_case *test_cases;</tspan><tspan x="523.33319" y="354.19229"> ...</tspan><tspan x="523.33319" y="372.52567">};</tspan></text>
</g>
</svg>
.. SPDX-License-Identifier: GPL-2.0
============================
Run Tests without kunit_tool
============================
If we do not want to use kunit_tool (For example: we want to integrate
with other systems, or run tests on real hardware), we can
include KUnit in any kernel, read out results, and parse manually.
.. note:: KUnit is not designed for use in a production system. It is
possible that tests may reduce the stability or security of
the system.
Configure the Kernel
====================
KUnit tests can run without kunit_tool. This can be useful, if:
- We have an existing kernel configuration to test.
- Need to run on real hardware (or using an emulator/VM kunit_tool
does not support).
- Wish to integrate with some existing testing systems.
KUnit is configured with the ``CONFIG_KUNIT`` option, and individual
tests can also be built by enabling their config options in our
``.config``. KUnit tests usually (but don't always) have config options
ending in ``_KUNIT_TEST``. Most tests can either be built as a module,
or be built into the kernel.
.. note ::
We can enable the ``KUNIT_ALL_TESTS`` config option to
automatically enable all tests with satisfied dependencies. This is
a good way of quickly testing everything applicable to the current
config.
Once we have built our kernel (and/or modules), it is simple to run
the tests. If the tests are built-in, they will run automatically on the
kernel boot. The results will be written to the kernel log (``dmesg``)
in TAP format.
If the tests are built as modules, they will run when the module is
loaded.
.. code-block :: bash
# modprobe example-test
The results will appear in TAP format in ``dmesg``.
.. note ::
If ``CONFIG_KUNIT_DEBUGFS`` is enabled, KUnit test results will
be accessible from the ``debugfs`` filesystem (if mounted).
They will be in ``/sys/kernel/debug/kunit/<test_suite>/results``, in
TAP format.
.. SPDX-License-Identifier: GPL-2.0
=========================
Run Tests with kunit_tool
=========================
We can either run KUnit tests using kunit_tool or can run tests
manually, and then use kunit_tool to parse the results. To run tests
manually, see: Documentation/dev-tools/kunit/run_manual.rst.
As long as we can build the kernel, we can run KUnit.
kunit_tool is a Python script which configures and builds a kernel, runs
tests, and formats the test results.
Run command:
.. code-block::
./tools/testing/kunit/kunit.py run
We should see the following:
.. code-block::
Generating .config...
Building KUnit kernel...
Starting KUnit kernel...
We may want to use the following options:
.. code-block::
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all
- ``--timeout`` sets a maximum amount of time for tests to run.
- ``--jobs`` sets the number of threads to build the kernel.
kunit_tool will generate a ``.kunitconfig`` with a default
configuration, if no other ``.kunitconfig`` file exists
(in the build directory). In addition, it verifies that the
generated ``.config`` file contains the ``CONFIG`` options in the
``.kunitconfig``.
It is also possible to pass a separate ``.kunitconfig`` fragment to
kunit_tool. This is useful if we have several different groups of
tests we want to run independently, or if we want to use pre-defined
test configs for certain subsystems.
To use a different ``.kunitconfig`` file (such as one
provided to test a particular subsystem), pass it as an option:
.. code-block::
./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig
To view kunit_tool flags (optional command-line arguments), run:
.. code-block::
./tools/testing/kunit/kunit.py run --help
Create a ``.kunitconfig`` File
===============================
If we want to run a specific set of tests (rather than those listed
in the KUnit ``defconfig``), we can provide Kconfig options in the
``.kunitconfig`` file. For default .kunitconfig, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/kunit/configs/default.config.
A ``.kunitconfig`` is a ``minconfig`` (a .config
generated by running ``make savedefconfig``), used for running a
specific set of tests. This file contains the regular Kernel configs
with specific test targets. The ``.kunitconfig`` also
contains any other config options required by the tests (For example:
dependencies for features under tests, configs that enable/disable
certain code blocks, arch configs and so on).
To create a ``.kunitconfig``, using the KUnit ``defconfig``:
.. code-block::
cd $PATH_TO_LINUX_REPO
cp tools/testing/kunit/configs/default.config .kunit/.kunitconfig
We can then add any other Kconfig options. For example:
.. code-block::
CONFIG_LIST_KUNIT_TEST=y
kunit_tool ensures that all config options in ``.kunitconfig`` are
set in the kernel ``.config`` before running the tests. It warns if we
have not included the options dependencies.
.. note:: Removing something from the ``.kunitconfig`` will
not rebuild the ``.config file``. The configuration is only
updated if the ``.kunitconfig`` is not a subset of ``.config``.
This means that we can use other tools
(For example: ``make menuconfig``) to adjust other config options.
The build dir needs to be set for ``make menuconfig`` to
work, therefore by default use ``make O=.kunit menuconfig``.
Configure, Build, and Run Tests
===============================
If we want to make manual changes to the KUnit build process, we
can run part of the KUnit build process independently.
When running kunit_tool, from a ``.kunitconfig``, we can generate a
``.config`` by using the ``config`` argument:
.. code-block::
./tools/testing/kunit/kunit.py config
To build a KUnit kernel from the current ``.config``, we can use the
``build`` argument:
.. code-block::
./tools/testing/kunit/kunit.py build
If we already have built UML kernel with built-in KUnit tests, we
can run the kernel, and display the test results with the ``exec``
argument:
.. code-block::
./tools/testing/kunit/kunit.py exec
The ``run`` command discussed in section: **Run Tests with kunit_tool**,
is equivalent to running the above three commands in sequence.
Parse Test Results
==================
KUnit tests output displays results in TAP (Test Anything Protocol)
format. When running tests, kunit_tool parses this output and prints
a summary. To see the raw test results in TAP format, we can pass the
``--raw_output`` argument:
.. code-block::
./tools/testing/kunit/kunit.py run --raw_output
If we have KUnit results in the raw TAP format, we can parse them and
print the human-readable summary with the ``parse`` command for
kunit_tool. This accepts a filename for an argument, or will read from
standard input.
.. code-block:: bash
# Reading from a file
./tools/testing/kunit/kunit.py parse /var/log/dmesg
# Reading from stdin
dmesg | ./tools/testing/kunit/kunit.py parse
Run Selected Test Suites
========================
By passing a bash style glob filter to the ``exec`` or ``run``
commands, we can run a subset of the tests built into a kernel . For
example: if we only want to run KUnit resource tests, use:
.. code-block::
./tools/testing/kunit/kunit.py run 'kunit-resource*'
This uses the standard glob format with wildcard characters.
Run Tests on qemu
=================
kunit_tool supports running tests on qemu as well as
via UML. To run tests on qemu, by default it requires two flags:
- ``--arch``: Selects a configs collection (Kconfig, qemu config options
and so on), that allow KUnit tests to be run on the specified
architecture in a minimal way. The architecture argument is same as
the option name passed to the ``ARCH`` variable used by Kbuild.
Not all architectures currently support this flag, but we can use
``--qemu_config`` to handle it. If ``um`` is passed (or this flag
is ignored), the tests will run via UML. Non-UML architectures,
for example: i386, x86_64, arm and so on; run on qemu.
- ``--cross_compile``: Specifies the Kbuild toolchain. It passes the
same argument as passed to the ``CROSS_COMPILE`` variable used by
Kbuild. As a reminder, this will be the prefix for the toolchain
binaries such as GCC. For example:
- ``sparc64-linux-gnu`` if we have the sparc toolchain installed on
our system.
- ``$HOME/toolchains/microblaze/gcc-9.2.0-nolibc/microblaze-linux/bin/microblaze-linux``
if we have downloaded the microblaze toolchain from the 0-day
website to a directory in our home directory called toolchains.
If we want to run KUnit tests on an architecture not supported by
the ``--arch`` flag, or want to run KUnit tests on qemu using a
non-default configuration; then we can write our own``QemuConfig``.
These ``QemuConfigs`` are written in Python. They have an import line
``from..qemu_config import QemuArchParams`` at the top of the file.
The file must contain a variable called ``QEMU_ARCH`` that has an
instance of ``QemuArchParams`` assigned to it. See example in:
``tools/testing/kunit/qemu_configs/x86_64.py``.
Once we have a ``QemuConfig``, we can pass it into kunit_tool,
using the ``--qemu_config`` flag. When used, this flag replaces the
``--arch`` flag. For example: using
``tools/testing/kunit/qemu_configs/x86_64.py``, the invocation appear
as
.. code-block:: bash
./tools/testing/kunit/kunit.py run \
--timeout=60 \
--jobs=12 \
--qemu_config=./tools/testing/kunit/qemu_configs/x86_64.py
To run existing KUnit tests on non-UML architectures, see:
Documentation/dev-tools/kunit/non_uml.rst.
Command-Line Arguments
======================
kunit_tool has a number of other command-line arguments which can
be useful for our test environment. Below the most commonly used
command line arguments:
- ``--help``: Lists all available options. To list common options,
place ``--help`` before the command. To list options specific to that
command, place ``--help`` after the command.
.. note:: Different commands (``config``, ``build``, ``run``, etc)
have different supported options.
- ``--build_dir``: Specifies kunit_tool build directory. It includes
the ``.kunitconfig``, ``.config`` files and compiled kernel.
- ``--make_options``: Specifies additional options to pass to make, when
compiling a kernel (using ``build`` or ``run`` commands). For example:
to enable compiler warnings, we can pass ``--make_options W=1``.
- ``--alltests``: Builds a UML kernel with all config options enabled
using ``make allyesconfig``. This allows us to run as many tests as
possible.
.. note:: It is slow and prone to breakage as new options are
added or modified. Instead, enable all tests
which have satisfied dependencies by adding
``CONFIG_KUNIT_ALL_TESTS=y`` to your ``.kunitconfig``.
......@@ -4,132 +4,137 @@
Getting Started
===============
Installing dependencies
Installing Dependencies
=======================
KUnit has the same dependencies as the Linux kernel. As long as you can build
the kernel, you can run KUnit.
KUnit has the same dependencies as the Linux kernel. As long as you can
build the kernel, you can run KUnit.
Running tests with the KUnit Wrapper
====================================
Included with KUnit is a simple Python wrapper which runs tests under User Mode
Linux, and formats the test results.
The wrapper can be run with:
Running tests with kunit_tool
=============================
kunit_tool is a Python script, which configures and builds a kernel, runs
tests, and formats the test results. From the kernel repository, you
can run kunit_tool:
.. code-block:: bash
./tools/testing/kunit/kunit.py run
For more information on this wrapper (also called kunit_tool) check out the
Documentation/dev-tools/kunit/kunit-tool.rst page.
For more information on this wrapper, see:
Documentation/dev-tools/kunit/run_wrapper.rst.
Creating a ``.kunitconfig``
---------------------------
By default, kunit_tool runs a selection of tests. However, you can specify which
unit tests to run by creating a ``.kunitconfig`` file with kernel config options
that enable only a specific set of tests and their dependencies.
The ``.kunitconfig`` file contains a list of kconfig options which are required
to run the desired targets. The ``.kunitconfig`` also contains any other test
specific config options, such as test dependencies. For example: the
``FAT_FS`` tests - ``FAT_KUNIT_TEST``, depends on
``FAT_FS``. ``FAT_FS`` can be enabled by selecting either ``MSDOS_FS``
or ``VFAT_FS``. To run ``FAT_KUNIT_TEST``, the ``.kunitconfig`` has:
Creating a .kunitconfig
-----------------------
If you want to run a specific set of tests (rather than those listed in the
KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
This file essentially contains the regular Kernel config, with the specific
test targets as well. The ``.kunitconfig`` should also contain any other config
options required by the tests.
.. code-block:: none
CONFIG_KUNIT=y
CONFIG_MSDOS_FS=y
CONFIG_FAT_KUNIT_TEST=y
A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
1. A good starting point for the ``.kunitconfig``, is the KUnit default
config. Run the command:
.. code-block:: bash
cd $PATH_TO_LINUX_REPO
cp tools/testing/kunit/configs/default.config .kunitconfig
You can then add any other Kconfig options you wish, e.g.:
.. note ::
You may want to remove CONFIG_KUNIT_ALL_TESTS from the ``.kunitconfig`` as
it will enable a number of additional tests that you may not want.
2. You can then add any other Kconfig options, for example:
.. code-block:: none
CONFIG_LIST_KUNIT_TEST=y
:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
It'll warn you if you haven't included the dependencies of the options you're
using.
Before running the tests, kunit_tool ensures that all config options
set in ``.kunitconfig`` are set in the kernel ``.config``. It will warn
you if you have not included dependencies for the options used.
.. note::
.. note ::
If you change the ``.kunitconfig``, kunit.py will trigger a rebuild of the
``.config`` file. But you can edit the ``.config`` file directly or with
tools like ``make menuconfig O=.kunit``. As long as its a superset of
``.kunitconfig``, kunit.py won't overwrite your changes.
Running the tests (KUnit Wrapper)
---------------------------------
To make sure that everything is set up correctly, simply invoke the Python
wrapper from your kernel repo:
Running Tests (KUnit Wrapper)
-----------------------------
1. To make sure that everything is set up correctly, invoke the Python
wrapper from your kernel repository:
.. code-block:: bash
./tools/testing/kunit/kunit.py run
.. note::
You may want to run ``make mrproper`` first.
If everything worked correctly, you should see the following:
.. code-block:: bash
.. code-block::
Generating .config ...
Building KUnit Kernel ...
Starting KUnit Kernel ...
followed by a list of tests that are run. All of them should be passing.
The tests will pass or fail.
.. note::
.. note ::
Because it is building a lot of sources for the first time, the
``Building KUnit kernel`` step may take a while.
``Building KUnit kernel`` may take a while.
Running tests without the KUnit Wrapper
Running Tests without the KUnit Wrapper
=======================================
If you'd rather not use the KUnit Wrapper (if, for example, you need to
integrate with other systems, or use an architecture other than UML), KUnit can
be included in any kernel, and the results read out and parsed manually.
.. note::
KUnit is not designed for use in a production system, and it's possible that
tests may reduce the stability or security of the system.
Configuring the kernel
If you do not want to use the KUnit Wrapper (for example: you want code
under test to integrate with other systems, or use a different/
unsupported architecture or configuration), KUnit can be included in
any kernel, and the results are read out and parsed manually.
.. note ::
``CONFIG_KUNIT`` should not be enabled in a production environment.
Enabling KUnit disables Kernel Address-Space Layout Randomization
(KASLR), and tests may affect the state of the kernel in ways not
suitable for production.
Configuring the Kernel
----------------------
To enable KUnit itself, you need to enable the ``CONFIG_KUNIT`` Kconfig
option (under Kernel Hacking/Kernel Testing and Coverage in
``menuconfig``). From there, you can enable any KUnit tests. They
usually have config options ending in ``_KUNIT_TEST``.
In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
menuconfig). From there, you can enable any KUnit tests you want: they usually
have config options ending in ``_KUNIT_TEST``.
KUnit and KUnit tests can be compiled as modules: in this case the tests in a
module will be run when the module is loaded.
KUnit and KUnit tests can be compiled as modules. The tests in a module
will run when the module is loaded.
Running the tests (w/o KUnit Wrapper)
Running Tests (without KUnit Wrapper)
-------------------------------------
Build and run your kernel. In the kernel log, the test output is printed
out in the TAP format. This will only happen by default if KUnit/tests
are built-in. Otherwise the module will need to be loaded.
Build and run your kernel as usual. Test output will be written to the kernel
log in `TAP <https://testanything.org/>`_ format.
.. note::
It's possible that there will be other lines and/or data interspersed in the
TAP output.
.. note ::
Some lines and/or data may get interspersed in the TAP output.
Writing your first test
Writing Your First Test
=======================
In your kernel repository, let's add some code that we can test.
In your kernel repo let's add some code that we can test. Create a file
``drivers/misc/example.h`` with the contents:
1. Create a file ``drivers/misc/example.h``, which includes:
.. code-block:: c
int misc_example_add(int left, int right);
create a file ``drivers/misc/example.c``:
2. Create a file ``drivers/misc/example.c``, which includes:
.. code-block:: c
......@@ -142,21 +147,22 @@ create a file ``drivers/misc/example.c``:
return left + right;
}
Now add the following lines to ``drivers/misc/Kconfig``:
3. Add the following lines to ``drivers/misc/Kconfig``:
.. code-block:: kconfig
config MISC_EXAMPLE
bool "My example"
and the following lines to ``drivers/misc/Makefile``:
4. Add the following lines to ``drivers/misc/Makefile``:
.. code-block:: make
obj-$(CONFIG_MISC_EXAMPLE) += example.o
Now we are ready to write the test. The test will be in
``drivers/misc/example-test.c``:
Now we are ready to write the test cases.
1. Add the below test case in ``drivers/misc/example_test.c``:
.. code-block:: c
......@@ -191,7 +197,7 @@ Now we are ready to write the test. The test will be in
};
kunit_test_suite(misc_example_test_suite);
Now add the following to ``drivers/misc/Kconfig``:
2. Add the following lines to ``drivers/misc/Kconfig``:
.. code-block:: kconfig
......@@ -200,20 +206,20 @@ Now add the following to ``drivers/misc/Kconfig``:
depends on MISC_EXAMPLE && KUNIT=y
default KUNIT_ALL_TESTS
and the following to ``drivers/misc/Makefile``:
3. Add the following lines to ``drivers/misc/Makefile``:
.. code-block:: make
obj-$(CONFIG_MISC_EXAMPLE_TEST) += example-test.o
obj-$(CONFIG_MISC_EXAMPLE_TEST) += example_test.o
Now add it to your ``.kunitconfig``:
4. Add the following lines to ``.kunitconfig``:
.. code-block:: none
CONFIG_MISC_EXAMPLE=y
CONFIG_MISC_EXAMPLE_TEST=y
Now you can run the test:
5. Run the test:
.. code-block:: bash
......@@ -230,13 +236,20 @@ You should see the following failure:
[16:08:57] This test never passes.
...
Congrats! You just wrote your first KUnit test!
Congrats! You just wrote your first KUnit test.
Next Steps
==========
* Check out the Documentation/dev-tools/kunit/tips.rst page for tips on
writing idiomatic KUnit tests.
* Check out the :doc:`running_tips` page for tips on
how to make running KUnit tests easier.
* Optional: see the :doc:`usage` page for a more
in-depth explanation of KUnit.
* Documentation/dev-tools/kunit/architecture.rst - KUnit architecture.
* Documentation/dev-tools/kunit/run_wrapper.rst - run kunit_tool.
* Documentation/dev-tools/kunit/run_manual.rst - run tests without kunit_tool.
* Documentation/dev-tools/kunit/usage.rst - write tests.
* Documentation/dev-tools/kunit/tips.rst - best practices with
examples.
* Documentation/dev-tools/kunit/api/index.rst - KUnit APIs
used for testing.
* Documentation/dev-tools/kunit/kunit-tool.rst - kunit_tool helper
script.
* Documentation/dev-tools/kunit/faq.rst - KUnit common questions and
answers.
......@@ -4,37 +4,36 @@
Test Style and Nomenclature
===========================
To make finding, writing, and using KUnit tests as simple as possible, it's
To make finding, writing, and using KUnit tests as simple as possible, it is
strongly encouraged that they are named and written according to the guidelines
below. While it's possible to write KUnit tests which do not follow these rules,
below. While it is possible to write KUnit tests which do not follow these rules,
they may break some tooling, may conflict with other tests, and may not be run
automatically by testing systems.
It's recommended that you only deviate from these guidelines when:
It is recommended that you only deviate from these guidelines when:
1. Porting tests to KUnit which are already known with an existing name, or
2. Writing tests which would cause serious problems if automatically run (e.g.,
non-deterministically producing false positives or negatives, or taking an
extremely long time to run).
1. Porting tests to KUnit which are already known with an existing name.
2. Writing tests which would cause serious problems if automatically run. For
example, non-deterministically producing false positives or negatives, or
taking a long time to run.
Subsystems, Suites, and Tests
=============================
In order to make tests as easy to find as possible, they're grouped into suites
and subsystems. A test suite is a group of tests which test a related area of
the kernel, and a subsystem is a set of test suites which test different parts
of the same kernel subsystem or driver.
To make tests easy to find, they are grouped into suites and subsystems. A test
suite is a group of tests which test a related area of the kernel. A subsystem
is a set of test suites which test different parts of a kernel subsystem
or a driver.
Subsystems
----------
Every test suite must belong to a subsystem. A subsystem is a collection of one
or more KUnit test suites which test the same driver or part of the kernel. A
rule of thumb is that a test subsystem should match a single kernel module. If
the code being tested can't be compiled as a module, in many cases the subsystem
should correspond to a directory in the source tree or an entry in the
MAINTAINERS file. If unsure, follow the conventions set by tests in similar
areas.
test subsystem should match a single kernel module. If the code being tested
cannot be compiled as a module, in many cases the subsystem should correspond to
a directory in the source tree or an entry in the ``MAINTAINERS`` file. If
unsure, follow the conventions set by tests in similar areas.
Test subsystems should be named after the code being tested, either after the
module (wherever possible), or after the directory or files being tested. Test
......@@ -42,9 +41,8 @@ subsystems should be named to avoid ambiguity where necessary.
If a test subsystem name has multiple components, they should be separated by
underscores. *Do not* include "test" or "kunit" directly in the subsystem name
unless you are actually testing other tests or the kunit framework itself.
Example subsystems could be:
unless we are actually testing other tests or the kunit framework itself. For
example, subsystems could be called:
``ext4``
Matches the module and filesystem name.
......@@ -56,48 +54,46 @@ Example subsystems could be:
Has several components (``snd``, ``hda``, ``codec``, ``hdmi``) separated by
underscores. Matches the module name.
Avoid names like these:
Avoid names as shown in examples below:
``linear-ranges``
Names should use underscores, not dashes, to separate words. Prefer
``linear_ranges``.
``qos-kunit-test``
As well as using underscores, this name should not have "kunit-test" as a
suffix, and ``qos`` is ambiguous as a subsystem name. ``power_qos`` would be a
better name.
This name should use underscores, and not have "kunit-test" as a
suffix. ``qos`` is also ambiguous as a subsystem name, because several parts
of the kernel have a ``qos`` subsystem. ``power_qos`` would be a better name.
``pc_parallel_port``
The corresponding module name is ``parport_pc``, so this subsystem should also
be named ``parport_pc``.
.. note::
The KUnit API and tools do not explicitly know about subsystems. They're
simply a way of categorising test suites and naming modules which
provides a simple, consistent way for humans to find and run tests. This
may change in the future, though.
The KUnit API and tools do not explicitly know about subsystems. They are
a way of categorizing test suites and naming modules which provides a
simple, consistent way for humans to find and run tests. This may change
in the future.
Suites
------
KUnit tests are grouped into test suites, which cover a specific area of
functionality being tested. Test suites can have shared initialisation and
shutdown code which is run for all tests in the suite.
Not all subsystems will need to be split into multiple test suites (e.g. simple drivers).
functionality being tested. Test suites can have shared initialization and
shutdown code which is run for all tests in the suite. Not all subsystems need
to be split into multiple test suites (for example, simple drivers).
Test suites are named after the subsystem they are part of. If a subsystem
contains several suites, the specific area under test should be appended to the
subsystem name, separated by an underscore.
In the event that there are multiple types of test using KUnit within a
subsystem (e.g., both unit tests and integration tests), they should be put into
separate suites, with the type of test as the last element in the suite name.
Unless these tests are actually present, avoid using ``_test``, ``_unittest`` or
similar in the suite name.
subsystem (for example, both unit tests and integration tests), they should be
put into separate suites, with the type of test as the last element in the suite
name. Unless these tests are actually present, avoid using ``_test``, ``_unittest``
or similar in the suite name.
The full test suite name (including the subsystem name) should be specified as
the ``.name`` member of the ``kunit_suite`` struct, and forms the base for the
module name (see below).
Example test suites could include:
module name. For example, test suites could include:
``ext4_inode``
Part of the ``ext4`` subsystem, testing the ``inode`` area.
......@@ -109,26 +105,27 @@ Example test suites could include:
The ``kasan`` subsystem has only one suite, so the suite name is the same as
the subsystem name.
Avoid names like:
Avoid names, for example:
``ext4_ext4_inode``
There's no reason to state the subsystem twice.
There is no reason to state the subsystem twice.
``property_entry``
The suite name is ambiguous without the subsystem name.
``kasan_integration_test``
Because there is only one suite in the ``kasan`` subsystem, the suite should
just be called ``kasan``. There's no need to redundantly add
``integration_test``. Should a separate test suite with, for example, unit
tests be added, then that suite could be named ``kasan_unittest`` or similar.
just be called as ``kasan``. Do not redundantly add
``integration_test``. It should be a separate test suite. For example, if the
unit tests are added, then that suite could be named as ``kasan_unittest`` or
similar.
Test Cases
----------
Individual tests consist of a single function which tests a constrained
codepath, property, or function. In the test output, individual tests' results
will show up as subtests of the suite's results.
codepath, property, or function. In the test output, an individual test's
results will show up as subtests of the suite's results.
Tests should be named after what they're testing. This is often the name of the
Tests should be named after what they are testing. This is often the name of the
function being tested, with a description of the input or codepath being tested.
As tests are C functions, they should be named and written in accordance with
the kernel coding style.
......@@ -136,7 +133,7 @@ the kernel coding style.
.. note::
As tests are themselves functions, their names cannot conflict with
other C identifiers in the kernel. This may require some creative
naming. It's a good idea to make your test functions `static` to avoid
naming. It is a good idea to make your test functions `static` to avoid
polluting the global namespace.
Example test names include:
......@@ -162,16 +159,16 @@ This Kconfig entry must:
* be named ``CONFIG_<name>_KUNIT_TEST``: where <name> is the name of the test
suite.
* be listed either alongside the config entries for the driver/subsystem being
tested, or be under [Kernel Hacking][Kernel Testing and Coverage]
* depend on ``CONFIG_KUNIT``
tested, or be under [Kernel Hacking]->[Kernel Testing and Coverage]
* depend on ``CONFIG_KUNIT``.
* be visible only if ``CONFIG_KUNIT_ALL_TESTS`` is not enabled.
* have a default value of ``CONFIG_KUNIT_ALL_TESTS``.
* have a brief description of KUnit in the help text
* have a brief description of KUnit in the help text.
Unless there's a specific reason not to (e.g. the test is unable to be built as
a module), Kconfig entries for tests should be tristate.
If we are not able to meet above conditions (for example, the test is unable to
be built as a module), Kconfig entries for tests should be tristate.
An example Kconfig entry:
For example, a Kconfig entry might look like:
.. code-block:: none
......@@ -182,8 +179,8 @@ An example Kconfig entry:
help
This builds unit tests for foo.
For more information on KUnit and unit tests in general, please refer
to the KUnit documentation in Documentation/dev-tools/kunit/.
For more information on KUnit and unit tests in general,
please refer to the KUnit documentation in Documentation/dev-tools/kunit/.
If unsure, say N.
......
.. SPDX-License-Identifier: GPL-2.0
===========
Using KUnit
===========
The purpose of this document is to describe what KUnit is, how it works, how it
is intended to be used, and all the concepts and terminology that are needed to
understand it. This guide assumes a working knowledge of the Linux kernel and
some basic knowledge of testing.
For a high level introduction to KUnit, including setting up KUnit for your
project, see Documentation/dev-tools/kunit/start.rst.
Organization of this document
=============================
This document is organized into two main sections: Testing and Common Patterns.
The first covers what unit tests are and how to use KUnit to write them. The
second covers common testing patterns, e.g. how to isolate code and make it
possible to unit test code that was otherwise un-unit-testable.
Testing
=======
What is KUnit?
--------------
"K" is short for "kernel" so "KUnit" is the "(Linux) Kernel Unit Testing
Framework." KUnit is intended first and foremost for writing unit tests; it is
general enough that it can be used to write integration tests; however, this is
a secondary goal. KUnit has no ambition of being the only testing framework for
the kernel; for example, it does not intend to be an end-to-end testing
framework.
What is Unit Testing?
---------------------
A `unit test <https://martinfowler.com/bliki/UnitTest.html>`_ is a test that
tests code at the smallest possible scope, a *unit* of code. In the C
programming language that's a function.
Unit tests should be written for all the publicly exposed functions in a
compilation unit; so that is all the functions that are exported in either a
*class* (defined below) or all functions which are **not** static.
Writing Tests
-------------
=============
Test Cases
~~~~~~~~~~
----------
The fundamental unit in KUnit is the test case. A test case is a function with
the signature ``void (*)(struct kunit *test)``. It calls a function to be tested
the signature ``void (*)(struct kunit *test)``. It calls the function under test
and then sets *expectations* for what should happen. For example:
.. code-block:: c
......@@ -65,18 +21,19 @@ and then sets *expectations* for what should happen. For example:
KUNIT_FAIL(test, "This test never passes.");
}
In the above example ``example_test_success`` always passes because it does
nothing; no expectations are set, so all expectations pass. On the other hand
``example_test_failure`` always fails because it calls ``KUNIT_FAIL``, which is
a special expectation that logs a message and causes the test case to fail.
In the above example, ``example_test_success`` always passes because it does
nothing; no expectations are set, and therefore all expectations pass. On the
other hand ``example_test_failure`` always fails because it calls ``KUNIT_FAIL``,
which is a special expectation that logs a message and causes the test case to
fail.
Expectations
~~~~~~~~~~~~
An *expectation* is a way to specify that you expect a piece of code to do
something in a test. An expectation is called like a function. A test is made
by setting expectations about the behavior of a piece of code under test; when
one or more of the expectations fail, the test case fails and information about
the failure is logged. For example:
An *expectation* specifies that we expect a piece of code to do something in a
test. An expectation is called like a function. A test is made by setting
expectations about the behavior of a piece of code under test. When one or more
expectations fail, the test case fails and information about the failure is
logged. For example:
.. code-block:: c
......@@ -86,29 +43,28 @@ the failure is logged. For example:
KUNIT_EXPECT_EQ(test, 2, add(1, 1));
}
In the above example ``add_test_basic`` makes a number of assertions about the
behavior of a function called ``add``; the first parameter is always of type
``struct kunit *``, which contains information about the current test context;
the second parameter, in this case, is what the value is expected to be; the
In the above example, ``add_test_basic`` makes a number of assertions about the
behavior of a function called ``add``. The first parameter is always of type
``struct kunit *``, which contains information about the current test context.
The second parameter, in this case, is what the value is expected to be. The
last value is what the value actually is. If ``add`` passes all of these
expectations, the test case, ``add_test_basic`` will pass; if any one of these
expectations fails, the test case will fail.
It is important to understand that a test case *fails* when any expectation is
violated; however, the test will continue running, potentially trying other
expectations until the test case ends or is otherwise terminated. This is as
opposed to *assertions* which are discussed later.
A test case *fails* when any expectation is violated; however, the test will
continue to run, and try other expectations until the test case ends or is
otherwise terminated. This is as opposed to *assertions* which are discussed
later.
To learn about more expectations supported by KUnit, see
Documentation/dev-tools/kunit/api/test.rst.
To learn about more KUnit expectations, see Documentation/dev-tools/kunit/api/test.rst.
.. note::
A single test case should be pretty short, pretty easy to understand,
focused on a single behavior.
A single test case should be short, easy to understand, and focused on a
single behavior.
For example, if we wanted to properly test the add function above, we would
create additional tests cases which would each test a different property that an
add function should have like this:
For example, if we want to rigorously test the ``add`` function above, create
additional tests cases which would test each property that an ``add`` function
should have as shown below:
.. code-block:: c
......@@ -134,56 +90,43 @@ add function should have like this:
KUNIT_EXPECT_EQ(test, INT_MIN, add(INT_MAX, 1));
}
Notice how it is immediately obvious what all the properties that we are testing
for are.
Assertions
~~~~~~~~~~
KUnit also has the concept of an *assertion*. An assertion is just like an
expectation except the assertion immediately terminates the test case if it is
not satisfied.
For example:
An assertion is like an expectation, except that the assertion immediately
terminates the test case if the condition is not satisfied. For example:
.. code-block:: c
static void mock_test_do_expect_default_return(struct kunit *test)
static void test_sort(struct kunit *test)
{
struct mock_test_context *ctx = test->priv;
struct mock *mock = ctx->mock;
int param0 = 5, param1 = -5;
const char *two_param_types[] = {"int", "int"};
const void *two_params[] = {&param0, &param1};
const void *ret;
ret = mock->do_expect(mock,
"test_printk", test_printk,
two_param_types, two_params,
ARRAY_SIZE(two_params));
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ret);
KUNIT_EXPECT_EQ(test, -4, *((int *) ret));
int *a, i, r = 1;
a = kunit_kmalloc_array(test, TEST_LEN, sizeof(*a), GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, a);
for (i = 0; i < TEST_LEN; i++) {
r = (r * 725861) % 6599;
a[i] = r;
}
sort(a, TEST_LEN, sizeof(*a), cmpint, NULL);
for (i = 0; i < TEST_LEN-1; i++)
KUNIT_EXPECT_LE(test, a[i], a[i + 1]);
}
In this example, the method under test should return a pointer to a value, so
if the pointer returned by the method is null or an errno, we don't want to
bother continuing the test since the following expectation could crash the test
case. `ASSERT_NOT_ERR_OR_NULL(...)` allows us to bail out of the test case if
the appropriate conditions have not been satisfied to complete the test.
In this example, the method under test should return pointer to a value. If the
pointer returns null or an errno, we want to stop the test since the following
expectation could crash the test case. `ASSERT_NOT_ERR_OR_NULL(...)` allows us
to bail out of the test case if the appropriate conditions are not satisfied to
complete the test.
Test Suites
~~~~~~~~~~~
Now obviously one unit test isn't very helpful; the power comes from having
many test cases covering all of a unit's behaviors. Consequently it is common
to have many *similar* tests; in order to reduce duplication in these closely
related tests most unit testing frameworks - including KUnit - provide the
concept of a *test suite*. A *test suite* is just a collection of test cases
for a unit of code with a set up function that gets invoked before every test
case and then a tear down function that gets invoked after every test case
completes.
Example:
We need many test cases covering all the unit's behaviors. It is common to have
many similar tests. In order to reduce duplication in these closely related
tests, most unit testing frameworks (including KUnit) provide the concept of a
*test suite*. A test suite is a collection of test cases for a unit of code
with a setup function that gets invoked before every test case and then a tear
down function that gets invoked after every test case completes. For example:
.. code-block:: c
......@@ -202,23 +145,48 @@ Example:
};
kunit_test_suite(example_test_suite);
In the above example the test suite, ``example_test_suite``, would run the test
cases ``example_test_foo``, ``example_test_bar``, and ``example_test_baz``;
each would have ``example_test_init`` called immediately before it and would
have ``example_test_exit`` called immediately after it.
In the above example, the test suite ``example_test_suite`` would run the test
cases ``example_test_foo``, ``example_test_bar``, and ``example_test_baz``. Each
would have ``example_test_init`` called immediately before it and
``example_test_exit`` called immediately after it.
``kunit_test_suite(example_test_suite)`` registers the test suite with the
KUnit test framework.
.. note::
A test case will only be run if it is associated with a test suite.
A test case will only run if it is associated with a test suite.
``kunit_test_suite(...)`` is a macro which tells the linker to put the
specified test suite in a special linker section so that it can be run by KUnit
either after ``late_init``, or when the test module is loaded (if the test was
built as a module).
For more information, see Documentation/dev-tools/kunit/api/test.rst.
Writing Tests For Other Architectures
-------------------------------------
It is better to write tests that run on UML to tests that only run under a
particular architecture. It is better to write tests that run under QEMU or
another easy to obtain (and monetarily free) software environment to a specific
piece of hardware.
``kunit_test_suite(...)`` is a macro which tells the linker to put the specified
test suite in a special linker section so that it can be run by KUnit either
after late_init, or when the test module is loaded (depending on whether the
test was built in or not).
Nevertheless, there are still valid reasons to write a test that is architecture
or hardware specific. For example, we might want to test code that really
belongs in ``arch/some-arch/*``. Even so, try to write the test so that it does
not depend on physical hardware. Some of our test cases may not need hardware,
only few tests actually require the hardware to test it. When hardware is not
available, instead of disabling tests, we can skip them.
For more information on these types of things see the
Documentation/dev-tools/kunit/api/test.rst.
Now that we have narrowed down exactly what bits are hardware specific, the
actual procedure for writing and running the tests is same as writing normal
KUnit tests.
.. important::
We may have to reset hardware state. If this is not possible, we may only
be able to run one test case per invocation.
.. TODO(brendanhiggins@google.com): Add an actual example of an architecture-
dependent KUnit test.
Common Patterns
===============
......@@ -226,43 +194,39 @@ Common Patterns
Isolating Behavior
------------------
The most important aspect of unit testing that other forms of testing do not
provide is the ability to limit the amount of code under test to a single unit.
In practice, this is only possible by being able to control what code gets run
when the unit under test calls a function and this is usually accomplished
through some sort of indirection where a function is exposed as part of an API
such that the definition of that function can be changed without affecting the
rest of the code base. In the kernel this primarily comes from two constructs,
classes, structs that contain function pointers that are provided by the
implementer, and architecture-specific functions which have definitions selected
at compile time.
Unit testing limits the amount of code under test to a single unit. It controls
what code gets run when the unit under test calls a function. Where a function
is exposed as part of an API such that the definition of that function can be
changed without affecting the rest of the code base. In the kernel, this comes
from two constructs: classes, which are structs that contain function pointers
provided by the implementer, and architecture-specific functions, which have
definitions selected at compile time.
Classes
~~~~~~~
Classes are not a construct that is built into the C programming language;
however, it is an easily derived concept. Accordingly, pretty much every project
that does not use a standardized object oriented library (like GNOME's GObject)
has their own slightly different way of doing object oriented programming; the
Linux kernel is no exception.
however, it is an easily derived concept. Accordingly, in most cases, every
project that does not use a standardized object oriented library (like GNOME's
GObject) has their own slightly different way of doing object oriented
programming; the Linux kernel is no exception.
The central concept in kernel object oriented programming is the class. In the
kernel, a *class* is a struct that contains function pointers. This creates a
contract between *implementers* and *users* since it forces them to use the
same function signature without having to call the function directly. In order
for it to truly be a class, the function pointers must specify that a pointer
to the class, known as a *class handle*, be one of the parameters; this makes
it possible for the member functions (also known as *methods*) to have access
to member variables (more commonly known as *fields*) allowing the same
implementation to have multiple *instances*.
Typically a class can be *overridden* by *child classes* by embedding the
*parent class* in the child class. Then when a method provided by the child
class is called, the child implementation knows that the pointer passed to it is
of a parent contained within the child; because of this, the child can compute
the pointer to itself because the pointer to the parent is always a fixed offset
from the pointer to the child; this offset is the offset of the parent contained
in the child struct. For example:
same function signature without having to call the function directly. To be a
class, the function pointers must specify that a pointer to the class, known as
a *class handle*, be one of the parameters. Thus the member functions (also
known as *methods*) have access to member variables (also known as *fields*)
allowing the same implementation to have multiple *instances*.
A class can be *overridden* by *child classes* by embedding the *parent class*
in the child class. Then when the child class *method* is called, the child
implementation knows that the pointer passed to it is of a parent contained
within the child. Thus, the child can compute the pointer to itself because the
pointer to the parent is always a fixed offset from the pointer to the child.
This offset is the offset of the parent contained in the child struct. For
example:
.. code-block:: c
......@@ -290,8 +254,8 @@ in the child struct. For example:
self->width = width;
}
In this example (as in most kernel code) the operation of computing the pointer
to the child from the pointer to the parent is done by ``container_of``.
In this example, computing the pointer to the child from the pointer to the
parent is done by ``container_of``.
Faking Classes
~~~~~~~~~~~~~~
......@@ -300,14 +264,11 @@ In order to unit test a piece of code that calls a method in a class, the
behavior of the method must be controllable, otherwise the test ceases to be a
unit test and becomes an integration test.
A fake just provides an implementation of a piece of code that is different than
what runs in a production instance, but behaves identically from the standpoint
of the callers; this is usually done to replace a dependency that is hard to
deal with, or is slow.
A good example for this might be implementing a fake EEPROM that just stores the
"contents" in an internal buffer. For example, let's assume we have a class that
represents an EEPROM:
A fake class implements a piece of code that is different than what runs in a
production instance, but behaves identical from the standpoint of the callers.
This is done to replace a dependency that is hard to deal with, or is slow. For
example, implementing a fake EEPROM that stores the "contents" in an
internal buffer. Assume we have a class that represents an EEPROM:
.. code-block:: c
......@@ -316,7 +277,7 @@ represents an EEPROM:
ssize_t (*write)(struct eeprom *this, size_t offset, const char *buffer, size_t count);
};
And we want to test some code that buffers writes to the EEPROM:
And we want to test code that buffers writes to the EEPROM:
.. code-block:: c
......@@ -329,7 +290,7 @@ And we want to test some code that buffers writes to the EEPROM:
struct eeprom_buffer *new_eeprom_buffer(struct eeprom *eeprom);
void destroy_eeprom_buffer(struct eeprom *eeprom);
We can easily test this code by *faking out* the underlying EEPROM:
We can test this code by *faking out* the underlying EEPROM:
.. code-block:: c
......@@ -456,14 +417,14 @@ We can now use it to test ``struct eeprom_buffer``:
destroy_eeprom_buffer(ctx->eeprom_buffer);
}
Testing against multiple inputs
Testing Against Multiple Inputs
-------------------------------
Testing just a few inputs might not be enough to have confidence that the code
works correctly, e.g. for a hash function.
Testing just a few inputs is not enough to ensure that the code works correctly,
for example: testing a hash function.
In such cases, it can be helpful to have a helper macro or function, e.g. this
fictitious example for ``sha1sum(1)``
We can write a helper macro or function. The function is called for each input.
For example, to test ``sha1sum(1)``, we can write:
.. code-block:: c
......@@ -475,16 +436,15 @@ fictitious example for ``sha1sum(1)``
TEST_SHA1("hello world", "2aae6c35c94fcfb415dbe95f408b9ce91ee846ed");
TEST_SHA1("hello world!", "430ce34d020724ed75a196dfc2ad67c77772d169");
Note the use of the ``_MSG`` version of ``KUNIT_EXPECT_STREQ`` to print a more
detailed error and make the assertions clearer within the helper macros.
Note the use of ``KUNIT_EXPECT_STREQ_MSG`` to give more context when it fails
and make it easier to track down. (Yes, in this example, ``want`` is likely
going to be unique enough on its own).
The ``_MSG`` variants are useful when the same expectation is called multiple
times (in a loop or helper function) and thus the line number is not enough to
identify what failed, as shown below.
The ``_MSG`` variants are even more useful when the same expectation is called
multiple times (in a loop or helper function) and thus the line number isn't
enough to identify what failed, like below.
In some cases, it can be helpful to write a *table-driven test* instead, e.g.
In complicated cases, we recommend using a *table-driven test* compared to the
helper macro variation, for example:
.. code-block:: c
......@@ -513,17 +473,18 @@ In some cases, it can be helpful to write a *table-driven test* instead, e.g.
}
There's more boilerplate involved, but it can:
There is more boilerplate code involved, but it can:
* be more readable when there are multiple inputs/outputs (due to field names).
* be more readable when there are multiple inputs/outputs thanks to field names,
* For example, see ``fs/ext4/inode-test.c``.
* E.g. see ``fs/ext4/inode-test.c`` for an example of both.
* reduce duplication if test cases can be shared across multiple tests.
* reduce duplication if test cases are shared across multiple tests.
* E.g. if we wanted to also test ``sha256sum``, we could add a ``sha256``
* For example: if we want to test ``sha256sum``, we could add a ``sha256``
field and reuse ``cases``.
* be converted to a "parameterized test", see below.
* be converted to a "parameterized test".
Parameterized Testing
~~~~~~~~~~~~~~~~~~~~~
......@@ -531,7 +492,7 @@ Parameterized Testing
The table-driven testing pattern is common enough that KUnit has special
support for it.
Reusing the same ``cases`` array from above, we can write the test as a
By reusing the same ``cases`` array from above, we can write the test as a
"parameterized test" with the following.
.. code-block:: c
......@@ -582,193 +543,160 @@ Reusing the same ``cases`` array from above, we can write the test as a
.. _kunit-on-non-uml:
KUnit on non-UML architectures
==============================
By default KUnit uses UML as a way to provide dependencies for code under test.
Under most circumstances KUnit's usage of UML should be treated as an
implementation detail of how KUnit works under the hood. Nevertheless, there
are instances where being able to run architecture-specific code or test
against real hardware is desirable. For these reasons KUnit supports running on
other architectures.
Running existing KUnit tests on non-UML architectures
-----------------------------------------------------
Exiting Early on Failed Expectations
------------------------------------
There are some special considerations when running existing KUnit tests on
non-UML architectures:
We can use ``KUNIT_EXPECT_EQ`` to mark the test as failed and continue
execution. In some cases, it is unsafe to continue. We can use the
``KUNIT_ASSERT`` variant to exit on failure.
* Hardware may not be deterministic, so a test that always passes or fails
when run under UML may not always do so on real hardware.
* Hardware and VM environments may not be hermetic. KUnit tries its best to
provide a hermetic environment to run tests; however, it cannot manage state
that it doesn't know about outside of the kernel. Consequently, tests that
may be hermetic on UML may not be hermetic on other architectures.
* Some features and tooling may not be supported outside of UML.
* Hardware and VMs are slower than UML.
.. code-block:: c
None of these are reasons not to run your KUnit tests on real hardware; they are
only things to be aware of when doing so.
void example_test_user_alloc_function(struct kunit *test)
{
void *object = alloc_some_object_for_me();
Currently, the KUnit Wrapper (``tools/testing/kunit/kunit.py``) (aka
kunit_tool) only fully supports running tests inside of UML and QEMU; however,
this is only due to our own time limitations as humans working on KUnit. It is
entirely possible to support other emulators and even actual hardware, but for
now QEMU and UML is what is fully supported within the KUnit Wrapper. Again, to
be clear, this is just the Wrapper. The actualy KUnit tests and the KUnit
library they are written in is fully architecture agnostic and can be used in
virtually any setup, you just won't have the benefit of typing a single command
out of the box and having everything magically work perfectly.
/* Make sure we got a valid pointer back. */
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, object);
do_something_with_object(object);
}
Again, all core KUnit framework features are fully supported on all
architectures, and using them is straightforward: Most popular architectures
are supported directly in the KUnit Wrapper via QEMU. Currently, supported
architectures on QEMU include:
Allocating Memory
-----------------
* i386
* x86_64
* arm
* arm64
* alpha
* powerpc
* riscv
* s390
* sparc
Where you might use ``kzalloc``, you can instead use ``kunit_kzalloc`` as KUnit
will then ensure that the memory is freed once the test completes.
In order to run KUnit tests on one of these architectures via QEMU with the
KUnit wrapper, all you need to do is specify the flags ``--arch`` and
``--cross_compile`` when invoking the KUnit Wrapper. For example, we could run
the default KUnit tests on ARM in the following manner (assuming we have an ARM
toolchain installed):
This is useful because it lets us use the ``KUNIT_ASSERT_EQ`` macros to exit
early from a test without having to worry about remembering to call ``kfree``.
For example:
.. code-block:: bash
.. code-block:: c
tools/testing/kunit/kunit.py run --timeout=60 --jobs=12 --arch=arm --cross_compile=arm-linux-gnueabihf-
void example_test_allocation(struct kunit *test)
{
char *buffer = kunit_kzalloc(test, 16, GFP_KERNEL);
/* Ensure allocation succeeded. */
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buffer);
Alternatively, if you want to run your tests on real hardware or in some other
emulation environment, all you need to do is to take your kunitconfig, your
Kconfig options for the tests you would like to run, and merge them into
whatever config your are using for your platform. That's it!
KUNIT_ASSERT_STREQ(test, buffer, "");
}
For example, let's say you have the following kunitconfig:
.. code-block:: none
Testing Static Functions
------------------------
CONFIG_KUNIT=y
CONFIG_KUNIT_EXAMPLE_TEST=y
If we do not want to expose functions or variables for testing, one option is to
conditionally ``#include`` the test file at the end of your .c file. For
example:
If you wanted to run this test on an x86 VM, you might add the following config
options to your ``.config``:
.. code-block:: c
.. code-block:: none
/* In my_file.c */
CONFIG_KUNIT=y
CONFIG_KUNIT_EXAMPLE_TEST=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
static int do_interesting_thing();
All these new options do is enable support for a common serial console needed
for logging.
#ifdef CONFIG_MY_KUNIT_TEST
#include "my_kunit_test.c"
#endif
Next, you could build a kernel with these tests as follows:
Injecting Test-Only Code
------------------------
Similar to as shown above, we can add test-specific logic. For example:
.. code-block:: bash
.. code-block:: c
make ARCH=x86 olddefconfig
make ARCH=x86
/* In my_file.h */
Once you have built a kernel, you could run it on QEMU as follows:
#ifdef CONFIG_MY_KUNIT_TEST
/* Defined in my_kunit_test.c */
void test_only_hook(void);
#else
void test_only_hook(void) { }
#endif
.. code-block:: bash
This test-only code can be made more useful by accessing the current ``kunit_test``
as shown in next section: *Accessing The Current Test*.
qemu-system-x86_64 -enable-kvm \
-m 1024 \
-kernel arch/x86_64/boot/bzImage \
-append 'console=ttyS0' \
--nographic
Accessing The Current Test
--------------------------
Interspersed in the kernel logs you might see the following:
In some cases, we need to call test-only code from outside the test file.
For example, see example in section *Injecting Test-Only Code* or if
we are providing a fake implementation of an ops struct. Using
``kunit_test`` field in ``task_struct``, we can access it via
``current->kunit_test``.
.. code-block:: none
The example below includes how to implement "mocking":
TAP version 14
# Subtest: example
1..1
# example_simple_test: initializing
ok 1 - example_simple_test
ok 1 - example
.. code-block:: c
Congratulations, you just ran a KUnit test on the x86 architecture!
#include <linux/sched.h> /* for current */
In a similar manner, kunit and kunit tests can also be built as modules,
so if you wanted to run tests in this way you might add the following config
options to your ``.config``:
struct test_data {
int foo_result;
int want_foo_called_with;
};
.. code-block:: none
static int fake_foo(int arg)
{
struct kunit *test = current->kunit_test;
struct test_data *test_data = test->priv;
CONFIG_KUNIT=m
CONFIG_KUNIT_EXAMPLE_TEST=m
KUNIT_EXPECT_EQ(test, test_data->want_foo_called_with, arg);
return test_data->foo_result;
}
Once the kernel is built and installed, a simple
static void example_simple_test(struct kunit *test)
{
/* Assume priv (private, a member used to pass test data from
* the init function) is allocated in the suite's .init */
struct test_data *test_data = test->priv;
.. code-block:: bash
test_data->foo_result = 42;
test_data->want_foo_called_with = 1;
modprobe example-test
/* In a real test, we'd probably pass a pointer to fake_foo somewhere
* like an ops struct, etc. instead of calling it directly. */
KUNIT_EXPECT_EQ(test, fake_foo(1), 42);
}
...will run the tests.
In this example, we are using the ``priv`` member of ``struct kunit`` as a way
of passing data to the test from the init function. In general ``priv`` is
pointer that can be used for any user data. This is preferred over static
variables, as it avoids concurrency issues.
.. note::
Note that you should make sure your test depends on ``KUNIT=y`` in Kconfig
if the test does not support module build. Otherwise, it will trigger
compile errors if ``CONFIG_KUNIT`` is ``m``.
Had we wanted something more flexible, we could have used a named ``kunit_resource``.
Each test can have multiple resources which have string names providing the same
flexibility as a ``priv`` member, but also, for example, allowing helper
functions to create resources without conflicting with each other. It is also
possible to define a clean up function for each resource, making it easy to
avoid resource leaks. For more information, see Documentation/dev-tools/kunit/api/test.rst.
Writing new tests for other architectures
-----------------------------------------
Failing The Current Test
------------------------
The first thing you must do is ask yourself whether it is necessary to write a
KUnit test for a specific architecture, and then whether it is necessary to
write that test for a particular piece of hardware. In general, writing a test
that depends on having access to a particular piece of hardware or software (not
included in the Linux source repo) should be avoided at all costs.
If we want to fail the current test, we can use ``kunit_fail_current_test(fmt, args...)``
which is defined in ``<kunit/test-bug.h>`` and does not require pulling in ``<kunit/test.h>``.
For example, we have an option to enable some extra debug checks on some data
structures as shown below:
Even if you only ever plan on running your KUnit test on your hardware
configuration, other people may want to run your tests and may not have access
to your hardware. If you write your test to run on UML, then anyone can run your
tests without knowing anything about your particular setup, and you can still
run your tests on your hardware setup just by compiling for your architecture.
.. code-block:: c
.. important::
Always prefer tests that run on UML to tests that only run under a particular
architecture, and always prefer tests that run under QEMU or another easy
(and monetarily free) to obtain software environment to a specific piece of
hardware.
Nevertheless, there are still valid reasons to write an architecture or hardware
specific test: for example, you might want to test some code that really belongs
in ``arch/some-arch/*``. Even so, try your best to write the test so that it
does not depend on physical hardware: if some of your test cases don't need the
hardware, only require the hardware for tests that actually need it.
Now that you have narrowed down exactly what bits are hardware specific, the
actual procedure for writing and running the tests is pretty much the same as
writing normal KUnit tests. One special caveat is that you have to reset
hardware state in between test cases; if this is not possible, you may only be
able to run one test case per invocation.
#include <kunit/test-bug.h>
.. TODO(brendanhiggins@google.com): Add an actual example of an architecture-
dependent KUnit test.
#ifdef CONFIG_EXTRA_DEBUG_CHECKS
static void validate_my_data(struct data *data)
{
if (is_valid(data))
return;
KUnit debugfs representation
============================
When kunit test suites are initialized, they create an associated directory
in ``/sys/kernel/debug/kunit/<test-suite>``. The directory contains one file
kunit_fail_current_test("data %p is invalid", data);
- results: "cat results" displays results of each test case and the results
of the entire suite for the last test run.
/* Normal, non-KUnit, error reporting code here. */
}
#else
static void my_debug_function(void) { }
#endif
The debugfs representation is primarily of use when kunit test suites are
run in a native environment, either as modules or builtin. Having a way
to display results like this is valuable as otherwise results can be
intermixed with other events in dmesg output. The maximum size of each
results file is KUNIT_LOG_SIZE bytes (defined in ``include/kunit/test.h``).
......@@ -138,6 +138,17 @@ To pass extra options to Sphinx, you can use the ``SPHINXOPTS`` make
variable. For example, use ``make SPHINXOPTS=-v htmldocs`` to get more verbose
output.
It is also possible to pass an extra DOCS_CSS overlay file, in order to customize
the html layout, by using the ``DOCS_CSS`` make variable.
By default, the build will try to use the Read the Docs sphinx theme:
https://github.com/readthedocs/sphinx_rtd_theme
If the theme is not available, it will fall-back to the classic one.
The Sphinx theme can be overridden by using the ``DOCS_THEME`` make variable.
To remove the generated documentation, run ``make cleandocs``.
Writing Documentation
......@@ -250,12 +261,11 @@ please feel free to remove it.
list tables
-----------
We recommend the use of *list table* formats. The *list table* formats are
double-stage lists. Compared to the ASCII-art they might not be as
comfortable for
readers of the text files. Their advantage is that they are easy to
create or modify and that the diff of a modification is much more meaningful,
because it is limited to the modified content.
The list-table formats can be useful for tables that are not easily laid
out in the usual Sphinx ASCII-art formats. These formats are nearly
impossible for readers of the plain-text documents to understand, though,
and should be avoided in the absence of a strong justification for their
use.
The ``flat-table`` is a double-stage list similar to the ``list-table`` with
some additional features:
......
......@@ -169,7 +169,6 @@ prototypes::
int (*show_options)(struct seq_file *, struct dentry *);
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
locking rules:
All may block [not true, see below]
......@@ -194,7 +193,6 @@ umount_begin: no
show_options: no (namespace_sem)
quota_read: no (see below)
quota_write: no (see below)
bdev_try_to_free_page: no (see below)
====================== ============ ========================
->statfs() has s_umount (shared) when called by ustat(2) (native or
......@@ -210,9 +208,6 @@ dqio_sem) (unless an admin really wants to screw up something and
writes to quota files with quotas on). For other details about locking
see also dquot_operations section.
->bdev_try_to_free_page is called from the ->releasepage handler of
the block device inode. See there for more details.
file_system_type
================
......
.. SPDX-License-Identifier: GPL-2.0
==
===
RDS
===
......
......@@ -197,14 +197,29 @@ the build process, for example, or editor backup files) in the patch. The
file "dontdiff" in the Documentation directory can help in this regard;
pass it to diff with the "-X" option.
The tags mentioned above are used to describe how various developers have
been associated with the development of this patch. They are described in
detail in
the :ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
document; what follows here is a brief summary. Each of these lines has
the format:
The tags already briefly mentioned above are used to provide insights how
the patch came into being. They are described in detail in the
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>`
document; what follows here is a brief summary.
::
One tag is used to refer to earlier commits which introduced problems fixed by
the patch::
Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the first 12 characters of its SHA-1 ID")
Another tag is used for linking web pages with additional backgrounds or
details, for example a report about a bug fixed by the patch or a document
with a specification implemented by the patch::
Link: https://example.com/somewhere.html optional-other-stuff
Many maintainers when applying a patch also add this tag to link to the
latest public review posting of the patch; often this is automatically done
by tools like b4 or a git hook like the one described in
'Documentation/maintainer/configure-git.rst'.
A third kind of tag is used to document who was involved in the development of
the patch. Each of these uses this format::
tag: Full Name <email address> optional-other-stuff
......
......@@ -271,25 +271,6 @@ least a notification of the change, so that some information makes its way
into the manual pages. User-space API changes should also be copied to
linux-api@vger.kernel.org.
For small patches you may want to CC the Trivial Patch Monkey
trivial@kernel.org which collects "trivial" patches. Have a look
into the MAINTAINERS file for its current manager.
Trivial patches must qualify for one of the following rules:
- Spelling fixes in documentation
- Spelling fixes for errors which could break :manpage:`grep(1)`
- Warning fixes (cluttering with useless warnings is bad)
- Compilation fixes (only if they are actually correct)
- Runtime fixes (only if they actually fix things)
- Removing use of deprecated functions/macros
- Contact detail and documentation fixes
- Non-portable code replaced by portable code (even in arch-specific,
since people copy, as long as it's trivial)
- Any fix by the author/maintainer of the file (ie. patch monkey
in re-transmission mode)
No MIME, no links, no compression, no attachments. Just plain text
-------------------------------------------------------------------
......
......@@ -74,7 +74,6 @@ Quota, period and burst are managed within the cpu subsystem via cgroupfs.
to cgroup v1. For cgroup v2, see
:ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`.
- cpu.cfs_quota_us: the total available run-time within a period (in
- cpu.cfs_quota_us: run-time replenished within a period (in microseconds)
- cpu.cfs_period_us: the length of a period (in microseconds)
- cpu.stat: exports throttling statistics [explained further below]
......@@ -135,7 +134,7 @@ cpu.stat:
of the group have been throttled.
- nr_bursts: Number of periods burst occurs.
- burst_time: Cumulative wall-time (in nanoseconds) that any CPUs has used
above quota in respective periods
above quota in respective periods.
This interface is read-only.
......@@ -238,7 +237,7 @@ Examples
additionally, in case accumulation has been done.
With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU.
And 10ms burst will be equivalent to 20% of 1 CPU.
And 10ms burst will be equivalent to 20% of 1 CPU::
# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
# echo 50000 > cpu.cfs_period_us /* period = 50ms */
......
......@@ -81,8 +81,7 @@ of the kernel, gaining the protection of the kernel's strict memory
permissions as described above.
For variables that are initialized once at ``__init`` time, these can
be marked with the (new and under development) ``__ro_after_init``
attribute.
be marked with the ``__ro_after_init`` attribute.
What remains are variables that are updated rarely (e.g. GDT). These
will need another infrastructure (similar to the temporary exceptions
......
/* -*- coding: utf-8; mode: css -*-
*
* Sphinx HTML theme customization: read the doc
*
* Please don't add any color definition here, as the theme should
* work for both normal and dark modes.
*/
/* Improve contrast and increase size for easier reading. */
body {
font-family: serif;
color: black;
font-size: 100%;
}
......@@ -16,17 +16,8 @@ h1, h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend {
font-family: sans-serif;
}
.wy-menu-vertical li.current a {
color: #505050;
}
.wy-menu-vertical li.on a, .wy-menu-vertical li.current > a {
color: #303030;
}
div[class^="highlight"] pre {
font-family: monospace;
color: black;
font-size: 100%;
}
......@@ -104,13 +95,10 @@ div[class^="highlight"] pre {
/* Menu selection and keystrokes */
span.menuselection {
color: blue;
font-family: "Courier New", Courier, monospace
}
code.kbd, code.kbd span {
color: white;
background-color: darkblue;
font-weight: bold;
font-family: "Courier New", Courier, monospace
}
......
/* -*- coding: utf-8; mode: css -*-
*
* Sphinx HTML theme customization: color settings for RTD (non-dark) theme
*
*/
/* Improve contrast and increase size for easier reading. */
body {
color: black;
}
.wy-menu-vertical li.current a {
color: #505050;
}
.wy-menu-vertical li.on a, .wy-menu-vertical li.current > a {
color: #303030;
}
div[class^="highlight"] pre {
color: black;
}
@media screen {
/* Menu selection and keystrokes */
span.menuselection {
color: blue;
}
code.kbd, code.kbd span {
color: white;
background-color: darkblue;
}
}
......@@ -271,18 +271,29 @@ def get_c_namespace(app, docname):
def auto_markup(app, doctree, name):
global c_namespace
c_namespace = get_c_namespace(app, name)
def text_but_not_a_reference(node):
# The nodes.literal test catches ``literal text``, its purpose is to
# avoid adding cross-references to functions that have been explicitly
# marked with cc:func:.
if not isinstance(node, nodes.Text) or isinstance(node.parent, nodes.literal):
return False
child_of_reference = False
parent = node.parent
while parent:
if isinstance(parent, nodes.Referential):
child_of_reference = True
break
parent = parent.parent
return not child_of_reference
#
# This loop could eventually be improved on. Someday maybe we
# want a proper tree traversal with a lot of awareness of which
# kinds of nodes to prune. But this works well for now.
#
# The nodes.literal test catches ``literal text``, its purpose is to
# avoid adding cross-references to functions that have been explicitly
# marked with cc:func:.
#
for para in doctree.traverse(nodes.paragraph):
for node in para.traverse(nodes.Text):
if not isinstance(node.parent, nodes.literal):
for node in para.traverse(condition=text_but_not_a_reference):
node.parent.replace(node, markup_refs(name, app, node))
def setup(app):
......
......@@ -104,7 +104,7 @@ class KernelCmd(Directive):
return nodeList
def runCmd(self, cmd, **kwargs):
u"""Run command ``cmd`` and return it's stdout as unicode."""
u"""Run command ``cmd`` and return its stdout as unicode."""
try:
proc = subprocess.Popen(
......
......@@ -106,7 +106,7 @@ class KernelFeat(Directive):
return nodeList
def runCmd(self, cmd, **kwargs):
u"""Run command ``cmd`` and return it's stdout as unicode."""
u"""Run command ``cmd`` and return its stdout as unicode."""
try:
proc = subprocess.Popen(
......
......@@ -131,9 +131,7 @@ Ftrace Histogram Options
Since it is too long to write a histogram action as a string for per-event
action option, there are tree-style options under per-event 'hist' subkey
for the histogram actions. For the detail of the each parameter,
please read the event histogram document [3]_.
.. [3] See :ref:`Documentation/trace/histogram.rst <histogram>`
please read the event histogram document (Documentation/trace/histogram.rst)
ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist.[N.]keys = KEY1[, KEY2[...]]
Set histogram key parameters. (Mandatory)
......
......@@ -22,13 +22,14 @@ Linux PCI总线子系统
:numbered:
pci
Todolist:
pciebus-howto
pci-iov-howto
msi-howto
sysfs-pci
Todolist:
acpi-info
pci-error-recovery
pcieaer-howto
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/msi-howto.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
===========
MSI驱动指南
===========
:作者: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
:版权: 2003, 2008 Intel Corporation
关于本指南
==========
本指南介绍了消息标记中断(MSI)的基本知识,使用MSI相对于传统中断机制的优势,如何
改变你的驱动程序以使用MSI或MSI-X,以及在设备不支持MSI时可以尝试的一些基本诊断方法。
什么是MSI?
==========
信息信号中断是指从设备写到一个特殊的地址,导致CPU接收到一个中断。
MSI能力首次在PCI 2.2中规定,后来在PCI 3.0中得到增强,允许对每个中断进行单独屏蔽。
MSI-X功能也随着PCI 3.0被引入。它比MSI支持每个设备更多的中断,并允许独立配置中断。
设备可以同时支持MSI和MSI-X,但一次只能启用一个。
为什么用MSI?
============
有三个原因可以说明为什么使用MSI比传统的基于针脚的中断有优势。
基于针脚的PCI中断通常在几个设备之间共享。为了支持这一点,内核必须调用每个与中断相
关的中断处理程序,这导致了整个系统性能的降低。MSI从不共享,所以这个问题不会出现。
当一个设备将数据写入内存,然后引发一个基于引脚的中断时,有可能在所有的数据到达内存
之前,中断就已经到达了(这在PCI-PCI桥后面的设备中变得更有可能)。为了确保所有的数
据已经到达内存中,中断处理程序必须在引发中断的设备上读取一个寄存器。PCI事务排序规
则要求所有的数据在返回寄存器的值之前到达内存。使用MSI可以避免这个问题,因为中断产
生的写入不能通过数据写入,所以当中断发生时,驱动程序知道所有的数据已经到达内存中。
PCI设备每个功能只能支持一个基于引脚的中断。通常情况下,驱动程序必须查询设备以找出
发生了什么事件,这就减慢了对常见情况的中断处理。有了MSI,设备可以支持更多的中断,
允许每个中断被专门用于不同的目的。一种可能的设计是给不经常发生的情况(如错误)提供
自己的中断,这使得驱动程序可以更有效地处理正常的中断处理路径。其他可能的设计包括给
网卡的每个数据包队列或存储控制器的每个端口提供一个中断。
如何使用MSI
===========
PCI设备被初始化为使用基于引脚的中断。设备驱动程序必须将设备设置为使用MSI或MSI-X。
并非所有的机器都能正确地支持MSI,对于这些机器,下面描述的API将简单地失败,设备将
继续使用基于引脚的中断。
加入内核对MSI的支持
-------------------
为了支持MSI或MSI-X,内核在构建时必须启用CONFIG_PCI_MSI选项。这个选项只在某些架
构上可用,而且它可能取决于其他一些选项的设置。例如,在x86上,你必须同时启用X86_UP_APIC
或SMP,才能看到CONFIG_PCI_MSI选项。
使用MSI
-------
大部分沉重的工作是在PCI层为驱动程序完成的。驱动程序只需要请求PCI层为这个设备设置
MSI功能。
要自动使用MSI或MSI-X中断向量,请使用以下函数::
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags);
它为一个PCI设备分配最多至max_vecs的中断向量。它返回分配的向量数量或一个负的错误。
如果设备对最小数量的向量有要求,驱动程序可以传递一个min_vecs参数,设置为这个限制,
如果PCI核不能满足最小数量的向量,将返回-ENOSPC。
flags参数用来指定设备和驱动程序可以使用哪种类型的中断(PCI_IRQ_LEGACY, PCI_IRQ_MSI,
PCI_IRQ_MSIX)。一个方便的短语(PCI_IRQ_ALL_TYPES)也可以用来要求任何可能的中断类型。
如果PCI_IRQ_AFFINITY标志被设置,pci_alloc_irq_vectors()将把中断分散到可用的CPU上。
要获得传递给require_irq()和free_irq()的Linux IRQ号码和向量,请使用以下函数::
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
在删除设备之前,应使用以下功能释放任何已分配的资源::
void pci_free_irq_vectors(struct pci_dev *dev);
如果一个设备同时支持MSI-X和MSI功能,这个API将优先使用MSI-X,而不是MSI。MSI-X支
持1到2048之间的任何数量的中断。相比之下,MSI被限制为最多32个中断(而且必须是2的幂)。
此外,MSI中断向量必须连续分配,所以系统可能无法为MSI分配像MSI-X那样多的向量。在一
些平台上,MSI中断必须全部针对同一组CPU,而MSI-X中断可以全部针对不同的CPU。
如果一个设备既不支持MSI-X,也不支持MSI,它就会退回到一个传统的IRQ向量。
MSI或MSI-X中断的典型用法是分配尽可能多的向量,可能达到设备支持的极限。如果nvec大于
设备支持的数量,它将自动被限制在支持的限度内,所以没有必要事先查询支持的向量的数量。::
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
if (nvec < 0)
goto out_err;
如果一个驱动程序不能或不愿意处理可变数量的MSI中断,它可以要求一个特定数量的中断,将该
数量作为“min_vecs“和“max_vecs“参数传递给pci_alloc_irq_vectors()函数。::
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
if (ret < 0)
goto out_err;
上述请求类型的最臭名昭著的例子是为一个设备启用单一的MSI模式。它可以通过传递两个1作为
'min_vecs'和'max_vecs'来实现::
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
if (ret < 0)
goto out_err;
一些设备可能不支持使用传统的线路中断,在这种情况下,驱动程序可以指定只接受MSI或MSI-X。::
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
if (nvec < 0)
goto out_err;
传统API
-----------
以下用于启用和禁用MSI或MSI-X中断的旧API不应该在新代码中使用::
pci_enable_msi() /* deprecated */
pci_disable_msi() /* deprecated */
pci_enable_msix_range() /* deprecated */
pci_enable_msix_exact() /* deprecated */
pci_disable_msix() /* deprecated */
此外,还有一些API来提供支持的MSI或MSI-X向量的数量:pci_msi_vec_count()和
pci_msix_vec_count()。一般来说,应该避免使用这些方法,而是让pci_alloc_irq_vectors()
来限制向量的数量。如果你对向量的数量有合法的特殊用例,我们可能要重新审视这个决定,
并增加一个pci_nr_irq_vectors()助手,透明地处理MSI和MSI-X。
使用MSI时需要考虑的因素
-----------------------
自旋锁
~~~~~~
大多数设备驱动程序都有一个每的自旋锁,在中断处理程序中被占用。对于基于引脚的中断
或单一的MSI,没有必要禁用中断(Linux保证同一中断不会被重新输入)。如果一个设备
使用多个中断,驱动程序必须在锁被持有的时候禁用中断。如果设备发出一个不同的中断,
驱动程序将死锁,试图递归地获取自旋锁。这种死锁可以通过使用spin_lock_irqsave()
或spin_lock_irq()来避免,它们可以禁用本地中断并获取锁(见《不可靠的锁定指南》)。
如何判断一个设备上是否启用了MSI/MSI-X
-------------------------------------
使用“lspci -v“(以root身份)可能会显示一些具有“MSI“、“Message Signalled Interrupts“
或“MSI-X“功能的设备。这些功能中的每一个都有一个“启用“标志,后面是“+“(启用)
或“-“(禁用)。
MSI特性
=======
众所周知,一些PCI芯片组或设备不支持MSI。PCI协议栈提供了三种禁用MSI的方法:
1. 全局的
2. 在一个特定的桥后面的所有设备上
3. 在单一设备上
全局禁用MSI
-----------
一些主控芯片组根本无法正确支持MSI。如果我们幸运的话,制造商知道这一点,并在
ACPI FADT表中指明了它。在这种情况下,Linux会自动禁用MSI。有些板卡在表中没
有包括这一信息,因此我们必须自己检测它们。完整的列表可以在drivers/pci/quirks.c
中的quirk_disable_all_msi()函数附近找到。
如果你有一块有MSI问题的板子,你可以在内核命令行中传递pci=nomsi来禁用所有设
备上的MSI。你最好把问题报告给linux-pci@vger.kernel.org,包括完整的
“lspci -v“,这样我们就可以把这些怪癖添加到内核中。
禁用桥下的MSI
-------------
一些PCI桥接器不能在总线之间正确地路由MSI。在这种情况下,必须在桥接器后面的所
有设备上禁用MSI。
一些桥接器允许你通过改变PCI配置空间的一些位来启用MSI(特别是Hypertransport
芯片组,如nVidia nForce和Serverworks HT2000)。与主机芯片组一样,Linux大
多知道它们,如果可以的话,会自动启用MSI。如果你有一个Linux不知道的网桥,你可以
用你知道的任何方法在配置空间中启用MSI,然后通过以下方式在该网桥上启用MSI::
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
其中$bridge是你所启用的桥的PCI地址(例如0000:00:0e.0)。
要禁用MSI,请回显0而不是1。改变这个值应该谨慎进行,因为它可能会破坏这个桥下面所
有设备的中断处理。
同样,请通知 linux-pci@vger.kernel.org 任何需要特殊处理的桥。
在单一设备上关闭MSIs
--------------------
众所周知,有些设备的MSI实现是有问题的。通常情况下,这是在单个设备驱动程序中处理的,
但偶尔也有必要用一个古怪的方法来处理。一些驱动程序有一个选项可以禁用MSI的使用。虽然
这对驱动程序的作者来说是一个方便的变通办法,但这不是一个好的做法,不应该被模仿。
寻找设备上MSI被禁用的原因
-------------------------
从以上三个部分,你可以看到有许多原因导致MSI没有在某个设备上被启用。你的第一步应该是
仔细检查你的dmesg以确定你的机器是否启用了MSI。你还应该检查你的.config以确定你已经
启用了CONFIG_PCI_MSI。
然后,“lspci -t“给出一个设备上面的网列表。读取 ``/sys/bus/pci/devices/*/msi_bus``
将告诉你MSI是否被启用(1)或禁用(0)。如果在任何属于PCI根和设备之间的桥的msi_bus
文件中发现0,说明MSI被禁用。
也需要检查设备驱动程序,看它是否支持MSI。例如,它可能包含对带有PCI_IRQ_MSI或
PCI_IRQ_MSIX标志的pci_alloc_irq_vectors()的调用。
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/pci-iov-howto.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
.. _cn_pci-iov-howto:
==========================
PCI Express I/O 虚拟化指南
==========================
:版权: |copy| 2009 Intel Corporation
:作者: - Yu Zhao <yu.zhao@intel.com>
- Donald Dutile <ddutile@redhat.com>
概述
====
什么是SR-IOV
------------
单根I/O虚拟化(SR-IOV)是一种PCI Express扩展功能,它使一个物理设备显示为多个
虚拟设备。物理设备被称为物理功能(PF),而虚拟设备被称为虚拟功能(VF)。VF的分
配可以由PF通过封装在该功能中的寄存器动态控制。默认情况下,该功能未被启用,PF表
现为传统的PCIe设备。一旦开启,每个VF的PCI配置空间都可以通过自己的总线、设备和
功能编号(路由ID)来访问。而且每个VF也有PCI内存空间,用于映射其寄存器集。VF设
备驱动程序对寄存器集进行操作,这样它就可以发挥功能,并作为一个真正的现有PCI设备
出现。
使用指南
========
我怎样才能启用SR-IOV功能
------------------------
有多种方法可用于SR-IOV的启用。在第一种方法中,设备驱动(PF驱动)将通过SR-IOV
核心提供的API控制功能的启用和禁用。如果硬件具有SR-IOV能力,加载其PF驱动器将启
用它和与PF相关的所有VF。一些PF驱动需要设置一个模块参数,以确定要启用的VF的数量。
在第二种方法中,对sysfs文件sriov_numvfs的写入将启用和禁用与PCIe PF相关的VF。
这种方法实现了每个PF的VF启用/禁用值,而第一种方法则适用于同一设备的所有PF。此外,
PCI SRIOV核心支持确保启用/禁用操作是有效的,以减少同一检查在多个驱动程序中的重
复,例如,如果启用VF,检查numvfs == 0,确保numvfs <= totalvfs。
第二种方法是对新的/未来的VF设备的推荐方法。
我怎样才能使用虚拟功能
----------------------
在内核中,VF被视为热插拔的PCI设备,所以它们应该能够以与真正的PCI设备相同的方式
工作。VF需要的设备驱动与普通PCI设备的驱动相同。
开发者指南
==========
SR-IOV API
----------
用来开启SR-IOV功能:
(a) 对于第一种方法,在驱动程序中::
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
nr_virtfn'是要启用的VF的编号。
(b) 对于第二种方法,从sysfs::
echo 'nr_virtfn' > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
用来关闭SR-IOV功能:
(a) 对于第一种方法,在驱动程序中::
void pci_disable_sriov(struct pci_dev *dev);
(b) 对于第二种方法,从sysfs::
echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
要想通过主机上的兼容驱动启用自动探测VF,在启用SR-IOV功能之前运行下面的命令。这
是默认的行为。
::
echo 1 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
要禁止主机上的兼容驱动自动探测VF,请在启用SR-IOV功能之前运行以下命令。更新这个
入口不会影响已经被探测的VF。
::
echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
用例
----
下面的代码演示了SR-IOV API的用法
::
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
pci_enable_sriov(dev, NR_VIRTFN);
...
return 0;
}
static void dev_remove(struct pci_dev *dev)
{
pci_disable_sriov(dev);
...
}
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
{
...
return 0;
}
static int dev_resume(struct pci_dev *dev)
{
...
return 0;
}
static void dev_shutdown(struct pci_dev *dev)
{
...
}
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
{
if (numvfs > 0) {
...
pci_enable_sriov(dev, numvfs);
...
return numvfs;
}
if (numvfs == 0) {
....
pci_disable_sriov(dev);
...
return 0;
}
}
static struct pci_driver dev_driver = {
.name = "SR-IOV Physical Function driver",
.id_table = dev_id_table,
.probe = dev_probe,
.remove = dev_remove,
.suspend = dev_suspend,
.resume = dev_resume,
.shutdown = dev_shutdown,
.sriov_configure = dev_sriov_configure,
};
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/pciebus-howto.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
.. _cn_pciebus-howto:
===========================
PCI Express端口总线驱动指南
===========================
:作者: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
:版权: |copy| 2004 Intel Corporation
关于本指南
==========
本指南介绍了PCI Express端口总线驱动程序的基本知识,并提供了如何使服务驱
动程序在PCI Express端口总线驱动程序中注册/取消注册的介绍。
什么是PCI Express端口总线驱动程序
=================================
一个PCI Express端口是一个逻辑的PCI-PCI桥结构。有两种类型的PCI Express端
口:根端口和交换端口。根端口从PCI Express根综合体发起一个PCI Express链接,
交换端口将PCI Express链接连接到内部逻辑PCI总线。交换机端口,其二级总线代表
交换机的内部路由逻辑,被称为交换机的上行端口。交换机的下行端口是从交换机的内部
路由总线桥接到代表来自PCI Express交换机的下游PCI Express链接的总线。
一个PCI Express端口可以提供多达四个不同的功能,在本文中被称为服务,这取决于
其端口类型。PCI Express端口的服务包括本地热拔插支持(HP)、电源管理事件支持(PME)、
高级错误报告支持(AER)和虚拟通道支持(VC)。这些服务可以由一个复杂的驱动程序
处理,也可以单独分布并由相应的服务驱动程序处理。
为什么要使用PCI Express端口总线驱动程序?
=========================================
在现有的Linux内核中,Linux设备驱动模型允许一个物理设备只由一个驱动处理。
PCI Express端口是一个具有多个不同服务的PCI-PCI桥设备。为了保持一个干净和简
单的解决方案,每个服务都可以有自己的软件服务驱动。在这种情况下,几个服务驱动将
竞争一个PCI-PCI桥设备。例如,如果PCI Express根端口的本机热拔插服务驱动程序
首先被加载,它就会要求一个PCI-PCI桥根端口。因此,内核不会为该根端口加载其他服
务驱动。换句话说,使用当前的驱动模型,不可能让多个服务驱动同时加载并运行在
PCI-PCI桥设备上。
为了使多个服务驱动程序同时运行,需要有一个PCI Express端口总线驱动程序,它管
理所有填充的PCI Express端口,并根据需要将所有提供的服务请求分配给相应的服务
驱动程序。下面列出了使用PCI Express端口总线驱动程序的一些关键优势:
- 允许在一个PCI-PCI桥接端口设备上同时运行多个服务驱动。
- 允许以独立的分阶段方式实施服务驱动程序。
- 允许一个服务驱动程序在多个PCI-PCI桥接端口设备上运行。
- 管理和分配PCI-PCI桥接端口设备的资源给要求的服务驱动程序。
配置PCI Express端口总线驱动程序与服务驱动程序
=============================================
将PCI Express端口总线驱动支持纳入内核
-------------------------------------
包括PCI Express端口总线驱动程序取决于内核配置中是否包含PCI Express支持。当内核
中的PCI Express支持被启用时,内核将自动包含PCI Express端口总线驱动程序作为内核
驱动程序。
启用服务驱动支持
----------------
PCI设备驱动是基于Linux设备驱动模型实现的。所有的服务驱动都是PCI设备驱动。如上所述,
一旦内核加载了PCI Express端口总线驱动程序,就不可能再加载任何服务驱动程序。为了满
足PCI Express端口总线驱动程序模型,需要对现有的服务驱动程序进行一些最小的改变,其
对现有的服务驱动程序的功能没有影响。
服务驱动程序需要使用下面所示的两个API,将其服务注册到PCI Express端口总线驱动程
序中(见第5.2.1和5.2.2节)。在调用这些API之前,服务驱动程序必须初始化头文件
/include/linux/pcieport_if.h中的pcie_port_service_driver数据结构。如果不这
样做,将导致身份不匹配,从而使PCI Express端口总线驱动程序无法加载服务驱动程序。
pcie_port_service_register
~~~~~~~~~~~~~~~~~~~~~~~~~~
::
int pcie_port_service_register(struct pcie_port_service_driver *new)
这个API取代了Linux驱动模型的 pci_register_driver API。一个服务驱动应该总是在模
块启动时调用 pcie_port_service_register。请注意,在服务驱动被加载后,诸如
pci_enable_device(dev) 和 pci_set_master(dev) 的调用不再需要,因为这些调用由
PCI端口总线驱动执行。
pcie_port_service_unregister
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
pcie_port_service_unregister取代了Linux驱动模型的pci_unregister_driver。当一
个模块退出时,它总是被服务驱动调用。
示例代码
~~~~~~~~
下面是服务驱动代码示例,用于初始化端口服务的驱动程序数据结构。
::
static struct pcie_port_service_id service_id[] = { {
.vendor = PCI_ANY_ID,
.device = PCI_ANY_ID,
.port_type = PCIE_RC_PORT,
.service_type = PCIE_PORT_SERVICE_AER,
}, { /* end: all zeroes */ }
};
static struct pcie_port_service_driver root_aerdrv = {
.name = (char *)device_name,
.id_table = &service_id[0],
.probe = aerdrv_load,
.remove = aerdrv_unload,
.suspend = aerdrv_suspend,
.resume = aerdrv_resume,
};
下面是一个注册/取消注册服务驱动的示例代码。
::
static int __init aerdrv_service_init(void)
{
int retval = 0;
retval = pcie_port_service_register(&root_aerdrv);
if (!retval) {
/*
* FIX ME
*/
}
return retval;
}
static void __exit aerdrv_service_exit(void)
{
pcie_port_service_unregister(&root_aerdrv);
}
module_init(aerdrv_service_init);
module_exit(aerdrv_service_exit);
可能的资源冲突
==============
由于PCI-PCI桥接端口设备的所有服务驱动被允许同时运行,下面列出了一些可能的资源冲突和
建议的解决方案。
MSI 和 MSI-X 向量资源
---------------------
一旦设备上的MSI或MSI-X中断被启用,它就会一直保持这种模式,直到它们再次被禁用。由于同
一个PCI-PCI桥接端口的服务驱动程序共享同一个物理设备,如果一个单独的服务驱动程序启用或
禁用MSI/MSI-X模式,可能会导致不可预知的行为。
为了避免这种情况,所有的服务驱动程序都不允许在其设备上切换中断模式。PCI Express端口
总线驱动程序负责确定中断模式,这对服务驱动程序来说应该是透明的。服务驱动程序只需要知道
分配给结构体pcie_device的字段irq的向量IRQ,当PCI Express端口总线驱动程序探测每
个服务驱动程序时,它被传入。服务驱动应该使用(struct pcie_device*)dev->irq来调用
request_irq/free_irq。此外,中断模式被存储在struct pcie_device的interrupt_mode
字段中。
PCI内存/IO映射的区域
--------------------
PCI Express电源管理(PME)、高级错误报告(AER)、热插拔(HP)和虚拟通道(VC)的服务
驱动程序访问PCI Express端口的PCI配置空间。在所有情况下,访问的寄存器是相互独立的。这
个补丁假定所有的服务驱动程序都会表现良好,不会覆盖其他服务驱动程序的配置设置。
PCI配置寄存器
-------------
每个服务驱动都在自己的功能结构体上运行PCI配置操作,除了PCI Express功能结构体,其中根控制
寄存器和设备控制寄存器是在PME和AER之间共享。这个补丁假定所有的服务驱动都会表现良好,不会
覆盖其他服务驱动的配置设置。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/sysfs-pci.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
========================
通过sysfs访问PCI设备资源
========================
sysfs,通常挂载在/sys,在支持它的平台上提供对PCI资源的访问。例如,一个特定的总线可能看起
来像这样::
/sys/devices/pci0000:17
|-- 0000:17:00.0
| |-- class
| |-- config
| |-- device
| |-- enable
| |-- irq
| |-- local_cpus
| |-- remove
| |-- resource
| |-- resource0
| |-- resource1
| |-- resource2
| |-- revision
| |-- rom
| |-- subsystem_device
| |-- subsystem_vendor
| `-- vendor
`-- ...
最上面的元素描述了PCI域和总线号码。在这种情况下,域号是0000,总线号是17(两个值都是十六进制)。
这个总线在0号插槽中包含一个单一功能的设备。为了方便起见,我们复制了域和总线的编号。在设备目录
下有几个文件,每个文件都有自己的功能。
=================== =====================================================
文件 功能
=================== =====================================================
class PCI级别 (ascii, ro)
config PCI配置空间 (binary, rw)
device PCI设备 (ascii, ro)
enable 设备是否被启用 (ascii, rw)
irq IRQ编号 (ascii, ro)
local_cpus 临近CPU掩码(cpumask, ro)
remove 从内核的列表中删除设备 (ascii, wo)
resource PCI资源主机地址 (ascii, ro)
resource0..N PCI资源N,如果存在的话 (binary, mmap, rw\ [1]_)
resource0_wc..N_wc PCI WC映射资源N,如果可预取的话 (binary, mmap)
revision PCI修订版 (ascii, ro)
rom PCI ROM资源,如果存在的话 (binary, ro)
subsystem_device PCI子系统设备 (ascii, ro)
subsystem_vendor PCI子系统供应商 (ascii, ro)
vendor PCI供应商 (ascii, ro)
=================== =====================================================
::
ro - 只读文件
rw - 文件是可读和可写的
wo - 只写文件
mmap - 文件是可移动的
ascii - 文件包含ascii文本
binary - 文件包含二进制数据
cpumask - 文件包含一个cpumask类型的
.. [1] rw 仅适用于 IORESOURCE_IO(I/O 端口)区域
只读文件是信息性的,对它们的写入将被忽略,但 "rom "文件除外。可写文件可以用来在设备上执
行操作(例如,改变配置空间,分离设备)。 mmapable文件可以通过偏移量为0的文件的mmap获得,
可以用来从用户空间进行实际的设备编程。注意,有些平台不支持某些资源的mmapping,所以一定要
检查任何尝试的mmap的返回值。其中最值得注意的是I/O端口资源,它也提供读/写访问。
enable "文件提供了一个计数器,表明设备已经被启用了多少次。如果'enable'文件目前返回'4',
而一个'1'被呼入它,它将返回'5'。向它呼入一个'0'会减少计数。不过,即使它返回到0,一些初始
化可能也不会被逆转。
rom "文件很特别,因为它提供了对设备ROM文件的只读访问,如果有的话。然而,它在默认情况下是
禁用的,所以应用程序应该在尝试读取调用之前将字符串 "1 "写入该文件以启用它,并在访问之后将
"0 "写入该文件以禁用它。请注意,设备必须被启用,才能成功返回数据。如果驱动没有被绑定到设备
上,可以使用上面提到的 "enable "文件将其启用。
remove "文件是用来移除PCI设备的,通过向该文件写入一个非零的整数。这并不涉及任何形式的热插
拔功能,例如关闭设备的电源。该设备被从内核的PCI设备列表中移除,它的sysfs目录被移除,并且该
设备将被从任何连接到它的驱动程序中移除。移除PCI根总线是不允许的。
通过sysfs访问原有资源
---------------------
如果底层平台支持的话,传统的I/O端口和ISA内存资源也会在sysfs中提供。它们位于PCI类的层次结构
中,例如::
/sys/class/pci_bus/0000:17/
|-- bridge -> ../../../devices/pci0000:17
|-- cpuaffinity
|-- legacy_io
`-- legacy_mem
legacy_io文件是一个读/写文件,可以被应用程序用来做传统的端口I/O。应用程序应该打开该文件,寻
找所需的端口(例如0x3e8),并进行1、2或4字节的读或写。legacy_mem文件应该被mmapped,其偏移
量与所需的内存偏移量相对应,例如0xa0000用于VGA帧缓冲器。然后,应用程序可以简单地解除引用返回
的指针(当然是在检查了错误之后)来访问遗留内存空间。
支持新平台上的PCI访问
---------------------
为了支持上述的PCI资源映射,Linux平台代码最好定义ARCH_GENERIC_PCI_MMAP_RESOURCE并使用该
功能的通用实现。为了支持通过/proc/bus/pci中的文件实现mmap()的历史接口,平台也可以设置
HAVE_PCI_MMAP。
另外,设置了 HAVE_PCI_MMAP 的平台可以提供他们自己的 pci_mmap_page_range() 实现,而不是定
义 ARCH_GENERIC_PCI_MMAP_RESOURCE。
支持PCI资源的写组合映射的平台必须定义arch_can_pci_mmap_wc(),当写组合被允许时,在运行时应
评估为非零。支持I/O资源映射的平台同样定义arch_can_pci_mmap_io()。
遗留资源由HAVE_PCI_LEGACY定义保护。希望支持遗留功能的平台应该定义它并提供 pci_legacy_read,
pci_legacy_write 和 pci_mmap_legacy_page_range 函数。
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/accounting/delay-accounting.rst
:Translator: Yang Yang <yang.yang29@zte.com.cn>
========
延时计数
========
任务在等待某些内核资源可用时,会造成延时。例如一个可运行的任务可能会等待
一个空闲CPU来运行。
基于每任务的延时计数功能度量由以下情况造成的任务延时:
a) 等待一个CPU(任务为可运行)
b) 完成由该任务发起的块I/O同步请求
c) 页面交换
d) 内存回收
并将这些统计信息通过taskstats接口提供给用户空间。
这些延时信息为适当的调整任务CPU优先级、io优先级、rss限制提供反馈。重要任务
长期延时,表示可能需要提高其相关优先级。
通过使用taskstats接口,本功能还可提供一个线程组(对应传统Unix进程)所有任务
(或线程)的总延时统计信息。此类汇总往往是需要的,由内核来完成更加高效。
用户空间的实体,特别是资源管理程序,可将延时统计信息汇总到任意组中。为实现
这一点,任务的延时统计信息在其生命周期内和退出时皆可获取,从而确保可进行
连续、完整的监控。
接口
----
延时计数使用taskstats接口,该接口由本目录另一个单独的文档详细描述。Taskstats
向用户态返回一个通用数据结构,对应每pid或每tgid的统计信息。延时计数功能填写
该数据结构的特定字段。见
include/linux/taskstats.h
其描述了延时计数相关字段。系统通常以计数器形式返回 CPU、同步块 I/O、交换、内存
回收等的累积延时。
取任务某计数器两个连续读数的差值,将得到任务在该时间间隔内等待对应资源的总延时。
当任务退出时,内核会将包含每任务的统计信息发送给用户空间,而无需额外的命令。
若其为线程组最后一个退出的任务,内核还会发送每tgid的统计信息。更多详细信息见
taskstats接口的描述。
tools/accounting目录中的用户空间程序getdelays.c提供了一些简单的命令,用以显示
延时统计信息。其也是使用taskstats接口的示例。
用法
----
使用以下配置编译内核::
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASKSTATS=y
延时计数在启动时默认关闭。
若需开启,在启动参数中增加::
delayacct
本文后续的说明基于延时计数已开启。也可在系统运行时,使用sysctl的
kernel.task_delayacct进行开关。注意,只有在启用延时计数后启动的
任务才会有相关信息。
系统启动后,使用类似getdelays.c的工具获取任务或线程组(tgid)的延时信息。
getdelays命令的一般格式::
getdelays [-t tgid] [-p pid] [-c cmd...]
获取pid为10的任务从系统启动后的延时信息::
# ./getdelays -p 10
(输出信息和下例相似)
获取所有tgid为5的任务从系统启动后的总延时信息::
# ./getdelays -t 5
CPU count real total virtual total delay total
7876 92005750 100000000 24001500
IO count delay total
0 0
SWAP count delay total
0 0
RECLAIM count delay total
0 0
获取指定简单命令运行时的延时信息::
# ./getdelays -c ls /
bin data1 data3 data5 dev home media opt root srv sys usr
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
CPU count real total virtual total delay total
6 4000250 4000000 0
IO count delay total
0 0
SWAP count delay total
0 0
RECLAIM count delay total
0 0
......@@ -15,11 +15,11 @@
.. toctree::
:maxdepth: 1
delay-accounting
psi
taskstats
Todolist:
cgroupstats
delay-accounting
taskstats
taskstats-struct
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/accounting/taskstats.rst
:Translator: Yang Yang <yang.yang29@zte.com.cn>
================
每任务的统计接口
================
Taskstats是一个基于netlink的接口,用于从内核向用户空间发送每任务及每进程的
统计信息。
Taskstats设计目的:
- 在任务生命周期内和退出时高效的提供统计信息
- 统一不同计数子系统的接口
- 支持未来计数系统的扩展
术语
----
“pid”、“tid”、“任务”互换使用,用于描述由struct task_struct定义的标准
Linux任务。“每pid的统计数据”等价于“每任务的统计数据”。
“tgid”、“进程”、“线程组”互换使用,用于描述共享mm_struct的任务集,
也就是传统的Unix进程。尽管使用了tgid这个词,即使一个任务是线程组组长,
对它的处理也没有什么不同。只要一个进程还有任何归属它的任务,它就被认为
活着。
用法
----
为了在任务生命周期内获得统计信息,用户空间需打开一个单播的netlink套接字
(NETLINK_GENERIC族)然后发送指定pid或tgid的命令。响应消息中包含单个
任务的统计信息(若指定了pid)或进程所有任务汇总的统计信息(若指定了tgid)。
为了在任务退出时获取统计信息,用户空间的监听者发送一个指定cpu掩码的注册命令。
cpu掩码内的cpu上有任务退出时,每pid的统计信息将发送给注册成功的监听者。使用
cpu掩码可以限制一个监听者收到的数据,并有助于对netlink接口进行流量控制,后文
将进行更详细的解释。
如果正在退出的任务是线程组中最后一个退出的线程,额外一条包含每tgid统计信息的
记录也将发送给用户空间。后者包含线程组中所有线程(包括过去和现在)的每pid统计
信息总和。
getdelays.c是一个简单的示例,用以演示如何使用taskstats接口获取延迟统计信息。
用户可注册cpu掩码、发送命令和处理响应、监听每tid/tgid退出数据、将收到的数据
写入文件、通过增大接收缓冲区进行基本的流量控制。
接口
----
内核用户接口封装在include/linux/taskstats.h。
为避免本文档随着接口的演进而过期,本文仅给出当前版本的概要。当本文与taskstats.h
不一致时,以taskstats.h为准。
struct taskstats是每pid和每tgid数据共用的计数结构体。它是版本化的,可在内核新增
计数子系统时进行扩展。taskstats.h中定义了各字段及语义。
用户、内核空间的数据交换是属于NETLINK_GENERIC族的netlink消息,使用netlink属性
接口。消息格式如下::
+----------+- - -+-------------+-------------------+
| nlmsghdr | Pad | genlmsghdr | taskstats payload |
+----------+- - -+-------------+-------------------+
Taskstats载荷有三种类型:
1. 命令:由用户发送给内核。获取指定pid/tgid数据的命令包含一个类型为
TASKSTATS_CMD_ATTR_PID/TGID的属性,该属性包含u32的pid或tgid载荷。
pid/tgid指示用户空间要统计的任务/进程。
注册/注销获取指定cpu集上退出数据的命令包含一个类型为
TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK的属性,该属性包含cpu掩码载荷。
cpu掩码是以ascii码表示,用逗号分隔的cpu范围。例如若需监听1,2,3,5,7,8号cpu的
退出数据,cpu掩码表示为"1-3,5,7-8"。若用户空间在关闭监听套接字前忘了注销监听
的cpu集,随着时间的推移,内核会清理此监听集。但是,出于提效的目的,建议明确
执行注销。
2. 命令的应答:内核发出应答用户空间的命令。载荷有三类属性:
a) TASKSTATS_TYPE_AGGR_PID/TGID: 本属性不包含载荷,用以指示其后为被统计对象
的pig/tgid。
b) TASKSTATS_TYPE_PID/TGID:本属性的载荷为pig/tgid,其统计信息将被返回。
c) TASKSTATS_TYPE_STATS:本属性的载荷为一个struct taskstats实例。每pid和
每tgid统计信息共用该结构体。
3. 内核会在任务退出时发送新消息。其载荷包含一系列以下类型的属性:
a) TASKSTATS_TYPE_AGGR_PID:指示其后两个属性为pid+stats。
b) TASKSTATS_TYPE_PID:包含退出任务的pid。
c) TASKSTATS_TYPE_STATS:包含退出任务的每pid统计信息
d) TASKSTATS_TYPE_AGGR_TGID:指示其后两个属性为tgid+stats。
e) TASKSTATS_TYPE_TGID:包含任务所属进程的tgid
f) TASKSTATS_TYPE_STATS:包含退出任务所属进程的每tgid统计信息
每tgid的统计
------------
除了每任务的统计信息,taskstats还提供每进程的统计信息,因为资源管理通常以进程
粒度完成,并且仅在用户空间聚合任务统计信息效率低下且可能不准确(缺乏原子性)。
然而,除了每任务统计信息,在内核中维护每进程统计信息存在额外的时间和空间开销。
为解决此问题,taskstats代码将退出任务的统计信息累积到进程范围的数据结构中。
当进程最后一个任务退出时,累积的进程级数据也会发送到用户空间(与每任务数据一起)。
当用户查询每tgid数据时,内核将指定线程组中所有活动线程的统计信息相加,并添加到
该线程组的累积总数(含之前退出的线程)。
扩展taskstats
-------------
有两种方法可在未来修改内核扩展taskstats接口,以导出更多的每任务/进程统计信息:
1. 在现有struct taskstats末尾增加字段。该结构体中的版本号确保了向后兼容性。
用户空间将仅使用与其版本对应的结构体字段。
2. 定义单独的统计结构体并使用netlink属性接口返回对应的数据。由于用户空间独立
处理每个netlink属性,所以总是可以忽略其不理解类型的属性(因为使用了旧版本接口)。
在1.和2.之间进行选择,属于权衡灵活性和开销的问题。若仅需增加少数字段,那么1.是
首选方法,因为内核和用户空间无需承担处理新netlink属性的开销。但若新字段过多的
扩展现有结构体,导致不同的用户空间计数程序不必要的接收大型结构体,而对结构体
字段并不感兴趣,那么2.是值得的。
Taskstats的流量控制
-------------------
当退出任务数速率变大,监听者可能跟不上内核发送每tid/tgid退出数据的速率,而导致
数据丢失。taskstats结构体变大、cpu数量上升,都会导致这种可能性增加。
为避免统计信息丢失,用户空间应执行以下操作中至少一项:
- 增大监听者用于接收退出数据的netlink套接字接收缓存区。
- 创建更多的监听者,减少每个监听者监听的cpu数量。极端情况下可为每个cpu创建
一个监听者。用户还可考虑将监听者的cpu亲和性设置为监听cpu的子集,特别是当他们
仅监听一个cpu。
尽管采取了这些措施,若用户空间仍收到指示接收缓存区溢出的ENOBUFS错误消息,
则应采取其他措施处理数据丢失。
......@@ -133,7 +133,7 @@ Linux内核5.x版本 <http://kernel.org/>
即使只升级一个小版本,也不要跳过此步骤。每个版本中都会添加新的配置选项,
如果配置文件没有按预定设置,就会出现奇怪的问题。如果您想以最少的工作量
将现有配置升级到新版本,请使用 ``makeoldconfig`` ,它只会询问您新配置
将现有配置升级到新版本,请使用 ``make oldconfig`` ,它只会询问您新配置
选项的答案。
- 其他配置命令包括::
......@@ -161,7 +161,7 @@ Linux内核5.x版本 <http://kernel.org/>
"make ${PLATFORM}_defconfig"
使用arch/$arch/configs/${PLATFORM}_defconfig中
的默认选项值创建一个./.config文件。
用“makehelp”来获取您体系架构中所有可用平台的列表。
用“make help”来获取您体系架构中所有可用平台的列表。
"make allyesconfig"
通过尽可能将选项值设置为“y”,创建一个
......@@ -197,9 +197,10 @@ Linux内核5.x版本 <http://kernel.org/>
"make localyesconfig" 与localmodconfig类似,只是它会将所有模块选项转换
为内置(=y)。你可以同时通过LMC_KEEP保留模块。
"make kvmconfig" 为kvm客体内核支持启用其他选项。
"make kvm_guest.config"
为kvm客户机内核支持启用其他选项。
"make xenconfig" 为xen dom0客体内核支持启用其他选项。
"make xen.config" 为xen dom0客户机内核支持启用其他选项。
"make tinyconfig" 配置尽可能小的内核。
......@@ -229,7 +230,7 @@ Linux内核5.x版本 <http://kernel.org/>
请注意,您仍然可以使用此内核运行a.out用户程序。
- 执行 ``make`` 来创建压缩内核映像。如果您安装了lilo以适配内核makefile,
那么也可以进行 ``makeinstall`` ,但是您可能需要先检查特定的lilo设置。
那么也可以进行 ``make install`` ,但是您可能需要先检查特定的lilo设置。
实际安装必须以root身份执行,但任何正常构建都不需要。
无须徒然使用root身份。
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/admin-guide/cputopology.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
==========================
如何通过sysfs将CPU拓扑导出
==========================
CPU拓扑信息通过sysfs导出。显示的项(属性)和某些架构的/proc/cpuinfo输出相似。它们位于
/sys/devices/system/cpu/cpuX/topology/。请阅读ABI文件:
Documentation/ABI/stable/sysfs-devices-system-cpu。
drivers/base/topology.c是体系结构中性的,它导出了这些属性。然而,die、cluster、book、
draw这些层次结构相关的文件仅在体系结构提供了下文描述的宏的条件下被创建。
对于支持这个特性的体系结构,它必须在include/asm-XXX/topology.h中定义这些宏中的一部分::
#define topology_physical_package_id(cpu)
#define topology_die_id(cpu)
#define topology_cluster_id(cpu)
#define topology_core_id(cpu)
#define topology_book_id(cpu)
#define topology_drawer_id(cpu)
#define topology_sibling_cpumask(cpu)
#define topology_core_cpumask(cpu)
#define topology_cluster_cpumask(cpu)
#define topology_die_cpumask(cpu)
#define topology_book_cpumask(cpu)
#define topology_drawer_cpumask(cpu)
``**_id macros`` 的类型是int。
``**_cpumask macros`` 的类型是 ``(const) struct cpumask *`` 。后者和恰当的
``**_siblings`` sysfs属性对应(除了topology_sibling_cpumask(),它和thread_siblings
对应)。
为了在所有体系结构上保持一致,include/linux/topology.h提供了上述所有宏的默认定义,以防
它们未在include/asm-XXX/topology.h中定义:
1) topology_physical_package_id: -1
2) topology_die_id: -1
3) topology_cluster_id: -1
4) topology_core_id: 0
5) topology_book_id: -1
6) topology_drawer_id: -1
7) topology_sibling_cpumask: 仅入参CPU
8) topology_core_cpumask: 仅入参CPU
9) topology_cluster_cpumask: 仅入参CPU
10) topology_die_cpumask: 仅入参CPU
11) topology_book_cpumask: 仅入参CPU
12) topology_drawer_cpumask: 仅入参CPU
此外,CPU拓扑信息由/sys/devices/system/cpu提供,包含下述文件。输出对应的内部数据源放在
方括号("[]")中。
=========== ==================================================================
kernel_max: 内核配置允许的最大CPU下标值。[NR_CPUS-1]
offline: 由于热插拔移除或者超过内核允许的CPU上限(上文描述的kernel_max)
导致未上线的CPU。[~cpu_online_mask + cpus >= NR_CPUS]
online: 在线的CPU,可供调度使用。[cpu_online_mask]
possible: 已被分配资源的CPU,如果它们CPU实际存在,可以上线。
[cpu_possible_mask]
present: 被系统识别实际存在的CPU。[cpu_present_mask]
=========== ==================================================================
上述输出的格式和cpulist_parse()兼容[参见 <linux/cpumask.h>]。下面给些例子。
在本例中,系统中有64个CPU,但是CPU 32-63超过了kernel_max值,因为NR_CPUS配置项是32,
取值范围被限制为0..31。此外注意CPU2和4-31未上线,但是可以上线,因为它们同时存在于
present和possible::
kernel_max: 31
offline: 2,4-31,32-63
online: 0-1,3
possible: 0-31
present: 0-31
在本例中,NR_CPUS配置项是128,但内核启动时设置possible_cpus=144。系统中有4个CPU,
CPU2被手动设置下线(也是唯一一个可以上线的CPU)::
kernel_max: 127
offline: 2,4-127,128-143
online: 0-1,3
possible: 0-127
present: 0-3
阅读Documentation/core-api/cpu_hotplug.rst可了解开机参数possible_cpus=NUM,同时还
可以了解各种cpumask的信息。
......@@ -65,6 +65,7 @@ Todolist:
clearing-warn-once
cpu-load
cputopology
lockup-watchdogs
unicode
sysrq
......@@ -84,7 +85,6 @@ Todolist:
cgroup-v1/index
cgroup-v2
cifs/index
cputopology
dell_rbu
device-mapper/index
edid
......
......@@ -7,7 +7,9 @@
司延腾 Yanteng Si <siyanteng@loongson.cn>
.. _cn_core.rst:
:校译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
====================================
CPUFreq核心和CPUFreq通知器的通用说明
......@@ -29,10 +31,10 @@ CPUFreq核心和CPUFreq通知器的通用说明
======================
cpufreq核心代码位于drivers/cpufreq/cpufreq.c中。这些cpufreq代码为CPUFreq架构的驱
动程序(那些操作硬件切换频率的代码)以及 "通知器 "提供了一个标准化的接口。
这些是设备驱动程序或需要了解策略变化的其它内核部分(如 ACPI 热量管理)或所有频率更改(除
计时代码外),甚至需要强制确定速度限制的通知器(如 ARM 架构上的 LCD 驱动程序)
此外, 内核 "常数" loops_per_jiffy会根据频率变化而更新。
动程序(那些执行硬件频率切换的代码)以及 "通知器" 提供了一个标准化的接口。
包括设备驱动程序;需要了解策略变化(如 ACPI 热量管理),或所有频率变化(如计时代码),
甚至需要强制限制为指定频率(如 ARM 架构上的 LCD 驱动程序)的其它内核组件
此外,内核 "常数" loops_per_jiffy 会根据频率变化而更新。
cpufreq策略的引用计数由 cpufreq_cpu_get 和 cpufreq_cpu_put 来完成,以确保 cpufreq 驱
动程序被正确地注册到核心中,并且驱动程序在 cpufreq_put_cpu 被调用之前不会被卸载。这也保证
......@@ -41,7 +43,7 @@ cpufreq策略的引用计数由 cpufreq_cpu_get 和 cpufreq_cpu_put 来完成,
2. CPUFreq 通知器
====================
CPUFreq通知器符合标准的内核通知器接口。
CPUFreq通知器遵循标准的内核通知器接口。
关于通知器的细节请参阅 linux/include/linux/notifier.h。
这里有两个不同的CPUfreq通知器 - 策略通知器和转换通知器。
......@@ -69,20 +71,20 @@ CPUFreq通知器符合标准的内核通知器接口。
第三个参数是一个包含如下值的结构体cpufreq_freqs:
===== ====================
cpu 受影响cpu的编号
====== ===============================
policy 指向struct cpufreq_policy的指针
old 旧频率
new 新频率
flags cpufreq驱动的标志
===== ====================
====== ===============================
3. 含有Operating Performance Point (OPP)的CPUFreq表的生成
==================================================================
关于OPP的细节请参阅 Documentation/power/opp.rst
dev_pm_opp_init_cpufreq_table -
这个功能提供了一个随时可用的转换程序,用来将OPP层关于可用频率的内部信息翻译成一种容易提供给
cpufreq的格式。
这个函数提供了一个随时可用的转换例程,用来将OPP层关于可用频率的内部信息翻译成一种
cpufreq易于处理的格式。
.. Warning::
......
......@@ -8,7 +8,9 @@
司延腾 Yanteng Si <siyanteng@loongson.cn>
.. _cn_cpu-drivers.rst:
:校译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
=======================================
如何实现一个新的CPUFreq处理器驱动程序?
......@@ -38,14 +40,14 @@
1. 怎么做?
===========
,你刚刚得到了一个全新的CPU/芯片组及其数据手册,并希望为这个CPU/芯片组添加cpufreq
,你刚刚得到了一个全新的CPU/芯片组及其数据手册,并希望为这个CPU/芯片组添加cpufreq
支持?很好,这里有一些至关重要的提示:
1.1 初始化
----------
首先,在__initcall_level_7 (module_init())或更靠后的函数中检查这个内核是否
首先,在 __initcall level 7 (module_init())或更靠后的函数中检查这个内核是否
运行在正确的CPU和正确的芯片组上。如果是,则使用cpufreq_register_driver()向
CPUfreq核心层注册一个cpufreq_driver结构体。
......@@ -60,11 +62,11 @@ CPUfreq核心层注册一个cpufreq_driver结构体。
.setpolicy 或 .fast_switch 或 .target 或 .target_index - 差异见
下文。
并且可选择
其它可选成员
.flags - cpufreq核的提示。
.flags - 给cpufreq核心的提示。
.driver_data - cpufreq驱动程序的特数据。
.driver_data - cpufreq驱动程序的特数据。
.get_intermediate 和 target_intermediate - 用于在改变CPU频率时切换到稳定
的频率。
......@@ -73,16 +75,16 @@ CPUfreq核心层注册一个cpufreq_driver结构体。
.bios_limit - 返回HW/BIOS对CPU的最大频率限制值。
.exit - 一个指向per-policy清理函数的指针,该函数在cpu热插拔过程的CPU_POST_DEAD
.exit - 一个指向per-policy清理函数的指针,该函数在CPU热插拔过程的CPU_POST_DEAD
阶段被调用。
.suspend - 一个指向per-policy暂停函数的指针,该函数在关中断且在该策略的调节器停止
后被调用。
.resume - 一个指向per-policy恢复函数的指针,该函数在关中断且在调节器再一次开始前被
.resume - 一个指向per-policy恢复函数的指针,该函数在关中断且在调节器再一次启动前被
调用。
.attr - 一个指向NULL结尾的"struct freq_attr"列表的指针,该函数允许导出值到
.attr - 一个指向NULL结尾的"struct freq_attr"列表的指针,该列表允许导出值到
sysfs。
.boost_enabled - 如果设置,则启用提升(boost)频率。
......@@ -93,95 +95,93 @@ CPUfreq核心层注册一个cpufreq_driver结构体。
1.2 Per-CPU 初始化
------------------
每当一个新的CPU被注册到设备模型中,或者在cpufreq驱动注册自己之后,如果此CPU的cpufreq策
略不存在,则会调用per-policy的初始化函数cpufreq_driver.init。请注意,.init()和.exit()程序
对策略调用一次,而不是对策略管理的每个CPU调用一次。它需要一个 ``struct cpufreq_policy
每当一个新的CPU被注册到设备模型中,或者当cpufreq驱动注册自身之后,如果此CPU的cpufreq策
略不存在,则会调用per-policy的初始化函数cpufreq_driver.init。请注意,.init()和.exit()例程
为某个策略调用一次,而不是对该策略管理的每个CPU调用一次。它需要一个 ``struct cpufreq_policy
*policy`` 作为参数。现在该怎么做呢?
如果有必要,请在你的CPU上激活CPUfreq功能支持。
然后,驱动程序必须填写以下值:
然后,驱动程序必须填写以下值:
+-----------------------------------+--------------------------------------+
|policy->cpuinfo.min_freq 和 | |
|policy->cpuinfo.max_freq | 该CPU支持的最低和最高频率(kHz) |
| | |
|policy->cpuinfo.min_freq和 | 该CPU支持的最低和最高频率(kHz) |
|policy->cpuinfo.max_freq | |
| | |
+-----------------------------------+--------------------------------------+
|policy->cpuinfo.transition_latency | |
| | CPU在两个频率之间切换所需的时间,以 |
| | 纳秒为单位(如适用,否则指定 |
|policy->cpuinfo.transition_latency | CPU在两个频率之间切换所需的时间,以 |
| | 纳秒为单位(如不适用,设定为 |
| | CPUFREQ_ETERNAL) |
| | |
+-----------------------------------+--------------------------------------+
|policy->cur | 该CPU当前的工作频率(如适用) |
| | |
+-----------------------------------+--------------------------------------+
|policy->min, | |
|policy->max, | |
|policy->policy and, if necessary, | |
|policy->governor | 必须包含该cpu的 “默认策略”。稍后 |
| | 会用这些值调用 |
| | cpufreq_driver.verify and either |
| | cpufreq_driver.setpolicy or |
|policy->min, | 必须包含该CPU的"默认策略"。稍后 |
|policy->max, | 会用这些值调用 |
|policy->policy and, if necessary, | cpufreq_driver.verify和下面函数 |
|policy->governor | 之一:cpufreq_driver.setpolicy或 |
| | cpufreq_driver.target/target_index |
| | |
+-----------------------------------+--------------------------------------+
|policy->cpus | 用与这个CPU一起做DVFS的(在线+离线) |
| | CPU(即与它共享时钟/电压轨)的掩码更新 |
| | 这个 |
|policy->cpus | 该policy通过DVFS框架影响的全部CPU |
| | (即与本CPU共享"时钟/电压"对)构成 |
| | 掩码(同时包含在线和离线CPU),用掩码 |
| | 更新本字段 |
| | |
+-----------------------------------+--------------------------------------+
对于设置其中的一些值(cpuinfo.min[max]_freq, policy->min[max]),频率表助手可能会有帮
对于设置其中的一些值(cpuinfo.min[max]_freq, policy->min[max]),频率表辅助函数可能会有帮
助。关于它们的更多信息,请参见第2节。
1.3 验证
--------
当用户决定设置一个新的策略(由 “policy,governor,min,max组成”)时,必须对这个策略进行验证,
当用户决定设置一个新的策略(由"policy,governor,min,max组成")时,必须对这个策略进行验证,
以便纠正不兼容的值。为了验证这些值,cpufreq_verify_within_limits(``struct cpufreq_policy
*policy``, ``unsigned int min_freq``, ``unsigned int max_freq``)函数可能会有帮助。
关于频率表助手的详细内容请参见第2节。
关于频率表辅助函数的详细内容请参见第2节。
您需要确保至少有一个有效频率(或工作范围)在 policy->min 和 policy->max 范围内。如果有必
要,先增加policy->max,只有在没有办法的情况下,才减少policy->min。
要,先增大policy->max,只有在没有解决方案的情况下,才减小policy->min。
1.4 target 或 target_index 或 setpolicy 或 fast_switch?
-------------------------------------------------------
大多数cpufreq驱动甚至大多数cpu频率升降算法只允许将CPU频率设置为预定义的固定值。对于这些,你
大多数cpufreq驱动甚至大多数CPU频率升降算法只允许将CPU频率设置为预定义的固定值。对于这些,你
可以使用->target(),->target_index()或->fast_switch()回调。
有些cpufreq功能的处理器可以自己在某些限制之间切换频率。这些应使用->setpolicy()回调。
有些具有硬件调频能力的处理器可以自行依据某些限制来切换CPU频率。它们应使用->setpolicy()回调。
1.5. target/target_index
------------------------
target_index调用有两个参数:``struct cpufreq_policy * policy``和``unsigned int``
索引(于列出的频率表)。
target_index调用有两个参数: ``struct cpufreq_policy * policy`` 和 ``unsigned int``
索引(用于索引频率表项)。
当调用这里时,CPUfreq驱动必须设置新的频率。实际频率必须由freq_table[index].frequency决定。
它应该总是在错误的情况下恢复到之前的频率(即policy->restore_freq),即使我们之前切换到中间频率。
在发生错误的情况下总是应该恢复到之前的频率(即policy->restore_freq),即使我们已经切换到了
中间频率。
已弃用
----------
目标调用有三个参数。``struct cpufreq_policy * policy``, unsigned int target_frequency,
target调用有三个参数。``struct cpufreq_policy * policy``, unsigned int target_frequency,
unsigned int relation.
CPUfreq驱动在调用这里时必须设置新的频率。实际的频率必须使用以下规则来确定。
- 紧跟 "目标频率"。
- 尽量贴近"目标频率"。
- policy->min <= new_freq <= policy->max (这必须是有效的!!!)
- 如果 relation==CPUFREQ_REL_L,尝试选择一个高于或等于 target_freq 的 new_freq。("L代表
最低,但不能低于")
- 如果 relation==CPUFREQ_REL_H,尝试选择一个低于或等于 target_freq 的 new_freq。("H代表
最高,但不能高于")
这里,频率表助手可能会帮助你--详见第2节。
这里,频率表辅助函数可能会帮助你 -- 详见第2节。
1.6. fast_switch
----------------
......@@ -195,42 +195,43 @@ CPUfreq驱动在调用这里时必须设置新的频率。实际的频率必须
1.7 setpolicy
-------------
setpolicy调用只需要一个``struct cpufreq_policy * policy``作为参数。需要将处理器内或芯片组内动态频
setpolicy调用只需要一个 ``struct cpufreq_policy * policy`` 作为参数。需要将处理器内或芯片组内动态频
率切换的下限设置为policy->min,上限设置为policy->max,如果支持的话,当policy->policy为
CPUFREQ_POLICY_PERFORMANCE时选择面向性能的设置,CPUFREQ_POLICY_POWERSAVE时选择面向省电的设置。
CPUFREQ_POLICY_PERFORMANCE时选择面向性能的设置,CPUFREQ_POLICY_POWERSAVE时选择面向省电的设置。
也可以查看drivers/cpufreq/longrun.c中的参考实现。
1.8 get_intermediate 和 target_intermediate
--------------------------------------------
仅适用于 target_index() 和 CPUFREQ_ASYNC_NOTIFICATION 未设置的驱动。
仅适用于未设置 target_index() 和 CPUFREQ_ASYNC_NOTIFICATION 的驱动。
get_intermediate应该返回一个平台想要切换到的稳定的中间频率,target_intermediate()应该将CPU设置为
该频率,然后再跳转到'index'对应的频率。核心会负责发送通知,驱动不必在target_intermediate()或
target_index()中处理
该频率,然后再跳转到'index'对应的频率。cpufreq核心会负责发送通知,驱动不必在
target_intermediate()或target_index()中处理它们
在驱动程序不想因为某个目标频率切换到中间频率的情况下,它们可以从get_intermediate()中返回'0'。在这种情况
下,核心将直接调用->target_index()。
在驱动程序不想为某个目标频率切换到中间频率的情况下,它们可以让get_intermediate()返回'0'。
在这种情况下,cpufreq核心将直接调用->target_index()。
注意:->target_index()应该在失败的情况下恢复到policy->restore_freq,因为core会为此发送通知。
注意:->target_index()应该在发生失败的情况下将频率恢复到policy->restore_freq,
因为cpufreq核心会为此发送通知。
2. 频率表助手
=============
2. 频率表辅助函数
=================
由于大多数cpufreq处理器只允许被设置为几个特定的频率,因此,一个带有一些函数的 “频率表”可能会辅助处理器驱动
程序的一些工作。这样的 "频率表" 由一个cpufreq_frequency_table条目构成的数组组成,"driver_data" 中
了驱动程序的具体数值,"frequency" 中包含了相应的频率,并设置了标志。在表的最后,需要添加一个
cpufreq_frequency_table条目,频率设置为CPUFREQ_TABLE_END。如果想跳过表中的一个条目,则将频率设置为
CPUFREQ_ENTRY_INVALID。这些条目不需要按照任何特定的顺序排序,但如果它们是cpufreq 核心会对它们进行快速的DVFS
由于大多数支持cpufreq的处理器只允许被设置为几个特定的频率,因此,"频率表"和一些相关函数可能会辅助处理器驱动
程序的一些工作。这样的"频率表"是一个由struct cpufreq_frequency_table的条目构成的数组,"driver_data"成员
驱动程序的专用值,"frequency"成员包含了相应的频率,此外还有标志成员。在表的最后,需要添加一个
cpufreq_frequency_table条目,频率设置为CPUFREQ_TABLE_END。如果想跳过表中的一个条目,则将频率设置为
CPUFREQ_ENTRY_INVALID。这些条目不需要按照任何特定的顺序排序,如果排序了,cpufreq核心执行DVFS会更快一点
因为搜索最佳匹配会更快。
如果策略在其policy->freq_table字段中包含一个有效的指针,cpufreq表就会被核心自动验证。
如果在policy->freq_table字段中包含一个有效的频率表指针,频率表就会被cpufreq核心自动验证。
cpufreq_frequency_table_verify()保证至少有一个有效的频率在policy->min和policy->max范围内,并且所有其他
标准都被满足。这对->verify调用很有帮助。
准则都被满足。这对->verify调用很有帮助。
cpufreq_frequency_table_target()是对应于->target阶段的频率表助手。只要把数值传递给这个函数,这个函数就会返
cpufreq_frequency_table_target()是对应于->target阶段的频率表辅助函数。只要把值传递给这个函数,这个函数就会返
回包含CPU要设置的频率的频率表条目。
以下宏可以作为cpufreq_frequency_table的迭代器。
......@@ -238,8 +239,8 @@ cpufreq_frequency_table_target()是对应于->target阶段的频率表助手。
cpufreq_for_each_entry(pos, table) - 遍历频率表的所有条目。
cpufreq_for_each_valid_entry(pos, table) - 该函数遍历所有条目,不包括CPUFREQ_ENTRY_INVALID频率。
使用参数 "pos"-一个``cpufreq_frequency_table * `` 作为循环变量,使用参数 "table"-作为你想迭代
``cpufreq_frequency_table * `` 。
使用参数"pos" -- 一个 ``cpufreq_frequency_table *`` 作为循环指针,使用参数"table" -- 作为你想迭代
``cpufreq_frequency_table *`` 。
例如::
......@@ -250,5 +251,5 @@ cpufreq_for_each_valid_entry(pos, table) - 该函数遍历所有条目,不包
pos->frequency = ...
}
如果你需要在driver_freq_table中处理pos的位置,不要减去指针,因为它的代价相当高。相反,使用宏
如果你需要在driver_freq_table中处理pos的位置,不要做指针减法,因为它的代价相当高。作为替代,使用宏
cpufreq_for_each_entry_idx() 和 cpufreq_for_each_valid_entry_idx() 。
......@@ -8,13 +8,15 @@
司延腾 Yanteng Si <siyanteng@loongson.cn>
.. _cn_cpufreq-stats.rst:
:校译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
==========================================
sysfs CPUFreq Stats的一般说明
==========================================
用户信息
为使用者准备的信息
作者: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
......@@ -29,17 +31,16 @@ sysfs CPUFreq Stats的一般说明
1. 简介
===============
cpufreq-stats是一个为每个CPU提供CPU频率统计的驱动。
这些统计数据在/sysfs中以一堆只读接口的形式提供。这个接口(在配置好后)将出现在
/sysfs(<sysfs root>/devices/system/cpu/cpuX/cpufreq/stats/)中cpufreq下的一个单
独的目录中,提供给每个CPU。
各种统计数据将在此目录下形成只读文件。
cpufreq-stats是一种为每个CPU提供CPU频率统计的驱动。
这些统计数据以/sysfs中一系列只读接口的形式呈现。cpufreq-stats接口(若已配置)将为每个CPU生成
/sysfs(<sysfs root>/devices/system/cpu/cpuX/cpufreq/stats/)中cpufreq目录下的stats目录。
各项统计数据将在stats目录下形成对应的只读文件。
此驱动是独立于任何可能运行在你所用CPU上的特定cpufreq_driver而设计的。因此,它将与所有
cpufreq_driver一起工作。
此驱动是以独立于任何可能运行在你所用CPU上的特定cpufreq_driver的方式设计的。因此,它将能和任何
cpufreq_driver协同工作。
2. 提供的统计数据(举例说明)
2. 已提供的统计数据(有例子)
=====================================
cpufreq stats提供了以下统计数据(在下面详细解释)。
......@@ -48,8 +49,8 @@ cpufreq stats提供了以下统计数据(在下面详细解释)。
- total_trans
- trans_table
所有的统计数据将从统计驱动被载入的时间(或统计被重置的时间)开始,到某一统计数据被读取的时间为止。
显然,统计驱动不会有任何关于统计驱动载入之前的频率转换信息。
所有统计数据来自以下时间范围:从统计驱动被加载的时间(或统计数据被重置的时间)开始,到某一统计数据被读取的时间为止。
显然,统计驱动不会保存它被加载之前的任何频率转换信息。
::
......@@ -64,14 +65,14 @@ cpufreq stats提供了以下统计数据(在下面详细解释)。
- **reset**
只写属性,可用于重置统计计数器。这对于评估不同调节器的系统行为非常有用,且无需重启。
只写属性,可用于重置统计计数器。这对于评估不同调节器的系统行为非常有用,且无需重启。
- **time_in_state**
项给出了这个CPU所支持的每个频率所花费的时间。cat输出的每一行都会有"<frequency>
<time>"对,表示这个CPU在<frequency>上花费了<time>个usertime单位的时间。这里的
usertime单位是10mS(类似于/proc中输出的其他时间)。
文件给出了在本CPU支持的每个频率上分别花费的时间。cat输出的每一行都是一个"<frequency>
<time>"对,表示这个CPU在<frequency>上花费了<time>个usertime单位的时间。输出的每一行对应
一个CPU支持的频率。这里usertime单位是10mS(类似于/proc导出的其它时间)。
::
......@@ -85,7 +86,7 @@ usertime单位是10mS(类似于/proc中输出的其他时间)。
- **total_trans**
给出了这个CPU上频率转换的总次数。cat的输出将有一个单一的计数,这就是频率转换的总数。
此文件给出了这个CPU频率转换的总次数。cat的输出是一个计数值,它就是频率转换的总次数。
::
......@@ -94,10 +95,10 @@ usertime单位是10mS(类似于/proc中输出的其他时间)。
- **trans_table**
这将提供所有CPU频率转换的细粒度信息。这里的cat输出是一个二维矩阵,其中一个条目<i, j>(第
本文件提供所有CPU频率转换的细粒度信息。这里的cat输出是一个二维矩阵,其中一个条目<i, j>(第
i行,第j列)代表从Freq_i到Freq_j的转换次数。Freq_i行和Freq_j列遵循驱动最初提供给cpufreq
的频率表的排序顺序,因此可以排序(升序或降序)或不排序。 这里的输出也包含了每行每列的实际
频率值,以便更好地阅读。
心的频率表的排列顺序,因此可以已排序(升序或降序)或未排序。这里的输出也包含了实际
频率值,分别按行和按列显示,以便更好地阅读。
如果转换表大于PAGE_SIZE,读取时将返回一个-EFBIG错误。
......@@ -115,7 +116,7 @@ i行,第j列)代表从Freq_i到Freq_j的转换次数。Freq_i行和Freq_j列
3. 配置cpufreq-stats
============================
在你的内核中配置cpufreq-stats::
按以下方式在你的内核中配置cpufreq-stats::
Config Main Menu
Power management options (ACPI, APM) --->
......@@ -124,7 +125,7 @@ i行,第j列)代表从Freq_i到Freq_j的转换次数。Freq_i行和Freq_j列
[*] CPU frequency translation statistics
"CPU Frequency scaling" (CONFIG_CPU_FREQ) 应该被启用配置cpufreq-stats。
"CPU Frequency scaling" (CONFIG_CPU_FREQ) 应该被启用,以支持配置cpufreq-stats。
"CPU frequency translation statistics" (CONFIG_CPU_FREQ_STAT)提供了包括
time_in_state、total_trans和trans_table的统计数据。
......
......@@ -22,13 +22,13 @@ Documentation/translations/zh_CN/dev-tools/testing-overview.rst
:maxdepth: 2
testing-overview
sparse
gcov
kasan
Todolist:
- coccinelle
- sparse
- kcov
- ubsan
- kmemleak
......
Chinese translated version of Documentation/dev-tools/sparse.rst
Copyright 2004 Linus Torvalds
Copyright 2004 Pavel Machek <pavel@ucw.cz>
Copyright 2006 Bob Copeland <me@bobcopeland.com>
If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help. Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.
.. include:: ../disclaimer-zh_CN.rst
Chinese maintainer: Li Yang <leoyang.li@nxp.com>
---------------------------------------------------------------------
Documentation/dev-tools/sparse.rst 的中文翻译
:Original: Documentation/dev-tools/sparse.rst
如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者。
:翻译:
中文版维护者: 李阳 Li Yang <leoyang.li@nxp.com>
中文版翻译者: 李阳 Li Yang <leoyang.li@nxp.com>
Li Yang <leoyang.li@nxp.com>
:校译:
以下为正文
---------------------------------------------------------------------
司延腾 Yanteng Si <siyanteng@loongson.cn>
Copyright 2004 Linus Torvalds
Copyright 2004 Pavel Machek <pavel@ucw.cz>
Copyright 2006 Bob Copeland <me@bobcopeland.com>
.. _cn_sparse:
Sparse
======
Sparse是一个C程序的语义检查器;它可以用来发现内核代码的一些潜在问题。 关
于sparse的概述,请参见https://lwn.net/Articles/689907/;本文档包含
一些针对内核的sparse信息。
关于sparse的更多信息,主要是关于它的内部结构,可以在它的官方网页上找到:
https://sparse.docs.kernel.org。
使用 sparse 工具做类型检查
~~~~~~~~~~~~~~~~~~~~~~~~~~
"__bitwise" 是一种类型属性,所以你应该这样使用它
"__bitwise" 是一种类型属性,所以你应该这样使用它::
typedef int __bitwise pm_request_t;
......@@ -48,7 +48,7 @@ Copyright 2006 Bob Copeland <me@bobcopeland.com>
坦白来说,你并不需要使用枚举类型。上面那些实际都可以浓缩成一个特殊的"int
__bitwise"类型。
所以更简单的办法只要这样做
所以更简单的办法只要这样做::
typedef int __bitwise pm_request_t;
......@@ -60,25 +60,42 @@ __bitwise"类型。
一个小提醒:常数整数"0"是特殊的。你可以直接把常数零当作位方式整数使用而
不用担心 sparse 会抱怨。这是因为"bitwise"(恰如其名)是用来确保不同位方
式类型不会被弄混(小尾模式,大尾模式,cpu尾模式,或者其他),对他们来说
常数"0"确实是特殊的。
常数"0"确实 **是** 特殊的。
使用sparse进行锁检查
--------------------
下面的宏对于 gcc 来说是未定义的,在 sparse 运行时定义,以使用sparse的“上下文”
跟踪功能,应用于锁定。 这些注释告诉 sparse 什么时候有锁,以及注释的函数的进入和
退出。
__must_hold - 指定的锁在函数进入和退出时被持有。
__acquires - 指定的锁在函数退出时被持有,但在进入时不被持有。
__releases - 指定的锁在函数进入时被持有,但在退出时不被持有。
如果函数在不持有锁的情况下进入和退出,在函数内部以平衡的方式获取和释放锁,则不
需要注释。
上面的三个注释是针对sparse否则会报告上下文不平衡的情况。
获取 sparse 工具
~~~~~~~~~~~~~~~~
你可以从 Sparse 的主页获取最新的发布版本:
http://www.kernel.org/pub/linux/kernel/people/josh/sparse/
https://www.kernel.org/pub/software/devel/sparse/dist/
或者,你也可以使用 git 克隆最新的 sparse 开发版本:
git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git
git://git.kernel.org/pub/scm/devel/sparse/sparse.git
一旦你下载了源码,只要以普通用户身份运行:
make
make install
它将会被自动安装到你的 ~/bin 目录下。
如果是标准的用户,它将会被自动安装到你的~/bin目录下。
使用 sparse 工具
~~~~~~~~~~~~~~~~
......
......@@ -23,6 +23,11 @@
另外,随时欢迎您对内核文档进行改进;如果您想提供帮助,请加入vger.kernel.org
上的linux-doc邮件列表。
顺便说下,中文文档也需要遵守内核编码风格,风格中中文和英文的主要不同就是中文
的字符标点占用两个英文字符宽度, 所以,当英文要求不要超过每行100个字符时,
中文就不要超过50个字符。另外,也要注意'-','=' 等符号与相关标题的对齐。在将
补丁提交到社区之前,一定要进行必要的checkpatch.pl检查和编译测试。
许可证文档
----------
......@@ -106,6 +111,7 @@ TODOList:
virt/index
infiniband/index
accounting/index
scheduler/index
TODOList:
......@@ -140,7 +146,6 @@ TODOList:
* PCI/index
* scsi/index
* misc-devices/index
* scheduler/index
* mhi/index
体系结构无关文档
......
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/completion.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
=======================================
完成 - "等待完成" 屏障应用程序接口(API)
=======================================
简介:
-----
如果你有一个或多个线程必须等待某些内核活动达到某个点或某个特定的状态,完成可以为这
个问题提供一个无竞争的解决方案。从语义上讲,它们有点像pthread_barrier(),并且使
用的案例类似
完成是一种代码同步机制,它比任何滥用锁/信号量和忙等待循环的行为都要好。当你想用yield()
或一些古怪的msleep(1)循环来允许其它代码继续运行时,你可能想用wait_for_completion*()
调用和completion()来代替。
使用“完成”的好处是,它们有一个良好定义、聚焦的目标,这不仅使得我们很容易理解代码的意图,
而且它们也会生成更高效的代码,因为所有线程都可以继续执行,直到真正需要结果的时刻。而且等
待和信号都高效的使用了低层调度器的睡眠/唤醒设施。
完成是建立在Linux调度器的等待队列和唤醒基础设施之上的。等待队列中的线程所等待的
事件被简化为 ``struct completion`` 中的一个简单标志,被恰如其名地称为‘done’。
由于完成与调度有关,代码可以在kernel/sched/completion.c中找到。
用法:
-----
使用完成需要三个主要部分:
- 'struct completion' 同步对象的初始化
- 通过调用wait_for_completion()的一个变体来实现等待部分。
- 通过调用complete()或complete_all()实现发信端。
也有一些辅助函数用于检查完成的状态。请注意,虽然必须先做初始化,但等待和信号部分可以
按任何时间顺序出现。也就是说,一个线程在另一个线程检查是否需要等待它之前,已经将一个
完成标记为 "done",这是完全正常的。
要使用完成API,你需要#include <linux/completion.h>并创建一个静态或动态的
``struct completion`` 类型的变量,它只有两个字段::
struct completion {
unsigned int done;
wait_queue_head_t wait;
};
结构体提供了->wait等待队列来放置任务进行等待(如果有的话),以及->done完成标志来表明它
是否完成。
完成的命名应当与正在被同步的事件名一致。一个好的例子是::
wait_for_completion(&early_console_added);
complete(&early_console_added);
好的、直观的命名(一如既往地)有助于代码的可读性。将一个完成命名为 ``complete``
是没有帮助的,除非其目的是超级明显的...
初始化完成:
-----------
动态分配的完成对象最好被嵌入到数据结构中,以确保在函数/驱动的生命周期内存活,以防
止与异步complete()调用发生竞争。
在使用wait_for_completion()的_timeout()或_killable()/_interruptible()变体
时应特别小心,因为必须保证在所有相关活动(complete()或reinit_completion())发生
之前不会发生内存解除分配,即使这些等待函数由于超时或信号触发而过早返回。
动态分配的完成对象的初始化是通过调用init_completion()来完成的::
init_completion(&dynamic_object->done);
在这个调用中,我们初始化 waitqueue 并将 ->done 设置为 0,即“not completed”或
“not done”。
重新初始化函数reinit_completion(),只是将->done字段重置为0(“not done”),而
不触及等待队列。这个函数的调用者必须确保没有任何令人讨厌的wait_for_completion()
调用在并行进行。
在同一个完成对象上调用init_completion()两次很可能是一个bug,因为它将队列重新初始
化为一个空队列,已排队的任务可能会“丢失”--在这种情况下使用reinit_completion(),但
要注意其他竞争。
对于静态声明和初始化,可以使用宏。
对于文件范围内的静态(或全局)声明,你可以使用 DECLARE_COMPLETION()::
static DECLARE_COMPLETION(setup_done);
DECLARE_COMPLETION(setup_done);
注意,在这种情况下,完成在启动时(或模块加载时)被初始化为“not done”,不需要调用
init_completion()。
当完成被声明为一个函数中的局部变量时,那么应该总是明确地使用
DECLARE_COMPLETION_ONSTACK()来初始化,这不仅仅是为了让lockdep正确运行,也是明确表
名它有限的使用范围是有意为之并被仔细考虑的::
DECLARE_COMPLETION_ONSTACK(setup_done)
请注意,当使用完成对象作为局部变量时,你必须敏锐地意识到函数堆栈的短暂生命期:在所有
活动(如等待的线程)停止并且完成对象完全未被使用之前,函数不得返回到调用上下文。
再次强调这一点:特别是在使用一些具有更复杂结果的等待API变体时,比如超时或信号
(_timeout(), _killable()和_interruptible())变体,等待可能会提前完成,而对象可
能仍在被其他线程使用 - 从wait_on_completion*()调用者函数的返回会取消分配函数栈,如
果complete()在其它某线程中完成调用,会引起微小的数据损坏。简单的测试可能不会触发这
些类型的竞争。
如果不确定的话,使用动态分配的完成对象, 最好是嵌入到其它一些生命周期长的对象中,长到
超过使用完成对象的任何辅助线程的生命周期,或者有一个锁或其他同步机制来确保complete()
不会在一个被释放的对象中调用。
在堆栈上单纯地调用DECLARE_COMPLETION()会触发一个lockdep警告。
等待完成:
---------
对于一个线程来说,要等待一些并发活动的完成,它要在初始化的完成结构体上调用
wait_for_completion()::
void wait_for_completion(struct completion *done)
一个典型的使用场景是::
CPU#1 CPU#2
struct completion setup_done;
init_completion(&setup_done);
initialize_work(...,&setup_done,...);
/* run non-dependent code */ /* do setup */
wait_for_completion(&setup_done); complete(setup_done);
这并不意味着调用wait_for_completion()和complete()有任何特定的时间顺序--如果调
用complete()发生在调用wait_for_completion()之前,那么等待方将立即继续执行,因为
所有的依赖都得到了满足;如果没有,它将阻塞,直到complete()发出完成的信号。
注意,wait_for_completion()是在调用spin_lock_irq()/spin_unlock_irq(),所以
只有当你知道中断被启用时才能安全地调用它。从IRQs-off的原子上下文中调用它将导致难以检
测的错误的中断启用。
默认行为是不带超时的等待,并将任务标记为“UNINTERRUPTIBLE”状态。wait_for_completion()
及其变体只有在进程上下文中才是安全的(因为它们可以休眠),但在原子上下文、中断上下文、IRQ
被禁用或抢占被禁用的情况下是不安全的--关于在原子/中断上下文中处理完成的问题,还请看下面的
try_wait_for_completion()。
由于wait_for_completion()的所有变体都可能(很明显)阻塞很长时间,这取决于它们所等
待的活动的性质,所以在大多数情况下,你可能不想在持有mutex锁的情况下调用它。
wait_for_completion*()可用的变体:
---------------------------------
下面的变体都会返回状态,在大多数(/所有)情况下都应该检查这个状态--在故意不检查状态的情
况下,你可能要做一个说明(例如,见arch/arm/kernel/smp.c:__cpu_up())。
一个常见的问题是不准确的返回类型赋值,所以要注意将返回值赋值给适当类型的变量。
检查返回值的具体含义也可能被发现是相当不准确的,例如,像这样的构造::
if (!wait_for_completion_interruptible_timeout(...))
...会在成功完成和中断的情况下执行相同的代码路径--这可能不是你想要的结果::
int wait_for_completion_interruptible(struct completion *done)
这个函数在任务等待时标记为TASK_INTERRUPTIBLE。如果在等待期间收到信号,它将返回
-ERESTARTSYS;否则为0::
unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout)
该任务被标记为TASK_UNINTERRUPTIBLE,并将最多超时等待“timeout”个jiffies。如果超时发生,则
返回0,否则返回剩余的时间(但至少是1)。
超时最好用msecs_to_jiffies()或usecs_to_jiffies()计算,以使代码在很大程度上不受
HZ的影响。
如果返回的超时值被故意忽略,那么注释应该解释原因
(例如,见drivers/mfd/wm8350-core.c wm8350_read_auxadc()::
long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout)
这个函数传递一个以jiffies为单位的超时,并将任务标记为TASK_INTERRUPTIBLE。如果收到
信号,则返回-ERESTARTSYS;否则,如果完成超时,则返回0;如果完成了,则返回剩余的时间
(jiffies)。
更多的变体包括_killable,它使用TASK_KILLABLE作为指定的任务状态,如果它被中断,将返
回-ERESTARTSYS,如果完成了,则返回0。它也有一个_timeout变体::
long wait_for_completion_killable(struct completion *done)
long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout)
wait_for_completion_io()的_io变体的行为与非_io变体相同,只是将等待时间计为“IO等待”,
这对任务在调度/IO统计中的计算方式有影响::
void wait_for_completion_io(struct completion *done)
unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout)
对完成发信号:
-------------
一个线程想要发出信号通知继续的条件已经达到,就会调用complete(),向其中一个等待者发出信
号表明它可以继续::
void complete(struct completion *done)
... or calls complete_all() to signal all current and future waiters::
void complete_all(struct completion *done)
即使在线程开始等待之前就发出了完成的信号,信号传递也会继续进行。这是通过等待者
“consuming”(递减)“struct completion” 的完成字段来实现的。等待的线程唤醒的顺序
与它们被排队的顺序相同(FIFO顺序)。
如果多次调用complete(),那么这将允许该数量的等待者继续进行--每次调用complete()将
简单地增加已完成的字段。但多次调用complete_all()是一个错误。complete()和
complete_all()都可以在IRQ/atomic上下文中安全调用。
在任何时候,只能有一个线程在一个特定的 “struct completion”上调用 complete() 或
complete_all() - 通过等待队列自旋锁进行序列化。任何对 complete() 或
complete_all() 的并发调用都可能是一个设计错误。
从IRQ上下文中发出完成信号 是可行的,因为它将正确地用
spin_lock_irqsave()/spin_unlock_irqrestore()执行锁操作
try_wait_for_completion()/completion_done():
--------------------------------------------
try_wait_for_completion()函数不会将线程放在等待队列中,而是在需要排队(阻塞)线
程时返回false,否则会消耗一个已发布的完成并返回true::
bool try_wait_for_completion(struct completion *done)
最后,为了在不以任何方式改变完成的情况下检查完成的状态,可以调用completion_done(),
如果没有发布的完成尚未被等待者消耗,则返回false(意味着存在等待者),否则返回true::
bool completion_done(struct completion *done)
try_wait_for_completion()和completion_done()都可以在IRQ或原子上下文中安全调用。
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
===============
Linux调度器
===============
.. toctree::
:maxdepth: 1
completion
sched-arch
sched-bwc
sched-design-CFS
sched-domains
sched-capacity
TODOList:
sched-bwc
sched-deadline
sched-energy
sched-nice-design
sched-rt-group
sched-stats
text_files
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/sched-arch.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
===============================
架构特定代码的CPU调度器实现提示
===============================
Nick Piggin, 2005
上下文切换
==========
1. 运行队列锁
默认情况下,switch_to arch函数在调用时锁定了运行队列。这通常不是一个问题,除非
switch_to可能需要获取运行队列锁。这通常是由于上下文切换中的唤醒操作造成的。见
arch/ia64/include/asm/switch_to.h的例子。
为了要求调度器在运行队列解锁的情况下调用switch_to,你必须在头文件
中`#define __ARCH_WANT_UNLOCKED_CTXSW`(通常是定义switch_to的那个文件)。
在CONFIG_SMP的情况下,解锁的上下文切换对核心调度器的实现只带来了非常小的性能损
失。
CPU空转
=======
你的cpu_idle程序需要遵守以下规则:
1. 现在抢占应该在空闲的例程上禁用。应该只在调用schedule()时启用,然后再禁用。
2. need_resched/TIF_NEED_RESCHED 只会被设置,并且在运行任务调用 schedule()
之前永远不会被清除。空闲线程只需要查询need_resched,并且永远不会设置或清除它。
3. 当cpu_idle发现(need_resched() == 'true'),它应该调用schedule()。否则
它不应该调用schedule()。
4. 在检查need_resched时,唯一需要禁用中断的情况是,我们要让处理器休眠到下一个中
断(这并不对need_resched提供任何保护,它可以防止丢失一个中断):
4a. 这种睡眠类型的常见问题似乎是::
local_irq_disable();
if (!need_resched()) {
local_irq_enable();
*** resched interrupt arrives here ***
__asm__("sleep until next interrupt");
}
5. 当need_resched变为高电平时,TIF_POLLING_NRFLAG可以由不需要中断来唤醒它们
的空闲程序设置。换句话说,它们必须定期轮询need_resched,尽管做一些后台工作或
进入低CPU优先级可能是合理的。
- 5a. 如果TIF_POLLING_NRFLAG被设置,而我们确实决定进入一个中断睡眠,那
么需要清除它,然后发出一个内存屏障(接着测试need_resched,禁用中断,如3中解释)。
arch/x86/kernel/process.c有轮询和睡眠空闲函数的例子。
可能出现的arch/问题
===================
我发现的可能的arch问题(并试图解决或没有解决)。:
ia64 - safe_halt的调用与中断相比,是否很荒谬? (它睡眠了吗) (参考 #4a)
sh64 - 睡眠与中断相比,是否很荒谬? (参考 #4a)
sparc - 在这一点上,IRQ是开着的(?),把local_irq_save改为_disable。
- 待办事项: 需要第二个CPU来禁用抢占 (参考 #1)
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/sched-bwc.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
============
CFS 带宽控制
============
.. note::
本文只讨论了SCHED_NORMAL的CPU带宽控制。
SCHED_RT的情况在Documentation/scheduler/sched-rt-group.rst中有涉及。
CFS带宽控制是一个CONFIG_FAIR_GROUP_SCHED扩展,它允许指定一个组或层次的最大CPU带宽。
一个组允许的带宽是用配额和周期指定的。在每个给定的”周期“(微秒)内,一个任务组被分配多
达“配额”微秒的CPU时间。当cgroup中的线程可运行时,该配额以时间片段的方式被分配到每个cpu
运行队列中。一旦所有的配额被分配,任何额外的配额请求将导致这些线程被限流。被限流的线程将不
能再次运行,直到下一个时期的配额得到补充。
一个组的未分配配额是全局跟踪的,在每个周期边界被刷新为cfs_quota单元。当线程消耗这个带宽时,
它以需求为基础被转移到cpu-local“筒仓”,在每次更新中转移的数量是可调整的,被描述为“片“(时
间片)。
突发特性
--------
现在这个功能借来的时间是用于防范我们对未来的低估,代价是对其他系统用户的干扰增加。所有这些都
有很好的限制。
传统的(UP-EDF)带宽控制是这样的:
(U = \Sum u_i) <= 1
这既保证了每个最后期限的实现,也保证了系统的稳定。毕竟,如果U>1,那么每一秒钟的壁钟时间,我
们就必须运行超过一秒钟的程序时间,显然会错过我们的最后期限,但下一个最后期限会更远,永远没有
时间赶上,无边无界的失败。
突发特性观察到工作负载并不总是执行全部配额;这使得人们可以将u_i描述为一个统计分布。
例如,让u_i = {x,e}_i,其中x是p(95)和x+e p(100)(传统的WCET)。这实际上允许u更小,提
高了效率(我们可以在系统中打包更多的任务),但代价是当所有的概率都一致时,会错过最后期限。然
而,它确实保持了稳定性,因为只要我们的x高于平均水平,每一次超限都必须与低估相匹配。
也就是说,假设我们有两个任务,都指定了一个p(95)值,那么我们有一个p(95)*p(95)=90.25%的机
会,两个任务都在他们的配额内,一切都很好。同时,我们有一个p(5)p(5)=0.25%的机会,两个任务同
时超过他们的配额(保证最后期限失败)。在这两者之间有一个阈值,其中一个超过了,而另一个没有不足,
无法补偿;这取决于具体的CDFs。
同时,我们可以说,最坏的情况下的截止日期失败,将是Sum e_i;也就是说,有一个有界的迟延(在假
设x+e确实是WCET的情况下)。
使用突发时的干扰是由错过最后期限的可能性和平均WCET来评价的。测试结果表明,当有许多cgroup或
CPU未被充分利用时,干扰是有限的。更多的细节显示在:
https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
管理
----
配额、周期和突发是在cpu子系统内通过cgroupfs管理的。
.. note::
本节描述的cgroupfs文件只适用于cgroup v1.对于cgroup v2,请参阅Control Group v2。
:ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`.
- cpu.cfs_quota_us:在一个时期内补充的运行时间(微秒)。
- cpu.cfs_period_us:一个周期的长度(微秒)。
- cpu.stat: 输出节流统计数据[下面进一步解释]
- cpu.cfs_burst_us:最大累积运行时间(微秒)。
默认值是::
cpu.cfs_period_us=100ms
cpu.cfs_quota_us=-1
cpu.cfs_burst_us=0
cpu.cfs_quota_us的值为-1表示该组没有任何带宽限制,这样的组被描述为无限制的带宽组。这代表
了CFS的传统工作保护行为。
写入不小于cpu.cfs_burst_us的任何(有效的)正值将配发指定的带宽限制。该配额或周期允许的最
小配额是1ms。周期长度也有一个1s的上限。当带宽限制以分层方式使用时,存在额外的限制,这些在下
面有更详细的解释。
向cpu.cfs_quota_us写入任何负值都会移除带宽限制,并使组再次回到无限制的状态。
cpu.cfs_burst_us的值为0表示该组不能积累任何未使用的带宽。它使得CFS的传统带宽控制行为没有
改变。将不大于 cpu.cfs_quota_us 的任何(有效的)正值写入 cpu.cfs_burst_us 将配发未使用
带宽累积的上限。
如果一个组处于受限状态,对该组带宽规格的任何更新都将导致其成为无限流状态。
系统范围设置
------------
为了提高效率,运行时间在全局池和CPU本地“筒仓”之间以批处理方式转移。这大大减少了大型系统的全
局核算压力。每次需要进行这种更新时,传输的数量被描述为 "片"。
这是可以通过procfs调整的::
/proc/sys/kernel/sched_cfs_bandwidth_slice_us (default=5ms)
较大的时间片段值将减少传输开销,而较小的值则允许更精细的消费。
统计
----
一个组的带宽统计数据通过cpu.stat的5个字段导出。
cpu.stat:
- nr_periods:已经过去的执行间隔的数量。
- nr_throttled: 该组已被节流/限制的次数。
- throttled_time: 该组的实体被限流的总时间长度(纳秒)。
- nr_bursts:突发发生的周期数。
- burst_time: 任何CPU在各个时期使用超过配额的累计壁钟时间(纳秒)。
这个接口是只读的。
分层考虑
--------
该接口强制要求单个实体的带宽总是可以达到的,即:max(c_i) <= C。然而,在总体情况下,是明确
允许过度订阅的,以便在一个层次结构中实现工作保护语义:
例如,Sum (c_i)可能超过C
[ 其中C是父方的带宽,c_i是其子方的带宽。 ]
.. note::
译文中的父亲/孩子指的是cgroup parent, cgroup children。
有两种方式可以使一个组变得限流:
a. 它在一段时期内完全消耗自己的配额
b. 父方的配额在其期间内全部用完
在上述b)情况下,即使孩子可能有剩余的运行时间,它也不会被允许,直到父亲的运行时间被刷新。
CFS带宽配额的注意事项
---------------------
一旦一个片断被分配给一个cpu,它就不会过期。然而,如果该cpu上的所有线程都无法运行,那么除了
1ms以外的所有时间片都可以返回到全局池中。这是在编译时由min_cfs_rq_runtime变量配置的。这
是一个性能调整,有助于防止对全局锁的额外争夺。
cpu-local分片不会过期的事实导致了一些有趣的罕见案例,应该被理解。
对于cgroup cpu限制的应用程序来说,这是一个相对有意义的问题,因为他们自然会消耗他们的全部配
额,以及每个cpu-本地片在每个时期的全部。因此,预计nr_periods大致等于nr_throttled,并且
cpuacct.用量的增加大致等于cfs_quota_us在每个周期的增加。
对于高线程、非cpu绑定的应用程序,这种非过期的细微差别允许应用程序短暂地突破他们的配额限制,
即任务组正在运行的每个cpu上未使用的片断量(通常每个cpu最多1ms或由min_cfs_rq_runtime定
义)。这种轻微的突发只适用于配额已经分配给cpu,然后没有完全使用或在以前的时期返回。这个突发
量不会在核心之间转移。因此,这种机制仍然严格限制任务组的配额平均使用量,尽管是在比单一时期更
长的时间窗口。这也限制了突发能力,每个cpu不超过1ms。这为在高核数机器上有小配额限制的高线程
应用提供了更好的更可预测的用户体验。它还消除了在使用低于配额的cpu时对这些应用进行节流的倾向。
另一种说法是,通过允许一个片断的未使用部分在不同时期保持有效,我们减少了在不需要整个片断的cpu
时间的cpu-local 筒仓上浪费配额的可能性。
绑定cpu和非绑定cpu的交互式应用之间的互动也应该被考虑,特别是当单核使用率达到100%时。如果你
给了这些应用程序一半的cpu-core,并且它们都被安排在同一个CPU上,理论上非cpu绑定的应用程序有
可能在某些时期使用多达1ms的额外配额,从而阻止cpu绑定的应用程序完全使用其配额,这也是同样的数
量。在这些情况下,将由CFS算法(见CFS调度器)来决定选择哪个应用程序来运行,因为它们都是可运行
的,并且有剩余的配额。这个运行时间的差异将在接下来的交互式应用程序空闲期间得到弥补。
例子
----
1. 限制一个组的运行时间为1个CPU的价值::
如果周期是250ms,配额也是250ms,那么该组将每250ms获得价值1个CPU的运行时间。
# echo 250000 > cpu.cfs_quota_us /* quota = 250ms */
# echo 250000 > cpu.cfs_period_us /* period = 250ms */
2. 在多CPU机器上,将一个组的运行时间限制为2个CPU的价值
在500ms周期和1000ms配额的情况下,该组每500ms可以获得2个CPU的运行时间::
# echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
# echo 500000 > cpu.cfs_period_us /* period = 500ms */
这里较大的周期允许增加突发能力。
3. 将一个组限制在1个CPU的20%。
在50ms周期内,10ms配额将相当于1个CPU的20%。::
# echo 10000 > cpu.cfs_quota_us /* quota = 10ms */
# echo 50000 > cpu.cfs_period_us /* period = 50ms */
通过在这里使用一个小的周期,我们以牺牲突发容量为代价来确保稳定的延迟响应。
4. 将一个组限制在1个CPU的40%,并允许累积到1个CPU的20%,如果已经累积了的话。
在50ms周期内,20ms配额将相当于1个CPU的40%。而10毫秒的突发将相当于1个
CPU的20%::
# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
# echo 50000 > cpu.cfs_period_us /* period = 50ms */
# echo 10000 > cpu.cfs_burst_us /* burst = 10ms */
较大的缓冲区设置(不大于配额)允许更大的突发容量。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/sched-capacity.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
:校译:
时奎亮 Alex Shi <alexs@kernel.org>
=============
算力感知调度
=============
1. CPU算力
==========
1.1 简介
--------
一般来说,同构的SMP平台由完全相同的CPU构成。异构的平台则由性能特征不同的CPU构成,在这样的
平台中,CPU不能被认为是相同的。
我们引入CPU算力(capacity)的概念来测量每个CPU能达到的性能,它的值相对系统中性能最强的CPU
做过归一化处理。异构系统也被称为非对称CPU算力系统,因为它们由不同算力的CPU组成。
最大可达性能(换言之,最大CPU算力)的差异有两个主要来源:
- 不是所有CPU的微架构都相同。
- 在动态电压频率升降(Dynamic Voltage and Frequency Scaling,DVFS)框架中,不是所有的CPU都
能达到一样高的操作性能值(Operating Performance Points,OPP。译注,也就是“频率-电压”对)。
Arm大小核(big.LITTLE)系统是同时具有两种差异的一个例子。相较小核,大核面向性能(拥有更多的
流水线层级,更大的缓存,更智能的分支预测器等),通常可以达到更高的操作性能值。
CPU性能通常由每秒百万指令(Millions of Instructions Per Second,MIPS)表示,也可表示为
per Hz能执行的指令数,故::
capacity(cpu) = work_per_hz(cpu) * max_freq(cpu)
1.2 调度器术语
--------------
调度器使用了两种不同的算力值。CPU的 ``capacity_orig`` 是它的最大可达算力,即最大可达性能等级。
CPU的 ``capacity`` 是 ``capacity_orig`` 扣除了一些性能损失(比如处理中断的耗时)的值。
注意CPU的 ``capacity`` 仅仅被设计用于CFS调度类,而 ``capacity_orig`` 是不感知调度类的。为
简洁起见,本文档的剩余部分将不加区分的使用术语 ``capacity`` 和 ``capacity_orig`` 。
1.3 平台示例
------------
1.3.1 操作性能值相同
~~~~~~~~~~~~~~~~~~~~
考虑一个假想的双核非对称CPU算力系统,其中
- work_per_hz(CPU0) = W
- work_per_hz(CPU1) = W/2
- 所有CPU以相同的固定频率运行
根据上文对算力的定义:
- capacity(CPU0) = C
- capacity(CPU1) = C/2
若这是Arm大小核系统,那么CPU0是大核,而CPU1是小核。
考虑一种周期性产生固定工作量的工作负载,你将会得到类似下图的执行轨迹::
CPU0 work ^
| ____ ____ ____
| | | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
CPU1 work ^
| _________ _________ ____
| | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
CPU0在系统中具有最高算力(C),它使用T个单位时间完成固定工作量W。另一方面,CPU1只有CPU0一半
算力,因此在T个单位时间内仅完成工作量W/2。
1.3.2 最大操作性能值不同
~~~~~~~~~~~~~~~~~~~~~~~~
具有不同算力值的CPU,通常来说最大操作性能值也不同。考虑上一小节提到的CPU(也就是说,
work_per_hz()相同):
- max_freq(CPU0) = F
- max_freq(CPU1) = 2/3 * F
这将推出:
- capacity(CPU0) = C
- capacity(CPU1) = C/3
执行1.3.1节描述的工作负载,每个CPU按最大频率运行,结果为::
CPU0 work ^
| ____ ____ ____
| | | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
workload on CPU1
CPU1 work ^
| ______________ ______________ ____
| | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
1.4 关于计算方式的注意事项
--------------------------
需要注意的是,使用单一值来表示CPU性能的差异是有些争议的。两个不同的微架构的相对性能差异应该
描述为:X%整数运算差异,Y%浮点数运算差异,Z%分支跳转差异,等等。尽管如此,使用简单计算方式
的结果目前还是令人满意的。
2. 任务使用率
=============
2.1 简介
--------
算力感知调度要求描述任务需求,描述方式要和CPU算力相关。每个调度类可以用不同的方式描述它。
任务使用率是CFS独有的描述方式,不过在这里介绍它有助于引入更多一般性的概念。
任务使用率是一种用百分比来描述任务吞吐率需求的方式。一个简单的近似是任务的占空比,也就是说::
task_util(p) = duty_cycle(p)
在频率固定的SMP系统中,100%的利用率意味着任务是忙等待循环。反之,10%的利用率暗示这是一个
小周期任务,它在睡眠上花费的时间比执行更多。
2.2 频率不变性
--------------
一个需要考虑的议题是,工作负载的占空比受CPU正在运行的操作性能值直接影响。考虑以给定的频率F
执行周期性工作负载::
CPU work ^
| ____ ____ ____
| | | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
可以算出 duty_cycle(p) == 25%。
现在,考虑以给定频率F/2执行 *同一个* 工作负载::
CPU work ^
| _________ _________ ____
| | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
可以算出 duty_cycle(p) == 50%,尽管两次执行中,任务的行为完全一致(也就是说,执行的工作量
相同)。
任务利用率信号可按下面公式处理成频率不变的(译注:这里的术语用到了信号与系统的概念)::
task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) / max_frequency(cpu))
对上面两个例子运用该公式,可以算出频率不变的任务利用率均为25%。
2.3 CPU不变性
-------------
CPU算力与任务利用率具有类型的效应,在算力不同的CPU上执行完全相同的工作负载,将算出不同的
占空比。
考虑1.3.2节提到的系统,也就是说::
- capacity(CPU0) = C
- capacity(CPU1) = C/3
每个CPU按最大频率执行指定周期性工作负载,结果为::
CPU0 work ^
| ____ ____ ____
| | | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
CPU1 work ^
| ______________ ______________ ____
| | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
也就是说,
- duty_cycle(p) == 25%,如果任务p在CPU0上按最大频率运行。
- duty_cycle(p) == 75%,如果任务p在CPU1上按最大频率运行。
任务利用率信号可按下面公式处理成CPU算力不变的::
task_util_cpu_inv(p) = duty_cycle(p) * (capacity(cpu) / max_capacity)
其中 ``max_capacity`` 是系统中最高的CPU算力。对上面的例子运用该公式,可以算出CPU算力不变
的任务利用率均为25%。
2.4 任务利用率不变量
--------------------
频率和CPU算力不变性都需要被应用到任务利用率的计算中,以便求出真正的不变信号。
任务利用率的伪计算公式是同时具备CPU和频率不变性的,也就是说,对于指定任务p::
curr_frequency(cpu) capacity(cpu)
task_util_inv(p) = duty_cycle(p) * ------------------- * -------------
max_frequency(cpu) max_capacity
也就是说,任务利用率不变量假定任务在系统中最高算力CPU上以最高频率运行,以此描述任务的行为。
在接下来的章节中提到的任何任务利用率,均是不变量的形式。
2.5 利用率估算
--------------
由于预测未来的水晶球不存在,当任务第一次变成可运行时,任务的行为和任务利用率均不能被准确预测。
CFS调度类基于实体负载跟踪机制(Per-Entity Load Tracking, PELT)维护了少量CPU和任务信号,
其中之一可以算出平均利用率(与瞬时相反)。
这意味着,尽管运用“真实的”任务利用率(凭借水晶球)写出算力感知调度的准则,但是它的实现将只能
用任务利用率的估算值。
3. 算力感知调度的需求
=====================
3.1 CPU算力
-----------
当前,Linux无法凭自身算出CPU算力,因此必须要有把这个信息传递给Linux的方式。每个架构必须为此
定义arch_scale_cpu_capacity()函数。
arm和arm64架构直接把这个信息映射到arch_topology驱动的CPU scaling数据中(译注:参考
arch_topology.h的percpu变量cpu_scale),它是从capacity-dmips-mhz CPU binding中衍生计算
出来的。参见Documentation/devicetree/bindings/arm/cpu-capacity.txt。
3.2 频率不变性
--------------
如2.2节所述,算力感知调度需要频率不变的任务利用率。每个架构必须为此定义
arch_scale_freq_capacity(cpu)函数。
实现该函数要求计算出每个CPU当前以什么频率在运行。实现它的一种方式是利用硬件计数器(x86的
APERF/MPERF,arm64的AMU),它能按CPU当前频率动态可扩展地升降递增计数器的速率。另一种方式是
在cpufreq频率变化时直接使用钩子函数,内核此时感知到将要被切换的频率(也被arm/arm64实现了)。
4. 调度器拓扑结构
=================
在构建调度域时,调度器将会发现系统是否表现为非对称CPU算力。如果是,那么:
- sched_asym_cpucapacity静态键(static key)将使能。
- SD_ASYM_CPUCAPACITY_FULL标志位将在尽量最低调度域层级中被设置,同时要满足条件:调度域恰好
完整包含某个CPU算力值的全部CPU。
- SD_ASYM_CPUCAPACITY标志将在所有包含非对称CPU的调度域中被设置。
sched_asym_cpucapacity静态键的设计意图是,保护为非对称CPU算力系统所准备的代码。不过要注意的
是,这个键是系统范围可见的。想象下面使用了cpuset的步骤::
capacity C/2 C
________ ________
/ \ / \
CPUs 0 1 2 3 4 5 6 7
\__/ \______________/
cpusets cs0 cs1
可以通过下面的方式创建:
.. code-block:: sh
mkdir /sys/fs/cgroup/cpuset/cs0
echo 0-1 > /sys/fs/cgroup/cpuset/cs0/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/cs0/cpuset.mems
mkdir /sys/fs/cgroup/cpuset/cs1
echo 2-7 > /sys/fs/cgroup/cpuset/cs1/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/cs1/cpuset.mems
echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_load_balance
由于“这是”非对称CPU算力系统,sched_asym_cpucapacity静态键将使能。然而,CPU 0--1对应的
调度域层级,算力值仅有一个,该层级中SD_ASYM_CPUCAPACITY未被设置,它描述的是一个SMP区域,也
应该被以此处理。
因此,“典型的”保护非对称CPU算力代码路径的代码模式是:
- 检查sched_asym_cpucapacity静态键
- 如果它被使能,接着检查调度域层级中SD_ASYM_CPUCAPACITY标志位是否出现
5. 算力感知调度的实现
=====================
5.1 CFS
-------
5.1.1 算力适应性(fitness)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
CFS最主要的算力调度准则是::
task_util(p) < capacity(task_cpu(p))
它通常被称为算力适应性准则。也就是说,CFS必须保证任务“适合”在某个CPU上运行。如果准则被违反,
任务将要更长地消耗该CPU,任务是CPU受限的(CPU-bound)。
此外,uclamp允许用户空间指定任务的最小和最大利用率,要么以sched_setattr()的方式,要么以
cgroup接口的方式(参阅Documentation/admin-guide/cgroup-v2.rst)。如其名字所暗示,uclamp
可以被用在前一条准则中限制task_util()。
5.1.2 被唤醒任务的CPU选择
~~~~~~~~~~~~~~~~~~~~~~~~~
CFS任务唤醒的CPU选择,遵循上面描述的算力适应性准则。在此之上,uclamp被用来限制任务利用率,
这令用户空间对CFS任务的CPU选择有更多的控制。也就是说,CFS被唤醒任务的CPU选择,搜索满足以下
条件的CPU::
clamp(task_util(p), task_uclamp_min(p), task_uclamp_max(p)) < capacity(cpu)
通过使用uclamp,举例来说,用户空间可以允许忙等待循环(100%使用率)在任意CPU上运行,只要给
它设置低的uclamp.max值。相反,uclamp能强制一个小的周期性任务(比如,10%利用率)在最高性能
的CPU上运行,只要给它设置高的uclamp.min值。
.. note::
CFS的被唤醒的任务的CPU选择,可被能耗感知调度(Energy Aware Scheduling,EAS)覆盖,在
Documentation/scheduler/sched-energy.rst中描述。
5.1.3 负载均衡
~~~~~~~~~~~~~~
被唤醒任务的CPU选择的一个病理性的例子是,任务几乎不睡眠,那么也几乎不发生唤醒。考虑::
w == wakeup event
capacity(CPU0) = C
capacity(CPU1) = C / 3
workload on CPU0
CPU work ^
| _________ _________ ____
| | | | | |
+----+----+----+----+----+----+----+----+----+----+-> time
w w w
workload on CPU1
CPU work ^
| ____________________________________________
| |
+----+----+----+----+----+----+----+----+----+----+->
w
该工作负载应该在CPU0上运行,不过如果任务满足以下条件之一:
- 一开始发生不合适的调度(不准确的初始利用率估计)
- 一开始调度正确,但突然需要更多的处理器功率
则任务可能变为CPU受限的,也就是说 ``task_util(p) > capacity(task_cpu(p))`` ;CPU算力
调度准则被违反,将不会有任何唤醒事件来修复这个错误的CPU选择。
这种场景下的任务被称为“不合适的”(misfit)任务,处理这个场景的机制同样也以此命名。Misfit
任务迁移借助CFS负载均衡器,更明确的说,是主动负载均衡的部分(用来迁移正在运行的任务)。
当发生负载均衡时,如果一个misfit任务可以被迁移到一个相较当前运行的CPU具有更高算力的CPU上,
那么misfit任务的主动负载均衡将被触发。
5.2 实时调度
------------
5.2.1 被唤醒任务的CPU选择
~~~~~~~~~~~~~~~~~~~~~~~~~
实时任务唤醒时的CPU选择,搜索满足以下条件的CPU::
task_uclamp_min(p) <= capacity(task_cpu(cpu))
同时仍然允许接着使用常规的优先级限制。如果没有CPU能满足这个算力准则,那么将使用基于严格
优先级的调度,CPU算力将被忽略。
5.3 最后期限调度
----------------
5.3.1 被唤醒任务的CPU选择
~~~~~~~~~~~~~~~~~~~~~~~~~
最后期限任务唤醒时的CPU选择,搜索满足以下条件的CPU::
task_bandwidth(p) < capacity(task_cpu(p))
同时仍然允许接着使用常规的带宽和截止期限限制。如果没有CPU能满足这个算力准则,那么任务依然
在当前CPU队列中。
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/sched-design-CFS.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
===============
完全公平调度器
===============
1. 概述
=======
CFS表示“完全公平调度器”,它是为桌面新设计的进程调度器,由Ingo Molnar实现并合入Linux
2.6.23。它替代了之前原始调度器中SCHED_OTHER策略的交互式代码。
CFS 80%的设计可以总结为一句话:CFS在真实硬件上建模了一个“理想的,精确的多任务CPU”。
“理想的多任务CPU”是一种(不存在的 :-))具有100%物理算力的CPU,它能让每个任务精确地以
相同的速度并行运行,速度均为1/nr_running。举例来说,如果有两个任务正在运行,那么每个
任务获得50%物理算力。 --- 也就是说,真正的并行。
在真实的硬件上,一次只能运行一个任务,所以我们需要介绍“虚拟运行时间”的概念。任务的虚拟
运行时间表明,它的下一个时间片将在上文描述的理想多任务CPU上开始执行。在实践中,任务的
虚拟运行时间由它的真实运行时间相较正在运行的任务总数归一化计算得到。
2. 一些实现细节
===============
在CFS中,虚拟运行时间由每个任务的p->se.vruntime(单位为纳秒)的值表达和跟踪。因此,
精确地计时和测量一个任务应得的“预期的CPU时间”是可能的。
一些细节:在“理想的”硬件上,所有的任务在任何时刻都应该具有一样的p->se.vruntime值,
--- 也就是说,任务应当同时执行,没有任务会在“理想的”CPU分时中变得“不平衡”。
CFS的任务选择逻辑基于p->se.vruntime的值,因此非常简单:总是试图选择p->se.vruntime值
最小的任务运行(也就是说,至今执行时间最少的任务)。CFS总是尽可能尝试按“理想多任务硬件”
那样将CPU时间在可运行任务中均分。
CFS剩下的其它设计,一般脱离了这个简单的概念,附加的设计包括nice级别,多处理,以及各种
用来识别已睡眠任务的算法变体。
3. 红黑树
=========
CFS的设计非常激进:它不使用运行队列的旧数据结构,而是使用按时间排序的红黑树,构建出
任务未来执行的“时间线”。因此没有任何“数组切换”的旧包袱(之前的原始调度器和RSDL/SD都
被它影响)。
CFS同样维护了rq->cfs.min_vruntime值,它是单调递增的,跟踪运行队列中的所有任务的最小
虚拟运行时间值。系统做的全部工作是:使用min_vruntime跟踪,然后用它的值将新激活的调度
实体尽可能地放在红黑树的左侧。
运行队列中正在运行的任务的总数由rq->cfs.load计数,它是运行队列中的任务的权值之和。
CFS维护了一个按时间排序的红黑树,所有可运行任务以p->se.vruntime为键值排序。CFS从这颗
树上选择“最左侧”的任务并运行。系统继续运行,被执行过的任务越来越被放到树的右侧 --- 缓慢,
但很明确每个任务都有成为“最左侧任务”的机会,因此任务将确定性地获得一定量CPU时间。
总结一下,CFS工作方式像这样:它运行一个任务一会儿,当任务发生调度(或者调度器时钟滴答
tick产生),就会考虑任务的CPU使用率:任务刚刚花在物理CPU上的(少量)时间被加到
p->se.vruntime。一旦p->se.vruntime变得足够大,其它的任务将成为按时间排序的红黑树的
“最左侧任务”(相较最左侧的任务,还要加上一个很小的“粒度”量,使得我们不会对任务过度调度,
导致缓存颠簸),然后新的最左侧任务将被选中,当前任务被抢占。
4. CFS的一些特征
================
CFS使用纳秒粒度的计时,不依赖于任何jiffies或HZ的细节。因此CFS并不像之前的调度器那样
有“时间片”的概念,也没有任何启发式的设计。唯一可调的参数(你需要打开CONFIG_SCHED_DEBUG)是:
/proc/sys/kernel/sched_min_granularity_ns
它可以用来将调度器从“桌面”模式(也就是低时延)调节为“服务器”(也就是高批处理)模式。
它的默认设置是适合桌面的工作负载。SCHED_BATCH也被CFS调度器模块处理。
CFS的设计不易受到当前存在的任何针对stock调度器的“攻击”的影响,包括fiftyp.c,thud.c,
chew.c,ring-test.c,massive_intr.c,它们都能很好地运行,不会影响交互性,将产生
符合预期的行为。
CFS调度器处理nice级别和SCHED_BATCH的能力比之前的原始调度器更强:两种类型的工作负载
都被更激进地隔离了。
SMP负载均衡被重做/清理过:遍历运行队列的假设已经从负载均衡的代码中移除,使用调度模块
的迭代器。结果是,负载均衡代码变得简单不少。
5. 调度策略
===========
CFS实现了三种调度策略:
- SCHED_NORMAL:(传统被称为SCHED_OTHER):该调度策略用于普通任务。
- SCHED_BATCH:抢占不像普通任务那样频繁,因此允许任务运行更长时间,更好地利用缓存,
不过要以交互性为代价。它很适合批处理工作。
- SCHED_IDLE:它比nice 19更弱,不过它不是真正的idle定时器调度器,因为要避免给机器
带来死锁的优先级反转问题。
SCHED_FIFO/_RR被实现在sched/rt.c中,它们由POSIX具体说明。
util-linux-ng 2.13.1.1中的chrt命令可以设置以上所有策略,除了SCHED_IDLE。
6. 调度类
=========
新的CFS调度器被设计成支持“调度类”,一种调度模块的可扩展层次结构。这些模块封装了调度策略
细节,由调度器核心代码处理,且无需对它们做太多假设。
sched/fair.c 实现了上文描述的CFS调度器。
sched/rt.c 实现了SCHED_FIFO和SCHED_RR语义,且比之前的原始调度器更简洁。它使用了100个
运行队列(总共100个实时优先级,替代了之前调度器的140个),且不需要过期数组(expired
array)。
调度类由sched_class结构体实现,它包括一些函数钩子,当感兴趣的事件发生时,钩子被调用。
这是(部分)钩子的列表:
- enqueue_task(...)
当任务进入可运行状态时,被调用。它将调度实体(任务)放到红黑树中,增加nr_running变量
的值。
- dequeue_task(...)
当任务不再可运行时,这个函数被调用,对应的调度实体被移出红黑树。它减少nr_running变量
的值。
- yield_task(...)
这个函数的行为基本上是出队,紧接着入队,除非compat_yield sysctl被开启。在那种情况下,
它将调度实体放在红黑树的最右端。
- check_preempt_curr(...)
这个函数检查进入可运行状态的任务能否抢占当前正在运行的任务。
- pick_next_task(...)
这个函数选择接下来最适合运行的任务。
- set_curr_task(...)
这个函数在任务改变调度类或改变任务组时被调用。
- task_tick(...)
这个函数最常被时间滴答函数调用,它可能导致进程切换。这驱动了运行时抢占。
7. CFS的组调度扩展
==================
通常,调度器操作粒度为任务,努力为每个任务提供公平的CPU时间。有时可能希望将任务编组,
并为每个组提供公平的CPU时间。举例来说,可能首先希望为系统中的每个用户提供公平的CPU
时间,接下来才是某个用户的每个任务。
CONFIG_CGROUP_SCHED 力求实现它。它将任务编组,并为这些组公平地分配CPU时间。
CONFIG_RT_GROUP_SCHED 允许将实时(也就是说,SCHED_FIFO和SCHED_RR)任务编组。
CONFIG_FAIR_GROUP_SCHED 允许将CFS(也就是说,SCHED_NORMAL和SCHED_BATCH)任务编组。
这些编译选项要求CONFIG_CGROUPS被定义,然后管理员能使用cgroup伪文件系统任意创建任务组。
关于该文件系统的更多信息,参见Documentation/admin-guide/cgroup-v1/cgroups.rst
当CONFIG_FAIR_GROUP_SCHED被定义后,通过伪文件系统,每个组被创建一个“cpu.shares”文件。
参见下面的例子来创建任务组,并通过“cgroup”伪文件系统修改它们的CPU份额::
# mount -t tmpfs cgroup_root /sys/fs/cgroup
# mkdir /sys/fs/cgroup/cpu
# mount -t cgroup -ocpu none /sys/fs/cgroup/cpu
# cd /sys/fs/cgroup/cpu
# mkdir multimedia # 创建 "multimedia" 任务组
# mkdir browser # 创建 "browser" 任务组
# #配置multimedia组,令其获得browser组两倍CPU带宽
# echo 2048 > multimedia/cpu.shares
# echo 1024 > browser/cpu.shares
# firefox & # 启动firefox并把它移到 "browser" 组
# echo <firefox_pid> > browser/tasks
# #启动gmplayer(或者你最喜欢的电影播放器)
# echo <movie_player_pid> > multimedia/tasks
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/scheduler/sched-domains.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
:校译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
======
调度域
======
每个CPU有一个“基”调度域(struct sched_domain)。调度域层次结构从基调度域构建而来,可
通过->parent指针自下而上遍历。->parent必须以NULL结尾,调度域结构体必须是per-CPU的,
因为它们无锁更新。
每个调度域管辖数个CPU(存储在->span字段中)。一个调度域的span必须是它的子调度域span的
超集(如有需求出现,这个限制可以放宽)。CPU i的基调度域必须至少管辖CPU i。每个CPU的
顶层调度域通常将会管辖系统中的全部CPU,尽管严格来说这不是必须的,假如是这样,会导致某些
CPU出现永远不会被指定任务运行的情况,直到允许的CPU掩码被显式设定。调度域的span字段意味
着“在这些CPU中做进程负载均衡”。
每个调度域必须具有一个或多个CPU调度组(struct sched_group),它们以单向循环链表的形式
组织,存储在->groups指针中。这些组的CPU掩码的并集必须和调度域span字段一致。->groups
指针指向的这些组包含的CPU,必须被调度域管辖。组包含的是只读数据,被创建之后,可能被多个
CPU共享。任意两个组的CPU掩码的交集不一定为空,如果是这种情况,对应调度域的SD_OVERLAP
标志位被设置,它管辖的调度组可能不能在多个CPU中共享。
调度域中的负载均衡发生在调度组中。也就是说,每个组被视为一个实体。组的负载被定义为它
管辖的每个CPU的负载之和。仅当组的负载不均衡后,任务才在组之间发生迁移。
在kernel/sched/core.c中,trigger_load_balance()在每个CPU上通过scheduler_tick()
周期执行。在当前运行队列下一个定期调度再平衡事件到达后,它引发一个软中断。负载均衡真正
的工作由run_rebalance_domains()->rebalance_domains()完成,在软中断上下文中执行
(SCHED_SOFTIRQ)。
后一个函数有两个入参:当前CPU的运行队列、它在scheduler_tick()调用时是否空闲。函数会从
当前CPU所在的基调度域开始迭代执行,并沿着parent指针链向上进入更高层级的调度域。在迭代
过程中,函数会检查当前调度域是否已经耗尽了再平衡的时间间隔,如果是,它在该调度域运行
load_balance()。接下来它检查父调度域(如果存在),再后来父调度域的父调度域,以此类推。
起初,load_balance()查找当前调度域中最繁忙的调度组。如果成功,在该调度组管辖的全部CPU
的运行队列中找出最繁忙的运行队列。如能找到,对当前的CPU运行队列和新找到的最繁忙运行
队列均加锁,并把任务从最繁忙队列中迁移到当前CPU上。被迁移的任务数量等于在先前迭代执行
中计算出的该调度域的调度组的不均衡值。
实现调度域
==========
基调度域会管辖CPU层次结构中的第一层。对于超线程(SMT)而言,基调度域将会管辖同一个物理
CPU的全部虚拟CPU,每个虚拟CPU对应一个调度组。
在SMP中,基调度域的父调度域将会管辖同一个结点中的全部物理CPU,每个调度组对应一个物理CPU。
接下来,如果是非统一内存访问(NUMA)系统,SMP调度域的父调度域将管辖整个机器,一个结点的
CPU掩码对应一个调度组。亦或,你可以使用多级NUMA;举例来说Opteron处理器,可能仅用一个
调度域来覆盖它的一个NUMA层级。
实现者需要阅读include/linux/sched/sd_flags.h的注释:读SD_*来了解具体情况以及调度域的
SD标志位调节了哪些东西。
体系结构可以把指定的拓扑层级的通用调度域构建器和默认的SD标志位覆盖掉,方法是创建一个
sched_domain_topology_level数组,并以该数组作为入参调用set_sched_topology()。
调度域调试基础设施可以通过CONFIG_SCHED_DEBUG开启,并在开机启动命令行中增加
“sched_verbose”。如果你忘记调整开机启动命令行了,也可以打开
/sys/kernel/debug/sched/verbose开关。这将开启调度域错误检查的解析,它应该能捕获(上文
描述过的)绝大多数错误,同时以可视化格式打印调度域的结构。
......@@ -34,7 +34,8 @@ The Linux kernel supports the following overcommit handling modes
The overcommit policy is set via the sysctl ``vm.overcommit_memory``.
The overcommit amount can be set via ``vm.overcommit_ratio`` (percentage)
or ``vm.overcommit_kbytes`` (absolute value).
or ``vm.overcommit_kbytes`` (absolute value). These only have an effect
when ``vm.overcommit_memory`` is set to 2.
The current overcommit limit and amount committed are viewable in
``/proc/meminfo`` as CommitLimit and Committed_AS respectively.
......
......@@ -18413,6 +18413,7 @@ M: Vineet Gupta <vgupta@kernel.org>
L: linux-snps-arc@lists.infradead.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git
F: Documentation/arc/
F: Documentation/devicetree/bindings/arc/*
F: Documentation/devicetree/bindings/interrupt-controller/snps,arc*
F: arch/arc/
......@@ -19430,12 +19431,6 @@ W: https://github.com/srcres258/linux-doc
T: git git://github.com/srcres258/linux-doc.git doc-zh-tw
F: Documentation/translations/zh_TW/
TRIVIAL PATCHES
M: Jiri Kosina <trivial@kernel.org>
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial.git
K: ^Subject:.*(?i)trivial
TTY LAYER
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
M: Jiri Slaby <jirislaby@kernel.org>
......
......@@ -78,6 +78,7 @@ my %texlive = (
'ucs.sty' => 'texlive-ucs',
'upquote.sty' => 'texlive-upquote',
'wrapfig.sty' => 'texlive-wrapfig',
'ctexhook.sty' => 'texlive-ctex',
);
#
......@@ -369,6 +370,9 @@ sub give_debian_hints()
);
if ($pdf) {
check_missing_file(["/usr/share/texlive/texmf-dist/tex/latex/ctex/ctexhook.sty"],
"texlive-lang-chinese", 2);
check_missing_file(["/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf"],
"fonts-dejavu", 2);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment