Commit 624ad333 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.16' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "This is a relatively unexciting cycle for documentation.

   - Some small scripts/kerneldoc fixes

   - More Chinese translation work, but at a much reduced rate.

   - The tip-tree maintainer's handbook

  ...plus the usual array of build fixes, typo fixes, etc"

* tag 'docs-5.16' of git://git.lwn.net/linux: (53 commits)
  kernel-doc: support DECLARE_PHY_INTERFACE_MASK()
  docs/zh_CN: add core-api xarray translation
  docs/zh_CN: add core-api assoc_array translation
  speakup: Fix typo in documentation "boo" -> "boot"
  docs: submitting-patches: make section about the Link: tag more explicit
  docs: deprecated.rst: Clarify open-coded arithmetic with literals
  scripts: documentation-file-ref-check: fix bpf selftests path
  scripts: documentation-file-ref-check: ignore hidden files
  coding-style.rst: trivial: fix location of driver model macros
  docs: f2fs: fix text alignment
  docs/zh_CN add PCI pci.rst translation
  docs/zh_CN add PCI index.rst translation
  docs: translations: zh_CN: memory-hotplug.rst: fix a typo
  docs: translations: zn_CN: irq-affinity.rst: add a missing extension
  block: add documentation for inflight
  scripts: kernel-doc: Ignore __alloc_size() attribute
  docs: pdfdocs: Adjust \headheight for fancyhdr
  docs: UML: user_mode_linux_howto_v2 edits
  docs: use the lore redirector everywhere
  docs: proc.rst: mountinfo: align columns
  ...
parents 313b6ffc 603bdf5d
...@@ -28,6 +28,22 @@ Description: ...@@ -28,6 +28,22 @@ Description:
For more details refer Documentation/admin-guide/iostats.rst For more details refer Documentation/admin-guide/iostats.rst
What: /sys/block/<disk>/inflight
Date: October 2009
Contact: Jens Axboe <axboe@kernel.dk>, Nikanth Karthikesan <knikanth@suse.de>
Description:
Reports the number of I/O requests currently in progress
(pending / in flight) in a device driver. This can be less
than the number of requests queued in the block device queue.
The report contains 2 fields: one for read requests
and one for write requests.
The value type is unsigned int.
Cf. Documentation/block/stat.rst which contains a single value for
requests in flight.
This is related to nr_requests in Documentation/block/queue-sysfs.rst
and for SCSI device also its queue_depth.
What: /sys/block/<disk>/diskseq What: /sys/block/<disk>/diskseq
Date: February 2021 Date: February 2021
Contact: Matteo Croce <mcroce@microsoft.com> Contact: Matteo Croce <mcroce@microsoft.com>
......
...@@ -29,7 +29,7 @@ Description: ...@@ -29,7 +29,7 @@ Description:
What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
Date: December 2019 Date: December 2019
KernelVersion: 5.6 KernelVersion: 5.6
Contact: SeongJae Park <sjpark@amazon.de> Contact: SeongJae Park <sj@kernel.org>
Description: Description:
When memory pressure is reported to blkback this option When memory pressure is reported to blkback this option
controls the duration in milliseconds that blkback will not controls the duration in milliseconds that blkback will not
...@@ -39,7 +39,7 @@ Description: ...@@ -39,7 +39,7 @@ Description:
What: /sys/module/xen_blkback/parameters/feature_persistent What: /sys/module/xen_blkback/parameters/feature_persistent
Date: September 2020 Date: September 2020
KernelVersion: 5.10 KernelVersion: 5.10
Contact: SeongJae Park <sjpark@amazon.de> Contact: SeongJae Park <sj@kernel.org>
Description: Description:
Whether to enable the persistent grants feature or not. Note Whether to enable the persistent grants feature or not. Note
that this option only takes effect on newly created backends. that this option only takes effect on newly created backends.
......
...@@ -12,7 +12,7 @@ Description: ...@@ -12,7 +12,7 @@ Description:
What: /sys/module/xen_blkfront/parameters/feature_persistent What: /sys/module/xen_blkfront/parameters/feature_persistent
Date: September 2020 Date: September 2020
KernelVersion: 5.10 KernelVersion: 5.10
Contact: SeongJae Park <sjpark@amazon.de> Contact: SeongJae Park <sj@kernel.org>
Description: Description:
Whether to enable the persistent grants feature or not. Note Whether to enable the persistent grants feature or not. Note
that this option only takes effect on newly created frontends. that this option only takes effect on newly created frontends.
......
...@@ -196,6 +196,28 @@ you can go through every map in the process, find the PFNs, look those up ...@@ -196,6 +196,28 @@ you can go through every map in the process, find the PFNs, look those up
in kpagecount, and tally up the number of pages that are only referenced in kpagecount, and tally up the number of pages that are only referenced
once. once.
Exceptions for Shared Memory
============================
Page table entries for shared pages are cleared when the pages are zapped or
swapped out. This makes swapped out pages indistinguishable from never-allocated
ones.
In kernel space, the swap location can still be retrieved from the page cache.
However, values stored only on the normal PTE get lost irretrievably when the
page is swapped out (i.e. SOFT_DIRTY).
In user space, whether the page is present, swapped or none can be deduced with
the help of lseek and/or mincore system calls.
lseek() can differentiate between accessed pages (present or swapped out) and
holes (none/non-allocated) by specifying the SEEK_DATA flag on the file where
the pages are backed. For anonymous shared pages, the file can be found in
``/proc/pid/map_files/``.
mincore() can differentiate between pages in memory (present, including swap
cache) and out of memory (swapped out or none/non-allocated).
Other notes Other notes
=========== ===========
......
...@@ -543,7 +543,7 @@ As mentioned earlier, Speakup can either be completely compiled into the ...@@ -543,7 +543,7 @@ As mentioned earlier, Speakup can either be completely compiled into the
kernel, with the exception of the help module, or it can be compiled as kernel, with the exception of the help module, or it can be compiled as
a series of modules. When compiled as modules, Speakup will only be a series of modules. When compiled as modules, Speakup will only be
able to speak some of the bootup messages if your system administrator able to speak some of the bootup messages if your system administrator
has configured the system to load the modules at boo time. The modules has configured the system to load the modules at boot time. The modules
can be loaded after the file systems have been checked and mounted, or can be loaded after the file systems have been checked and mounted, or
from an initrd. There is a third possibility. Speakup can be compiled from an initrd. There is a third possibility. Speakup can be compiled
with some components built into the kernel, and others as modules. As with some components built into the kernel, and others as modules. As
......
...@@ -21,6 +21,7 @@ Orion family ...@@ -21,6 +21,7 @@ Orion family
- Datasheet: https://web.archive.org/web/20210124231420/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-datasheet.pdf - Datasheet: https://web.archive.org/web/20210124231420/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-datasheet.pdf
- Programmer's User Guide: https://web.archive.org/web/20210124231536/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-opensource-manual.pdf - Programmer's User Guide: https://web.archive.org/web/20210124231536/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-opensource-manual.pdf
- User Manual: https://web.archive.org/web/20210124231631/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-usermanual.pdf - User Manual: https://web.archive.org/web/20210124231631/http://csclub.uwaterloo.ca/~board/ts7800/MV88F5182-usermanual.pdf
- Functional Errata: https://web.archive.org/web/20210704165540/https://www.digriz.org.uk/ts78xx/88F5182_Functional_Errata.pdf
- 88F5281 - 88F5281
- Datasheet: https://web.archive.org/web/20131028144728/http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf - Datasheet: https://web.archive.org/web/20131028144728/http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf
...@@ -212,6 +213,7 @@ EBU Armada family ARMv8 ...@@ -212,6 +213,7 @@ EBU Armada family ARMv8
arch/arm64/boot/dts/marvell/armada-37* arch/arm64/boot/dts/marvell/armada-37*
Armada 7K Flavors: Armada 7K Flavors:
- 88F6040 (AP806 Quad 600 MHz + one CP110)
- 88F7020 (AP806 Dual + one CP110) - 88F7020 (AP806 Dual + one CP110)
- 88F7040 (AP806 Quad + one CP110) - 88F7040 (AP806 Quad + one CP110)
...@@ -243,6 +245,23 @@ EBU Armada family ARMv8 ...@@ -243,6 +245,23 @@ EBU Armada family ARMv8
Device tree files: Device tree files:
arch/arm64/boot/dts/marvell/armada-80* arch/arm64/boot/dts/marvell/armada-80*
Octeon TX2 CN913x Flavors:
- CN9130 (AP807 Quad + one internal CP115)
- CN9131 (AP807 Quad + one internal CP115 + one external CP115 / 88F8215)
- CN9132 (AP807 Quad + one internal CP115 + two external CP115 / 88F8215)
Core:
ARM Cortex A72
Homepage:
https://web.archive.org/web/20200803150818/https://www.marvell.com/products/infrastructure-processors/multi-core-processors/octeon-tx2/octeon-tx2-cn9130.html
Product Brief:
https://web.archive.org/web/20200803150818/https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-infrastructure-processors-octeon-tx2-cn913x-product-brief-2020-02.pdf
Device tree files:
arch/arm64/boot/dts/marvell/cn913*
Avanta family Avanta family
------------- -------------
......
...@@ -64,7 +64,7 @@ macros, it was decided that brand new macros should be introduced instead:: ...@@ -64,7 +64,7 @@ macros, it was decided that brand new macros should be introduced instead::
of importing all the crappy, historic, essentially randomly chosen of importing all the crappy, historic, essentially randomly chosen
debug symbol macro names from the binutils and older kernels? debug symbol macro names from the binutils and older kernels?
.. _discussion: https://lkml.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz .. _discussion: https://lore.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz
Macros Description Macros Description
------------------ ------------------
......
...@@ -40,10 +40,11 @@ discard_max_hw_bytes (RO) ...@@ -40,10 +40,11 @@ discard_max_hw_bytes (RO)
------------------------- -------------------------
Devices that support discard functionality may have internal limits on Devices that support discard functionality may have internal limits on
the number of bytes that can be trimmed or unmapped in a single operation. the number of bytes that can be trimmed or unmapped in a single operation.
The discard_max_bytes parameter is set by the device driver to the maximum The `discard_max_hw_bytes` parameter is set by the device driver to the
number of bytes that can be discarded in a single operation. Discard maximum number of bytes that can be discarded in a single operation.
requests issued to the device must not exceed this limit. A discard_max_bytes Discard requests issued to the device must not exceed this limit.
value of 0 means that the device does not support discard functionality. A `discard_max_hw_bytes` value of 0 means that the device does not support
discard functionality.
discard_max_bytes (RW) discard_max_bytes (RW)
---------------------- ----------------------
......
...@@ -353,6 +353,9 @@ latex_elements = { ...@@ -353,6 +353,9 @@ latex_elements = {
\\setsansfont{DejaVu Sans} \\setsansfont{DejaVu Sans}
\\setromanfont{DejaVu Serif} \\setromanfont{DejaVu Serif}
\\setmonofont{DejaVu Sans Mono} \\setmonofont{DejaVu Sans Mono}
% Adjust \\headheight for fancyhdr
\\addtolength{\\headheight}{1.6pt}
\\addtolength{\\topmargin}{-1.6pt}
''', ''',
} }
......
...@@ -710,6 +710,39 @@ Indentation and Line Breaks ...@@ -710,6 +710,39 @@ Indentation and Line Breaks
See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings
**SPLIT_STRING**
Quoted strings that appear as messages in userspace and can be
grepped, should not be split across multiple lines.
See: https://lore.kernel.org/lkml/20120203052727.GA15035@leaf/
**MULTILINE_DEREFERENCE**
A single dereferencing identifier spanned on multiple lines like::
struct_identifier->member[index].
member = <foo>;
is generally hard to follow. It can easily lead to typos and so makes
the code vulnerable to bugs.
If fixing the multiple line dereferencing leads to an 80 column
violation, then either rewrite the code in a more simple way or if the
starting part of the dereferencing identifier is the same and used at
multiple places then store it in a temporary variable, and use that
temporary variable only at all the places. For example, if there are
two dereferencing identifiers::
member1->member2->member3.foo1;
member1->member2->member3.foo2;
then store the member1->member2->member3 part in a temporary variable.
It not only helps to avoid the 80 column violation but also reduces
the program size by removing the unnecessary dereferences.
But if none of the above methods work then ignore the 80 column
violation because it is much easier to read a dereferencing identifier
on a single line.
**TRAILING_STATEMENTS** **TRAILING_STATEMENTS**
Trailing statements (for example after any conditional) should be Trailing statements (for example after any conditional) should be
on the next line. on the next line.
...@@ -845,6 +878,38 @@ Macros, Attributes and Symbols ...@@ -845,6 +878,38 @@ Macros, Attributes and Symbols
Use the `fallthrough;` pseudo keyword instead of Use the `fallthrough;` pseudo keyword instead of
`/* fallthrough */` like comments. `/* fallthrough */` like comments.
**TRAILING_SEMICOLON**
Macro definition should not end with a semicolon. The macro
invocation style should be consistent with function calls.
This can prevent any unexpected code paths::
#define MAC do_something;
If this macro is used within a if else statement, like::
if (some_condition)
MAC;
else
do_something;
Then there would be a compilation error, because when the macro is
expanded there are two trailing semicolons, so the else branch gets
orphaned.
See: https://lore.kernel.org/lkml/1399671106.2912.21.camel@joe-AO725/
**SINGLE_STATEMENT_DO_WHILE_MACRO**
For the multi-statement macros, it is necessary to use the do-while
loop to avoid unpredictable code paths. The do-while loop helps to
group the multiple statements into a single one so that a
function-like macro can be used as a function only.
But for the single statement macros, it is unnecessary to use the
do-while loop. Although the code is syntactically correct but using
the do-while loop is redundant. So remove the do-while loop for single
statement macros.
**WEAK_DECLARATION** **WEAK_DECLARATION**
Using weak declarations like __attribute__((weak)) or __weak Using weak declarations like __attribute__((weak)) or __weak
can have unintended link defects. Avoid using them. can have unintended link defects. Avoid using them.
...@@ -920,6 +985,11 @@ Functions and Variables ...@@ -920,6 +985,11 @@ Functions and Variables
Your compiler (or rather your loader) automatically does Your compiler (or rather your loader) automatically does
it for you. it for you.
**MULTIPLE_ASSIGNMENTS**
Multiple assignments on a single line makes the code unnecessarily
complicated. So on a single line assign value to a single variable
only, this makes the code more readable and helps avoid typos.
**RETURN_PARENTHESES** **RETURN_PARENTHESES**
return is not a function and as such doesn't need parentheses:: return is not a function and as such doesn't need parentheses::
...@@ -957,6 +1027,17 @@ Permissions ...@@ -957,6 +1027,17 @@ Permissions
Permission bits should use 4 digit octal permissions (like 0700 or 0444). Permission bits should use 4 digit octal permissions (like 0700 or 0444).
Avoid using any other base like decimal. Avoid using any other base like decimal.
**SYMBOLIC_PERMS**
Permission bits in the octal form are more readable and easier to
understand than their symbolic counterparts because many command-line
tools use this notation. Experienced kernel developers have been using
these traditional Unix permission bits for decades and so they find it
easier to understand the octal notation than the symbolic macros.
For example, it is harder to read S_IWUSR|S_IRUGO than 0644, which
obscures the developer's intent rather than clarifying it.
See: https://lore.kernel.org/lkml/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com/
Spacing and Brackets Spacing and Brackets
-------------------- --------------------
......
...@@ -12,41 +12,31 @@ track the inode as orphan so that in case of crash extra blocks allocated to ...@@ -12,41 +12,31 @@ track the inode as orphan so that in case of crash extra blocks allocated to
the file get truncated. the file get truncated.
Traditionally ext4 tracks orphan inodes in a form of single linked list where Traditionally ext4 tracks orphan inodes in a form of single linked list where
superblock contains the inode number of the last orphan inode (s\_last\_orphan superblock contains the inode number of the last orphan inode (s_last_orphan
field) and then each inode contains inode number of the previously orphaned field) and then each inode contains inode number of the previously orphaned
inode (we overload i\_dtime inode field for this). However this filesystem inode (we overload i_dtime inode field for this). However this filesystem
global single linked list is a scalability bottleneck for workloads that result global single linked list is a scalability bottleneck for workloads that result
in heavy creation of orphan inodes. When orphan file feature in heavy creation of orphan inodes. When orphan file feature
(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode (COMPAT_ORPHAN_FILE) is enabled, the filesystem has a special inode
(referenced from the superblock through s\_orphan_file_inum) with several (referenced from the superblock through s_orphan_file_inum) with several
blocks. Each of these blocks has a structure: blocks. Each of these blocks has a structure:
.. list-table:: ============= ================ =============== ===============================
:widths: 8 8 24 40 Offset Type Name Description
:header-rows: 1 ============= ================ =============== ===============================
0x0 Array of Orphan inode Each __le32 entry is either
* - Offset __le32 entries entries empty (0) or it contains
- Type inode number of an orphan
- Name inode.
- Description blocksize-8 __le32 ob_magic Magic value stored in orphan
* - 0x0 block tail (0x0b10ca04)
- Array of \_\_le32 entries blocksize-4 __le32 ob_checksum Checksum of the orphan block.
- Orphan inode entries ============= ================ =============== ===============================
- Each \_\_le32 entry is either empty (0) or it contains inode number of
an orphan inode.
* - blocksize - 8
- \_\_le32
- ob\_magic
- Magic value stored in orphan block tail (0x0b10ca04)
* - blocksize - 4
- \_\_le32
- ob\_checksum
- Checksum of the orphan block.
When a filesystem with orphan file feature is writeably mounted, we set When a filesystem with orphan file feature is writeably mounted, we set
RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may RO_COMPAT_ORPHAN_PRESENT feature in the superblock to indicate there may
be valid orphan entries. In case we see this feature when mounting the be valid orphan entries. In case we see this feature when mounting the
filesystem, we read the whole orphan file and process all orphan inodes found filesystem, we read the whole orphan file and process all orphan inodes found
there as usual. When cleanly unmounting the filesystem we remove the there as usual. When cleanly unmounting the filesystem we remove the
RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan RO_COMPAT_ORPHAN_PRESENT feature to avoid unnecessary scanning of the orphan
file and also make the filesystem fully compatible with older kernels. file and also make the filesystem fully compatible with older kernels.
...@@ -283,7 +283,7 @@ compress_extension=%s Support adding specified extension, so that f2fs can enab ...@@ -283,7 +283,7 @@ compress_extension=%s Support adding specified extension, so that f2fs can enab
For other files, we can still enable compression via ioctl. For other files, we can still enable compression via ioctl.
Note that, there is one reserved special extension '*', it Note that, there is one reserved special extension '*', it
can be set to enable compression for all files. can be set to enable compression for all files.
nocompress_extension=%s Support adding specified extension, so that f2fs can disable nocompress_extension=%s Support adding specified extension, so that f2fs can disable
compression on those corresponding files, just contrary to compression extension. compression on those corresponding files, just contrary to compression extension.
If you know exactly which files cannot be compressed, you can use this. If you know exactly which files cannot be compressed, you can use this.
The same extension name can't appear in both compress and nocompress The same extension name can't appear in both compress and nocompress
......
...@@ -1857,19 +1857,19 @@ For example:: ...@@ -1857,19 +1857,19 @@ For example::
This file contains lines of the form:: This file contains lines of the form::
36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) (1)(2)(3) (4) (5) (6) (n…m) (m+1)(m+2) (m+3) (m+4)
(1) mount ID: unique identifier of the mount (may be reused after umount) (1) mount ID: unique identifier of the mount (may be reused after umount)
(2) parent ID: ID of parent (or of self for the top of the mount tree) (2) parent ID: ID of parent (or of self for the top of the mount tree)
(3) major:minor: value of st_dev for files on filesystem (3) major:minor: value of st_dev for files on filesystem
(4) root: root of the mount within the filesystem (4) root: root of the mount within the filesystem
(5) mount point: mount point relative to the process's root (5) mount point: mount point relative to the process's root
(6) mount options: per mount options (6) mount options: per mount options
(7) optional fields: zero or more fields of the form "tag[:value]" (nm) optional fields: zero or more fields of the form "tag[:value]"
(8) separator: marks the end of the optional fields (m+1) separator: marks the end of the optional fields
(9) filesystem type: name of filesystem of the form "type[.subtype]" (m+2) filesystem type: name of filesystem of the form "type[.subtype]"
(10) mount source: filesystem specific information or "none" (m+3) mount source: filesystem specific information or "none"
(11) super options: per super block options (m+4) super options: per super block options
Parsers should ignore all unrecognised optional fields. Currently the Parsers should ignore all unrecognised optional fields. Currently the
possible optional fields are: possible optional fields are:
......
...@@ -42,7 +42,7 @@ ...@@ -42,7 +42,7 @@
# "select FW_LOADER" [0], in the end the simple alternative solution to this # "select FW_LOADER" [0], in the end the simple alternative solution to this
# problem consisted on matching semantics with newly introduced features. # problem consisted on matching semantics with newly introduced features.
# #
# [0] https://lkml.kernel.org/r/1432241149-8762-1-git-send-email-mcgrof@do-not-panic.com # [0] https://lore.kernel.org/r/1432241149-8762-1-git-send-email-mcgrof@do-not-panic.com
mainmenu "Simple example to demo cumulative kconfig recursive dependency implication" mainmenu "Simple example to demo cumulative kconfig recursive dependency implication"
......
...@@ -15,7 +15,7 @@ please direct abuse to Tobin C. Harding <me@tobin.cc>. ...@@ -15,7 +15,7 @@ please direct abuse to Tobin C. Harding <me@tobin.cc>.
Original email thread:: Original email thread::
http://lkml.kernel.org/r/20171114110500.GA21175@kroah.com https://lore.kernel.org/r/20171114110500.GA21175@kroah.com
Create Branch Create Branch
......
...@@ -50,7 +50,7 @@ the excellent reporting over at LWN.net or read the original code. ...@@ -50,7 +50,7 @@ the excellent reporting over at LWN.net or read the original code.
patchset patchset
[PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY [PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY
https://lkml.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com https://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com
Interface Interface
......
...@@ -480,13 +480,48 @@ closing function brace line. E.g.: ...@@ -480,13 +480,48 @@ closing function brace line. E.g.:
} }
EXPORT_SYMBOL(system_is_up); EXPORT_SYMBOL(system_is_up);
6.1) Function prototypes
************************
In function prototypes, include parameter names with their data types. In function prototypes, include parameter names with their data types.
Although this is not required by the C language, it is preferred in Linux Although this is not required by the C language, it is preferred in Linux
because it is a simple way to add valuable information for the reader. because it is a simple way to add valuable information for the reader.
Do not use the ``extern`` keyword with function prototypes as this makes Do not use the ``extern`` keyword with function declarations as this makes
lines longer and isn't strictly necessary. lines longer and isn't strictly necessary.
When writing function prototypes, please keep the `order of elements regular
<https://lore.kernel.org/mm-commits/CAHk-=wiOCLRny5aifWNhr621kYrJwhfURsa0vFPeUEm8mF0ufg@mail.gmail.com/>`_.
For example, using this function declaration example::
__init void * __must_check action(enum magic value, size_t size, u8 count,
char *fmt, ...) __printf(4, 5) __malloc;
The preferred order of elements for a function prototype is:
- storage class (below, ``static __always_inline``, noting that ``__always_inline``
is technically an attribute but is treated like ``inline``)
- storage class attributes (here, ``__init`` -- i.e. section declarations, but also
things like ``__cold``)
- return type (here, ``void *``)
- return type attributes (here, ``__must_check``)
- function name (here, ``action``)
- function parameters (here, ``(enum magic value, size_t size, u8 count, char *fmt, ...)``,
noting that parameter names should always be included)
- function parameter attributes (here, ``__printf(4, 5)``)
- function behavior attributes (here, ``__malloc``)
Note that for a function **definition** (i.e. the actual function body),
the compiler does not allow function parameter attributes after the
function parameters. In these cases, they should go after the storage
class attributes (e.g. note the changed position of ``__printf(4, 5)``
below, compared to the **declaration** example above)::
static __always_inline __init __printf(4, 5) void * __must_check action(enum magic value,
size_t size, u8 count, char *fmt, ...) __malloc
{
...
}
7) Centralized exiting of functions 7) Centralized exiting of functions
----------------------------------- -----------------------------------
...@@ -855,7 +890,7 @@ Kernel messages do not have to be terminated with a period. ...@@ -855,7 +890,7 @@ Kernel messages do not have to be terminated with a period.
Printing numbers in parentheses (%d) adds no value and should be avoided. Printing numbers in parentheses (%d) adds no value and should be avoided.
There are a number of driver model diagnostic macros in <linux/device.h> There are a number of driver model diagnostic macros in <linux/dev_printk.h>
which you should use to make sure messages are matched to the right device which you should use to make sure messages are matched to the right device
and driver, and are tagged with the right level: dev_err(), dev_warn(), and driver, and are tagged with the right level: dev_err(), dev_warn(),
dev_info(), and so forth. For messages that aren't associated with a dev_info(), and so forth. For messages that aren't associated with a
......
...@@ -59,8 +59,9 @@ risk of them overflowing. This could lead to values wrapping around and a ...@@ -59,8 +59,9 @@ risk of them overflowing. This could lead to values wrapping around and a
smaller allocation being made than the caller was expecting. Using those smaller allocation being made than the caller was expecting. Using those
allocations could lead to linear overflows of heap memory and other allocations could lead to linear overflows of heap memory and other
misbehaviors. (One exception to this is literal values where the compiler misbehaviors. (One exception to this is literal values where the compiler
can warn if they might overflow. Though using literals for arguments as can warn if they might overflow. However, the preferred way in these
suggested below is also harmless.) cases is to refactor the code as suggested below to avoid the open-coded
arithmetic.)
For example, do not use ``count * size`` as an argument, as in:: For example, do not use ``count * size`` as an argument, as in::
......
...@@ -27,6 +27,7 @@ Below are the essential guides that every developer should read. ...@@ -27,6 +27,7 @@ Below are the essential guides that every developer should read.
submitting-patches submitting-patches
programming-language programming-language
coding-style coding-style
maintainer-handbooks
maintainer-pgp-guide maintainer-pgp-guide
email-clients email-clients
kernel-enforcement-statement kernel-enforcement-statement
......
.. SPDX-License-Identifier: GPL-2.0
.. _maintainer_handbooks_main:
Subsystem and maintainer tree specific development process notes
================================================================
The purpose of this document is to provide subsystem specific information
which is supplementary to the general development process handbook
:ref:`Documentation/process <development_process_main>`.
Contents:
.. toctree::
:numbered:
:maxdepth: 2
maintainer-tip
This diff is collapsed.
...@@ -185,7 +185,7 @@ Linux USB project: ...@@ -185,7 +185,7 @@ Linux USB project:
http://www.linux-usb.org/ http://www.linux-usb.org/
How to NOT write kernel driver by Arjan van de Ven: How to NOT write kernel driver by Arjan van de Ven:
http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf https://landley.net/kdocs/ols/2002/ols2002-pages-545-555.pdf
Kernel Janitor: Kernel Janitor:
https://kernelnewbies.org/KernelJanitors https://kernelnewbies.org/KernelJanitors
......
...@@ -21,6 +21,10 @@ If you're unfamiliar with ``git``, you would be well-advised to learn how to ...@@ -21,6 +21,10 @@ If you're unfamiliar with ``git``, you would be well-advised to learn how to
use it, it will make your life as a kernel developer and in general much use it, it will make your life as a kernel developer and in general much
easier. easier.
Some subsystems and maintainer trees have additional information about
their workflow and expectations, see :ref:`Documentation/process/maintainer
handbooks <maintainer_handbooks_main>`.
Obtain a current source tree Obtain a current source tree
---------------------------- ----------------------------
...@@ -92,17 +96,6 @@ instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy ...@@ -92,17 +96,6 @@ instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change to do frotz", as if you are giving orders to the codebase to change
its behaviour. its behaviour.
If the patch fixes a logged bug entry, refer to that bug entry by
number and URL. If the patch follows from a mailing list discussion,
give a URL to the mailing list archive; use the https://lkml.kernel.org/
redirector with a ``Message-Id``, to ensure that the links cannot become
stale.
However, try to make your explanation understandable without external
resources. In addition to giving a URL to a mailing list archive or
bug, summarize the relevant points of the discussion that led to the
patch as submitted.
If you want to refer to a specific commit, don't just refer to the If you want to refer to a specific commit, don't just refer to the
SHA-1 ID of the commit. Please also include the oneline summary of SHA-1 ID of the commit. Please also include the oneline summary of
the commit, to make it easier for reviewers to know what it is about. the commit, to make it easier for reviewers to know what it is about.
...@@ -119,6 +112,28 @@ collisions with shorter IDs a real possibility. Bear in mind that, even if ...@@ -119,6 +112,28 @@ collisions with shorter IDs a real possibility. Bear in mind that, even if
there is no collision with your six-character ID now, that condition may there is no collision with your six-character ID now, that condition may
change five years from now. change five years from now.
If related discussions or any other background information behind the change
can be found on the web, add 'Link:' tags pointing to it. In case your patch
fixes a bug, for example, add a tag with a URL referencing the report in the
mailing list archives or a bug tracker; if the patch is a result of some
earlier mailing list discussion or something documented on the web, point to
it.
When linking to mailing list archives, preferably use the lore.kernel.org
message archiver service. To create the link URL, use the contents of the
``Message-Id`` header of the message without the surrounding angle brackets.
For example::
Link: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/
Please check the link to make sure that it is actually working and points
to the relevant message.
However, try to make your explanation understandable without external
resources. In addition to giving a URL to a mailing list archive or bug,
summarize the relevant points of the discussion that led to the
patch as submitted.
If your patch fixes a bug in a specific commit, e.g. you found an issue using If your patch fixes a bug in a specific commit, e.g. you found an issue using
``git bisect``, please use the 'Fixes:' tag with the first 12 characters of ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
the SHA-1 ID, and the one line summary. Do not split the tag across multiple the SHA-1 ID, and the one line summary. Do not split the tag across multiple
...@@ -326,6 +341,7 @@ politely and address the problems they have pointed out. ...@@ -326,6 +341,7 @@ politely and address the problems they have pointed out.
See Documentation/process/email-clients.rst for recommendations on email See Documentation/process/email-clients.rst for recommendations on email
clients and mailing list etiquette. clients and mailing list etiquette.
.. _resend_reminders:
Don't get discouraged - or impatient Don't get discouraged - or impatient
------------------------------------ ------------------------------------
...@@ -711,6 +727,8 @@ patch:: ...@@ -711,6 +727,8 @@ patch::
See more details on the proper patch format in the following See more details on the proper patch format in the following
references. references.
.. _backtraces:
Backtraces in commit mesages Backtraces in commit mesages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...@@ -743,7 +761,7 @@ the bug report. However, for a multi-patch series, it is generally ...@@ -743,7 +761,7 @@ the bug report. However, for a multi-patch series, it is generally
best to avoid using In-Reply-To: to link to older versions of the best to avoid using In-Reply-To: to link to older versions of the
series. This way multiple versions of the patch don't become an series. This way multiple versions of the patch don't become an
unmanageable forest of references in email clients. If a link is unmanageable forest of references in email clients. If a link is
helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in helpful, you can use the https://lore.kernel.org/ redirector (e.g., in
the cover email text) to link to an earlier version of the patch series. the cover email text) to link to an earlier version of the patch series.
......
...@@ -70,6 +70,10 @@ interrupt. After all, the primary purpose of a scheduling-clock interrupt ...@@ -70,6 +70,10 @@ interrupt. After all, the primary purpose of a scheduling-clock interrupt
is to force a busy CPU to shift its attention among multiple duties, is to force a busy CPU to shift its attention among multiple duties,
and an idle CPU has no duties to shift its attention among. and an idle CPU has no duties to shift its attention among.
An idle CPU that is not receiving scheduling-clock interrupts is said to
be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
tickless". The remainder of this document will use "dyntick-idle mode".
The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
scheduling-clock interrupts to idle CPUs, which is critically important scheduling-clock interrupts to idle CPUs, which is critically important
both to battery-powered devices and to highly virtualized mainframes. both to battery-powered devices and to highly virtualized mainframes.
...@@ -91,10 +95,6 @@ Therefore, systems with aggressive real-time response constraints often ...@@ -91,10 +95,6 @@ Therefore, systems with aggressive real-time response constraints often
run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels) run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
in order to avoid degrading from-idle transition latencies. in order to avoid degrading from-idle transition latencies.
An idle CPU that is not receiving scheduling-clock interrupts is said to
be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
tickless". The remainder of this document will use "dyntick-idle mode".
There is also a boot parameter "nohz=" that can be used to disable There is also a boot parameter "nohz=" that can be used to disable
dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off". dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
......
...@@ -107,7 +107,7 @@ comportamento. ...@@ -107,7 +107,7 @@ comportamento.
Se la patch corregge un baco conosciuto, fare riferimento a quel baco inserendo Se la patch corregge un baco conosciuto, fare riferimento a quel baco inserendo
il suo numero o il suo URL. Se la patch è la conseguenza di una discussione il suo numero o il suo URL. Se la patch è la conseguenza di una discussione
su una lista di discussione, allora fornite l'URL all'archivio di quella su una lista di discussione, allora fornite l'URL all'archivio di quella
discussione; usate i collegamenti a https://lkml.kernel.org/ con il discussione; usate i collegamenti a https://lore.kernel.org/ con il
``Message-Id``, in questo modo vi assicurerete che il collegamento non diventi ``Message-Id``, in questo modo vi assicurerete che il collegamento non diventi
invalido nel tempo. invalido nel tempo.
...@@ -772,7 +772,7 @@ che lo riportava. Tuttavia, per serie di patch multiple è generalmente ...@@ -772,7 +772,7 @@ che lo riportava. Tuttavia, per serie di patch multiple è generalmente
sconsigliato l'uso di In-Reply-To: per collegare precedenti versioni. sconsigliato l'uso di In-Reply-To: per collegare precedenti versioni.
In questo modo versioni multiple di una patch non diventeranno un'ingestibile In questo modo versioni multiple di una patch non diventeranno un'ingestibile
giungla di riferimenti all'interno dei programmi di posta. Se un collegamento giungla di riferimenti all'interno dei programmi di posta. Se un collegamento
è utile, potete usare https://lkml.kernel.org/ per ottenere i collegamenti è utile, potete usare https://lore.kernel.org/ per ottenere i collegamenti
ad una versione precedente di una serie di patch (per esempio, potete usarlo ad una versione precedente di una serie di patch (per esempio, potete usarlo
per l'email introduttiva alla serie). per l'email introduttiva alla serie).
......
NOTE: NOTE:
This is a version of Documentation/memory-barriers.txt translated into Korean. This is a version of Documentation/memory-barriers.txt translated into Korean.
This document is maintained by SeongJae Park <sj38.park@gmail.com>. This document is maintained by SeongJae Park <sj@kernel.org>.
If you find any difference between this document and the original file or If you find any difference between this document and the original file or
a problem with the translation, please contact the maintainer of this file. a problem with the translation, please contact the maintainer of this file.
...@@ -10,13 +10,13 @@ a fork. So if you have any comments or updates for this file please ...@@ -10,13 +10,13 @@ a fork. So if you have any comments or updates for this file please
update the original English file first. The English version is update the original English file first. The English version is
definitive, and readers should look there if they have any doubt. definitive, and readers should look there if they have any doubt.
=================================== =================================
이 문서는 이 문서는
Documentation/memory-barriers.txt Documentation/memory-barriers.txt
의 한글 번역입니다. 의 한글 번역입니다.
역자: 박성재 <sj38.park@gmail.com> 역자: 박성재 <sj@kernel.org>
=================================== =================================
========================= =========================
......
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/PCI/index.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
.. _cn_PCI_index.rst:
===================
Linux PCI总线子系统
===================
.. toctree::
:maxdepth: 2
:numbered:
pci
Todolist:
pciebus-howto
pci-iov-howto
msi-howto
sysfs-pci
acpi-info
pci-error-recovery
pcieaer-howto
endpoint/index
boot-interrupts
This diff is collapsed.
...@@ -67,6 +67,7 @@ Todolist: ...@@ -67,6 +67,7 @@ Todolist:
cpu-load cpu-load
lockup-watchdogs lockup-watchdogs
unicode unicode
sysrq
Todolist: Todolist:
...@@ -118,7 +119,6 @@ Todolist: ...@@ -118,7 +119,6 @@ Todolist:
rtc rtc
serial-console serial-console
svga svga
sysrq
thunderbolt thunderbolt
ufs ufs
vga-softcursor vga-softcursor
......
This diff is collapsed.
This diff is collapsed.
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/boot-time-mm.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
时奎亮 <alexs@kernel.org>
.. _cn_core-api_boot-time-mm:
================
启动时的内存管理
================
系统初始化早期“正常”的内存管理由于没有设置完毕无法使用。但是内核仍然需要
为各种数据结构分配内存,例如物理页分配器。
一个叫做 ``memblock`` 的专用分配器执行启动时的内存管理。特定架构的初始化
必须在setup_arch()中设置它,并在mem_init()函数中移除它。
一旦早期的内存管理可用,它就为内存分配提供了各种函数和宏。分配请求可以指向
第一个(也可能是唯一的)节点或NUMA系统中的某个特定节点。有一些API变体在分
配失败时panic,也有一些不会panic的。
Memblock还提供了各种控制其自身行为的API。
Memblock概述
============
该API在以下内核代码中:
mm/memblock.c
函数和结构体
============
下面是关于memblock数据结构、函数和宏的描述。其中一些实际上是内部的,但由于
它们被记录下来,漏掉它们是很愚蠢的。此外,阅读内部函数的注释可以帮助理解引
擎盖下真正发生的事情。
该API在以下内核代码中:
include/linux/memblock.h
mm/memblock.c
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/genalloc.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
时奎亮 <alexs@kernel.org>
.. _cn_core-api_genalloc:
genalloc/genpool子系统
======================
内核中有许多内存分配子系统,每一个都是针对特定的需求。然而,有时候,内核开发者需
要为特定范围的特殊用途的内存实现一个新的分配器;通常这个内存位于某个设备上。该设
备的驱动程序的作者当然可以写一个小的分配器来完成工作,但这是让内核充满几十个测试
差劲的分配器的方法。早在2005年,Jes Sorensen从sym53c8xx_2驱动中提取了其中的一
个分配器,并将其作为一个通用模块发布,用于创建特设的内存分配器。这段代码在2.6.13
版本中被合并;此后它被大大地修改了。
.. _posted: https://lwn.net/Articles/125842/
使用这个分配器的代码应该包括<linux/genalloc.h>。这个动作从创建一个池开始,使用
一个:
该API在以下内核代码中:
lib/genalloc.c
对gen_pool_create()的调用将创建一个内存池。分配的粒度由min_alloc_order设置;它
是一个log-base-2(以2为底的对数)的数字,就像页面分配器使用的数字一样,但它指的是
字节而不是页面。因此,如果min_alloc_order被传递为3,那么所有的分配将是8字节的倍数。
增加min_alloc_order可以减少跟踪池中内存所需的内存。nid参数指定哪一个NUMA节点应该被
用于分配管家结构体;如果调用者不关心,它可以是-1。
“管理的”接口devm_gen_pool_create()将内存池与一个特定的设备联系起来。在其他方面,
当给定的设备被销毁时,它将自动清理内存池。
一个内存池池被关闭的方法是:
该API在以下内核代码中:
lib/genalloc.c
值得注意的是,如果在给定的内存池中仍有未完成的分配,这个函数将采取相当极端的步骤,调用
BUG(),使整个系统崩溃。你已经被警告了。
一个新创建的内存池没有内存可以分配。在这种状态下,它是相当无用的,所以首要任务之一通常
是向内存池里添加内存。这可以通过以下方式完成:
该API在以下内核代码中:
include/linux/genalloc.h
lib/genalloc.c
对gen_pool_add()的调用将把从地址(在内核的虚拟地址空间)开始的内存的大小字节放入
给定的池中,再次使用nid作为节点ID进行辅助内存分配。gen_pool_add_virt()变体将显式
物理地址与内存联系起来;只有在内存池被用于DMA分配时,这才是必要的。
从内存池中分配内存(并将其放回)的函数是:
该API在以下内核代码中:
include/linux/genalloc.h
lib/genalloc.c
正如人们所期望的,gen_pool_alloc()将从给定的池中分配size<字节。gen_pool_dma_alloc()
变量分配内存用于DMA操作,返回dma所指向的空间中的相关物理地址。这只有在内存是用
gen_pool_add_virt()添加的情况下才会起作用。请注意,这个函数偏离了genpool通常使用
无符号长值来表示内核地址的模式;它返回一个void * 来代替。
这一切看起来都比较简单;事实上,一些开发者显然认为这太简单了。毕竟,上面的接口没有提
供对分配函数如何选择返回哪块特定内存的控制。如果需要这样的控制,下面的函数将是有意义
的:
该API在以下内核代码中:
lib/genalloc.c
使用gen_pool_alloc_algo()进行的分配指定了一种用于选择要分配的内存的算法;默认算法可
以用gen_pool_set_algo()来设置。数据值被传递给算法;大多数算法会忽略它,但偶尔也会需
要它。当然,人们可以写一个特殊用途的算法,但是已经有一套公平的算法可用了:
- gen_pool_first_fit是一个简单的初配分配器;如果没有指定其他算法,这是默认算法。
- gen_pool_first_fit_align强迫分配有一个特定的对齐方式(通过genpool_data_align结
构中的数据传递)。
- gen_pool_first_fit_order_align 按照大小的顺序排列分配。例如,一个60字节的分配将
以64字节对齐。
- gen_pool_best_fit,正如人们所期望的,是一个简单的最佳匹配分配器。
- gen_pool_fixed_alloc在池中的一个特定偏移量(通过数据参数在genpool_data_fixed结
构中传递)进行分配。如果指定的内存不可用,则分配失败。
还有一些其他的函数,主要是为了查询内存池中的可用空间或迭代内存块等目的。然而,大多数
用户应该不需要以上描述的功能。如果幸运的话,对这个模块的广泛认识将有助于防止在未来编
写特殊用途的内存分配器。
该API在以下内核代码中:
lib/genalloc.c
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/gfp_mask-from-fs-io.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
时奎亮 <alexs@kernel.org>
.. _cn_core-api_gfp_mask-from-fs-io:
============================
从FS/IO上下文中使用的GFP掩码
============================
:日期: 2018年5月
:作者: Michal Hocko <mhocko@kernel.org>
简介
====
文件系统和IO栈中的代码路径在分配内存时必须小心,以防止因直接调用FS或IO路径的内
存回收和阻塞已经持有的资源(例如锁--最常见的是用于事务上下文的锁)而造成递归死
锁。
避免这种死锁问题的传统方法是在调用分配器时,在gfp掩码中清除__GFP_FS和__GFP_IO
(注意后者意味着也要清除第一个)。GFP_NOFS和GFP_NOIO可以作为快捷方式使用。但事
实证明,上述方法导致了滥用,当限制性的gfp掩码被用于“万一”时,没有更深入的考虑,
这导致了问题,因为过度使用GFP_NOFS/GFP_NOIO会导致内存过度回收或其他内存回收的问
题。
新API
=====
从4.12开始,我们为NOFS和NOIO上下文提供了一个通用的作用域API,分别是
``memalloc_nofs_save`` , ``memalloc_nofs_restore`` 和 ``memalloc_noio_save`` ,
``memalloc_noio_restore`` ,允许从文件系统或I/O的角度将一个作用域标记为一个
关键部分。从该作用域的任何分配都将从给定的掩码中删除__GFP_FS和__GFP_IO,所以
没有内存分配可以追溯到FS/IO中。
该API在以下内核代码中:
include/linux/sched/mm.h
然后,FS/IO代码在任何与回收有关的关键部分开始之前简单地调用适当的保存函数
——例如,与回收上下文共享的锁或当事务上下文嵌套可能通过回收进行时。恢复函数
应该在关键部分结束时被调用。所有这一切最好都伴随着解释什么是回收上下文,以
方便维护。
请注意,保存/恢复函数的正确配对允许嵌套,所以从现有的NOIO或NOFS范围分别调
用 ``memalloc_noio_save`` 或 ``memalloc_noio_restore`` 是安全的。
那么__vmalloc(GFP_NOFS)呢?
===========================
vmalloc不支持GFP_NOFS语义,因为在分配器的深处有硬编码的GFP_KERNEL分配,要修
复这些分配是相当不容易的。这意味着用GFP_NOFS/GFP_NOIO调用 ``vmalloc`` 几乎
总是一个错误。好消息是,NOFS/NOIO语义可以通过范围API实现。
在理想的世界中,上层应该已经标记了危险的上下文,因此不需要特别的照顾, ``vmalloc``
的调用应该没有任何问题。有时,如果上下文不是很清楚,或者有叠加的违规行为,那么
推荐的方法是用范围API包装vmalloc,并加上注释来解释问题。
...@@ -39,12 +39,14 @@ ...@@ -39,12 +39,14 @@
:maxdepth: 1 :maxdepth: 1
kobject kobject
Todolist:
kref kref
assoc_array assoc_array
xarray xarray
Todolist:
idr idr
circular-buffers circular-buffers
rbtree rbtree
...@@ -101,19 +103,23 @@ Todolist: ...@@ -101,19 +103,23 @@ Todolist:
如何在内核中分配和使用内存。请注意,在 如何在内核中分配和使用内存。请注意,在
:doc:`/vm/index` 中有更多的内存管理文档。 :doc:`/vm/index` 中有更多的内存管理文档。
Todolist: .. toctree::
:maxdepth: 1
memory-allocation memory-allocation
unaligned-memory-access unaligned-memory-access
mm-api
genalloc
boot-time-mm
gfp_mask-from-fs-io
Todolist:
dma-api dma-api
dma-api-howto dma-api-howto
dma-attributes dma-attributes
dma-isa-lpc dma-isa-lpc
mm-api
genalloc
pin_user_pages pin_user_pages
boot-time-mm
gfp_mask-from-fs-io
内核调试的接口 内核调试的接口
============== ==============
......
.. include:: ../../disclaimer-zh_CN.rst .. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/core-api/irq/irq-affinity :Original: Documentation/core-api/irq/irq-affinity.rst
:翻译: :翻译:
......
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/kref.rst
翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
校译:
<此处请校译员签名(自愿),我将在下一个版本添加>
.. _cn_core_api_kref.rst:
=================================
为内核对象添加引用计数器(krefs)
=================================
:作者: Corey Minyard <minyard@acm.org>
:作者: Thomas Hellstrom <thellstrom@vmware.com>
其中很多内容都是从Greg Kroah-Hartman2004年关于krefs的OLS论文和演讲中摘
录的,可以在以下网址找到:
- http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
- http://www.kroah.com/linux/talks/ols_2004_kref_talk/
简介
====
krefs允许你为你的对象添加引用计数器。如果你有在多个地方使用和传递的对象,
而你没有refcounts,你的代码几乎肯定是坏的。如果你想要引用计数,krefs是个
好办法。
要使用kref,请在你的数据结构中添加一个,如::
struct my_data
{
.
.
struct kref refcount;
.
.
};
kref可以出现在数据结构体中的任何地方。
初始化
======
你必须在分配kref之后初始化它。 要做到这一点,可以这样调用kref_init::
struct my_data *data;
data = kmalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
kref_init(&data->refcount);
这将kref中的refcount设置为1。
Kref规则
========
一旦你有一个初始化的kref,你必须遵循以下规则:
1) 如果你对一个指针做了一个非临时性的拷贝,特别是如果它可以被传递给另一个执
行线程,你必须在传递之前用kref_get()增加refcount::
kref_get(&data->refcount);
如果你已经有了一个指向kref-ed结构体的有效指针(refcount不能为零),你
可以在没有锁的情况下这样做。
2) 当你完成对一个指针的处理时,你必须调用kref_put()::
kref_put(&data->refcount, data_release);
如果这是对该指针的最后一次引用,释放程序将被调用。如果代码从来没有尝试过
在没有已经持有有效指针的情况下获得一个kref-ed结构体的有效指针,那么在没
有锁的情况下这样做是安全的。
3) 如果代码试图获得对一个kref-ed结构体的引用,而不持有一个有效的指针,它必
须按顺序访问,在kref_put()期间不能发生kref_get(),并且该结构体在kref_get()
期间必须保持有效。
例如,如果你分配了一些数据,然后将其传递给另一个线程来处理::
void data_release(struct kref *ref)
{
struct my_data *data = container_of(ref, struct my_data, refcount);
kfree(data);
}
void more_data_handling(void *cb_data)
{
struct my_data *data = cb_data;
.
. do stuff with data here
.
kref_put(&data->refcount, data_release);
}
int my_data_handler(void)
{
int rv = 0;
struct my_data *data;
struct task_struct *task;
data = kmalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
kref_init(&data->refcount);
kref_get(&data->refcount);
task = kthread_run(more_data_handling, data, "more_data_handling");
if (task == ERR_PTR(-ENOMEM)) {
rv = -ENOMEM;
kref_put(&data->refcount, data_release);
goto out;
}
.
. do stuff with data here
.
out:
kref_put(&data->refcount, data_release);
return rv;
}
这样,两个线程处理数据的顺序并不重要,kref_put()处理知道数据不再被引用并释
放它。kref_get()不需要锁,因为我们已经有了一个有效的指针,我们拥有一个
refcount。put不需要锁,因为没有任何东西试图在没有持有指针的情况下获取数据。
在上面的例子中,kref_put()在成功和错误路径中都会被调用2次。这是必要的,因
为引用计数被kref_init()和kref_get()递增了2次。
请注意,规则1中的 "before "是非常重要的。你不应该做类似于::
task = kthread_run(more_data_handling, data, "more_data_handling");
if (task == ERR_PTR(-ENOMEM)) {
rv = -ENOMEM;
goto out;
} else
/* BAD BAD BAD - 在交接后得到 */
kref_get(&data->refcount);
不要以为你知道自己在做什么而使用上述构造。首先,你可能不知道自己在做什么。
其次,你可能知道自己在做什么(有些情况下涉及到锁,上述做法可能是合法的),
但其他不知道自己在做什么的人可能会改变代码或复制代码。这是很危险的作风。请
不要这样做。
在有些情况下,你可以优化get和put。例如,如果你已经完成了一个对象,并且给其
他对象排队,或者把它传递给其他对象,那么就没有理由先做一个get,然后再做一个
put::
/* 糟糕的额外获取(get)和输出(put) */
kref_get(&obj->ref);
enqueue(obj);
kref_put(&obj->ref, obj_cleanup);
只要做enqueue就可以了。 我们随时欢迎对这个问题的评论::
enqueue(obj);
/* 我们已经完成了对obj的处理,所以我们把我们的refcount传给了队列。
在这之后不要再碰obj了! */
最后一条规则(规则3)是最难处理的一条。例如,你有一个每个项目都被krefed的列表,
而你希望得到第一个项目。你不能只是从列表中抽出第一个项目,然后kref_get()它。
这违反了规则3,因为你还没有持有一个有效的指针。你必须添加一个mutex(或其他锁)。
比如说::
static DEFINE_MUTEX(mutex);
static LIST_HEAD(q);
struct my_data
{
struct kref refcount;
struct list_head link;
};
static struct my_data *get_entry()
{
struct my_data *entry = NULL;
mutex_lock(&mutex);
if (!list_empty(&q)) {
entry = container_of(q.next, struct my_data, link);
kref_get(&entry->refcount);
}
mutex_unlock(&mutex);
return entry;
}
static void release_entry(struct kref *ref)
{
struct my_data *entry = container_of(ref, struct my_data, refcount);
list_del(&entry->link);
kfree(entry);
}
static void put_entry(struct my_data *entry)
{
mutex_lock(&mutex);
kref_put(&entry->refcount, release_entry);
mutex_unlock(&mutex);
}
如果你不想在整个释放操作过程中持有锁,kref_put()的返回值是有用的。假设你不想在
上面的例子中在持有锁的情况下调用kfree()(因为这样做有点无意义)。你可以使用kref_put(),
如下所示::
static void release_entry(struct kref *ref)
{
/* 所有的工作都是在从kref_put()返回后完成的。*/
}
static void put_entry(struct my_data *entry)
{
mutex_lock(&mutex);
if (kref_put(&entry->refcount, release_entry)) {
list_del(&entry->link);
mutex_unlock(&mutex);
kfree(entry);
} else
mutex_unlock(&mutex);
}
如果你必须调用其他程序作为释放操作的一部分,而这些程序可能需要很长的时间,或者可
能要求相同的锁,那么这真的更有用。请注意,在释放例程中做所有的事情还是比较好的,
因为它比较整洁。
上面的例子也可以用kref_get_unless_zero()来优化,方法如下::
static struct my_data *get_entry()
{
struct my_data *entry = NULL;
mutex_lock(&mutex);
if (!list_empty(&q)) {
entry = container_of(q.next, struct my_data, link);
if (!kref_get_unless_zero(&entry->refcount))
entry = NULL;
}
mutex_unlock(&mutex);
return entry;
}
static void release_entry(struct kref *ref)
{
struct my_data *entry = container_of(ref, struct my_data, refcount);
mutex_lock(&mutex);
list_del(&entry->link);
mutex_unlock(&mutex);
kfree(entry);
}
static void put_entry(struct my_data *entry)
{
kref_put(&entry->refcount, release_entry);
}
这对于在put_entry()中移除kref_put()周围的mutex锁是很有用的,但是重要的是
kref_get_unless_zero被封装在查找表中的同一关键部分,否则kref_get_unless_zero
可能引用已经释放的内存。注意,在不检查其返回值的情况下使用kref_get_unless_zero
是非法的。如果你确信(已经有了一个有效的指针)kref_get_unless_zero()会返回true,
那么就用kref_get()代替。
Krefs和RCU
==========
函数kref_get_unless_zero也使得在上述例子中使用rcu锁进行查找成为可能::
struct my_data
{
struct rcu_head rhead;
.
struct kref refcount;
.
.
};
static struct my_data *get_entry_rcu()
{
struct my_data *entry = NULL;
rcu_read_lock();
if (!list_empty(&q)) {
entry = container_of(q.next, struct my_data, link);
if (!kref_get_unless_zero(&entry->refcount))
entry = NULL;
}
rcu_read_unlock();
return entry;
}
static void release_entry_rcu(struct kref *ref)
{
struct my_data *entry = container_of(ref, struct my_data, refcount);
mutex_lock(&mutex);
list_del_rcu(&entry->link);
mutex_unlock(&mutex);
kfree_rcu(entry, rhead);
}
static void put_entry(struct my_data *entry)
{
kref_put(&entry->refcount, release_entry_rcu);
}
但要注意的是,在调用release_entry_rcu后,结构kref成员需要在有效内存中保留一个rcu
宽限期。这可以通过使用上面的kfree_rcu(entry, rhead)来实现,或者在使用kfree之前
调用synchronize_rcu(),但注意synchronize_rcu()可能会睡眠相当长的时间。
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/memory-allocation.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
时奎亮 <alexs@kernel.org>
.. _cn_core-api_memory-allocation:
============
内存分配指南
============
Linux为内存分配提供了多种API。你可以使用 `kmalloc` 或 `kmem_cache_alloc`
系列分配小块内存,使用 `vmalloc` 及其派生产品分配大的几乎连续的区域,或者
你可以用 alloc_pages 直接向页面分配器请求页面。也可以使用更专业的分配器,
例如 `cma_alloc` 或 `zs_malloc` 。
大多数的内存分配API使用GFP标志来表达该内存应该如何分配。GFP的缩写代表
“(get free pages)获取空闲页”,是底层的内存分配功能。
(内存)分配API的多样性与众多的GFP标志相结合,使得“我应该如何分配内存?”这个问
题不那么容易回答,尽管很可能你应该使用
::
kzalloc(<size>, GFP_KERNEL);
当然,有些情况下必须使用其他分配API和不同的GFP标志。
获取空闲页标志
==============
GFP标志控制分配器的行为。它们告诉我们哪些内存区域可以被使用,分配器应该多努力寻
找空闲的内存,这些内存是否可以被用户空间访问等等。内存管理API为GFP标志和它们的
组合提供了参考文件,这里我们简要介绍一下它们的推荐用法:
* 大多数时候, ``GFP_KERNEL`` 是你需要的。内核数据结构的内存,DMA可用内存,inode
缓存,所有这些和其他许多分配类型都可以使用 ``GFP_KERNEL`` 。注意,使用 ``GFP_KERNEL``
意味着 ``GFP_RECLAIM`` ,这意味着在有内存压力的情况下可能会触发直接回收;调用上
下文必须允许睡眠。
* 如果分配是从一个原子上下文中进行的,例如中断处理程序,使用 ``GFP_NOWAIT`` 。这个
标志可以防止直接回收和IO或文件系统操作。因此,在内存压力下, ``GFP_NOWAIT`` 分配
可能会失败。有合理退路的分配应该使用 ``GFP_NOWARN`` 。
* 如果你认为访问保留内存区是合理的,并且除非分配成功,否则内核会有压力,你可以使用 ``GFP_ATOMIC`` 。
* 从用户空间触发的不可信任的分配应该是kmem核算的对象,必须设置 ``__GFP_ACCOUNT`` 位。
有一个方便的用于 ``GFP_KERNEL`` 分配的 ``GFP_KERNEL_ACCOUNT`` 快捷键,其应该被核
算。
* 用户空间的分配应该使用 ``GFP_USER`` 、 ``GFP_HIGHUSER`` 或 ``GFP_HIGHUSER_MOVABLE``
中的一个标志。标志名称越长,限制性越小。
``GFP_HIGHUSER_MOVABLE`` 不要求分配的内存将被内核直接访问,并意味着数据是可迁移的。
``GFP_HIGHUSER`` 意味着所分配的内存是不可迁移的,但也不要求它能被内核直接访问。举个
例子就是一个硬件分配内存,这些数据直接映射到用户空间,但没有寻址限制。
``GFP_USER`` 意味着分配的内存是不可迁移的,它必须被内核直接访问。
你可能会注意到,在现有的代码中,有相当多的分配指定了 ``GFP_NOIO`` 或 ``GFP_NOFS`` 。
从历史上看,它们被用来防止递归死锁,这种死锁是由直接内存回收调用到FS或IO路径以及对已
经持有的资源进行阻塞引起的。从4.12开始,解决这个问题的首选方法是使用新的范围API,即
:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
其他传统的GFP标志是 ``GFP_DMA`` 和 ``GFP_DMA32`` 。它们用于确保分配的内存可以被寻
址能力有限的硬件访问。因此,除非你正在为一个有这种限制的设备编写驱动程序,否则要避免
使用这些标志。而且,即使是有限制的硬件,也最好使用dma_alloc* APIs。
GFP标志和回收行为
-----------------
内存分配可能会触发直接或后台回收,了解页面分配器将如何努力满足该请求或其他请求是非常
有用的。
* ``GFP_KERNEL & ~__GFP_RECLAIM`` - 乐观分配,完全不尝试释放内存。最轻量级的模
式,甚至不启动后台回收。应该小心使用,因为它可能会耗尽内存,而下一个用户可能会启
动更积极的回收。
* ``GFP_KERNEL & ~__GFP_DIRECT_RECLAIM`` (or ``GFP_NOWAIT`` ) - 乐观分配,不
试图从当前上下文中释放内存,但如果该区域低于低水位,可以唤醒kswapd来回收内存。可
以从原子上下文中使用,或者当请求是一个性能优化,并且有另一个慢速路径的回退。
* ``(GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM`` (aka ``GFP_ATOMIC`` ) - 非
睡眠分配,有一个昂贵的回退,所以它可以访问某些部分的内存储备。通常从中断/底层上下
文中使用,有一个昂贵的慢速路径回退。
* ``GFP_KERNEL`` - 允许后台和直接回收,并使用默认的页面分配器行为。这意味着廉价
的分配请求基本上是不会失败的,但不能保证这种行为,所以失败必须由调用者适当检查(例
如,目前允许OOM杀手失败)。
* ``GFP_KERNEL | __GFP_NORETRY`` - 覆盖默认的分配器行为,所有的分配请求都会提前
失败,而不是导致破坏性的回收(在这个实现中是一轮的回收)。OOM杀手不被调用。
* ``GFP_KERNEL | __GFP_RETRY_MAYFAIL`` - 覆盖 **默认** 的分配器行为,所有分配请求都非
常努力。如果回收不能取得任何进展,该请求将失败。OOM杀手不会被触发。
* ``GFP_KERNEL | __GFP_NOFAIL`` - 覆盖默认的分配器行为,所有分配请求将无休止地循
环,直到成功。这可能真的很危险,特别是对于较大的需求。
选择内存分配器
==============
分配内存的最直接的方法是使用kmalloc()系列的函数。而且,为了安全起见,最好使用将内存
设置为零的例程,如kzalloc()。如果你需要为一个数组分配内存,有kmalloc_array()和kcalloc()
辅助程序。辅助程序struct_size()、array_size()和array3_size()可以用来安全地计算对
象的大小而不会溢出。
可以用 `kmalloc` 分配的块的最大尺寸是有限的。实际的限制取决于硬件和内核配置,但是对于
小于页面大小的对象,使用 `kmalloc` 是一个好的做法。
用 `kmalloc` 分配的块的地址至少要对齐到ARCH_KMALLOC_MINALIGN字节。对于2的幂的大小,
对齐方式也被保证为至少是各自的大小。
用kmalloc()分配的块可以用krealloc()调整大小。与kmalloc_array()类似:以krealloc_array()
的形式提供了一个用于调整数组大小的辅助工具。
对于大量的分配,你可以使用vmalloc()和vzalloc(),或者直接向页面分配器请求页面。由vmalloc
和相关函数分配的内存在物理上是不连续的。
如果你不确定分配的大小对 `kmalloc` 来说是否太大,可以使用kvmalloc()及其派生函数。它将尝
试用kmalloc分配内存,如果分配失败,将用 `vmalloc` 重新尝试。对于哪些GFP标志可以与 `kvmalloc`
一起使用是有限制的;请看kvmalloc_node()参考文档。注意, `kvmalloc` 可能会返回物理上不连
续的内存。
如果你需要分配许多相同的对象,你可以使用slab缓存分配器。在使用缓存之前,应该用
kmem_cache_create()或kmem_cache_create_usercopy()来设置缓存。如果缓存的一部分可能被复
制到用户空间,应该使用第二个函数。在缓存被创建后,kmem_cache_alloc()和它的封装可以从该缓
存中分配内存。
当分配的内存不再需要时,它必须被释放。你可以使用kvfree()来处理用 `kmalloc` 、 `vmalloc`
和 `kvmalloc` 分配的内存。slab缓存应该用kmem_cache_free()来释放。不要忘记用
kmem_cache_destroy()来销毁缓存。
.. include:: ../disclaimer-zh_CN.rst .. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/memory_hotplug.rst :Original: Documentation/core-api/memory-hotplug.rst
:翻译: :翻译:
......
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/mm-api.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
时奎亮<alexs@kernel.org>
.. _cn_core-api_mm-api:
============
内存管理APIs
============
API(Application Programming Interface,应用程序接口)
用户空间内存访问
================
该API在以下内核代码中:
arch/x86/include/asm/uaccess.h
arch/x86/lib/usercopy_32.c
mm/gup.c
.. _cn_mm-api-gfp-flags:
内存分配控制
============
该API在以下内核代码中:
include/linux/gfp.h
Slab缓存
========
此缓存非cpu片上缓存,请读者自行查阅资料。
该API在以下内核代码中:
include/linux/slab.h
mm/slab.c
mm/slab_common.c
mm/util.c
虚拟连续(内存页)映射
======================
该API在以下内核代码中:
mm/vmalloc.c
文件映射和页面缓存
==================
该API在以下内核代码中:
mm/readahead.c
mm/filemap.c
mm/page-writeback.c
mm/truncate.c
include/linux/pagemap.h
内存池
======
该API在以下内核代码中:
mm/mempool.c
DMA池
=====
DMA(Direct Memory Access,直接存储器访问)
该API在以下内核代码中:
mm/dmapool.c
更多的内存管理函数
==================
该API在以下内核代码中:
mm/memory.c
mm/page_alloc.c
mm/mempolicy.c
include/linux/mm_types.h
include/linux/mm.h
include/linux/mmzone.h
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/core-api/unaligned-memory-access.rst
:翻译:
司延腾 Yanteng Si <siyanteng@loongson.cn>
:校译:
时奎亮 <alexs@kernel.org>
.. _cn_core-api_unaligned-memory-access:
==============
非对齐内存访问
==============
:作者: Daniel Drake <dsd@gentoo.org>,
:作者: Johannes Berg <johannes@sipsolutions.net>
:感谢他们的帮助: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
Vadim Lobanov
Linux运行在各种各样的架构上,这些架构在内存访问方面有不同的表现。本文介绍了一些
关于不对齐访问的细节,为什么你需要编写不引起不对齐访问的代码,以及如何编写这样的
代码
非对齐访问的定义
================
当你试图从一个不被N偶数整除的地址(即addr % N != 0)开始读取N字节的数据时,就
会发生无对齐内存访问。例如,从地址0x10004读取4个字节的数据是可以的,但从地址
0x10005读取4个字节的数据将是一个不对齐的内存访问。
上述内容可能看起来有点模糊,因为内存访问可以以不同的方式发生。这里的背景是在机器
码层面上:某些指令在内存中读取或写入一些字节(例如x86汇编中的movb、movw、movl)。
正如将变得清晰的那样,相对容易发现那些将编译为多字节内存访问指令的C语句,即在处理
u16、u32和u64等类型时。
自然对齐
========
上面提到的规则构成了我们所说的自然对齐。当访问N个字节的内存时,基础内存地址必须被
N平均分割,即addr % N == 0。
在编写代码时,假设目标架构有自然对齐的要求。
在现实中,只有少数架构在所有大小的内存访问上都要求自然对齐。然而,我们必须考虑所
有支持的架构;编写满足自然对齐要求的代码是实现完全可移植性的最简单方法。
为什么非对齐访问时坏事
======================
执行非对齐内存访问的效果因架构不同而不同。在这里写一整篇关于这些差异的文档是很容
易的;下面是对常见情况的总结:
- 一些架构能够透明地执行非对齐内存访问,但通常会有很大的性能代价。
- 当不对齐的访问发生时,一些架构会引发处理器异常。异常处理程序能够纠正不对齐的
访问,但要付出很大的性能代价。
- 一些架构在发生不对齐访问时,会引发处理器异常,但异常中并没有包含足够的信息来
纠正不对齐访问。
- 有些架构不能进行无对齐内存访问,但会默默地执行与请求不同的内存访问,从而导致
难以发现的微妙的代码错误!
从上文可以看出,如果你的代码导致不对齐的内存访问发生,那么你的代码在某些平台上将无
法正常工作,在其他平台上将导致性能问题。
不会导致非对齐访问的代码
========================
起初,上面的概念似乎有点难以与实际编码实践联系起来。毕竟,你对某些变量的内存地址没
有很大的控制权,等等。
幸运的是事情并不复杂,因为在大多数情况下,编译器会确保代码工作正常。例如,以下面的
结构体为例::
struct foo {
u16 field1;
u32 field2;
u8 field3;
};
让我们假设上述结构体的一个实例驻留在从地址0x10000开始的内存中。根据基本的理解,访问
field2会导致非对齐访问,这并不是不合理的。你会期望field2位于该结构体的2个字节的偏移
量,即地址0x10002,但该地址不能被4平均整除(注意,我们在这里读一个4字节的值)。
幸运的是,编译器理解对齐约束,所以在上述情况下,它会在field1和field2之间插入2个字节
的填充。因此,对于标准的结构体类型,你总是可以依靠编译器来填充结构体,以便对字段的访
问可以适当地对齐(假设你没有将字段定义不同长度的类型)。
同样,你也可以依靠编译器根据变量类型的大小,将变量和函数参数对齐到一个自然对齐的方案。
在这一点上,应该很清楚,访问单个字节(u8或char)永远不会导致无对齐访问,因为所有的内
存地址都可以被1均匀地整除。
在一个相关的话题上,考虑到上述因素,你可以观察到,你可以对结构体中的字段进行重新排序,
以便将字段放在不重排就会插入填充物的地方,从而减少结构体实例的整体常驻内存大小。上述
例子的最佳布局是::
struct foo {
u32 field2;
u16 field1;
u8 field3;
};
对于一个自然对齐方案,编译器只需要在结构的末尾添加一个字节的填充。添加这种填充是为了满
足这些结构的数组的对齐约束。
另一点值得一提的是在结构体类型上使用__attribute__((packed))。这个GCC特有的属性告诉编
译器永远不要在结构体中插入任何填充,当你想用C结构体来表示一些“off the wire”的固定排列
的数据时,这个属性很有用。
你可能会倾向于认为,在访问不满足架构对齐要求的字段时,使用这个属性很容易导致不对齐的访
问。然而,编译器也意识到了对齐的限制,并且会产生额外的指令来执行内存访问,以避免造成不
对齐的访问。当然,与non-packed的情况相比,额外的指令显然会造成性能上的损失,所以packed
属性应该只在避免结构填充很重要的时候使用。
导致非对齐访问的代码
====================
考虑到上述情况,让我们来看看一个现实生活中可能导致非对齐内存访问的函数的例子。下面这个
函数取自include/linux/etherdevice.h,是一个优化的例程,用于比较两个以太网MAC地址是否
相等::
bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
{
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
return fold == 0;
#else
const u16 *a = (const u16 *)addr1;
const u16 *b = (const u16 *)addr2;
return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0;
#endif
}
在上述函数中,当硬件具有高效的非对齐访问能力时,这段代码没有问题。但是当硬件不能在任意
边界上访问内存时,对a[0]的引用导致从地址addr1开始的2个字节(16位)被读取。
想一想,如果addr1是一个奇怪的地址,如0x10003,会发生什么?(提示:这将是一个非对齐访
问。)
尽管上述函数存在潜在的非对齐访问问题,但它还是被包含在内核中,但被理解为只在16位对齐
的地址上正常工作。调用者应该确保这种对齐方式或者根本不使用这个函数。这个不对齐的函数
仍然是有用的,因为它是在你能确保对齐的情况下的一个很好的优化,这在以太网网络环境中几
乎是一直如此。
下面是另一个可能导致非对齐访问的代码的例子::
void myfunc(u8 *data, u32 value)
{
[...]
*((u32 *) data) = cpu_to_le32(value);
[...]
}
每当数据参数指向的地址不被4均匀整除时,这段代码就会导致非对齐访问。
综上所述,你可能遇到非对齐访问问题的两种主要情况包括:
1. 将变量定义不同长度的类型
2. 指针运算后访问至少2个字节的数据
避免非对齐访问
==============
避免非对齐访问的最简单方法是使用<asm/unaligned.h>头文件提供的get_unaligned()和
put_unaligned()宏。
回到前面的一个可能导致非对齐访问的代码例子::
void myfunc(u8 *data, u32 value)
{
[...]
*((u32 *) data) = cpu_to_le32(value);
[...]
}
为了避免非对齐的内存访问,你可以将其改写如下::
void myfunc(u8 *data, u32 value)
{
[...]
value = cpu_to_le32(value);
put_unaligned(value, (u32 *) data);
[...]
}
get_unaligned()宏的工作原理与此类似。假设'data'是一个指向内存的指针,并且你希望避免
非对齐访问,其用法如下::
u32 value = get_unaligned((u32 *) data);
这些宏适用于任何长度的内存访问(不仅仅是上面例子中的32位)。请注意,与标准的对齐内存
访问相比,使用这些宏来访问非对齐内存可能会在性能上付出代价。
如果使用这些宏不方便,另一个选择是使用memcpy(),其中源或目标(或两者)的类型为u8*或
非对齐char*。由于这种操作的字节性质,避免了非对齐访问。
对齐 vs. 网络
=============
在需要对齐负载的架构上,网络要求IP头在四字节边界上对齐,以优化IP栈。对于普通的以太网
硬件,常数NET_IP_ALIGN被使用。在大多数架构上,这个常数的值是2,因为正常的以太网头是
14个字节,所以为了获得适当的对齐,需要DMA到一个可以表示为4*n+2的地址。一个值得注意的
例外是powerpc,它将NET_IP_ALIGN定义为0,因为DMA到未对齐的地址可能非常昂贵,与未对齐
的负载的成本相比相形见绌。
对于一些不能DMA到未对齐地址的以太网硬件,如4*n+2或非以太网硬件,这可能是一个问题,这
时需要将传入的帧复制到一个对齐的缓冲区。因为这在可以进行非对齐访问的架构上是不必要的,
所以可以使代码依赖于CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS,像这样::
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
skb = original skb
#else
skb = copy skb
#endif
This diff is collapsed.
...@@ -21,7 +21,7 @@ Harding <me@tobin.cc>。 ...@@ -21,7 +21,7 @@ Harding <me@tobin.cc>。
原始邮件线程:: 原始邮件线程::
http://lkml.kernel.org/r/20171114110500.GA21175@kroah.com https://lore.kernel.org/r/20171114110500.GA21175@kroah.com
创建分支 创建分支
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
:ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>` :ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>`
和 :ref:`Documentation/translations/zh_CN/process/submit-checklist.rst <cn_submitchecklist>`。 和 :ref:`Documentation/translations/zh_CN/process/submit-checklist.rst <cn_submitchecklist>`。
何时邮寄 何时寄送
-------- --------
在补丁完全“准备好”之前,避免发布补丁是一种持续的诱惑。对于简单的补丁,这 在补丁完全“准备好”之前,避免发布补丁是一种持续的诱惑。对于简单的补丁,这
...@@ -142,7 +142,7 @@ ...@@ -142,7 +142,7 @@
一般来说,你越把自己放在每个阅读你变更日志的人的位置上,变更日志(和内核 一般来说,你越把自己放在每个阅读你变更日志的人的位置上,变更日志(和内核
作为一个整体)就越好。 作为一个整体)就越好。
说,变更日志是将变更提交到版本控制系统时使用的文本。接下来将是: 需要说,变更日志是将变更提交到版本控制系统时使用的文本。接下来将是:
- 补丁本身,采用统一的(“-u”)补丁格式。使用“-p”选项来diff将使函数名与 - 补丁本身,采用统一的(“-u”)补丁格式。使用“-p”选项来diff将使函数名与
更改相关联,从而使结果补丁更容易被其他人读取。 更改相关联,从而使结果补丁更容易被其他人读取。
...@@ -186,10 +186,10 @@ ...@@ -186,10 +186,10 @@
在补丁中添加标签时要小心:只有Cc:才适合在没有指定人员明确许可的情况下添加。 在补丁中添加标签时要小心:只有Cc:才适合在没有指定人员明确许可的情况下添加。
送补丁 送补丁
-------- --------
在寄补丁之前,您还需要注意以下几点: 在寄补丁之前,您还需要注意以下几点:
- 您确定您的邮件发送程序不会损坏补丁吗?被邮件客户端更改空白或修饰了行的补丁 - 您确定您的邮件发送程序不会损坏补丁吗?被邮件客户端更改空白或修饰了行的补丁
无法被另一端接受,并且通常不会进行任何详细检查。如果有任何疑问,先把补丁寄 无法被另一端接受,并且通常不会进行任何详细检查。如果有任何疑问,先把补丁寄
......
...@@ -381,7 +381,7 @@ MAINTAINERS文件中可以找到不同话题对应的邮件列表。 ...@@ -381,7 +381,7 @@ MAINTAINERS文件中可以找到不同话题对应的邮件列表。
内核社区的工作模式同大多数传统公司开发队伍的工作模式并不相同。下面这些例 内核社区的工作模式同大多数传统公司开发队伍的工作模式并不相同。下面这些例
子,可以帮助你避免某些可能发生问题: 子,可以帮助你避免某些可能发生问题:
用这些话介绍你的修改提案会有好处: 用这些话介绍你的修改提案会有好处:(在任何时候你都不应该用中文写提案)
- 它同时解决了多个问题 - 它同时解决了多个问题
- 它删除了2000行代码 - 它删除了2000行代码
...@@ -448,8 +448,8 @@ Linux内核社区并不喜欢一下接收大段的代码。修改需要被恰当 ...@@ -448,8 +448,8 @@ Linux内核社区并不喜欢一下接收大段的代码。修改需要被恰当
保证修改分成很多小块,这样在整个项目都准备好被包含进内核之前,其中的一部 保证修改分成很多小块,这样在整个项目都准备好被包含进内核之前,其中的一部
分可能会先被接收。 分可能会先被接收。
必须了解这样做是不可接受的:试图将未完成的工作提交进内核,然后再找时间修 你必须明白这么做是无法令人接受的:试图将不完整的代码提交进内核,然后再找
复。 时间修复。
证明修改的必要性 证明修改的必要性
...@@ -475,8 +475,8 @@ Linux内核社区并不喜欢一下接收大段的代码。修改需要被恰当 ...@@ -475,8 +475,8 @@ Linux内核社区并不喜欢一下接收大段的代码。修改需要被恰当
https://www.ozlabs.org/~akpm/stuff/tpp.txt https://www.ozlabs.org/~akpm/stuff/tpp.txt
这些事情有时候做起来很难。要在任何方面都做到完美可能需要好几年时间。这是 这些事情有时候做起来很难。想要在任何方面都做到完美可能需要好几年时间。这
一个持续提高的过程,它需要大量的耐心和决心。只要不放弃,你一定可以做到。 一个持续提高的过程,它需要大量的耐心和决心。只要不放弃,你一定可以做到。
很多人已经做到了,而他们都曾经和现在的你站在同样的起点上。 很多人已经做到了,而他们都曾经和现在的你站在同样的起点上。
......
...@@ -127,13 +127,13 @@ ...@@ -127,13 +127,13 @@
URL来查找补丁描述并将其放入补丁中。也就是说,补丁(系列)及其描述应该是独立的。 URL来查找补丁描述并将其放入补丁中。也就是说,补丁(系列)及其描述应该是独立的。
这对维护人员和审查人员都有好处。一些评审者可能甚至没有收到补丁的早期版本。 这对维护人员和审查人员都有好处。一些评审者可能甚至没有收到补丁的早期版本。
描述你在命令语气中的变化,例如“make xyzzy do frotz”而不是“[这个补丁]make 描述你在命令语气中的变化,例如“make xyzzy do frotz”而不是“[This patch]make
xyzzy do frotz”或“[]changed xyzzy to do frotz”,就好像你在命令代码库改变 xyzzy do frotz”或“[I]changed xyzzy to do frotz”,就好像你在命令代码库改变
它的行为一样。 它的行为一样。
如果修补程序修复了一个记录的bug条目,请按编号和URL引用该bug条目。如果补丁来 如果修补程序修复了一个记录的bug条目,请按编号和URL引用该bug条目。如果补丁来
自邮件列表讨论,请给出邮件列表存档的URL;使用带有 ``Message-ID`` 的 自邮件列表讨论,请给出邮件列表存档的URL;使用带有 ``Message-ID`` 的
https://lkml.kernel.org/ 重定向,以确保链接不会过时。 https://lore.kernel.org/ 重定向,以确保链接不会过时。
但是,在没有外部资源的情况下,尽量让你的解释可理解。除了提供邮件列表存档或 但是,在没有外部资源的情况下,尽量让你的解释可理解。除了提供邮件列表存档或
bug的URL之外,还要总结需要提交补丁的相关讨论要点。 bug的URL之外,还要总结需要提交补丁的相关讨论要点。
...@@ -599,7 +599,7 @@ e-mail 标题中的“一句话概述”扼要的描述 e-mail 中的补丁。 ...@@ -599,7 +599,7 @@ e-mail 标题中的“一句话概述”扼要的描述 e-mail 中的补丁。
将补丁与以前的相关讨论关联起来,例如,将bug修复程序链接到电子邮件和bug报告。 将补丁与以前的相关讨论关联起来,例如,将bug修复程序链接到电子邮件和bug报告。
但是,对于多补丁系列,最好避免在回复时使用链接到该系列的旧版本。这样, 但是,对于多补丁系列,最好避免在回复时使用链接到该系列的旧版本。这样,
补丁的多个版本就不会成为电子邮件客户端中无法管理的引用序列。如果链接有用, 补丁的多个版本就不会成为电子邮件客户端中无法管理的引用序列。如果链接有用,
可以使用 https://lkml.kernel.org/ 重定向器(例如,在封面电子邮件文本中) 可以使用 https://lore.kernel.org/ 重定向器(例如,在封面电子邮件文本中)
链接到补丁系列的早期版本。 链接到补丁系列的早期版本。
16) 发送git pull请求 16) 发送git pull请求
......
...@@ -140,11 +140,6 @@ TODOList: ...@@ -140,11 +140,6 @@ TODOList:
體系結構無關文檔 體系結構無關文檔
---------------- ----------------
.. toctree::
:maxdepth: 2
arm64/index
TODOList: TODOList:
* asm-annotations * asm-annotations
...@@ -152,6 +147,11 @@ TODOList: ...@@ -152,6 +147,11 @@ TODOList:
特定體系結構文檔 特定體系結構文檔
---------------- ----------------
.. toctree::
:maxdepth: 2
arm64/index
TODOList: TODOList:
* arch * arch
......
...@@ -136,7 +136,7 @@ xyzzy do frotz」或「[我]changed xyzzy to do frotz」,就好像你在命令 ...@@ -136,7 +136,7 @@ xyzzy do frotz」或「[我]changed xyzzy to do frotz」,就好像你在命令
如果修補程序修復了一個記錄的bug條目,請按編號和URL引用該bug條目。如果補丁來 如果修補程序修復了一個記錄的bug條目,請按編號和URL引用該bug條目。如果補丁來
自郵件列表討論,請給出郵件列表存檔的URL;使用帶有 ``Message-ID`` 的 自郵件列表討論,請給出郵件列表存檔的URL;使用帶有 ``Message-ID`` 的
https://lkml.kernel.org/ 重定向,以確保連結不會過時。 https://lore.kernel.org/ 重定向,以確保連結不會過時。
但是,在沒有外部資源的情況下,儘量讓你的解釋可理解。除了提供郵件列表存檔或 但是,在沒有外部資源的情況下,儘量讓你的解釋可理解。除了提供郵件列表存檔或
bug的URL之外,還要總結需要提交補丁的相關討論要點。 bug的URL之外,還要總結需要提交補丁的相關討論要點。
...@@ -602,7 +602,7 @@ e-mail 標題中的「一句話概述」扼要的描述 e-mail 中的補丁。 ...@@ -602,7 +602,7 @@ e-mail 標題中的「一句話概述」扼要的描述 e-mail 中的補丁。
將補丁與以前的相關討論關聯起來,例如,將bug修復程序連結到電子郵件和bug報告。 將補丁與以前的相關討論關聯起來,例如,將bug修復程序連結到電子郵件和bug報告。
但是,對於多補丁系列,最好避免在回復時使用連結到該系列的舊版本。這樣, 但是,對於多補丁系列,最好避免在回復時使用連結到該系列的舊版本。這樣,
補丁的多個版本就不會成爲電子郵件客戶端中無法管理的引用序列。如果連結有用, 補丁的多個版本就不會成爲電子郵件客戶端中無法管理的引用序列。如果連結有用,
可以使用 https://lkml.kernel.org/ 重定向器(例如,在封面電子郵件文本中) 可以使用 https://lore.kernel.org/ 重定向器(例如,在封面電子郵件文本中)
連結到補丁系列的早期版本。 連結到補丁系列的早期版本。
16) 發送git pull請求 16) 發送git pull請求
......
...@@ -205,7 +205,7 @@ which are function pointers of struct address_space_operations. ...@@ -205,7 +205,7 @@ which are function pointers of struct address_space_operations.
In this function, the driver should put the isolated page back into its own data In this function, the driver should put the isolated page back into its own data
structure. structure.
4. non-LRU movable page flags Non-LRU movable page flags
There are two page flags for supporting non-LRU movable page. There are two page flags for supporting non-LRU movable page.
......
...@@ -8,7 +8,7 @@ This file documents some of the kernel entries in ...@@ -8,7 +8,7 @@ This file documents some of the kernel entries in
arch/x86/entry/entry_64.S. A lot of this explanation is adapted from arch/x86/entry/entry_64.S. A lot of this explanation is adapted from
an email from Ingo Molnar: an email from Ingo Molnar:
http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu> https://lore.kernel.org/r/20110529191055.GC9835%40elte.hu
The x86 architecture has quite a few different ways to jump into The x86 architecture has quite a few different ways to jump into
kernel code. Most of these entry points are registered in kernel code. Most of these entry points are registered in
......
...@@ -177,6 +177,6 @@ brutal, unyielding efficiency. ...@@ -177,6 +177,6 @@ brutal, unyielding efficiency.
ORC stands for Oops Rewind Capability. ORC stands for Oops Rewind Capability.
.. [1] https://lkml.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de .. [1] https://lore.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de
.. [2] https://lkml.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz .. [2] https://lore.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz
.. [3] http://dustin.wikidot.com/half-orcs-and-orcs .. [3] http://dustin.wikidot.com/half-orcs-and-orcs
...@@ -94,6 +94,9 @@ while (<IN>) { ...@@ -94,6 +94,9 @@ while (<IN>) {
# Makefiles and scripts contain nasty expressions to parse docs # Makefiles and scripts contain nasty expressions to parse docs
next if ($f =~ m/Makefile/ || $f =~ m/\.sh$/); next if ($f =~ m/Makefile/ || $f =~ m/\.sh$/);
# It doesn't make sense to parse hidden files
next if ($f =~ m#/\.#);
# Skip this script # Skip this script
next if ($f eq $scriptname); next if ($f eq $scriptname);
...@@ -144,6 +147,7 @@ while (<IN>) { ...@@ -144,6 +147,7 @@ while (<IN>) {
if ($f =~ m/tools/) { if ($f =~ m/tools/) {
my $path = $f; my $path = $f;
$path =~ s,(.*)/.*,$1,; $path =~ s,(.*)/.*,$1,;
$path =~ s,testing/selftests/bpf,bpf/bpftool,;
next if (grep -e, glob("$path/$ref $path/../$ref $path/$fulref")); next if (grep -e, glob("$path/$ref $path/../$ref $path/$fulref"));
} }
......
...@@ -1256,6 +1256,7 @@ sub dump_struct($$) { ...@@ -1256,6 +1256,7 @@ sub dump_struct($$) {
my $args = qr{([^,)]+)}; my $args = qr{([^,)]+)};
# replace DECLARE_BITMAP # replace DECLARE_BITMAP
$members =~ s/__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)/DECLARE_BITMAP($1, __ETHTOOL_LINK_MODE_MASK_NBITS)/gos; $members =~ s/__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)/DECLARE_BITMAP($1, __ETHTOOL_LINK_MODE_MASK_NBITS)/gos;
$members =~ s/DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)/DECLARE_BITMAP($1, PHY_INTERFACE_MODE_MAX)/gos;
$members =~ s/DECLARE_BITMAP\s*\($args,\s*$args\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; $members =~ s/DECLARE_BITMAP\s*\($args,\s*$args\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos;
# replace DECLARE_HASHTABLE # replace DECLARE_HASHTABLE
$members =~ s/DECLARE_HASHTABLE\s*\($args,\s*$args\)/unsigned long $1\[1 << (($2) - 1)\]/gos; $members =~ s/DECLARE_HASHTABLE\s*\($args,\s*$args\)/unsigned long $1\[1 << (($2) - 1)\]/gos;
...@@ -1798,6 +1799,7 @@ sub dump_function($$) { ...@@ -1798,6 +1799,7 @@ sub dump_function($$) {
$prototype =~ s/__weak +//; $prototype =~ s/__weak +//;
$prototype =~ s/__sched +//; $prototype =~ s/__sched +//;
$prototype =~ s/__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +//; $prototype =~ s/__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +//;
$prototype =~ s/__alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +//;
my $define = $prototype =~ s/^#\s*define\s+//; #ak added my $define = $prototype =~ s/^#\s*define\s+//; #ak added
$prototype =~ s/__attribute_const__ +//; $prototype =~ s/__attribute_const__ +//;
$prototype =~ s/__attribute__\s*\(\( $prototype =~ s/__attribute__\s*\(\(
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment