Commit 50d22834 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.10' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "As hoped, things calmed down for docs this cycle; fewer changes and
  almost no conflicts at all. This includes:

   - A reworked and expanded user-mode Linux document

   - Some simplifications and improvements for submitting-patches.rst

   - An emergency fix for (some) problems with Sphinx 3.x

   - Some welcome automarkup improvements to automatically generate
     cross-references to struct definitions and other documents

   - The usual collection of translation updates, typo fixes, etc"

* tag 'docs-5.10' of git://git.lwn.net/linux: (81 commits)
  gpiolib: Update indentation in driver.rst for code excerpts
  Documentation/admin-guide: tainted-kernels: Fix typo occured
  Documentation: better locations for sysfs-pci, sysfs-tagging
  docs: programming-languages: refresh blurb on clang support
  Documentation: kvm: fix a typo
  Documentation: Chinese translation of Documentation/arm64/amu.rst
  doc: zh_CN: index files in arm64 subdirectory
  mailmap: add entry for <mstarovoitov@marvell.com>
  doc: seq_file: clarify role of *pos in ->next()
  docs: trace: ring-buffer-design.rst: use the new SPDX tag
  Documentation: kernel-parameters: clarify "module." parameters
  Fix references to nommu-mmap.rst
  docs: rewrite admin-guide/sysctl/abi.rst
  docs: fb: Remove vesafb scrollback boot option
  docs: fb: Remove sstfb scrollback boot option
  docs: fb: Remove matroxfb scrollback boot option
  docs: fb: Remove framebuffer scrollback boot option
  docs: replace the old User Mode Linux HowTo with a new one
  Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev"
  Documentation/admin-guide: README & svga: remove use of "rdev"
  ...
parents ced3a9eb 4fb220da
......@@ -152,3 +152,6 @@ x509.genkey
# Clang's compilation database file
/compile_commands.json
# Documentation toolchain
sphinx_*/
......@@ -197,6 +197,7 @@ Maciej W. Rozycki <macro@mips.com> <macro@imgtec.com>
Marcin Nowakowski <marcin.nowakowski@mips.com> <marcin.nowakowski@imgtec.com>
Marc Zyngier <maz@kernel.org> <marc.zyngier@arm.com>
Mark Brown <broonie@sirena.org.uk>
Mark Starovoytov <mstarovo@pm.me> <mstarovoitov@marvell.com>
Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@puri.sm>
......
What: /sys/kernel/notes
Date: July 2009
Contact: <linux-kernel@vger.kernel.org>
Description: The /sys/kernel/notes file contains the binary representation
of the running vmlinux's .notes section.
......@@ -12,6 +12,7 @@ Linux PCI Bus Subsystem
pciebus-howto
pci-iov-howto
msi-howto
sysfs-pci
acpi-info
pci-error-recovery
pcieaer-howto
......
......@@ -322,9 +322,9 @@ Compiling the kernel
reboot, and enjoy!
If you ever need to change the default root device, video mode,
ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
alternatively the LILO boot options when appropriate). No need to
recompile the kernel to change these parameters.
etc. in the kernel image, use your bootloader's boot options
where appropriate. No need to recompile the kernel to change
these parameters.
- Reboot with the new kernel and enjoy.
......
......@@ -5,11 +5,14 @@ A block layer cache (bcache)
Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
nice if you could use them as cache... Hence bcache.
Wiki and git repositories are at:
The bcache wiki can be found at:
https://bcache.evilpiepirate.org
- https://bcache.evilpiepirate.org
- http://evilpiepirate.org/git/linux-bcache.git
- https://evilpiepirate.org/git/bcache-tools.git
This is the git repository of bcache-tools:
https://git.kernel.org/pub/scm/linux/kernel/git/colyli/bcache-tools.git/
The latest bcache kernel code can be found from mainline Linux kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
It's designed around the performance characteristics of SSDs - it only allocates
in erase block sized buckets, and it uses a hybrid btree/log to track cached
......@@ -41,17 +44,21 @@ in the cache it first disables writeback caching and waits for all dirty data
to be flushed.
Getting started:
You'll need make-bcache from the bcache-tools repository. Both the cache device
You'll need bcache util from the bcache-tools repository. Both the cache device
and backing device must be formatted before use::
make-bcache -B /dev/sdb
make-bcache -C /dev/sdc
bcache make -B /dev/sdb
bcache make -C /dev/sdc
make-bcache has the ability to format multiple devices at the same time - if
`bcache make` has the ability to format multiple devices at the same time - if
you format your backing devices and cache device at the same time, you won't
have to manually attach::
make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
bcache make -B /dev/sda /dev/sdb -C /dev/sdc
If your bcache-tools is not updated to latest version and does not have the
unified `bcache` utility, you may use the legacy `make-bcache` utility to format
bcache device with same -B and -C parameters.
bcache-tools now ships udev rules, and bcache devices are known to the kernel
immediately. Without udev, you can manually register devices like this::
......@@ -188,7 +195,7 @@ D) Recovering data without bcache:
If bcache is not available in the kernel, a filesystem on the backing
device is still available at an 8KiB offset. So either via a loopdev
of the backing device created with --offset 8K, or any value defined by
--data-offset when you originally formatted bcache with `make-bcache`.
--data-offset when you originally formatted bcache with `bcache make`.
For example::
......@@ -210,7 +217,7 @@ E) Wiping a cache device
After you boot back with bcache enabled, you recreate the cache and attach it::
host:~# make-bcache -C /dev/sdh2
host:~# bcache make -C /dev/sdh2
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
version: 0
......@@ -318,7 +325,7 @@ want for getting the best possible numbers when benchmarking.
The default metadata size in bcache is 8k. If your backing device is
RAID based, then be sure to align this by a multiple of your stride
width using `make-bcache --data-offset`. If you intend to expand your
width using `bcache make --data-offset`. If you intend to expand your
disk array in the future, then multiply a series of primes by your
raid stripe size to get the disk multiples that you would like.
......
......@@ -6,7 +6,7 @@ Using the RAM disk block device with Linux
1) Overview
2) Kernel Command Line Parameters
3) Using "rdev -r"
3) Using "rdev"
4) An Example of Creating a Compressed RAM Disk
......@@ -59,51 +59,27 @@ default is 4096 (4 MB).
rd_size
See ramdisk_size.
3) Using "rdev -r"
------------------
3) Using "rdev"
---------------
The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
prompt/wait sequence is to be given before trying to read the RAM disk. Since
the RAM disk dynamically grows as data is being written into it, a size field
is not required. Bits 11 to 13 are not currently used and may as well be zero.
These numbers are no magical secrets, as seen below::
"rdev" is an obsolete, deprecated, antiquated utility that could be used
to set the boot device in a Linux kernel image.
./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
Instead of using rdev, just place the boot device information on the
kernel command line and pass it to the kernel from the bootloader.
Consider a typical two floppy disk setup, where you will have the
kernel on disk one, and have already put a RAM disk image onto disk #2.
You can also pass arguments to the kernel by setting FDARGS in
arch/x86/boot/Makefile and specify in initrd image by setting FDINITRD in
arch/x86/boot/Makefile.
Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
starts at an offset of 0 kB from the beginning of the floppy.
The command line equivalent is: "ramdisk_start=0"
Some of the kernel command line boot options that may apply here are::
You want bit 14 as one, indicating that a RAM disk is to be loaded.
The command line equivalent is: "load_ramdisk=1"
You want bit 15 as one, indicating that you want a prompt/keypress
sequence so that you have a chance to switch floppy disks.
The command line equivalent is: "prompt_ramdisk=1"
Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
So to create disk one of the set, you would do::
/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
/usr/src/linux# rdev /dev/fd0 /dev/fd0
/usr/src/linux# rdev -r /dev/fd0 49152
ramdisk_start=N
ramdisk_size=M
If you make a boot disk that has LILO, then for the above, you would use::
append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
Since the default start = 0 and the default prompt = 1, you could use::
append = "load_ramdisk=1"
append = "ramdisk_start=N ramdisk_size=M"
4) An Example of Creating a Compressed RAM Disk
-----------------------------------------------
......@@ -151,12 +127,9 @@ f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
have 2^15 + 2^14 + 400 = 49552::
rdev /dev/fd0 /dev/fd0
rdev -r /dev/fd0 49552
g) Make sure that you have already specified the boot information in
FDARGS and FDINITRD or that you use a bootloader to pass kernel
command line boot options to the kernel.
That is it. You now have your boot/root compressed RAM disk floppy. Some
users may wish to combine steps (d) and (f) by using a pipe.
......@@ -167,11 +140,14 @@ users may wish to combine steps (d) and (f) by using a pipe.
Changelog:
----------
SEPT-2020 :
Removed usage of "rdev"
10-22-04 :
Updated to reflect changes in command line options, remove
obsolete references, general cleanup.
James Nelson (james4765@gmail.com)
12-95 :
Original Document
......@@ -509,9 +509,12 @@ ELF32-format headers using the --elf32-core-headers kernel option on the
dump kernel.
You can also use the Crash utility to analyze dump files in Kdump
format. Crash is available on Dave Anderson's site at the following URL:
format. Crash is available at the following URL:
http://people.redhat.com/~anderson/
https://github.com/crash-utility/crash
Crash document can be found at:
https://crash-utility.github.io/
Trigger Kdump on WARN()
=======================
......
......@@ -591,7 +591,7 @@
some critical bits.
cma=nn[MG]@[start[MG][-end[MG]]]
[ARM,X86,KNL]
[KNL,CMA]
Sets the size of kernel global memory area for
contiguous memory allocations and optionally the
placement constraint by the physical address range of
......@@ -940,7 +940,7 @@
Arch Perfmon v4 (Skylake and newer).
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
Disable Dynamic DMA Window support. Use this
to workaround buggy firmware.
disable_ipv6= [IPV6]
......@@ -1019,7 +1019,7 @@
what data is available or for reverse-engineering.
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
<module>.dyndbg[="val"]
Enable debug messages at boot time. See
Documentation/admin-guide/dynamic-debug-howto.rst
for details.
......@@ -1027,7 +1027,7 @@
nopku [X86] Disable Memory Protection Keys CPU feature found
in some Intel CPUs.
module.async_probe [KNL]
<module>.async_probe [KNL]
Enable asynchronous probe on this module.
early_ioremap_debug [KNL]
......@@ -1956,7 +1956,7 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
io7= [HW] IO7 for Marvel based alpha systems
io7= [HW] IO7 for Marvel-based Alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
......@@ -2177,7 +2177,7 @@
kgdbwait [KGDB] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
kmac= [MIPS] korina ethernet MAC address.
kmac= [MIPS] Korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.
......@@ -2258,6 +2258,14 @@
[KVM,ARM] Allow use of GICv4 for direct injection of
LPIs.
kvm_cma_resv_ratio=n [PPC]
Reserves given percentage from system memory area for
contiguous memory allocation for KVM hash pagetable
allocation.
By default it reserves 5% of total system memory.
Format: <integer>
Default: 5
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
......@@ -2367,9 +2375,10 @@
lapic [X86-32,APIC] Enable the local APIC even if BIOS
disabled it.
lapic= [X86,APIC] "notscdeadline" Do not use TSC deadline
lapic= [X86,APIC] Do not use TSC deadline
value for LAPIC timer one-shot implementation. Default
back to the programmable timer unit in the LAPIC.
Format: notscdeadline
lapic_timer_c2_ok [X86,APIC] trust the local apic timer
in C2 power state.
......@@ -2441,8 +2450,7 @@
memblock=debug [KNL] Enable memblock debug messages.
load_ramdisk= [RAM] List of ramdisks to load from floppy
See Documentation/admin-guide/blockdev/ramdisk.rst.
load_ramdisk= [RAM] [Deprecated]
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
......@@ -2579,8 +2587,8 @@
(machvec) in a generic kernel.
Example: machvec=hpzx1
machtype= [Loongson] Share the same kernel image file between different
yeeloong laptop.
machtype= [Loongson] Share the same kernel image file between
different yeeloong laptops.
Example: machtype=lemote-yeeloong-2f-7inch
max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater
......@@ -3185,7 +3193,7 @@
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
nohugeiomap [KNL,X86,PPC] Disable kernel huge I/O mappings.
nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
......@@ -3921,9 +3929,7 @@
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
See Documentation/admin-guide/blockdev/ramdisk.rst.
prompt_ramdisk= [RAM] [Deprecated]
prot_virt= [S390] enable hosting protected virtual machines
isolated from the hypervisor (if hardware supports
......@@ -3981,6 +3987,8 @@
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/admin-guide/blockdev/ramdisk.rst.
ramdisk_start= [RAM] RAM disk image start address
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
CPU's random number generator (if available) to
......
......@@ -12,7 +12,8 @@ Intro
This small document describes the "Video Mode Selection" feature which
allows the use of various special video modes supported by the video BIOS. Due
to usage of the BIOS, the selection is limited to boot time (before the
kernel decompression starts) and works only on 80X86 machines.
kernel decompression starts) and works only on 80X86 machines that are
booted through BIOS firmware (as opposed to through UEFI, kexec, etc.).
.. note::
......@@ -23,7 +24,7 @@ kernel decompression starts) and works only on 80X86 machines.
The video mode to be used is selected by a kernel parameter which can be
specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
option of LILO (or some other boot loader you use) or by the "vidmode" utility
option of LILO (or some other boot loader you use) or by the "xrandr" utility
(present in standard Linux utility packages). You can use the following values
of this parameter::
......@@ -41,7 +42,7 @@ of this parameter::
better to use absolute mode numbers instead.
0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
for exact meaning of the ID). Warning: rdev and LILO don't support
for exact meaning of the ID). Warning: LILO doesn't support
hexadecimal numbers -- you have to convert it to decimal manually.
Menu
......
.. SPDX-License-Identifier: GPL-2.0+
================================
Documentation for /proc/sys/abi/
================================
kernel version 2.6.0.test2
.. See scripts/check-sysctl-docs to keep this up to date:
.. scripts/check-sysctl-docs -vtable="abi" \
.. Documentation/admin-guide/sysctl/abi.rst \
.. $(git grep -l register_sysctl_)
Copyright (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net>
Copyright (c) 2020, Stephen Kitt
For general info: index.rst.
For general info, see :doc:`index`.
------------------------------------------------------------------------------
This path is binary emulation relevant aka personality types aka abi.
When a process is executed, it's linked to an exec_domain whose
personality is defined using values available from /proc/sys/abi.
You can find further details about abi in include/linux/personality.h.
Here are the files featuring in 2.6 kernel:
- defhandler_coff
- defhandler_elf
- defhandler_lcall7
- defhandler_libcso
- fake_utsname
- trace
defhandler_coff
---------------
defined value:
PER_SCOSVR3::
0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
defhandler_elf
--------------
defined value:
PER_LINUX::
0
defhandler_lcall7
-----------------
defined value :
PER_SVR4::
0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
defhandler_libsco
-----------------
defined value:
PER_SVR4::
The files in ``/proc/sys/abi`` can be used to see and modify
ABI-related settings.
0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
Currently, these files might (depending on your configuration)
show up in ``/proc/sys/kernel``:
fake_utsname
------------
.. contents:: :local:
Unused
vsyscall32 (x86)
================
trace
-----
Determines whether the kernels maps a vDSO page into 32-bit processes;
can be set to 1 to enable, or 0 to disable. Defaults to enabled if
``CONFIG_COMPAT_VDSO`` is set, disabled otherwide.
Unused
This controls the same setting as the ``vdso32`` kernel boot
parameter.
......@@ -130,7 +130,7 @@ More detailed explanation for tainting
5) ``B`` If a page-release function has found a bad page reference or some
unexpected page flags. This indicates a hardware problem or a kernel bug;
there should be other information in the log indicating why this tainting
occured.
occurred.
6) ``U`` if a user or user application specifically requested that the
Tainted flag be set, ``' '`` otherwise.
......
......@@ -108,7 +108,7 @@ SunXi family
* Datasheet
http://dl.linux-sunxi.org/H3/Allwinner_H3_Datasheet_V1.0.pdf
https://linux-sunxi.org/images/4/4b/Allwinner_H3_Datasheet_V1.2.pdf
- Allwinner R40 (sun8i)
......
.. _amu_index:
=======================================================
Activity Monitors Unit (AMU) extension in AArch64 Linux
=======================================================
......
.. _arm64_index:
==================
ARM64 Architecture
==================
......
......@@ -36,10 +36,23 @@ needs_sphinx = '1.3'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include',
'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
'maintainers_include', 'sphinx.ext.autosectionlabel' ]
#
# cdomain is badly broken in Sphinx 3+. Leaving it out generates *most*
# of the docs correctly, but not all. Scream bloody murder but allow
# the process to proceed; hopefully somebody will fix this properly soon.
#
if major >= 3:
sys.stderr.write('''WARNING: The kernel documentation build process
does not work correctly with Sphinx v3.0 and above. Expect errors
in the generated output.
''')
else:
extensions.append('cdomain')
# Ensure that autosectionlabel will produce unique names
autosectionlabel_prefix_document = True
autosectionlabel_maxdepth = 2
......
......@@ -30,7 +30,7 @@ which didn't support these methods.
Command Line Switches
=====================
``maxcpus=n``
Restrict boot time CPUs to *n*. Say if you have fourV CPUs, using
Restrict boot time CPUs to *n*. Say if you have four CPUs, using
``maxcpus=2`` will only boot two. You can choose to bring the
other CPUs later online.
......
......@@ -387,22 +387,23 @@ Domain`_ references.
Cross-referencing from reStructuredText
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To cross-reference the functions and types defined in the kernel-doc comments
from reStructuredText documents, please use the `Sphinx C Domain`_
references. For example::
See function :c:func:`foo` and struct/union/enum/typedef :c:type:`bar`.
While the type reference works with just the type name, without the
struct/union/enum/typedef part in front, you may want to use::
See :c:type:`struct foo <foo>`.
See :c:type:`union bar <bar>`.
See :c:type:`enum baz <baz>`.
See :c:type:`typedef meh <meh>`.
This will produce prettier links, and is in line with how kernel-doc does the
cross-references.
No additional syntax is needed to cross-reference the functions and types
defined in the kernel-doc comments from reStructuredText documents.
Just end function names with ``()`` and write ``struct``, ``union``, ``enum``
or ``typedef`` before types.
For example::
See foo().
See struct foo.
See union bar.
See enum baz.
See typedef meh.
However, if you want custom text in the cross-reference link, that can be done
through the following syntax::
See :c:func:`my custom link text for function foo <foo>`.
See :c:type:`my custom link text for struct bar <bar>`.
For further details, please refer to the `Sphinx C Domain`_ documentation.
......
......@@ -337,6 +337,23 @@ Rendered as:
- column 3
Cross-referencing
-----------------
Cross-referencing from one documentation page to another can be done by passing
the path to the file starting from the Documentation folder.
For example, to cross-reference to this page (the .rst extension is optional)::
See Documentation/doc-guide/sphinx.rst.
If you want to use a relative path, you need to use Sphinx's ``doc`` directive.
For example, referencing this page from the same directory would be done as::
See :doc:`sphinx`.
For information on cross-referencing to kernel-doc functions or types, see
Documentation/doc-guide/kernel-doc.rst.
.. _sphinx_kfigure:
Figures & Images
......
......@@ -85,7 +85,7 @@ consider though:
- Memory mapping the contents of the DMA buffer is also supported. See the
discussion below on `CPU Access to DMA Buffer Objects`_ for the full details.
- The DMA buffer FD is also pollable, see `Fence Poll Support`_ below for
- The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below for
details.
Basic Operation and Device DMA Access
......
......@@ -10,3 +10,4 @@ Non-Volatile Memory Device (NVDIMM)
nvdimm
btt
security
firmware-activate
......@@ -518,10 +518,10 @@ typically called during a dailink .shutdown() callback, which clears
the stream pointer for all DAIS connected to a stream and releases the
memory allocated for the stream.
Not Supported
Not Supported
=============
1. A single port with multiple channels supported cannot be used between two
streams or across stream. For example a port with 4 channels cannot be used
to handle 2 independent stereo streams even though it's possible in theory
in SoundWire.
streams or across stream. For example a port with 4 channels cannot be used
to handle 2 independent stereo streams even though it's possible in theory
in SoundWire.
......@@ -87,15 +87,8 @@ C. Boot options
Note, not all drivers can handle font with widths not divisible by 8,
such as vga16fb.
2. fbcon=scrollback:<value>[k]
The scrollback buffer is memory that is used to preserve display
contents that has already scrolled past your view. This is accessed
by using the Shift-PageUp key combination. The value 'value' is any
integer. It defaults to 32KB. The 'k' suffix is optional, and will
multiply the 'value' by 1024.
3. fbcon=map:<0123>
2. fbcon=map:<0123>
This is an interesting option. It tells which driver gets mapped to
which console. The value '0123' is a sequence that gets repeated until
......@@ -116,7 +109,7 @@ C. Boot options
Later on, when you want to map the console the to the framebuffer
device, you can use the con2fbmap utility.
4. fbcon=vc:<n1>-<n2>
3. fbcon=vc:<n1>-<n2>
This option tells fbcon to take over only a range of consoles as
specified by the values 'n1' and 'n2'. The rest of the consoles
......@@ -127,7 +120,7 @@ C. Boot options
is typically located on the same video card. Thus, the consoles that
are controlled by the VGA console will be garbled.
5. fbcon=rotate:<n>
4. fbcon=rotate:<n>
This option changes the orientation angle of the console display. The
value 'n' accepts the following:
......@@ -152,21 +145,21 @@ C. Boot options
Actually, the underlying fb driver is totally ignorant of console
rotation.
6. fbcon=margin:<color>
5. fbcon=margin:<color>
This option specifies the color of the margins. The margins are the
leftover area at the right and the bottom of the screen that are not
used by text. By default, this area will be black. The 'color' value
is an integer number that depends on the framebuffer driver being used.
7. fbcon=nodefer
6. fbcon=nodefer
If the kernel is compiled with deferred fbcon takeover support, normally
the framebuffer contents, left in place by the firmware/bootloader, will
be preserved until there actually is some text is output to the console.
This option causes fbcon to bind immediately to the fbdev device.
8. fbcon=logo-pos:<location>
7. fbcon=logo-pos:<location>
The only possible 'location' is 'center' (without quotes), and when
given, the bootup logo is moved from the default top-left corner
......@@ -174,7 +167,7 @@ C. Boot options
displayed due to multiple CPUs, the collected line of logos is moved
as a whole.
9. fbcon=logo-count:<n>
8. fbcon=logo-count:<n>
The value 'n' overrides the number of bootup logos. 0 disables the
logo, and -1 gives the default which is the number of online CPUs.
......
......@@ -317,8 +317,6 @@ Currently there are following known bugs:
- interlaced text mode is not supported; it looks like hardware limitation,
but I'm not sure.
- Gxx0 SGRAM/SDRAM is not autodetected.
- If you are using more than one framebuffer device, you must boot kernel
with 'video=scrollback:0'.
- maybe more...
And following misfeatures:
......
......@@ -185,9 +185,6 @@ Bugs
contact me.
- The 24/32 is not likely to work anytime soon, knowing that the
hardware does ... unusual things in 24/32 bpp.
- When used with another video board, current limitations of the linux
console subsystem can cause some troubles, specifically, you should
disable software scrollback, as it can oops badly ...
Todo
====
......
......@@ -135,8 +135,6 @@ ypan enable display panning using the VESA protected mode
* scrolling (fullscreen) is fast, because there is
no need to copy around data.
* You'll get scrollback (the Shift-PgUp thing),
the video memory can be used as scrollback buffer
kontra:
......
......@@ -34,8 +34,6 @@ algorithms work.
quota
seq_file
sharedsubtree
sysfs-pci
sysfs-tagging
automount-support
......
.. SPDX-License-Identifier: GPL-2.0
====================
fILESYSTEM Mount API
Filesystem Mount API
====================
.. CONTENTS
......@@ -479,7 +479,7 @@ returned.
int vfs_parse_fs_param(struct fs_context *fc,
struct fs_parameter *param);
Supply a single mount parameter to the filesystem context. This include
Supply a single mount parameter to the filesystem context. This includes
the specification of the source/device which is specified as the "source"
parameter (which may be specified multiple times if the filesystem
supports that).
......@@ -592,8 +592,7 @@ The following helpers all wrap sget_fc():
one.
=====================
PARAMETER DESCRIPTION
Parameter Description
=====================
Parameters are described using structures defined in linux/fs_parser.h.
......
......@@ -129,7 +129,9 @@ also a special value which can be returned by the start() function
called SEQ_START_TOKEN; it can be used if you wish to instruct your
show() function (described below) to print a header at the top of the
output. SEQ_START_TOKEN should only be used if the offset is zero,
however.
however. SEQ_START_TOKEN has no special meaning to the core seq_file
code. It is provided as a convenience for a start() funciton to
communicate with the next() and show() functions.
The next function to implement is called, amazingly, next(); its job is to
move the iterator forward to the next position in the sequence. The
......@@ -145,6 +147,22 @@ complete. Here's the example version::
return spos;
}
The next() function should set ``*pos`` to a value that start() can use
to find the new location in the sequence. When the iterator is being
stored in the private data area, rather than being reinitialized on each
start(), it might seem sufficient to simply set ``*pos`` to any non-zero
value (zero always tells start() to restart the sequence). This is not
sufficient due to historical problems.
Historically, many next() functions have *not* updated ``*pos`` at
end-of-file. If the value is then used by start() to initialise the
iterator, this can result in corner cases where the last entry in the
sequence is reported twice in the file. In order to discourage this bug
from being resurrected, the core seq_file code now produces a warning if
a next() function does not change the value of ``*pos``. Consequently a
next() function *must* change the value of ``*pos``, and of course must
set it to a non-zero value.
The stop() function closes a session; its job, of course, is to clean
up. If dynamic memory is allocated for the iterator, stop() is the
place to free it; if a lock was taken by start(), stop() must release
......
......@@ -172,14 +172,13 @@ calls the associated methods.
To illustrate::
#define to_dev(obj) container_of(obj, struct device, kobj)
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
char *buf)
{
struct device_attribute *dev_attr = to_dev_attr(attr);
struct device *dev = to_dev(kobj);
struct device *dev = kobj_to_dev(kobj);
ssize_t ret = -EIO;
if (dev_attr->show)
......
.. SPDX-License-Identifier: GPL-2.0
:orphan:
.. UBIFS Authentication
.. sigma star gmbh
.. 2018
============================
UBIFS Authentication Support
============================
Introduction
============
......
......@@ -26,3 +26,4 @@ ACPI Support
lpit
video_extension
extcon-intel-int3496
intel-pmc-mux
......@@ -158,6 +158,7 @@ Hardware Monitoring Kernel Drivers
smsc47b397
smsc47m192
smsc47m1
sparx5-temp
tc654
tc74
thmc50
......
......@@ -15,4 +15,3 @@ IA-64 Architecture
irq-redir
mca
serial
xen
********************************************************
Recipe for getting/building/running Xen/ia64 with pv_ops
********************************************************
This recipe describes how to get xen-ia64 source and build it,
and run domU with pv_ops.
Requirements
============
- python
- mercurial
it (aka "hg") is an open-source source code
management software. See the below.
http://www.selenic.com/mercurial/wiki/
- git
- bridge-utils
Getting and Building Xen and Dom0
=================================
My environment is:
- Machine : Tiger4
- Domain0 OS : RHEL5
- DomainU OS : RHEL5
1. Download source::
# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
# cd xen-unstable.hg
# hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg
2. # make world
3. # make install-tools
4. copy kernels and xen::
# cp xen/xen.gz /boot/efi/efi/redhat/
# cp build-linux-2.6.18-xen_ia64/vmlinux.gz \
/boot/efi/efi/redhat/vmlinuz-2.6.18.8-xen
5. make initrd for Dom0/DomU::
# make -C linux-2.6.18-xen.hg ARCH=ia64 modules_install \
O=$(pwd)/build-linux-2.6.18-xen_ia64
# mkinitrd -f /boot/efi/efi/redhat/initrd-2.6.18.8-xen.img \
2.6.18.8-xen --builtin mptspi --builtin mptbase \
--builtin mptscsih --builtin uhci-hcd --builtin ohci-hcd \
--builtin ehci-hcd
Making a disk image for guest OS
================================
1. make file::
# dd if=/dev/zero of=/root/rhel5.img bs=1M seek=4096 count=0
# mke2fs -F -j /root/rhel5.img
# mount -o loop /root/rhel5.img /mnt
# cp -ax /{dev,var,etc,usr,bin,sbin,lib} /mnt
# mkdir /mnt/{root,proc,sys,home,tmp}
Note: You may miss some device files. If so, please create them
with mknod. Or you can use tar instead of cp.
2. modify DomU's fstab::
# vi /mnt/etc/fstab
/dev/xvda1 / ext3 defaults 1 1
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
3. modify inittab
set runlevel to 3 to avoid X trying to start::
# vi /mnt/etc/inittab
id:3:initdefault:
Start a getty on the hvc0 console::
X0:2345:respawn:/sbin/mingetty hvc0
tty1-6 mingetty can be commented out
4. add hvc0 into /etc/securetty::
# vi /mnt/etc/securetty (add hvc0)
5. umount::
# umount /mnt
FYI, virt-manager can also make a disk image for guest OS.
It's GUI tools and easy to make it.
Boot Xen & Domain0
==================
1. replace elilo
elilo of RHEL5 can boot Xen and Dom0.
If you use old elilo (e.g RHEL4), please download from the below
http://elilo.sourceforge.net/cgi-bin/blosxom
and copy into /boot/efi/efi/redhat/::
# cp elilo-3.6-ia64.efi /boot/efi/efi/redhat/elilo.efi
2. modify elilo.conf (like the below)::
# vi /boot/efi/efi/redhat/elilo.conf
prompt
timeout=20
default=xen
relocatable
image=vmlinuz-2.6.18.8-xen
label=xen
vmm=xen.gz
initrd=initrd-2.6.18.8-xen.img
read-only
append=" -- rhgb root=/dev/sda2"
The append options before "--" are for xen hypervisor,
the options after "--" are for dom0.
FYI, your machine may need console options like
"com1=19200,8n1 console=vga,com1". For example,
append="com1=19200,8n1 console=vga,com1 -- rhgb console=tty0 \
console=ttyS0 root=/dev/sda2"
Getting and Building domU with pv_ops
=====================================
1. get pv_ops tree::
# git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/
2. git branch (if necessary)::
# cd linux-2.6-xen-ia64/
# git checkout -b your_branch origin/xen-ia64-domu-minimal-2008may19
Note:
The current branch is xen-ia64-domu-minimal-2008may19.
But you would find the new branch. You can see with
"git branch -r" to get the branch lists.
http://people.valinux.co.jp/~yamahata/xen-ia64/for_eagl/linux-2.6-ia64-pv-ops.git/
is also available.
The tree is based on
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 test)
3. copy .config for pv_ops of domU::
# cp arch/ia64/configs/xen_domu_wip_defconfig .config
4. make kernel with pv_ops::
# make oldconfig
# make
5. install the kernel and initrd::
# cp vmlinux.gz /boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU
# make modules_install
# mkinitrd -f /boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img \
2.6.26-rc3xen-ia64-08941-g1b12161 --builtin mptspi \
--builtin mptbase --builtin mptscsih --builtin uhci-hcd \
--builtin ohci-hcd --builtin ehci-hcd
Boot DomainU with pv_ops
========================
1. make config of DomU::
# vi /etc/xen/rhel5
kernel = "/boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU"
ramdisk = "/boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img"
vcpus = 1
memory = 512
name = "rhel5"
disk = [ 'file:/root/rhel5.img,xvda1,w' ]
root = "/dev/xvda1 ro"
extra= "rhgb console=hvc0"
2. After boot xen and dom0, start xend::
# /etc/init.d/xend start
( In the debugging case, `# XEND_DEBUG=1 xend trace_start` )
3. start domU::
# xm create -c rhel5
Reference
=========
- Wiki of Xen/IA64 upstream merge
http://wiki.xensource.com/xenwiki/XenIA64/UpstreamMerge
Written by Akio Takebe <takebe_akio@jp.fujitsu.com> on 28 May 2008
......@@ -53,7 +53,7 @@ kernel module following the interface in include/linux/iio/sw_trigger.h::
*/
}
static int iio_trig_hrtimer_remove(struct iio_sw_trigger *swt)
static int iio_trig_sample_remove(struct iio_sw_trigger *swt)
{
/*
* This undoes the actions in iio_trig_sample_probe
......
.. _kbuild_llvm:
==============================
Building Linux with Clang/LLVM
==============================
......@@ -73,6 +75,8 @@ Getting Help
- `Wiki <https://github.com/ClangBuiltLinux/linux/wiki>`_
- `Beginner Bugs <https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_
.. _getting_llvm:
Getting LLVM
-------------
......
......@@ -13,4 +13,5 @@ additions to this manual.
rebasing-and-merging
pull-requests
maintainer-entry-profile
modifying-patches
.. _modifyingpatches:
Modifying Patches
=================
If you are a subsystem or branch maintainer, sometimes you need to slightly
modify patches you receive in order to merge them, because the code is not
exactly the same in your tree and the submitters'. If you stick strictly to
rule (c) of the developers certificate of origin, you should ask the submitter
to rediff, but this is a totally counter-productive waste of time and energy.
Rule (b) allows you to adjust the code, but then it is very impolite to change
one submitters code and make him endorse your bugs. To solve this problem, it
is recommended that you add a line between the last Signed-off-by header and
yours, indicating the nature of your changes. While there is nothing mandatory
about this, it seems like prepending the description with your mail and/or
name, all enclosed in square brackets, is noticeable enough to make it obvious
that you are responsible for last-minute changes. Example::
Signed-off-by: Random J Developer <random@developer.example.org>
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
This practice is particularly helpful if you maintain a stable branch and
want at the same time to credit the author, track changes, merge the fix,
and protect the submitter from complaints. Note that under no circumstances
can you change the author's identity (the From header), as it is the one
which appears in the changelog.
Special note to back-porters: It seems to be a common and useful practice
to insert an indication of the origin of a patch at the top of the commit
message (just after the subject line) to facilitate tracking. For instance,
here's what we see in a 3.x-stable release::
Date: Tue Oct 7 07:26:38 2014 -0400
libata: Un-break ATA blacklist
commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
And here's what might appear in an older kernel once a patch is backported::
Date: Tue May 13 22:12:27 2008 +0200
wireless, airo: waitbusy() won't delay
[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
Whatever the format, this information provides a valuable help to people
tracking your trees, and to people trying to troubleshoot bugs in your
tree.
......@@ -546,8 +546,8 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
[*] For information on bus mastering DMA and coherency please read:
Documentation/driver-api/pci/pci.rst
Documentation/DMA-API-HOWTO.txt
Documentation/DMA-API.txt
Documentation/core-api/dma-api-howto.rst
Documentation/core-api/dma-api.rst
DATA DEPENDENCY BARRIERS (HISTORICAL)
......@@ -1932,8 +1932,8 @@ There are some more advanced barrier functions:
here.
See the subsection "Kernel I/O barrier effects" for more information on
relaxed I/O accessors and the Documentation/DMA-API.txt file for more
information on consistent memory.
relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for
more information on consistent memory.
(*) pmem_wmb();
......
......@@ -95,6 +95,7 @@ Contents:
seg6-sysctl
strparser
switchdev
sysfs-tagging
tc-actions-env-rules
tcp-thin
team
......
......@@ -405,7 +405,7 @@ be found at:
http://vger.kernel.org/vger-lists.html
There are lists hosted elsewhere, though; a number of them are at
lists.redhat.com.
redhat.com/mailman/listinfo.
The core mailing list for kernel development is, of course, linux-kernel.
This list is an intimidating place to be; volume can reach 500 messages per
......
......@@ -30,6 +30,7 @@ you probably needn't concern yourself with pcmciautils.
Program Minimal version Command to check the version
====================== =============== ========================================
GNU C 4.9 gcc --version
Clang/LLVM (optional) 10.0.1 clang --version
GNU make 3.81 make --version
binutils 2.23 ld -v
flex 2.5.35 flex --version
......@@ -68,6 +69,15 @@ GCC
The gcc version requirements may vary depending on the type of CPU in your
computer.
Clang/LLVM (optional)
---------------------
The latest formal release of clang and LLVM utils (according to
`releases.llvm.org <https://releases.llvm.org>`_) are supported for building
kernels. Older releases aren't guaranteed to work, and we may drop workarounds
from the kernel that were used to support older versions. Please see additional
docs on :ref:`Building Linux with Clang/LLVM <kbuild_llvm>`.
Make
----
......@@ -331,6 +341,11 @@ gcc
- <ftp://ftp.gnu.org/gnu/gcc/>
Clang/LLVM
----------
- :ref:`Getting LLVM <getting_llvm>`.
Make
----
......
......@@ -51,24 +51,6 @@ to make sure their systems do not continue running in the face of
"unreachable" conditions. (For example, see commits like `this one
<https://git.kernel.org/linus/d4689846881d160a4d12a514e991a740bcb5d65a>`_.)
uninitialized_var()
-------------------
For any compiler warnings about uninitialized variables, just add
an initializer. Using the uninitialized_var() macro (or similar
warning-silencing tricks) is dangerous as it papers over `real bugs
<https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/>`_
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes. Keep in
mind that in most cases, if an initialization is obviously redundant,
the compiler's dead-store elimination pass will make sure there are no
needless variable writes.
As Linus has said, this macro
`must <https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/>`_
`be <https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/>`_
`removed <https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/>`_.
open-coded arithmetic in allocator arguments
--------------------------------------------
Dynamic size calculations (especially multiplication) should not be
......@@ -322,7 +304,8 @@ to allocate for a structure containing an array of this kind as a member::
In the example above, we had to remember to calculate ``count - 1`` when using
the struct_size() helper, otherwise we would have --unintentionally-- allocated
memory for one too many ``items`` objects. The cleanest and least error-prone way
to implement this is through the use of a `flexible array member`::
to implement this is through the use of a `flexible array member`, together with
struct_size() and flex_array_size() helpers::
struct something {
size_t count;
......@@ -334,5 +317,4 @@ to implement this is through the use of a `flexible array member`::
instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
instance->count = count;
size = sizeof(instance->items[0]) * instance->count;
memcpy(instance->items, source, size);
memcpy(instance->items, source, flex_array_size(instance, items, instance->count));
......@@ -25,6 +25,11 @@ attachments, but then the attachments should have content-type
it makes quoting portions of the patch more difficult in the patch
review process.
It's also strongly recommended that you use plain text in your email body,
for patches and other emails alike. https://useplaintext.email may be useful
for information on how to configure your preferred email client, as well as
listing recommended email clients should you not already have a preference.
Email clients that are used for Linux kernel patches should send the
patch text untouched. For example, they should not modify or delete tabs
or spaces, even at the beginning or end of lines.
......
......@@ -6,14 +6,15 @@ Programming Language
The kernel is written in the C programming language [c-language]_.
More precisely, the kernel is typically compiled with ``gcc`` [gcc]_
under ``-std=gnu89`` [gcc-c-dialect-options]_: the GNU dialect of ISO C90
(including some C99 features).
(including some C99 features). ``clang`` [clang]_ is also supported, see
docs on :ref:`Building Linux with Clang/LLVM <kbuild_llvm>`.
This dialect contains many extensions to the language [gnu-extensions]_,
and many of them are used within the kernel as a matter of course.
There is some support for compiling the kernel with ``clang`` [clang]_
and ``icc`` [icc]_ for several of the architectures, although at the time
of writing it is not completed, requiring third-party patches.
There is some support for compiling the kernel with ``icc`` [icc]_ for several
of the architectures, although at the time of writing it is not completed,
requiring third-party patches.
Attributes
----------
......
......@@ -24,6 +24,10 @@ and elsewhere regarding submitting Linux kernel patches.
c) Builds successfully when using ``O=builddir``
d) Any Documentation/ changes build successfully without new warnings/errors.
Use ``make htmldocs`` or ``make pdfdocs`` to check the build and
fix any issues.
3) Builds on multiple CPU architectures by using local cross-compile tools
or some other build farm.
......
......@@ -60,10 +60,11 @@ What Criteria Determine Acceptance
Licensing:
The code must be released to us under the
GNU General Public License. We don't insist on any kind
of exclusive GPL licensing, and if you wish the driver
to be useful to other communities such as BSD you may well
wish to release under multiple licenses.
GNU General Public License. If you wish the driver to be
useful to other communities such as BSD you may release
under multiple licenses. If you choose to release under
licenses other than the GPL, you should include your
rationale for your license choices in your cover letter.
See accepted licenses at include/linux/module.h
Copyright:
......
......@@ -10,22 +10,18 @@ can greatly increase the chances of your change being accepted.
This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process
works, see :ref:`Documentation/process <development_process_main>`.
Also, read :ref:`Documentation/process/submit-checklist.rst <submitchecklist>`
for a list of items to check before
submitting code. If you are submitting a driver, also read
:ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>`;
for device tree binding patches, read
Documentation/devicetree/bindings/submitting-patches.rst.
Many of these steps describe the default behavior of the ``git`` version
control system; if you use ``git`` to prepare your patches, you'll find much
of the mechanical work done for you, though you'll still need to prepare
and document a sensible set of patches. In general, use of ``git`` will make
your life as a kernel developer easier.
0) Obtain a current source tree
-------------------------------
works, see :doc:`development-process`. Also, read :doc:`submit-checklist`
for a list of items to check before submitting code. If you are submitting
a driver, also read :doc:`submitting-drivers`; for device tree binding patches,
read :doc:`submitting-patches`.
This documentation assumes that you're using ``git`` to prepare your patches.
If you're unfamiliar with ``git``, you would be well-advised to learn how to
use it, it will make your life as a kernel developer and in general much
easier.
Obtain a current source tree
----------------------------
If you do not have a repository with the current kernel source handy, use
``git`` to obtain one. You'll want to start with the mainline repository,
......@@ -39,68 +35,10 @@ patches prepared against those trees. See the **T:** entry for the subsystem
in the MAINTAINERS file to find that tree, or simply ask the maintainer if
the tree is not listed there.
It is still possible to download kernel releases via tarballs (as described
in the next section), but that is the hard way to do kernel development.
1) ``diff -up``
---------------
If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN``
to create patches. Git generates patches in this form by default; if
you're using ``git``, you can skip this section entirely.
All changes to the Linux kernel occur in the form of patches, as
generated by :manpage:`diff(1)`. When creating your patch, make sure to
create it in "unified diff" format, as supplied by the ``-u`` argument
to :manpage:`diff(1)`.
Also, please use the ``-p`` argument which shows which C function each
change is in - that makes the resultant ``diff`` a lot easier to read.
Patches should be based in the root kernel source directory,
not in any lower subdirectory.
To create a patch for a single file, it is often sufficient to do::
SRCTREE=linux
MYFILE=drivers/net/mydriver.c
cd $SRCTREE
cp $MYFILE $MYFILE.orig
vi $MYFILE # make your change
cd ..
diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch
To create a patch for multiple files, you should unpack a "vanilla",
or unmodified kernel source tree, and generate a ``diff`` against your
own source tree. For example::
MYSRC=/devel/linux
tar xvfz linux-3.19.tar.gz
mv linux-3.19 linux-3.19-vanilla
diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \
linux-3.19-vanilla $MYSRC > /tmp/patch
``dontdiff`` is a list of files which are generated by the kernel during
the build process, and should be ignored in any :manpage:`diff(1)`-generated
patch.
Make sure your patch does not include any extra files which do not
belong in a patch submission. Make sure to review your patch -after-
generating it with :manpage:`diff(1)`, to ensure accuracy.
If your changes produce a lot of deltas, you need to split them into
individual patches which modify things in logical stages; see
:ref:`split_changes`. This will facilitate review by other kernel developers,
very important if you want your patch accepted.
If you're using ``git``, ``git rebase -i`` can help you with this process. If
you're not using ``git``, ``quilt`` <https://savannah.nongnu.org/projects/quilt>
is another popular alternative.
.. _describe_changes:
2) Describe your changes
------------------------
Describe your changes
---------------------
Describe your problem. Whether your patch is a one-line bug fix or
5000 lines of a new feature, there must be an underlying problem that
......@@ -203,8 +141,8 @@ An example call::
.. _split_changes:
3) Separate your changes
------------------------
Separate your changes
---------------------
Separate each **logical change** into a separate patch.
......@@ -236,8 +174,8 @@ then only post say 15 or so at a time and wait for review and integration.
4) Style-check your changes
---------------------------
Style-check your changes
------------------------
Check your patch for basic style violations, details of which can be
found in
......@@ -267,8 +205,8 @@ You should be able to justify all violations that remain in your
patch.
5) Select the recipients for your patch
---------------------------------------
Select the recipients for your patch
------------------------------------
You should always copy the appropriate subsystem maintainer(s) on any patch
to code that they maintain; look through the MAINTAINERS file and the
......@@ -299,7 +237,8 @@ sending him e-mail.
If you have a patch that fixes an exploitable security bug, send that patch
to security@kernel.org. For severe bugs, a short embargo may be considered
to allow distributors to get the patch out to users; in such cases,
obviously, the patch should not be sent to any public lists.
obviously, the patch should not be sent to any public lists. See also
:doc:`/admin-guide/security-bugs`.
Patches that fix a severe bug in a released kernel should be directed
toward the stable maintainers by putting a line like this::
......@@ -342,15 +281,20 @@ Trivial patches must qualify for one of the following rules:
6) No MIME, no links, no compression, no attachments. Just plain text
----------------------------------------------------------------------
No MIME, no links, no compression, no attachments. Just plain text
-------------------------------------------------------------------
Linus and other kernel developers need to be able to read and comment
on the changes you are submitting. It is important for a kernel
developer to be able to "quote" your changes, using standard e-mail
tools, so that they may comment on specific portions of your code.
For this reason, all patches should be submitted by e-mail "inline".
For this reason, all patches should be submitted by e-mail "inline". The
easiest way to do this is with ``git send-email``, which is strongly
recommended. An interactive tutorial for ``git send-email`` is available at
https://git-send-email.io.
If you choose not to use ``git send-email``:
.. warning::
......@@ -366,27 +310,17 @@ decreasing the likelihood of your MIME-attached change being accepted.
Exception: If your mailer is mangling patches then someone may ask
you to re-send them using MIME.
See :ref:`Documentation/process/email-clients.rst <email_clients>`
for hints about configuring your e-mail client so that it sends your patches
untouched.
7) E-mail size
--------------
See :doc:`/process/email-clients` for hints about configuring your e-mail
client so that it sends your patches untouched.
Large changes are not appropriate for mailing lists, and some
maintainers. If your patch, uncompressed, exceeds 300 kB in size,
it is preferred that you store your patch on an Internet-accessible
server, and provide instead a URL (link) pointing to your patch. But note
that if your patch exceeds 300 kB, it almost certainly needs to be broken up
anyway.
8) Respond to review comments
-----------------------------
Respond to review comments
--------------------------
Your patch will almost certainly get comments from reviewers on ways in
which the patch can be improved. You must respond to those comments;
ignoring reviewers is a good way to get ignored in return. Review comments
or questions that do not lead to a code change should almost certainly
which the patch can be improved, in the form of a reply to your email. You must
respond to those comments; ignoring reviewers is a good way to get ignored in
return. You can simply reply to their emails to answer their comments. Review
comments or questions that do not lead to a code change should almost certainly
bring about a comment or changelog entry so that the next reviewer better
understands what is going on.
......@@ -395,9 +329,12 @@ for their time. Code review is a tiring and time-consuming process, and
reviewers sometimes get grumpy. Even in that case, though, respond
politely and address the problems they have pointed out.
See :doc:`email-clients` for recommendations on email
clients and mailing list etiquette.
9) Don't get discouraged - or impatient
---------------------------------------
Don't get discouraged - or impatient
------------------------------------
After you have submitted your change, be patient and wait. Reviewers are
busy people and may not get to your patch right away.
......@@ -410,18 +347,19 @@ one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.
10) Include PATCH in the subject
--------------------------------
Include PATCH in the subject
-----------------------------
Due to high e-mail traffic to Linus, and to linux-kernel, it is common
convention to prefix your subject line with [PATCH]. This lets Linus
and other kernel developers more easily distinguish patches from other
e-mail discussions.
``git send-email`` will do this for you automatically.
11) Sign your work - the Developer's Certificate of Origin
----------------------------------------------------------
Sign your work - the Developer's Certificate of Origin
------------------------------------------------------
To improve tracking of who did what, especially with patches that can
percolate to their final resting place in the kernel through several
......@@ -465,60 +403,15 @@ then you just add a line saying::
Signed-off-by: Random J Developer <random@developer.example.org>
using your real name (sorry, no pseudonyms or anonymous contributions.)
This will be done for you automatically if you use ``git commit -s``.
Some people also put extra tags at the end. They'll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.
If you are a subsystem or branch maintainer, sometimes you need to slightly
modify patches you receive in order to merge them, because the code is not
exactly the same in your tree and the submitters'. If you stick strictly to
rule (c), you should ask the submitter to rediff, but this is a totally
counter-productive waste of time and energy. Rule (b) allows you to adjust
the code, but then it is very impolite to change one submitter's code and
make him endorse your bugs. To solve this problem, it is recommended that
you add a line between the last Signed-off-by header and yours, indicating
the nature of your changes. While there is nothing mandatory about this, it
seems like prepending the description with your mail and/or name, all
enclosed in square brackets, is noticeable enough to make it obvious that
you are responsible for last-minute changes. Example::
Signed-off-by: Random J Developer <random@developer.example.org>
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
This practice is particularly helpful if you maintain a stable branch and
want at the same time to credit the author, track changes, merge the fix,
and protect the submitter from complaints. Note that under no circumstances
can you change the author's identity (the From header), as it is the one
which appears in the changelog.
Special note to back-porters: It seems to be a common and useful practice
to insert an indication of the origin of a patch at the top of the commit
message (just after the subject line) to facilitate tracking. For instance,
here's what we see in a 3.x-stable release::
Date: Tue Oct 7 07:26:38 2014 -0400
libata: Un-break ATA blacklist
commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
And here's what might appear in an older kernel once a patch is backported::
Date: Tue May 13 22:12:27 2008 +0200
wireless, airo: waitbusy() won't delay
[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
Whatever the format, this information provides a valuable help to people
tracking your trees, and to people trying to troubleshoot bugs in your
tree.
12) When to use Acked-by:, Cc:, and Co-developed-by:
-------------------------------------------------------
When to use Acked-by:, Cc:, and Co-developed-by:
------------------------------------------------
The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery path.
......@@ -586,8 +479,8 @@ Example of a patch submitted by a Co-developed-by: author::
Signed-off-by: Submitting Co-Author <sub@coauthor.example.org>
13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
--------------------------------------------------------------------------
Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
----------------------------------------------------------------------
The Reported-by tag gives credit to people who find bugs and report them and it
hopefully inspires them to help us again in the future. Please note that if
......@@ -650,8 +543,8 @@ for more details.
.. _the_canonical_patch_format:
14) The canonical patch format
------------------------------
The canonical patch format
--------------------------
This section describes how the patch itself should be formatted. Note
that, if you have your patches stored in a ``git`` repository, proper patch
......@@ -773,8 +666,8 @@ references.
.. _explicit_in_reply_to:
15) Explicit In-Reply-To headers
--------------------------------
Explicit In-Reply-To headers
----------------------------
It can be helpful to manually add In-Reply-To: headers to a patch
(e.g., when using ``git send-email``) to associate the patch with
......@@ -787,8 +680,8 @@ helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in
the cover email text) to link to an earlier version of the patch series.
16) Providing base tree information
-----------------------------------
Providing base tree information
-------------------------------
When other developers receive your patches and start the review process,
it is often useful for them to know where in the tree history they
......@@ -838,61 +731,6 @@ either below the ``---`` line or at the very bottom of all other
content, right before your email signature.
17) Sending ``git pull`` requests
---------------------------------
If you have a series of patches, it may be most convenient to have the
maintainer pull them directly into the subsystem repository with a
``git pull`` operation. Note, however, that pulling patches from a developer
requires a higher degree of trust than taking patches from a mailing list.
As a result, many subsystem maintainers are reluctant to take pull
requests, especially from new, unknown developers. If in doubt you can use
the pull request as the cover letter for a normal posting of the patch
series, giving the maintainer the option of using either.
A pull request should have [GIT PULL] in the subject line. The
request itself should include the repository name and the branch of
interest on a single line; it should look something like::
Please pull from
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
to get these changes:
A pull request should also include an overall message saying what will be
included in the request, a ``git shortlog`` listing of the patches
themselves, and a ``diffstat`` showing the overall effect of the patch series.
The easiest way to get all this information together is, of course, to let
``git`` do it for you with the ``git request-pull`` command.
Some maintainers (including Linus) want to see pull requests from signed
commits; that increases their confidence that the request actually came
from you. Linus, in particular, will not pull from public hosting sites
like GitHub in the absence of a signed tag.
The first step toward creating such tags is to make a GNUPG key and get it
signed by one or more core kernel developers. This step can be hard for
new developers, but there is no way around it. Attending conferences can
be a good way to find developers who can sign your key.
Once you have prepared a patch series in ``git`` that you wish to have somebody
pull, create a signed tag with ``git tag -s``. This will create a new tag
identifying the last commit in the series and containing a signature
created with your private key. You will also have the opportunity to add a
changelog-style message to the tag; this is an ideal place to describe the
effects of the pull request as a whole.
If the tree the maintainer will be pulling from is not the repository you
are working from, don't forget to push the signed tag explicitly to the
public tree.
When generating your pull request, use the signed tag as the target. A
command like this will do the trick::
git request-pull master git://my.public.tree/linux.git my-signed-tag
References
----------
......
......@@ -365,7 +365,7 @@ giving it a high uclamp.min value.
.. note::
Wakeup CPU selection in CFS can be eclipsed by Energy Aware Scheduling
(EAS), which is described in Documentation/scheduling/sched-energy.rst.
(EAS), which is described in Documentation/scheduler/sched-energy.rst.
5.1.3 Load balancing
~~~~~~~~~~~~~~~~~~~~
......
......@@ -331,7 +331,7 @@ asymmetric CPU topologies for now. This requirement is checked at run-time by
looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
domains are built.
See Documentation/sched/sched-capacity.rst for requirements to be met for this
See Documentation/scheduler/sched-capacity.rst for requirements to be met for this
flag to be set in the sched_domain hierarchy.
Please note that EAS is not fundamentally incompatible with SMP, but no
......
......@@ -323,7 +323,6 @@ credentials (the value is simply returned in each case)::
uid_t current_fsuid(void) Current's file access UID
gid_t current_fsgid(void) Current's file access GID
kernel_cap_t current_cap(void) Current's effective capabilities
void *current_security(void) Current's LSM security pointer
struct user_struct *current_user(void) Current's user account
There are also convenience wrappers for retrieving specific associated pairs of
......
......@@ -39,10 +39,9 @@ With the IBM TSS 2 stack::
Or with the Intel TSS 2 stack::
#> tpm2_createprimary --hierarchy o -G rsa2048 -o key.ctxt
#> tpm2_createprimary --hierarchy o -G rsa2048 -c key.ctxt
[...]
handle: 0x800000FF
#> tpm2_evictcontrol -c key.ctxt -p 0x81000001
#> tpm2_evictcontrol -c key.ctxt 0x81000001
persistentHandle: 0x81000001
Usage::
......
......@@ -13,6 +13,7 @@ if sphinx.version_info[0] < 2 or \
else:
from sphinx.errors import NoUri
import re
from itertools import chain
#
# Regex nastiness. Of course.
......@@ -21,7 +22,13 @@ import re
# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
# bit tries to restrict matches to things that won't create trouble.
#
RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
RE_function = re.compile(r'(([\w_][\w\d_]+)\(\))')
RE_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
#
# Detects a reference to a documentation page of the form Documentation/... with
# an optional extension
#
RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
#
# Many places in the docs refer to common system calls. It is
......@@ -34,32 +41,59 @@ Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
'socket' ]
#
# Find all occurrences of function() and try to replace them with
# appropriate cross references.
#
def markup_funcs(docname, app, node):
cdom = app.env.domains['c']
def markup_refs(docname, app, node):
t = node.astext()
done = 0
repl = [ ]
for m in RE_function.finditer(t):
#
# Include any text prior to function() as a normal text node.
# Associate each regex with the function that will markup its matches
#
markup_func = {RE_type: markup_c_ref,
RE_function: markup_c_ref,
RE_doc: markup_doc_ref}
match_iterators = [regex.finditer(t) for regex in markup_func]
#
# Sort all references by the starting position in text
#
sorted_matches = sorted(chain(*match_iterators), key=lambda m: m.start())
for m in sorted_matches:
#
# Include any text prior to match as a normal text node.
#
if m.start() > done:
repl.append(nodes.Text(t[done:m.start()]))
#
# Call the function associated with the regex that matched this text and
# append its return to the text
#
repl.append(markup_func[m.re](docname, app, m))
done = m.end()
if done < len(t):
repl.append(nodes.Text(t[done:]))
return repl
#
# Try to replace a C reference (function() or struct/union/enum/typedef
# type_name) with an appropriate cross reference.
#
def markup_c_ref(docname, app, match):
class_str = {RE_function: 'c-func', RE_type: 'c-type'}
reftype_str = {RE_function: 'function', RE_type: 'type'}
cdom = app.env.domains['c']
#
# Go through the dance of getting an xref out of the C domain
#
target = m.group(1)[:-2]
target_text = nodes.Text(target + '()')
target = match.group(2)
target_text = nodes.Text(match.group(0))
xref = None
if target not in Skipfuncs:
lit_text = nodes.literal(classes=['xref', 'c', 'c-func'])
if not (match.re == RE_function and target in Skipfuncs):
lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
lit_text += target_text
pxref = addnodes.pending_xref('', refdomain = 'c',
reftype = 'function',
reftype = reftype_str[match.re],
reftarget = target, modname = None,
classname = None)
#
......@@ -68,21 +102,48 @@ def markup_funcs(docname, app, node):
#
try:
xref = cdom.resolve_xref(app.env, docname, app.builder,
'function', target, pxref, lit_text)
reftype_str[match.re], target, pxref,
lit_text)
except NoUri:
xref = None
#
# Toss the xref into the list if we got it; otherwise just put
# the function text.
# Return the xref if we got it; otherwise just return the plain text.
#
if xref:
repl.append(xref)
return xref
else:
repl.append(target_text)
done = m.end()
if done < len(t):
repl.append(nodes.Text(t[done:]))
return repl
return target_text
#
# Try to replace a documentation reference of the form Documentation/... with a
# cross reference to that page
#
def markup_doc_ref(docname, app, match):
stddom = app.env.domains['std']
#
# Go through the dance of getting an xref out of the std domain
#
target = match.group(1)
xref = None
pxref = addnodes.pending_xref('', refdomain = 'std', reftype = 'doc',
reftarget = target, modname = None,
classname = None, refexplicit = False)
#
# XXX The Latex builder will throw NoUri exceptions here,
# work around that by ignoring them.
#
try:
xref = stddom.resolve_xref(app.env, docname, app.builder, 'doc',
target, pxref, None)
except NoUri:
xref = None
#
# Return the xref if we got it; otherwise just return the plain text.
#
if xref:
return xref
else:
return nodes.Text(match.group(0))
def auto_markup(app, doctree, name):
#
......@@ -97,7 +158,7 @@ def auto_markup(app, doctree, name):
for para in doctree.traverse(nodes.paragraph):
for node in para.traverse(nodes.Text):
if not isinstance(node.parent, nodes.literal):
node.parent.replace(node, markup_funcs(name, app, node))
node.parent.replace(node, markup_refs(name, app, node))
def setup(app):
app.connect('doctree-resolved', auto_markup)
......
......@@ -40,7 +40,7 @@ Synopsis of kprobe_events
MEMADDR : Address where the probe is inserted.
MAXACTIVE : Maximum number of instances of the specified function that
can be probed simultaneously, or 0 for the default value
as defined in Documentation/staging/kprobes.rst section 1.3.1.
as defined in Documentation/trace/kprobes.rst section 1.3.1.
FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
......
.. This file is dual-licensed: you can use it either under the terms
.. of the GPL 2.0 or the GFDL 1.2 license, at your option. Note that this
.. dual licensing only applies to this file, and not this project as a
.. whole.
..
.. a) This file is free software; you can redistribute it and/or
.. modify it under the terms of the GNU General Public License as
.. published by the Free Software Foundation version 2 of
.. the License.
..
.. This file is distributed in the hope that it will be useful,
.. but WITHOUT ANY WARRANTY; without even the implied warranty of
.. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.. GNU General Public License for more details.
..
.. Or, alternatively,
..
.. b) Permission is granted to copy, distribute and/or modify this
.. document under the terms of the GNU Free Documentation License,
.. Version 1.2 version published by the Free Software
.. Foundation, with no Invariant Sections, no Front-Cover Texts
.. and no Back-Cover Texts. A copy of the license is included at
.. Documentation/userspace-api/media/fdl-appendix.rst.
..
.. TODO: replace it to GPL-2.0 OR GFDL-1.2 WITH no-invariant-sections
.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only
===========================
Lockless Ring Buffer Design
......
......@@ -284,9 +284,10 @@ Andrew Morton의 글이 있다.
여러 메이저 넘버를 갖는 다양한 안정된 커널 트리들
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 자리 숫자로 이루어진 버젼의 커널들은 -stable 커널들이다. 그것들은 해당 메이저
메인라인 릴리즈에서 발견된 큰 회귀들이나 보안 문제들 중 비교적 작고 중요한
수정들을 포함하며, 앞의 두 버전 넘버는 같은 기반 버전을 의미한다.
세개의 버젼 넘버로 이루어진 버젼의 커널들은 -stable 커널들이다. 그것들은 해당
메이저 메인라인 릴리즈에서 발견된 큰 회귀들이나 보안 문제들 중 비교적 작고
중요한 수정들을 포함한다. 주요 stable 시리즈 릴리즈는 세번째 버젼 넘버를
증가시키며 앞의 두 버젼 넘버는 그대로 유지한다.
이것은 가장 최근의 안정적인 커널을 원하는 사용자에게 추천되는 브랜치이며,
개발/실험적 버젼을 테스트하는 것을 돕고자 하는 사용자들과는 별로 관련이 없다.
......@@ -316,7 +317,7 @@ Andrew Morton의 글이 있다.
제안된 패치는 서브시스템 트리에 커밋되기 전에 메일링 리스트를 통해
리뷰된다(아래의 관련 섹션을 참고하기 바란다). 일부 커널 서브시스템의 경우, 이
리뷰 프로세스는 patchwork라는 도구를 통해 추적된다. patchwork은 등록된 패치와
패치에 대한 코멘트, 패치의 버을 볼 수 있는 웹 인터페이스를 제공하고,
패치에 대한 코멘트, 패치의 버을 볼 수 있는 웹 인터페이스를 제공하고,
메인테이너는 패치를 리뷰 중, 리뷰 통과, 또는 반려됨으로 표시할 수 있다.
대부분의 이러한 patchwork 사이트는 https://patchwork.kernel.org/ 에 나열되어
있다.
......
......@@ -91,7 +91,6 @@ Documentation/memory-barriers.txt
- 컴파일러 배리어.
- CPU 메모리 배리어.
- MMIO 쓰기 배리어.
(*) 암묵적 커널 메모리 배리어.
......@@ -103,7 +102,6 @@ Documentation/memory-barriers.txt
(*) CPU 간 ACQUIRING 배리어의 효과.
- Acquire vs 메모리 액세스.
- Acquire vs I/O 액세스.
(*) 메모리 배리어가 필요한 곳
......@@ -515,14 +513,13 @@ CPU 에게 기대할 수 있는 최소한의 보장사항 몇가지가 있습니
완료되기 전에 행해진 것처럼 보일 수 있습니다.
ACQUIRE 와 RELEASE 오퍼레이션의 사용은 일반적으로 다른 메모리 배리어의
필요성을 없앱니다 (하지만 "MMIO 쓰기 배리어" 서브섹션에서 설명되는 예외를
알아두세요). 또한, RELEASE+ACQUIRE 조합은 범용 메모리 배리어처럼 동작할
것을 보장하지 -않습니다-. 하지만, 어떤 변수에 대한 RELEASE 오퍼레이션을
앞서는 메모리 액세스들의 수행 결과는 이 RELEASE 오퍼레이션을 뒤이어 같은
변수에 대해 수행된 ACQUIRE 오퍼레이션을 뒤따르는 메모리 액세스에는 보여질
것이 보장됩니다. 다르게 말하자면, 주어진 변수의 크리티컬 섹션에서는, 해당
변수에 대한 앞의 크리티컬 섹션에서의 모든 액세스들이 완료되었을 것을
보장합니다.
필요성을 없앱니다. 또한, RELEASE+ACQUIRE 조합은 범용 메모리 배리어처럼
동작할 것을 보장하지 -않습니다-. 하지만, 어떤 변수에 대한 RELEASE
오퍼레이션을 앞서는 메모리 액세스들의 수행 결과는 이 RELEASE 오퍼레이션을
뒤이어 같은 변수에 대해 수행된 ACQUIRE 오퍼레이션을 뒤따르는 메모리
액세스에는 보여질 것이 보장됩니다. 다르게 말하자면, 주어진 변수의
크리티컬 섹션에서는, 해당 변수에 대한 앞의 크리티컬 섹션에서의 모든
액세스들이 완료되었을 것을 보장합니다.
즉, ACQUIRE 는 최소한의 "취득" 동작처럼, 그리고 RELEASE 는 최소한의 "공개"
처럼 동작한다는 의미입니다.
......@@ -1501,8 +1498,6 @@ u 로의 스토어를 cpu1() 의 v 로부터의 로드 뒤에 일어난 것으
(*) CPU 메모리 배리어.
(*) MMIO 쓰기 배리어.
컴파일러 배리어
---------------
......@@ -1909,6 +1904,19 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
"커널 I/O 배리어의 효과" 섹션을, consistent memory 에 대한 자세한 내용을
위해선 Documentation/core-api/dma-api.rst 문서를 참고하세요.
(*) pmem_wmb();
이것은 persistent memory 를 위한 것으로, persistent 저장소에 가해진 변경
사항이 플랫폼 연속성 도메인에 도달했을 것을 보장하기 위한 것입니다.
예를 들어, 임시적이지 않은 pmem 영역으로의 쓰기 후, 우리는 쓰기가 플랫폼
연속성 도메인에 도달했을 것을 보장하기 위해 pmem_wmb() 를 사용합니다.
이는 쓰기가 뒤따르는 instruction 들이 유발하는 어떠한 데이터 액세스나
데이터 전송의 시작 전에 persistent 저장소를 업데이트 했을 것을 보장합니다.
이는 wmb() 에 의해 이뤄지는 순서 규칙을 포함합니다.
Persistent memory 에서의 로드를 위해선 현재의 읽기 메모리 배리어로도 읽기
순서를 보장하는데 충분합니다.
=========================
암묵적 커널 메모리 배리어
......
.. include:: ../disclaimer-zh_CN.rst
:Original: :ref:`Documentation/arm64/amu.rst <amu_index>`
Translator: Bailu Lin <bailu.lin@vivo.com>
=================================
AArch64 Linux 中扩展的活动监控单元
=================================
作者: Ionela Voinescu <ionela.voinescu@arm.com>
日期: 2019-09-10
本文档简要描述了 AArch64 Linux 支持的活动监控单元的规范。
架构总述
--------
活动监控是 ARMv8.4 CPU 架构引入的一个可选扩展特性。
活动监控单元(在每个 CPU 中实现)为系统管理提供了性能计数器。既可以通
过系统寄存器的方式访问计数器,同时也支持外部内存映射的方式访问计数器。
AMUv1 架构实现了一个由4个固定的64位事件计数器组成的计数器组。
- CPU 周期计数器:同 CPU 的频率增长
- 常量计数器:同固定的系统时钟频率增长
- 淘汰指令计数器: 同每次架构指令执行增长
- 内存停顿周期计数器:计算由在时钟域内的最后一级缓存中未命中而引起
的指令调度停顿周期数
当处于 WFI 或者 WFE 状态时,计数器不会增长。
AMU 架构提供了一个高达16位的事件计数器空间,未来新的 AMU 版本中可能
用它来实现新增的事件计数器。
另外,AMUv1 实现了一个多达16个64位辅助事件计数器的计数器组。
冷复位时所有的计数器会清零。
基本支持
--------
内核可以安全地运行在支持 AMU 和不支持 AMU 的 CPU 组合中。
因此,当配置 CONFIG_ARM64_AMU_EXTN 后我们无条件使能后续
(secondary or hotplugged) CPU 检测和使用这个特性。
当在 CPU 上检测到该特性时,我们会标记为特性可用但是不能保证计数器的功能,
仅表明有扩展属性。
固件(代码运行在高异常级别,例如 arm-tf )需支持以下功能:
- 提供低异常级别(EL2 和 EL1)访问 AMU 寄存器的能力。
- 使能计数器。如果未使能,它的值应为 0。
- 在从电源关闭状态启动 CPU 前或后保存或者恢复计数器。
当使用使能了该特性的内核启动但固件损坏时,访问计数器寄存器可能会遭遇
panic 或者死锁。即使未发现这些症状,计数器寄存器返回的数据结果并不一
定能反映真实情况。通常,计数器会返回 0,表明他们未被使能。
如果固件没有提供适当的支持最好关闭 CONFIG_ARM64_AMU_EXTN。
值得注意的是,出于安全原因,不要绕过 AMUSERRENR_EL0 设置而捕获从
EL0(用户空间) 访问 EL1(内核空间)。 因此,固件应该确保访问 AMU寄存器
不会困在 EL2或EL3。
AMUv1 的固定计数器可以通过如下系统寄存器访问:
- SYS_AMEVCNTR0_CORE_EL0
- SYS_AMEVCNTR0_CONST_EL0
- SYS_AMEVCNTR0_INST_RET_EL0
- SYS_AMEVCNTR0_MEM_STALL_EL0
特定辅助计数器可以通过 SYS_AMEVCNTR1_EL0(n) 访问,其中n介于0到15。
详细信息定义在目录:arch/arm64/include/asm/sysreg.h。
用户空间访问
------------
由于以下原因,当前禁止从用户空间访问 AMU 的寄存器:
- 安全因数:可能会暴露处于安全模式执行的代码信息。
- 意愿:AMU 是用于系统管理的。
同样,该功能对用户空间不可见。
虚拟化
------
由于以下原因,当前禁止从 KVM 客户端的用户空间(EL0)和内核空间(EL1)
访问 AMU 的寄存器:
- 安全因数:可能会暴露给其他客户端或主机端执行的代码信息。
任何试图访问 AMU 寄存器的行为都会触发一个注册在客户端的未定义异常。
.. include:: ../disclaimer-zh_CN.rst
:Original: :ref:`Documentation/arm64/index.rst <arm64_index>`
:Translator: Bailu Lin <bailu.lin@vivo.com>
.. _cn_arm64_index:
==========
ARM64 架构
==========
.. toctree::
:maxdepth: 2
amu
......@@ -154,14 +154,13 @@ sysfs 会为这个类型调用适当的方法。当一个文件被读写时,
示例:
#define to_dev(obj) container_of(obj, struct device, kobj)
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
char *buf)
{
struct device_attribute *dev_attr = to_dev_attr(attr);
struct device *dev = to_dev(kobj);
struct device *dev = kobj_to_dev(kobj);
ssize_t ret = -EIO;
if (dev_attr->show)
......
......@@ -19,6 +19,7 @@
admin-guide/index
process/index
filesystems/index
arm64/index
目录和表格
----------
......
......@@ -8,7 +8,7 @@ Linux Virtualization Support
:maxdepth: 2
kvm/index
uml/user_mode_linux
uml/user_mode_linux_howto_v2
paravirt_ops
guest-halt-polling
......
......@@ -53,11 +53,11 @@ key management interface to perform common hypervisor activities such as
encrypting bootstrap code, snapshot, migrating and debugging the guest. For more
information, see the SEV Key Management spec [api-spec]_
The main ioctl to access SEV is KVM_MEM_ENCRYPT_OP. If the argument
to KVM_MEM_ENCRYPT_OP is NULL, the ioctl returns 0 if SEV is enabled
The main ioctl to access SEV is KVM_MEMORY_ENCRYPT_OP. If the argument
to KVM_MEMORY_ENCRYPT_OP is NULL, the ioctl returns 0 if SEV is enabled
and ``ENOTTY` if it is disabled (on some older versions of Linux,
the ioctl runs normally even with a NULL argument, and therefore will
likely return ``EFAULT``). If non-NULL, the argument to KVM_MEM_ENCRYPT_OP
likely return ``EFAULT``). If non-NULL, the argument to KVM_MEMORY_ENCRYPT_OP
must be a struct kvm_sev_cmd::
struct kvm_sev_cmd {
......
......@@ -4211,7 +4211,7 @@ H_GET_CPU_CHARACTERISTICS hypercall.
:Capability: basic
:Architectures: x86
:Type: system
:Type: vm
:Parameters: an opaque platform specific structure (in/out)
:Returns: 0 on success; -1 on error
......@@ -4343,7 +4343,7 @@ Errors:
#define KVM_STATE_NESTED_VMX_SMM_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_VMX_SMM_VMXON 0x00000002
#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
struct kvm_vmx_nested_state_hdr {
__u64 vmxon_pa;
......
......@@ -78,7 +78,7 @@ KVM_FEATURE_PV_SEND_IPI 11 guest checks this feature bit
before enabling paravirtualized
sebd IPIs
KVM_FEATURE_PV_POLL_CONTROL 12 host-side polling on HLT can
KVM_FEATURE_POLL_CONTROL 12 host-side polling on HLT can
be disabled by writing
to msr 0x4b564d05.
......
This source diff could not be displayed because it is too large. You can view the blob instead.
.. SPDX-License-Identifier: GPL-2.0
#########
UML HowTo
#########
.. contents:: :local:
************
Introduction
************
Welcome to User Mode Linux
User Mode Linux is the first Open Source virtualization platform (first
release date 1991) and second virtualization platform for an x86 PC.
How is UML Different from a VM using Virtualization package X?
==============================================================
We have come to assume that virtualization also means some level of
hardware emulation. In fact, it does not. As long as a virtualization
package provides the OS with devices which the OS can recognize and
has a driver for, the devices do not need to emulate real hardware.
Most OSes today have built-in support for a number of "fake"
devices used only under virtualization.
User Mode Linux takes this concept to the ultimate extreme - there
is not a single real device in sight. It is 100% artificial or if
we use the correct term 100% paravirtual. All UML devices are abstract
concepts which map onto something provided by the host - files, sockets,
pipes, etc.
The other major difference between UML and various virtualization
packages is that there is a distinct difference between the way the UML
kernel and the UML programs operate.
The UML kernel is just a process running on Linux - same as any other
program. It can be run by an unprivileged user and it does not require
anything in terms of special CPU features.
The UML userspace, however, is a bit different. The Linux kernel on the
host machine assists UML in intercepting everything the program running
on a UML instance is trying to do and making the UML kernel handle all
of its requests.
This is different from other virtualization packages which do not make any
difference between the guest kernel and guest programs. This difference
results in a number of advantages and disadvantages of UML over let's say
QEMU which we will cover later in this document.
Why Would I Want User Mode Linux?
=================================
* If User Mode Linux kernel crashes, your host kernel is still fine. It
is not accelerated in any way (vhost, kvm, etc) and it is not trying to
access any devices directly. It is, in fact, a process like any other.
* You can run a usermode kernel as a non-root user (you may need to
arrange appropriate permissions for some devices).
* You can run a very small VM with a minimal footprint for a specific
task (for example 32M or less).
* You can get extremely high performance for anything which is a "kernel
specific task" such as forwarding, firewalling, etc while still being
isolated from the host kernel.
* You can play with kernel concepts without breaking things.
* You are not bound by "emulating" hardware, so you can try weird and
wonderful concepts which are very difficult to support when emulating
real hardware such as time travel and making your system clock
dependent on what UML does (very useful for things like tests).
* It's fun.
Why not to run UML
==================
* The syscall interception technique used by UML makes it inherently
slower for any userspace applications. While it can do kernel tasks
on par with most other virtualization packages, its userspace is
**slow**. The root cause is that UML has a very high cost of creating
new processes and threads (something most Unix/Linux applications
take for granted).
* UML is strictly uniprocessor at present. If you want to run an
application which needs many CPUs to function, it is clearly the
wrong choice.
***********************
Building a UML instance
***********************
There is no UML installer in any distribution. While you can use off
the shelf install media to install into a blank VM using a virtualization
package, there is no UML equivalent. You have to use appropriate tools on
your host to build a viable filesystem image.
This is extremely easy on Debian - you can do it using debootstrap. It is
also easy on OpenWRT - the build process can build UML images. All other
distros - YMMV.
Creating an image
=================
Create a sparse raw disk image::
# dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G
This will create a 16G disk image. The OS will initially allocate only one
block and will allocate more as they are written by UML. As of kernel
version 4.19 UML fully supports TRIM (as usually used by flash drives).
Using TRIM inside the UML image by specifying discard as a mount option
or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to
return any unused blocks to the OS.
Create a filesystem on the disk image and mount it::
# mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt
This example uses ext4, any other filesystem such as ext3, btrfs, xfs,
jfs, etc will work too.
Create a minimal OS installation on the mounted filesystem::
# debootstrap buster /mnt http://deb.debian.org/debian
debootstrap does not set up the root password, fstab, hostname or
anything related to networking. It is up to the user to do that.
Set the root password -t he easiest way to do that is to chroot into the
mounted image::
# chroot /mnt
# passwd
# exit
Edit key system files
=====================
UML block devices are called ubds. The fstab created by debootstrap
will be empty and it needs an entry for the root file system::
/dev/ubd0 ext4 discard,errors=remount-ro 0 1
The image hostname will be set to the same as the host on which you
are creating it image. It is a good idea to change that to avoid
"Oh, bummer, I rebooted the wrong machine".
UML supports two classes of network devices - the older uml_net ones
which are scheduled for obsoletion. These are called ethX. It also
supports the newer vector IO devices which are significantly faster
and have support for some standard virtual network encapsulations like
Ethernet over GRE and Ethernet over L2TPv3. These are called vec0.
Depending on which one is in use, ``/etc/network/interfaces`` will
need entries like::
# legacy UML network devices
auto eth0
iface eth0 inet dhcp
# vector UML network devices
auto vec0
iface eth0 inet dhcp
We now have a UML image which is nearly ready to run, all we need is a
UML kernel and modules for it.
Most distributions have a UML package. Even if you intend to use your own
kernel, testing the image with a stock one is always a good start. These
packages come with a set of modules which should be copied to the target
filesystem. The location is distribution dependent. For Debian these
reside under /usr/lib/uml/modules. Copy recursively the content of this
directory to the mounted UML filesystem::
# cp -rax /usr/lib/uml/modules /mnt/lib/modules
If you have compiled your own kernel, you need to use the usual "install
modules to a location" procedure by running::
# make install MODULES_DIR=/mnt/lib/modules
At this point the image is ready to be brought up.
*************************
Setting Up UML Networking
*************************
UML networking is designed to emulate an Ethernet connection. This
connection may be either a point-to-point (similar to a connection
between machines using a back-to-back cable) or a connection to a
switch. UML supports a wide variety of means to build these
connections to all of: local machine, remote machine(s), local and
remote UML and other VM instances.
+-----------+--------+------------------------------------+------------+
| Transport | Type | Capabilities | Throughput |
+===========+========+====================================+============+
| tap | vector | checksum, tso | > 8Gbit |
+-----------+--------+------------------------------------+------------+
| hybrid | vector | checksum, tso, multipacket rx | > 6GBit |
+-----------+--------+------------------------------------+------------+
| raw | vector | checksum, tso, multipacket rx, tx" | > 6GBit |
+-----------+--------+------------------------------------+------------+
| EoGRE | vector | multipacket rx, tx | > 3Gbit |
+-----------+--------+------------------------------------+------------+
| Eol2tpv3 | vector | multipacket rx, tx | > 3Gbit |
+-----------+--------+------------------------------------+------------+
| bess | vector | multipacket rx, tx | > 3Gbit |
+-----------+--------+------------------------------------+------------+
| fd | vector | dependent on fd type | varies |
+-----------+--------+------------------------------------+------------+
| tuntap | legacy | none | ~ 500Mbit |
+-----------+--------+------------------------------------+------------+
| daemon | legacy | none | ~ 450Mbit |
+-----------+--------+------------------------------------+------------+
| socket | legacy | none | ~ 450Mbit |
+-----------+--------+------------------------------------+------------+
| pcap | legacy | rx only | ~ 450Mbit |
+-----------+--------+------------------------------------+------------+
| ethertap | legacy | obsolete | ~ 500Mbit |
+-----------+--------+------------------------------------+------------+
| vde | legacy | obsolete | ~ 500Mbit |
+-----------+--------+------------------------------------+------------+
* All transports which have tso and checksum offloads can deliver speeds
approaching 10G on TCP streams.
* All transports which have multi-packet rx and/or tx can deliver pps
rates of up to 1Mps or more.
* All legacy transports are generally limited to ~600-700MBit and 0.05Mps
* GRE and L2TPv3 allow connections to all of: local machine, remote
machines, remote network devices and remote UML instances.
* Socket allows connections only between UML instances.
* Daemon and bess require running a local switch. This switch may be
connected to the host as well.
Network configuration privileges
================================
The majority of the supported networking modes need ``root`` privileges.
For example, in the legacy tuntap networking mode, users were required
to be part of the group associated with the tunnel device.
For newer network drivers like the vector transports, ``root`` privilege
is required to fire an ioctl to setup the tun interface and/or use
raw sockets where needed.
This can be achieved by granting the user a particular capability instead
of running UML as root. In case of vector transport, a user can add the
capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW``, to the uml binary.
Thenceforth, UML can be run with normal user privilges, along with
full networking.
For example::
# sudo setcap cap_net_raw,cap_net_admin+ep linux
Configuring vector transports
===============================
All vector transports support a similar syntax:
If X is the interface number as in vec0, vec1, vec2, etc, the general
syntax for options is::
vecX:transport="Transport Name",option=value,option=value,...,option=value
Common options
--------------
These options are common for all transports:
* ``depth=int`` - sets the queue depth for vector IO. This is the
amount of packets UML will attempt to read or write in a single
system call. The default number is 64 and is generally sufficient
for most applications that need throughput in the 2-4 Gbit range.
Higher speeds may require larger values.
* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value.
* ``gro=[0,1]`` - sets GRO on or off. Enables receive/transmit offloads.
The effect of this option depends on the host side support in the transport
which is being configured. In most cases it will enable TCP segmentation and
RX/TX checksumming offloads. The setting must be identical on the host side
and the UML side. The UML kernel will produce warnings if it is not.
For example, GRO is enabled by default on local machine interfaces
(e.g. veth pairs, bridge, etc), so it should be enabled in UML in the
corresponding UML transports (raw, tap, hybrid) in order for networking to
operate correctly.
* ``mtu=int`` - sets the interface MTU
* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved
if a packet will need to be re-encapsulated into for instance VXLAN.
* ``vec=0`` - disable multipacket io and fall back to packet at a
time mode
Shared Options
--------------
* ``ifname=str`` Transports which bind to a local network interface
have a shared option - the name of the interface to bind to.
* ``src, dst, src_port, dst_port`` - all transports which use sockets
which have the notion of source and destination and/or source port
and destination port use these to specify them.
* ``v6=[0,1]`` to specify if a v6 connection is desired for all
transports which operate over IP. Additionally, for transports that
have some differences in the way they operate over v4 and v6 (for example
EoL2TPv3), sets the correct mode of operation. In the absense of this
option, the socket type is determined based on what do the src and dst
arguments resolve/parse to.
tap transport
-------------
Example::
vecX:transport=tap,ifname=tap0,depth=128,gro=1
This will connect vec0 to tap0 on the host. Tap0 must already exist (for example
created using tunctl) and UP.
tap0 can be configured as a point-to-point interface and given an ip
address so that UML can talk to the host. Alternatively, it is possible
to connect UML to a tap interface which is connected to a bridge.
While tap relies on the vector infrastructure, it is not a true vector
transport at this point, because Linux does not support multi-packet
IO on tap file descriptors for normal userspace apps like UML. This
is a privilege which is offered only to something which can hook up
to it at kernel level via specialized interfaces like vhost-net. A
vhost-net like helper for UML is planned at some point in the future.
Privileges required: tap transport requires either:
* tap interface to exist and be created persistent and owned by the
UML user using tunctl. Example ``tunctl -u uml-user -t tap0``
* binary to have ``CAP_NET_ADMIN`` privilege
hybrid transport
----------------
Example::
vecX:transport=hybrid,ifname=tap0,depth=128,gro=1
This is an experimental/demo transport which couples tap for transmit
and a raw socket for receive. The raw socket allows multi-packet
receive resulting in significantly higher packet rates than normal tap
Privileges required: hybrid requires ``CAP_NET_RAW`` capability by
the UML user as well as the requirements for the tap transport.
raw socket transport
--------------------
Example::
vecX:transport=raw,ifname=p-veth0,depth=128,gro=1
This transport uses vector IO on raw sockets. While you can bind to any
interface including a physical one, the most common use it to bind to
the "peer" side of a veth pair with the other side configured on the
host.
Example host configuration for Debian:
**/etc/network/interfaces**::
auto veth0
iface veth0 inet static
address 192.168.4.1
netmask 255.255.255.252
broadcast 192.168.4.3
pre-up ip link add veth0 type veth peer name p-veth0 && \
ifconfig p-veth0 up
UML can now bind to p-veth0 like this::
vec0:transport=raw,ifname=p-veth0,depth=128,gro=1
If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0
it can talk to the host on 192.168.4.1
The raw transport also provides some support for offloading some of the
filtering to the host. The two options to control it are:
* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter
* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux.
This option allows the use of the ethtool load firmware command to
load bpf code.
In either case the bpf code is loaded into the host kernel. While this is
presently limited to legacy bpf syntax (not ebpf), it is still a security
risk. It is not recommended to allow this unless the User Mode Linux
instance is considered trusted.
Privileges required: raw socket transport requires `CAP_NET_RAW`
capability.
GRE socket transport
--------------------
Example::
vecX:transport=gre,src=$src_host,dst=$dst_host
This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or
``GREIRB``) tunnel which will connect the UML instance to a ``GRE``
endpoint at host dst_host. ``GRE`` supports the following additional
options:
* ``rx_key=int`` - GRE 32 bit integer key for rx packets, if set,
``txkey`` must be set too
* ``tx_key=int`` - GRE 32 bit integer key for tx packets, if set
``rx_key`` must be set too
* ``sequence=[0,1]`` - enable GRE sequence
* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset
on each packet (needed to interoperate with some really broken
implementations)
* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively
* GRE checksum is not presently supported
GRE has a number of caveats:
* You can use only one GRE connection per ip address. There is no way to
multiplex connections as each GRE tunnel is terminated directly on
the UML instance.
* The key is not really a security feature. While it was intended as such
it's "security" is laughable. It is, however, a useful feature to
ensure that the tunnel is not misconfigured.
An example configuration for a Linux host with a local address of
192.168.128.1 to connect to a UML instance at 192.168.129.1
**/etc/network/interfaces**::
auto gt0
iface gt0 inet static
address 10.0.0.1
netmask 255.255.255.0
broadcast 10.0.0.255
mtu 1500
pre-up ip link add gt0 type gretap local 192.168.128.1 \
remote 192.168.129.1 || true
down ip link del gt0 || true
Additionally, GRE has been tested versus a variety of network equipment.
Privileges required: GRE requires ``CAP_NET_RAW``
l2tpv3 socket transport
-----------------------
_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more
options than GNU ls". While it has some advantages, there are usually
easier (and less verbose) ways to connect a UML instance to something.
For example, most devices which support L2TPv3 also support GRE.
Example::
vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff
This will configure an Ethernet over L2TPv3 fixed tunnel which will
connect the UML instance to a L2TPv3 endpoint at host $dst_host using
the L2TPv3 UDP flavour and UDP destination port $dst_port.
L2TPv3 always requires the following additional options:
* ``rx_session=int`` - l2tpv3 32 bit integer session for rx packets
* ``tx_session=int`` - l2tpv3 32 bit integer session for tx packets
As the tunnel is fixed these are not negotiated and they are
preconfigured on both ends.
Additionally, L2TPv3 supports the following optional parameters
* ``rx_cookie=int`` - l2tpv3 32 bit integer cookie for rx packets - same
functionality as GRE key, more to prevent misconfiguration than provide
actual security
* ``tx_cookie=int`` - l2tpv3 32 bit integer cookie for tx packets
* ``cookie64=[0,1]`` - use 64 bit cookies instead of 32 bit.
* ``counter=[0,1]`` - enable l2tpv3 counter
* ``pin_counter=[0,1]`` - pretend that the counter is always reset on
each packet (needed to interoperate with some really broken
implementations)
* ``v6=[0,1]`` - force v6 sockets
* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol
L2TPv3 has a number of caveats:
* you can use only one connection per ip address in raw mode. There is
no way to multiplex connections as each L2TPv3 tunnel is terminated
directly on the UML instance. UDP mode can use different ports for
this purpose.
Here is an example of how to configure a linux host to connect to UML
via L2TPv3:
**/etc/network/interfaces**::
auto l2tp1
iface l2tp1 inet static
address 192.168.126.1
netmask 255.255.255.0
broadcast 192.168.126.255
mtu 1500
pre-up ip l2tp add tunnel remote 127.0.0.1 \
local 127.0.0.1 encap udp tunnel_id 2 \
peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \
ip l2tp add session name l2tp1 tunnel_id 2 \
session_id 0xffffffff peer_session_id 0xffffffff
down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \
ip l2tp del tunnel tunnel_id 2
Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and
no special privileges for the UDP mode.
BESS socket transport
---------------------
BESS is a high performance modular network switch.
https://github.com/NetSys/bess
It has support for a simple sequential packet socket mode which in the
more recent versions is using vector IO for high performance.
Example::
vecX:transport=bess,src=$unix_src,dst=$unix_dst
This will configure a BESS transport using the unix_src Unix domain
socket address as source and unix_dst socket address as destination.
For BESS configuration and how to allocate a BESS Unix domain socket port
please see the BESS documentation.
https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports
BESS transport does not require any special privileges.
Configuring Legacy transports
=============================
Legacy transports are now considered obsolete. Please use the vector
versions.
***********
Running UML
***********
This section assumes that either the user-mode-linux package from the
distribution or a custom built kernel has been installed on the host.
These add an executable called linux to the system. This is the UML
kernel. It can be run just like any other executable.
It will take most normal linux kernel arguments as command line
arguments. Additionally, it will need some UML specific arguments
in order to do something useful.
Arguments
=========
Mandatory Arguments:
--------------------
* ``mem=int[K,M,G]`` - amount of memory. By default bytes. It will
also accept K, M or G qualifiers.
* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really
mandatory, but it is likely to be needed in nearly all cases so we can
specify a root file system.
The simplest possible image specification is the name of the image
file for the filesystem (created using one of the methods described
in `Creating an image`_)
* UBD devices support copy on write (COW). The changes are kept in
a separate file which can be discarded allowing a rollback to the
original pristine image. If COW is desired, the UBD image is
specified as: ``cow_file,master_image``.
Example:``ubd0=Filesystem.cow,Filesystem.img``
* UBD devices can be set to use synchronous IO. Any writes are
immediately flushed to disk. This is done by adding ``s`` after
the ``ubdX`` specification
* UBD performs some euristics on devices specified as a single
filename to make sure that a COW file has not been specified as
the image. To turn them off, use the ``d`` flag after ``ubdX``
* UBD supports TRIM - asking the Host OS to reclaim any unused
blocks in the image. To turn it off, specify the ``t`` flag after
``ubdX``
* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux
filesystem image)
Important Optional Arguments
----------------------------
If UML is run as "linux" with no extra arguments, it will try to start an
xterm for every console configured inside the image (up to 6 in most
linux distributions). Each console is started inside an
xterm. This makes it nice and easy to use UML on a host with a GUI. It is,
however, the wrong approach if UML is to be used as a testing harness or run
in a text-only environment.
In order to change this behaviour we need to specify an alternative console
and wire it to one of the supported "line" channels. For this we need to map a
console to use something different from the default xterm.
Example which will divert console number 1 to stdin/stdout::
con1=fd:0,fd:1
UML supports a wide variety of serial line channels which are specified using
the following syntax
conX=channel_type:options[,channel_type:options]
If the channel specification contains two parts separated by comma, the first
one is input, the second one output.
* The null channel - Discard all input or output. Example ``con=null`` will set
all consoles to null by default.
* The fd channel - use file descriptor numbers for input/out. Example:
``con1=fd:0,fd:1.``
* The port channel - listen on tcp port number. Example: ``con1=port:4321``
* The pty and pts channels - use system pty/pts.
* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8``
will make UML use the host 8th console (usually unused).
* The xterm channel - this is the default - bring up an xterm on this channel
and direct IO to it. Note, that in order for xterm to work, the host must
have the UML distribution package installed. This usually contains the
port-helper and other utilities needed for UML to communicate with the xterm.
Alternatively, these need to be complied and installed from source. All
options applicable to consoles also apply to UML serial lines which are
presented as ttyS inside UML.
Starting UML
============
We can now run UML.
::
# linux mem=2048M umid=TEST \
ubd0=Filesystem.img \
vec0:transport=tap,ifname=tap0,depth=128,gro=1 \
root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1
This will run an instance with ``2048M RAM``, try to use the image file
called ``Filesystem.img`` as root. It will connect to the host using tap0.
All consoles except ``con1`` will be disabled and console 1 will
use standard input/output making it appear in the same terminal it was started.
Logging in
============
If you have not set up a password when generating the image, you will have to
shut down the UML instance, mount the image, chroot into it and set it - as
described in the Generating an Image section. If the password is already set,
you can just log in.
The UML Management Console
============================
In addition to managing the image from "the inside" using normal sysadmin tools,
it is possible to perform a number of low level operations using the UML
management console. The UML management console is a low-level interface to the
kernel on a running UML instance, somewhat like the i386 SysRq interface. Since
there is a full-blown operating system under UML, there is much greater
flexibility possible than with the SysRq mechanism.
There are a number of things you can do with the mconsole interface:
* get the kernel version
* add and remove devices
* halt or reboot the machine
* Send SysRq commands
* Pause and resume the UML
* Inspect processes running inside UML
* Inspect UML internal /proc state
You need the mconsole client (uml\_mconsole) which is a part of the UML
tools package available in most Linux distritions.
You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML
kernel. When you boot UML, you'll see a line like::
mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
If you specify a unique machine id one the UML command line, i.e.
``umid=debian``, you'll see this::
mconsole initialized on /home/jdike/.uml/debian/mconsole
That file is the socket that uml_mconsole will use to communicate with
UML. Run it with either the umid or the full path as its argument::
# uml_mconsole debian
or
# uml_mconsole /home/jdike/.uml/debian/mconsole
You'll get a prompt, at which you can run one of these commands:
* version
* help
* halt
* reboot
* config
* remove
* sysrq
* help
* cad
* stop
* go
* proc
* stack
version
-------
This command takes no arguments. It prints the UML version::
(mconsole) version
OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64
There are a couple actual uses for this. It's a simple no-op which
can be used to check that a UML is running. It's also a way of
sending a device interrupt to the UML. UML mconsole is treated internally as
a UML device.
help
----
This command takes no arguments. It prints a short help screen with the
supported mconsole commands.
halt and reboot
---------------
These commands take no arguments. They shut the machine down immediately, with
no syncing of disks and no clean shutdown of userspace. So, they are
pretty close to crashing the machine::
(mconsole) halt
OK
config
------
"config" adds a new device to the virtual machine. This is supported
by most UML device drivers. It takes one argument, which is the
device to add, with the same syntax as the kernel command line::
(mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22
remove
------
"remove" deletes a device from the system. Its argument is just the
name of the device to be removed. The device must be idle in whatever
sense the driver considers necessary. In the case of the ubd driver,
the removed block device must not be mounted, swapped on, or otherwise
open, and in the case of the network driver, the device must be down::
(mconsole) remove ubd3
sysrq
-----
This command takes one argument, which is a single letter. It calls the
generic kernel's SysRq driver, which does whatever is called for by
that argument. See the SysRq documentation in
Documentation/admin-guide/sysrq.rst in your favorite kernel tree to
see what letters are valid and what they do.
cad
---
This invokes the ``Ctl-Alt-Del`` action in the running image. What exactly
this ends up doing is up to init, systemd, etc. Normally, it reboots the
machine.
stop
----
This puts the UML in a loop reading mconsole requests until a 'go'
mconsole command is received. This is very useful as a
debugging/snapshotting tool.
go
--
This resumes a UML after being paused by a 'stop' command. Note that
when the UML has resumed, TCP connections may have timed out and if
the UML is paused for a long period of time, crond might go a little
crazy, running all the jobs it didn't do earlier.
proc
----
This takes one argument - the name of a file in /proc which is printed
to the mconsole standard output
stack
-----
This takes one argument - the pid number of a process. Its stack is
printed to a standard output.
*******************
Advanced UML Topics
*******************
Sharing Filesystems between Virtual Machines
============================================
Don't attempt to share filesystems simply by booting two UMLs from the
same file. That's the same thing as booting two physical machines
from a shared disk. It will result in filesystem corruption.
Using layered block devices
---------------------------
The way to share a filesystem between two virtual machines is to use
the copy-on-write (COW) layering capability of the ubd block driver.
Any changed blocks are stored in the private COW file, while reads come
from either device - the private one if the requested block is valid in
it, the shared one if not. Using this scheme, the majority of data
which is unchanged is shared between an arbitrary number of virtual
machines, each of which has a much smaller file containing the changes
that it has made. With a large number of UMLs booting from a large root
filesystem, this leads to a huge disk space saving.
Sharing file system data will also help performance, since the host will
be able to cache the shared data using a much smaller amount of memory,
so UML disk requests will be served from the host's memory rather than
its disks. There is a major caveat in doing this on multisocket NUMA
machines. On such hardware, running many UML instances with a shared
master image and COW changes may caise issues like NMIs from excess of
inter-socket traffic.
If you are running UML on high end hardware like this, make sure to
bind UML to a set of logical cpus residing on the same socket using the
``taskset`` command or have a look at the "tuning" section.
To add a copy-on-write layer to an existing block device file, simply
add the name of the COW file to the appropriate ubd switch::
ubd0=root_fs_cow,root_fs_debian_22
where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is
the existing shared filesystem. The COW file need not exist. If it
doesn't, the driver will create and initialize it.
Disk Usage
----------
UML has TRIM support which will release any unused space in its disk
image files to the underlying OS. It is important to use either ls -ls
or du to verify the actual file size.
COW validity.
-------------
Any changes to the master image will invalidate all COW files. If this
happens, UML will *NOT* automatically delete any of the COW files and
will refuse to boot. In this case the only solution is to either
restore the old image (including its last modified timestamp) or remove
all COW files which will result in their recreation. Any changes in
the COW files will be lost.
Cows can moo - uml_moo : Merging a COW file with its backing file
-----------------------------------------------------------------
Depending on how you use UML and COW devices, it may be advisable to
merge the changes in the COW file into the backing file every once in
a while.
The utility that does this is uml_moo. Its usage is::
uml_moo COW_file new_backing_file
There's no need to specify the backing file since that information is
already in the COW file header. If you're paranoid, boot the new
merged file, and if you're happy with it, move it over the old backing
file.
``uml_moo`` creates a new backing file by default as a safety measure.
It also has a destructive merge option which will merge the COW file
directly into its current backing file. This is really only usable
when the backing file only has one COW file associated with it. If
there are multiple COWs associated with a backing file, a -d merge of
one of them will invalidate all of the others. However, it is
convenient if you're short of disk space, and it should also be
noticeably faster than a non-destructive merge.
``uml_moo`` is installed with the UML distribution packages and is
available as a part of UML utilities.
Host file access
==================
If you want to access files on the host machine from inside UML, you
can treat it as a separate machine and either nfs mount directories
from the host or copy files into the virtual machine with scp.
However, since UML is running on the host, it can access those
files just like any other process and make them available inside the
virtual machine without the need to use the network.
This is possible with the hostfs virtual filesystem. With it, you
can mount a host directory into the UML filesystem and access the
files contained in it just as you would on the host.
*SECURITY WARNING*
Hostfs without any parameters to the UML Image will allow the image
to mount any part of the host filesystem and write to it. Always
confine hostfs to a specific "harmless" directory (for example ``/var/tmp``)
if running UML. This is especially important if UML is being run as root.
Using hostfs
------------
To begin with, make sure that hostfs is available inside the virtual
machine with::
# cat /proc/filesystems
``hostfs`` should be listed. If it's not, either rebuild the kernel
with hostfs configured into it or make sure that hostfs is built as a
module and available inside the virtual machine, and insmod it.
Now all you need to do is run mount::
# mount none /mnt/host -t hostfs
will mount the host's ``/`` on the virtual machine's ``/mnt/host``.
If you don't want to mount the host root directory, then you can
specify a subdirectory to mount with the -o switch to mount::
# mount none /mnt/home -t hostfs -o /home
will mount the hosts's /home on the virtual machine's /mnt/home.
hostfs as the root filesystem
-----------------------------
It's possible to boot from a directory hierarchy on the host using
hostfs rather than using the standard filesystem in a file.
To start, you need that hierarchy. The easiest way is to loop mount
an existing root_fs file::
# mount root_fs uml_root_dir -o loop
You need to change the filesystem type of ``/`` in ``etc/fstab`` to be
'hostfs', so that line looks like this::
/dev/ubd/0 / hostfs defaults 1 1
Then you need to chown to yourself all the files in that directory
that are owned by root. This worked for me::
# find . -uid 0 -exec chown jdike {} \;
Next, make sure that your UML kernel has hostfs compiled in, not as a
module. Then run UML with the boot device pointing at that directory::
ubd0=/path/to/uml/root/directory
UML should then boot as it does normally.
Hostfs Caveats
--------------
Hostfs does not support keeping track of host filesystem changes on the
host (outside UML). As a result, if a file is changed without UML's
knowledge, UML will not know about it and its own in-memory cache of
the file may be corrupt. While it is possible to fix this, it is not
something which is being worked on at present.
Tuning UML
============
UML at present is strictly uniprocessor. It will, however spin up a
number of threads to handle various functions.
The UBD driver, SIGIO and the MMU emulation do that. If the system is
idle, these threads will be migrated to other processors on a SMP host.
This, unfortunately, will usually result in LOWER performance because of
all of the cache/memory synchronization traffic between cores. As a
result, UML will usually benefit from being pinned on a single CPU
especially on a large system. This can result in performance differences
of 5 times or higher on some benchmarks.
Similarly, on large multi-node NUMA systems UML will benefit if all of
its memory is allocated from the same NUMA node it will run on. The
OS will *NOT* do that by default. In order to do that, the sysadmin
needs to create a suitable tmpfs ramdisk bound to a particular node
and use that as the source for UML RAM allocation by specifying it
in the TMP or TEMP environment variables. UML will look at the values
of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will
look for shmfs mounted under ``/dev/shm``. If everything else fails use
``/tmp/`` regardless of the filesystem type used for it::
mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX
TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options..
*******************************************
Contributing to UML and Developing with UML
*******************************************
UML is an excellent platform to develop new Linux kernel concepts -
filesystems, devices, virtualization, etc. It provides unrivalled
opportunities to create and test them without being constrained to
emulating specific hardware.
Example - want to try how linux will work with 4096 "proper" network
devices?
Not an issue with UML. At the same time, this is something which
is difficult with other virtualization packages - they are
constrained by the number of devices allowed on the hardware bus
they are trying to emulate (for example 16 on a PCI bus in qemu).
If you have something to contribute such as a patch, a bugfix, a
new feature, please send it to ``linux-um@lists.infradead.org``
Please follow all standard Linux patch guidelines such as cc-ing
relevant maintainers and run ``./sripts/checkpatch.pl`` on your patch.
For more details see ``Documentation/process/submitting-patches.rst``
Note - the list does not accept HTML or attachments, all emails must
be formatted as plain text.
Developing always goes hand in hand with debugging. First of all,
you can always run UML under gdb and there will be a whole section
later on on how to do that. That, however, is not the only way to
debug a linux kernel. Quite often adding tracing statements and/or
using UML specific approaches such as ptracing the UML kernel process
are significantly more informative.
Tracing UML
=============
When running UML consists of a main kernel thread and a number of
helper threads. The ones of interest for tracing are NOT the ones
that are already ptraced by UML as a part of its MMU emulation.
These are usually the first three threads visible in a ps display.
The one with the lowest PID number and using most CPU is usually the
kernel thread. The other threads are the disk
(ubd) device helper thread and the sigio helper thread.
Running ptrace on this thread usually results in the following picture::
host$ strace -p 16566
--- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} ---
epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
epoll_wait(4, [], 64, 0) = 0
rt_sigreturn({mask=[PIPE]}) = 16967
ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0
ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0
ptrace(PTRACE_SYSEMU, 16967, NULL, 0) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} ---
wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967
ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0
getpid() = 16566
clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} ---
rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call)
This is a typical picture from a mostly idle UML instance
* UML interrupt controller uses epoll - this is UML waiting for IO
interrupts:
epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
* The sequence of ptrace calls is part of MMU emulation and runnin the
UML userspace
* ``timer_settime`` is part of the UML high res timer subsystem mapping
timer requests from inside UML onto the host high resultion timers.
* ``clock_nanosleep`` is UML going into idle (similar to the way a PC
will execute an ACPI idle).
As you can see UML will generate quite a bit of output even in idle.The output
can be very informative when observing IO. It shows the actual IO calls, their
arguments and returns values.
Kernel debugging
================
You can run UML under gdb now, though it will not necessarily agree to
be started under it. If you are trying to track a runtime bug, it is
much better to attach gdb to a running UML instance and let UML run.
Assuming the same PID number as in the previous example, this would be::
# gdb -p 16566
This will STOP the UML instance, so you must enter `cont` at the GDB
command line to request it to continue. It may be a good idea to make
this into a gdb script and pass it to gdb as an argument.
Developing Device Drivers
=========================
Nearly all UML drivers are monolithic. While it is possible to build a
UML driver as a kernel module, that limits the possible functionality
to in-kernel only and non-UML specific. The reason for this is that
in order to really leverage UML, one needs to write a piece of
userspace code which maps driver concepts onto actual userspace host
calls.
This forms the so called "user" portion of the driver. While it can
reuse a lot of kernel concepts, it is generally just another piece of
userspace code. This portion needs some matching "kernel" code which
resides inside the UML image and which implements the Linux kernel part.
*Note: There are very few limitations in the way "kernel" and "user" interact*.
UML does not have a strictly defined kernel to host API. It does not
try to emulate a specific architecture or bus. UML's "kernel" and
"user" can share memory, code and interact as needed to implement
whatever design the software developer has in mind. The only
limitations are purely technical. Due to a lot of functions and
variables having the same names, the developer should be careful
which includes and libraries they are trying to refer to.
As a result a lot of userspace code consists of simple wrappers.
F.e. ``os_close_file()`` is just a wrapper around ``close()``
which ensures that the userspace function close does not clash
with similarly named function(s) in the kernel part.
Security Considerations
-----------------------
Drivers or any new functionality should default to not
accepting arbitrary filename, bpf code or other parameters
which can affect the host from inside the UML instance.
For example, specifying the socket used for IPC communication
between a driver and the host at the UML command line is OK
security-wise. Allowing it as a loadable module parameter
isn't.
If such functionality is desireable for a particular application
(e.g. loading BPF "firmware" for raw socket network transports),
it should be off by default and should be explicitly turned on
as a command line parameter at startup.
Even with this in mind, the level of isolation between UML
and the host is relatively weak. If the UML userspace is
allowed to load arbitrary kernel drivers, an attacker can
use this to break out of UML. Thus, if UML is used in
a production application, it is recommended that all modules
are loaded at boot and kernel module loading is disabled
afterwards.
.. hmm:
.. _hmm:
=====================================
Heterogeneous Memory Management (HMM)
......@@ -271,10 +271,139 @@ map those pages from the CPU side.
Migration to and from device memory
===================================
Because the CPU cannot access device memory, migration must use the device DMA
engine to perform copy from and to device memory. For this we need to use
migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() helpers.
Because the CPU cannot access device memory directly, the device driver must
use hardware DMA or device specific load/store instructions to migrate data.
The migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize()
functions are designed to make drivers easier to write and to centralize common
code across drivers.
Before migrating pages to device private memory, special device private
``struct page`` need to be created. These will be used as special "swap"
page table entries so that a CPU process will fault if it tries to access
a page that has been migrated to device private memory.
These can be allocated and freed with::
struct resource *res;
struct dev_pagemap pagemap;
res = request_free_mem_region(&iomem_resource, /* number of bytes */,
"name of driver resource");
pagemap.type = MEMORY_DEVICE_PRIVATE;
pagemap.range.start = res->start;
pagemap.range.end = res->end;
pagemap.nr_range = 1;
pagemap.ops = &device_devmem_ops;
memremap_pages(&pagemap, numa_node_id());
memunmap_pages(&pagemap);
release_mem_region(pagemap.range.start, range_len(&pagemap.range));
There are also devm_request_free_mem_region(), devm_memremap_pages(),
devm_memunmap_pages(), and devm_release_mem_region() when the resources can
be tied to a ``struct device``.
The overall migration steps are similar to migrating NUMA pages within system
memory (see :ref:`Page migration <page_migration>`) but the steps are split
between device driver specific code and shared common code:
1. ``mmap_read_lock()``
The device driver has to pass a ``struct vm_area_struct`` to
migrate_vma_setup() so the mmap_read_lock() or mmap_write_lock() needs to
be held for the duration of the migration.
2. ``migrate_vma_setup(struct migrate_vma *args)``
The device driver initializes the ``struct migrate_vma`` fields and passes
the pointer to migrate_vma_setup(). The ``args->flags`` field is used to
filter which source pages should be migrated. For example, setting
``MIGRATE_VMA_SELECT_SYSTEM`` will only migrate system memory and
``MIGRATE_VMA_SELECT_DEVICE_PRIVATE`` will only migrate pages residing in
device private memory. If the latter flag is set, the ``args->pgmap_owner``
field is used to identify device private pages owned by the driver. This
avoids trying to migrate device private pages residing in other devices.
Currently only anonymous private VMA ranges can be migrated to or from
system memory and device private memory.
One of the first steps migrate_vma_setup() does is to invalidate other
device's MMUs with the ``mmu_notifier_invalidate_range_start(()`` and
``mmu_notifier_invalidate_range_end()`` calls around the page table
walks to fill in the ``args->src`` array with PFNs to be migrated.
The ``invalidate_range_start()`` callback is passed a
``struct mmu_notifier_range`` with the ``event`` field set to
``MMU_NOTIFY_MIGRATE`` and the ``migrate_pgmap_owner`` field set to
the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is
allows the device driver to skip the invalidation callback and only
invalidate device private MMU mappings that are actually migrating.
This is explained more in the next section.
While walking the page tables, a ``pte_none()`` or ``is_zero_pfn()``
entry results in a valid "zero" PFN stored in the ``args->src`` array.
This lets the driver allocate device private memory and clear it instead
of copying a page of zeros. Valid PTE entries to system memory or
device private struct pages will be locked with ``lock_page()``, isolated
from the LRU (if system memory since device private pages are not on
the LRU), unmapped from the process, and a special migration PTE is
inserted in place of the original PTE.
migrate_vma_setup() also clears the ``args->dst`` array.
3. The device driver allocates destination pages and copies source pages to
destination pages.
The driver checks each ``src`` entry to see if the ``MIGRATE_PFN_MIGRATE``
bit is set and skips entries that are not migrating. The device driver
can also choose to skip migrating a page by not filling in the ``dst``
array for that page.
The driver then allocates either a device private struct page or a
system memory page, locks the page with ``lock_page()``, and fills in the
``dst`` array entry with::
dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
Now that the driver knows that this page is being migrated, it can
invalidate device private MMU mappings and copy device private memory
to system memory or another device private page. The core Linux kernel
handles CPU page table invalidations so the device driver only has to
invalidate its own MMU mappings.
The driver can use ``migrate_pfn_to_page(src[i])`` to get the
``struct page`` of the source and either copy the source page to the
destination or clear the destination device private memory if the pointer
is ``NULL`` meaning the source page was not populated in system memory.
4. ``migrate_vma_pages()``
This step is where the migration is actually "committed".
If the source page was a ``pte_none()`` or ``is_zero_pfn()`` page, this
is where the newly allocated page is inserted into the CPU's page table.
This can fail if a CPU thread faults on the same page. However, the page
table is locked and only one of the new pages will be inserted.
The device driver will see that the ``MIGRATE_PFN_MIGRATE`` bit is cleared
if it loses the race.
If the source page was locked, isolated, etc. the source ``struct page``
information is now copied to destination ``struct page`` finalizing the
migration on the CPU side.
5. Device driver updates device MMU page tables for pages still migrating,
rolling back pages not migrating.
If the ``src`` entry still has ``MIGRATE_PFN_MIGRATE`` bit set, the device
driver can update the device MMU and set the write enable bit if the
``MIGRATE_PFN_WRITE`` bit is set.
6. ``migrate_vma_finalize()``
This step replaces the special migration page table entry with the new
page's page table entry and releases the reference to the source and
destination ``struct page``.
7. ``mmap_read_unlock()``
The lock can now be released.
Memory cgroup (memcg) and rss accounting
========================================
......
......@@ -29,6 +29,7 @@ descriptions of data structures and algorithms.
:maxdepth: 1
active_mm
arch_pgtable_helpers
balance
cleancache
free_page_reporting
......
......@@ -4,25 +4,28 @@
Page migration
==============
Page migration allows the moving of the physical location of pages between
nodes in a numa system while the process is running. This means that the
Page migration allows moving the physical location of pages between
nodes in a NUMA system while the process is running. This means that the
virtual addresses that the process sees do not change. However, the
system rearranges the physical location of those pages.
The main intend of page migration is to reduce the latency of memory access
Also see :ref:`Heterogeneous Memory Management (HMM) <hmm>`
for migrating pages to or from device private memory.
The main intent of page migration is to reduce the latency of memory accesses
by moving pages near to the processor where the process accessing that memory
is running.
Page migration allows a process to manually relocate the node on which its
pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
a new memory policy via mbind(). The pages of process can also be relocated
a new memory policy via mbind(). The pages of a process can also be relocated
from another process using the sys_migrate_pages() function call. The
migrate_pages function call takes two sets of nodes and moves pages of a
migrate_pages() function call takes two sets of nodes and moves pages of a
process that are located on the from nodes to the destination nodes.
Page migration functions are provided by the numactl package by Andi Kleen
(a version later than 0.9.3 is required. Get it from
ftp://oss.sgi.com/www/projects/libnuma/download/). numactl provides libnuma
which provides an interface similar to other numa functionality for page
https://github.com/numactl/numactl.git). numactl provides libnuma
which provides an interface similar to other NUMA functionality for page
migration. cat ``/proc/<pid>/numa_maps`` allows an easy review of where the
pages of a process are located. See also the numa_maps documentation in the
proc(5) man page.
......@@ -30,19 +33,19 @@ proc(5) man page.
Manual migration is useful if for example the scheduler has relocated
a process to a processor on a distant node. A batch scheduler or an
administrator may detect the situation and move the pages of the process
nearer to the new processor. The kernel itself does only provide
nearer to the new processor. The kernel itself only provides
manual page migration support. Automatic page migration may be implemented
through user space processes that move pages. A special function call
"move_pages" allows the moving of individual pages within a process.
A NUMA profiler may f.e. obtain a log showing frequent off node
For example, A NUMA profiler may obtain a log showing frequent off-node
accesses and may use the result to move pages to more advantageous
locations.
Larger installations usually partition the system using cpusets into
sections of nodes. Paul Jackson has equipped cpusets with the ability to
move pages when a task is moved to another cpuset (See
Documentation/admin-guide/cgroup-v1/cpusets.rst).
Cpusets allows the automation of process locality. If a task is moved to
:ref:`CPUSETS <cpusets>`).
Cpusets allow the automation of process locality. If a task is moved to
a new cpuset then also all its pages are moved with it so that the
performance of the process does not sink dramatically. Also the pages
of processes in a cpuset are moved if the allowed memory nodes of a
......@@ -67,9 +70,9 @@ In kernel use of migrate_pages()
Lists of pages to be migrated are generated by scanning over
pages and moving them into lists. This is done by
calling isolate_lru_page().
Calling isolate_lru_page increases the references to the page
Calling isolate_lru_page() increases the references to the page
so that it cannot vanish while the page migration occurs.
It also prevents the swapper or other scans to encounter
It also prevents the swapper or other scans from encountering
the page.
2. We need to have a function of type new_page_t that can be
......@@ -91,23 +94,24 @@ is increased so that the page cannot be freed while page migration occurs.
Steps:
1. Lock the page to be migrated
1. Lock the page to be migrated.
2. Ensure that writeback is complete.
3. Lock the new page that we want to move to. It is locked so that accesses to
this (not yet uptodate) page immediately lock while the move is in progress.
this (not yet uptodate) page immediately block while the move is in progress.
4. All the page table references to the page are converted to migration
entries. This decreases the mapcount of a page. If the resulting
mapcount is not zero then we do not migrate the page. All user space
processes that attempt to access the page will now wait on the page lock.
processes that attempt to access the page will now wait on the page lock
or wait for the migration page table entry to be removed.
5. The i_pages lock is taken. This will cause all processes trying
to access the page via the mapping to block on the spinlock.
6. The refcount of the page is examined and we back out if references remain
otherwise we know that we are the only one referencing this page.
6. The refcount of the page is examined and we back out if references remain.
Otherwise, we know that we are the only one referencing this page.
7. The radix tree is checked and if it does not contain the pointer to this
page then we back out because someone else modified the radix tree.
......@@ -134,124 +138,124 @@ Steps:
15. Queued up writeback on the new page is triggered.
16. If migration entries were page then replace them with real ptes. Doing
so will enable access for user space processes not already waiting for
the page lock.
16. If migration entries were inserted into the page table, then replace them
with real ptes. Doing so will enable access for user space processes not
already waiting for the page lock.
19. The page locks are dropped from the old and new page.
17. The page locks are dropped from the old and new page.
Processes waiting on the page lock will redo their page faults
and will reach the new page.
20. The new page is moved to the LRU and can be scanned by the swapper
etc again.
18. The new page is moved to the LRU and can be scanned by the swapper,
etc. again.
Non-LRU page migration
======================
Although original migration aimed for reducing the latency of memory access
for NUMA, compaction who want to create high-order page is also main customer.
Although migration originally aimed for reducing the latency of memory accesses
for NUMA, compaction also uses migration to create high-order pages.
Current problem of the implementation is that it is designed to migrate only
*LRU* pages. However, there are potential non-lru pages which can be migrated
*LRU* pages. However, there are potential non-LRU pages which can be migrated
in drivers, for example, zsmalloc, virtio-balloon pages.
For virtio-balloon pages, some parts of migration code path have been hooked
up and added virtio-balloon specific functions to intercept migration logics.
It's too specific to a driver so other drivers who want to make their pages
movable would have to add own specific hooks in migration path.
movable would have to add their own specific hooks in the migration path.
To overclome the problem, VM supports non-LRU page migration which provides
To overcome the problem, VM supports non-LRU page migration which provides
generic functions for non-LRU movable pages without driver specific hooks
migration path.
in the migration path.
If a driver want to make own pages movable, it should define three functions
If a driver wants to make its pages movable, it should define three functions
which are function pointers of struct address_space_operations.
1. ``bool (*isolate_page) (struct page *page, isolate_mode_t mode);``
What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the page
What VM expects from isolate_page() function of driver is to return *true*
if driver isolates the page successfully. On returning true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.
Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.
shouldn't expect to preserve values in those fields.
2. ``int (*migratepage) (struct address_space *mapping,``
| ``struct page *newpage, struct page *oldpage, enum migrate_mode);``
After isolation, VM calls migratepage of driver with isolated page.
The function of migratepage is to move content of the old page to new page
After isolation, VM calls migratepage() of driver with the isolated page.
The function of migratepage() is to move the contents of the old page to the
new page
and set up fields of struct page newpage. Keep in mind that you should
indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
under page_lock if you migrated the oldpage successfully and returns
under page_lock if you migrated the oldpage successfully and returned
MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
because VM interprets -EAGAIN as "temporal migration failure". On returning
any error except -EAGAIN, VM will give up the page migration without retrying
in this time.
because VM interprets -EAGAIN as "temporary migration failure". On returning
any error except -EAGAIN, VM will give up the page migration without
retrying.
Driver shouldn't touch page.lru field VM using in the functions.
Driver shouldn't touch the page.lru field while in the migratepage() function.
3. ``void (*putback_page)(struct page *);``
If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed page.
In this function, driver should put the isolated page back to the own data
If migration fails on the isolated page, VM should return the isolated page
to the driver so VM calls the driver's putback_page() with the isolated page.
In this function, the driver should put the isolated page back into its own data
structure.
4. non-lru movable page flags
4. non-LRU movable page flags
There are two page flags for supporting non-lru movable page.
There are two page flags for supporting non-LRU movable page.
* PG_movable
Driver should use the below function to make page movable under page_lock::
Driver should use the function below to make page movable under page_lock::
void __SetPageMovable(struct page *page, struct address_space *mapping)
It needs argument of address_space for registering migration
family functions which will be called by VM. Exactly speaking,
PG_movable is not a real flag of struct page. Rather than, VM
reuses page->mapping's lower bits to represent it.
PG_movable is not a real flag of struct page. Rather, VM
reuses the page->mapping's lower bits to represent it::
::
#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
so driver shouldn't access page->mapping directly. Instead, driver should
use page_mapping which mask off the low two bits of page->mapping under
page lock so it can get right struct address_space.
For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.
For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.
Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page.
use page_mapping() which masks off the low two bits of page->mapping under
page lock so it can get the right struct address_space.
For testing of non-LRU movable pages, VM supports __PageMovable() function.
However, it doesn't guarantee to identify non-LRU movable pages because
the page->mapping field is unified with other variables in struct page.
If the driver releases the page after isolation by VM, page->mapping
doesn't have a stable value although it has PAGE_MAPPING_MOVABLE set
(look at __ClearPageMovable). But __PageMovable() is cheap to call whether
page is LRU or non-LRU movable once the page has been isolated because LRU
pages can never have PAGE_MAPPING_MOVABLE set in page->mapping. It is also
good for just peeking to test non-LRU movable pages before more expensive
checking with lock_page() in pfn scanning to select a victim.
For guaranteeing non-LRU movable page, VM provides PageMovable() function.
Unlike __PageMovable(), PageMovable() validates page->mapping and
mapping->a_ops->isolate_page under lock_page(). The lock_page() prevents
sudden destroying of page->mapping.
Drivers using __SetPageMovable() should clear the flag via
__ClearMovablePage() under page_lock() before the releasing the page.
* PG_isolated
To prevent concurrent isolation among several CPUs, VM marks isolated page
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
movable page, it can skip it. Driver doesn't need to manipulate the flag
because VM will set/clear it automatically. Keep in mind that if driver
sees PG_isolated page, it means the page have been isolated by VM so it
shouldn't touch page.lru field.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.
as PG_isolated under lock_page(). So if a CPU encounters PG_isolated
non-LRU movable page, it can skip it. Driver doesn't need to manipulate the
flag because VM will set/clear it automatically. Keep in mind that if the
driver sees a PG_isolated page, it means the page has been isolated by the
VM so it shouldn't touch the page.lru field.
The PG_isolated flag is aliased with the PG_reclaim flag so drivers
shouldn't use PG_isolated for its own purposes.
Monitoring Migration
=====================
......@@ -266,8 +270,8 @@ The following events (counters) can be used to monitor page migration.
512.
2. PGMIGRATE_FAIL: Normal page migration failure. Same counting rules as for
_SUCCESS, above: this will be increased by the number of subpages, if it was
a THP.
PGMIGRATE_SUCCESS, above: this will be increased by the number of subpages,
if it was a THP.
3. THP_MIGRATION_SUCCESS: A THP was migrated without being split.
......
......@@ -103,8 +103,10 @@ watch that specific key).
To manage a watch list, the following functions are provided:
* ``void init_watch_list(struct watch_list *wlist,
void (*release_watch)(struct watch *wlist));``
* ::
void init_watch_list(struct watch_list *wlist,
void (*release_watch)(struct watch *wlist));
Initialise a watch list. If ``release_watch`` is not NULL, then this
indicates a function that should be called when the watch_list object is
......@@ -179,9 +181,11 @@ The following functions are provided to manage watches:
driver-settable fields in the watch struct must have been set before this
is called.
* ``int remove_watch_from_object(struct watch_list *wlist,
* ::
int remove_watch_from_object(struct watch_list *wlist,
struct watch_queue *wqueue,
u64 id, false);``
u64 id, false);
Remove a watch from a watch list, where the watch must match the specified
watch queue (``wqueue``) and object identifier (``id``). A notification
......
......@@ -14259,7 +14259,7 @@ QLOGIC QLA3XXX NETWORK DRIVER
M: GR-Linux-NIC-Dev@marvell.com
L: netdev@vger.kernel.org
S: Supported
F: Documentation/networking/device_drivers/ethernet/qlogic/LICENSE.qla3xxx
F: Documentation/networking/device_drivers/qlogic/LICENSE.qla3xxx
F: drivers/net/ethernet/qlogic/qla3xxx.*
QLOGIC QLA4XXX iSCSI DRIVER
......@@ -17759,6 +17759,7 @@ S: Supported
W: http://www.linux-mtd.infradead.org/doc/ubifs.html
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs.git next
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs.git fixes
F: Documentation/filesystems/ubifs-authentication.rst
F: Documentation/filesystems/ubifs.rst
F: fs/ubifs/
......
......@@ -641,7 +641,7 @@ static inline struct iio_dev *iio_device_get(struct iio_dev *indio_dev)
*
* This utility must be called between IIO device allocation
* (via devm_iio_device_alloc()) & IIO device registration
* (via {devm_}iio_device_register()).
* (via iio_device_register() and devm_iio_device_register())).
* By default, the device allocation will also assign a parent device to
* the IIO device object. In cases where devm_iio_device_alloc() is used,
* sometimes the parent device must be different than the device used to
......
......@@ -1986,7 +1986,7 @@ config MMAP_ALLOW_UNINITIALIZED
userspace. Since that isn't generally a problem on no-MMU systems,
it is normally safe to say Y here.
See Documentation/mm/nommu-mmap.rst for more information.
See Documentation/admin-guide/mm/nommu-mmap.rst for more information.
config SYSTEM_DATA_VERIFICATION
def_bool n
......
......@@ -383,7 +383,7 @@ config NOMMU_INITIAL_TRIM_EXCESS
This option specifies the initial value of this option. The default
of 1 says that all excess pages should be trimmed.
See Documentation/mm/nommu-mmap.rst for more information.
See Documentation/admin-guide/mm/nommu-mmap.rst for more information.
config TRANSPARENT_HUGEPAGE
bool "Transparent Hugepage Support"
......
......@@ -5,7 +5,7 @@
* Replacement code for mm functions to support CPU's that don't
* have any form of memory management unit (thus no virtual memory).
*
* See Documentation/mm/nommu-mmap.rst
* See Documentation/admin-guide/mm/nommu-mmap.rst
*
* Copyright (c) 2004-2008 David Howells <dhowells@redhat.com>
* Copyright (c) 2000-2003 David McCullough <davidm@snapgear.com>
......
......@@ -5,7 +5,7 @@
* stack trace and selected registers when _do_fork() is called.
*
* For more information on theory of operation of kprobes, see
* Documentation/staging/kprobes.rst
* Documentation/trace/kprobes.rst
*
* You will see the trace data in /var/log/messages and on the console
* whenever _do_fork() is invoked to create a new process.
......
......@@ -11,7 +11,7 @@
* If no func_name is specified, _do_fork is instrumented
*
* For more information on theory of operation of kretprobes, see
* Documentation/staging/kprobes.rst
* Documentation/trace/kprobes.rst
*
* Build and insert the kernel module as done in the kprobe example.
* You will see the trace data in /var/log/messages and on the console
......
// SPDX-License-Identifier: GPL-2.0-only
///
/// From Documentation/filesystems/sysfs.txt:
/// From Documentation/filesystems/sysfs.rst:
/// show() must not use snprintf() when formatting the value to be
/// returned to user space. If you can guarantee that an overflow
/// will never happen you can use sprintf() otherwise you must use
......
......@@ -1083,7 +1083,7 @@ sub dump_struct($$) {
my $x = shift;
my $file = shift;
if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
my $decl_type = $1;
$declaration_name = $2;
my $members = $3;
......@@ -1099,6 +1099,7 @@ sub dump_struct($$) {
$members =~ s/\s*__packed\s*/ /gos;
$members =~ s/\s*CRYPTO_MINALIGN_ATTR/ /gos;
$members =~ s/\s*____cacheline_aligned_in_smp/ /gos;
$members =~ s/\s*____cacheline_aligned/ /gos;
# replace DECLARE_BITMAP
$members =~ s/__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)/DECLARE_BITMAP($1, __ETHTOOL_LINK_MODE_MASK_NBITS)/gos;
......@@ -1594,6 +1595,8 @@ sub dump_function($$) {
my $file = shift;
my $noret = 0;
print_lineno($.);
$prototype =~ s/^static +//;
$prototype =~ s/^extern +//;
$prototype =~ s/^asmlinkage +//;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment