Commit b23c4771 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-5.8' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A fair amount of stuff this time around, dominated by yet another
  massive set from Mauro toward the completion of the RST conversion. I
  *really* hope we are getting close to the end of this. Meanwhile,
  those patches reach pretty far afield to update document references
  around the tree; there should be no actual code changes there. There
  will be, alas, more of the usual trivial merge conflicts.

  Beyond that we have more translations, improvements to the sphinx
  scripting, a number of additions to the sysctl documentation, and lots
  of fixes"

* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
  Documentation: fixes to the maintainer-entry-profile template
  zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
  tracing: Fix events.rst section numbering
  docs: acpi: fix old http link and improve document format
  docs: filesystems: add info about efivars content
  Documentation: LSM: Correct the basic LSM description
  mailmap: change email for Ricardo Ribalda
  docs: sysctl/kernel: document unaligned controls
  Documentation: admin-guide: update bug-hunting.rst
  docs: sysctl/kernel: document ngroups_max
  nvdimm: fixes to maintainter-entry-profile
  Documentation/features: Correct RISC-V kprobes support entry
  Documentation/features: Refresh the arch support status files
  Revert "docs: sysctl/kernel: document ngroups_max"
  docs: move locking-specific documents to locking/
  docs: move digsig docs to the security book
  docs: move the kref doc into the core-api book
  docs: add IRQ documentation at the core-api book
  docs: debugging-via-ohci1394.txt: add it to the core-api book
  docs: fix references for ipmi.rst file
  ...
parents c2b0fc84 e35b5a4c
...@@ -152,6 +152,7 @@ Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com> ...@@ -152,6 +152,7 @@ Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com>
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Leon Romanovsky <leon@kernel.org> <leon@leon.nu> Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com> Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
Leonardo Bras <leobras.c@gmail.com> <leonardo@linux.ibm.com>
Leonid I Ananiev <leonid.i.ananiev@intel.com> Leonid I Ananiev <leonid.i.ananiev@intel.com>
Linas Vepstas <linas@austin.ibm.com> Linas Vepstas <linas@austin.ibm.com>
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de> Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
...@@ -234,7 +235,9 @@ Ralf Baechle <ralf@linux-mips.org> ...@@ -234,7 +235,9 @@ Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de> Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net> Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net>
Rémi Denis-Courmont <rdenis@simphalempin.com> Rémi Denis-Courmont <rdenis@simphalempin.com>
Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Ricardo Ribalda <ribalda@kernel.org> <ricardo.ribalda@gmail.com>
Ricardo Ribalda <ribalda@kernel.org> <ricardo@ribalda.com>
Ricardo Ribalda <ribalda@kernel.org> Ricardo Ribalda Delgado <ribalda@kernel.org>
Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com> Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com>
Rudolf Marek <R.Marek@sh.cvut.cz> Rudolf Marek <R.Marek@sh.cvut.cz>
Rui Saraiva <rmps@joel.ist.utl.pt> Rui Saraiva <rmps@joel.ist.utl.pt>
......
...@@ -3104,14 +3104,16 @@ W: http://www.qsl.net/dl1bke/ ...@@ -3104,14 +3104,16 @@ W: http://www.qsl.net/dl1bke/
D: Generic Z8530 driver, AX.25 DAMA slave implementation D: Generic Z8530 driver, AX.25 DAMA slave implementation
D: Several AX.25 hacks D: Several AX.25 hacks
N: Ricardo Ribalda Delgado N: Ricardo Ribalda
E: ricardo.ribalda@gmail.com E: ribalda@kernel.org
W: http://ribalda.com W: http://ribalda.com
D: PLX USB338x driver D: PLX USB338x driver
D: PCA9634 driver D: PCA9634 driver
D: Option GTM671WFS D: Option GTM671WFS
D: Fintek F81216A D: Fintek F81216A
D: AD5761 iio driver D: AD5761 iio driver
D: TI DAC7612 driver
D: Sony IMX214 driver
D: Various kernel hacks D: Various kernel hacks
S: Qtechnology A/S S: Qtechnology A/S
S: Valby Langgade 142 S: Valby Langgade 142
......
...@@ -54,7 +54,7 @@ Date: October 2002 ...@@ -54,7 +54,7 @@ Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org> Contact: Linux Memory Management list <linux-mm@kvack.org>
Description: Description:
Provides information about the node's distribution and memory Provides information about the node's distribution and memory
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst
What: /sys/devices/system/node/nodeX/numastat What: /sys/devices/system/node/nodeX/numastat
Date: October 2002 Date: October 2002
......
...@@ -11,7 +11,7 @@ Description: ...@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps. These fields represent are not present in /proc/pid/smaps. These fields represent
the sum of the Pss field of each type (anon, file, shmem). the sum of the Pss field of each type (anon, file, shmem).
For more details, see Documentation/filesystems/proc.txt For more details, see Documentation/filesystems/proc.rst
and the procfs man page. and the procfs man page.
Typical output looks like this: Typical output looks like this:
......
...@@ -98,7 +98,11 @@ else # HAVE_PDFLATEX ...@@ -98,7 +98,11 @@ else # HAVE_PDFLATEX
pdfdocs: latexdocs pdfdocs: latexdocs
@$(srctree)/scripts/sphinx-pre-install --version-check @$(srctree)/scripts/sphinx-pre-install --version-check
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;) $(foreach var,$(SPHINXDIRS), \
$(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit; \
mkdir -p $(BUILDDIR)/$(var)/pdf; \
mv $(subst .tex,.pdf,$(wildcard $(BUILDDIR)/$(var)/latex/*.tex)) $(BUILDDIR)/$(var)/pdf/; \
)
endif # HAVE_PDFLATEX endif # HAVE_PDFLATEX
......
...@@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as ...@@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
reaches a specific count with the error "nobody cared". This disabled IRQ reaches a specific count with the error "nobody cared". This disabled IRQ
now prevents valid usage by an existing interrupt which may happen to share now prevents valid usage by an existing interrupt which may happen to share
the IRQ line. the IRQ line::
irq 19: nobody cared (try booting with the "irqpoll" option) irq 19: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1 CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020 Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
Call Trace: Call Trace:
<IRQ> <IRQ>
? dump_stack+0x46/0x5e ? dump_stack+0x46/0x5e
? __report_bad_irq+0x2e/0xb0 ? __report_bad_irq+0x2e/0xb0
...@@ -85,15 +86,18 @@ Mitigations ...@@ -85,15 +86,18 @@ Mitigations
The mitigations take the form of PCI quirks. The preference has been to The mitigations take the form of PCI quirks. The preference has been to
first identify and make use of a means to disable the routing to the PCH. first identify and make use of a means to disable the routing to the PCH.
In such a case a quirk to disable boot interrupt generation can be In such a case a quirk to disable boot interrupt generation can be
added.[1] added. [1]_
Intel® 6300ESB I/O Controller Hub Intel® 6300ESB I/O Controller Hub
Alternate Base Address Register: Alternate Base Address Register:
BIE: Boot Interrupt Enable BIE: Boot Interrupt Enable
0 = Boot interrupt is enabled.
1 = Boot interrupt is disabled.
Intel® Sandy Bridge through Sky Lake based Xeon servers: == ===========================
0 Boot interrupt is enabled.
1 Boot interrupt is disabled.
== ===========================
Intel® Sandy Bridge through Sky Lake based Xeon servers:
Coherent Interface Protocol Interrupt Control Coherent Interface Protocol Interrupt Control
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2: dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
When this bit is set. Local INTx messages received from the When this bit is set. Local INTx messages received from the
...@@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be ...@@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be
disabled, the Linux kernel will reroute the valid interrupt to its legacy disabled, the Linux kernel will reroute the valid interrupt to its legacy
interrupt. This redirection of the handler will prevent the occurrence of interrupt. This redirection of the handler will prevent the occurrence of
the spurious interrupt detection which would ordinarily disable the IRQ the spurious interrupt detection which would ordinarily disable the IRQ
line due to excessive unhandled counts.[2] line due to excessive unhandled counts. [2]_
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
disable) the redirection of the interrupt handler to the PCH interrupt disable) the redirection of the interrupt handler to the PCH interrupt
line. The option can be overridden by either pci=ioapicreroute or line. The option can be overridden by either pci=ioapicreroute or
pci=noioapicreroute.[3] pci=noioapicreroute. [3]_
More Documentation More Documentation
...@@ -127,19 +131,19 @@ into the evolution of its handling with chipsets. ...@@ -127,19 +131,19 @@ into the evolution of its handling with chipsets.
Example of disabling of the boot interrupt Example of disabling of the boot interrupt
------------------------------------------ ------------------------------------------
Intel® 6300ESB I/O Controller Hub (Document # 300641-004US) - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
5.7.3 Boot Interrupt 5.7.3 Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
Datasheet - Volume 2: Registers (Document # 330784-003) Datasheet - Volume 2: Registers (Document # 330784-003)
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Example of handler rerouting Example of handler rerouting
---------------------------- ----------------------------
Intel® 6700PXH 64-bit PCI Hub (Document # 302628) - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
...@@ -150,6 +154,6 @@ Cheers, ...@@ -150,6 +154,6 @@ Cheers,
Sean V Kelley Sean V Kelley
sean.v.kelley@linux.intel.com sean.v.kelley@linux.intel.com
[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/ .. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/ .. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/ .. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
...@@ -63,7 +63,7 @@ which can then be compiled to AML binary format:: ...@@ -63,7 +63,7 @@ which can then be compiled to AML binary format::
ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords
AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes
[1] http://wiki.minnowboard.org/MinnowBoard_MAX#Low_Speed_Expansion_Connector_.28Top.29 [1] https://www.elinux.org/Minnowboard:MinnowMax#Low_Speed_Expansion_.28Top.29
The resulting AML code can then be loaded by the kernel using one of the methods The resulting AML code can then be loaded by the kernel using one of the methods
below. below.
......
...@@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one:: ...@@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one::
Despite being an **Oops** or some other sort of stack trace, the offended Despite being an **Oops** or some other sort of stack trace, the offended
line is usually required to identify and handle the bug. Along this chapter, line is usually required to identify and handle the bug. Along this chapter,
we'll refer to "Oops" for all kinds of stack traces that need to be analized. we'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
.. note:: If the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
quality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
Modules linked in
-----------------
Modules that are tainted or are being loaded or unloaded are marked with
"(...)", where the taint flags are described in
file:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
annotated with "+", and "being unloaded" is annotated with "-".
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
format (from ``dmesg``, etc). Ignore any references in this or other docs to
"decoding the Oops" or "running it through ksymoops".
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
people will just tell you to repost it.
Where is the Oops message is located? Where is the Oops message is located?
------------------------------------- -------------------------------------
...@@ -71,7 +75,7 @@ by running ``journalctl`` command. ...@@ -71,7 +75,7 @@ by running ``journalctl`` command.
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
read the data from the kernel buffers and save it. Or you can read the data from the kernel buffers and save it. Or you can
``cat /proc/kmsg > file``, however you have to break in to stop the transfer, ``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
``kmsg`` is a "never ending file". since ``kmsg`` is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options: the disk is not available then you have three options:
...@@ -81,9 +85,9 @@ the disk is not available then you have three options: ...@@ -81,9 +85,9 @@ the disk is not available then you have three options:
planned for a crash. Alternatively, you can take a picture of planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you nothing. If the messages scroll off the top of the console, you
may find that booting with a higher resolution (eg, ``vga=791``) may find that booting with a higher resolution (e.g., ``vga=791``)
will allow you to read more of the text. (Caveat: This needs ``vesafb``, will allow you to read more of the text. (Caveat: This needs ``vesafb``,
so won't help for 'early' oopses) so won't help for 'early' oopses.)
(2) Boot with a serial console (see (2) Boot with a serial console (see
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
...@@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using ...@@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using
gdb gdb
^^^ ^^^
The GNU debug (``gdb``) is the best way to figure out the exact file and line The GNU debugger (``gdb``) is the best way to figure out the exact file and line
number of the OOPS from the ``vmlinux`` file. number of the OOPS from the ``vmlinux`` file.
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``. The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
...@@ -165,7 +169,7 @@ If you have a call trace, such as:: ...@@ -165,7 +169,7 @@ If you have a call trace, such as::
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
... ...
this shows the problem likely in the :jbd: module. You can load that module this shows the problem likely is in the :jbd: module. You can load that module
in gdb and list the relevant code:: in gdb and list the relevant code::
$ gdb fs/jbd/jbd.ko $ gdb fs/jbd/jbd.ko
...@@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example:: ...@@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example::
You need to be at the top level of the kernel tree for this to pick up You need to be at the top level of the kernel tree for this to pick up
your C files. your C files.
If you don't have access to the code you can also debug on some crash dumps If you don't have access to the source code you can still debug some crash
e.g. crash dump output as shown by Dave Miller:: dumps using the following method (example crash dump output as shown by
Dave Miller)::
EIP is at +0x14/0x4c0 EIP is at +0x14/0x4c0
... ...
...@@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller:: ...@@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller::
mov 0x8(%ebp), %ebx ! %ebx = skb->sk mov 0x8(%ebp), %ebx ! %ebx = skb->sk
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
file:`scripts/decodecode` can be used to automate most of this, depending
on what CPU architecture is being debugged.
Reporting the bug Reporting the bug
----------------- -----------------
...@@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using ...@@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using
the ``get_maintainer.pl`` script. the ``get_maintainer.pl`` script.
For example, if you find a bug at the gspca's sonixj.c file, you can get For example, if you find a bug at the gspca's sonixj.c file, you can get
their maintainers with:: its maintainers with::
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c $ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%) Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
...@@ -253,16 +261,17 @@ their maintainers with:: ...@@ -253,16 +261,17 @@ their maintainers with::
Please notice that it will point to: Please notice that it will point to:
- The last developers that touched on the source code. On the above example, - The last developers that touched the source code (if this is done inside
Tejun and Bhaktipriya (in this specific case, none really envolved on the a git tree). On the above example, Tejun and Bhaktipriya (in this
development of this file); specific case, none really envolved on the development of this file);
- The driver maintainer (Hans Verkuil); - The driver maintainer (Hans Verkuil);
- The subsystem maintainer (Mauro Carvalho Chehab); - The subsystem maintainer (Mauro Carvalho Chehab);
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org); - The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org). - the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
Usually, the fastest way to have your bug fixed is to report it to mailing Usually, the fastest way to have your bug fixed is to report it to mailing
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans). list used for the development of the code (linux-media ML) copying the
driver maintainer (Hans).
If you are totally stumped as to whom to send the report, and If you are totally stumped as to whom to send the report, and
``get_maintainer.pl`` didn't provide you anything useful, send it to ``get_maintainer.pl`` didn't provide you anything useful, send it to
...@@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files ...@@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files
and forwarded to the kernel developers. and forwarded to the kernel developers.
Two types of address resolution are performed by ``klogd``. The first is Two types of address resolution are performed by ``klogd``. The first is
static translation and the second is dynamic translation. Static static translation and the second is dynamic translation.
translation uses the System.map file in much the same manner that Static translation uses the System.map file.
ksymoops does. In order to do static translation the ``klogd`` daemon In order to do static translation the ``klogd`` daemon
must be able to find a system map file at daemon initialization time. must be able to find a system map file at daemon initialization time.
See the klogd man page for information on how ``klogd`` searches for map See the klogd man page for information on how ``klogd`` searches for map
files. files.
......
...@@ -105,7 +105,7 @@ References ...@@ -105,7 +105,7 @@ References
---------- ----------
- http://lkml.org/lkml/2007/2/12/6 - http://lkml.org/lkml/2007/2/12/6
- Documentation/filesystems/proc.txt (1.8) - Documentation/filesystems/proc.rst (1.8)
Thanks Thanks
......
...@@ -268,7 +268,7 @@ Guest mitigation mechanisms ...@@ -268,7 +268,7 @@ Guest mitigation mechanisms
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
available at: available at:
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
.. _smt_control: .. _smt_control:
......
Explaining the dreaded "No init found." boot hang message Explaining the "No working init found." boot hang message
========================================================= =========================================================
:Authors: Andreas Mohr <andi at lisas period de>
Cristian Souza <cristianmsbr at gmail period com>
OK, so you've got this pretty unintuitive message (currently located This document provides some high-level reasons for failure
in init/main.c) and are wondering what the H*** went wrong. (listed roughly in order of execution) to load the init binary.
Some high-level reasons for failure (listed roughly in order of execution)
to load the init binary are: 1) **Unable to mount root FS**: Set "debug" kernel parameter (in bootloader
config file or CONFIG_CMDLINE) to get more detailed kernel messages.
A) Unable to mount root FS
B) init binary doesn't exist on rootfs 2) **init binary doesn't exist on rootfs**: Make sure you have the correct
C) broken console device root FS type (and ``root=`` kernel parameter points to the correct
D) binary exists but dependencies not available partition), required drivers such as storage hardware (such as SCSI or
E) binary cannot be loaded USB!) and filesystem (ext3, jffs2, etc.) are builtin (alternatively as
modules, to be pre-loaded by an initrd).
Detailed explanations:
3) **Broken console device**: Possibly a conflict in ``console= setup``
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) --> initial console unavailable. E.g. some serial consoles are unreliable
to get more detailed kernel messages. due to serial IRQ issues (e.g. missing interrupt-based configuration).
B) make sure you have the correct root FS type
(and ``root=`` kernel parameter points to the correct partition),
required drivers such as storage hardware (such as SCSI or USB!)
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
to be pre-loaded by an initrd)
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
missing interrupt-based configuration).
Try using a different ``console= device`` or e.g. ``netconsole=``. Try using a different ``console= device`` or e.g. ``netconsole=``.
D) e.g. required library dependencies of the init binary such as
``/lib/ld-linux.so.2`` missing or broken. Use 4) **Binary exists but dependencies not available**: E.g. required library
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required. dependencies of the init binary such as ``/lib/ld-linux.so.2`` missing or
E) make sure the binary's architecture matches your hardware. broken. Use ``readelf -d <INIT>|grep NEEDED`` to find out which libraries
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. are required.
In case you tried loading a non-binary file here (shell script?),
you should make sure that the script specifies an interpreter in its shebang 5) **Binary cannot be loaded**: Make sure the binary's architecture matches
header line (``#!/...``) that is fully working (including its library your hardware. E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM
dependencies). And before tackling scripts, better first test a simple hardware. In case you tried loading a non-binary file here (shell script?),
non-script binary such as ``/bin/sh`` and confirm its successful execution. you should make sure that the script specifies an interpreter in its
To find out more, add code ``to init/main.c`` to display kernel_execve()s shebang header line (``#!/...``) that is fully working (including its
return values. library dependencies). And before tackling scripts, better first test a
simple non-script binary such as ``/bin/sh`` and confirm its successful
execution. To find out more, add code ``to init/main.c`` to display
kernel_execve()s return values.
Please extend this explanation whenever you find new failure causes Please extend this explanation whenever you find new failure causes
(after all loading the init binary is a CRITICAL and hard transition step (after all loading the init binary is a CRITICAL and hard transition step
which needs to be made as painless as possible), then submit patch to LKML. which needs to be made as painless as possible), then submit a patch to LKML.
Further TODOs: Further TODOs:
- Implement the various ``run_init_process()`` invocations via a struct array - Implement the various ``run_init_process()`` invocations via a struct array
which can then store the ``kernel_execve()`` result value and on failure which can then store the ``kernel_execve()`` result value and on failure
log it all by iterating over **all** results (very important usability fix). log it all by iterating over **all** results (very important usability fix).
- try to make the implementation itself more helpful in general, - Try to make the implementation itself more helpful in general, e.g. by
e.g. by providing additional error messages at affected places. providing additional error messages at affected places.
Andreas Mohr <andi at lisas period de>
...@@ -3336,7 +3336,7 @@ ...@@ -3336,7 +3336,7 @@
See Documentation/admin-guide/sysctl/vm.rst for details. See Documentation/admin-guide/sysctl/vm.rst for details.
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver. ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
See Documentation/debugging-via-ohci1394.txt for more See Documentation/core-api/debugging-via-ohci1394.rst for more
info. info.
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
......
...@@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work. ...@@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work.
References References
========== ==========
- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. - Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs.
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. - Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
......
...@@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot) ...@@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot)
In order to use a diskless system, such as an X-terminal or printer server for In order to use a diskless system, such as an X-terminal or printer server for
example, it is necessary for the root filesystem to be present on a non-disk example, it is necessary for the root filesystem to be present on a non-disk
device. This may be an initramfs (see device. This may be an initramfs (see
Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/filesystems/ramfs-rootfs-initramfs.rst), a ramdisk (see
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
following text describes on how to use NFS for the root filesystem. For the rest following text describes on how to use NFS for the root filesystem. For the rest
of this text 'client' means the diskless system, and 'server' means the NFS of this text 'client' means the diskless system, and 'server' means the NFS
......
...@@ -6,6 +6,21 @@ Numa policy hit/miss statistics ...@@ -6,6 +6,21 @@ Numa policy hit/miss statistics
All units are pages. Hugepages have separate counters. All units are pages. Hugepages have separate counters.
The numa_hit, numa_miss and numa_foreign counters reflect how well processes
are able to allocate memory from nodes they prefer. If they succeed, numa_hit
is incremented on the preferred node, otherwise numa_foreign is incremented on
the preferred node and numa_miss on the node where allocation succeeded.
Usually preferred node is the one local to the CPU where the process executes,
but restrictions such as mempolicies can change that, so there are also two
counters based on CPU local node. local_node is similar to numa_hit and is
incremented on allocation from a node by CPU on the same node. other_node is
similar to numa_miss and is incremented on the node where allocation succeeds
from a CPU from a different node. Note there is no counter analogical to
numa_foreign.
In more detail:
=============== ============================================================ =============== ============================================================
numa_hit A process wanted to allocate memory from this node, numa_hit A process wanted to allocate memory from this node,
and succeeded. and succeeded.
...@@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node, ...@@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node,
but ended up with memory from this node. but ended up with memory from this node.
numa_foreign A process wanted to allocate on this node, numa_foreign A process wanted to allocate on this node,
but ended up with memory from another one. but ended up with memory from another node.
local_node A process ran on this node and got memory from it. local_node A process ran on this node's CPU,
and got memory from this node.
other_node A process ran on this node and got memory from another node. other_node A process ran on a different node's CPU
and got memory from this node.
interleave_hit Interleaving wanted to allocate from this node interleave_hit Interleaving wanted to allocate from this node
and succeeded. and succeeded.
...@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package ...@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package
(http://oss.sgi.com/projects/libnuma/). Note that it only works (http://oss.sgi.com/projects/libnuma/). Note that it only works
well right now on machines with a small number of CPUs. well right now on machines with a small number of CPUs.
Note that on systems with memoryless nodes (where a node has CPUs but no
memory) the numa_hit, numa_miss and numa_foreign statistics can be skewed
heavily. In the current kernel implementation, if a process prefers a
memoryless node (i.e. because it is running on one of its local CPU), the
implementation actually treats one of the nearest nodes with memory as the
preferred node. As a result, such allocation will not increase the numa_foreign
counter on the memoryless node, and will skew the numa_hit, numa_miss and
numa_foreign statistics of the nearest node.
...@@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones. ...@@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones.
ECC memory ECC memory
---------- ----------
As mentioned on the previous section, ECC memory has extra bits to be As mentioned in the previous section, ECC memory has extra bits to be
used for error correction. So, on 64 bit systems, a memory module used for error correction. In the above example, a memory module has
has 64 bits of *data width*, and 74 bits of *total width*. So, there are 64 bits of *data width*, and 72 bits of *total width*. The extra 8
8 bits extra bits to be used for the error detection and correction bits which are used for the error detection and correction mechanisms
mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_. are referred to as the *syndrome*\ [#f1]_\ [#f2]_.
So, when the cpu requests the memory controller to write a word with So, when the cpu requests the memory controller to write a word with
*data width*, the memory controller calculates the *syndrome* in real time, *data width*, the memory controller calculates the *syndrome* in real time,
...@@ -212,7 +212,7 @@ EDAC - Error Detection And Correction ...@@ -212,7 +212,7 @@ EDAC - Error Detection And Correction
purposes. purposes.
When the subsystem was pushed upstream for the first time, on When the subsystem was pushed upstream for the first time, on
Kernel 2.6.16, for the first time, it was renamed to ``EDAC``. Kernel 2.6.16, it was renamed to ``EDAC``.
Purpose Purpose
------- -------
...@@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels: ...@@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels:
+------------+-----------+-----------+ +------------+-----------+-----------+
| | ``ch0`` | ``ch1`` | | | ``ch0`` | ``ch1`` |
+============+===========+===========+ +============+===========+===========+
| ``csrow0`` | DIMM_A0 | DIMM_B0 | | |**DIMM_A0**|**DIMM_B0**|
| | rank0 | rank0 | +------------+-----------+-----------+
+------------+ - | - | | ``csrow0`` | rank0 | rank0 |
+------------+-----------+-----------+
| ``csrow1`` | rank1 | rank1 | | ``csrow1`` | rank1 | rank1 |
+------------+-----------+-----------+ +------------+-----------+-----------+
| ``csrow2`` | DIMM_A1 | DIMM_B1 | | |**DIMM_A1**|**DIMM_B1**|
| | rank0 | rank0 | +------------+-----------+-----------+
+------------+ - | - | | ``csrow2`` | rank0 | rank0 |
| ``csrow3`` | rank1 | rank1 | +------------+-----------+-----------+
| ``csrow3`` | rank1 | rank1 |
+------------+-----------+-----------+ +------------+-----------+-----------+
In the above example, there are 4 physical slots on the motherboard In the above example, there are 4 physical slots on the motherboard
......
...@@ -102,6 +102,30 @@ See the ``type_of_loader`` and ``ext_loader_ver`` fields in ...@@ -102,6 +102,30 @@ See the ``type_of_loader`` and ``ext_loader_ver`` fields in
:doc:`/x86/boot` for additional information. :doc:`/x86/boot` for additional information.
bpf_stats_enabled
=================
Controls whether the kernel should collect statistics on BPF programs
(total time spent running, number of times run...). Enabling
statistics causes a slight reduction in performance on each program
run. The statistics can be seen using ``bpftool``.
= ===================================
0 Don't collect statistics (default).
1 Collect statistics.
= ===================================
cad_pid
=======
This is the pid which will be signalled on reboot (notably, by
Ctrl-Alt-Delete). Writing a value to this file which doesn't
correspond to a running process will result in ``-ESRCH``.
See also `ctrl-alt-del`_.
cap_last_cap cap_last_cap
============ ============
...@@ -241,6 +265,40 @@ domain names are in general different. For a detailed discussion ...@@ -241,6 +265,40 @@ domain names are in general different. For a detailed discussion
see the ``hostname(1)`` man page. see the ``hostname(1)`` man page.
firmware_config
===============
See :doc:`/driver-api/firmware/fallback-mechanisms`.
The entries in this directory allow the firmware loader helper
fallback to be controlled:
* ``force_sysfs_fallback``, when set to 1, forces the use of the
fallback;
* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
ftrace_dump_on_oops
===================
Determines whether ``ftrace_dump()`` should be called on an oops (or
kernel panic). This will output the contents of the ftrace buffers to
the console. This is very useful for capturing traces that lead to
crashes and outputting them to a serial console.
= ===================================================
0 Disabled (default).
1 Dump buffers of all CPUs.
2 Dump the buffer of the CPU that triggered the oops.
= ===================================================
ftrace_enabled, stack_tracer_enabled
====================================
See :doc:`/trace/ftrace`.
hardlockup_all_cpu_backtrace hardlockup_all_cpu_backtrace
============================ ============================
...@@ -344,6 +402,25 @@ Controls whether the panic kmsg data should be reported to Hyper-V. ...@@ -344,6 +402,25 @@ Controls whether the panic kmsg data should be reported to Hyper-V.
= ========================================================= = =========================================================
ignore-unaligned-usertrap
=========================
On architectures where unaligned accesses cause traps, and where this
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
currently, ``arc`` and ``ia64``), controls whether all unaligned traps
are logged.
= =============================================================
0 Log all unaligned accesses.
1 Only warn the first time a process traps. This is the default
setting.
= =============================================================
See also `unaligned-trap`_ and `unaligned-dump-stack`_. On ``ia64``,
this allows system administrators to override the
``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
kexec_load_disabled kexec_load_disabled
=================== ===================
...@@ -459,6 +536,15 @@ Notes: ...@@ -459,6 +536,15 @@ Notes:
successful IPC object allocation. If an IPC object allocation syscall successful IPC object allocation. If an IPC object allocation syscall
fails, it is undefined if the value remains unmodified or is reset to -1. fails, it is undefined if the value remains unmodified or is reset to -1.
ngroups_max
===========
Maximum number of supplementary groups, _i.e._ the maximum size which
``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
nmi_watchdog nmi_watchdog
============ ============
...@@ -877,7 +963,7 @@ this sysctl interface anymore. ...@@ -877,7 +963,7 @@ this sysctl interface anymore.
pty pty
=== ===
See Documentation/filesystems/devpts.txt. See Documentation/filesystems/devpts.rst.
randomize_va_space randomize_va_space
...@@ -1173,6 +1259,65 @@ If a value outside of this range is written to ``threads-max`` an ...@@ -1173,6 +1259,65 @@ If a value outside of this range is written to ``threads-max`` an
``EINVAL`` error occurs. ``EINVAL`` error occurs.
traceoff_on_warning
===================
When set, disables tracing (see :doc:`/trace/ftrace`) when a
``WARN()`` is hit.
tracepoint_printk
=================
When tracepoints are sent to printk() (enabled by the ``tp_printk``
boot parameter), this entry provides runtime control::
echo 0 > /proc/sys/kernel/tracepoint_printk
will stop tracepoints from being sent to printk(), and::
echo 1 > /proc/sys/kernel/tracepoint_printk
will send them to printk() again.
This only works if the kernel was booted with ``tp_printk`` enabled.
See :doc:`/admin-guide/kernel-parameters` and
:doc:`/trace/boottime-trace`.
.. _unaligned-dump-stack:
unaligned-dump-stack (ia64)
===========================
When logging unaligned accesses, controls whether the stack is
dumped.
= ===================================================
0 Do not dump the stack. This is the default setting.
1 Dump the stack.
= ===================================================
See also `ignore-unaligned-usertrap`_.
unaligned-trap
==============
On architectures where unaligned accesses cause traps, and where this
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
``arc`` and ``parisc``), controls whether unaligned traps are caught
and emulated (instead of failing).
= ========================================================
0 Do not emulate unaligned accesses.
1 Emulate unaligned accesses. This is the default setting.
= ========================================================
See also `ignore-unaligned-usertrap`_.
unknown_nmi_panic unknown_nmi_panic
================= =================
...@@ -1184,6 +1329,16 @@ NMI switch that most IA32 servers have fires unknown NMI up, for ...@@ -1184,6 +1329,16 @@ NMI switch that most IA32 servers have fires unknown NMI up, for
example. If a system hangs up, try pressing the NMI switch. example. If a system hangs up, try pressing the NMI switch.
unprivileged_bpf_disabled
=========================
Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` will return
``-EPERM``.
Once set, this can't be cleared.
watchdog watchdog
======== ========
......
...@@ -24,13 +24,13 @@ optional external memory-mapped interface. ...@@ -24,13 +24,13 @@ optional external memory-mapped interface.
Version 1 of the Activity Monitors architecture implements a counter group Version 1 of the Activity Monitors architecture implements a counter group
of four fixed and architecturally defined 64-bit event counters. of four fixed and architecturally defined 64-bit event counters.
- CPU cycle counter: increments at the frequency of the CPU. - CPU cycle counter: increments at the frequency of the CPU.
- Constant counter: increments at the fixed frequency of the system - Constant counter: increments at the fixed frequency of the system
clock. clock.
- Instructions retired: increments with every architecturally executed - Instructions retired: increments with every architecturally executed
instruction. instruction.
- Memory stall cycles: counts instruction dispatch stall cycles caused by - Memory stall cycles: counts instruction dispatch stall cycles caused by
misses in the last level cache within the clock domain. misses in the last level cache within the clock domain.
When in WFI or WFE these counters do not increment. When in WFI or WFE these counters do not increment.
...@@ -59,11 +59,11 @@ counters, only the presence of the extension. ...@@ -59,11 +59,11 @@ counters, only the presence of the extension.
Firmware (code running at higher exception levels, e.g. arm-tf) support is Firmware (code running at higher exception levels, e.g. arm-tf) support is
needed to: needed to:
- Enable access for lower exception levels (EL2 and EL1) to the AMU - Enable access for lower exception levels (EL2 and EL1) to the AMU
registers. registers.
- Enable the counters. If not enabled these will read as 0. - Enable the counters. If not enabled these will read as 0.
- Save/restore the counters before/after the CPU is being put/brought up - Save/restore the counters before/after the CPU is being put/brought up
from the 'off' power state. from the 'off' power state.
When using kernels that have this feature enabled but boot with broken When using kernels that have this feature enabled but boot with broken
firmware the user may experience panics or lockups when accessing the firmware the user may experience panics or lockups when accessing the
...@@ -81,10 +81,10 @@ are not trapped in EL2/EL3. ...@@ -81,10 +81,10 @@ are not trapped in EL2/EL3.
The fixed counters of AMUv1 are accessible though the following system The fixed counters of AMUv1 are accessible though the following system
register definitions: register definitions:
- SYS_AMEVCNTR0_CORE_EL0 - SYS_AMEVCNTR0_CORE_EL0
- SYS_AMEVCNTR0_CONST_EL0 - SYS_AMEVCNTR0_CONST_EL0
- SYS_AMEVCNTR0_INST_RET_EL0 - SYS_AMEVCNTR0_INST_RET_EL0
- SYS_AMEVCNTR0_MEM_STALL_EL0 - SYS_AMEVCNTR0_MEM_STALL_EL0
Auxiliary platform specific counters can be accessed using Auxiliary platform specific counters can be accessed using
SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15. SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
...@@ -97,9 +97,9 @@ Userspace access ...@@ -97,9 +97,9 @@ Userspace access
Currently, access from userspace to the AMU registers is disabled due to: Currently, access from userspace to the AMU registers is disabled due to:
- Security reasons: they might expose information about code executed in - Security reasons: they might expose information about code executed in
secure mode. secure mode.
- Purpose: AMU counters are intended for system management use. - Purpose: AMU counters are intended for system management use.
Also, the presence of the feature is not visible to userspace. Also, the presence of the feature is not visible to userspace.
...@@ -110,8 +110,8 @@ Virtualization ...@@ -110,8 +110,8 @@ Virtualization
Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM
guest side is disabled due to: guest side is disabled due to:
- Security reasons: they might expose information about code executed - Security reasons: they might expose information about code executed
by other guests or the host. by other guests or the host.
Any attempt to access the AMU registers will result in an UNDEFINED Any attempt to access the AMU registers will result in an UNDEFINED
exception being injected into the guest. exception being injected into the guest.
...@@ -173,8 +173,10 @@ Before jumping into the kernel, the following conditions must be met: ...@@ -173,8 +173,10 @@ Before jumping into the kernel, the following conditions must be met:
- Caches, MMUs - Caches, MMUs
The MMU must be off. The MMU must be off.
The instruction cache may be on or off, and must not hold any stale The instruction cache may be on or off, and must not hold any stale
entries corresponding to the loaded kernel image. entries corresponding to the loaded kernel image.
The address range corresponding to the loaded kernel image must be The address range corresponding to the loaded kernel image must be
cleaned to the PoC. In the presence of a system cache or other cleaned to the PoC. In the presence of a system cache or other
coherent masters with caches enabled, this will typically require coherent masters with caches enabled, this will typically require
...@@ -239,6 +241,7 @@ Before jumping into the kernel, the following conditions must be met: ...@@ -239,6 +241,7 @@ Before jumping into the kernel, the following conditions must be met:
- The DT or ACPI tables must describe a GICv2 interrupt controller. - The DT or ACPI tables must describe a GICv2 interrupt controller.
For CPUs with pointer authentication functionality: For CPUs with pointer authentication functionality:
- If EL3 is present: - If EL3 is present:
- SCR_EL3.APK (bit 16) must be initialised to 0b1 - SCR_EL3.APK (bit 16) must be initialised to 0b1
...@@ -250,18 +253,22 @@ Before jumping into the kernel, the following conditions must be met: ...@@ -250,18 +253,22 @@ Before jumping into the kernel, the following conditions must be met:
- HCR_EL2.API (bit 41) must be initialised to 0b1 - HCR_EL2.API (bit 41) must be initialised to 0b1
For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
- If EL3 is present: - If EL3 is present:
CPTR_EL3.TAM (bit 30) must be initialised to 0b0
CPTR_EL2.TAM (bit 30) must be initialised to 0b0 - CPTR_EL3.TAM (bit 30) must be initialised to 0b0
AMCNTENSET0_EL0 must be initialised to 0b1111 - CPTR_EL2.TAM (bit 30) must be initialised to 0b0
AMCNTENSET1_EL0 must be initialised to a platform specific value - AMCNTENSET0_EL0 must be initialised to 0b1111
having 0b1 set for the corresponding bit for each of the auxiliary - AMCNTENSET1_EL0 must be initialised to a platform specific value
counters present. having 0b1 set for the corresponding bit for each of the auxiliary
counters present.
- If the kernel is entered at EL1: - If the kernel is entered at EL1:
AMCNTENSET0_EL0 must be initialised to 0b1111
AMCNTENSET1_EL0 must be initialised to a platform specific value - AMCNTENSET0_EL0 must be initialised to 0b1111
having 0b1 set for the corresponding bit for each of the auxiliary - AMCNTENSET1_EL0 must be initialised to a platform specific value
counters present. having 0b1 set for the corresponding bit for each of the auxiliary
counters present.
The requirements described above for CPU mode, caches, MMUs, architected The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must timers, coherency and system registers apply to all CPUs. All CPUs must
...@@ -305,7 +312,8 @@ following manner: ...@@ -305,7 +312,8 @@ following manner:
Documentation/devicetree/bindings/arm/psci.yaml. Documentation/devicetree/bindings/arm/psci.yaml.
- Secondary CPU general-purpose register settings - Secondary CPU general-purpose register settings
x0 = 0 (reserved for future use)
x1 = 0 (reserved for future use) - x0 = 0 (reserved for future use)
x2 = 0 (reserved for future use) - x1 = 0 (reserved for future use)
x3 = 0 (reserved for future use) - x2 = 0 (reserved for future use)
- x3 = 0 (reserved for future use)
...@@ -388,44 +388,6 @@ if major == 1 and minor < 6: ...@@ -388,44 +388,6 @@ if major == 1 and minor < 6:
# author, documentclass [howto, manual, or own class]). # author, documentclass [howto, manual, or own class]).
# Sorted in alphabetical order # Sorted in alphabetical order
latex_documents = [ latex_documents = [
('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
'The kernel development community', 'manual'),
('core-api/index', 'core-api.tex', 'The kernel core API manual',
'The kernel development community', 'manual'),
('crypto/index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
'The kernel development community', 'manual'),
('dev-tools/index', 'dev-tools.tex', 'Development tools for the Kernel',
'The kernel development community', 'manual'),
('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
'The kernel development community', 'manual'),
('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
'The kernel development community', 'manual'),
('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
'The kernel development community', 'manual'),
('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide',
'ext4 Community', 'manual'),
('filesystems/ext4/index', 'ext4-data-structures.tex',
'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
'The kernel development community', 'manual'),
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
'The kernel development community', 'manual'),
('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
'The kernel development community', 'manual'),
('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
'The kernel development community', 'manual'),
('networking/index', 'networking.tex', 'Linux Networking Documentation',
'The kernel development community', 'manual'),
('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
'The kernel development community', 'manual'),
('security/index', 'security.tex', 'The kernel security subsystem manual',
'The kernel development community', 'manual'),
('sh/index', 'sh.tex', 'SuperH architecture implementation manual',
'The kernel development community', 'manual'),
('sound/index', 'sound.tex', 'Linux Sound Subsystem Documentation',
'The kernel development community', 'manual'),
('userspace-api/index', 'userspace-api.tex', 'The Linux kernel user-space API guide',
'The kernel development community', 'manual'),
] ]
# Add all other index files from Documentation/ subdirectories # Add all other index files from Documentation/ subdirectories
......
...@@ -18,6 +18,7 @@ it. ...@@ -18,6 +18,7 @@ it.
kernel-api kernel-api
workqueue workqueue
printk-basics
printk-formats printk-formats
symbol-namespaces symbol-namespaces
...@@ -30,10 +31,12 @@ Library functionality that is used throughout the kernel. ...@@ -30,10 +31,12 @@ Library functionality that is used throughout the kernel.
:maxdepth: 1 :maxdepth: 1
kobject kobject
kref
assoc_array assoc_array
xarray xarray
idr idr
circular-buffers circular-buffers
rbtree
generic-radix-tree generic-radix-tree
packing packing
timekeeping timekeeping
...@@ -50,6 +53,7 @@ How Linux keeps everything from happening at the same time. See ...@@ -50,6 +53,7 @@ How Linux keeps everything from happening at the same time. See
atomic_ops atomic_ops
refcount-vs-atomic refcount-vs-atomic
irq/index
local_ops local_ops
padata padata
../RCU/index ../RCU/index
...@@ -78,6 +82,10 @@ more memory-management documentation in :doc:`/vm/index`. ...@@ -78,6 +82,10 @@ more memory-management documentation in :doc:`/vm/index`.
:maxdepth: 1 :maxdepth: 1
memory-allocation memory-allocation
dma-api
dma-api-howto
dma-attributes
dma-isa-lpc
mm-api mm-api
genalloc genalloc
pin_user_pages pin_user_pages
...@@ -92,6 +100,7 @@ Interfaces for kernel debugging ...@@ -92,6 +100,7 @@ Interfaces for kernel debugging
debug-objects debug-objects
tracepoint tracepoint
debugging-via-ohci1394
Everything else Everything else
=============== ===============
......
====
IRQs
====
.. toctree::
:maxdepth: 1
concepts
irq-affinity
irq-domain
irqflags-tracing
...@@ -263,7 +263,8 @@ needs to: ...@@ -263,7 +263,8 @@ needs to:
Hierarchy irq_domain is in no way x86 specific, and is heavily used to Hierarchy irq_domain is in no way x86 specific, and is heavily used to
support other architectures, such as ARM, ARM64 etc. support other architectures, such as ARM, ARM64 etc.
=== Debugging === Debugging
=========
Most of the internals of the IRQ subsystem are exposed in debugfs by Most of the internals of the IRQ subsystem are exposed in debugfs by
turning CONFIG_GENERIC_IRQ_DEBUGFS on. turning CONFIG_GENERIC_IRQ_DEBUGFS on.
...@@ -80,11 +80,11 @@ what is the pointer to the containing structure? You must avoid tricks ...@@ -80,11 +80,11 @@ what is the pointer to the containing structure? You must avoid tricks
(such as assuming that the kobject is at the beginning of the structure) (such as assuming that the kobject is at the beginning of the structure)
and, instead, use the container_of() macro, found in ``<linux/kernel.h>``:: and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
container_of(pointer, type, member) container_of(ptr, type, member)
where: where:
* ``pointer`` is the pointer to the embedded kobject, * ``ptr`` is the pointer to the embedded kobject,
* ``type`` is the type of the containing structure, and * ``type`` is the type of the containing structure, and
* ``member`` is the name of the structure field to which ``pointer`` points. * ``member`` is the name of the structure field to which ``pointer`` points.
...@@ -140,7 +140,7 @@ the name of the kobject, call kobject_rename():: ...@@ -140,7 +140,7 @@ the name of the kobject, call kobject_rename()::
int kobject_rename(struct kobject *kobj, const char *new_name); int kobject_rename(struct kobject *kobj, const char *new_name);
kobject_rename does not perform any locking or have a solid notion of kobject_rename() does not perform any locking or have a solid notion of
what names are valid so the caller must provide their own sanity checking what names are valid so the caller must provide their own sanity checking
and serialization. and serialization.
...@@ -210,7 +210,7 @@ statically and will warn the developer of this improper usage. ...@@ -210,7 +210,7 @@ statically and will warn the developer of this improper usage.
If all that you want to use a kobject for is to provide a reference counter If all that you want to use a kobject for is to provide a reference counter
for your structure, please use the struct kref instead; a kobject would be for your structure, please use the struct kref instead; a kobject would be
overkill. For more information on how to use struct kref, please see the overkill. For more information on how to use struct kref, please see the
file Documentation/kref.txt in the Linux kernel source tree. file Documentation/core-api/kref.rst in the Linux kernel source tree.
Creating "simple" kobjects Creating "simple" kobjects
...@@ -222,17 +222,17 @@ ksets, show and store functions, and other details. This is the one ...@@ -222,17 +222,17 @@ ksets, show and store functions, and other details. This is the one
exception where a single kobject should be created. To create such an exception where a single kobject should be created. To create such an
entry, use the function:: entry, use the function::
struct kobject *kobject_create_and_add(char *name, struct kobject *parent); struct kobject *kobject_create_and_add(const char *name, struct kobject *parent);
This function will create a kobject and place it in sysfs in the location This function will create a kobject and place it in sysfs in the location
underneath the specified parent kobject. To create simple attributes underneath the specified parent kobject. To create simple attributes
associated with this kobject, use:: associated with this kobject, use::
int sysfs_create_file(struct kobject *kobj, struct attribute *attr); int sysfs_create_file(struct kobject *kobj, const struct attribute *attr);
or:: or::
int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp); int sysfs_create_group(struct kobject *kobj, const struct attribute_group *grp);
Both types of attributes used here, with a kobject that has been created Both types of attributes used here, with a kobject that has been created
with the kobject_create_and_add(), can be of type kobj_attribute, so no with the kobject_create_and_add(), can be of type kobj_attribute, so no
...@@ -300,8 +300,10 @@ kobj_type:: ...@@ -300,8 +300,10 @@ kobj_type::
void (*release)(struct kobject *kobj); void (*release)(struct kobject *kobj);
const struct sysfs_ops *sysfs_ops; const struct sysfs_ops *sysfs_ops;
struct attribute **default_attrs; struct attribute **default_attrs;
const struct attribute_group **default_groups;
const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
const void *(*namespace)(struct kobject *kobj); const void *(*namespace)(struct kobject *kobj);
void (*get_ownership)(struct kobject *kobj, kuid_t *uid, kgid_t *gid);
}; };
This structure is used to describe a particular type of kobject (or, more This structure is used to describe a particular type of kobject (or, more
...@@ -352,12 +354,12 @@ created and never declared statically or on the stack. To create a new ...@@ -352,12 +354,12 @@ created and never declared statically or on the stack. To create a new
kset use:: kset use::
struct kset *kset_create_and_add(const char *name, struct kset *kset_create_and_add(const char *name,
struct kset_uevent_ops *u, const struct kset_uevent_ops *uevent_ops,
struct kobject *parent); struct kobject *parent_kobj);
When you are finished with the kset, call:: When you are finished with the kset, call::
void kset_unregister(struct kset *kset); void kset_unregister(struct kset *k);
to destroy it. This removes the kset from sysfs and decrements its reference to destroy it. This removes the kset from sysfs and decrements its reference
count. When the reference count goes to zero, the kset will be released. count. When the reference count goes to zero, the kset will be released.
...@@ -371,9 +373,9 @@ If a kset wishes to control the uevent operations of the kobjects ...@@ -371,9 +373,9 @@ If a kset wishes to control the uevent operations of the kobjects
associated with it, it can use the struct kset_uevent_ops to handle it:: associated with it, it can use the struct kset_uevent_ops to handle it::
struct kset_uevent_ops { struct kset_uevent_ops {
int (*filter)(struct kset *kset, struct kobject *kobj); int (* const filter)(struct kset *kset, struct kobject *kobj);
const char *(*name)(struct kset *kset, struct kobject *kobj); const char *(* const name)(struct kset *kset, struct kobject *kobj);
int (*uevent)(struct kset *kset, struct kobject *kobj, int (* const uevent)(struct kset *kset, struct kobject *kobj,
struct kobj_uevent_env *env); struct kobj_uevent_env *env);
}; };
......
.. SPDX-License-Identifier: GPL-2.0
===========================
Message logging with printk
===========================
printk() is one of the most widely known functions in the Linux kernel. It's the
standard tool we have for printing messages and usually the most basic way of
tracing and debugging. If you're familiar with printf(3) you can tell printk()
is based on it, although it has some functional differences:
- printk() messages can specify a log level.
- the format string, while largely compatible with C99, doesn't follow the
exact same specification. It has some extensions and a few limitations
(no ``%n`` or floating point conversion specifiers). See :ref:`How to get
printk format specifiers right <printk-specifiers>`.
All printk() messages are printed to the kernel log buffer, which is a ring
buffer exported to userspace through /dev/kmsg. The usual way to read it is
using ``dmesg``.
printk() is typically used like this::
printk(KERN_INFO "Message: %s\n", arg);
where ``KERN_INFO`` is the log level (note that it's concatenated to the format
string, the log level is not a separate argument). The available log levels are:
+----------------+--------+-----------------------------------------------+
| Name | String | Alias function |
+================+========+===============================================+
| KERN_EMERG | "0" | pr_emerg() |
+----------------+--------+-----------------------------------------------+
| KERN_ALERT | "1" | pr_alert() |
+----------------+--------+-----------------------------------------------+
| KERN_CRIT | "2" | pr_crit() |
+----------------+--------+-----------------------------------------------+
| KERN_ERR | "3" | pr_err() |
+----------------+--------+-----------------------------------------------+
| KERN_WARNING | "4" | pr_warn() |
+----------------+--------+-----------------------------------------------+
| KERN_NOTICE | "5" | pr_notice() |
+----------------+--------+-----------------------------------------------+
| KERN_INFO | "6" | pr_info() |
+----------------+--------+-----------------------------------------------+
| KERN_DEBUG | "7" | pr_debug() and pr_devel() if DEBUG is defined |
+----------------+--------+-----------------------------------------------+
| KERN_DEFAULT | "" | |
+----------------+--------+-----------------------------------------------+
| KERN_CONT | "c" | pr_cont() |
+----------------+--------+-----------------------------------------------+
The log level specifies the importance of a message. The kernel decides whether
to show the message immediately (printing it to the current console) depending
on its log level and the current *console_loglevel* (a kernel variable). If the
message priority is higher (lower log level value) than the *console_loglevel*
the message will be printed to the console.
If the log level is omitted, the message is printed with ``KERN_DEFAULT``
level.
You can check the current *console_loglevel* with::
$ cat /proc/sys/kernel/printk
4 4 1 7
The result shows the *current*, *default*, *minimum* and *boot-time-default* log
levels.
To change the current console_loglevel simply write the the desired level to
``/proc/sys/kernel/printk``. For example, to print all messages to the console::
# echo 8 > /proc/sys/kernel/printk
Another way, using ``dmesg``::
# dmesg -n 5
sets the console_loglevel to print KERN_WARNING (4) or more severe messages to
console. See ``dmesg(1)`` for more information.
As an alternative to printk() you can use the ``pr_*()`` aliases for
logging. This family of macros embed the log level in the macro names. For
example::
pr_info("Info message no. %d\n", msg_num);
prints a ``KERN_INFO`` message.
Besides being more concise than the equivalent printk() calls, they can use a
common definition for the format string through the pr_fmt() macro. For
instance, defining this at the top of a source file (before any ``#include``
directive)::
#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
would prefix every pr_*() message in that file with the module and function name
that originated the message.
For debugging purposes there are also two conditionally-compiled macros:
pr_debug() and pr_devel(), which are compiled-out unless ``DEBUG`` (or
also ``CONFIG_DYNAMIC_DEBUG`` in the case of pr_debug()) is defined.
Function reference
==================
.. kernel-doc:: kernel/printk/printk.c
:functions: printk
.. kernel-doc:: include/linux/printk.h
:functions: pr_emerg pr_alert pr_crit pr_err pr_warn pr_notice pr_info
pr_fmt pr_debug pr_devel pr_cont
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
How to get printk format specifiers right How to get printk format specifiers right
========================================= =========================================
.. _printk-specifiers:
:Author: Randy Dunlap <rdunlap@infradead.org> :Author: Randy Dunlap <rdunlap@infradead.org>
:Author: Andrew Murray <amurray@mpc-data.co.uk> :Author: Andrew Murray <amurray@mpc-data.co.uk>
......
...@@ -6,7 +6,7 @@ Documentation subsystem maintainer entry profile ...@@ -6,7 +6,7 @@ Documentation subsystem maintainer entry profile
The documentation "subsystem" is the central coordinating point for the The documentation "subsystem" is the central coordinating point for the
kernel's documentation and associated infrastructure. It covers the kernel's documentation and associated infrastructure. It covers the
hierarchy under Documentation/ (with the exception of hierarchy under Documentation/ (with the exception of
Documentation/device-tree), various utilities under scripts/ and, at least Documentation/devicetree), various utilities under scripts/ and, at least
some of the time, LICENSES/. some of the time, LICENSES/.
It's worth noting, though, that the boundaries of this subsystem are rather It's worth noting, though, that the boundaries of this subsystem are rather
......
...@@ -11,7 +11,7 @@ course not limited to GPU use cases. ...@@ -11,7 +11,7 @@ course not limited to GPU use cases.
The three main components of this are: (1) dma-buf, representing a The three main components of this are: (1) dma-buf, representing a
sg_table and exposed to userspace as a file descriptor to allow passing sg_table and exposed to userspace as a file descriptor to allow passing
between devices, (2) fence, which provides a mechanism to signal when between devices, (2) fence, which provides a mechanism to signal when
one device as finished access, and (3) reservation, which manages the one device has finished access, and (3) reservation, which manages the
shared or exclusive fence(s) associated with the buffer. shared or exclusive fence(s) associated with the buffer.
Shared DMA Buffers Shared DMA Buffers
...@@ -31,7 +31,7 @@ The exporter ...@@ -31,7 +31,7 @@ The exporter
- implements and manages operations in :c:type:`struct dma_buf_ops - implements and manages operations in :c:type:`struct dma_buf_ops
<dma_buf_ops>` for the buffer, <dma_buf_ops>` for the buffer,
- allows other users to share the buffer by using dma_buf sharing APIs, - allows other users to share the buffer by using dma_buf sharing APIs,
- manages the details of buffer allocation, wrapped int a :c:type:`struct - manages the details of buffer allocation, wrapped in a :c:type:`struct
dma_buf <dma_buf>`, dma_buf <dma_buf>`,
- decides about the actual backing storage where this allocation happens, - decides about the actual backing storage where this allocation happens,
- and takes care of any migration of scatterlist - for all (shared) users of - and takes care of any migration of scatterlist - for all (shared) users of
......
...@@ -50,10 +50,10 @@ Attributes ...@@ -50,10 +50,10 @@ Attributes
Attributes of devices can be exported by a device driver through sysfs. Attributes of devices can be exported by a device driver through sysfs.
Please see Documentation/filesystems/sysfs.txt for more information Please see Documentation/filesystems/sysfs.rst for more information
on how sysfs works. on how sysfs works.
As explained in Documentation/kobject.txt, device attributes must be As explained in Documentation/core-api/kobject.rst, device attributes must be
created before the KOBJ_ADD uevent is generated. The only way to realize created before the KOBJ_ADD uevent is generated. The only way to realize
that is by defining an attribute group. that is by defining an attribute group.
......
...@@ -121,4 +121,4 @@ device-specific data or tunable interfaces. ...@@ -121,4 +121,4 @@ device-specific data or tunable interfaces.
More information about the sysfs directory layout can be found in More information about the sysfs directory layout can be found in
the other documents in this directory and in the file the other documents in this directory and in the file
Documentation/filesystems/sysfs.txt. Documentation/filesystems/sysfs.rst.
...@@ -39,6 +39,7 @@ available subsections can be seen below. ...@@ -39,6 +39,7 @@ available subsections can be seen below.
spi spi
i2c i2c
ipmb ipmb
ipmi
i3c/index i3c/index
interconnect interconnect
devfreq devfreq
......
...@@ -278,8 +278,8 @@ by a region device with a dynamically assigned id (REGION0 - REGION5). ...@@ -278,8 +278,8 @@ by a region device with a dynamically assigned id (REGION0 - REGION5).
be contiguous in DPA-space. be contiguous in DPA-space.
This bus is provided by the kernel under the device This bus is provided by the kernel under the device
/sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and /sys/devices/platform/nfit_test.0 when the nfit_test.ko module from
the nfit_test.ko module is loaded. This not only test LIBNVDIMM but the tools/testing/nvdimm is loaded. This not only test LIBNVDIMM but the
acpi_nfit.ko driver as well. acpi_nfit.ko driver as well.
......
================
CPU Idle Cooling
================
Situation: Situation:
---------- ----------
......
...@@ -8,6 +8,7 @@ Thermal ...@@ -8,6 +8,7 @@ Thermal
:maxdepth: 1 :maxdepth: 1
cpu-cooling-api cpu-cooling-api
cpu-idle-cooling
sysfs-api sysfs-api
power_allocator power_allocator
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | TODO | | parisc: | TODO |
| powerpc: | ok | | powerpc: | ok |
| riscv: | TODO | | riscv: | ok |
| s390: | ok | | s390: | ok |
| sh: | TODO | | sh: | TODO |
| sparc: | ok | | sparc: | ok |
......
...@@ -22,9 +22,9 @@ ...@@ -22,9 +22,9 @@
| nios2: | TODO | | nios2: | TODO |
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | TODO | | parisc: | TODO |
| powerpc: | TODO | | powerpc: | ok |
| riscv: | TODO | | riscv: | ok |
| s390: | TODO | | s390: | ok |
| sh: | TODO | | sh: | TODO |
| sparc: | TODO | | sparc: | TODO |
| um: | TODO | | um: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | TODO | | arm: | TODO |
| arm64: | TODO | | arm64: | TODO |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | ok | | ia64: | ok |
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | ok | | parisc: | ok |
| powerpc: | ok | | powerpc: | ok |
| riscv: | ok | | riscv: | TODO |
| s390: | ok | | s390: | ok |
| sh: | ok | | sh: | ok |
| sparc: | ok | | sparc: | ok |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | ok | | ia64: | ok |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
......
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
| m68k: | TODO | | m68k: | TODO |
| microblaze: | TODO | | microblaze: | ok |
| mips: | ok | | mips: | ok |
| nds32: | TODO | | nds32: | TODO |
| nios2: | TODO | | nios2: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | ok | | hexagon: | ok |
| ia64: | TODO | | ia64: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | ok | | hexagon: | ok |
| ia64: | TODO | | ia64: | TODO |
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
| nds32: | ok | | nds32: | ok |
| nios2: | TODO | | nios2: | TODO |
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | TODO | | parisc: | ok |
| powerpc: | ok | | powerpc: | ok |
| riscv: | TODO | | riscv: | TODO |
| s390: | ok | | s390: | ok |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | TODO | | parisc: | TODO |
| powerpc: | ok | | powerpc: | ok |
| riscv: | TODO | | riscv: | ok |
| s390: | ok | | s390: | ok |
| sh: | TODO | | sh: | TODO |
| sparc: | TODO | | sparc: | TODO |
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
| arm: | ok | | arm: | ok |
| arm64: | ok | | arm64: | ok |
| c6x: | TODO | | c6x: | TODO |
| csky: | TODO | | csky: | ok |
| h8300: | TODO | | h8300: | TODO |
| hexagon: | TODO | | hexagon: | TODO |
| ia64: | TODO | | ia64: | TODO |
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | TODO | | parisc: | TODO |
| powerpc: | ok | | powerpc: | ok |
| riscv: | TODO | | riscv: | ok |
| s390: | ok | | s390: | ok |
| sh: | TODO | | sh: | TODO |
| sparc: | TODO | | sparc: | TODO |
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | ok | | parisc: | ok |
| powerpc: | ok | | powerpc: | ok |
| riscv: | TODO | | riscv: | ok |
| s390: | ok | | s390: | ok |
| sh: | TODO | | sh: | TODO |
| sparc: | TODO | | sparc: | TODO |
......
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
| nios2: | TODO | | nios2: | TODO |
| openrisc: | TODO | | openrisc: | TODO |
| parisc: | TODO | | parisc: | TODO |
| powerpc: | TODO | | powerpc: | ok |
| riscv: | TODO | | riscv: | TODO |
| s390: | TODO | | s390: | TODO |
| sh: | TODO | | sh: | TODO |
......
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
| ia64: | TODO | | ia64: | TODO |
| m68k: | TODO | | m68k: | TODO |
| microblaze: | TODO | | microblaze: | TODO |
| mips: | TODO | | mips: | ok |
| nds32: | TODO | | nds32: | TODO |
| nios2: | TODO | | nios2: | TODO |
| openrisc: | TODO | | openrisc: | TODO |
......
...@@ -192,4 +192,4 @@ For more information on the Plan 9 Operating System check out ...@@ -192,4 +192,4 @@ For more information on the Plan 9 Operating System check out
http://plan9.bell-labs.com/plan9 http://plan9.bell-labs.com/plan9
For information on Plan 9 from User Space (Plan 9 applications and libraries For information on Plan 9 from User Space (Plan 9 applications and libraries
ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9 ported to Linux/BSD/OSX/etc) check out https://9fans.github.io/plan9port/
.. SPDX-License-Identifier: GPL-2.0
=================
Automount Support
=================
Support is available for filesystems that wish to do automounting Support is available for filesystems that wish to do automounting
support (such as kAFS which can be found in fs/afs/ and NFS in support (such as kAFS which can be found in fs/afs/ and NFS in
fs/nfs/). This facility includes allowing in-kernel mounts to be fs/nfs/). This facility includes allowing in-kernel mounts to be
...@@ -5,13 +12,12 @@ performed and mountpoint degradation to be requested. The latter can ...@@ -5,13 +12,12 @@ performed and mountpoint degradation to be requested. The latter can
also be requested by userspace. also be requested by userspace.
====================== In-Kernel Automounting
IN-KERNEL AUTOMOUNTING
====================== ======================
See section "Mount Traps" of Documentation/filesystems/autofs.rst See section "Mount Traps" of Documentation/filesystems/autofs.rst
Then from userspace, you can just do something like: Then from userspace, you can just do something like::
[root@andromeda root]# mount -t afs \#root.afs. /afs [root@andromeda root]# mount -t afs \#root.afs. /afs
[root@andromeda root]# ls /afs [root@andromeda root]# ls /afs
...@@ -21,7 +27,7 @@ Then from userspace, you can just do something like: ...@@ -21,7 +27,7 @@ Then from userspace, you can just do something like:
[root@andromeda root]# ls /afs/cambridge/afsdoc/ [root@andromeda root]# ls /afs/cambridge/afsdoc/
ChangeLog html LICENSE pdf RELNOTES-1.2.2 ChangeLog html LICENSE pdf RELNOTES-1.2.2
And then if you look in the mountpoint catalogue, you'll see something like: And then if you look in the mountpoint catalogue, you'll see something like::
[root@andromeda root]# cat /proc/mounts [root@andromeda root]# cat /proc/mounts
... ...
...@@ -30,8 +36,7 @@ And then if you look in the mountpoint catalogue, you'll see something like: ...@@ -30,8 +36,7 @@ And then if you look in the mountpoint catalogue, you'll see something like:
#afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0 #afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
=========================== Automatic Mountpoint Expiry
AUTOMATIC MOUNTPOINT EXPIRY
=========================== ===========================
Automatic expiration of mountpoints is easy, provided you've mounted the Automatic expiration of mountpoints is easy, provided you've mounted the
...@@ -43,7 +48,8 @@ To do expiration, you need to follow these steps: ...@@ -43,7 +48,8 @@ To do expiration, you need to follow these steps:
hung. hung.
(2) When a new mountpoint is created in the ->d_automount method, add (2) When a new mountpoint is created in the ->d_automount method, add
the mnt to the list using mnt_set_expiry() the mnt to the list using mnt_set_expiry()::
mnt_set_expiry(newmnt, &afs_vfsmounts); mnt_set_expiry(newmnt, &afs_vfsmounts);
(3) When you want mountpoints to be expired, call mark_mounts_for_expiry() (3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
...@@ -70,8 +76,7 @@ and the copies of those that are on an expiration list will be added to the ...@@ -70,8 +76,7 @@ and the copies of those that are on an expiration list will be added to the
same expiration list. same expiration list.
======================= Userspace Driven Expiry
USERSPACE DRIVEN EXPIRY
======================= =======================
As an alternative, it is possible for userspace to request expiry of any As an alternative, it is possible for userspace to request expiry of any
......
.. SPDX-License-Identifier: GPL-2.0
Filesystem Caching
==================
.. toctree::
:maxdepth: 2
fscache
object
backend-api
cachefiles
netfs-api
operations
==================================================== .. SPDX-License-Identifier: GPL-2.0
IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
==================================================== ====================================================
In-Kernel Cache Object Representation and Management
====================================================
By: David Howells <dhowells@redhat.com> By: David Howells <dhowells@redhat.com>
Contents: .. Contents:
(*) Representation (*) Representation
...@@ -18,8 +20,7 @@ Contents: ...@@ -18,8 +20,7 @@ Contents:
(*) The set of events. (*) The set of events.
============== Representation
REPRESENTATION
============== ==============
FS-Cache maintains an in-kernel representation of each object that a netfs is FS-Cache maintains an in-kernel representation of each object that a netfs is
...@@ -38,7 +39,7 @@ or even by no objects (it may not be cached). ...@@ -38,7 +39,7 @@ or even by no objects (it may not be cached).
Furthermore, both cookies and objects are hierarchical. The two hierarchies Furthermore, both cookies and objects are hierarchical. The two hierarchies
correspond, but the cookies tree is a superset of the union of the object trees correspond, but the cookies tree is a superset of the union of the object trees
of multiple caches: of multiple caches::
NETFS INDEX TREE : CACHE 1 : CACHE 2 NETFS INDEX TREE : CACHE 1 : CACHE 2
: : : :
...@@ -89,8 +90,7 @@ pointers to the cookies. The cookies themselves and any objects attached to ...@@ -89,8 +90,7 @@ pointers to the cookies. The cookies themselves and any objects attached to
those cookies are hidden from it. those cookies are hidden from it.
=============================== Object Management State Machine
OBJECT MANAGEMENT STATE MACHINE
=============================== ===============================
Within FS-Cache, each active object is managed by its own individual state Within FS-Cache, each active object is managed by its own individual state
...@@ -124,7 +124,7 @@ is not masked, the object will be queued for processing (by calling ...@@ -124,7 +124,7 @@ is not masked, the object will be queued for processing (by calling
fscache_enqueue_object()). fscache_enqueue_object()).
PROVISION OF CPU TIME Provision of CPU Time
--------------------- ---------------------
The work to be done by the various states was given CPU time by the threads of The work to be done by the various states was given CPU time by the threads of
...@@ -141,7 +141,7 @@ because: ...@@ -141,7 +141,7 @@ because:
workqueues don't necessarily have the right numbers of threads. workqueues don't necessarily have the right numbers of threads.
LOCKING SIMPLIFICATION Locking Simplification
---------------------- ----------------------
Because only one worker thread may be operating on any particular object's Because only one worker thread may be operating on any particular object's
...@@ -151,8 +151,7 @@ from the cache backend's representation (fscache_object) - which may be ...@@ -151,8 +151,7 @@ from the cache backend's representation (fscache_object) - which may be
requested from either end. requested from either end.
================= The Set of States
THE SET OF STATES
================= =================
The object state machine has a set of states that it can be in. There are The object state machine has a set of states that it can be in. There are
...@@ -275,19 +274,17 @@ memory and potentially deletes stuff from disk: ...@@ -275,19 +274,17 @@ memory and potentially deletes stuff from disk:
this state. this state.
THE SET OF EVENTS The Set of Events
----------------- -----------------
There are a number of events that can be raised to an object state machine: There are a number of events that can be raised to an object state machine:
(*) FSCACHE_OBJECT_EV_UPDATE FSCACHE_OBJECT_EV_UPDATE
The netfs requested that an object be updated. The state machine will ask The netfs requested that an object be updated. The state machine will ask
the cache backend to update the object, and the cache backend will ask the the cache backend to update the object, and the cache backend will ask the
netfs for details of the change through its cookie definition ops. netfs for details of the change through its cookie definition ops.
(*) FSCACHE_OBJECT_EV_CLEARED FSCACHE_OBJECT_EV_CLEARED
This is signalled in two circumstances: This is signalled in two circumstances:
(a) when an object's last child object is dropped and (a) when an object's last child object is dropped and
...@@ -296,20 +293,16 @@ There are a number of events that can be raised to an object state machine: ...@@ -296,20 +293,16 @@ There are a number of events that can be raised to an object state machine:
This is used to proceed from the dying state. This is used to proceed from the dying state.
(*) FSCACHE_OBJECT_EV_ERROR FSCACHE_OBJECT_EV_ERROR
This is signalled when an I/O error occurs during the processing of some This is signalled when an I/O error occurs during the processing of some
object. object.
(*) FSCACHE_OBJECT_EV_RELEASE FSCACHE_OBJECT_EV_RELEASE, FSCACHE_OBJECT_EV_RETIRE
(*) FSCACHE_OBJECT_EV_RETIRE
These are signalled when the netfs relinquishes a cookie it was using. These are signalled when the netfs relinquishes a cookie it was using.
The event selected depends on whether the netfs asks for the backing The event selected depends on whether the netfs asks for the backing
object to be retired (deleted) or retained. object to be retired (deleted) or retained.
(*) FSCACHE_OBJECT_EV_WITHDRAW FSCACHE_OBJECT_EV_WITHDRAW
This is signalled when the cache backend wants to withdraw an object. This is signalled when the cache backend wants to withdraw an object.
This means that the object will have to be detached from the netfs's This means that the object will have to be detached from the netfs's
cookie. cookie.
......
================================ .. SPDX-License-Identifier: GPL-2.0
ASYNCHRONOUS OPERATIONS HANDLING
================================ ================================
Asynchronous Operations Handling
================================
By: David Howells <dhowells@redhat.com> By: David Howells <dhowells@redhat.com>
Contents: .. Contents:
(*) Overview. (*) Overview.
...@@ -17,8 +19,7 @@ Contents: ...@@ -17,8 +19,7 @@ Contents:
(*) Asynchronous callback. (*) Asynchronous callback.
======== Overview
OVERVIEW
======== ========
FS-Cache has an asynchronous operations handling facility that it uses for its FS-Cache has an asynchronous operations handling facility that it uses for its
...@@ -33,11 +34,10 @@ backend for completion. ...@@ -33,11 +34,10 @@ backend for completion.
To make use of this facility, <linux/fscache-cache.h> should be #included. To make use of this facility, <linux/fscache-cache.h> should be #included.
=============================== Operation Record Initialisation
OPERATION RECORD INITIALISATION
=============================== ===============================
An operation is recorded in an fscache_operation struct: An operation is recorded in an fscache_operation struct::
struct fscache_operation { struct fscache_operation {
union { union {
...@@ -50,7 +50,7 @@ An operation is recorded in an fscache_operation struct: ...@@ -50,7 +50,7 @@ An operation is recorded in an fscache_operation struct:
}; };
Someone wanting to issue an operation should allocate something with this Someone wanting to issue an operation should allocate something with this
struct embedded in it. They should initialise it by calling: struct embedded in it. They should initialise it by calling::
void fscache_operation_init(struct fscache_operation *op, void fscache_operation_init(struct fscache_operation *op,
fscache_operation_release_t release); fscache_operation_release_t release);
...@@ -67,8 +67,7 @@ FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the ...@@ -67,8 +67,7 @@ FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
operation and waited for afterwards. operation and waited for afterwards.
========== Parameters
PARAMETERS
========== ==========
There are a number of parameters that can be set in the operation record's flag There are a number of parameters that can be set in the operation record's flag
...@@ -87,7 +86,7 @@ operations: ...@@ -87,7 +86,7 @@ operations:
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
before submitting the operation, and the operating thread must wait for it before submitting the operation, and the operating thread must wait for it
to be cleared before proceeding: to be cleared before proceeding::
wait_on_bit(&op->flags, FSCACHE_OP_WAITING, wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
TASK_UNINTERRUPTIBLE); TASK_UNINTERRUPTIBLE);
...@@ -101,7 +100,7 @@ operations: ...@@ -101,7 +100,7 @@ operations:
page to a netfs page after the backing fs has read the page in. page to a netfs page after the backing fs has read the page in.
If this option is used, op->fast_work and op->processor must be If this option is used, op->fast_work and op->processor must be
initialised before submitting the operation: initialised before submitting the operation::
INIT_WORK(&op->fast_work, do_some_work); INIT_WORK(&op->fast_work, do_some_work);
...@@ -114,7 +113,7 @@ operations: ...@@ -114,7 +113,7 @@ operations:
pages that have just been fetched from a remote server. pages that have just been fetched from a remote server.
If this option is used, op->slow_work and op->processor must be If this option is used, op->slow_work and op->processor must be
initialised before submitting the operation: initialised before submitting the operation::
fscache_operation_init_slow(op, processor) fscache_operation_init_slow(op, processor)
...@@ -132,8 +131,7 @@ Furthermore, operations may be one of two types: ...@@ -132,8 +131,7 @@ Furthermore, operations may be one of two types:
operations running at the same time. operations running at the same time.
========= Procedure
PROCEDURE
========= =========
Operations are used through the following procedure: Operations are used through the following procedure:
...@@ -143,7 +141,7 @@ Operations are used through the following procedure: ...@@ -143,7 +141,7 @@ Operations are used through the following procedure:
generic op embedded within. generic op embedded within.
(2) The submitting thread must then submit the operation for processing using (2) The submitting thread must then submit the operation for processing using
one of the following two functions: one of the following two functions::
int fscache_submit_op(struct fscache_object *object, int fscache_submit_op(struct fscache_object *object,
struct fscache_operation *op); struct fscache_operation *op);
...@@ -164,7 +162,7 @@ Operations are used through the following procedure: ...@@ -164,7 +162,7 @@ Operations are used through the following procedure:
operation of conflicting exclusivity is in progress on the object. operation of conflicting exclusivity is in progress on the object.
If the operation is asynchronous, the manager will retain a reference to If the operation is asynchronous, the manager will retain a reference to
it, so the caller should put their reference to it by passing it to: it, so the caller should put their reference to it by passing it to::
void fscache_put_operation(struct fscache_operation *op); void fscache_put_operation(struct fscache_operation *op);
...@@ -179,12 +177,12 @@ Operations are used through the following procedure: ...@@ -179,12 +177,12 @@ Operations are used through the following procedure:
(4) The operation holds an effective lock upon the object, preventing other (4) The operation holds an effective lock upon the object, preventing other
exclusive ops conflicting until it is released. The operation can be exclusive ops conflicting until it is released. The operation can be
enqueued for further immediate asynchronous processing by adjusting the enqueued for further immediate asynchronous processing by adjusting the
CPU time provisioning option if necessary, eg: CPU time provisioning option if necessary, eg::
op->flags &= ~FSCACHE_OP_TYPE; op->flags &= ~FSCACHE_OP_TYPE;
op->flags |= ~FSCACHE_OP_FAST; op->flags |= ~FSCACHE_OP_FAST;
and calling: and calling::
void fscache_enqueue_operation(struct fscache_operation *op) void fscache_enqueue_operation(struct fscache_operation *op)
...@@ -192,13 +190,12 @@ Operations are used through the following procedure: ...@@ -192,13 +190,12 @@ Operations are used through the following procedure:
pools. pools.
===================== Asynchronous Callback
ASYNCHRONOUS CALLBACK
===================== =====================
When used in asynchronous mode, the worker thread pool will invoke the When used in asynchronous mode, the worker thread pool will invoke the
processor method with a pointer to the operation. This should then get at the processor method with a pointer to the operation. This should then get at the
container struct by using container_of(): container struct by using container_of()::
static void fscache_write_op(struct fscache_operation *_op) static void fscache_write_op(struct fscache_operation *_op)
{ {
......
.. SPDX-License-Identifier: GPL-2.0
===========================================
Mounting root file system via SMB (cifs.ko) Mounting root file system via SMB (cifs.ko)
=========================================== ===========================================
Written 2019 by Paulo Alcantara <palcantara@suse.de> Written 2019 by Paulo Alcantara <palcantara@suse.de>
Written 2019 by Aurelien Aptel <aaptel@suse.com> Written 2019 by Aurelien Aptel <aaptel@suse.com>
The CONFIG_CIFS_ROOT option enables experimental root file system The CONFIG_CIFS_ROOT option enables experimental root file system
...@@ -32,7 +36,7 @@ Server configuration ...@@ -32,7 +36,7 @@ Server configuration
==================== ====================
To enable SMB1+UNIX extensions you will need to set these global To enable SMB1+UNIX extensions you will need to set these global
settings in Samba smb.conf: settings in Samba smb.conf::
[global] [global]
server min protocol = NT1 server min protocol = NT1
...@@ -41,12 +45,16 @@ settings in Samba smb.conf: ...@@ -41,12 +45,16 @@ settings in Samba smb.conf:
Kernel command line Kernel command line
=================== ===================
root=/dev/cifs ::
root=/dev/cifs
This is just a virtual device that basically tells the kernel to mount This is just a virtual device that basically tells the kernel to mount
the root file system via SMB protocol. the root file system via SMB protocol.
cifsroot=//<server-ip>/<share>[,options] ::
cifsroot=//<server-ip>/<share>[,options]
Enables the kernel to mount the root file system via SMB that are Enables the kernel to mount the root file system via SMB that are
located in the <server-ip> and <share> specified in this option. located in the <server-ip> and <share> specified in this option.
...@@ -65,33 +73,33 @@ options ...@@ -65,33 +73,33 @@ options
Examples Examples
======== ========
Export root file system as a Samba share in smb.conf file. Export root file system as a Samba share in smb.conf file::
... ...
[linux] [linux]
path = /path/to/rootfs path = /path/to/rootfs
read only = no read only = no
guest ok = yes guest ok = yes
force user = root force user = root
force group = root force group = root
browseable = yes browseable = yes
writeable = yes writeable = yes
admin users = root admin users = root
public = yes public = yes
create mask = 0777 create mask = 0777
directory mask = 0777 directory mask = 0777
... ...
Restart smb service. Restart smb service::
# systemctl restart smb # systemctl restart smb
Test it under QEMU on a kernel built with CONFIG_CIFS_ROOT and Test it under QEMU on a kernel built with CONFIG_CIFS_ROOT and
CONFIG_IP_PNP options enabled. CONFIG_IP_PNP options enabled::
# qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \ # qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
-kernel /path/to/linux/arch/x86/boot/bzImage -nographic \ -kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
-append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3" -append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
1: https://wiki.samba.org/index.php/UNIX_Extensions 1: https://wiki.samba.org/index.php/UNIX_Extensions
...@@ -74,7 +74,7 @@ are zeroed out and converted to written extents before being returned to avoid ...@@ -74,7 +74,7 @@ are zeroed out and converted to written extents before being returned to avoid
exposure of uninitialized data through mmap. exposure of uninitialized data through mmap.
These filesystems may be used for inspiration: These filesystems may be used for inspiration:
- ext2: see Documentation/filesystems/ext2.txt - ext2: see Documentation/filesystems/ext2.rst
- ext4: see Documentation/filesystems/ext4/ - ext4: see Documentation/filesystems/ext4/
- xfs: see Documentation/admin-guide/xfs.rst - xfs: see Documentation/admin-guide/xfs.rst
......
...@@ -166,16 +166,17 @@ file:: ...@@ -166,16 +166,17 @@ file::
}; };
struct debugfs_regset32 { struct debugfs_regset32 {
struct debugfs_reg32 *regs; const struct debugfs_reg32 *regs;
int nregs; int nregs;
void __iomem *base; void __iomem *base;
struct device *dev; /* Optional device for Runtime PM */
}; };
debugfs_create_regset32(const char *name, umode_t mode, debugfs_create_regset32(const char *name, umode_t mode,
struct dentry *parent, struct dentry *parent,
struct debugfs_regset32 *regset); struct debugfs_regset32 *regset);
void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, void debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs,
int nregs, void __iomem *base, char *prefix); int nregs, void __iomem *base, char *prefix);
The "base" argument may be 0, but you may want to build the reg32 array The "base" argument may be 0, but you may want to build the reg32 array
......
.. SPDX-License-Identifier: GPL-2.0
=====================
The Devpts Filesystem
=====================
Each mount of the devpts filesystem is now distinct such that ptys
and their indicies allocated in one mount are independent from ptys
and their indicies in all other mounts.
All mounts of the devpts filesystem now create a ``/dev/pts/ptmx`` node
with permissions ``0000``.
To retain backwards compatibility the a ptmx device node (aka any node
created with ``mknod name c 5 2``) when opened will look for an instance
of devpts under the name ``pts`` in the same directory as the ptmx device
node.
As an option instead of placing a ``/dev/ptmx`` device node at ``/dev/ptmx``
it is possible to place a symlink to ``/dev/pts/ptmx`` at ``/dev/ptmx`` or
to bind mount ``/dev/ptx/ptmx`` to ``/dev/ptmx``. If you opt for using
the devpts filesystem in this manner devpts should be mounted with
the ``ptmxmode=0666``, or ``chmod 0666 /dev/pts/ptmx`` should be called.
Total count of pty pairs in all instances is limited by sysctls::
kernel.pty.max = 4096 - global limit
kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
kernel.pty.nr - current count of ptys
Per-instance limit could be set by adding mount option ``max=<count>``.
This feature was added in kernel 3.4 together with
``sysctl kernel.pty.reserve``.
In kernels older than 3.4 sysctl ``kernel.pty.max`` works as per-instance limit.
Each mount of the devpts filesystem is now distinct such that ptys
and their indicies allocated in one mount are independent from ptys
and their indicies in all other mounts.
All mounts of the devpts filesystem now create a /dev/pts/ptmx node
with permissions 0000.
To retain backwards compatibility the a ptmx device node (aka any node
created with "mknod name c 5 2") when opened will look for an instance
of devpts under the name "pts" in the same directory as the ptmx device
node.
As an option instead of placing a /dev/ptmx device node at /dev/ptmx
it is possible to place a symlink to /dev/pts/ptmx at /dev/ptmx or
to bind mount /dev/ptx/ptmx to /dev/ptmx. If you opt for using
the devpts filesystem in this manner devpts should be mounted with
the ptmxmode=0666, or chmod 0666 /dev/pts/ptmx should be called.
Total count of pty pairs in all instances is limited by sysctls:
kernel.pty.max = 4096 - global limit
kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
kernel.pty.nr - current count of ptys
Per-instance limit could be set by adding mount option "max=<count>".
This feature was added in kernel 3.4 together with sysctl kernel.pty.reserve.
In kernels older than 3.4 sysctl kernel.pty.max works as per-instance limit.
Linux Directory Notification .. SPDX-License-Identifier: GPL-2.0
============================
============================
Linux Directory Notification
============================
Stephen Rothwell <sfr@canb.auug.org.au> Stephen Rothwell <sfr@canb.auug.org.au>
...@@ -12,6 +15,7 @@ being delivered using signals. ...@@ -12,6 +15,7 @@ being delivered using signals.
The application decides which "events" it wants to be notified about. The application decides which "events" it wants to be notified about.
The currently defined events are: The currently defined events are:
========= =====================================================
DN_ACCESS A file in the directory was accessed (read) DN_ACCESS A file in the directory was accessed (read)
DN_MODIFY A file in the directory was modified (write,truncate) DN_MODIFY A file in the directory was modified (write,truncate)
DN_CREATE A file was created in the directory DN_CREATE A file was created in the directory
...@@ -19,6 +23,7 @@ The currently defined events are: ...@@ -19,6 +23,7 @@ The currently defined events are:
DN_RENAME A file in the directory was renamed DN_RENAME A file in the directory was renamed
DN_ATTRIB A file in the directory had its attributes DN_ATTRIB A file in the directory had its attributes
changed (chmod,chown) changed (chmod,chown)
========= =====================================================
Usually, the application must reregister after each notification, but Usually, the application must reregister after each notification, but
if DN_MULTISHOT is or'ed with the event mask, then the registration will if DN_MULTISHOT is or'ed with the event mask, then the registration will
...@@ -36,7 +41,7 @@ especially important if DN_MULTISHOT is specified. Note that SIGRTMIN ...@@ -36,7 +41,7 @@ especially important if DN_MULTISHOT is specified. Note that SIGRTMIN
is often blocked, so it is better to use (at least) SIGRTMIN + 1. is often blocked, so it is better to use (at least) SIGRTMIN + 1.
Implementation expectations (features and bugs :-)) Implementation expectations (features and bugs :-))
--------------------------- ---------------------------------------------------
The notification should work for any local access to files even if the The notification should work for any local access to files even if the
actual file system is on a remote server. This implies that remote actual file system is on a remote server. This implies that remote
...@@ -67,4 +72,4 @@ See tools/testing/selftests/filesystems/dnotify_test.c for an example. ...@@ -67,4 +72,4 @@ See tools/testing/selftests/filesystems/dnotify_test.c for an example.
NOTE NOTE
---- ----
Beginning with Linux 2.6.13, dnotify has been replaced by inotify. Beginning with Linux 2.6.13, dnotify has been replaced by inotify.
See Documentation/filesystems/inotify.txt for more information on it. See Documentation/filesystems/inotify.rst for more information on it.
...@@ -24,3 +24,20 @@ files that are not well-known standardized variables are created ...@@ -24,3 +24,20 @@ files that are not well-known standardized variables are created
as immutable files. This doesn't prevent removal - "chattr -i" will work - as immutable files. This doesn't prevent removal - "chattr -i" will work -
but it does prevent this kind of failure from being accomplished but it does prevent this kind of failure from being accomplished
accidentally. accidentally.
.. warning ::
When a content of an UEFI variable in /sys/firmware/efi/efivars is
displayed, for example using "hexdump", pay attention that the first
4 bytes of the output represent the UEFI variable attributes,
in little-endian format.
Practically the output of each efivar is composed of:
+-----------------------------------+
|4_bytes_of_attributes + efivar_data|
+-----------------------------------+
*See also:*
- Documentation/admin-guide/acpi/ssdt-overlays.rst
- Documentation/ABI/stable/sysfs-firmware-efi-vars
.. SPDX-License-Identifier: GPL-2.0
============ ============
Fiemap Ioctl Fiemap Ioctl
============ ============
...@@ -10,9 +12,9 @@ returns a list of extents. ...@@ -10,9 +12,9 @@ returns a list of extents.
Request Basics Request Basics
-------------- --------------
A fiemap request is encoded within struct fiemap: A fiemap request is encoded within struct fiemap::
struct fiemap { struct fiemap {
__u64 fm_start; /* logical offset (inclusive) at __u64 fm_start; /* logical offset (inclusive) at
* which to start mapping (in) */ * which to start mapping (in) */
__u64 fm_length; /* logical length of mapping which __u64 fm_length; /* logical length of mapping which
...@@ -23,7 +25,7 @@ struct fiemap { ...@@ -23,7 +25,7 @@ struct fiemap {
__u32 fm_extent_count; /* size of fm_extents array (in) */ __u32 fm_extent_count; /* size of fm_extents array (in) */
__u32 fm_reserved; __u32 fm_reserved;
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
}; };
fm_start, and fm_length specify the logical range within the file fm_start, and fm_length specify the logical range within the file
...@@ -51,12 +53,12 @@ nothing to prevent the file from changing between calls to FIEMAP. ...@@ -51,12 +53,12 @@ nothing to prevent the file from changing between calls to FIEMAP.
The following flags can be set in fm_flags: The following flags can be set in fm_flags:
* FIEMAP_FLAG_SYNC FIEMAP_FLAG_SYNC
If this flag is set, the kernel will sync the file before mapping extents. If this flag is set, the kernel will sync the file before mapping extents.
* FIEMAP_FLAG_XATTR FIEMAP_FLAG_XATTR
If this flag is set, the extents returned will describe the inodes If this flag is set, the extents returned will describe the inodes
extended attribute lookup tree, instead of its data tree. extended attribute lookup tree, instead of its data tree.
Extent Mapping Extent Mapping
...@@ -75,18 +77,18 @@ complete the requested range and will not have the FIEMAP_EXTENT_LAST ...@@ -75,18 +77,18 @@ complete the requested range and will not have the FIEMAP_EXTENT_LAST
flag set (see the next section on extent flags). flag set (see the next section on extent flags).
Each extent is described by a single fiemap_extent structure as Each extent is described by a single fiemap_extent structure as
returned in fm_extents. returned in fm_extents::
struct fiemap_extent { struct fiemap_extent {
__u64 fe_logical; /* logical offset in bytes for the start of __u64 fe_logical; /* logical offset in bytes for the start of
* the extent */ * the extent */
__u64 fe_physical; /* physical offset in bytes for the start __u64 fe_physical; /* physical offset in bytes for the start
* of the extent */ * of the extent */
__u64 fe_length; /* length in bytes for the extent */ __u64 fe_length; /* length in bytes for the extent */
__u64 fe_reserved64[2]; __u64 fe_reserved64[2];
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
__u32 fe_reserved[3]; __u32 fe_reserved[3];
}; };
All offsets and lengths are in bytes and mirror those on disk. It is valid All offsets and lengths are in bytes and mirror those on disk. It is valid
for an extents logical offset to start before the request or its logical for an extents logical offset to start before the request or its logical
...@@ -114,26 +116,27 @@ worry about all present and future flags which might imply unaligned ...@@ -114,26 +116,27 @@ worry about all present and future flags which might imply unaligned
data. Note that the opposite is not true - it would be valid for data. Note that the opposite is not true - it would be valid for
FIEMAP_EXTENT_NOT_ALIGNED to appear alone. FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
* FIEMAP_EXTENT_LAST FIEMAP_EXTENT_LAST
This is generally the last extent in the file. A mapping attempt past This is generally the last extent in the file. A mapping attempt past
this extent may return nothing. Some implementations set this flag to this extent may return nothing. Some implementations set this flag to
indicate this extent is the last one in the range queried by the user indicate this extent is the last one in the range queried by the user
(via fiemap->fm_length). (via fiemap->fm_length).
FIEMAP_EXTENT_UNKNOWN
The location of this extent is currently unknown. This may indicate
the data is stored on an inaccessible volume or that no storage has
been allocated for the file yet.
* FIEMAP_EXTENT_UNKNOWN FIEMAP_EXTENT_DELALLOC
The location of this extent is currently unknown. This may indicate This will also set FIEMAP_EXTENT_UNKNOWN.
the data is stored on an inaccessible volume or that no storage has
been allocated for the file yet.
* FIEMAP_EXTENT_DELALLOC Delayed allocation - while there is data for this extent, its
- This will also set FIEMAP_EXTENT_UNKNOWN. physical location has not been allocated yet.
Delayed allocation - while there is data for this extent, its
physical location has not been allocated yet.
* FIEMAP_EXTENT_ENCODED FIEMAP_EXTENT_ENCODED
This extent does not consist of plain filesystem blocks but is This extent does not consist of plain filesystem blocks but is
encoded (e.g. encrypted or compressed). Reading the data in this encoded (e.g. encrypted or compressed). Reading the data in this
extent via I/O to the block device will have undefined results. extent via I/O to the block device will have undefined results.
Note that it is *always* undefined to try to update the data Note that it is *always* undefined to try to update the data
in-place by writing to the indicated location without the in-place by writing to the indicated location without the
...@@ -145,32 +148,32 @@ unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is ...@@ -145,32 +148,32 @@ unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
clear; user applications must not try reading or writing to the clear; user applications must not try reading or writing to the
filesystem via the block device under any other circumstances. filesystem via the block device under any other circumstances.
* FIEMAP_EXTENT_DATA_ENCRYPTED FIEMAP_EXTENT_DATA_ENCRYPTED
- This will also set FIEMAP_EXTENT_ENCODED This will also set FIEMAP_EXTENT_ENCODED
The data in this extent has been encrypted by the file system. The data in this extent has been encrypted by the file system.
* FIEMAP_EXTENT_NOT_ALIGNED FIEMAP_EXTENT_NOT_ALIGNED
Extent offsets and length are not guaranteed to be block aligned. Extent offsets and length are not guaranteed to be block aligned.
* FIEMAP_EXTENT_DATA_INLINE FIEMAP_EXTENT_DATA_INLINE
This will also set FIEMAP_EXTENT_NOT_ALIGNED This will also set FIEMAP_EXTENT_NOT_ALIGNED
Data is located within a meta data block. Data is located within a meta data block.
* FIEMAP_EXTENT_DATA_TAIL FIEMAP_EXTENT_DATA_TAIL
This will also set FIEMAP_EXTENT_NOT_ALIGNED This will also set FIEMAP_EXTENT_NOT_ALIGNED
Data is packed into a block with data from other files. Data is packed into a block with data from other files.
* FIEMAP_EXTENT_UNWRITTEN FIEMAP_EXTENT_UNWRITTEN
Unwritten extent - the extent is allocated but its data has not been Unwritten extent - the extent is allocated but its data has not been
initialized. This indicates the extent's data will be all zero if read initialized. This indicates the extent's data will be all zero if read
through the filesystem but the contents are undefined if read directly from through the filesystem but the contents are undefined if read directly from
the device. the device.
* FIEMAP_EXTENT_MERGED FIEMAP_EXTENT_MERGED
This will be set when a file does not support extents, i.e., it uses a block This will be set when a file does not support extents, i.e., it uses a block
based addressing scheme. Since returning an extent for each block back to based addressing scheme. Since returning an extent for each block back to
userspace would be highly inefficient, the kernel will try to merge most userspace would be highly inefficient, the kernel will try to merge most
adjacent blocks into 'extents'. adjacent blocks into 'extents'.
VFS -> File System Implementation VFS -> File System Implementation
...@@ -179,23 +182,23 @@ VFS -> File System Implementation ...@@ -179,23 +182,23 @@ VFS -> File System Implementation
File systems wishing to support fiemap must implement a ->fiemap callback on File systems wishing to support fiemap must implement a ->fiemap callback on
their inode_operations structure. The fs ->fiemap call is responsible for their inode_operations structure. The fs ->fiemap call is responsible for
defining its set of supported fiemap flags, and calling a helper function on defining its set of supported fiemap flags, and calling a helper function on
each discovered extent: each discovered extent::
struct inode_operations { struct inode_operations {
... ...
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
u64 len); u64 len);
->fiemap is passed struct fiemap_extent_info which describes the ->fiemap is passed struct fiemap_extent_info which describes the
fiemap request: fiemap request::
struct fiemap_extent_info { struct fiemap_extent_info {
unsigned int fi_flags; /* Flags as passed from user */ unsigned int fi_flags; /* Flags as passed from user */
unsigned int fi_extents_mapped; /* Number of mapped extents */ unsigned int fi_extents_mapped; /* Number of mapped extents */
unsigned int fi_extents_max; /* Size of fiemap_extent array */ unsigned int fi_extents_max; /* Size of fiemap_extent array */
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */ struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
}; };
It is intended that the file system should not need to access any of this It is intended that the file system should not need to access any of this
structure directly. Filesystem handlers should be tolerant to signals and return structure directly. Filesystem handlers should be tolerant to signals and return
...@@ -203,9 +206,9 @@ EINTR once fatal signal received. ...@@ -203,9 +206,9 @@ EINTR once fatal signal received.
Flag checking should be done at the beginning of the ->fiemap callback via the Flag checking should be done at the beginning of the ->fiemap callback via the
fiemap_check_flags() helper: fiemap_check_flags() helper::
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
The struct fieinfo should be passed in as received from ioctl_fiemap(). The The struct fieinfo should be passed in as received from ioctl_fiemap(). The
set of fiemap flags which the fs understands should be passed via fs_flags. If set of fiemap flags which the fs understands should be passed via fs_flags. If
...@@ -216,10 +219,10 @@ ioctl_fiemap(). ...@@ -216,10 +219,10 @@ ioctl_fiemap().
For each extent in the request range, the file system should call For each extent in the request range, the file system should call
the helper function, fiemap_fill_next_extent(): the helper function, fiemap_fill_next_extent()::
int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
u64 phys, u64 len, u32 flags, u32 dev); u64 phys, u64 len, u32 flags, u32 dev);
fiemap_fill_next_extent() will use the passed values to populate the fiemap_fill_next_extent() will use the passed values to populate the
next free extent in the fm_extents array. 'General' extent flags will next free extent in the fm_extents array. 'General' extent flags will
......
.. SPDX-License-Identifier: GPL-2.0
===================================
File management in the Linux kernel File management in the Linux kernel
----------------------------------- ===================================
This document describes how locking for files (struct file) This document describes how locking for files (struct file)
and file descriptor table (struct files) works. and file descriptor table (struct files) works.
...@@ -34,7 +37,7 @@ appear atomic. Here are the locking rules for ...@@ -34,7 +37,7 @@ appear atomic. Here are the locking rules for
the fdtable structure - the fdtable structure -
1. All references to the fdtable must be done through 1. All references to the fdtable must be done through
the files_fdtable() macro : the files_fdtable() macro::
struct fdtable *fdt; struct fdtable *fdt;
...@@ -61,7 +64,8 @@ the fdtable structure - ...@@ -61,7 +64,8 @@ the fdtable structure -
4. To look up the file structure given an fd, a reader 4. To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup. take care of barrier requirements due to lock-free lookup.
An example :
An example::
struct file *file; struct file *file;
...@@ -77,7 +81,7 @@ the fdtable structure - ...@@ -77,7 +81,7 @@ the fdtable structure -
of the fd (fget()/fget_light()) are lock-free, it is possible of the fd (fget()/fget_light()) are lock-free, it is possible
that look-up may race with the last put() operation on the that look-up may race with the last put() operation on the
file structure. This is avoided using atomic_long_inc_not_zero() file structure. This is avoided using atomic_long_inc_not_zero()
on ->f_count : on ->f_count::
rcu_read_lock(); rcu_read_lock();
file = fcheck_files(files, fd); file = fcheck_files(files, fd);
...@@ -106,7 +110,8 @@ the fdtable structure - ...@@ -106,7 +110,8 @@ the fdtable structure -
holding files->file_lock. If ->file_lock is dropped, then holding files->file_lock. If ->file_lock is dropped, then
another thread expand the files thereby creating a new another thread expand the files thereby creating a new
fdtable and making the earlier fdtable pointer stale. fdtable and making the earlier fdtable pointer stale.
For example :
For example::
spin_lock(&files->file_lock); spin_lock(&files->file_lock);
fd = locate_fd(files, file, start); fd = locate_fd(files, file, start);
......
.. SPDX-License-Identifier: GPL-2.0
==============
Fuse I/O Modes
==============
Fuse supports the following I/O modes: Fuse supports the following I/O modes:
- direct-io - direct-io
......
...@@ -24,6 +24,22 @@ algorithms work. ...@@ -24,6 +24,22 @@ algorithms work.
splice splice
locking locking
directory-locking directory-locking
devpts
dnotify
fiemap
files
locks
mandatory-locking
mount_api
quota
seq_file
sharedsubtree
sysfs-pci
sysfs-tagging
automount-support
caching/index
porting porting
...@@ -57,7 +73,10 @@ Documentation for filesystem implementations. ...@@ -57,7 +73,10 @@ Documentation for filesystem implementations.
befs befs
bfs bfs
btrfs btrfs
cifs/cifsroot
ceph ceph
coda
configfs
cramfs cramfs
debugfs debugfs
dlmfs dlmfs
...@@ -73,6 +92,7 @@ Documentation for filesystem implementations. ...@@ -73,6 +92,7 @@ Documentation for filesystem implementations.
hfsplus hfsplus
hpfs hpfs
fuse fuse
fuse-io
inotify inotify
isofs isofs
nilfs2 nilfs2
...@@ -88,6 +108,7 @@ Documentation for filesystem implementations. ...@@ -88,6 +108,7 @@ Documentation for filesystem implementations.
ramfs-rootfs-initramfs ramfs-rootfs-initramfs
relay relay
romfs romfs
spufs/index
squashfs squashfs
sysfs sysfs
sysv-fs sysv-fs
...@@ -97,4 +118,6 @@ Documentation for filesystem implementations. ...@@ -97,4 +118,6 @@ Documentation for filesystem implementations.
udf udf
virtiofs virtiofs
vfat vfat
xfs-delayed-logging-design
xfs-self-describing-metadata
zonefs zonefs
File Locking Release Notes .. SPDX-License-Identifier: GPL-2.0
==========================
File Locking Release Notes
==========================
Andy Walker <andy@lysaker.kvaerner.no> Andy Walker <andy@lysaker.kvaerner.no>
...@@ -6,7 +10,7 @@ ...@@ -6,7 +10,7 @@
1. What's New? 1. What's New?
-------------- ==============
1.1 Broken Flock Emulation 1.1 Broken Flock Emulation
-------------------------- --------------------------
...@@ -25,7 +29,7 @@ anyway (see the file "Documentation/process/changes.rst".) ...@@ -25,7 +29,7 @@ anyway (see the file "Documentation/process/changes.rst".)
--------------------------- ---------------------------
1.2.1 Typical Problems - Sendmail 1.2.1 Typical Problems - Sendmail
--------------------------------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because sendmail was unable to use the old flock() emulation, many sendmail Because sendmail was unable to use the old flock() emulation, many sendmail
installations use fcntl() instead of flock(). This is true of Slackware 3.0 installations use fcntl() instead of flock(). This is true of Slackware 3.0
for example. This gave rise to some other subtle problems if sendmail was for example. This gave rise to some other subtle problems if sendmail was
...@@ -37,7 +41,7 @@ to lock solid with deadlocked processes. ...@@ -37,7 +41,7 @@ to lock solid with deadlocked processes.
1.2.2 The Solution 1.2.2 The Solution
------------------ ^^^^^^^^^^^^^^^^^^
The solution I have chosen, after much experimentation and discussion, The solution I have chosen, after much experimentation and discussion,
is to make flock() and fcntl() locks oblivious to each other. Both can is to make flock() and fcntl() locks oblivious to each other. Both can
exists, and neither will have any effect on the other. exists, and neither will have any effect on the other.
...@@ -54,7 +58,7 @@ fcntl(), with all the problems that implies. ...@@ -54,7 +58,7 @@ fcntl(), with all the problems that implies.
--------------------------------------- ---------------------------------------
Mandatory locking, as described in Mandatory locking, as described in
'Documentation/filesystems/mandatory-locking.txt' was prior to this release a 'Documentation/filesystems/mandatory-locking.rst' was prior to this release a
general configuration option that was valid for all mounted filesystems. This general configuration option that was valid for all mounted filesystems. This
had a number of inherent dangers, not the least of which was the ability to had a number of inherent dangers, not the least of which was the ability to
freeze an NFS server by asking it to read a file for which a mandatory lock freeze an NFS server by asking it to read a file for which a mandatory lock
......
Mandatory File Locking For The Linux Operating System .. SPDX-License-Identifier: GPL-2.0
=====================================================
Mandatory File Locking For The Linux Operating System
=====================================================
Andy Walker <andy@lysaker.kvaerner.no> Andy Walker <andy@lysaker.kvaerner.no>
15 April 1996 15 April 1996
(Updated September 2007) (Updated September 2007)
0. Why you should avoid mandatory locking 0. Why you should avoid mandatory locking
...@@ -53,15 +58,17 @@ possible on existing user code. The scheme is based on marking individual files ...@@ -53,15 +58,17 @@ possible on existing user code. The scheme is based on marking individual files
as candidates for mandatory locking, and using the existing fcntl()/lockf() as candidates for mandatory locking, and using the existing fcntl()/lockf()
interface for applying locks just as if they were normal, advisory locks. interface for applying locks just as if they were normal, advisory locks.
Note 1: In saying "file" in the paragraphs above I am actually not telling .. Note::
the whole truth. System V locking is based on fcntl(). The granularity of
fcntl() is such that it allows the locking of byte ranges in files, in addition 1. In saying "file" in the paragraphs above I am actually not telling
to entire files, so the mandatory locking rules also have byte level the whole truth. System V locking is based on fcntl(). The granularity of
granularity. fcntl() is such that it allows the locking of byte ranges in files, in
addition to entire files, so the mandatory locking rules also have byte
level granularity.
Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite 2. POSIX.1 does not specify any scheme for mandatory locking, despite
borrowing the fcntl() locking scheme from System V. The mandatory locking borrowing the fcntl() locking scheme from System V. The mandatory locking
scheme is defined by the System V Interface Definition (SVID) Version 3. scheme is defined by the System V Interface Definition (SVID) Version 3.
2. Marking a file for mandatory locking 2. Marking a file for mandatory locking
--------------------------------------- ---------------------------------------
......
...@@ -119,9 +119,7 @@ it comes to that question:: ...@@ -119,9 +119,7 @@ it comes to that question::
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf /opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
Create an /etc/pvfs2tab file:: Create an /etc/pvfs2tab file (localhost is fine)::
Localhost is fine for your pvfs2tab file:
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \ echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
/etc/pvfs2tab /etc/pvfs2tab
......
...@@ -1871,7 +1871,7 @@ unbindable mount is unbindable ...@@ -1871,7 +1871,7 @@ unbindable mount is unbindable
For more information on mount propagation see: For more information on mount propagation see:
Documentation/filesystems/sharedsubtree.txt Documentation/filesystems/sharedsubtree.rst
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
......
...@@ -71,7 +71,7 @@ be allowed write access to a ramfs mount. ...@@ -71,7 +71,7 @@ be allowed write access to a ramfs mount.
A ramfs derivative called tmpfs was created to add size limits, and the ability A ramfs derivative called tmpfs was created to add size limits, and the ability
to write the data to swap space. Normal users can be allowed write access to to write the data to swap space. Normal users can be allowed write access to
tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information. tmpfs mounts. See Documentation/filesystems/tmpfs.rst for more information.
What is rootfs? What is rootfs?
--------------- ---------------
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment