Commit c76c2230 authored by David S. Miller's avatar David S. Miller

Merge branch 'net-ReST-convert'

Mauro Carvalho Chehab says:

====================
net: manually convert files to ReST format - part 1

There are very few documents upstream that aren't converted upstream.

This series convert part of the networking text files into ReST.
It is part of a bigger set of patches, which were split on parts,
in order to make reviewing task easier.

The full series (including those ones) are at:

	https://git.linuxtv.org/mchehab/experimental.git/log/?h=net-docs

And the documents, converted to HTML via the building system
are at:

	https://www.infradead.org/~mchehab/kernel_docs/networking/
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 790ab249 b9dd2bea
......@@ -356,7 +356,7 @@
shot down by NMI
autoconf= [IPV6]
See Documentation/networking/ipv6.txt.
See Documentation/networking/ipv6.rst.
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
Limit apic dumping. The parameter defines the maximal
......@@ -831,7 +831,7 @@
decnet.addr= [HW,NET]
Format: <area>[,<node>]
See also Documentation/networking/decnet.txt.
See also Documentation/networking/decnet.rst.
default_hugepagesz=
[same as hugepagesz=] The size of the default
......@@ -872,7 +872,7 @@
miss to occur.
disable= [IPV6]
See Documentation/networking/ipv6.txt.
See Documentation/networking/ipv6.rst.
hardened_usercopy=
[KNL] Under CONFIG_HARDENED_USERCOPY, whether
......@@ -912,7 +912,7 @@
to workaround buggy firmware.
disable_ipv6= [IPV6]
See Documentation/networking/ipv6.txt.
See Documentation/networking/ipv6.rst.
disable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous
......@@ -4910,7 +4910,7 @@
Set the number of tcp_metrics_hash slots.
Default value is 8192 or 16384 depending on total
ram pages. This is used to specify the TCP metrics
cache size. See Documentation/networking/ip-sysctl.txt
cache size. See Documentation/networking/ip-sysctl.rst
"tcp_no_metrics_save" section for more details.
tdfx= [HW,DRM]
......
......@@ -353,8 +353,8 @@ socket's buffer. It will not take effect unless PF_UNIX flag is specified.
3. /proc/sys/net/ipv4 - IPV4 settings
-------------------------------------
Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
descriptions of these entries.
Please see: Documentation/networking/ip-sysctl.rst and
Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
4. Appletalk
......
......@@ -7,7 +7,7 @@ Filter) facility, with a focus on the extended BPF version (eBPF).
This kernel side documentation is still work in progress. The main
textual documentation is (for historical reasons) described in
`Documentation/networking/filter.txt`_, which describe both classical
`Documentation/networking/filter.rst`_, which describe both classical
and extended BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.
......@@ -59,7 +59,7 @@ Testing and debugging BPF
.. Links:
.. _Documentation/networking/filter.txt: ../networking/filter.txt
.. _Documentation/networking/filter.rst: ../networking/filter.txt
.. _man-pages: https://www.kernel.org/doc/man-pages/
.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
.. SPDX-License-Identifier: GPL-2.0
==============
6pack Protocol
==============
This is the 6pack-mini-HOWTO, written by
Andreas Könsgen DG3KQ
Internet: ajk@comnets.uni-bremen.de
AMPR-net: dg3kq@db0pra.ampr.org
AX.25: dg3kq@db0ach.#nrw.deu.eu
:Internet: ajk@comnets.uni-bremen.de
:AMPR-net: dg3kq@db0pra.ampr.org
:AX.25: dg3kq@db0ach.#nrw.deu.eu
Last update: April 7, 1998
1. What is 6pack, and what are the advantages to KISS?
======================================================
6pack is a transmission protocol for data exchange between the PC and
the TNC over a serial line. It can be used as an alternative to KISS.
6pack has two major advantages:
- The PC is given full control over the radio
channel. Special control data is exchanged between the PC and the TNC so
that the PC knows at any time if the TNC is receiving data, if a TNC
buffer underrun or overrun has occurred, if the PTT is
set and so on. This control data is processed at a higher priority than
normal data, so a data stream can be interrupted at any time to issue an
important event. This helps to improve the channel access and timing
algorithms as everything is computed in the PC. It would even be possible
to experiment with something completely different from the known CSMA and
important event. This helps to improve the channel access and timing
algorithms as everything is computed in the PC. It would even be possible
to experiment with something completely different from the known CSMA and
DAMA channel access methods.
This kind of real-time control is especially important to supply several
TNCs that are connected between each other and the PC by a daisy chain
......@@ -36,6 +45,7 @@ More details about 6pack are described in the file 6pack.ps that is located
in the doc directory of the AX.25 utilities package.
2. Who has developed the 6pack protocol?
========================================
The 6pack protocol has been developed by Ekki Plicht DF4OR, Henning Rech
DF9IC and Gunter Jost DK7WJ. A driver for 6pack, written by Gunter Jost and
......@@ -44,12 +54,14 @@ They have also written a firmware for TNCs to perform the 6pack
protocol (see section 4 below).
3. Where can I get the latest version of 6pack for LinuX?
=========================================================
At the moment, the 6pack stuff can obtained via anonymous ftp from
db0bm.automation.fh-aachen.de. In the directory /incoming/dg3kq,
there is a file named 6pack.tgz.
4. Preparing the TNC for 6pack operation
========================================
To be able to use 6pack, a special firmware for the TNC is needed. The EPROM
of a newly bought TNC does not contain 6pack, so you will have to
......@@ -75,12 +87,14 @@ and the status LED are lit for about a second if the firmware initialises
the TNC correctly.
5. Building and installing the 6pack driver
===========================================
The driver has been tested with kernel version 2.1.90. Use with older
kernels may lead to a compilation error because the interface to a kernel
function has been changed in the 2.1.8x kernels.
How to turn on 6pack support:
=============================
- In the linux kernel configuration program, select the code maturity level
options menu and turn on the prompting for development drivers.
......@@ -94,27 +108,28 @@ To use the driver, the kissattach program delivered with the AX.25 utilities
has to be modified.
- Do a cd to the directory that holds the kissattach sources. Edit the
kissattach.c file. At the top, insert the following lines:
kissattach.c file. At the top, insert the following lines::
#ifndef N_6PACK
#define N_6PACK (N_AX25+1)
#endif
#ifndef N_6PACK
#define N_6PACK (N_AX25+1)
#endif
Then find the line:
Then find the line
int disc = N_AX25;
int disc = N_AX25;
and replace N_AX25 by N_6PACK.
- Recompile kissattach. Rename it to spattach to avoid confusions.
Installing the driver:
----------------------
- Do an insmod 6pack. Look at your /var/log/messages file to check if the
- Do an insmod 6pack. Look at your /var/log/messages file to check if the
module has printed its initialization message.
- Do a spattach as you would launch kissattach when starting a KISS port.
Check if the kernel prints the message '6pack: TNC found'.
Check if the kernel prints the message '6pack: TNC found'.
- From here, everything should work as if you were setting up a KISS port.
The only difference is that the network device that represents
......@@ -138,6 +153,7 @@ from the PC to the TNC over the serial line, the status LED if data is
sent to the PC.
6. Known problems
=================
When testing the driver with 2.0.3x kernels and
operating with data rates on the radio channel of 9600 Baud or higher,
......
Altera Triple-Speed Ethernet MAC driver
.. SPDX-License-Identifier: GPL-2.0
Copyright (C) 2008-2014 Altera Corporation
.. include:: <isonum.txt>
=======================================
Altera Triple-Speed Ethernet MAC driver
=======================================
Copyright |copy| 2008-2014 Altera Corporation
This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers
using the SGDMA and MSGDMA soft DMA IP components. The driver uses the
......@@ -46,23 +52,33 @@ Jumbo frames are not supported at this time.
The driver limits PHY operations to 10/100Mbps, and has not yet been fully
tested for 1Gbps. This support will be added in a future maintenance update.
1) Kernel Configuration
1. Kernel Configuration
=======================
The kernel configuration option is ALTERA_TSE:
Device Drivers ---> Network device support ---> Ethernet driver support --->
Altera Triple-Speed Ethernet MAC support (ALTERA_TSE)
2) Driver parameters list:
debug: message level (0: no output, 16: all);
dma_rx_num: Number of descriptors in the RX list (default is 64);
dma_tx_num: Number of descriptors in the TX list (default is 64).
2. Driver parameters list
=========================
- debug: message level (0: no output, 16: all);
- dma_rx_num: Number of descriptors in the RX list (default is 64);
- dma_tx_num: Number of descriptors in the TX list (default is 64).
3. Command line options
=======================
Driver parameters can be also passed in command line by using::
3) Command line options
Driver parameters can be also passed in command line by using:
altera_tse=dma_rx_num:128,dma_tx_num:512
4) Driver information and notes
4. Driver information and notes
===============================
4.1) Transmit process
4.1. Transmit process
---------------------
When the driver's transmit routine is called by the kernel, it sets up a
transmit descriptor by calling the underlying DMA transmit routine (SGDMA or
MSGDMA), and initiates a transmit operation. Once the transmit is complete, an
......@@ -70,7 +86,8 @@ interrupt is driven by the transmit DMA logic. The driver handles the transmit
completion in the context of the interrupt handling chain by recycling
resource required to send and track the requested transmit operation.
4.2) Receive process
4.2. Receive process
--------------------
The driver will post receive buffers to the receive DMA logic during driver
initialization. Receive buffers may or may not be queued depending upon the
underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able
......@@ -79,34 +96,39 @@ received, the DMA logic generates an interrupt. The driver handles a receive
interrupt by obtaining the DMA receive logic status, reaping receive
completions until no more receive completions are available.
4.3) Interrupt Mitigation
4.3. Interrupt Mitigation
-------------------------
The driver is able to mitigate the number of its DMA interrupts
using NAPI for receive operations. Interrupt mitigation is not yet supported
for transmit operations, but will be added in a future maintenance release.
4.4) Ethtool support
--------------------
Ethtool is supported. Driver statistics and internal errors can be taken using:
ethtool -S ethX command. It is possible to dump registers etc.
4.5) PHY Support
----------------
The driver is compatible with PAL to work with PHY and GPHY devices.
4.7) List of source files:
o Kconfig
o Makefile
o altera_tse_main.c: main network device driver
o altera_tse_ethtool.c: ethtool support
o altera_tse.h: private driver structure and common definitions
o altera_msgdma.h: MSGDMA implementation function definitions
o altera_sgdma.h: SGDMA implementation function definitions
o altera_msgdma.c: MSGDMA implementation
o altera_sgdma.c: SGDMA implementation
o altera_sgdmahw.h: SGDMA register and descriptor definitions
o altera_msgdmahw.h: MSGDMA register and descriptor definitions
o altera_utils.c: Driver utility functions
o altera_utils.h: Driver utility function definitions
5) Debug Information
--------------------------
- Kconfig
- Makefile
- altera_tse_main.c: main network device driver
- altera_tse_ethtool.c: ethtool support
- altera_tse.h: private driver structure and common definitions
- altera_msgdma.h: MSGDMA implementation function definitions
- altera_sgdma.h: SGDMA implementation function definitions
- altera_msgdma.c: MSGDMA implementation
- altera_sgdma.c: SGDMA implementation
- altera_sgdmahw.h: SGDMA register and descriptor definitions
- altera_msgdmahw.h: MSGDMA register and descriptor definitions
- altera_utils.c: Driver utility functions
- altera_utils.h: Driver utility function definitions
5. Debug Information
====================
The driver exports debug information such as internal statistics,
debug information, MAC and DMA registers etc.
......@@ -118,17 +140,18 @@ or sees the MAC registers: e.g. using: ethtool -d ethX
The developer can also use the "debug" module parameter to get
further debug information.
6) Statistics Support
6. Statistics Support
=====================
The controller and driver support a mix of IEEE standard defined statistics,
RFC defined statistics, and driver or Altera defined statistics. The four
specifications containing the standard definitions for these statistics are
as follows:
o IEEE 802.3-2012 - IEEE Standard for Ethernet.
o RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt.
o RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt.
o Altera Triple Speed Ethernet User Guide, found at http://www.altera.com
- IEEE 802.3-2012 - IEEE Standard for Ethernet.
- RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt.
- RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt.
- Altera Triple Speed Ethernet User Guide, found at http://www.altera.com
The statistics supported by the TSE and the device driver are as follows:
......
----------------------------------------------------------------------------
NOTE: See also arcnet-hardware.txt in this directory for jumper-setting
and cabling information if you're like many of us and didn't happen to get a
manual with your ARCnet card.
----------------------------------------------------------------------------
.. SPDX-License-Identifier: GPL-2.0
======
ARCnet
======
.. note::
See also arcnet-hardware.txt in this directory for jumper-setting
and cabling information if you're like many of us and didn't happen to get a
manual with your ARCnet card.
Since no one seems to listen to me otherwise, perhaps a poem will get your
attention:
attention::
This driver's getting fat and beefy,
But my cat is still named Fifi.
......@@ -24,28 +31,21 @@ Come on, be a sport! Send me a success report!
(hey, that was even better than my original poem... this is getting bad!)
--------
WARNING:
--------
If you don't e-mail me about your success/failure soon, I may be forced to
start SINGING. And we don't want that, do we?
.. warning::
(You know, it might be argued that I'm pushing this point a little too much.
If you think so, why not flame me in a quick little e-mail? Please also
include the type of card(s) you're using, software, size of network, and
whether it's working or not.)
If you don't e-mail me about your success/failure soon, I may be forced to
start SINGING. And we don't want that, do we?
My e-mail address is: apenwarr@worldvisions.ca
(You know, it might be argued that I'm pushing this point a little too much.
If you think so, why not flame me in a quick little e-mail? Please also
include the type of card(s) you're using, software, size of network, and
whether it's working or not.)
My e-mail address is: apenwarr@worldvisions.ca
---------------------------------------------------------------------------
These are the ARCnet drivers for Linux.
This new release (2.91) has been put together by David Woodhouse
This new release (2.91) has been put together by David Woodhouse
<dwmw2@infradead.org>, in an attempt to tidy up the driver after adding support
for yet another chipset. Now the generic support has been separated from the
individual chipset drivers, and the source files aren't quite so packed with
......@@ -62,12 +62,13 @@ included and seems to be working fine!
Where do I discuss these drivers?
---------------------------------
Tomasz has been so kind as to set up a new and improved mailing list.
Tomasz has been so kind as to set up a new and improved mailing list.
Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR
REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the
list, mail to linux-arcnet@tichy.ch.uj.edu.pl.
There are archives of the mailing list at:
http://epistolary.org/mailman/listinfo.cgi/arcnet
The people on linux-net@vger.kernel.org (now defunct, replaced by
......@@ -80,17 +81,20 @@ Other Drivers and Info
----------------------
You can try my ARCNET page on the World Wide Web at:
http://www.qis.net/~jschmitz/arcnet/
http://www.qis.net/~jschmitz/arcnet/
Also, SMC (one of the companies that makes ARCnet cards) has a WWW site you
might be interested in, which includes several drivers for various cards
including ARCnet. Try:
http://www.smc.com/
Performance Technologies makes various network software that supports
ARCnet:
http://www.perftech.com/ or ftp to ftp.perftech.com.
Novell makes a networking stack for DOS which includes ARCnet drivers. Try
FTPing to ftp.novell.com.
......@@ -99,19 +103,20 @@ one you'll want to use with ARCnet cards) from
oak.oakland.edu:/simtel/msdos/pktdrvr. It won't work perfectly on a 386+
without patches, though, and also doesn't like several cards. Fixed
versions are available on my WWW page, or via e-mail if you don't have WWW
access.
access.
Installing the Driver
---------------------
All you will need to do in order to install the driver is:
All you will need to do in order to install the driver is::
make config
(be sure to choose ARCnet in the network devices
(be sure to choose ARCnet in the network devices
and at least one chipset driver.)
make clean
make zImage
If you obtained this ARCnet package as an upgrade to the ARCnet driver in
your current kernel, you will need to first copy arcnet.c over the one in
the linux/drivers/net directory.
......@@ -125,10 +130,12 @@ There are four chipset options:
This is the normal ARCnet card, which you've probably got. This is the only
chipset driver which will autoprobe if not told where the card is.
It following options on the command line:
It following options on the command line::
com90xx=[<io>[,<irq>[,<shmem>]]][,<name>] | <name>
If you load the chipset support as a module, the options are:
If you load the chipset support as a module, the options are::
io=<io> irq=<irq> shmem=<shmem> device=<name>
To disable the autoprobe, just specify "com90xx=" on the kernel command line.
......@@ -136,14 +143,17 @@ To specify the name alone, but allow autoprobe, just put "com90xx=<name>"
2. ARCnet COM20020 chipset.
This is the new chipset from SMC with support for promiscuous mode (packet
This is the new chipset from SMC with support for promiscuous mode (packet
sniffing), extra diagnostic information, etc. Unfortunately, there is no
sensible method of autoprobing for these cards. You must specify the I/O
address on the kernel command line.
The command line options are:
The command line options are::
com20020=<io>[,<irq>[,<node_ID>[,backplane[,CKP[,timeout]]]]][,name]
If you load the chipset support as a module, the options are:
If you load the chipset support as a module, the options are::
io=<io> irq=<irq> node=<node_ID> backplane=<backplane> clock=<CKP>
timeout=<timeout> device=<name>
......@@ -160,8 +170,10 @@ you have a card which doesn't support shared memory, or (strangely) in case
you have so many ARCnet cards in your machine that you run out of shmem slots.
If you don't give the IO address on the kernel command line, then the driver
will not find the card.
The command line options are:
com90io=<io>[,<irq>][,<name>]
The command line options are::
com90io=<io>[,<irq>][,<name>]
If you load the chipset support as a module, the options are:
io=<io> irq=<irq> device=<name>
......@@ -169,44 +181,49 @@ If you load the chipset support as a module, the options are:
4. ARCnet RIM I cards.
These are COM90xx chips which are _completely_ memory mapped. The support for
these is not tested. If you have one, please mail the author with a success
these is not tested. If you have one, please mail the author with a success
report. All options must be specified, except the device name.
Command line options:
Command line options::
arcrimi=<shmem>,<irq>,<node_ID>[,<name>]
If you load the chipset support as a module, the options are:
If you load the chipset support as a module, the options are::
shmem=<shmem> irq=<irq> node=<node_ID> device=<name>
Loadable Module Support
-----------------------
Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet
Configure and rebuild Linux. When asked, answer 'm' to "Generic ARCnet
support" and to support for your ARCnet chipset if you want to use the
loadable module. You can also say 'y' to "Generic ARCnet support" and 'm'
loadable module. You can also say 'y' to "Generic ARCnet support" and 'm'
to the chipset support if you wish.
::
make config
make clean
make clean
make zImage
make modules
If you're using a loadable module, you need to use insmod to load it, and
you can specify various characteristics of your card on the command
line. (In recent versions of the driver, autoprobing is much more reliable
and works as a module, so most of this is now unnecessary.)
For example:
For example::
cd /usr/src/linux/modules
insmod arcnet.o
insmod com90xx.o
insmod com20020.o io=0x2e0 device=eth1
Using the Driver
----------------
If you build your kernel with ARCnet COM90xx support included, it should
If you build your kernel with ARCnet COM90xx support included, it should
probe for your card automatically when you boot. If you use a different
chipset driver complied into the kernel, you must give the necessary options
on the kernel command line, as detailed above.
......@@ -224,69 +241,78 @@ Multiple Cards in One Computer
------------------------------
Linux has pretty good support for this now, but since I've been busy, the
ARCnet driver has somewhat suffered in this respect. COM90xx support, if
compiled into the kernel, will (try to) autodetect all the installed cards.
ARCnet driver has somewhat suffered in this respect. COM90xx support, if
compiled into the kernel, will (try to) autodetect all the installed cards.
If you have other cards, with support compiled into the kernel, then you can
just repeat the options on the kernel command line, e.g.::
LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260
If you have other cards, with support compiled into the kernel, then you can
just repeat the options on the kernel command line, e.g.:
LILO: linux com20020=0x2e0 com20020=0x380 com90io=0x260
If you have the chipset support built as a loadable module, then you need to
do something like this::
If you have the chipset support built as a loadable module, then you need to
do something like this:
insmod -o arc0 com90xx
insmod -o arc1 com20020 io=0x2e0
insmod -o arc2 com90xx
The ARCnet drivers will now sort out their names automatically.
How do I get it to work with...?
--------------------------------
NFS: Should be fine linux->linux, just pretend you're using Ethernet cards.
oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There
is also a DOS-based NFS server called SOSS. It doesn't multitask
quite the way Linux does (actually, it doesn't multitask AT ALL) but
you never know what you might need.
With AmiTCP (and possibly others), you may need to set the following
options in your Amiga nfstab: MD 1024 MR 1024 MW 1024
(Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de>
NFS:
Should be fine linux->linux, just pretend you're using Ethernet cards.
oak.oakland.edu:/simtel/msdos/nfs has some nice DOS clients. There
is also a DOS-based NFS server called SOSS. It doesn't multitask
quite the way Linux does (actually, it doesn't multitask AT ALL) but
you never know what you might need.
With AmiTCP (and possibly others), you may need to set the following
options in your Amiga nfstab: MD 1024 MR 1024 MW 1024
(Thanks to Christian Gottschling <ferksy@indigo.tng.oche.de>
for this.)
Probably these refer to maximum NFS data/read/write block sizes. I
don't know why the defaults on the Amiga didn't work; write to me if
you know more.
DOS: If you're using the freeware arcether.com, you might want to install
the driver patch from my web page. It helps with PC/TCP, and also
can get arcether to load if it timed out too quickly during
initialization. In fact, if you use it on a 386+ you REALLY need
the patch, really.
Windows: See DOS :) Trumpet Winsock works fine with either the Novell or
DOS:
If you're using the freeware arcether.com, you might want to install
the driver patch from my web page. It helps with PC/TCP, and also
can get arcether to load if it timed out too quickly during
initialization. In fact, if you use it on a 386+ you REALLY need
the patch, really.
Windows:
See DOS :) Trumpet Winsock works fine with either the Novell or
Arcether client, assuming you remember to load winpkt of course.
LAN Manager and Windows for Workgroups: These programs use protocols that
are incompatible with the Internet standard. They try to pretend
the cards are Ethernet, and confuse everyone else on the network.
However, v2.00 and higher of the Linux ARCnet driver supports this
protocol via the 'arc0e' device. See the section on "Multiprotocol
Support" for more information.
LAN Manager and Windows for Workgroups:
These programs use protocols that
are incompatible with the Internet standard. They try to pretend
the cards are Ethernet, and confuse everyone else on the network.
However, v2.00 and higher of the Linux ARCnet driver supports this
protocol via the 'arc0e' device. See the section on "Multiprotocol
Support" for more information.
Using the freeware Samba server and clients for Linux, you can now
interface quite nicely with TCP/IP-based WfWg or Lan Manager
networks.
Windows 95: Tools are included with Win95 that let you use either the LANMAN
Windows 95:
Tools are included with Win95 that let you use either the LANMAN
style network drivers (NDIS) or Novell drivers (ODI) to handle your
ARCnet packets. If you use ODI, you'll need to use the 'arc0'
device with Linux. If you use NDIS, then try the 'arc0e' device.
device with Linux. If you use NDIS, then try the 'arc0e' device.
See the "Multiprotocol Support" section below if you need arc0e,
you're completely insane, and/or you need to build some kind of
hybrid network that uses both encapsulation types.
OS/2: I've been told it works under Warp Connect with an ARCnet driver from
OS/2:
I've been told it works under Warp Connect with an ARCnet driver from
SMC. You need to use the 'arc0e' interface for this. If you get
the SMC driver to work with the TCP/IP stuff included in the
"normal" Warp Bonus Pack, let me know.
......@@ -295,7 +321,8 @@ OS/2: I've been told it works under Warp Connect with an ARCnet driver from
which should use the same protocol as WfWg does. I had no luck
installing it under Warp, however. Please mail me with any results.
NetBSD/AmiTCP: These use an old version of the Internet standard ARCnet
NetBSD/AmiTCP:
These use an old version of the Internet standard ARCnet
protocol (RFC1051) which is compatible with the Linux driver v2.10
ALPHA and above using the arc0s device. (See "Multiprotocol ARCnet"
below.) ** Newer versions of NetBSD apparently support RFC1201.
......@@ -307,16 +334,17 @@ Using Multiprotocol ARCnet
The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
"virtual network device":
arc0 - RFC1201 protocol, the official Internet standard which just
happens to be 100% compatible with Novell's TRXNET driver.
====== ===============================================================
arc0 RFC1201 protocol, the official Internet standard which just
happens to be 100% compatible with Novell's TRXNET driver.
Version 1.00 of the ARCnet driver supported _only_ this
protocol. arc0 is the fastest of the three protocols (for
whatever reason), and allows larger packets to be used
because it supports RFC1201 "packet splitting" operations.
because it supports RFC1201 "packet splitting" operations.
Unless you have a specific need to use a different protocol,
I strongly suggest that you stick with this one.
arc0e - "Ethernet-Encapsulation" which sends packets over ARCnet
arc0e "Ethernet-Encapsulation" which sends packets over ARCnet
that are actually a lot like Ethernet packets, including the
6-byte hardware addresses. This protocol is compatible with
Microsoft's NDIS ARCnet driver, like the one in WfWg and
......@@ -328,8 +356,8 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
fit. arc0e also works slightly more slowly than arc0, for
reasons yet to be determined. (Probably it's the smaller
MTU that does it.)
arc0s - The "[s]imple" RFC1051 protocol is the "previous" Internet
arc0s The "[s]imple" RFC1051 protocol is the "previous" Internet
standard that is completely incompatible with the new
standard. Some software today, however, continues to
support the old standard (and only the old standard)
......@@ -338,9 +366,10 @@ The ARCnet driver v2.10 ALPHA supports three protocols, each on its own
smaller than the Internet "requirement," so it's quite
possible that you may run into problems. It's also slower
than RFC1201 by about 25%, for the same reason as arc0e.
The arc0s support was contributed by Tomasz Motylewski
and modified somewhat by me. Bugs are probably my fault.
====== ===============================================================
You can choose not to compile arc0e and arc0s into the driver if you want -
this will save you a bit of memory and avoid confusion when eg. trying to
......@@ -358,19 +387,21 @@ can set up your network then:
two available protocols. As mentioned above, it's a good idea to use
only arc0 unless you have a good reason (like some other software, ie.
WfWg, that only works with arc0e).
If you need only arc0, then the following commands should get you going:
ifconfig arc0 MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0
route add -net SUB.NET.ADD.RESS arc0
[add other local routes here]
If you need arc0e (and only arc0e), it's a little different:
ifconfig arc0 MY.IP.ADD.RESS
ifconfig arc0e MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0e
route add -net SUB.NET.ADD.RESS arc0e
If you need only arc0, then the following commands should get you going::
ifconfig arc0 MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0
route add -net SUB.NET.ADD.RESS arc0
[add other local routes here]
If you need arc0e (and only arc0e), it's a little different::
ifconfig arc0 MY.IP.ADD.RESS
ifconfig arc0e MY.IP.ADD.RESS
route add MY.IP.ADD.RESS arc0e
route add -net SUB.NET.ADD.RESS arc0e
arc0s works much the same way as arc0e.
......@@ -391,29 +422,32 @@ can set up your network then:
XT (patience), however, does not have its own Internet IP address and so
I assigned it one on a "private subnet" (as defined by RFC1597).
To start with, take a simple network with just insight and freedom.
To start with, take a simple network with just insight and freedom.
Insight needs to:
- talk to freedom via RFC1201 (arc0) protocol, because I like it
- talk to freedom via RFC1201 (arc0) protocol, because I like it
more and it's faster.
- use freedom as its Internet gateway.
That's pretty easy to do. Set up insight like this:
ifconfig arc0 insight
route add insight arc0
route add freedom arc0 /* I would use the subnet here (like I said
That's pretty easy to do. Set up insight like this::
ifconfig arc0 insight
route add insight arc0
route add freedom arc0 /* I would use the subnet here (like I said
to to in "single protocol" above),
but the rest of the subnet
unfortunately lies across the PPP
link on freedom, which confuses
things. */
route add default gw freedom
And freedom gets configured like so:
ifconfig arc0 freedom
route add freedom arc0
route add insight arc0
/* and default gateway is configured by pppd */
but the rest of the subnet
unfortunately lies across the PPP
link on freedom, which confuses
things. */
route add default gw freedom
And freedom gets configured like so::
ifconfig arc0 freedom
route add freedom arc0
route add insight arc0
/* and default gateway is configured by pppd */
Great, now insight talks to freedom directly on arc0, and sends packets
to the Internet through freedom. If you didn't know how to do the above,
you should probably stop reading this section now because it only gets
......@@ -425,7 +459,7 @@ can set up your network then:
Internet. (Recall that patience has a "private IP address" which won't
work on the Internet; that's okay, I configured Linux IP masquerading on
freedom for this subnet).
So patience (necessarily; I don't have another IP number from my
provider) has an IP address on a different subnet than freedom and
insight, but needs to use freedom as an Internet gateway. Worse, most
......@@ -435,53 +469,54 @@ can set up your network then:
insight, patience WILL send through its default gateway, regardless of
the fact that both freedom and insight (courtesy of the arc0e device)
could understand a direct transmission.
I compensate by giving freedom an extra IP address - aliased 'gatekeeper'
- that is on my private subnet, the same subnet that patience is on. I
I compensate by giving freedom an extra IP address - aliased 'gatekeeper' -
that is on my private subnet, the same subnet that patience is on. I
then define gatekeeper to be the default gateway for patience.
To configure freedom (in addition to the commands above):
ifconfig arc0e gatekeeper
route add gatekeeper arc0e
route add patience arc0e
To configure freedom (in addition to the commands above)::
ifconfig arc0e gatekeeper
route add gatekeeper arc0e
route add patience arc0e
This way, freedom will send all packets for patience through arc0e,
giving its IP address as gatekeeper (on the private subnet). When it
talks to insight or the Internet, it will use its "freedom" Internet IP
address.
You will notice that we haven't configured the arc0e device on insight.
You will notice that we haven't configured the arc0e device on insight.
This would work, but is not really necessary, and would require me to
assign insight another special IP number from my private subnet. Since
both insight and patience are using freedom as their default gateway, the
two can already talk to each other.
It's quite fortunate that I set things up like this the first time (cough
cough) because it's really handy when I boot insight into DOS. There, it
runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet.
runs the Novell ODI protocol stack, which only works with RFC1201 ARCnet.
In this mode it would be impossible for insight to communicate directly
with patience, since the Novell stack is incompatible with Microsoft's
Ethernet-Encap. Without changing any settings on freedom or patience, I
simply set freedom as the default gateway for insight (now in DOS,
remember) and all the forwarding happens "automagically" between the two
hosts that would normally not be able to communicate at all.
For those who like diagrams, I have created two "virtual subnets" on the
same physical ARCnet wire. You can picture it like this:
[RFC1201 NETWORK] [ETHER-ENCAP NETWORK]
same physical ARCnet wire. You can picture it like this::
[RFC1201 NETWORK] [ETHER-ENCAP NETWORK]
(registered Internet subnet) (RFC1597 private subnet)
(IP Masquerade)
/---------------\ * /---------------\
| | * | |
| +-Freedom-*-Gatekeeper-+ |
| | | * | |
\-------+-------/ | * \-------+-------/
| | |
Insight | Patience
(Internet)
(IP Masquerade)
/---------------\ * /---------------\
| | * | |
| +-Freedom-*-Gatekeeper-+ |
| | | * | |
\-------+-------/ | * \-------+-------/
| | |
Insight | Patience
(Internet)
......@@ -491,6 +526,7 @@ It works: what now?
Send mail describing your setup, preferably including driver version, kernel
version, ARCnet card model, CPU type, number of systems on your network, and
list of software in use to me at the following address:
apenwarr@worldvisions.ca
I do send (sometimes automated) replies to all messages I receive. My email
......@@ -525,7 +561,7 @@ this, you should grab the pertinent RFCs. (some are listed near the top of
arcnet.c). arcdump assumes your card is at 0xD0000. If it isn't, edit the
script.
Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending.
Buffers 0 and 1 are used for receiving, and Buffers 2 and 3 are for sending.
Ping-pong buffers are implemented both ways.
If your debug level includes D_DURING and you did NOT define SLOW_XMIT_COPY,
......@@ -535,9 +571,11 @@ decides that the driver is broken). During a transmit, unused parts of the
buffer will be cleared to 0x42 as well. This is to make it easier to figure
out which bytes are being used by a packet.
You can change the debug level without recompiling the kernel by typing:
You can change the debug level without recompiling the kernel by typing::
ifconfig arc0 down metric 1xxx
/etc/rc.d/rc.inet1
where "xxx" is the debug level you want. For example, "metric 1015" would put
you at debug level 15. Debug level 7 is currently the default.
......@@ -546,7 +584,7 @@ combination of different debug flags; so debug level 7 is really 1+2+4 or
D_NORMAL+D_EXTRA+D_INIT. To include D_DURING, you would add 16 to this,
resulting in debug level 23.
If you don't understand that, you probably don't want to know anyway.
If you don't understand that, you probably don't want to know anyway.
E-mail me about your problem.
......
.. SPDX-License-Identifier: GPL-2.0
===
ATM
===
In order to use anything but the most primitive functions of ATM,
several user-mode programs are required to assist the kernel. These
programs and related material can be found via the ATM on Linux Web
......
.. SPDX-License-Identifier: GPL-2.0
=====
AX.25
=====
To use the amateur radio protocols within Linux you will need to get a
suitable copy of the AX.25 Utilities. More detailed information about
AX.25, NET/ROM and ROSE, associated programs and and utilities can be
......
LINUX DRIVERS FOR BAYCOM MODEMS
.. SPDX-License-Identifier: GPL-2.0
Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch>
===============================
Linux Drivers for Baycom Modems
===============================
!!NEW!! (04/98) The drivers for the baycom modems have been split into
Thomas M. Sailer, HB9JNX/AE4WA, <sailer@ife.ee.ethz.ch>
The drivers for the baycom modems have been split into
separate drivers as they did not share any code, and the driver
and device names have changed.
This document describes the Linux Kernel Drivers for simple Baycom style
amateur radio modems.
amateur radio modems.
The following drivers are available:
====================================
baycom_ser_fdx:
This driver supports the SER12 modems either full or half duplex.
Its baud rate may be changed via the `baud' module parameter,
Its baud rate may be changed via the ``baud`` module parameter,
therefore it supports just about every bit bang modem on a
serial port. Its devices are called bcsf0 through bcsf3.
This is the recommended driver for SER12 type modems,
however if you have a broken UART clone that does not have working
delta status bits, you may try baycom_ser_hdx.
delta status bits, you may try baycom_ser_hdx.
baycom_ser_hdx:
baycom_ser_hdx:
This is an alternative driver for SER12 type modems.
It only supports half duplex, and only 1200 baud. Its devices
are called bcsh0 through bcsh3. Use this driver only if baycom_ser_fdx
......@@ -37,45 +42,48 @@ baycom_epp:
The following modems are supported:
ser12: This is a very simple 1200 baud AFSK modem. The modem consists only
of a modulator/demodulator chip, usually a TI TCM3105. The computer
is responsible for regenerating the receiver bit clock, as well as
for handling the HDLC protocol. The modem connects to a serial port,
hence the name. Since the serial port is not used as an async serial
port, the kernel driver for serial ports cannot be used, and this
driver only supports standard serial hardware (8250, 16450, 16550)
par96: This is a modem for 9600 baud FSK compatible to the G3RUH standard.
The modem does all the filtering and regenerates the receiver clock.
Data is transferred from and to the PC via a shift register.
The shift register is filled with 16 bits and an interrupt is signalled.
The PC then empties the shift register in a burst. This modem connects
to the parallel port, hence the name. The modem leaves the
implementation of the HDLC protocol and the scrambler polynomial to
the PC.
picpar: This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem
is protocol compatible to par96, but uses only three low power ICs
and can therefore be fed from the parallel port and does not require
an additional power supply. Furthermore, it incorporates a carrier
detect circuitry.
EPP: This is a high-speed modem adaptor that connects to an enhanced parallel port.
Its target audience is users working over a high speed hub (76.8kbit/s).
eppfpga: This is a redesign of the EPP adaptor.
======= ========================================================================
ser12 This is a very simple 1200 baud AFSK modem. The modem consists only
of a modulator/demodulator chip, usually a TI TCM3105. The computer
is responsible for regenerating the receiver bit clock, as well as
for handling the HDLC protocol. The modem connects to a serial port,
hence the name. Since the serial port is not used as an async serial
port, the kernel driver for serial ports cannot be used, and this
driver only supports standard serial hardware (8250, 16450, 16550)
par96 This is a modem for 9600 baud FSK compatible to the G3RUH standard.
The modem does all the filtering and regenerates the receiver clock.
Data is transferred from and to the PC via a shift register.
The shift register is filled with 16 bits and an interrupt is signalled.
The PC then empties the shift register in a burst. This modem connects
to the parallel port, hence the name. The modem leaves the
implementation of the HDLC protocol and the scrambler polynomial to
the PC.
picpar This is a redesign of the par96 modem by Henning Rech, DF9IC. The modem
is protocol compatible to par96, but uses only three low power ICs
and can therefore be fed from the parallel port and does not require
an additional power supply. Furthermore, it incorporates a carrier
detect circuitry.
EPP This is a high-speed modem adaptor that connects to an enhanced parallel
port.
Its target audience is users working over a high speed hub (76.8kbit/s).
eppfpga This is a redesign of the EPP adaptor.
======= ========================================================================
All of the above modems only support half duplex communications. However,
the driver supports the KISS (see below) fullduplex command. It then simply
starts to send as soon as there's a packet to transmit and does not care
about DCD, i.e. it starts to send even if there's someone else on the channel.
This command is required by some implementations of the DAMA channel
This command is required by some implementations of the DAMA channel
access protocol.
The Interface of the drivers
============================
Unlike previous drivers, these drivers are no longer character devices,
but they are now true kernel network interfaces. Installation is therefore
......@@ -88,20 +96,22 @@ me for WAMPES which allows attaching a kernel network interface directly.
Configuring the driver
======================
Every time a driver is inserted into the kernel, it has to know which
modems it should access at which ports. This can be done with the setbaycom
utility. If you are only using one modem, you can also configure the
driver from the insmod command line (or by means of an option line in
/etc/modprobe.d/*.conf).
``/etc/modprobe.d/*.conf``).
Examples::
Examples:
modprobe baycom_ser_fdx mode="ser12*" iobase=0x3f8 irq=4
sethdlc -i bcsf0 -p mode "ser12*" io 0x3f8 irq 4
Both lines configure the first port to drive a ser12 modem at the first
serial port (COM1 under DOS). The * in the mode parameter instructs the driver to use
the software DCD algorithm (see below).
serial port (COM1 under DOS). The * in the mode parameter instructs the driver
to use the software DCD algorithm (see below)::
insmod baycom_par mode="picpar" iobase=0x378
sethdlc -i bcp0 -p mode "picpar" io 0x378
......@@ -115,29 +125,33 @@ Note that both utilities interpret the values slightly differently.
Hardware DCD versus Software DCD
================================
To avoid collisions on the air, the driver must know when the channel is
busy. This is the task of the DCD circuitry/software. The driver may either
utilise a software DCD algorithm (options=1) or use a DCD signal from
the hardware (options=0).
ser12: if software DCD is utilised, the radio's squelch should always be
open. It is highly recommended to use the software DCD algorithm,
as it is much faster than most hardware squelch circuitry. The
disadvantage is a slightly higher load on the system.
======= =================================================================
ser12 if software DCD is utilised, the radio's squelch should always be
open. It is highly recommended to use the software DCD algorithm,
as it is much faster than most hardware squelch circuitry. The
disadvantage is a slightly higher load on the system.
par96: the software DCD algorithm for this type of modem is rather poor.
The modem simply does not provide enough information to implement
a reasonable DCD algorithm in software. Therefore, if your radio
feeds the DCD input of the PAR96 modem, the use of the hardware
DCD circuitry is recommended.
par96 the software DCD algorithm for this type of modem is rather poor.
The modem simply does not provide enough information to implement
a reasonable DCD algorithm in software. Therefore, if your radio
feeds the DCD input of the PAR96 modem, the use of the hardware
DCD circuitry is recommended.
picpar: the picpar modem features a builtin DCD hardware, which is highly
recommended.
picpar the picpar modem features a builtin DCD hardware, which is highly
recommended.
======= =================================================================
Compatibility with the rest of the Linux kernel
===============================================
The serial driver and the baycom serial drivers compete
for the same hardware resources. Of course only one driver can access a given
......@@ -154,5 +168,7 @@ The parallel port drivers (baycom_par, baycom_epp) now use the parport subsystem
to arbitrate the ports between different client drivers.
vy 73s de
Tom Sailer, sailer@ife.ee.ethz.ch
hb9jnx @ hb9w.ampr.org
This source diff could not be displayed because it is too large. You can view the blob instead.
:orphan:
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
......
.. SPDX-License-Identifier: GPL-2.0
CAIF
====
Contents:
.. toctree::
:maxdepth: 2
linux_caif
caif
spi_porting
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
==========
Linux CAIF
===========
copyright (C) ST-Ericsson AB 2010
Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
License terms: GNU General Public License (GPL) version 2
==========
Copyright |copy| ST-Ericsson AB 2010
:Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
:License terms: GNU General Public License (GPL) version 2
Introduction
------------
============
CAIF is a MUX protocol used by ST-Ericsson cellular modems for
communication between Modem and host. The host processes can open virtual AT
channels, initiate GPRS Data connections, Video channels and Utility Channels.
......@@ -16,13 +23,16 @@ ST-Ericsson modems support a number of transports between modem
and host. Currently, UART and Loopback are available for Linux.
Architecture:
------------
Architecture
============
The implementation of CAIF is divided into:
* CAIF Socket Layer and GPRS IP Interface.
* CAIF Core Protocol Implementation
* CAIF Link Layer, implemented as NET devices.
::
RTNL
!
......@@ -46,12 +56,12 @@ The implementation of CAIF is divided into:
I M P L E M E N T A T I O N
===========================
Implementation
==============
CAIF Core Protocol Layer
=========================================
------------------------
CAIF Core layer implements the CAIF protocol as defined by ST-Ericsson.
It implements the CAIF protocol stack in a layered approach, where
......@@ -59,8 +69,11 @@ each layer described in the specification is implemented as a separate layer.
The architecture is inspired by the design patterns "Protocol Layer" and
"Protocol Packet".
== CAIF structure ==
CAIF structure
^^^^^^^^^^^^^^
The Core CAIF implementation contains:
- Simple implementation of CAIF.
- Layered architecture (a la Streams), each layer in the CAIF
specification is implemented in a separate c-file.
......@@ -73,7 +86,8 @@ The Core CAIF implementation contains:
to the called function (except for framing layers' receive function)
Layered Architecture
--------------------
====================
The CAIF protocol can be divided into two parts: Support functions and Protocol
Implementation. The support functions include:
......@@ -112,7 +126,7 @@ The CAIF Protocol implementation contains:
- CFSERL CAIF Serial layer. Handles concatenation/split of frames
into CAIF Frames with correct length.
::
+---------+
| Config |
......@@ -143,18 +157,24 @@ The CAIF Protocol implementation contains:
In this layered approach the following "rules" apply.
- All layers embed the same structure "struct cflayer"
- A layer does not depend on any other layer's private data.
- Layers are stacked by setting the pointers
- Layers are stacked by setting the pointers::
layer->up , layer->dn
- In order to send data upwards, each layer should do
- In order to send data upwards, each layer should do::
layer->up->receive(layer->up, packet);
- In order to send data downwards, each layer should do
- In order to send data downwards, each layer should do::
layer->dn->transmit(layer->dn, packet);
CAIF Socket and IP interface
===========================
============================
The IP interface and CAIF socket API are implemented on top of the
CAIF Core protocol. The IP Interface and CAIF socket have an instance of
......
- CAIF SPI porting -
.. SPDX-License-Identifier: GPL-2.0
- CAIF SPI basics:
================
CAIF SPI porting
================
CAIF SPI basics
===============
Running CAIF over SPI needs some extra setup, owing to the nature of SPI.
Two extra GPIOs have been added in order to negotiate the transfers
between the master and the slave. The minimum requirement for running
between the master and the slave. The minimum requirement for running
CAIF over SPI is a SPI slave chip and two GPIOs (more details below).
Please note that running as a slave implies that you need to keep up
with the master clock. An overrun or underrun event is fatal.
- CAIF SPI framework:
CAIF SPI framework
==================
To make porting as easy as possible, the CAIF SPI has been divided in
two parts. The first part (called the interface part) deals with all
......@@ -27,7 +33,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
need to implement the following
functions:
int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev):
::
int (*init_xfer) (struct cfspi_xfer * xfer, struct cfspi_dev *dev):
This function is called by the CAIF SPI interface to give
you a chance to set up your hardware to be ready to receive
......@@ -36,7 +44,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
of the transfer in both directions.The dev parameter can be used
to map to different CAIF SPI slave devices.
void (*sig_xfer) (bool xfer, struct cfspi_dev *dev):
::
void (*sig_xfer) (bool xfer, struct cfspi_dev *dev):
This function is called by the CAIF SPI interface when the output
(SPI_INT) GPIO needs to change state. The boolean value of the xfer
......@@ -46,7 +56,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
- Functionality provided by the CAIF SPI interface:
void (*ss_cb) (bool assert, struct cfspi_ifc *ifc);
::
void (*ss_cb) (bool assert, struct cfspi_ifc *ifc);
This function is called by the CAIF SPI slave device in order to
signal a change of state of the input GPIO (SS) to the interface.
......@@ -55,7 +67,9 @@ the physical hardware, both with regard to SPI and to GPIOs.
not to introduce latency). The ifc parameter should be the pointer
returned from the platform probe function in the SPI device structure.
void (*xfer_done_cb) (struct cfspi_ifc *ifc);
::
void (*xfer_done_cb) (struct cfspi_ifc *ifc);
This function is called by the CAIF SPI slave device in order to
report that a transfer is completed. This function should only be
......@@ -68,17 +82,24 @@ the physical hardware, both with regard to SPI and to GPIOs.
- Filling in the SPI slave device structure:
Connect the necessary callback functions.
Indicate clock speed (used to calculate toggle delays).
Chose a suitable name (helps debugging if you use several CAIF
SPI slave devices).
Assign your private data (can be used to map to your structure).
Connect the necessary callback functions.
Indicate clock speed (used to calculate toggle delays).
Chose a suitable name (helps debugging if you use several CAIF
SPI slave devices).
Assign your private data (can be used to map to your
structure).
- Filling in the SPI slave platform device structure:
Add name of driver to connect to ("cfspi_sspi").
Assign the SPI slave device structure as platform data.
- Padding:
Add name of driver to connect to ("cfspi_sspi").
Assign the SPI slave device structure as platform data.
Padding
=======
In order to optimize throughput, a number of SPI padding options are provided.
Padding can be enabled independently for uplink and downlink transfers.
......@@ -87,122 +108,122 @@ The padding needs to be correctly configured on both sides of the link.
The padding can be changed via module parameters in cfspi_sspi.c or via
the sysfs directory of the cfspi_sspi driver (before device registration).
- CAIF SPI device template:
/*
* Copyright (C) ST-Ericsson AB 2010
* Author: Daniel Martensson / Daniel.Martensson@stericsson.com
* License terms: GNU General Public License (GPL), version 2.
*
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/wait.h>
#include <linux/interrupt.h>
#include <linux/dma-mapping.h>
#include <net/caif/caif_spi.h>
MODULE_LICENSE("GPL");
struct sspi_struct {
struct cfspi_dev sdev;
struct cfspi_xfer *xfer;
};
static struct sspi_struct slave;
static struct platform_device slave_device;
static irqreturn_t sspi_irq(int irq, void *arg)
{
/* You only need to trigger on an edge to the active state of the
* SS signal. Once a edge is detected, the ss_cb() function should be
* called with the parameter assert set to true. It is OK
* (and even advised) to call the ss_cb() function in IRQ context in
* order not to add any delay. */
return IRQ_HANDLED;
}
static void sspi_complete(void *context)
{
/* Normally the DMA or the SPI framework will call you back
* in something similar to this. The only thing you need to
* do is to call the xfer_done_cb() function, providing the pointer
* to the CAIF SPI interface. It is OK to call this function
* from IRQ context. */
}
static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev)
{
/* Store transfer info. For a normal implementation you should
* set up your DMA here and make sure that you are ready to
* receive the data from the master SPI. */
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
sspi->xfer = xfer;
return 0;
}
void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev)
{
/* If xfer is true then you should assert the SPI_INT to indicate to
* the master that you are ready to receive the data from the master
* SPI. If xfer is false then you should de-assert SPI_INT to indicate
* that the transfer is done.
*/
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
}
static void sspi_release(struct device *dev)
{
/*
* Here you should release your SPI device resources.
*/
}
static int __init sspi_init(void)
{
/* Here you should initialize your SPI device by providing the
* necessary functions, clock speed, name and private data. Once
* done, you can register your device with the
* platform_device_register() function. This function will return
* with the CAIF SPI interface initialized. This is probably also
* the place where you should set up your GPIOs, interrupts and SPI
* resources. */
int res = 0;
/* Initialize slave device. */
slave.sdev.init_xfer = sspi_init_xfer;
slave.sdev.sig_xfer = sspi_sig_xfer;
slave.sdev.clk_mhz = 13;
slave.sdev.priv = &slave;
slave.sdev.name = "spi_sspi";
slave_device.dev.release = sspi_release;
/* Initialize platform device. */
slave_device.name = "cfspi_sspi";
slave_device.dev.platform_data = &slave.sdev;
/* Register platform device. */
res = platform_device_register(&slave_device);
if (res) {
printk(KERN_WARNING "sspi_init: failed to register dev.\n");
return -ENODEV;
}
return res;
}
static void __exit sspi_exit(void)
{
platform_device_del(&slave_device);
}
module_init(sspi_init);
module_exit(sspi_exit);
- CAIF SPI device template::
/*
* Copyright (C) ST-Ericsson AB 2010
* Author: Daniel Martensson / Daniel.Martensson@stericsson.com
* License terms: GNU General Public License (GPL), version 2.
*
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/wait.h>
#include <linux/interrupt.h>
#include <linux/dma-mapping.h>
#include <net/caif/caif_spi.h>
MODULE_LICENSE("GPL");
struct sspi_struct {
struct cfspi_dev sdev;
struct cfspi_xfer *xfer;
};
static struct sspi_struct slave;
static struct platform_device slave_device;
static irqreturn_t sspi_irq(int irq, void *arg)
{
/* You only need to trigger on an edge to the active state of the
* SS signal. Once a edge is detected, the ss_cb() function should be
* called with the parameter assert set to true. It is OK
* (and even advised) to call the ss_cb() function in IRQ context in
* order not to add any delay. */
return IRQ_HANDLED;
}
static void sspi_complete(void *context)
{
/* Normally the DMA or the SPI framework will call you back
* in something similar to this. The only thing you need to
* do is to call the xfer_done_cb() function, providing the pointer
* to the CAIF SPI interface. It is OK to call this function
* from IRQ context. */
}
static int sspi_init_xfer(struct cfspi_xfer *xfer, struct cfspi_dev *dev)
{
/* Store transfer info. For a normal implementation you should
* set up your DMA here and make sure that you are ready to
* receive the data from the master SPI. */
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
sspi->xfer = xfer;
return 0;
}
void sspi_sig_xfer(bool xfer, struct cfspi_dev *dev)
{
/* If xfer is true then you should assert the SPI_INT to indicate to
* the master that you are ready to receive the data from the master
* SPI. If xfer is false then you should de-assert SPI_INT to indicate
* that the transfer is done.
*/
struct sspi_struct *sspi = (struct sspi_struct *)dev->priv;
}
static void sspi_release(struct device *dev)
{
/*
* Here you should release your SPI device resources.
*/
}
static int __init sspi_init(void)
{
/* Here you should initialize your SPI device by providing the
* necessary functions, clock speed, name and private data. Once
* done, you can register your device with the
* platform_device_register() function. This function will return
* with the CAIF SPI interface initialized. This is probably also
* the place where you should set up your GPIOs, interrupts and SPI
* resources. */
int res = 0;
/* Initialize slave device. */
slave.sdev.init_xfer = sspi_init_xfer;
slave.sdev.sig_xfer = sspi_sig_xfer;
slave.sdev.clk_mhz = 13;
slave.sdev.priv = &slave;
slave.sdev.name = "spi_sspi";
slave_device.dev.release = sspi_release;
/* Initialize platform device. */
slave_device.name = "cfspi_sspi";
slave_device.dev.platform_data = &slave.sdev;
/* Register platform device. */
res = platform_device_register(&slave_device);
if (res) {
printk(KERN_WARNING "sspi_init: failed to register dev.\n");
return -ENODEV;
}
return res;
}
static void __exit sspi_exit(void)
{
platform_device_del(&slave_device);
}
module_init(sspi_init);
module_exit(sspi_exit);
cdc_mbim - Driver for CDC MBIM Mobile Broadband modems
========================================================
.. SPDX-License-Identifier: GPL-2.0
======================================================
cdc_mbim - Driver for CDC MBIM Mobile Broadband modems
======================================================
The cdc_mbim driver supports USB devices conforming to the "Universal
Serial Bus Communications Class Subclass Specification for Mobile
......@@ -19,9 +22,9 @@ by a cdc_ncm driver parameter:
prefer_mbim
-----------
Type: Boolean
Valid Range: N/Y (0-1)
Default Value: Y (MBIM is preferred)
:Type: Boolean
:Valid Range: N/Y (0-1)
:Default Value: Y (MBIM is preferred)
This parameter sets the system policy for NCM/MBIM functions. Such
functions will be handled by either the cdc_ncm driver or the cdc_mbim
......@@ -44,11 +47,13 @@ userspace MBIM management application always is required to enable a
MBIM function.
Such userspace applications includes, but are not limited to:
- mbimcli (included with the libmbim [3] library), and
- ModemManager [4]
Establishing a MBIM IP session reequires at least these actions by the
management application:
- open the control channel
- configure network connection settings
- connect to network
......@@ -76,7 +81,7 @@ complies with all the control channel requirements in [1].
The cdc-wdmX device is created as a child of the MBIM control
interface USB device. The character device associated with a specific
MBIM function can be looked up using sysfs. For example:
MBIM function can be looked up using sysfs. For example::
bjorn@nemi:~$ ls /sys/bus/usb/drivers/cdc_mbim/2-4:2.12/usbmisc
cdc-wdm0
......@@ -119,13 +124,15 @@ negotiated control message size.
/dev/cdc-wdmX ioctl()
--------------------
---------------------
IOCTL_WDM_MAX_COMMAND: Get Maximum Command Size
This ioctl returns the wMaxControlMessage field of the CDC MBIM
functional descriptor for MBIM devices. This is intended as a
convenience, eliminating the need to parse the USB descriptors from
userspace.
::
#include <stdio.h>
#include <fcntl.h>
#include <sys/ioctl.h>
......@@ -178,7 +185,7 @@ VLAN links prior to establishing MBIM IP sessions where the SessionId
is greater than 0. These links can be added by using the normal VLAN
kernel interfaces, either ioctl or netlink.
For example, adding a link for a MBIM IP session with SessionId 3:
For example, adding a link for a MBIM IP session with SessionId 3::
ip link add link wwan0 name wwan0.3 type vlan id 3
......@@ -207,6 +214,7 @@ the stream to the end user in an appropriate way for the stream type.
The network device ABI requires a dummy ethernet header for every DSS
data frame being transported. The contents of this header is
arbitrary, with the following exceptions:
- TX frames using an IP protocol (0x0800 or 0x86dd) will be dropped
- RX frames will have the protocol field set to ETH_P_802_3 (but will
not be properly formatted 802.3 frames)
......@@ -218,7 +226,7 @@ adding the dummy ethernet header on TX and stripping it on RX.
This is a simple example using tools commonly available, exporting
DssSessionId 5 as a pty character device pointed to by a /dev/nmea
symlink:
symlink::
ip link add link wwan0 name wwan0.dss5 type vlan id 261
ip link set dev wwan0.dss5 up
......@@ -236,7 +244,7 @@ map frames to the correct DSS session and adding 18 byte VLAN ethernet
headers with the appropriate tag on TX. In this case using a socket
filter is recommended, matching only the DSS VLAN subset. This avoid
unnecessary copying of unrelated IP session data to userspace. For
example:
example::
static struct sock_filter dssfilter[] = {
/* use special negative offsets to get VLAN tag */
......@@ -249,11 +257,11 @@ example:
BPF_JUMP(BPF_JMP|BPF_JGE|BPF_K, 512, 3, 0), /* 511 is last DSS VLAN */
/* verify ethertype */
BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 2 * ETH_ALEN),
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, ETH_P_802_3, 0, 1),
BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 2 * ETH_ALEN),
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, ETH_P_802_3, 0, 1),
BPF_STMT(BPF_RET|BPF_K, (u_int)-1), /* accept */
BPF_STMT(BPF_RET|BPF_K, 0), /* ignore */
BPF_STMT(BPF_RET|BPF_K, (u_int)-1), /* accept */
BPF_STMT(BPF_RET|BPF_K, 0), /* ignore */
};
......@@ -266,6 +274,7 @@ network device.
This mapping implies a few restrictions on multiplexed IPS and DSS
sessions, which may not always be practical:
- no IPS or DSS session can use a frame size greater than the MTU on
IP session 0
- no IPS or DSS session can be in the up state unless the network
......@@ -280,7 +289,7 @@ device.
Tip: It might be less confusing to the end user to name this VLAN
subdevice after the MBIM SessionID instead of the VLAN ID. For
example:
example::
ip link add link wwan0 name wwan0.0 type vlan id 4094
......@@ -290,7 +299,7 @@ VLAN mapping
Summarizing the cdc_mbim driver mapping described above, we have this
relationship between VLAN tags on the wwanY network device and MBIM
sessions on the shared USB data channel:
sessions on the shared USB data channel::
VLAN ID MBIM type MBIM SessionID Notes
---------------------------------------------------------
......@@ -310,30 +319,37 @@ sessions on the shared USB data channel:
References
==========
[1] USB Implementers Forum, Inc. - "Universal Serial Bus
Communications Class Subclass Specification for Mobile Broadband
Interface Model", Revision 1.0 (Errata 1), May 1, 2013
1) USB Implementers Forum, Inc. - "Universal Serial Bus
Communications Class Subclass Specification for Mobile Broadband
Interface Model", Revision 1.0 (Errata 1), May 1, 2013
- http://www.usb.org/developers/docs/devclass_docs/
[2] USB Implementers Forum, Inc. - "Universal Serial Bus
Communications Class Subclass Specifications for Network Control
Model Devices", Revision 1.0 (Errata 1), November 24, 2010
2) USB Implementers Forum, Inc. - "Universal Serial Bus
Communications Class Subclass Specifications for Network Control
Model Devices", Revision 1.0 (Errata 1), November 24, 2010
- http://www.usb.org/developers/docs/devclass_docs/
[3] libmbim - "a glib-based library for talking to WWAN modems and
devices which speak the Mobile Interface Broadband Model (MBIM)
protocol"
3) libmbim - "a glib-based library for talking to WWAN modems and
devices which speak the Mobile Interface Broadband Model (MBIM)
protocol"
- http://www.freedesktop.org/wiki/Software/libmbim/
[4] ModemManager - "a DBus-activated daemon which controls mobile
broadband (2G/3G/4G) devices and connections"
4) ModemManager - "a DBus-activated daemon which controls mobile
broadband (2G/3G/4G) devices and connections"
- http://www.freedesktop.org/wiki/Software/ModemManager/
[5] "MBIM (Mobile Broadband Interface Model) Registry"
5) "MBIM (Mobile Broadband Interface Model) Registry"
- http://compliance.usb.org/mbim/
[6] "/sys/kernel/debug/usb/devices output format"
6) "/sys/kernel/debug/usb/devices output format"
- Documentation/driver-api/usb/usb.rst
[7] "/sys/bus/usb/devices/.../descriptors"
7) "/sys/bus/usb/devices/.../descriptors"
- Documentation/ABI/stable/sysfs-bus-usb
Text File for the COPS LocalTalk Linux driver (cops.c).
By Jay Schulist <jschlst@samba.org>
.. SPDX-License-Identifier: GPL-2.0
========================================
The COPS LocalTalk Linux driver (cops.c)
========================================
By Jay Schulist <jschlst@samba.org>
This driver has two modes and they are: Dayna mode and Tangent mode.
Each mode corresponds with the type of card. It has been found
that there are 2 main types of cards and all other cards are
the same and just have different names or only have minor differences
such as more IO ports. As this driver is tested it will
become more clear exactly what cards are supported.
become more clear exactly what cards are supported.
Right now these cards are known to work with the COPS driver. The
LT-200 cards work in a somewhat more limited capacity than the
DL200 cards, which work very well and are in use by many people.
TANGENT driver mode:
Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200
- Tangent ATB-II, Novell NL-1000, Daystar Digital LT-200
DAYNA driver mode:
Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95,
Farallon PhoneNET PC III, Farallon PhoneNET PC II
- Dayna DL2000/DaynaTalk PC (Half Length), COPS LT-95,
- Farallon PhoneNET PC III, Farallon PhoneNET PC II
Other cards possibly supported mode unknown though:
Dayna DL2000 (Full length)
- Dayna DL2000 (Full length)
The COPS driver defaults to using Dayna mode. To change the driver's
The COPS driver defaults to using Dayna mode. To change the driver's
mode if you built a driver with dual support use board_type=1 or
board_type=2 for Dayna or Tangent with insmod.
** Operation/loading of the driver.
Operation/loading of the driver
===============================
Use modprobe like this: /sbin/modprobe cops.o (IO #) (IRQ #)
If you do not specify any options the driver will try and use the IO = 0x240,
IRQ = 5. As of right now I would only use IRQ 5 for the card, if autoprobing.
To load multiple COPS driver Localtalk cards you can do one of the following.
To load multiple COPS driver Localtalk cards you can do one of the following::
insmod cops io=0x240 irq=5
insmod -o cops2 cops io=0x260 irq=3
insmod cops io=0x240 irq=5
insmod -o cops2 cops io=0x260 irq=3
Or in lilo.conf put something like this::
Or in lilo.conf put something like this:
append="ether=5,0x240,lt0 ether=3,0x260,lt1"
Then bring up the interface with ifconfig. It will look something like this:
lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0
Then bring up the interface with ifconfig. It will look something like this::
lt0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-F7-00-00-00-00-00-00-00-00
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING NOARP MULTICAST MTU:600 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 coll:0
Netatalk Configuration
======================
** Netatalk Configuration
You will need to configure atalkd with something like the following to make
it work with the cops.c driver.
* For single LTalk card use.
dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
* For single LTalk card use::
dummy -seed -phase 2 -net 2000 -addr 2000.10 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
* For multiple cards, Ethernet and LocalTalk.
eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
* For multiple cards, Ethernet and LocalTalk::
eth0 -seed -phase 2 -net 3000 -addr 3000.20 -zone "1033"
lt0 -seed -phase 1 -net 1000 -addr 1000.50 -zone "1033"
* For multiple LocalTalk cards, and an Ethernet card.
* Order seems to matter here, Ethernet last.
lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1"
lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2"
eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk"
* Order seems to matter here, Ethernet last::
lt0 -seed -phase 1 -net 1000 -addr 1000.10 -zone "LocalTalk1"
lt1 -seed -phase 1 -net 2000 -addr 2000.20 -zone "LocalTalk2"
eth0 -seed -phase 2 -net 3000 -addr 3000.30 -zone "EtherTalk"
.. SPDX-License-Identifier: GPL-2.0
========================
ATM cxacru device driver
========================
Firmware is required for this device: http://accessrunner.sourceforge.net/
While it is capable of managing/maintaining the ADSL connection without the
......@@ -19,29 +25,35 @@ several sysfs attribute files for retrieving device statistics:
* adsl_headend
* adsl_headend_environment
Information about the remote headend.
- Information about the remote headend.
* adsl_config
Configuration writing interface.
Write parameters in hexadecimal format <index>=<value>,
separated by whitespace, e.g.:
- Configuration writing interface.
- Write parameters in hexadecimal format <index>=<value>,
separated by whitespace, e.g.:
"1=0 a=5"
Up to 7 parameters at a time will be sent and the modem will restart
the ADSL connection when any value is set. These are logged for future
reference.
- Up to 7 parameters at a time will be sent and the modem will restart
the ADSL connection when any value is set. These are logged for future
reference.
* downstream_attenuation (dB)
* downstream_bits_per_frame
* downstream_rate (kbps)
* downstream_snr_margin (dB)
Downstream stats.
- Downstream stats.
* upstream_attenuation (dB)
* upstream_bits_per_frame
* upstream_rate (kbps)
* upstream_snr_margin (dB)
* transmitter_power (dBm/Hz)
Upstream stats.
- Upstream stats.
* downstream_crc_errors
* downstream_fec_errors
......@@ -49,48 +61,56 @@ several sysfs attribute files for retrieving device statistics:
* upstream_crc_errors
* upstream_fec_errors
* upstream_hec_errors
Error counts.
- Error counts.
* line_startable
Indicates that ADSL support on the device
is/can be enabled, see adsl_start.
- Indicates that ADSL support on the device
is/can be enabled, see adsl_start.
* line_status
"initialising"
"down"
"attempting to activate"
"training"
"channel analysis"
"exchange"
"waiting"
"up"
- "initialising"
- "down"
- "attempting to activate"
- "training"
- "channel analysis"
- "exchange"
- "waiting"
- "up"
Changes between "down" and "attempting to activate"
if there is no signal.
* link_status
"not connected"
"connected"
"lost"
- "not connected"
- "connected"
- "lost"
* mac_address
* modulation
"" (when not connected)
"ANSI T1.413"
"ITU-T G.992.1 (G.DMT)"
"ITU-T G.992.2 (G.LITE)"
- "" (when not connected)
- "ANSI T1.413"
- "ITU-T G.992.1 (G.DMT)"
- "ITU-T G.992.2 (G.LITE)"
* startup_attempts
Count of total attempts to initialise ADSL.
- Count of total attempts to initialise ADSL.
To enable/disable ADSL, the following can be written to the adsl_state file:
"start"
"stop
"restart" (stops, waits 1.5s, then starts)
"poll" (used to resume status polling if it was disabled due to failure)
Changes in adsl/line state are reported via kernel log messages:
- "start"
- "stop
- "restart" (stops, waits 1.5s, then starts)
- "poll" (used to resume status polling if it was disabled due to failure)
Changes in adsl/line state are reported via kernel log messages::
[4942145.150704] ATM dev 0: ADSL state: running
[4942243.663766] ATM dev 0: ADSL line: down
[4942249.665075] ATM dev 0: ADSL line: attempting to activate
......
.. SPDX-License-Identifier: GPL-2.0
=============
DCCP protocol
=============
Contents
========
- Introduction
- Missing features
- Socket options
- Sysctl variables
- IOCTLs
- Other tunables
- Notes
.. Contents
- Introduction
- Missing features
- Socket options
- Sysctl variables
- IOCTLs
- Other tunables
- Notes
Introduction
......@@ -38,6 +40,7 @@ The Linux DCCP implementation does not currently support all the features that a
specified in RFCs 4340...42.
The known bugs are at:
http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP
For more up-to-date versions of the DCCP implementation, please consider using
......@@ -54,7 +57,8 @@ defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
be formatted using a cmsg(3) message header filled in as follows:
be formatted using a cmsg(3) message header filled in as follows::
cmsg->cmsg_level = SOL_DCCP;
cmsg->cmsg_type = DCCP_SCM_PRIORITY;
cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */
......@@ -94,7 +98,7 @@ must be registered on the socket before calling connect() or listen().
DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
Please note that the getsockopt argument type here is `int', not uint8_t.
Please note that the getsockopt argument type here is ``int``, not uint8_t.
DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
......@@ -113,6 +117,7 @@ be enabled at the receiver, too with suitable choice of CsCov.
DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
range 0..15 are acceptable. The default setting is 0 (full coverage),
values between 1..15 indicate partial coverage.
DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
sets a threshold, where again values 0..15 are acceptable. The default
of 0 means that all packets with a partial coverage will be discarded.
......@@ -123,11 +128,13 @@ DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
The following two options apply to CCID 3 exclusively and are getsockopt()-only.
In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
DCCP_SOCKOPT_CCID_RX_INFO
Returns a `struct tfrc_rx_info' in optval; the buffer for optval and
Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and
optlen must be set to at least sizeof(struct tfrc_rx_info).
DCCP_SOCKOPT_CCID_TX_INFO
Returns a `struct tfrc_tx_info' in optval; the buffer for optval and
Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and
optlen must be set to at least sizeof(struct tfrc_tx_info).
On unidirectional connections it is useful to close the unused half-connection
......@@ -182,7 +189,7 @@ sync_ratelimit = 125 ms
IOCTLS
======
FIONREAD
Works as in udp(7): returns in the `int' argument pointer the size of
Works as in udp(7): returns in the ``int`` argument pointer the size of
the next pending datagram in bytes, or 0 when no datagram is pending.
......@@ -191,10 +198,12 @@ Other tunables
Per-route rto_min support
CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value
of the RTO timer. This setting can be modified via the 'rto_min' option
of iproute2; for example:
of iproute2; for example::
> ip route change 10.0.0.0/24 rto_min 250j dev wlan0
> ip route add 10.0.0.254/32 rto_min 800j dev wlan0
> ip route show dev wlan0
CCID-3 also supports the rto_min setting: it is used to define the lower
bound for the expiry of the nofeedback timer. This can be useful on LANs
with very low RTTs (e.g., loopback, Gbit ethernet).
......
.. SPDX-License-Identifier: GPL-2.0
======================
DCTCP (DataCenter TCP)
----------------------
======================
DCTCP is an enhancement to the TCP congestion control algorithm for data
center networks and leverages Explicit Congestion Notification (ECN) in
the data center network to provide multi-bit feedback to the end hosts.
To enable it on end hosts:
To enable it on end hosts::
sysctl -w net.ipv4.tcp_congestion_control=dctcp
sysctl -w net.ipv4.tcp_ecn_fallback=0 (optional)
......@@ -25,14 +28,19 @@ SIGCOMM/SIGMETRICS papers:
i) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye,
Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan:
"Data Center TCP (DCTCP)", Data Center Networks session
"Data Center TCP (DCTCP)", Data Center Networks session"
Proc. ACM SIGCOMM, New Delhi, 2010.
http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
http://www.sigcomm.org/ccr/papers/2010/October/1851275.1851192
ii) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar:
"Analysis of DCTCP: Stability, Convergence, and Fairness"
Proc. ACM SIGMETRICS, San Jose, 2011.
http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
IETF informational draft:
......
Linux DECnet Networking Layer Information
===========================================
.. SPDX-License-Identifier: GPL-2.0
1) Other documentation....
=========================================
Linux DECnet Networking Layer Information
=========================================
o Project Home Pages
http://www.chygwyn.com/ - Kernel info
http://linux-decnet.sourceforge.net/ - Userland tools
http://www.sourceforge.net/projects/linux-decnet/ - Status page
1. Other documentation....
==========================
2) Configuring the kernel
- Project Home Pages
- http://www.chygwyn.com/ - Kernel info
- http://linux-decnet.sourceforge.net/ - Userland tools
- http://www.sourceforge.net/projects/linux-decnet/ - Status page
2. Configuring the kernel
=========================
Be sure to turn on the following options:
CONFIG_DECNET (obviously)
CONFIG_PROC_FS (to see what's going on)
CONFIG_SYSCTL (for easy configuration)
- CONFIG_DECNET (obviously)
- CONFIG_PROC_FS (to see what's going on)
- CONFIG_SYSCTL (for easy configuration)
if you want to try out router support (not properly debugged yet)
you'll need the following options as well...
CONFIG_DECNET_ROUTER (to be able to add/delete routes)
CONFIG_NETFILTER (will be required for the DECnet routing daemon)
- CONFIG_DECNET_ROUTER (to be able to add/delete routes)
- CONFIG_NETFILTER (will be required for the DECnet routing daemon)
Don't turn on SIOCGIFCONF support for DECnet unless you are really sure
that you need it, in general you won't and it can cause ifconfig to
......@@ -29,7 +34,7 @@ malfunction.
Run time configuration has changed slightly from the 2.4 system. If you
want to configure an endnode, then the simplified procedure is as follows:
o Set the MAC address on your ethernet card before starting _any_ other
- Set the MAC address on your ethernet card before starting _any_ other
network protocols.
As soon as your network card is brought into the UP state, DECnet should
......@@ -37,7 +42,8 @@ start working. If you need something more complicated or are unsure how
to set the MAC address, see the next section. Also all configurations which
worked with 2.4 will work under 2.5 with no change.
3) Command line options
3. Command line options
=======================
You can set a DECnet address on the kernel command line for compatibility
with the 2.4 configuration procedure, but in general it's not needed any more.
......@@ -56,7 +62,7 @@ interface then you won't see any entries in /proc/net/neigh for the local
host until such time as you start a connection. This doesn't affect the
operation of the local communications in any other way though.
The kernel command line takes options looking like the following:
The kernel command line takes options looking like the following::
decnet.addr=1,2
......@@ -82,7 +88,7 @@ address of the node in order for it to be autoconfigured (and then appear in
FTP sites called dn2ethaddr which can compute the correct ethernet
address to use. The address can be set by ifconfig either before or
at the time the device is brought up. If you are using RedHat you can
add the line:
add the line::
MACADDR=AA:00:04:00:03:04
......@@ -95,7 +101,7 @@ verify with iproute2).
The default device for routing can be set through the /proc filesystem
by setting /proc/sys/net/decnet/default_device to the
device you want DECnet to route packets out of when no specific route
is available. Usually this will be eth0, for example:
is available. Usually this will be eth0, for example::
echo -n "eth0" >/proc/sys/net/decnet/default_device
......@@ -106,7 +112,9 @@ confirm that by looking in the default_device file of course.
There is a list of what the other files under /proc/sys/net/decnet/ do
on the kernel patch web site (shown above).
4) Run time kernel configuration
4. Run time kernel configuration
================================
This is either done through the sysctl/proc interface (see the kernel web
pages for details on what the various options do) or through the iproute2
......@@ -122,20 +130,21 @@ since its the _only_ way to add and delete routes currently. Eventually
there will be a routing daemon to send and receive routing messages for
each interface and update the kernel routing tables accordingly. The
routing daemon will use netfilter to listen to routing packets, and
rtnetlink to update the kernels routing tables.
rtnetlink to update the kernels routing tables.
The DECnet raw socket layer has been removed since it was there purely
for use by the routing daemon which will now use netfilter (a much cleaner
and more generic solution) instead.
5) How can I tell if its working ?
5. How can I tell if its working?
=================================
Here is a quick guide of what to look for in order to know if your DECnet
kernel subsystem is working.
- Is the node address set (see /proc/sys/net/decnet/node_address)
- Is the node of the correct type
(see /proc/sys/net/decnet/conf/<dev>/forwarding)
- Is the node of the correct type
(see /proc/sys/net/decnet/conf/<dev>/forwarding)
- Is the Ethernet MAC address of each Ethernet card set to match
the DECnet address. If in doubt use the dn2ethaddr utility available
at the ftp archive.
......@@ -160,7 +169,8 @@ kernel subsystem is working.
network, and see if you can obtain the same results.
- At this point you are on your own... :-)
6) How to send a bug report
6. How to send a bug report
===========================
If you've found a bug and want to report it, then there are several things
you can do to help me work out exactly what it is that is wrong. Useful
......@@ -175,18 +185,19 @@ information (_most_ of which _is_ _essential_) includes:
- How much data was being transferred ?
- Was the network congested ?
- How can the problem be reproduced ?
- Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of
- Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of
tcpdump don't understand how to dump DECnet properly, so including
the hex listing of the packet contents is _essential_, usually the -x flag.
You may also need to increase the length grabbed with the -s flag. The
-e flag also provides very useful information (ethernet MAC addresses))
7) MAC FAQ
7. MAC FAQ
==========
A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet
interact and how to get the best performance from your hardware.
interact and how to get the best performance from your hardware.
Ethernet cards are designed to normally only pass received network frames
Ethernet cards are designed to normally only pass received network frames
to a host computer when they are addressed to it, or to the broadcast address.
Linux has an interface which allows the setting of extra addresses for
......@@ -197,8 +208,8 @@ significant processor time and bus bandwidth can be used up on a busy
network (see the NAPI documentation for a longer explanation of these
effects).
DECnet makes use of this interface to allow running DECnet on an ethernet
card which has already been configured using TCP/IP (presumably using the
DECnet makes use of this interface to allow running DECnet on an ethernet
card which has already been configured using TCP/IP (presumably using the
built in MAC address of the card, as usual) and/or to allow multiple DECnet
addresses on each physical interface. If you do this, be aware that if your
ethernet card doesn't support perfect hashing in its MAC address filter
......@@ -210,7 +221,8 @@ to gain the best efficiency. Better still is to use a card which supports
NAPI as well.
8) Mailing list
8. Mailing list
===============
If you are keen to get involved in development, or want to ask questions
about configuration, or even just report bugs, then there is a mailing
......@@ -218,7 +230,8 @@ list that you can join, details are at:
http://sourceforge.net/mail/?group_id=4993
9) Legal Info
9. Legal Info
=============
The Linux DECnet project team have placed their code under the GPL. The
software is provided "as is" and without warranty express or implied.
......
Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4.
.. SPDX-License-Identifier: GPL-2.0
=====================================================
Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver
=====================================================
:Version: v.1.1.4
DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI
......
......@@ -33,7 +33,7 @@ The following features are now available in supported kernels:
- SNMP
Channel Bonding documentation can be found in the Linux kernel source:
/Documentation/networking/bonding.txt
/Documentation/networking/bonding.rst
Identifying Your Adapter
......
......@@ -37,7 +37,7 @@ The following features are available in this kernel:
- SNMP
Channel Bonding documentation can be found in the Linux kernel source:
/Documentation/networking/bonding.txt
/Documentation/networking/bonding.rst
The driver information previously displayed in the /proc filesystem is not
supported in this release. Alternatively, you can use ethtool (version 1.6
......
===================
DNS Resolver Module
===================
.. SPDX-License-Identifier: GPL-2.0
Contents:
===================
DNS Resolver Module
===================
.. Contents:
- Overview.
- Compilation.
......@@ -12,8 +14,7 @@ Contents:
- Debugging.
========
OVERVIEW
Overview
========
The DNS resolver module provides a way for kernel services to make DNS queries
......@@ -33,50 +34,50 @@ It does not yet support the following AFS features:
This code is extracted from the CIFS filesystem.
===========
COMPILATION
Compilation
===========
The module should be enabled by turning on the kernel configuration options:
The module should be enabled by turning on the kernel configuration options::
CONFIG_DNS_RESOLVER - tristate "DNS Resolver support"
==========
SETTING UP
Setting up
==========
To set up this facility, the /etc/request-key.conf file must be altered so that
/sbin/request-key can appropriately direct the upcalls. For example, to handle
basic dname to IPv4/IPv6 address resolution, the following line should be
added:
added::
#OP TYPE DESC CO-INFO PROGRAM ARG1 ARG2 ARG3 ...
#====== ============ ======= ======= ==========================
create dns_resolver * * /usr/sbin/cifs.upcall %k
To direct a query for query type 'foo', a line of the following should be added
before the more general line given above as the first match is the one taken.
before the more general line given above as the first match is the one taken::
create dns_resolver foo:* * /usr/sbin/dns.foo %k
=====
USAGE
Usage
=====
To make use of this facility, one of the following functions that are
implemented in the module can be called after doing:
implemented in the module can be called after doing::
#include <linux/dns_resolver.h>
(1) int dns_query(const char *type, const char *name, size_t namelen,
const char *options, char **_result, time_t *_expiry);
::
int dns_query(const char *type, const char *name, size_t namelen,
const char *options, char **_result, time_t *_expiry);
This is the basic access function. It looks for a cached DNS query and if
it doesn't find it, it upcalls to userspace to make a new DNS query, which
may then be cached. The key description is constructed as a string of the
form:
form::
[<type>:]<name>
......@@ -107,16 +108,14 @@ This can be cleared by any process that has the CAP_SYS_ADMIN capability by
the use of KEYCTL_KEYRING_CLEAR on the keyring ID.
===============================
READING DNS KEYS FROM USERSPACE
Reading DNS Keys from Userspace
===============================
Keys of dns_resolver type can be read from userspace using keyctl_read() or
"keyctl read/print/pipe".
=========
MECHANISM
Mechanism
=========
The dnsresolver module registers a key type called "dns_resolver". Keys of
......@@ -147,11 +146,10 @@ See <file:Documentation/security/keys/request-key.rst> for further
information about request-key function.
=========
DEBUGGING
Debugging
=========
Debugging messages can be turned on dynamically by writing a 1 into the
following file:
following file::
/sys/module/dnsresolver/parameters/debug
/sys/module/dnsresolver/parameters/debug
Document about softnet driver issues
.. SPDX-License-Identifier: GPL-2.0
=====================
Softnet Driver Issues
=====================
Transmit path guidelines:
......@@ -8,7 +12,7 @@ Transmit path guidelines:
transmit function will become busy.
Instead it must maintain the queue properly. For example,
for a driver implementing scatter-gather this means:
for a driver implementing scatter-gather this means::
static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
......@@ -38,25 +42,25 @@ Transmit path guidelines:
return NETDEV_TX_OK;
}
And then at the end of your TX reclamation event handling:
And then at the end of your TX reclamation event handling::
if (netif_queue_stopped(dp->dev) &&
TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
netif_wake_queue(dp->dev);
For a non-scatter-gather supporting card, the three tests simply become:
For a non-scatter-gather supporting card, the three tests simply become::
/* This is a hard error log it. */
if (TX_BUFFS_AVAIL(dp) <= 0)
and:
and::
if (TX_BUFFS_AVAIL(dp) == 0)
and:
and::
if (netif_queue_stopped(dp->dev) &&
TX_BUFFS_AVAIL(dp) > 0)
TX_BUFFS_AVAIL(dp) > 0)
netif_wake_queue(dp->dev);
2) An ndo_start_xmit method must not modify the shared parts of a
......@@ -86,7 +90,7 @@ Close/stop guidelines:
1) After the ndo_stop routine has been called, the hardware must
not receive or transmit any data. All in flight packets must
be aborted. If necessary, poll or wait for completion of
be aborted. If necessary, poll or wait for completion of
any reset commands.
2) The ndo_stop routine will be called by unregister_netdevice
......
EQL Driver: Serial IP Load Balancing HOWTO
.. SPDX-License-Identifier: GPL-2.0
==========================================
EQL Driver: Serial IP Load Balancing HOWTO
==========================================
Simon "Guru Aleph-Null" Janes, simon@ncm.com
v1.1, February 27, 1995
This is the manual for the EQL device driver. EQL is a software device
......@@ -12,7 +18,8 @@
which was only created to patch cleanly in the very latest kernel
source trees. (Yes, it worked fine.)
1. Introduction
1. Introduction
===============
Which is worse? A huge fee for a 56K leased line or two phone lines?
It's probably the former. If you find yourself craving more bandwidth,
......@@ -41,47 +48,40 @@
Hey, we can all dream you know...
2. Kernel Configuration
2. Kernel Configuration
=======================
Here I describe the general steps of getting a kernel up and working
with the eql driver. From patching, building, to installing.
2.1. Patching The Kernel
2.1. Patching The Kernel
------------------------
If you do not have or cannot get a copy of the kernel with the eql
driver folded into it, get your copy of the driver from
ftp://slaughter.ncm.com/pub/Linux/LOAD_BALANCING/eql-1.1.tar.gz.
Unpack this archive someplace obvious like /usr/local/src/. It will
create the following files:
create the following files::
______________________________________________________________________
-rw-r--r-- guru/ncm 198 Jan 19 18:53 1995 eql-1.1/NO-WARRANTY
-rw-r--r-- guru/ncm 30620 Feb 27 21:40 1995 eql-1.1/eql-1.1.patch
-rwxr-xr-x guru/ncm 16111 Jan 12 22:29 1995 eql-1.1/eql_enslave
-rw-r--r-- guru/ncm 2195 Jan 10 21:48 1995 eql-1.1/eql_enslave.c
______________________________________________________________________
Unpack a recent kernel (something after 1.1.92) someplace convenient
like say /usr/src/linux-1.1.92.eql. Use symbolic links to point
/usr/src/linux to this development directory.
Apply the patch by running the commands:
Apply the patch by running the commands::
______________________________________________________________________
cd /usr/src
patch </usr/local/src/eql-1.1/eql-1.1.patch
______________________________________________________________________
2.2. Building The Kernel
2.2. Building The Kernel
------------------------
After patching the kernel, run make config and configure the kernel
for your hardware.
......@@ -90,7 +90,8 @@
After configuration, make and install according to your habit.
3. Network Configuration
3. Network Configuration
========================
So far, I have only used the eql device with the DSLIP SLIP connection
manager by Matt Dillon (-- "The man who sold his soul to code so much
......@@ -100,37 +101,27 @@
connection.
3.1. /etc/rc.d/rc.inet1
3.1. /etc/rc.d/rc.inet1
-----------------------
In rc.inet1, ifconfig the eql device to the IP address you usually use
for your machine, and the MTU you prefer for your SLIP lines. One
could argue that MTU should be roughly half the usual size for two
modems, one-third for three, one-fourth for four, etc... But going
too far below 296 is probably overkill. Here is an example ifconfig
command that sets up the eql device:
command that sets up the eql device::
______________________________________________________________________
ifconfig eql 198.67.33.239 mtu 1006
______________________________________________________________________
Once the eql device is up and running, add a static default route to
it in the routing table using the cool new route syntax that makes
life so much easier:
life so much easier::
______________________________________________________________________
route add default eql
______________________________________________________________________
3.2. Enslaving Devices By Hand
3.2. Enslaving Devices By Hand
------------------------------
Enslaving devices by hand requires two utility programs: eql_enslave
and eql_emancipate (-- eql_emancipate hasn't been written because when
......@@ -140,87 +131,56 @@
The syntax for enslaving a device is "eql_enslave <master-name>
<slave-name> <estimated-bps>". Here are some example enslavings:
<slave-name> <estimated-bps>". Here are some example enslavings::
______________________________________________________________________
eql_enslave eql sl0 28800
eql_enslave eql ppp0 14400
eql_enslave eql sl1 57600
______________________________________________________________________
When you want to free a device from its life of slavery, you can
either down the device with ifconfig (eql will automatically bury the
dead slave and remove it from its queue) or use eql_emancipate to free
it. (-- Or just ifconfig it down, and the eql driver will take it out
for you.--)
for you.--)::
______________________________________________________________________
eql_emancipate eql sl0
eql_emancipate eql ppp0
eql_emancipate eql sl1
______________________________________________________________________
3.3. DSLIP Configuration for the eql Device
3.3. DSLIP Configuration for the eql Device
-------------------------------------------
The general idea is to bring up and keep up as many SLIP connections
as you need, automatically.
3.3.1. /etc/slip/runslip.conf
Here is an example runslip.conf:
3.3.1. /etc/slip/runslip.conf
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example runslip.conf::
name sl-line-1
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua2-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua2
name sl-line-2
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua3-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua3
______________________________________________________________________
name sl-line-1
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua2-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua2
name sl-line-2
enabled
baud 38400
mtu 576
ducmd -e /etc/slip/dialout/cua3-288.xp -t 9
command eql_enslave eql $interface 28800
address 198.67.33.239
line /dev/cua3
______________________________________________________________________
3.4. Using PPP and the eql Device
3.4. Using PPP and the eql Device
---------------------------------
I have not yet done any load-balancing testing for PPP devices, mainly
because I don't have a PPP-connection manager like SLIP has with
......@@ -235,7 +195,8 @@
year.
4. About the Slave Scheduler Algorithm
4. About the Slave Scheduler Algorithm
======================================
The slave scheduler probably could be replaced with a dozen other
things and push traffic much faster. The formula in the current set
......@@ -254,7 +215,8 @@
traffic and the "slower" modem starved.
5. Testers' Reports
5. Testers' Reports
===================
Some people have experimented with the eql device with newer
kernels (than 1.1.75). I have since updated the driver to patch
......@@ -262,87 +224,29 @@
balancing" driver config option.
o icee from LinuxNET patched 1.1.86 without any rejects and was able
- icee from LinuxNET patched 1.1.86 without any rejects and was able
to boot the kernel and enslave a couple of ISDN PPP links.
5.1. Randolph Bentson's Test Report
5.1. Randolph Bentson's Test Report
-----------------------------------
::
From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995
Date: Tue, 7 Feb 95 22:57 PST
From: Randolph Bentson <bentson@grieg.seaslug.org>
To: guru@ncm.com
Subject: EQL driver tests
I have been checking out your eql driver. (Nice work, that!)
Although you may already done this performance testing, here
are some data I've discovered.
Randolph Bentson
bentson@grieg.seaslug.org
From bentson@grieg.seaslug.org Wed Feb 8 19:08:09 1995
Date: Tue, 7 Feb 95 22:57 PST
From: Randolph Bentson <bentson@grieg.seaslug.org>
To: guru@ncm.com
Subject: EQL driver tests
I have been checking out your eql driver. (Nice work, that!)
Although you may already done this performance testing, here
are some data I've discovered.
Randolph Bentson
bentson@grieg.seaslug.org
---------------------------------------------------------
------------------------------------------------------------------
A pseudo-device driver, EQL, written by Simon Janes, can be used
......@@ -363,7 +267,7 @@
Once a link was established, I timed a binary ftp transfer of
289284 bytes of data. If there were no overhead (packet headers,
inter-character and inter-packet delays, etc.) the transfers
would take the following times:
would take the following times::
bits/sec seconds
345600 8.3
......@@ -388,141 +292,82 @@
that the connection establishment seemed fragile for the higher
speeds. Once established, the connection seemed robust enough.)
#lines speed mtu seconds theory actual %of
kbit/sec duration speed speed max
3 115200 900 _ 345600
3 115200 400 18.1 345600 159825 46
2 115200 900 _ 230400
2 115200 600 18.1 230400 159825 69
2 115200 400 19.3 230400 149888 65
4 57600 900 _ 234600
4 57600 600 _ 234600
4 57600 400 _ 234600
3 57600 600 20.9 172800 138413 80
3 57600 900 21.2 172800 136455 78
3 115200 600 21.7 345600 133311 38
3 57600 400 22.5 172800 128571 74
4 38400 900 25.2 153600 114795 74
4 38400 600 26.4 153600 109577 71
4 38400 400 27.3 153600 105965 68
2 57600 900 29.1 115200 99410.3 86
1 115200 900 30.7 115200 94229.3 81
2 57600 600 30.2 115200 95789.4 83
3 38400 900 30.3 115200 95473.3 82
3 38400 600 31.2 115200 92719.2 80
1 115200 600 31.3 115200 92423 80
2 57600 400 32.3 115200 89561.6 77
1 115200 400 32.8 115200 88196.3 76
3 38400 400 33.5 115200 86353.4 74
2 38400 900 43.7 76800 66197.7 86
2 38400 600 44 76800 65746.4 85
2 38400 400 47.2 76800 61289 79
4 19200 900 50.8 76800 56945.7 74
4 19200 400 53.2 76800 54376.7 70
4 19200 600 53.7 76800 53870.4 70
1 57600 900 54.6 57600 52982.4 91
1 57600 600 56.2 57600 51474 89
3 19200 900 60.5 57600 47815.5 83
1 57600 400 60.2 57600 48053.8 83
3 19200 600 62 57600 46658.7 81
3 19200 400 64.7 57600 44711.6 77
1 38400 900 79.4 38400 36433.8 94
1 38400 600 82.4 38400 35107.3 91
2 19200 900 84.4 38400 34275.4 89
1 38400 400 86.8 38400 33327.6 86
2 19200 600 87.6 38400 33023.3 85
2 19200 400 91.2 38400 31719.7 82
4 9600 900 94.7 38400 30547.4 79
4 9600 400 106 38400 27290.9 71
4 9600 600 110 38400 26298.5 68
3 9600 900 118 28800 24515.6 85
3 9600 600 120 28800 24107 83
3 9600 400 131 28800 22082.7 76
1 19200 900 155 19200 18663.5 97
1 19200 600 161 19200 17968 93
1 19200 400 170 19200 17016.7 88
2 9600 600 176 19200 16436.6 85
2 9600 900 180 19200 16071.3 83
2 9600 400 181 19200 15982.5 83
1 9600 900 305 9600 9484.72 98
1 9600 600 314 9600 9212.87 95
1 9600 400 332 9600 8713.37 90
5.2. Anthony Healy's Report
Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST)
From: Antony Healey <ahealey@st.nepean.uws.edu.au>
To: Simon Janes <guru@ncm.com>
Subject: Re: Load Balancing
Hi Simon,
====== ======== === ======== ======= ======= ===
#lines speed mtu seconds theory actual %of
kbit/sec duration speed speed max
====== ======== === ======== ======= ======= ===
3 115200 900 _ 345600
3 115200 400 18.1 345600 159825 46
2 115200 900 _ 230400
2 115200 600 18.1 230400 159825 69
2 115200 400 19.3 230400 149888 65
4 57600 900 _ 234600
4 57600 600 _ 234600
4 57600 400 _ 234600
3 57600 600 20.9 172800 138413 80
3 57600 900 21.2 172800 136455 78
3 115200 600 21.7 345600 133311 38
3 57600 400 22.5 172800 128571 74
4 38400 900 25.2 153600 114795 74
4 38400 600 26.4 153600 109577 71
4 38400 400 27.3 153600 105965 68
2 57600 900 29.1 115200 99410.3 86
1 115200 900 30.7 115200 94229.3 81
2 57600 600 30.2 115200 95789.4 83
3 38400 900 30.3 115200 95473.3 82
3 38400 600 31.2 115200 92719.2 80
1 115200 600 31.3 115200 92423 80
2 57600 400 32.3 115200 89561.6 77
1 115200 400 32.8 115200 88196.3 76
3 38400 400 33.5 115200 86353.4 74
2 38400 900 43.7 76800 66197.7 86
2 38400 600 44 76800 65746.4 85
2 38400 400 47.2 76800 61289 79
4 19200 900 50.8 76800 56945.7 74
4 19200 400 53.2 76800 54376.7 70
4 19200 600 53.7 76800 53870.4 70
1 57600 900 54.6 57600 52982.4 91
1 57600 600 56.2 57600 51474 89
3 19200 900 60.5 57600 47815.5 83
1 57600 400 60.2 57600 48053.8 83
3 19200 600 62 57600 46658.7 81
3 19200 400 64.7 57600 44711.6 77
1 38400 900 79.4 38400 36433.8 94
1 38400 600 82.4 38400 35107.3 91
2 19200 900 84.4 38400 34275.4 89
1 38400 400 86.8 38400 33327.6 86
2 19200 600 87.6 38400 33023.3 85
2 19200 400 91.2 38400 31719.7 82
4 9600 900 94.7 38400 30547.4 79
4 9600 400 106 38400 27290.9 71
4 9600 600 110 38400 26298.5 68
3 9600 900 118 28800 24515.6 85
3 9600 600 120 28800 24107 83
3 9600 400 131 28800 22082.7 76
1 19200 900 155 19200 18663.5 97
1 19200 600 161 19200 17968 93
1 19200 400 170 19200 17016.7 88
2 9600 600 176 19200 16436.6 85
2 9600 900 180 19200 16071.3 83
2 9600 400 181 19200 15982.5 83
1 9600 900 305 9600 9484.72 98
1 9600 600 314 9600 9212.87 95
1 9600 400 332 9600 8713.37 90
====== ======== === ======== ======= ======= ===
5.2. Anthony Healy's Report
---------------------------
::
Date: Mon, 13 Feb 1995 16:17:29 +1100 (EST)
From: Antony Healey <ahealey@st.nepean.uws.edu.au>
To: Simon Janes <guru@ncm.com>
Subject: Re: Load Balancing
Hi Simon,
I've installed your patch and it works great. I have trialed
it over twin SL/IP lines, just over null modems, but I was
able to data at over 48Kb/s [ISDN link -Simon]. I managed a
transfer of up to 7.5 Kbyte/s on one go, but averaged around
6.4 Kbyte/s, which I think is pretty cool. :)
LC-trie implementation notes.
.. SPDX-License-Identifier: GPL-2.0
============================
LC-trie implementation notes
============================
Node types
----------
leaf
leaf
An end node with data. This has a copy of the relevant key, along
with 'hlist' with routing table entries sorted by prefix length.
See struct leaf and struct leaf_info.
......@@ -13,7 +17,7 @@ trie node or tnode
A few concepts explained
------------------------
Bits (tnode)
Bits (tnode)
The number of bits in the key segment used for indexing into the
child array - the "child index". See Level Compression.
......@@ -23,7 +27,7 @@ Pos (tnode)
Path Compression / skipped bits
Any given tnode is linked to from the child array of its parent, using
a segment of the key specified by the parent's "pos" and "bits"
a segment of the key specified by the parent's "pos" and "bits"
In certain cases, this tnode's own "pos" will not be immediately
adjacent to the parent (pos+bits), but there will be some bits
in the key skipped over because they represent a single path with no
......@@ -56,8 +60,8 @@ full_children
Comments
---------
We have tried to keep the structure of the code as close to fib_hash as
possible to allow verification and help up reviewing.
We have tried to keep the structure of the code as close to fib_hash as
possible to allow verification and help up reviewing.
fib_find_node()
A good start for understanding this code. This function implements a
......
.. SPDX-License-Identifier: GPL-2.0
=======================================================
Linux Socket Filtering aka Berkeley Packet Filter (BPF)
=======================================================
......@@ -42,10 +45,10 @@ displays what is being placed into this structure.
Although we were only speaking about sockets here, BPF in Linux is used
in many more places. There's xt_bpf for netfilter, cls_bpf in the kernel
qdisc layer, SECCOMP-BPF (SECure COMPuting [1]), and lots of other places
qdisc layer, SECCOMP-BPF (SECure COMPuting [1]_), and lots of other places
such as team driver, PTP code, etc where BPF is being used.
[1] Documentation/userspace-api/seccomp_filter.rst
.. [1] Documentation/userspace-api/seccomp_filter.rst
Original BPF paper:
......@@ -59,23 +62,23 @@ Structure
---------
User space applications include <linux/filter.h> which contains the
following relevant structures:
following relevant structures::
struct sock_filter { /* Filter block */
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
};
struct sock_filter { /* Filter block */
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
};
Such a structure is assembled as an array of 4-tuples, that contains
a code, jt, jf and k value. jt and jf are jump offsets and k a generic
value to be used for a provided code.
value to be used for a provided code::
struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
unsigned short len; /* Number of filter blocks */
struct sock_filter __user *filter;
};
struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
unsigned short len; /* Number of filter blocks */
struct sock_filter __user *filter;
};
For socket filtering, a pointer to this structure (as shown in
follow-up example) is being passed to the kernel through setsockopt(2).
......@@ -83,55 +86,57 @@ follow-up example) is being passed to the kernel through setsockopt(2).
Example
-------
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <linux/if_ether.h>
/* ... */
/* From the example above: tcpdump -i em1 port 22 -dd */
struct sock_filter code[] = {
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 8, 0x000086dd },
{ 0x30, 0, 0, 0x00000014 },
{ 0x15, 2, 0, 0x00000084 },
{ 0x15, 1, 0, 0x00000006 },
{ 0x15, 0, 17, 0x00000011 },
{ 0x28, 0, 0, 0x00000036 },
{ 0x15, 14, 0, 0x00000016 },
{ 0x28, 0, 0, 0x00000038 },
{ 0x15, 12, 13, 0x00000016 },
{ 0x15, 0, 12, 0x00000800 },
{ 0x30, 0, 0, 0x00000017 },
{ 0x15, 2, 0, 0x00000084 },
{ 0x15, 1, 0, 0x00000006 },
{ 0x15, 0, 8, 0x00000011 },
{ 0x28, 0, 0, 0x00000014 },
{ 0x45, 6, 0, 0x00001fff },
{ 0xb1, 0, 0, 0x0000000e },
{ 0x48, 0, 0, 0x0000000e },
{ 0x15, 2, 0, 0x00000016 },
{ 0x48, 0, 0, 0x00000010 },
{ 0x15, 0, 1, 0x00000016 },
{ 0x06, 0, 0, 0x0000ffff },
{ 0x06, 0, 0, 0x00000000 },
};
struct sock_fprog bpf = {
.len = ARRAY_SIZE(code),
.filter = code,
};
sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (sock < 0)
/* ... bail out ... */
ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
if (ret < 0)
/* ... bail out ... */
/* ... */
close(sock);
::
#include <sys/socket.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <linux/if_ether.h>
/* ... */
/* From the example above: tcpdump -i em1 port 22 -dd */
struct sock_filter code[] = {
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 8, 0x000086dd },
{ 0x30, 0, 0, 0x00000014 },
{ 0x15, 2, 0, 0x00000084 },
{ 0x15, 1, 0, 0x00000006 },
{ 0x15, 0, 17, 0x00000011 },
{ 0x28, 0, 0, 0x00000036 },
{ 0x15, 14, 0, 0x00000016 },
{ 0x28, 0, 0, 0x00000038 },
{ 0x15, 12, 13, 0x00000016 },
{ 0x15, 0, 12, 0x00000800 },
{ 0x30, 0, 0, 0x00000017 },
{ 0x15, 2, 0, 0x00000084 },
{ 0x15, 1, 0, 0x00000006 },
{ 0x15, 0, 8, 0x00000011 },
{ 0x28, 0, 0, 0x00000014 },
{ 0x45, 6, 0, 0x00001fff },
{ 0xb1, 0, 0, 0x0000000e },
{ 0x48, 0, 0, 0x0000000e },
{ 0x15, 2, 0, 0x00000016 },
{ 0x48, 0, 0, 0x00000010 },
{ 0x15, 0, 1, 0x00000016 },
{ 0x06, 0, 0, 0x0000ffff },
{ 0x06, 0, 0, 0x00000000 },
};
struct sock_fprog bpf = {
.len = ARRAY_SIZE(code),
.filter = code,
};
sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (sock < 0)
/* ... bail out ... */
ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
if (ret < 0)
/* ... bail out ... */
/* ... */
close(sock);
The above example code attaches a socket filter for a PF_PACKET socket
in order to let all IPv4/IPv6 packets with port 22 pass. The rest will
......@@ -178,15 +183,17 @@ closely modelled after Steven McCanne's and Van Jacobson's BPF paper.
The BPF architecture consists of the following basic elements:
======= ====================================================
Element Description
======= ====================================================
A 32 bit wide accumulator
X 32 bit wide X register
M[] 16 x 32 bit wide misc registers aka "scratch memory
store", addressable from 0 to 15
store", addressable from 0 to 15
======= ====================================================
A program, that is translated by bpf_asm into "opcodes" is an array that
consists of the following elements (as already mentioned):
consists of the following elements (as already mentioned)::
op:16, jt:8, jf:8, k:32
......@@ -201,8 +208,9 @@ and return instructions that are also represented in bpf_asm syntax. This
table lists all bpf_asm instructions available resp. what their underlying
opcodes as defined in linux/filter.h stand for:
=========== =================== =====================
Instruction Addressing mode Description
=========== =================== =====================
ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A
ldh 1, 2 Load half-word into A
......@@ -241,11 +249,13 @@ opcodes as defined in linux/filter.h stand for:
txa Copy X into A
ret 4, 11 Return
=========== =================== =====================
The next table shows addressing formats from the 2nd column:
=============== =================== ===============================================
Addressing mode Syntax Description
=============== =================== ===============================================
0 x/%x Register X
1 [k] BHW at byte offset k in the packet
2 [x + k] BHW at the offset X + k in the packet
......@@ -259,6 +269,7 @@ The next table shows addressing formats from the 2nd column:
10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A
12 extension BPF extension
=============== =================== ===============================================
The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with
......@@ -267,8 +278,9 @@ extensions are loaded into A.
Possible BPF extensions are shown in the following table:
=================================== =================================================
Extension Description
=================================== =================================================
len skb->len
proto skb->protocol
type skb->pkt_type
......@@ -285,18 +297,19 @@ Possible BPF extensions are shown in the following table:
vlan_avail skb_vlan_tag_present(skb)
vlan_tpid skb->vlan_proto
rand prandom_u32()
=================================== =================================================
These extensions can also be prefixed with '#'.
Examples for low-level BPF:
** ARP packets:
**ARP packets**::
ldh [12]
jne #0x806, drop
ret #-1
drop: ret #0
** IPv4 TCP packets:
**IPv4 TCP packets**::
ldh [12]
jne #0x800, drop
......@@ -305,14 +318,15 @@ Examples for low-level BPF:
ret #-1
drop: ret #0
** (Accelerated) VLAN w/ id 10:
**(Accelerated) VLAN w/ id 10**::
ld vlan_tci
jneq #10, drop
ret #-1
drop: ret #0
** icmp random packet sampling, 1 in 4
**icmp random packet sampling, 1 in 4**:
ldh [12]
jne #0x800, drop
ldb [23]
......@@ -324,7 +338,7 @@ Examples for low-level BPF:
ret #-1
drop: ret #0
** SECCOMP filter example:
**SECCOMP filter example**::
ld [4] /* offsetof(struct seccomp_data, arch) */
jne #0xc000003e, bad /* AUDIT_ARCH_X86_64 */
......@@ -345,18 +359,18 @@ Examples for low-level BPF:
The above example code can be placed into a file (here called "foo"), and
then be passed to the bpf_asm tool for generating opcodes, output that xt_bpf
and cls_bpf understands and can directly be loaded with. Example with above
ARP code:
ARP code::
$ ./bpf_asm foo
4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0,
$ ./bpf_asm foo
4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0,
In copy and paste C-like output:
In copy and paste C-like output::
$ ./bpf_asm -c foo
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 1, 0x00000806 },
{ 0x06, 0, 0, 0xffffffff },
{ 0x06, 0, 0, 0000000000 },
$ ./bpf_asm -c foo
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 1, 0x00000806 },
{ 0x06, 0, 0, 0xffffffff },
{ 0x06, 0, 0, 0000000000 },
In particular, as usage with xt_bpf or cls_bpf can result in more complex BPF
filters that might not be obvious at first, it's good to test filters before
......@@ -365,9 +379,9 @@ bpf_dbg under tools/bpf/ in the kernel source directory. This debugger allows
for testing BPF filters against given pcap files, single stepping through the
BPF code on the pcap's packets and to do BPF machine register dumps.
Starting bpf_dbg is trivial and just requires issuing:
Starting bpf_dbg is trivial and just requires issuing::
# ./bpf_dbg
# ./bpf_dbg
In case input and output do not equal stdin/stdout, bpf_dbg takes an
alternative stdin source as a first argument, and an alternative stdout
......@@ -381,84 +395,100 @@ Interaction in bpf_dbg happens through a shell that also has auto-completion
support (follow-up example commands starting with '>' denote bpf_dbg shell).
The usual workflow would be to ...
> load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0
* load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0
Loads a BPF filter from standard output of bpf_asm, or transformed via
e.g. `tcpdump -iem1 -ddd port 22 | tr '\n' ','`. Note that for JIT
e.g. ``tcpdump -iem1 -ddd port 22 | tr '\n' ','``. Note that for JIT
debugging (next section), this command creates a temporary socket and
loads the BPF code into the kernel. Thus, this will also be useful for
JIT developers.
> load pcap foo.pcap
* load pcap foo.pcap
Loads standard tcpdump pcap file.
> run [<n>]
* run [<n>]
bpf passes:1 fails:9
Runs through all packets from a pcap to account how many passes and fails
the filter will generate. A limit of packets to traverse can be given.
> disassemble
l0: ldh [12]
l1: jeq #0x800, l2, l5
l2: ldb [23]
l3: jeq #0x1, l4, l5
l4: ret #0xffff
l5: ret #0
* disassemble::
l0: ldh [12]
l1: jeq #0x800, l2, l5
l2: ldb [23]
l3: jeq #0x1, l4, l5
l4: ret #0xffff
l5: ret #0
Prints out BPF code disassembly.
> dump
/* { op, jt, jf, k }, */
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 3, 0x00000800 },
{ 0x30, 0, 0, 0x00000017 },
{ 0x15, 0, 1, 0x00000001 },
{ 0x06, 0, 0, 0x0000ffff },
{ 0x06, 0, 0, 0000000000 },
* dump::
/* { op, jt, jf, k }, */
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 3, 0x00000800 },
{ 0x30, 0, 0, 0x00000017 },
{ 0x15, 0, 1, 0x00000001 },
{ 0x06, 0, 0, 0x0000ffff },
{ 0x06, 0, 0, 0000000000 },
Prints out C-style BPF code dump.
> breakpoint 0
breakpoint at: l0: ldh [12]
> breakpoint 1
breakpoint at: l1: jeq #0x800, l2, l5
* breakpoint 0::
breakpoint at: l0: ldh [12]
* breakpoint 1::
breakpoint at: l1: jeq #0x800, l2, l5
...
Sets breakpoints at particular BPF instructions. Issuing a `run` command
will walk through the pcap file continuing from the current packet and
break when a breakpoint is being hit (another `run` will continue from
the currently active breakpoint executing next instructions):
> run
-- register dump --
pc: [0] <-- program counter
code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction
curr: l0: ldh [12] <-- disassembly of current instruction
A: [00000000][0] <-- content of A (hex, decimal)
X: [00000000][0] <-- content of X (hex, decimal)
M[0,15]: [00000000][0] <-- folded content of M (hex, decimal)
-- packet dump -- <-- Current packet from pcap (hex)
len: 42
0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01
16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26
32: 00 00 00 00 00 00 0a 3b 01 01
(breakpoint)
>
> breakpoint
breakpoints: 0 1
Prints currently set breakpoints.
> step [-<n>, +<n>]
* run::
-- register dump --
pc: [0] <-- program counter
code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction
curr: l0: ldh [12] <-- disassembly of current instruction
A: [00000000][0] <-- content of A (hex, decimal)
X: [00000000][0] <-- content of X (hex, decimal)
M[0,15]: [00000000][0] <-- folded content of M (hex, decimal)
-- packet dump -- <-- Current packet from pcap (hex)
len: 42
0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01
16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26
32: 00 00 00 00 00 00 0a 3b 01 01
(breakpoint)
>
* breakpoint::
breakpoints: 0 1
Prints currently set breakpoints.
* step [-<n>, +<n>]
Performs single stepping through the BPF program from the current pc
offset. Thus, on each step invocation, above register dump is issued.
This can go forwards and backwards in time, a plain `step` will break
on the next BPF instruction, thus +1. (No `run` needs to be issued here.)
> select <n>
* select <n>
Selects a given packet from the pcap file to continue from. Thus, on
the next `run` or `step`, the BPF program is being evaluated against
the user pre-selected packet. Numbering starts just as in Wireshark
with index 1.
> quit
#
* quit
Exits bpf_dbg.
JIT compiler
......@@ -468,23 +498,23 @@ The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
attached filter from user space or for internal kernel users if it has
been previously enabled by root:
been previously enabled by root::
echo 1 > /proc/sys/net/core/bpf_jit_enable
For JIT developers, doing audits etc, each compile run can output the generated
opcode image into the kernel log via:
opcode image into the kernel log via::
echo 2 > /proc/sys/net/core/bpf_jit_enable
Example output from dmesg:
Example output from dmesg::
[ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f
[ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
[ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
[ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
[ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f
[ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
[ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
[ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
setting any other value than that will return in failure. This is even the case for
......@@ -493,78 +523,78 @@ is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is t
generally recommended approach instead.
In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
generating disassembly out of the kernel log's hexdump:
# ./bpf_jit_disasm
70 bytes emitted from JIT compiler (pass:3, flen:6)
ffffffffa0069c8f + <x>:
0: push %rbp
1: mov %rsp,%rbp
4: sub $0x60,%rsp
8: mov %rbx,-0x8(%rbp)
c: mov 0x68(%rdi),%r9d
10: sub 0x6c(%rdi),%r9d
14: mov 0xd8(%rdi),%r8
1b: mov $0xc,%esi
20: callq 0xffffffffe0ff9442
25: cmp $0x800,%eax
2a: jne 0x0000000000000042
2c: mov $0x17,%esi
31: callq 0xffffffffe0ff945e
36: cmp $0x1,%eax
39: jne 0x0000000000000042
3b: mov $0xffff,%eax
40: jmp 0x0000000000000044
42: xor %eax,%eax
44: leaveq
45: retq
Issuing option `-o` will "annotate" opcodes to resulting assembler
instructions, which can be very useful for JIT developers:
# ./bpf_jit_disasm -o
70 bytes emitted from JIT compiler (pass:3, flen:6)
ffffffffa0069c8f + <x>:
0: push %rbp
55
1: mov %rsp,%rbp
48 89 e5
4: sub $0x60,%rsp
48 83 ec 60
8: mov %rbx,-0x8(%rbp)
48 89 5d f8
c: mov 0x68(%rdi),%r9d
44 8b 4f 68
10: sub 0x6c(%rdi),%r9d
44 2b 4f 6c
14: mov 0xd8(%rdi),%r8
4c 8b 87 d8 00 00 00
1b: mov $0xc,%esi
be 0c 00 00 00
20: callq 0xffffffffe0ff9442
e8 1d 94 ff e0
25: cmp $0x800,%eax
3d 00 08 00 00
2a: jne 0x0000000000000042
75 16
2c: mov $0x17,%esi
be 17 00 00 00
31: callq 0xffffffffe0ff945e
e8 28 94 ff e0
36: cmp $0x1,%eax
83 f8 01
39: jne 0x0000000000000042
75 07
3b: mov $0xffff,%eax
b8 ff ff 00 00
40: jmp 0x0000000000000044
eb 02
42: xor %eax,%eax
31 c0
44: leaveq
c9
45: retq
c3
generating disassembly out of the kernel log's hexdump::
# ./bpf_jit_disasm
70 bytes emitted from JIT compiler (pass:3, flen:6)
ffffffffa0069c8f + <x>:
0: push %rbp
1: mov %rsp,%rbp
4: sub $0x60,%rsp
8: mov %rbx,-0x8(%rbp)
c: mov 0x68(%rdi),%r9d
10: sub 0x6c(%rdi),%r9d
14: mov 0xd8(%rdi),%r8
1b: mov $0xc,%esi
20: callq 0xffffffffe0ff9442
25: cmp $0x800,%eax
2a: jne 0x0000000000000042
2c: mov $0x17,%esi
31: callq 0xffffffffe0ff945e
36: cmp $0x1,%eax
39: jne 0x0000000000000042
3b: mov $0xffff,%eax
40: jmp 0x0000000000000044
42: xor %eax,%eax
44: leaveq
45: retq
Issuing option `-o` will "annotate" opcodes to resulting assembler
instructions, which can be very useful for JIT developers:
# ./bpf_jit_disasm -o
70 bytes emitted from JIT compiler (pass:3, flen:6)
ffffffffa0069c8f + <x>:
0: push %rbp
55
1: mov %rsp,%rbp
48 89 e5
4: sub $0x60,%rsp
48 83 ec 60
8: mov %rbx,-0x8(%rbp)
48 89 5d f8
c: mov 0x68(%rdi),%r9d
44 8b 4f 68
10: sub 0x6c(%rdi),%r9d
44 2b 4f 6c
14: mov 0xd8(%rdi),%r8
4c 8b 87 d8 00 00 00
1b: mov $0xc,%esi
be 0c 00 00 00
20: callq 0xffffffffe0ff9442
e8 1d 94 ff e0
25: cmp $0x800,%eax
3d 00 08 00 00
2a: jne 0x0000000000000042
75 16
2c: mov $0x17,%esi
be 17 00 00 00
31: callq 0xffffffffe0ff945e
e8 28 94 ff e0
36: cmp $0x1,%eax
83 f8 01
39: jne 0x0000000000000042
75 07
3b: mov $0xffff,%eax
b8 ff ff 00 00
40: jmp 0x0000000000000044
eb 02
42: xor %eax,%eax
31 c0
44: leaveq
c9
45: retq
c3
For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful
toolchain for developing and testing the kernel's JIT compiler.
......@@ -663,9 +693,9 @@ Some core changes of the new internal format:
- Conditional jt/jf targets replaced with jt/fall-through:
While the original design has constructs such as "if (cond) jump_true;
else jump_false;", they are being replaced into alternative constructs like
"if (cond) jump_true; /* else fall-through */".
While the original design has constructs such as ``if (cond) jump_true;
else jump_false;``, they are being replaced into alternative constructs like
``if (cond) jump_true; /* else fall-through */``.
- Introduces bpf_call insn and register passing convention for zero overhead
calls from/to other kernel functions:
......@@ -684,32 +714,32 @@ Some core changes of the new internal format:
a return value of the function. Since R6 - R9 are callee saved, their state
is preserved across the call.
For example, consider three C functions:
For example, consider three C functions::
u64 f1() { return (*_f2)(1); }
u64 f2(u64 a) { return f3(a + 1, a); }
u64 f3(u64 a, u64 b) { return a - b; }
u64 f1() { return (*_f2)(1); }
u64 f2(u64 a) { return f3(a + 1, a); }
u64 f3(u64 a, u64 b) { return a - b; }
GCC can compile f1, f3 into x86_64:
GCC can compile f1, f3 into x86_64::
f1:
movl $1, %edi
movq _f2(%rip), %rax
jmp *%rax
f3:
movq %rdi, %rax
subq %rsi, %rax
ret
f1:
movl $1, %edi
movq _f2(%rip), %rax
jmp *%rax
f3:
movq %rdi, %rax
subq %rsi, %rax
ret
Function f2 in eBPF may look like:
Function f2 in eBPF may look like::
f2:
bpf_mov R2, R1
bpf_add R1, 1
bpf_call f3
bpf_exit
f2:
bpf_mov R2, R1
bpf_add R1, 1
bpf_call f3
bpf_exit
If f2 is JITed and the pointer stored to '_f2'. The calls f1 -> f2 -> f3 and
If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and
returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to
be used to call into f2.
......@@ -722,6 +752,8 @@ Some core changes of the new internal format:
On 64-bit architectures all register map to HW registers one to one. For
example, x86_64 JIT compiler can map them as ...
::
R0 - rax
R1 - rdi
R2 - rsi
......@@ -737,7 +769,7 @@ Some core changes of the new internal format:
... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
and rbx, r12 - r15 are callee saved.
Then the following internal BPF pseudo-program:
Then the following internal BPF pseudo-program::
bpf_mov R6, R1 /* save ctx */
bpf_mov R2, 2
......@@ -755,7 +787,7 @@ Some core changes of the new internal format:
bpf_add R0, R7
bpf_exit
After JIT to x86_64 may look like:
After JIT to x86_64 may look like::
push %rbp
mov %rsp,%rbp
......@@ -781,21 +813,21 @@ Some core changes of the new internal format:
leaveq
retq
Which is in this example equivalent in C to:
Which is in this example equivalent in C to::
u64 bpf_filter(u64 ctx)
{
return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
}
In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64
arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper
registers and place their return value into '%rax' which is R0 in eBPF.
registers and place their return value into ``%rax`` which is R0 in eBPF.
Prologue and epilogue are emitted by JIT and are implicit in the
interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve
them across the calls as defined by calling convention.
For example the following program is invalid:
For example the following program is invalid::
bpf_mov R1, 1
bpf_call foo
......@@ -814,7 +846,7 @@ The input context pointer for invoking the interpreter function is generic,
its content is defined by a specific use case. For seccomp register R1 points
to seccomp_data, for converted BPF filters R1 points to a skb.
A program, that is translated internally consists of the following elements:
A program, that is translated internally consists of the following elements::
op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32
......@@ -824,7 +856,7 @@ instructions must be multiple of 8 bytes to preserve backward compatibility.
Internal BPF is a general purpose RISC instruction set. Not every register and
every instruction are used during translation from original BPF to new format.
For example, socket filters are not using 'exclusive add' instruction, but
For example, socket filters are not using ``exclusive add`` instruction, but
tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack.
......@@ -849,7 +881,7 @@ eBPF opcode encoding
eBPF is reusing most of the opcode encoding from classic to simplify conversion
of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
field is divided into three parts:
field is divided into three parts::
+----------------+--------+--------------------+
| 4 bits | 1 bit | 3 bits |
......@@ -859,8 +891,9 @@ field is divided into three parts:
Three LSB bits store instruction class which is one of:
Classic BPF classes: eBPF classes:
=================== ===============
Classic BPF classes eBPF classes
=================== ===============
BPF_LD 0x00 BPF_LD 0x00
BPF_LDX 0x01 BPF_LDX 0x01
BPF_ST 0x02 BPF_ST 0x02
......@@ -869,25 +902,28 @@ Three LSB bits store instruction class which is one of:
BPF_JMP 0x05 BPF_JMP 0x05
BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07
=================== ===============
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
BPF_K 0x00
BPF_X 0x08
::
BPF_K 0x00
BPF_X 0x08
* in classic BPF, this means:
* in classic BPF, this means::
BPF_SRC(code) == BPF_X - use register X as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
BPF_SRC(code) == BPF_X - use register X as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
* in eBPF, this means:
* in eBPF, this means::
BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
... and four MSB bits store operation code.
If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of::
BPF_ADD 0x00
BPF_SUB 0x10
......@@ -904,7 +940,7 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */
If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:
If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of::
BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10
......@@ -934,7 +970,7 @@ exactly the same operations as BPF_ALU, but with 64-bit wide operands
instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
dst_reg = dst_reg + src_reg
Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
Classic BPF wastes the whole BPF_RET class to represent a single ``ret``
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return
......@@ -942,7 +978,7 @@ value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
operands for the comparisons instead.
For load and store instructions the 8-bit 'code' field is divided as:
For load and store instructions the 8-bit 'code' field is divided as::
+--------+--------+-------------------+
| 3 bits | 2 bits | 3 bits |
......@@ -952,19 +988,21 @@ For load and store instructions the 8-bit 'code' field is divided as:
Size modifier is one of ...
::
BPF_W 0x00 /* word */
BPF_H 0x08 /* half word */
BPF_B 0x10 /* byte */
BPF_DW 0x18 /* eBPF only, double word */
... which encodes size of load/store operation:
... which encodes size of load/store operation::
B - 1 byte
H - 2 byte
W - 4 byte
DW - 8 byte (eBPF only)
Mode modifier is one of:
Mode modifier is one of::
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
......@@ -979,7 +1017,7 @@ eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
They had to be carried over from classic to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to 'struct sk_buff' and
be used when interpreter context is a pointer to ``struct sk_buff`` and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
......@@ -992,26 +1030,26 @@ the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions.
For example:
For example::
BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 were scratched.
Unlike classic BPF instruction set, eBPF has generic load/store operations:
Unlike classic BPF instruction set, eBPF has generic load/store operations::
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
2 byte atomic increments are not supported.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
of two consecutive 'struct bpf_insn' 8-byte blocks and interpreted as single
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
32-bit immediate value into a register.
......@@ -1037,38 +1075,48 @@ since addition of two valid pointers makes invalid pointer.
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
sure that kernel addresses don't leak to unprivileged users)
If register was never written to, it's not readable:
If register was never written to, it's not readable::
bpf_mov R0 = R2
bpf_exit
will be rejected, since R2 is unreadable at the start of the program.
After kernel function call, R1-R5 are reset to unreadable and
R0 has a return type of the function.
Since R6-R9 are callee saved, their state is preserved across the call.
::
bpf_mov R6 = 1
bpf_call foo
bpf_mov R0 = R6
bpf_exit
is a correct program. If there was R1 instead of R6, it would have
been rejected.
load/store instructions are allowed only with registers of valid types, which
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
For example:
For example::
bpf_mov R1 = 1
bpf_mov R2 = 2
bpf_xadd *(u32 *)(R1 + 3) += R2
bpf_exit
will be rejected, since R1 doesn't have a valid pointer type at the time of
execution of instruction bpf_xadd.
At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context')
At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
A callback is used to customize verifier to restrict eBPF program access to only
certain fields within ctx structure with specified size and alignment.
For example, the following insn:
For example, the following insn::
bpf_ld R0 = *(u32 *)(R6 + 8)
intends to load a word from address R6 + 8 and store it into R0
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
that offset 8 of size 4 bytes can be accessed for reading, otherwise
......@@ -1079,10 +1127,13 @@ so it will fail verification, since it's out of bounds.
The verifier will allow eBPF program to read data from stack only after
it wrote into it.
Classic BPF verifier does similar check with M[0-15] memory slots.
For example:
For example::
bpf_ld R0 = *(u32 *)(R10 - 4)
bpf_exit
is invalid program.
Though R10 is correct read-only register and has type PTR_TO_STACK
and R10 - 4 is within stack bounds, there were no stores into that location.
......@@ -1113,48 +1164,61 @@ Register value tracking
-----------------------
In order to determine the safety of an eBPF program, the verifier must track
the range of possible values in each register and also in each stack slot.
This is done with 'struct bpf_reg_state', defined in include/linux/
This is done with ``struct bpf_reg_state``, defined in include/linux/
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
register state has a type, which is either NOT_INIT (the register has not been
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
pointer type. The types of pointers describe their base, as follows:
PTR_TO_CTX Pointer to bpf_context.
CONST_PTR_TO_MAP Pointer to struct bpf_map. "Const" because arithmetic
on these pointers is forbidden.
PTR_TO_MAP_VALUE Pointer to the value stored in a map element.
PTR_TO_CTX
Pointer to bpf_context.
CONST_PTR_TO_MAP
Pointer to struct bpf_map. "Const" because arithmetic
on these pointers is forbidden.
PTR_TO_MAP_VALUE
Pointer to the value stored in a map element.
PTR_TO_MAP_VALUE_OR_NULL
Either a pointer to a map value, or NULL; map accesses
(see section 'eBPF maps', below) return this type,
which becomes a PTR_TO_MAP_VALUE when checked != NULL.
Arithmetic on these pointers is forbidden.
PTR_TO_STACK Frame pointer.
PTR_TO_PACKET skb->data.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
Either a pointer to a map value, or NULL; map accesses
(see section 'eBPF maps', below) return this type,
which becomes a PTR_TO_MAP_VALUE when checked != NULL.
Arithmetic on these pointers is forbidden.
PTR_TO_STACK
Frame pointer.
PTR_TO_PACKET
skb->data.
PTR_TO_PACKET_END
skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET
Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
operand) is added to a pointer, while the latter is used for values which are
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
the range of possible values in the register.
The verifier's knowledge about the variable offset consists of:
* minimum and maximum values as unsigned
* minimum and maximum values as signed
* knowledge of the values of individual bits, in the form of a 'tnum': a u64
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
mask and value; no bit should ever be 1 in both. For example, if a byte is read
into a register from memory, the register's top 56 bits are known zero, while
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
0x1ff), because of potential carries.
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
mask and value; no bit should ever be 1 in both. For example, if a byte is read
into a register from memory, the register's top 56 bits are known zero, while
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
0x1ff), because of potential carries.
Besides arithmetic, the register state can also be updated by conditional
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
......@@ -1188,7 +1252,7 @@ The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding 'struct sock'. To ensure that the
represents a reference to the corresponding ``struct sock``. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.
......@@ -1196,17 +1260,18 @@ Direct packet access
--------------------
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
data via skb->data and skb->data_end pointers.
Ex:
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
2: r3 = *(u32 *)(r1 +76) /* load skb->data */
3: r5 = r3
4: r5 += 14
5: if r5 > r4 goto pc+16
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
Ex::
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
2: r3 = *(u32 *)(r1 +76) /* load skb->data */
3: r5 = r3
4: r5 += 14
5: if r5 > r4 goto pc+16
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
this 2byte load from the packet is safe to do, since the program author
did check 'if (skb->data + 14 > skb->data_end) goto err' at insn #5 which
did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
means that in the fall-through case the register R3 (which points to skb->data)
has at least 14 directly accessible bytes. The verifier marks it
as R3=pkt(id=0,off=0,r=14).
......@@ -1215,52 +1280,58 @@ off=0 means that no additional constants were added.
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
to the packet data, but constant 14 was added to the register, so
it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14)
it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
which is zero bytes.
More complex packet access may look like:
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
7: r4 = *(u8 *)(r3 +12)
8: r4 *= 14
9: r3 = *(u32 *)(r1 +76) /* load skb->data */
10: r3 += r4
11: r2 = r1
12: r2 <<= 48
13: r2 >>= 48
14: r3 += r2
15: r2 = r3
16: r2 += 8
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
18: if r2 > r1 goto pc+2
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
19: r1 = *(u8 *)(r3 +4)
More complex packet access may look like::
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
7: r4 = *(u8 *)(r3 +12)
8: r4 *= 14
9: r3 = *(u32 *)(r1 +76) /* load skb->data */
10: r3 += r4
11: r2 = r1
12: r2 <<= 48
13: r2 >>= 48
14: r3 += r2
15: r2 = r3
16: r2 += 8
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
18: if r2 > r1 goto pc+2
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
19: r1 = *(u8 *)(r3 +4)
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some
id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
offset within a packet and since the program author did
'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8).
``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
The verifier only allows 'add'/'sub' operations on packet registers. Any other
operation will set the register state to 'SCALAR_VALUE' and it won't be
available for direct packet access.
Operation 'r3 += rX' may overflow and become less than original skb->data,
therefore the verifier has to prevent that. So when it sees 'r3 += rX'
Operation ``r3 += rX`` may overflow and become less than original skb->data,
therefore the verifier has to prevent that. So when it sees ``r3 += rX``
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
against skb->data_end will not give us 'range' information, so attempts to read
through the pointer will give "invalid access to packet" error.
Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is
Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
of the register are guaranteed to be zero, and nothing is known about the lower
8 bits. After insn 'r4 *= 14' the state becomes
8 bits. After insn ``r4 *= 14`` the state becomes
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
value by constant 14 will keep upper 52 bits as zero, also the least significant
bit will be zero as 14 is even. Similarly 'r2 >>= 48' will make
bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
extending. This logic is implemented in adjust_reg_min_max_vals() function,
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
versa) and adjust_scalar_min_max_vals() for operations on two scalars.
The end result is that bpf program author can access packet directly
using normal C code as:
using normal C code as::
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct eth_hdr *eth = data;
......@@ -1268,13 +1339,14 @@ using normal C code as:
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
return 0;
return 0;
if (eth->h_proto != htons(ETH_P_IP))
return 0;
return 0;
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
return 0;
return 0;
if (udp->dest == 53 || udp->source == 9)
...;
...;
which makes such programs easier to write comparing to LD_ABS insn
and significantly faster.
......@@ -1284,23 +1356,24 @@ eBPF maps
and userspace.
The maps are accessed from user space via BPF syscall, which has commands:
- create a map with given type and attributes
map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
returns process-local file descriptor or negative error
- lookup key in a given map
err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value
returns zero and stores found elem into value or negative error
- create or update key/value pair in a given map
err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value
returns zero or negative error
- find and delete element by key in a given map
err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key
- to delete map: close(fd)
......@@ -1312,10 +1385,11 @@ are concurrently updating.
maps can have different types: hash, array, bloom filter, radix-tree, etc.
The map is defined by:
. type
. max number of elements
. key size in bytes
. value size in bytes
- type
- max number of elements
- key size in bytes
- value size in bytes
Pruning
-------
......@@ -1339,57 +1413,75 @@ Understanding eBPF verifier messages
The following are few examples of invalid eBPF programs and verifier error
messages as seen in the log:
Program with unreachable instructions:
static struct bpf_insn prog[] = {
Program with unreachable instructions::
static struct bpf_insn prog[] = {
BPF_EXIT_INSN(),
BPF_EXIT_INSN(),
};
};
Error:
unreachable insn 1
Program that reads uninitialized register:
Program that reads uninitialized register::
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_EXIT_INSN(),
Error:
Error::
0: (bf) r0 = r2
R2 !read_ok
Program that doesn't initialize R0 before exiting:
Program that doesn't initialize R0 before exiting::
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
BPF_EXIT_INSN(),
Error:
Error::
0: (bf) r2 = r1
1: (95) exit
R0 !read_ok
Program that accesses stack out of bounds:
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
BPF_EXIT_INSN(),
Error:
0: (7a) *(u64 *)(r10 +8) = 0
invalid stack off=8 size=8
Program that accesses stack out of bounds::
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
BPF_EXIT_INSN(),
Error::
0: (7a) *(u64 *)(r10 +8) = 0
invalid stack off=8 size=8
Program that doesn't initialize stack before passing its address into function::
Program that doesn't initialize stack before passing its address into function:
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
Error:
Error::
0: (bf) r2 = r10
1: (07) r2 += -8
2: (b7) r1 = 0x0
3: (85) call 1
invalid indirect read from stack off -8+0 size 8
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
......@@ -1398,7 +1490,8 @@ Error:
fd 0 is not pointing to valid bpf_map
Program that doesn't check return value of map_lookup_elem() before accessing
map element:
map element::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
......@@ -1406,7 +1499,9 @@ map element:
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
......@@ -1416,7 +1511,8 @@ Error:
R0 invalid mem access 'map_value_or_null'
Program that correctly checks map_lookup_elem() returned value for NULL, but
accesses the memory with incorrect alignment:
accesses the memory with incorrect alignment::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
......@@ -1425,7 +1521,9 @@ accesses the memory with incorrect alignment:
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
......@@ -1438,7 +1536,8 @@ Error:
Program that correctly checks map_lookup_elem() returned value for NULL and
accesses memory with correct alignment in one side of 'if' branch, but fails
to do so in the other side of 'if' branch:
to do so in the other side of 'if' branch::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
......@@ -1449,7 +1548,9 @@ to do so in the other side of 'if' branch:
BPF_EXIT_INSN(),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
Error:
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
......@@ -1465,8 +1566,8 @@ Error:
R0 invalid mem access 'imm'
Program that performs a socket lookup then sets the pointer to NULL without
checking it:
value:
checking it::
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
......@@ -1477,7 +1578,9 @@ value:
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Error:
Error::
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
......@@ -1491,7 +1594,8 @@ Error:
Unreleased reference id=1, alloc_insn=7
Program that performs a socket lookup but does not NULL-check the returned
value:
value::
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
......@@ -1501,7 +1605,9 @@ value:
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
Error:
Error::
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
......@@ -1519,7 +1625,7 @@ Testing
Next to the BPF toolchain, the kernel also ships a test module that contains
various test cases for classic and internal BPF that can be executed against
the BPF interpreter and JIT compiler. It can be found in lib/test_bpf.c and
enabled via Kconfig:
enabled via Kconfig::
CONFIG_TEST_BPF=m
......@@ -1540,6 +1646,6 @@ The document was written in the hope that it is found useful and in order
to give potential BPF hackers or security auditors a better overview of
the underlying architecture.
Jay Schulist <jschlst@samba.org>
Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov <ast@kernel.org>
- Jay Schulist <jschlst@samba.org>
- Daniel Borkmann <daniel@iogearbox.net>
- Alexei Starovoitov <ast@kernel.org>
.. SPDX-License-Identifier: GPL-2.0
=============================================
FORE Systems PCA-200E/SBA-200E ATM NIC driver
---------------------------------------------
=============================================
This driver adds support for the FORE Systems 200E-series ATM adapters
to the Linux operating system. It is based on the earlier PCA-200E driver
......@@ -27,8 +29,8 @@ in the linux/drivers/atm directory for details and restrictions.
Firmware Updates
----------------
The FORE Systems 200E-series driver is shipped with firmware data being
uploaded to the ATM adapters at system boot time or at module loading time.
The FORE Systems 200E-series driver is shipped with firmware data being
uploaded to the ATM adapters at system boot time or at module loading time.
The supplied firmware images should work with all adapters.
However, if you encounter problems (the firmware doesn't start or the driver
......
Frame Relay (FR) support for linux is built into a two tiered system of device
.. SPDX-License-Identifier: GPL-2.0
================
Frame Relay (FR)
================
Frame Relay (FR) support for linux is built into a two tiered system of device
drivers. The upper layer implements RFC1490 FR specification, and uses the
Data Link Connection Identifier (DLCI) as its hardware address. Usually these
are assigned by your network supplier, they give you the number/numbers of
......@@ -7,18 +13,18 @@ the Virtual Connections (VC) assigned to you.
Each DLCI is a point-to-point link between your machine and a remote one.
As such, a separate device is needed to accommodate the routing. Within the
net-tools archives is 'dlcicfg'. This program will communicate with the
base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'...
base "DLCI" device, and create new net devices named 'dlci00', 'dlci01'...
The configuration script will ask you how many DLCIs you need, as well as
how many DLCIs you want to assign to each Frame Relay Access Device (FRAD).
The DLCI uses a number of function calls to communicate with the FRAD, all
of which are stored in the FRAD's private data area. assoc/deassoc,
of which are stored in the FRAD's private data area. assoc/deassoc,
activate/deactivate and dlci_config. The DLCI supplies a receive function
to the FRAD to accept incoming packets.
With this initial offering, only 1 FRAD driver is available. With many thanks
to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E &
S508 are supported. This driver is currently set up for only FR, but as
to Sangoma Technologies, David Mandelstam & Gene Kozin, the S502A, S502E &
S508 are supported. This driver is currently set up for only FR, but as
Sangoma makes more firmware modules available, it can be updated to provide
them as well.
......@@ -32,8 +38,7 @@ an initial configuration.
Additional FRAD device drivers can be added as hardware is available.
At this time, the dlcicfg and fradcfg programs have not been incorporated into
the net-tools distribution. They can be found at ftp.invlogic.com, in
the net-tools distribution. They can be found at ftp.invlogic.com, in
/pub/linux. Note that with OS/2 FTPD, you end up in /pub by default, so just
use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for
use 'cd linux'. v0.10 is for use on pre-2.0.3 and earlier, v0.15 is for
pre-2.0.4 and later.
.. SPDX-License-Identifier: GPL-2.0
===============================================
Generic networking statistics for netlink users
======================================================================
===============================================
Statistic counters are grouped into structs:
==================== ===================== =====================
Struct TLV type Description
----------------------------------------------------------------------
==================== ===================== =====================
gnet_stats_basic TCA_STATS_BASIC Basic statistics
gnet_stats_rate_est TCA_STATS_RATE_EST Rate estimator
gnet_stats_queue TCA_STATS_QUEUE Queue statistics
none TCA_STATS_APP Application specific
==================== ===================== =====================
Collecting:
-----------
Declare the statistic structs you need:
struct mystruct {
struct gnet_stats_basic bstats;
struct gnet_stats_queue qstats;
...
};
Declare the statistic structs you need::
struct mystruct {
struct gnet_stats_basic bstats;
struct gnet_stats_queue qstats;
...
};
Update statistics, in dequeue() methods only, (while owning qdisc->running)::
Update statistics, in dequeue() methods only, (while owning qdisc->running)
mystruct->tstats.packet++;
mystruct->qstats.backlog += skb->pkt_len;
mystruct->tstats.packet++;
mystruct->qstats.backlog += skb->pkt_len;
Export to userspace (Dump):
---------------------------
my_dumping_routine(struct sk_buff *skb, ...)
{
struct gnet_dump dump;
::
if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump,
TCA_PAD) < 0)
goto rtattr_failure;
my_dumping_routine(struct sk_buff *skb, ...)
{
struct gnet_dump dump;
if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 ||
gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 ||
gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0)
goto rtattr_failure;
if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump,
TCA_PAD) < 0)
goto rtattr_failure;
if (gnet_stats_finish_copy(&dump) < 0)
goto rtattr_failure;
...
}
if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 ||
gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 ||
gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0)
goto rtattr_failure;
if (gnet_stats_finish_copy(&dump) < 0)
goto rtattr_failure;
...
}
TCA_STATS/TCA_XSTATS backward compatibility:
--------------------------------------------
Prior users of struct tc_stats and xstats can maintain backward
compatibility by calling the compat wrappers to keep providing the
existing TLV types.
existing TLV types::
my_dumping_routine(struct sk_buff *skb, ...)
{
if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS,
TCA_XSTATS, &mystruct->lock, &dump,
TCA_PAD) < 0)
goto rtattr_failure;
...
}
my_dumping_routine(struct sk_buff *skb, ...)
{
if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS,
TCA_XSTATS, &mystruct->lock, &dump,
TCA_PAD) < 0)
goto rtattr_failure;
...
}
A struct tc_stats will be filled out during gnet_stats_copy_* calls
and appended to the skb. TCA_XSTATS is provided if gnet_stats_copy_app
......@@ -77,7 +86,7 @@ are responsible for making sure that the lock is initialized.
Rate Estimator:
--------------
---------------
0) Prepare an estimator attribute. Most likely this would be in user
space. The value of this TLV should contain a tc_estimator structure.
......@@ -92,18 +101,19 @@ Rate Estimator:
TCA_RATE to your code in the kernel.
In the kernel when setting up:
1) make sure you have basic stats and rate stats setup first.
2) make sure you have initialized stats lock that is used to setup such
stats.
3) Now initialize a new estimator:
3) Now initialize a new estimator::
int ret = gen_new_estimator(my_basicstats,my_rate_est_stats,
mystats_lock, attr_with_tcestimator_struct);
int ret = gen_new_estimator(my_basicstats,my_rate_est_stats,
mystats_lock, attr_with_tcestimator_struct);
if ret == 0
success
else
failed
if ret == 0
success
else
failed
From now on, every time you dump my_rate_est_stats it will contain
up-to-date info.
......@@ -115,5 +125,5 @@ are still valid (i.e still exist) at the time of making this call.
Authors:
--------
Thomas Graf <tgraf@suug.ch>
Jamal Hadi Salim <hadi@cyberus.ca>
- Thomas Graf <tgraf@suug.ch>
- Jamal Hadi Salim <hadi@cyberus.ca>
.. SPDX-License-Identifier: GPL-2.0
==================
Generic HDLC layer
==================
Krzysztof Halasa <khc@pm.waw.pl>
Generic HDLC layer currently supports:
1. Frame Relay (ANSI, CCITT, Cisco and no LMI)
- Normal (routed) and Ethernet-bridged (Ethernet device emulation)
interfaces can share a single PVC.
- ARP support (no InARP support in the kernel - there is an
experimental InARP user-space daemon available on:
http://www.kernel.org/pub/linux/utils/net/hdlc/).
2. raw HDLC - either IP (IPv4) interface or Ethernet device emulation
3. Cisco HDLC
4. PPP
......@@ -24,19 +32,24 @@ with IEEE 802.1Q (VLANs) and 802.1D (Ethernet bridging).
Make sure the hdlc.o and the hardware driver are loaded. It should
create a number of "hdlc" (hdlc0 etc) network devices, one for each
WAN port. You'll need the "sethdlc" utility, get it from:
http://www.kernel.org/pub/linux/utils/net/hdlc/
Compile sethdlc.c utility:
Compile sethdlc.c utility::
gcc -O2 -Wall -o sethdlc sethdlc.c
Make sure you're using a correct version of sethdlc for your kernel.
Use sethdlc to set physical interface, clock rate, HDLC mode used,
and add any required PVCs if using Frame Relay.
Usually you want something like:
Usually you want something like::
sethdlc hdlc0 clock int rate 128000
sethdlc hdlc0 cisco interval 10 timeout 25
or
or::
sethdlc hdlc0 rs232 clock ext
sethdlc hdlc0 fr lmi ansi
sethdlc hdlc0 create 99
......@@ -49,46 +62,63 @@ any IP address to it) before using pvc devices.
Setting interface:
* v35 | rs232 | x21 | t1 | e1 - sets physical interface for a given port
if the card has software-selectable interfaces
loopback - activate hardware loopback (for testing only)
* clock ext - both RX clock and TX clock external
* clock int - both RX clock and TX clock internal
* clock txint - RX clock external, TX clock internal
* clock txfromrx - RX clock external, TX clock derived from RX clock
* rate - sets clock rate in bps (for "int" or "txint" clock only)
* v35 | rs232 | x21 | t1 | e1
- sets physical interface for a given port
if the card has software-selectable interfaces
loopback
- activate hardware loopback (for testing only)
* clock ext
- both RX clock and TX clock external
* clock int
- both RX clock and TX clock internal
* clock txint
- RX clock external, TX clock internal
* clock txfromrx
- RX clock external, TX clock derived from RX clock
* rate
- sets clock rate in bps (for "int" or "txint" clock only)
Setting protocol:
* hdlc - sets raw HDLC (IP-only) mode
nrz / nrzi / fm-mark / fm-space / manchester - sets transmission code
no-parity / crc16 / crc16-pr0 (CRC16 with preset zeros) / crc32-itu
crc16-itu (CRC16 with ITU-T polynomial) / crc16-itu-pr0 - sets parity
* hdlc-eth - Ethernet device emulation using HDLC. Parity and encoding
as above.
* cisco - sets Cisco HDLC mode (IP, IPv6 and IPX supported)
interval - time in seconds between keepalive packets
timeout - time in seconds after last received keepalive packet before
we assume the link is down
we assume the link is down
* ppp - sets synchronous PPP mode
* x25 - sets X.25 mode
* fr - Frame Relay mode
lmi ansi / ccitt / cisco / none - LMI (link management) type
dce - Frame Relay DCE (network) side LMI instead of default DTE (user).
It has nothing to do with clocks!
t391 - link integrity verification polling timer (in seconds) - user
t392 - polling verification timer (in seconds) - network
n391 - full status polling counter - user
n392 - error threshold - both user and network
n393 - monitored events count - both user and network
- t391 - link integrity verification polling timer (in seconds) - user
- t392 - polling verification timer (in seconds) - network
- n391 - full status polling counter - user
- n392 - error threshold - both user and network
- n393 - monitored events count - both user and network
Frame-Relay only:
* create n | delete n - adds / deletes PVC interface with DLCI #n.
Newly created interface will be named pvc0, pvc1 etc.
......@@ -101,26 +131,34 @@ Frame-Relay only:
Board-specific issues
---------------------
n2.o and c101.o need parameters to work:
n2.o and c101.o need parameters to work::
insmod n2 hw=io,irq,ram,ports[:io,irq,...]
example:
example::
insmod n2 hw=0x300,10,0xD0000,01
or
or::
insmod c101 hw=irq,ram[:irq,...]
example:
example::
insmod c101 hw=9,0xdc000
If built into the kernel, these drivers need kernel (command line) parameters:
If built into the kernel, these drivers need kernel (command line) parameters::
n2.hw=io,irq,ram,ports:...
or
or::
c101.hw=irq,ram:...
If you have a problem with N2, C101 or PLX200SYN card, you can issue the
"private" command to see port's packet descriptor rings (in kernel logs):
"private" command to see port's packet descriptor rings (in kernel logs)::
sethdlc hdlc0 private
......
.. SPDX-License-Identifier: GPL-2.0
===============
Generic Netlink
===============
A wiki document on how to use Generic Netlink can be found here:
* http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto
.. SPDX-License-Identifier: GPL-2.0
=====================================
The Linux kernel GTP tunneling module
======================================================================
Documentation by Harald Welte <laforge@gnumonks.org> and
Andreas Schultz <aschultz@tpip.net>
=====================================
Documentation by
Harald Welte <laforge@gnumonks.org> and
Andreas Schultz <aschultz@tpip.net>
In 'drivers/net/gtp.c' you are finding a kernel-level implementation
of a GTP tunnel endpoint.
== What is GTP ==
What is GTP
===========
GTP is the Generic Tunnel Protocol, which is a 3GPP protocol used for
tunneling User-IP payload between a mobile station (phone, modem)
......@@ -41,7 +47,8 @@ publicly via the 3GPP website at http://www.3gpp.org/DynaReport/29060.htm
A direct PDF link to v13.6.0 is provided for convenience below:
http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.06.00_60/ts_129060v130600p.pdf
== The Linux GTP tunnelling module ==
The Linux GTP tunnelling module
===============================
The module implements the function of a tunnel endpoint, i.e. it is
able to decapsulate tunneled IP packets in the uplink originated by
......@@ -70,7 +77,8 @@ Userspace :)
The official homepage of the module is at
https://osmocom.org/projects/linux-kernel-gtp-u/wiki
== Userspace Programs with Linux Kernel GTP-U support ==
Userspace Programs with Linux Kernel GTP-U support
==================================================
At the time of this writing, there are at least two Free Software
implementations that implement GTP-C and can use the netlink interface
......@@ -82,7 +90,8 @@ to make use of the Linux kernel GTP-U support:
* ergw (GGSN + P-GW in Erlang):
https://github.com/travelping/ergw
== Userspace Library / Command Line Utilities ==
Userspace Library / Command Line Utilities
==========================================
There is a userspace library called 'libgtpnl' which is based on
libmnl and which implements a C-language API towards the netlink
......@@ -90,7 +99,8 @@ interface provided by the Kernel GTP module:
http://git.osmocom.org/libgtpnl/
== Protocol Versions ==
Protocol Versions
=================
There are two different versions of GTP-U: v0 [GSM TS 09.60] and v1
[3GPP TS 29.281]. Both are implemented in the Kernel GTP module.
......@@ -105,7 +115,8 @@ doesn't implement GTP-C, we don't have to worry about this. It's the
responsibility of the control plane implementation in userspace to
implement that.
== IPv6 ==
IPv6
====
The 3GPP specifications indicate either IPv4 or IPv6 can be used both
on the inner (user) IP layer, or on the outer (transport) layer.
......@@ -114,22 +125,25 @@ Unfortunately, the Kernel module currently supports IPv6 neither for
the User IP payload, nor for the outer IP layer. Patches or other
Contributions to fix this are most welcome!
== Mailing List ==
Mailing List
============
If yo have questions regarding how to use the Kernel GTP module from
If you have questions regarding how to use the Kernel GTP module from
your own software, or want to contribute to the code, please use the
osmocom-net-grps mailing list for related discussion. The list can be
reached at osmocom-net-gprs@lists.osmocom.org and the mailman
interface for managing your subscription is at
https://lists.osmocom.org/mailman/listinfo/osmocom-net-gprs
== Issue Tracker ==
Issue Tracker
=============
The Osmocom project maintains an issue tracker for the Kernel GTP-U
module at
https://osmocom.org/projects/linux-kernel-gtp-u/issues
== History / Acknowledgements ==
History / Acknowledgements
==========================
The Module was originally created in 2012 by Harald Welte, but never
completed. Pablo came in to finish the mess Harald left behind. But
......@@ -139,9 +153,11 @@ In 2015, Andreas Schultz came to the rescue and fixed lots more bugs,
extended it with new features and finally pushed all of us to get it
mainline, where it was merged in 4.7.0.
== Architectural Details ==
Architectural Details
=====================
=== Local GTP-U entity and tunnel identification ===
Local GTP-U entity and tunnel identification
--------------------------------------------
GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152
for GTPv1-U and 3386 for GTPv0-U.
......@@ -164,15 +180,15 @@ Therefore:
destination IP and the tunnel endpoint id. The source IP and port
have no meaning and can change at any time.
[3GPP TS 29.281] Section 4.3.0 defines this so:
[3GPP TS 29.281] Section 4.3.0 defines this so::
> The TEID in the GTP-U header is used to de-multiplex traffic
> incoming from remote tunnel endpoints so that it is delivered to the
> User plane entities in a way that allows multiplexing of different
> users, different packet protocols and different QoS levels.
> Therefore no two remote GTP-U endpoints shall send traffic to a
> GTP-U protocol entity using the same TEID value except
> for data forwarding as part of mobility procedures.
The TEID in the GTP-U header is used to de-multiplex traffic
incoming from remote tunnel endpoints so that it is delivered to the
User plane entities in a way that allows multiplexing of different
users, different packet protocols and different QoS levels.
Therefore no two remote GTP-U endpoints shall send traffic to a
GTP-U protocol entity using the same TEID value except
for data forwarding as part of mobility procedures.
The definition above only defines that two remote GTP-U endpoints
*should not* send to the same TEID, it *does not* forbid or exclude
......@@ -183,7 +199,8 @@ multiple or unknown peers.
Therefore, the receiving side identifies tunnels exclusively based on
TEIDs, not based on the source IP!
== APN vs. Network Device ==
APN vs. Network Device
======================
The GTP-U driver creates a Linux network device for each Gi/SGi
interface.
......@@ -201,29 +218,33 @@ number of Gi/SGi interfaces implemented by a GGSN/P-GW.
[3GPP TS 29.061] Section 11.3 makes it clear that the selection of a
specific Gi/SGi interfaces is made through the Access Point Name
(APN):
> 2. each private network manages its own addressing. In general this
> will result in different private networks having overlapping
> address ranges. A logically separate connection (e.g. an IP in IP
> tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW
> and each private network.
>
> In this case the IP address alone is not necessarily unique. The
> pair of values, Access Point Name (APN) and IPv4 address and/or
> IPv6 prefixes, is unique.
(APN)::
2. each private network manages its own addressing. In general this
will result in different private networks having overlapping
address ranges. A logically separate connection (e.g. an IP in IP
tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW
and each private network.
In this case the IP address alone is not necessarily unique. The
pair of values, Access Point Name (APN) and IPv4 address and/or
IPv6 prefixes, is unique.
In order to support the overlapping address range use case, each APN
is mapped to a separate Gi/SGi interface (network device).
NOTE: The Access Point Name is purely a control plane (GTP-C) concept.
At the GTP-U level, only Tunnel Endpoint Identifiers are present in
GTP-U packets and network devices are known
.. note::
The Access Point Name is purely a control plane (GTP-C) concept.
At the GTP-U level, only Tunnel Endpoint Identifiers are present in
GTP-U packets and network devices are known
Therefore for a given UE the mapping in IP to PDN network is:
* network device + MS IP -> Peer IP + Peer TEID,
and from PDN to IP network:
* local GTP-U IP + TEID -> network device
Furthermore, before a received T-PDU is injected into the network
......
.. SPDX-License-Identifier: GPL-2.0
============================================================
Linux Kernel Driver for Huawei Intelligent NIC(HiNIC) family
============================================================
......@@ -110,7 +113,7 @@ hinic_dev - de/constructs the Logical Tx and Rx Queues.
(hinic_main.c, hinic_dev.h)
Miscellaneous:
Miscellaneous
=============
Common functions that are used by HW and Logical Device.
......
.. SPDX-License-Identifier: GPL-2.0
===================================
Identifier Locator Addressing (ILA)
===================================
Introduction
......@@ -26,11 +30,13 @@ The ILA protocol is described in Internet-Draft draft-herbert-intarea-ila.
ILA terminology
===============
- Identifier A number that identifies an addressable node in the network
- Identifier
A number that identifies an addressable node in the network
independent of its location. ILA identifiers are sixty-four
bit values.
- Locator A network prefix that routes to a physical host. Locators
- Locator
A network prefix that routes to a physical host. Locators
provide the topological location of an addressed node. ILA
locators are sixty-four bit prefixes.
......@@ -51,17 +57,20 @@ ILA terminology
bits) and an identifier (low order sixty-four bits). ILA
addresses are never visible to an application.
- ILA host An end host that is capable of performing ILA translations
- ILA host
An end host that is capable of performing ILA translations
on transmit or receive.
- ILA router A network node that performs ILA translation and forwarding
- ILA router
A network node that performs ILA translation and forwarding
of translated packets.
- ILA forwarding cache
A type of ILA router that only maintains a working set
cache of mappings.
- ILA node A network node capable of performing ILA translations. This
- ILA node
A network node capable of performing ILA translations. This
can be an ILA router, ILA forwarding cache, or ILA host.
......@@ -82,18 +91,18 @@ Configuration and datapath for these two points of deployment is somewhat
different.
The diagram below illustrates the flow of packets through ILA as well
as showing ILA hosts and routers.
as showing ILA hosts and routers::
+--------+ +--------+
| Host A +-+ +--->| Host B |
| | | (2) ILA (') | |
+--------+ | ...addressed.... ( ) +--------+
V +---+--+ . packet . +---+--+ (_)
V +---+--+ . packet . +---+--+ (_)
(1) SIR | | ILA |----->-------->---->| ILA | | (3) SIR
addressed +->|router| . . |router|->-+ addressed
packet +---+--+ . IPv6 . +---+--+ packet
/ . Network .
/ . . +--+-++--------+
/ . Network .
/ . . +--+-++--------+
+--------+ / . . |ILA || Host |
| Host +--+ . .- -|host|| |
| | . . +--+-++--------+
......@@ -173,7 +182,7 @@ ILA address, never a SIR address.
In the simplest format the identifier types, C-bit, and checksum
adjustment value are not present so an identifier is considered an
unstructured sixty-four bit value.
unstructured sixty-four bit value::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier |
......@@ -184,7 +193,7 @@ unstructured sixty-four bit value.
The checksum neutral adjustment may be configured to always be
present using neutral-map-auto. In this case there is no C-bit, but the
checksum adjustment is in the low order 16 bits. The identifier is
still sixty-four bits.
still sixty-four bits::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier |
......@@ -193,7 +202,7 @@ still sixty-four bits.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The C-bit may used to explicitly indicate that checksum neutral
mapping has been applied to an ILA address. The format is:
mapping has been applied to an ILA address. The format is::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |C| Identifier |
......@@ -204,7 +213,7 @@ mapping has been applied to an ILA address. The format is:
The identifier type field may be present to indicate the identifier
type. If it is not present then the type is inferred based on mapping
configuration. The checksum neutral adjustment may automatically
used with the identifier type as illustrated below.
used with the identifier type as illustrated below::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type| Identifier |
......@@ -213,7 +222,7 @@ used with the identifier type as illustrated below.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If the identifier type and the C-bit can be present simultaneously so
the identifier format would be:
the identifier format would be::
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type|C| Identifier |
......@@ -258,28 +267,30 @@ same meanings as described above.
Some examples
=============
# Configure an ILA route that uses checksum neutral mapping as well
# as type field. Note that the type field is set in the SIR address
# (the 2000 implies type is 1 which is LUID).
ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \
csum-mode neutral-map ident-type use-format
# Configure an ILA LWT route that uses auto checksum neutral mapping
# (no C-bit) and configure identifier type to be LUID so that the
# identifier type field will not be present.
ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \
csum-mode neutral-map-auto ident-type luid
ila_xlat configuration
# Configure an ILA to SIR mapping that matches a locator and overwrites
# it with a SIR address (3333:0:0:1 in this example). The C-bit and
# identifier field are used.
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
csum-mode neutral-map-auto ident-type use-format
# Configure an ILA to SIR mapping where checksum neutral is automatically
# set without the C-bit and the identifier type is configured to be LUID
# so that the identifier type field is not present.
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
csum-mode neutral-map-auto ident-type use-format
::
# Configure an ILA route that uses checksum neutral mapping as well
# as type field. Note that the type field is set in the SIR address
# (the 2000 implies type is 1 which is LUID).
ip route add 3333:0:0:1:2000:0:1:87/128 encap ila 2001:0:87:0 \
csum-mode neutral-map ident-type use-format
# Configure an ILA LWT route that uses auto checksum neutral mapping
# (no C-bit) and configure identifier type to be LUID so that the
# identifier type field will not be present.
ip route add 3333:0:0:1:2000:0:2:87/128 encap ila 2001:0:87:1 \
csum-mode neutral-map-auto ident-type luid
ila_xlat configuration
# Configure an ILA to SIR mapping that matches a locator and overwrites
# it with a SIR address (3333:0:0:1 in this example). The C-bit and
# identifier field are used.
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
csum-mode neutral-map-auto ident-type use-format
# Configure an ILA to SIR mapping where checksum neutral is automatically
# set without the C-bit and the identifier type is configured to be LUID
# so that the identifier type field is not present.
ip ila add loc_match 2001:0:119:0 loc 3333:0:0:1 \
csum-mode neutral-map-auto ident-type use-format
......@@ -15,6 +15,7 @@ Contents:
device_drivers/index
dsa/index
devlink/index
caif/index
ethtool-netlink
ieee802154
j1939
......@@ -36,6 +37,43 @@ Contents:
tls-offload
nfc
6lowpan
6pack
altera_tse
arcnet-hardware
arcnet
atm
ax25
baycom
bonding
cdc_mbim
cops
cxacru
dccp
dctcp
decnet
defza
dns_resolver
driver
eql
fib_trie
filter
fore200e
framerelay
generic-hdlc
generic_netlink
gen_stats
gtp
hinic
ila
ipddp
ip_dynaddr
iphase
ipsec
ip-sysctl
ipv6
ipvlan
ipvs-sysctl
kcm
.. only:: subproject and html
......
/proc/sys/net/ipv4/* Variables:
.. SPDX-License-Identifier: GPL-2.0
=========
IP Sysctl
=========
/proc/sys/net/ipv4/* Variables
==============================
ip_forward - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
Forward Packets between interfaces.
......@@ -38,6 +45,7 @@ ip_no_pmtu_disc - INTEGER
could break other protocols.
Possible values: 0-3
Default: FALSE
min_pmtu - INTEGER
......@@ -51,16 +59,20 @@ ip_forward_use_pmtu - BOOLEAN
which tries to discover path mtus by itself and depends on the
kernel honoring this information. This is normally not the
case.
Default: 0 (disabled)
Possible values:
0 - disabled
1 - enabled
- 0 - disabled
- 1 - enabled
fwmark_reflect - BOOLEAN
Controls the fwmark of kernel-generated IPv4 reply packets that are not
associated with a socket for example, TCP RSTs or ICMP echo replies).
If unset, these packets have a fwmark of zero. If set, they have the
fwmark of the packet they are replying to.
Default: 0
fib_multipath_use_neigh - BOOLEAN
......@@ -68,63 +80,80 @@ fib_multipath_use_neigh - BOOLEAN
multipath routes. If disabled, neighbor information is not used and
packets could be directed to a failed nexthop. Only valid for kernels
built with CONFIG_IP_ROUTE_MULTIPATH enabled.
Default: 0 (disabled)
Possible values:
0 - disabled
1 - enabled
- 0 - disabled
- 1 - enabled
fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes. Only valid
for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
Default: 0 (Layer 3)
Possible values:
0 - Layer 3
1 - Layer 4
2 - Layer 3 or inner Layer 3 if present
- 0 - Layer 3
- 1 - Layer 4
- 2 - Layer 3 or inner Layer 3 if present
fib_sync_mem - UNSIGNED INTEGER
Amount of dirty memory from fib entries that can be backlogged before
synchronize_rcu is forced.
Default: 512kB Minimum: 64kB Maximum: 64MB
Default: 512kB Minimum: 64kB Maximum: 64MB
ip_forward_update_priority - INTEGER
Whether to update SKB priority from "TOS" field in IPv4 header after it
is forwarded. The new SKB priority is mapped from TOS field value
according to an rt_tos2priority table (see e.g. man tc-prio).
Default: 1 (Update priority.)
Possible values:
0 - Do not update priority.
1 - Update priority.
- 0 - Do not update priority.
- 1 - Update priority.
route/max_size - INTEGER
Maximum number of routes allowed in the kernel. Increase
this when using large numbers of interfaces and/or routes.
From linux kernel 3.6 onwards, this is deprecated for ipv4
as route cache is no longer used.
neigh/default/gc_thresh1 - INTEGER
Minimum number of entries to keep. Garbage collector will not
purge entries if there are fewer than this number.
Default: 128
neigh/default/gc_thresh2 - INTEGER
Threshold when garbage collector becomes more aggressive about
purging entries. Entries older than 5 seconds will be cleared
when over this number.
Default: 512
neigh/default/gc_thresh3 - INTEGER
Maximum number of non-PERMANENT neighbor entries allowed. Increase
this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
Default: 1024
neigh/default/unres_qlen_bytes - INTEGER
The maximum number of bytes which may be used by packets
queued for each unresolved address by other network layers.
(added in linux 3.3)
Setting negative value is meaningless and will return error.
Default: SK_WMEM_MAX, (same as net.core.wmem_default).
Exact value depends on architecture and kernel options,
but should be enough to allow queuing 256 packets
of medium size.
......@@ -132,11 +161,14 @@ neigh/default/unres_qlen_bytes - INTEGER
neigh/default/unres_qlen - INTEGER
The maximum number of packets which may be queued for each
unresolved address by other network layers.
(deprecated in linux 3.3) : use unres_qlen_bytes instead.
Prior to linux 3.3, the default value is 3 which may cause
unexpected packet loss. The current default value is calculated
according to default value of unres_qlen_bytes and true size of
packet.
Default: 101
mtu_expires - INTEGER
......@@ -183,7 +215,8 @@ ipfrag_max_dist - INTEGER
from different IP datagrams, which could result in data corruption.
Default: 64
INET peer storage:
INET peer storage
=================
inet_peer_threshold - INTEGER
The approximate size of the storage. Starting from this threshold
......@@ -203,7 +236,8 @@ inet_peer_maxttl - INTEGER
when the number of entries in the pool is very small).
Measured in seconds.
TCP variables:
TCP variables
=============
somaxconn - INTEGER
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
......@@ -222,18 +256,22 @@ tcp_adv_win_scale - INTEGER
Count buffering overhead as bytes/2^tcp_adv_win_scale
(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
if it is <= 0.
Possible values are [-31, 31], inclusive.
Default: 1
tcp_allowed_congestion_control - STRING
Show/set the congestion control choices available to non-privileged
processes. The list is a subset of those listed in
tcp_available_congestion_control.
Default is "reno" and the default setting (tcp_congestion_control).
tcp_app_win - INTEGER
Reserve max(window/2^tcp_app_win, mss) of window for application
buffer. Value 0 is special, it means that nothing is reserved.
Default: 31
tcp_autocorking - BOOLEAN
......@@ -244,6 +282,7 @@ tcp_autocorking - BOOLEAN
packet for the flow is waiting in Qdisc queues or device transmit
queue. Applications can still use TCP_CORK for optimal behavior
when they know how/when to uncork their sockets.
Default : 1
tcp_available_congestion_control - STRING
......@@ -265,6 +304,7 @@ tcp_mtu_probe_floor - INTEGER
tcp_min_snd_mss - INTEGER
TCP SYN and SYNACK messages usually advertise an ADVMSS option,
as described in RFC 1122 and RFC 6691.
If this ADVMSS option is smaller than tcp_min_snd_mss,
it is silently capped to tcp_min_snd_mss.
......@@ -277,6 +317,7 @@ tcp_congestion_control - STRING
Default is set as part of kernel configuration.
For passive connections, the listener congestion control choice
is inherited.
[see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ]
tcp_dsack - BOOLEAN
......@@ -286,9 +327,12 @@ tcp_early_retrans - INTEGER
Tail loss probe (TLP) converts RTOs occurring due to tail
losses into fast recovery (draft-ietf-tcpm-rack). Note that
TLP requires RACK to function properly (see tcp_recovery below)
Possible values:
0 disables TLP
3 or 4 enables TLP
- 0 disables TLP
- 3 or 4 enables TLP
Default: 3
tcp_ecn - INTEGER
......@@ -297,12 +341,17 @@ tcp_ecn - INTEGER
support for it. This feature is useful in avoiding losses due
to congestion by allowing supporting routers to signal
congestion before having to drop packets.
Possible values are:
0 Disable ECN. Neither initiate nor accept ECN.
1 Enable ECN when requested by incoming connections and
also request ECN on outgoing connection attempts.
2 Enable ECN when requested by incoming connections
but do not request ECN on outgoing connections.
= =====================================================
0 Disable ECN. Neither initiate nor accept ECN.
1 Enable ECN when requested by incoming connections and
also request ECN on outgoing connection attempts.
2 Enable ECN when requested by incoming connections
but do not request ECN on outgoing connections.
= =====================================================
Default: 2
tcp_ecn_fallback - BOOLEAN
......@@ -312,6 +361,7 @@ tcp_ecn_fallback - BOOLEAN
additional detection mechanisms could be implemented under this
knob. The value is not used, if tcp_ecn or per route (or congestion
control) ECN settings are disabled.
Default: 1 (fallback enabled)
tcp_fack - BOOLEAN
......@@ -324,7 +374,9 @@ tcp_fin_timeout - INTEGER
valid "receive only" state for an un-orphaned connection, an
orphaned connection in FIN_WAIT_2 state could otherwise wait
forever for the remote to close its end of the connection.
Cf. tcp_max_orphans
Default: 60 seconds
tcp_frto - INTEGER
......@@ -390,7 +442,8 @@ tcp_l3mdev_accept - BOOLEAN
derived from the listen socket to be bound to the L3 domain in
which the packets originated. Only valid when the kernel was
compiled with CONFIG_NET_L3_MASTER_DEV.
Default: 0 (disabled)
Default: 0 (disabled)
tcp_low_latency - BOOLEAN
This is a legacy option, it has no effect anymore.
......@@ -410,10 +463,14 @@ tcp_max_orphans - INTEGER
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests (SYN_RECV),
which have not received an acknowledgment from connecting client.
This is a per-listener limit.
The minimal value is 128 for low memory machines, and it will
increase in proportion to the memory of machine.
If server suffers from overload, try increasing this number.
Remember to also check /proc/sys/net/core/somaxconn
A SYN_RECV request socket consumes about 304 bytes of memory.
......@@ -445,7 +502,9 @@ tcp_min_rtt_wlen - INTEGER
minimum RTT when it is moved to a longer path (e.g., due to traffic
engineering). A longer window makes the filter more resistant to RTT
inflations such as transient congestion. The unit is seconds.
Possible values: 0 - 86400 (1 day)
Default: 300
tcp_moderate_rcvbuf - BOOLEAN
......@@ -457,9 +516,10 @@ tcp_moderate_rcvbuf - BOOLEAN
tcp_mtu_probing - INTEGER
Controls TCP Packetization-Layer Path MTU Discovery. Takes three
values:
0 - Disabled
1 - Disabled by default, enabled when an ICMP black hole detected
2 - Always enabled, use initial MSS of tcp_base_mss.
- 0 - Disabled
- 1 - Disabled by default, enabled when an ICMP black hole detected
- 2 - Always enabled, use initial MSS of tcp_base_mss.
tcp_probe_interval - UNSIGNED INTEGER
Controls how often to start TCP Packetization-Layer Path MTU
......@@ -481,6 +541,7 @@ tcp_no_metrics_save - BOOLEAN
tcp_no_ssthresh_metrics_save - BOOLEAN
Controls whether TCP saves ssthresh metrics in the route cache.
Default is 1, which disables ssthresh metrics.
tcp_orphan_retries - INTEGER
......@@ -489,6 +550,7 @@ tcp_orphan_retries - INTEGER
See tcp_retries2 for more details.
The default value is 8.
If your machine is a loaded WEB server,
you should think about lowering this value, such sockets
may consume significant resources. Cf. tcp_max_orphans.
......@@ -497,11 +559,15 @@ tcp_recovery - INTEGER
This value is a bitmap to enable various experimental loss recovery
features.
RACK: 0x1 enables the RACK loss detection for fast detection of lost
retransmissions and tail drops. It also subsumes and disables
RFC6675 recovery for SACK connections.
RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
RACK: 0x4 disables RACK's DUPACK threshold heuristic
========= =============================================================
RACK: 0x1 enables the RACK loss detection for fast detection of lost
retransmissions and tail drops. It also subsumes and disables
RFC6675 recovery for SACK connections.
RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
RACK: 0x4 disables RACK's DUPACK threshold heuristic
========= =============================================================
Default: 0x1
......@@ -509,12 +575,14 @@ tcp_reordering - INTEGER
Initial reordering level of packets in a TCP stream.
TCP stack can then dynamically adjust flow reordering level
between this initial value and tcp_max_reordering
Default: 3
tcp_max_reordering - INTEGER
Maximal reordering level of packets in a TCP stream.
300 is a fairly conservative value, but you might increase it
if paths are using per packet load balancing (like bonding rr mode)
Default: 300
tcp_retrans_collapse - BOOLEAN
......@@ -550,12 +618,14 @@ tcp_rfc1337 - BOOLEAN
If set, the TCP stack behaves conforming to RFC1337. If unset,
we are not conforming to RFC, but prevent TCP TIME_WAIT
assassination.
Default: 0
tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.
It is guaranteed to each TCP socket, even under moderate memory
pressure.
Default: 4K
default: initial size of receive buffer used by TCP sockets.
......@@ -592,12 +662,14 @@ tcp_slow_start_after_idle - BOOLEAN
window after an idle period. An idle period is defined at
the current RTO. If unset, the congestion window will not
be timed out after an idle period.
Default: 1
tcp_stdurg - BOOLEAN
Use the Host requirements interpretation of the TCP urgent pointer field.
Most hosts use the older BSD interpretation, so if you turn this on
Linux might not communicate correctly with them.
Default: FALSE
tcp_synack_retries - INTEGER
......@@ -646,15 +718,18 @@ tcp_fastopen - INTEGER
the option value being the length of the syn-data backlog.
The values (bitmap) are
0x1: (client) enables sending data in the opening SYN on the client.
0x2: (server) enables the server support, i.e., allowing data in
===== ======== ======================================================
0x1 (client) enables sending data in the opening SYN on the client.
0x2 (server) enables the server support, i.e., allowing data in
a SYN packet to be accepted and passed to the
application before 3-way handshake finishes.
0x4: (client) send data in the opening SYN regardless of cookie
0x4 (client) send data in the opening SYN regardless of cookie
availability and without a cookie option.
0x200: (server) accept data-in-SYN w/o any cookie option present.
0x400: (server) enable all listeners to support Fast Open by
0x200 (server) accept data-in-SYN w/o any cookie option present.
0x400 (server) enable all listeners to support Fast Open by
default without explicit TCP_FASTOPEN socket option.
===== ======== ======================================================
Default: 0x1
......@@ -668,6 +743,7 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER
get detected right after Fastopen is re-enabled and will reset to
initial value when the blackhole issue goes away.
0 to disable the blackhole detection.
By default, it is set to 1hr.
tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
......@@ -698,20 +774,24 @@ tcp_syn_retries - INTEGER
for an active TCP connection attempt will happen after 127seconds.
tcp_timestamps - INTEGER
Enable timestamps as defined in RFC1323.
0: Disabled.
1: Enable timestamps as defined in RFC1323 and use random offset for
each connection rather than only using the current time.
2: Like 1, but without random offsets.
Enable timestamps as defined in RFC1323.
- 0: Disabled.
- 1: Enable timestamps as defined in RFC1323 and use random offset for
each connection rather than only using the current time.
- 2: Like 1, but without random offsets.
Default: 1
tcp_min_tso_segs - INTEGER
Minimal number of segments per TSO frame.
Since linux-3.12, TCP does an automatic sizing of TSO frames,
depending on flow rate, instead of filling 64Kbytes packets.
For specific usages, it's possible to force TCP to build big
TSO frames. Note that TCP stack might split too big TSO packets
if available window is too small.
Default: 2
tcp_pacing_ss_ratio - INTEGER
......@@ -720,6 +800,7 @@ tcp_pacing_ss_ratio - INTEGER
If TCP is in slow start, tcp_pacing_ss_ratio is applied
to let TCP probe for bigger speeds, assuming cwnd can be
doubled every other RTT.
Default: 200
tcp_pacing_ca_ratio - INTEGER
......@@ -727,6 +808,7 @@ tcp_pacing_ca_ratio - INTEGER
to current rate. (current_rate = cwnd * mss / srtt)
If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio
is applied to conservatively probe for bigger throughput.
Default: 120
tcp_tso_win_divisor - INTEGER
......@@ -734,16 +816,20 @@ tcp_tso_win_divisor - INTEGER
can be consumed by a single TSO frame.
The setting of this parameter is a choice between burstiness and
building larger TSO frames.
Default: 3
tcp_tw_reuse - INTEGER
Enable reuse of TIME-WAIT sockets for new connections when it is
safe from protocol viewpoint.
0 - disable
1 - global enable
2 - enable for loopback traffic only
- 0 - disable
- 1 - global enable
- 2 - enable for loopback traffic only
It should not be changed without advice/request of technical
experts.
Default: 2
tcp_window_scaling - BOOLEAN
......@@ -752,11 +838,14 @@ tcp_window_scaling - BOOLEAN
tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
Default: 4K
default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols.
It is usually lower than net.core.wmem_default.
Default: 16K
max: Maximal amount of memory allowed for automatically tuned
......@@ -764,6 +853,7 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max
net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
automatic tuning of that socket's send buffer size, in which case
this value is ignored.
Default: between 64K and 4MB, depending on RAM size.
tcp_notsent_lowat - UNSIGNED INTEGER
......@@ -784,6 +874,7 @@ tcp_workaround_signed_windows - BOOLEAN
remote TCP is broken and treats the window as a signed quantity.
If unset, assume the remote TCP is not broken even if we do
not receive a window scaling option from them.
Default: 0
tcp_thin_linear_timeouts - BOOLEAN
......@@ -796,6 +887,7 @@ tcp_thin_linear_timeouts - BOOLEAN
non-aggressive thin streams, often found to be time-dependent.
For more information on thin streams, see
Documentation/networking/tcp-thin.txt
Default: 0
tcp_limit_output_bytes - INTEGER
......@@ -807,6 +899,7 @@ tcp_limit_output_bytes - INTEGER
flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes
limits the number of bytes on qdisc or device to reduce artificial
RTT/cwnd and reduce bufferbloat.
Default: 1048576 (16 * 65536)
tcp_challenge_ack_limit - INTEGER
......@@ -822,7 +915,8 @@ tcp_rx_skb_cache - BOOLEAN
Default: 0 (disabled)
UDP variables:
UDP variables
=============
udp_l3mdev_accept - BOOLEAN
Enabling this option allows a "global" bound socket to work
......@@ -830,7 +924,8 @@ udp_l3mdev_accept - BOOLEAN
being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV.
Default: 0 (disabled)
Default: 0 (disabled)
udp_mem - vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets.
......@@ -849,15 +944,18 @@ udp_rmem_min - INTEGER
Minimal size of receive buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for receiving data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 4K
udp_wmem_min - INTEGER
Minimal size of send buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for sending data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 4K
RAW variables:
RAW variables
=============
raw_l3mdev_accept - BOOLEAN
Enabling this option allows a "global" bound socket to work
......@@ -865,9 +963,11 @@ raw_l3mdev_accept - BOOLEAN
being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV.
Default: 1 (enabled)
CIPSOv4 Variables:
CIPSOv4 Variables
=================
cipso_cache_enable - BOOLEAN
If set, enable additions to and lookups from the CIPSO label mapping
......@@ -875,6 +975,7 @@ cipso_cache_enable - BOOLEAN
miss. However, regardless of the setting the cache is still
invalidated when required when means you can safely toggle this on and
off and the cache will always be "safe".
Default: 1
cipso_cache_bucket_size - INTEGER
......@@ -884,6 +985,7 @@ cipso_cache_bucket_size - INTEGER
more CIPSO label mappings that can be cached. When the number of
entries in a given hash bucket reaches this limit adding new entries
causes the oldest entry in the bucket to be removed to make room.
Default: 10
cipso_rbm_optfmt - BOOLEAN
......@@ -891,6 +993,7 @@ cipso_rbm_optfmt - BOOLEAN
the CIPSO draft specification (see Documentation/netlabel for details).
This means that when set the CIPSO tag will be padded with empty
categories in order to make the packet data 32-bit aligned.
Default: 0
cipso_rbm_structvalid - BOOLEAN
......@@ -900,9 +1003,11 @@ cipso_rbm_structvalid - BOOLEAN
where in the CIPSO processing code but setting this to 0 (False) should
result in less work (i.e. it should be faster) but could cause problems
with other implementations that require strict checking.
Default: 0
IP Variables:
IP Variables
============
ip_local_port_range - 2 INTEGERS
Defines the local port range that is used by TCP and UDP to
......@@ -931,12 +1036,12 @@ ip_local_reserved_ports - list of comma separated ranges
assignments.
You can reserve ports which are not in the current
ip_local_port_range, e.g.:
ip_local_port_range, e.g.::
$ cat /proc/sys/net/ipv4/ip_local_port_range
32000 60999
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
8080,9148
$ cat /proc/sys/net/ipv4/ip_local_port_range
32000 60999
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
8080,9148
although this is redundant. However such a setting is useful
if later the port range is changed to a value that will
......@@ -956,6 +1061,7 @@ ip_unprivileged_port_start - INTEGER
ip_nonlocal_bind - BOOLEAN
If set, allows processes to bind() to non-local IP addresses,
which can be quite useful - but may break some applications.
Default: 0
ip_autobind_reuse - BOOLEAN
......@@ -972,6 +1078,7 @@ ip_dynaddr - BOOLEAN
If set to a non-zero value larger than 1, a kernel log
message will be printed when dynamic address rewriting
occurs.
Default: 0
ip_early_demux - BOOLEAN
......@@ -981,6 +1088,7 @@ ip_early_demux - BOOLEAN
It may add an additional cost for pure routing workloads that
reduces overall throughput, in such case you should disable it.
Default: 1
ping_group_range - 2 INTEGERS
......@@ -992,21 +1100,25 @@ ping_group_range - 2 INTEGERS
tcp_early_demux - BOOLEAN
Enable early demux for established TCP sockets.
Default: 1
udp_early_demux - BOOLEAN
Enable early demux for connected UDP sockets. Disable this if
your system could experience more unconnected load.
Default: 1
icmp_echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it.
Default: 0
icmp_echo_ignore_broadcasts - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO and
TIMESTAMP requests sent to it via broadcast/multicast.
Default: 1
icmp_ratelimit - INTEGER
......@@ -1016,46 +1128,55 @@ icmp_ratelimit - INTEGER
otherwise the minimal space between responses in milliseconds.
Note that another sysctl, icmp_msgs_per_sec limits the number
of ICMP packets sent on all targets.
Default: 1000
icmp_msgs_per_sec - INTEGER
Limit maximal number of ICMP packets sent per second from this host.
Only messages whose type matches icmp_ratemask (see below) are
controlled by this limit.
Default: 1000
icmp_msgs_burst - INTEGER
icmp_msgs_per_sec controls number of ICMP packets sent per second,
while icmp_msgs_burst controls the burst size of these packets.
Default: 50
icmp_ratemask - INTEGER
Mask made of ICMP types for which rates are being limited.
Significant bits: IHGFEDCBA9876543210
Default mask: 0000001100000011000 (6168)
Bit definitions (see include/linux/icmp.h):
= =========================
0 Echo Reply
3 Destination Unreachable *
4 Source Quench *
3 Destination Unreachable [1]_
4 Source Quench [1]_
5 Redirect
8 Echo Request
B Time Exceeded *
C Parameter Problem *
B Time Exceeded [1]_
C Parameter Problem [1]_
D Timestamp Request
E Timestamp Reply
F Info Request
G Info Reply
H Address Mask Request
I Address Mask Reply
= =========================
* These are rate limited by default (see default mask above)
.. [1] These are rate limited by default (see default mask above)
icmp_ignore_bogus_error_responses - BOOLEAN
Some routers violate RFC1122 by sending bogus responses to broadcast
frames. Such violations are normally logged via a kernel warning.
If this is set to TRUE, the kernel will not give such warnings, which
will avoid log file clutter.
Default: 1
icmp_errors_use_inbound_ifaddr - BOOLEAN
......@@ -1100,32 +1221,39 @@ igmp_max_memberships - INTEGER
igmp_max_msf - INTEGER
Maximum number of addresses allowed in the source filter list for a
multicast group.
Default: 10
igmp_qrv - INTEGER
Controls the IGMP query robustness variable (see RFC2236 8.1).
Default: 2 (as specified by RFC2236 8.1)
Minimum: 1 (as specified by RFC6636 4.5)
force_igmp_version - INTEGER
0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback
allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier
Present timer expires.
1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if
receive IGMPv2/v3 query.
2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive
IGMPv1 query message. Will reply report if receive IGMPv3 query.
3 - Enforce to use IGMP version 3. The same react with default 0.
- 0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback
allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier
Present timer expires.
- 1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if
receive IGMPv2/v3 query.
- 2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive
IGMPv1 query message. Will reply report if receive IGMPv3 query.
- 3 - Enforce to use IGMP version 3. The same react with default 0.
.. note::
Note: this is not the same with force_mld_version because IGMPv3 RFC3376
Security Considerations does not have clear description that we could
ignore other version messages completely as MLDv2 RFC3810. So make
this value as default 0 is recommended.
this is not the same with force_mld_version because IGMPv3 RFC3376
Security Considerations does not have clear description that we could
ignore other version messages completely as MLDv2 RFC3810. So make
this value as default 0 is recommended.
conf/interface/* changes special settings per interface (where
"interface" is the name of your network interface)
``conf/interface/*``
changes special settings per interface (where
interface" is the name of your network interface)
conf/all/* is special, changes the settings for all interfaces
``conf/all/*``
is special, changes the settings for all interfaces
log_martians - BOOLEAN
Log packets with impossible addresses to kernel log.
......@@ -1136,14 +1264,21 @@ log_martians - BOOLEAN
accept_redirects - BOOLEAN
Accept ICMP redirect messages.
accept_redirects for the interface will be enabled if:
- both conf/{all,interface}/accept_redirects are TRUE in the case
forwarding for the interface is enabled
or
- at least one of conf/{all,interface}/accept_redirects is TRUE in the
case forwarding for the interface is disabled
accept_redirects for the interface will be disabled otherwise
default TRUE (host)
FALSE (router)
default:
- TRUE (host)
- FALSE (router)
forwarding - BOOLEAN
Enable IP forwarding on this interface. This controls whether packets
......@@ -1168,12 +1303,14 @@ medium_id - INTEGER
proxy_arp - BOOLEAN
Do proxy arp.
proxy_arp for the interface will be enabled if at least one of
conf/{all,interface}/proxy_arp is set to TRUE,
it will be disabled otherwise
proxy_arp_pvlan - BOOLEAN
Private VLAN proxy arp.
Basically allow proxy arp replies back to the same interface
(from which the ARP request/solicitation was received).
......@@ -1186,6 +1323,7 @@ proxy_arp_pvlan - BOOLEAN
proxy_arp.
This technology is known by different names:
In RFC 3069 it is called VLAN Aggregation.
Cisco and Allied Telesyn call it Private VLAN.
Hewlett-Packard call it Source-Port filtering or port-isolation.
......@@ -1194,26 +1332,33 @@ proxy_arp_pvlan - BOOLEAN
shared_media - BOOLEAN
Send(router) or accept(host) RFC1620 shared media redirects.
Overrides secure_redirects.
shared_media for the interface will be enabled if at least one of
conf/{all,interface}/shared_media is set to TRUE,
it will be disabled otherwise
default TRUE
secure_redirects - BOOLEAN
Accept ICMP redirect messages only to gateways listed in the
interface's current gateway list. Even if disabled, RFC1122 redirect
rules still apply.
Overridden by shared_media.
secure_redirects for the interface will be enabled if at least one of
conf/{all,interface}/secure_redirects is set to TRUE,
it will be disabled otherwise
default TRUE
send_redirects - BOOLEAN
Send redirects, if router.
send_redirects for the interface will be enabled if at least one of
conf/{all,interface}/send_redirects is set to TRUE,
it will be disabled otherwise
Default: TRUE
bootp_relay - BOOLEAN
......@@ -1222,15 +1367,20 @@ bootp_relay - BOOLEAN
BOOTP relay daemon will catch and forward such packets.
conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay
for the interface
default FALSE
Not Implemented Yet.
accept_source_route - BOOLEAN
Accept packets with SRR option.
conf/all/accept_source_route must also be set to TRUE to accept packets
with SRR option on the interface
default TRUE (router)
FALSE (host)
default
- TRUE (router)
- FALSE (host)
accept_local - BOOLEAN
Accept packets with local source addresses. In combination with
......@@ -1241,18 +1391,19 @@ accept_local - BOOLEAN
route_localnet - BOOLEAN
Do not consider loopback addresses as martian source or destination
while routing. This enables the use of 127/8 for local routing purposes.
default FALSE
rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface
is not the best reverse path the packet check will fail.
By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB
and if the source address is not reachable via any interface
the packet check will fail.
- 0 - No source validation.
- 1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface
is not the best reverse path the packet check will fail.
By default failed packets are discarded.
- 2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB
and if the source address is not reachable via any interface
the packet check will fail.
Current recommended practice in RFC3704 is to enable strict mode
to prevent IP spoofing from DDos attacks. If using asymmetric routing
......@@ -1265,19 +1416,19 @@ rp_filter - INTEGER
in startup scripts.
arp_filter - BOOLEAN
1 - Allows you to have multiple network interfaces on the same
subnet, and have the ARPs for each interface be answered
based on whether or not the kernel would route a packet from
the ARP'd IP out that interface (therefore you must use source
based routing for this to work). In other words it allows control
of which cards (usually 1) will respond to an arp request.
0 - (default) The kernel can respond to arp requests with addresses
from other interfaces. This may seem wrong but it usually makes
sense, because it increases the chance of successful communication.
IP addresses are owned by the complete host on Linux, not by
particular interfaces. Only for more complex setups like load-
balancing, does this behaviour cause problems.
- 1 - Allows you to have multiple network interfaces on the same
subnet, and have the ARPs for each interface be answered
based on whether or not the kernel would route a packet from
the ARP'd IP out that interface (therefore you must use source
based routing for this to work). In other words it allows control
of which cards (usually 1) will respond to an arp request.
- 0 - (default) The kernel can respond to arp requests with addresses
from other interfaces. This may seem wrong but it usually makes
sense, because it increases the chance of successful communication.
IP addresses are owned by the complete host on Linux, not by
particular interfaces. Only for more complex setups like load-
balancing, does this behaviour cause problems.
arp_filter for the interface will be enabled if at least one of
conf/{all,interface}/arp_filter is set to TRUE,
......@@ -1287,26 +1438,27 @@ arp_announce - INTEGER
Define different restriction levels for announcing the local
source IP address from IP packets in ARP requests sent on
interface:
0 - (default) Use any local address, configured on any interface
1 - Try to avoid local addresses that are not in the target's
subnet for this interface. This mode is useful when target
hosts reachable via this interface require the source IP
address in ARP requests to be part of their logical network
configured on the receiving interface. When we generate the
request we will check all our subnets that include the
target IP and will preserve the source address if it is from
such subnet. If there is no such subnet we select source
address according to the rules for level 2.
2 - Always use the best local address for this target.
In this mode we ignore the source address in the IP packet
and try to select local address that we prefer for talks with
the target host. Such local address is selected by looking
for primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable
local address is found we select the first local address
we have on the outgoing interface or on all other interfaces,
with the hope we will receive reply for our request and
even sometimes no matter the source IP address we announce.
- 0 - (default) Use any local address, configured on any interface
- 1 - Try to avoid local addresses that are not in the target's
subnet for this interface. This mode is useful when target
hosts reachable via this interface require the source IP
address in ARP requests to be part of their logical network
configured on the receiving interface. When we generate the
request we will check all our subnets that include the
target IP and will preserve the source address if it is from
such subnet. If there is no such subnet we select source
address according to the rules for level 2.
- 2 - Always use the best local address for this target.
In this mode we ignore the source address in the IP packet
and try to select local address that we prefer for talks with
the target host. Such local address is selected by looking
for primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable
local address is found we select the first local address
we have on the outgoing interface or on all other interfaces,
with the hope we will receive reply for our request and
even sometimes no matter the source IP address we announce.
The max value from conf/{all,interface}/arp_announce is used.
......@@ -1317,32 +1469,37 @@ arp_announce - INTEGER
arp_ignore - INTEGER
Define different modes for sending replies in response to
received ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configured
on any interface
1 - reply only if the target IP address is local address
configured on the incoming interface
2 - reply only if the target IP address is local address
configured on the incoming interface and both with the
sender's IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host,
only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses
- 0 - (default): reply for any local target IP address, configured
on any interface
- 1 - reply only if the target IP address is local address
configured on the incoming interface
- 2 - reply only if the target IP address is local address
configured on the incoming interface and both with the
sender's IP address are part from same subnet on this interface
- 3 - do not reply for local addresses configured with scope host,
only resolutions for global and link addresses are replied
- 4-7 - reserved
- 8 - do not reply for all local addresses
The max value from conf/{all,interface}/arp_ignore is used
when ARP request is received on the {interface}
arp_notify - BOOLEAN
Define mode for notification of address and device changes.
0 - (default): do nothing
1 - Generate gratuitous arp requests when device is brought up
or hardware address changes.
== ==========================================================
0 (default): do nothing
1 Generate gratuitous arp requests when device is brought up
or hardware address changes.
== ==========================================================
arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table
- 0 - don't create new entries in the ARP table
- 1 - create new entries in the ARP table
Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on.
......@@ -1378,11 +1535,13 @@ disable_xfrm - BOOLEAN
igmpv2_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
IGMPv1 or IGMPv2 report retransmit will take place.
Default: 10000 (10 seconds)
igmpv3_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
IGMPv3 report retransmit will take place.
Default: 1000 (1 seconds)
promote_secondaries - BOOLEAN
......@@ -1393,19 +1552,23 @@ promote_secondaries - BOOLEAN
drop_unicast_in_l2_multicast - BOOLEAN
Drop any unicast IP packets that are received in link-layer
multicast (or broadcast) frames.
This behavior (for multicast) is actually a SHOULD in RFC
1122, but is disabled by default for compatibility reasons.
Default: off (0)
drop_gratuitous_arp - BOOLEAN
Drop all gratuitous ARP frames, for example if there's a known
good ARP proxy on the network and such frames need not be used
(or in the case of 802.11, must not be used to prevent attacks.)
Default: off (0)
tag - INTEGER
Allows you to write a number, which can be used as required.
Default value is 0.
xfrm4_gc_thresh - INTEGER
......@@ -1417,21 +1580,24 @@ xfrm4_gc_thresh - INTEGER
igmp_link_local_mcast_reports - BOOLEAN
Enable IGMP reports for link local multicast groups in the
224.0.0.X range.
Default TRUE
Alexey Kuznetsov.
kuznet@ms2.inr.ac.ru
Updated by:
Andi Kleen
ak@muc.de
Nicolas Delon
delon.nicolas@wanadoo.fr
- Andi Kleen
ak@muc.de
- Nicolas Delon
delon.nicolas@wanadoo.fr
/proc/sys/net/ipv6/* Variables:
/proc/sys/net/ipv6/* Variables
==============================
IPv6 has no global variables such as tcp_*. tcp_* settings under ipv4/ also
apply to IPv6 [XXX?].
......@@ -1440,8 +1606,9 @@ bindv6only - BOOLEAN
Default value for IPV6_V6ONLY socket option,
which restricts use of the IPv6 socket to IPv6 communication
only.
TRUE: disable IPv4-mapped address feature
FALSE: enable IPv4-mapped address feature
- TRUE: disable IPv4-mapped address feature
- FALSE: enable IPv4-mapped address feature
Default: FALSE (as specified in RFC3493)
......@@ -1449,8 +1616,10 @@ flowlabel_consistency - BOOLEAN
Protect the consistency (and unicity) of flow label.
You have to disable it to use IPV6_FL_F_REFLECT flag on the
flow label manager.
TRUE: enabled
FALSE: disabled
- TRUE: enabled
- FALSE: disabled
Default: TRUE
auto_flowlabels - INTEGER
......@@ -1458,22 +1627,28 @@ auto_flowlabels - INTEGER
packet. This allows intermediate devices, such as routers, to
identify packet flows for mechanisms like Equal Cost Multipath
Routing (see RFC 6438).
0: automatic flow labels are completely disabled
1: automatic flow labels are enabled by default, they can be
= ===========================================================
0 automatic flow labels are completely disabled
1 automatic flow labels are enabled by default, they can be
disabled on a per socket basis using the IPV6_AUTOFLOWLABEL
socket option
2: automatic flow labels are allowed, they may be enabled on a
2 automatic flow labels are allowed, they may be enabled on a
per socket basis using the IPV6_AUTOFLOWLABEL socket option
3: automatic flow labels are enabled and enforced, they cannot
3 automatic flow labels are enabled and enforced, they cannot
be disabled by the socket option
= ===========================================================
Default: 1
flowlabel_state_ranges - BOOLEAN
Split the flow label number space into two ranges. 0-0x7FFFF is
reserved for the IPv6 flow manager facility, 0x80000-0xFFFFF
is reserved for stateless flow labels as described in RFC6437.
TRUE: enabled
FALSE: disabled
- TRUE: enabled
- FALSE: disabled
Default: true
flowlabel_reflect - INTEGER
......@@ -1483,49 +1658,59 @@ flowlabel_reflect - INTEGER
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
This is a bitmask.
1: enabled for established flows
Note that this prevents automatic flowlabel changes, as done
in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
and "tcp: Change txhash on every SYN and RTO retransmit"
- 1: enabled for established flows
Note that this prevents automatic flowlabel changes, as done
in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
and "tcp: Change txhash on every SYN and RTO retransmit"
2: enabled for TCP RESET packets (no active listener)
If set, a RST packet sent in response to a SYN packet on a closed
port will reflect the incoming flow label.
- 2: enabled for TCP RESET packets (no active listener)
If set, a RST packet sent in response to a SYN packet on a closed
port will reflect the incoming flow label.
4: enabled for ICMPv6 echo reply messages.
- 4: enabled for ICMPv6 echo reply messages.
Default: 0
fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes.
Default: 0 (Layer 3)
Possible values:
0 - Layer 3 (source and destination addresses plus flow label)
1 - Layer 4 (standard 5-tuple)
2 - Layer 3 or inner Layer 3 if present
- 0 - Layer 3 (source and destination addresses plus flow label)
- 1 - Layer 4 (standard 5-tuple)
- 2 - Layer 3 or inner Layer 3 if present
anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6
echo reply
TRUE: enabled
FALSE: disabled
- TRUE: enabled
- FALSE: disabled
Default: FALSE
idgen_delay - INTEGER
Controls the delay in seconds after which time to retry
privacy stable address generation if a DAD conflict is
detected.
Default: 1 (as specified in RFC7217)
idgen_retries - INTEGER
Controls the number of retries to generate a stable privacy
address if a DAD conflict is detected.
Default: 3 (as specified in RFC7217)
mld_qrv - INTEGER
Controls the MLD query robustness variable (see RFC3810 9.1).
Default: 2 (as specified by RFC3810 9.1)
Minimum: 1 (as specified by RFC6636 4.5)
max_dst_opts_number - INTEGER
......@@ -1533,6 +1718,7 @@ max_dst_opts_number - INTEGER
options extension header. If this value is less than zero
then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number.
Default: 8
max_hbh_opts_number - INTEGER
......@@ -1540,16 +1726,19 @@ max_hbh_opts_number - INTEGER
options extension header. If this value is less than zero
then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number.
Default: 8
max_dst_opts_length - INTEGER
Maximum length allowed for a Destination options extension
header.
Default: INT_MAX (unlimited)
max_hbh_length - INTEGER
Maximum length allowed for a Hop-by-Hop options extension
header.
Default: INT_MAX (unlimited)
skip_notify_on_dev_down - BOOLEAN
......@@ -1558,6 +1747,7 @@ skip_notify_on_dev_down - BOOLEAN
generate this message; IPv6 does by default. Setting this sysctl
to true skips the message, making IPv4 and IPv6 on par in relying
on userspace caches to track link events and evict routes.
Default: false (generate message)
nexthop_compat_mode - BOOLEAN
......@@ -1592,18 +1782,20 @@ seg6_flowlabel - INTEGER
Controls the behaviour of computing the flowlabel of outer
IPv6 header in case of SR T.encaps
-1 set flowlabel to zero.
0 copy flowlabel from Inner packet in case of Inner IPv6
(Set flowlabel to 0 in case IPv4/L2)
1 Compute the flowlabel using seg6_make_flowlabel()
== =======================================================
-1 set flowlabel to zero.
0 copy flowlabel from Inner packet in case of Inner IPv6
(Set flowlabel to 0 in case IPv4/L2)
1 Compute the flowlabel using seg6_make_flowlabel()
== =======================================================
Default is 0.
conf/default/*:
``conf/default/*``:
Change the interface-specific default settings.
conf/all/*:
``conf/all/*``:
Change all the interface-specific settings.
[XXX: Other special features than forwarding?]
......@@ -1627,9 +1819,10 @@ fwmark_reflect - BOOLEAN
associated with a socket for example, TCP RSTs or ICMPv6 echo replies).
If unset, these packets have a fwmark of zero. If set, they have the
fwmark of the packet they are replying to.
Default: 0
conf/interface/*:
``conf/interface/*``:
Change special settings per interface.
The functional behaviour for certain settings is different
......@@ -1644,31 +1837,40 @@ accept_ra - INTEGER
transmitted.
Possible values are:
0 Do not accept Router Advertisements.
1 Accept Router Advertisements if forwarding is disabled.
2 Overrule forwarding behaviour. Accept Router Advertisements
even if forwarding is enabled.
Functional default: enabled if local forwarding is disabled.
disabled if local forwarding is enabled.
== ===========================================================
0 Do not accept Router Advertisements.
1 Accept Router Advertisements if forwarding is disabled.
2 Overrule forwarding behaviour. Accept Router Advertisements
even if forwarding is enabled.
== ===========================================================
Functional default:
- enabled if local forwarding is disabled.
- disabled if local forwarding is enabled.
accept_ra_defrtr - BOOLEAN
Learn default router in Router Advertisement.
Functional default: enabled if accept_ra is enabled.
disabled if accept_ra is disabled.
Functional default:
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_ra_from_local - BOOLEAN
Accept RA with source-address that is found on local machine
if the RA is otherwise proper and able to be accepted.
Default is to NOT accept these as it may be an un-intended
network loop.
if the RA is otherwise proper and able to be accepted.
Default is to NOT accept these as it may be an un-intended
network loop.
Functional default:
enabled if accept_ra_from_local is enabled
on a specific interface.
disabled if accept_ra_from_local is disabled
on a specific interface.
- enabled if accept_ra_from_local is enabled
on a specific interface.
- disabled if accept_ra_from_local is disabled
on a specific interface.
accept_ra_min_hop_limit - INTEGER
Minimum hop limit Information in Router Advertisement.
......@@ -1681,8 +1883,10 @@ accept_ra_min_hop_limit - INTEGER
accept_ra_pinfo - BOOLEAN
Learn Prefix Information in Router Advertisement.
Functional default: enabled if accept_ra is enabled.
disabled if accept_ra is disabled.
Functional default:
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_ra_rt_info_min_plen - INTEGER
Minimum prefix length of Route Information in RA.
......@@ -1690,8 +1894,10 @@ accept_ra_rt_info_min_plen - INTEGER
Route Information w/ prefix smaller than this variable shall
be ignored.
Functional default: 0 if accept_ra_rtr_pref is enabled.
-1 if accept_ra_rtr_pref is disabled.
Functional default:
* 0 if accept_ra_rtr_pref is enabled.
* -1 if accept_ra_rtr_pref is disabled.
accept_ra_rt_info_max_plen - INTEGER
Maximum prefix length of Route Information in RA.
......@@ -1699,33 +1905,41 @@ accept_ra_rt_info_max_plen - INTEGER
Route Information w/ prefix larger than this variable shall
be ignored.
Functional default: 0 if accept_ra_rtr_pref is enabled.
-1 if accept_ra_rtr_pref is disabled.
Functional default:
* 0 if accept_ra_rtr_pref is enabled.
* -1 if accept_ra_rtr_pref is disabled.
accept_ra_rtr_pref - BOOLEAN
Accept Router Preference in RA.
Functional default: enabled if accept_ra is enabled.
disabled if accept_ra is disabled.
Functional default:
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_ra_mtu - BOOLEAN
Apply the MTU value specified in RA option 5 (RFC4861). If
disabled, the MTU specified in the RA will be ignored.
Functional default: enabled if accept_ra is enabled.
disabled if accept_ra is disabled.
Functional default:
- enabled if accept_ra is enabled.
- disabled if accept_ra is disabled.
accept_redirects - BOOLEAN
Accept Redirects.
Functional default: enabled if local forwarding is disabled.
disabled if local forwarding is enabled.
Functional default:
- enabled if local forwarding is disabled.
- disabled if local forwarding is enabled.
accept_source_route - INTEGER
Accept source routing (routing extension header).
>= 0: Accept only routing header type 2.
< 0: Do not accept routing header.
- >= 0: Accept only routing header type 2.
- < 0: Do not accept routing header.
Default: 0
......@@ -1733,24 +1947,30 @@ autoconf - BOOLEAN
Autoconfigure addresses using Prefix Information in Router
Advertisements.
Functional default: enabled if accept_ra_pinfo is enabled.
disabled if accept_ra_pinfo is disabled.
Functional default:
- enabled if accept_ra_pinfo is enabled.
- disabled if accept_ra_pinfo is disabled.
dad_transmits - INTEGER
The amount of Duplicate Address Detection probes to send.
Default: 1
forwarding - INTEGER
Configure interface-specific Host/Router behaviour.
Note: It is recommended to have the same setting on all
interfaces; mixed router/host scenarios are rather uncommon.
.. note::
It is recommended to have the same setting on all
interfaces; mixed router/host scenarios are rather uncommon.
Possible values are:
0 Forwarding disabled
1 Forwarding enabled
FALSE (0):
- 0 Forwarding disabled
- 1 Forwarding enabled
**FALSE (0)**:
By default, Host behaviour is assumed. This means:
......@@ -1761,7 +1981,7 @@ forwarding - INTEGER
Advertisements (and do autoconfiguration).
4. If accept_redirects is TRUE (default), accept Redirects.
TRUE (1):
**TRUE (1)**:
If local forwarding is enabled, Router behaviour is assumed.
This means exactly the reverse from the above:
......@@ -1772,19 +1992,22 @@ forwarding - INTEGER
4. Redirects are ignored.
Default: 0 (disabled) if global forwarding is disabled (default),
otherwise 1 (enabled).
otherwise 1 (enabled).
hop_limit - INTEGER
Default Hop Limit to set.
Default: 64
mtu - INTEGER
Default Maximum Transfer Unit
Default: 1280 (IPv6 required minimum)
ip_nonlocal_bind - BOOLEAN
If set, allows processes to bind() to non-local IPv6 addresses,
which can be quite useful - but may break some applications.
Default: 0
router_probe_interval - INTEGER
......@@ -1796,15 +2019,18 @@ router_probe_interval - INTEGER
router_solicitation_delay - INTEGER
Number of seconds to wait after interface is brought up
before sending Router Solicitations.
Default: 1
router_solicitation_interval - INTEGER
Number of seconds to wait between Router Solicitations.
Default: 4
router_solicitations - INTEGER
Number of Router Solicitations to send until assuming no
routers are present.
Default: 3
use_oif_addrs_only - BOOLEAN
......@@ -1816,28 +2042,35 @@ use_oif_addrs_only - BOOLEAN
use_tempaddr - INTEGER
Preference for Privacy Extensions (RFC3041).
<= 0 : disable Privacy Extensions
== 1 : enable Privacy Extensions, but prefer public
addresses over temporary addresses.
> 1 : enable Privacy Extensions and prefer temporary
addresses over public addresses.
Default: 0 (for most devices)
-1 (for point-to-point devices and loopback devices)
* <= 0 : disable Privacy Extensions
* == 1 : enable Privacy Extensions, but prefer public
addresses over temporary addresses.
* > 1 : enable Privacy Extensions and prefer temporary
addresses over public addresses.
Default:
* 0 (for most devices)
* -1 (for point-to-point devices and loopback devices)
temp_valid_lft - INTEGER
valid lifetime (in seconds) for temporary addresses.
Default: 604800 (7 days)
temp_prefered_lft - INTEGER
Preferred lifetime (in seconds) for temporary addresses.
Default: 86400 (1 day)
keep_addr_on_down - INTEGER
Keep all IPv6 addresses on an interface down event. If set static
global addresses with no expiration time are not flushed.
>0 : enabled
0 : system default
<0 : disabled
* >0 : enabled
* 0 : system default
* <0 : disabled
Default: 0 (addresses are removed)
......@@ -1846,11 +2079,13 @@ max_desync_factor - INTEGER
that ensures that clients don't synchronize with each
other and generate new addresses at exactly the same time.
value is in seconds.
Default: 600
regen_max_retry - INTEGER
Number of attempts before give up attempting to generate
valid temporary addresses.
Default: 5
max_addresses - INTEGER
......@@ -1858,12 +2093,14 @@ max_addresses - INTEGER
to zero disables the limitation. It is not recommended to set this
value too large (or to zero) because it would be an easy way to
crash the kernel by allowing too many addresses to be created.
Default: 16
disable_ipv6 - BOOLEAN
Disable IPv6 operation. If accept_dad is set to 2, this value
will be dynamically set to TRUE if DAD fails for the link-local
address.
Default: FALSE (enable IPv6 operation)
When this value is changed from 1 to 0 (IPv6 is being enabled),
......@@ -1877,10 +2114,13 @@ disable_ipv6 - BOOLEAN
accept_dad - INTEGER
Whether to accept DAD (Duplicate Address Detection).
0: Disable DAD
1: Enable DAD (default)
2: Enable DAD, and disable IPv6 operation if MAC-based duplicate
link-local address has been found.
== ==============================================================
0 Disable DAD
1 Enable DAD (default)
2 Enable DAD, and disable IPv6 operation if MAC-based duplicate
link-local address has been found.
== ==============================================================
DAD operation and mode on a given interface will be selected according
to the maximum value of conf/{all,interface}/accept_dad.
......@@ -1888,6 +2128,7 @@ accept_dad - INTEGER
force_tllao - BOOLEAN
Enable sending the target link-layer address option even when
responding to a unicast neighbor solicitation.
Default: FALSE
Quoting from RFC 2461, section 4.4, Target link-layer address:
......@@ -1905,9 +2146,10 @@ force_tllao - BOOLEAN
ndisc_notify - BOOLEAN
Define mode for notification of address and device changes.
0 - (default): do nothing
1 - Generate unsolicited neighbour advertisements when device is brought
up or hardware address changes.
* 0 - (default): do nothing
* 1 - Generate unsolicited neighbour advertisements when device is brought
up or hardware address changes.
ndisc_tclass - INTEGER
The IPv6 Traffic Class to use by default when sending IPv6 Neighbor
......@@ -1916,33 +2158,38 @@ ndisc_tclass - INTEGER
These 8 bits can be interpreted as 6 high order bits holding the DSCP
value and 2 low order bits representing ECN (which you probably want
to leave cleared).
0 - (default)
* 0 - (default)
mldv1_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
MLDv1 report retransmit will take place.
Default: 10000 (10 seconds)
mldv2_unsolicited_report_interval - INTEGER
The interval in milliseconds in which the next unsolicited
MLDv2 report retransmit will take place.
Default: 1000 (1 second)
force_mld_version - INTEGER
0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed
1 - Enforce to use MLD version 1
2 - Enforce to use MLD version 2
* 0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed
* 1 - Enforce to use MLD version 1
* 2 - Enforce to use MLD version 2
suppress_frag_ndisc - INTEGER
Control RFC 6980 (Security Implications of IPv6 Fragmentation
with IPv6 Neighbor Discovery) behavior:
1 - (default) discard fragmented neighbor discovery packets
0 - allow fragmented neighbor discovery packets
* 1 - (default) discard fragmented neighbor discovery packets
* 0 - allow fragmented neighbor discovery packets
optimistic_dad - BOOLEAN
Whether to perform Optimistic Duplicate Address Detection (RFC 4429).
0: disabled (default)
1: enabled
* 0: disabled (default)
* 1: enabled
Optimistic Duplicate Address Detection for the interface will be enabled
if at least one of conf/{all,interface}/optimistic_dad is set to 1,
......@@ -1953,8 +2200,9 @@ use_optimistic - BOOLEAN
source address selection. Preferred addresses will still be chosen
before optimistic addresses, subject to other ranking in the source
address selection algorithm.
0: disabled (default)
1: enabled
* 0: disabled (default)
* 1: enabled
This will be enabled if at least one of
conf/{all,interface}/use_optimistic is set to 1, disabled otherwise.
......@@ -1976,12 +2224,14 @@ stable_secret - IPv6 address
addr_gen_mode - INTEGER
Defines how link-local and autoconf addresses are generated.
0: generate address based on EUI64 (default)
1: do no generate a link-local address, use EUI64 for addresses generated
from autoconf
2: generate stable privacy addresses, using the secret from
= =================================================================
0 generate address based on EUI64 (default)
1 do no generate a link-local address, use EUI64 for addresses
generated from autoconf
2 generate stable privacy addresses, using the secret from
stable_secret (RFC7217)
3: generate stable privacy addresses, using a random secret if unset
3 generate stable privacy addresses, using a random secret if unset
= =================================================================
drop_unicast_in_l2_multicast - BOOLEAN
Drop any unicast IPv6 packets that are received in link-layer
......@@ -2003,13 +2253,18 @@ enhanced_dad - BOOLEAN
detection of duplicates due to loopback of the NS messages that we send.
The nonce option will be sent on an interface unless both of
conf/{all,interface}/enhanced_dad are set to FALSE.
Default: TRUE
icmp/*:
``icmp/*``:
===========
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 messages.
0 to disable any limiting,
otherwise the minimal space between responses in milliseconds.
Default: 1000
ratemask - list of comma separated ranges
......@@ -2030,16 +2285,19 @@ ratemask - list of comma separated ranges
echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol.
Default: 0
echo_ignore_multicast - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol via multicast.
Default: 0
echo_ignore_anycast - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol destined to anycast address.
Default: 0
xfrm6_gc_thresh - INTEGER
......@@ -2055,43 +2313,52 @@ YOSHIFUJI Hideaki / USAGI Project <yoshfuji@linux-ipv6.org>
/proc/sys/net/bridge/* Variables:
=================================
bridge-nf-call-arptables - BOOLEAN
1 : pass bridged ARP traffic to arptables' FORWARD chain.
0 : disable this.
- 1 : pass bridged ARP traffic to arptables' FORWARD chain.
- 0 : disable this.
Default: 1
bridge-nf-call-iptables - BOOLEAN
1 : pass bridged IPv4 traffic to iptables' chains.
0 : disable this.
- 1 : pass bridged IPv4 traffic to iptables' chains.
- 0 : disable this.
Default: 1
bridge-nf-call-ip6tables - BOOLEAN
1 : pass bridged IPv6 traffic to ip6tables' chains.
0 : disable this.
- 1 : pass bridged IPv6 traffic to ip6tables' chains.
- 0 : disable this.
Default: 1
bridge-nf-filter-vlan-tagged - BOOLEAN
1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
0 : disable this.
- 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
- 0 : disable this.
Default: 0
bridge-nf-filter-pppoe-tagged - BOOLEAN
1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
0 : disable this.
- 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
- 0 : disable this.
Default: 0
bridge-nf-pass-vlan-input-dev - BOOLEAN
1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan
interface on the bridge and set the netfilter input device to the vlan.
This allows use of e.g. "iptables -i br0.1" and makes the REDIRECT
target work with vlan-on-top-of-bridge interfaces. When no matching
vlan interface is found, or this switch is off, the input device is
set to the bridge interface.
0: disable bridge netfilter vlan interface lookup.
- 1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan
interface on the bridge and set the netfilter input device to the
vlan. This allows use of e.g. "iptables -i br0.1" and makes the
REDIRECT target work with vlan-on-top-of-bridge interfaces. When no
matching vlan interface is found, or this switch is off, the input
device is set to the bridge interface.
- 0: disable bridge netfilter vlan interface lookup.
Default: 0
proc/sys/net/sctp/* Variables:
``proc/sys/net/sctp/*`` Variables:
==================================
addip_enable - BOOLEAN
Enable or disable extension of Dynamic Address Reconfiguration
......@@ -2156,11 +2423,13 @@ addip_noauth_enable - BOOLEAN
we provide this variable to control the enforcement of the
authentication requirement.
1: Allow ADD-IP extension to be used without authentication. This
== ===============================================================
1 Allow ADD-IP extension to be used without authentication. This
should only be set in a closed environment for interoperability
with older implementations.
0: Enforce the authentication requirement
0 Enforce the authentication requirement
== ===============================================================
Default: 0
......@@ -2170,8 +2439,8 @@ auth_enable - BOOLEAN
required for secure operation of Dynamic Address Reconfiguration
(ADD-IP) extension.
1: Enable this extension.
0: Disable this extension.
- 1: Enable this extension.
- 0: Disable this extension.
Default: 0
......@@ -2179,8 +2448,8 @@ prsctp_enable - BOOLEAN
Enable or disable the Partial Reliability extension (RFC3758) which
is used to notify peers that a given DATA should no longer be expected.
1: Enable extension
0: Disable
- 1: Enable extension
- 0: Disable
Default: 1
......@@ -2282,8 +2551,8 @@ cookie_preserve_enable - BOOLEAN
Enable or disable the ability to extend the lifetime of the SCTP cookie
that is used during the establishment phase of SCTP association
1: Enable cookie lifetime extension.
0: Disable
- 1: Enable cookie lifetime extension.
- 0: Disable
Default: 1
......@@ -2291,9 +2560,11 @@ cookie_hmac_alg - STRING
Select the hmac algorithm used when generating the cookie value sent by
a listening sctp socket to a connecting client in the INIT-ACK chunk.
Valid values are:
* md5
* sha1
* none
Ability to assign md5 or sha1 as the selected alg is predicated on the
configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and
CONFIG_CRYPTO_SHA1).
......@@ -2312,16 +2583,16 @@ rcvbuf_policy - INTEGER
to each association instead of the socket. This prevents the described
blocking.
1: rcvbuf space is per association
0: rcvbuf space is per socket
- 1: rcvbuf space is per association
- 0: rcvbuf space is per socket
Default: 0
sndbuf_policy - INTEGER
Similar to rcvbuf_policy above, this applies to send buffer space.
1: Send buffer is tracked per association
0: Send buffer is tracked per socket.
- 1: Send buffer is tracked per association
- 0: Send buffer is tracked per socket.
Default: 0
......@@ -2354,19 +2625,23 @@ sctp_wmem - vector of 3 INTEGERs: min, default, max
addr_scope_policy - INTEGER
Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
0 - Disable IPv4 address scoping
1 - Enable IPv4 address scoping
2 - Follow draft but allow IPv4 private addresses
3 - Follow draft but allow IPv4 link local addresses
- 0 - Disable IPv4 address scoping
- 1 - Enable IPv4 address scoping
- 2 - Follow draft but allow IPv4 private addresses
- 3 - Follow draft but allow IPv4 link local addresses
Default: 1
/proc/sys/net/core/*
``/proc/sys/net/core/*``
========================
Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
/proc/sys/net/unix/*
``/proc/sys/net/unix/*``
========================
max_dgram_qlen - INTEGER
The maximum length of dgram socket receive queue
......
.. SPDX-License-Identifier: GPL-2.0
==================================
IP dynamic address hack-port v0.03
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==================================
This stuff allows diald ONESHOT connections to get established by
dynamically changing packet source address (and socket's if local procs).
It is implemented for TCP diald-box connections(1) and IP_MASQuerading(2).
If enabled[*] and forwarding interface has changed:
If enabled\ [#]_ and forwarding interface has changed:
1) Socket (and packet) source address is rewritten ON RETRANSMISSIONS
while in SYN_SENT state (diald-box processes).
2) Out-bounded MASQueraded source address changes ON OUTPUT (when
......@@ -12,18 +17,24 @@ If enabled[*] and forwarding interface has changed:
received by the tunnel.
This is specially helpful for auto dialup links (diald), where the
``actual'' outgoing address is unknown at the moment the link is
``actual`` outgoing address is unknown at the moment the link is
going up. So, the *same* (local AND masqueraded) connections requests that
bring the link up will be able to get established.
[*] At boot, by default no address rewriting is attempted.
To enable:
.. [#] At boot, by default no address rewriting is attempted.
To enable::
# echo 1 > /proc/sys/net/ipv4/ip_dynaddr
To enable verbose mode:
# echo 2 > /proc/sys/net/ipv4/ip_dynaddr
To disable (default)
To enable verbose mode::
# echo 2 > /proc/sys/net/ipv4/ip_dynaddr
To disable (default)::
# echo 0 > /proc/sys/net/ipv4/ip_dynaddr
Enjoy!
-- Juanjo <jjciarla@raiz.uncu.edu.ar>
Juanjo <jjciarla@raiz.uncu.edu.ar>
Text file for ipddp.c:
AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
.. SPDX-License-Identifier: GPL-2.0
This text file is written by Jay Schulist <jschlst@samba.org>
=========================================================
AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
=========================================================
Documentation ipddp.c
This file is written by Jay Schulist <jschlst@samba.org>
Introduction
------------
......@@ -21,7 +26,7 @@ kernel AppleTalk layer and drivers are available.
Each mode requires its own user space software.
Compiling AppleTalk-IP Decapsulation/Encapsulation
=================================================
==================================================
AppleTalk-IP decapsulation needs to be compiled into your kernel. You
will need to turn on AppleTalk-IP driver support. Then you will need to
......
.. SPDX-License-Identifier: GPL-2.0
==================================
ATM (i)Chip IA Linux Driver Source
==================================
READ ME FISRT
READ ME FISRT
ATM (i)Chip IA Linux Driver Source
--------------------------------------------------------------------------------
Read This Before You Begin!
Read This Before You Begin!
--------------------------------------------------------------------------------
Description
-----------
===========
This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver
This is the README file for the Interphase PCI ATM (i)Chip IA Linux driver
source release.
The features and limitations of this driver are as follows:
- A single VPI (VPI value of 0) is supported.
- Supports 4K VCs for the server board (with 512K control memory) and 1K
- Supports 4K VCs for the server board (with 512K control memory) and 1K
VCs for the client board (with 128K control memory).
- UBR, ABR and CBR service categories are supported.
- Only AAL5 is supported.
- Supports setting of PCR on the VCs.
- Only AAL5 is supported.
- Supports setting of PCR on the VCs.
- Multiple adapters in a system are supported.
- All variants of Interphase ATM PCI (i)Chip adapter cards are supported,
including x575 (OC3, control memory 128K , 512K and packet memory 128K,
512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See
- All variants of Interphase ATM PCI (i)Chip adapter cards are supported,
including x575 (OC3, control memory 128K , 512K and packet memory 128K,
512K and 1M), x525 (UTP25) and x531 (DS3 and E3). See
http://www.iphase.com/
for details.
- Only x86 platforms are supported.
......@@ -29,128 +37,155 @@ The features and limitations of this driver are as follows:
Before You Start
----------------
================
Installation
------------
1. Installing the adapters in the system
To install the ATM adapters in the system, follow the steps below.
a. Login as root.
b. Shut down the system and power off the system.
c. Install one or more ATM adapters in the system.
d. Connect each adapter to a port on an ATM switch. The green 'Link'
LED on the front panel of the adapter will be on if the adapter is
connected to the switch properly when the system is powered up.
d. Connect each adapter to a port on an ATM switch. The green 'Link'
LED on the front panel of the adapter will be on if the adapter is
connected to the switch properly when the system is powered up.
e. Power on and boot the system.
2. [ Removed ]
3. Rebuild kernel with ABR support
[ a. and b. removed ]
c. Reconfigure the kernel, choose the Interphase ia driver through "make
c. Reconfigure the kernel, choose the Interphase ia driver through "make
menuconfig" or "make xconfig".
d. Rebuild the kernel, loadable modules and the atm tools.
d. Rebuild the kernel, loadable modules and the atm tools.
e. Install the new built kernel and modules and reboot.
4. Load the adapter hardware driver (ia driver) if it is built as a module
a. Login as root.
b. Change directory to /lib/modules/<kernel-version>/atm.
c. Run "insmod suni.o;insmod iphase.o"
The yellow 'status' LED on the front panel of the adapter will blink
while the driver is loaded in the system.
d. To verify that the 'ia' driver is loaded successfully, run the
following command:
The yellow 'status' LED on the front panel of the adapter will blink
while the driver is loaded in the system.
d. To verify that the 'ia' driver is loaded successfully, run the
following command::
cat /proc/atm/devices
cat /proc/atm/devices
If the driver is loaded successfully, the output of the command will
be similar to the following lines:
If the driver is loaded successfully, the output of the command will
be similar to the following lines::
Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ...
0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 )
Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ...
0 ia xxxxxxxxx 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 )
You can also check the system log file /var/log/messages for messages
related to the ATM driver.
You can also check the system log file /var/log/messages for messages
related to the ATM driver.
5. Ia Driver Configuration
5. Ia Driver Configuration
5.1 Configuration of adapter buffers
The (i)Chip boards have 3 different packet RAM size variants: 128K, 512K and
1M. The RAM size decides the number of buffers and buffer size. The default
size and number of buffers are set as following:
Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf
RAM size size size size size cnt cnt
-------- ------ ------ ------ ------ ------ ------
128K 64K 64K 10K 10K 6 6
512K 256K 256K 10K 10K 25 25
1M 512K 512K 10K 10K 51 51
1M. The RAM size decides the number of buffers and buffer size. The default
size and number of buffers are set as following:
========= ======= ====== ====== ====== ====== ======
Total Rx RAM Tx RAM Rx Buf Tx Buf Rx buf Tx buf
RAM size size size size size cnt cnt
========= ======= ====== ====== ====== ====== ======
128K 64K 64K 10K 10K 6 6
512K 256K 256K 10K 10K 25 25
1M 512K 512K 10K 10K 51 51
========= ======= ====== ====== ====== ====== ======
These setting should work well in most environments, but can be
changed by typing the following command:
insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \
IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE>
changed by typing the following command::
insmod <IA_DIR>/ia.o IA_RX_BUF=<RX_CNT> IA_RX_BUF_SZ=<RX_SIZE> \
IA_TX_BUF=<TX_CNT> IA_TX_BUF_SZ=<TX_SIZE>
Where:
RX_CNT = number of receive buffers in the range (1-128)
RX_SIZE = size of receive buffers in the range (48-64K)
TX_CNT = number of transmit buffers in the range (1-128)
TX_SIZE = size of transmit buffers in the range (48-64K)
1. Transmit and receive buffer size must be a multiple of 4.
2. Care should be taken so that the memory required for the
transmit and receive buffers is less than or equal to the
total adapter packet memory.
- RX_CNT = number of receive buffers in the range (1-128)
- RX_SIZE = size of receive buffers in the range (48-64K)
- TX_CNT = number of transmit buffers in the range (1-128)
- TX_SIZE = size of transmit buffers in the range (48-64K)
1. Transmit and receive buffer size must be a multiple of 4.
2. Care should be taken so that the memory required for the
transmit and receive buffers is less than or equal to the
total adapter packet memory.
5.2 Turn on ia debug trace
When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver
can provide more debug trace if needed. There is a bit mask variable,
IADebugFlag, which controls the output of the traces. You can find the bit
map of the IADebugFlag in iphase.h.
The debug trace can be turn on through the insmod command line option, for
example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug
When the ia driver is built with the CONFIG_ATM_IA_DEBUG flag, the driver
can provide more debug trace if needed. There is a bit mask variable,
IADebugFlag, which controls the output of the traces. You can find the bit
map of the IADebugFlag in iphase.h.
The debug trace can be turn on through the insmod command line option, for
example, "insmod iphase.o IADebugFlag=0xffffffff" can turn on all the debug
traces together with loading the driver.
6. Ia Driver Test Using ttcp_atm and PVC
For the PVC setup, the test machines can either be connected back-to-back or
through a switch. If connected through the switch, the switch must be
For the PVC setup, the test machines can either be connected back-to-back or
through a switch. If connected through the switch, the switch must be
configured for the PVC(s).
a. For UBR test:
At the test machine intended to receive data, type:
ttcp_atm -r -a -s 0.100
At the other test machine, type:
ttcp_atm -t -a -s 0.100 -n 10000
At the test machine intended to receive data, type::
ttcp_atm -r -a -s 0.100
At the other test machine, type::
ttcp_atm -t -a -s 0.100 -n 10000
Run "ttcp_atm -h" to display more options of the ttcp_atm tool.
b. For ABR test:
It is the same as the UBR testing, but with an extra command option:
-Pabr:max_pcr=<xxx>
where:
xxx = the maximum peak cell rate, from 170 - 353207.
This option must be set on both the machines.
It is the same as the UBR testing, but with an extra command option::
-Pabr:max_pcr=<xxx>
where:
xxx = the maximum peak cell rate, from 170 - 353207.
This option must be set on both the machines.
c. For CBR test:
It is the same as the UBR testing, but with an extra command option:
-Pcbr:max_pcr=<xxx>
where:
xxx = the maximum peak cell rate, from 170 - 353207.
This option may only be set on the transmit machine.
It is the same as the UBR testing, but with an extra command option::
-Pcbr:max_pcr=<xxx>
where:
xxx = the maximum peak cell rate, from 170 - 353207.
OUTSTANDING ISSUES
------------------
This option may only be set on the transmit machine.
Outstanding Issues
==================
Contact Information
-------------------
::
Customer Support:
United States: Telephone: (214) 654-5555
Fax: (214) 654-5500
United States: Telephone: (214) 654-5555
Fax: (214) 654-5500
E-Mail: intouch@iphase.com
Europe: Telephone: 33 (0)1 41 15 44 00
Fax: 33 (0)1 41 15 12 13
......
.. SPDX-License-Identifier: GPL-2.0
=====
IPsec
=====
Here documents known IPsec corner cases which need to be keep in mind when
deploy various IPsec configuration in real world production environment.
1. IPcomp: Small IP packet won't get compressed at sender, and failed on
1. IPcomp:
Small IP packet won't get compressed at sender, and failed on
policy check on receiver.
Quote from RFC3173:
2.2. Non-Expansion Policy
Quote from RFC3173::
2.2. Non-Expansion Policy
If the total size of a compressed payload and the IPComp header, as
defined in section 3, is not smaller than the size of the original
......
.. SPDX-License-Identifier: GPL-2.0
====
IPv6
====
Options for the ipv6 module are supplied as parameters at load time.
Module options may be given as command line arguments to the insmod
or modprobe command, but are usually specified in either
/etc/modules.d/*.conf configuration files, or in a distro-specific
``/etc/modules.d/*.conf`` configuration files, or in a distro-specific
configuration file.
The available ipv6 module parameters are listed below. If a parameter
......
.. SPDX-License-Identifier: GPL-2.0
IPVLAN Driver HOWTO
===================
IPVLAN Driver HOWTO
===================
Initial Release:
Mahesh Bandewar <maheshb AT google.com>
1. Introduction:
This is conceptually very similar to the macvlan driver with one major
================
This is conceptually very similar to the macvlan driver with one major
exception of using L3 for mux-ing /demux-ing among slaves. This property makes
the master device share the L2 with it's slave devices. I have developed this
driver in conjunction with network namespaces and not sure if there is use case
......@@ -13,34 +17,48 @@ outside of it.
2. Building and Installation:
In order to build the driver, please select the config item CONFIG_IPVLAN.
=============================
In order to build the driver, please select the config item CONFIG_IPVLAN.
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
(CONFIG_IPVLAN=m).
3. Configuration:
There are no module parameters for this driver and it can be configured
=================
There are no module parameters for this driver and it can be configured
using IProute2/ip utility.
::
ip link add link <master> name <slave> type ipvlan [ mode MODE ] [ FLAGS ]
where
MODE: l3 (default) | l3s | l2
FLAGS: bridge (default) | private | vepa
MODE: l3 (default) | l3s | l2
FLAGS: bridge (default) | private | vepa
e.g.
e.g.
(a) Following will create IPvlan link with eth0 as master in
L3 bridge mode
bash# ip link add link eth0 name ipvl0 type ipvlan
(b) This command will create IPvlan link in L2 bridge mode.
bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
(c) This command will create an IPvlan device in L2 private mode.
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
(d) This command will create an IPvlan device in L2 vepa mode.
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
L3 bridge mode::
bash# ip link add link eth0 name ipvl0 type ipvlan
(b) This command will create IPvlan link in L2 bridge mode::
bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
(c) This command will create an IPvlan device in L2 private mode::
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
(d) This command will create an IPvlan device in L2 vepa mode::
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
4. Operating modes:
IPvlan has two modes of operation - L2 and L3. For a given master device,
===================
IPvlan has two modes of operation - L2 and L3. For a given master device,
you can select one of these two modes and all slaves on that master will
operate in the same (selected) mode. The RX mode is almost identical except
that in L3 mode the slaves wont receive any multicast / broadcast traffic.
......@@ -48,39 +66,50 @@ L3 mode is more restrictive since routing is controlled from the other (mostly)
default namespace.
4.1 L2 mode:
In this mode TX processing happens on the stack instance attached to the
------------
In this mode TX processing happens on the stack instance attached to the
slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well.
4.2 L3 mode:
In this mode TX processing up to L3 happens on the stack instance attached
------------
In this mode TX processing up to L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves
will not receive nor can send multicast / broadcast traffic.
4.3 L3S mode:
This is very similar to the L3 mode except that iptables (conn-tracking)
-------------
This is very similar to the L3 mode except that iptables (conn-tracking)
works in this mode and hence it is L3-symmetric (L3s). This will have slightly less
performance but that shouldn't matter since you are choosing this mode over plain-L3
mode to make conn-tracking work.
5. Mode flags:
At this time following mode flags are available
==============
At this time following mode flags are available
5.1 bridge:
This is the default option. To configure the IPvlan port in this mode,
-----------
This is the default option. To configure the IPvlan port in this mode,
user can choose to either add this option on the command-line or don't specify
anything. This is the traditional mode where slaves can cross-talk among
themselves apart from talking through the master device.
5.2 private:
If this option is added to the command-line, the port is set in private
------------
If this option is added to the command-line, the port is set in private
mode. i.e. port won't allow cross communication between slaves.
5.3 vepa:
If this is added to the command-line, the port is set in VEPA mode.
---------
If this is added to the command-line, the port is set in VEPA mode.
i.e. port will offload switching functionality to the external entity as
described in 802.1Qbg
Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the
......@@ -89,18 +118,25 @@ neighbor will have source and destination mac same. This will make the switch /
router send the redirect message.
6. What to choose (macvlan vs. ipvlan)?
These two devices are very similar in many regards and the specific use
=======================================
These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following
situations defines your use case then you can choose to use ipvlan -
(a) The Linux host that is connected to the external switch / router has
policy configured that allows only one mac per port.
(b) No of virtual devices created on a master exceed the mac capacity and
puts the NIC in promiscuous mode and degraded performance is a concern.
(c) If the slave device is to be put into the hostile / untrusted network
namespace where L2 on the slave could be changed / misused.
situations defines your use case then you can choose to use ipvlan:
(a) The Linux host that is connected to the external switch / router has
policy configured that allows only one mac per port.
(b) No of virtual devices created on a master exceed the mac capacity and
puts the NIC in promiscuous mode and degraded performance is a concern.
(c) If the slave device is to be put into the hostile / untrusted network
namespace where L2 on the slave could be changed / misused.
6. Example configuration:
=========================
::
+=============================================================+
| Host: host1 |
......@@ -117,30 +153,37 @@ namespace where L2 on the slave could be changed / misused.
+==============================#==============================+
(a) Create two network namespaces - ns0, ns1
ip netns add ns0
ip netns add ns1
(b) Create two ipvlan slaves on eth0 (master device)
ip link add link eth0 ipvl0 type ipvlan mode l2
ip link add link eth0 ipvl1 type ipvlan mode l2
(c) Assign slaves to the respective network namespaces
ip link set dev ipvl0 netns ns0
ip link set dev ipvl1 netns ns1
(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
- For ns0
(1) ip netns exec ns0 bash
(2) ip link set dev ipvl0 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl0
(6) ip -4 route add default via $ROUTER dev ipvl0
- For ns1
(1) ip netns exec ns1 bash
(2) ip link set dev ipvl1 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl1
(6) ip -4 route add default via $ROUTER dev ipvl1
(a) Create two network namespaces - ns0, ns1::
ip netns add ns0
ip netns add ns1
(b) Create two ipvlan slaves on eth0 (master device)::
ip link add link eth0 ipvl0 type ipvlan mode l2
ip link add link eth0 ipvl1 type ipvlan mode l2
(c) Assign slaves to the respective network namespaces::
ip link set dev ipvl0 netns ns0
ip link set dev ipvl1 netns ns1
(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
- For ns0::
(1) ip netns exec ns0 bash
(2) ip link set dev ipvl0 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl0
(6) ip -4 route add default via $ROUTER dev ipvl0
- For ns1::
(1) ip netns exec ns1 bash
(2) ip link set dev ipvl1 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl1
(6) ip -4 route add default via $ROUTER dev ipvl1
.. SPDX-License-Identifier: GPL-2.0
===========
IPvs-sysctl
===========
/proc/sys/net/ipv4/vs/* Variables:
==================================
am_droprate - INTEGER
default 10
default 10
It sets the always mode drop rate, which is used in the mode 3
of the drop_rate defense.
It sets the always mode drop rate, which is used in the mode 3
of the drop_rate defense.
amemthresh - INTEGER
default 1024
default 1024
It sets the available memory threshold (in pages), which is
used in the automatic modes of defense. When there is no
enough available memory, the respective strategy will be
enabled and the variable is automatically set to 2, otherwise
the strategy is disabled and the variable is set to 1.
It sets the available memory threshold (in pages), which is
used in the automatic modes of defense. When there is no
enough available memory, the respective strategy will be
enabled and the variable is automatically set to 2, otherwise
the strategy is disabled and the variable is set to 1.
backup_only - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
If set, disable the director function while the server is
in backup mode to avoid packet loops for DR/TUN methods.
......@@ -44,8 +51,8 @@ conn_reuse_mode - INTEGER
real servers to a very busy cluster.
conntrack - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
If set, maintain connection tracking entries for
connections handled by IPVS.
......@@ -61,28 +68,28 @@ conntrack - BOOLEAN
Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled.
cache_bypass - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
If it is enabled, forward packets to the original destination
directly when no cache server is available and destination
address is not local (iph->daddr is RTN_UNICAST). It is mostly
used in transparent web cache cluster.
If it is enabled, forward packets to the original destination
directly when no cache server is available and destination
address is not local (iph->daddr is RTN_UNICAST). It is mostly
used in transparent web cache cluster.
debug_level - INTEGER
0 - transmission error messages (default)
1 - non-fatal error messages
2 - configuration
3 - destination trash
4 - drop entry
5 - service lookup
6 - scheduling
7 - connection new/expire, lookup and synchronization
8 - state transition
9 - binding destination, template checks and applications
10 - IPVS packet transmission
11 - IPVS packet handling (ip_vs_in/ip_vs_out)
12 or more - packet traversal
- 0 - transmission error messages (default)
- 1 - non-fatal error messages
- 2 - configuration
- 3 - destination trash
- 4 - drop entry
- 5 - service lookup
- 6 - scheduling
- 7 - connection new/expire, lookup and synchronization
- 8 - state transition
- 9 - binding destination, template checks and applications
- 10 - IPVS packet transmission
- 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
- 12 or more - packet traversal
Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled.
......@@ -92,58 +99,58 @@ debug_level - INTEGER
the level.
drop_entry - INTEGER
0 - disabled (default)
The drop_entry defense is to randomly drop entries in the
connection hash table, just in order to collect back some
memory for new connections. In the current code, the
drop_entry procedure can be activated every second, then it
randomly scans 1/32 of the whole and drops entries that are in
the SYN-RECV/SYNACK state, which should be effective against
syn-flooding attack.
The valid values of drop_entry are from 0 to 3, where 0 means
that this strategy is always disabled, 1 and 2 mean automatic
modes (when there is no enough available memory, the strategy
is enabled and the variable is automatically set to 2,
otherwise the strategy is disabled and the variable is set to
1), and 3 means that that the strategy is always enabled.
- 0 - disabled (default)
The drop_entry defense is to randomly drop entries in the
connection hash table, just in order to collect back some
memory for new connections. In the current code, the
drop_entry procedure can be activated every second, then it
randomly scans 1/32 of the whole and drops entries that are in
the SYN-RECV/SYNACK state, which should be effective against
syn-flooding attack.
The valid values of drop_entry are from 0 to 3, where 0 means
that this strategy is always disabled, 1 and 2 mean automatic
modes (when there is no enough available memory, the strategy
is enabled and the variable is automatically set to 2,
otherwise the strategy is disabled and the variable is set to
1), and 3 means that that the strategy is always enabled.
drop_packet - INTEGER
0 - disabled (default)
- 0 - disabled (default)
The drop_packet defense is designed to drop 1/rate packets
before forwarding them to real servers. If the rate is 1, then
drop all the incoming packets.
The drop_packet defense is designed to drop 1/rate packets
before forwarding them to real servers. If the rate is 1, then
drop all the incoming packets.
The value definition is the same as that of the drop_entry. In
the automatic mode, the rate is determined by the follow
formula: rate = amemthresh / (amemthresh - available_memory)
when available memory is less than the available memory
threshold. When the mode 3 is set, the always mode drop rate
is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
The value definition is the same as that of the drop_entry. In
the automatic mode, the rate is determined by the follow
formula: rate = amemthresh / (amemthresh - available_memory)
when available memory is less than the available memory
threshold. When the mode 3 is set, the always mode drop rate
is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
expire_nodest_conn - BOOLEAN
0 - disabled (default)
not 0 - enabled
The default value is 0, the load balancer will silently drop
packets when its destination server is not available. It may
be useful, when user-space monitoring program deletes the
destination server (because of server overload or wrong
detection) and add back the server later, and the connections
to the server can continue.
If this feature is enabled, the load balancer will expire the
connection immediately when a packet arrives and its
destination server is not available, then the client program
will be notified that the connection is closed. This is
equivalent to the feature some people requires to flush
connections when its destination is not available.
- 0 - disabled (default)
- not 0 - enabled
The default value is 0, the load balancer will silently drop
packets when its destination server is not available. It may
be useful, when user-space monitoring program deletes the
destination server (because of server overload or wrong
detection) and add back the server later, and the connections
to the server can continue.
If this feature is enabled, the load balancer will expire the
connection immediately when a packet arrives and its
destination server is not available, then the client program
will be notified that the connection is closed. This is
equivalent to the feature some people requires to flush
connections when its destination is not available.
expire_quiescent_template - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
When set to a non-zero value, the load balancer will expire
persistent templates when the destination server is quiescent.
......@@ -158,8 +165,8 @@ expire_quiescent_template - BOOLEAN
connection and the destination server is quiescent.
ignore_tunneled - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
If set, ipvs will set the ipvs_property on all packets which are of
unrecognized protocols. This prevents us from routing tunneled
......@@ -168,30 +175,30 @@ ignore_tunneled - BOOLEAN
ipvs routing loops when ipvs is also acting as a real server).
nat_icmp_send - BOOLEAN
0 - disabled (default)
not 0 - enabled
- 0 - disabled (default)
- not 0 - enabled
It controls sending icmp error messages (ICMP_DEST_UNREACH)
for VS/NAT when the load balancer receives packets from real
servers but the connection entries don't exist.
It controls sending icmp error messages (ICMP_DEST_UNREACH)
for VS/NAT when the load balancer receives packets from real
servers but the connection entries don't exist.
pmtu_disc - BOOLEAN
0 - disabled
not 0 - enabled (default)
- 0 - disabled
- not 0 - enabled (default)
By default, reject with FRAG_NEEDED all DF packets that exceed
the PMTU, irrespective of the forwarding method. For TUN method
the flag can be disabled to fragment such packets.
secure_tcp - INTEGER
0 - disabled (default)
- 0 - disabled (default)
The secure_tcp defense is to use a more complicated TCP state
transition table. For VS/NAT, it also delays entering the
TCP ESTABLISHED state until the three way handshake is completed.
The value definition is the same as that of drop_entry and
drop_packet.
The value definition is the same as that of drop_entry and
drop_packet.
sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period
default 3 50
......@@ -248,8 +255,8 @@ sync_ports - INTEGER
8848+sync_ports-1.
snat_reroute - BOOLEAN
0 - disabled
not 0 - enabled (default)
- 0 - disabled
- not 0 - enabled (default)
If enabled, recalculate the route of SNATed packets from
realservers so that they are routed as if they originate from the
......@@ -270,6 +277,7 @@ sync_persist_mode - INTEGER
Controls the synchronisation of connections when using persistence
0: All types of connections are synchronised
1: Attempt to reduce the synchronisation traffic depending on
the connection type. For persistent services avoid synchronisation
for normal connections, do it only for persistence templates.
......
.. SPDX-License-Identifier: GPL-2.0
=============================
Kernel Connection Multiplexor
-----------------------------
=============================
Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
interface over TCP for generic application protocols. With KCM an application
can efficiently send and receive application protocol messages over TCP using
datagram sockets.
KCM implements an NxM multiplexor in the kernel as diagrammed below:
+------------+ +------------+ +------------+ +------------+
| KCM socket | | KCM socket | | KCM socket | | KCM socket |
+------------+ +------------+ +------------+ +------------+
| | | |
+-----------+ | | +----------+
| | | |
+----------------------------------+
| Multiplexor |
+----------------------------------+
| | | | |
+---------+ | | | ------------+
| | | | |
+----------+ +----------+ +----------+ +----------+ +----------+
| Psock | | Psock | | Psock | | Psock | | Psock |
+----------+ +----------+ +----------+ +----------+ +----------+
| | | | |
+----------+ +----------+ +----------+ +----------+ +----------+
| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
+----------+ +----------+ +----------+ +----------+ +----------+
KCM implements an NxM multiplexor in the kernel as diagrammed below::
+------------+ +------------+ +------------+ +------------+
| KCM socket | | KCM socket | | KCM socket | | KCM socket |
+------------+ +------------+ +------------+ +------------+
| | | |
+-----------+ | | +----------+
| | | |
+----------------------------------+
| Multiplexor |
+----------------------------------+
| | | | |
+---------+ | | | ------------+
| | | | |
+----------+ +----------+ +----------+ +----------+ +----------+
| Psock | | Psock | | Psock | | Psock | | Psock |
+----------+ +----------+ +----------+ +----------+ +----------+
| | | | |
+----------+ +----------+ +----------+ +----------+ +----------+
| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
+----------+ +----------+ +----------+ +----------+ +----------+
KCM sockets
-----------
===========
The KCM sockets provide the user interface to the multiplexor. All the KCM sockets
bound to a multiplexor are considered to have equivalent function, and I/O
......@@ -37,7 +40,7 @@ operations in different sockets may be done in parallel without the need for
synchronization between threads in userspace.
Multiplexor
-----------
===========
The multiplexor provides the message steering. In the transmit path, messages
written on a KCM socket are sent atomically on an appropriate TCP socket.
......@@ -45,14 +48,14 @@ Similarly, in the receive path, messages are constructed on each TCP socket
(Psock) and complete messages are steered to a KCM socket.
TCP sockets & Psocks
--------------------
====================
TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
for each bound TCP socket, this structure holds the state for constructing
messages on receive as well as other connection specific information for KCM.
Connected mode semantics
------------------------
========================
Each multiplexor assumes that all attached TCP connections are to the same
destination and can use the different connections for load balancing when
......@@ -60,7 +63,7 @@ transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
can be used to send and receive messages from the KCM socket.
Socket types
------------
============
KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
......@@ -110,23 +113,23 @@ User interface
Creating a multiplexor
----------------------
A new multiplexor and initial KCM socket is created by a socket call:
A new multiplexor and initial KCM socket is created by a socket call::
socket(AF_KCM, type, protocol)
- type is either SOCK_DGRAM or SOCK_SEQPACKET
- protocol is KCMPROTO_CONNECTED
- type is either SOCK_DGRAM or SOCK_SEQPACKET
- protocol is KCMPROTO_CONNECTED
Cloning KCM sockets
-------------------
After the first KCM socket is created using the socket call as described
above, additional sockets for the multiplexor can be created by cloning
a KCM socket. This is accomplished by an ioctl on a KCM socket:
a KCM socket. This is accomplished by an ioctl on a KCM socket::
/* From linux/kcm.h */
struct kcm_clone {
int fd;
int fd;
};
struct kcm_clone info;
......@@ -142,11 +145,11 @@ Attach transport sockets
------------------------
Attaching of transport sockets to a multiplexor is performed by calling an
ioctl on a KCM socket for the multiplexor. e.g.:
ioctl on a KCM socket for the multiplexor. e.g.::
/* From linux/kcm.h */
struct kcm_attach {
int fd;
int fd;
int bpf_fd;
};
......@@ -160,18 +163,19 @@ ioctl on a KCM socket for the multiplexor. e.g.:
ioctl(kcmfd, SIOCKCMATTACH, &info);
The kcm_attach structure contains:
fd: file descriptor for TCP socket being attached
bpf_prog_fd: file descriptor for compiled BPF program downloaded
- fd: file descriptor for TCP socket being attached
- bpf_prog_fd: file descriptor for compiled BPF program downloaded
Unattach transport sockets
--------------------------
Unattaching a transport socket from a multiplexor is straightforward. An
"unattach" ioctl is done with the kcm_unattach structure as the argument:
"unattach" ioctl is done with the kcm_unattach structure as the argument::
/* From linux/kcm.h */
struct kcm_unattach {
int fd;
int fd;
};
struct kcm_unattach info;
......@@ -190,7 +194,7 @@ When receive is disabled, any pending messages in the socket's
receive buffer are moved to other sockets. This feature is useful
if an application thread knows that it will be doing a lot of
work on a request and won't be able to service new messages for a
while. Example use:
while. Example use::
int val = 1;
......@@ -200,7 +204,7 @@ BFP programs for message delineation
------------------------------------
BPF programs can be compiled using the BPF LLVM backend. For example,
the BPF program for parsing Thrift is:
the BPF program for parsing Thrift is::
#include "bpf.h" /* for __sk_buff */
#include "bpf_helpers.h" /* for load_word intrinsic */
......@@ -250,6 +254,7 @@ based on groups, or batches of messages, can be beneficial for performance.
On transmit, there are three ways an application can batch (pipeline)
messages on a KCM socket.
1) Send multiple messages in a single sendmmsg.
2) Send a group of messages each with a sendmsg call, where all messages
except the last have MSG_BATCH in the flags of sendmsg call.
......
......@@ -99,7 +99,7 @@ treat the LocalTalk device like an ordinary Ethernet device, even if
that's what it looks like to Netatalk.
Instead, you follow the same procedure as for doing IP in EtherTalk.
See Documentation/networking/ipddp.txt for more information about the
See Documentation/networking/ipddp.rst for more information about the
kernel driver and userspace tools needed.
--------------------------------------
......
......@@ -1051,7 +1051,7 @@ for more information on hardware timestamps.
-------------------------------------------------------------------------------
- Packet sockets work well together with Linux socket filters, thus you also
might want to have a look at Documentation/networking/filter.txt
might want to have a look at Documentation/networking/filter.rst
--------------------------------------------------------------------------------
+ THANKS
......
......@@ -792,7 +792,7 @@ counters to indicate the ACK is skipped in which scenario. The ACK
would only be skipped if the received packet is either a SYN packet or
it has no data.
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
* TcpExtTCPACKSkippedSynRecv
......
......@@ -3192,7 +3192,7 @@ Q: https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
F: Documentation/bpf/
F: Documentation/networking/filter.txt
F: Documentation/networking/filter.rst
F: arch/*/net/*
F: include/linux/bpf*
F: include/linux/filter.h
......@@ -4728,7 +4728,7 @@ DECnet NETWORK LAYER
L: linux-decnet-user@lists.sourceforge.net
S: Orphan
W: http://linux-decnet.sourceforge.net
F: Documentation/networking/decnet.txt
F: Documentation/networking/decnet.rst
F: net/decnet/
DECSTATION PLATFORM SUPPORT
......@@ -7815,7 +7815,7 @@ HUAWEI ETHERNET DRIVER
M: Aviad Krawczyk <aviad.krawczyk@huawei.com>
L: netdev@vger.kernel.org
S: Supported
F: Documentation/networking/hinic.txt
F: Documentation/networking/hinic.rst
F: drivers/net/ethernet/huawei/hinic/
HUGETLB FILESYSTEM
......@@ -8934,7 +8934,7 @@ L: lvs-devel@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git
F: Documentation/networking/ipvs-sysctl.txt
F: Documentation/networking/ipvs-sysctl.rst
F: include/net/ip_vs.h
F: include/uapi/linux/ip_vs.h
F: net/netfilter/ipvs/
......
......@@ -306,7 +306,7 @@ config ATM_IA
for more info about the cards. Say Y (or M to compile as a module
named iphase) here if you have one of these cards.
See the file <file:Documentation/networking/iphase.txt> for further
See the file <file:Documentation/networking/iphase.rst> for further
details.
config ATM_IA_DEBUG
......@@ -336,7 +336,7 @@ config ATM_FORE200E
on PCI and SBUS hosts. Say Y (or M to compile as a module
named fore_200e) here if you have one of these ATM adapters.
See the file <file:Documentation/networking/fore200e.txt> for
See the file <file:Documentation/networking/fore200e.rst> for
further details.
config ATM_FORE200E_USE_TASKLET
......
......@@ -50,7 +50,7 @@ config BONDING
The driver supports multiple bonding modes to allow for both high
performance and high availability operation.
Refer to <file:Documentation/networking/bonding.txt> for more
Refer to <file:Documentation/networking/bonding.rst> for more
information.
To compile this driver as a module, choose M here: the module
......@@ -126,7 +126,7 @@ config EQUALIZER
Linux driver or with a Livingston Portmaster 2e.
Say Y if you want this and read
<file:Documentation/networking/eql.txt>. You may also want to read
<file:Documentation/networking/eql.rst>. You may also want to read
section 6.2 of the NET-3-HOWTO, available from
<http://www.tldp.org/docs.html#howto>.
......
......@@ -59,7 +59,7 @@ config COPS
package. This driver is experimental, which means that it may not
work. This driver will only work if you choose "AppleTalk DDP"
networking support, above.
Please read the file <file:Documentation/networking/cops.txt>.
Please read the file <file:Documentation/networking/cops.rst>.
config COPS_DAYNA
bool "Dayna firmware support"
......@@ -86,7 +86,7 @@ config IPDDP
box is stuck on an AppleTalk only network) or decapsulate (e.g. if
you want your Linux box to act as an Internet gateway for a zoo of
AppleTalk connected Macs). Please see the file
<file:Documentation/networking/ipddp.txt> for more information.
<file:Documentation/networking/ipddp.rst> for more information.
If you say Y here, the AppleTalk-IP support will be compiled into
the kernel. In this case, you can either use encapsulation or
......@@ -107,4 +107,4 @@ config IPDDP_ENCAP
IP packets inside AppleTalk frames; this is useful if your Linux box
is stuck on an AppleTalk network (which hopefully contains a
decapsulator somewhere). Please see
<file:Documentation/networking/ipddp.txt> for more information.
<file:Documentation/networking/ipddp.rst> for more information.
......@@ -9,7 +9,7 @@ menuconfig ARCNET
---help---
If you have a network card of this type, say Y and check out the
(arguably) beautiful poetry in
<file:Documentation/networking/arcnet.txt>.
<file:Documentation/networking/arcnet.rst>.
You need both this driver, and the driver for the particular ARCnet
chipset of your card. If you don't know, then it's probably a
......@@ -28,7 +28,7 @@ config ARCNET_1201
arc0 device. You need to say Y here to communicate with
industry-standard RFC1201 implementations, like the arcether.com
packet driver or most DOS/Windows ODI drivers. Please read the
ARCnet documentation in <file:Documentation/networking/arcnet.txt>
ARCnet documentation in <file:Documentation/networking/arcnet.rst>
for more information about using arc0.
config ARCNET_1051
......@@ -42,7 +42,7 @@ config ARCNET_1051
industry-standard RFC1201 implementations, like the arcether.com
packet driver or most DOS/Windows ODI drivers. RFC1201 is included
automatically as the arc0 device. Please read the ARCnet
documentation in <file:Documentation/networking/arcnet.txt> for more
documentation in <file:Documentation/networking/arcnet.rst> for more
information about using arc0e and arc0s.
config ARCNET_RAW
......
......@@ -28,7 +28,7 @@ config CAIF_SPI_SLAVE
The CAIF Link layer SPI Protocol driver for Slave SPI interface.
This driver implements a platform driver to accommodate for a
platform specific SPI device. A sample CAIF SPI Platform device is
provided in <file:Documentation/networking/caif/spi_porting.txt>.
provided in <file:Documentation/networking/caif/spi_porting.rst>.
config CAIF_SPI_SYNC
bool "Next command and length in start of frame"
......
......@@ -30,7 +30,7 @@ config 6PACK
Note that this driver is still experimental and might cause
problems. For details about the features and the usage of the
driver, read <file:Documentation/networking/6pack.txt>.
driver, read <file:Documentation/networking/6pack.rst>.
To compile this driver as a module, choose M here: the module
will be called 6pack.
......@@ -127,7 +127,7 @@ config BAYCOM_SER_FDX
your serial interface chip. To configure the driver, use the sethdlc
utility available in the standard ax25 utilities package. For
information on the modems, see <http://www.baycom.de/> and
<file:Documentation/networking/baycom.txt>.
<file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module
will be called baycom_ser_fdx. This is recommended.
......@@ -145,7 +145,7 @@ config BAYCOM_SER_HDX
the driver, use the sethdlc utility available in the standard ax25
utilities package. For information on the modems, see
<http://www.baycom.de/> and
<file:Documentation/networking/baycom.txt>.
<file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module
will be called baycom_ser_hdx. This is recommended.
......@@ -160,7 +160,7 @@ config BAYCOM_PAR
par96 designs. To configure the driver, use the sethdlc utility
available in the standard ax25 utilities package. For information on
the modems, see <http://www.baycom.de/> and the file
<file:Documentation/networking/baycom.txt>.
<file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module
will be called baycom_par. This is recommended.
......@@ -175,7 +175,7 @@ config BAYCOM_EPP
designs. To configure the driver, use the sethdlc utility available
in the standard ax25 utilities package. For information on the
modems, see <http://www.baycom.de/> and the file
<file:Documentation/networking/baycom.txt>.
<file:Documentation/networking/baycom.rst>.
To compile this driver as a module, choose M here: the module
will be called baycom_epp. This is recommended.
......
......@@ -336,7 +336,7 @@ config DLCI
To use frame relay, you need supporting hardware (called FRAD) and
certain programs from the net-tools package as explained in
<file:Documentation/networking/framerelay.txt>.
<file:Documentation/networking/framerelay.rst>.
To compile this driver as a module, choose M here: the
module will be called dlci.
......@@ -361,7 +361,7 @@ config SDLA
These are multi-protocol cards, but only Frame Relay is supported
by the driver at this time. Please read
<file:Documentation/networking/framerelay.txt>.
<file:Documentation/networking/framerelay.rst>.
To compile this driver as a module, choose M here: the
module will be called sdla.
......
......@@ -86,7 +86,7 @@ config INET
"Sysctl support" below, you can change various aspects of the
behavior of the TCP/IP code by writing to the (virtual) files in
/proc/sys/net/ipv4/*; the options are explained in the file
<file:Documentation/networking/ip-sysctl.txt>.
<file:Documentation/networking/ip-sysctl.rst>.
Short answer: say Y.
......
......@@ -16,7 +16,7 @@ config ATM
of your ATM card below.
Note that you need a set of user-space programs to actually make use
of ATM. See the file <file:Documentation/networking/atm.txt> for
of ATM. See the file <file:Documentation/networking/atm.rst> for
further details.
config ATM_CLIP
......
......@@ -40,7 +40,7 @@ config AX25
radio as well as information about how to configure an AX.25 port is
contained in the AX25-HOWTO, available from
<http://www.tldp.org/docs.html#howto>. You might also want to
check out the file <file:Documentation/networking/ax25.txt> in the
check out the file <file:Documentation/networking/ax25.rst> in the
kernel source. More information about digital amateur radio in
general is on the WWW at
<http://www.tapr.org/>.
......@@ -88,7 +88,7 @@ config NETROM
users as well as information about how to configure an AX.25 port is
contained in the Linux Ham Wiki, available from
<http://www.linux-ax25.org>. You also might want to check out the
file <file:Documentation/networking/ax25.txt>. More information about
file <file:Documentation/networking/ax25.rst>. More information about
digital amateur radio in general is on the WWW at
<http://www.tapr.org/>.
......@@ -107,7 +107,7 @@ config ROSE
users as well as information about how to configure an AX.25 port is
contained in the Linux Ham Wiki, available from
<http://www.linux-ax25.org>. You also might want to check out the
file <file:Documentation/networking/ax25.txt>. More information about
file <file:Documentation/networking/ax25.rst>. More information about
digital amateur radio in general is on the WWW at
<http://www.tapr.org/>.
......
......@@ -39,6 +39,6 @@ config CEPH_LIB_USE_DNS_RESOLVER
be resolved using the CONFIG_DNS_RESOLVER facility.
For information on how to use CONFIG_DNS_RESOLVER consult
Documentation/networking/dns_resolver.txt
Documentation/networking/dns_resolver.rst
If unsure, say N.
......@@ -6,7 +6,7 @@
* Jamal Hadi Salim
* Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
*
* See Documentation/networking/gen_stats.txt
* See Documentation/networking/gen_stats.rst
*/
#include <linux/types.h>
......
......@@ -15,7 +15,7 @@ config DECNET
<http://linux-decnet.sourceforge.net/>.
More detailed documentation is available in
<file:Documentation/networking/decnet.txt>.
<file:Documentation/networking/decnet.rst>.
Be sure to say Y to "/proc file system support" and "Sysctl support"
below when using DECnet, since you will need sysctl support to aid
......@@ -40,4 +40,4 @@ config DECNET_ROUTER
filtering" option will be required for the forthcoming routing daemon
to work.
See <file:Documentation/networking/decnet.txt> for more information.
See <file:Documentation/networking/decnet.rst> for more information.
......@@ -19,7 +19,7 @@ config DNS_RESOLVER
SMB2 later. DNS Resolver is supported by the userspace upcall
helper "/sbin/dns.resolver" via /etc/request-key.conf.
See <file:Documentation/networking/dns_resolver.txt> for further
See <file:Documentation/networking/dns_resolver.rst> for further
information.
To compile this as a module, choose M here: the module will be called
......
/* Key type used to cache DNS lookups made by the kernel
*
* See Documentation/networking/dns_resolver.txt
* See Documentation/networking/dns_resolver.rst
*
* Copyright (c) 2007 Igor Mammedov
* Author(s): Igor Mammedov (niallain@gmail.com)
......
/* Upcall routine, designed to work as a key type and working through
* /sbin/request-key to contact userspace when handling DNS queries.
*
* See Documentation/networking/dns_resolver.txt
* See Documentation/networking/dns_resolver.rst
*
* Copyright (c) 2007 Igor Mammedov
* Author(s): Igor Mammedov (niallain@gmail.com)
......
......@@ -49,7 +49,7 @@ config IP_ADVANCED_ROUTER
Note that some distributions enable it in startup scripts.
For details about rp_filter strict and loose mode read
<file:Documentation/networking/ip-sysctl.txt>.
<file:Documentation/networking/ip-sysctl.rst>.
If unsure, say N here.
......
......@@ -853,7 +853,7 @@ static bool icmp_unreach(struct sk_buff *skb)
case ICMP_FRAG_NEEDED:
/* for documentation of the ip_no_pmtu_disc
* values please see
* Documentation/networking/ip-sysctl.txt
* Documentation/networking/ip-sysctl.rst
*/
switch (net->ipv4.sysctl_ip_no_pmtu_disc) {
default:
......
......@@ -13,7 +13,7 @@ menuconfig IPV6
For general information about IPv6, see
<https://en.wikipedia.org/wiki/IPv6>.
For specific information about IPv6 under Linux, see
Documentation/networking/ipv6.txt and read the HOWTO at
Documentation/networking/ipv6.rst and read the HOWTO at
<http://www.tldp.org/HOWTO/Linux+IPv6-HOWTO/>
To compile this protocol support as a module, choose M here: the
......
......@@ -11,7 +11,7 @@
*
* How to get into it:
*
* 1) read Documentation/networking/filter.txt
* 1) read Documentation/networking/filter.rst
* 2) Run `bpf_asm [-c] <filter-prog file>` to translate into binary
* blob that is loadable with xt_bpf, cls_bpf et al. Note: -c will
* pretty print a C-like construct.
......
......@@ -13,7 +13,7 @@
* for making a verdict when multiple simple BPF programs are combined
* into one in order to prevent parsing same headers multiple times.
*
* More on how to debug BPF opcodes see Documentation/networking/filter.txt
* More on how to debug BPF opcodes see Documentation/networking/filter.rst
* which is the main document on BPF. Mini howto for getting started:
*
* 1) `./bpf_dbg` to enter the shell (shell cmds denoted with '>'):
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment