Commits · 0baf26b0fcd74bbfcef53c5d5e8bad2b99c8d0d2 · nexedi / linux

09 Jan, 2020 8 commits

bpf: tcp: Support tcp_congestion_ops in bpf · 0baf26b0

Martin KaFai Lau authored Jan 08, 2020

This patch makes "struct tcp_congestion_ops" to be the first user
of BPF STRUCT_OPS.  It allows implementing a tcp_congestion_ops
in bpf.

The BPF implemented tcp_congestion_ops can be used like
regular kernel tcp-cc through sysctl and setsockopt.  e.g.
[root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion
net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic
net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic
net.ipv4.tcp_congestion_control = bpf_cubic

There has been attempt to move the TCP CC to the user space
(e.g. CCP in TCP).   The common arguments are faster turn around,
get away from long-tail kernel versions in production...etc,
which are legit points.

BPF has been the continuous effort to join both kernel and
userspace upsides together (e.g. XDP to gain the performance
advantage without bypassing the kernel).  The recent BPF
advancements (in particular BTF-aware verifier, BPF trampoline,
BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
possible in BPF.  It allows a faster turnaround for testing algorithm
in the production while leveraging the existing (and continue growing)
BPF feature/framework instead of building one specifically for
userspace TCP CC.

This patch allows write access to a few fields in tcp-sock
(in bpf_tcp_ca_btf_struct_access()).

The optional "get_info" is unsupported now.  It can be added
later.  One possible way is to output the info with a btf-id
to describe the content.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com

0baf26b0

bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS · 85d33df3

Martin KaFai Lau authored Jan 08, 2020

The patch introduces BPF_MAP_TYPE_STRUCT_OPS.  The map value
is a kernel struct with its func ptr implemented in bpf prog.
This new map is the interface to register/unregister/introspect
a bpf implemented kernel struct.

The kernel struct is actually embedded inside another new struct
(or called the "value" struct in the code).  For example,
"struct tcp_congestion_ops" is embbeded in:
struct bpf_struct_ops_tcp_congestion_ops {
	refcount_t refcnt;
	enum bpf_struct_ops_state state;
	struct tcp_congestion_ops data;  /* <-- kernel subsystem struct here */
}
The map value is "struct bpf_struct_ops_tcp_congestion_ops".
The "bpftool map dump" will then be able to show the
state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
number of tcp_sock in the tcp_congestion_ops case).  This "value" struct
is created automatically by a macro.  Having a separate "value" struct
will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
"void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
initialization works before registering the struct_ops to the kernel
subsystem).  The libbpf will take care of finding and populating the
"struct bpf_struct_ops_XYZ" from "struct XYZ".

Register a struct_ops to a kernel subsystem:
1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
   set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
   running kernel.
   Instead of reusing the attr->btf_value_type_id,
   btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
   used as the "user" btf which could store other useful sysadmin/debug
   info that may be introduced in the furture,
   e.g. creation-date/compiler-details/map-creator...etc.
3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
   in the running kernel btf.  Populate the value of this object.
   The function ptr should be populated with the prog fds.
4. Call BPF_MAP_UPDATE with the object created in (3) as
   the map value.  The key is always "0".

During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
args as an array of u64 is generated.  BPF_MAP_UPDATE also allows
the specific struct_ops to do some final checks in "st_ops->init_member()"
(e.g. ensure all mandatory func ptrs are implemented).
If everything looks good, it will register this kernel struct
to the kernel subsystem.  The map will not allow further update
from this point.

Unregister a struct_ops from the kernel subsystem:
BPF_MAP_DELETE with key "0".

Introspect a struct_ops:
BPF_MAP_LOOKUP_ELEM with key "0".  The map value returned will
have the prog _id_ populated as the func ptr.

The map value state (enum bpf_struct_ops_state) will transit from:
INIT (map created) =>
INUSE (map updated, i.e. reg) =>
TOBEFREE (map value deleted, i.e. unreg)

The kernel subsystem needs to call bpf_struct_ops_get() and
bpf_struct_ops_put() to manage the "refcnt" in the
"struct bpf_struct_ops_XYZ".  This patch uses a separate refcnt
for the purose of tracking the subsystem usage.  Another approach
is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
the subsystem's usage by doing map->refcnt - map->usercnt to filter out
the map-fd/pinned-map usage.  However, that will also tie down the
future semantics of map->refcnt and map->usercnt.

The very first subsystem's refcnt (during reg()) holds one
count to map->refcnt.  When the very last subsystem's refcnt
is gone, it will also release the map->refcnt.  All bpf_prog will be
freed when the map->refcnt reaches 0 (i.e. during map_free()).

Here is how the bpftool map command will look like:
[root@arch-fb-vm1 bpf]# bpftool map show
6: struct_ops  name dctcp  flags 0x0
	key 4B  value 256B  max_entries 1  memlock 4096B
	btf_id 6
[root@arch-fb-vm1 bpf]# bpftool map dump id 6
[{
        "value": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": 1,
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 2,
                "init": 24,
                "release": 0,
                "ssthresh": 25,
                "cong_avoid": 30,
                "set_state": 27,
                "cwnd_event": 28,
                "in_ack_event": 26,
                "undo_cwnd": 29,
                "pkts_acked": 0,
                "min_tso_segs": 0,
                "sndbuf_expand": 0,
                "cong_control": 0,
                "get_info": 0,
                "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
                ],
                "owner": 0
            }
        }
    }
]

Misc Notes:
* bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
  It does an inplace update on "*value" instead returning a pointer
  to syscall.c.  Otherwise, it needs a separate copy of "zero" value
  for the BPF_STRUCT_OPS_STATE_INIT to avoid races.

* The bpf_struct_ops_map_delete_elem() is also called without
  preempt_disable() from map_delete_elem().  It is because
  the "->unreg()" may requires sleepable context, e.g.
  the "tcp_unregister_congestion_control()".

* "const" is added to some of the existing "struct btf_func_model *"
  function arg to avoid a compiler warning caused by this patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com

85d33df3

bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS · 27ae7997

Martin KaFai Lau authored Jan 08, 2020

This patch allows the kernel's struct ops (i.e. func ptr) to be
implemented in BPF. The first use case in this series is the
"struct tcp_congestion_ops" which will be introduced in a
latter patch.

This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
func ptr of a kernel struct. The attr->attach_btf_id is the btf id
of a kernel struct. The attr->expected_attach_type is the member
"index" of that kernel struct. The first member of a struct starts
with member index 0. That will avoid ambiguity when a kernel struct
has multiple func ptrs with the same func signature.

For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
to implement the "init" func ptr of the "struct tcp_congestion_ops".
The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
of the _running_ kernel. The attr->expected_attach_type is 3.

The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
by arch_prepare_bpf_trampoline that will be done in the next
patch when introducing BPF_MAP_TYPE_STRUCT_OPS.

"struct bpf_struct_ops" is introduced as a common interface for the kernel
struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel
struct will need to implement an instance of the "struct bpf_struct_ops".

The supporting kernel struct also needs to implement a bpf_verifier_ops.
During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
bpf_verifier_ops by searching the attr->attach_btf_id.

A new "btf_struct_access" is also added to the bpf_verifier_ops such
that the supporting kernel struct can optionally provide its own specific
check on accessing the func arg (e.g. provide limited write access).

After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
to initialize some values (e.g. the btf id of the supporting kernel
struct) and it can only be done once the btf_vmlinux is available.

The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
if the return type of the prog->aux->attach_func_proto is "void".
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com

27ae7997

bpf: Support bitfield read access in btf_struct_access · 976aba00

Martin KaFai Lau authored Jan 08, 2020

This patch allows bitfield access as a scalar.

It checks "off + size > t->size" to avoid accessing bitfield
end up accessing beyond the struct.  This check is done
outside of the loop since it is applicable to all access.

It also takes this chance to break early on the "off < moff" case.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003501.3855427-1-kafai@fb.com

976aba00

bpf: Add enum support to btf_ctx_access() · 218b3f65

Martin KaFai Lau authored Jan 08, 2020

It allows bpf prog (e.g. tracing) to attach
to a kernel function that takes enum argument.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003459.3855366-1-kafai@fb.com

218b3f65

bpf: Avoid storing modifier to info->btf_id · 275517ff

Martin KaFai Lau authored Jan 08, 2020

info->btf_id expects the btf_id of a struct, so it should
store the final result after skipping modifiers (if any).

It also takes this chanace to add a missing newline in one of the
bpf_log() messages.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003456.3855176-1-kafai@fb.com

275517ff

bpf: Save PTR_TO_BTF_ID register state when spilling to stack · 65726b5b

Martin KaFai Lau authored Jan 08, 2020

This patch makes the verifier save the PTR_TO_BTF_ID register state when
spilling to the stack.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003454.3854870-1-kafai@fb.com

65726b5b

selftests/bpf: Restore original comm in test_overhead · e4300224

Stanislav Fomichev authored Jan 08, 2020

test_overhead changes task comm in order to estimate BPF trampoline
overhead but never sets the comm back to the original one.
We have the tests (like core_reloc.c) that have 'test_progs'
as hard-coded expected comm, so let's try to preserve the
original comm.

Currently, everything works because the order of execution is:
first core_recloc, then test_overhead; but let's make it a bit
future-proof.

Other related changes: use 'test_overhead' as new comm instead of
'test' to make it easy to debug and drop '\n' at the end.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200108192132.189221-1-sdf@google.com

e4300224

08 Jan, 2020 2 commits

bpftool: Add misc section and probe for large INSN limit · 2faef64a

Michal Rostecki authored Jan 08, 2020

Introduce a new probe section (misc) for probes not related to concrete
map types, program types, functions or kernel configuration. Introduce a
probe for large INSN limit as the first one in that section.

Example outputs:

  # bpftool feature probe
  [...]
  Scanning miscellaneous eBPF features...
  Large program size limit is available

  # bpftool feature probe macros
  [...]
  /*** eBPF misc features ***/
  #define HAVE_HAVE_LARGE_INSN_LIMIT

  # bpftool feature probe -j | jq '.["misc"]'
  {
    "have_large_insn_limit": true
  }
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Link: https://lore.kernel.org/bpf/20200108162428.25014-3-mrostecki@opensuse.org

2faef64a

libbpf: Add probe for large INSN limit · 5ff05120

Michal Rostecki authored Jan 08, 2020

Introduce a new probe which checks whether kernel has large maximum
program size which was increased in the following commit:

c04c0d2b ("bpf: increase complexity limit and maximum program size")

Based on the similar check in Cilium[0], authored by Daniel Borkmann.

  [0] https://github.com/cilium/cilium/commit/657d0f585afd26232cfa5d4e70b6f64d2ea91596Co-authored-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Link: https://lore.kernel.org/bpf/20200108162428.25014-2-mrostecki@opensuse.org

5ff05120

07 Jan, 2020 23 commits

ptp: clockmatrix: Rework clockmatrix version information. · 1ece2fbe

Vincent Cheng authored Jan 07, 2020

Simplify and fix the version information displayed by the driver.
The new info better relects what is needed to support the hardware.

Prev:
Version: 4.8.0, Pipeline 22169 0x4001, Rev 0, Bond 5, CSR 311, IRQ 2

New:
Version: 4.8.0, Id: 0x4001  Hw Rev: 5  OTP Config Select: 15

- Remove pipeline, CSR and IRQ because version x.y.z already incorporates
  this information.
- Remove bond number because it is not used.
- Remove rev number because register was not implemented, always 0
- Add HW Rev ID register to replace rev number
- Add OTP config select to show the user configuration chosen by
  the configurable GPIO pins on start-up
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ece2fbe

enetc: Fix inconsistent IS_ERR and PTR_ERR · 4addbcb3

YueHaibing authored Jan 07, 2020

The proper pointer to be passed as argument is hw
Detected using Coccinelle.

Fixes: 6517798d ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4addbcb3

enetc: Fix an off by one in enetc_setup_tc_txtime() · 0d6e5bfc

Dan Carpenter authored Jan 07, 2020

The priv->tx_ring[] has 16 elements but only priv->num_tx_rings are
set up, the rest are NULL.  This ">" comparison should be ">=" to avoid
a potential crash.

Fixes: 0d08c9ec ("enetc: add support time specific departure base on the qos etf")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d6e5bfc

Merge branch 'Documentation-stmmac-documentation-improvements' · cbefe2c9

David S. Miller authored Jan 07, 2020

Jose Abreu says:

====================
Documentation: stmmac documentation improvements

Converts stmmac documentation to RST format.

1) Adds missing entry of stmmac documentation to MAINTAINERS.

2) Converts stmmac documentation to RST format and adds some new info.

3) Adds the new RST file to the list of files.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cbefe2c9

Documentation: networking: Add stmmac to device drivers list · b053b28e

Jose Abreu authored Jan 07, 2020

Add the stmmac RST file to the index.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b053b28e

Documentation: networking: Convert stmmac documentation to RST format · 2ffebffb

Jose Abreu authored Jan 07, 2020

Convert the documentation of the driver to RST format and delete the old
txt and old information that no longer applies.

Also, add some new information.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ffebffb

MAINTAINERS: Add stmmac Ethernet driver documentation entry · 15011254

Jose Abreu authored Jan 07, 2020

Add the missing entry for the file that documents the stmicro Ethernet
driver stmmac.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15011254

drivers: net: cisco_hdlc: use __func__ in debug message · c68d7248

Chen Zhou authored Jan 07, 2020

Use __func__ to print the function name instead of hard coded string.
BTW, replace printk(KERN_DEBUG, ...) with netdev_dbg.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c68d7248

Merge branch 'net-ch9200-code-cleanup' · 10332dc2

David S. Miller authored Jan 07, 2020

Chen Zhou says:

====================
net: ch9200: code cleanup

patch 1 introduce __func__ in debug message.
patch 2 remove unnecessary return.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

10332dc2

net: ch9200: remove unnecessary return · 195234b8

Chen Zhou authored Jan 07, 2020

The return is not needed, remove it.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

195234b8

net: ch9200: use __func__ in debug message · e64dec83

Chen Zhou authored Jan 07, 2020

Use __func__ to print the function name instead of hard coded string.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e64dec83

Merge branch 'ionic-driver-updates' · 58cf542a

David S. Miller authored Jan 07, 2020

Shannon Nelson says:

====================
ionic: driver updates

These are a few little updates for the ionic network driver.

v2: dropped IBM msi patch
    added fix for a compiler warning
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

58cf542a

ionic: clear compiler warning on hb use before set · 6be1a5ce

Shannon Nelson authored Jan 06, 2020

Build checks have pointed out that 'hb' can theoretically
be used before set, so let's initialize it and get rid
of the compiler complaint.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

6be1a5ce

ionic: restrict received packets to mtu size · c37d6e3f

Shannon Nelson authored Jan 06, 2020

Make sure the NIC drops packets that are larger than the
specified MTU.

The front end of the NIC will accept packets larger than MTU and
will copy all the data it can to fill up the driver's posted
buffers - if the buffers are not long enough the packet will
then get dropped.  With the Rx SG buffers allocagted as full
pages, we are currently setting up more space than MTU size
available and end up receiving some packets that are larger
than MTU, up to the size of buffers posted.  To be sure the
NIC doesn't waste our time with oversized packets we need to
lie a little in the SG descriptor about how long is the last
SG element.

At dealloc time, we know the allocation was a page, so the
deallocation doesn't care about what length we put in the
descriptor.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

c37d6e3f

ionic: add Rx dropped packet counter · 24cfa8c7

Shannon Nelson authored Jan 06, 2020

Add a counter for packets dropped by the driver, typically
for bad size or a receive error seen by the device.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

24cfa8c7

ionic: drop use of subdevice tags · 3daca28f

Shannon Nelson authored Jan 06, 2020

The subdevice concept is not being used in the driver, so
drop the references to it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

3daca28f

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 5528e0d7

David S. Miller authored Jan 06, 2020

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2020-01-06

This series contains updates to igc to add basic support for
timestamping.

Vinicius adds basic support for timestamping and enables ptp4l/phc2sys
to work with i225 devices.  Initially, adds the ability to read and
adjust the PHC clock.  Patches 2 & 3 enable and retrieve hardware
timestamps.  Patch 4 implements the ethtool ioctl that ptp4l uses to
check what timestamping methods are supported.  Lastly, added support to
do timestamping using the "Start of Packet" signal from the PHY, which
is now supported in i225 devices.

While i225 does support multiple PTP domains, with multiple timestamping
registers, we currently only support one PTP domain and use only one of
the timestamping registers for implementation purposes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5528e0d7

Merge branch 'Unique-mv88e6xxx-IRQ-names' · 1b935183

David S. Miller authored Jan 06, 2020

Andrew Lunn says:

====================
Unique mv88e6xxx IRQ names

There are a few boards which have multiple mv88e6xxx switches. With
such boards, it can be hard to determine which interrupts belong to
which switches. Make the interrupt names unique by including the
device name in the interrupt name. For the SERDES interrupt, also
include the port number. As a result of these patches ZII devel C
looks like:

 50:          0  gpio-vf610  27 Level     mv88e6xxx-0.1:00
 54:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-0.1:00-g1-atu-prob
 56:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-0.1:00-g1-vtu-prob
 58:          0  mv88e6xxx-g1   7 Edge      mv88e6xxx-0.1:00-g2
 61:          0  mv88e6xxx-g2   1 Edge      !mdio-mux!mdio@1!switch@0!mdio:01
 62:          0  mv88e6xxx-g2   2 Edge      !mdio-mux!mdio@1!switch@0!mdio:02
 63:          0  mv88e6xxx-g2   3 Edge      !mdio-mux!mdio@1!switch@0!mdio:03
 64:          0  mv88e6xxx-g2   4 Edge      !mdio-mux!mdio@1!switch@0!mdio:04
 70:          0  mv88e6xxx-g2  10 Edge      mv88e6xxx-0.1:00-serdes-10
 75:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-0.1:00-watchdog
 76:          5  gpio-vf610  26 Level     mv88e6xxx-0.2:00
 80:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-0.2:00-g1-atu-prob
 82:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-0.2:00-g1-vtu-prob
 84:          4  mv88e6xxx-g1   7 Edge      mv88e6xxx-0.2:00-g2
 87:          2  mv88e6xxx-g2   1 Edge      !mdio-mux!mdio@2!switch@0!mdio:01
 88:          0  mv88e6xxx-g2   2 Edge      !mdio-mux!mdio@2!switch@0!mdio:02
 89:          0  mv88e6xxx-g2   3 Edge      !mdio-mux!mdio@2!switch@0!mdio:03
 90:          0  mv88e6xxx-g2   4 Edge      !mdio-mux!mdio@2!switch@0!mdio:04
 95:          3  mv88e6xxx-g2   9 Edge      mv88e6xxx-0.2:00-serdes-9
 96:          0  mv88e6xxx-g2  10 Edge      mv88e6xxx-0.2:00-serdes-10
101:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-0.2:00-watchdog

Interrupt names like !mdio-mux!mdio@2!switch@0!mdio:01 are created by
phylib for the integrated PHYs. The mv88e6xxx driver does not
determine these names.
====================
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b935183

net: dsa: mv88e6xxx: Unique ATU and VTU IRQ names · 8ddf0b56

Andrew Lunn authored Jan 06, 2020

Dynamically generate a unique interrupt name for the VTU and ATU,
based on the device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ddf0b56

net: dsa: mv88e6xxx: Unique g2 IRQ name · 06acd114

Andrew Lunn authored Jan 06, 2020

Dynamically generate a unique g2 interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

06acd114

net: dsa: mv88e6xxx: Unique watchdog IRQ name · 8b4db289

Andrew Lunn authored Jan 06, 2020

Dynamically generate a unique watchdog interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b4db289

net: dsa: mv88e6xxx: Unique SERDES interrupt names · e6f2f6b8

Andrew Lunn authored Jan 06, 2020

Dynamically generate a unique SERDES interrupt name, based on the
device name and the port the SERDES is for. For example:

 95:          3  mv88e6xxx-g2   9 Edge      mv88e6xxx-0.2:00-serdes-9
 96:          0  mv88e6xxx-g2  10 Edge      mv88e6xxx-0.2:00-serdes-10

The 0.2:00 indicates the switch and -9 indicates port 9.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6f2f6b8

net: dsa: mv88e6xxx: Unique IRQ name · 3095383a

Andrew Lunn authored Jan 06, 2020

Dynamically generate a unique switch interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

3095383a

06 Jan, 2020 7 commits

igc: Use Start of Packet signal from PHY for timestamping · a299df35

Vinicius Costa Gomes authored Dec 02, 2019

For better accuracy, i225 is able to do timestamping using the Start of
Packet signal from the PHY.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a299df35

igc: Add support for ethtool GET_TS_INFO command · 60dbede0

Vinicius Costa Gomes authored Dec 02, 2019

This command allows igc to report what types of timestamping are
supported. ptp4l uses this to detect if the hardware supports
timestamping.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

60dbede0

igc: Add support for TX timestamping · 2c344ae2

Vinicius Costa Gomes authored Dec 02, 2019

This adds support for timestamping packets being transmitted.

Based on the code from i210. The basic differences is that i225 has 4
registers to store the transmit timestamps (i210 has one). Right now,
we only support retrieving from one register, support for using the
other registers will be added later.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2c344ae2

igc: Add support for RX timestamping · 81b05520

Vinicius Costa Gomes authored Dec 02, 2019

This adds support for timestamping received packets.

It is based on the i210, as many features of i225 work the same way.
The main difference from i210 is that i225 has support for choosing
the timer register to use when timestamping packets. Right now, we
only support using timer 0. The other difference is that i225 stores
two timestamps in the receive descriptor, right now, we only retrieve
one.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

81b05520

Merge branch 'ethtool-allow-nesting-of-begin-and-complete-callbacks' · 50d31037

David S. Miller authored Jan 06, 2020

Michal Kubecek says:

====================
ethtool: allow nesting of begin() and complete() callbacks

The ethtool ioctl interface used to guarantee that ethtool_ops callbacks
were always called in a block between calls to ->begin() and ->complete()
(if these are defined) and that this whole block was executed with RTNL
lock held:

	rtnl_lock();
	ops->begin();
	/* other ethtool_ops calls */
	ops->complete();
	rtnl_unlock();

This prevented any nesting or crossing of the begin-complete blocks.
However, this is no longer guaranteed even for ioctl interface as at least
ethtool_phys_id() releases RTNL lock while waiting for a timer. With the
introduction of netlink ethtool interface, the begin-complete pairs are
naturally nested e.g. when a request triggers a netlink notification.

Fortunately, only minority of networking drivers implements begin() and
complete() callbacks and most of those that do, fall into three groups:

  - wrappers for pm_runtime_get_sync() and pm_runtime_put()
  - wrappers for clk_prepare_enable() and clk_disable_unprepare()
  - begin() checks netif_running() (fails if false), no complete()

First two have their own refcounting, third is safe w.r.t. nesting of the
blocks.

Only three in-tree networking drivers need an update to deal with nesting
of begin() and complete() calls: via-velocity and epic100 perform resume
and suspend on their own and wil6210 completely serializes the calls using
its own mutex (which would lead to a deadlock if a request request
triggered a netlink notification). The series addresses these problems.

changes between v1 and v2:
  - fix inverted condition in epic100 ethtool_begin() (thanks to Andrew
    Lunn)
====================
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50d31037

epic100: allow nesting of ethtool_ops begin() and complete() · 4ac0ac84

Michal Kubecek authored Jan 06, 2020

Unlike most networking drivers using begin() and complete() ethtool_ops
callbacks to resume a device which is down and suspend it again when done,
epic100 does not use standard refcounted infrastructure but sets device
sleep state directly.

With the introduction of netlink ethtool interface, we may have nested
begin-complete blocks so that inner complete() would put the device back to
sleep for the rest of the outer block.

To avoid rewriting an old and not very actively developed driver, just add
a nesting counter and only perform resume and suspend on the outermost
level.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ac0ac84

via-velocity: allow nesting of ethtool_ops begin() and complete() · 71f711a4

Michal Kubecek authored Jan 06, 2020

Unlike most networking drivers using begin() and complete() ethtool_ops
callbacks to resume a device which is down and suspend it again when done,
via-velocity does not use standard refcounted infrastructure but sets
device sleep state directly.

With the introduction of netlink ethtool interface, we may have nested
begin-complete blocks so that inner complete() would put the device back to
sleep for the rest of the outer block.

To avoid rewriting an old and not very actively developed driver, just add
a nesting counter and only perform resume and suspend on the outermost
level.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

71f711a4