Commits · a47e8f5ad8981fa1b0e6be99fa0d99c1355408bc · nexedi / linux

16 May, 2014 40 commits

Merge branch 'ieee802154-next' · a47e8f5a

David S. Miller authored May 16, 2014

Phoebe Buckheister says:

====================
802154: implement link-layer security

This patch series implements 802.15.4-2011 link layer security.

Patches 1 and 2 prepare for llsec by adding data structures to represent the
llsec PIB as specified in 802.15.4-2011. I've changed some structures from
their specification to be more sensible, since 802.15.4 specifies some
structures in not-exactly-useful ways. Nested lists are common, but not very
accessible for netlink methods, and not very fast to traverse when searching
for specific elements either.

Patch 3 implements backends for these structures in mac802154.

Patch 4 and 5 implement the encryption and decryption methods, split from patch
3 to ease review. The encryption and decryption methods are almost entirely
compliant with the specified outgoing/incoming frame procedures. Decryption
deviates from the specification slightly where the specification makes no
sense, i.e. encrypted frames with security level 0 may be sent, but must be
dropped an reception - but transforms for processing such frames are given a
few lines in the standard. I've opted to not drop these frames instead of not
implementing the transforms that wouldn't be used if they were dropped.

Patch 6 links the mac802154 llsec with the SoftMAC devices. This is mainly
init//fini code for llsec context, handling of security subheaders and calling
the encryption/decryption methods.

Patch 7 adds sockopts to 802.15.4 dgram sockets to modifiy outgoing security
parameters on a per-socket basis. Ideally, this would also be available for
sockets on 6lowpan devices, but I'm not sure how to do that nicely.

Patch 8 adds forwarders to the llsec configuration methods for netlink, patch
10 implements these netlink accessors. This is mainly mechanical.

Patch 11, implements a key tracking option for devices that previous patches
haven't, because I'm not entirely sure whether this is the best approach to the
problem. It performs reasonably well though, so I decided to include it as a
separate patch in this series instead of sending an RFC just for this one
option.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a47e8f5a

ieee802154, mac802154: implement devkey record option · f0f77dc6

Phoebe Buckheister authored May 16, 2014

The 802.15.4-2011 standard states that for each key, a list of devices
that use this key shall be kept. Previous patches have only considered
two options:

 * a device "uses" (or may use) all keys, rendering the list useless
 * a device is restricted to a certain set of keys

Another option would be that a device *may* use all keys, but need not
do so, and we are interested in the actual set of keys the device uses.
Recording keys used by any given device may have a noticable performance
impact and might not be needed as often. The common case, in which a
device will not switch keys too often, should still perform well.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0f77dc6

ieee802154: add netlink interfaces for llsec · 3e9c156e

Phoebe Buckheister authored May 16, 2014

This patch adds user-visible interfaces for the llsec infrastructure.
For the added methods, the only major difference between all add/remove
implementation lies in how the specific object is parsed, and for dump
requests, how objects are written into netlink messages.

To save on boilerplate code, table dumps are routed through a helper
function that handles netlink dump state, leaving the actual dumping
code to care only about iterating over the table to be dumped and
filling netlink messages. For add/remove methods, the boilerplate
required to work is not quite as large, but still enough to also move
into a local helper.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e9c156e

mac802154: propagate device address changes to llsec · 9b0bb4a8

Phoebe Buckheister authored May 16, 2014

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b0bb4a8

mac802154: add llsec configuration functions · 29e02374

Phoebe Buckheister authored May 16, 2014

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

29e02374

ieee802154: add dgram sockopts for security control · af9eed5b

Phoebe Buckheister authored May 16, 2014

Allow datagram sockets to override the security settings of the device
they send from on a per-socket basis. Requires CAP_NET_ADMIN or
CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

af9eed5b

mac802154: integrate llsec with wpan devices · f30be4d5

Phoebe Buckheister authored May 16, 2014

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f30be4d5

mac802154: add llsec decryption method · 4c14a2fb

Phoebe Buckheister authored May 16, 2014

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c14a2fb

mac802154: add llsec encryption method · 03556e4d

Phoebe Buckheister authored May 16, 2014

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

03556e4d

mac802154: add llsec structures and mutators · 5d637d5a

Phoebe Buckheister authored May 16, 2014

This patch adds containers and mutators for the major ieee802154_llsec
structures to mac802154. Most of the (rather simple) ieee802154_llsec
structs are wrapped only to provide an rcu_head for orderly disposal,
but some structs - llsec keys notably - require more complex
bookkeeping.

Since each llsec key may be referenced by a number of llsec key table
entries (with differing key ids, but the same actual key), we want to
save memory and not allocate crypto transforms for each entry in the
table. Thus, the mac802154 llsec key is reference-counted instead.
Further, each key will have four associated crypto transforms - three
CCM transforms for the authsizes 4/8/16 and one CTR transform for
unauthenticated encryption. If we had a CCM* transform that allowed
authsize 0, and authsize as part of requests instead of transforms, this
would not be necessary.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d637d5a

mac802154: update Kconfig · 87de726c

Phoebe Buckheister authored May 16, 2014

Link-layer security requires AES CCM for authenticated modes and AES CTR
for the unauthenticated encryption mode.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

87de726c

ieee802154: add types for link-layer security · dc20759f

Phoebe Buckheister authored May 16, 2014

The added structures match 802.15.4-2011 link-layer security PIBs as
closely as is reasonable. Some lists required by the standard were
modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
excessive and not particularly useful. The DeviceDescriptorHandleList
was inverted and is here a per-device list, since operations on this
list are likely to have both a key and a device at hand, and per-device
lists of keys are shorter than per-key lists of devices.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc20759f

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · e54740e6

David S. Miller authored May 16, 2014

Jesse Gross says:

====================
A set of OVS changes for net-next/3.16.

The major change here is a switch from per-CPU to per-NUMA flow
statistics. This improves scalability by reducing kernel overhead
in flow setup and maintenance.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e54740e6

Merge branch 'dt_fixed_phy' · ad2ebb3d

David S. Miller authored May 16, 2014

Thomas Petazzoni says:

====================
Add DT support for fixed PHYs

Here is a fourth version of the patch set that adds a Device Tree
binding and the related code to support fixed PHYs. I'm hoping to get
this merged in 3.16.

Changes since v3:

 * Rebased on top of v3.15-rc5

 * In patch "net: phy: decouple PHY id and PHY address in fixed PHY
   driver", changed the PHY ID of fixed PHYs from 0xdeadbeef to 0x0,
   as suggested by Grant Likely.

 * Fixed the !CONFIG_PHY_FIXED case in patch "net: phy: extend fixed
   driver with fixed_phy_register()". Noticed by Florian Fainelli.

 * Added Acked-by from Grant Likely and Florian Fainelli on patch
   "net: phy: extend fixed driver with fixed_phy_register()".

 * Reworked the new fixed-link DT binding to be just a sub-node of the
   Ethernet MAC node, and not a node referenced by the 'phy'
   property. This was requested by Grant Likely.

 * Reworked the code implementing the new DT binding to also make it
   accept the old, single property based, DT binding.

 * Added a patch that actually uses the new fixed link DT binding for
   the Armada XP Matrix board.

Changes since v2:

 * Rebased on top of v3.14-rc1, and re-tested on hardware.

 * Removed the RFC tag, since there seems to be some real interest in
   this feature, and the code has gone through several iterations
   already.

 * The error handling in fixed_phy_register() has been fixed.

Changes since v1:

 * Instead of using a 'fixed-link' property inside the Ethernet device
   DT node, with a fairly cryptic succession of integer values, we now
   use a PHY subnode under the Ethernet device DT node, with explicit
   properties to configure the duplex, speed, pause and other PHY
   properties.

 * The PHY address is automatically allocated by the kernel and no
   longer visible in the Device Tree binding.

 * The PHY device is created directly when the network driver calls
   of_phy_connect_fixed_link(), and associated to the PHY DT node,
   which allows the existing of_phy_connect() function to work,
   without the need to use the deprecated of_phy_connect_fixed_link().

Posts of previous versions:

  RFCv1:   http://www.spinics.net/lists/netdev/msg243253.html
  RFCv2:   http://lists.infradead.org/pipermail/linux-arm-kernel/2013-September/196919.html
  PATCHv3: http://www.spinics.net/lists/netdev/msg273117.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ad2ebb3d

ARM: mvebu: use the fixed-link PHY DT binding for the Armada XP Matrix board · 84f6e11f

Thomas Petazzoni authored May 16, 2014

The Armada XP Matrix board has an Ethernet PHY that isn't configurable
through the MDIO bus, so we use the newly introduced fixed-link PHY DT
binding to represent the PHY of this platform and get network working.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84f6e11f

net: mvneta: add support for fixed links · 83895bed

Thomas Petazzoni authored May 16, 2014

Following the introduction of of_phy_register_fixed_link(), this patch
introduces fixed link support in the mvneta driver, for Marvell Armada
370/XP SOCs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83895bed

of: provide a binding for fixed link PHYs · 3be2a49e

Thomas Petazzoni authored May 16, 2014

Some Ethernet MACs have a "fixed link", and are not connected to a
normal MDIO-managed PHY device. For those situations, a Device Tree
binding allows to describe a "fixed link" using a special PHY node.

This patch adds:

 * A documentation for the fixed PHY Device Tree binding.

 * An of_phy_is_fixed_link() function that an Ethernet driver can call
   on its PHY phandle to find out whether it's a fixed link PHY or
   not. It should typically be used to know if
   of_phy_register_fixed_link() should be called.

 * An of_phy_register_fixed_link() function that instantiates the
   fixed PHY into the PHY subsystem, so that when the driver calls
   of_phy_connect(), the PHY device associated to the OF node will be
   found.

These two additional functions also support the old fixed-link Device
Tree binding used on PowerPC platforms, so that ultimately, the
network device drivers for those platforms could be converted to use
of_phy_is_fixed_link() and of_phy_register_fixed_link() instead of
of_phy_connect_fixed_link(), while keeping compatibility with their
respective Device Tree bindings.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3be2a49e

net: phy: extend fixed driver with fixed_phy_register() · a7595121

Thomas Petazzoni authored May 16, 2014

The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:

 * The address of the PHY on the fake bus needs to be passed, while a
   dynamic allocation is desired.

 * Since the phy_device instantiation is post-poned until the next
   mdiobus scan, there is no way to associate the fixed PHY with its
   OF node, which later prevents of_phy_connect() from finding this
   fixed PHY from a given OF node.

To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7595121

net: phy: decouple PHY id and PHY address in fixed PHY driver · 9b744942

Thomas Petazzoni authored May 16, 2014

Until now, the fixed_phy_add() function was taking as argument
'phy_id', which was used both as the PHY address on the fake fixed
MDIO bus, and as the PHY id, as available in the MII_PHYSID1 and
MII_PHYSID2 registers. However, those two informations are completely
unrelated.

This patch decouples them. The PHY id of fixed PHYs is hardcoded to be
0x0. Ideally, a really reserved value would be nicer, but there
doesn't seem to be an easy of making sure a dummy value can be
assigned to the Linux kernel for such usage.

The PHY address remains passed by the caller of phy_fixed_add().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b744942

Merge branch 'bridge-non-promisc' · 2770abcc

David S. Miller authored May 16, 2014

Vlad Yasevich says:

====================
bridge: Non-promisc bridge ports support

This series adds functionality to the bridge device to enable
operations without setting all ports to promiscuous mode.

The basic concept is this.  The bridge keeps track of the ports
that support learning and flooding packets to unknown destinations.
We call these ports auto-discovery ports since they automatically
discover who is behind them through learning and flooding.

If flooding and learning are disabled via flags, then the port
requires static configuration to tell it which mac addresses
are behind it.  This is accomplished through adding of fdbs.
These fdbs should be static as dynamic fdbs can expire and systems
will become unreachable due to lack of flooding.

If the user marks all ports as needing static configuration then
we can safely make them non-promiscuous since we will know all the
information about them.

If the user leaves only 1 port as automatic, then we can mark
that port as not-promiscuous as well.  One could think of
this a edge relay similar to what's support by embedded switches
in SRIOV devices.  Since we have all the information about the
other ports, we can just program the mac addresses into the
single automatic port to receive all necessary traffic.
More information about this is patch 6.

In other cases, we keep all ports promiscuous as before.

There are some other cases when promiscuous mode has to be turned
back on.  One is when the bridge itself if placed in promiscuous
mode (user sets promisc flag).  The other is if vlan filtering is
turned off.  Since this is the default configuration, the default
bridge operation is not changed.

Changes since v2:
 - White space and spelling fixes from Michael Tsirkin
 - Squash patches 6, 7 and 8 to prevent bisect breakage.

Changes since v1:
 - Address issues rasied by Stephen Heminger
 - Address initializer comments raised by Sergey Shtylyov
 - Rebased recent net-next.

Changes since rfc v2:
 - Better description of in the commit logs
 - Leave port in promiscuous mode if IFF_UNICAST_FLT is disabled on the
   device.
 - Fix issue with flag masking
 - Rework patch ordering a bit.

Changes since rfc v1:
 - Removed private list.  We now traverse the fdb hashtable itself
   to write necessary addresses to the ports (Stephen's concern)
 - Add learning flag to the mask for flags that decides if the port
   is 'auto' or not (suggest by MST and Jamal).
 - Simplified tracking of such ports at the cost of a loop over all
   ports (suggested by MST)

I've played with quite a large number of ports and the current approach
seems to work fairly well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2770abcc

bridge: Automatically manage port promiscuous mode. · 2796d0c6

Vlad Yasevich authored May 16, 2014

There exist configurations where the administrator or another management
entity has the foreknowledge of all the mac addresses of end systems
that are being bridged together.

In these environments, the administrator can statically configure known
addresses in the bridge FDB and disable flooding and learning on ports.
This makes it possible to turn off promiscuous mode on the interfaces
connected to the bridge.

Here is why disabling flooding and learning allows us to control
promiscuity:
 Consider port X.  All traffic coming into this port from outside the
bridge (ingress) will be either forwarded through other ports of the
bridge (egress) or dropped.  Forwarding (egress) is defined by FDB
entries and by flooding in the event that no FDB entry exists.
In the event that flooding is disabled, only FDB entries define
the egress.  Once learning is disabled, only static FDB entries
provided by a management entity define the egress.  If we provide
information from these static FDBs to the ingress port X, then we'll
be able to accept all traffic that can be successfully forwarded and
drop all the other traffic sooner without spending CPU cycles to
process it.
 Another way to define the above is as following equations:
    ingress = egress + drop
 expanding egress
    ingress = static FDB + learned FDB + flooding + drop
 disabling flooding and learning we a left with
    ingress = static FDB + drop

By adding addresses from the static FDB entries to the MAC address
filter of an ingress port X, we fully define what the bridge can
process without dropping and can thus turn off promiscuous mode,
thus dropping packets sooner.

There have been suggestions that we may want to allow learning
and update the filters with learned addresses as well.  This
would require mac-level authentication similar to 802.1x to
prevent attacks against the hw filters as they are limited
resource.

Additionally, if the user places the bridge device in promiscuous mode,
all ports are placed in promiscuous mode regardless of the changes
to flooding and learning.

Since the above functionality depends on full static configuration,
we have also require that vlan filtering be enabled to take
advantage of this.  The reason is that the bridge has to be
able to receive and process VLAN-tagged frames and the there
are only 2 ways to accomplish this right now: promiscuous mode
or vlan filtering.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2796d0c6

bridge: Add addresses from static fdbs to non-promisc ports · 145beee8

Vlad Yasevich authored May 16, 2014

When a static fdb entry is created, add the mac address
from this fdb entry to any ports that are currently running
in non-promiscuous mode.  These ports need this data so that
they can receive traffic destined to these addresses.
By default ports start in promiscuous mode, so this feature
is disabled.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

145beee8

bridge: Introduce BR_PROMISC flag · f3a6ddf1

Vlad Yasevich authored May 16, 2014

Introduce a BR_PROMISC per-port flag that will help us track if the
current port is supposed to be in promiscuous mode or not.  For now,
always start in promiscuous mode.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3a6ddf1

bridge: Add functionality to sync static fdb entries to hw · 8db24af7

Vlad Yasevich authored May 16, 2014

Add code that allows static fdb entires to be synced to the
hw list for a specified port.  This will be used later to
program ports that can function in non-promiscuous mode.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8db24af7

bridge: Keep track of ports capable of automatic discovery. · e028e4b8

Vlad Yasevich authored May 16, 2014

By default, ports on the bridge are capable of automatic
discovery of nodes located behind the port.  This is accomplished
via flooding of unknown traffic (BR_FLOOD) and learning the
mac addresses from these packets (BR_LEARNING).
If the above functionality is disabled by turning off these
flags, the port requires static configuration in the form
of static FDB entries to function properly.

This patch adds functionality to keep track of all ports
capable of automatic discovery.  This will later be used
to control promiscuity settings.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e028e4b8

bridge: Turn flag change macro into a function. · 63c3a622

Vlad Yasevich authored May 16, 2014

Turn the flag change macro into a function to allow
easier updates and to reduce space.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63c3a622

net: pch_gbe depends on x86_32 · 4c30b525

Jean Delvare authored May 16, 2014

The pch_gbe driver is for a companion chip to the Intel Atom E600
series processors. These are 32-bit x86 processors so the driver is
only needed on X86_32.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c30b525

ip_tunnel: don't add tunnel twice · ee30ef4d

Duan Jiong authored May 15, 2014

When using command "ip tunnel add" to add a tunnel, the tunnel will be added twice,
through ip_tunnel_create() and ip_tunnel_update().

Because the second is unnecessary, so we can just break after adding tunnel
through ip_tunnel_create().
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee30ef4d

tools: bpf_jit_disasm: increase image buffer size · 9bb1a208

Alexei Starovoitov authored May 15, 2014

JITed seccomp filters can be quite large if they check a lot of syscalls
Simply increase buffer size
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bb1a208

tools: bpf_jit_disasm: ignore image address for disasm · ed4afd45

Alexei Starovoitov authored May 15, 2014

seccomp filters use kernel JIT image addresses, so bpf_jit_enable=2 prints
[ 20.146438] flen=3 proglen=82 pass=0 image=0000000000000000
[ 20.146442] JIT code: 00000000: 55 48 89 e5 48 81 ec 28 02 00 00 ...

ignore image address, so that seccomp filters can be disassembled
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed4afd45

Merge branch 'systemport-next' · e0a1272c

David S. Miller authored May 16, 2014

Florian Fainelli says:

====================
net: systemport: DMA and MAC fixes

This patch series contains a critical fix in how the DMA unmapping of packet
is done, as well as a less critical fix in how we disable the Ethernet MAC
RX/TX functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e0a1272c

net: systemport: wait for packet in umac_enable_set() · 00b91c69

Florian Fainelli authored May 15, 2014

When umac_enable_set() is used to disable the UniMAC receiver or
transmitter, we need to make sure that we wait for a full-sized packet
to be processed because the UniMAC hardware stops on a packet boundary,
not immediately.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00b91c69

net: systemport: fix dma_unmap_single() len · b1ff53e9

Florian Fainelli authored May 15, 2014

dma_unmap_single() was called with dma_unmap_len(cb, dma_len),
unfortunately we failed to assign this length field in
bcm_sysport_rx_refill() or bcm_sysport_alloc_rx_bufs() using
dma_unmap_len_set().

This causes packet contents corruption because are we not invoking the
cache invalidation routines with the proper length.  Fix this by using
the full RX buffer size (RX_BUF_LENGTH) because the mappings for the RX
bufers are created with that size.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1ff53e9

net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c · 944df8ae

Monam Agarwal authored Mar 24, 2014

This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

944df8ae

openvswitch: Use TCP flags in the flow key for stats. · 88d73f6c

Jarno Rajahalme authored Mar 27, 2014

We already extract the TCP flags for the key, might as well use that
for stats.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

88d73f6c

openvswitch: Fix output of SCTP mask. · d92ab135

Jarno Rajahalme authored Mar 27, 2014

The 'output' argument of the ovs_nla_put_flow() is the one from which
the bits are written to the netlink attributes.  For SCTP we
accidentally used the bits from the 'swkey' instead.  This caused the
mask attributes to include the bits from the actual flow key instead
of the mask.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

d92ab135

openvswitch: Per NUMA node flow stats. · 63e7959c

Jarno Rajahalme authored Mar 27, 2014

Keep kernel flow stats for each NUMA node rather than each (logical)
CPU.  This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master.  Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace.  The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows.  This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
+   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
+   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
+   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
+   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
+   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
+   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
+   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
+   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0).  As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated.  This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency.  If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated.  This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

63e7959c

openvswitch: Remove 5-tuple optimization. · 23dabf88

Jarno Rajahalme authored Mar 27, 2014

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch.  Remove it first to make the changes easier to
grasp.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

23dabf88

openvswitch: Use ether_addr_copy · 8c63ff09

Joe Perches authored Feb 18, 2014

It's slightly smaller/faster for some architectures.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

8c63ff09

openvswitch: flow_netlink: Use pr_fmt to OVS_NLERR output · 2235ad1c

Joe Perches authored Feb 03, 2014

Add "openvswitch: " prefix to OVS_NLERR output
to match the other OVS_NLERR output of datapath.c
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

2235ad1c