Commits · 00b2034029840ddad255352c46db0ae21342ce56 · Kirill Smelkov / linux

04 May, 2016 30 commits

gre: remove superfluous pskb_may_pull · 00b20340

Jiri Benc authored May 03, 2016

The call to gre_parse_header is either followed by iptunnel_pull_header, or
in the case of ICMP error path, the actual header is not accessed at all.

In the first case, iptunnel_pull_header will call pskb_may_pull anyway and
it's pointless to do it twice. The only difference is what call will fail
with what error code but the net effect is still the same in all call sites.

In the second case, pskb_may_pull is pointless, as skb->data is at the outer
IP header and not at the GRE header.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00b20340

Merge branch 'mlx5-sriov-updates' · 3f7496aa

David S. Miller authored May 04, 2016

Saeed Mahameed says:

====================
Mellanox 100G ethernet SRIOV Upgrades

This series introduces new features and upgrades for mlx5 etherenet SRIOV,
while the first patch provides a bug fixes for a compilation issue introduced
buy the previous aRFS series for when CONFIG_RFS_ACCEL=y and CONFIG_MLX5_CORE_EN=n.

Changes from V0:
    - 1st patch: Don't add a new Kconfig flag.  Instead, compile out en_arfs.c \
contents when CONFIG_RFS_ACCEL=n

SRIOV upgrades:
    - Use synchronize_irq instead of the vport events spin_lock
    - Fix memory leak in error flow
    - Added full VST support
    - Spoofcheck support
    - Trusted VF promiscuous and allmulti support

VST and Spoofcheck in details:
    - Adding Low level firmware commands support for creating ACLs
     (Access Control Lists) Flow tables.  ACLs are regular flow tables with
     the only exception that they are bound to a specific e-Switch vport (VF)
     and they can be one of two types
        > egress ACL: filters traffic going from e-Switch to VF.
        > ingress ACL: filters traffic going from VF to e-Switch.
    - Ingress/Egress ACLs (per vport) for VF VST mode filtering.
    - Ingress/Egress ACLs (per vport) for VF spoofcheck filtering.
    - Ingress/Egress ACLs (per vport) configuration:
        > Created only when at least one of (VST, spoofcheck) is configured.
	> if (!spoofchk && !vst) allow all traffic.  i.e. no ACLs.
        > if (spoofchk && vst) allow only untagged traffic with smac=original mac \
                sent from the VF.
        > if (spoofchk && !vst) allow only traffic with smac=original mac sent from \
the VF.  > if (!spoofchk && vst) allow only untagged traffic.

Trusted VF promiscuous and allmulti support in details:
    - Added two flow groups for allmulti and promisc VFs to the e-Switch FDB table
        > Allmulti group: One rule that forwards any mcast traffic coming from
                          either uplink or VFs/PF vports.
        > Promisc group: One rule that forwards all unmatched traffic coming from \
                uplink.
    - Add vport context change event handling for promisc and allmulti
      If VF is trusted respect the request and:
        > if allmulti request: add the vport to the allmulti group.
          and to all other L2 mcast address in the FDB table.
        > if promisc request: add the vport to the promisc group.
        > Note: A promisc VF can only see traffic that was not explicitly matched to
                or requested by any other VF.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3f7496aa

net/mlx5: E-Switch, Implement trust vf ndo · 1edc57e2

Mohamad Haj Yahia authored May 03, 2016

- Add support to configure trusted vf attribute through trust_vf_ndo.

- Upon VF trust setting change we update vport context to refresh
 allmulti/promisc or any trusted vf attributes that we didn't trust the
 VF for before.

- Lock the eswitch state lock on vport event in order to synchronise the
 vport context updates , this will prevent contention with vport trust
 setting change which will trigger vport mac list update.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1edc57e2

net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling · a35f71f2

Mohamad Haj Yahia authored May 03, 2016

Add promisc_change as a trigger to vport context change event.
Add set vport promisc/allmulti functions to add vport to promiscuous
flowtable rules.
Upon promisc/allmulti rx mode vf request add the vport to
the relevant promiscuous group (Allmulti/Promisc group) so the relevant
traffic will be forwarded to it.
Upon allmulti vf request add the vport to each existing multicast fdb
rule.
Upon adding/removing mcast address from a vport, update all other
allmulti vports.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a35f71f2

net/mlx5: E-Switch, Add promiscuous and allmulti FDB flowtable groups · 78a9199b

Mohamad Haj Yahia authored May 03, 2016

Add promiscuous and allmulti steering groups in FDB table.
Besides the full match L2 steering rules group, we added
two more groups to catch the "miss" rules traffic:
* Allmulti group: One rule that forwards any mcast traffic coming from
either uplink or VFs/PF vports
* Promisc group: One rule that forwards all unmatched traffic coming
from uplink.

Needed for downstream privileged VF promisc and allmulti support.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78a9199b

net/mlx5: E-Switch, Use vport event handler for vport cleanup · 586cfa7f

Mohamad Haj Yahia authored May 03, 2016

Remove the usage of explicit cleanup function and use existing vport
change handler. Calling vport change handler while vport
is disabled will cleanup the vport resources.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

586cfa7f

net/mlx5: E-Switch, Enable/disable ACL tables on demand · 01f51f22

Mohamad Haj Yahia authored May 03, 2016

Enable ingress/egress ACL tables only when we need to configure ACL
rules.
Disable ingress/egress ACL tables once all ACL rules are removed.

All VF outgoing/incoming traffic need to go through the ingress/egress ACL
tables.
Adding/Removing these tables on demand will save unnecessary hops in the
flow steering when the ACL tables are empty.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01f51f22

net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk · f942380c

Mohamad Haj Yahia authored May 03, 2016

Configure ingress and egress vport ACL rules according to spoofchk
admin parameters.

Ingress ACL flow table rules:
if (!spoofchk && !vst) allow all traffic.
else :
1) one of the following rules :
* if (spoofchk && vst) allow only untagged traffic with smac=original
mac sent from the VF.
* if (spoofchk && !vst) allow only traffic with smac=original mac sent
from the VF.
* if (!spoofchk && vst) allow only untagged traffic.
2) drop all traffic that didn't hit #1.

Add support for set vf spoofchk ndo.

Add non zero mac validation in case of spoofchk to set mac ndo:
when setting new mac we need to validate that the new mac is
not zero while the spoofchk is on because it is illegal
combination.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f942380c

net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode · dfcb1ed3

Mohamad Haj Yahia authored May 03, 2016

Configure ingress and egress vport ACL rules according to
vlan and qos admin parameters.

Ingress ACL flow table rules:
1) drop any tagged packet sent from the VF
2) allow other traffic (default behavior)

Egress ACL flow table rules:
1) allow only tagged traffic with vlan_tag=vst_vid.
2) drop other traffic.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfcb1ed3

net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs · 5742df0f

Mohamad Haj Yahia authored May 03, 2016

Create egress/ingress ACLs per VF vport at vport enable.

Ingress ACL:
	- one flow group to drop all tagged traffic in VST mode.

Egress ACL:
	- one flow group that allows only untagged traffic with
          smac that is equals to the original mac (anti-spoofing).
        - one flow group that allows only untagged traffic.
        - one flow group that allows only  smac that is equals
          to the original mac (anti-spoofing).
        (note: only one of the above group has active rule)
	- star rule will be used to drop all other traffic.

By default no rules are generated, unless VST is explicitly requested.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5742df0f

net/mlx5: E-Switch, Fix error flow memory leak · 761e205b

Mohamad Haj Yahia authored May 03, 2016

Fix memory leak in case query nic vport command failed.

Fixes: 81848731 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

761e205b

net/mlx5: E-Switch, Replace vport spin lock with synchronize_irq() · 831cae1d

Mohamad Haj Yahia authored May 03, 2016

Vport spin lock can be replaced with synchronize_irq() in the right
place, this will remove the need of locking inside irq context.
Locking in esw_enable_vport is not required since vport events are yet
to be enabled, and at esw_disable_vport it is sufficient to
synchronize_irq() to guarantee no further vport events handlers will be
scheduled.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

831cae1d

net/mlx5: Flow steering, Add vport ACL support · efdc810b

Mohamad Haj Yahia authored May 03, 2016

Update the relevant flow steering device structs and commands to
support vport.
Update the flow steering core API to receive vport number.
Add ingress and egress ACL flow table name spaces.
Add ACL flow table support:
* ACL (Access Control List) flow table is a table that contains
only allow/drop steering rules.

* We have two types of ACL flow tables - ingress and egress.

* ACLs handle traffic sent from/to E-Switch FDB table, Ingress refers to
traffic sent from Vport to E-Switch and Egress refers to traffic sent
from E-Switch to vport.

* Ingress ACL flow table allow/drop rules is checked against traffic
sent from VF.

* Egress ACL flow table allow/drop rules is checked against traffic sent
to VF.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efdc810b

net/mlx5e: Fix aRFS compilation dependency · fbc4a69b

Maor Gottlieb authored May 03, 2016

en_arfs.o should be compiled only if both CONFIG_MLX5_CORE_EN
and CONFIG_RFS_ACCEL are enabled. en_arfs calls to rps_may_expire_flow
which is compiled only if CONFIG_RFS_ACCEL is defined.

Move en_arfs.o compilation dependency to be under CONFIG_MLX5_CORE_EN
and wrap the en_arfs.c content with ifdef of CONFIG_RFS_ACCEL.

Fixes: 1cabe6b0 ('net/mlx5e: Create aRFS flow tables')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbc4a69b

Merge branch 'cxgb4-mbox' · 27540247

David S. Miller authored May 04, 2016

Hariprasad Shenai says:

====================
cxgb4: mbox enhancements for cxgb4

This patch series checks for firmware errors when we are waiting for
mbox response in a loop and breaks out. When negative timeout is passed
to mailbox code, don't sleep. Negative timeout is passed only from
interrupt context.

This patch series has been created against net-next tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

27540247

cxgb4: Check for firmware errors in the mailbox command loop · f358738b

Hariprasad Shenai authored May 03, 2016

Check for firmware errors in the mailbox command loop and report
them differently rather than simply timing out when the firmware goes
belly up.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f358738b

cxgb4: Don't sleep when mbox cmd is issued from interrupt context · 5a20f5cf

Hariprasad Shenai authored May 03, 2016

When link goes down, from the interrupt handler DCB priority for the
Tx queues needs to be unset. We issue mbox command to unset the Tx queue
priority with negative timeout. In t4_wr_mbox_meat_timeout() do not sleep
when negative timeout is passed, since it is called from interrupt context.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a20f5cf

drivers: net: emac: add Atheros AR8035 phy initialization code · ecc9120e

Christian Lamparter authored May 03, 2016

This patch adds the phy initialization code for Qualcomm
Atheros AR8035 phy. This configuration is found in the
Cisco Meraki MR24.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecc9120e

Merge branch 'tunnel-features-and-gso-partial' · 53da5b47

David S. Miller authored May 04, 2016

Alexander Duyck says:

====================
Fix Tunnel features and enable GSO partial for several drivers

This patch series is meant to allow us to get the best performance possible
for Mellanox ConnectX-3/4 and Broadcom NetXtreme-C/E adapters in terms of
VXLAN and GRE tunnels.

The first 3 patches address issues I found in regards to GSO_PARTIAL and
TSO_MANGLEID.

The next 4 patches go through and enable GSO_PARTIAL for VXLAN tunnels that
have an outer checksum enabled, and then enable IPv6 support where I can.
One outstanding issue is that I wasn't able to get offloads working with
outer IPv6 headers on mlx4.  However that wasn't a feature that was enabled
before so it isn't technically a regression, however I believe Engineers
from Mellanox said they would look into it since they thought it should be
supported.

The last patch enables GSO_PARTIAL for VXLAN and GRE tunnels on the bnxt
driver.  One piece of feedback I received on the patch was that the
hardware has globally set IPv6 UDP tunnels to always have the checksum
field computed.  I plan to work with Broadcom to get that addressed so that
we only populate the checksum field if it was requested by the network
stack.

v2: Rebased patches off of latest changes to the mlx4/mlx5 drivers.
    Added bnxt driver patch as I received feedback on the RFC.
v3: Moved 2 patches into series for net as they were generic fixes.
    Added patch to disable GSO partial if frame is less than 2x size of MSS

    There are outstanding issues as called out above that need to be
    addressed, however they were present before these patches so it isn't
    as if they introduce a regression.  In addition gains can be easily
    seen so there should be no issue with applying the driver patches while
    the IPv6 mlx4_en and bnxt issues are being researched.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

53da5b47

bnxt: Add support for segmentation of tunnels with outer checksums · 152971ee

Alexander Duyck authored May 02, 2016

This patch assumes that the bnxt hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP and GRE headers.

I have been told by Michael Chan that this is working. Though this might
be somewhat redundant for IPv6 as they are forcing the checksum to be
computed for all IPv6 frames that are offloaded. A follow-up patch may be
necessary in order to fix this as it is essentially mangling the outer IPv6
headers to add a checksum where none was requested.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

152971ee

net/mlx5e: Fix IPv6 tunnel checksum offload · f3ed653c

Alexander Duyck authored May 02, 2016

The mlx5 driver exposes support for TSO6 but not IPv6 csum for hardware
encapsulated tunnels. This leads to issues as it triggers warnings in
skb_checksum_help as it ends up being called as we report supporting the
segmentation but not the checksumming for IPv6 frames.

This patch corrects that and drops 2 features that don't actually need to
be supported in hw_enc_features since they are Rx features and don't
actually impact anything by being present in hw_enc_features.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3ed653c

net/mlx5e: Add support for UDP tunnel segmentation with outer checksum offload · b49663c8

Alexander Duyck authored May 02, 2016

This patch assumes that the mlx5 hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP headers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b49663c8

net/mlx4_en: Add support for inner IPv6 checksum offloads and TSO · 09067122

Alexander Duyck authored May 02, 2016

>From what I can tell the ConnectX-3 will support an inner IPv6 checksum and
segmentation offload, however it cannot support outer IPv6 headers.  This
assumption is based on the fact that I could see the checksum being
offloaded for inner header on IPv4 tunnels, but not on IPv6 tunnels.

For this reason I am adding the feature to the hw_enc_features and adding
an extra check to the features_check call that will disable GSO and
checksum offload in the case that the encapsulated frame has an outer IP
version of that is not 4.  The check in mlx4_en_features_check could be
removed if at some point in the future a fix is found that allows the
hardware to offload segmentation/checksum on tunnels with an outer IPv6
header.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09067122

net/mlx4_en: Add support for UDP tunnel segmentation with outer checksum offload · 3c9346b2

Alexander Duyck authored May 02, 2016

This patch assumes that the mlx4 hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP headers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c9346b2

net: Fix netdev_fix_features so that TSO_MANGLEID is only available with TSO · b1dc497b

Alexander Duyck authored May 02, 2016

This change makes it so that we will strip the TSO_MANGLEID bit if TSO is
not present.  This way we will also handle ECN correctly of TSO is not
present.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1dc497b

gso: Only allow GSO_PARTIAL if we can checksum the inner protocol · 36c98382

Alexander Duyck authored May 02, 2016

This patch addresses a possible issue that can occur if we get into any odd
corner cases where we support TSO for a given protocol but not the checksum
or scatter-gather offload.  There are few drivers floating around that
setup their tunnels this way and by enforcing the checksum piece we can
avoid mangling any frames.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36c98382

gso: Do not perform partial GSO if number of partial segments is 1 or less · d7fb5a80

Alexander Duyck authored May 02, 2016

In the event that the number of partial segments is equal to 1 we don't
really need to perform partial segmentation offload.  As such we should
skip multiplying the MSS and instead just clear the partial_segs value
since it will not provide any gain to advertise the frame as being GSO when
it is a single frame.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7fb5a80

gre: change gre_parse_header to return the header length · f132ae7c

Jiri Benc authored May 03, 2016

It's easier for gre_parse_header to return the header length instead of
filing it into a parameter. That way, the callers that don't care about the
header length can just check whether the returned value is lower than zero.

In gre_err, the tunnel header must not be pulled. See commit b7f8fe25
("gre: do not pull header in ICMP error processing") for details.

This patch reduces the conflict between the mentioned commit and commit
95f5c64c ("gre: Move utility functions to common headers").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f132ae7c

tcp: guarantee forward progress in tcp_sendmsg() · d4011239

Eric Dumazet authored May 02, 2016

Under high rx pressure, it is possible tcp_sendmsg() never has a
chance to allocate an skb and loop forever as sk_flush_backlog()
would always return true.

Fix this by calling sk_flush_backlog() only if one skb had been
allocated and filled before last backlog check.

Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4011239

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · cba65321

David S. Miller authored May 04, 2016

Conflicts:
	net/ipv4/ip_gre.c

Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>

cba65321

03 May, 2016 10 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7391daf2

Linus Torvalds authored May 03, 2016

Pull networking fixes from David Miller:
 "Some straggler bug fixes:

   1) Batman-adv DAT must consider VLAN IDs when choosing candidate
      nodes, from Antonio Quartulli.

   2) Fix botched reference counting of vlan objects and neigh nodes in
      batman-adv, from Sven Eckelmann.

   3) netem can crash when it sees GSO packets, the fix is to segment
      then upon ->enqueue.  Fix from Neil Horman with help from Eric
      Dumazet.

   4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew
      Finlay.

   5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5,
      since it can sleep.  Fix also from Matthew Finlay.

   6) Check mdiobus_scan() return values properly in pxa168_eth and macb
      drivers.  From Sergei Shtylyov.

   7) If the netdevice doesn't support checksumming, disable
      segmentation.  From Alexandery Duyck.

   8) Fix races between RDS tcp accept and sending, from Sowmini
      Varadhan.

   9) In macb driver, probe MDIO bus before we register the netdev,
      otherwise we can try to open the device before it is really ready
      for that.  Fix from Florian Fainelli.

  10) Netlink attribute size for ILA "tunnels" not calculated properly,
      fix from Nicolas Dichtel"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  ipv6/ila: fix nlsize calculation for lwtunnel
  net: macb: Probe MDIO bus before registering netdev
  RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.
  RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock
  vxlan: Add checksum check to the features check function
  net: Disable segmentation if checksumming is not supported
  net: mvneta: Remove superfluous SMP function call
  macb: fix mdiobus_scan() error check
  pxa168_eth: fix mdiobus_scan() error check
  net/mlx5e: Use workqueue for vxlan ops
  net/mlx5e: Implement a mlx5e workqueue
  net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue
  net/mlx5: Unmap only the relevant IO memory mapping
  netem: Segment GSO packets on enqueue
  batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node
  batman-adv: Fix reference counting of vlan object for tt_local_entry
  batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event
  batman-adv: fix DAT candidate selection (must use vid)

7391daf2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 610603a5

Linus Torvalds authored May 03, 2016

Pull fuse fixes from Miklos Szeredi:
 "Fix a regression and update the MAINTAINERS entry for fuse"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: update mailing list in MAINTAINERS
  fuse: Fix return value from fuse_get_user_pages()

610603a5

ipv6/ila: fix nlsize calculation for lwtunnel · 79e8dc8b

Nicolas Dichtel authored May 03, 2016

The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.

Fixes: 65d7ab8d ("net: Identifier Locator Addressing module")
CC: Tom Herbert <tom@herbertland.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79e8dc8b

ipv6: add new struct ipcm6_cookie · 26879da5

Wei Wang authored May 02, 2016

In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
This is not a good practice and makes it hard to add new parameters.
This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
ipv4 and include the above mentioned variables. And we only pass the
pointer to this structure to corresponding functions. This makes it easier
to add new parameters in the future and makes the function cleaner.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26879da5

net: macb: Probe MDIO bus before registering netdev · cf669660

Florian Fainelli authored May 02, 2016

The current sequence makes us register for a network device prior to
registering and probing the MDIO bus which could lead to some unwanted
consequences, like a thread of execution calling into ndo_open before
register_netdev() returns, while the MDIO bus is not ready yet.

Rework the sequence to register for the MDIO bus, and therefore attach
to a PHY prior to calling register_netdev(), which implies reworking the
error path a bit.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf669660

Merge branch 'rds-fixes' · b365d955

David S. Miller authored May 03, 2016

Sowmini Varadhan says:

====================
RDS: TCP: sychronization during connection startup

This patch series ensures that the passive (accept) side of the
TCP connection used for RDS-TCP is correctly synchronized with
any concurrent active (connect) attempts for a given pair of peers.

Patch 1 in the series makes sure that the t_sock in struct
rds_tcp_connection is only reset after any threads in rds_tcp_xmit
have completed (otherwise a null-ptr deref may be encountered).
Patch 2 synchronizes rds_tcp_accept_one() with the rds_tcp*connect()
path.

v2: review comments from Santosh Shilimkar, other spelling corrections
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b365d955

RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock. · bd7c5f98

Sowmini Varadhan authored May 02, 2016

An arbitration scheme for duelling SYNs is implemented as part of
commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
involved will arrive at the same arbitration decision. However, this
needs to be synchronized with an outgoing SYN to be generated by
rds_tcp_conn_connect(). This commit achieves the synchronization
through the t_conn_lock mutex in struct rds_tcp_connection.

The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
not already UP (an UP would indicate that rds_tcp_accept_one() has
completed 3WH, so no SYN needs to be generated).

Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
acquiring the t_conn_lock mutex. The only acceptable states (to
allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
outgoing SYN before we saw incoming SYN).
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd7c5f98

RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock · eb192840

Sowmini Varadhan authored May 02, 2016

There is a race condition between rds_send_xmit -> rds_tcp_xmit
and the code that deals with resolution of duelling syns added
by commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()").

Specifically, we may end up derefencing a null pointer in rds_send_xmit
if we have the interleaving sequence:
           rds_tcp_accept_one                  rds_send_xmit

                                             conn is RDS_CONN_UP, so
    					 invoke rds_tcp_xmit

                                             tc = conn->c_transport_data
        rds_tcp_restore_callbacks
            /* reset t_sock */
    					 null ptr deref from tc->t_sock

The race condition can be avoided without adding the overhead of
additional locking in the xmit path: have rds_tcp_accept_one wait
for rds_tcp_xmit threads to complete before resetting callbacks.
The synchronization can be done in the same manner as rds_conn_shutdown().
First set the rds_conn_state to something other than RDS_CONN_UP
(so that new threads cannot get into rds_tcp_xmit()), then wait for
RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
threads in rds_tcp_xmit are done.

Fixes: 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb192840

net: add __sock_wfree() helper · 1d2077ac

Eric Dumazet authored May 02, 2016

Hosts sending lot of ACK packets exhibit high sock_wfree() cost
because of cache line miss to test SOCK_USE_WRITE_QUEUE

We could move this flag close to sk_wmem_alloc but it is better
to perform the atomic_sub_and_test() on a clean cache line,
as it avoid one extra bus transaction.

skb_orphan_partial() can also have a fast track for packets that either
are TCP acks, or already went through another skb_orphan_partial()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d2077ac

Merge branch 'tunnel-csum-and-sg-offloads' · 42c8819b

David S. Miller authored May 03, 2016

Alexander Duyck says:

====================
Fixes for tunnel checksum and segmentation offloads

This patch series is a subset of patches I had submitted for net-next.  I
plan to drop these two patches from the v3 of "Fix Tunnel features and
enable GSO partial for several drivers" and I am instead submitting them
for net since these are truly fixes and likely will need to be backported
to stable branches.

This series addresses 2 specific issues.  The first is that we could
request TSO on a v4 inner header while not supporting checksum offload of
the outer IPv6 header.  The second is that we could request an IPv6 inner
checksum offload without validating that we could actually support an inner
IPv6 checksum offload.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

42c8819b