Commits · c74bfbdba0e8d056e4ba579a666b5cdb8ec3cd35 · nexedi / linux

19 Jul, 2016 2 commits

sctp: load transport header after sk_filter · c74bfbdb

Willem de Bruijn authored Jul 16, 2016

Do not cache pointers into the skb linear segment across sk_filter.
The function call can trigger pskb_expand_head.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c74bfbdb

net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int · 0564bf0a

Konstantin Khlebnikov authored Jul 16, 2016

In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink
protocol these values are converted into pshed ticks (64ns for now) and
truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens"
are declared as unsigned 32-bit but they could be negative thus tool 'tc'
prints them as signed. Big values loose higher bits and/or become negative.

This patch clamps tokens in xstat into range from INT_MIN to INT_MAX.
In this way it's easier to understand what's going on here.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

0564bf0a

17 Jul, 2016 4 commits

net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata · 8e6ce7eb

Florian Fainelli authored Jul 15, 2016

The label lio_xmit_failed is used 3 times through liquidio_xmit() but it
always makes a call to dma_unmap_single() using potentially
uninitialized variables from "ndata" variable. Out of the 3 gotos, 2 run
after ndata has been initialized, and had a prior dma_map_single() call.

Fix this by adding a new error label: lio_xmit_dma_failed which does
this dma_unmap_single() and then processed with the lio_xmit_failed
fallthrough.

Fixes: f21fb3ed ("Add support of Cavium Liquidio ethernet adapters")
Reported-by: coverity (CID 1309740)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e6ce7eb

net: nb8800: Fix SKB leak in nb8800_receive() · ea6ff112

Florian Fainelli authored Jul 15, 2016

In case nb8800_receive() fails to allocate a fragment, we would leak the
SKB freshly allocated and just return, instead, free it.

Reported-by: coverity (CID 1341750)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea6ff112

et131x: Fix logical vs bitwise check in et131x_tx_timeout() · de702da7

Florian Fainelli authored Jul 15, 2016

We should be using a logical check here instead of a bitwise operation
to check if the device is closed already in et131x_tx_timeout().

Reported-by: coverity (CID 146498)
Fixes: 38df6492 ("et131x: Add PCIe gigabit ethernet driver et131x to drivers/net")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de702da7

vlan: use a valid default mtu value for vlan over macsec · 18d3df3e

Paolo Abeni authored Jul 14, 2016

macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18d3df3e

15 Jul, 2016 14 commits

net: bgmac: Fix infinite loop in bgmac_dma_tx_add() · e86663c4

Florian Fainelli authored Jul 15, 2016

Nothing is decrementing the index "i" while we are cleaning up the
fragments we could not successful transmit.

Fixes: 9cde9450 ("bgmac: implement scatter/gather support")
Reported-by: coverity (CID 1352048)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e86663c4

Merge branch 'mlxsw-fixes' · f57ec188

David S. Miller authored Jul 15, 2016

Jiri Pirko says:

====================
mlxsw: Couple of fixes

Couple of fixes for mlxsw driver from Ido.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f57ec188

mlxsw: spectrum: Prevent invalid ingress buffer mapping · 11719a58

Ido Schimmel authored Jul 15, 2016

Packets entering the switch are mapped to a Switch Priority (SP)
according to their PCP value (untagged frames are mapped to SP 0).

The packets are classified to a priority group (PG) buffer in the port's
headroom according to their SP.

The switch maintains another mapping (SP to IEEE priority), which is
used to generate PFC frames for lossless PGs. This mapping is
initialized to IEEE = SP % 8.

Therefore, when mapping SP 'x' to PG 'y' we create a situation in which
an IEEE priority is mapped to two different PGs:

IEEE 'x' ---> SP 'x' ---> PG 'y'
IEEE 'x' ---> SP 'x + 8' ---> PG '0' (default)

Which is invalid, as a flow can use only one PG buffer.

Fix this by mapping both SP 'x' and 'x + 8' to the same PG buffer.

Fixes: 8e8dfe9f ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

11719a58

mlxsw: spectrum: Prevent overwrite of DCB capability fields · 28f5275e

Ido Schimmel authored Jul 15, 2016

The number of supported traffic classes that can have ETS and PFC
simultaneously enabled is not subject to user configuration, so make
sure we always initialize them to the correct values following a set
operation.

Fixes: 8e8dfe9f ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Fixes: d81a6bdb ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28f5275e

mlxsw: spectrum: Don't emit errors when PFC is disabled · 7347180d

Ido Schimmel authored Jul 15, 2016

We can't have PAUSE frames and PFC both enabled on the same port, but
the fact that ieee_setpfc() was called doesn't necessarily mean PFC is
enabled.

Only emit errors when PAUSE frames and PFC are enabled simultaneously.

Fixes: d81a6bdb ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7347180d

mlxsw: spectrum: Indicate support for autonegotiation · c3f15768

Ido Schimmel authored Jul 15, 2016

The device supports link autonegotiation, so let the user know about it
by indicating support via ethtool ops.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3f15768

mlxsw: spectrum: Force link training according to admin state · 6277d46b

Ido Schimmel authored Jul 15, 2016

When setting a new speed we need to disable and enable the port for the
changes to take effect. We currently only do that if the operational
state of the port is up. However, setting a new speed following link
training failure will require us to explicitly set the port down and then
up.

Instead, disable and enable the port based on its administrative state.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6277d46b

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · dd79cf7d

David S. Miller authored Jul 15, 2016

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2016-07-14

This series contains fixes to i40e and ixgbe.

Alex fixes issues found in i40e_rx_checksum() which was broken, where the
checksum was being returned valid when it was not.

Kiran fixes a bug which was found when we abruptly remove a cable which
caused a panic.  Set the VSI broadcast promiscuous mode during VSI add
sequence and prevents adding MAC filter if specified MAC address is
broadcast.

Paolo Abeni fixes a bug by returning the actual work done, capped to
weight - 1, since the core doesn't allow to return the full budget when
the driver modifies the NAPI status.

Guilherme Piccoli fixes an issue where the q_vector initialization
routine sets the affinity _mask of a q_vector based on v_idx value.
This means a loop iterates on v_idx, which is an incremental value, and
the cpumask is created based on this value.  This is a problem in
systems with multiple logical CPUs per core (like in SMT scenarios).
Changed the way q_vector's affinity_mask is created to resolve the issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dd79cf7d

r8152: add MODULE_VERSION · c961e877

Grant Grundler authored Jul 14, 2016

ethtool -i provides a driver version that is hard coded.
Export the same value via "modinfo".
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c961e877

tcp: enable per-socket rate limiting of all 'challenge acks' · 083ae308

Jason Baron authored Jul 14, 2016

The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:

commit f2b2c582 ("tcp: mitigate ACK loops for connections as tcp_sock")

And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.

Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.

It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable

Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

083ae308

i40e: use valid online CPU on q_vector initialization · 7f6c5539

Guilherme G. Piccoli authored Jun 27, 2016

Currently, the q_vector initialization routine sets the affinity_mask
of a q_vector based on v_idx value. Meaning a loop iterates on v_idx,
which is an incremental value, and the cpumask is created based on
this value.

This is a problem in systems with multiple logical CPUs per core (like in
SMT scenarios). If we disable some logical CPUs, by turning SMT off for
example, we will end up with a sparse cpu_online_mask, i.e., only the first
CPU in a core is online, and incremental filling in q_vector cpumask might
lead to multiple offline CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

This patch changes the way q_vector's affinity_mask is created: it iterates
on v_idx, but consumes the CPU index from the cpu_online_mask instead of
just using the v_idx incremental value.

No functional changes were introduced.
Signed-off-by: Guilherme G Piccoli <gpiccoli@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

7f6c5539

ixgbe: napi_poll must return the work done · 4b732cd4

Paolo Abeni authored Jun 15, 2016

Currently the function ixgbe_poll() returns 0 when it clean completely
the rx rings, but this foul budget accounting in core code.
Fix this returning the actual work done, capped to weight - 1, since
the core doesn't allow to return the full budget when the driver modifies
the napi status
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4b732cd4

i40e: enable VSI broadcast promiscuous mode instead of adding broadcast filter · f6bd0962

Kiran Patil authored Jun 20, 2016

This patch sets VSI broadcast promiscuous mode during VSI add sequence
and prevents adding MAC filter if specified MAC address is broadcast.

Change-ID: Ia62251fca095bc449d0497fc44bec3a5a0136773
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f6bd0962

i40e/i40evf: Fix i40e_rx_checksum · 858296c8

Alexander Duyck authored Jun 14, 2016

There are a couple of issues I found in i40e_rx_checksum while doing some
recent testing. As a result I have found the Rx checksum logic is pretty
much broken and returning that the checksum is valid for tunnels in cases
where it is not.

First the inner types are not the correct values to use to test for if a
tunnel is present or not. In addition the inner protocol types are not a
bitmask as such performing an OR of the values doesn't make sense. I have
instead changed the code so that the inner protocol types are used to
determine if we report CHECKSUM_UNNECESSARY or not. For anything that does
not end in UDP, TCP, or SCTP it doesn't make much sense to report a
checksum offload since it won't contain a checksum anyway.

This leaves us with the need to set the csum_level based on some value.
For that purpose I am using the tunnel_type field. If the tunnel type is
GRENAT or greater then this means we have a GRE or UDP tunnel with an inner
header. In the case of GRE or UDP we will have a possible checksum present
so for this reason it should be safe to set the csum_level to 1 to indicate
that we are reporting the state of the inner header.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

858296c8

14 Jul, 2016 1 commit

bonding: set carrier off for devices created through netlink · 005db31d

Beniamino Galvani authored Jul 13, 2016

Commit e826eafa ("bonding: Call netif_carrier_off after
register_netdevice") moved netif_carrier_off() from bond_init() to
bond_create(), but the latter is called only for initial default
devices and ones created through sysfs:

 $ modprobe bonding
 $ echo +bond1 > /sys/class/net/bonding_masters
 $ ip link add bond2 type bond
 $ grep "MII Status" /proc/net/bonding/*
 /proc/net/bonding/bond0:MII Status: down
 /proc/net/bonding/bond1:MII Status: down
 /proc/net/bonding/bond2:MII Status: up

Ensure that carrier is initially off also for devices created through
netlink.
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

005db31d

13 Jul, 2016 9 commits

Merge branch 'sk_filter-trim-limit' · 790e5ef5

David S. Miller authored Jul 13, 2016

Willem de Bruijn says:

====================
limit sk_filter trim to payload

Sockets can apply a filter to incoming packets to drop or trim them.
Fix two codepaths that call skb_pull/__skb_pull after sk_filter
without checking for packet length.

Reading beyond skb->tail after trimming happens in more codepaths, but
safety of reading in the linear segment is based on minimum allocation
size (MAX_HEADER, GRO_MAX_HEAD, ..).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

790e5ef5

dccp: limit sk_filter trim to payload · 4f0c40d9

Willem de Bruijn authored Jul 12, 2016

Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.

A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.

Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.

Fixes: 7c657876 ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f0c40d9

rose: limit sk_filter trim to payload · f4979fce

Willem de Bruijn authored Jul 12, 2016

Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4979fce

Merge branch 'mlx5-fixes' · 22cb99fb

David S. Miller authored Jul 13, 2016

Saeed Mahameed says:

====================
mlx5 tx timeout watchdog fixes

This patch set provides two trivial fixes for the tx timeout series lately
applied into net 4.7.

From Daniel, detect stuck queues due to BQL
From Mohamad, fix tx timeout watchdog false alarm

Hopefully those two fixes will make it to -stable, assuming
3947ca18 ('net/mlx5e: Implement ndo_tx_timeout callback') was also backported to -stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

22cb99fb

net/mlx5e: start/stop all tx queues upon open/close netdev · c3b7c5c9

Mohamad Haj Yahia authored Jul 13, 2016

Start all tx queues (including inactive ones) when opening the netdev.
Stop all tx queues (including inactive ones) when closing the netdev.

This is a workaround for the tx timeout watchdog false alarm issue in
which the netdev watchdog is polling all the tx queues which may include
inactive queues and thus once lowering the real tx queues number
(ethtool -L) it will generate tx timeout watchdog false alarms.

Fixes: 3947ca18 ('net/mlx5e: Implement ndo_tx_timeout callback')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3b7c5c9

net/mlx5e: Fix TX Timeout to detect queues stuck on BQL · 2c1ccc99

Daniel Jurgens authored Jul 13, 2016

Change netif_tx_queue_stopped to netif_xmit_stopped.  This will show
when queues are stopped due to byte queue limits.

Fixes: 3947ca18 ('net/mlx5e: Implement ndo_tx_timeout callback')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c1ccc99

Merge branch 'ethoc-fixes' · ea43f860

David S. Miller authored Jul 12, 2016

Florian Fainelli says:

====================
net: ethoc: Error path and transmit fixes

This patch series contains two patches for the ethoc driver while testing on a
TS-7300 board where ethoc is provided by an on-board FPGA.

First patch was cooked after chasing crashes with invalid resources passed to
the driver.

Second patch was cooked after seeing that an interface configured with IP
192.168.2.2 was sending ARP packets for 192.168.0.0, no wonder why it could not
work.

I don't have access to any other platform using an ethoc interface so
it could be good to some testing on Xtensa for instance.

Changes in v3:

- corrected the error path if skb_put_padto() fails, thanks to Max
  for spotting this!

Changes in v2:

- fixed the first commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ea43f860

net: ethoc: Correctly pad short packets · ee6c21b9

Florian Fainelli authored Jul 12, 2016

Even though the hardware can be doing zero padding, we want the SKB to
be going out on the wire with the appropriate size. This fixes packet
truncations observed with e.g: ARP packets.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee6c21b9

net: ethoc: Fix early error paths · 386512d1

Florian Fainelli authored Jul 12, 2016

In case any operation fails before we can successfully go the point
where we would register a MDIO bus, we would be going to an error label
which involves unregistering then freeing this yet to be created MDIO
bus. Update all error paths to go to label free which is the only one
valid until either the clock is enabled, or the MDIO bus is allocated
and registered. This fixes kernel oops observed while trying to
dereference the MDIO bus structure which is not yet allocated.

Fixes: a1702857 ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

386512d1

12 Jul, 2016 8 commits

net: nps_enet: Fix PCS reset · 136ab0d0

Noam Camus authored Jul 12, 2016

During commit b54b8c2d
 ("net: ezchip: adapt driver to little endian architecture")
 adapting to little endian architecture,
 zeroing of controller was left out.
Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

136ab0d0

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 92a03eb0

David S. Miller authored Jul 12, 2016

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree.
they are:

1) Fix leak in the error path of nft_expr_init(), from Liping Zhang.

2) Tracing from nf_tables cannot be disabled, also from Zhang.

3) Fix an integer overflow on 32bit archs when setting the number of
   hashtable buckets, from Florian Westphal.

4) Fix configuration of ipvs sync in backup mode with IPv6 address,
   from Quentin Armitage via Simon Horman.

5) Fix incorrect timeout calculation in nft_ct NFT_CT_EXPIRATION,
   from Florian Westphal.

6) Skip clash resolution in conntrack insertion races if NAT is in
   place.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

92a03eb0

netfilter: conntrack: skip clash resolution if nat is in place · 590b52e1

Pablo Neira Ayuso authored Jul 11, 2016

The clash resolution is not easy to apply if the NAT table is
registered. Even if no NAT rules are installed, the nul-binding ensures
that a unique tuple is used, thus, the packet that loses race gets a
different source port number, as described by:

http://marc.info/?l=netfilter-devel&m=146818011604484&w=2

Clash resolution with NAT is also problematic if addresses/port range
ports are used since the conntrack that wins race may describe a
different mangling that we may have earlier applied to the packet via
nf_nat_setup_info().

Fixes: 71d8c47f ("netfilter: conntrack: introduce clash resolution on insertion race")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>

590b52e1

Merge branch 'tipc-fixes' · ce9a4f31

David S. Miller authored Jul 11, 2016

Jon Maloy says:

====================
tipc: three small fixes

Fixes for some broadcast link problems that may occur in large systems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce9a4f31

tipc: reset all unicast links when broadcast send link fails · 1fc07f3e

Jon Paul Maloy authored Jul 11, 2016

In test situations with many nodes and a heavily stressed system we have
observed that the transmission broadcast link may fail due to an
excessive number of retransmissions of the same packet. In such
situations we need to reset all unicast links to all peers, in order to
reset and re-synchronize the broadcast link.

In this commit, we add a new function tipc_bearer_reset_all() to be used
in such situations. The function scans across all bearers and resets all
their pertaining links.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fc07f3e

tipc: ensure correct broadcast send buffer release when peer is lost · a71eb720

Jon Paul Maloy authored Jul 11, 2016

After a new receiver peer has been added to the broadcast transmission
link, we allow immediate transmission of new broadcast packets, trusting
that the new peer will not accept the packets until it has received the
previously sent unicast broadcast initialiation message. In the same
way, the sender must not accept any acknowledges until it has itself
received the broadcast initialization from the peer, as well as
confirmation of the reception of its own initialization message.

Furthermore, when a receiver peer goes down, the sender has to produce
the missing acknowledges from the lost peer locally, in order ensure
correct release of the buffers that were expected to be acknowledged by
the said peer.

In a highly stressed system we have observed that contact with a peer
may come up and be lost before the above mentioned broadcast initial-
ization and confirmation have been received. This leads to the locally
produced acknowledges being rejected, and the non-acknowledged buffers
to linger in the broadcast link transmission queue until it fills up
and the link goes into permanent congestion.

In this commit, we remedy this by temporarily setting the corresponding
broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
state to true before we issue the local acknowledges. This ensures that
those acknowledges will always be accepted. The mentioned state values
are restored immediately afterwards when the link is reset.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a71eb720

tipc: extend broadcast link initialization criteria · 2d18ac4b

Jon Paul Maloy authored Jul 11, 2016

At first contact between two nodes, an endpoint might sometimes have
time to send out a LINK_PROTOCOL/STATE packet before it has received
the broadcast initialization packet from the peer, i.e., before it has
received a valid broadcast packet number to add to the 'bc_ack' field
of the protocol message.

This means that the peer endpoint will receive a protocol packet with an
invalid broadcast acknowledge value of 0. Under unlucky circumstances
this may lead to the original, already received acknowledge value being
overwritten, so that the whole broadcast link goes stale after a while.

We fix this by delaying the setting of the link field 'bc_peer_is_up'
until we know that the peer really has received our own broadcast
initialization message. The latter is always sent out as the first
unicast message on a link, and always with seqeunce number 1. Because
of this, we only need to look for a non-zero unicast acknowledge value
in the arriving STATE messages, and once that is confirmed we know we
are safe and can set the mentioned field. Before this moment, we must
ignore all broadcast acknowledges from the peer.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d18ac4b

r8152: Add support for setting pass through MAC address on RTL8153-AD · 34ee32c9

Mario Limonciello authored Jul 11, 2016

The RTL8153-AD supports a persistent system specific MAC address.
This means a device plugged into two different systems with host side
support will show different (but persistent) MAC addresses.

This information for the system's persistent MAC address is burned in when
the system HW is built and available under \_SB.AMAC in the DSDT at runtime.

This technology is currently implemented in the Dell TB15 and WD15 Type-C
docks. More information is available here:
http://www.dell.com/support/article/us/en/04/SLN301147Signed-off-by: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34ee32c9

11 Jul, 2016 2 commits

sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send · 779f1ede

Soheil Hassas Yeganeh authored Jul 11, 2016

Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
as a control message to TCP. Since __sock_cmsg_send does not
support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
hence breaks pulse audio over TCP.

SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
but they semantically belong to SOL_UNIX. Since all
cmsg-processing functions including sock_cmsg_send ignore control
messages of other layers, it is best to ignore SCM_RIGHTS
and SCM_CREDENTIALS for consistency (and also for fixing pulse
audio over TCP).

Fixes: c14ac945 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

779f1ede

ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space · 80610229

Julian Anastasov authored Jul 10, 2016

Vegard Nossum is reporting for a crash in fib_dump_info
when nh_dev = NULL and fib_nhs == 1:

Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
RIP: 0033:[<00000000602b3d18>]
RSP: 0000000062623890  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
Stack:
 626238f0 960226a02 00000400 000000fe
 62623910 600afca7 62623970 62623a48
 62468038 00000018 00000000 00000000
Call Trace:
 [<602b3e93>] rtmsg_fib+0xd3/0x190
 [<602b6680>] fib_table_insert+0x260/0x500
 [<602b0e5d>] inet_rtm_newroute+0x4d/0x60
 [<60250def>] rtnetlink_rcv_msg+0x8f/0x270
 [<60267079>] netlink_rcv_skb+0xc9/0xe0
 [<60250d4b>] rtnetlink_rcv+0x3b/0x50
 [<60265400>] netlink_unicast+0x1a0/0x2c0
 [<60265e47>] netlink_sendmsg+0x3f7/0x470
 [<6021dc9a>] sock_sendmsg+0x3a/0x90
 [<6021e0d0>] ___sys_sendmsg+0x300/0x360
 [<6021fa64>] __sys_sendmsg+0x54/0xa0
 [<6021fac0>] SyS_sendmsg+0x10/0x20
 [<6001ea68>] handle_syscall+0x88/0x90
 [<600295fd>] userspace+0x3fd/0x500
 [<6001ac55>] fork_handler+0x85/0x90

$ addr2line -e vmlinux -i 0x602b3d18
include/linux/inetdevice.h:222
net/ipv4/fib_semantics.c:1264

Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.

Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Fixes: 0eeb075f ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Dinesh Dutt <ddutt@cumulusnetworks.com>
Cc: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80610229