Commits · 90bdfcb76f7d3b4a763ded3242277578ef22eda4 · Kirill Smelkov / linux

04 May, 2015 40 commits

tipc: deal with return value of tipc_conn_new callback · 90bdfcb7

Ying Xue authored May 04, 2015

Once tipc_conn_new() returns NULL, the connection should be shut
down immediately, otherwise, oops may happen due to the NULL pointer.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90bdfcb7

tipc: adjust locking policy of subscription · a13683f2

Ying Xue authored May 04, 2015

Currently subscriber's lock protects not only subscriber's subscription
list but also all subscriptions linked into the list. However, as all
members of subscription are never changed after they are initialized,
it's unnecessary for subscription to be protected under subscriber's
lock. If the lock is used to only protect subscriber's subscription
list, the adjustment not only makes the locking policy simpler, but
also helps to avoid a deadlock which may happen once creating a
subscription is failed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a13683f2

tipc: involve reference counter for subscriber · 00bc00a9

Ying Xue authored May 04, 2015

At present subscriber's lock is used to protect the subscription list
of subscriber as well as subscriptions linked into the list. While one
or all subscriptions are deleted through iterating the list, the
subscriber's lock must be held. Meanwhile, as deletion of subscription
may happen in subscription timer's handler, the lock must be grabbed
in the function as well. When subscription's timer is terminated with
del_timer_sync() during above iteration, subscriber's lock has to be
temporarily released, otherwise, deadlock may occur. However, the
temporary release may cause the double free of a subscription as the
subscription is not disconnected from the subscription list.

Now if a reference counter is introduced to subscriber, subscription's
timer can be asynchronously stopped with del_timer(). As a result, the
issue is not only able to be fixed, but also relevant code is pretty
readable and understandable.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00bc00a9

tipc: introduce tipc_subscrb_create routine · 1b764828

Ying Xue authored May 04, 2015

Introducing a new function makes the purpose of tipc_subscrb_connect_cb
callback routine more clear.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b764828

tipc: rename functions defined in subscr.c · 57f1d186

Ying Xue authored May 04, 2015

When a topology server accepts a connection request from its client,
it allocates a connection instance and a tipc_subscriber structure
object. The former is used to communicate with client, and the latter
is often treated as a subscriber which manages all subscription events
requested from a same client. When a topology server receives a request
of subscribing name services from a client through the connection, it
creates a tipc_subscription structure instance which is seen as a
subscription recording what name services are subscribed. In order to
manage all subscriptions from a same client, topology server links
them into the subscrp_list of the subscriber. So subscriber and
subscription completely represents different meanings respectively,
but function names associated with them make us so confused that we
are unable to easily tell which function is against subscriber and
which is to subscription. So we want to eliminate the confusion by
renaming them.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57f1d186

Merge branch 'igmp_mld_export' · 29a1ff65

David S. Miller authored May 04, 2015

Linus Lüssing says:

====================
Exporting IGMP/MLD checking from bridge code

The multicast optimizations in batman-adv are yet only usable and
enabled in non-bridged scenarios. To be able to support bridged setups
batman-adv needs to be able to detect IGMP/MLD queriers and reports on
mesh nodes without bridges, too. See the following link for details:

http://www.open-mesh.org/projects/batman-adv/wiki/Multicast-optimizations-listener-reports

To avoid duplicate code between the bridge and batman-adv, the IGMP/MLD
message validation code is moved from the bridge to the IPv4/IPv6 stack.

On the way, some refactoring to increase readability and to iron out
some subtle differences between the IGMP and MLD parsing code is done.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

29a1ff65

net: Export IGMP/MLD message validation code · 9afd85c9

Linus Lüssing authored May 02, 2015

With this patch, the IGMP and MLD message validation functions are moved
from the bridge code to IPv4/IPv6 multicast files. Some small
refactoring was done to enhance readibility and to iron out some
differences in behaviour between the IGMP and MLD parsing code (e.g. the
skb-cloning of MLD messages is now only done if necessary, just like the
IGMP part always did).

Finally, these IGMP and MLD message validation functions are exported so
that not only the bridge can use it but batman-adv later, too.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>

9afd85c9

bridge: multicast: call skb_checksum_{simple_, }validate · 3c9e4f87

Linus Lüssing authored May 02, 2015

Let's use these new, neat helpers.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c9e4f87

tc: remove unused redirect ttl · c19ae86a

Jamal Hadi Salim authored May 01, 2015

improves ingress+u32 performance from 22.4 Mpps to 22.9 Mpps
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c19ae86a

Merge branch 'via-rhine-rework' · 4256af62

David S. Miller authored May 04, 2015

Francois Romieu says:

====================
via-rhine rework

The series applies against davem-next as of
9dd3c797 ("drivers: net: xgene: fix kbuild
warnings").

Patches #1..#4 avoid holes in the receive ring.

Patch #5 is a small leftover cleanup for #1..#4.

Patches #6 and #7 are fairly simple barrier stuff.

Patch #8 closes some SMP transmit races - not that anyone really
complained about these but it's a bit hard to handwave that they
can be safely ignored. Some testing, especially SMP testing of
course, would be welcome.

. Changes since #2:
  - added dma_rmb barrier in vlan related patch 6.
  - s/wmb/dma_wmb/ in (*new*) patch 7 of 8.
  - added explicit SMP barriers in (*new*) patch 8 of 8.

. Changes since #1:
  - turned wmb() into dma_wmb() as suggested by davem and Alexander Duyck
    in patch 1 of 6.
  - forgot to reset rx_head_desc in rhine_reset_rbufs in patch 4 of 6.
  - removed rx_head_desc altogether in (*new*) patch 5 of 6
  - remoed some vlan receive uglyness in (*new*) patch 6 of 6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4256af62

via-rhine: close SMP transmit races. · 3a5a883a

françois romieu authored May 01, 2015

7ab87ff4 ("via-rhine: move work from
irq handler to softirq and beyond") forgot to explicitely control the
lifespan of the tx_dirty and tx_cur pointers.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a5a883a

via-rhine: dma_wmb transmit barrier. · e1efa872

françois romieu authored May 01, 2015

Follow the now usual transmit descriptor update path:
1. content change
2. dma_wmb
3. ownership change
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1efa872

via-rhine: add consistent memory barrier in vlan receive code. · 810f19bc

françois romieu authored May 01, 2015

The NAPI receive path depends on desc->rx_status but it does not
enforce any explicit receive barrier.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

810f19bc

via-rhine: kiss rx_head_desc goodbye. · 62ca1ba0

françois romieu authored May 01, 2015

The driver no longer produces holes in its receive ring so rx_head_desc
only duplicates cur_rx.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62ca1ba0

via-rhine: forbid holes in the receive descriptor ring. · 8709bb2c

françois romieu authored May 01, 2015

Rationales:
- throttle work under memory pressure
- lower receive descriptor recycling latency for the network adapter
- lower the maintenance burden of uncommon paths

The patch is twofold:
- it fails early if the receive ring can't be completely initialized
  at dev->open() time
- it drops packets on the floor in the napi receive handler so as to
  keep the received ring full
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8709bb2c

via-rhine: gotoize rhine_open error path. · 4d1fd9c1

françois romieu authored May 01, 2015

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d1fd9c1

via-rhine: allocate and map receive buffer in a single transaction · a21bb8ba

françois romieu authored May 01, 2015

It's used to initialize the receive ring but it will actually shine when
the receive poll code is reworked.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a21bb8ba

via-rhine: commit receive buffer address before descriptor status update. · e45af497
françois romieu authored May 01, 2015
```
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
e45af497

Merge branch 'flow_keys_digest' · 7c9a2eea

David S. Miller authored May 04, 2015

Tom Herbert says:

====================
net: Eliminate calls to flow_dissector and introduce flow_keys_digest

In this patch set we add skb_get_hash_perturb which gets the skbuff
hash for a packet and perturbs it using a provided key and jhash1.
This function is used in serveral qdiscs and eliminates many calls
to flow_dissector and jhash3 to get a perturbed hash for a packet.

To handle the sch_choke issue (passes flow_keys in skbuff cb) we
add flow_keys_digest which is a digest of a flow constructed
from a flow_keys structure.

This is the second version of these patches I posted a while ago,
and is prerequisite work to increasing the size of the flow_keys
structure and hashing over it (full IPv6 address, flow label, VLAN ID,
etc.).

Version 2:

- Add keyval parameter to __flow_hash_from_keys which allows caller to
  set the initval for jhash
- Perturb always does flow dissection and creates hash based on
  input perturb value which acts as the keyval to __flow_hash_from_keys
- Added a _flow_keys_digest_data which is used in make_flow_keys_digest.
  This fills out the digest by populating individual fields instead
  of copying the whole structure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7c9a2eea

sch_choke: Use flow_keys_digest · 2e99403d

Tom Herbert authored May 01, 2015

Call make_flow_keys_digest to get a digest from flow keys and
use that to pass skbuff cb and for comparing flows.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e99403d

net: Add flow_keys digest · 2f59e1eb

Tom Herbert authored May 01, 2015

Some users of flow keys (well just sch_choke now) need to pass
flow_keys in skbuff cb, and use them for exact comparisons of flows
so that skb->hash is not sufficient. In order to increase size of
the flow_keys structure, we introduce another structure for
the purpose of passing flow keys in skbuff cb. We limit this structure
to sixteen bytes, and we will technically treat this as a digest of
flow_keys struct hence its name flow_keys_digest. In the first
incaranation we just copy the flow_keys structure up to 16 bytes--
this is the same information previously passed in the cb. In the
future, we'll adapt this for larger flow_keys and could use something
like SHA-1 over the whole flow_keys to improve the quality of the
digest.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f59e1eb

sched: Call skb_get_hash_perturb in sch_sfq · ada1dba0

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ada1dba0

sched: Call skb_get_hash_perturb in sch_sfb · 63c0ad4d

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63c0ad4d

sched: Call skb_get_hash_perturb in sch_hhf · f969777a

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f969777a

sched: Call skb_get_hash_perturb in sch_fq_codel · 342db221

Tom Herbert authored May 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

342db221

net: Add skb_get_hash_perturb · 50fb7992

Tom Herbert authored May 01, 2015

This calls flow_disect and __skb_get_hash to procure a hash for a
packet. Input includes a key to initialize jhash. This function
does not set skb->hash.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50fb7992

net: ipv4: route: Fix sending IGMP messages with link address · 6a211654

Andrew Lunn authored May 01, 2015

In setups with a global scope address on an interface, and a lesser
scope address on an interface sending IGMP reports, the reports can be
sent using the other interfaces global scope address rather than the
local interface address. RFC 2236 suggests:

     Ignore the Report if you cannot identify the source address of
     the packet as belonging to a subnet assigned to the interface on
     which the packet was received.

since such reports could be forged.

Look at the protocol when deciding if a RT_SCOPE_LINK address should
be used for the packet.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a211654

net: sched: run ingress qdisc without locks · 087c1a60

Alexei Starovoitov authored Apr 30, 2015

TC classifiers/actions were converted to RCU by John in the series:
http://thread.gmane.org/gmane.linux.network/329739/focus=329739
and many follow on patches.
This is the last patch from that series that finally drops
ingress spin_lock.

Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.

In two cpu case when both cores are receiving traffic on the same
device and go into the same ingress+u32 the performance jumps
from 4.5 + 4.5 Mpps to 23.5 + 23.5 Mpps
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

087c1a60

Merge branch 'tcp_sack_rttm' · a89f96c9

David S. Miller authored May 03, 2015

Kenneth Klette Jonassen says:

====================
tcp: SACK RTTM changes for congestion control

This patch series improves SACK RTT measurements for congestion control:
  o Picks the latest sequence SACKed for RTT, i.e. most accurate delay
    signal.
  o Calls the congestion control's pkts_acked hook with SACK RTTMs
    even when not sequentially ACKing new data.

V2: amend misleading comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a89f96c9

tcp: invoke pkts_acked hook on every ACK · 138998fd

Kenneth Klette Jonassen authored May 01, 2015

Invoking pkts_acked is currently conditioned on FLAG_ACKED:
receiving a cumulative ACK of new data, or ACK with SYN flag set.

Remove this condition so that CC may get RTT measurements from all SACKs.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

138998fd

tcp: improve RTT from SACK for CC · 31231a8a

Kenneth Klette Jonassen authored May 01, 2015

tcp_sacktag_one() always picks the earliest sequence SACKed for RTT.
This might not make sense for congestion control in cases where:

  1. ACKs are lost, i.e. a SACK following a lost SACK covers both
     new and old segments at the receiver.
  2. The receiver disregards the RFC 5681 recommendation to immediately
     ACK out-of-order segments.

Give congestion control a RTT for the latest segment SACKed, which is the
most accurate RTT estimate, but preserve the conservative RTT for RTO.

Removes the call to skb_mstamp_get() in tcp_sacktag_one().

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31231a8a

tcp: move struct tcp_sacktag_state to tcp_ack() · 196da974

Kenneth Klette Jonassen authored May 01, 2015

Later patch passes two values set in tcp_sacktag_one() to
tcp_clean_rtx_queue(). Prepare passing them via struct tcp_sacktag_state.
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

196da974

Merge branch 'rhashtable-test' · 10308220

David S. Miller authored May 03, 2015

Thomas Graf says:

====================
rhashtable self-test improvements

This series improves the rhashtable self-test to:
  * Avoid allocation of test objects
  * Measure the time of test runs
  * Use the iterator to walk the table for consistency
  * Account for failed insertions due to memory pressure or
    utilization pressure
  * Ignore failed insertions when checking for consistency
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

10308220

rhashtable-test: Detect insertion failures · 67b7cbf4

Thomas Graf authored Apr 30, 2015

Account for failed inserts due to memory pressure or EBUSY and
ignore failed entries during the consistency check.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

67b7cbf4

rhashtable-test: Use walker to test bucket statistics · 246b23a7

Thomas Graf authored Apr 30, 2015

As resizes may continue to run in the background, use walker to
ensure we see all entries. Also print the encountered number
of rehashes queued up while traversing.

This may lead to warnings due to entries being seen multiple
times. We consider them non-fatal.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

246b23a7

rhashtable-test: Do not allocate individual test objects · fcc57020

Thomas Graf authored Apr 30, 2015

By far the most expensive part of the selftest was the allocation
of entries. Using a static array allows to measure the rhashtable
operations.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcc57020

rhashtable-test: Get rid of ptr in test_obj structure · c2c8a901

Thomas Graf authored Apr 30, 2015

This only blows up the size of the test structure for no gain
in test coverage. Reduces size of test_obj from 24 to 16 bytes.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2c8a901

rhashtable-test: Measure time to insert, remove & traverse entries · 1aa661f5

Thomas Graf authored Apr 30, 2015

Make test configurable by allowing to specify all relevant knobs
through module parameters.

Do several test runs and measure the average time it takes to
insert & remove all entries. Note, a deferred resize might still
continue to run in the background.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

1aa661f5

rhashtable-test: Remove unused TEST_NEXPANDS · f54e84b6

Thomas Graf authored Apr 30, 2015

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

f54e84b6

Merge branch 'eth_type_trans' · 7a852021

David S. Miller authored May 03, 2015

Alexander Duyck says:

====================
A few minor clean-ups to eth_type_trans

This series addresses a few minor issues I found in eth_type_trans that
that allow us to gain back something like 3 or more cycles per packet.

The first change is to drop the byte swap since it isn't necessary.  On x86
we could just check the first byte and compare that against the upper 8
bits of the Ethertype to determine if we are dealing with a size value or
not.

The second makes it so that the value we read in to test for multicast can
be used for the address comparison.  This allows us to avoid a second read
of the destination address.

The final change is to avoid some unneeded instructions in computing the
Ethernet header pointer.  When we start the call the Ethernet header is at
skb->data, so we just use that rather than computing mac_header, and then
adding that back to skb->head.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7a852021