Commits · c3ac759ea6e9836d3040abed92f797383da4024f · Kirill Smelkov / linux

30 Oct, 2014 3 commits

Pablo Neira Ayuso authored Oct 30, 2014

Simon Horman says:

====================
The single patch in this series fixes some minor fallout from adding
support IPv6 real servers in IPv4 virtual-services and vice versa.

It should not have any run-time affect other than perhaps saving a few cycles.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c3ac759e

netfilter: log: protect nf_log_register against double registering · 8ac2bde2

Marcelo Leitner authored Oct 29, 2014

Currently, despite the comment right before the function,
nf_log_register allows registering two loggers on with the same type and
end up overwriting the previous register.

Not a real issue today as current tree doesn't have two loggers for the
same type but it's better to get this protected.

Also make sure that all of its callers do error checking.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8ac2bde2

netfilter: nf_log: Introduce nft_log_dereference() macro · 0c26ed1c

Marcelo Leitner authored Oct 29, 2014

Wrap up a common call pattern in an easier to handle call.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0c26ed1c

28 Oct, 2014 1 commit

ipvs: remove unnecessary assignment in __ip_vs_get_out_rt · d7701089

Alex Gartrell authored Oct 06, 2014

It is a precondition of the function that daddr be equal to dest->addr.ip
if dest is non-NULL, so this additional assignment is just confusing for
stupid engineers like me.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

d7701089

27 Oct, 2014 17 commits

netfilter: nf_tables: add new expression nft_redir · e9105f1b

Arturo Borrero authored Oct 17, 2014

This new expression provides NAT in the redirect flavour, which is to
redirect packets to local machine.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e9105f1b

netfilter: refactor NAT redirect IPv6 code to use it from nf_tables · 9de920ed

Arturo Borrero authored Oct 17, 2014

This patch refactors the IPv6 code so it can be usable both from xt and
nf_tables.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9de920ed

netfilter: refactor NAT redirect IPv4 to use it from nf_tables · 8b13eddf

Arturo Borrero authored Oct 16, 2014

This patch refactors the IPv4 code so it can be usable both from xt and
nf_tables.

A similar patch follows-up to handle IPv6.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8b13eddf

ipx: remove __inline__ in c file on static · b8901ac3

Fabian Frederick authored Oct 27, 2014

Let compiler decide what to do with static void __ipxitf_put()
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8901ac3

ipx: remove unnecessary casting on ntohl · e0f36310

Fabian Frederick authored Oct 27, 2014

use %08X instead of %08lX and remove casting.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0f36310

ipx: move extern sysctl_ipx_pprop_broadcasting to header file · 5e96d788

Fabian Frederick authored Oct 27, 2014

include ipx.h from sysctl_net_ipx.c
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e96d788

ipv6: include linux/uaccess.h instead of asm/uaccess.h · ce256981
Fabian Frederick authored Oct 27, 2014
```
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ce256981

ipv6: replace min/casting by min_t · 9451a304

Fabian Frederick authored Oct 27, 2014

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

9451a304

ipv4: remove set but unused variable sha · 6b436d33

Fabian Frederick authored Oct 27, 2014

unsigned char *sha (source) was already in original git version
 but was never used.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b436d33

Merge branch 's390-next' · 49cc91f9

David S. Miller authored Oct 26, 2014

Frank Blaschka says:

====================
s390: network patches for net-next

looks like there was a problem with my previous posting. Hope this time
it will work. Sorry for any inconvenience. The patches are mostly
cleanups and small enhancements for net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

49cc91f9

ctcm: replace sscanf by kstrto function · 652d77ba

Thomas Richter authored Oct 22, 2014

Since a single integer value is read from the supplied buffer
use the kstrto functions instead of sscanf.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

652d77ba

lcs: replace sscanf by kstrto function · 786f0065

Thomas Richter authored Oct 22, 2014

Since a single integer value is read from the supplied buffer
use the kstrto functions instead of sscanf.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

786f0065

qeth: s390 ethernet device driver dependency · 3d14f661

Thomas Richter authored Oct 22, 2014

Compile the s390 10GB ethernet device driver only when
ETHERNET has been defined in the kernel configuration file.
Right now the qeth device driver is always built regardless
of which network connectivity is active.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d14f661

qeth: make local functions static in qeth_l3 module · 56530d68

Thomas Richter authored Oct 22, 2014

This patch makes 4 local functions static and removes
the prototypes from the header file.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56530d68

qeth: fix some trace formating issues · 8a593148

Thomas Richter authored Oct 22, 2014

This patch fixes trace formatting issues using the
QETH_CARD_TEXT_ macro. The total size of each trace entry
is 8 bytes. Some of the sprintf formats exceed these 8
bytes (for example using abcd:%d and the converted value
needs more than 3 bytes). The solution is to shorten the
text prepending the value or use a different format (%x).
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a593148

qeth: qeth_core_main make local functions static · bca51650

Thomas Richter authored Oct 22, 2014

This patch makes some global functions static and removes
the prototypes from the header file.
Also function qeth_query_card_info is not exported anymore,
there is no external user for it, this function should never
have been exported in the first place.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bca51650

xen-netfront: always keep the Rx ring full of requests · 1f3c2eba

David Vrabel authored Oct 22, 2014

A full Rx ring only requires 1 MiB of memory.  This is not enough
memory that it is useful to dynamically scale the number of Rx
requests in the ring based on traffic rates, because:

a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
   VM (for example, the AWS micro instance still has 1 GiB of memory).

b) Netfront would have used up to 1 MiB already even with moderate
   data rates (there was no adjustment of target based on memory
   pressure).

c) Small VMs are going to typically have one VCPU and hence only one
   queue.

Keeping the ring full of Rx requests handles bursty traffic better
than trying to converge on an optimal number of requests to keep
filled.

On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
traffic are expected to be higher.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f3c2eba

25 Oct, 2014 4 commits

Merge branch 'sunvnet-napi' · 9286ae01

David S. Miller authored Oct 25, 2014

Sowmini Varadhan says:

====================
sunvnet: NAPIfy sunvnet

This patchset converts the sunvnet driver to use the NAPI framework.
Changes since v4 to Patch1:
  vnet_event accumulates LDC_EVENT_* bits into rx_event.
  vnet_event_napi() unrolls send_events() logic to process all rx_event bits.
Changes since v5:
  Patch 1: use net_device.h definition for NAPI_POLL_WEIGHT.
  Drop sparclinux changes (patch3) per David Miller feedback

Patch 1 in the series addresses the packet-receive path- all
the vnet_event() processing is moved into NAPI context.
This patch is dependant on the sparc-next commit:
  "sparc64: Add vio_set_intr() to enable/disable Rx interrupts"
  (sparc commit id ca605b7d)

Patch 2 uses RCU to fix race conditions between vnet_port_remove and
paths that access/modify port-related state, such as vnet_start_xmit.

Patch 3 leverages from the NAPIfied Rx path,
dropping superfluous usage of the irqsave/irqrestores on the vio.lock
where possible.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9286ae01

sunvnet: Remove irqsave/irqrestore on vio.lock · 13b13dd9

Sowmini Varadhan authored Oct 25, 2014

After the  NAPIfication of sunvnet, we no longer need to
synchronize by doing irqsave/restore on vio.lock in the
I/O fastpath.

NAPI ->poll() is non-reentrant, so all RX processing occurs
strictly in a serialized environment. TX reclaim is done in NAPI
context, so the netif_tx_lock can be used to serialize
critical sections between Tx and Rx paths.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

13b13dd9

sunvnet: Use RCU to synchronize port usage with vnet_port_remove() · 2a968dd8

Sowmini Varadhan authored Oct 25, 2014

A vnet_port_remove could be triggered as a result of an ldm-unbind
operation by the peer, module unload, or other changes to the
inter-vnet-link configuration.  When this is concurrent with
vnet_start_xmit(), there are several race sequences possible,
such as

thread 1                                    thread 2
vnet_start_xmit
-> tx_port_find
   spin_lock_irqsave(&vp->lock..)
   ret = __tx_port_find(..)
   spin_lock_irqrestore(&vp->lock..)
                                           vio_remove -> ..
                                               ->vnet_port_remove
                                           spin_lock_irqsave(&vp->lock..)
                                           cleanup
                                           spin_lock_irqrestore(&vp->lock..)
                                           kfree(port)
/* attempt to use ret will bomb */

This patch adds RCU locking for port access so that vnet_port_remove
will correctly clean up port-related state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a968dd8

sunvnet: NAPIfy sunvnet · 69088822

Sowmini Varadhan authored Oct 25, 2014

Move Rx packet procssing to the NAPI poll callback.
Disable VIO interrupt and unconditioanlly go into NAPI
context from vnet_event.

Note that we want to minimize the number of LDC
STOP/START messages sent. Specifically, do not send a STOP
message if vnet_walk_rx does not read all the available descriptors
because of the NAPI budget limitation. Instead, note the end index
as part of port state, and resume from this index when the
next poll callback is triggered.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69088822

24 Oct, 2014 15 commits

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 132fb579

David S. Miller authored Oct 24, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-10-23

This series contains updates to i40e and i40evf.

Jesse modifies the i40e driver to only notify the firmware on link up/down
and qualified module events.  Also simplified the job of managing link
state by using the admin queue receive event for link events as a signal
to tell the driver to update link state.

Jeff (me) cleans up the inconsistent use of tabs for indentation in the admin
queue command header file.

Neerav converts the use of udelay() to usleep_range().

Anjali fixes a bug where receive would stop after some stress by adding
a sleep and restart as well as moving the setting of flow control because
it should be done at a PF level and not a VSI level.

Mitch adds code to handle link events when updating the PF switch, which
allows link information to be properly provided to VFS in all cases.

Catherine adds driver support for 10GBaseT and bumps driver version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

132fb579

net: llc: include linux/errno.h instead of asm/errno.h · 74bca138
Fabian Frederick authored Oct 22, 2014
```
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
74bca138

lapb: move EXPORT_SYMBOL after functions. · 75da1469

Fabian Frederick authored Oct 22, 2014

See Documentation/CodingStyle Chapter 6
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

75da1469

Merge branch 'berlin_ethernet' · 5f3619f2

David S. Miller authored Oct 24, 2014

Sebastian Hesselbarth says:

====================
Marvell PXA168 libphy handling and Berlin Ethernet

This patch series deals with a removing a IP feature that can be found
on all currently supported Marvell Ethernet IP (pxa168_eth, mv643xx_eth,
mvneta). The MAC IP allows to automatically perform PHY auto-negotiation
without software interaction.

However, this feature (a) fundamentally clashes with the way libphy works
and (b) is unable to deal with quirky PHYs that require special treatment.
In this series, pxa168_eth driver is rewritten to completely disable that
feature and properly deal with libphy provided PHYs.

As usual, a branch on top of v3.18-rc1 can be found at

git://git.infradead.org/users/hesselba/linux-berlin.git devel/bg2-bg2cd-eth-v2

Patches 1-5 should go through David's net tree, I'll pick up the DT patches
6-9.

There have been some changes,
compared to the RFT
- added phy-connection-type property to BG2Q PHY DT node
- bail out from pxa168_eth_adjust_link when there is no change in
  PHY parameters. Also, add a call to phy_print_status.
compared to v1
- move phy-connection-type to ethernet node instead of PHY node

Patch 1 adds support for Marvell 88E3016 FastEthernet PHY that is also
integrated in Marvell Berlin BG2/BG2CD SoCs.

Patch 2 allows to pass phy_interface_t on pxa168_eth platform_data that
is only used by mach-mmp/gplug. From the board setup, I guessed gplug's
PHY is connected via RMII. The patch still isn't even compile tested.

Patches 3-5 prepare proper libphy handling and finally remove all in-driver
PHY mangling related to the feature explained above.

Patches 6-9 add corresponding ethernet DT nodes to BG2, BG2CD, add a
phy-connection-type property to BG2Q and enable ethernet on BG2-based Sony
NSZ-GS7. I have tested all this on GS7 successfully with ip=dhcp on 100M FD.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5f3619f2

net: pxa168_eth: Remove in-driver PHY mangling · 9ff32fe1

Sebastian Hesselbarth authored Oct 22, 2014

With properly using libphy PHYs now, remove the in-driver PHY
mangling.
Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ff32fe1

net: pxa168_eth: Remove HW auto-negotiaion · 1a149132

Sebastian Hesselbarth authored Oct 22, 2014

Marvell Ethernet IP supports PHY negotiation driven by HW. This
fundamentally clashes with libphy (software) driven negotiation and
also cannot cope with quirky PHYs. Therefore, always disable any HW
negotiation features and properly use libphy's phy_device.
Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a149132

net: pxa168_eth: Prepare proper libphy handling · 9d8ea73d

Sebastian Hesselbarth authored Oct 22, 2014

Current libphy handling in pxa168_eth lacks proper phy_connect. Prepare
to fix this by first moving phy properties from platform_data to private
driver data.
Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d8ea73d

net: pxa168_eth: Provide phy_interface mode on platform_data · e7de17ab

Sebastian Hesselbarth authored Oct 22, 2014

The PXA168 Ethernet IP support MII and RMII connection to its PHY.
Currently, pxa168 platform_data does not provide a way to pass that
and there is one user of pxa168 platform_data (mach-mmp/gplug).
Given the pinctrl settings of gplug it uses RMII, so add and pass
a corresponding phy_interface_t.
Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7de17ab

phy: marvell: Add support for 88E3016 FastEthernet PHY · 6b358aed

Sebastian Hesselbarth authored Oct 22, 2014

Marvell 88E3016 is a FastEthernet PHY that also can be found in Marvell
Berlin SoCs as integrated PHY.
Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b358aed

natsemi/macsonic: Remove superfluous interrupt disable/restore · d4c3363e

Geert Uytterhoeven authored Oct 21, 2014

As of commit e4dc601b ("m68k: Disable/restore interrupts in
hwreg_present()/hwreg_write()"), this is no longer needed.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4c3363e

cirrus/mac89x0: Remove superfluous interrupt disable/restore · 7f30b742

Geert Uytterhoeven authored Oct 21, 2014

As of commit e4dc601b ("m68k: Disable/restore interrupts in
hwreg_present()/hwreg_write()"), this is no longer needed.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f30b742

net: typhoon: Remove redundant casts · 00fd5d94

Rasmus Villemoes authored Oct 21, 2014

Both image_data and typhoon_fw->data are const u8*, so the cast to u8*
is unnecessary and confusing.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

00fd5d94

Removed unused function sctp_addr_is_valid() · 16704b12

Sébastien Barré authored Oct 21, 2014

sctp_addr_is_valid() only appeared in its definition.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

16704b12

Merge branch 'ipv6_route' · fad71e4a

David S. Miller authored Oct 24, 2014

Martin KaFai Lau says:

====================
ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route()

This patch set is trying to reduce the number of fib6_lookup()
calls from ip6_pol_route().

I have adapted davem's udpflooda and kbench_mod test
(https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to
support IPv6 and here is the result:

Before:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done

real    0m34.190s
user    0m3.047s
sys     0m31.108s

real    0m34.635s
user    0m3.125s
sys     0m31.475s

real    0m34.517s
user    0m3.034s
sys     0m31.449s

[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[  660.160976] ip6_route_kbench: ip6_route_output tdiff: 933
[  660.207261] ip6_route_kbench: ip6_route_output tdiff: 988
[  660.253492] ip6_route_kbench: ip6_route_output tdiff: 896
[  660.298862] ip6_route_kbench: ip6_route_output tdiff: 898

After:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done

real    0m32.695s
user    0m2.925s
sys     0m29.737s

real    0m32.636s
user    0m3.007s
sys     0m29.596s

real    0m32.797s
user    0m2.866s
sys     0m29.898s

[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[  881.220793] ip6_route_kbench: ip6_route_output tdiff: 684
[  881.253477] ip6_route_kbench: ip6_route_output tdiff: 640
[  881.286867] ip6_route_kbench: ip6_route_output tdiff: 630
[  881.320749] ip6_route_kbench: ip6_route_output tdiff: 653

/****************************** udpflood.c ******************************/
/* It is an adaptation of the Eric Dumazet's and David Miller's
 * udpflood tool, by adding IPv6 support.
 */

typedef uint32_t u32;

static int debug =3D 0;

/* Allow -fstrict-aliasing */
typedef union sa_u {
	struct sockaddr_storage a46;
	struct sockaddr_in a4;
	struct sockaddr_in6 a6;
} sa_u;

static int usage(void)
{
	printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs=
 ] IP_ADDRESS\n");
	return -1;
}

static u32 get_last32h(const sa_u *sa)
{
	if (sa->a46.ss_family =3D=3D PF_INET)
		return ntohl(sa->a4.sin_addr.s_addr);
	else
		return ntohl(sa->a6.sin6_addr.s6_addr32[3]);
}

static void set_last32h(sa_u *sa, u32 last32h)
{
	if (sa->a46.ss_family =3D=3D PF_INET)
		sa->a4.sin_addr.s_addr =3D htonl(last32h);
	else
		sa->a6.sin6_addr.s6_addr32[3] =3D htonl(last32h);
}

static void print_saddr(const sa_u *sa, const char *msg)
{
	char buf[64];

	if (!debug)
		return;

	switch (sa->a46.ss_family) {
	case PF_INET:
		inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf,
			  sizeof(buf));
		break;
	case PF_INET6:
		inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf));
		break;
	}

	printf("%s: %s\n", msg, buf);
}

static int send_packets(const sa_u *sa, size_t num_addrs, int count, int ms=
g_sz)
{
	char *msg =3D malloc(msg_sz);
	sa_u saddr;
	u32 start_addr32h, end_addr32h, cur_addr32h;
	int fd, i, err;

	if (!msg)
		return -ENOMEM;

	memset(msg, 0, msg_sz);

	memcpy(&saddr, sa, sizeof(saddr));
	cur_addr32h =3D start_addr32h =3D get_last32h(&saddr);
	end_addr32h =3D start_addr32h + num_addrs;

	fd =3D socket(saddr.a46.ss_family, SOCK_DGRAM, 0);
	if (fd < 0) {
		perror("socket");
		err =3D fd;
		goto out_nofd;
	}

	/* connect to avoid the kernel spending time in figuring
	 * out the source address (i.e pin the src address)
	 */
	err =3D connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
	if (err < 0) {
		perror("connect");
		goto out;
	}

	print_saddr(&saddr, "start_addr");
	for (i =3D 0; i < count; i++) {
		print_saddr(&saddr, "sendto");
		err =3D sendto(fd, msg, msg_sz, 0, (struct sockaddr *)&saddr,
			     sizeof(saddr));
		if (err < 0) {
			perror("sendto");
			goto out;
		}

		if (++cur_addr32h >=3D end_addr32h)
			cur_addr32h =3D start_addr32h;
		set_last32h(&saddr, cur_addr32h);
	}

	err =3D 0;
out:
	close(fd);
out_nofd:
	free(msg);
	return err;
}

int main(int argc, char **argv, char **envp)
{
	int port, msg_sz, count, num_addrs, ret;

	sa_u start_addr;

	port =3D 6000;
	msg_sz =3D 32;
	count =3D 10000000;
	num_addrs =3D 1;

	while ((ret =3D getopt(argc, argv, "dl:s:p:c:")) >=3D 0) {
		switch (ret) {
		case 'l':
			sscanf(optarg, "%d", &count);
			break;
		case 's':
			sscanf(optarg, "%d", &msg_sz);
			break;
		case 'p':
			sscanf(optarg, "%d", &port);
			break;
		case 'c':
			sscanf(optarg, "%d", &num_addrs);
			break;
		case 'd':
			debug =3D 1;
			break;
		case '?':
			return usage();
		}
	}

	if (num_addrs < 1)
		return usage();

	if (!argv[optind])
		return usage();

	start_addr.a4.sin_port =3D htons(port);
	if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr))
		start_addr.a46.ss_family =3D PF_INET;
	else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_add=
r))
		start_addr.a46.ss_family =3D PF_INET6;
	else
		return usage();

	return send_packets(&start_addr, num_addrs, count, msg_sz);
}

/****************** ip6_route_kbench_mod.c ******************/

/* We can't just use "get_cycles()" as on some platforms, such
 * as sparc64, that gives system cycles rather than cpu clock
 * cycles.
 */

static inline unsigned long long get_tick(void)
{
	unsigned long long t;

	__asm__ __volatile__("rd %%tick, %0" : "=r" (t));
	return t;
}
static inline unsigned long long get_tick(void)
{
	unsigned long long t;

	rdtscll(t);

	return t;
}
static inline unsigned long long get_tick(void)
{
	return get_cycles();
}

static int flow_oif = DEFAULT_OIF;
static int flow_iif = DEFAULT_IIF;
static u32 flow_mark = DEFAULT_MARK;
static struct in6_addr flow_dst_ip_addr;
static struct in6_addr flow_src_ip_addr;
static int flow_tos = DEFAULT_TOS;

static char dst_string[64];
static char src_string[64];

module_param_string(dst, dst_string, sizeof(dst_string), 0);
module_param_string(src, src_string, sizeof(src_string), 0);

static int __init flow_setup(void)
{
	if (dst_string[0] &&
	    !in6_pton(dst_string, -1, &flow_dst_ip_addr.s6_addr[0], -1, NULL)) {
		pr_info("cannot parse \"%s\"\n", dst_string);
		return -1;
	}

	if (src_string[0] &&
	    !in6_pton(src_string, -1, &flow_src_ip_addr.s6_addr[0], -1, NULL)) {
		pr_info("cannot parse \"%s\"\n", dst_string);
		return -1;
	}

	return 0;
}

module_param_named(oif, flow_oif, int, 0);
module_param_named(iif, flow_iif, int, 0);
module_param_named(mark, flow_mark, uint, 0);
module_param_named(tos, flow_tos, int, 0);

static int warmup_count = DEFAULT_WARMUP_COUNT;
module_param_named(count, warmup_count, int, 0);

static void flow_init(struct flowi6 *fl6)
{
	memset(fl6, 0, sizeof(*fl6));
	fl6->flowi6_proto = IPPROTO_ICMPV6;
	fl6->flowi6_oif = flow_oif;
	fl6->flowi6_iif = flow_iif;
	fl6->flowi6_mark = flow_mark;
	fl6->flowi6_tos = flow_tos;
	fl6->daddr = flow_dst_ip_addr;
	fl6->saddr = flow_src_ip_addr;
}

static struct sk_buff * fake_skb_get(void)
{
	struct ipv6hdr *hdr;
	struct sk_buff *skb;

	skb = alloc_skb(4096, GFP_KERNEL);
	if (!skb) {
		pr_info("Cannot alloc SKB for test\n");
		return NULL;
	}
	skb->dev = __dev_get_by_index(&init_net, flow_iif);
	if (skb->dev == NULL) {
		pr_info("Input device (%d) does not exist\n", flow_iif);
		goto err;
	}

	skb_reset_mac_header(skb);
	skb_reset_network_header(skb);
	skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr));
	hdr = ipv6_hdr(skb);

	hdr->priority = 0;
	hdr->version = 6;
	memset(hdr->flow_lbl, 0, sizeof(hdr->flow_lbl));
	hdr->payload_len = htons(sizeof(struct icmp6hdr));
	hdr->nexthdr = IPPROTO_ICMPV6;
	hdr->saddr = flow_src_ip_addr;
	hdr->daddr = flow_dst_ip_addr;
	skb->protocol = htons(ETH_P_IPV6);
	skb->mark = flow_mark;

	return skb;
err:
	kfree_skb(skb);
	return NULL;
}

static void do_full_output_lookup_bench(void)
{
	unsigned long long t1, t2, tdiff;
	struct rt6_info *rt;
	struct flowi6 fl6;
	int i;

	rt = NULL;

	for (i = 0; i < warmup_count; i++) {
		flow_init(&fl6);

		rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
		if (IS_ERR(rt))
			break;
		ip6_rt_put(rt);
	}
	if (IS_ERR(rt)) {
		pr_info("ip_route_output_key: err=%ld\n", PTR_ERR(rt));
		return;
	}

	flow_init(&fl6);

	t1 = get_tick();
	rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
	t2 = get_tick();
	if (!IS_ERR(rt))
		ip6_rt_put(rt);

	tdiff = t2 - t1;
	pr_info("ip6_route_output tdiff: %llu\n", tdiff);
}

static void do_full_input_lookup_bench(void)
{
	unsigned long long t1, t2, tdiff;
	struct sk_buff *skb;
	struct rt6_info *rt;
	int err, i;

	skb = fake_skb_get();
	if (skb == NULL)
		goto out_free;

	err = 0;
	local_bh_disable();
	for (i = 0; i < warmup_count; i++) {
		ip6_route_input(skb);
		rt = (struct rt6_info *)skb_dst(skb);
		err = (!rt || rt == init_net.ipv6.ip6_null_entry);
		skb_dst_drop(skb);
		if (err)
			break;
	}
	local_bh_enable();

	if (err) {
		pr_info("Input route lookup fails\n");
		goto out_free;
	}

	local_bh_disable();
	t1 = get_tick();
	ip6_route_input(skb);
	t2 = get_tick();
	local_bh_enable();

	rt = (struct rt6_info *)skb_dst(skb);
	err = (!rt || rt == init_net.ipv6.ip6_null_entry);
	skb_dst_drop(skb);
	if (err) {
		pr_info("Input route lookup fails\n");
		goto out_free;
	}

	tdiff = t2 - t1;
	pr_info("ip6_route_input tdiff: %llu\n", tdiff);

out_free:
	kfree_skb(skb);
}

static void do_full_lookup_bench(void)
{
	if (!flow_iif)
		do_full_output_lookup_bench();
	else
		do_full_input_lookup_bench();
}

static void do_bench(void)
{
	do_full_lookup_bench();
	do_full_lookup_bench();
	do_full_lookup_bench();
	do_full_lookup_bench();
}

static int __init kbench_init(void)
{
	if (flow_setup())
		return -EINVAL;

	pr_info("flow [IIF(%d),OIF(%d),MARK(0x%08x),D("IP6_FMT"),"
		"S("IP6_FMT"),TOS(0x%02x)]\n",
		flow_iif, flow_oif, flow_mark,
		IP6_PRT(flow_dst_ip_addr),
		IP6_PRT(flow_src_ip_addr),
		flow_tos);

	if (!cpu_has_tsc) {
		pr_err("X86 TSC is required, but is unavailable.\n");
		return -EINVAL;
	}

	pr_info("sizeof(struct rt6_info)==%zu\n", sizeof(struct rt6_info));

	do_bench();

	return -ENODEV;
}

static void __exit kbench_exit(void)
{
}

module_init(kbench_init);
module_exit(kbench_exit);
MODULE_LICENSE("GPL");
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fad71e4a

ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn · 367efcb9

Martin KaFai Lau authored Oct 20, 2014

This patch save the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.

Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable. Only use the 'strict' variable instead.

After this patch, "failing ip6_ins_rt()" should be the only case that
requires a redo of fib6_lookup().

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

367efcb9