Commits · 24b36f0193467fa727b85b4c004016a8dae999b9 · Kirill Smelkov / linux

02 Aug, 2010 1 commit

netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary · 24b36f01

Eric Dumazet authored Aug 02, 2010

We currently disable BH for the whole duration of get_counters()

On machines with a lot of cpus and large tables, this might be too long.

We can disable preemption during the whole function, and disable BH only
while fetching counters for the current cpu.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

24b36f01

23 Jul, 2010 10 commits

netfilter: iptables: use skb->len for accounting · 7df0884c

Changli Gao authored Jul 23, 2010

Use skb->len for accounting as xt_quota does.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

7df0884c

netfilter: ip6tables: use skb->len for accounting · 261abc8c

Changli Gao authored Jul 23, 2010

ipv6_hdr(skb)->payload_len is ZERO and can't be used for accounting, if
the payload is a Jumbo Payload specified in RFC2675.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

261abc8c

xt_quota: report initial quota value instead of current value to userspace · 49daf6a2

Changli Gao authored Jul 23, 2010

We should copy the initial value to userspace for iptables-save and
to allow removal of specific quota rules.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

49daf6a2

netfilter: xt_quota: use per-rule spin lock · b0c81aa5

Changli Gao authored Jul 23, 2010

Use per-rule spin lock to improve the scalability.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

b0c81aa5

netfilter: arptables: use arp_hdr_len() · f667009e

Changli Gao authored Jul 23, 2010

use arp_hdr_len().
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

f667009e

netfilter: nf_nat_core: merge the same lines · c36952e5

Changli Gao authored Jul 23, 2010

proto->unique_tuple() will be called finally, if the previous calls fail. This
patch checks the false condition of (range->flags &IP_NAT_RANGE_PROTO_RANDOM)
instead to avoid duplicate line of code: proto->unique_tuple().
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

c36952e5

netfilter: add xt_cpu match · e8648a1f

Eric Dumazet authored Jul 23, 2010

In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.

With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)

Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.

Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.

Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
        -j REDIRECT --to-port 8080

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
        -j REDIRECT --to-port 8081

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
        -j REDIRECT --to-port 8082

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
        -j REDIRECT --to-port 8083
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

e8648a1f

IPVS: make FTP work with full NAT support · 7f1c4075

Hannes Eder authored Jul 23, 2010

Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
sequence adjusting.  The function 'ip_vs_skb_replace' is now dead
code, so it is removed.

To SNAT FTP, use something like:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
    --vport 21 -j SNAT --to-source 192.168.10.10
and for the data connections in passive mode:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
    --vportctl 21 -j SNAT --to-source 192.168.10.10
using '-m state --state RELATED' would also works.

Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
nf_nat_ftp are loaded.

[ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

7f1c4075

IPVS: make friends with nf_conntrack · 7b215ffc

Hannes Eder authored Jul 23, 2010

Update the nf_conntrack tuple in reply direction, as we will see
traffic from the real server (RIP) to the client (CIP).  Once this is
done we can use netfilters SNAT in POSTROUTING, especially with
xt_ipvs, to do source NAT, e.g.:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 80 \
		  -j SNAT --to-source 192.168.10.10

[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

7b215ffc

netfilter: xt_ipvs (netfilter matcher for IPVS) · 9c3e1c39

Hannes Eder authored Jul 23, 2010

This implements the kernel-space side of the netfilter matcher xt_ipvs.

[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
[ Patrick: added xt_ipvs.h to Kbuild ]
Signed-off-by: Patrick McHardy <kaber@trash.net>

9c3e1c39

16 Jul, 2010 1 commit
- netfilter: correct CHECKSUM header and export it · 22cb5166
  Michael S. Tsirkin authored Jul 16, 2010
```
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
  22cb5166
15 Jul, 2010 3 commits

netfilter: add CHECKSUM target · edf0e1fb

Michael S. Tsirkin authored Jul 15, 2010

This adds a `CHECKSUM' target, which can be used in the iptables mangle
table.

You can use this target to compute and fill in the checksum in
a packet that lacks a checksum.  This is particularly useful,
if you need to work around old applications such as dhcp clients,
that do not work well with checksum offloads, but don't want to
disable checksum offload in your device.

The problem happens in the field with virtualized applications.
For reference, see Red Hat bz 605555, as well as
http://www.spinics.net/lists/kvm/msg37660.html

Typical expected use (helps old dhclient binary running in a VM):
iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
	-j CHECKSUM --checksum-fill

Includes fixes by Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

edf0e1fb

netfilter: nf_ct_tcp: fix flow recovery with TCP window tracking enabled · fac42a9a

Pablo Neira Ayuso authored Jul 15, 2010

This patch adds the missing bits to support the recovery of TCP flows
without disabling window tracking (aka be_liberal). To ensure a
successful recovery, we have to inject the window scale factor via
ctnetlink.

This patch has been tested with a development snapshot of conntrackd
and the new clause `TCPWindowTracking' that allows to perform strict
TCP window tracking recovery across fail-overs.

With this patch, we don't update the receiver's window until it's not
initiated. We require this to perform a successful recovery. Jozsef
confirmed in a private email that this spotted a real issue since that
should not happen.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

fac42a9a

nfnetlink_log: do not expose NFULNL_COPY_DISABLED to user-space · cca5cf91

Pablo Neira Ayuso authored Jul 15, 2010

This patch moves NFULNL_COPY_PACKET definition from
linux/netfilter/nfnetlink_log.h to net/netfilter/nfnetlink_log.h
since this copy mode is only for internal use.

I have also changed the value from 0x03 to 0xff. Thus, we avoid
a gap from user-space that may confuse users if we add new
copy modes in the future.

This change was introduced in:
http://www.spinics.net/lists/netfilter-devel/msg13535.html

Since this change is not included in any stable Linux kernel,
I think it's safe to make this change now. Anyway, this copy
mode does not make any sense from user-space, so this patch
should not break any existing setup.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

cca5cf91

09 Jul, 2010 2 commits

netfilter: xt_TPROXY: the length of lines should be within 80 · 116e1f1b

Changli Gao authored Jul 09, 2010

According to the Documentation/CodingStyle, the length of lines should
be within 80.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

116e1f1b

ipvs: lvs sctp protocol handler is incorrectly invoked ip_vs_app_pkt_out · 8a0acaac

Xiaoyu Du authored Jul 09, 2010

lvs sctp protocol handler is incorrectly invoked ip_vs_app_pkt_out
Since there's no sctp helpers at present, it does the same thing as
ip_vs_app_pkt_in.
Signed-off-by: Xiaoyu Du <tingsrain@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

8a0acaac

05 Jul, 2010 4 commits

ipvs: Kconfig cleanup · 72c7664f

Michal Marek authored Jul 05, 2010

IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
selected. Express this with standard kconfig syntax.
Signed-off-by: Michal Marek <mmarek@suse.cz>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

72c7664f

netfilter: ipt_REJECT: avoid touching dst ref · b13b7125

Eric Dumazet authored Jul 05, 2010

We can avoid a pair of atomic ops in ipt_REJECT send_reset()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

b13b7125

netfilter: ipt_REJECT: postpone the checksum calculation. · 98b0e84a

Changli Gao authored Jul 05, 2010

postpone the checksum calculation, then if the output NIC supports checksum
offloading, we can utlize it. And though the output NIC doesn't support
checksum offloading, but we'll mangle this packet, this can free us from
updating the checksum, as the checksum calculation occurs later.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

98b0e84a

netfilter: nf_conntrack_reasm: add fast path for in-order fragments · ea8fbe8f

Changli Gao authored Jul 05, 2010

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ea8fbe8f

04 Jul, 2010 9 commits

IB/{nes, ipoib}: Pass supported flags to ethtool_op_set_flags() · 39827be2

Ben Hutchings authored Jul 03, 2010

Following commit 1437ce39 "ethtool:
Change ethtool_op_set_flags to validate flags", ethtool_op_set_flags
takes a third parameter and cannot be used directly as an
implementation of ethtool_ops::set_flags.

Changes nes and ipoib driver to pass in the appropriate value.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

39827be2

bnx2: Update version to 2.0.16. · e5a0c1fd

Michael Chan authored Jul 03, 2010

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5a0c1fd

bnx2: Dump some config space registers during TX timeout. · 5804a8fb

Michael Chan authored Jul 03, 2010

These config register values will be useful when the memory registers
are returning 0xffffffff which has been reported.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5804a8fb

bnx2: Add support for skb->rxhash. · fdc8541d

Michael Chan authored Jul 03, 2010

Add skb->rxhash support for TCP packets only because the bnx2 RSS hash
does not hash UDP ports.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdc8541d

bnx2: Always enable MSI-X on 5709. · 3d5f3a7b

Michael Chan authored Jul 03, 2010

Minor change to use MSI-X even if there is only one CPU.  This allows
the CNIC driver to always have a dedicated MSI-X vector to handle
iSCSI events, instead of sharing the MSI vector.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d5f3a7b

netdevice.h: Change netif_<level> macros to call netdev_<level> functions · f45f4321

Joe Perches authored Jun 27, 2010

Reduces text ~300 bytes of text (woohoo!) in an x86 defconfig

$ size vmlinux*
   text	   data	    bss	    dec	    hex	filename
7198526	 720112	1366288	9284926	 8dad3e	vmlinux
7198862	 720112	1366288	9285262	 8dae8e	vmlinux.netdev
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f45f4321

netdevice.h net/core/dev.c: Convert netdev_<level> logging macros to functions · 256df2f3

Joe Perches authored Jun 27, 2010

Reduces an x86 defconfig text and data ~2k.
text is smaller, data is larger.

$ size vmlinux*
   text	   data	    bss	    dec	    hex	filename
7198862	 720112	1366288	9285262	 8dae8e	vmlinux
7205273	 716016	1366288	9287577	 8db799	vmlinux.device_h

Uses %pV and struct va_format
Format arguments are verified before printk
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

256df2f3

device.h drivers/base/core.c Convert dev_<level> logging macros to functions · 99bcf217

Joe Perches authored Jun 27, 2010

Reduces an x86 defconfig text and data ~55k, .6% smaller.

$ size vmlinux*
   text	   data	    bss	    dec	    hex	filename
7205273	 716016	1366288	9287577	 8db799	vmlinux
7258890	 719768	1366288	9344946	 8e97b2	vmlinux.master

Uses %pV and struct va_format
Format arguments are verified before printk

The dev_info macro is converted to _dev_info because there are
existing uses of variables named dev_info in the kernel tree
like drivers/net/pcmcia/pcnet_cs.c

A dev_info macro is created to call _dev_info
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

99bcf217

vsprintf: Recursive vsnprintf: Add "%pV", struct va_format · 7db6f5fb

Joe Perches authored Jun 27, 2010

Add the ability to print a format and va_list from a structure pointer

Allows __dev_printk to be implemented as a single printk while
minimizing string space duplication.

%pV should not be used without some mechanism to verify the
format and argument use ala __attribute__(format (printf(...))).
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

7db6f5fb

03 Jul, 2010 1 commit
- Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 · e490c1de
  David S. Miller authored Jul 02, 2010
  
  e490c1de
02 Jul, 2010 9 commits

bridge: add per bridge device controls for invoking iptables · 4df53d8b

Patrick McHardy authored Jul 02, 2010

Support more fine grained control of bridge netfilter iptables invocation
by adding seperate brnf_call_*tables parameters for each device using the
sysfs interface. Packets are passed to layer 3 netfilter when either the
global parameter or the per bridge parameter is enabled.
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>

4df53d8b

ixgbe: use NETIF_F_LRO · 0a17d8c7

Stanislaw Gruszka authored Jul 01, 2010

Both ETH_FLAG_LRO and NETIF_F_LRO have the same value, but NETIF_F_LRO
is intended to use with netdev->features.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a17d8c7

igb: Add comment · de42edde

Greg Rose authored Jul 01, 2010

Add explanatory comment to avoid confusion when a pointer is set
to the second word of an array instead of the customary cast of a
pointer to the beginning of the array.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de42edde

igb: correct link test not being run when link is down · 8d420a1b

Alexander Duyck authored Jul 01, 2010

The igb online link test was always reporting pass because instead of
checking for if_running it was checking for netif_carrier_ok.

This change corrects the test so that it is run if the interface is running
instead of checking for netif carrier ok.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d420a1b

igb: Fix Tx hangs seen when loading igb with max_vfs > 7. · c0f2276f

Emil Tantilov authored Jul 01, 2010

Check the value of max_vfs at the time of assignment of vfs_allocated_count.

The previous check in igb_probe_vfs was too late as by that time the rx/tx
rings were initialized with the wrong offset.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0f2276f

igb: Use only a single Tx queue in SR-IOV mode · 5fa8517f

Greg Rose authored Jul 01, 2010

The 82576 expects the second rx queue in any pool to receive L2 switch
loop back packets sent from the second tx queue in another pool.  The
82576 VF driver does not enable the second rx queue so if the PF driver
sends packets destined to a VF from its second tx queue then the VF
driver will never see them.  In SR-IOV mode limit the number of tx queues
used by the PF driver to one. This patch fixes a bug reported in which
the PF cannot communciate with the VF and should be considered for 2.6.34
stable.

CC: stable@kernel.org
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fa8517f

igb: fix PHY config access on 82580 · ede3ef0d

Nick Nunley authored Jul 01, 2010

82580 NICs can have up to 4 functions. This fixes phy accesses
to use the correct locks for functions 2 and 3.
Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ede3ef0d

x86: Drop CONFIG_MCORE2 check around setting of NET_IP_ALIGN · 74752710

Alexander Duyck authored Jul 01, 2010

This patch removes the CONFIG_MCORE2 check from around NET_IP_ALIGN.  It is
based on a suggestion from Andi Kleen.  The assumption is that there are
not any x86 cores where unaligned access is really slow, and this change
would allow for a performance improvement to still exist on configurations
that are not necessarily optimized for Core 2.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74752710

ll_temac: add error checking to DMA init path · fe62c298

Denis Kirjanov authored Jun 30, 2010

Add error checking to DMA descriptor rings initialization code.
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe62c298