Commits · 2aac7a2cb0d9d8c65fc7dde3e19e46b3e878d23d · Kirill Smelkov / linux

16 Dec, 2011 12 commits

unix_diag: Pending connections IDs NLA · 2aac7a2c

Pavel Emelyanov authored Dec 15, 2011

When establishing a unix connection on stream sockets the
server end receives an skb with socket in its receive queue.

Report who is waiting for these ends to be accepted for
listening sockets via NLA.

There's a lokcing issue with this -- the unix sk state lock is
required to access the peer, and it is taken under the listening
sk's queue lock. Strictly speaking the queue lock should be taken
inside the state lock, but since in this case these two sockets
are different it shouldn't lead to deadlock.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2aac7a2c

unix_diag: Unix peer inode NLA · ac02be8d

Pavel Emelyanov authored Dec 15, 2011

Report the peer socket inode ID as NLA. With this it's finally
possible to find out the other end of an interesting unix connection.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac02be8d

unix_diag: Unix inode info NLA · 5f7b0569

Pavel Emelyanov authored Dec 15, 2011

Actually, the socket path if it's not anonymous doesn't give
a clue to which file the socket is bound to. Even if the path
is absolute, it can be unlinked and then new socket can be
bound to it.

With this NLA it's possible to check which file a particular
socket is really bound to.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f7b0569

unix_diag: Unix socket name NLA · f5248b48

Pavel Emelyanov authored Dec 15, 2011

Report the sun_path when requested as NLA. With leading '\0' if
present but without the leading AF_UNIX bits.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5248b48

unix_diag: Dumping exact socket core · 5d3cae8b

Pavel Emelyanov authored Dec 15, 2011

The socket inode is used as a key for lookup. This is effectively
the only really unique ID of a unix socket, but using this for
search currently has one problem -- it is O(number of sockets) :(

Does it worth fixing this lookup or inventing some other ID for
unix sockets?
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d3cae8b

unix_diag: Dumping all sockets core · 45a96b9b

Pavel Emelyanov authored Dec 15, 2011

Walk the unix sockets table and fill the core response structure,
which includes type, state and inode.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45a96b9b

unix_diag: Basic module skeleton · 22931d3b

Pavel Emelyanov authored Dec 15, 2011

Includes basic module_init/_exit functionality, dump/get_exact stubs
and declares the basic API structures for request and response.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22931d3b

af_unix: Export stuff required for diag module · fa7ff56f

Pavel Emelyanov authored Dec 15, 2011

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa7ff56f

sock_diag: Generalize requests cookies managements · f65c1b53

Pavel Emelyanov authored Dec 15, 2011

The sk address is used as a cookie between dump/get_exact calls.
It will be required for unix socket sdumping, so move it from
inet_diag to sock_diag.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f65c1b53

sock_diag: Fix module netlink aliases · aec8dc62

Pavel Emelyanov authored Dec 15, 2011

I've made a mistake when fixing the sock_/inet_diag aliases :(

1. The sock_diag layer should request the family-based alias,
   not just the IPPROTO_IP one;
2. The inet_diag layer should request for AF_INET+protocol alias,
   not just the protocol one.

Thus fix this.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aec8dc62

sock_diag: Move the SOCK_DIAG_BY_FAMILY cmd declaration · e7c466e5

Pavel Emelyanov authored Dec 15, 2011

It should belong to sock_diag, not inet_diag.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7c466e5

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b26e478f
David S. Miller authored Dec 16, 2011
```
Conflicts:
	drivers/net/ethernet/freescale/fsl_pq_mdio.c
	net/batman-adv/translation-table.c
	net/ipv6/route.c
```
b26e478f

15 Dec, 2011 7 commits

tg3: Break out RSS indir table init and assignment · bcebcc46

Matt Carlson authored Dec 14, 2011

This patch creates a new device member to hold the RSS indirection table
and separates out the code that initializes the table from the code that
programs the table into device registers.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcebcc46

tg3: Use mii_advertise_flowctrl · f88788f0

Matt Carlson authored Dec 14, 2011

This patch replaces tg3's internal tg3_advert_flowctrl_1000T function
with mii_advertise_flowctrl provided by the kernel headers.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f88788f0

tg3: Add 57766 ASIC rev support · 55086ad9

Matt Carlson authored Dec 14, 2011

This patch adds support for the 57766 ASIC revision.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55086ad9

tg3: Make the TX BD DMA limit configurable · a4cb428d

Matt Carlson authored Dec 14, 2011

The 57766 ASIC rev will impose a new TX BD DMA limit on the driver.
This patch prepares for 57766 support by making the tx BD DMA limit
tunable.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4cb428d

tg3: Enable EEE support for capable 10/100 devs · 4f272096

Matt Carlson authored Dec 14, 2011

There are some devices in the 57765 ASIC rev that are EEE capable.
Unfortunately the EEE setup code only gets executed if the device is
gigabit capable.  This patch fixes the problem.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f272096

tcp_memcontrol: fix reversed if condition · c48e074c

Dan Carpenter authored Dec 15, 2011

We should only dereference the pointer if it's valid, not the other way
round.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c48e074c

Move limit definitions outside CONFIG_INET · 888bdaa9

Glauber Costa authored Dec 14, 2011

They need to be available for other protocols as well, since
they are used in sock.c openly
Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

888bdaa9

14 Dec, 2011 7 commits

inet: remove rcu protection on tw_net · f943cbe6

Eric Dumazet authored Dec 14, 2011

commit b099ce26 (net: Batch inet_twsk_purge) added rcu protection
on tw_net for no obvious reason.

struct net are refcounted anyway since timewait sockets escape from rcu
protected sections. tw_net stay valid for the whole timwait lifetime.

This also removes a lot of sparse errors.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f943cbe6

net: ping: remove some sparse errors · e6560d4d

Eric Dumazet authored Dec 14, 2011

net/ipv4/sysctl_net_ipv4.c:78:6: warning: symbol 'inet_get_ping_group_range_table'
was not declared. Should it be static?

net/ipv4/sysctl_net_ipv4.c:119:31: warning: incorrect type in argument 2
(different signedness)
net/ipv4/sysctl_net_ipv4.c:119:31: expected int *range
net/ipv4/sysctl_net_ipv4.c:119:31: got unsigned int *<noident>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6560d4d

cls_flow: remove one dynamic array · 3a53943b

Eric Dumazet authored Dec 14, 2011

Its better to use a predefined size for this small automatic variable.

Removes a sparse error as well :

net/sched/cls_flow.c:288:13: error: bad constant expression
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a53943b

bnx2x: handle vpd data longer than 128 bytes · fcdf95cb

Barak Witkowski authored Dec 14, 2011

Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcdf95cb

vlan: static functions · de93cb2e

Eric Dumazet authored Dec 13, 2011

commit 6d4cdf47 (vlan: add 802.1q netpoll support) forgot to declare
as static some private functions.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

de93cb2e

vlan: add rtnl_dereference() annotations · f9586f79

Dan Carpenter authored Dec 13, 2011

The original code generates a Sparse warning:
net/8021q/vlan_core.c:336:9:
	error: incompatible types in comparison expression (different address spaces)

It's ok to dereference __rcu pointers here because we are holding the
RTNL lock.  I've added some calls to rtnl_dereference() to silence the
warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9586f79

rtnetlink: rtnl_link_register() sanity test · c63044f0

Eric Dumazet authored Dec 13, 2011

Before adding a struct rtnl_link_ops into link_ops list, check it doesnt
clash with a prior one.

Based on a previous patch from Alexander Smirnov
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c63044f0

13 Dec, 2011 14 commits

ipv6: Check dest prefix length on original route not copied one in rt6_alloc_cow(). · bb3c3686

David S. Miller authored Dec 13, 2011

After commit 8e2ec639 ("ipv6: don't
use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
for setting the ANYCAST flag is now wrong.

'rt' will always now have a plen of 128, because it is set explicitly
to 128 by ip6_rt_copy.

So to restore the semantics of the test, check the destination prefix
length of 'ort'.
Signed-off-by: David S. Miller <davem@davemloft.net>

bb3c3686

ipv6: If neigh lookup fails during icmp6 dst allocation, propagate error. · b43faac6

David S. Miller authored Dec 13, 2011

Don't just succeed with a route that has a NULL neighbour attached.
This follows the behavior of addrconf_dst_alloc().

Allowing this kind of route to end up with a NULL neigh attached will
result in packet drops on output until the route is somehow
invalidated, since nothing will meanwhile try to lookup the neigh
again.

A statistic is bumped for the case where we see a neigh-less route on
output, but the resulting packet drop is otherwise silent in nature,
and frankly it's a hard error for this to happen and ipv6 should do
what ipv4 does which is say something in the kernel logs.
Signed-off-by: David S. Miller <davem@davemloft.net>

b43faac6

net: Remove unused neighbour layer ops. · 5c3ddec7

David S. Miller authored Dec 13, 2011

It's simpler to just keep these things out until there is a real user
of them, so we can see what the needs actually are, rather than keep
these things around as useless overhead.
Signed-off-by: David S. Miller <davem@davemloft.net>

5c3ddec7

mlx4_en: updated driver version to 2.0 · 6edf91da

Yevgeny Petrilin authored Dec 13, 2011

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

6edf91da

mlx4_core: updated driver version to 1.1 · 7d4b6bcc

Yevgeny Petrilin authored Dec 13, 2011

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d4b6bcc

mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet · ab9c17a0

Jack Morgenstein authored Dec 13, 2011

1. Added module parameters sr_iov and probe_vf for controlling enablement of
   SRIOV mode.
2. Increased default max num-qps, num-mpts and log_num_macs to accomodate
   SRIOV mode
3. Added port_type_array as a module parameter to allow driver startup with
   ports configured as desired.
   In SRIOV mode, only ETH is supported, and this array is ignored; otherwise,
   for the case where the FW supports both port types (ETH and IB), the
   port_type_array parameter is used.
   By default, the port_type_array is set to configure both ports as IB.
4. When running in sriov mode, the master needs to initialize the ICM eq table
   to hold the eq's for itself and also for all the slaves.
5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap.
6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures
   mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow
   modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca).
   VFs obtain their startup information from the PF (master) device via the
   comm channel.
7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver
   aborts loading the device.
8. Do not allow setting port type via sysfs when running in SRIOV mode.
9. mlx4_get_ownership:  Currently, only one PF is supported by the driver.
   If the HCA is burned with FW which enables more than one PF, only one
   of the PFs is allowed to run.  The first one up grabs a FW ownership
   semaphone -- all other PFs will find that semaphore taken, and the
   driver will not allow them to run.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab9c17a0

mlx4_core: adjust catas operation for SRIOV mode · d81c7186

Jack Morgenstein authored Dec 13, 2011

When running in SRIOV mode, driver should not automatically start/stop
the mlx4_core upon sensing an HCA internal error -- doing this disables/enables
sriov, which will cause the hypervisor to hang if there are running VMs with
attached VFs.

In addition, on VMs the catas process should not run at all, since the HCA
error buffer is not available to VMs in the BARs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

d81c7186

mlx4_core: mtts resources units changed to offset · 2b8fb286

Marcel Apfelbaum authored Dec 13, 2011

In the previous implementation mtts are managed by:
1. order     - log(mtt segments), 'mtt segment' groups several mtts together.
2. first_seg - segment location relative to mtt table.
In the current implementation:
1. order     - log(mtts) rather than segments
2. offset    - mtt index in mtt table

Note: The actual mtt allocation is made in segments but it is
      transparent to callers.

Rational: The mtt resource holders are not interested on how the allocation
          of mtt is done, but rather on how they will use it.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b8fb286

mlx4_en: Allow communication between functions on same host · 5b4c4d36

Eugenia Emantayev authored Dec 13, 2011

To enable internal loopback, always fill DMAC in control segment
when transmitting the packet, once this is done, the packet is subject
for loopback for if the DMAC mathces one of the multicast/unicast addresses
registered on the physical port.
In receive path if source MAC is our own MAC and we are not in selftest,
or not in force LB mode - drop this packet.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b4c4d36

mlx4: Ethernet port management modifications · ffe455ad

Eugenia Emantayev authored Dec 13, 2011

The physical port is now common to the PF and VFs.
The port resources and configuration is managed by the PF, VFs can
only influence the MTU of the port, it is set as max among all functions,
Each function allocates RX buffers of required size to meet it's MTU enforcement.
Port management code was moved to mlx4_core, as the mlx4_en module is
virtualization unaware

Move handling qp functionality to mlx4_get_eth_qp/mlx4_put_eth_qp
including reserve/release range and add/release unicast steering.
Let mlx4_register/unregister_mac deal only with MAC (un)registration.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffe455ad

mlx4: Traffic steering management support for SRIOV · 0ec2c0f8

Eugenia Emantayev authored Dec 13, 2011

Let multicast/unicast attaching flow go through resource tracker.
The PF is the one responsible for managing all the steering entries.
Define and use module parameter that determines the number of qps
per multicast group.
Minor changes in function calls according to changed prototype.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ec2c0f8

mlx4_ib: disable SRIOV mode for IB ports (not yet supported) · 8e59d254

Jack Morgenstein authored Dec 13, 2011

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e59d254

mlx4_core: resource tracking for HCA resources used by guests · c82e9aa0

Eli Cohen authored Dec 13, 2011

The resource tracker is used to track usage of HCA resources by the different
guests.

Virtual functions (VFs) are attached to guest operating systems but
resources are allocated from the same pool and are assigned to VFs. It is
essential that hostile/buggy guests not be able to affect the operation of
other VFs, possibly attached to other guest OSs since ConnectX firmware is not
tolerant to misuse of resources.

The resource tracker module associates each resource with a VF and maintains
state information for the allocated object. It also defines allowed state
transitions and enforces them.

Relationships between resources are also referred to. For example, CQs are
pointed to by QPs, so it is forbidden to destroy a CQ if a QP refers to it.

ICM memory is always accessible through the primary function and hence it is
allocated by the owner of the primary function.

When a guest dies, an FLR is generated for all the VFs it owns and all the
resources it used are freed.

The tracked resource types are: QPs, CQs, SRQs, MPTs, MTTs, MACs, RES_EQs,
and XRCDNs.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

c82e9aa0

mlx4_core: Add wrapper functions and comm channel and slave event support to EQs · acba2420

Jack Morgenstein authored Dec 13, 2011

Passing async events to slaves:
In SRIOV mode, each slave creates its own async EQ, but only the master can
register directly with the FW to receive async events.  Async events which
should be passed to slaves (such as a WQ_ACCESS_ERROR for a QP owned by a slave)
are generated at the slave by the master using the GEN_EQE FW command.

Wrapper functions: mlx4_MAP_EQ_wrapper
Only the master can map an EQ. The slave commands to map their EQs arrive
at the master via the comm channel.  The master then invokes the wrapper
function to do the work (and enter the resource in the tracking database).

New events: COMM_CHANNEL and FLR
The COMM_CHANNEL event arrives only at the master, and signals that
a slave has posted a command on the comm channel.
The FLR event is generated by the FW when a guest operating a VF
unexpectedly goes down.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

acba2420