Commits · a2519929aba78e8cec7955d2c2a0c1e230d1f6e6 · nexedi / linux

04 Mar, 2015 10 commits

mpls: Basic support for adding and removing routes · a2519929

Eric W. Biederman authored Mar 03, 2015

mpls_route_add and mpls_route_del implement the basic logic for adding
and removing Next Hop Label Forwarding Entries from the MPLS input
label map.  The addition and subtraction is done in a way that is
consistent with how the existing routing table in Linux are
maintained.  Thus all of the work to deal with NLM_F_APPEND,
NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE.

Cases that are not clearly defined such as changing the interpretation
of the mpls reserved labels is not allowed.

Because it seems like the right thing to do adding an MPLS route without
specifying an input label and allowing the kernel to pick a free label
table entry is supported.   The implementation is currently less than optimal
but that can be changed.

As I don't have anything else to test with only ethernet and the loopback
device are the only two device types currently supported for forwarding
MPLS over.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2519929

mpls: Add a sysctl to control the size of the mpls label table · 7720c01f

Eric W. Biederman authored Mar 03, 2015

This sysctl gives two benefits.  By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets.  This prevents unpleasant surprises for users.

The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.

This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7720c01f

mpls: Basic routing support · 0189197f

Eric W. Biederman authored Mar 03, 2015

This change adds a new Kconfig option MPLS_ROUTING.

The core of this change is the code to look at an mpls packet received
from another machine.  Look that packet up in a routing table and
forward the packet on.

Support of MPLS over ATM is not considered or attempted here.  This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.

What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
label fordwarding information base (lfib) might also be valid.

Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path.  In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.

Quite a few optional features are not implemented here.  Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error).  The traffic
class field is always set to 0.  The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.

Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value).  I was lazy and implemented a next hop mac
address instead.  The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.

Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces.  There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte.  Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.

For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early.  Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped.  Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0189197f

mpls: Refactor how the mpls module is built · cec9166c

Eric W. Biederman authored Mar 03, 2015

This refactoring is needed to allow more than just mpls gso
support to be built into the mpls moddule.
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cec9166c

Merge branch 'neigh-mpls-prep' · ee23393b

David S. Miller authored Mar 04, 2015

Eric W. Biederman says:

====================
Neighbour table prep for MPLS

In preparation for using the IPv4 and IPv6 neighbour tables in my mpls
code this patchset factors out ___neigh_lookup_noref from
__ipv4_neigh_lookup_noref, __ipv6_lookup_noref and neigh_lookup.
Allowing the lookup logic to be shared between the different
implementations.  At what appears to be no cost. (Aka the same assembly
is generated for ip6_finish_output2 and ip_finish_output2).

After that I add a simple function that takes an address family and an
address consults the neighbour table and sends the packet to the
appropriate location.  The address family argument decoupls callers
of neigh_xmit from the addresses families the packets are sent over.
(Aka The ipv6 module can be loaded after mpls and a previously
configured ipv6 next hop will start working).

The refactoring in ___neigh_lookup_noref may be a bit overkill but it
feels like the right thing to do.  Especially since the same code is
generated.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ee23393b

neigh: Add helper function neigh_xmit · 4fd3d7d9

Eric W. Biederman authored Mar 03, 2015

For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.

The kind of next hop we have is indicated by the neighbour table
pointer.  A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.

The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fd3d7d9

neigh: Factor out ___neigh_lookup_noref · 60395a20

Eric W. Biederman authored Mar 03, 2015

While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.

So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function.  I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.

To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq.  So that the static size of the key would be
available.

I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60395a20

bridge: fix bridge netlink RCU usage · 2f56f6be

Johannes Berg authored Mar 03, 2015

When the STP timer fires, it can call br_ifinfo_notify(),
which in turn ends up in the new br_get_link_af_size().
This function is annotated to be using RTNL locking, which
clearly isn't the case here, and thus lockdep warns:

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.19.0+ #569 Not tainted
  -------------------------------
  net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage!

Fix this by doing RCU locking here.

Fixes: b7853d73 ("bridge: add vlan info to bridge setlink and dellink notification messages")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f56f6be

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 71a83a6d

David S. Miller authored Mar 03, 2015

Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.
Signed-off-by: David S. Miller <davem@davemloft.net>

71a83a6d

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · b97526f3

David S. Miller authored Mar 03, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-03-03

This series contains updates to fm10k, i40e and i40evf.

Matthew updates the fm10k driver by cleaning up code comments and whitespace
issues.  Also modifies the tunnel length header check, to make it more robust
by calculating the inner L4 header length based on whether it is TCP or UDP.
Implemented ndo_features_check() that allows drivers to report their offload
capabilities per-skb.

Neerav updates the i40e driver to skip over priority tagging if DCB is not
enabled.  Fixes an issue where the driver is not flushing out the
DCBNL app table for applications that are not present in the local DCBX
application configuration TLVs.  Fixed i40e where, in the case of MFP
mode, the driver was returning the incorrect number of traffic classes
for partitions that are not enabled for iSCSI.  Even though the driver
was not configuring these traffic classes in the transmit scheduler for
the NIC partitions, it does use this map to setup the queue mappings.

Shannon updates i40e/i40evf to include the firmware build number in the
formatted firmware version string.

Akeem adds a safety net (by adding a 'default' case) for the possible
unmatched switch calls.

Mitch updates i40e to not automatically disable PF loopback at runtime,
now that we have the functionality to enable and disable PF loopback.  This
fix cleans up a bogus error message when removing the PF module with VFs
enabled.  Adds a extra check to make sure that the indirection table
pointer is valid before dereferencing it.

Anjali enables i40e to enable more than the max RSS qps when running in a
single TC mode for the main VSI.  It is possible to enable as many as
num_online_cpus().  Adds a firmware check to ensure that DCB is disabled for
firmware versions older than 4.33.  Updates i40e/i40evf to add missing
packet types for VXLAN offload.  Updated i40e to be able to handle varying
RSS table size for each VSI, since all VSI's do not have the same RSS table
size.

v2: Dropped previous patch #9 "i40e/i40evf: Add capability to gather VEB
    per TC stats" since the stats should be in ethtool and not debugfs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b97526f3

03 Mar, 2015 30 commits

Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux · a6c5170d

Linus Torvalds authored Mar 03, 2015

Pull nfsd fixes from Bruce Fields:
 "Three miscellaneous bugfixes, most importantly the clp->cl_revoked
  bug, which we've seen several reports of people hitting"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  sunrpc: integer underflow in rsc_parse()
  nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd
  svcrpc: fix memory leak in gssp_accept_sec_context_upcall

a6c5170d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 789d7f60

Linus Torvalds authored Mar 03, 2015

Pull networking fixes from David Miller:

 1) If an IPVS tunnel is created with a mixed-family destination
    address, it cannot be removed.  Fix from Alexey Andriyanov.

 2) Fix module refcount underflow in netfilter's nft_compat, from Pablo
    Neira Ayuso.

 3) Generic statistics infrastructure can reference variables sitting on
    a released function stack, therefore use dynamic allocation always.
    Fix from Ignacy Gawędzki.

 4) skb_copy_bits() return value test is inverted in ip_check_defrag().

 5) Fix network namespace exit in openvswitch, we have to release all of
    the per-net vports.  From Pravin B Shelar.

 6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter.

 7) Fix rhashtable grow/shrink behavior, only expand during inserts and
    shrink during deletes.  From Daniel Borkmann.

 8) Netdevice names with semicolons should never be allowed, because
    they serve as a separator.  From Matthew Thode.

 9) Use {,__}set_current_state() where appropriate, from Fabian
    Frederick.

10) Revert byte queue limits support in r8169 driver, it's causing
    regressions we can't figure out.

11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to
    measure packets in flight, properly use tcp_packets_in_flight()
    instead.  From Neal Cardwell.

12) Fix accidental removal of support for bluetooth in CSR based Intel
    wireless cards.  From Marcel Holtmann.

13) We accidently added a behavioral change between native and compat
    tasks, wrt testing the MSG_CMSG_COMPAT bit.  Just ignore it if the
    user happened to set it in a native binary as that was always the
    behavior we had.  From Catalin Marinas.

14) Check genlmsg_unicast() return valud in hwsim netlink tx frame
    handling, from Bob Copeland.

15) Fix stale ->radar_required setting in mac80211 that can prevent
    starting new scans, from Eliad Peller.

16) Fix memory leak in nl80211 monitor, from Johannes Berg.

17) Fix race in TX index handling in xen-netback, from David Vrabel.

18) Don't enable interrupts in amx-xgbe driver until all software et al.
    state is ready for the interrupt handler to run.  From Thomas
    Lendacky.

19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric
    W Biederman.

20) The amount of header space needed in macvtap was not calculated
    properly, fix it otherwise we splat past the beginning of the
    packet.  From Eric Dumazet.

21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin.

22) Don't raw initialize or mod timers, use setup_timer() and
    mod_timer() instead.  From Vaishali Thakkar.

23) Fix software maintained statistics in bcmgenet and systemport
    drivers, from Florian Fainelli.

24) DMA descriptor updates in sh_eth need proper memory barriers, from
    Ben Hutchings.

25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal
    Kubecek.

26) Openvswitch's non-masked set actions aren't constructed properly
    into netlink messages, fix from Joe Stringer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  openvswitch: Fix serialization of non-masked set actions.
  gianfar: Reduce logging noise seen due to phy polling if link is down
  ibmveth: Add function to enable live MAC address changes
  net: bridge: add compile-time assert for cb struct size
  udp: only allow UFO for packets from SOCK_DGRAM sockets
  sh_eth: Really fix padding of short frames on TX
  Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"
  sh_eth: Fix RX recovery on R-Car in case of RX ring underrun
  sh_eth: Ensure proper ordering of descriptor active bit write/read
  net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
  net/mlx4_core: Fix wrong mask and error flow for the update-qp command
  net: systemport: fix software maintained statistics
  net: bcmgenet: fix software maintained statistics
  rxrpc: don't multiply with HZ twice
  rxrpc: terminate retrans loop when sending of skb fails
  net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface.
  net: pasemi: Use setup_timer and mod_timer
  net: stmmac: Use setup_timer and mod_timer
  net: 8390: axnet_cs: Use setup_timer and mod_timer
  net: 8390: pcnet_cs: Use setup_timer and mod_timer
  ...

789d7f60

l2tp: Use eth_<foo>_addr instead of memset · 1cea7e2c

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1cea7e2c

wireless: Use eth_<foo>_addr instead of memset · d2beae10

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2beae10

mac80211: Use eth_<foo>_addr instead of memset · c84a67a2

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c84a67a2

ethernet: Use eth_<foo>_addr instead of memset · afc130dd

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

afc130dd

bluetooth: Use eth_<foo>_addr instead of memset · 211b8534

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

211b8534

atm: Use eth_<foo>_addr instead of memset · 19ffa562

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19ffa562

appletalk: Use eth_<foo>_addr instead of memset · 1a73de07

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a73de07

8021q: Use eth_<foo>_addr instead of memset · 423049ab

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

423049ab

xen: Use eth_<foo>_addr instead of memset · 3b6ed26d

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b6ed26d

netconsole: Use eth_<foo>_addr instead of memset · 1667c942

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.

Miscellanea:

Add #include <linux/etherdevice.h>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1667c942

wireless: Use eth_<foo>_addr instead of memset · 93803b33

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.

Miscellanea:

Add #include <linux/etherdevice.h> where appropriate
Use ETH_ALEN instead of 6
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93803b33

net: usb: Use eth_<foo>_addr instead of memset · 519983b1

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

519983b1

ethernet: Use eth_<foo>_addr instead of memset · c7bf7169

Joe Perches authored Mar 02, 2015

Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7bf7169

i40e: Fix dependencies in the i40e driver on configfs · e8ab38cf

Greg Rose authored Mar 03, 2015

Module dependencies are broken in the case where CONFIG_I40E=y and
CONFIG_CONFIGFS_FS=m.  This fixes the broken dependency.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8ab38cf

ax25: Stop using magic neighbour cache operations. · 1d5da757

Eric W. Biederman authored Mar 03, 2015

Before the ax25 stack calls dev_queue_xmit it always calls
ax25_type_trans which sets skb->protocol to ETH_P_AX25.

Which means that by looking at the protocol type it is possible to
detect IP packets that have not been munged by the ax25 stack in
ndo_start_xmit and call a function to munge them.

Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and
value to be appropriate for an ndo_start_xmit function.

Update all of the ax25 devices to test the protocol type for ETH_P_IP
and return ax25_ip_xmit as the first thing they do.  This preserves
the existing semantics of IP packet processing, but the timing will be
a little different as the IP packets now pass through the qdisc layer
before reaching the ax25 ip packet processing.

Remove the now unnecessary ax25 neighbour table operations.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d5da757

openvswitch: Fix serialization of non-masked set actions. · f4f8e738

Joe Stringer authored Mar 02, 2015

Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside
of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions
back to regular set actions, the inner attribute length was not changed,
ie, double the length being serialized. This patch fixes the bug.

Fixes: 83d2b9ba ("net: openvswitch: Support masked set actions.")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4f8e738

gianfar: Reduce logging noise seen due to phy polling if link is down · 0ae93b2c

Guenter Roeck authored Mar 02, 2015

Commit 6ce29b0e ("gianfar: Avoid unnecessary reg accesses in adjust_link()")
eliminates unnecessary calls to adjust_link for phy devices which don't support
interrupts and need polling. As part of that work, the 'new_state' local flag,
which was used to reduce logging noise on the console, was eliminated.

Unfortunately, that means that a 'Link is Down' log message will now be
issued continuously if a link is configured as UP, the link state is down,
and the associated phy requires polling. This occurs because priv->oldduplex
is -1 in this case, which always differs from phydev->duplex. In addition,
phydev->speed may also differ from priv->oldspeed. gfar_update_link_state()
is therefore called each time a phy is polled, even if the link state did not
change.

Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ae93b2c

ibmveth: Add function to enable live MAC address changes · c77c761f

Thomas Falcon authored Mar 02, 2015

Add a function that will enable changing the MAC address
of an ibmveth interface while it is still running.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

c77c761f

net/atm/signaling.c: remove WAIT_FOR_DEMON code · bcc90e3f

Fabian Frederick authored Mar 03, 2015

WAIT_FOR_DEMON code is directly undefined at the beginning
of signaling.c since initial git version and thus never compiled.
This also removes buggy current->state direct access.
Suggested-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcc90e3f

net: bridge: add compile-time assert for cb struct size · 71e168b1

Florian Westphal authored Mar 03, 2015

make build fail if structure no longer fits into ->cb storage.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

71e168b1

Linux 4.0-rc2 · 13a7a6ac
Linus Torvalds authored Mar 03, 2015

13a7a6ac

drm/i915: Fix modeset state confusion in the load detect code · 9128b040

Daniel Vetter authored Mar 03, 2015

This is a tricky story of the new atomic state handling and the legacy
code fighting over each another. The bug at hand is an underrun of the
framebuffer reference with subsequent hilarity caused by the load
detect code. Which is peculiar since the the exact same code works
fine as the implementation of the legacy setcrtc ioctl.

Let's look at the ingredients:

- Currently our code is a crazy mix of legacy modeset interfaces to
  set the parameters and half-baked atomic state tracking underneath.
  While this transition is going we're using the transitional plane
  helpers to update the atomic side (drm_plane_helper_disable/update
  and friends), i.e. plane->state->fb. Since the state structure owns
  the fb those functions take care of that themselves.

  The legacy state (specifically crtc->primary->fb) is still managed
  by the old code (and mostly by the drm core), with the fb reference
  counting done by callers (core drm for the ioctl or the i915 load
  detect code). The relevant commit is

  commit ea2c67bb
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Dec 23 10:41:52 2014 -0800

      drm/i915: Move to atomic plane helpers (v9)

- drm_plane_helper_disable has special code to handle multiple calls
  in a row - it checks plane->crtc == NULL and bails out. This is to
  match the proper atomic implementation which needs the crtc to get
  at the implied locking context atomic updates always need. See

  commit acf24a39
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Tue Jul 29 15:33:05 2014 +0200

      drm/plane-helper: transitional atomic plane helpers

- The universal plane code split out the implicit primary plane from
  the CRTC into it's own full-blown drm_plane object. As part of that
  the setcrtc ioctl (which updated both the crtc mode and primary
  plane) learned to set crtc->primary->crtc on modeset to make sure
  the plane->crtc assignments statate up to date in

  commit e13161af
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Apr 1 15:22:38 2014 -0700

      drm: Add drm_crtc_init_with_planes() (v2)

  Unfortunately we've forgotten to update the load detect code. Which
  wasn't a problem since the load detect modeset is temporary and
  always undone before we drop the locks.

- Finally there is a organically grown history (i.e. don't ask) around
  who sets the legacy plane->fb for the various driver entry points.
  Originally updating that was the drivers duty, but for almost all
  places we've moved that (plus updating the refcounts) into the core.
  Again the exception is the load detect code.

Taking all together the following happens:
- The load detect code doesn't set crtc->primary->crtc. This is only
  really an issue on crtcs never before used or when userspace
  explicitly disabled the primary plane.

- The plane helper glue code short-circuits because of that and leaves
  a non-NULL fb behind in plane->state->fb and plane->fb. The state
  fb isn't a real problem (it's properly refcounted on its own), it's
  just the canary.

- Load detect code drops the reference for that fb, but doesn't set
  plane->fb = NULL. This is ok since it's still living in that old
  world where drivers had to clear the pointer but the core/callers
  handled the refcounting.

- On the next modeset the drm core notices plane->fb and takes care of
  refcounting it properly by doing another unref. This drops the
  refcount to zero, leaving state->plane now pointing at freed memory.

- intel_plane_duplicate_state still assume it owns a reference to that
  very state->fb and bad things start to happen.

Fix this all by applying the same duct-tape as for the legacy setcrtc
ioctl code and set crtc->primary->crtc properly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9128b040

i40e/i40evf: Bump versions · ce458fcf

Sravanthi Tangeda authored Feb 24, 2015

Bump i40e to 1.2.10 and i40evf to 1.2.4

Change-ID: I48aa64df05fcc8356e7026f3a9e69ecf78d0c785
Signed-off-by: Sravanthi Tangeda <sravanthi.tangeda@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ce458fcf

i40e: Only enable TC0 for NIC partition type · fc51de96

Neerav Parikh authored Feb 24, 2015

In case of MFP mode the driver was returning incorrect number of TCs
for partitions that are not enabled for iSCSI. Though the driver does
not configure these TCs in the Tx scheduler for the NIC partitions;
it does use this map to setup the queue mappings.

This patch fixes this and keeps all the NIC partitions to the default
PF TC i.e. TC0.

Change-ID: Iede214c907e7bac1356e999049b9f642759512b3
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fc51de96

i40e: Register DCBNL ops in MFP mode · 15d504b9

Neerav Parikh authored Feb 24, 2015

Allow DCBNL operations in MFP mode to allow query of port DCB settings
via all the PFs and register iSCSI APP on iSCSI enabled PF.

Change-ID: I34cc39b4665d0a631847d4079350d3814f02381e
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

15d504b9

i40evf: ethtool RSS fixes · ed250ecc

Mitch Williams authored Feb 24, 2015

Add an extra check to make sure that the indirection table pointer is
valid before dereferencing it.

Change-ID: I698adbf3daff03081d01f489dc95a9f1ad8b12f1
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ed250ecc

i40e: Fix RSS size at init since default num queue calculation has changed · 66ddcffb

Anjali Singhai Jain authored Feb 24, 2015

With changes to default number of queue pairs that the interface comes up with
from 1 per online CPU to 1 per lan_msix, we need to make sure we recalculate
rss_size. We will now recalculate rss_size based on number of queues enabled in
the VSI.

Without this fix if the max_lan_msix < num_online_cpu we will be coming up
with fewer queues but will be populating rss_size based on num_online_cpus.
This will result in packets getting silently dropped because RSS LUT has queues
that are not enabled.

Change-ID: Ifac8796ce1be1758bb0c34f38dbf4a3a76621e76
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

66ddcffb

i40e: Move RSS table size for VSIs to the VSI struct · 5db4cb59

Anjali Singhai Jain authored Feb 24, 2015

Since all VSIs don't have the same RSS table size,
have one for each VSI instead of having a single define
for RSS table size

Change-ID: Ic2c7c66e4a389d4b6c8841a707510a9735041f02
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5db4cb59