Commits · 0a6b2a1dc2a2105f178255fe495eb914b09cb37a · Kirill Smelkov / linux

21 Feb, 2018 16 commits

tcp: switch to GSO being always on · 0a6b2a1d

Eric Dumazet authored Feb 19, 2018

Oleksandr Natalenko reported performance issues with BBR without FQ
packet scheduler that were root caused to lack of SG and GSO/TSO on
his configuration.

In this mode, TCP internal pacing has to setup a high resolution timer
for each MSS sent.

We could implement in TCP a strategy similar to the one adopted
in commit fefa569a ("net_sched: sch_fq: account for schedule/timers drifts")
or decide to finally switch TCP stack to a GSO only mode.

This has many benefits :

1) Most TCP developments are done with TSO in mind.
2) Less high-resolution timers needs to be armed for TCP-pacing
3) GSO can benefit of xmit_more hint
4) Receiver GRO is more effective (as if TSO was used for real on sender)
   -> Lower ACK traffic
5) Write queues have less overhead (one skb holds about 64KB of payload)
6) SACK coalescing just works.
7) rtx rb-tree contains less packets, SACK is cheaper.

This patch implements the minimum patch, but we can remove some legacy
code as follow ups.

Tested:

On 40Gbit link, one netperf -t TCP_STREAM

BBR+fq:
sg on:  26 Gbits/sec
sg off: 15.7 Gbits/sec   (was 2.3 Gbit before patch)

BBR+pfifo_fast:
sg on:  24.2 Gbits/sec
sg off: 14.9 Gbits/sec  (was 0.66 Gbit before patch !!! )

BBR+fq_codel:
sg on:  24.4 Gbits/sec
sg off: 15 Gbits/sec  (was 0.66 Gbit before patch !!! )
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a6b2a1d

Merge branch 'ibmvnic-Make-driver-resources-dynamic' · 960103ff

David S. Miller authored Feb 21, 2018

Nathan Fontenot says:

====================
ibmvnic: Make driver resources dynamic

The ibmvnic driver needs to be able to handle the number of tx/rx
sub-crqs changing during a reset of the driver. To do this several
changes need to be made. First the num_active_[tx|rx]_pools
counters need to be re-named to num_active_[tc|rx]_scrqs, and
updated after resource initialization.

With this change we can now release and init the sub crqs and napi
(for rx sub crqs) when the number of sub crqs change.

Lastly, the stats buffer allocation is updated to always allocate
the maximum number of sub-crqs count of stats buffers.

-Nathan
---

Updates for V3:
Patch 3/5 - Make do_h_free parameter a bool

Updates for V2:
Patch 3/5 - Use correct queue count when driver is in probed state
for releasing sub crqs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

960103ff

ibmvnic: Allocate max queues stats buffers · abcae546

Nathan Fontenot authored Feb 19, 2018

To avoid losing any stats when the number of sub-crqs change, allocate
the max number of stats buffers so a stats buffer exists all possible
sub-crqs.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abcae546

ibmvnic: Make napi usage dynamic · 86f669b2

Nathan Fontenot authored Feb 19, 2018

In order to handle the number of rx sub crqs changing during a driver
reset, the ibmvnic driver also needs to update the number of napi.
To do this the code to init and free napi's is moved to their own
routines so they can be called during the reset process.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86f669b2

ibmvnic: Free and re-allocate scrqs when tx/rx scrqs change · d7c0ef36

Nathan Fontenot authored Feb 19, 2018

When the driver resets it is possible that the number of tx/rx
sub-crqs can change. This patch handles this so that the driver does
not try to access non-existent sub-crqs.

The count for releasing sub crqs depends on the adapter state. The
active queue count is not set in probe, so if we are relasing in probe
state we use the request queue count.

Additionally, a parameter is added to release_sub_crqs() so that
we know if the h_call to free the sub-crq needs to be made. In
the reset path we have to do a reset of the main crq, which is
a free followed by a register of the main crq. The free of main
crq results in all of the sub crq's being free'ed. When updating
sub-crq count in the reset path we do not want to h_free the
sub-crqs, they are already free'ed.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7c0ef36

ibmvnic: Move active sub-crq count settings · d9043c10

Nathan Fontenot authored Feb 19, 2018

Inpreparation for using the active scrq count to track more active
resources, move the setting of the active count to after initialization
occurs in initial driver init and during driver reset.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9043c10

ibmvnic: Rename active queue count variables · 8862541d

Nathan Fontenot authored Feb 19, 2018

Rename the tx/rx active pool variables to be tx/rx active scrq
counts. The tx/rx pools are per sub-crq so this is a more appropriate
name. This also is a preparatory step for using thiese variables
for handling updates to sub-crqs and napi based on the active
count.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8862541d

rds: send: mark expected switch fall-through in rds_rm_size · f9053113

Gustavo A. R. Silva authored Feb 19, 2018

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1465362 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9053113

Merge branch '8390-cleanups' · aa00d40b

David S. Miller authored Feb 21, 2018

Finn Thain says:

====================
Fixes, cleanup and modernization for 8390 ethernet drivers

Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Added acked-by tags from Geert Uytterhoeven.
- Omitted patches unrelated to 8390 drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

aa00d40b

net/mac8390: Fix log messages · 4a1b27c9

Finn Thain authored Feb 18, 2018

Use dev_foo() to log the slot number instead of the unexpanded "eth%d"
format string.
Disambiguate the two identical "Card type %s is unsupported" messages.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a1b27c9

net/mac8390: Convert to nubus_driver · 494a973e

Finn Thain authored Feb 18, 2018

This resolves an old bug that constrained this driver to no more than
one card.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

494a973e

net/8390: Fix msg_enable patch snafu · 646fe03b

Finn Thain authored Feb 18, 2018

The lib8390 module parameter 'msg_enable' doesn't do anything useful:
it causes an ancient version string to be logged.

Remove redundant code that logs the same string.

In ne.c and wd.c, the value of ei_local->msg_enable is used before
being assigned. Use ne_msg_enable and wd_msg_enable, respectively.

Most of the other 8390 drivers never assign ei_local->msg_enable.
Use the 'msg_enable' module parameter from lib8390 as the default
value.

Eliminate the pointless static and local variables.

Clean up an indentation mistake.

All of these issues originated from the same patch.

Cc: Russell King <linux@armlinux.org.uk>
Fixes: c45f812f ("8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

646fe03b

net/8390: Remove redundant make dependencies · 73219de2

Finn Thain authored Feb 18, 2018

The hydra, zorro8390 and mcf8390 drivers all #include "lib8390.c" and
have no need for 8390.o. modinfo confirms no dependency on 8390.ko.
Drop the redundant dependency from the Makefile. objdump confirms
that this patch has no effect on the module binaries.

The superfluous additions of 8390.o were introduced in
commit 644570b8 ("8390: Move the 8390 related drivers").

Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

73219de2

r8169: remove not needed PHY soft reset in rtl8168e_2_hw_phy_config · 4be83e5a

Heiner Kallweit authored Feb 20, 2018

rtl8169_init_phy() resets the PHY anyway after applying the chip-specific
PHY configuration. So we don't need to soft-reset the PHY as part of the
chip-specific configuration.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4be83e5a

net: sched: add em_ipt ematch for calling xtables matches · ccc007e4

Eyal Birger authored Feb 15, 2018

The commit a new tc ematch for using netfilter xtable matches.

This allows early classification as well as mirroning/redirecting traffic
based on logic implemented in netfilter extensions.

Current supported use case is classification based on the incoming IPSec
state used during decpsulation using the 'policy' iptables extension
(xt_policy).

The module dynamically fetches the netfilter match module and calls
it using a fake xt_action_param structure based on validated userspace
provided parameters.

As the xt_policy match does not access skb->data, no skb modifications
are needed on match.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ccc007e4

r8169: remove some WOL-related dead code · 022ddbca

Heiner Kallweit authored Feb 20, 2018

Commit bde135a6 "r8169: only enable PCI wakeups when WOL is active"
removed the only user of flag RTL_FEATURE_WOL. So let's remove some
now dead code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

022ddbca

20 Feb, 2018 20 commits

Merge branch 'stmmac-multi-queue-fixes-and-cleanups' · 5eeba397

David S. Miller authored Feb 20, 2018

Niklas Cassel says:

====================
stmmac multi-queue fixes and cleanups
====================
Reviewed-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5eeba397

net: stmmac: honor error code from stmmac_dt_phy() · ce339abc

Niklas Cassel authored Feb 19, 2018

Honor error code from stmmac_dt_phy() instead of always
returning -ENODEV.

No functional change intended.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce339abc

net: stmmac: add error handling in stmmac_mtl_setup() · 2ee2132f

Niklas Cassel authored Feb 19, 2018

The device tree binding for stmmac says:

- Multiple TX Queues parameters: below the list of all the parameters to
                                 configure the multiple TX queues:
        - snps,tx-queues-to-use: number of TX queues to be used in the driver
	[...]
        - For each TX queue
		[...]

However, if one specifies snps,tx-queues-to-use = 2,
but omits the queue subnodes, or defines just one queue subnode,
since the driver appears to initialize queues with sane default
values, we will get tx queue timeouts.

This is because the initialization code only initializes
as many queues as it finds subnodes. Potentially leaving
some queues uninitialized.

To avoid hard to debug issues, return an error if the number
of subnodes differ from snps,tx-queues-to-use/snps,rx-queues-to-use.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ee2132f

net: stmmac: call correct function in stmmac_mac_config_rx_queues_routing() · 13138de0

Niklas Cassel authored Feb 19, 2018

stmmac_mac_config_rx_queues_routing() incorrectly calls rx_queue_prio()
instead of rx_queue_routing().

This looks like a copy paste issue, since
stmmac_mac_config_rx_queues_prio() already calls rx_queue_prio(),
and both stmmac_mac_config_rx_queues_routing() and
stmmac_mac_config_rx_queues_prio() are very similar in structure.

Fixes: abe80fdc ("net: stmmac: RX queue routing configuration")
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

13138de0

net: stmmac: rename dwmac4_tx_queue_routing() to match reality · e5a01992

Niklas Cassel authored Feb 19, 2018

Looking at dwmac4_tx_queue_routing(), it is obvious that it
sets up rx queue routing.

Rename dwmac4_tx_queue_routing() to dwmac4_rx_queue_routing()
to better match reality.

Fixes: abe80fdc ("net: stmmac: RX queue routing configuration")
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5a01992

net: stmmac: WARN if tx_skbuff entries are reused before cleared · b4c9784c

Niklas Cassel authored Feb 19, 2018

The current code assumes that a tx_skbuff entry has been cleared
by stmmac_tx_clean() before stmmac_xmit()/stmmac_tso_xmit()
assigns a new skb to that entry. However, since we never check
the current value before overwriting it, it is theoretically
possible that a non-NULL value is overwritten.

Add WARN_ONs to verify that each entry in tx_skbuff is NULL
before it is assigned a new value.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4c9784c

net: stmmac: do not clear tx_skbuff entries in stmmac_xmit()/stmmac_tso_xmit() · f66b533d

Niklas Cassel authored Feb 19, 2018

tx_skbuff is initialized to NULL in init_dma_tx_desc_rings(), which is
called from ndo_open().

stmmac_tx_clean() frees any non-NULL skb, and sets the tx_skbuff
entry to NULL. Hence, there is no need to set skbuff entries to NULL
in stmmac_xmit()/stmmac_tso_xmit(), and doing so falsely gives the
reader the impression that it is needed.
Do not clear tx_skbuff entries in stmmac_xmit()/stmmac_tso_xmit().
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f66b533d

net: stmmac: set MSS for each tx DMA channel · 8d212a9e

Niklas Cassel authored Feb 19, 2018

The DMA engine in dwmac4 can segment a large TSO packet to several
smaller packets of (max) size Maximum Segment Size (MSS).

The DMA engine fetches and saves the MSS via a context descriptor.

This context decriptor has to be provided to each tx DMA channel.
To ensure that this is done, move struct member mss from stmmac_priv
to stmmac_tx_queue.

stmmac_reset_queues_param() now also resets mss, together with other
queue parameters, so reset of mss value can be removed from
stmmac_resume().

init_dma_tx_desc_rings() now also resets mss, together with other
queue parameters, so reset of mss value can be removed from
stmmac_open().

This fixes tx queue timeouts for dwmac4, with DT property
snps,tx-queues-to-use > 1, when running iperf3 with multiple threads.

Fixes: ce736788 ("net: stmmac: adding multiple buffers for TX")
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d212a9e

x25: use %*ph to print small buffer · 3fef2b62

Antonio Cardace authored Feb 19, 2018

Use %*ph format to print small buffer as hex string.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Antonio Cardace <anto.cardace@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fef2b62

Merge branch 'net-Expose-KVD-linear-parts-as-resources' · a38bbe7d

David S. Miller authored Feb 20, 2018

Jiri Pirko says:

====================
net: Expose KVD linear parts as resources

Arkadi says:

Expose the KVD linear partitions via the devlink resource interface. This
will give the user the ability to control the linear memory division.

---
v1->v2:
- patch1:
 - fixed u64 division error reported by kbuildbot
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a38bbe7d

mlxsw: spectrum_kvdl: Add support for per part occupancy · 7f47b19b

Arkadi Sharshevsky authored Feb 20, 2018

Add support for calculating occupancy for separate kvdl parts.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f47b19b

mlxsw: spectrum_kvdl: Add support for dynamic partition set · 887839e6

Arkadi Sharshevsky authored Feb 20, 2018

Add support for dynamic partition set via the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

887839e6

mlxsw: spectrum_kvdl: Add support for linear division resources · 51d3c08e

Arkadi Sharshevsky authored Feb 20, 2018

The linear part of the KVD memory is sub-divided into multiple parts. This
patch exposes this internal partitions via the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51d3c08e

devlink: Perform cleanup of resource_set cb · 4f4bbf7c

Arkadi Sharshevsky authored Feb 20, 2018

After adding size validation logic into core cleanup is required.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f4bbf7c

devlink: Move size validation to core · cc944ead

Arkadi Sharshevsky authored Feb 20, 2018

Currently the size validation is done via a cb, which is unneeded. The
size validation can be moved to core. The next patch will perform cleanup.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc944ead

Merge branch 'net-Get-rid-of-net_mutex-and-simplify-cleanup_list-queueing' · b99fe0e2

David S. Miller authored Feb 20, 2018

Kirill Tkhai says:

====================
net: Get rid of net_mutex and simplify cleanup_list queueing

[1/3] kills net_mutex and makes net_sem be taken for write instead.
      This is made to take less locks (1 instead of 2) for the time
      before all pernet_operations are converted.

[2-3/3] simplifies dead net cleanup queueing, and makes llist api
        be used for that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b99fe0e2

net: Queue net_cleanup_work only if there is first net added · 8349efd9

Kirill Tkhai authored Feb 19, 2018

When llist_add() returns false, cleanup_net() hasn't made its
llist_del_all(), while the work has already been scheduled
by the first queuer. So, we may skip queue_work() in this case.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8349efd9

net: Make cleanup_list and net::cleanup_list of llist type · 65b7b5b9

Kirill Tkhai authored Feb 19, 2018

This simplifies cleanup queueing and makes cleanup lists
to use llist primitives. Since llist has its own cmpxchg()
ordering, cleanup_list_lock is not more need.

Also, struct llist_node is smaller, than struct list_head,
so we save some bytes in struct net with this patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65b7b5b9

net: Kill net_mutex · 19efbd93

Kirill Tkhai authored Feb 19, 2018

We take net_mutex, when there are !async pernet_operations
registered, and read locking of net_sem is not enough. But
we may get rid of taking the mutex, and just change the logic
to write lock net_sem in such cases. This obviously reduces
the number of lock operations, we do.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19efbd93

ibmvnic: Keep track of supplementary TX descriptors · ffc385b9

Thomas Falcon authored Feb 18, 2018

Supplementary TX descriptors were not being accounted for, which
was resulting in an overflow of the hardware device's transmit
queue. Keep track of those descriptors now when determining
how many entries remain on the TX queue.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffc385b9

19 Feb, 2018 4 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f5c0c6f4
David S. Miller authored Feb 19, 2018

f5c0c6f4

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 79c0ef3e

Linus Torvalds authored Feb 19, 2018

Pull networking fixes from David Miller:

 1) Prevent index integer overflow in ptr_ring, from Jason Wang.

 2) Program mvpp2 multicast filter properly, from Mikulas Patocka.

 3) The bridge brport attribute file is write only and doesn't have a
    ->show() method, don't blindly invoke it. From Xin Long.

 4) Inverted mask used in genphy_setup_forced(), from Ingo van Lil.

 5) Fix multiple definition issue with if_ether.h UAPI header, from
    Hauke Mehrtens.

 6) Fix GFP_KERNEL usage in atomic in RDS protocol code, from Sowmini
    Varadhan.

 7) Revert XDP redirect support from thunderx driver, it is not
    implemented properly. From Jesper Dangaard Brouer.

 8) Fix missing RTNL protection across some tipc operations, from Ying
    Xue.

 9) Return the correct IV bytes in the TLS getsockopt code, from Boris
    Pismenny.

10) Take tclassid into consideration properly when doing FIB rule
    matching. From Stefano Brivio.

11) cxgb4 device needs more PCI VPD quirks, from Casey Leedom.

12) TUN driver doesn't align frags properly, and we can end up doing
    unaligned atomics on misaligned metadata. From Eric Dumazet.

13) Fix various crashes found using DEBUG_PREEMPT in rmnet driver, from
    Subash Abhinov Kasiviswanathan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  tg3: APE heartbeat changes
  mlxsw: spectrum_router: Do not unconditionally clear route offload indication
  net: qualcomm: rmnet: Fix possible null dereference in command processing
  net: qualcomm: rmnet: Fix warning seen with 64 bit stats
  net: qualcomm: rmnet: Fix crash on real dev unregistration
  sctp: remove the left unnecessary check for chunk in sctp_renege_events
  rxrpc: Work around usercopy check
  tun: fix tun_napi_alloc_frags() frag allocator
  udplite: fix partial checksum initialization
  skbuff: Fix comment mis-spelling.
  dn_getsockoptdecnet: move nf_{get/set}sockopt outside sock lock
  PCI/cxgb4: Extend T3 PCI quirk to T4+ devices
  cxgb4: fix trailing zero in CIM LA dump
  cxgb4: free up resources of pf 0-3
  fib_semantics: Don't match route with mismatching tclassid
  NFC: llcp: Limit size of SDP URI
  tls: getsockopt return record sequence number
  tls: reset the crypto info if copy_from_user fails
  tls: retrun the correct IV in getsockopt
  docs: segmentation-offloads.txt: add SCTP info
  ...

79c0ef3e

tipc: don't call sock_release() in atomic context · 26736a08

Paolo Abeni authored Feb 19, 2018

syzbot reported a scheduling while atomic issue at netns
destruction time:

BUG: sleeping function called from invalid context at net/core/sock.c:2769
in_atomic(): 1, irqs_disabled(): 0, pid: 85, name: kworker/u4:3
5 locks held by kworker/u4:3/85:
  #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c9792deb>]
process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
  #1:  (net_cleanup_work){+.+.}, at: [<00000000adc12e2a>]
process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
  #2:  (net_sem){++++}, at: [<000000009ccb5669>] cleanup_net+0x23f/0xd20
net/core/net_namespace.c:494
  #3:  (net_mutex){+.+.}, at: [<00000000a92767d9>] cleanup_net+0xa7d/0xd20
net/core/net_namespace.c:496
  #4:  (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>]
spin_lock_bh include/linux/spinlock.h:315 [inline]
  #4:  (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>]
tipc_topsrv_stop+0x231/0x610 net/tipc/topsrv.c:685
CPU: 0 PID: 85 Comm: kworker/u4:3 Not tainted 4.16.0-rc1+ #230
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6128
  __might_sleep+0x95/0x190 kernel/sched/core.c:6081
  lock_sock_nested+0x37/0x110 net/core/sock.c:2769
  lock_sock include/net/sock.h:1463 [inline]
  tipc_release+0x103/0xff0 net/tipc/socket.c:572
  sock_release+0x8d/0x1e0 net/socket.c:594
  tipc_topsrv_stop+0x3c0/0x610 net/tipc/topsrv.c:696
  tipc_exit_net+0x15/0x40 net/tipc/core.c:96
  ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:148
  cleanup_net+0x6ba/0xd20 net/core/net_namespace.c:529
  process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429

This is caused by tipc_topsrv_stop() releasing the listener socket
with the idr lock held. This changeset addresses the issue moving
the release operation outside such lock.

Reported-and-tested-by: syzbot+749d9d87c294c00ca856@syzkaller.appspotmail.com
Fixes: 0ef897be ("tipc: separate topology server listener socket from subcsriber sockets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by:  ///jon
Signed-off-by: David S. Miller <davem@davemloft.net>

26736a08

tipc: fix bug on error path in tipc_topsrv_kern_subscr() · 96c252bf

Jon Maloy authored Feb 19, 2018

In commit cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we
re-introduced an old bug on the error path in the function
tipc_topsrv_kern_subscr(). We now re-introduce the correction too.

Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96c252bf