Commits · f9264dd687f8d3f9104c9900f8f3e5e419f27c55 · Kirill Smelkov / linux

25 Feb, 2019 40 commits

ice: fix ice_remove_rule_internal vsi_list handling · f9264dd6

Jacob Keller authored Feb 08, 2019

When adding multiple VLANs to the same VSI, the ice_add_vlan code will
share the VSI list, so as not to create multiple unnecessary VSI lists.

Consider the following flow

  ice_add_vlan(hw, <VSI 0 VID 7, VSI 0 VID 8, VSI 0 VID 9>)

Where we add three VLAN filters for VIDs 7, 8, and 9, all for VSI 0.

The ice_add_vlan will create a single vsi_list and share it among all
the filters.

Later, if we try to remove a VLAN,

  ice_remove_vlan(hw, <VSI 0 VID 7>)

Then the removal code will update the vsi_list and remove VSI 0 from it.
But, since the vsi_list is shared, this breaks the list for the other
users who reference it. We actually even free the VSI list memory, and
may result in segmentation faults.

This is due to the way that VLAN rule share VSI lists with reference
counts, and is caused because we call ice_rem_update_vsi_list even when
the ref_cnt is greater than one.

To fix this, handle the case where ref_cnt is greater than one
separately. In this case, we need to remove the associated rule without
modifying the vsi_list, since it is currently being referenced by
another rule. Instead, we just need to decrement the VSI list ref_cnt.

The case for handling sharing of VSI lists with multiple VSIs is not
currently supported by this code. No such rules will be created today,
and this code will require changes if/when such code is added.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f9264dd6

ice: fix stack hogs from struct ice_vsi_ctx structures · 198a666a

Bruce Allan authored Feb 08, 2019

struct ice_vsi_ctx has gotten large enough that function local declarations
of it on the stack are causing stack hogs.  Fix that by allocating the
structs on heap.  Cleanup some formatting issues in the code around these
changes and fix incorrect data type uses of returned functions in a couple
places.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

198a666a

ice: sizeof(<type>) should be avoided · c6dfd690

Bruce Allan authored Feb 08, 2019

With sizeof(), it is preferable to use the variable of type <type> instead
of sizeof(<type>).

There are multiple places where a temporary variable is used to hold a
'size' value which is then used for a subsequent alloc/memset. Get rid
of the temporary variable by calculating size as part of the alloc/memset
statement.

Also remove unnecessary type-cast.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

c6dfd690

ice: Fix added in VSI supported nodes calc · 0e8fd74d

Victor Raj authored Feb 08, 2019

VSI supported nodes are calculated in order to add the VSI parent or
intermediate nodes to the scheduler tree. If one of the node in below
layers (from VSI layer) has space to add the new VSI or intermediate node
above that layer then it's not required to continue the calculation further
for below layers.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

0e8fd74d

ice: Fix the calculation of ICE_MAX_MTU · 5ed5d316

Maciej Fijalkowski authored Feb 08, 2019

Currently ICE_MAX_MTU subtracts only ETH_HLEN from max frame size and
adds ETH_FCS_LEN and VLAN_HLEN, which is not what was intended.
The ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN expression should be surrounded
with parentheses.

Wrap mentioned expression and take into account VLAN double tagging.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5ed5d316

ice: Mark extack argument as __always_unused · 99be37ed

Bruce Allan authored Feb 08, 2019

Commit 87b0984e ("net: Add extack argument to ndo_fdb_add()") in
net-next added an extended parameter to the .ndo_fdb_add op and changed
ice_fdb_add() accordingly. Update the function header and add the
__always_unused attribute to the new parameter to avoid -Wunused-parameter
warnings.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

99be37ed

net: fix double-free in bpf_lwt_xmit_reroute · bd16693f

Peter Oskolkov authored Feb 23, 2019

dst_output() frees skb when it fails (see, for example,
ip_finish_output2), so it must not be freed in this case.

Fixes: 3bd0b152 ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd16693f

ip_tunnel: Add ip tunnel tun_info type dst_cache in ip_tunnel_xmit · 186d9366

wenxu authored Feb 24, 2019

ip l add dev tun type gretap key 1000

Non-tunnel-dst ip tunnel device can send packet through lwtunnel
This patch provide the tun_inf dst cache support for this mode.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

186d9366

Merge branch 'dsa-mv88e6xxx-lockdep' · 169431ed

David S. Miller authored Feb 24, 2019

Andrew Lunn says:

====================
mv88e6xxx: Avoid false positive Lockdep splats

When acquiring the GPIO interrupt line for the switch, it is possible
to trigger lockdep splats. These are false positives, the mutex is in
a different IRQ descriptor. But fix it anyway, since it could mask
real locking issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

169431ed

net: dsa: mv88e6xxx: Release lock while requesting IRQ · 342a0ee7

Andrew Lunn authored Feb 23, 2019

There is no need to hold the register lock while requesting the GPIO
interrupt. By not holding it we can also avoid a false positive
lockdep splat.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

342a0ee7

net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat · f6d9758b

Andrew Lunn authored Feb 23, 2019

The following false positive lockdep splat has been observed.

======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #302 Not tainted
------------------------------------------------------
systemd-udevd/160 is trying to acquire lock:
edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704

but task is already holding lock:
edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&desc->request_mutex){+.+.}:
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0xa0/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbedf2ae8

-> #0 (&chip->reg_lock){+.+.}:
       __mutex_lock+0x50/0x8b8
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0x640/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
       mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbedf2ae8

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&desc->request_mutex);
                               lock(&chip->reg_lock);
                               lock(&desc->request_mutex);
  lock(&chip->reg_lock);

&desc->request_mutex refer to two different mutex. #1 is the GPIO for
the chip interrupt. #2 is the chained interrupt between global 1 and
global 2.

Add lockdep classes to the GPIO interrupt to avoid this.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6d9758b

ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel · 3d25eabb

wenxu authored Feb 23, 2019

The lwtunnel_state is not init the dst_cache Which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d25eabb

tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg · 2b794c40

Vakul Garg authored Feb 23, 2019

The patch enables returning 'type' in msghdr for records that are
retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
from socket from getting clubbed with any other record of different
type when records are subsequently dequeued from strparser.

For each record, we now retain its type in sk_buff's control buffer
cb[]. Inside control buffer, record's full length and offset are already
stored by strparser in 'struct strp_msg'. We store record type after
'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
stored just after record dequeue. For tls1.3, the type is stored after
record has been decrypted.

Inside process_rx_list(), before processing a non-data record, we check
that we must be able to return back the record type to the user
application. If not, the decrypted records in tls context's rx_list is
left there without consuming any data.

Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b794c40

Merge branch 'ipv4-v6-icmp-small-cleanup-and-update' · 2bdeb8e5

David S. Miller authored Feb 24, 2019

Kefeng Wang says:

====================
ipv4/v6: icmp: small cleanup and update

v2:
- Add cover letter and user proper patch subject-prefix suggested-by Eric Dumazet

This patch series contains some small cleanup and update,
1) use icmp/v6_sk_exit when icmp_sk_init fails instead of open-code
2) use new percpu allocation interface for the ipv6.icmp_sk
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2bdeb8e5

ipv6: icmp: use percpu allocation · 75efc250

Kefeng Wang authored Feb 23, 2019

Use percpu allocation for the ipv6.icmp_sk.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75efc250

ipv6: icmp: use icmpv6_sk_exit() · 3232a1ef

Kefeng Wang authored Feb 23, 2019

Simply use icmpv6_sk_exit() when inet_ctl_sock_create() fail
in icmpv6_sk_init().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3232a1ef

ipv4: icmp: use icmp_sk_exit() · e9128c14

Kefeng Wang authored Feb 23, 2019

Simply use icmp_sk_exit() when inet_ctl_sock_create() fail in icmp_sk_init().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9128c14

ila: Fix uninitialised return value in ila_xlat_nl_cmd_flush · 4ef595cb

Herbert Xu authored Feb 23, 2019

This patch fixes an uninitialised return value error in
ila_xlat_nl_cmd_flush.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6c4128f6 ("rhashtable: Remove obsolete...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ef595cb

net/sched: act_tunnel_key: Add dst_cache support · 41411e2f

wenxu authored Feb 22, 2019

The metadata_dst is not init the dst_cache which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

41411e2f

Merge branch 'code-optimizations-and-bugfixes-for-HNS3-driver' · caf337bd

David S. Miller authored Feb 24, 2019

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for
the HNS3 ethernet controller driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

caf337bd

net: hns3: fix improper error handling for hns3_client_start · 18655128

Huazhong Tan authored Feb 23, 2019

If hns3_client_start() failed in the hns3_client_init(),
register_dev() should be undo in its error handling.

Fixes: a6d818e3 ("net: hns3: Add vport alive state checking support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18655128

net: hns3: fix setting of the hns reset_type for rdma hw errors · eb4c2ccb

Shiju Jose authored Feb 23, 2019

Presently the hns reset_type for the roce errors is set
in the hclge_log_and_clear_rocee_ras_error function.
This function is also called to detect and clear roce errors
while enabling the rdma error interrupts. However there is no hns
reset requested for this case. This can cause issue of wrong
reset_type used with subsequent hns reset as the
reset_type set in the above case was not cleared.

This patch moves setting of hns reset_type for the roce errors from
hclge_log_and_clear_rocee_ras_error function
to hclge_handle_rocee_ras_error.

Fixes: 630ba007 ("net: hns3: add handling of RDMA RAS errors")
Reported-by: Huazhong Tan <tanhuazhong@huawei.com>
Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb4c2ccb

net: hns3: fix get VF RSS issue · a638b1d8

Jian Shen authored Feb 23, 2019

For revision 0x20, VF shares the same RSS config with PF.
In original codes, it always return 0 when query RSS hash
key for VF. This patch fixes it by return the hash key
got from PF.

Fixes: 374ad291 ("net: hns3: net: hns3: Add RSS general configuration support for VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a638b1d8

net: hns3: enable VF VLAN filter for each VF when initializing · 30ebc576

Jian Shen authored Feb 23, 2019

For revision 0x21, the switch of VF VLAN filter is per function.
It's necessary to enable VF VLAN filter for each VF when initializing.
Otherwise, VF will be able to receive broadcast packets with unknown
VLAN when PF enters promisc mode.

Fixes: 64d114f0 ("net: hns3: Add egress/ingress vlan filter for revision 0x21")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30ebc576

net: hns3: add support to config depth for tx|rx ring separately · c0425944

Peng Li authored Feb 23, 2019

This patch adds support to config depth for tx|rx ring separately
by ethtool command "-G".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0425944

net: hns3: remove hnae3_get_bit in data path · e8149933

Yunsheng Lin authored Feb 23, 2019

The hnae3_get_bit uses hnae3_get_field, and hnae3_get_field
masks the data, which is unnecessary in data path.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8149933

net: hns3: replace hnae3_set_bit and hnae3_set_field in data path · cde4ffad

Yunsheng Lin authored Feb 23, 2019

hnae3_set_bit and hnae3_set_field masks the data before setting
the field or bit, which is unnecessary because the data is already
zero initialized.
Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cde4ffad

net: hns3: add unlikely for error handling in data path · 0cccebac

Yunsheng Lin authored Feb 23, 2019

This patch adds unlikely hint for error handling in critical data
path.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cccebac

net: hns3: remove some ops in struct hns3_nic_ops · d40fa7ee

Yunsheng Lin authored Feb 23, 2019

The fill_desc ops has only one implementation, and
get_rxd_bnum has not been used, so this patch removes
them.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d40fa7ee

net: hns3: limit some variable scope in critical data path · 47e7b13b

Yunsheng Lin authored Feb 23, 2019

This patch limits some variables' scope as much as possible in
hns3_fill_desc.

Also, only set l3_type and l4_type when necessary.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47e7b13b

net: hns3: avoid mult + div op in critical data path · 3fe13ed9

Yunsheng Lin authored Feb 23, 2019

This patch uses shift offset to avoid doing mult and div operation.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fe13ed9

net: hns3: add xps setting support for hns3 driver · 2a73ac3e

Yunsheng Lin authored Feb 23, 2019

This patch adds xps setting support for hns3 driver based on
the interrupt affinity info.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a73ac3e

Merge branch 'mlxsw-spectrum_acl-Don-t-take-rtnl-mutex-for-region-rehash' · 834f9b05

David S. Miller authored Feb 24, 2019

Ido Schimmel says:

====================
mlxsw: spectrum_acl: Don't take rtnl mutex for region rehash

Jiri says:

During region rehash, a new region is created with a more optimized set
of masks (ERPs). When transitioning to the new region, all the rules
from the old region are copied one-by-one to the new region. This
transition can be time consuming and currently done under RTNL lock.

In order to remove RTNL lock dependency during region rehash, introduce
multiple smaller locks guarding dedicated structures or parts of them.
That is the vast majority of this patchset. Only patch #1 is simple
cleanup and patches 12-15 are improving or introducing new selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

834f9b05