1. 20 Jul, 2021 10 commits
    • David S. Miller's avatar
      Merge branch 'octeon-DMAC' · 6f91d7ab
      David S. Miller authored
      Subbaraya Sundeep says:
      
      ====================
      octeontx2-af: Introduce DMAC based switching
      
      With this patch set packets can be switched between
      all CGX mapped PFs and VFs in the system based on
      the DMAC addresses. To implement this:
      AF allocates high priority rules from top entry(0) in MCAM.
      Rules are allocated for all the CGX mapped PFs and VFs though
      they are not active and with no NIXLFs attached.
      Rules for a PF/VF will be enabled only after they are brought up.
      Two rules one for TX and one for RX are allocated for each PF/VF.
      
      A packet sent from a PF/VF with a destination mac of another
      PF/VF will be hit by TX rule and sent to LBK channel 63. The
      same returned packet will be hit by RX rule whose action is
      to forward packet to PF/VF with that destination mac.
      
      Implementation of this for 98xx is tricky since there are
      two NIX blocks and till now a PF/VF can install rule for
      an NIX0/1 interface only if it is mapped to corresponding NIX0/1 block.
      Hence Tx rules are modified such that TX interface in MCAM
      entry can be either NIX0-TX or NIX1-TX.
      
      Testing:
      
      1. Create two VFs over PF1(on NIX0) and assign two VFs to two VMs
      2. Assign ip addresses to two VFs in VMs and PF2(on NIX1) in host.
      3. Assign static arp entries in two VMs and PF2.
      4. Ping between VMs and host PF2.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f91d7ab
    • Jakub Kicinski's avatar
      Merge branch 'net-hns3-fixes-for-net' · 97d0931f
      Jakub Kicinski authored
      Guangbin Huang says:
      
      ====================
      net: hns3: fixes for -net
      
      This series includes some bugfixes for the HNS3 ethernet driver.
      ====================
      
      Link: https://lore.kernel.org/r/1626685988-25869-1-git-send-email-huangguangbin2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      97d0931f
    • Jian Shen's avatar
      net: hns3: fix rx VLAN offload state inconsistent issue · bbfd4506
      Jian Shen authored
      Currently, VF doesn't enable rx VLAN offload when initializating,
      and PF does it for VFs. If user disable the rx VLAN offload for
      VF with ethtool -K, and reload the VF driver, it may cause the
      rx VLAN offload state being inconsistent between hardware and
      software.
      
      Fixes it by enabling rx VLAN offload when VF initializing.
      
      Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bbfd4506
    • Jian Shen's avatar
      net: hns3: disable port VLAN filter when support function level VLAN filter control · 184cd221
      Jian Shen authored
      For hardware limitation, port VLAN filter is port level, and
      effective for all the functions of the port. So if not support
      port VLAN bypass, it's necessary to disable the port VLAN filter,
      in order to support function level VLAN filter control.
      
      Fixes: 2ba30662 ("net: hns3: add support for modify VLAN filter state")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      184cd221
    • Peng Li's avatar
      net: hns3: add match_id to check mailbox response from PF to VF · 4671042f
      Peng Li authored
      When VF need response from PF, VF will wait (1us - 1s) to receive
      the response, or it will wait timeout and the VF action fails.
      If VF do not receive response in 1st action because timeout,
      the 2nd action may receive response for the 1st action, and get
      incorrect response data.VF must reciveve the right response from
      PF,or it will cause unexpected error.
      
      This patch adds match_id to check mailbox response from PF to VF,
      to make sure VF get the right response:
      1. The message sent from VF was labelled with match_id which was a
      unique 16-bit non-zero value.
      2. The response sent from PF will label with match_id which got from
      the request.
      3. The VF uses the match_id to match request and response message.
      
      This scheme depends on PF driver supports match_id, if PF driver doesn't
      support then VF will uses the original scheme.
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4671042f
    • Chengwen Feng's avatar
      net: hns3: fix possible mismatches resp of mailbox · 1b713d14
      Chengwen Feng authored
      Currently, the mailbox synchronous communication between VF and PF use
      the following fields to maintain communication:
      1. Origin_mbx_msg which was combined by message code and subcode, used
      to match request and response.
      2. Received_resp which means whether received response.
      
      There may possible mismatches of the following situation:
      1. VF sends message A with code=1 subcode=1.
      2. PF was blocked about 500ms when processing the message A.
      3. VF will detect message A timeout because it can't get the response
      within 500ms.
      4. VF sends message B with code=1 subcode=1 which equal message A.
      5. PF processes the first message A and send the response message to
      VF.
      6. VF will identify the response matched the message B because the
      code/subcode is the same. This will lead to mismatch of request and
      response.
      
      To fix the above bug, we use the following scheme:
      1. The message sent from VF was labelled with match_id which was a
      unique 16-bit non-zero value.
      2. The response sent from PF will label with match_id which got from
      the request.
      3. The VF uses the match_id to match request and response message.
      
      As for PF driver, it only needs to copy the match_id from request to
      response.
      
      Fixes: dde1a86e ("net: hns3: Add mailbox support to PF driver")
      Signed-off-by: default avatarChengwen Feng <fengchengwen@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b713d14
    • Vladimir Oltean's avatar
      net: bridge: do not replay fdb entries pointing towards the bridge twice · cbb56b03
      Vladimir Oltean authored
      This simple script:
      
      ip link add br0 type bridge
      ip link set swp2 master br0
      ip link set br0 address 00:01:02:03:04:05
      ip link del br0
      
      produces this result on a DSA switch:
      
      [  421.306399] br0: port 1(swp2) entered blocking state
      [  421.311445] br0: port 1(swp2) entered disabled state
      [  421.472553] device swp2 entered promiscuous mode
      [  421.488986] device swp2 left promiscuous mode
      [  421.493508] br0: port 1(swp2) entered disabled state
      [  421.886107] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 1 from fdb: -ENOENT
      [  421.894374] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 0 from fdb: -ENOENT
      [  421.943982] br0: port 1(swp2) entered blocking state
      [  421.949030] br0: port 1(swp2) entered disabled state
      [  422.112504] device swp2 entered promiscuous mode
      
      A very simplified view of what happens is:
      
      (1) the bridge port is created, and the bridge device inherits its MAC
          address
      
      (2) when joining, the bridge port (DSA) requests a replay of the
          addition of all FDB entries towards this bridge port and towards the
          bridge device itself. In fact, DSA calls br_fdb_replay() twice:
      
      	br_fdb_replay(br, brport_dev);
      	br_fdb_replay(br, br);
      
          DSA uses reference counting for the FDB entries. So the MAC address
          of the bridge is simply kept with refcount 2. When the bridge port
          leaves under normal circumstances, everything cancels out since the
          replay of the FDB entry deletion is also done twice per VLAN.
      
      (3) when the bridge MAC address changes, switchdev is notified of the
          deletion of the old address and of the insertion of the new one.
          But the old address does not really go away, since it had refcount
          2, and the new address is added "only" with refcount 1.
      
      (4) when the bridge port leaves now, it will replay a deletion of the
          FDB entries pointing towards the bridge twice. Then DSA will
          complain that it can't delete something that no longer exists.
      
      It is clear that the problem is that the FDB entries towards the bridge
      are replayed too many times, so let's fix that problem.
      
      Fixes: 63c51453 ("net: dsa: replay the local bridge FDB entries pointing to the bridge dev too")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20210719093916.4099032-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cbb56b03
    • Landen Chao's avatar
      net: Update MAINTAINERS for MediaTek switch driver · 6c2d1258
      Landen Chao authored
      Update maintainers for MediaTek switch driver with Deng Qingfang who has
      contributed many high-quality patches (interrupt, VLAN, GPIO, and etc.)
      and will help maintenance.
      Signed-off-by: default avatarLanden Chao <landen.chao@mediatek.com>
      Signed-off-by: default avatarDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/49e1aa8aac58dcbf1b5e036d09b3fa3bbb1d94d0.1626751861.git.landen.chao@mediatek.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c2d1258
    • Eric Dumazet's avatar
      net/tcp_fastopen: remove obsolete extern · 74946876
      Eric Dumazet authored
      After cited commit, sysctl_tcp_fastopen_blackhole_timeout is no longer
      a global variable.
      
      Fixes: 3733be14 ("ipv4: Namespaceify tcp_fastopen_blackhole_timeout knob")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Link: https://lore.kernel.org/r/20210719092028.3016745-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      74946876
    • Vasily Averin's avatar
      ipv6: ip6_finish_output2: set sk into newly allocated nskb · 2d85a1b3
      Vasily Averin authored
      skb_set_owner_w() should set sk not to old skb but to new nskb.
      
      Fixes: 5796015f ("ipv6: allocate enough headroom in ip6_finish_output2()")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Link: https://lore.kernel.org/r/70c0744f-89ae-1869-7e3e-4fa292158f4b@virtuozzo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d85a1b3
  2. 19 Jul, 2021 19 commits
  3. 18 Jul, 2021 4 commits
    • Nguyen Dinh Phi's avatar
      netrom: Decrease sock refcount when sock timers expire · 517a16b1
      Nguyen Dinh Phi authored
      Commit 63346650 ("netrom: switch to sock timer API") switched to use
      sock timer API. It replaces mod_timer() by sk_reset_timer(), and
      del_timer() by sk_stop_timer().
      
      Function sk_reset_timer() will increase the refcount of sock if it is
      called on an inactive timer, hence, in case the timer expires, we need to
      decrease the refcount ourselves in the handler, otherwise, the sock
      refcount will be unbalanced and the sock will never be freed.
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Reported-by: syzbot+10f1194569953b72f1ae@syzkaller.appspotmail.com
      Fixes: 63346650 ("netrom: switch to sock timer API")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      517a16b1
    • Xin Long's avatar
      sctp: trim optlen when it's a huge value in sctp_setsockopt · 2f3fdd8d
      Xin Long authored
      After commit ca84bd05 ("sctp: copy the optval from user space in
      sctp_setsockopt"), it does memory allocation in sctp_setsockopt with
      the optlen, and it would fail the allocation and return error if the
      optlen from user space is a huge value.
      
      This breaks some sockopts, like SCTP_HMAC_IDENT, SCTP_RESET_STREAMS and
      SCTP_AUTH_KEY, as when processing these sockopts before, optlen would
      be trimmed to a biggest value it needs when optlen is a huge value,
      instead of failing the allocation and returning error.
      
      This patch is to fix the allocation failure when it's a huge optlen from
      user space by trimming it to the biggest size sctp sockopt may need when
      necessary, and this biggest size is from sctp_setsockopt_reset_streams()
      for SCTP_RESET_STREAMS, which is bigger than those for SCTP_HMAC_IDENT
      and SCTP_AUTH_KEY.
      
      Fixes: ca84bd05 ("sctp: copy the optval from user space in sctp_setsockopt")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f3fdd8d
    • Pavel Skripkin's avatar
      net: sched: fix memory leak in tcindex_partial_destroy_work · f5051bce
      Pavel Skripkin authored
      Syzbot reported memory leak in tcindex_set_parms(). The problem was in
      non-freed perfect hash in tcindex_partial_destroy_work().
      
      In tcindex_set_parms() new tcindex_data is allocated and some fields from
      old one are copied to new one, but not the perfect hash. Since
      tcindex_partial_destroy_work() is the destroy function for old
      tcindex_data, we need to free perfect hash to avoid memory leak.
      
      Reported-and-tested-by: syzbot+f0bbb2287b8993d4fa74@syzkaller.appspotmail.com
      Fixes: 331b7292 ("net: sched: RCU cls_tcindex")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5051bce
    • Pravin B Shelar's avatar
      net: Fix zero-copy head len calculation. · a17ad096
      Pravin B Shelar authored
      In some cases skb head could be locked and entire header
      data is pulled from skb. When skb_zerocopy() called in such cases,
      following BUG is triggered. This patch fixes it by copying entire
      skb in such cases.
      This could be optimized incase this is performance bottleneck.
      
      ---8<---
      kernel BUG at net/core/skbuff.c:2961!
      invalid opcode: 0000 [#1] SMP PTI
      CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           OE     5.4.0-77-generic #86-Ubuntu
      Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:skb_zerocopy+0x37a/0x3a0
      RSP: 0018:ffffbcc70013ca38 EFLAGS: 00010246
      Call Trace:
       <IRQ>
       queue_userspace_packet+0x2af/0x5e0 [openvswitch]
       ovs_dp_upcall+0x3d/0x60 [openvswitch]
       ovs_dp_process_packet+0x125/0x150 [openvswitch]
       ovs_vport_receive+0x77/0xd0 [openvswitch]
       netdev_port_receive+0x87/0x130 [openvswitch]
       netdev_frame_hook+0x4b/0x60 [openvswitch]
       __netif_receive_skb_core+0x2b4/0xc90
       __netif_receive_skb_one_core+0x3f/0xa0
       __netif_receive_skb+0x18/0x60
       process_backlog+0xa9/0x160
       net_rx_action+0x142/0x390
       __do_softirq+0xe1/0x2d6
       irq_exit+0xae/0xb0
       do_IRQ+0x5a/0xf0
       common_interrupt+0xf/0xf
      
      Code that triggered BUG:
      int
      skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen)
      {
              int i, j = 0;
              int plen = 0; /* length of skb->head fragment */
              int ret;
              struct page *page;
              unsigned int offset;
      
              BUG_ON(!from->head_frag && !hlen);
      Signed-off-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a17ad096
  4. 17 Jul, 2021 1 commit
    • Mahesh Bandewar's avatar
      bonding: fix build issue · 5b69874f
      Mahesh Bandewar authored
      The commit 9a560550 (" bonding: Add struct bond_ipesc to manage SA") is causing
      following build error when XFRM is not selected in kernel config.
      
      lld: error: undefined symbol: xfrm_dev_state_flush
      >>> referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453)
      >>>               net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a
      
      Fixes: 9a560550 (" bonding: Add struct bond_ipesc to manage SA")
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      CC: Taehee Yoo <ap420073@gmail.com>
      CC: Jay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b69874f
  5. 16 Jul, 2021 3 commits
  6. 15 Jul, 2021 3 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 20192d9c
      David S. Miller authored
      Andrii Nakryiko says:
      
      ====================
      pull-request: bpf 2021-07-15
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 9 non-merge commits during the last 5 day(s) which contain
      a total of 9 files changed, 37 insertions(+), 15 deletions(-).
      
      The main changes are:
      
      1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and
         BPF_XDP_CPUMAP programs, from Xuan Zhuo.
      
      2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo.
      
      3) Follow-up fix to subprog poke descriptor use-after-free problem, from
         Daniel Borkmann and John Fastabend.
      
      4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King.
      
      5) Fix memory leak in BPF sockmap, from John Fastabend.
      
      6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend
         and Jakub Sitnicki.
      
      7) Fix NULL pointer dereference in bpftool, from Tobias Klauser.
      
      8) AF_XDP documentation fixes, from Baruch Siach.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20192d9c
    • Dongliang Mu's avatar
      usb: hso: fix error handling code of hso_create_net_device · a6ecfb39
      Dongliang Mu authored
      The current error handling code of hso_create_net_device is
      hso_free_net_device, no matter which errors lead to. For example,
      WARNING in hso_free_net_device [1].
      
      Fix this by refactoring the error handling code of
      hso_create_net_device by handling different errors by different code.
      
      [1] https://syzkaller.appspot.com/bug?id=66eff8d49af1b28370ad342787413e35bbe76efe
      
      Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com
      Fixes: 5fcfb6d0 ("hso: fix bailout in error case of probe")
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6ecfb39
    • Jia He's avatar
      qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union() · 6206b798
      Jia He authored
      Liajian reported a bug_on hit on a ThunderX2 arm64 server with FastLinQ
      QL41000 ethernet controller:
       BUG: scheduling while atomic: kworker/0:4/531/0x00000200
        [qed_probe:488()]hw prepare failed
        kernel BUG at mm/vmalloc.c:2355!
        Internal error: Oops - BUG: 0 [#1] SMP
        CPU: 0 PID: 531 Comm: kworker/0:4 Tainted: G W 5.4.0-77-generic #86-Ubuntu
        pstate: 00400009 (nzcv daif +PAN -UAO)
       Call trace:
        vunmap+0x4c/0x50
        iounmap+0x48/0x58
        qed_free_pci+0x60/0x80 [qed]
        qed_probe+0x35c/0x688 [qed]
        __qede_probe+0x88/0x5c8 [qede]
        qede_probe+0x60/0xe0 [qede]
        local_pci_probe+0x48/0xa0
        work_for_cpu_fn+0x24/0x38
        process_one_work+0x1d0/0x468
        worker_thread+0x238/0x4e0
        kthread+0xf0/0x118
        ret_from_fork+0x10/0x18
      
      In this case, qed_hw_prepare() returns error due to hw/fw error, but in
      theory work queue should be in process context instead of interrupt.
      
      The root cause might be the unpaired spin_{un}lock_bh() in
      _qed_mcp_cmd_and_union(), which causes botton half is disabled incorrectly.
      Reported-by: default avatarLijian Zhang <Lijian.Zhang@arm.com>
      Signed-off-by: default avatarJia He <justin.he@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6206b798