1. 03 Oct, 2022 12 commits
  2. 01 Oct, 2022 1 commit
  3. 30 Sep, 2022 6 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 0bafedc5
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2022-09-29
      
      1) Use the inner instead of the outer protocol for GSO on inter
         address family tunnels. This fixes the GSO case for address
         family tunnels. From Sabrina Dubroca.
      
      2) Reset ipcomp_scratches with NULL when freed, otherwise
         it holds obsolete address. From Khalid Masum.
      
      3) Reinject transport-mode packets through workqueue
         instead of a tasklet. The tasklet might take too
         long to finish. From Liu Jian.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bafedc5
    • Neal Cardwell's avatar
      tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited · f4ce91ce
      Neal Cardwell authored
      This commit fixes a bug in the tracking of max_packets_out and
      is_cwnd_limited. This bug can cause the connection to fail to remember
      that is_cwnd_limited is true, causing the connection to fail to grow
      cwnd when it should, causing throughput to be lower than it should be.
      
      The following event sequence is an example that triggers the bug:
      
       (a) The connection is cwnd_limited, but packets_out is not at its
           peak due to TSO deferral deciding not to send another skb yet.
           In such cases the connection can advance max_packets_seq and set
           tp->is_cwnd_limited to true and max_packets_out to a small
           number.
      
      (b) Then later in the round trip the connection is pacing-limited (not
           cwnd-limited), and packets_out is larger. In such cases the
           connection would raise max_packets_out to a bigger number but
           (unexpectedly) flip tp->is_cwnd_limited from true to false.
      
      This commit fixes that bug.
      
      One straightforward fix would be to separately track (a) the next
      window after max_packets_out reaches a maximum, and (b) the next
      window after tp->is_cwnd_limited is set to true. But this would
      require consuming an extra u32 sequence number.
      
      Instead, to save space we track only the most important
      information. Specifically, we track the strongest available signal of
      the degree to which the cwnd is fully utilized:
      
      (1) If the connection is cwnd-limited then we remember that fact for
      the current window.
      
      (2) If the connection not cwnd-limited then we track the maximum
      number of outstanding packets in the current window.
      
      In particular, note that the new logic cannot trigger the buggy
      (a)/(b) sequence above because with the new logic a condition where
      tp->packets_out > tp->max_packets_out can only trigger an update of
      tp->is_cwnd_limited if tp->is_cwnd_limited is false.
      
      This first showed up in a testing of a BBRv2 dev branch, but this
      buggy behavior highlighted a general issue with the
      tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
      the proper rate for any TCP congestion control, including Reno or
      CUBIC.
      
      Fixes: ca8a2263 ("tcp: make cwnd-limited checks measurement-based, and gentler")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarKevin(Yudong) Yang <yyd@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4ce91ce
    • Xin Long's avatar
      sctp: handle the error returned from sctp_auth_asoc_init_active_key · 022152aa
      Xin Long authored
      When it returns an error from sctp_auth_asoc_init_active_key(), the
      active_key is actually not updated. The old sh_key will be freeed
      while it's still used as active key in asoc. Then an use-after-free
      will be triggered when sending patckets, as found by syzbot:
      
        sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
        sctp_set_owner_w net/sctp/socket.c:132 [inline]
        sctp_sendmsg_to_asoc+0xbd5/0x1a20 net/sctp/socket.c:1863
        sctp_sendmsg+0x1053/0x1d50 net/sctp/socket.c:2025
        inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
        sock_sendmsg_nosec net/socket.c:714 [inline]
        sock_sendmsg+0xcf/0x120 net/socket.c:734
      
      This patch is to fix it by not replacing the sh_key when it returns
      errors from sctp_auth_asoc_init_active_key() in sctp_auth_set_key().
      For sctp_auth_set_active_key(), old active_key_id will be set back
      to asoc->active_key_id when the same thing happens.
      
      Fixes: 58acd100 ("sctp: update active_key for asoc when old key is being replaced")
      Reported-by: syzbot+a236dd8e9622ed8954a3@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      022152aa
    • Duoming Zhou's avatar
      mISDN: fix use-after-free bugs in l1oip timer handlers · 2568a7e0
      Duoming Zhou authored
      The l1oip_cleanup() traverses the l1oip_ilist and calls
      release_card() to cleanup module and stack. However,
      release_card() calls del_timer() to delete the timers
      such as keep_tl and timeout_tl. If the timer handler is
      running, the del_timer() will not stop it and result in
      UAF bugs. One of the processes is shown below:
      
          (cleanup routine)          |        (timer handler)
      release_card()                 | l1oip_timeout()
       ...                           |
       del_timer()                   | ...
       ...                           |
       kfree(hc) //FREE              |
                                     | hc->timeout_on = 0 //USE
      
      Fix by calling del_timer_sync() in release_card(), which
      makes sure the timer handlers have finished before the
      resources, such as l1oip and so on, have been deallocated.
      
      What's more, the hc->workq and hc->socket_thread can kick
      those timers right back in. We add a bool flag to show
      if card is released. Then, check this flag in hc->workq
      and hc->socket_thread.
      
      Fixes: 3712b42d ("Add layer1 over IP support")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2568a7e0
    • Jakub Kicinski's avatar
      eth: alx: take rtnl_lock on resume · 6ad1c94e
      Jakub Kicinski authored
      Zbynek reports that alx trips an rtnl assertion on resume:
      
       RTNL: assertion failed at net/core/dev.c (2891)
       RIP: 0010:netif_set_real_num_tx_queues+0x1ac/0x1c0
       Call Trace:
        <TASK>
        __alx_open+0x230/0x570 [alx]
        alx_resume+0x54/0x80 [alx]
        ? pci_legacy_resume+0x80/0x80
        dpm_run_callback+0x4a/0x150
        device_resume+0x8b/0x190
        async_resume+0x19/0x30
        async_run_entry_fn+0x30/0x130
        process_one_work+0x1e5/0x3b0
      
      indeed the driver does not hold rtnl_lock during its internal close
      and re-open functions during suspend/resume. Note that this is not
      a huge bug as the driver implements its own locking, and does not
      implement changing the number of queues, but we need to silence
      the splat.
      
      Fixes: 4a5fe57e ("alx: use fine-grained locking instead of RTNL")
      Reported-and-tested-by: default avatarZbynek Michl <zbynek.michl@gmail.com>
      Reviewed-by: default avatarNiels Dossche <dossche.niels@gmail.com>
      Link: https://lore.kernel.org/r/20220928181236.1053043-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ad1c94e
    • Junichi Uekawa's avatar
      vhost/vsock: Use kvmalloc/kvfree for larger packets. · 0e3f7293
      Junichi Uekawa authored
      When copying a large file over sftp over vsock, data size is usually 32kB,
      and kmalloc seems to fail to try to allocate 32 32kB regions.
      
       vhost-5837: page allocation failure: order:4, mode:0x24040c0
       Call Trace:
        [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
        [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
        [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
        [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
        [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
        [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
        [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
        [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
        [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
        [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
        [<ffffffffb683ddce>] kthread+0xfd/0x105
        [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
        [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
        [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
        [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
      
      Work around by doing kvmalloc instead.
      
      Fixes: 433fc58e ("VSOCK: Introduce vhost_vsock.ko")
      Signed-off-by: default avatarJunichi Uekawa <uekawa@chromium.org>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e3f7293
  4. 29 Sep, 2022 13 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 511cce16
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wifi and can.
      
        Current release - regressions:
      
         - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()
      
         - wifi: fix locking in mac80211 mlme
      
         - eth:
            - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
            - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
      
        Previous releases - regressions:
      
         - wifi: fix regression with non-QoS drivers
      
        Previous releases - always broken:
      
         - mptcp: fix unreleased socket in accept queue
      
         - wifi:
            - don't start TX with fq->lock to fix deadlock
            - fix memory corruption in minstrel_ht_update_rates()
      
         - eth:
            - macb: fix ZynqMP SGMII non-wakeup source resume failure
            - mt7531: only do PLL once after the reset
            - usbnet: fix memory leak in usbnet_disconnect()
      
        Misc:
      
         - usb: qmi_wwan: add new usb-id for Dell branded EM7455"
      
      * tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
        mptcp: fix unreleased socket in accept queue
        mptcp: factor out __mptcp_close() without socket lock
        net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2}
        net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge
        can: c_can: don't cache TX messages for C_CAN cores
        ice: xsk: drop power of 2 ring size restriction for AF_XDP
        ice: xsk: change batched Tx descriptor cleaning
        net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
        selftests: Fix the if conditions of in test_extra_filter()
        net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
        net: stmmac: power up/down serdes in stmmac_open/release
        wifi: mac80211: mlme: Fix double unlock on assoc success handling
        wifi: mac80211: mlme: Fix missing unlock on beacon RX
        wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
        wifi: mac80211: fix regression with non-QoS drivers
        wifi: mac80211: ensure vif queues are operational after start
        wifi: mac80211: don't start TX with fq->lock to fix deadlock
        wifi: cfg80211: fix MCS divisor value
        net: hippi: Add missing pci_disable_device() in rr_init_one()
        net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
        ...
      511cce16
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · da9eede6
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
      
       - small fixes for iqs62x-keys and melfas_mip4 drivers
      
       - corrected register address in snvs_pwrkey driver
      
       - Synaptic driver will stop trying to use intertouch (native) mode on
         some Lenovo AMD devices
      
      * tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
        Input: synaptics - disable Intertouch for Lenovo T14 and P14s AMD G1
        Input: iqs62x-keys - drop unused device node references
        Input: melfas_mip4 - fix return value check in mip4_probe()
      da9eede6
    • Linus Torvalds's avatar
      Merge tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 71f18757
      Linus Torvalds authored
      Pull ATA fixes from Damien Le Moal:
       "Three late patches to fix problems discovered recently:
      
         - Add a horkage to disable link power management by default for the
           Pioneer BDR-207M and BDR-205 DVD drives (from Niklas)
      
         - Two patches to fix setting the maximum queue depth of libsas owned
           ATA devices (from me)"
      
      * tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libata-sata: Fix device queue depth control
        ata: libata-scsi: Fix initialization of device queue depth
        libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
      71f18757
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.0-3' of... · 81bcd4b5
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Some trivial fixes and cleanup"
      
      * tag 'loongarch-fixes-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Clean up loongson3_smp_ops declaration
        LoongArch: Fix and cleanup csr_era handling in do_ri()
        LoongArch: Align the address of kernel_entry to 4KB
      81bcd4b5
    • Yanteng Si's avatar
      LoongArch: Clean up loongson3_smp_ops declaration · 4f196cb6
      Yanteng Si authored
      Since loongson3_smp_ops is not used in LoongArch anymore, let's remove
      it for cleanup.
      
      Fixes: f2ac457a ("LoongArch: Add CPU definition headers")
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      4f196cb6
    • Huacai Chen's avatar
      LoongArch: Fix and cleanup csr_era handling in do_ri() · 06e76ace
      Huacai Chen authored
      We don't emulate reserved instructions and just send a signal to the
      current process now. So we don't need to call compute_return_era() to
      add 4 (point to the next instruction) to csr_era in pt_regs. RA/ERA's
      backup/restore is cleaned up as well.
      Signed-off-by: default avatarJun Yi <yijun@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      06e76ace
    • Huacai Chen's avatar
      LoongArch: Align the address of kernel_entry to 4KB · 2938431e
      Huacai Chen authored
      Align the address of kernel_entry to 4KB, to avoid early tlb miss
      exception in case the entry code crosses page boundary.
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      2938431e
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-properly-clean-up-unaccepted-subflows' · 3b04cba7
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Properly clean up unaccepted subflows
      
      Patch 1 factors out part of the mptcp_close() function for use by a caller
      that already owns the socket lock. This is a prerequisite for patch 2.
      
      Patch 2 is the fix that fully cleans up the unaccepted subflow sockets.
      ====================
      
      Link: https://lore.kernel.org/r/20220927193158.195729-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3b04cba7
    • Menglong Dong's avatar
      mptcp: fix unreleased socket in accept queue · 30e51b92
      Menglong Dong authored
      The mptcp socket and its subflow sockets in accept queue can't be
      released after the process exit.
      
      While the release of a mptcp socket in listening state, the
      corresponding tcp socket will be released too. Meanwhile, the tcp
      socket in the unaccept queue will be released too. However, only init
      subflow is in the unaccept queue, and the joined subflow is not in the
      unaccept queue, which makes the joined subflow won't be released, and
      therefore the corresponding unaccepted mptcp socket will not be released
      to.
      
      This can be reproduced easily with following steps:
      
      1. create 2 namespace and veth:
         $ ip netns add mptcp-client
         $ ip netns add mptcp-server
         $ sysctl -w net.ipv4.conf.all.rp_filter=0
         $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
         $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
         $ ip link add red-client netns mptcp-client type veth peer red-server \
           netns mptcp-server
         $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
         $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
         $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
         $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
         $ ip -n mptcp-server link set red-server up
         $ ip -n mptcp-client link set red-client up
      
      2. configure the endpoint and limit for client and server:
         $ ip -n mptcp-server mptcp endpoint flush
         $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
         $ ip -n mptcp-client mptcp endpoint flush
         $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
         $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
           1 subflow
      
      3. listen and accept on a port, such as 9999. The nc command we used
         here is modified, which makes it use mptcp protocol by default.
         $ ip netns exec mptcp-server nc -l -k -p 9999
      
      4. open another *two* terminal and use each of them to connect to the
         server with the following command:
         $ ip netns exec mptcp-client nc 10.0.0.1 9999
         Input something after connect to trigger the connection of the second
         subflow. So that there are two established mptcp connections, with the
         second one still unaccepted.
      
      5. exit all the nc command, and check the tcp socket in server namespace.
         And you will find that there is one tcp socket in CLOSE_WAIT state
         and can't release forever.
      
      Fix this by closing all of the unaccepted mptcp socket in
      mptcp_subflow_queue_clean() with __mptcp_close().
      
      Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
      __mptcp_close() before they are released, so mptcp_sock_destruct(), which
      is used to clean the unaccepted mptcp socket, is not needed anymore.
      
      The selftests for mptcp is ran for this commit, and no new failures.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Fixes: 6aeed904 ("mptcp: fix race on unaccepted mptcp sockets")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarMengen Sun <mengensun@tencent.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30e51b92
    • Menglong Dong's avatar
      mptcp: factor out __mptcp_close() without socket lock · 26d3e21c
      Menglong Dong authored
      Factor out __mptcp_close() from mptcp_close(). The caller of
      __mptcp_close() should hold the socket lock, and cancel mptcp work when
      __mptcp_close() returns true.
      
      This function will be used in the next commit.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Fixes: 6aeed904 ("mptcp: fix race on unaccepted mptcp sockets")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarMengen Sun <mengensun@tencent.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      26d3e21c
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 3e1308a7
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      ice: xsk: ZC changes
      
      Maciej Fijalkowski says:
      
      This set consists of two fixes to issues that were either pointed out on
      indirectly (John was reviewing AF_XDP selftests that were testing ice's
      ZC support) mailing list or were directly reported by customers.
      
      First patch allows user space to see done descriptor in CQ even after a
      single frame being transmitted and second patch removes the need for
      having HW rings sized to power of 2 number of descriptors when used
      against AF_XDP.
      
      I also forgot to mention that due to the current Tx cleaning algorithm,
      4k HW ring was broken and these two patches bring it back to life, so we
      kill two birds with one stone.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: xsk: drop power of 2 ring size restriction for AF_XDP
        ice: xsk: change batched Tx descriptor cleaning
      ====================
      
      Link: https://lore.kernel.org/r/20220927164112.4011983-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3e1308a7
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} · c9da02bf
      Daniel Golle authored
      The bitmasks applied in RX_DMA_GET_SPORT and RX_DMA_GET_SPORT_V2 macros
      were swapped. Fix that.
      Reported-by: default avatarChen Minqiang <ptpt52@gmail.com>
      Fixes: 160d3a9b ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/YzMW+mg9UsaCdKRQ@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c9da02bf
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge · 276d37eb
      Vladimir Oltean authored
      Currently the following set of commands fails:
      
      $ ip link add br0 type bridge # vlan_filtering 0
      $ ip link set swp0 master br0
      $ bridge vlan
      port              vlan-id
      swp0              1 PVID Egress Untagged
      $ bridge vlan add dev swp0 vid 10
      Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
      
      Dumping ocelot->vlans, one can see that the 2 egress-untagged VLANs on swp0 are
      vid 1 (the bridge PVID) and vid 4094, a PVID used privately by the driver for
      VLAN-unaware bridging. So this is why bridge vid 10 is refused, despite
      'bridge vlan' showing a single egress untagged VLAN.
      
      As mentioned in the comment added, having this private VLAN does not impose
      restrictions to the hardware configuration, yet it is a bookkeeping problem.
      
      There are 2 possible solutions.
      
      One is to make the functions that operate on VLAN-unaware pvids:
      - ocelot_add_vlan_unaware_pvid()
      - ocelot_del_vlan_unaware_pvid()
      - ocelot_port_setup_dsa_8021q_cpu()
      - ocelot_port_teardown_dsa_8021q_cpu()
      call something different than ocelot_vlan_member_(add|del)(), the latter being
      the real problem, because it allocates a struct ocelot_bridge_vlan *vlan which
      it adds to ocelot->vlans. We don't really *need* the private VLANs in
      ocelot->vlans, it's just that we have the extra convenience of having the
      vlan->portmask cached in software (whereas without these structures, we'd have
      to create a raw ocelot_vlant_rmw_mask() procedure which reads back the current
      port mask from hardware).
      
      The other solution is to filter out the private VLANs from
      ocelot_port_num_untagged_vlans(), since they aren't what callers care about.
      We only need to do this to the mentioned function and not to
      ocelot_port_num_tagged_vlans(), because private VLANs are never egress-tagged.
      
      Nothing else seems to be broken in either solution, but the first one requires
      more rework which will conflict with the net-next change  36a0bf44 ("net:
      mscc: ocelot: set up tag_8021q CPU ports independent of user port affinity"),
      and I'd like to avoid that. So go with the other one.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220927122042.1100231-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      276d37eb
  5. 28 Sep, 2022 8 commits
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c3e0e1e2
      Linus Torvalds authored
      Pull more irqchip fixes from Borislav Petkov:
       "More irqchip fixes for 6.0 from Marc Zyngier. Stuff got left hanging
        due to the whole Plumbers and vacations commotion.
      
         - A couple of configuration fixes for the recently merged Loongarch
           drivers
      
         - A fix to avoid dynamic allocation of a cpumask which was causing
           issues with PREEMPT_RT and the GICv3 ITS
      
         - A tightening of an error check in the stm32 exti driver"
      
      * tag 'irq_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/loongson-pch-lpc: Add dependence on LoongArch
        irqchip: Select downstream irqchip drivers for LoongArch CPU
        irqchip/gic-v3-its: Remove cpumask_var_t allocation
        irqchip/stm32-exti: Remove check on always false condition
      c3e0e1e2
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · e817c070
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "A couple of MMC fixes. This time there is also a fix for the ARM SCMI
        firmware driver, which has been acked by Sudeep Holla, the maintainer.
      
        MMC core:
         - Terminate infinite loop in SD-UHS voltage switch
      
        MMC host:
         - hsq: Fix kernel crash in the recovery path
         - moxart: Fix bus width configurations
         - sdhci: Fix kernel panic for cqe irq
      
        ARM_SCMI:
         - Fixup clock management by reverting 'firmware: arm_scmi: Add clock
           management to the SCMI power domain'"
      
      * tag 'mmc-v6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: hsq: Fix data stomping during mmc recovery
        Revert "firmware: arm_scmi: Add clock management to the SCMI power domain"
        mmc: core: Terminate infinite loop in SD-UHS voltage switch
        mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
        mmc: sdhci: Fix host->cmd is null
      e817c070
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.0-20220928' of... · af2faee5
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.0-20220928' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-09-28
      
      The patch is by me and targets the c_can driver. It disables an
      optimization in the TX path of C_CAN cores which causes problems.
      
      * tag 'linux-can-fixes-for-6.0-20220928' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: c_can: don't cache TX messages for C_CAN cores
      ====================
      
      Link: https://lore.kernel.org/r/20220928090629.1124190-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af2faee5
    • Shakeel Butt's avatar
      Revert "net: set proper memcg for net_init hooks allocations" · b1cab78b
      Shakeel Butt authored
      This reverts commit 1d0403d2.
      
      Anatoly Pugachev reported that the commit 1d0403d2 ("net: set proper
      memcg for net_init hooks allocations") is somehow causing the sparc64
      VMs failed to boot and the VMs boot fine with that patch reverted. So,
      revert the patch for now and later we can debug the issue.
      
      Link: https://lore.kernel.org/all/20220918092849.GA10314@u164.east.ru/Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Vasily Averin <vvs@openvz.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: cgroups@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-kernel@vger.kernel.org
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Fixes: 1d0403d2 ("net: set proper memcg for net_init hooks allocations")
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1cab78b
    • Damien Le Moal's avatar
      ata: libata-sata: Fix device queue depth control · 141f3d62
      Damien Le Moal authored
      The function __ata_change_queue_depth() uses the helper
      ata_scsi_find_dev() to get the ata_device structure of a scsi device and
      set that device maximum queue depth. However, when the ata device is
      managed by libsas, ata_scsi_find_dev() returns NULL, turning
      __ata_change_queue_depth() into a nop, which prevents the user from
      setting the maximum queue depth of ATA devices used with libsas based
      HBAs.
      
      Fix this by renaming __ata_change_queue_depth() to
      ata_change_queue_depth() and adding a pointer to the ata_device
      structure of the target device as argument. This pointer is provided by
      ata_scsi_change_queue_depth() using ata_scsi_find_dev() in the case of
      a libata managed device and by sas_change_queue_depth() using
      sas_to_ata_dev() in the case of a libsas managed ata device.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Tested-by: default avatarJohn Garry <john.garry@huawei.com>
      141f3d62
    • Damien Le Moal's avatar
      ata: libata-scsi: Fix initialization of device queue depth · 6a8438de
      Damien Le Moal authored
      For SATA devices supporting NCQ, drivers using libsas first initialize a
      scsi device queue depth based on the controller and device capabilities,
      leading to the scsi device queue_depth field being 32 (ATA maximum queue
      depth) for most setup. However, if libata was loaded using the
      force=[ID]]noncq argument, the default queue depth should be set to 1 to
      reflect the fact that queuable commands will never be used. This is
      consistent with manually setting a device queue depth to 1 through sysfs
      as that disables NCQ use for the device.
      
      Fix ata_scsi_dev_config() to honor the noncq parameter by sertting the
      device queue depth to 1 for devices that do not have the ATA_DFLAG_NCQ
      flag set.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Tested-by: default avatarJohn Garry <john.garry@huawei.com>
      6a8438de
    • Marc Kleine-Budde's avatar
      can: c_can: don't cache TX messages for C_CAN cores · 81d192c2
      Marc Kleine-Budde authored
      As Jacob noticed, the optimization introduced in 387da6bc ("can:
      c_can: cache frames to operate as a true FIFO") doesn't properly work
      on C_CAN, but on D_CAN IP cores. The exact reasons are still unknown.
      
      For now disable caching if CAN frames in the TX path for C_CAN cores.
      
      Fixes: 387da6bc ("can: c_can: cache frames to operate as a true FIFO")
      Link: https://lore.kernel.org/all/20220928083354.1062321-1-mkl@pengutronix.de
      Link: https://lore.kernel.org/all/15a8084b-9617-2da1-6704-d7e39d60643b@gmail.comReported-by: default avatarJacob Kroon <jacob.kroon@gmail.com>
      Tested-by: default avatarJacob Kroon <jacob.kroon@gmail.com>
      Cc: stable@vger.kernel.org # v5.15
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      81d192c2
    • Liu Jian's avatar
      xfrm: Reinject transport-mode packets through workqueue · 4f492066
      Liu Jian authored
      The following warning is displayed when the tcp6-multi-diffip11 stress
      test case of the LTP test suite is tested:
      
      watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ns-tcpserver:48198]
      CPU: 0 PID: 48198 Comm: ns-tcpserver Kdump: loaded Not tainted 6.0.0-rc6+ #39
      Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : des3_ede_encrypt+0x27c/0x460 [libdes]
      lr : 0x3f
      sp : ffff80000ceaa1b0
      x29: ffff80000ceaa1b0 x28: ffff0000df056100 x27: ffff0000e51e5280
      x26: ffff80004df75030 x25: ffff0000e51e4600 x24: 000000000000003b
      x23: 0000000000802080 x22: 000000000000003d x21: 0000000000000038
      x20: 0000000080000020 x19: 000000000000000a x18: 0000000000000033
      x17: ffff0000e51e4780 x16: ffff80004e2d1448 x15: ffff80004e2d1248
      x14: ffff0000e51e4680 x13: ffff80004e2d1348 x12: ffff80004e2d1548
      x11: ffff80004e2d1848 x10: ffff80004e2d1648 x9 : ffff80004e2d1748
      x8 : ffff80004e2d1948 x7 : 000000000bcaf83d x6 : 000000000000001b
      x5 : ffff80004e2d1048 x4 : 00000000761bf3bf x3 : 000000007f1dd0a3
      x2 : ffff0000e51e4780 x1 : ffff0000e3b9a2f8 x0 : 00000000db44e872
      Call trace:
       des3_ede_encrypt+0x27c/0x460 [libdes]
       crypto_des3_ede_encrypt+0x1c/0x30 [des_generic]
       crypto_cbc_encrypt+0x148/0x190
       crypto_skcipher_encrypt+0x2c/0x40
       crypto_authenc_encrypt+0xc8/0xfc [authenc]
       crypto_aead_encrypt+0x2c/0x40
       echainiv_encrypt+0x144/0x1a0 [echainiv]
       crypto_aead_encrypt+0x2c/0x40
       esp6_output_tail+0x1c8/0x5d0 [esp6]
       esp6_output+0x120/0x278 [esp6]
       xfrm_output_one+0x458/0x4ec
       xfrm_output_resume+0x6c/0x1f0
       xfrm_output+0xac/0x4ac
       __xfrm6_output+0x130/0x270
       xfrm6_output+0x60/0xec
       ip6_xmit+0x2ec/0x5bc
       inet6_csk_xmit+0xbc/0x10c
       __tcp_transmit_skb+0x460/0x8c0
       tcp_write_xmit+0x348/0x890
       __tcp_push_pending_frames+0x44/0x110
       tcp_rcv_established+0x3c8/0x720
       tcp_v6_do_rcv+0xdc/0x4a0
       tcp_v6_rcv+0xc24/0xcb0
       ip6_protocol_deliver_rcu+0xf0/0x574
       ip6_input_finish+0x48/0x7c
       ip6_input+0x48/0xc0
       ip6_rcv_finish+0x80/0x9c
       xfrm_trans_reinject+0xb0/0xf4
       tasklet_action_common.constprop.0+0xf8/0x134
       tasklet_action+0x30/0x3c
       __do_softirq+0x128/0x368
       do_softirq+0xb4/0xc0
       __local_bh_enable_ip+0xb0/0xb4
       put_cpu_fpsimd_context+0x40/0x70
       kernel_neon_end+0x20/0x40
       sha1_base_do_update.constprop.0.isra.0+0x11c/0x140 [sha1_ce]
       sha1_ce_finup+0x94/0x110 [sha1_ce]
       crypto_shash_finup+0x34/0xc0
       hmac_finup+0x48/0xe0
       crypto_shash_finup+0x34/0xc0
       shash_digest_unaligned+0x74/0x90
       crypto_shash_digest+0x4c/0x9c
       shash_ahash_digest+0xc8/0xf0
       shash_async_digest+0x28/0x34
       crypto_ahash_digest+0x48/0xcc
       crypto_authenc_genicv+0x88/0xcc [authenc]
       crypto_authenc_encrypt+0xd8/0xfc [authenc]
       crypto_aead_encrypt+0x2c/0x40
       echainiv_encrypt+0x144/0x1a0 [echainiv]
       crypto_aead_encrypt+0x2c/0x40
       esp6_output_tail+0x1c8/0x5d0 [esp6]
       esp6_output+0x120/0x278 [esp6]
       xfrm_output_one+0x458/0x4ec
       xfrm_output_resume+0x6c/0x1f0
       xfrm_output+0xac/0x4ac
       __xfrm6_output+0x130/0x270
       xfrm6_output+0x60/0xec
       ip6_xmit+0x2ec/0x5bc
       inet6_csk_xmit+0xbc/0x10c
       __tcp_transmit_skb+0x460/0x8c0
       tcp_write_xmit+0x348/0x890
       __tcp_push_pending_frames+0x44/0x110
       tcp_push+0xb4/0x14c
       tcp_sendmsg_locked+0x71c/0xb64
       tcp_sendmsg+0x40/0x6c
       inet6_sendmsg+0x4c/0x80
       sock_sendmsg+0x5c/0x6c
       __sys_sendto+0x128/0x15c
       __arm64_sys_sendto+0x30/0x40
       invoke_syscall+0x50/0x120
       el0_svc_common.constprop.0+0x170/0x194
       do_el0_svc+0x38/0x4c
       el0_svc+0x28/0xe0
       el0t_64_sync_handler+0xbc/0x13c
       el0t_64_sync+0x180/0x184
      
      Get softirq info by bcc tool:
      ./softirqs -NT 10
      Tracing soft irq event time... Hit Ctrl-C to end.
      
      15:34:34
      SOFTIRQ          TOTAL_nsecs
      block                 158990
      timer               20030920
      sched               46577080
      net_rx             676746820
      tasklet           9906067650
      
      15:34:45
      SOFTIRQ          TOTAL_nsecs
      block                  86100
      sched               38849790
      net_rx             676532470
      timer             1163848790
      tasklet           9409019620
      
      15:34:55
      SOFTIRQ          TOTAL_nsecs
      sched               58078450
      net_rx             475156720
      timer              533832410
      tasklet           9431333300
      
      The tasklet software interrupt takes too much time. Therefore, the
      xfrm_trans_reinject executor is changed from tasklet to workqueue. Add add
      spin lock to protect the queue. This reduces the processing flow of the
      tcp_sendmsg function in this scenario.
      
      Fixes: acf568ee ("xfrm: Reinject transport-mode packets through tasklet")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      4f492066