1. 29 Sep, 2022 5 commits
    • Menglong Dong's avatar
      mptcp: fix unreleased socket in accept queue · 30e51b92
      Menglong Dong authored
      The mptcp socket and its subflow sockets in accept queue can't be
      released after the process exit.
      
      While the release of a mptcp socket in listening state, the
      corresponding tcp socket will be released too. Meanwhile, the tcp
      socket in the unaccept queue will be released too. However, only init
      subflow is in the unaccept queue, and the joined subflow is not in the
      unaccept queue, which makes the joined subflow won't be released, and
      therefore the corresponding unaccepted mptcp socket will not be released
      to.
      
      This can be reproduced easily with following steps:
      
      1. create 2 namespace and veth:
         $ ip netns add mptcp-client
         $ ip netns add mptcp-server
         $ sysctl -w net.ipv4.conf.all.rp_filter=0
         $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
         $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
         $ ip link add red-client netns mptcp-client type veth peer red-server \
           netns mptcp-server
         $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
         $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
         $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
         $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
         $ ip -n mptcp-server link set red-server up
         $ ip -n mptcp-client link set red-client up
      
      2. configure the endpoint and limit for client and server:
         $ ip -n mptcp-server mptcp endpoint flush
         $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
         $ ip -n mptcp-client mptcp endpoint flush
         $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
         $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
           1 subflow
      
      3. listen and accept on a port, such as 9999. The nc command we used
         here is modified, which makes it use mptcp protocol by default.
         $ ip netns exec mptcp-server nc -l -k -p 9999
      
      4. open another *two* terminal and use each of them to connect to the
         server with the following command:
         $ ip netns exec mptcp-client nc 10.0.0.1 9999
         Input something after connect to trigger the connection of the second
         subflow. So that there are two established mptcp connections, with the
         second one still unaccepted.
      
      5. exit all the nc command, and check the tcp socket in server namespace.
         And you will find that there is one tcp socket in CLOSE_WAIT state
         and can't release forever.
      
      Fix this by closing all of the unaccepted mptcp socket in
      mptcp_subflow_queue_clean() with __mptcp_close().
      
      Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
      __mptcp_close() before they are released, so mptcp_sock_destruct(), which
      is used to clean the unaccepted mptcp socket, is not needed anymore.
      
      The selftests for mptcp is ran for this commit, and no new failures.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Fixes: 6aeed904 ("mptcp: fix race on unaccepted mptcp sockets")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarMengen Sun <mengensun@tencent.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30e51b92
    • Menglong Dong's avatar
      mptcp: factor out __mptcp_close() without socket lock · 26d3e21c
      Menglong Dong authored
      Factor out __mptcp_close() from mptcp_close(). The caller of
      __mptcp_close() should hold the socket lock, and cancel mptcp work when
      __mptcp_close() returns true.
      
      This function will be used in the next commit.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Fixes: 6aeed904 ("mptcp: fix race on unaccepted mptcp sockets")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarMengen Sun <mengensun@tencent.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      26d3e21c
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 3e1308a7
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      ice: xsk: ZC changes
      
      Maciej Fijalkowski says:
      
      This set consists of two fixes to issues that were either pointed out on
      indirectly (John was reviewing AF_XDP selftests that were testing ice's
      ZC support) mailing list or were directly reported by customers.
      
      First patch allows user space to see done descriptor in CQ even after a
      single frame being transmitted and second patch removes the need for
      having HW rings sized to power of 2 number of descriptors when used
      against AF_XDP.
      
      I also forgot to mention that due to the current Tx cleaning algorithm,
      4k HW ring was broken and these two patches bring it back to life, so we
      kill two birds with one stone.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: xsk: drop power of 2 ring size restriction for AF_XDP
        ice: xsk: change batched Tx descriptor cleaning
      ====================
      
      Link: https://lore.kernel.org/r/20220927164112.4011983-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3e1308a7
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} · c9da02bf
      Daniel Golle authored
      The bitmasks applied in RX_DMA_GET_SPORT and RX_DMA_GET_SPORT_V2 macros
      were swapped. Fix that.
      Reported-by: default avatarChen Minqiang <ptpt52@gmail.com>
      Fixes: 160d3a9b ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/YzMW+mg9UsaCdKRQ@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c9da02bf
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge · 276d37eb
      Vladimir Oltean authored
      Currently the following set of commands fails:
      
      $ ip link add br0 type bridge # vlan_filtering 0
      $ ip link set swp0 master br0
      $ bridge vlan
      port              vlan-id
      swp0              1 PVID Egress Untagged
      $ bridge vlan add dev swp0 vid 10
      Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
      
      Dumping ocelot->vlans, one can see that the 2 egress-untagged VLANs on swp0 are
      vid 1 (the bridge PVID) and vid 4094, a PVID used privately by the driver for
      VLAN-unaware bridging. So this is why bridge vid 10 is refused, despite
      'bridge vlan' showing a single egress untagged VLAN.
      
      As mentioned in the comment added, having this private VLAN does not impose
      restrictions to the hardware configuration, yet it is a bookkeeping problem.
      
      There are 2 possible solutions.
      
      One is to make the functions that operate on VLAN-unaware pvids:
      - ocelot_add_vlan_unaware_pvid()
      - ocelot_del_vlan_unaware_pvid()
      - ocelot_port_setup_dsa_8021q_cpu()
      - ocelot_port_teardown_dsa_8021q_cpu()
      call something different than ocelot_vlan_member_(add|del)(), the latter being
      the real problem, because it allocates a struct ocelot_bridge_vlan *vlan which
      it adds to ocelot->vlans. We don't really *need* the private VLANs in
      ocelot->vlans, it's just that we have the extra convenience of having the
      vlan->portmask cached in software (whereas without these structures, we'd have
      to create a raw ocelot_vlant_rmw_mask() procedure which reads back the current
      port mask from hardware).
      
      The other solution is to filter out the private VLANs from
      ocelot_port_num_untagged_vlans(), since they aren't what callers care about.
      We only need to do this to the mentioned function and not to
      ocelot_port_num_tagged_vlans(), because private VLANs are never egress-tagged.
      
      Nothing else seems to be broken in either solution, but the first one requires
      more rework which will conflict with the net-next change  36a0bf44 ("net:
      mscc: ocelot: set up tag_8021q CPU ports independent of user port affinity"),
      and I'd like to avoid that. So go with the other one.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220927122042.1100231-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      276d37eb
  2. 28 Sep, 2022 2 commits
  3. 27 Sep, 2022 14 commits
  4. 26 Sep, 2022 6 commits
  5. 24 Sep, 2022 1 commit
  6. 23 Sep, 2022 6 commits
  7. 22 Sep, 2022 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 504c25cb
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wifi, netfilter and can.
      
        A handful of awaited fixes here - revert of the FEC changes, bluetooth
        fix, fixes for iwlwifi spew.
      
        We added a warning in PHY/MDIO code which is triggering on a couple of
        platforms in a false-positive-ish way. If we can't iron that out over
        the week we'll drop it and re-add for 6.1.
      
        I've added a new "follow up fixes" section for fixes to fixes in
        6.0-rcs but it may actually give the false impression that those are
        problematic or that more testing time would have caught them. So
        likely a one time thing.
      
        Follow up fixes:
      
         - nf_tables_addchain: fix nft_counters_enabled underflow
      
         - ebtables: fix memory leak when blob is malformed
      
         - nf_ct_ftp: fix deadlock when nat rewrite is needed
      
        Current release - regressions:
      
         - Revert "fec: Restart PPS after link state change" and the related
           "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
      
         - Bluetooth: fix HCIGETDEVINFO regression
      
         - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
      
         - mptcp: fix fwd memory accounting on coalesce
      
         - rwlock removal fall out:
            - ipmr: always call ip{,6}_mr_forward() from RCU read-side
              critical section
            - ipv6: fix crash when IPv6 is administratively disabled
      
         - tcp: read multiple skbs in tcp_read_skb()
      
         - mdio_bus_phy_resume state warning fallout:
            - eth: ravb: fix PHY state warning splat during system resume
            - eth: sh_eth: fix PHY state warning splat during system resume
      
        Current release - new code bugs:
      
         - wifi: iwlwifi: don't spam logs with NSS>2 messages
      
         - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
      
        Previous releases - regressions:
      
         - bonding: fix NULL deref in bond_rr_gen_slave_id
      
         - wifi: iwlwifi: mark IWLMEI as broken
      
        Previous releases - always broken:
      
         - nf_conntrack helpers:
            - irc: tighten matching on DCC message
            - sip: fix ct_sip_walk_headers
            - osf: fix possible bogus match in nf_osf_find()
      
         - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
      
         - core: fix flow symmetric hash
      
         - bonding, team: unsync device addresses on ndo_stop
      
         - phy: micrel: fix shared interrupt on LAN8814"
      
      * tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
        selftests: forwarding: add shebang for sch_red.sh
        bnxt: prevent skb UAF after handing over to PTP worker
        net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
        net: sched: fix possible refcount leak in tc_new_tfilter()
        net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
        udp: Use WARN_ON_ONCE() in udp_read_skb()
        selftests: bonding: cause oops in bond_rr_gen_slave_id
        bonding: fix NULL deref in bond_rr_gen_slave_id
        net: phy: micrel: fix shared interrupt on LAN8814
        net/smc: Stop the CLC flow if no link to map buffers on
        ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
        net: atlantic: fix potential memory leak in aq_ndev_close()
        can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
        can: gs_usb: gs_can_open(): fix race dev->can.state condition
        can: flexcan: flexcan_mailbox_read() fix return value for drop = true
        net: sh_eth: Fix PHY state warning splat during system resume
        net: ravb: Fix PHY state warning splat during system resume
        netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
        netfilter: ebtables: fix memory leak when blob is malformed
        netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
        ...
      504c25cb
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-for-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 129e7152
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Use the right variable to check for shim insecure mode
      
       - Wipe setup_data field when booting via EFI
      
       - Add missing error check to efibc driver
      
      * tag 'efi-urgent-for-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: libstub: check Shim mode using MokSBStateRT
        efi: x86: Wipe setup_data on pure EFI boot
        efi: efibc: Guard against allocation failure
      129e7152
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 5e0a93e4
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix a NULL-pointer dereference at driver unbind and a potential
         resource leak in error path in gpio-mockup
      
       - make the irqchip immutable in gpio-ftgpio010
      
       - fix dereferencing a potentially uninitialized variable in gpio-tqmx86
      
       - fix interrupt registering in gpiolib's character device code
      
      * tag 'gpio-fixes-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpiolib: cdev: Set lineevent_state::irq after IRQ register successfully
        gpio: tqmx86: fix uninitialized variable girq
        gpio: ftgpio010: Make irqchip immutable
        gpio: mockup: Fix potential resource leakage when register a chip
        gpio: mockup: fix NULL pointer dereference when removing debugfs
      5e0a93e4
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.0-2022-09-21' of... · 9597f088
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.0-2022-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix polling of system-wide events related to mixing per-cpu and
         per-thread events.
      
       - Do not check if /proc/modules is unchanged when copying /proc/kcore,
         that doesn't get in the way of post processing analysis.
      
       - Include program header in ELF files generated for JIT files, so that
         they can be opened by tools using elfutils libraries.
      
       - Enter namespaces when synthesizing build-ids.
      
       - Fix some bugs related to a recent cpu_map overhaul where we should be
         using an index and not the cpu number.
      
       - Fix BPF program ELF section name, using the naming expected by libbpf
         when using BPF counters in 'perf stat'.
      
       - Add a new test for perf stat cgroup BPF counter.
      
       - Adjust check on 'perf test wp' for older kernels, where the
         PERF_EVENT_IOC_MODIFY_ATTRIBUTES ioctl isn't supported.
      
       - Sync x86 cpufeatures with the kernel sources, no changes in tooling.
      
      * tag 'perf-tools-fixes-for-v6.0-2022-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf tools: Honor namespace when synthesizing build-ids
        tools headers cpufeatures: Sync with the kernel sources
        perf kcore_copy: Do not check /proc/modules is unchanged
        libperf evlist: Fix polling of system-wide events
        perf record: Fix cpu mask bit setting for mixed mmaps
        perf test: Skip wp modify test on old kernels
        perf jit: Include program header in ELF files
        perf test: Add a new test for perf stat cgroup BPF counter
        perf stat: Use evsel->core.cpus to iterate cpus in BPF cgroup counters
        perf stat: Fix cpu map index in bperf cgroup code
        perf stat: Fix BPF program section name
      9597f088
    • Hangbin Liu's avatar
      selftests: forwarding: add shebang for sch_red.sh · 83e4b196
      Hangbin Liu authored
      RHEL/Fedora RPM build checks are stricter, and complain when executable
      files don't have a shebang line, e.g.
      
      *** WARNING: ./kselftests/net/forwarding/sch_red.sh is executable but has no shebang, removing executable bit
      
      Fix it by adding shebang line.
      
      Fixes: 6cf0291f ("selftests: forwarding: Add a RED test for SW datapath")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20220922024453.437757-1-liuhangbin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      83e4b196
    • Jakub Kicinski's avatar
      bnxt: prevent skb UAF after handing over to PTP worker · c31f26c8
      Jakub Kicinski authored
      When reading the timestamp is required bnxt_tx_int() hands
      over the ownership of the completed skb to the PTP worker.
      The skb should not be used afterwards, as the worker may
      run before the rest of our code and free the skb, leading
      to a use-after-free.
      
      Since dev_kfree_skb_any() accepts NULL make the loss of
      ownership more obvious and set skb to NULL.
      
      Fixes: 83bb623c ("bnxt_en: Transmit and retrieve packet timestamps")
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20220921201005.335390-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c31f26c8