1. 22 Nov, 2022 2 commits
    • Chen Zhongjin's avatar
      xfrm: Fix ignored return value in xfrm6_init() · 40781bfb
      Chen Zhongjin authored
      When IPv6 module initializing in xfrm6_init(), register_pernet_subsys()
      is possible to fail but its return value is ignored.
      
      If IPv6 initialization fails later and xfrm6_fini() is called,
      removing uninitialized list in xfrm6_net_ops will cause null-ptr-deref:
      
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 1 PID: 330 Comm: insmod
      RIP: 0010:unregister_pernet_operations+0xc9/0x450
      Call Trace:
       <TASK>
       unregister_pernet_subsys+0x31/0x3e
       xfrm6_fini+0x16/0x30 [ipv6]
       ip6_route_init+0xcd/0x128 [ipv6]
       inet6_init+0x29c/0x602 [ipv6]
       ...
      
      Fix it by catching the error return value of register_pernet_subsys().
      
      Fixes: 8d068875 ("xfrm: make gc_thresh configurable in all namespaces")
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      40781bfb
    • Thomas Jarosch's avatar
      xfrm: Fix oops in __xfrm_state_delete() · b97df039
      Thomas Jarosch authored
      Kernel 5.14 added a new "byseq" index to speed
      up xfrm_state lookups by sequence number in commit
      fe9f1d87 ("xfrm: add state hashtable keyed by seq")
      
      While the patch was thorough, the function pfkey_send_new_mapping()
      in net/af_key.c also modifies x->km.seq and never added
      the current xfrm_state to the "byseq" index.
      
      This leads to the following kernel Ooops:
          BUG: kernel NULL pointer dereference, address: 0000000000000000
          ..
          RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
          ..
          Call Trace:
          <TASK>
          xfrm_state_delete+0x1e/0x40
          xfrm_del_sa+0xb0/0x110 [xfrm_user]
          xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
          ? remove_entity_load_avg+0x8a/0xa0
          ? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
          netlink_rcv_skb+0x51/0x100
          xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
          netlink_unicast+0x1a6/0x270
          netlink_sendmsg+0x22a/0x480
          __sys_sendto+0x1a6/0x1c0
          ? __audit_syscall_entry+0xd8/0x130
          ? __audit_syscall_exit+0x249/0x2b0
          __x64_sys_sendto+0x23/0x30
          do_syscall_64+0x3a/0x90
          entry_SYSCALL_64_after_hwframe+0x61/0xcb
      
      Exact location of the crash in __xfrm_state_delete():
          if (x->km.seq)
              hlist_del_rcu(&x->byseq);
      
      The hlist_node "byseq" was never populated.
      
      The bug only triggers if a new NAT traversal mapping (changed IP or port)
      is detected in esp_input_done2() / esp6_input_done2(), which in turn
      indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled
      with CONFIG_NET_KEY and "af_key" is active.
      
      The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367.
      Various implementations have been examined how they handle
      the "sadb_msg_seq" header field:
      
      - racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING
      - strongswan: does not care about sadb_msg_seq
      - openswan: does not care about sadb_msg_seq
      
      There is no standard how PF_KEYv2 sadb_msg_seq should be populated
      for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular
      implementations either. Herbert Xu suggested we should just
      use the current km.seq value as is. This fixes the root cause
      of the oops since we no longer modify km.seq itself.
      
      The update of "km.seq" looks like a copy'n'paste error
      from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq
      number according to RFC 2367. It has been verified that code paths
      involving pfkey_send_acquire() don't cause the same Oops.
      
      PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here:
          https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
          commit cbc34886
          Author:     Derek Atkins <derek@ihtfp.com>
          AuthorDate: Wed Apr 2 13:21:02 2003 -0800
      
              [IPSEC]: Implement UDP Encapsulation framework.
      
              In particular, implement ESPinUDP encapsulation for IPsec
              Nat Traversal.
      
      A note on triggering the bug: I was not able to trigger it using VMs.
      There is one VPN using a high latency link on our production VPN server
      that triggered it like once a day though.
      
      Link: https://github.com/strongswan/strongswan/issues/992
      Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/
      Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/
      
      Fixes: fe9f1d87 ("xfrm: add state hashtable keyed by seq")
      Reported-by: default avatarRoth Mark <rothm@mail.com>
      Reported-by: default avatarZhihao Chen <chenzhihao@meizu.com>
      Tested-by: default avatarRoth Mark <rothm@mail.com>
      Signed-off-by: default avatarThomas Jarosch <thomas.jarosch@intra2net.com>
      Acked-by: default avatarAntony Antony <antony.antony@secunet.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      b97df039
  2. 27 Oct, 2022 1 commit
  3. 19 Oct, 2022 1 commit
  4. 12 Oct, 2022 2 commits
    • Eyal Birger's avatar
      xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available · d83f7040
      Eyal Birger authored
      Ido reported that a kernel warning [1] can be triggered from
      user space when the kernel is compiled with CONFIG_MODULES=y and
      CONFIG_XFRM=n when adding an xfrm encap type route, e.g:
      
      $ ip route add 198.51.100.0/24 dev dummy1 encap xfrm if_id 1
      Error: lwt encapsulation type not supported.
      
      The reason for the warning is that the LWT infrastructure has an
      autoloading feature which is meant only for encap types that don't
      use a net device,  which is not the case in xfrm encap.
      
      Mute this warning for xfrm encap as there's no encap module to autoload
      in this case.
      
      [1]
       WARNING: CPU: 3 PID: 2746262 at net/core/lwtunnel.c:57 lwtunnel_valid_encap_type+0x4f/0x120
      [...]
       Call Trace:
        <TASK>
        rtm_to_fib_config+0x211/0x350
        inet_rtm_newroute+0x3a/0xa0
        rtnetlink_rcv_msg+0x154/0x3c0
        netlink_rcv_skb+0x49/0xf0
        netlink_unicast+0x22f/0x350
        netlink_sendmsg+0x208/0x440
        ____sys_sendmsg+0x21f/0x250
        ___sys_sendmsg+0x83/0xd0
        __sys_sendmsg+0x54/0xa0
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Fixes: 2c2493b9 ("xfrm: lwtunnel: add lwtunnel support for xfrm interfaces in collect_md mode")
      Signed-off-by: default avatarEyal Birger <eyal.birger@gmail.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      d83f7040
    • Eyal Birger's avatar
      xfrm: fix "disable_policy" on ipv4 early demux · 3a591318
      Eyal Birger authored
      The commit in the "Fixes" tag tried to avoid a case where policy check
      is ignored due to dst caching in next hops.
      
      However, when the traffic is locally consumed, the dst may be cached
      in a local TCP or UDP socket as part of early demux. In this case the
      "disable_policy" flag is not checked as ip_route_input_noref() was only
      called before caching, and thus, packets after the initial packet in a
      flow will be dropped if not matching policies.
      
      Fix by checking the "disable_policy" flag also when a valid dst is
      already available.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216557Reported-by: default avatarMonil Patel <monil191989@gmail.com>
      Fixes: e6175a2e ("xfrm: fix "disable_policy" flag use when arriving from different devices")
      Signed-off-by: default avatarEyal Birger <eyal.birger@gmail.com>
      
      ----
      
      v2: use dev instead of skb->dev
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      3a591318
  5. 06 Oct, 2022 5 commits
    • Jakub Kicinski's avatar
      Merge tag 'ieee802154-for-net-2022-10-05' of... · 1d22f78d
      Jakub Kicinski authored
      Merge tag 'ieee802154-for-net-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2022-10-05
      
      Only two patches this time around. A revert from Alexander Aring to a patch
      that hit net and the updated patch to fix the problem from Tetsuo Handa.
      
      * tag 'ieee802154-for-net-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
        net/ieee802154: don't warn zero-sized raw_sendmsg()
        Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
      ====================
      
      Link: https://lore.kernel.org/r/20221005144508.787376-1-stefan@datenfreihafen.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d22f78d
    • Alexandru Tachici's avatar
      net: ethernet: adi: adin1110: Add check in netdev_event · f9371935
      Alexandru Tachici authored
      Check whether this driver actually is the intended recipient of
      upper change event.
      
      Fixes: bc93e19d ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Link: https://lore.kernel.org/r/20221003111636.54973-1-alexandru.tachici@analog.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9371935
    • Casper Andersson's avatar
      229a0027
    • Geert Uytterhoeven's avatar
      net: pse-pd: PSE_REGULATOR should depend on REGULATOR · 304ee24b
      Geert Uytterhoeven authored
      The Regulator based PSE controller driver relies on regulator support to
      be enabled.  If regulator support is disabled, it will still compile
      fine, but won't operate correctly.
      
      Hence add a dependency on REGULATOR, to prevent asking the user about
      this driver when configuring a kernel without regulator support.
      
      Fixes: 66741b4e ("net: pse-pd: add regulator based PSE driver")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/709caac8873ff2a8b72b92091429be7c1a939959.1664900558.git.geert+renesas@glider.beSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      304ee24b
    • Vladimir Oltean's avatar
      Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs" · af7b29b1
      Vladimir Oltean authored
      taprio_attach() has this logic at the end, which should have been
      removed with the blamed patch (which is now being reverted):
      
      	/* access to the child qdiscs is not needed in offload mode */
      	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
      		kfree(q->qdiscs);
      		q->qdiscs = NULL;
      	}
      
      because otherwise, we make use of q->qdiscs[] even after this array was
      deallocated, namely in taprio_leaf(). Therefore, whenever one would try
      to attach a valid child qdisc to a fully offloaded taprio root, one
      would immediately dereference a NULL pointer.
      
      $ tc qdisc replace dev eno0 handle 8001: parent root taprio \
      	num_tc 8 \
      	map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      	max-sdu 0 0 0 0 0 200 0 0 \
      	base-time 200 \
      	sched-entry S 80 20000 \
      	sched-entry S a0 20000 \
      	sched-entry S 5f 60000 \
      	flags 2
      $ max_frame_size=1500
      $ data_rate_kbps=20000
      $ port_transmit_rate_kbps=1000000
      $ idleslope=$data_rate_kbps
      $ sendslope=$(($idleslope - $port_transmit_rate_kbps))
      $ locredit=$(($max_frame_size * $sendslope / $port_transmit_rate_kbps))
      $ hicredit=$(($max_frame_size * $idleslope / $port_transmit_rate_kbps))
      $ tc qdisc replace dev eno0 parent 8001:7 cbs \
      	idleslope $idleslope \
      	sendslope $sendslope \
      	hicredit $hicredit \
      	locredit $locredit \
      	offload 0
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
      pc : taprio_leaf+0x28/0x40
      lr : qdisc_leaf+0x3c/0x60
      Call trace:
       taprio_leaf+0x28/0x40
       tc_modify_qdisc+0xf0/0x72c
       rtnetlink_rcv_msg+0x12c/0x390
       netlink_rcv_skb+0x5c/0x130
       rtnetlink_rcv+0x1c/0x2c
      
      The solution is not as obvious as the problem. The code which deallocates
      q->qdiscs[] is in fact copied and pasted from mqprio, which also
      deallocates the array in mqprio_attach() and never uses it afterwards.
      
      Therefore, the identical cleanup logic of priv->qdiscs[] that
      mqprio_destroy() has is deceptive because it will never take place at
      qdisc_destroy() time, but just at raw ops->destroy() time (otherwise
      said, priv->qdiscs[] do not last for the entire lifetime of the mqprio
      root), but rather, this is just the twisted way in which the Qdisc API
      understands error path cleanup should be done (Qdisc_ops :: destroy() is
      called even when Qdisc_ops :: init() never succeeded).
      
      Side note, in fact this is also what the comment in mqprio_init() says:
      
      	/* pre-allocate qdisc, attachment can't fail */
      
      Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as
      data passing between Qdisc_ops :: init() and Qdisc_ops :: attach().
      
      [ this comment was also copied and pasted into the initial taprio
        commit, even though taprio_attach() came way later ]
      
      The problem is that taprio also makes extensive use of the q->qdiscs[]
      array in the software fast path (taprio_enqueue() and taprio_dequeue()),
      but it does not keep a reference of its own on q->qdiscs[i] (you'd think
      that since it creates these Qdiscs, it holds the reference, but nope,
      this is not completely true).
      
      To understand the difference between taprio_destroy() and mqprio_destroy()
      one must look before commit 13511704 ("net: taprio offload: enforce
      qdisc to netdev queue mapping"), because that just muddied the waters.
      
      In the "original" taprio design, taprio always attached itself (the root
      Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go
      through taprio_enqueue().
      
      It also called qdisc_refcount_inc() on itself for as many times as there
      were netdev TX queues, in order to counter-balance what tc_get_qdisc()
      does when destroying a Qdisc (simplified for brevity below):
      
      	if (n->nlmsg_type == RTM_DELQDISC)
      		err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack);
      
      qdisc_graft(where "new" is NULL so this deletes the Qdisc):
      
      	for (i = 0; i < num_q; i++) {
      		struct netdev_queue *dev_queue;
      
      		dev_queue = netdev_get_tx_queue(dev, i);
      
      		old = dev_graft_qdisc(dev_queue, new);
      		if (new && i > 0)
      			qdisc_refcount_inc(new);
      
      		qdisc_put(old);
      		~~~~~~~~~~~~~~
      		this decrements taprio's refcount once for each TX queue
      	}
      
      	notify_and_destroy(net, skb, n, classid,
      			   rtnl_dereference(dev->qdisc), new);
      			   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      			   and this finally decrements it to zero,
      			   making qdisc_put() call qdisc_destroy()
      
      The q->qdiscs[] created using qdisc_create_dflt() (or their
      replacements, if taprio_graft() was ever to get called) were then
      privately freed by taprio_destroy().
      
      This is still what is happening after commit 13511704 ("net: taprio
      offload: enforce qdisc to netdev queue mapping"), but only for software
      mode.
      
      In full offload mode, the per-txq "qdisc_put(old)" calls from
      qdisc_graft() now deallocate the child Qdiscs rather than decrement
      taprio's refcount. So when notify_and_destroy(taprio) finally calls
      taprio_destroy(), the difference is that the child Qdiscs were already
      deallocated.
      
      And this is exactly why the taprio_attach() comment "access to the child
      qdiscs is not needed in offload mode" is deceptive too. Not only the
      q->qdiscs[] array is not needed, but it is also necessary to get rid of
      it as soon as possible, because otherwise, we will also call qdisc_put()
      on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this
      will cause a nasty use-after-free/refcount-saturate/whatever.
      
      In short, the problem is that since the blamed commit, taprio_leaf()
      needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy()
      -> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach()
      for full offload. Fixing one problem triggers the other.
      
      All of this can be solved by making taprio keep its q->qdiscs[i] with a
      refcount elevated at 2 (in offloaded mode where they are attached to the
      netdev TX queues), both in taprio_attach() and in taprio_graft(). The
      generic qdisc_graft() would just decrement the child qdiscs' refcounts
      to 1, and taprio_destroy() would give them the final coup de grace.
      
      However the rabbit hole of changes is getting quite deep, and the
      complexity increases. The blamed commit was supposed to be a bug fix in
      the first place, and the bug it addressed is not so significant so as to
      justify further rework in stable trees. So I'd rather just revert it.
      I don't know enough about multi-queue Qdisc design to make a proper
      judgement right now regarding what is/isn't idiomatic use of Qdisc
      concepts in taprio. I will try to study the problem more and come with a
      different solution in net-next.
      
      Fixes: 1461d212 ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs")
      Reported-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Reported-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af7b29b1
  6. 05 Oct, 2022 2 commits
  7. 04 Oct, 2022 27 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 0326074f
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "Core:
      
         - Introduce and use a single page frag cache for allocating small skb
           heads, clawing back the 10-20% performance regression in UDP flood
           test from previous fixes.
      
         - Run packets which already went thru HW coalescing thru SW GRO. This
           significantly improves TCP segment coalescing and simplifies
           deployments as different workloads benefit from HW or SW GRO.
      
         - Shrink the size of the base zero-copy send structure.
      
         - Move TCP init under a new slow / sleepable version of DO_ONCE().
      
        BPF:
      
         - Add BPF-specific, any-context-safe memory allocator.
      
         - Add helpers/kfuncs for PKCS#7 signature verification from BPF
           programs.
      
         - Define a new map type and related helpers for user space -> kernel
           communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
      
         - Allow targeting BPF iterators to loop through resources of one
           task/thread.
      
         - Add ability to call selected destructive functions. Expose
           crash_kexec() to allow BPF to trigger a kernel dump. Use
           CAP_SYS_BOOT check on the loading process to judge permissions.
      
         - Enable BPF to collect custom hierarchical cgroup stats efficiently
           by integrating with the rstat framework.
      
         - Support struct arguments for trampoline based programs. Only
           structs with size <= 16B and x86 are supported.
      
         - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
           sockets (instead of just TCP and UDP sockets).
      
         - Add a helper for accessing CLOCK_TAI for time sensitive network
           related programs.
      
         - Support accessing network tunnel metadata's flags.
      
         - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
      
         - Add support for writing to Netfilter's nf_conn:mark.
      
        Protocols:
      
         - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
           (MLO) work (802.11be, WiFi 7).
      
         - vsock: improve support for SO_RCVLOWAT.
      
         - SMC: support SO_REUSEPORT.
      
         - Netlink: define and document how to use netlink in a "modern" way.
           Support reporting missing attributes via extended ACK.
      
         - IPSec: support collect metadata mode for xfrm interfaces.
      
         - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
           packets.
      
         - TCP: introduce optional per-netns connection hash table to allow
           better isolation between namespaces (opt-in, at the cost of memory
           and cache pressure).
      
         - MPTCP: support TCP_FASTOPEN_CONNECT.
      
         - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
      
         - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
      
         - Open vSwitch:
            - Allow specifying ifindex of new interfaces.
            - Allow conntrack and metering in non-initial user namespace.
      
         - TLS: support the Korean ARIA-GCM crypto algorithm.
      
         - Remove DECnet support.
      
        Driver API:
      
         - Allow selecting the conduit interface used by each port in DSA
           switches, at runtime.
      
         - Ethernet Power Sourcing Equipment and Power Device support.
      
         - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
           traffic class max frame size for time-based packet schedules.
      
         - Support PHY rate matching - adapting between differing host-side
           and link-side speeds.
      
         - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
      
         - Validate OF (device tree) nodes for DSA shared ports; make
           phylink-related properties mandatory on DSA and CPU ports.
           Enforcing more uniformity should allow transitioning to phylink.
      
         - Require that flash component name used during update matches one of
           the components for which version is reported by info_get().
      
         - Remove "weight" argument from driver-facing NAPI API as much as
           possible. It's one of those magic knobs which seemed like a good
           idea at the time but is too indirect to use in practice.
      
         - Support offload of TLS connections with 256 bit keys.
      
        New hardware / drivers:
      
         - Ethernet:
            - Microchip KSZ9896 6-port Gigabit Ethernet Switch
            - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
            - Analog Devices ADIN1110 and ADIN2111 industrial single pair
              Ethernet (10BASE-T1L) MAC+PHY.
            - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
      
         - Ethernet SFPs / modules:
            - RollBall / Hilink / Turris 10G copper SFPs
            - HALNy GPON module
      
         - WiFi:
            - CYW43439 SDIO chipset (brcmfmac)
            - CYW89459 PCIe chipset (brcmfmac)
            - BCM4378 on Apple platforms (brcmfmac)
      
        Drivers:
      
         - CAN:
            - gs_usb: HW timestamp support
      
         - Ethernet PHYs:
            - lan8814: cable diagnostics
      
         - Ethernet NICs:
            - Intel (100G):
               - implement control of FCS/CRC stripping
               - port splitting via devlink
               - L2TPv3 filtering offload
            - nVidia/Mellanox:
               - tunnel offload for sub-functions
               - MACSec offload, w/ Extended packet number and replay window
                 offload
               - significantly restructure, and optimize the AF_XDP support,
                 align the behavior with other vendors
            - Huawei:
               - configuring DSCP map for traffic class selection
               - querying standard FEC statistics
               - querying SerDes lane number via ethtool
            - Marvell/Cavium:
               - egress priority flow control
               - MACSec offload
            - AMD/SolarFlare:
               - PTP over IPv6 and raw Ethernet
            - small / embedded:
               - ax88772: convert to phylink (to support SFP cages)
               - altera: tse: convert to phylink
               - ftgmac100: support fixed link
               - enetc: standard Ethtool counters
               - macb: ZynqMP SGMII dynamic configuration support
               - tsnep: support multi-queue and use page pool
               - lan743x: Rx IP & TCP checksum offload
               - igc: add xdp frags support to ndo_xdp_xmit
      
         - Ethernet high-speed switches:
            - Marvell (prestera):
               - support SPAN port features (traffic mirroring)
               - nexthop object offloading
            - Microchip (sparx5):
               - multicast forwarding offload
               - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
      
         - Ethernet embedded switches:
            - Marvell (mv88e6xxx):
               - support RGMII cmode
            - NXP (felix):
               - standardized ethtool counters
            - Microchip (lan966x):
               - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
               - traffic policing and mirroring
               - link aggregation / bonding offload
               - QUSGMII PHY mode support
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - cold boot calibration support on WCN6750
            - support to connect to a non-transmit MBSSID AP profile
            - enable remain-on-channel support on WCN6750
            - Wake-on-WLAN support for WCN6750
            - support to provide transmit power from firmware via nl80211
            - support to get power save duration for each client
            - spectral scan support for 160 MHz
      
         - MediaTek WiFi (mt76):
            - WiFi-to-Ethernet bridging offload for MT7986 chips
      
         - RealTek WiFi (rtw89):
            - P2P support"
      
      * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
        eth: pse: add missing static inlines
        once: rename _SLOW to _SLEEPABLE
        net: pse-pd: add regulator based PSE driver
        dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
        ethtool: add interface to interact with Ethernet Power Equipment
        net: mdiobus: search for PSE nodes by parsing PHY nodes.
        net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
        net: add framework to support Ethernet PSE and PDs devices
        dt-bindings: net: phy: add PoDL PSE property
        net: marvell: prestera: Propagate nh state from hw to kernel
        net: marvell: prestera: Add neighbour cache accounting
        net: marvell: prestera: add stub handler neighbour events
        net: marvell: prestera: Add heplers to interact with fib_notifier_info
        net: marvell: prestera: Add length macros for prestera_ip_addr
        net: marvell: prestera: add delayed wq and flush wq on deinit
        net: marvell: prestera: Add strict cleanup of fib arbiter
        net: marvell: prestera: Add cleanup of allocated fib_nodes
        net: marvell: prestera: Add router nexthops ABI
        eth: octeon: fix build after netif_napi_add() changes
        net/mlx5: E-Switch, Return EBUSY if can't get mode lock
        ...
      0326074f
    • Linus Torvalds's avatar
      Merge tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux · 522667b2
      Linus Torvalds authored
      Pull landlock updates from Mickaël Salaün:
       "Improve user help for Landlock (documentation and sample)"
      
      * tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
        landlock: Fix documentation style
        landlock: Slightly improve documentation and fix spelling
        samples/landlock: Print hints about ABI versions
      522667b2
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · c645c11a
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Six audit patches for v6.1, most are pretty trivial, but a quick list
        of the highlights are below:
      
         - Only free the audit proctitle information on task exit. This allows
           us to cache the information and improve performance slightly.
      
         - Use the time_after() macro to do time comparisons instead of doing
           it directly and potentially causing ourselves problems when the
           timer wraps.
      
         - Convert an audit_context state comparison from a relative enum
           comparison, e.g. (x < y), to a not-equal comparison to ensure that
           we are not caught out at some unknown point in the future by an
           enum shuffle.
      
         - A handful of small cleanups such as tidying up comments and
           removing unused declarations"
      
      * tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: remove selinux_audit_rule_update() declaration
        audit: use time_after to compare time
        audit: free audit_proctitle only on task exit
        audit: explicitly check audit_context->context enum value
        audit: audit_context pid unused, context enum comment fix
        audit: fix repeated words in comments
      c645c11a
    • Linus Torvalds's avatar
      Merge tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3eba620e
      Linus Torvalds authored
      Pull x86 cleanups from Borislav Petkov:
      
       - The usual round of smaller fixes and cleanups all over the tree
      
      * tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
        x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32
        x86: Fix various duplicate-word comment typos
        x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h
      3eba620e
    • Linus Torvalds's avatar
      Merge tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 193e2268
      Linus Torvalds authored
      Pull x86 cache resource control updates from Borislav Petkov:
      
       - More work by James Morse to disentangle the resctrl filesystem
         generic code from the architectural one with the endgoal of plugging
         ARM's MPAM implementation into it too so that the user interface
         remains the same
      
       - Properly restore the MSR_MISC_FEATURE_CONTROL value instead of
         blindly overwriting it to 0
      
      * tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
        x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
        x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
        x86/resctrl: Rename and change the units of resctrl_cqm_threshold
        x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
        x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
        x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
        x86/resctrl: Abstract __rmid_read()
        x86/resctrl: Allow per-rmid arch private storage to be reset
        x86/resctrl: Add per-rmid arch private storage for overflow and chunks
        x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
        x86/resctrl: Allow update_mba_bw() to update controls directly
        x86/resctrl: Remove architecture copy of mbps_val
        x86/resctrl: Switch over to the resctrl mbps_val list
        x86/resctrl: Create mba_sc configuration in the rdt_domain
        x86/resctrl: Abstract and use supports_mba_mbps()
        x86/resctrl: Remove set_mba_sc()s control array re-initialisation
        x86/resctrl: Add domain offline callback for resctrl work
        x86/resctrl: Group struct rdt_hw_domain cleanup
        x86/resctrl: Add domain online callback for resctrl work
        x86/resctrl: Merge mon_capable and mon_enabled
        ...
      193e2268
    • Linus Torvalds's avatar
      Merge tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b5f0b113
      Linus Torvalds authored
      Pull x75 microcode loader updates from Borislav Petkov:
      
       - Get rid of a single ksize() usage
      
       - By popular demand, print the previous microcode revision an update
         was done over
      
       - Remove more code related to the now gone MICROCODE_OLD_INTERFACE
      
       - Document the problems stemming from microcode late loading
      
      * tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode/AMD: Track patch allocation size explicitly
        x86/microcode: Print previous version of microcode after reload
        x86/microcode: Remove ->request_microcode_user()
        x86/microcode: Document the whole late loading problem
      b5f0b113
    • Linus Torvalds's avatar
      Merge tag 'x86_paravirt_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9bf445b6
      Linus Torvalds authored
      Pull x86 paravirt fix from Borislav Petkov:
      
       - Ensure paravirt patching site descriptors are aligned properly so
         that code can do proper arithmetic with their addresses
      
      * tag 'x86_paravirt_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/paravirt: Ensure proper alignment
      9bf445b6
    • Linus Torvalds's avatar
      Merge tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 901735e5
      Linus Torvalds authored
      Pull misc x86 fixes from Borislav Petkov:
      
       - Drop misleading "RIP" from the opcodes dumping message
      
       - Correct APM entry's Konfig help text
      
      * tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/dumpstack: Don't mention RIP in "Code: "
        x86/Kconfig: Specify idle=poll instead of no-hlt
      901735e5
    • Linus Torvalds's avatar
      Merge tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bb1f1154
      Linus Torvalds authored
      Pull x86 asm update from Borislav Petkov:
      
       - Use the __builtin_ffs/ctzl() compiler builtins for the constant
         argument case in the kernel's optimized ffs()/ffz() helpers in order
         to make use of the compiler's constant folding optmization passes.
      
      * tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions
        x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
      bb1f1154
    • Linus Torvalds's avatar
      Merge tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8cded8fb
      Linus Torvalds authored
      Pull x86 core fixes from Borislav Petkov:
      
       - Make sure an INT3 is slapped after every unconditional retpoline JMP
         as both vendors suggest
      
       - Clean up pciserial a bit
      
      * tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86,retpoline: Be sure to emit INT3 after JMP *%\reg
        x86/earlyprintk: Clean up pciserial
      8cded8fb
    • Linus Torvalds's avatar
      Merge tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5bb3a16d
      Linus Torvalds authored
      Pull x86 APIC update from Borislav Petkov:
      
       - Add support for locking the APIC in X2APIC mode to prevent SGX
         enclave leaks
      
      * tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic: Don't disable x2APIC if locked
      5bb3a16d
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 51eaa866
      Linus Torvalds authored
      Pull x86 RAS updates from Borislav Petkov:
      
       - Fix the APEI MCE callback handler to consult the hardware about the
         granularity of the memory error instead of hard-coding it
      
       - Offline memory pages on Intel machines after 2 errors reported per
         page
      
      * tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Retrieve poison range from hardware
        RAS/CEC: Reduce offline page threshold for Intel systems
      51eaa866
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7db99f01
      Linus Torvalds authored
      Pull x86 cpu updates from Borislav Petkov:
      
       - Print the CPU number at segfault time.
      
         The number printed is not always accurate (preemption is enabled at
         that time) but the print string contains "likely" and after a lot of
         back'n'forth on this, this was the consensus that was reached. See
         thread at [1].
      
       - After a *lot* of testing and polishing, finally the clear_user()
         improvements to inline REP; STOSB by default
      
      Link: https://lore.kernel.org/r/5d62c1d0-7425-d5bb-ecb5-1dc3b4d7d245@intel.com [1]
      
      * tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Print likely CPU at segfault time
        x86/clear_user: Make it faster
      7db99f01
    • Linus Torvalds's avatar
      Merge tag 'x86_sgx_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ba94a7a9
      Linus Torvalds authored
      Pull x86 SGX update from Borislav Petkov:
      
       - Improve the documentation of a couple of SGX functions handling
         backing storage
      
      * tag 'x86_sgx_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sgx: Improve comments for sgx_encl_lookup/alloc_backing()
      ba94a7a9
    • Linus Torvalds's avatar
      Merge tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f8475a67
      Linus Torvalds authored
      Pull x86 RTC cleanups from Borislav Petkov:
      
       - Cleanup x86/rtc.c and delete duplicated functionality in favor of
         using the respective functionality from the RTC library
      
      * tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/rtc: Rename mach_set_rtc_mmss() to mach_set_cmos_time()
        x86/rtc: Rewrite & simplify mach_get_cmos_time() by deleting duplicated functionality
      f8475a67
    • Linus Torvalds's avatar
      Merge tag 'x86_platform_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3339914a
      Linus Torvalds authored
      Pull x86 platform update from Borislav Petkov:
       "A single x86/platform improvement when the kernel is running as an
        ACRN guest:
      
         - Get TSC and CPU frequency from CPUID leaf 0x40000010 when the
           kernel is running as a guest on the ACRN hypervisor"
      
      * tag 'x86_platform_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/acrn: Set up timekeeping
      3339914a
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · bf767625
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - Add support for Skylake-S CPUs to ie31200_edac
      
       - Improve error decoding speed of the Intel drivers by avoiding the
         ACPI facilities but doing decoding in the driver itself
      
       - Other misc improvements to the Intel drivers
      
       - The usual cleanups and fixlets all over EDAC land
      
      * tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/i7300: Correct the i7300_exit() function name in comment
        x86/sb_edac: Add row column translation for Broadwell
        EDAC/i10nm: Print an extra register set of retry_rd_err_log
        EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
        EDAC/skx_common: Add ChipSelect ADXL component
        EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
        EDAC: Remove obsolete declarations in edac_module.h
        EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
        EDAC/skx_common: Make output format similar
        EDAC/skx_common: Use driver decoder first
        EDAC/mc: Drop duplicated dimm->nr_pages debug printout
        EDAC/mc: Replace spaces with tabs in memtype flags definition
        EDAC/wq: Remove unneeded flush_workqueue()
        EDAC/ie31200: Add Skylake-S support
      bf767625
    • Borislav Petkov's avatar
      Merge branches 'edac-drivers' and 'edac-misc' into edac-updates-for-v6.1 · c2577956
      Borislav Petkov authored
      Combine all queued EDAC changes for submission into v6.1:
      
      * edac-drivers:
        EDAC/ie31200: Add Skylake-S support
      
      * edac-misc:
        EDAC/i7300: Correct the i7300_exit() function name in comment
        x86/sb_edac: Add row column translation for Broadwell
        EDAC/i10nm: Print an extra register set of retry_rd_err_log
        EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
        EDAC/skx_common: Add ChipSelect ADXL component
        EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
        EDAC: Remove obsolete declarations in edac_module.h
        EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
        EDAC/skx_common: Make output format similar
        EDAC/skx_common: Use driver decoder first
        EDAC/mc: Drop duplicated dimm->nr_pages debug printout
        EDAC/mc: Replace spaces with tabs in memtype flags definition
        EDAC/wq: Remove unneeded flush_workqueue()
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      c2577956
    • Jakub Kicinski's avatar
      eth: pse: add missing static inlines · 681bf011
      Jakub Kicinski authored
      build bot reports missing 'static inline' qualifiers in the header.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: 18ff0bcd ("ethtool: add interface to interact with Ethernet Power Equipment")
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20221004040327.2034878-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      681bf011
    • Linus Torvalds's avatar
      Merge tag 'statx-dioalign-for-linus' of... · 725737e7
      Linus Torvalds authored
      Merge tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
      
      Pull STATX_DIOALIGN support from Eric Biggers:
       "Make statx() support reporting direct I/O (DIO) alignment information.
      
        This provides a generic interface for userspace programs to determine
        whether a file supports DIO, and if so with what alignment
        restrictions. Specifically, STATX_DIOALIGN works on block devices, and
        on regular files when their containing filesystem has implemented
        support.
      
        An interface like this has been requested for years, since the
        conditions for when DIO is supported in Linux have gotten increasingly
        complex over time. Today, DIO support and alignment requirements can
        be affected by various filesystem features such as multi-device
        support, data journalling, inline data, encryption, verity,
        compression, checkpoint disabling, log-structured mode, etc.
      
        Further complicating things, Linux v6.0 relaxed the traditional rule
        of DIO needing to be aligned to the block device's logical block size;
        now user buffers (but not file offsets) only need to be aligned to the
        DMA alignment.
      
        The approach of uplifting the XFS specific ioctl XFS_IOC_DIOINFO was
        discarded in favor of creating a clean new interface with statx().
      
        For more information, see the individual commits and the man page
        update[1]"
      
      Link: https://lore.kernel.org/r/20220722074229.148925-1-ebiggers@kernel.org [1]
      
      * tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
        xfs: support STATX_DIOALIGN
        f2fs: support STATX_DIOALIGN
        f2fs: simplify f2fs_force_buffered_io()
        f2fs: move f2fs_force_buffered_io() into file.c
        ext4: support STATX_DIOALIGN
        fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN
        vfs: support STATX_DIOALIGN on block devices
        statx: add direct I/O alignment information
      725737e7
    • Linus Torvalds's avatar
      Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 5779aa2d
      Linus Torvalds authored
      Pull fsverity updates from Eric Biggers:
       "Minor changes to convert uses of kmap() to kmap_local_page()"
      
      * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fs-verity: use kmap_local_page() instead of kmap()
        fs-verity: use memcpy_from_page()
      5779aa2d
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 438b2cdd
      Linus Torvalds authored
      Pull fscrypt updates from Eric Biggers:
       "This release contains some implementation changes, but no new
        features:
      
         - Rework the implementation of the fscrypt filesystem-level keyring
           to not be as tightly coupled to the keyrings subsystem. This
           resolves several issues.
      
         - Eliminate most direct uses of struct request_queue from fs/crypto/,
           since struct request_queue is considered to be a block layer
           implementation detail.
      
         - Stop using the PG_error flag to track decryption failures. This is
           a prerequisite for freeing up PG_error for other uses"
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fscrypt: work on block_devices instead of request_queues
        fscrypt: stop holding extra request_queue references
        fscrypt: stop using keyrings subsystem for fscrypt_master_key
        fscrypt: stop using PG_error to track error status
        fscrypt: remove fscrypt_set_test_dummy_encryption()
      438b2cdd
    • Linus Torvalds's avatar
      Merge tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm · f4309528
      Linus Torvalds authored
      Pull dlm updates from David Teigland:
      
       - Fix a couple races found with a new torture test
      
       - Improve errors when api functions are used incorrectly
      
       - Improve tracing for lock requests from user space
      
       - Fix use after free in recently added tracing cod.
      
       - Small internal code cleanups
      
      * tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
        fs: dlm: fix possible use after free if tracing
        fs: dlm: const void resource name parameter
        fs: dlm: LSFL_CB_DELAY only for kernel lockspaces
        fs: dlm: remove DLM_LSFL_FS from uapi
        fs: dlm: trace user space callbacks
        fs: dlm: change ls_clear_proc_locks to spinlock
        fs: dlm: remove dlm_del_ast prototype
        fs: dlm: handle rcom in else if branch
        fs: dlm: allow lockspaces have zero lvblen
        fs: dlm: fix invalid derefence of sb_lvbptr
        fs: dlm: handle -EINVAL as log_error()
        fs: dlm: use __func__ for function name
        fs: dlm: handle -EBUSY first in unlock validation
        fs: dlm: handle -EBUSY first in lock arg validation
        fs: dlm: fix race between test_bit() and queue_work()
        fs: dlm: fix race in lowcomms
      f4309528
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · f90497a1
      Linus Torvalds authored
      Pull nfsd updates from Chuck Lever:
       "This release is mostly bug fixes, clean-ups, and optimizations.
      
        One notable set of fixes addresses a subtle buffer overflow issue that
        occurs if a small RPC Call message arrives in an oversized RPC record.
        This is only possible on a framed RPC transport such as TCP.
      
        Because NFSD shares the receive and send buffers in one set of pages,
        an oversized RPC record steals pages from the send buffer that will be
        used to construct the RPC Reply message. NFSD must not assume that a
        full-sized buffer is always available to it; otherwise, it will walk
        off the end of the send buffer while constructing its reply.
      
        In this release, we also introduce the ability for the server to wait
        a moment for clients to return delegations before it responds with
        NFS4ERR_DELAY. This saves a retransmit and a network round- trip when
        a delegation recall is needed. This work will be built upon in future
        releases.
      
        The NFS server adds another shrinker to its collection. Because
        courtesy clients can linger for quite some time, they might be
        freeable when the server host comes under memory pressure. A new
        shrinker has been added that releases courtesy client resources during
        low memory scenarios.
      
        Lastly, of note: the maximum number of operations per NFSv4 COMPOUND
        that NFSD can handle is increased from 16 to 50. There are NFSv4
        client implementations that need more than 16 to successfully perform
        a mount operation that uses a pathname with many components"
      
      * tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (53 commits)
        nfsd: extra checks when freeing delegation stateids
        nfsd: make nfsd4_run_cb a bool return function
        nfsd: fix comments about spinlock handling with delegations
        nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
        NFSD: fix use-after-free on source server when doing inter-server copy
        NFSD: Cap rsize_bop result based on send buffer size
        NFSD: Rename the fields in copy_stateid_t
        nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
        nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
        nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
        nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
        nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
        NFSD: Pack struct nfsd4_compoundres
        NFSD: Remove unused nfsd4_compoundargs::cachetype field
        NFSD: Remove "inline" directives on op_rsize_bop helpers
        NFSD: Clean up nfs4svc_encode_compoundres()
        SUNRPC: Fix typo in xdr_buf_subsegment's kdoc comment
        NFSD: Clean up WRITE arg decoders
        NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
        NFSD: Refactor common code out of dirlist helpers
        ...
      f90497a1
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 3497640a
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "In this cycle, for container use cases, fscache-based shared domain is
        introduced [1] so that data blobs in the same domain will be storage
        deduplicated and it will also be used for page cache sharing later.
      
        Also, a special packed inode is now introduced to record inode
        fragments which keep the tail part of files by Yue Hu [2]. You can
        keep arbitary length or (at will) the whole file as a fragment and
        then fragments can be optionally compressed in the packed inode
        together and even deduplicated for smaller image sizes.
      
        In addition to that, global compressed data deduplication by sharing
        partial-referenced pclusters is also supported in this cycle.
      
        Summary:
      
         - Introduce fscache-based domain to share blobs between images
      
         - Support recording fragments in a special packed inode
      
         - Support partial-referenced pclusters for global compressed data
           deduplication
      
         - Fix an order >= MAX_ORDER warning due to crafted negative i_size
      
         - Several cleanups"
      
      Link: https://lore.kernel.org/r/20220916085940.89392-1-zhujia.zj@bytedance.com [1]
      Link: https://lore.kernel.org/r/cover.1663065968.git.huyue2@coolpad.com [2]
      
      * tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: clean up erofs_iget()
        erofs: clean up unnecessary code and comments
        erofs: fold in z_erofs_reload_indexes()
        erofs: introduce partial-referenced pclusters
        erofs: support on-disk compressed fragments data
        erofs: support interlaced uncompressed data for compressed files
        erofs: clean up .read_folio() and .readahead() in fscache mode
        erofs: introduce 'domain_id' mount option
        erofs: Support sharing cookies in the same domain
        erofs: introduce a pseudo mnt to manage shared cookies
        erofs: introduce fscache-based domain
        erofs: code clean up for fscache
        erofs: use kill_anon_super() to kill super in fscache mode
        erofs: fix order >= MAX_ORDER warning due to crafted negative i_size
      3497640a
    • Linus Torvalds's avatar
      Merge tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · 8bea8ff3
      Linus Torvalds authored
      Pull fatfs vfsuid conversion from Christian Brauner:
       "Last cycle we introduced the new vfs{g,u}id_t types that we had agreed
        on. The most important parts of the vfs have been converted but there
        are a few more places we need to switch before we can remove the old
        helpers completely.
      
        This cycle we converted all filesystems that called idmapped mount
        helpers directly. The affected filesystems are f2fs, fat, fuse, ksmbd,
        overlayfs, and xfs. We've sent patches for all of them. Looking at
        -next f2fs, ksmbd, overlayfs, and xfs have all picked up these patches
        and they should land in mainline during the v6.1 merge window.
      
        So all filesystems that have a separate tree should send the vfsuid
        conversion themselves. Onle the fat conversion is going through this
        generic fs trees because there is no fat tree.
      
        In order to change time settings on an inode fat checks that the
        caller either is the owner of the inode or the inode's group is in the
        caller's group list. If fat is on an idmapped mount we compare whether
        the inode mapped into the mount is equivalent to the caller's fsuid.
        If it isn't we compare whether the inode's group mapped into the mount
        is in the caller's group list.
      
        We now use the new vfsuid based helpers for that"
      
      * tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        fat: port to vfs{g,u}id_t and associated helpers
      8bea8ff3
    • Linus Torvalds's avatar
      Merge tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · 223b8452
      Linus Torvalds authored
      Pull vfs acl updates from Christian Brauner:
       "These are general fixes and preparatory changes related to the ongoing
        posix acl rework. The actual rework where we build a type safe posix
        acl api wasn't ready for this merge window but we're hopeful for the
        next merge window.
      
        General fixes:
      
         - Some filesystems like 9p and cifs have to implement custom posix
           acl handlers because they require access to the dentry in order to
           set and get posix acls while the set and get inode operations
           currently don't. But the ntfs3 filesystem has no such requirement
           and thus implemented custom posix acl xattr handlers when it really
           didn't have to. So this pr contains patch that just implements set
           and get inode operations for ntfs3 and switches it to rely on the
           generic posix acl xattr handlers. (We would've appreciated reviews
           from the ntfs3 maintainers but we didn't get any. But hey, if we
           really broke it we'll fix it. But fstests for ntfs3 said it's
           fine.)
      
         - The posix_acl_fix_xattr_common() helper has been adapted so it can
           be used by a few more callers and avoiding open-coding the same
           checks over and over.
      
        Other than the two general fixes this series introduces a new helper
        vfs_set_acl_prepare(). The reason for this helper is so that we can
        mitigate one of the source that change {g,u}id values directly in the
        uapi struct. With the vfs_set_acl_prepare() helper we can move the
        idmapped mount fixup into the generic posix acl set handler.
      
        The advantage of this is that it allows us to remove the
        posix_acl_setxattr_idmapped_mnt() helper which so far we had to call
        in vfs_setxattr() to account for idmapped mounts. While semantically
        correct the problem with this approach was that we had to keep the
        value parameter of the generic vfs_setxattr() call as non-const. This
        is rectified in this series.
      
        Ultimately, we will get rid of all the extreme kludges and type
        unsafety once we have merged the posix api - hopefully during the next
        merge window - built solely around get and set inode operations. Which
        incidentally will also improve handling of posix acls in security and
        especially in integrity modesl. While this will come with temporarily
        having two inode operation for posix acls that is nothing compared to
        the problems we have right now and so well worth it. We'll end up with
        something that we can actually reason about instead of needing to
        write novels to explain what's going on"
      
      * tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        xattr: always us is_posix_acl_xattr() helper
        acl: fix the comments of posix_acl_xattr_set
        xattr: constify value argument in vfs_setxattr()
        ovl: use vfs_set_acl_prepare()
        acl: move idmapping handling into posix_acl_xattr_set()
        acl: add vfs_set_acl_prepare()
        acl: return EOPNOTSUPP in posix_acl_fix_xattr_common()
        ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
      223b8452