1. 09 Aug, 2022 10 commits
  2. 05 Aug, 2022 6 commits
    • Cezar Bulinaru's avatar
      selftests: add few test cases for tap driver · 2e64fe46
      Cezar Bulinaru authored
      Few test cases related to the fix for 924a9bc3:
      "net: check if protocol extracted by virtio_net_hdr_set_proto is correct"
      
      Need test for the case when a non-standard packet (GSO without NEEDS_CSUM)
      sent to the tap device causes a BUG check in the tap driver.
      Signed-off-by: default avatarCezar Bulinaru <cbulinaru@gmail.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e64fe46
    • Cezar Bulinaru's avatar
      net: tap: NULL pointer derefence in dev_parse_header_protocol when skb->dev is null · 4f61f133
      Cezar Bulinaru authored
      Fixes a NULL pointer derefence bug triggered from tap driver.
      When tap_get_user calls virtio_net_hdr_to_skb the skb->dev is null
      (in tap.c skb->dev is set after the call to virtio_net_hdr_to_skb)
      virtio_net_hdr_to_skb calls dev_parse_header_protocol which
      needs skb->dev field to be valid.
      
      The line that trigers the bug is in dev_parse_header_protocol
      (dev is at offset 0x10 from skb and is stored in RAX register)
        if (!dev->header_ops || !dev->header_ops->parse_protocol)
        22e1:   mov    0x10(%rbx),%rax
        22e5:	  mov    0x230(%rax),%rax
      
      Setting skb->dev before the call in tap.c fixes the issue.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000230
      RIP: 0010:virtio_net_hdr_to_skb.constprop.0+0x335/0x410 [tap]
      Code: c0 0f 85 b7 fd ff ff eb d4 41 39 c6 77 cf 29 c6 48 89 df 44 01 f6 e8 7a 79 83 c1 48 85 c0 0f 85 d9 fd ff ff eb b7 48 8b 43 10 <48> 8b 80 30 02 00 00 48 85 c0 74 55 48 8b 40 28 48 85 c0 74 4c 48
      RSP: 0018:ffffc90005c27c38 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff888298f25300 RCX: 0000000000000010
      RDX: 0000000000000005 RSI: ffffc90005c27cb6 RDI: ffff888298f25300
      RBP: ffffc90005c27c80 R08: 00000000ffffffea R09: 00000000000007e8
      R10: ffff88858ec77458 R11: 0000000000000000 R12: 0000000000000001
      R13: 0000000000000014 R14: ffffc90005c27e08 R15: ffffc90005c27cb6
      FS:  0000000000000000(0000) GS:ffff88858ec40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000230 CR3: 0000000281408006 CR4: 00000000003706e0
      Call Trace:
       tap_get_user+0x3f1/0x540 [tap]
       tap_sendmsg+0x56/0x362 [tap]
       ? get_tx_bufs+0xc2/0x1e0 [vhost_net]
       handle_tx_copy+0x114/0x670 [vhost_net]
       handle_tx+0xb0/0xe0 [vhost_net]
       handle_tx_kick+0x15/0x20 [vhost_net]
       vhost_worker+0x7b/0xc0 [vhost]
       ? vhost_vring_call_reset+0x40/0x40 [vhost]
       kthread+0xfa/0x120
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x1f/0x30
      
      Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      Signed-off-by: default avatarCezar Bulinaru <cbulinaru@gmail.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f61f133
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 9f05f9ad
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for mptcp cleanup/close and a selftest
      
      Patch 1 fixes an issue with leaking subflow sockets if there's a failure
      in a CGROUP_INET_SOCK_CREATE eBPF program.
      
      Patch 2 fixes a syzkaller-detected race at MPTCP socket close.
      
      Patch 3 is a fix for one mode of the mptcp_connect.sh selftest.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f05f9ad
    • Florian Westphal's avatar
      selftests: mptcp: make sendfile selftest work · df9e03ae
      Florian Westphal authored
      When the selftest got added, sendfile() on mptcp sockets returned
      -EOPNOTSUPP, so running 'mptcp_connect.sh -m sendfile' failed
      immediately.
      
      This is no longer the case, but the script fails anyway due to timeout.
      Let the receiver know once the sender has sent all data, just like
      with '-m mmap' mode.
      
      v2: need to respect cfg_wait too, as pm_userspace.sh relied
      on -m sendfile to keep the connection open (Mat Martineau)
      
      Fixes: 048d19d4 ("mptcp: add basic kselftest for mptcp")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df9e03ae
    • Paolo Abeni's avatar
      mptcp: do not queue data on closed subflows · c886d702
      Paolo Abeni authored
      Dipanjan reported a syzbot splat at close time:
      
      WARNING: CPU: 1 PID: 10818 at net/ipv4/af_inet.c:153
      inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153
      Modules linked in: uio_ivshmem(OE) uio(E)
      CPU: 1 PID: 10818 Comm: kworker/1:16 Tainted: G           OE
      5.19.0-rc6-g2eae0556bb9d #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153
      Code: 21 02 00 00 41 8b 9c 24 28 02 00 00 e9 07 ff ff ff e8 34 4d 91
      f9 89 ee 4c 89 e7 e8 4a 47 60 ff e9 a6 fc ff ff e8 20 4d 91 f9 <0f> 0b
      e9 84 fe ff ff e8 14 4d 91 f9 0f 0b e9 d4 fd ff ff e8 08 4d
      RSP: 0018:ffffc9001b35fa78 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000002879d0 RCX: ffff8881326f3b00
      RDX: 0000000000000000 RSI: ffff8881326f3b00 RDI: 0000000000000002
      RBP: ffff888179662674 R08: ffffffff87e983a0 R09: 0000000000000000
      R10: 0000000000000005 R11: 00000000000004ea R12: ffff888179662400
      R13: ffff888179662428 R14: 0000000000000001 R15: ffff88817e38e258
      FS:  0000000000000000(0000) GS:ffff8881f5f00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020007bc0 CR3: 0000000179592000 CR4: 0000000000150ee0
      Call Trace:
       <TASK>
       __sk_destruct+0x4f/0x8e0 net/core/sock.c:2067
       sk_destruct+0xbd/0xe0 net/core/sock.c:2112
       __sk_free+0xef/0x3d0 net/core/sock.c:2123
       sk_free+0x78/0xa0 net/core/sock.c:2134
       sock_put include/net/sock.h:1927 [inline]
       __mptcp_close_ssk+0x50f/0x780 net/mptcp/protocol.c:2351
       __mptcp_destroy_sock+0x332/0x760 net/mptcp/protocol.c:2828
       mptcp_worker+0x5d2/0xc90 net/mptcp/protocol.c:2586
       process_one_work+0x9cc/0x1650 kernel/workqueue.c:2289
       worker_thread+0x623/0x1070 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
       </TASK>
      
      The root cause of the problem is that an mptcp-level (re)transmit can
      race with mptcp_close() and the packet scheduler checks the subflow
      state before acquiring the socket lock: we can try to (re)transmit on
      an already closed ssk.
      
      Fix the issue checking again the subflow socket status under the
      subflow socket lock protection. Additionally add the missing check
      for the fallback-to-tcp case.
      
      Fixes: d5f49190 ("mptcp: allow picking different xmit subflows")
      Reported-by: default avatarDipanjan Das <mail.dipanjan.das@gmail.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c886d702
    • Paolo Abeni's avatar
      mptcp: move subflow cleanup in mptcp_destroy_common() · c0bf3c6a
      Paolo Abeni authored
      If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
      eBPF program, the MPTCP protocol ends-up leaking all the subflows:
      the related cleanup happens in __mptcp_destroy_sock() that is not
      invoked in such code path.
      
      Address the issue moving the subflow sockets cleanup in the
      mptcp_destroy_common() helper, which is invoked in every msk cleanup
      path.
      
      Additionally get rid of the intermediate list_splice_init step, which
      is an unneeded relic from the past.
      
      The issue is present since before the reported root cause commit, but
      any attempt to backport the fix before that hash will require a complete
      rewrite.
      
      Fixes: e16163b6 ("mptcp: refactor shutdown and close")
      Reported-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Co-developed-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0bf3c6a
  3. 04 Aug, 2022 7 commits
  4. 03 Aug, 2022 17 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · f86d1fbb
      Linus Torvalds authored
      Pull networking changes from Paolo Abeni:
       "Core:
      
         - Refactor the forward memory allocation to better cope with memory
           pressure with many open sockets, moving from a per socket cache to
           a per-CPU one
      
         - Replace rwlocks with RCU for better fairness in ping, raw sockets
           and IP multicast router.
      
         - Network-side support for IO uring zero-copy send.
      
         - A few skb drop reason improvements, including codegen the source
           file with string mapping instead of using macro magic.
      
         - Rename reference tracking helpers to a more consistent netdev_*
           schema.
      
         - Adapt u64_stats_t type to address load/store tearing issues.
      
         - Refine debug helper usage to reduce the log noise caused by bots.
      
        BPF:
      
         - Improve socket map performance, avoiding skb cloning on read
           operation.
      
         - Add support for 64 bits enum, to match types exposed by kernel.
      
         - Introduce support for sleepable uprobes program.
      
         - Introduce support for enum textual representation in libbpf.
      
         - New helpers to implement synproxy with eBPF/XDP.
      
         - Improve loop performances, inlining indirect calls when possible.
      
         - Removed all the deprecated libbpf APIs.
      
         - Implement new eBPF-based LSM flavor.
      
         - Add type match support, which allow accurate queries to the eBPF
           used types.
      
         - A few TCP congetsion control framework usability improvements.
      
         - Add new infrastructure to manipulate CT entries via eBPF programs.
      
         - Allow for livepatch (KLP) and BPF trampolines to attach to the same
           kernel function.
      
        Protocols:
      
         - Introduce per network namespace lookup tables for unix sockets,
           increasing scalability and reducing contention.
      
         - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
      
         - Add support to forciby close TIME_WAIT TCP sockets via user-space
           tools.
      
         - Significant performance improvement for the TLS 1.3 receive path,
           both for zero-copy and not-zero-copy.
      
         - Support for changing the initial MTPCP subflow priority/backup
           status
      
         - Introduce virtually contingus buffers for sockets over RDMA, to
           cope better with memory pressure.
      
         - Extend CAN ethtool support with timestamping capabilities
      
         - Refactor CAN build infrastructure to allow building only the needed
           features.
      
        Driver API:
      
         - Remove devlink mutex to allow parallel commands on multiple links.
      
         - Add support for pause stats in distributed switch.
      
         - Implement devlink helpers to query and flash line cards.
      
         - New helper for phy mode to register conversion.
      
        New hardware / drivers:
      
         - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
      
         - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
      
         - Ethernet DSA driver for the Microchip LAN937x switch.
      
         - Ethernet PHY driver for the Aquantia AQR113C EPHY.
      
         - CAN driver for the OBD-II ELM327 interface.
      
         - CAN driver for RZ/N1 SJA1000 CAN controller.
      
         - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
      
        Drivers:
      
         - Intel Ethernet NICs:
            - i40e: add support for vlan pruning
            - i40e: add support for XDP framented packets
            - ice: improved vlan offload support
            - ice: add support for PPPoE offload
      
         - Mellanox Ethernet (mlx5)
            - refactor packet steering offload for performance and scalability
            - extend support for TC offload
            - refactor devlink code to clean-up the locking schema
            - support stacked vlans for bridge offloads
            - use TLS objects pool to improve connection rate
      
         - Netronome Ethernet NICs (nfp):
            - extend support for IPv6 fields mangling offload
            - add support for vepa mode in HW bridge
            - better support for virtio data path acceleration (VDPA)
            - enable TSO by default
      
         - Microsoft vNIC driver (mana)
            - add support for XDP redirect
      
         - Others Ethernet drivers:
            - bonding: add per-port priority support
            - microchip lan743x: extend phy support
            - Fungible funeth: support UDP segmentation offload and XDP xmit
            - Solarflare EF100: add support for virtual function representors
            - MediaTek SoC: add XDP support
      
         - Mellanox Ethernet/IB switch (mlxsw):
            - dropped support for unreleased H/W (XM router).
            - improved stats accuracy
            - unified bridge model coversion improving scalability (parts 1-6)
            - support for PTP in Spectrum-2 asics
      
         - Broadcom PHYs
            - add PTP support for BCM54210E
            - add support for the BCM53128 internal PHY
      
         - Marvell Ethernet switches (prestera):
            - implement support for multicast forwarding offload
      
         - Embedded Ethernet switches:
            - refactor OcteonTx MAC filter for better scalability
            - improve TC H/W offload for the Felix driver
            - refactor the Microchip ksz8 and ksz9477 drivers to share the
              probe code (parts 1, 2), add support for phylink mac
              configuration
      
         - Other WiFi:
            - Microchip wilc1000: diable WEP support and enable WPA3
            - Atheros ath10k: encapsulation offload support
      
        Old code removal:
      
         - Neterion vxge ethernet driver: this is untouched since more than 10 years"
      
      * tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
        doc: sfp-phylink: Fix a broken reference
        wireguard: selftests: support UML
        wireguard: allowedips: don't corrupt stack when detecting overflow
        wireguard: selftests: update config fragments
        wireguard: ratelimiter: use hrtimer in selftest
        net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
        net: usb: ax88179_178a: Bind only to vendor-specific interface
        selftests: net: fix IOAM test skip return code
        net: usb: make USB_RTL8153_ECM non user configurable
        net: marvell: prestera: remove reduntant code
        octeontx2-pf: Reduce minimum mtu size to 60
        net: devlink: Fix missing mutex_unlock() call
        net/tls: Remove redundant workqueue flush before destroy
        net: txgbe: Fix an error handling path in txgbe_probe()
        net: dsa: Fix spelling mistakes and cleanup code
        Documentation: devlink: add add devlink-selftests to the table of contents
        dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
        net: ionic: fix error check for vlan flags in ionic_set_nic_features()
        net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
        nfp: flower: add support for tunnel offload without key ID
        ...
      f86d1fbb
    • Linus Torvalds's avatar
      Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 526942b8
      Linus Torvalds authored
      Pull ATA updates from Damien Le Moal:
      
       - Some code refactoring for the pata_hpt37x and pata_hpt3x2n drivers,
         from Sergei.
      
       - Several patches to cleanup in libata-core, libata-scsi and libata-eh
         code: fixes arguments and variables types, change some functions
         declaration to static and fix for a typo in a comment. From Sergey
         and Xiang.
      
       - Fix a compilation warning in the pata_macio driver, from me.
      
       - A fix for the expected number of resources in the sata_mv driver fix,
         from Andrew.
      
      * tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: sata_mv: Fixes expected number of resources now IRQs are gone
        ata: libata-scsi: fix result type of ata_ioc32()
        ata: pata_macio: Fix compilation warning
        ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()
        ata: libata-core: fix sloppy parameter type in ata_exec_internal[_sg]()
        ata: make ata_port::fastdrain_cnt *unsigned int*
        ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()
        ata: libata-core: make ata_exec_internal_sg() *static*
        ata: make transfer mode masks *unsigned int*
        ata: libata-core: get rid of *else* branches in ata_id_n_sectors()
        ata: libata-core: fix sloppy typing in ata_id_n_sectors()
        ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()
        ata: pata_hpt37x: merge hpt374_read_freq() to hpt37x_pci_clock()
        ata: pata_hpt37x: factor out hpt37x_pci_clock()
        ata: pata_hpt37x: move claculating PCI clock from hpt37x_clock_slot()
        ata: libata: Fix syntax errors in comments
      526942b8
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · a39b5dbd
      Linus Torvalds authored
      Pull zonefs update from Damien Le Moal:
       "A single change for this cycle to simplify handling of the memory page
        used as super block buffer during mount (from Fabio)"
      
      * tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: Call page_address() on page acquired with GFP_KERNEL flag
      a39b5dbd
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f18d7309
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "The most notable change in this first batch is that we no longer
        schedule pages beyond i_size for writeback, preferring instead to let
        truncate deal with those pages.
      
        Next week, there may be a second pull request to remove
        iomap_writepage from the other two filesystems (gfs2/zonefs) that use
        iomap for buffered IO. This follows in the same vein as the recent
        removal of writepage from XFS, since it hasn't been triggered in a few
        years; it does nothing during direct reclaim; and as far as the people
        who examined the patchset can tell, it's moving the codebase in the
        right direction.
      
        However, as it was a late addition to for-next, I'm holding off on
        that section for another week of testing to see if anyone can come up
        with a solid reason for holding off in the meantime.
      
        Summary:
      
         - Skip writeback for pages that are completely beyond EOF
      
         - Minor code cleanups"
      
      * tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        dax: set did_zero to true when zeroing successfully
        iomap: set did_zero to true when zeroing successfully
        iomap: skip pages past eof in iomap_do_writepage()
      f18d7309
    • Linus Torvalds's avatar
      Merge tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 2e4f8c72
      Linus Torvalds authored
      Pull affs fix from David Sterba:
       "One update to AFFS, switching away from the kmap/kmap_atomic API"
      
      * tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: use memcpy_to_page and remove replace kmap_atomic()
      2e4f8c72
    • Linus Torvalds's avatar
      Merge tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 353767e4
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "This brings some long awaited changes, the send protocol bump,
        otherwise lots of small improvements and fixes. The main core part is
        reworking bio handling, cleaning up the submission and endio and
        improving error handling.
      
        There are some changes outside of btrfs adding helpers or updating
        API, listed at the end of the changelog.
      
        Features:
      
         - sysfs:
            - export chunk size, in debug mode add tunable for setting its size
            - show zoned among features (was only in debug mode)
            - show commit stats (number, last/max/total duration)
      
         - send protocol updated to 2
            - new commands:
               - ability write larger data chunks than 64K
               - send raw compressed extents (uses the encoded data ioctls),
                 ie. no decompression on send side, no compression needed on
                 receive side if supported
               - send 'otime' (inode creation time) among other timestamps
               - send file attributes (a.k.a file flags and xflags)
            - this is first version bump, backward compatibility on send and
              receive side is provided
            - there are still some known and wanted commands that will be
              implemented in the near future, another version bump will be
              needed, however we want to minimize that to avoid causing
              usability issues
      
         - print checksum type and implementation at mount time
      
         - don't print some messages at mount (mentioned as people asked about
           it), we want to print messages namely for new features so let's
           make some space for that
            - big metadata - this has been supported for a long time and is
              not a feature that's worth mentioning
            - skinny metadata - same reason, set by default by mkfs
      
        Performance improvements:
      
         - reduced amount of reserved metadata for delayed items
            - when inserted items can be batched into one leaf
            - when deleting batched directory index items
            - when deleting delayed items used for deletion
            - overall improved count of files/sec, decreased subvolume lock
              contention
      
         - metadata item access bounds checker micro-optimized, with a few
           percent of improved runtime for metadata-heavy operations
      
         - increase direct io limit for read to 256 sectors, improved
           throughput by 3x on sample workload
      
        Notable fixes:
      
         - raid56
            - reduce parity writes, skip sectors of stripe when there are no
              data updates
            - restore reading from on-disk data instead of using stripe cache,
              this reduces chances to damage correct data due to RMW cycle
      
         - refuse to replay log with unknown incompat read-only feature bit
           set
      
         - zoned
            - fix page locking when COW fails in the middle of allocation
            - improved tracking of active zones, ZNS drives may limit the
              number and there are ENOSPC errors due to that limit and not
              actual lack of space
            - adjust maximum extent size for zone append so it does not cause
              late ENOSPC due to underreservation
      
         - mirror reading error messages show the mirror number
      
         - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
           have the NOWAIT semantics for buffered io yet
      
         - send, fix sending link commands for existing file paths when there
           are deleted and created hardlinks for same files
      
         - repair all mirrors for profiles with more than 1 copy (raid1c34)
      
         - fix repair of compressed extents, unify where error detection and
           repair happen
      
        Core changes:
      
         - bio completion cleanups
            - don't double defer compression bios
            - simplify endio workqueues
            - add more data to btrfs_bio to avoid allocation for read requests
            - rework bio error handling so it's same what block layer does,
              the submission works and errors are consumed in endio
            - when asynchronous bio offload fails fall back to synchronous
              checksum calculation to avoid errors under writeback or memory
              pressure
      
         - new trace points
            - raid56 events
            - ordered extent operations
      
         - super block log_root_transid deprecated (never used)
      
         - mixed_backref and big_metadata sysfs feature files removed, they've
           been default for sufficiently long time, there are no known users
           and mixed_backref could be confused with mixed_groups
      
        Non-btrfs changes, API updates:
      
         - minor highmem API update to cover const arguments
      
         - switch all kmap/kmap_atomic to kmap_local
      
         - remove redundant flush_dcache_page()
      
         - address_space_operations::writepage callback removed
      
         - add bdev_max_segments() helper"
      
      * tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
        btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
        btrfs: fix repair of compressed extents
        btrfs: remove the start argument to check_data_csum and export
        btrfs: pass a btrfs_bio to btrfs_repair_one_sector
        btrfs: simplify the pending I/O counting in struct compressed_bio
        btrfs: repair all known bad mirrors
        btrfs: merge btrfs_dev_stat_print_on_error with its only caller
        btrfs: join running log transaction when logging new name
        btrfs: simplify error handling in btrfs_lookup_dentry
        btrfs: send: always use the rbtree based inode ref management infrastructure
        btrfs: send: fix sending link commands for existing file paths
        btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
        btrfs: zoned: wait until zone is finished when allocation didn't progress
        btrfs: zoned: write out partially allocated region
        btrfs: zoned: activate necessary block group
        btrfs: zoned: activate metadata block group on flush_space
        btrfs: zoned: disable metadata overcommit for zoned
        btrfs: zoned: introduce space_info->active_total_bytes
        btrfs: zoned: finish least available block group on data bg allocation
        btrfs: let can_allocate_chunk return error
        ...
      353767e4
    • Linus Torvalds's avatar
      Merge tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · ab17c0cd
      Linus Torvalds authored
      Pull efivars sysfs interface removal from Ard Biesheuvel:
       "Remove the obsolete 'efivars' sysfs based interface to the EFI
        variable store, now that all users have moved to the efivarfs pseudo
        file system, which was created ~10 years ago to address some
        fundamental shortcomings in the sysfs based driver.
      
        Move the 'business logic' related to which EFI variables are important
        and may affect the boot flow from the efivars support layer into the
        efivarfs pseudo file system, so it is no longer exposed to other parts
        of the kernel"
      
      * tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: vars: Move efivar caching layer into efivarfs
        efi: vars: Switch to new wrapper layer
        efi: vars: Remove deprecated 'efivars' sysfs interface
      ab17c0cd
    • Linus Torvalds's avatar
      Merge tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 97a77ab1
      Linus Torvalds authored
      Pull EFI updates from Ard Biesheuvel:
      
       - Enable mirrored memory for arm64
      
       - Fix up several abuses of the efivar API
      
       - Refactor the efivar API in preparation for moving the 'business
         logic' part of it into efivarfs
      
       - Enable ACPI PRM on arm64
      
      * tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
        ACPI: Move PRM config option under the main ACPI config
        ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
        ACPI: PRM: Change handler_addr type to void pointer
        efi: Simplify arch_efi_call_virt() macro
        drivers: fix typo in firmware/efi/memmap.c
        efi: vars: Drop __efivar_entry_iter() helper which is no longer used
        efi: vars: Use locking version to iterate over efivars linked lists
        efi: pstore: Omit efivars caching EFI varstore access layer
        efi: vars: Add thin wrapper around EFI get/set variable interface
        efi: vars: Don't drop lock in the middle of efivar_init()
        pstore: Add priv field to pstore_record for backend specific use
        Input: applespi - avoid efivars API and invoke EFI services directly
        selftests/kexec: remove broken EFI_VARS secure boot fallback check
        brcmfmac: Switch to appropriate helper to load EFI variable contents
        iwlwifi: Switch to proper EFI variable store interface
        media: atomisp_gmin_platform: stop abusing efivar API
        efi: efibc: avoid efivar API for setting variables
        efi: avoid efivars layer when loading SSDTs from variables
        efi: Correct comment on efi_memmap_alloc
        memblock: Disable mirror feature if kernelcore is not specified
        ...
      97a77ab1
    • Linus Torvalds's avatar
      Merge tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ff89dd08
      Linus Torvalds authored
      Pull 9p iov_iter fix from Al Viro:
       "net/9p abuses iov_iter primitives - it attempts to copy _from_ a
        destination-only iov_iter when it handles Rerror arriving in reply to
        zero-copy request.   Not hard to fix, fortunately.
      
        This is a prereq for the iov_iter_get_pages() work in the second part
        of iov_iter series, ended up in a separate branch"
      
      * tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        9p: handling Rerror without copy_from_iter_full()
      ff89dd08
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d9b58ab7
      Linus Torvalds authored
      Pull copy_to_iter_mc fix from Al Viro:
       "Backportable fix for copy_to_iter_mc() - the second part of iov_iter
        work will pretty much overwrite this, but would be much harder to
        backport"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix short copy handling in copy_mc_pipe_to_iter()
      d9b58ab7
    • Linus Torvalds's avatar
      Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5264406c
      Linus Torvalds authored
      Pull vfs iov_iter updates from Al Viro:
       "Part 1 - isolated cleanups and optimizations.
      
        One of the goals is to reduce the overhead of using ->read_iter() and
        ->write_iter() instead of ->read()/->write().
      
        new_sync_{read,write}() has a surprising amount of overhead, in
        particular inside iocb_flags(). That's the explanation for the
        beginning of the series is in this pile; it's not directly
        iov_iter-related, but it's a part of the same work..."
      
      * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        first_iovec_segment(): just return address
        iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
        iov_iter: first_{iovec,bvec}_segment() - simplify a bit
        iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
        iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
        iov_iter_bvec_advance(): don't bother with bvec_iter
        copy_page_{to,from}_iter(): switch iovec variants to generic
        keep iocb_flags() result cached in struct file
        iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
        struct file: use anonymous union member for rcuhead and llist
        btrfs: use IOMAP_DIO_NOSYNC
        teach iomap_dio_rw() to suppress dsync
        No need of likely/unlikely on calls of check_copy_size()
      5264406c
    • Linus Torvalds's avatar
      Merge tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 200e340f
      Linus Torvalds authored
      Pull vfs dcache updates from Al Viro:
       "The main part here is making parallel lookups safe for RT - making
        sure preemption is disabled in start_dir_add()/ end_dir_add() sections
        (on non-RT it's automatic, on RT it needs to to be done explicitly)
        and moving wakeups from __d_lookup_done() inside of such to the end of
        those sections.
      
        Wakeups can be safely delayed for as long as ->d_lock on in-lookup
        dentry is held; proving that has caught a bug in d_add_ci() that
        allows memory corruption when sufficiently bogus ntfs (or
        case-insensitive xfs) image is mounted. Easily fixed, fortunately"
      
      * tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/dcache: Move wakeup out of i_seq_dir write held region.
        fs/dcache: Move the wakeup from __d_lookup_done() to the caller.
        fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT
        d_add_ci(): make sure we don't miss d_lookup_done()
      200e340f
    • Linus Torvalds's avatar
      Merge tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a782e866
      Linus Torvalds authored
      Pull vfs lseek updates from Al Viro:
       "Jason's lseek series.
      
        Saner handling of 'lseek should fail with ESPIPE' - this gets rid of
        the magical no_llseek thing and makes checks consistent.
      
        In particular, the ad-hoc "can we do splice via internal pipe" checks
        got saner (and somewhat more permissive, which is what Jason had been
        after, AFAICT)"
      
      * tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: remove no_llseek
        fs: check FMODE_LSEEK to control internal pipe splicing
        vfio: do not set FMODE_LSEEK flag
        dma-buf: remove useless FMODE_LSEEK flag
        fs: do not compare against ->llseek
        fs: clear or set FMODE_LSEEK based on llseek function
      a782e866
    • Linus Torvalds's avatar
      Merge tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d9395512
      Linus Torvalds authored
      Pull vfs namei updates from Al Viro:
       "RCU pathwalk cleanups.
      
        Storing sampled ->d_seq of the next dentry in nameidata simplifies
        life considerably, especially if we delay fetching ->d_inode until
        step_into()"
      
      * tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        step_into(): move fetching ->d_inode past handle_mounts()
        lookup_fast(): don't bother with inode
        follow_dotdot{,_rcu}(): don't bother with inode
        step_into(): lose inode argument
        namei: stash the sampled ->d_seq into nameidata
        namei: move clearing LOOKUP_RCU towards rcu_read_unlock()
        switch try_to_unlazy_next() to __legitimize_mnt()
        follow_dotdot{,_rcu}(): change calling conventions
        namei: get rid of pointless unlikely(read_seqcount_retry(...))
        __follow_mount_rcu(): verify that mount_lock remains unchanged
      d9395512
    • Linus Torvalds's avatar
      Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache · f0065400
      Linus Torvalds authored
      Pull folio updates from Matthew Wilcox:
      
       - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
         when running xfstests
      
       - Convert more of mpage to use folios
      
       - Remove add_to_page_cache() and add_to_page_cache_locked()
      
       - Convert find_get_pages_range() to filemap_get_folios()
      
       - Improvements to the read_cache_page() family of functions
      
       - Remove a few unnecessary checks of PageError
      
       - Some straightforward filesystem conversions to use folios
      
       - Split PageMovable users out from address_space_operations into
         their own movable_operations
      
       - Convert aops->migratepage to aops->migrate_folio
      
       - Remove nobh support (Christoph Hellwig)
      
      * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
        fs: remove the NULL get_block case in mpage_writepages
        fs: don't call ->writepage from __mpage_writepage
        fs: remove the nobh helpers
        jfs: stop using the nobh helper
        ext2: remove nobh support
        ntfs3: refactor ntfs_writepages
        mm/folio-compat: Remove migration compatibility functions
        fs: Remove aops->migratepage()
        secretmem: Convert to migrate_folio
        hugetlb: Convert to migrate_folio
        aio: Convert to migrate_folio
        f2fs: Convert to filemap_migrate_folio()
        ubifs: Convert to filemap_migrate_folio()
        btrfs: Convert btrfs_migratepage to migrate_folio
        mm/migrate: Add filemap_migrate_folio()
        mm/migrate: Convert migrate_page() to migrate_folio()
        nfs: Convert to migrate_folio
        btrfs: Convert btree_migratepage to migrate_folio
        mm/migrate: Convert expected_page_refs() to folio_expected_refs()
        mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
        ...
      f0065400
    • Linus Torvalds's avatar
      Merge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray · e087437a
      Linus Torvalds authored
      Pull XArray/IDR updates from Matthew Wilcox:
      
       - Add appropriate might_alloc() annotations to the XArray APIs
      
       - Document that the IDR is deprecated
      
      * tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray:
        IDR: Note that the IDR API is deprecated
        XArray: Add calls to might_alloc()
      e087437a
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · b6bb70f9
      Linus Torvalds authored
      Pull cgroup updates from Tejun Heo:
       "Several core optimizations:
      
         - threadgroup_rwsem write locking is skipped when configuring
           controllers in empty subtrees.
      
           Combined with CLONE_INTO_CGROUP, this allows the common static
           usage pattern to not grab threadgroup_rwsem at all (glibc still
           doesn't seem ready for CLONE_INTO_CGROUP unfortunately).
      
         - threadgroup_rwsem used to be put into non-percpu mode by default
           due to latency concerns in specific use cases. There's no reason
           for everyone else to pay for it. Make the behavior optional.
      
         - psi no longer allocates memory when disabled.
      
        ... along with some code cleanups"
      
      * tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Skip subtree root in cgroup_update_dfl_csses()
        cgroup: remove "no" prefixed mount options
        cgroup: Make !percpu threadgroup_rwsem operations optional
        cgroup: Add "no" prefixed mount options
        cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
        cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
        cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
        psi: dont alloc memory for psi by default
      b6bb70f9