1. 09 Mar, 2024 1 commit
  2. 08 Mar, 2024 7 commits
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 9831e35e
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-03-06 (igc, igb, ice)
      
      This series contains updates to igc, igb, and ice drivers.
      
      Vinicius removes double clearing of interrupt register which could cause
      timestamp events to be missed on igc and igb.
      
      Przemek corrects calculation of statistics which caused incorrect spikes
      in reporting for ice driver.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: fix stats being updated by way too large values
        igb: Fix missing time sync events
        igc: Fix missing time sync events
      ====================
      
      Link: https://lore.kernel.org/r/20240306182617.625932-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9831e35e
    • Jiri Pirko's avatar
      dpll: fix dpll_xa_ref_*_del() for multiple registrations · b446631f
      Jiri Pirko authored
      Currently, if there are multiple registrations of the same pin on the
      same dpll device, following warnings are observed:
      WARNING: CPU: 5 PID: 2212 at drivers/dpll/dpll_core.c:143 dpll_xa_ref_pin_del.isra.0+0x21e/0x230
      WARNING: CPU: 5 PID: 2212 at drivers/dpll/dpll_core.c:223 __dpll_pin_unregister+0x2b3/0x2c0
      
      The problem is, that in both dpll_xa_ref_dpll_del() and
      dpll_xa_ref_pin_del() registration is only removed from list in case the
      reference count drops to zero. That is wrong, the registration has to
      be removed always.
      
      To fix this, remove the registration from the list and free
      it unconditionally, instead of doing it only when the ref reference
      counter reaches zero.
      
      Fixes: 9431063a ("dpll: core: Add DPLL framework base functions")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b446631f
    • Kévin L'hôpital's avatar
      net: phy: fix phy_get_internal_delay accessing an empty array · 4469c0c5
      Kévin L'hôpital authored
      The phy_get_internal_delay function could try to access to an empty
      array in the case that the driver is calling phy_get_internal_delay
      without defining delay_values and rx-internal-delay-ps or
      tx-internal-delay-ps is defined to 0 in the device-tree.
      This will lead to "unable to handle kernel NULL pointer dereference at
      virtual address 0". To avoid this kernel oops, the test should be delay
      >= 0. As there is already delay < 0 test just before, the test could
      only be size == 0.
      
      Fixes: 92252eec ("net: phy: Add a helper to return the index for of the internal delay")
      Co-developed-by: default avatarEnguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
      Signed-off-by: default avatarEnguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
      Signed-off-by: default avatarKévin L'hôpital <kevin.lhopital@savoirfairelinux.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4469c0c5
    • Sunil Goutham's avatar
      octeontx2-af: Fix devlink params · fc1b2901
      Sunil Goutham authored
      Devlink param for adjusting NPC MCAM high zone
      area is in wrong param list and is not getting
      activated on CN10KA silicon.
      That patch fixes this issue.
      
      Fixes: dd784287 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarSai Krishna <saikrishnag@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc1b2901
    • Eric Dumazet's avatar
      net: ip_tunnel: make sure to pull inner header in ip_tunnel_rcv() · b0ec2abf
      Eric Dumazet authored
      Apply the same fix than ones found in :
      
      8d975c15 ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
      1ca1ba46 ("geneve: make sure to pull inner header in geneve_rx()")
      
      We have to save skb->network_header in a temporary variable
      in order to be able to recompute the network_header pointer
      after a pskb_inet_may_pull() call.
      
      pskb_inet_may_pull() makes sure the needed headers are in skb->head.
      
      syzbot reported:
      BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
       BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
       BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
       BUG: KMSAN: uninit-value in ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409
        __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
        INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
        IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
        ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409
        __ipgre_rcv+0x9bc/0xbc0 net/ipv4/ip_gre.c:389
        ipgre_rcv net/ipv4/ip_gre.c:411 [inline]
        gre_rcv+0x423/0x19f0 net/ipv4/ip_gre.c:447
        gre_rcv+0x2a4/0x390 net/ipv4/gre_demux.c:163
        ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
        ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
        dst_input include/net/dst.h:461 [inline]
        ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
        __netif_receive_skb_one_core net/core/dev.c:5534 [inline]
        __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
        netif_receive_skb_internal net/core/dev.c:5734 [inline]
        netif_receive_skb+0x58/0x660 net/core/dev.c:5793
        tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1556
        tun_get_user+0x53b9/0x66e0 drivers/net/tun.c:2009
        tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055
        call_write_iter include/linux/fs.h:2087 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xb6b/0x1520 fs/read_write.c:590
        ksys_write+0x20f/0x4c0 fs/read_write.c:643
        __do_sys_write fs/read_write.c:655 [inline]
        __se_sys_write fs/read_write.c:652 [inline]
        __x64_sys_write+0x93/0xd0 fs/read_write.c:652
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
        __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590
        alloc_pages_mpol+0x62b/0x9d0 mm/mempolicy.c:2133
        alloc_pages+0x1be/0x1e0 mm/mempolicy.c:2204
        skb_page_frag_refill+0x2bf/0x7c0 net/core/sock.c:2909
        tun_build_skb drivers/net/tun.c:1686 [inline]
        tun_get_user+0xe0a/0x66e0 drivers/net/tun.c:1826
        tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055
        call_write_iter include/linux/fs.h:2087 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xb6b/0x1520 fs/read_write.c:590
        ksys_write+0x20f/0x4c0 fs/read_write.c:643
        __do_sys_write fs/read_write.c:655 [inline]
        __se_sys_write fs/read_write.c:652 [inline]
        __x64_sys_write+0x93/0xd0 fs/read_write.c:652
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0ec2abf
    • Shiming Cheng's avatar
      ipv6: fib6_rules: flush route cache when rule is changed · c4386ab4
      Shiming Cheng authored
      When rule policy is changed, ipv6 socket cache is not refreshed.
      The sock's skb still uses a outdated route cache and was sent to
      a wrong interface.
      
      To avoid this error we should update fib node's version when
      rule is changed. Then skb's route will be reroute checked as
      route cache version is already different with fib node version.
      The route cache is refreshed to match the latest rule.
      
      Fixes: 101367c2 ("[IPV6]: Policy Routing Rules")
      Signed-off-by: default avatarShiming Cheng <shiming.cheng@mediatek.com>
      Signed-off-by: default avatarLena Wang <lena.wang@mediatek.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4386ab4
    • Oleksij Rempel's avatar
      net: dsa: microchip: make sure drive strength configuration is not lost by soft reset · e3fb8e8b
      Oleksij Rempel authored
      This driver has two separate reset sequence in different places:
      - gpio/HW reset on start of ksz_switch_register()
      - SW reset on start of ksz_setup()
      
      The second one will overwrite drive strength configuration made in the
      ksz_switch_register().
      
      To fix it, move ksz_parse_drive_strength() from ksz_switch_register() to
      ksz_setup().
      
      Fixes: d67d7247 ("net: dsa: microchip: Add drive strength configuration")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20240304135612.814404-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e3fb8e8b
  3. 07 Mar, 2024 20 commits
  4. 06 Mar, 2024 12 commits
    • Florian Westphal's avatar
      netfilter: nft_ct: fix l3num expectations with inet pseudo family · 99993789
      Florian Westphal authored
      Following is rejected but should be allowed:
      
      table inet t {
              ct expectation exp1 {
                      [..]
                      l3proto ip
      
      Valid combos are:
      table ip t, l3proto ip
      table ip6 t, l3proto ip6
      table inet t, l3proto ip OR l3proto ip6
      
      Disallow inet pseudeo family, the l3num must be a on-wire protocol known
      to conntrack.
      
      Retain NFPROTO_INET case to make it clear its rejected
      intentionally rather as oversight.
      
      Fixes: 8059918a ("netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      99993789
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject constant set with timeout · 5f4fc4bd
      Pablo Neira Ayuso authored
      This set combination is weird: it allows for elements to be
      added/deleted, but once bound to the rule it cannot be updated anymore.
      Eventually, all elements expire, leading to an empty set which cannot
      be updated anymore. Reject this flags combination.
      
      Cc: stable@vger.kernel.org
      Fixes: 761da293 ("netfilter: nf_tables: add set timeout API support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5f4fc4bd
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: disallow anonymous set with timeout flag · 16603605
      Pablo Neira Ayuso authored
      Anonymous sets are never used with timeout from userspace, reject this.
      Exception to this rule is NFT_SET_EVAL to ensure legacy meters still work.
      
      Cc: stable@vger.kernel.org
      Fixes: 761da293 ("netfilter: nf_tables: add set timeout API support")
      Reported-by: default avatarlonial con <kongln9170@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      16603605
    • Przemek Kitszel's avatar
      ice: fix stats being updated by way too large values · 257310e9
      Przemek Kitszel authored
      Simplify stats accumulation logic to fix the case where we don't take
      previous stat value into account, we should always respect it.
      
      Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
      magnitude too big during OpenStack reconfiguration events, possibly other
      reconfiguration cases too.
      
      The regression was reported to be between 6.1 and 6.2, so I was almost
      certain that on of the two "preserve stats over reset" commits were the
      culprit. While reading the code, it was found that in some cases we will
      increase the stats by arbitrarily large number (thanks to ignoring "-prev"
      part of condition, after zeroing it).
      
      Note that this fixes also the case where we were around limits of u64, but
      that was not the regression reported.
      
      Full disclosure: I remember suggesting this particular piece of code to
      Ben a few years ago, so blame on me.
      
      Fixes: 2fd5e433 ("ice: Accumulate HW and Netdev statistics over reset")
      Reported-by: default avatarNebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
      Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.comReported-by: default avatarChristian Rohmann <christian.rohmann@inovex.de>
      Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.deReviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      257310e9
    • Vinicius Costa Gomes's avatar
      igb: Fix missing time sync events · ee14cc9e
      Vinicius Costa Gomes authored
      Fix "double" clearing of interrupts, which can cause external events
      or timestamps to be missed.
      
      The E1000_TSIRC Time Sync Interrupt Cause register can be cleared in two
      ways, by either reading it or by writing '1' into the specific cause
      bit. This is documented in section 8.16.1.
      
      The following flow was used:
          1. read E1000_TSIRC into 'tsicr';
          2. handle the interrupts present into 'tsirc' and mark them in 'ack';
          3. write 'ack' into E1000_TSICR;
      
      As both (1) and (3) will clear the interrupt cause, if the same
      interrupt happens again between (1) and (3) it will be ignored,
      causing events to be missed.
      
      Remove the extra clear in (3).
      
      Fixes: 00c65578 ("igb: enable internal PPS for the i210")
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ee14cc9e
    • Vinicius Costa Gomes's avatar
      igc: Fix missing time sync events · 244ae992
      Vinicius Costa Gomes authored
      Fix "double" clearing of interrupts, which can cause external events
      or timestamps to be missed.
      
      The IGC_TSIRC Time Sync Interrupt Cause register can be cleared in two
      ways, by either reading it or by writing '1' into the specific cause
      bit. This is documented in section 8.16.1.
      
      The following flow was used:
       1. read IGC_TSIRC into 'tsicr';
       2. handle the interrupts present in 'tsirc' and mark them in 'ack';
       3. write 'ack' into IGC_TSICR;
      
      As both (1) and (3) will clear the interrupt cause, if the same
      interrupt happens again between (1) and (3) it will be ignored,
      causing events to be missed.
      
      Remove the extra clear in (3).
      
      Fixes: 2c344ae2 ("igc: Add support for TX timestamping")
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      244ae992
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 67be068d
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Get rid of copy_mc flag in iov_iter which really only makes sense for
         the core dumping code so move it out of the generic iov iter code and
         make it coredump's problem. See the detailed commit description.
      
       - Revert fs/aio: Make io_cancel() generate completions again
      
         The initial fix here was predicated on the assumption that calling
         ki_cancel() didn't complete aio requests. However, that turned out to
         be wrong since the two drivers that actually make use of this set a
         cancellation function that performs the cancellation correctly. So
         revert this change.
      
       - Ensure that the test for IOCB_AIO_RW always happens before the read
         from ki_ctx.
      
      * tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        iov_iter: get rid of 'copy_mc' flag
        fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
        Revert "fs/aio: Make io_cancel() generate completions again"
      67be068d
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 5274d261
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "These should be the final fixes for the soc tree for 6.8, as usual
        they mostly deal wtih dts files:
      
         - Qualcomm fixes for pcie4 on sc8280xp, a revert of msm8996 mpm
           support, sm6115 interconnect and sm8650 gpio.
      
         - Two fixes for Tegra234 ethernet
      
         - A Makefile fix to actually build the allwinner based orange pi zero
           2w device tree
      
         - Fixes for clocks and reset on imx8mp and a DSI display regression
           on imx7.
      
        The non-DT fixes are:
      
         - Firmware fixes addressing a kernel panic in op-tee and a minor
           regression in microchip/riscv.
      
         - A defconfig change to bring back backlight support after a Kconfig
           change"
      
      * tag 'arm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        firmware: microchip: Fix over-requested allocation size
        tee: optee: Fix kernel panic caused by incorrect error handling
        Revert "arm64: dts: qcom: msm8996: Hook up MPM"
        arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
        arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
        arm64: dts: imx8mp: Fix LDB clocks property
        arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM
        MAINTAINERS: Use a proper mailinglist for NXP i.MX development
        ARM: dts: imx7: remove DSI port endpoints
        arm64: dts: allwinner: h616: Add Orange Pi Zero 2W to Makefile
        ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE
        arm64: tegra: Fix Tegra234 MGBE power-domains
        arm64: tegra: Set the correct PHY mode for MGBE
        arm64: dts: qcom: sm6115: Fix missing interconnect-names
        arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio
        arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio
      5274d261
    • Linus Torvalds's avatar
      Merge tag 'v6.8-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 09dcdbac
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "Fix potential use-after-frees in rk3288 and sun8i-ce"
      
      * tag 'v6.8-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: rk3288 - Fix use after free in unprepare
        crypto: sun8i-ce - Fix use after free in unprepare
      09dcdbac
    • Edward Adam Davis's avatar
      net/rds: fix WARNING in rds_conn_connect_if_down · c055fc00
      Edward Adam Davis authored
      If connection isn't established yet, get_mr() will fail, trigger connection after
      get_mr().
      
      Fixes: 584a8279 ("RDS: RDMA: return appropriate error on rdma map failures")
      Reported-and-tested-by: syzbot+d4faee732755bba9838e@syzkaller.appspotmail.com
      Signed-off-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c055fc00
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · f287d6aa
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-03-05 (idpf, ice, i40e, igc, e1000e)
      
      This series contains updates to idpf, ice, i40e, igc and e1000e drivers.
      
      Emil disables local BH on NAPI schedule for proper handling of softirqs
      on idpf.
      
      Jake stops reporting of virtchannel RSS option which in unsupported on
      ice.
      
      Rand Deeb adds null check to prevent possible null pointer dereference
      on ice.
      
      Michal Schmidt moves DPLL mutex initialization to resolve uninitialized
      mutex usage for ice.
      
      Jesse fixes incorrect variable usage for calculating Tx stats on ice.
      
      Ivan Vecera corrects logic for firmware equals check on i40e.
      
      Florian Kauer prevents memory corruption for XDP_REDIRECT on igc.
      
      Sasha reverts an incorrect use of FIELD_GET which caused a regression
      for Wake on LAN on e1000e.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f287d6aa
    • Linus Torvalds's avatar
      iov_iter: get rid of 'copy_mc' flag · a50026bd
      Linus Torvalds authored
      This flag is only set by one single user: the magical core dumping code
      that looks up user pages one by one, and then writes them out using
      their kernel addresses (by using a BVEC_ITER).
      
      That actually ends up being a huge problem, because while we do use
      copy_mc_to_kernel() for this case and it is able to handle the possible
      machine checks involved, nothing else is really ready to handle the
      failures caused by the machine check.
      
      In particular, as reported by Tong Tiangen, we don't actually support
      fault_in_iov_iter_readable() on a machine check area.
      
      As a result, the usual logic for writing things to a file under a
      filesystem lock, which involves doing a copy with page faults disabled
      and then if that fails trying to fault pages in without holding the
      locks with fault_in_iov_iter_readable() does not work at all.
      
      We could decide to always just make the MC copy "succeed" (and filling
      the destination with zeroes), and that would then create a core dump
      file that just ignores any machine checks.
      
      But honestly, this single special case has been problematic before, and
      means that all the normal iov_iter code ends up slightly more complex
      and slower.
      
      See for example commit c9eec08b ("iov_iter: Don't deal with
      iter->copy_mc in memcpy_from_iter_mc()") where David Howells
      re-organized the code just to avoid having to check the 'copy_mc' flags
      inside the inner iov_iter loops.
      
      So considering that we have exactly one user, and that one user is a
      non-critical special case that doesn't actually ever trigger in real
      life (Tong found this with manual error injection), the sane solution is
      to just decide that the onus on handling the machine check lines on that
      user instead.
      
      Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
      the user data to a stable kernel page before writing it out.
      
      Fixes: f1982740 ("iov_iter: Convert iterate*() to inline funcs")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarTong Tiangen <tongtiangen@huawei.com>
      Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com
      Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Reported-by: default avatarTong Tiangen <tongtiangen@huawei.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a50026bd