1. 09 Jun, 2022 25 commits
    • Matthew Wilcox (Oracle)'s avatar
      mm/huge_memory: Fix xarray node memory leak · 69a37a8b
      Matthew Wilcox (Oracle) authored
      If xas_split_alloc() fails to allocate the necessary nodes to complete the
      xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem()
      then interprets as "Please allocate more memory", not as "Please free
      any unnecessary memory" (which was the intended outcome).  It's confusing
      to use xas_nomem() to free memory in this context, so call xas_destroy()
      instead.
      
      Reported-by: syzbot+9e27a75a8c24f3fe75c1@syzkaller.appspotmail.com
      Fixes: 6b24ca4a ("mm: Use multi-index entries in the page cache")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      69a37a8b
    • Matthew Wilcox (Oracle)'s avatar
      filemap: Cache the value of vm_flags · dcfa24ba
      Matthew Wilcox (Oracle) authored
      After we have unlocked the mmap_lock for I/O, the file is pinned, but
      the VMA is not.  Checking this flag after that can be a use-after-free.
      It's not a terribly interesting use-after-free as it can only read one
      bit, and it's used to decide whether to read 2MB or 4MB.  But it
      upsets the automated tools and it's generally bad practice anyway,
      so let's fix it.
      
      Reported-by: syzbot+5b96d55e5b54924c77ad@syzkaller.appspotmail.com
      Fixes: 4687fdbb ("mm/filemap: Support VM_HUGEPAGE for file mappings")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      dcfa24ba
    • Matthew Wilcox (Oracle)'s avatar
      filemap: Don't release a locked folio · 6bf74cdd
      Matthew Wilcox (Oracle) authored
      We must hold a reference over the call to filemap_release_folio(),
      otherwise the page cache will put the last reference to the folio
      before we unlock it, leading to splats like this:
      
       BUG: Bad page state in process u8:5  pfn:1ab1f4
       page:ffffea0006ac7d00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x28b1de pfn:0x1ab1f4
       flags: 0x17ff80000040001(locked|reclaim|node=0|zone=2|lastcpupid=0xfff)
       raw: 017ff80000040001 dead000000000100 dead000000000122 0000000000000000
       raw: 000000000028b1de 0000000000000000 00000000ffffffff 0000000000000000
       page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
      
      It's an error path, so it doesn't see much testing.
      Reported-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Fixes: a42634a6 ("readahead: Use a folio in read_pages()")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      6bf74cdd
    • Linus Torvalds's avatar
      Merge tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 3d9f55c5
      Linus Torvalds authored
      Pull ext2, writeback, and quota fixes and cleanups from Jan Kara:
       "A fix for race in writeback code and two cleanups in quota and ext2"
      
      * tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        quota: Prevent memory allocation recursion while holding dq_lock
        writeback: Fix inode->i_io_list not be protected by inode->i_lock error
        fs: Fix syntax errors in comments
      3d9f55c5
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 95fc76c8
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - On 32-bit fix overread/overwrite of thread_struct via ptrace
         PEEK/POKE.
      
       - Fix softirqs not switching to the softirq stack since we moved
         irq_exit().
      
       - Force thread size increase when KASAN is enabled to avoid stack
         overflows.
      
       - On Book3s 64 mark more code as not to be instrumented by KASAN to
         avoid crashes.
      
       - Exempt __get_wchan() from KASAN checking, as it's inherently racy.
      
       - Fix a recently introduced crash in the papr_scm driver in some
         configurations.
      
       - Remove include of <generated/compile.h> which is forbidden.
      
      Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner,
      He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras,
      Sachin Sant, Vaibhav Jain, and Wanming Hu.
      
      * tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/32: Fix overread/overwrite of thread_struct via ptrace
        powerpc/book3e: get rid of #include <generated/compile.h>
        powerpc/kasan: Force thread size increase with KASAN
        powerpc/papr_scm: don't requests stats with '0' sized stats buffer
        powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK
        powerpc/kasan: Silence KASAN warnings in __get_wchan()
        powerpc/kasan: Mark more real-mode code as not to be instrumented
      95fc76c8
    • Linus Torvalds's avatar
      Merge tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 825464e7
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - eth: amt: fix possible null-ptr-deref in amt_rcv()
      
        Previous releases - regressions:
      
         - tcp: use alloc_large_system_hash() to allocate table_perturb
      
         - af_unix: fix a data-race in unix_dgram_peer_wake_me()
      
         - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
      
         - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF
      
        Previous releases - always broken:
      
         - ipv6: fix signed integer overflow in __ip6_append_data
      
         - netfilter:
             - nat: really support inet nat without l3 address
             - nf_tables: memleak flow rule from commit path
      
         - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs
      
         - openvswitch: fix misuse of the cached connection on tuple changes
      
         - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred
      
         - eth: altera: fix refcount leak in altera_tse_mdio_create
      
        Misc:
      
         - add Quentin Monnet to bpftool maintainers"
      
      * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
        net: amd-xgbe: fix clang -Wformat warning
        tcp: use alloc_large_system_hash() to allocate table_perturb
        net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY
        net: dsa: mv88e6xxx: correctly report serdes link failure
        net: dsa: mv88e6xxx: fix BMSR error to be consistent with others
        net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
        net: altera: Fix refcount leak in altera_tse_mdio_create
        net: openvswitch: fix misuse of the cached connection on tuple changes
        net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag
        ip_gre: test csum_start instead of transport header
        au1000_eth: stop using virt_to_bus()
        ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
        ipv6: Fix signed integer overflow in __ip6_append_data
        nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred
        nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
        nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
        nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
        net: ipv6: unexport __init-annotated seg6_hmac_init()
        net: xfrm: unexport __init-annotated xfrm4_protocol_init()
        net: mdio: unexport __init-annotated mdio_bus_init()
        ...
      825464e7
    • Linus Torvalds's avatar
      netfs: gcc-12: temporarily disable '-Wattribute-warning' for now · 507160f4
      Linus Torvalds authored
      This is a pure band-aid so that I can continue merging stuff from people
      while some of the gcc-12 fallout gets sorted out.
      
      In particular, gcc-12 is very unhappy about the kinds of pointer
      arithmetic tricks that netfs does, and that makes the fortify checks
      trigger in afs and ceph:
      
        In function ‘fortify_memset_chk’,
            inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2,
            inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2,
            inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2:
        include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
          258 |                         __write_overflow_field(p_size_field, size);
              |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      and the reason is that netfs_i_context_init() is passed a 'struct inode'
      pointer, and then it does
      
              struct netfs_i_context *ctx = netfs_i_context(inode);
      
              memset(ctx, 0, sizeof(*ctx));
      
      where that netfs_i_context() function just does pointer arithmetic on
      the inode pointer, knowing that the netfs_i_context is laid out
      immediately after it in memory.
      
      This is all truly disgusting, since the whole "netfs_i_context is laid
      out immediately after it in memory" is not actually remotely true in
      general, but is just made to be that way for afs and ceph.
      
      See for example fs/cifs/cifsglob.h:
      
        struct cifsInodeInfo {
              struct {
                      /* These must be contiguous */
                      struct inode    vfs_inode;      /* the VFS's inode record */
                      struct netfs_i_context netfs_ctx; /* Netfslib context */
              };
      	[...]
      
      and realize that this is all entirely wrong, and the pointer arithmetic
      that netfs_i_context() is doing is also very very wrong and wouldn't
      give the right answer if netfs_ctx had different alignment rules from a
      'struct inode', for example).
      
      Anyway, that's just a long-winded way to say "the gcc-12 warning is
      actually quite reasonable, and our code happens to work but is pretty
      disgusting".
      
      This is getting fixed properly, but for now I made the mistake of
      thinking "the week right after the merge window tends to be calm for me
      as people take a breather" and I did a sustem upgrade.  And I got gcc-12
      as a result, so to continue merging fixes from people and not have the
      end result drown in warnings, I am fixing all these gcc-12 issues I hit.
      
      Including with these kinds of temporary fixes.
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      507160f4
    • Linus Torvalds's avatar
      gcc-12: disable '-Warray-bounds' universally for now · f0be87c4
      Linus Torvalds authored
      In commit 8b202ee2 ("s390: disable -Warray-bounds") the s390 people
      disabled the '-Warray-bounds' warning for gcc-12, because the new logic
      in gcc would cause warnings for their use of the S390_lowcore macro,
      which accesses absolute pointers.
      
      It turns out gcc-12 has many other issues in this area, so this takes
      that s390 warning disable logic, and turns it into a kernel build config
      entry instead.
      
      Part of the intent is that we can make this all much more targeted, and
      use this conflig flag to disable it in only particular configurations
      that cause problems, with the s390 case as an example:
      
              select GCC12_NO_ARRAY_BOUNDS
      
      and we could do that for other configuration cases that cause issues.
      
      Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more
      targeted way, and disable the warning only for particular uses: again
      the s390 case as an example:
      
        KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)
      
      but this ends up just doing it globally in the top-level Makefile, since
      the current issues are spread fairly widely all over:
      
        KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
      
      We'll try to limit this later, since the gcc-12 problems are rare enough
      that *much* of the kernel can be built with it without disabling this
      warning.
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0be87c4
    • Linus Torvalds's avatar
      mellanox: mlx5: avoid uninitialized variable warning with gcc-12 · 842c3b3d
      Linus Torvalds authored
      gcc-12 started warning about 'tracker' being used uninitialized:
      
        drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c: In function ‘mlx5_do_bond’:
        drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c:786:28: warning: ‘tracker’ is used uninitialized [-Wuninitialized]
          786 |         struct lag_tracker tracker;
              |                            ^~~~~~~
      
      which seems to be because it doesn't track how the use (and
      initialization) is bound by the 'do_bond' flag.
      
      But admittedly that 'do_bond' usage is fairly complicated, and involves
      passing it around as an argument to helper functions, so it's somewhat
      understandable that gcc doesn't see how that all works.
      
      This function could be rewritten to make the use of that tracker
      variable more obviously safe, but for now I'm just adding the forced
      initialization of it.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      842c3b3d
    • Linus Torvalds's avatar
      gcc-12: disable '-Wdangling-pointer' warning for now · 49beadbd
      Linus Torvalds authored
      While the concept of checking for dangling pointers to local variables
      at function exit is really interesting, the gcc-12 implementation is not
      compatible with reality, and results in false positives.
      
      For example, gcc sees us putting things on a local list head allocated
      on the stack, which involves exactly those kinds of pointers to the
      local stack entry:
      
        In function ‘__list_add’,
            inlined from ‘list_add_tail’ at include/linux/list.h:102:2,
            inlined from ‘rebuild_snap_realms’ at fs/ceph/snap.c:434:2:
        include/linux/list.h:74:19: warning: storing the address of local variable ‘realm_queue’ in ‘*&realm_27(D)->rebuild_item.prev’ [-Wdangling-pointer=]
           74 |         new->prev = prev;
              |         ~~~~~~~~~~^~~~~~
      
      But then gcc - understandably - doesn't really understand the big
      picture how the doubly linked list works, so doesn't see how we then end
      up emptying said list head in a loop and the pointer we added has been
      removed.
      
      Gcc also complains about us (intentionally) using this as a way to store
      a kind of fake stack trace, eg
      
        drivers/acpi/acpica/utdebug.c:40:38: warning: storing the address of local variable ‘current_sp’ in ‘acpi_gbl_entry_stack_pointer’ [-Wdangling-pointer=]
           40 |         acpi_gbl_entry_stack_pointer = &current_sp;
              |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
      
      which is entirely reasonable from a compiler standpoint, and we may want
      to change those kinds of patterns, but not not.
      
      So this is one of those "it would be lovely if the compiler were to
      complain about us leaving dangling pointers to the stack", but not this
      way.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49beadbd
    • Linus Torvalds's avatar
      drm: imx: fix compiler warning with gcc-12 · 7aefd8b5
      Linus Torvalds authored
      Gcc-12 correctly warned about this code using a non-NULL pointer as a
      truth value:
      
        drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’:
        drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address]
           72 |                 if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base)
              |                     ^
      
      due to the extraneous '&' address-of operator.
      
      Philipp Zabel points out that The mistake had no adverse effect since
      the following condition doesn't actually dereference the NULL pointer,
      but the intent of the code was obviously to check for it, not to take
      the address of the member.
      
      Fixes: eb8c8880 ("drm/imx: add deferred plane disabling")
      Acked-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7aefd8b5
    • Michael Ellerman's avatar
      powerpc/32: Fix overread/overwrite of thread_struct via ptrace · 8e127844
      Michael Ellerman authored
      The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process
      to read/write registers of another process.
      
      To get/set a register, the API takes an index into an imaginary address
      space called the "USER area", where the registers of the process are
      laid out in some fashion.
      
      The kernel then maps that index to a particular register in its own data
      structures and gets/sets the value.
      
      The API only allows a single machine-word to be read/written at a time.
      So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels.
      
      The way floating point registers (FPRs) are addressed is somewhat
      complicated, because double precision float values are 64-bit even on
      32-bit CPUs. That means on 32-bit kernels each FPR occupies two
      word-sized locations in the USER area. On 64-bit kernels each FPR
      occupies one word-sized location in the USER area.
      
      Internally the kernel stores the FPRs in an array of u64s, or if VSX is
      enabled, an array of pairs of u64s where one half of each pair stores
      the FPR. Which half of the pair stores the FPR depends on the kernel's
      endianness.
      
      To handle the different layouts of the FPRs depending on VSX/no-VSX and
      big/little endian, the TS_FPR() macro was introduced.
      
      Unfortunately the TS_FPR() macro does not take into account the fact
      that the addressing of each FPR differs between 32-bit and 64-bit
      kernels. It just takes the index into the "USER area" passed from
      userspace and indexes into the fp_state.fpr array.
      
      On 32-bit there are 64 indexes that address FPRs, but only 32 entries in
      the fp_state.fpr array, meaning the user can read/write 256 bytes past
      the end of the array. Because the fp_state sits in the middle of the
      thread_struct there are various fields than can be overwritten,
      including some pointers. As such it may be exploitable.
      
      It has also been observed to cause systems to hang or otherwise
      misbehave when using gdbserver, and is probably the root cause of this
      report which could not be easily reproduced:
        https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/
      
      Rather than trying to make the TS_FPR() macro even more complicated to
      fix the bug, or add more macros, instead add a special-case for 32-bit
      kernels. This is more obvious and hopefully avoids a similar bug
      happening again in future.
      
      Note that because 32-bit kernels never have VSX enabled the code doesn't
      need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to
      ensure that 32-bit && VSX is never enabled.
      
      Fixes: 87fec051 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds")
      Cc: stable@vger.kernel.org # v3.13+
      Reported-by: default avatarAriel Miculas <ariel.miculas@belden.com>
      Tested-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au
      8e127844
    • Justin Stitt's avatar
      net: amd-xgbe: fix clang -Wformat warning · 647df0d4
      Justin Stitt authored
      see warning:
      | drivers/net/ethernet/amd/xgbe/xgbe-drv.c:2787:43: warning: format specifies
      | type 'unsigned short' but the argument has type 'int' [-Wformat]
      |        netdev_dbg(netdev, "Protocol: %#06hx\n", ntohs(eth->h_proto));
      |                                      ~~~~~~     ^~~~~~~~~~~~~~~~~~~
      
      Variadic functions (printf-like) undergo default argument promotion.
      Documentation/core-api/printk-formats.rst specifically recommends
      using the promoted-to-type's format flag.
      
      Also, as per C11 6.3.1.1:
      (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf)
      `If an int can represent all values of the original type ..., the
      value is converted to an int; otherwise, it is converted to an
      unsigned int. These are called the integer promotions.`
      
      Since the argument is a u16 it will get promoted to an int and thus it is
      most accurate to use the %x format specifier here. It should be noted that the
      `#06` formatting sugar does not alter the promotion rules.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: default avatarJustin Stitt <jstitt007@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Link: https://lore.kernel.org/r/20220607191119.20686-1-jstitt007@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      647df0d4
    • Muchun Song's avatar
      tcp: use alloc_large_system_hash() to allocate table_perturb · e67b72b9
      Muchun Song authored
      In our server, there may be no high order (>= 6) memory since we reserve
      lots of HugeTLB pages when booting.  Then the system panic.  So use
      alloc_large_system_hash() to allocate table_perturb.
      
      Fixes: e9261476 ("tcp: dynamically allocate the perturb table used by source ports")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e67b72b9
    • Alvin Šipraga's avatar
      net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY · 487994ff
      Alvin Šipraga authored
      Since commit a18e6521 ("net: phylink: handle NA interface mode in
      phylink_fwnode_phy_connect()"), phylib defaults to GMII when no phy-mode
      or phy-connection-type property is specified in a DSA port node of the
      device tree. The same commit caused a regression in rtl8365mb whereby
      phylink would fail to connect, because the driver did not advertise
      support for GMII for ports with internal PHY.
      
      It should be noted that the aforementioned regression is not because the
      blamed commit was incorrect: on the contrary, the blamed commit is
      correcting the previous behaviour whereby unspecified phy-mode would
      cause the internal interface mode to be PHY_INTERFACE_MODE_NA. The
      rtl8365mb driver only worked by accident before because it _did_
      advertise support for PHY_INTERFACE_MODE_NA, despite NA being reserved
      for internal use by phylink. With one mistake fixed, the other was
      exposed.
      
      Commit a5dba0f2 ("net: dsa: rtl8365mb: add GMII as user port mode")
      then introduced implicit support for GMII mode on ports with internal
      PHY to allow a PHY connection for device trees where the phy-mode is not
      explicitly set to "internal". At this point everything was working OK
      again.
      
      Subsequently, commit 6ff60646 ("net: dsa: realtek: convert to
      phylink_generic_validate()") broke this behaviour again by discarding
      the usage of rtl8365mb_phy_mode_supported() - where this GMII support
      was indicated - while switching to the new .phylink_get_caps API.
      
      With the new API, rtl8365mb_phy_mode_supported() is no longer needed.
      Remove it altogether and add back the GMII capability - this time to
      rtl8365mb_phylink_get_caps() - so that the above default behaviour works
      for ports with internal PHY again.
      
      Fixes: 6ff60646 ("net: dsa: realtek: convert to phylink_generic_validate()")
      Signed-off-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/20220607184624.417641-1-alvin@pqrs.dkSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      487994ff
    • Jakub Kicinski's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 568a32f5
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-06-07
      
      This series contains updates to ixgbe driver only.
      
      Olivier Matz resolves an issue so that broadcast packets can still be
      received when VF removes promiscuous settings and removes setting of
      VLAN promiscuous, in promiscuous mode, to prevent a loop when VFs are
      bridged.
      
      * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ixgbe: fix unexpected VLAN Rx in promisc mode on VF
        ixgbe: fix bcast packets Rx on VF after promisc removal
      ====================
      
      Link: https://lore.kernel.org/r/20220607181538.748786-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      568a32f5
    • Jakub Kicinski's avatar
      Merge branch 'mv88e6xxx-fixes-for-reading-serdes-state' · 5d4af9c1
      Jakub Kicinski authored
      Russell King says:
      
      ====================
      mv88e6xxx: fixes for reading serdes state
      
      These are some low-priority fixes to the mv88e6xxx serdes code.
      Patch 1 fixes the reporting of an_complete, which is used in the
      emulation of a conventional C22 PHY. Patch from Marek.
      
      Patch 2 makes one of the error messages in patch 2 to be consistent
      with the other error messages in this function.
      
      Patch 3 ensures that we do not miss a link-failure event.
      ====================
      
      Link: https://lore.kernel.org/r/Yp82TyoLon9jz6k3@shell.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5d4af9c1
    • Russell King (Oracle)'s avatar
      net: dsa: mv88e6xxx: correctly report serdes link failure · b4d78731
      Russell King (Oracle) authored
      Phylink wants to know if the link has dropped since the last time state
      was retrieved, and the BMSR gives us that. Read the BMSR and use it when
      deciding the link state. Fill in the an_complete member as well for the
      emulated PHY state.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b4d78731
    • Russell King (Oracle)'s avatar
      net: dsa: mv88e6xxx: fix BMSR error to be consistent with others · 2b4bb9cd
      Russell King (Oracle) authored
      Other errors accessing the registers in mv88e6352_serdes_pcs_get_state()
      print "PHY " before the register name, except for the BMSR. Make this
      consistent with the other error messages.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b4bb9cd
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete · 47e96930
      Marek Behún authored
      Commit ede359d8 ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN
      is bypassed") added the ability to link if AN was bypassed, and added
      filling of state->an_complete field, but set it to true if AN was
      enabled in BMCR, not when AN was reported complete in BMSR.
      
      This was done because for some reason, when I wanted to use BMSR value
      to infer an_complete, I was looking at BMSR_ANEGCAPABLE bit (which was
      always 1), instead of BMSR_ANEGCOMPLETE bit.
      
      Use BMSR_ANEGCOMPLETE for filling state->an_complete.
      
      Fixes: ede359d8 ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      47e96930
    • Miaoqian Lin's avatar
      net: altera: Fix refcount leak in altera_tse_mdio_create · 11ec18b1
      Miaoqian Lin authored
      Every iteration of for_each_child_of_node() decrements
      the reference count of the previous node.
      When break from a for_each_child_of_node() loop,
      we need to explicitly call of_node_put() on the child node when
      not need anymore.
      Add missing of_node_put() to avoid refcount leak.
      
      Fixes: bbd2190c ("Altera TSE: Add main and header file for Altera Ethernet Driver")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Link: https://lore.kernel.org/r/20220607041144.7553-1-linmq006@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      11ec18b1
    • Ilya Maximets's avatar
      net: openvswitch: fix misuse of the cached connection on tuple changes · 2061ecfd
      Ilya Maximets authored
      If packet headers changed, the cached nfct is no longer relevant
      for the packet and attempt to re-use it leads to the incorrect packet
      classification.
      
      This issue is causing broken connectivity in OpenStack deployments
      with OVS/OVN due to hairpin traffic being unexpectedly dropped.
      
      The setup has datapath flows with several conntrack actions and tuple
      changes between them:
      
        actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
                set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
                set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
                ct(zone=8),recirc(0x4)
      
      After the first ct() action the packet headers are almost fully
      re-written.  The next ct() tries to re-use the existing nfct entry
      and marks the packet as invalid, so it gets dropped later in the
      pipeline.
      
      Clearing the cached conntrack entry whenever packet tuple is changed
      to avoid the issue.
      
      The flow key should not be cleared though, because we should still
      be able to match on the ct_state if the recirculation happens after
      the tuple change but before the next ct() action.
      
      Cc: stable@vger.kernel.org
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Reported-by: default avatarFrode Nordahl <frode.nordahl@canonical.com>
      Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html
      Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2061ecfd
    • Chen Lin's avatar
      net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag · 2f2c0d29
      Chen Lin authored
      When rx_flag == MTK_RX_FLAGS_HWLRO,
      rx_data_len = MTK_MAX_LRO_RX_LENGTH(4096 * 3) > PAGE_SIZE.
      netdev_alloc_frag is for alloction of page fragment only.
      Reference to other drivers and Documentation/vm/page_frags.rst
      
      Branch to use __get_free_pages when ring->frag_size > PAGE_SIZE.
      Signed-off-by: default avatarChen Lin <chen45464546@163.com>
      Link: https://lore.kernel.org/r/1654692413-2598-1-git-send-email-chen45464546@163.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2f2c0d29
    • Willem de Bruijn's avatar
      ip_gre: test csum_start instead of transport header · 8d21e996
      Willem de Bruijn authored
      GRE with TUNNEL_CSUM will apply local checksum offload on
      CHECKSUM_PARTIAL packets.
      
      ipgre_xmit must validate csum_start after an optional skb_pull,
      else lco_csum may trigger an overflow. The original check was
      
      	if (csum && skb_checksum_start(skb) < skb->data)
      		return -EINVAL;
      
      This had false positives when skb_checksum_start is undefined:
      when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement
      was straightforward
      
      	if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
      	    skb_checksum_start(skb) < skb->data)
      		return -EINVAL;
      
      But was eventually revised more thoroughly:
      - restrict the check to the only branch where needed, in an
        uncommon GRE path that uses header_ops and calls skb_pull.
      - test skb_transport_header, which is set along with csum_start
        in skb_partial_csum_set in the normal header_ops datapath.
      
      Turns out skbs can arrive in this branch without the transport
      header set, e.g., through BPF redirection.
      
      Revise the check back to check csum_start directly, and only if
      CHECKSUM_PARTIAL. Do leave the check in the updated location.
      Check field regardless of whether TUNNEL_CSUM is configured.
      
      Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
      Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u
      Fixes: 8a0ed250 ("ip_gre: validate csum_start only on pull")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d21e996
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d5d4c363
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-06-09
      
      We've added 6 non-merge commits during the last 2 day(s) which contain
      a total of 8 files changed, 49 insertions(+), 15 deletions(-).
      
      The main changes are:
      
      1) Fix an illegal copy_to_user() attempt seen by syzkaller through arm64
         BPF JIT compiler, from Eric Dumazet.
      
      2) Fix calling global functions from BPF_PROG_TYPE_EXT programs by using
         the correct program context type, from Toke Høiland-Jørgensen.
      
      3) Fix XSK TX batching invalid descriptor handling, from Maciej Fijalkowski.
      
      4) Fix potential integer overflows in multi-kprobe link code by using safer
         kvmalloc_array() allocation helpers, from Dan Carpenter.
      
      5) Add Quentin as bpftool maintainer, from Quentin Monnet.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        MAINTAINERS: Add a maintainer for bpftool
        xsk: Fix handling of invalid descriptors in XSK TX batching API
        selftests/bpf: Add selftest for calling global functions from freplace
        bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs
        bpf: Use safer kvmalloc_array() where possible
        bpf, arm64: Clear prog->jited_len along prog->jited
      ====================
      
      Link: https://lore.kernel.org/r/20220608234133.32265-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d5d4c363
  2. 08 Jun, 2022 15 commits