1. 22 Nov, 2021 13 commits
    • Marta Plantykow's avatar
      ice: avoid bpf_prog refcount underflow · f65ee535
      Marta Plantykow authored
      Ice driver has the routines for managing XDP resources that are shared
      between ndo_bpf op and VSI rebuild flow. The latter takes place for
      example when user changes queue count on an interface via ethtool's
      set_channels().
      
      There is an issue around the bpf_prog refcounting when VSI is being
      rebuilt - since ice_prepare_xdp_rings() is called with vsi->xdp_prog as
      an argument that is used later on by ice_vsi_assign_bpf_prog(), same
      bpf_prog pointers are swapped with each other. Then it is also
      interpreted as an 'old_prog' which in turn causes us to call
      bpf_prog_put on it that will decrement its refcount.
      
      Below splat can be interpreted in a way that due to zero refcount of a
      bpf_prog it is wiped out from the system while kernel still tries to
      refer to it:
      
      [  481.069429] BUG: unable to handle page fault for address: ffffc9000640f038
      [  481.077390] #PF: supervisor read access in kernel mode
      [  481.083335] #PF: error_code(0x0000) - not-present page
      [  481.089276] PGD 100000067 P4D 100000067 PUD 1001cb067 PMD 106d2b067 PTE 0
      [  481.097141] Oops: 0000 [#1] PREEMPT SMP PTI
      [  481.101980] CPU: 12 PID: 3339 Comm: sudo Tainted: G           OE     5.15.0-rc5+ #1
      [  481.110840] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [  481.122021] RIP: 0010:dev_xdp_prog_id+0x25/0x40
      [  481.127265] Code: 80 00 00 00 00 0f 1f 44 00 00 89 f6 48 c1 e6 04 48 01 fe 48 8b 86 98 08 00 00 48 85 c0 74 13 48 8b 50 18 31 c0 48 85 d2 74 07 <48> 8b 42 38 8b 40 20 c3 48 8b 96 90 08 00 00 eb e8 66 2e 0f 1f 84
      [  481.148991] RSP: 0018:ffffc90007b63868 EFLAGS: 00010286
      [  481.155034] RAX: 0000000000000000 RBX: ffff889080824000 RCX: 0000000000000000
      [  481.163278] RDX: ffffc9000640f000 RSI: ffff889080824010 RDI: ffff889080824000
      [  481.171527] RBP: ffff888107af7d00 R08: 0000000000000000 R09: ffff88810db5f6e0
      [  481.179776] R10: 0000000000000000 R11: ffff8890885b9988 R12: ffff88810db5f4bc
      [  481.188026] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [  481.196276] FS:  00007f5466d5bec0(0000) GS:ffff88903fb00000(0000) knlGS:0000000000000000
      [  481.205633] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  481.212279] CR2: ffffc9000640f038 CR3: 000000014429c006 CR4: 00000000003706e0
      [  481.220530] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  481.228771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  481.237029] Call Trace:
      [  481.239856]  rtnl_fill_ifinfo+0x768/0x12e0
      [  481.244602]  rtnl_dump_ifinfo+0x525/0x650
      [  481.249246]  ? __alloc_skb+0xa5/0x280
      [  481.253484]  netlink_dump+0x168/0x3c0
      [  481.257725]  netlink_recvmsg+0x21e/0x3e0
      [  481.262263]  ____sys_recvmsg+0x87/0x170
      [  481.266707]  ? __might_fault+0x20/0x30
      [  481.271046]  ? _copy_from_user+0x66/0xa0
      [  481.275591]  ? iovec_from_user+0xf6/0x1c0
      [  481.280226]  ___sys_recvmsg+0x82/0x100
      [  481.284566]  ? sock_sendmsg+0x5e/0x60
      [  481.288791]  ? __sys_sendto+0xee/0x150
      [  481.293129]  __sys_recvmsg+0x56/0xa0
      [  481.297267]  do_syscall_64+0x3b/0xc0
      [  481.301395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  481.307238] RIP: 0033:0x7f5466f39617
      [  481.311373] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bd 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  481.342944] RSP: 002b:00007ffedc7f4308 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
      [  481.361783] RAX: ffffffffffffffda RBX: 00007ffedc7f5460 RCX: 00007f5466f39617
      [  481.380278] RDX: 0000000000000000 RSI: 00007ffedc7f5360 RDI: 0000000000000003
      [  481.398500] RBP: 00007ffedc7f53f0 R08: 0000000000000000 R09: 000055d556f04d50
      [  481.416463] R10: 0000000000000077 R11: 0000000000000246 R12: 00007ffedc7f5360
      [  481.434131] R13: 00007ffedc7f5350 R14: 00007ffedc7f5344 R15: 0000000000000e98
      [  481.451520] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mxm_wmi mei_me coretemp mei ipmi_si ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ahci crypto_simd cryptd libahci lpc_ich [last unloaded: ice]
      [  481.528558] CR2: ffffc9000640f038
      [  481.542041] ---[ end trace d1f24c9ecf5b61c1 ]---
      
      Fix this by only calling ice_vsi_assign_bpf_prog() inside
      ice_prepare_xdp_rings() when current vsi->xdp_prog pointer is NULL.
      This way set_channels() flow will not attempt to swap the vsi->xdp_prog
      pointers with itself.
      
      Also, sprinkle around some comments that provide a reasoning about
      correlation between driver and kernel in terms of bpf_prog refcount.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarMarta Plantykow <marta.a.plantykow@intel.com>
      Co-developed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f65ee535
    • Maciej Fijalkowski's avatar
      ice: fix vsi->txq_map sizing · 792b2086
      Maciej Fijalkowski authored
      The approach of having XDP queue per CPU regardless of user's setting
      exposed a hidden bug that could occur in case when Rx queue count differ
      from Tx queue count. Currently vsi->txq_map's size is equal to the
      doubled vsi->alloc_txq, which is not correct due to the fact that XDP
      rings were previously based on the Rx queue count. Below splat can be
      seen when ethtool -L is used and XDP rings are configured:
      
      [  682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f
      [  682.883403] #PF: supervisor read access in kernel mode
      [  682.889345] #PF: error_code(0x0000) - not-present page
      [  682.895289] PGD 0 P4D 0
      [  682.898218] Oops: 0000 [#1] PREEMPT SMP PTI
      [  682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G           OE     5.15.0-rc5+ #1
      [  682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [  682.923380] RIP: 0010:devres_remove+0x44/0x130
      [  682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8
      [  682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002
      [  682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370
      [  682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000
      [  682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000
      [  682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60
      [  682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c
      [  682.997535] FS:  00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
      [  683.006910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0
      [  683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  683.038336] Call Trace:
      [  683.041167]  devm_kfree+0x33/0x50
      [  683.045004]  ice_vsi_free_arrays+0x5e/0xc0 [ice]
      [  683.050380]  ice_vsi_rebuild+0x4c8/0x750 [ice]
      [  683.055543]  ice_vsi_recfg_qs+0x9a/0x110 [ice]
      [  683.060697]  ice_set_channels+0x14f/0x290 [ice]
      [  683.065962]  ethnl_set_channels+0x333/0x3f0
      [  683.070807]  genl_family_rcv_msg_doit+0xea/0x150
      [  683.076152]  genl_rcv_msg+0xde/0x1d0
      [  683.080289]  ? channels_prepare_data+0x60/0x60
      [  683.085432]  ? genl_get_cmd+0xd0/0xd0
      [  683.089667]  netlink_rcv_skb+0x50/0xf0
      [  683.094006]  genl_rcv+0x24/0x40
      [  683.097638]  netlink_unicast+0x239/0x340
      [  683.102177]  netlink_sendmsg+0x22e/0x470
      [  683.106717]  sock_sendmsg+0x5e/0x60
      [  683.110756]  __sys_sendto+0xee/0x150
      [  683.114894]  ? handle_mm_fault+0xd0/0x2a0
      [  683.119535]  ? do_user_addr_fault+0x1f3/0x690
      [  683.134173]  __x64_sys_sendto+0x25/0x30
      [  683.148231]  do_syscall_64+0x3b/0xc0
      [  683.161992]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fix this by taking into account the value that num_possible_cpus()
      yields in addition to vsi->alloc_txq instead of doubling the latter.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      792b2086
    • Arnd Bergmann's avatar
      nixge: fix mac address error handling again · a68229ca
      Arnd Bergmann authored
      The change to eth_hw_addr_set() caused gcc to correctly spot a
      bug that was introduced in an earlier incorrect fix:
      
      In file included from include/linux/etherdevice.h:21,
                       from drivers/net/ethernet/ni/nixge.c:7:
      In function '__dev_addr_set',
          inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:319:2,
          inlined from 'nixge_probe' at drivers/net/ethernet/ni/nixge.c:1286:3:
      include/linux/netdevice.h:4648:9: error: 'memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
       4648 |         memcpy(dev->dev_addr, addr, len);
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      As nixge_get_nvmem_address() can return either NULL or an error
      pointer, the NULL check is wrong, and we can end up reading from
      ERR_PTR(-EOPNOTSUPP), which gcc knows to contain zero readable
      bytes.
      
      Make the function always return an error pointer again but fix
      the check to match that.
      
      Fixes: f3956ebb ("ethernet: use eth_hw_addr_set() instead of ether_addr_copy()")
      Fixes: abcd3d6f ("net: nixge: Fix error path for obtaining mac address")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a68229ca
    • Wen Gu's avatar
      net/smc: Avoid warning of possible recursive locking · 7a61432d
      Wen Gu authored
      Possible recursive locking is detected by lockdep when SMC
      falls back to TCP. The corresponding warnings are as follows:
      
       ============================================
       WARNING: possible recursive locking detected
       5.16.0-rc1+ #18 Tainted: G            E
       --------------------------------------------
       wrk/1391 is trying to acquire lock:
       ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc]
      
       but task is already holding lock:
       ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&ei->socket.wq.wait);
         lock(&ei->socket.wq.wait);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       2 locks held by wrk/1391:
        #0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc]
        #1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
      
       stack backtrace:
       Call Trace:
        <TASK>
        dump_stack_lvl+0x56/0x7b
        __lock_acquire+0x951/0x11f0
        lock_acquire+0x27a/0x320
        ? smc_switch_to_fallback+0x109/0x250 [smc]
        ? smc_switch_to_fallback+0xfe/0x250 [smc]
        _raw_spin_lock_irq+0x3b/0x80
        ? smc_switch_to_fallback+0x109/0x250 [smc]
        smc_switch_to_fallback+0x109/0x250 [smc]
        smc_connect_fallback+0xe/0x30 [smc]
        __smc_connect+0xcf/0x1090 [smc]
        ? mark_held_locks+0x61/0x80
        ? __local_bh_enable_ip+0x77/0xe0
        ? lockdep_hardirqs_on+0xbf/0x130
        ? smc_connect+0x12a/0x150 [smc]
        smc_connect+0x12a/0x150 [smc]
        __sys_connect+0x8a/0xc0
        ? syscall_enter_from_user_mode+0x20/0x70
        __x64_sys_connect+0x16/0x20
        do_syscall_64+0x34/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The nested locking in smc_switch_to_fallback() is considered to
      possibly cause a deadlock because smc_wait->lock and clc_wait->lock
      are the same type of lock. But actually it is safe so far since
      there is no other place trying to obtain smc_wait->lock when
      clc_wait->lock is held. So the patch replaces spin_lock() with
      spin_lock_nested() to avoid false report by lockdep.
      
      Link: https://lkml.org/lkml/2021/11/19/962
      Fixes: 2153bd1e ("Transfer remaining wait queue entries during fallback")
      Reported-by: syzbot+e979d3597f48262cb4ee@syzkaller.appspotmail.com
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Acked-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a61432d
    • Michael S. Tsirkin's avatar
      vsock/virtio: suppress used length validation · f7a36b03
      Michael S. Tsirkin authored
      It turns out that vhost vsock violates the virtio spec
      by supplying the out buffer length in the used length
      (should just be the in length).
      As a result, attempts to validate the used length fail with:
      vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0
      
      Since vsock driver does not use the length fox tx and
      validates the length before use for rx, it is safe to
      suppress the validation in virtio core for this driver.
      Reported-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Fixes: 939779f5 ("virtio_ring: validate used buffer length")
      Cc: "Jason Wang" <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a36b03
    • Nicolas Iooss's avatar
      net: ax88796c: do not receive data in pointer · f93fd0ca
      Nicolas Iooss authored
      Function axspi_read_status calls:
      
          ret = spi_write_then_read(ax_spi->spi, ax_spi->cmd_buf, 1,
                                    (u8 *)&status, 3);
      
      status is a pointer to a struct spi_status, which is 3-byte wide:
      
          struct spi_status {
              u16 isr;
              u8 status;
          };
      
      But &status is the pointer to this pointer, and spi_write_then_read does
      not dereference this parameter:
      
          int spi_write_then_read(struct spi_device *spi,
                                  const void *txbuf, unsigned n_tx,
                                  void *rxbuf, unsigned n_rx)
      
      Therefore axspi_read_status currently receive a SPI response in the
      pointer status, which overwrites 24 bits of the pointer.
      
      Thankfully, on Little-Endian systems, the pointer is only used in
      
          le16_to_cpus(&status->isr);
      
      ... which is a no-operation. So there, the overwritten pointer is not
      dereferenced. Nevertheless on Big-Endian systems, this can lead to
      dereferencing pointers after their 24 most significant bits were
      overwritten. And in all systems this leads to possible use of
      uninitialized value in functions calling spi_write_then_read which
      expect status to be initialized when the function returns.
      
      Moreover function axspi_read_status (and macro AX_READ_STATUS) do not
      seem to be used anywhere. So currently this seems to be dead code. Fix
      the issue anyway so that future code works properly when using function
      axspi_read_status.
      
      Fixes: a97c69ba ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Acked-by: default avatarŁukasz Stelmach <l.stelmach@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f93fd0ca
    • Holger Assmann's avatar
      net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls · a6da2bbb
      Holger Assmann authored
      Currently, when user space emits SIOCSHWTSTAMP ioctl calls such as
      enabling/disabling timestamping or changing filter settings, the driver
      reads the current CLOCK_REALTIME value and programming this into the
      NIC's hardware clock. This might be necessary during system
      initialization, but at runtime, when the PTP clock has already been
      synchronized to a grandmaster, a reset of the timestamp settings might
      result in a clock jump. Furthermore, if the clock is also controlled by
      phc2sys in automatic mode (where the UTC offset is queried from ptp4l),
      that UTC-to-TAI offset (currently 37 seconds in 2021) would be
      temporarily reset to 0, and it would take a long time for phc2sys to
      readjust so that CLOCK_REALTIME and the PHC are apart by 37 seconds
      again.
      
      To address the issue, we introduce a new function called
      stmmac_init_tstamp_counter(), which gets called during ndo_open().
      It contains the code snippet moved from stmmac_hwtstamp_set() that
      manages the time synchronization. Besides, the sub second increment
      configuration is also moved here since the related values are hardware
      dependent and runtime invariant.
      
      Furthermore, the hardware clock must be kept running even when no time
      stamping mode is selected in order to retain the synchronized time base.
      That way, timestamping can be enabled again at any time only with the
      need to compensate the clock's natural drifting.
      
      As a side effect, this patch fixes the issue that ptp_clock_info::enable
      can be called before SIOCSHWTSTAMP and the driver (which looks at
      priv->systime_flags) was not prepared to handle that ordering.
      
      Fixes: 92ba6888 ("stmmac: add the support for PTP hw clock driver")
      Reported-by: default avatarMichael Olbrich <m.olbrich@pengutronix.de>
      Signed-off-by: default avatarAhmad Fatoum <a.fatoum@pengutronix.de>
      Signed-off-by: default avatarHolger Assmann <h.assmann@pengutronix.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6da2bbb
    • Diana Wang's avatar
      nfp: checking parameter process for rx-usecs/tx-usecs is invalid · 3bd6b2a8
      Diana Wang authored
      Use nn->tlv_caps.me_freq_mhz instead of nn->me_freq_mhz to check whether
      rx-usecs/tx-usecs is valid.
      
      This is because nn->tlv_caps.me_freq_mhz represents the clock_freq (MHz) of
      the flow processing cores (FPC) on the NIC. While nn->me_freq_mhz is not
      be set.
      
      Fixes: ce991ab6 ("nfp: read ME frequency from vNIC ctrl memory")
      Signed-off-by: default avatarDiana Wang <na.wang@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bd6b2a8
    • Eric Dumazet's avatar
      ipv6: fix typos in __ip6_finish_output() · 19d36c5f
      Eric Dumazet authored
      We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and
      IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED
      
      Found by code inspection, please double check that fixing this bug
      does not surface other bugs.
      
      Fixes: 09ee9dba ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tobias Brunner <tobias@strongswan.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarTobias Brunner <tobias@strongswan.org>
      Acked-by: default avatarTobias Brunner <tobias@strongswan.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19d36c5f
    • Li Zhijian's avatar
      selftests/tc-testings: Be compatible with newer tc output · ac2944ab
      Li Zhijian authored
      old tc(iproute2-5.9.0) output:
       action order 1: bpf action.o:[action-ok] id 60 tag bcf7977d3b93787c jited default-action pipe
      newer tc(iproute2-5.14.0) output:
       action order 1: bpf action.o:[action-ok] id 64 name tag bcf7977d3b93787c jited default-action pipe
      
      It can fix below errors:
       # ok 260 f84a - Add cBPF action with invalid bytecode
       # not ok 261 e939 - Add eBPF action with valid object-file
       #       Could not match regex pattern. Verify command output:
       # total acts 0
       #
       #       action order 1: bpf action.o:[action-ok] id 42 name  tag bcf7977d3b93787c jited default-action pipe
       #        index 667 ref 1 bind 0
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac2944ab
    • Li Zhijian's avatar
      selftests/tc-testing: match any qdisc type · bdf1565f
      Li Zhijian authored
      We should not always presume all kernels use pfifo_fast as the default qdisc.
      
      For example, a fq_codel qdisk could have below output:
      qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Suggested-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdf1565f
    • Robert Marko's avatar
      net: dsa: qca8k: fix MTU calculation · 65258b9d
      Robert Marko authored
      qca8k has a global MTU, so its tracking the MTU per port to make sure
      that the largest MTU gets applied.
      Since it uses the frame size instead of MTU the driver MTU change function
      will then add the size of Ethernet header and checksum on top of MTU.
      
      The driver currently populates the per port MTU size as Ethernet frame
      length + checksum which equals 1518.
      
      The issue is that then MTU change function will go through all of the
      ports, find the largest MTU and apply the Ethernet header + checksum on
      top of it again, so for a desired MTU of 1500 you will end up with 1536.
      
      This is obviously incorrect, so to correct it populate the per port struct
      MTU with just the MTU and not include the Ethernet header + checksum size
      as those will be added by the MTU change function.
      
      Fixes: f58d2598 ("net: dsa: qca8k: implement the port MTU callbacks")
      Signed-off-by: default avatarRobert Marko <robert.marko@sartura.hr>
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65258b9d
    • Ansuel Smith's avatar
      net: dsa: qca8k: fix internal delay applied to the wrong PAD config · 3b00a07c
      Ansuel Smith authored
      With SGMII phy the internal delay is always applied to the PAD0 config.
      This is caused by the falling edge configuration that hardcode the reg
      to PAD0 (as the falling edge bits are present only in PAD0 reg)
      Move the delay configuration before the reg overwrite to correctly apply
      the delay.
      
      Fixes: cef08115 ("net: dsa: qca8k: set internal delay also for sgmii")
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b00a07c
  2. 20 Nov, 2021 5 commits
  3. 19 Nov, 2021 9 commits
    • Brett Creeley's avatar
      iavf: Fix VLAN feature flags after VFR · 5951a2b9
      Brett Creeley authored
      When a VF goes through a reset, it's possible for the VF's feature set
      to change. For example it may lose the VIRTCHNL_VF_OFFLOAD_VLAN
      capability after VF reset. Unfortunately, the driver doesn't correctly
      deal with this situation and errors are seen from downing/upping the
      interface and/or moving the interface in/out of a network namespace.
      
      When setting the interface down/up we see the following errors after the
      VIRTCHNL_VF_OFFLOAD_VLAN capability was taken away from the VF:
      
      ice 0000:51:00.1: VF 1 failed opcode 12, retval: -64 iavf 0000:51:09.1:
      Failed to add VLAN filter, error IAVF_NOT_SUPPORTED ice 0000:51:00.1: VF
      1 failed opcode 13, retval: -64 iavf 0000:51:09.1: Failed to delete VLAN
      filter, error IAVF_NOT_SUPPORTED
      
      These add/delete errors are happening because the VLAN filters are
      tracked internally to the driver and regardless of the VLAN_ALLOWED()
      setting the driver tries to delete/re-add them over virtchnl.
      
      Fix the delete failure by making sure to delete any VLAN filter tracking
      in the driver when a removal request is made, while preventing the
      virtchnl request.  This makes it so the driver's VLAN list is up to date
      and the errors are
      
      Fix the add failure by making sure the check for VLAN_ALLOWED() during
      reset is done after the VF receives its capability list from the PF via
      VIRTCHNL_OP_GET_VF_RESOURCES. If VLAN functionality is not allowed, then
      prevent requesting re-adding the filters over virtchnl.
      
      When moving the interface into a network namespace we see the following
      errors after the VIRTCHNL_VF_OFFLOAD_VLAN capability was taken away from
      the VF:
      
      iavf 0000:51:09.1 enp81s0f1v1: NIC Link is Up Speed is 25 Gbps Full Duplex
      iavf 0000:51:09.1 temp_27: renamed from enp81s0f1v1
      iavf 0000:51:09.1 mgmt: renamed from temp_27
      iavf 0000:51:09.1 dev27: set_features() failed (-22); wanted 0x020190001fd54833, left 0x020190001fd54bb3
      
      These errors are happening because we aren't correctly updating the
      netdev capabilities and dealing with ndo_fix_features() and
      ndo_set_features() correctly.
      
      Fix this by only reporting errors in the driver's ndo_set_features()
      callback when VIRTCHNL_VF_OFFLOAD_VLAN is not allowed and any attempt to
      enable the VLAN features is made. Also, make sure to disable VLAN
      insertion, filtering, and stripping since the VIRTCHNL_VF_OFFLOAD_VLAN
      flag applies to all of them and not just VLAN stripping.
      
      Also, after we process the capabilities in the VF reset path, make sure
      to call netdev_update_features() in case the capabilities have changed
      in order to update the netdev's feature set to match the VF's actual
      capabilities.
      
      Lastly, make sure to always report success on VLAN filter delete when
      VIRTCHNL_VF_OFFLOAD_VLAN is not supported. The changed flow in
      iavf_del_vlans() allows the stack to delete previosly existing VLAN
      filters even if VLAN filtering is not allowed. This makes it so the VLAN
      filter list is up to date.
      
      Fixes: 8774370d ("i40e/i40evf: support for VF VLAN tag stripping control")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5951a2b9
    • Jedrzej Jagielski's avatar
      iavf: Fix refreshing iavf adapter stats on ethtool request · 3b5bdd18
      Jedrzej Jagielski authored
      Currently iavf adapter statistics are refreshed only in a
      watchdog task, triggered approximately every two seconds,
      which causes some ethtool requests to return outdated values.
      
      Add explicit statistics refresh when requested by ethtool -S.
      
      Fixes: b476b003 ("iavf: Move commands processing to the separate function")
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Signed-off-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3b5bdd18
    • Jedrzej Jagielski's avatar
      iavf: Fix deadlock occurrence during resetting VF interface · 0cc318d2
      Jedrzej Jagielski authored
      System hangs if close the interface is called from the kernel during
      the interface is in resetting state.
      During resetting operation the link is closing but kernel didn't
      know it and it tried to close this interface again what sometimes
      led to deadlock.
      Inform kernel about current state of interface
      and turn off the flag IFF_UP when interface is closing until reset
      is finished.
      Previously it was most likely to hang the system when kernel
      (network manager) tried to close the interface in the same time
      when interface was in resetting state because of deadlock.
      
      Fixes: 3c8e0b98 ("i40vf: don't stop me now")
      Signed-off-by: default avatarJaroslaw Gawin <jaroslawx.gawin@intel.com>
      Signed-off-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0cc318d2
    • Nitesh B Venkatesh's avatar
      iavf: Prevent changing static ITR values if adaptive moderation is on · e792779e
      Nitesh B Venkatesh authored
      Resolve being able to change static values on VF when adaptive interrupt
      moderation is enabled.
      
      This problem is fixed by checking the interrupt settings is not
      a combination of change of static value while adaptive interrupt
      moderation is turned on.
      
      Without this fix, the user would be able to change static values
      on VF with adaptive moderation enabled.
      
      Fixes: 65e87c03 ("i40evf: support queue-specific settings for interrupt moderation")
      Signed-off-by: default avatarNitesh B Venkatesh <nitesh.b.venkatesh@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e792779e
    • Zekun Shen's avatar
      stmmac_pci: Fix underflow size in stmmac_rx · 0f296e78
      Zekun Shen authored
      This bug report came up when we were testing the device driver
      by fuzzing. It shows that buf1_len can get underflowed and be
      0xfffffffc (4294967292).
      
      This bug is triggerable with a compromised/malfunctioning device.
      We found the bug through QEMU emulation tested the patch with
      emulation. We did NOT test it on real hardware.
      
      Attached is the bug report by fuzzing.
      
      BUG: KASAN: use-after-free in stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
      Read of size 4294967292 at addr ffff888016358000 by task ksoftirqd/0/9
      
      CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #1
      Call Trace:
       dump_stack+0x76/0xa0
       print_address_description.constprop.0+0x16/0x200
       ? stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
       ? stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
       __kasan_report.cold+0x37/0x7c
       ? stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
       kasan_report+0xe/0x20
       check_memory_region+0x15a/0x1d0
       memcpy+0x20/0x50
       stmmac_napi_poll_rx+0x1c08/0x36e0 [stmmac]
       ? stmmac_suspend+0x850/0x850 [stmmac]
       ? __next_timer_interrupt+0xba/0xf0
       net_rx_action+0x363/0xbd0
       ? call_timer_fn+0x240/0x240
       ? __switch_to_asm+0x40/0x70
       ? napi_busy_loop+0x520/0x520
       ? __schedule+0x839/0x15a0
       __do_softirq+0x18c/0x634
       ? takeover_tasklets+0x5f0/0x5f0
       run_ksoftirqd+0x15/0x20
       smpboot_thread_fn+0x2f1/0x6b0
       ? smpboot_unregister_percpu_thread+0x160/0x160
       ? __kthread_parkme+0x80/0x100
       ? smpboot_unregister_percpu_thread+0x160/0x160
       kthread+0x2b5/0x3b0
       ? kthread_create_on_node+0xd0/0xd0
       ret_from_fork+0x22/0x40
      Reported-by: default avatarBrendan Dolan-Gavitt <brendandg@nyu.edu>
      Signed-off-by: default avatarZekun Shen <bruceshenzk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f296e78
    • Zekun Shen's avatar
      atlantic: fix double-free in aq_ring_tx_clean · 6a405f6c
      Zekun Shen authored
      We found this bug while fuzzing the device driver. Using and freeing
      the dangling pointer buff->skb would cause use-after-free and
      double-free.
      
      This bug is triggerable with compromised/malfunctioning devices. We
      found the bug with QEMU emulation and tested the patch by emulation.
      We did NOT test on a real device.
      
      Attached is the bug report.
      
      BUG: KASAN: double-free or invalid-free in consume_skb+0x6c/0x1c0
      
      Call Trace:
       dump_stack+0x76/0xa0
       print_address_description.constprop.0+0x16/0x200
       ? consume_skb+0x6c/0x1c0
       kasan_report_invalid_free+0x61/0xa0
       ? consume_skb+0x6c/0x1c0
       __kasan_slab_free+0x15e/0x170
       ? consume_skb+0x6c/0x1c0
       kfree+0x8c/0x230
       consume_skb+0x6c/0x1c0
       aq_ring_tx_clean+0x5c2/0xa80 [atlantic]
       aq_vec_poll+0x309/0x5d0 [atlantic]
       ? _sub_I_65535_1+0x20/0x20 [atlantic]
       ? __next_timer_interrupt+0xba/0xf0
       net_rx_action+0x363/0xbd0
       ? call_timer_fn+0x240/0x240
       ? __switch_to_asm+0x34/0x70
       ? napi_busy_loop+0x520/0x520
       ? net_tx_action+0x379/0x720
       __do_softirq+0x18c/0x634
       ? takeover_tasklets+0x5f0/0x5f0
       run_ksoftirqd+0x15/0x20
       smpboot_thread_fn+0x2f1/0x6b0
       ? smpboot_unregister_percpu_thread+0x160/0x160
       ? __kthread_parkme+0x80/0x100
       ? smpboot_unregister_percpu_thread+0x160/0x160
       kthread+0x2b5/0x3b0
       ? kthread_create_on_node+0xd0/0xd0
       ret_from_fork+0x22/0x40
      Reported-by: default avatarBrendan Dolan-Gavitt <brendandg@nyu.edu>
      Signed-off-by: default avatarZekun Shen <bruceshenzk@gmail.com>
      Reviewed-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a405f6c
    • Volodymyr Mytnyk's avatar
      net: marvell: prestera: fix double free issue on err path · e8d03250
      Volodymyr Mytnyk authored
      fix error path handling in prestera_bridge_port_join() that
      cases prestera driver to crash (see below).
      
       Trace:
         Internal error: Oops: 96000044 [#1] SMP
         Modules linked in: prestera_pci prestera uio_pdrv_genirq
         CPU: 1 PID: 881 Comm: ip Not tainted 5.15.0 #1
         pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
         pc : prestera_bridge_destroy+0x2c/0xb0 [prestera]
         lr : prestera_bridge_port_join+0x2cc/0x350 [prestera]
         sp : ffff800011a1b0f0
         ...
         x2 : ffff000109ca6c80 x1 : dead000000000100 x0 : dead000000000122
          Call trace:
         prestera_bridge_destroy+0x2c/0xb0 [prestera]
         prestera_bridge_port_join+0x2cc/0x350 [prestera]
         prestera_netdev_port_event.constprop.0+0x3c4/0x450 [prestera]
         prestera_netdev_event_handler+0xf4/0x110 [prestera]
         raw_notifier_call_chain+0x54/0x80
         call_netdevice_notifiers_info+0x54/0xa0
         __netdev_upper_dev_link+0x19c/0x380
      
      Fixes: e1189d9a ("net: marvell: prestera: Add Switchdev driver implementation")
      Signed-off-by: default avatarVolodymyr Mytnyk <vmytnyk@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8d03250
    • Volodymyr Mytnyk's avatar
      net: marvell: prestera: fix brige port operation · 253e9b4d
      Volodymyr Mytnyk authored
      Return NOTIFY_DONE (dont't care) for switchdev notifications
      that prestera driver don't know how to handle them.
      
      With introduction of SWITCHDEV_BRPORT_[UN]OFFLOADED switchdev
      events, the driver rejects adding swport to bridge operation
      which is handled by prestera_bridge_port_join() func. The root
      cause of this is that prestera driver returns error (EOPNOTSUPP)
      in prestera_switchdev_blk_event() handler for unknown swdev
      events. This causes switchdev_bridge_port_offload() to fail
      when adding port to bridge in prestera_bridge_port_join().
      
      Fixes: 957e2235 ("net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge")
      Signed-off-by: default avatarVolodymyr Mytnyk <vmytnyk@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      253e9b4d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · d6821c5b
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for net:
      
      1) Add selftest for vrf+conntrack, from Florian Westphal.
      
      2) Extend nfqueue selftest to cover nfqueue, also from Florian.
      
      3) Remove duplicated include in nft_payload, from Wan Jiabing.
      
      4) Several improvements to the nat port shadowing selftest,
         from Phil Sutter.
      
      5) Fix filtering of reply tuple in ctnetlink, from Florent Fourcot.
      
      6) Do not override error with -EINVAL in filter setup path, also
         from Florent.
      
      7) Honor sysctl_expire_nodest_conn regardless conn_reuse_mode for
         reused connections, from yangxingwu.
      
      8) Replace snprintf() by sysfs_emit() in xt_IDLETIMER as reported
         by Coccinelle, from Jing Yao.
      
      9) Incorrect IPv6 tunnel match in flowtable offload, from Will
         Mortensen.
      
      10) Switch port shadow selftest to use socat, from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6821c5b
  4. 18 Nov, 2021 13 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8d0112ac
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, mac80211.
      
        Current release - regressions:
      
         - devlink: don't throw an error if flash notification sent before
           devlink visible
      
         - page_pool: Revert "page_pool: disable dma mapping support...",
           turns out there are active arches who need it
      
        Current release - new code bugs:
      
         - amt: cancel delayed_work synchronously in amt_fini()
      
        Previous releases - regressions:
      
         - xsk: fix crash on double free in buffer pool
      
         - bpf: fix inner map state pruning regression causing program
           rejections
      
         - mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
           preventing mis-selecting the best effort queue
      
         - mac80211: do not access the IV when it was stripped
      
         - mac80211: fix radiotap header generation, off-by-one
      
         - nl80211: fix getting radio statistics in survey dump
      
         - e100: fix device suspend/resume
      
        Previous releases - always broken:
      
         - tcp: fix uninitialized access in skb frags array for Rx 0cp
      
         - bpf: fix toctou on read-only map's constant scalar tracking
      
         - bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing
           progs
      
         - tipc: only accept encrypted MSG_CRYPTO msgs
      
         - smc: transfer remaining wait queue entries during fallback, fix
           missing wake ups
      
         - udp: validate checksum in udp_read_sock() (when sockmap is used)
      
         - sched: act_mirred: drop dst for the direction from egress to
           ingress
      
         - virtio_net_hdr_to_skb: count transport header in UFO, prevent
           allowing bad skbs into the stack
      
         - nfc: reorder the logic in nfc_{un,}register_device, fix unregister
      
         - ipsec: check return value of ipv6_skip_exthdr
      
         - usb: r8152: add MAC passthrough support for more Lenovo Docks"
      
      * tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
        ptp: ocp: Fix a couple NULL vs IS_ERR() checks
        net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
        net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
        ipv6: check return value of ipv6_skip_exthdr
        e100: fix device suspend/resume
        devlink: Don't throw an error if flash notification sent before devlink visible
        page_pool: Revert "page_pool: disable dma mapping support..."
        ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
        octeontx2-af: debugfs: don't corrupt user memory
        NFC: add NCI_UNREG flag to eliminate the race
        NFC: reorder the logic in nfc_{un,}register_device
        NFC: reorganize the functions in nci_request
        tipc: check for null after calling kmemdup
        i40e: Fix display error code in dmesg
        i40e: Fix creation of first queue by omitting it if is not power of two
        i40e: Fix warning message and call stack during rmmod i40e driver
        i40e: Fix ping is lost after configuring ADq on VF
        i40e: Fix changing previously set num_queue_pairs for PFs
        i40e: Fix NULL ptr dereference on VSI filter sync
        i40e: Fix correct max_pkt_size on VF RX queue
        ...
      8d0112ac
    • Linus Torvalds's avatar
      Merge tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6fdf8864
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Several xes and one old ioctl deprecation. Namely there's fix for
        crashes/warnings with lzo compression that was suspected to be caused
        by first pull merge resolution, but it was a different bug.
      
        Summary:
      
         - regression fix for a crash in lzo due to missing boundary checks of
           the page array
      
         - fix crashes on ARM64 due to missing barriers when synchronizing
           status bits between work queues
      
         - silence lockdep when reading chunk tree during mount
      
         - fix false positive warning in integrity checker on devices with
           disabled write caching
      
         - fix signedness of bitfields in scrub
      
         - start deprecation of balance v1 ioctl"
      
      * tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: deprecate BTRFS_IOC_BALANCE ioctl
        btrfs: make 1-bit bit-fields of scrub_page unsigned int
        btrfs: check-integrity: fix a warning on write caching disabled disk
        btrfs: silence lockdep when reading chunk tree during mount
        btrfs: fix memory ordering between normal and ordered work functions
        btrfs: fix a out-of-bound access in copy_compressed_data_to_page()
      6fdf8864
    • Linus Torvalds's avatar
      Merge tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · db850a9b
      Linus Torvalds authored
      Pull UDF fix from Jan Kara:
       "A fix for a long-standing UDF bug where we were not properly
        validating directory position inside readdir"
      
      * tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: Fix crash after seekdir
      db850a9b
    • Linus Torvalds's avatar
      Merge tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 7cf7eed1
      Linus Torvalds authored
      Pull setattr idmapping fix from Christian Brauner:
       "This contains a simple fix for setattr. When determining the validity
        of the attributes the ia_{g,u}id fields contain the value that will be
        written to inode->i_{g,u}id. When the {g,u}id attribute of the file
        isn't altered and the caller's fs{g,u}id matches the current {g,u}id
        attribute the attribute change is allowed.
      
        The value in ia_{g,u}id does already account for idmapped mounts and
        will have taken the relevant idmapping into account. So in order to
        verify that the {g,u}id attribute isn't changed we simple need to
        compare the ia_{g,u}id value against the inode's i_{g,u}id value.
      
        This only has any meaning for idmapped mounts as idmapping helpers are
        idempotent without them. And for idmapped mounts this really only has
        a meaning when circular idmappings are used, i.e. mappings where e.g.
        id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such
        ciruclar mappings can e.g. be useful when sharing the same home
        directory between multiple users at the same time.
      
        Before this patch we could end up denying legitimate attribute changes
        and allowing invalid attribute changes when circular mappings are
        used. To even get into this situation the caller must've been
        privileged both to create that mapping and to create that idmapped
        mount.
      
        This hasn't been seen in the wild anywhere but came up when expanding
        the fstest suite during work on a series of hardening patches. All
        idmapped fstests pass without any regressions and we're adding new
        tests to verify the behavior of circular mappings.
      
        The new tests can be found at [1]"
      
      Link: https://lore.kernel.org/linux-fsdevel/20211109145713.1868404-2-brauner@kernel.org [1]
      
      * tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fs: handle circular mappings correctly
      7cf7eed1
    • Linus Torvalds's avatar
      Merge tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · a6a6d227
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "parisc bug and warning fixes and wire up futex_waitv.
      
        Fix some warnings which showed up with allmodconfig builds, a revert
        of a change to the sigreturn trampoline which broke signal handling,
        wire up futex_waitv and add CONFIG_PRINTK_TIME=y to 32bit defconfig"
      
      * tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
        Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
        parisc: Wrap assembler related defines inside __ASSEMBLY__
        parisc: Wire up futex_waitv
        parisc: Include stringify.h to avoid build error in crypto/api.c
        parisc/sticon: fix reverse colors
      a6a6d227
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · c46e8ece
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Selftest changes:
      
         - Cleanups for the perf test infrastructure and mapping hugepages
      
         - Avoid contention on mmap_sem when the guests start to run
      
         - Add event channel upcall support to xen_shinfo_test
      
        x86 changes:
      
         - Fixes for Xen emulation
      
         - Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
      
         - Fixes for migration of 32-bit nested guests on 64-bit hypervisor
      
         - Compilation fixes
      
         - More SEV cleanups
      
        Generic:
      
         - Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
           and num_online_cpus(). Most architectures were only using one of
           the two"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
        KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
        KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
        KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
        KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
        KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
        KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
        KVM: x86: Assume a 64-bit hypercall for guests with protected state
        selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
        riscv: kvm: fix non-kernel-doc comment block
        KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
        KVM: SEV: Drop a redundant setting of sev->asid during initialization
        KVM: SEV: WARN if SEV-ES is marked active but SEV is not
        KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
        KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
        KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
        KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
        KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
        KVM: x86/xen: Use sizeof_field() instead of open-coding it
        KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
        KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
        ...
      c46e8ece
    • Linus Torvalds's avatar
      Merge tag 'docs-5.16-2' of git://git.lwn.net/linux · 4ae275bc
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "A handful of documentation fixes for 5.16"
      
      * tag 'docs-5.16-2' of git://git.lwn.net/linux:
        Documentation/process: fix a cross reference
        Documentation: update vcpu-requests.rst reference
        docs: accounting: update delay-accounting.rst reference
        libbpf: update index.rst reference
        docs: filesystems: Fix grammatical error "with" to "which"
        doc/zh_CN: fix a translation error in management-style
        docs: ftrace: fix the wrong path of tracefs
        Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document
        Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375
        Documentation: arm: marvell: Add some links to homepage / product infos
        docs: Update Sphinx requirements
      4ae275bc
    • Linus Torvalds's avatar
      Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 7d5775d4
      Linus Torvalds authored
      Pull printk fixes from Petr Mladek:
      
       - Try to flush backtraces from other CPUs also on the local one. This
         was a regression caused by printk_safe buffers removal.
      
       - Remove header dependency warning.
      
      * tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: Remove printk.h inclusion in percpu.h
        printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
      7d5775d4
    • Dan Carpenter's avatar
      ptp: ocp: Fix a couple NULL vs IS_ERR() checks · c7521d3a
      Dan Carpenter authored
      The ptp_ocp_get_mem() function does not return NULL, it returns error
      pointers.
      
      Fixes: 773bda96 ("ptp: ocp: Expose various resources on the timecard.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7521d3a
    • Teng Qi's avatar
      net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() · 0fa68da7
      Teng Qi authored
      The definition of macro MOTO_SROM_BUG is:
        #define MOTO_SROM_BUG    (lp->active == 8 && (get_unaligned_le32(
        dev->dev_addr) & 0x00ffffff) == 0x3e0008)
      
      and the if statement
        if (MOTO_SROM_BUG) lp->active = 0;
      
      using this macro indicates lp->active could be 8. If lp->active is 8 and
      the second comparison of this macro is false. lp->active will remain 8 in:
        lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1);
        lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1);
        lp->phy[lp->active].mc  = get_unaligned_le16(p); p += 2;
        lp->phy[lp->active].ana = get_unaligned_le16(p); p += 2;
        lp->phy[lp->active].fdx = get_unaligned_le16(p); p += 2;
        lp->phy[lp->active].ttm = get_unaligned_le16(p); p += 2;
        lp->phy[lp->active].mci = *p;
      
      However, the length of array lp->phy is 8, so array overflows can occur.
      To fix these possible array overflows, we first check lp->active and then
      return -EINVAL if it is greater or equal to ARRAY_SIZE(lp->phy) (i.e. 8).
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarTeng Qi <starmiku1207184332@gmail.com>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fa68da7
    • zhangyue's avatar
      net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound · 61217be8
      zhangyue authored
      In line 5001, if all id in the array 'lp->phy[8]' is not 0, when the
      'for' end, the 'k' is 8.
      
      At this time, the array 'lp->phy[8]' may be out of bound.
      Signed-off-by: default avatarzhangyue <zhangyue1@kylinos.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61217be8
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net- · 4e5d2124
      David S. Miller authored
      queue
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-11-17
      
      This series contains updates to i40e driver only.
      
      Eryk adds accounting for VLAN header in packet size when VF port VLAN is
      configured. He also fixes TC queue distribution when the user has changed
      queue counts as well as for configuration of VF ADQ which caused dropped
      packets.
      
      Michal adds tracking for when a VSI is being released to prevent null
      pointer dereference when managing filters.
      
      Karen ensures PF successfully initiates VF requested reset which could
      cause a call trace otherwise.
      
      Jedrzej moves validation of channel queue value earlier to prevent
      partial configuration when the value is invalid.
      
      Grzegorz corrects the reported error when adding filter fails.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e5d2124
    • Jordy Zomer's avatar
      ipv6: check return value of ipv6_skip_exthdr · 5f9c55c8
      Jordy Zomer authored
      The offset value is used in pointer math on skb->data.
      Since ipv6_skip_exthdr may return -1 the pointer to uh and th
      may not point to the actual udp and tcp headers and potentially
      overwrite other stuff. This is why I think this should be checked.
      
      EDIT:  added {}'s, thanks Kees
      Signed-off-by: default avatarJordy Zomer <jordy@pwning.systems>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f9c55c8