1. 29 Aug, 2021 1 commit
  2. 28 Aug, 2021 2 commits
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 2619835e
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-08-27
      
      This series contains updates to ice driver only.
      
      Jake corrects the iterator used for looping Tx timestamp and removes
      dead code related to pin configuration. He also adds locking around
      flushing of the Tx tracker and restarts the periodic clock following
      time changes.
      
      Brett corrects the locking around updating netdev dev_addr.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2619835e
    • Vladimir Oltean's avatar
      net: phy: marvell10g: fix broken PHY interrupts for anyone after us in the driver probe list · 0d55649d
      Vladimir Oltean authored
      Enabling interrupts via device tree for the internal PHYs on the
      mv88e6390 DSA switch does not work. The driver insists to use poll mode.
      
      Stage one debugging shows that the fwnode_mdiobus_phy_device_register
      function calls fwnode_irq_get properly, and phy->irq is set to a valid
      interrupt line initially.
      
      But it is then cleared.
      
      Stage two debugging shows that it is cleared here:
      
      phy_probe:
      
        /* Disable the interrupt if the PHY doesn't support it
         * but the interrupt is still a valid one
         */
        if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
      	phydev->irq = PHY_POLL;
      
      Okay, so does the "Marvell 88E6390 Family" PHY driver not have the
      .config_intr and .handle_interrupt function pointers? Yes it does.
      
      Stage three debugging shows that the PHY device does not attempt a probe
      against the "Marvell 88E6390 Family" driver, but against the "mv88x3310"
      driver.
      
      Okay, so why does the "mv88x3310" driver match on a mv88x6390 internal
      PHY? The PHY IDs (MARVELL_PHY_ID_88E6390_FAMILY vs
      MARVELL_PHY_ID_88X3310) are way different.
      
      Stage four debugging has us looking through:
      
      phy_device_register
      -> device_add
         -> bus_probe_device
            -> device_initial_probe
               -> __device_attach
                  -> bus_for_each_drv
                     -> driver_match_device
                        -> drv->bus->match
                           -> phy_bus_match
      
      Okay, so as we said, the MII_PHYSID1 of mv88e6390 does not match the
      mv88x3310 driver's PHY mask & ID, so why would phy_bus_match return...
      
      Ahh, phy_bus_match calls a shortcircuit method,
      phydrv->match_phy_device, and does not even bother to compare the PHY ID
      if that is implemented.
      
      So of course, we go inside the marvell10g.c driver and sure enough, it
      implements .match_phy_device and does not bother to check the PHY ID.
      
      What's interesting though is that at the end of the device_add() from
      phy_device_register(), the driver for the internal PHYs _is_ the proper
      "Marvell 88E6390 Family". This is because "mv88x3310" ends up failing to
      probe after all, and __device_attach_driver(), to quote:
      
        /*
         * Ignore errors returned by ->probe so that the next driver can try
         * its luck.
         */
      
      The next (and only other) driver that matches is the 6390 driver. For
      this one, phy_probe doesn't fail, and everything expects to work as
      normal, EXCEPT phydev->irq has already been cleared by the previous
      unsuccessful probe of a driver which did not implement PHY interrupts,
      and therefore cleared that IRQ.
      
      Okay, so it is not just Marvell 6390 that has PHY interrupts broken.
      Stuff like Atheros, Aquantia, Broadcom, Qualcomm work because they are
      lexicographically before Marvell, and stuff like NXP, Realtek, Vitesse
      are broken.
      
      This goes to show how fragile it is to reset phydev->irq = PHY_POLL from
      the actual beginning of phy_probe itself. That seems like an actual bug
      of its own too, since phy_probe has side effects which are not restored
      on probe failure, but the line of thought probably was, the same driver
      will attempt probe again, so it doesn't matter. Well, looks like it
      does.
      
      Maybe it would make more sense to move the phydev->irq clearing after
      the actual device_add() in phy_device_register() completes, and the
      bound driver is the actual final one.
      
      (also, a bit frightening that drivers are permitted to bypass the MDIO
      bus matching in such a trivial way and perform PHY reads and writes from
      the .match_phy_device method, on devices that do not even belong to
      them. In the general case it might not be guaranteed that the MDIO
      accesses one driver needs to make to figure out whether to match on a
      device is safe for all other PHY devices)
      
      Fixes: a5de4be0 ("net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210827132541.28953-1-kabel@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0d55649d
  3. 27 Aug, 2021 7 commits
    • Brett Creeley's avatar
      ice: Only lock to update netdev dev_addr · b357d971
      Brett Creeley authored
      commit 3ba7f53f ("ice: don't remove netdev->dev_addr from uc sync
      list") introduced calls to netif_addr_lock_bh() and
      netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is
      fine since the driver is updated the netdev's dev_addr, but since this
      is a spinlock, the driver cannot sleep when the lock is held.
      Unfortunately the functions to add/delete MAC filters depend on a mutex.
      This was causing a trace with the lock debug kernel config options
      enabled when changing the mac address via iproute.
      
      [  203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
      [  203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip
      [  203.273068] Preemption disabled at:
      [  203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice]
      [  203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S      W I       5.14.0-rc4 #2
      [  203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
      [  203.273102] Call Trace:
      [  203.273107]  dump_stack_lvl+0x33/0x42
      [  203.273113]  ? ice_set_mac_address+0x8b/0x1c0 [ice]
      [  203.273124]  ___might_sleep.cold.150+0xda/0xea
      [  203.273131]  mutex_lock+0x1c/0x40
      [  203.273136]  ice_remove_mac+0xe3/0x180 [ice]
      [  203.273155]  ? ice_fltr_add_mac_list+0x20/0x20 [ice]
      [  203.273175]  ice_fltr_prepare_mac+0x43/0xa0 [ice]
      [  203.273194]  ice_set_mac_address+0xab/0x1c0 [ice]
      [  203.273206]  dev_set_mac_address+0xb8/0x120
      [  203.273210]  dev_set_mac_address_user+0x2c/0x50
      [  203.273212]  do_setlink+0x1dd/0x10e0
      [  203.273217]  ? __nla_validate_parse+0x12d/0x1a0
      [  203.273221]  __rtnl_newlink+0x530/0x910
      [  203.273224]  ? __kmalloc_node_track_caller+0x17f/0x380
      [  203.273230]  ? preempt_count_add+0x68/0xa0
      [  203.273236]  ? _raw_spin_lock_irqsave+0x1f/0x30
      [  203.273241]  ? kmem_cache_alloc_trace+0x4d/0x440
      [  203.273244]  rtnl_newlink+0x43/0x60
      [  203.273245]  rtnetlink_rcv_msg+0x13a/0x380
      [  203.273248]  ? rtnl_calcit.isra.40+0x130/0x130
      [  203.273250]  netlink_rcv_skb+0x4e/0x100
      [  203.273256]  netlink_unicast+0x1a2/0x280
      [  203.273258]  netlink_sendmsg+0x242/0x490
      [  203.273260]  sock_sendmsg+0x58/0x60
      [  203.273263]  ____sys_sendmsg+0x1ef/0x260
      [  203.273265]  ? copy_msghdr_from_user+0x5c/0x90
      [  203.273268]  ? ____sys_recvmsg+0xe6/0x170
      [  203.273270]  ___sys_sendmsg+0x7c/0xc0
      [  203.273272]  ? copy_msghdr_from_user+0x5c/0x90
      [  203.273274]  ? ___sys_recvmsg+0x89/0xc0
      [  203.273276]  ? __netlink_sendskb+0x50/0x50
      [  203.273278]  ? mod_objcg_state+0xee/0x310
      [  203.273282]  ? __dentry_kill+0x114/0x170
      [  203.273286]  ? get_max_files+0x10/0x10
      [  203.273288]  __sys_sendmsg+0x57/0xa0
      [  203.273290]  do_syscall_64+0x37/0x80
      [  203.273295]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  203.273296] RIP: 0033:0x7f8edf96e278
      [  203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
      [  203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278
      [  203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003
      [  203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0
      [  203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      [  203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005
      
      Fix this by only locking when changing the netdev->dev_addr. Also, make
      sure to restore the old netdev->dev_addr on any failures.
      
      Fixes: 3ba7f53f ("ice: don't remove netdev->dev_addr from uc sync list")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b357d971
    • Jacob Keller's avatar
      ice: restart periodic outputs around time changes · 9ee31343
      Jacob Keller authored
      When we enabled auxiliary input/output support for the E810 device, we
      forgot to add logic to restart the output when we change time. This is
      important as the periodic output will be incorrect after a time change
      otherwise.
      
      This unfortunately includes the adjust time function, even though it
      uses an atomic hardware interface. The atomic adjustment can still cause
      the pin output to stall permanently, so we need to stop and restart it.
      
      Introduce wrapper functions to temporarily disable and then re-enable
      the clock outputs.
      
      Fixes: 172db5f9 ("ice: add support for auxiliary input/output pins")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarSunitha D Mekala <sunithax.d.mekala@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9ee31343
    • Jacob Keller's avatar
      ice: add lock around Tx timestamp tracker flush · 4dd0d5c3
      Jacob Keller authored
      The driver didn't take the lock while flushing the Tx tracker, which
      could cause a race where one thread is trying to read timestamps out
      while another thread is trying to read the tracker to check the
      timestamps.
      
      Avoid this by ensuring that flushing is locked against read accesses.
      
      Fixes: ea9b847c ("ice: enable transmit timestamps for E810 devices")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      4dd0d5c3
    • Jacob Keller's avatar
      ice: remove dead code for allocating pin_config · 1f0cbb3e
      Jacob Keller authored
      We have code in the ice driver which allocates the pin_config structure
      if n_pins is > 0, but we never set n_pins to be greater than zero.
      There's no reason to keep this code until we actually have pin_config
      support. Remove this. We can re-add it properly when we implement
      support for pin_config for E810-T devices.
      
      Fixes: 172db5f9 ("ice: add support for auxiliary input/output pins")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1f0cbb3e
    • Jacob Keller's avatar
      ice: fix Tx queue iteration for Tx timestamp enablement · 84c5fb8c
      Jacob Keller authored
      The driver accidentally copied the ice_for_each_rxq iterator when
      implementing enablement of the ptp_tx bit for the Tx rings. We still
      load the Tx rings and set the ptp_tx field, but we iterate over the
      count of the num_rxq.
      
      If the number of Tx and Rx queues differ, this could either cause
      a buffer overrun when accessing the tx_rings list if num_txq is greater
      than num_rxq, or it could cause us to fail to enable Tx timestamps for
      some rings.
      
      This was not noticed originally as we generally have the same number of
      Tx and Rx queues.
      
      Fixes: ea9b847c ("ice: enable transmit timestamps for E810 devices")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      84c5fb8c
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 5fe2a6b4
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-08-26
      
      This series introduces some fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fe2a6b4
    • Peter Collingbourne's avatar
      net: don't unconditionally copy_from_user a struct ifreq for socket ioctls · d0efb162
      Peter Collingbourne authored
      A common implementation of isatty(3) involves calling a ioctl passing
      a dummy struct argument and checking whether the syscall failed --
      bionic and glibc use TCGETS (passing a struct termios), and musl uses
      TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
      copy sizeof(struct ifreq) bytes of data from the argument and return
      -EFAULT if that fails. The result is that the isatty implementations
      may return a non-POSIX-compliant value in errno in the case where part
      of the dummy struct argument is inaccessible, as both struct termios
      and struct winsize are smaller than struct ifreq (at least on arm64).
      
      Although there is usually enough stack space following the argument
      on the stack that this did not present a practical problem up to now,
      with MTE stack instrumentation it's more likely for the copy to fail,
      as the memory following the struct may have a different tag.
      
      Fix the problem by adding an early check for whether the ioctl is a
      valid socket ioctl, and return -ENOTTY if it isn't.
      
      Fixes: 44c02a2c ("dev_ioctl(): move copyin/copyout to callers")
      Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4dSigned-off-by: default avatarPeter Collingbourne <pcc@google.com>
      Cc: <stable@vger.kernel.org> # 4.19
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0efb162
  4. 26 Aug, 2021 30 commits