1. 11 Nov, 2019 2 commits
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Restore PTP time after switch reset · 6cf99c13
      Vladimir Oltean authored
      The PTP time of the switch is not preserved when uploading a new static
      configuration. Work around this hardware oddity by reading its PTP time
      before a static config upload, and restoring it afterwards.
      
      Static config changes are expected to occur at runtime even in scenarios
      directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
      programmed in this way.
      
      Perhaps the larger implication of this patch is that the PTP .gettimex64
      and .settime functions need to be exposed to sja1105_main.c, where the
      PTP lock needs to be held during this entire process. So their core
      implementation needs to move to some common functions which get exposed
      in sja1105_ptp.h.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cf99c13
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Implement the .gettimex64 system call for PTP · 34d76e9f
      Vladimir Oltean authored
      Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
      applications (i.e. phc2sys) to compensate for the delays incurred while
      reading the PHC's time.
      
      The task itself of taking the software timestamp is delegated to the SPI
      subsystem, through the newly introduced API in struct spi_transfer. The
      goal is to cross-timestamp I/O operations on the switch's PTP clock with
      values in the local system clock (CLOCK_REALTIME). For that we need to
      understand a bit of the hardware internals.
      
      The 'read PTP time' message is a 12 byte structure, first 4 bytes of
      which represent the SPI header, and the last 8 bytes represent the
      64-bit PTP time. The switch itself starts processing the command
      immediately after receiving the last bit of the address, i.e. at the
      middle of byte 3 (last byte of header). The PTP time is shadowed to a
      buffer register in the switch, and retrieved atomically during the
      subsequent SPI frames.
      
      A similar thing goes on for the 'write PTP time' message, although in
      that case the switch waits until the 64-bit PTP time becomes fully
      available before taking any action. So the byte that needs to be
      software-timestamped is byte 11 (last) of the transfer.
      
      The patch creates a common (and local) sja1105_xfer implementation for
      the SPI I/O, and offers 3 front-ends:
      
      - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally
        requesting a PTP timestamp
      
      - sja1105_xfer_buf: this is for large transfers (e.g. the static config
        buffer) and other misc data, and there is no point in giving
        timestamping capabilities to this.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34d76e9f
  2. 10 Nov, 2019 7 commits
  3. 09 Nov, 2019 10 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 14684b93
      David S. Miller authored
      One conflict in the BPF samples Makefile, some fixes in 'net' whilst
      we were converting over to Makefile.target rules in 'net-next'.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14684b93
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 0058b0a5
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) BPF sample build fixes from Björn Töpel
      
       2) Fix powerpc bpf tail call implementation, from Eric Dumazet.
      
       3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet.
      
       4) Fix crash in ebtables when using dnat target, from Florian Westphal.
      
       5) Fix port disable handling whne removing bcm_sf2 driver, from Florian
          Fainelli.
      
       6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski.
      
       7) Various KCSAN fixes all over the networking, from Eric Dumazet.
      
       8) Memory leaks in mlx5 driver, from Alex Vesker.
      
       9) SMC interface refcounting fix, from Ursula Braun.
      
      10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu.
      
      11) Add a TX lock to synchonize the kTLS TX path properly with crypto
          operations. From Jakub Kicinski.
      
      12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano
          Garzarella.
      
      13) Infinite loop in Intel ice driver, from Colin Ian King.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
        ixgbe: need_wakeup flag might not be set for Tx
        i40e: need_wakeup flag might not be set for Tx
        igb/igc: use ktime accessors for skb->tstamp
        i40e: Fix for ethtool -m issue on X722 NIC
        iavf: initialize ITRN registers with correct values
        ice: fix potential infinite loop because loop counter being too small
        qede: fix NULL pointer deref in __qede_remove()
        net: fix data-race in neigh_event_send()
        vsock/virtio: fix sock refcnt holding during the shutdown
        net: ethernet: octeon_mgmt: Account for second possible VLAN header
        mac80211: fix station inactive_time shortly after boot
        net/fq_impl: Switch to kvmalloc() for memory allocation
        mac80211: fix ieee80211_txq_setup_flows() failure path
        ipv4: Fix table id reference in fib_sync_down_addr
        ipv6: fixes rt6_probe() and fib6_nh->last_probe init
        net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
        net: usb: qmi_wwan: add support for DW5821e with eSIM support
        CDC-NCM: handle incomplete transfer of MTU
        nfc: netlink: fix double device reference drop
        NFC: st21nfca: fix double free
        ...
      0058b0a5
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block · 5cb8418c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Two NVMe device removal crash fixes, and a compat fixup for for an
         ioctl that was introduced in this release (Anton, Charles, Max - via
         Keith)
      
       - Missing error path mutex unlock for drbd (Dan)
      
       - cgroup writeback fixup on dead memcg (Tejun)
      
       - blkcg online stats print fix (Tejun)
      
      * tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
        cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
        block: drbd: remove a stray unlock in __drbd_send_protocol()
        blkcg: make blkcg_print_stat() print stats only for online blkgs
        nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
        nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
        nvme-rdma: fix a segmentation fault during module unload
      5cb8418c
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · a2582cdc
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Fixes 2019-11-08
      
      This series contains fixes to igb, igc, ixgbe, i40e, iavf and ice
      drivers.
      
      Colin Ian King fixes a potentially wrap-around counter in a for-loop.
      
      Nick fixes the default ITR values for the iavf driver to 50 usecs
      interval.
      
      Arkadiusz fixes 'ethtool -m' for X722 devices where the correct value
      cannot be obtained from the firmware, so add X722 to the check to ensure
      the wrong value is not returned.
      
      Jake fixes igb and igc drivers in their implementation of launch time
      support by declaring skb->tstamp value as ktime_t instead of s64.
      
      Magnus fixes ixgbe and i40e where the need_wakeup flag for transmit may
      not be set for AF_XDP sockets that are only used to send packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2582cdc
    • Magnus Karlsson's avatar
      ixgbe: need_wakeup flag might not be set for Tx · 0843aa8f
      Magnus Karlsson authored
      The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an
      interrupt since interrupts are disabled in the napi poll loop) between
      the time we stopped processing the Tx completions and interrupts are
      enabled again. In this case, the need_wakeup flag will have been
      cleared at the end of the Tx completion processing as we believe we
      will get an interrupt from the outstanding completion at a later point
      in time. But if this completion interrupt occurs before interrupts
      are enable, we lose it and should at that point really have set the
      need_wakeup flag since there are no more outstanding completions that
      can generate an interrupt to continue the processing. When this
      happens, user space will see a Tx queue need_wakeup of 0 and skip
      issuing a syscall, which means will never get into the Tx processing
      again and we have a deadlock.
      
      This patch introduces a quick fix for this issue by just setting the
      need_wakeup flag for Tx to 1 all the time. I am working on a proper
      fix for this that will toggle the flag appropriately, but it is more
      challenging than I anticipated and I am afraid that this patch will
      not be completed before the merge window closes, therefore this easier
      fix for now. This fix has a negative performance impact in the range
      of 0% to 4%. Towards the higher end of the scale if you have driver
      and application on the same core and issue a lot of packets, and
      towards no negative impact if you use two cores, lower transmission
      speeds and/or a workload that also receives packets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0843aa8f
    • Magnus Karlsson's avatar
      i40e: need_wakeup flag might not be set for Tx · 70563957
      Magnus Karlsson authored
      The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an
      interrupt since interrupts are disabled in the napi poll loop) between
      the time we stopped processing the Tx completions and interrupts are
      enabled again. In this case, the need_wakeup flag will have been
      cleared at the end of the Tx completion processing as we believe we
      will get an interrupt from the outstanding completion at a later point
      in time. But if this completion interrupt occurs before interrupts
      are enable, we lose it and should at that point really have set the
      need_wakeup flag since there are no more outstanding completions that
      can generate an interrupt to continue the processing. When this
      happens, user space will see a Tx queue need_wakeup of 0 and skip
      issuing a syscall, which means will never get into the Tx processing
      again and we have a deadlock.
      
      This patch introduces a quick fix for this issue by just setting the
      need_wakeup flag for Tx to 1 all the time. I am working on a proper
      fix for this that will toggle the flag appropriately, but it is more
      challenging than I anticipated and I am afraid that this patch will
      not be completed before the merge window closes, therefore this easier
      fix for now. This fix has a negative performance impact in the range
      of 0% to 4%. Towards the higher end of the scale if you have driver
      and application on the same core and issue a lot of packets, and
      towards no negative impact if you use two cores, lower transmission
      speeds and/or a workload that also receives packets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      70563957
    • Jacob Keller's avatar
      igb/igc: use ktime accessors for skb->tstamp · 6acab13b
      Jacob Keller authored
      When implementing launch time support in the igb and igc drivers, the
      skb->tstamp value is assumed to be a s64, but it's declared as a ktime_t
      value.
      
      Although ktime_t is typedef'd to s64 it wasn't always, and the kernel
      provides accessors for ktime_t values.
      
      Use the ktime_to_timespec64 and ktime_set accessors instead of directly
      assuming that the variable is always an s64.
      
      This improves portability if the code is ever moved to another kernel
      version, or if the definition of ktime_t ever changes again in the
      future.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6acab13b
    • Arkadiusz Kubalewski's avatar
      i40e: Fix for ethtool -m issue on X722 NIC · 4c9da6f2
      Arkadiusz Kubalewski authored
      This patch contains fix for a problem with command:
      'ethtool -m <dev>'
      which breaks functionality of:
      'ethtool <dev>'
      when called on X722 NIC
      
      Disallowed update of link phy_types on X722 NIC
      Currently correct value cannot be obtained from FW
      Previously wrong value returned by FW was used and was
      a root cause for incorrect output of 'ethtool <dev>' command
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4c9da6f2
    • Nicholas Nunley's avatar
      iavf: initialize ITRN registers with correct values · 4eda4e00
      Nicholas Nunley authored
      Since commit 92418fb1 ("i40e/i40evf: Use usec value instead of reg
      value for ITR defines") the driver tracks the interrupt throttling
      intervals in single usec units, although the actual ITRN registers are
      programmed in 2 usec units. Most register programming flows in the driver
      correctly handle the conversion, although it is currently not applied when
      the registers are initialized to their default values. Most of the time
      this doesn't present a problem since the default values are usually
      immediately overwritten through the standard adaptive throttling mechanism,
      or updated manually by the user, but if adaptive throttling is disabled and
      the interval values are left alone then the incorrect value will persist.
      
      Since the intended default interval of 50 usecs (vs. 100 usecs as
      programmed) performs better for most traffic workloads, this can lead to
      performance regressions.
      
      This patch adds the correct conversion when writing the initial values to
      the ITRN registers.
      Signed-off-by: default avatarNicholas Nunley <nicholas.d.nunley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4eda4e00
    • Colin Ian King's avatar
      ice: fix potential infinite loop because loop counter being too small · 615457a2
      Colin Ian King authored
      Currently the for-loop counter i is a u8 however it is being checked
      against a maximum value hw->num_tx_sched_layers which is a u16. Hence
      there is a potential wrap-around of counter i back to zero if
      hw->num_tx_sched_layers is greater than 255.  Fix this by making i
      a u16.
      
      Addresses-Coverity: ("Infinite loop")
      Fixes: b36c598c ("ice: Updates to Tx scheduler code")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      615457a2
  4. 08 Nov, 2019 21 commits
    • David S. Miller's avatar
      Merge branch 'sctp-rfc7829' · 92da362c
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: update from rfc7829
      
      SCTP-PF was implemented based on a Internet-Draft in 2012:
      
        https://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
      
      It's been updated quite a few by rfc7829 in 2016.
      
      This patchset adds the following features:
      
        1. add SCTP_ADDR_POTENTIALLY_FAILED notification
        2. add pf_expose per netns/sock/asoc
        3. add SCTP_EXPOSE_POTENTIALLY_FAILED_STATE sockopt
        4. add ps_retrans per netns/sock/asoc/transport
           (Primary Path Switchover)
        5. add spt_pathcpthld for SCTP_PEER_ADDR_THLDS sockopt
      
      v1->v2:
        - See Patch 2/5 and Patch 5/5.
      v2->v3:
        - See Patch 1/5, 2/5 and 3/5.
      v3->v4:
        - See Patch 1/5, 2/5, 3/5 and 4/5.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92da362c
    • Xin Long's avatar
      sctp: add SCTP_PEER_ADDR_THLDS_V2 sockopt · d467ac0a
      Xin Long authored
      Section 7.2 of rfc7829: "Peer Address Thresholds (SCTP_PEER_ADDR_THLDS)
      Socket Option" extends 'struct sctp_paddrthlds' with 'spt_pathcpthld'
      added to allow a user to change ps_retrans per sock/asoc/transport, as
      other 2 paddrthlds: pf_retrans, pathmaxrxt.
      
      Note: to not break the user's program, here to support pf_retrans dump
      and setting by adding a new sockopt SCTP_PEER_ADDR_THLDS_V2, and a new
      structure sctp_paddrthlds_v2 instead of extending sctp_paddrthlds.
      
      Also, when setting ps_retrans, the value is not allowed to be greater
      than pf_retrans.
      
      v1->v2:
        - use SCTP_PEER_ADDR_THLDS_V2 to set/get pf_retrans instead,
          as Marcelo and David Laight suggested.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d467ac0a
    • Xin Long's avatar
      sctp: add support for Primary Path Switchover · 34515e94
      Xin Long authored
      This is a new feature defined in section 5 of rfc7829: "Primary Path
      Switchover". By introducing a new tunable parameter:
      
        Primary.Switchover.Max.Retrans (PSMR)
      
      The primary path will be changed to another active path when the path
      error counter on the old primary path exceeds PSMR, so that "the SCTP
      sender is allowed to continue data transmission on a new working path
      even when the old primary destination address becomes active again".
      
      This patch is to add this tunable parameter, 'ps_retrans' per netns,
      sock, asoc and transport. It also allows a user to change ps_retrans
      per netns by sysctl, and ps_retrans per sock/asoc/transport will be
      initialized with it.
      
      The check will be done in sctp_do_8_2_transport_strike() when this
      feature is enabled.
      
      Note this feature is disabled by initializing 'ps_retrans' per netns
      as 0xffff by default, and its value can't be less than 'pf_retrans'
      when changing by sysctl.
      
      v3->v4:
        - add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of
          sysctl 'ps_retrans'.
        - add a new entry for ps_retrans on ip-sysctl.txt.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34515e94
    • Xin Long's avatar
      sctp: add SCTP_EXPOSE_POTENTIALLY_FAILED_STATE sockopt · 8d2a6935
      Xin Long authored
      This is a sockopt defined in section 7.3 of rfc7829: "Exposing
      the Potentially Failed Path State", by which users can change
      pf_expose per sock and asoc.
      
      The new sockopt SCTP_EXPOSE_POTENTIALLY_FAILED_STATE is also
      known as SCTP_EXPOSE_PF_STATE for short.
      
      v2->v3:
        - return -EINVAL if params.assoc_value > SCTP_PF_EXPOSE_MAX.
        - define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.
      v3->v4:
        - improve changelog.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d2a6935
    • Xin Long's avatar
      sctp: add SCTP_ADDR_POTENTIALLY_FAILED notification · 768e1518
      Xin Long authored
      SCTP Quick failover draft section 5.1, point 5 has been removed
      from rfc7829. Instead, "the sender SHOULD (i) notify the Upper
      Layer Protocol (ULP) about this state transition", as said in
      section 3.2, point 8.
      
      So this patch is to add SCTP_ADDR_POTENTIALLY_FAILED, defined
      in section 7.1, "which is reported if the affected address
      becomes PF". Also remove transport cwnd's update when moving
      from PF back to ACTIVE , which is no longer in rfc7829 either.
      
      Note that ulp_notify will be set to false if asoc->expose is
      not 'enabled', according to last patch.
      
      v2->v3:
        - define SCTP_ADDR_PF SCTP_ADDR_POTENTIALLY_FAILED.
      v3->v4:
        - initialize spc_state with SCTP_ADDR_AVAILABLE, as Marcelo suggested.
        - check asoc->pf_expose in sctp_assoc_control_transport(), as Marcelo
          suggested.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      768e1518
    • Xin Long's avatar
      sctp: add pf_expose per netns and sock and asoc · aef587be
      Xin Long authored
      As said in rfc7829, section 3, point 12:
      
        The SCTP stack SHOULD expose the PF state of its destination
        addresses to the ULP as well as provide the means to notify the
        ULP of state transitions of its destination addresses from
        active to PF, and vice versa.  However, it is recommended that
        an SCTP stack implementing SCTP-PF also allows for the ULP to be
        kept ignorant of the PF state of its destinations and the
        associated state transitions, thus allowing for retention of the
        simpler state transition model of [RFC4960] in the ULP.
      
      Not only does it allow to expose the PF state to ULP, but also
      allow to ignore sctp-pf to ULP.
      
      So this patch is to add pf_expose per netns, sock and asoc. And in
      sctp_assoc_control_transport(), ulp_notify will be set to false if
      asoc->expose is not 'enabled' in next patch.
      
      It also allows a user to change pf_expose per netns by sysctl, and
      pf_expose per sock and asoc will be initialized with it.
      
      Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
      to not allow a user to query the state of a sctp-pf peer address
      when pf_expose is 'disabled', as said in section 7.3.
      
      v1->v2:
        - Fix a build warning noticed by Nathan Chancellor.
      v2->v3:
        - set pf_expose to UNUSED by default to keep compatible with old
          applications.
      v3->v4:
        - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
        - change this patch to 1/5, and move sctp_assoc_control_transport
          change into 2/5, as Marcelo suggested.
        - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
          set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aef587be
    • Jiri Pirko's avatar
      devlink: disallow reload operation during device cleanup · a0c76345
      Jiri Pirko authored
      There is a race between driver code that does setup/cleanup of device
      and devlink reload operation that in some drivers works with the same
      code. Use after free could we easily obtained by running:
      
      while true; do
              echo 10 > /sys/bus/netdevsim/new_device
              devlink dev reload netdevsim/netdevsim10 &
              echo 10 > /sys/bus/netdevsim/del_device
      done
      
      Fix this by enabling reload only after setup of device is complete and
      disabling it at the beginning of the cleanup process.
      Reported-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Fixes: 2d8dc5bb ("devlink: Add support for reload")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0c76345
    • Jiri Pirko's avatar
      selftest: net: add alternative names test · f95e6c9c
      Jiri Pirko authored
      Add a simple test for recently added netdevice alternative names.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f95e6c9c
    • Manish Chopra's avatar
      qede: fix NULL pointer deref in __qede_remove() · deabc871
      Manish Chopra authored
      While rebooting the system with SR-IOV vfs enabled leads
      to below crash due to recurrence of __qede_remove() on the VF
      devices (first from .shutdown() flow of the VF itself and
      another from PF's .shutdown() flow executing pci_disable_sriov())
      
      This patch adds a safeguard in __qede_remove() flow to fix this,
      so that driver doesn't attempt to remove "already removed" devices.
      
      [  194.360134] BUG: unable to handle kernel NULL pointer dereference at 00000000000008dc
      [  194.360227] IP: [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
      [  194.360304] PGD 0
      [  194.360325] Oops: 0000 [#1] SMP
      [  194.360360] Modules linked in: tcp_lp fuse tun bridge stp llc devlink bonding ip_set nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi dell_smbios iTCO_wdt iTCO_vendor_support dell_wmi_descriptor dcdbas vfat fat pcc_cpufreq skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd qedr ib_core pcspkr ses enclosure joydev ipmi_ssif sg i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_pad acpi_power_meter xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200
      [  194.361044]  qede i2c_algo_bit drm_kms_helper qed syscopyarea sysfillrect nvme sysimgblt fb_sys_fops ttm nvme_core mpt3sas crc8 ptp drm pps_core ahci raid_class scsi_transport_sas libahci libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
      [  194.361297] CPU: 51 PID: 7996 Comm: reboot Kdump: loaded Not tainted 3.10.0-1062.el7.x86_64 #1
      [  194.361359] Hardware name: Dell Inc. PowerEdge MX840c/0740HW, BIOS 2.4.6 10/15/2019
      [  194.361412] task: ffff9cea9b360000 ti: ffff9ceabebdc000 task.ti: ffff9ceabebdc000
      [  194.361463] RIP: 0010:[<ffffffffc03553c4>]  [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
      [  194.361534] RSP: 0018:ffff9ceabebdfac0  EFLAGS: 00010282
      [  194.361570] RAX: 0000000000000000 RBX: ffff9cd013846098 RCX: 0000000000000000
      [  194.361621] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9cd013846098
      [  194.361668] RBP: ffff9ceabebdfae8 R08: 0000000000000000 R09: 0000000000000000
      [  194.361715] R10: 00000000bfe14201 R11: ffff9ceabfe141e0 R12: 0000000000000000
      [  194.361762] R13: ffff9cd013846098 R14: 0000000000000000 R15: ffff9ceab5e48000
      [  194.361810] FS:  00007f799c02d880(0000) GS:ffff9ceacb0c0000(0000) knlGS:0000000000000000
      [  194.361865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  194.361903] CR2: 00000000000008dc CR3: 0000001bdac76000 CR4: 00000000007607e0
      [  194.361953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  194.362002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  194.362051] PKRU: 55555554
      [  194.362073] Call Trace:
      [  194.362109]  [<ffffffffc0355500>] qede_remove+0x10/0x20 [qede]
      [  194.362180]  [<ffffffffb97d0f3e>] pci_device_remove+0x3e/0xc0
      [  194.362240]  [<ffffffffb98b3c52>] __device_release_driver+0x82/0xf0
      [  194.362285]  [<ffffffffb98b3ce3>] device_release_driver+0x23/0x30
      [  194.362343]  [<ffffffffb97c86d4>] pci_stop_bus_device+0x84/0xa0
      [  194.362388]  [<ffffffffb97c87e2>] pci_stop_and_remove_bus_device+0x12/0x20
      [  194.362450]  [<ffffffffb97f153f>] pci_iov_remove_virtfn+0xaf/0x160
      [  194.362496]  [<ffffffffb97f1aec>] sriov_disable+0x3c/0xf0
      [  194.362534]  [<ffffffffb97f1bc3>] pci_disable_sriov+0x23/0x30
      [  194.362599]  [<ffffffffc02f83c3>] qed_sriov_disable+0x5e3/0x650 [qed]
      [  194.362658]  [<ffffffffb9622df6>] ? kfree+0x106/0x140
      [  194.362709]  [<ffffffffc02cc0c0>] ? qed_free_stream_mem+0x70/0x90 [qed]
      [  194.362754]  [<ffffffffb9622df6>] ? kfree+0x106/0x140
      [  194.362803]  [<ffffffffc02cd659>] qed_slowpath_stop+0x1a9/0x1d0 [qed]
      [  194.362854]  [<ffffffffc035544e>] __qede_remove+0xae/0x130 [qede]
      [  194.362904]  [<ffffffffc03554e0>] qede_shutdown+0x10/0x20 [qede]
      [  194.362956]  [<ffffffffb97cf90a>] pci_device_shutdown+0x3a/0x60
      [  194.363010]  [<ffffffffb98b180b>] device_shutdown+0xfb/0x1f0
      [  194.363066]  [<ffffffffb94b66c6>] kernel_restart_prepare+0x36/0x40
      [  194.363107]  [<ffffffffb94b66e2>] kernel_restart+0x12/0x60
      [  194.363146]  [<ffffffffb94b6959>] SYSC_reboot+0x229/0x260
      [  194.363196]  [<ffffffffb95f200d>] ? handle_mm_fault+0x39d/0x9b0
      [  194.363253]  [<ffffffffb942b621>] ? __switch_to+0x151/0x580
      [  194.363304]  [<ffffffffb9b7ec28>] ? __schedule+0x448/0x9c0
      [  194.363343]  [<ffffffffb94b69fe>] SyS_reboot+0xe/0x10
      [  194.363387]  [<ffffffffb9b8bede>] system_call_fastpath+0x25/0x2a
      [  194.363430] Code: f9 e9 37 ff ff ff 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 4c 8d af 98 00 00 00 41 54 4c 89 ef 41 89 f4 53 e8 4c e4 55 f9 <80> b8 dc 08 00 00 01 48 89 c3 4c 8d b8 c0 08 00 00 4c 8b b0 c0
      [  194.363712] RIP  [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
      [  194.363764]  RSP <ffff9ceabebdfac0>
      [  194.363791] CR2: 00000000000008dc
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarSudarsana Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      deabc871
    • Eric Dumazet's avatar
      packet: fix data-race in fanout_flow_is_huge() · b756ad92
      Eric Dumazet authored
      KCSAN reported the following data-race [1]
      
      Adding a couple of READ_ONCE()/WRITE_ONCE() should silence it.
      
      Since the report hinted about multiple cpus using the history
      concurrently, I added a test avoiding writing on it if the
      victim slot already contains the desired value.
      
      [1]
      
      BUG: KCSAN: data-race in fanout_demux_rollover / fanout_demux_rollover
      
      read to 0xffff8880b01786cc of 4 bytes by task 18921 on cpu 1:
       fanout_flow_is_huge net/packet/af_packet.c:1303 [inline]
       fanout_demux_rollover+0x33e/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b01786cc of 4 bytes by task 18922 on cpu 0:
       fanout_flow_is_huge net/packet/af_packet.c:1306 [inline]
       fanout_demux_rollover+0x3a4/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18922 Comm: syz-executor.3 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 3b3a5b0a ("packet: rollover huge flows before small flows")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b756ad92
    • David S. Miller's avatar
      Merge branch 'TIPC-Encryption' · 1c8f11d0
      David S. Miller authored
      Tuong Lien says:
      
      ====================
      TIPC Encryption
      
      This series provides TIPC encryption feature, kernel part. There will be
      another one in the 'iproute2/tipc' for user space to set key.
      
      v2: add select crypto 'aes(gcm)' for TIPC_CRYPTO in Kconfig
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c8f11d0
    • Tuong Lien's avatar
      tipc: add support for AEAD key setting via netlink · e1f32190
      Tuong Lien authored
      This commit adds two netlink commands to TIPC in order for user to be
      able to set or remove AEAD keys:
      - TIPC_NL_KEY_SET
      - TIPC_NL_KEY_FLUSH
      
      When the 'KEY_SET' is given along with the key data, the key will be
      initiated and attached to TIPC crypto. On the other hand, the
      'KEY_FLUSH' command will remove all existing keys if any.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1f32190
    • Tuong Lien's avatar
      tipc: introduce TIPC encryption & authentication · fc1b6d6d
      Tuong Lien authored
      This commit offers an option to encrypt and authenticate all messaging,
      including the neighbor discovery messages. The currently most advanced
      algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
      encryption/decryption is done at the bearer layer, just before leaving
      or after entering TIPC.
      
      Supported features:
      - Encryption & authentication of all TIPC messages (header + data);
      - Two symmetric-key modes: Cluster and Per-node;
      - Automatic key switching;
      - Key-expired revoking (sequence number wrapped);
      - Lock-free encryption/decryption (RCU);
      - Asynchronous crypto, Intel AES-NI supported;
      - Multiple cipher transforms;
      - Logs & statistics;
      
      Two key modes:
      - Cluster key mode: One single key is used for both TX & RX in all
      nodes in the cluster.
      - Per-node key mode: Each nodes in the cluster has one specific TX key.
      For RX, a node requires its peers' TX key to be able to decrypt the
      messages from those peers.
      
      Key setting from user-space is performed via netlink by a user program
      (e.g. the iproute2 'tipc' tool).
      
      Internal key state machine:
      
                                       Attach    Align(RX)
                                           +-+   +-+
                                           | V   | V
              +---------+      Attach     +---------+
              |  IDLE   |---------------->| PENDING |(user = 0)
              +---------+                 +---------+
                 A   A                   Switch|  A
                 |   |                         |  |
                 |   | Free(switch/revoked)    |  |
           (Free)|   +----------------------+  |  |Timeout
                 |              (TX)        |  |  |(RX)
                 |                          |  |  |
                 |                          |  v  |
              +---------+      Switch     +---------+
              | PASSIVE |<----------------| ACTIVE  |
              +---------+       (RX)      +---------+
              (user = 1)                  (user >= 1)
      
      The number of TFMs is 10 by default and can be changed via the procfs
      'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
      also used to print the crypto statistics at runtime:
      
      echo 0xfff1 > /proc/sys/net/tipc/max_tfms
      
      The patch defines a new TIPC version (v7) for the encryption message (-
      backward compatibility as well). The message is basically encapsulated
      as follows:
      
         +----------------------------------------------------------+
         | TIPCv7 encryption  | Original TIPCv2    | Authentication |
         | header             | packet (encrypted) | Tag            |
         +----------------------------------------------------------+
      
      The throughput is about ~40% for small messages (compared with non-
      encryption) and ~9% for large messages. With the support from hardware
      crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
      upto ~85% for small messages and ~55% for large messages.
      
      By default, the new feature is inactive (i.e. no encryption) until user
      sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
      in the kernel configuration to enable/disable the new code when needed.
      
      MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc1b6d6d
    • Tuong Lien's avatar
      tipc: add new AEAD key structure for user API · 134bdac3
      Tuong Lien authored
      The new structure 'tipc_aead_key' is added to the 'tipc.h' for user to
      be able to transfer a key to TIPC in kernel. Netlink will be used for
      this purpose in the later commits.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      134bdac3
    • Tuong Lien's avatar
      tipc: enable creating a "preliminary" node · 4cbf8ac2
      Tuong Lien authored
      When user sets RX key for a peer not existing on the own node, a new
      node entry is needed to which the RX key will be attached. However,
      since the peer node address (& capabilities) is unknown at that moment,
      only the node-ID is provided, this commit allows the creation of a node
      with only the data that we call as “preliminary”.
      
      A preliminary node is not the object of the “tipc_node_find()” but the
      “tipc_node_find_by_id()”. Once the first message i.e. LINK_CONFIG comes
      from that peer, and is successfully decrypted by the own node, the
      actual peer node data will be properly updated and the node will
      function as usual.
      
      In addition, the node timer always starts when a node object is created
      so if a preliminary node is not used, it will be cleaned up.
      
      The later encryption functions will also use the node timer and be able
      to create a preliminary node automatically when needed.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cbf8ac2
    • Tuong Lien's avatar
      tipc: add reference counter to bearer · 2a7ee696
      Tuong Lien authored
      As a need to support the crypto asynchronous operations in the later
      commits, apart from the current RCU mechanism for bearer pointer, we
      add a 'refcnt' to the bearer object as well.
      
      So, a bearer can be hold via 'tipc_bearer_hold()' without being freed
      even though the bearer or interface can be disabled in the meanwhile.
      If that happens, the bearer will be released then when the crypto
      operation is completed and 'tipc_bearer_put()' is called.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a7ee696
    • Eric Dumazet's avatar
      net: fix data-race in neigh_event_send() · 1b53d644
      Eric Dumazet authored
      KCSAN reported the following data-race [1]
      
      The fix will also prevent the compiler from optimizing out
      the condition.
      
      [1]
      
      BUG: KCSAN: data-race in neigh_resolve_output / neigh_resolve_output
      
      write to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 1:
       neigh_event_send include/net/neighbour.h:443 [inline]
       neigh_resolve_output+0x78/0x480 net/core/neighbour.c:1474
       neigh_output include/net/neighbour.h:511 [inline]
       ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:308 [inline]
       __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
       ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
       dst_output include/net/dst.h:436 [inline]
       ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
       __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
       ip_queue_xmit+0x45/0x60 include/net/ip.h:237
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
       tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
       tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
       tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
       tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:618
      
      read to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 0:
       neigh_event_send include/net/neighbour.h:442 [inline]
       neigh_resolve_output+0x57/0x480 net/core/neighbour.c:1474
       neigh_output include/net/neighbour.h:511 [inline]
       ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:308 [inline]
       __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
       ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
       dst_output include/net/dst.h:436 [inline]
       ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
       __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
       ip_queue_xmit+0x45/0x60 include/net/ip.h:237
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
       tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
       tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
       tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b53d644
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · f1ff4e80
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-11-08
      
      Another series that contains updates to the ice driver only.
      
      Anirudh cleans up the code of kernel config of ifdef wrappers by moving
      code that is needed by DCB to disable and enable the PF VSI for
      configuration.  Implements ice_vsi_type_str() to convert an VSI type
      enum value to its string equivalent to help identify VSI types from
      module print statements.
      
      Usha and Tarun add support for setting the maximum per-queue bit rate
      for transmit queues.
      
      Dave implements dcb_nl set functions and supporting software DCB
      functions to support the callbacks defined in the dcbnl_rtnl_ops
      structure.
      
      Henry adds a check to ensure we are not resetting the device when trying
      to configure it, and to return -EBUSY during a reset.
      
      Usha fixes a call trace caused by the receive/transmit descriptor size
      change request via ethtool when DCB is configured by using the number of
      enabled queues and not the total number of allocated queues.
      
      Paul cleans up and refactors the software LLDP configuration to handle
      when firmware DCBX is disabled.
      
      Akeem adds checks to ensure the VF or PF is not disabled before honoring
      mailbox messages to configure the VF.
      
      Brett corrects the check to make sure the vector_id passed down from
      iavf is less than the max allowed interrupts per VF.  Updates a flag bit
      to align with the current specification.
      
      Bruce updates a switch statement to use the correct status of the
      Download Package AQ command.  Does some housekeeping by cleaning up a
      conditional check that is not needed.
      
      Mitch shortens up the delay for SQ responses to resolve issues with VF
      resets failing.
      
      Jake cleans up the code reducing namespace pollution and to simplify
      ice_debug_cq() since it always uses the same mask, not need to pass it
      in.  Improve debugging by adding the command opcode in the debug
      messages that print an error code.
      
      v2: fixed reverse christmas tree issue in patch 3 of the series.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1ff4e80
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-5.4-rc7' of... · abf6c397
      Linus Torvalds authored
      Merge tag 'pwm/for-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm fix from Thierry Reding:
       "One more fix to keep a reference to the driver's module as long as
        there are users of the PWM exposed by the driver"
      
      * tag 'pwm/for-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        pwm: bcm-iproc: Prevent unloading the driver module while in use
      abf6c397
    • Tejun Heo's avatar
      cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead · 65de03e2
      Tejun Heo authored
      cgroup writeback tries to refresh the associated wb immediately if the
      current wb is dead.  This is to avoid keeping issuing IOs on the stale
      wb after memcg - blkcg association has changed (ie. when blkcg got
      disabled / enabled higher up in the hierarchy).
      
      Unfortunately, the logic gets triggered spuriously on inodes which are
      associated with dead cgroups.  When the logic is triggered on dead
      cgroups, the attempt fails only after doing quite a bit of work
      allocating and initializing a new wb.
      
      While c3aab9a0 ("mm/filemap.c: don't initiate writeback if mapping
      has no dirty pages") alleviated the issue significantly as it now only
      triggers when the inode has dirty pages.  However, the condition can
      still be triggered before the inode is switched to a different cgroup
      and the logic simply doesn't make sense.
      
      Skip the immediate switching if the associated memcg is dying.
      
      This is a simplified version of the following two patches:
      
       * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
       * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz
      
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: e8a7abf5 ("writeback: disassociate inodes from dying bdi_writebacks")
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65de03e2
    • Eric Dumazet's avatar
      net: icmp: fix data-race in cmp_global_allow() · bbab7ef2
      Eric Dumazet authored
      This code reads two global variables without protection
      of a lock. We need READ_ONCE()/WRITE_ONCE() pairs to
      avoid load/store-tearing and better document the intent.
      
      KCSAN reported :
      BUG: KCSAN: data-race in icmp_global_allow / icmp_global_allow
      
      read to 0xffffffff861a8014 of 4 bytes by task 11201 on cpu 0:
       icmp_global_allow+0x36/0x1b0 net/ipv4/icmp.c:254
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
      
      write to 0xffffffff861a8014 of 4 bytes by task 11183 on cpu 1:
       icmp_global_allow+0x174/0x1b0 net/ipv4/icmp.c:272
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 4cdf507d ("icmp: add a global rate limitation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbab7ef2