1. 17 Aug, 2019 26 commits
    • David S. Miller's avatar
      Merge branch 'drop_monitor-for-offloaded-paths' · 83beee5a
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      Add drop monitor for offloaded data paths
      
      Users have several ways to debug the kernel and understand why a packet
      was dropped. For example, using drop monitor and perf. Both utilities
      trace kfree_skb(), which is the function called when a packet is freed
      as part of a failure. The information provided by these tools is
      invaluable when trying to understand the cause of a packet loss.
      
      In recent years, large portions of the kernel data path were offloaded
      to capable devices. Today, it is possible to perform L2 and L3
      forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
      Different TC classifiers and actions are also offloaded to capable
      devices, at both ingress and egress.
      
      However, when the data path is offloaded it is not possible to achieve
      the same level of introspection since packets are dropped by the
      underlying device and never reach the kernel.
      
      This patchset aims to solve this by allowing users to monitor packets
      that the underlying device decided to drop along with relevant metadata
      such as the drop reason and ingress port.
      
      The above is achieved by exposing a fundamental capability of devices
      capable of data path offloading - packet trapping. In much the same way
      as drop monitor registers its probe function with the kfree_skb()
      tracepoint, the device is instructed to pass to the CPU (trap) packets
      that it decided to drop in various places in the pipeline.
      
      The configuration of the device to pass such packets to the CPU is
      performed using devlink, as it is not specific to a port, but rather to
      a device. In the future, we plan to control the policing of such packets
      using devlink, in order not to overwhelm the CPU.
      
      While devlink is used as the control path, the dropped packets are
      passed along with metadata to drop monitor, which reports them to
      userspace as netlink events. This allows users to use the same interface
      for the monitoring of both software and hardware drops.
      
      Logically, the solution looks as follows:
      
                                          Netlink event: Packet w/ metadata
                                                         Or a summary of recent drops
                                        ^
                                        |
               Userspace                |
              +---------------------------------------------------+
               Kernel                   |
                                        |
                                +-------+--------+
                                |                |
                                |  drop_monitor  |
                                |                |
                                +-------^--------+
                                        |
                                        |
                                        |
                                   +----+----+
                                   |         |      Kernel's Rx path
                                   | devlink |      (non-drop traps)
                                   |         |
                                   +----^----+      ^
                                        |           |
                                        +-----------+
                                        |
                                +-------+-------+
                                |               |
                                | Device driver |
                                |               |
                                +-------^-------+
               Kernel                   |
              +---------------------------------------------------+
               Hardware                 |
                                        | Trapped packet
                                        |
                                     +--+---+
                                     |      |
                                     | ASIC |
                                     |      |
                                     +------+
      
      In order to reduce the patch count, this patchset only includes
      integration with netdevsim. A follow-up patchset will add devlink-trap
      support in mlxsw.
      
      Patches #1-#7 extend drop monitor to also monitor hardware originated
      drops.
      
      Patches #8-#10 add the devlink-trap infrastructure.
      
      Patches #11-#12 add devlink-trap support in netdevsim.
      
      Patches #13-#16 add tests for the generic infrastructure over netdevsim.
      
      Example
      =======
      
      Instantiate netdevsim
      ---------------------
      
      List supported traps
      --------------------
      
      netdevsim/netdevsim10:
        name source_mac_is_multicast type drop generic true action drop group l2_drops
        name vlan_tag_mismatch type drop generic true action drop group l2_drops
        name ingress_vlan_filter type drop generic true action drop group l2_drops
        name ingress_spanning_tree_filter type drop generic true action drop group l2_drops
        name port_list_is_empty type drop generic true action drop group l2_drops
        name port_loopback_filter type drop generic true action drop group l2_drops
        name fid_miss type exception generic false action trap group l2_drops
        name blackhole_route type drop generic true action drop group l3_drops
        name ttl_value_is_too_small type exception generic true action trap group l3_drops
        name tail_drop type drop generic true action drop group buffer_drops
      
      Enable a trap
      -------------
      
      Query statistics
      ----------------
      
      netdevsim/netdevsim10:
        name blackhole_route type drop generic true action trap group l3_drops
          stats:
              rx:
                bytes 7384 packets 52
      
      Monitor dropped packets
      -----------------------
      
      dropwatch> set alertmode packet
      Setting alert mode
      Alert mode successfully set
      dropwatch> set sw true
      setting software drops monitoring to 1
      dropwatch> set hw true
      setting hardware drops monitoring to 1
      dropwatch> start
      Enabling monitoring...
      Kernel monitoring activated.
      Issue Ctrl-C to stop monitoring
      drop at: ttl_value_is_too_small (l3_drops)
      origin: hardware
      input port ifindex: 55
      input port name: eth0
      timestamp: Mon Aug 12 10:52:20 2019 445911505 nsec
      protocol: 0x800
      length: 142
      original length: 142
      
      drop at: ip6_mc_input+0x8b8/0xef8 (0xffffffff9e2bb0e8)
      origin: software
      input port ifindex: 4
      timestamp: Mon Aug 12 10:53:37 2019 024444587 nsec
      protocol: 0x86dd
      length: 110
      original length: 110
      
      Future plans
      ============
      
      * Provide more drop reasons as well as more metadata
      * Add dropmon support to libpcap, so that tcpdump/tshark could
        specifically listen on dropmon traffic, instead of capturing all
        netlink packets via nlmon interface
      
      Changes in v3:
      * Place test with the rest of the netdevsim tests
      * Fix test to load netdevsim module
      * Move devlink helpers from the test to devlink_lib.sh. Will be used
        by mlxsw tests
      * Re-order netdevsim includes in alphabetical order
      * Fix reverse xmas tree in netdevsim
      * Remove double include in netdevsim
      
      Changes in v2:
      * Use drop monitor to report dropped packets instead of devlink
      * Add drop monitor patches
      * Add test cases
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83beee5a
    • Ido Schimmel's avatar
    • Ido Schimmel's avatar
      selftests: devlink_trap: Add test cases for devlink-trap · b3cb7df9
      Ido Schimmel authored
      Add test cases for devlink-trap on top of the netdevsim implementation.
      
      The tests focus on the devlink-trap core infrastructure and user space
      API. They test both good and bad flows and also dismantle of the netdev
      and devlink device used to report trapped packets.
      
      This allows device drivers to focus their tests on device-specific
      functionality.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3cb7df9
    • Ido Schimmel's avatar
      selftests: forwarding: devlink_lib: Add devlink-trap helpers · a054c8d9
      Ido Schimmel authored
      Add helpers to interact with devlink-trap, such as setting the action of
      a trap and retrieving statistics.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a054c8d9
    • Ido Schimmel's avatar
      selftests: forwarding: devlink_lib: Allow tests to define devlink device · bc030d9c
      Ido Schimmel authored
      For tests that create their network interfaces dynamically or do not use
      interfaces at all (as with netdevsim) it is useful to define their own
      devlink device instead of deriving it from the first network interface.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc030d9c
    • Ido Schimmel's avatar
    • Ido Schimmel's avatar
      netdevsim: Add devlink-trap support · da58f90f
      Ido Schimmel authored
      Have netdevsim register its trap groups and traps with devlink during
      initialization and periodically report trapped packets to devlink core.
      
      Since netdevsim is not a real device, the trapped packets are emulated
      using a workqueue that periodically reports a UDP packet with a random
      5-tuple from each active packet trap and from each running netdev.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da58f90f
    • Ido Schimmel's avatar
      Documentation: Add devlink-trap documentation · f3047ca0
      Ido Schimmel authored
      Add initial documentation of the devlink-trap mechanism, explaining the
      background, motivation and the semantics of the interface.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3047ca0
    • Ido Schimmel's avatar
      devlink: Add generic packet traps and groups · 391203ab
      Ido Schimmel authored
      Add generic packet traps and groups that can report dropped packets as
      well as exceptions such as TTL error.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      391203ab
    • Ido Schimmel's avatar
      devlink: Add packet trap infrastructure · 0f420b6c
      Ido Schimmel authored
      Add the basic packet trap infrastructure that allows device drivers to
      register their supported packet traps and trap groups with devlink.
      
      Each driver is expected to provide basic information about each
      supported trap, such as name and ID, but also the supported metadata
      types that will accompany each packet trapped via the trap. The
      currently supported metadata type is just the input port, but more will
      be added in the future. For example, output port and traffic class.
      
      Trap groups allow users to set the action of all member traps. In
      addition, users can retrieve per-group statistics in case per-trap
      statistics are too narrow. In the future, the trap group object can be
      extended with more attributes, such as policer settings which will limit
      the amount of traffic generated by member traps towards the CPU.
      
      Beside registering their packet traps with devlink, drivers are also
      expected to report trapped packets to devlink along with relevant
      metadata. devlink will maintain packets and bytes statistics for each
      packet trap and will potentially report the trapped packet with its
      metadata to user space via drop monitor netlink channel.
      
      The interface towards the drivers is simple and allows devlink to set
      the action of the trap. Currently, only two actions are supported:
      'trap' and 'drop'. When set to 'trap', the device is expected to provide
      the sole copy of the packet to the driver which will pass it to devlink.
      When set to 'drop', the device is expected to drop the packet and not
      send a copy to the driver. In the future, more actions can be added,
      such as 'mirror'.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f420b6c
    • Ido Schimmel's avatar
      drop_monitor: Allow user to start monitoring hardware drops · 8e94c3bc
      Ido Schimmel authored
      Drop monitor has start and stop commands, but so far these were only
      used to start and stop monitoring of software drops.
      
      Now that drop monitor can also monitor hardware drops, we should allow
      the user to control these as well.
      
      Do that by adding SW and HW flags to these commands. If no flag is
      specified, then only start / stop monitoring software drops. This is
      done in order to maintain backward-compatibility with existing user
      space applications.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e94c3bc
    • Ido Schimmel's avatar
      drop_monitor: Add support for summary alert mode for hardware drops · d40e1deb
      Ido Schimmel authored
      In summary alert mode a notification is sent with a list of recent drop
      reasons and a count of how many packets were dropped due to this reason.
      
      To avoid expensive operations in the context in which packets are
      dropped, each CPU holds an array whose number of entries is the maximum
      number of drop reasons that can be encoded in the netlink notification.
      Each entry stores the drop reason and a count. When a packet is dropped
      the array is traversed and a new entry is created or the count of an
      existing entry is incremented.
      
      Later, in process context, the array is replaced with a newly allocated
      copy and the old array is encoded in a netlink notification. To avoid
      breaking user space, the notification includes the ancillary header,
      which is 'struct net_dm_alert_msg' with number of entries set to '0'.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d40e1deb
    • Ido Schimmel's avatar
      drop_monitor: Add support for packet alert mode for hardware drops · 5e58109b
      Ido Schimmel authored
      In a similar fashion to software drops, extend drop monitor to send
      netlink events when packets are dropped by the underlying hardware.
      
      The main difference is that instead of encoding the program counter (PC)
      from which kfree_skb() was called in the netlink message, we encode the
      hardware trap name. The two are mostly equivalent since they should both
      help the user understand why the packet was dropped.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e58109b
    • Ido Schimmel's avatar
      drop_monitor: Consider all monitoring states before performing configuration · 80cebed8
      Ido Schimmel authored
      The drop monitor configuration (e.g., alert mode) is global, but user
      will be able to enable monitoring of only software or hardware drops.
      
      Therefore, ensure that monitoring of both software and hardware drops are
      disabled before allowing drop monitor configuration to take place.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80cebed8
    • Ido Schimmel's avatar
      drop_monitor: Add basic infrastructure for hardware drops · edd3d007
      Ido Schimmel authored
      Export a function that can be invoked in order to report packets that
      were dropped by the underlying hardware along with metadata.
      
      Subsequent patches will add support for the different alert modes.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edd3d007
    • Ido Schimmel's avatar
      drop_monitor: Initialize hardware per-CPU data · cac1174f
      Ido Schimmel authored
      Like software drops, hardware drops also need the same type of per-CPU
      data. Therefore, initialize it during module initialization and
      de-initialize it during module exit.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cac1174f
    • Ido Schimmel's avatar
      drop_monitor: Move per-CPU data init/fini to separate functions · 9b63f57d
      Ido Schimmel authored
      Currently drop monitor only reports software drops to user space, but
      subsequent patches are going to add support for hardware drops.
      
      Like software drops, the per-CPU data of hardware drops needs to be
      initialized and de-initialized upon module initialization and exit. To
      avoid code duplication, break this code into separate functions, so that
      these could be re-used for hardware drops.
      
      No functional changes intended.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b63f57d
    • David S. Miller's avatar
      Merge branch 'bridge-mdb' · f7750830
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: mdb: allow dump/add/del of host-joined entries
      
      This set makes the bridge dump host-joined mdb entries, they should be
      treated as normal entries since they take a slot and are aging out.
      We already have notifications for them but we couldn't dump them until
      now so they remained hidden. We dump them similar to how they're
      notified, in order to keep user-space compatibility with the dumped
      objects (e.g. iproute2 dumps mdbs in a format which can be fed into
      add/del commands) we allow host-joined groups also to be added/deleted via
      mdb commands. That can later be used for L2 mcast MAC manipulation as
      was recently discussed. Note that iproute2 changes are not necessary,
      this set will work with the current user-space mdb code.
      
      Patch 01 - a trivial comment move
      Patch 02 - factors out the mdb filling code so it can be
                 re-used for the host-joined entries
      Patch 03 - dumps host-joined entries
      Patch 04 - allows manipulation of host-joined entries via standard mdb
                 calls
      
      v3: fix compiler warning in patch 04 (DaveM)
      v2: change patch 04 to avoid double notification and improve host group
          manual removal if no ports are present in the group
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7750830
    • Nikolay Aleksandrov's avatar
      net: bridge: mdb: allow add/delete for host-joined groups · 1bc844ee
      Nikolay Aleksandrov authored
      Currently this is needed only for user-space compatibility, so similar
      object adds/deletes as the dumped ones would succeed. Later it can be
      used for L2 mcast MAC add/delete.
      
      v3: fix compiler warning (DaveM)
      v2: don't send a notification when used from user-space, arm the group
          timer if no ports are left after host entry del
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bc844ee
    • Nikolay Aleksandrov's avatar
      net: bridge: mdb: dump host-joined entries as well · e77b0c84
      Nikolay Aleksandrov authored
      Currently we dump only the port mdb entries but we can have host-joined
      entries on the bridge itself and they should be treated as normal temp
      mdbs, they're already notified:
      $ bridge monitor all
      [MDB]dev br0 port br0 grp ff02::8 temp
      
      The group will not be shown in the bridge mdb output, but it takes 1 slot
      and it's timing out. If it's only host-joined then the mdb show output
      can even be empty.
      
      After this patch we show the host-joined groups:
      $ bridge mdb show
      dev br0 port br0 grp ff02::8 temp
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e77b0c84
    • Nikolay Aleksandrov's avatar
      net: bridge: mdb: factor out mdb filling · 6545916e
      Nikolay Aleksandrov authored
      We have to factor out the mdb fill portion in order to re-use it later for
      the bridge mdb entries. No functional changes intended.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6545916e
    • Nikolay Aleksandrov's avatar
      net: bridge: mdb: move vlan comments · f59783f5
      Nikolay Aleksandrov authored
      Trivial patch to move the vlan comments in their proper places above the
      vid 0 checks.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f59783f5
    • David S. Miller's avatar
      Merge branch 'net-phy-remove-genphy_config_init' · 59d0f749
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: remove genphy_config_init
      
      Supported PHY features are either auto-detected or explicitly set.
      In both cases calling genphy_config_init isn't needed. All that
      genphy_config_init does is removing features that are set as
      supported but can't be auto-detected. Basically it duplicates the
      code in genphy_read_abilities. Therefore remove genphy_config_init.
      
      v2:
      - remove call also from new adin driver
      v3:
      - pass NULL as config_init function pointer for dp83848
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59d0f749
    • Heiner Kallweit's avatar
      net: phy: remove genphy_config_init · 4b9cb2a5
      Heiner Kallweit authored
      Now that all users have been removed we can remove genphy_config_init.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b9cb2a5
    • Heiner Kallweit's avatar
      net: dsa: remove calls to genphy_config_init · 00843d99
      Heiner Kallweit authored
      Supported PHY features are either auto-detected or explicitly set.
      In both cases calling genphy_config_init isn't needed.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00843d99
    • Heiner Kallweit's avatar
      net: phy: remove calls to genphy_config_init · c227ce44
      Heiner Kallweit authored
      Supported PHY features are either auto-detected or explicitly set.
      In both cases calling genphy_config_init isn't needed. All that
      genphy_config_init does is removing features that are set as
      supported but can't be auto-detected. Basically it duplicates the
      code in genphy_read_abilities. Therefore remove such calls from
      all PHY drivers.
      
      v2:
      - remove call also from new adin PHY driver
      v3:
      - pass NULL as config_init function pointer for dp83848
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c227ce44
  2. 16 Aug, 2019 14 commits