1. 04 Dec, 2016 14 commits
    • David S. Miller's avatar
      Merge branch 'fib-notifier-event-replay' · 69248719
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      ipv4: fib: Replay events when registering FIB notifier
      
      Ido says:
      
      In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced
      by a new FIB notification chain to which modules could register in order
      to be notified about the addition and deletion of FIB entries. The
      motivation for this change was that switchdev drivers need to be able to
      reflect the entire FIB table and not only FIBs configured on top of the
      port netdevs themselves. This is useful in case of in-band management.
      
      The fundamental problem with this approach is that upon registration
      listeners lose all the information previously sent in the chain and
      thus have an incomplete view of the FIB tables, which can result in
      packet loss. This patchset fixes that by dumping the FIB tables and
      replaying notifications previously sent in the chain for the registered
      notification block.
      
      The entire dump process is done under RCU and thus the FIB notification
      chain is converted to be atomic. The listeners are modified accordingly.
      This is done in the first eight patches.
      
      The ninth patch adds a change sequence counter to ensure the integrity
      of the FIB dump. The last patch adds the dump itself to the FIB chain
      registration function and modifies existing listeners to pass a callback
      to be executed in case dump was inconsistent.
      
      ---
      v3->v4:
      - Register the notification block after the dump and protect it using
        the change sequence counter (Hannes Frederic Sowa).
      - Since we now integrate the dump into the registration function, drop
        the sysctl to set maximum number of retries and instead set it to a
        fixed number. Lets see if it's really a problem before adding something
        we can never remove.
      - For the same reason, dump FIB tables for all net namespaces.
      - Add a comment regarding guarantees provided by mutex semantics.
      
      v2->v3:
      - Add sysctl to set the number of FIB dump retries (Hannes Frederic Sowa).
      - Read the sequence counter under RTNL to ensure synchronization
        between the dump process and other processes changing the routing
        tables (Hannes Frederic Sowa).
      - Pass a callback to the dump function to be executed prior to a retry.
      - Limit the dump to a single net namespace.
      
      v1->v2:
      - Add a sequence counter to ensure the integrity of the FIB dump
        (David S. Miller, Hannes Frederic Sowa).
      - Protect notifications from re-ordering in listeners by using an
        ordered workqueue (Hannes Frederic Sowa).
      - Introduce fib_info_hold() (Jiri Pirko).
      - Relieve rocker from the need to invoke the FIB dump by registering
        to the FIB notification chain prior to ports creation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69248719
    • Ido Schimmel's avatar
      ipv4: fib: Replay events when registering FIB notifier · c3852ef7
      Ido Schimmel authored
      Commit b90eb754 ("fib: introduce FIB notification infrastructure")
      introduced a new notification chain to notify listeners (f.e., switchdev
      drivers) about addition and deletion of routes.
      
      However, upon registration to the chain the FIB tables can already be
      populated, which means potential listeners will have an incomplete view
      of the tables.
      
      Solve that by dumping the FIB tables and replaying the events to the
      passed notification block. The dump itself is done using RCU in order
      not to starve consumers that need RTNL to make progress.
      
      The integrity of the dump is ensured by reading the FIB change sequence
      counter before and after the dump under RTNL. This allows us to avoid
      the problematic situation in which the dumping process sends a ENTRY_ADD
      notification following ENTRY_DEL generated by another process holding
      RTNL.
      
      Callers of the registration function may pass a callback that is
      executed in case the dump was inconsistent with current FIB tables.
      
      The number of retries until a consistent dump is achieved is set to a
      fixed number to prevent callers from looping for long periods of time.
      In case current limit proves to be problematic in the future, it can be
      easily converted to be configurable using a sysctl.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3852ef7
    • Ido Schimmel's avatar
      ipv4: fib: Allow for consistent FIB dumping · cacaad11
      Ido Schimmel authored
      The next patch will enable listeners of the FIB notification chain to
      request a dump of the FIB tables. However, since RTNL isn't taken during
      the dump, it's possible for the FIB tables to change mid-dump, which
      will result in inconsistency between the listener's table and the
      kernel's.
      
      Allow listeners to know about changes that occurred mid-dump, by adding
      a change sequence counter to each net namespace. The counter is
      incremented just before a notification is sent in the FIB chain.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cacaad11
    • Ido Schimmel's avatar
      ipv4: fib: Convert FIB notification chain to be atomic · d3f706f6
      Ido Schimmel authored
      In order not to hold RTNL for long periods of time we're going to dump
      the FIB tables using RCU.
      
      Convert the FIB notification chain to be atomic, as we can't block in
      RCU critical sections.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3f706f6
    • Ido Schimmel's avatar
      rocker: Register FIB notifier before creating ports · 17f8be7d
      Ido Schimmel authored
      We can miss FIB notifications sent between the time the ports were
      created and the FIB notification block registered.
      
      Instead of receiving these notifications only when they are replayed for
      the FIB notification block during registration, just register the
      notification block before the ports are created.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17f8be7d
    • Ido Schimmel's avatar
      rocker: Implement FIB offload in deferred work · db701955
      Ido Schimmel authored
      Convert rocker to offload FIBs in deferred work in a similar fashion to
      mlxsw, which was converted in the previous commits.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db701955
    • Ido Schimmel's avatar
      rocker: Create an ordered workqueue for FIB offload · c1bb279c
      Ido Schimmel authored
      As explained in the previous commits, we need to process FIB entries
      addition / deletion events in FIFO order or otherwise we can have a
      mismatch between the kernel's FIB table and the device's.
      
      Create an ordered workqueue for rocker to which these work items will be
      submitted to.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1bb279c
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Implement FIB offload in deferred work · 3057224e
      Ido Schimmel authored
      FIB offload is currently done in process context with RTNL held, but
      we're about to dump the FIB tables in RCU critical section, so we can no
      longer sleep.
      
      Instead, defer the operation to process context using deferred work. Make
      sure fib info isn't freed while the work is queued by taking a reference
      on it and releasing it after the operation is done.
      
      Deferring the operation is valid because the upper layers always assume
      the operation was successful. If it's not, then the driver-specific
      abort mechanism is called and all routed traffic is directed to slow
      path.
      
      The work items are submitted to an ordered workqueue to prevent a
      mismatch between the kernel's FIB table and the device's.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3057224e
    • Ido Schimmel's avatar
      mlxsw: core: Create an ordered workqueue for FIB offload · a3832b31
      Ido Schimmel authored
      We're going to start processing FIB entries addition / deletion events
      in deferred work. These work items must be processed in the order they
      were submitted or otherwise we can have differences between the kernel's
      FIB table and the device's.
      
      Solve this by creating an ordered workqueue to which these work items
      will be submitted to. Note that we can't simply convert the current
      workqueue to be ordered, as EMADs re-transmissions are also processed in
      deferred work.
      
      Later on, we can migrate other work items to this workqueue, such as FDB
      notification processing and nexthop resolution, since they all take the
      same lock anyway.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3832b31
    • Ido Schimmel's avatar
      ipv4: fib: Add fib_info_hold() helper · 1c677b3d
      Ido Schimmel authored
      As explained in the previous commit, modules are going to need to take a
      reference on fib info and then drop it using fib_info_put().
      
      Add the fib_info_hold() helper to make the code more readable and also
      symmetric with fib_info_put().
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Suggested-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c677b3d
    • Ido Schimmel's avatar
      ipv4: fib: Export free_fib_info() · b423cb10
      Ido Schimmel authored
      The FIB notification chain is going to be converted to an atomic chain,
      which means switchdev drivers will have to offload FIB entries in
      deferred work, as hardware operations entail sleeping.
      
      However, while the work is queued fib info might be freed, so a
      reference must be taken. To release the reference (and potentially free
      the fib info) fib_info_put() will be called, which in turn calls
      free_fib_info().
      
      Export free_fib_info() so that modules will be able to invoke
      fib_info_put().
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b423cb10
    • WANG Cong's avatar
      act_mirred: fix a typo in get_dev · 548ed722
      WANG Cong authored
      Fixes: 255cb304 ("net/sched: act_mirred: Add new tc_action_ops get_dev()")
      Cc: Hadar Hen Zion <hadarh@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      548ed722
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · db7e9f7c
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-12-02
      
      This series contains updates to i40e and i40evf only.
      
      Alex provides changes so that we are much more robust about defining what
      we can and cannot offload in i40e and i40evf by doing additional checks
      other than L4 tunnel header length.
      
      Jake provides several fixes/changes, first cleaning up a label that is
      unnecessary, as well as cleaned up the use of a "magic number".  Clarified
      the code by separating the global private flags and the regular private
      flags per interface into two arrays, so that future additions will not
      produce duplication and buggy code.  Adds additional checks to protect
      against NULL values for msix_entries and q_vectors pointers.
      
      Michal adds Clause22 method for accessing registers for some external
      PHYs.
      
      Piotr adds additional protocol support for the admin queue discover
      capabilities function.
      
      Tushar Dave fixes a panic seen on SPARC, where writel() should not be
      used to write directly to a memory address but only to a memory mapped
      I/O address otherwise it causes data access exceptions.
      
      Joe Perches separates out a section of code into its own function, to
      help reduce i40evf_reset_task() a bit.
      
      Alan fixes an issue by checking for NULL before dereferencing msix_entries
      and returning early in the case where it is NULL within the i40evf_close()
      code path.
      
      Henry provides code cleanup to remove unreachable and redundant sections
      of code.  Fixed up an issue where new NICs were not identifying "unknown
      PHYs" correctly.
      
      Harshitha fixes a issue where the ethtool "Supported Link" modes list
      backplane interfaces on X722 devices for 10 GbE with SFP+ and Cortina
      retimer, where these interfaces should not be visible to the user since
      they cannot use them.
      
      Carolyn changes an X722 informational message so that it only appears
      when extra messages are desired.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db7e9f7c
    • Yuchung Cheng's avatar
      tcp: fix the missing avr32 SOF_TIMESTAMPING_OPT_STATS · 2bb14878
      Yuchung Cheng authored
      The commit of SOF_TIMESTAMPING_OPT_STATS didn't include the
      new header for avr32, causing build to break. The patch fixes it.
      
      Fixes: 1c885808 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bb14878
  2. 03 Dec, 2016 26 commits