1. 21 Jul, 2015 40 commits
    • Marcelo Ricardo Leitner's avatar
      sctp: fix cut and paste issue in comment · b52effd2
      Marcelo Ricardo Leitner authored
      Cookie ACK is always received by the association initiator, so fix the
      comment to avoid confusion.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b52effd2
    • David S. Miller's avatar
      Merge branch 'sctp-src-addr' · 57816cbc
      David S. Miller authored
      Marcelo Ricardo Leitner says:
      
      ====================
      sctp: fix src address selection if using secondary address
      
      This series improves the way SCTP chooses its src address so that the
      choosen one will always belong to the interface being used for output.
      
      v1->v2:
       - split out the refactoring from the fix itself
       - Doing a full reverse routing as in v1 is not necessary. Only looking
         for the interface that has the address and comparing its number is
         enough.
      ====================
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57816cbc
    • Marcelo Ricardo Leitner's avatar
      sctp: fix src address selection if using secondary addresses · 0ca50d12
      Marcelo Ricardo Leitner authored
      In short, sctp is likely to incorrectly choose src address if socket is
      bound to secondary addresses. This patch fixes it by adding a new check
      that checks if such src address belongs to the interface that routing
      identified as output.
      
      This is enough to avoid rp_filter drops on remote peer.
      
      Details:
      
      Currently, sctp will do a routing attempt without specifying the src
      address and compare the returned value (preferred source) with the
      addresses that the socket is bound to. When using secondary addresses,
      this will not match.
      
      Then it will try specifying each of the addresses that the socket is
      bound to and re-routing, checking if that address is valid as src for
      that dst. Thing is, this check alone is weak:
      
      # ip r l
      192.168.100.0/24 dev eth1  proto kernel  scope link  src 192.168.100.149
      192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.147
      
      # ip a l
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff
          inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0
             valid_lft 2160sec preferred_lft 2160sec
          inet 192.168.122.148/24 scope global secondary eth0
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:fe15:186a/64 scope link
             valid_lft forever preferred_lft forever
      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff
          inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1
             valid_lft 2162sec preferred_lft 2162sec
          inet 192.168.100.148/24 scope global secondary eth1
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:feb3:9146/64 scope link
             valid_lft forever preferred_lft forever
      4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff
          inet6 fe80::5054:ff:fe05:47ee/64 scope link
             valid_lft forever preferred_lft forever
      
      # ip r g 192.168.100.193 from 192.168.122.148
      192.168.100.193 from 192.168.122.148 dev eth1
          cache
      
      Even if you specify an interface:
      
      # ip r g 192.168.100.193 from 192.168.122.148 oif eth1
      192.168.100.193 from 192.168.122.148 dev eth1
          cache
      
      Although this would be valid, peers using rp_filter will drop such
      packets as their src doesn't match the routes for that interface.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ca50d12
    • Marcelo Ricardo Leitner's avatar
      sctp: reduce indent level on sctp_v4_get_dst · 07868284
      Marcelo Ricardo Leitner authored
      Paves the day for the next patch. Functionality stays untouched.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07868284
    • Sowmini Varadhan's avatar
      net/vxlan: Fix kernel unaligned access in __vxlan_find_mac · 7177a3b0
      Sowmini Varadhan authored
      __vxlan_find_mac invokes ether_addr_equal on the eth_addr field,
      which triggers unaligned access messages, so rearrange vxlan_fdb
      to avoid this in the most non-intrusive way.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: default avatarJiri Pirko <jiri@resnulli.us>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7177a3b0
    • Thomas Graf's avatar
      rhashtable: Allow other tasks to be scheduled in large lookup loops · 685a015e
      Thomas Graf authored
      Depending on system speed, the large lookup/insert/delete loops of the testsuite can
      take a considerable amount of time to complete causing watchdog warnings to appear.
      Allow other tasks to be scheduled throughout the loops.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      685a015e
    • Shaohui Xie's avatar
      phylib: add driver for Teranetics TN2020 · f61687c0
      Shaohui Xie authored
      Teranetics TN2020 is compliant with IEEE 802.3an 10 Gigabit.
      Signed-off-by: default avatarShaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f61687c0
    • David S. Miller's avatar
      Merge branch 'bpf-push-pop-helpers' · 500322ec
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: introduce bpf_skb_vlan_push/pop() helpers
      
      Let TC+eBPF programs call skb_vlan_push/pop via helpers.
      
      v1->v2:
      - reworded commit log to better explain correctness of re-caching
        and fixed comparison of mixed endiannes (suggested by Eric)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      500322ec
    • Alexei Starovoitov's avatar
      test_bpf: add bpf_skb_vlan_push/pop() tests · 4d9c5c53
      Alexei Starovoitov authored
      improve accuracy of timing in test_bpf and add two stress tests:
      - {skb->data[0], get_smp_processor_id} repeated 2k times
      - {skb->data[0], vlan_push} x 68 followed by {skb->data[0], vlan_pop} x 68
      
      1st test is useful to test performance of JIT implementation of BPF_LD_ABS
      together with BPF_CALL instructions.
      2nd test is stressing skb_vlan_push/pop logic together with skb->data access
      via BPF_LD_ABS insn which checks that re-caching of skb->data is done correctly.
      
      In order to call bpf_skb_vlan_push() from test_bpf.ko have to add
      three export_symbol_gpl.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d9c5c53
    • Alexei Starovoitov's avatar
      bpf: introduce bpf_skb_vlan_push/pop() helpers · 4e10df9a
      Alexei Starovoitov authored
      Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
      helper functions. These functions may change skb->data/hlen which are
      cached by some JITs to improve performance of ld_abs/ld_ind instructions.
      Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
      re-compute header len and re-cache skb->data/hlen back into cpu registers.
      Note, skb->data/hlen are not directly accessible from the programs,
      so any changes to skb->data done either by these helpers or by other
      TC actions are safe.
      
      eBPF JIT supported by three architectures:
      - arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
      - x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
        (experiments showed that conditional re-caching is slower).
      - s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
        in the program (re-caching is tbd).
      
      These helpers allow more scalable handling of vlan from the programs.
      Instead of creating thousands of vlan netdevs on top of eth0 and attaching
      TC+ingress+bpf to all of them, the program can be attached to eth0 directly
      and manipulate vlans as necessary.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e10df9a
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · f3120acc
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-07-17
      
      This series contains updates to igb, ixgbe, ixgbevf, i40e, bnx2x,
      freescale, siena and dp83640.
      
      Jacob provides several patches to clarify the intended way to implement
      both SIOCSHWTSTAMP and ethtool's get_ts_info().  It is okay to support
      the specific filters in SIOCSHWTSTAMP by upscaling them to the generic
      filters.
      
      Alex Duyck provides a igb patch to pull the time stamp from the fragment
      before it gets added to the skb, to avoid a possible issue in which the
      fragment can possibly be less than IGB_RX_HDR_LEN due to the time stamp
      being pulled after the copybreak check.  Also provides a ixgbevf patch to
      fold the ixgbevf_pull_tail() call into ixgbevf_add_rx_frag(), which gives
      the advantage that the fragment does not have to be modified after it is
      added to the skb.
      
      Fan provides patches for ixgbe/ixgbevf to set the receive hash type
      based on receive descriptor RSS type.
      
      Todd provides a fix for igb where on check for link on any media other
      than copper was not being detected since it was looking on the incorrect
      PHY page (due to the page being used gets switched before the function
      to check link gets executed).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3120acc
    • David S. Miller's avatar
      Merge branch 'bcmgenet-phy-rework' · 0e55a42a
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: bcmgenet: PHY initialization rework
      
      This patch series reworks how we perform PHY initialization and resets in the
      GENET driver. Although this contains mostly fixes, some of the changes are a
      bit too intrusive to be backported to 'net' at the moment.
      
      Some of the motivations behind these changes were to reduce the time spent in how
      performing MDIO transactions, since it is better to perform then when we have
      interrupts enabled. This reduces the bring-up time of GENET from ~600 msecs down
      to ~8 msecs, and about the same time for suspend/resume.
      
      Since I do not currently have a system which is not DT-aware, can you (Petri,
      Jaedon) give this a try and confirm things keep working as expected?
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e55a42a
    • Florian Fainelli's avatar
      net: bcmgenet: Remove init parameter from bcmgenet_mii_config · 28b45910
      Florian Fainelli authored
      Now that we have reworked the way we perform the PHY initialization, we
      no longer need to differentiate between init time vs. non-init time
      calls, just use a dev_info_once() print to print the PHY type.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28b45910
    • Florian Fainelli's avatar
      net: bcmgenet: Delay PHY initialization to bcmgenet_open() · 6cc8e6d4
      Florian Fainelli authored
      We are currently doing a full PHY initialization and even starting the
      pHY state machine during bcmgenet_mii_init() which is executed in the
      driver's probe function. This is convenient to determine whether we can
      attach to a proper PHY device but comes at the expense of spending up to
      10ms per MDIO transactions (to reach the waitqueue timeout), which slows
      things down.
      
      This also creates a sitaution where we end-up attaching twice to the
      PHY, which is not quite correct either.
      
      Fix this by moving bcmgenet_mii_probe() into bcmgenet_open() and update
      its error path accordingly.
      
      Avoid printing the message "attached PHY at address 1 [...]" every time
      we bring up/down the interface and remove this print since it duplicates
      what the PHY driver already does for us.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cc8e6d4
    • Florian Fainelli's avatar
      net: bcmgenet: Determine PHY type before scanning MDIO bus · c624f891
      Florian Fainelli authored
      Our internal GPHY might be powered off before we attempt scanning the
      MDIO bus and bind a driver to it. The way we are currently determining
      whether a PHY is internal or not is done *after* we have successfully
      matched its driver. If the PHY is powered down, it will not respond to
      the MDIO bus, so we will not be able to bind a driver to it.
      
      Our Device Tree for GENET interfaces specifies a "phy-mode" value:
      "internal" which tells if this internal uses an internal PHY or not.
      
      If of_get_phy_mode() fails to parse the 'phy-mode' property, do an
      additional manual lookup, and if we find "internal" set the
      corresponding internal variable accordingly.
      
      Replace all uses of phy_is_internal() with a check against
      priv->internal_phy to avoid having to rely on whether or not
      priv->phydev is set correctly.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c624f891
    • Florian Fainelli's avatar
      net: bcmgenet: Power on integrated GPHY in bcmgenet_power_up() · bd4060a6
      Florian Fainelli authored
      We are currently disabling the GPHY interface during bcmgenet_close(),
      and attempting to power it back on during bcmgenet_open(). This works
      fine for the first time, because we called bcmgenet_mii_config() which
      took care of enabling the interface, however, bcmgenet_power_up() really
      needs to power on the GPHY for correctness.
      
      This will be particularly important as we want to move
      bcmgenet_mii_probe() down to bcmgenet_open() to avoid seeing the "PHY
      already attached" message.
      
      Fixes: a642c4f7 ("net: bcmgenet: power up and down integrated GPHY when unused")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd4060a6
    • Florian Fainelli's avatar
      net: bcmgenet: Use correct dev_id for free_irq · 978ffac4
      Florian Fainelli authored
      bcmgenet_open()'s error path call free_irq() with a dev_id argument
      different from the one we used to call request_irq() with, this will
      make us trip over the warning in kernel/irq/manage.c:__free_irq()
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      978ffac4
    • Florian Fainelli's avatar
      net: bcmgenet: Remove excessive PHY reset · 6ac3ce82
      Florian Fainelli authored
      We are currently issuing multiple PHY resets during a suspend/resume,
      first during bcmgenet_power_up() which does a hardware reset, then a
      software reset by calling bcmgenet_mii_reset(). This is both unnecessary
      and can take as long as 10ms per MDIO transactions while we re-apply
      workarounds because we do not yet have MDIO interrupts enabled.
      
      phy_resume() takes care of re-apply our workarounds in case we need any,
      and bcmgenet_power_up() does a PHY hardware reset, all of this is more
      than enough to guarantee that the PHY operates correctly.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ac3ce82
    • David S. Miller's avatar
      Merge branch 'stmmac-cleanup' · 2c1bcaff
      David S. Miller authored
      Joachim Eastwood says:
      
      ====================
      stmmac clean up for 4.3 part1
      
      This patch set continues the conversion of the dwmac glue layers
      to more proper platform drivers. The first part of the patch set
      cleans up stmmac_platform a bit. Refactors code from the common
      probe function and exports two functions that will be used in
      the dwmac-* drivers.
      
      Second part converts two simple dwmac-* drivers to have their
      own probe function and use the exported functions. This brings
      us closer to point where stmmac_platform is only a library of
      common functions for the dwmac-* drivers to use.
      
      The plan next is:
       * add probe functions to the rest of the dwmac-* drivers
       * move probe function in stmmac_platform to dwmac-generic
       * remove struct stmmac_of_data and let those drivers
         that actually need match data handle it themselves
       * clean up include/linux/stmmac.h
      
      Note that this patch set has only been tested on lpc18xx so
      testing on other platforms is greatly appreciated.
      
      Previous parts can be found here:
      http://www.spinics.net/lists/netdev/msg328997.html
      http://www.spinics.net/lists/netdev/msg329932.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c1bcaff
    • Joachim Eastwood's avatar
      stmmac: drop custom_* fields from plat_stmmacenet_data · f4c190eb
      Joachim Eastwood authored
      Both of these fields are unused and has been unused since they
      were added 3 and 5 years ago. Drop them since they are clearly
      not very useful.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4c190eb
    • Joachim Eastwood's avatar
      stmmac: add proper probe function to dwmac-meson · 1734befd
      Joachim Eastwood authored
      By using a few functions from stmmac_platform we can now create
      a proper probe function in this driver. By doing so we can drop
      the OF match data and simplify the overall driver.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1734befd
    • Joachim Eastwood's avatar
      stmmac: add proper probe function to dwmac-lpc18xx · f4f8dfde
      Joachim Eastwood authored
      By using a few functions from stmmac_platform we can now create
      a proper probe function in this driver. By doing so we can drop
      the OF match data and simplify the overall driver.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4f8dfde
    • Joachim Eastwood's avatar
      stmmac: export probe_config_dt() and get_platform_resources() · 402dae0b
      Joachim Eastwood authored
      Export stmmac_probe_config_dt() and stmmac_get_platform_resources()
      so they can be used in the dwmac-* drivers themselves. This will
      allow us to build more flexible and standalone drivers which just
      use stmmac_platform as a library for setup functions.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      402dae0b
    • Joachim Eastwood's avatar
      stmmac: make stmmac_probe_config_dt return the platform data struct · b0003ead
      Joachim Eastwood authored
      Since stmmac_probe_config_dt() allocates the platform data structure
      it is cleaner if it just returned this structure directly. This
      function will later be used in the probe function in dwmac-* drivers.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0003ead
    • Joachim Eastwood's avatar
      stmmac: introduce stmmac_get_platform_resources() · f396cb01
      Joachim Eastwood authored
      Refactor all code that deals with platform resources into it's
      own get function. This function will later be used in the probe
      function in dwmac-* drivers.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f396cb01
    • Joachim Eastwood's avatar
      stmmac: clean up platform/of_match data retrieval · 4ed2d8fc
      Joachim Eastwood authored
      Refactor code to clearly separate probing non-dt versus dt. In the
      non-dt case platform data must be supplied to probe successfully.
      For dt the platform data structure is created and match data is
      copied into it. Note that support for supplying platform data in
      dt from AUXDATA is dropped as no users in mainline does this.
      
      This change will allow dt dwmac-* drivers to call the config_dt()
      function from probe to create the needed platform data struct and
      retrieve common dt properties.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ed2d8fc
    • Joachim Eastwood's avatar
      stmmac: use of_device_get_match_data to retrieve of match data · 0dacf3f6
      Joachim Eastwood authored
      By using of_device_get_match_data() the code that retrieve
      match data can be simplified quite a bit.
      Signed-off-by: default avatarJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dacf3f6
    • David S. Miller's avatar
      Merge branch 'tipc-separate-link-and-aggregation' · 7781e5d1
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: separate link and link aggregation layer
      
      This is the first batch of a longer series that has two main objectives:
      
      o Finer lock granularity during message sending and reception,
        especially regarding usage of the node spinlock.
      
      o Better separation between the link layer implementation and the link
        aggregation layer, represented by node.c::struct tipc_node.
      
      Hopefully these changes also make this part of code somewhat easier
      to comprehend and maintain.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7781e5d1
    • Jon Paul Maloy's avatar
      tipc: reduce locking scope during packet reception · d999297c
      Jon Paul Maloy authored
      We convert packet/message reception according to the same principle
      we have been using for message sending and timeout handling:
      
      We move the function tipc_rcv() to node.c, hence handling the initial
      packet reception at the link aggregation level. The function grabs
      the node lock, selects the receiving link, and accesses it via a new
      call tipc_link_rcv(). This function appends buffers to the input
      queue for delivery upwards, but it may also append outgoing packets
      to the xmit queue, just as we do during regular message sending. The
      latter will happen when buffers are forwarded from the link backlog,
      or when retransmission is requested.
      
      Upon return of this function, and after having released the node lock,
      tipc_rcv() delivers/tranmsits the contents of those queues, but it may
      also perform actions such as link activation or reset, as indicated by
      the return flags from the link.
      
      This reduces the number of cpu cycles spent inside the node spinlock,
      and reduces contention on that lock.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d999297c
    • Jon Paul Maloy's avatar
      tipc: introduce node contact FSM · 1a20cc25
      Jon Paul Maloy authored
      The logics for determining when a node is permitted to establish
      and maintain contact with its peer node becomes non-trivial in the
      presence of multiple parallel links that may come and go independently.
      
      A known failure scenario is that one endpoint registers both its links
      to the peer lost, cleans up it binding table, and prepares for a table
      update once contact is re-establihed, while the other endpoint may
      see its links reset and re-established one by one, hence seeing
      no need to re-synchronize the binding table. To avoid this, a node
      must not allow re-establishing contact until it has confirmation that
      even the peer has lost both links.
      
      Currently, the mechanism for handling this consists of setting and
      resetting two state flags from different locations in the code. This
      solution is hard to understand and maintain. A closer analysis even
      reveals that it is not completely safe.
      
      In this commit we do instead introduce an FSM that keeps track of
      the conditions for when the node can establish and maintain links.
      It has six states and four events, and is strictly based on explicit
      knowledge about the own node's and the peer node's contact states.
      Only events leading to state change are shown as edges in the figure
      below.
      
                                   +--------------+
                                   | SELF_UP/     |
                 +---------------->| PEER_COMING  |-----------------+
          SELF_  |                 +--------------+                 |PEER_
          ESTBL_ |                        |                         |ESTBL_
          CONTACT|      SELF_LOST_CONTACT |                         |CONTACT
                 |                        v                         |
                 |                 +--------------+                 |
                 |      PEER_      | SELF_DOWN/   |     SELF_       |
                 |      LOST_   +--| PEER_LEAVING |<--+ LOST_       v
      +-------------+   CONTACT |  +--------------+   | CONTACT  +-----------+
      | SELF_DOWN/  |<----------+                     +----------| SELF_UP/  |
      | PEER_DOWN   |<----------+                     +----------| PEER_UP   |
      +-------------+   SELF_   |  +--------------+   | PEER_    +-----------+
                 |      LOST_   +--| SELF_LEAVING/|<--+ LOST_       A
                 |      CONTACT    | PEER_DOWN    |     CONTACT     |
                 |                 +--------------+                 |
                 |                         A                        |
          PEER_  |       PEER_LOST_CONTACT |                        |SELF_
          ESTBL_ |                         |                        |ESTBL_
          CONTACT|                 +--------------+                 |CONTACT
                 +---------------->| PEER_UP/     |-----------------+
                                   | SELF_COMING  |
                                   +--------------+
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a20cc25
    • Jon Paul Maloy's avatar
      tipc: move link supervision timer to node level · 8a1577c9
      Jon Paul Maloy authored
      In our effort to move control of the links to the link aggregation
      layer, we move the perodic link supervision timer to struct tipc_node.
      The new timer is shared between all links belonging to the node, thus
      saving resources, while still kicking the FSM on both its pertaining
      links at each expiration.
      
      The current link timer and corresponding functions are removed.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a1577c9
    • Jon Paul Maloy's avatar
      tipc: simplify link timer implementation · 333ef69e
      Jon Paul Maloy authored
      We create a second, simpler, link timer function, tipc_link_timeout().
      The new function  makes use of the new FSM function introduced in the
      previous commit, and just like it, takes a buffer queue as parameter.
      It returns an event bit field and potentially a link protocol packet
      to the caller.
      
      The existing timer function, link_timeout(), is still needed for a
      while, so we redesign it to become a wrapper around the new function.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      333ef69e
    • Jon Paul Maloy's avatar
      tipc: improve link FSM implementation · 6ab30f9c
      Jon Paul Maloy authored
      The link FSM implementation is currently unnecessarily complex.
      It sometimes checks for conditional state outside the FSM data
      before deciding next state, and often performs actions directly
      inside the FSM logics.
      
      In this commit, we create a second, simpler FSM implementation,
      that as far as possible acts only on states and events that it is
      strictly defined for, and postpone any actions until it is finished
      with its decisions. It also returns an event flag field and an a
      buffer queue which may potentially contain a protocol message to
      be sent by the caller.
      
      Unfortunately, we cannot yet make the FSM "clean", in the sense
      that its decisions are only based on FSM state and event, and that
      state changes happen only here. That will have to wait until the
      activate/reset logics has been cleaned up in a future commit.
      
      We also rename the link states as follows:
      
      WORKING_WORKING -> TIPC_LINK_WORKING
      WORKING_UNKNOWN -> TIPC_LINK_PROBING
      RESET_UNKNOWN   -> TIPC_LINK_RESETTING
      RESET_RESET     -> TIPC_LINK_ESTABLISHING
      
      The existing FSM function, link_state_event(), is still needed for
      a while, so we redesign it to make use of the new function.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ab30f9c
    • Jon Paul Maloy's avatar
      tipc: introduce new link protocol msg create function · 426cc2b8
      Jon Paul Maloy authored
      As a preparation for later changes, we introduce a new function
      tipc_link_build_proto_msg(). Instead of actually sending the created
      protocol message, it only creates it and adds it to the head of a
      skb queue provided by the caller.
      
      Since we still need the existing function tipc_link_protocol_xmit()
      for a while, we redesign it to make use of the new function.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      426cc2b8
    • Jon Paul Maloy's avatar
      tipc: clean up definitions and usage of link flags · d3504c34
      Jon Paul Maloy authored
      The status flag LINK_STOPPED is not needed any more, since the
      mechanism for delayed deletion of links has been removed.
      Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
      because we can just as well start the link timer directly from
      inside tipc_link_create().
      
      We eliminate these flags in this commit.
      
      Instead of the above flags, we now introduce three new link modes,
      TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
      indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
      the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
      blocks timer-driven protocol messages to be sent out, and any change
      to the link FSM. Since the modes are mutually exclusive, we convert
      them to state values, and rename the 'flags' field in struct tipc_link
      to 'exec_mode'.
      
      Finally, we move the #defines for link FSM states and events from link.h
      into enums inside the file link.c, which is the real usage scope of
      these definitions.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3504c34
    • Jon Paul Maloy's avatar
      tipc: make media xmit call outside node spinlock context · af9b028e
      Jon Paul Maloy authored
      Currently, message sending is performed through a deep call chain,
      where the node spinlock is grabbed and held during a significant
      part of the transmission time. This is clearly detrimental to
      overall throughput performance; it would be better if we could send
      the message after the spinlock has been released.
      
      In this commit, we do instead let the call revert on the stack after
      the buffer chain has been added to the transmission queue, whereafter
      clones of the buffers are transmitted to the device layer outside the
      spinlock scope.
      
      As a further step in our effort to separate the roles of the node
      and link entities we also move the function tipc_link_xmit() to
      node.c, and rename it to tipc_node_xmit().
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af9b028e
    • Jon Paul Maloy's avatar
      tipc: change sk_buffer handling in tipc_link_xmit() · 22d85c79
      Jon Paul Maloy authored
      When the function tipc_link_xmit() is given a buffer list for
      transmission, it currently consumes the list both when transmission
      is successful and when it fails, except for the special case when
      it encounters link congestion.
      
      This behavior is inconsistent, and needs to be corrected if we want
      to avoid problems in later commits in this series.
      
      In this commit, we change this to let the function consume the list
      only when transmission is successful, and leave the list with the
      sender in all other cases. We also modifiy the socket code so that
      it adapts to this change, i.e., purges the list when a non-congestion
      error code is returned.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22d85c79
    • Jon Paul Maloy's avatar
      tipc: use bearer index when looking up active links · 36e78a46
      Jon Paul Maloy authored
      struct tipc_node currently holds two arrays of link pointers; one,
      indexed by bearer identity, which contains all links irrespective of
      current state, and one two-slot array for the currently active link
      or links. The latter array contains direct pointers into the elements
      of the former. This has the effect that we cannot know the bearer id of
      a link when accessing it via the "active_links[]" array without actually
      dereferencing the pointer, something we want to avoid in some cases.
      
      In this commit, we do instead store the bearer identity in the
      "active_links" array, and use this as an index to find the right element
      in the overall link entry array. This change should be seen as a
      preparation for the later commits in this series.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36e78a46
    • Jon Paul Maloy's avatar
      tipc: move link input queue to tipc_node · d39bbd44
      Jon Paul Maloy authored
      At present, the link input queue and the name distributor receive
      queues are fields aggregated in struct tipc_link. This is a hazard,
      because a link might be deleted while a receiving socket still keeps
      reference to one of the queues.
      
      This commit fixes this bug. However, rather than adding yet another
      reference counter to the critical data path, we move the two queues
      to safe ground inside struct tipc_node, which is already protected, and
      let the link code only handle references to the queues. This is also
      in line with planned later changes in this area.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d39bbd44
    • Jon Paul Maloy's avatar
      tipc: move link creation from neighbor discoverer to node · d3a43b90
      Jon Paul Maloy authored
      As a step towards turning links into node internal entities, we move the
      creation of links from the neighbor discovery logics to the node's link
      control logics.
      
      We also create an additional entry for the link's media address in the
      newly introduced struct tipc_link_entry, since this is where it is
      needed in the upcoming commits. The current copy in struct tipc_link
      is kept for now, but will be removed later.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3a43b90