1. 30 Mar, 2018 19 commits
    • Kirill Tkhai's avatar
      net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net() · 328fbe74
      Kirill Tkhai authored
      {un,}register_netdevice_notifier() iterate over all net namespaces
      hashed to net_namespace_list. But pernet_operations register and
      unregister netdevices in unhashed net namespace, and they are not
      seen for netdevice notifiers. This results in asymmetry:
      
      1)Race with register_netdevice_notifier()
        pernet_operations::init(net)	...
         register_netdevice()		...
          call_netdevice_notifiers()  ...
            ... nb is not called ...
        ...				register_netdevice_notifier(nb) -> net skipped
        ...				...
        list_add_tail(&net->list, ..) ...
      
        Then, userspace stops using net, and it's destructed:
      
        pernet_operations::exit(net)
         unregister_netdevice()
          call_netdevice_notifiers()
            ... nb is called ...
      
      This always happens with net::loopback_dev, but it may be not the only device.
      
      2)Race with unregister_netdevice_notifier()
        pernet_operations::init(net)
         register_netdevice()
          call_netdevice_notifiers()
            ... nb is called ...
      
        Then, userspace stops using net, and it's destructed:
      
        list_del_rcu(&net->list)	...
        pernet_operations::exit(net)  unregister_netdevice_notifier(nb) -> net skipped
         dev_change_net_namespace()	...
          call_netdevice_notifiers()
            ... nb is not called ...
         unregister_netdevice()
          call_netdevice_notifiers()
            ... nb is not called ...
      
      This race is more danger, since dev_change_net_namespace() moves real
      network devices, which use not trivial netdevice notifiers, and if this
      will happen, the system will be left in unpredictable state.
      
      The patch closes the race. During the testing I found two places,
      where register_netdevice_notifier() is called from pernet init/exit
      methods (which led to deadlock) and fixed them (see previous patches).
      
      The review moved me to one more unusual registration place:
      raw_init() (can driver). It may be a reason of problems,
      if someone creates in-kernel CAN_RAW sockets, since they
      will be destroyed in exit method and raw_release()
      will call unregister_netdevice_notifier(). But grep over
      kernel tree does not show, someone creates such sockets
      from kernel space.
      
      Theoretically, there can be more places like this, and which are
      hidden from review, but we found them on the first bumping there
      (since there is no a race, it will be 100% reproducible).
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      328fbe74
    • Kirill Tkhai's avatar
      netfilter: Rework xt_TEE netdevice notifier · 9e2f6c5d
      Kirill Tkhai authored
      Register netdevice notifier for every iptable entry
      is not good, since this breaks modularity, and
      the hidden synchronization is based on rtnl_lock().
      
      This patch reworks the synchronization via new lock,
      while the rest of logic remains as it was before.
      This is required for the next patch.
      
      Tested via:
      
      while :; do
      	unshare -n iptables -t mangle -A OUTPUT -j TEE --gateway 1.1.1.2 --oif lo;
      done
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e2f6c5d
    • Kirill Tkhai's avatar
      xfrm: Register xfrm_dev_notifier in appropriate place · e9a441b6
      Kirill Tkhai authored
      Currently, driver registers it from pernet_operations::init method,
      and this breaks modularity, because initialization of net namespace
      and netdevice notifiers are orthogonal actions. We don't have
      per-namespace netdevice notifiers; all of them are global for all
      devices in all namespaces.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9a441b6
    • David S. Miller's avatar
      Merge branch 'Implement-of_get_nvmem_mac_address-helper' · caeeeda3
      David S. Miller authored
      Mike Looijmans says:
      
      ====================
      of_net: Implement of_get_nvmem_mac_address helper
      
      Posted this as a small set now, with an (optional) second patch that shows
      how the changes work and what I've used to test the code on a Topic Miami board.
      I've taken the liberty to add appropriate "Acked" and "Review" tags.
      
      v4: Replaced "6" with ETH_ALEN
      
      v3: Add patch that implements mac in nvmem for the Cadence MACB controller
          Remove the integrated of_get_mac_address call
      
      v2: Use of_nvmem_cell_get to avoid needing the assiciated device
          Use void* instead of char*
          Add devicetree binding doc
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caeeeda3
    • Mike Looijmans's avatar
      net: macb: Try to retrieve MAC addess from nvmem provider · aa076e3d
      Mike Looijmans authored
      Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem
      cell, if one is provided in the device tree. This allows the address to
      be stored in an I2C EEPROM device for example.
      Signed-off-by: default avatarMike Looijmans <mike.looijmans@topic.nl>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa076e3d
    • Mike Looijmans's avatar
      of_net: Implement of_get_nvmem_mac_address helper · 9217e566
      Mike Looijmans authored
      It's common practice to store MAC addresses for network interfaces into
      nvmem devices. However the code to actually do this in the kernel lacks,
      so this patch adds of_get_nvmem_mac_address() for drivers to obtain the
      address from an nvmem cell provider.
      
      This is particulary useful on devices where the ethernet interface cannot
      be configured by the bootloader, for example because it's in an FPGA.
      Signed-off-by: default avatarMike Looijmans <mike.looijmans@topic.nl>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9217e566
    • David S. Miller's avatar
      Merge branch 'nfp-flower-handle-MTU-changes' · 64e828df
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: flower: handle MTU changes
      
      This set improves MTU handling for flower offload.  The max MTU is
      correctly capped and physical port MTU is communicated to the FW
      (and indirectly HW).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64e828df
    • John Hurley's avatar
      nfp: flower: offload phys port MTU change · 29a5dcae
      John Hurley authored
      Trigger a port mod message to request an MTU change on the NIC when any
      physical port representor is assigned a new MTU value. The driver waits
      10 msec for an ack that the FW has set the MTU. If no ack is received the
      request is rejected and an appropriate warning flagged.
      
      Rather than maintain an MTU queue per repr, one is maintained per app.
      Because the MTU ndo is protected by the rtnl lock, there can never be
      contention here. Portmod messages from the NIC are also protected by
      rtnl so we first check if the portmod is an ack and, if so, handle outside
      rtnl and the cmsg work queue.
      
      Acks are detected by the marking of a bit in a portmod response. They are
      then verfied by checking the port number and MTU value expected by the
      app. If the expected MTU is 0 then no acks are currently expected.
      
      Also, ensure that the packet headroom reserved by the flower firmware is
      considered when accepting an MTU change on any repr.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29a5dcae
    • John Hurley's avatar
      nfp: modify app MTU setting callbacks · 167cebef
      John Hurley authored
      Rename the 'change_mtu' app callback to 'check_mtu'. This is called
      whenever an MTU change is requested on a netdev. It can reject the
      change but is not responsible for implementing it.
      
      Introduce a new 'repr_change_mtu' app callback that is hit when the MTU
      of a repr is to be changed. This is responsible for performing the MTU
      change and verifying it.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      167cebef
    • David S. Miller's avatar
      Merge branch 'phylink-API-changes' · 44465c47
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      phylink: API changes
      
      This patch series contains two API changes to PHYLINK which will later be used
      by DSA to migrate to PHYLINK. Because these are API changes that impact other
      outstanding work (e.g: MVPP2) I would rather get them included sooner to minimize
      conflicts.
      
      Thank you!
      
      Changes in v2:
      
      - added missing documentation to mac_link_{up,down} that the interface
        must be configured in mac_config()
      
      - added Russell's, Andrew's and my tags
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44465c47
    • Russell King's avatar
      sfp/phylink: move module EEPROM ethtool access into netdev core ethtool · e679c9c1
      Russell King authored
      Provide a pointer to the SFP bus in struct net_device, so that the
      ethtool module EEPROM methods can access the SFP directly, rather
      than needing every user to provide a hook for it.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e679c9c1
    • Florian Fainelli's avatar
      net: phy: phylink: Provide PHY interface to mac_link_{up, down} · c6ab3008
      Florian Fainelli authored
      In preparation for having DSA transition entirely to PHYLINK, we need to pass a
      PHY interface type to the mac_link_{up,down} callbacks because we may have to
      make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass
      an entire phylink_link_state because not all parameters (pause, duplex etc.) are
      defined when the link is down, only link and interface are.
      
      Update mvneta accordingly since it currently implements phylink_mac_ops.
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6ab3008
    • Ronak Doshi's avatar
      MAINTAINERS: update vmxnet3 driver maintainer · 2166dc95
      Ronak Doshi authored
      Shrikrishna Khare would no longer maintain the vmxnet3 driver. Taking
      over the role of vmxnet3 maintainer.
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2166dc95
    • David S. Miller's avatar
      Merge branch 'net-Broadcom-drivers-coalescing-fixes' · 95e623fd
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: Broadcom drivers coalescing fixes
      
      Following Tal's review of the adaptive RX/TX coalescing feature added to the
      SYSTEMPORT and GENET driver a number of things showed up:
      
      - adaptive TX coalescing is not actually a good idea with the current way
        the estimator will program the ring, this results in a higher CPU load, NAPI
        on TX already does a reasonably good job at maintaining the interrupt count low
      
      - both SYSTEMPORT and GENET would suffer from the same issues while configuring
        coalescing parameters where the values would just not be applied correctly
        based on user settings, so we fix that too
      
      Tal, thanks again for your feedback, I would appreciate if you could review that
      the new behavior appears to be implemented correctly.
      
      Thanks!
      
      Changes in v2:
      
      - added Tal's reviewed-by to the first patch
      - split DIM initialization from coalescing parameters initialization
      - avoid duplicating the same code in bcmgenet_set_coalesce() when configuring RX rings
      - fixed the condition where default DIM parameters would be applied when
        adaptive RX coalescing would be enabled, do this only if it was disabled before
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95e623fd
    • Florian Fainelli's avatar
      net: bcmgenet: Fix coalescing settings handling · 5e6ce1f1
      Florian Fainelli authored
      There were a number of issues with setting the RX coalescing parameters:
      
      - we would not be preserving values that would have been configured
        across close/open calls, instead we would always reset to no timeout
        and 1 interrupt per packet, this would also prevent DIM from setting its
        default usec/pkts values
      
      - when adaptive RX would be turned on, we woud not be fetching the
        default parameters, we would stay with no timeout/1 packet per interrupt
        until the estimator kicks in and changes that
      
      - finally disabling adaptive RX coalescing while providing parameters
        would not be honored, and we would stay with whatever DIM had previously
        determined instead of the user requested parameters
      
      Fixes: 9f4ca058 ("net: bcmgenet: Add support for adaptive RX coalescing")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarTal Gilboa <talgi@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e6ce1f1
    • Florian Fainelli's avatar
      net: systemport: Fix coalescing settings handling · a8cdfbdf
      Florian Fainelli authored
      There were a number of issues with setting the RX coalescing parameters:
      
      - we would not be preserving values that would have been configured
        across close/open calls, instead we would always reset to no timeout
        and 1 interrupt per packet, this would also prevent DIM from setting its
        default usec/pkts values
      
      - when adaptive RX would be turned on, we woud not be fetching the
        default parameters, we would stay with no timeout/1 packet per
        interrupt until the estimator kicks in and changes that
      
      - finally disabling adaptive RX coalescing while providing parameters
        would not be honored, and we would stay with whatever DIM had
        previously determined instead of the user requested parameters
      
      Fixes: b6e0e875 ("net: systemport: Implement adaptive interrupt coalescing")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarTal Gilboa <talgi@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8cdfbdf
    • Florian Fainelli's avatar
      net: systemport: Remove adaptive TX coalescing · fd41f2bf
      Florian Fainelli authored
      Adaptive TX coalescing is not currently giving us any advantages and
      ends up making the CPU spin more frequently until TX completion. Deny
      and disable adaptive TX coalescing for now and rely on static
      configuration, we can always add it back later.
      Reviewed-by: default avatarTal Gilboa <talgi@mellanox.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd41f2bf
    • Gal Pressman's avatar
      net: Call add/kill vid ndo on vlan filter feature toggling · 9daae9bd
      Gal Pressman authored
      NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit
      flip in dev->features in order to keep the driver in a consistent state.
      These features notify the driver of each added/removed vlan, but toggling
      of vlan-filter does not notify the driver accordingly for each of the
      existing vlans.
      
      This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT
      behavior (which notifies the driver about UDP ports in the same manner
      that vids are reported).
      
      Each toggling of the features propagates to the 8021q module, which
      iterates over the vlans and call add/kill ndo accordingly.
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9daae9bd
    • Wei Yongjun's avatar
      cxgb4: fix error return code in adap_init0() · 004c3cf1
      Wei Yongjun authored
      Fix to return a negative error code from the hash filter init error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: 5c31254e ("cxgb4: initialize hash-filter configuration")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      004c3cf1
  2. 29 Mar, 2018 21 commits