1. 30 Mar, 2018 40 commits
    • David Howells's avatar
      rxrpc: Fix leak of rxrpc_peer objects · 17226f12
      David Howells authored
      When a new client call is requested, an rxrpc_conn_parameters struct object
      is passed in with a bunch of parameters set, such as the local endpoint to
      use.  A pointer to the target peer record is also placed in there by
      rxrpc_get_client_conn() - and this is removed if and only if a new
      connection object is allocated.  Thus it leaks if a new connection object
      isn't allocated.
      
      Fix this by putting any peer object attached to the rxrpc_conn_parameters
      object in the function that allocated it.
      
      Fixes: 19ffa01c ("rxrpc: Use structs to hold connection params and protocol info")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      17226f12
    • David Howells's avatar
      rxrpc: Add a tracepoint to track rxrpc_peer refcounting · 1159d4b4
      David Howells authored
      Add a tracepoint to track reference counting on the rxrpc_peer struct.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      1159d4b4
    • David Howells's avatar
      rxrpc: Fix apparent leak of rxrpc_local objects · 31f5f9a1
      David Howells authored
      rxrpc_local objects cannot be disposed of until all the connections that
      point to them have been RCU'd as a connection object holds refcount on the
      local endpoint it is communicating through.  Currently, this can cause an
      assertion failure to occur when a network namespace is destroyed as there's
      no check that the RCU destructors for the connections have been run before
      we start trying to destroy local endpoints.
      
      The kernel reports:
      
      	rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5}
      	------------[ cut here ]------------
      	kernel BUG at ../net/rxrpc/local_object.c:439!
      
      Fix this by keeping a count of the live connections and waiting for it to
      go to zero at the end of rxrpc_destroy_all_connections().
      
      Fixes: dee46364 ("rxrpc: Add RCU destruction for connections and calls")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      31f5f9a1
    • David Howells's avatar
      rxrpc: Add a tracepoint to track rxrpc_local refcounting · 09d2bf59
      David Howells authored
      Add a tracepoint to track reference counting on the rxrpc_local struct.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      09d2bf59
    • David Howells's avatar
      rxrpc: Fix potential call vs socket/net destruction race · d3be4d24
      David Howells authored
      rxrpc_call structs don't pin sockets or network namespaces, but may attempt
      to access both after their refcount reaches 0 so that they can detach
      themselves from the network namespace.  However, there's no guarantee that
      the socket still exists at this point (so sock_net(&call->socket->sk) may
      be invalid) and the namespace may have gone away if the call isn't pinning
      a peer.
      
      Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
      waiting for all calls to be destroyed when the network namespace goes away.
      
      This was detected by checker:
      
      net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces)
      net/rxrpc/call_object.c:634:57:    expected struct sock const *sk
      net/rxrpc/call_object.c:634:57:    got struct sock [noderef] <asn:4>*<noident>
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d3be4d24
    • David Howells's avatar
      rxrpc: Fix checker warnings and errors · 88f2a825
      David Howells authored
      Fix various issues detected by checker.
      
      Errors:
      
       (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
           call->socket.
      
      Warnings:
      
       (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
           trace_rxrpc_conn() as the where argument.
      
       (*) rxrpc_disconnect_client_call() should get its net pointer via the
           call->conn rather than call->sock to avoid a warning about accessing
           an RCU pointer without protection.
      
       (*) Proc seq start/stop functions need annotation as they pass locks
           between the functions.
      
      False positives:
      
       (*) Checker doesn't correctly handle of seq-retry lock context balance in
           rxrpc_find_service_conn_rcu().
      
       (*) Checker thinks execution may proceed past the BUG() in
           rxrpc_publish_service_conn().
      
       (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
           rxkad.c.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      88f2a825
    • Sebastian Andrzej Siewior's avatar
      rxrpc: remove unused static variables · edb63e2b
      Sebastian Andrzej Siewior authored
      The rxrpc_security_methods and rxrpc_security_sem user has been removed
      in 648af7fc ("rxrpc: Absorb the rxkad security module"). This was
      noticed by kbuild test robot for the -RT tree but is also true for !RT.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      edb63e2b
    • Marc Dionne's avatar
      rxrpc: Fix resend event time calculation · 59299aa1
      Marc Dionne authored
      Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation
      for the next resend event.  For this calculation, "oldest" will be before
      "now", so ktime_sub(oldest, now) will yield a negative value.  When passed
      to nsecs_to_jiffies which expects an unsigned value, the end result will be
      a very large value, and a resend event scheduled far into the future.  This
      could cause calls to stall if some packets were lost.
      
      Fix by ordering the arguments to ktime_sub correctly.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Signed-off-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      59299aa1
    • David Howells's avatar
      rxrpc: Don't treat call aborts as conn aborts · 57b0c9d4
      David Howells authored
      If a call-level abort is received for the previous call to complete on a
      connection channel, then that abort is queued for the connection processor
      to handle.  Unfortunately, the connection processor then assumes without
      checking that the abort is connection-level (ie. callNumber is 0) and
      distributes it over all active calls on that connection, thereby
      incorrectly aborting them.
      
      Fix this by discarding aborts aimed at a completed call.
      
      Further, discard all packets aimed at a call that's complete if there's
      currently an active call on a channel, since the DATA packets associated
      with the new call automatically terminate the old call.
      
      Fixes: 18bfeba5 ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      57b0c9d4
    • David Howells's avatar
      rxrpc: Fix Tx ring annotation after initial Tx failure · 03877bf6
      David Howells authored
      rxrpc calls have a ring of packets that are awaiting ACK or retransmission
      and a parallel ring of annotations that tracks the state of those packets.
      If the initial transmission of a packet on the underlying UDP socket fails
      then the packet annotation is marked for resend - but the setting of this
      mark accidentally erases the last-packet mark also stored in the same
      annotation slot.  If this happens, a call won't switch out of the Tx phase
      when all the packets have been transmitted.
      
      Fix this by retaining the last-packet mark and only altering the packet
      state.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      03877bf6
    • David Howells's avatar
      rxrpc: Fix a bit of time confusion · f82eb88b
      David Howells authored
      The rxrpc_reduce_call_timer() function should be passed the 'current time'
      in jiffies, not the current ktime time.  It's confusing in rxrpc_resend
      because that has to deal with both.  Pass the correct current time in.
      
      Note that this only affects the trace produced and not the functioning of
      the code.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f82eb88b
    • David Howells's avatar
      rxrpc: Fix firewall route keepalive · ace45bec
      David Howells authored
      Fix the firewall route keepalive part of AF_RXRPC which is currently
      function incorrectly by replying to VERSION REPLY packets from the server
      with VERSION REQUEST packets.
      
      Instead, send VERSION REPLY packets to the peers of service connections to
      act as keep-alives 20s after the latest packet was transmitted to that
      peer.
      
      Also, just discard VERSION REPLY packets rather than replying to them.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ace45bec
    • Lucas Bates's avatar
      tc-testing: Add newline when writing test case files · c0b6edef
      Lucas Bates authored
      When using the -i feature to generate random ID numbers for test
      cases in tdc, the function that writes the JSON to file doesn't
      add a newline character to the end of the file, so we have to
      add our own.
      Signed-off-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b6edef
    • Raghu Vatsavayi's avatar
      liquidio: prevent rx queues from getting stalled · ccdd0b4c
      Raghu Vatsavayi authored
      This commit has fix for RX traffic issues when we stress test the driver
      with continuous ifconfig up/down under very high traffic conditions.
      
      Reason for the issue is that, in existing liquidio_stop function NAPI is
      disabled even before actual FW/HW interface is brought down via
      send_rx_ctrl_cmd(lio, 0). Between time frame of NAPI disable and actual
      interface down in firmware, firmware continuously enqueues rx traffic to
      host. When interrupt happens for new packets, host irq handler fails in
      scheduling NAPI as the NAPI is already disabled.
      
      After "ifconfig <iface> up", Host re-enables NAPI but cannot schedule it
      until it receives another Rx interrupt. Host never receives Rx interrupt as
      it never cleared the Rx interrupt it received during interface down
      operation. NIC Rx interrupt gets cleared only when Host processes queue and
      clears the queue counts. Above anomaly leads to other issues like packet
      overflow in FW/HW queues, backpressure.
      
      Fix:
      This commit fixes this issue by disabling NAPI only after informing
      firmware to stop queueing packets to host via send_rx_ctrl_cmd(lio, 0).
      send_rx_ctrl_cmd is not visible in the patch as it is already there in the
      code. The DOWN command also waits for any pending packets to be processed
      by NAPI so that the deadlock will not occur.
      Signed-off-by: default avatarRaghu Vatsavayi <raghu.vatsavayi@cavium.com>
      Acked-by: default avatarDerek Chickles <derek.chickles@cavium.com>
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccdd0b4c
    • David S. Miller's avatar
      Merge branch 'ieee802154-for-davem-2018-03-29' of... · 6f14f49c
      David S. Miller authored
      Merge branch 'ieee802154-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154-next 2018-03-29
      
      An update from ieee802154 for *net-next*
      
      Colin fixed a unused variable in the new mcr20a driver.
      Harry fixed an unitialised data read in the debugfs interface of the
      ca8210 driver.
      
      If there are any issues or you think these are to late for -rc1 (both can also
      go into -rc2 as they are simple fixes) let me know.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f14f49c
    • Roman Mashak's avatar
      tc-testing: add connmark action tests · 1dad0f9f
      Roman Mashak authored
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dad0f9f
    • Claudiu Manoil's avatar
      MAINTAINERS: Update my email address from freescale to nxp · fe3f4e80
      Claudiu Manoil authored
      The freescale.com address will no longer be available.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe3f4e80
    • Biju Das's avatar
      dt-bindings: net: renesas-ravb: Add support for r8a77470 SoC · 9b857563
      Biju Das authored
      Add a new compatible string for the RZ/G1C (R8A77470) SoC.
      Signed-off-by: default avatarBiju Das <biju.das@bp.renesas.com>
      Reviewed-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b857563
    • David S. Miller's avatar
      Merge branch 'stmmac-DWMAC5' · 8bafb83e
      David S. Miller authored
      Jose Abreu says:
      
      ====================
      Fix TX Timeout and implement Safety Features
      
      Fix the TX Timeout handler to correctly reconfigure the whole system and
      start implementing features for DWMAC5 cores, specifically the Safety
      Features.
      
      Changes since v1:
      	- Display error stats in ethtool
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bafb83e
    • Jose Abreu's avatar
      net: stmmac: Add support for DWMAC5 and implement Safety Features · 8bf993a5
      Jose Abreu authored
      This adds initial suport for DWMAC5 and implements the Automotive Safety
      Package which is available from core version 5.10.
      
      The Automotive Safety Pacakge (also called Safety Features) offers us
      with error protection in the core by implementing ECC Protection in
      memories, on-chip data path parity protection, FSM parity and timeout
      protection and Application/CSR interface timeout protection.
      
      In case of an uncorrectable error we call stmmac_global_err() and
      reconfigure the whole core.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bf993a5
    • Jose Abreu's avatar
      net: stmmac: Rework and fix TX Timeout code · 34877a15
      Jose Abreu authored
      Currently TX Timeout handler does not behaves as expected and leads to
      an unrecoverable state. Rework current implementation of TX Timeout
      handling to actually perform a complete reset of the driver state and IP.
      
      We use deferred work to init a task which will be responsible for
      resetting the system.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34877a15
    • Jisheng Zhang's avatar
      net: mvneta: remove duplicate *_coal assignment · 02281a35
      Jisheng Zhang authored
      The style of the rx/tx queue's *_coal member assignment is:
      
      static void foo_coal_set(...)
      {
      	set the coal in hw;
      	update queue's foo_coal member; [1]
      }
      
      In other place, we call foo_coal_set(pp, queue->foo_coal), so the above [1]
      is duplicated and could be removed.
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02281a35
    • David S. Miller's avatar
      Merge branch 'do-not-allow-adding-routes-if-disable_ipv6-is-enabled' · e7696042
      David S. Miller authored
      Lorenzo Bianconi says:
      
      ====================
      do not allow adding routes if disable_ipv6 is enabled
      
      Do not allow userspace to add static ipv6 routes if disable_ipv6 is enabled.
      Update disable_ipv6 documentation according to that change
      
      Changes since v1:
      - added an extack message telling the user that IPv6 is disabled on the nexthop
        device
      - rebased on-top of net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7696042
    • Lorenzo Bianconi's avatar
      Documentation: ip-sysctl.txt: clarify disable_ipv6 · 2f0aaf7f
      Lorenzo Bianconi authored
      Clarify that when disable_ipv6 is enabled even the ipv6 routes
      are deleted for the selected interface and from now it will not
      be possible to add addresses/routes to that interface
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f0aaf7f
    • Lorenzo Bianconi's avatar
      ipv6: do not set routes if disable_ipv6 has been enabled · 428604fb
      Lorenzo Bianconi authored
      Do not allow setting ipv6 routes from userspace if disable_ipv6 has been
      enabled. The issue can be triggered using the following reproducer:
      
      - sysctl net.ipv6.conf.all.disable_ipv6=1
      - ip -6 route add a:b:c:d::/64 dev em1
      - ip -6 route show
        a:b:c:d::/64 dev em1 metric 1024 pref medium
      
      Fix it checking disable_ipv6 value in ip6_route_info_create routine
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428604fb
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · d162190b
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for your net-next
      tree. This batch comes with more input sanitization for xtables to
      address bug reports from fuzzers, preparation works to the flowtable
      infrastructure and assorted updates. In no particular order, they are:
      
      1) Make sure userspace provides a valid standard target verdict, from
         Florian Westphal.
      
      2) Sanitize error target size, also from Florian.
      
      3) Validate that last rule in basechain matches underflow/policy since
         userspace assumes this when decoding the ruleset blob that comes
         from the kernel, from Florian.
      
      4) Consolidate hook entry checks through xt_check_table_hooks(),
         patch from Florian.
      
      5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
         very large compat offset arrays, so we have a reasonable upper limit
         and fuzzers don't exercise the oom-killer. Patches from Florian.
      
      6) Several WARN_ON checks on xtables mutex helper, from Florian.
      
      7) xt_rateest now has a hashtable per net, from Cong Wang.
      
      8) Consolidate counter allocation in xt_counters_alloc(), from Florian.
      
      9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
         from Xin Long.
      
      10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
          Felix Fietkau.
      
      11) Consolidate code through flow_offload_fill_dir(), also from Felix.
      
      12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
          to remove a dependency with flowtable and ipv6.ko, from Felix.
      
      13) Cache mtu size in flow_offload_tuple object, this is safe for
          forwarding as f87c10a8 describes, from Felix.
      
      14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
          modular infrastructure, from Felix.
      
      15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
          Ahmed Abdelsalam.
      
      16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.
      
      17) Support for counting only to nf_conncount infrastructure, patch
          from Yi-Hung Wei.
      
      18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
          to nft_ct.
      
      19) Use boolean as return value from ipt_ah and from IPVS too, patch
          from Gustavo A. R. Silva.
      
      20) Remove useless parameters in nfnl_acct_overquota() and
          nf_conntrack_broadcast_help(), from Taehee Yoo.
      
      21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.
      
      22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.
      
      23) Fix typo in xt_limit, from Geert Uytterhoeven.
      
      24) Do no use VLAs in Netfilter code, again from Gustavo.
      
      25) Use ADD_COUNTER from ebtables, from Taehee Yoo.
      
      26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.
      
      27) Use pr_*() and add pr_fmt(), from Arushi Singhal.
      
      28) Add synproxy support to ctnetlink.
      
      29) ICMP type and IGMP matching support for ebtables, patches from
          Matthias Schiffer.
      
      30) Support for the revision infrastructure to ebtables, from
          Bernie Harris.
      
      31) String match support for ebtables, also from Bernie.
      
      32) Documentation for the new flowtable infrastructure.
      
      33) Use generic comparison functions in ebt_stp, from Joe Perches.
      
      34) Demodularize filter chains in nftables.
      
      35) Register conntrack hooks in case nftables NAT chain is added.
      
      36) Merge assignments with return in a couple of spots in the
          Netfilter codebase, also from Arushi.
      
      37) Document that xtables percpu counters are stored in the same
          memory area, from Ben Hutchings.
      
      38) Revert mark_source_chains() sanity checks that break existing
          rulesets, from Florian Westphal.
      
      39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d162190b
    • David S. Miller's avatar
      Merge branch 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' · b9a12601
      David S. Miller authored
      Kirill Tkhai says:
      
      ====================
      Close race between {un, }register_netdevice_notifier and pernet_operations
      
      the problem is {,un}register_netdevice_notifier() do not take
      pernet_ops_rwsem, and they don't see network namespaces, being
      initialized in setup_net() and cleanup_net(), since at this
      time net is not hashed to net_namespace_list.
      
      This may lead to imbalance, when a notifier is called at time of
      setup_net()/net is alive, but it's not called at time of cleanup_net(),
      for the devices, hashed to the net, and vise versa. See (3/3) for
      the scheme of imbalance.
      
      This patchset fixes the problem by acquiring pernet_ops_rwsem
      at the time of {,un}register_netdevice_notifier() (3/3).
      (1-2/3) are preparations in xfrm and netfilter subsystems.
      
      The problem was introduced a long ago, but backporting won't be easy,
      since every previous kernel version may have changes in netdevice
      notifiers, and they all need review and testing. Otherwise, there
      may be more pernet_operations, which register or unregister
      netdevice notifiers, and that leads to deadlock (which is was fixed
      in 1-2/3). This patchset is for net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9a12601
    • Kirill Tkhai's avatar
      net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net() · 328fbe74
      Kirill Tkhai authored
      {un,}register_netdevice_notifier() iterate over all net namespaces
      hashed to net_namespace_list. But pernet_operations register and
      unregister netdevices in unhashed net namespace, and they are not
      seen for netdevice notifiers. This results in asymmetry:
      
      1)Race with register_netdevice_notifier()
        pernet_operations::init(net)	...
         register_netdevice()		...
          call_netdevice_notifiers()  ...
            ... nb is not called ...
        ...				register_netdevice_notifier(nb) -> net skipped
        ...				...
        list_add_tail(&net->list, ..) ...
      
        Then, userspace stops using net, and it's destructed:
      
        pernet_operations::exit(net)
         unregister_netdevice()
          call_netdevice_notifiers()
            ... nb is called ...
      
      This always happens with net::loopback_dev, but it may be not the only device.
      
      2)Race with unregister_netdevice_notifier()
        pernet_operations::init(net)
         register_netdevice()
          call_netdevice_notifiers()
            ... nb is called ...
      
        Then, userspace stops using net, and it's destructed:
      
        list_del_rcu(&net->list)	...
        pernet_operations::exit(net)  unregister_netdevice_notifier(nb) -> net skipped
         dev_change_net_namespace()	...
          call_netdevice_notifiers()
            ... nb is not called ...
         unregister_netdevice()
          call_netdevice_notifiers()
            ... nb is not called ...
      
      This race is more danger, since dev_change_net_namespace() moves real
      network devices, which use not trivial netdevice notifiers, and if this
      will happen, the system will be left in unpredictable state.
      
      The patch closes the race. During the testing I found two places,
      where register_netdevice_notifier() is called from pernet init/exit
      methods (which led to deadlock) and fixed them (see previous patches).
      
      The review moved me to one more unusual registration place:
      raw_init() (can driver). It may be a reason of problems,
      if someone creates in-kernel CAN_RAW sockets, since they
      will be destroyed in exit method and raw_release()
      will call unregister_netdevice_notifier(). But grep over
      kernel tree does not show, someone creates such sockets
      from kernel space.
      
      Theoretically, there can be more places like this, and which are
      hidden from review, but we found them on the first bumping there
      (since there is no a race, it will be 100% reproducible).
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      328fbe74
    • Kirill Tkhai's avatar
      netfilter: Rework xt_TEE netdevice notifier · 9e2f6c5d
      Kirill Tkhai authored
      Register netdevice notifier for every iptable entry
      is not good, since this breaks modularity, and
      the hidden synchronization is based on rtnl_lock().
      
      This patch reworks the synchronization via new lock,
      while the rest of logic remains as it was before.
      This is required for the next patch.
      
      Tested via:
      
      while :; do
      	unshare -n iptables -t mangle -A OUTPUT -j TEE --gateway 1.1.1.2 --oif lo;
      done
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e2f6c5d
    • Kirill Tkhai's avatar
      xfrm: Register xfrm_dev_notifier in appropriate place · e9a441b6
      Kirill Tkhai authored
      Currently, driver registers it from pernet_operations::init method,
      and this breaks modularity, because initialization of net namespace
      and netdevice notifiers are orthogonal actions. We don't have
      per-namespace netdevice notifiers; all of them are global for all
      devices in all namespaces.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9a441b6
    • David S. Miller's avatar
      Merge branch 'Implement-of_get_nvmem_mac_address-helper' · caeeeda3
      David S. Miller authored
      Mike Looijmans says:
      
      ====================
      of_net: Implement of_get_nvmem_mac_address helper
      
      Posted this as a small set now, with an (optional) second patch that shows
      how the changes work and what I've used to test the code on a Topic Miami board.
      I've taken the liberty to add appropriate "Acked" and "Review" tags.
      
      v4: Replaced "6" with ETH_ALEN
      
      v3: Add patch that implements mac in nvmem for the Cadence MACB controller
          Remove the integrated of_get_mac_address call
      
      v2: Use of_nvmem_cell_get to avoid needing the assiciated device
          Use void* instead of char*
          Add devicetree binding doc
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caeeeda3
    • Mike Looijmans's avatar
      net: macb: Try to retrieve MAC addess from nvmem provider · aa076e3d
      Mike Looijmans authored
      Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem
      cell, if one is provided in the device tree. This allows the address to
      be stored in an I2C EEPROM device for example.
      Signed-off-by: default avatarMike Looijmans <mike.looijmans@topic.nl>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa076e3d
    • Mike Looijmans's avatar
      of_net: Implement of_get_nvmem_mac_address helper · 9217e566
      Mike Looijmans authored
      It's common practice to store MAC addresses for network interfaces into
      nvmem devices. However the code to actually do this in the kernel lacks,
      so this patch adds of_get_nvmem_mac_address() for drivers to obtain the
      address from an nvmem cell provider.
      
      This is particulary useful on devices where the ethernet interface cannot
      be configured by the bootloader, for example because it's in an FPGA.
      Signed-off-by: default avatarMike Looijmans <mike.looijmans@topic.nl>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9217e566
    • David S. Miller's avatar
      Merge branch 'nfp-flower-handle-MTU-changes' · 64e828df
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: flower: handle MTU changes
      
      This set improves MTU handling for flower offload.  The max MTU is
      correctly capped and physical port MTU is communicated to the FW
      (and indirectly HW).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64e828df
    • John Hurley's avatar
      nfp: flower: offload phys port MTU change · 29a5dcae
      John Hurley authored
      Trigger a port mod message to request an MTU change on the NIC when any
      physical port representor is assigned a new MTU value. The driver waits
      10 msec for an ack that the FW has set the MTU. If no ack is received the
      request is rejected and an appropriate warning flagged.
      
      Rather than maintain an MTU queue per repr, one is maintained per app.
      Because the MTU ndo is protected by the rtnl lock, there can never be
      contention here. Portmod messages from the NIC are also protected by
      rtnl so we first check if the portmod is an ack and, if so, handle outside
      rtnl and the cmsg work queue.
      
      Acks are detected by the marking of a bit in a portmod response. They are
      then verfied by checking the port number and MTU value expected by the
      app. If the expected MTU is 0 then no acks are currently expected.
      
      Also, ensure that the packet headroom reserved by the flower firmware is
      considered when accepting an MTU change on any repr.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29a5dcae
    • John Hurley's avatar
      nfp: modify app MTU setting callbacks · 167cebef
      John Hurley authored
      Rename the 'change_mtu' app callback to 'check_mtu'. This is called
      whenever an MTU change is requested on a netdev. It can reject the
      change but is not responsible for implementing it.
      
      Introduce a new 'repr_change_mtu' app callback that is hit when the MTU
      of a repr is to be changed. This is responsible for performing the MTU
      change and verifying it.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      167cebef
    • David S. Miller's avatar
      Merge branch 'phylink-API-changes' · 44465c47
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      phylink: API changes
      
      This patch series contains two API changes to PHYLINK which will later be used
      by DSA to migrate to PHYLINK. Because these are API changes that impact other
      outstanding work (e.g: MVPP2) I would rather get them included sooner to minimize
      conflicts.
      
      Thank you!
      
      Changes in v2:
      
      - added missing documentation to mac_link_{up,down} that the interface
        must be configured in mac_config()
      
      - added Russell's, Andrew's and my tags
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44465c47
    • Russell King's avatar
      sfp/phylink: move module EEPROM ethtool access into netdev core ethtool · e679c9c1
      Russell King authored
      Provide a pointer to the SFP bus in struct net_device, so that the
      ethtool module EEPROM methods can access the SFP directly, rather
      than needing every user to provide a hook for it.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e679c9c1
    • Florian Fainelli's avatar
      net: phy: phylink: Provide PHY interface to mac_link_{up, down} · c6ab3008
      Florian Fainelli authored
      In preparation for having DSA transition entirely to PHYLINK, we need to pass a
      PHY interface type to the mac_link_{up,down} callbacks because we may have to
      make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass
      an entire phylink_link_state because not all parameters (pause, duplex etc.) are
      defined when the link is down, only link and interface are.
      
      Update mvneta accordingly since it currently implements phylink_mac_ops.
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6ab3008
    • Ronak Doshi's avatar
      MAINTAINERS: update vmxnet3 driver maintainer · 2166dc95
      Ronak Doshi authored
      Shrikrishna Khare would no longer maintain the vmxnet3 driver. Taking
      over the role of vmxnet3 maintainer.
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2166dc95