1. 07 Sep, 2016 8 commits
    • David Howells's avatar
      rxrpc: Calls shouldn't hold socket refs · 8d94aa38
      David Howells authored
      rxrpc calls shouldn't hold refs on the sock struct.  This was done so that
      the socket wouldn't go away whilst the call was in progress, such that the
      call could reach the socket's queues.
      
      However, we can mark the socket as requiring an RCU release and rely on the
      RCU read lock.
      
      To make this work, we do:
      
       (1) rxrpc_release_call() removes the call's call user ID.  This is now
           only called from socket operations and not from the call processor:
      
      	rxrpc_accept_call() / rxrpc_kernel_accept_call()
      	rxrpc_reject_call() / rxrpc_kernel_reject_call()
      	rxrpc_kernel_end_call()
      	rxrpc_release_calls_on_socket()
      	rxrpc_recvmsg()
      
           Though it is also called in the cleanup path of
           rxrpc_accept_incoming_call() before we assign a user ID.
      
       (2) Pass the socket pointer into rxrpc_release_call() rather than getting
           it from the call so that we can get rid of uninitialised calls.
      
       (3) Fix call processor queueing to pass a ref to the work queue and to
           release that ref at the end of the processor function (or to pass it
           back to the work queue if we have to requeue).
      
       (4) Skip out of the call processor function asap if the call is complete
           and don't requeue it if the call is complete.
      
       (5) Clean up the call immediately that the refcount reaches 0 rather than
           trying to defer it.  Actual deallocation is deferred to RCU, however.
      
       (6) Don't hold socket refs for allocated calls.
      
       (7) Use the RCU read lock when queueing a message on a socket and treat
           the call's socket pointer according to RCU rules and check it for
           NULL.
      
           We also need to use the RCU read lock when viewing a call through
           procfs.
      
       (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
           if this hasn't been done yet so that we can then disconnect the call.
           Once the call is disconnected, it won't have any access to the
           connection struct and the UDP socket for the call work processor to be
           able to send the ACK.  Terminal retransmission will be handled by the
           connection processor.
      
       (9) Release all calls immediately on the closing of a socket rather than
           trying to defer this.  Incomplete calls will be aborted.
      
      The call refcount model is much simplified.  Refs are held on the call by:
      
       (1) A socket's user ID tree.
      
       (2) A socket's incoming call secureq and acceptq.
      
       (3) A kernel service that has a call in progress.
      
       (4) A queued call work processor.  We have to take care to put any call
           that we failed to queue.
      
       (5) sk_buffs on a socket's receive queue.  A future patch will get rid of
           this.
      
      Whilst we're at it, we can do:
      
       (1) Get rid of the RXRPC_CALL_EV_RELEASE event.  Release is now done
           entirely from the socket routines and never from the call's processor.
      
       (2) Get rid of the RXRPC_CALL_DEAD state.  Calls now end in the
           RXRPC_CALL_COMPLETE state.
      
       (3) Get rid of the rxrpc_call::destroyer work item.  Calls are now torn
           down when their refcount reaches 0 and then handed over to RCU for
           final cleanup.
      
       (4) Get rid of the rxrpc_call::deadspan timer.  Calls are cleaned up
           immediately they're finished with and don't hang around.
           Post-completion retransmission is handled by the connection processor
           once the call is disconnected.
      
       (5) Get rid of the dead call expiry setting as there's no longer a timer
           to set.
      
       (6) rxrpc_destroy_all_calls() can just check that the call list is empty.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8d94aa38
    • David Howells's avatar
      rxrpc: Use rxrpc_is_service_call() rather than rxrpc_conn_is_service() · 6543ac52
      David Howells authored
      Use rxrpc_is_service_call() rather than rxrpc_conn_is_service() if the call
      is available just in case call->conn is NULL.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      6543ac52
    • David Howells's avatar
      rxrpc: Pass the connection pointer to rxrpc_post_packet_to_call() · 8b7fac50
      David Howells authored
      Pass the connection pointer to rxrpc_post_packet_to_call() as the call
      might get disconnected whilst we're looking at it, but the connection
      pointer determined by rxrpc_data_read() is guaranteed by RCU for the
      duration of the call.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8b7fac50
    • David Howells's avatar
      rxrpc: Cache the security index in the rxrpc_call struct · 278ac0cd
      David Howells authored
      Cache the security index in the rxrpc_call struct so that we can get at it
      even when the call has been disconnected and the connection pointer
      cleared.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      278ac0cd
    • David Howells's avatar
      rxrpc: Use call->peer rather than call->conn->params.peer · f4fdb352
      David Howells authored
      Use call->peer rather than call->conn->params.peer to avoid the possibility
      of call->conn being NULL and, whilst we're at it, check it for NULL before we
      access it.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f4fdb352
    • David Howells's avatar
      rxrpc: Improve the call tracking tracepoint · fff72429
      David Howells authored
      Improve the call tracking tracepoint by showing more differentiation
      between some of the put and get events, including:
      
        (1) Getting and putting refs for the socket call user ID tree.
      
        (2) Getting and putting refs for queueing and failing to queue the call
            processor work item.
      
      Note that these aren't necessarily used in this patch, but will be taken
      advantage of in future patches.
      
      An enum is added for the event subtype numbers rather than coding them
      directly as decimal numbers and a table of 3-letter strings is provided
      rather than a sequence of ?: operators.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      fff72429
    • David Howells's avatar
      rxrpc: Delete unused rxrpc_kernel_free_skb() · e796cb41
      David Howells authored
      Delete rxrpc_kernel_free_skb() as it's unused.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      e796cb41
    • David Howells's avatar
      rxrpc: Whitespace cleanup · 71a17de3
      David Howells authored
      Remove some whitespace.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      71a17de3
  2. 06 Sep, 2016 19 commits
    • Joe Perches's avatar
      qed: Remove OOM messages · 2591c280
      Joe Perches authored
      These messages are unnecessary as OOM allocation failures already do
      a dump_stack() giving more or less the same information.
      
      $ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64)
         text	   data	    bss	    dec	    hex	filename
       127817	  27969	  32800	 188586	  2e0aa	drivers/net/ethernet/qlogic/qed/built-in.o.new
       132474	  27969	  32800	 193243	  2f2db	drivers/net/ethernet/qlogic/qed/built-in.o.old
      
      Miscellanea:
      
      o Change allocs to the generally preferred forms where possible.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2591c280
    • David S. Miller's avatar
      Merge tag 'rxrpc-rewrite-20160904-2' of... · c7ee5672
      David S. Miller authored
      Merge tag 'rxrpc-rewrite-20160904-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Split output code from sendmsg code
      
      Here's a set of small patches that split the packet transmission code from
      the sendmsg code and simply rearrange the new file to make it more
      logically laid out ready for being rewritten.  An enum is also moved out of
      the header file to there as it's only used there.  This needs to be applied
      on top of the just-posted fixes patch set.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7ee5672
    • David S. Miller's avatar
      Merge tag 'rxrpc-rewrite-20160904-1' of... · 0122c6d5
      David S. Miller authored
      Merge tag 'rxrpc-rewrite-20160904-1' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Small fixes
      
      Here's a set of small fix patches:
      
       (1) Fix some uninitialised variables.
      
       (2) Set the client call state before making it live by attaching it to the
           conn struct.
      
       (3) Randomise the epoch and starting client conn ID values, and don't
           change the epoch when the client conn ID rolls round.
      
       (4) Replace deprecated create_singlethread_workqueue() calls.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0122c6d5
    • Haishuang Yan's avatar
      vxlan: Update tx_errors statistics if vxlan_build_skb return err. · 5e1e61a3
      Haishuang Yan authored
      If vxlan_build_skb return err < 0, tx_errors should be also increased.
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e1e61a3
    • Brenden Blanco's avatar
      net/mlx4_en: protect ring->xdp_prog with rcu_read_lock · 326fe02d
      Brenden Blanco authored
      Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
      freed despite the use of call_rcu inside bpf_prog_put. The situation is
      possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
      callback for destroying the bpf prog can run even during the bh handling
      in the mlx4 rx path.
      
      Several options were considered before this patch was settled on:
      
      Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
      of the rings are updated with the new program.
      This approach has the disadvantage that as the number of rings
      increases, the speed of update will slow down significantly due to
      napi_synchronize's msleep(1).
      
      Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
      The action of the bpf_prog_put_bh would be to then call bpf_prog_put
      later. Those drivers that consume a bpf prog in a bh context (like mlx4)
      would then use the bpf_prog_put_bh instead when the ring is up. This has
      the problem of complexity, in maintaining proper refcnts and rcu lists,
      and would likely be harder to review. In addition, this approach to
      freeing must be exclusive with other frees of the bpf prog, for instance
      a _bh prog must not be referenced from a prog array that is consumed by
      a non-_bh prog.
      
      The placement of rcu_read_lock in this patch is functionally the same as
      putting an rcu_read_lock in napi_poll. Actually doing so could be a
      potentially controversial change, but would bring the implementation in
      line with sk_busy_loop (though of course the nature of those two paths
      is substantially different), and would also avoid future copy/paste
      problems with future supporters of XDP. Still, this patch does not take
      that opinionated option.
      
      Testing was done with kernels in either PREEMPT_RCU=y or
      CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
      any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
      not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
      overhead of rcu_read_lock (according to perf) was the same before/after.
      In the rx path, rcu_read_lock is eventually called for every packet
      from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
      is easily amortized.
      
      v2:
      Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
      Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
      rcu_dereference[_protected] as appropriate.
      Add explicit mutex lock around rcu_assign instead of xchg loop.
      
      Fixes: d576acf0 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      326fe02d
    • David S. Miller's avatar
      Merge branch 'mediatek-rx-path-enhancements' · bc58493b
      David S. Miller authored
      Sean Wang says:
      
      ====================
      net: ethernet: mediatek: add enhancements to RX path
      
      Changes since v1:
      - fix message typos and add coverletter
      
      Changes since v2:
      - split from the previous series for submitting add enhancements as
      a series targeting 'net-next' and add indents before comments.
      
      Changes since v3:
      - merge the patch using PDMA RX path
      - fixed the input of mtk_poll_rx is with the remaining budget
      
      Changes since v4:
      - save one wmb and register update when no packet is being handled
      inside mtk_poll_rx call
      - fixed incorrect return packet count from mtk_napi_rx
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc58493b
    • Sean Wang's avatar
      net: ethernet: mediatek: enhance RX path by aggregating more SKBs into NAPI · 41156cea
      Sean Wang authored
      The patch adds support for aggregating more SKBs feed into NAPI in
      order to get more benefits from generic receive offload (GRO) by
      peeking at the RX ring status and moving more packets right before
      returning from NAPI RX polling handler if NAPI budgets are still
      available and some packets already present in hardware.
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41156cea
    • Sean Wang's avatar
      net: ethernet: mediatek: enhance RX path by reducing the frequency of the memory barrier used · 635372ad
      Sean Wang authored
      The patch makes move wmb() to outside the loop that could help
      RX path handling more faster although that RX descriptors aren't
      freed for DMA to use as soon as possible, but based on my experiment
      and the result shows it still can reach about 943Mbpis without
      performance drop that is tested based on the setup with one port
      using Giga PHY and 256 RX descriptors for DMA to move.
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      635372ad
    • David S. Miller's avatar
      Merge branch 'hso-neatening' · 0da4d283
      David S. Miller authored
      Joe Perches says:
      
      ====================
      hso: neatening
      
      This seems to be the only code in the kernel that uses
      macro defines with a trailing underscore.  Fix that.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0da4d283
    • Joe Perches's avatar
      hso: Convert printk to pr_<level> · 3981cce6
      Joe Perches authored
      Use a more common logging style
      
      Miscellanea:
      
      o Add pr_fmt to prefix each output message
      o Realign arguments
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3981cce6
    • Joe Perches's avatar
      hso: Use a more common logging style · 95a69117
      Joe Perches authored
      Macros that end in an underscore are just odd.
      Add hso_dbg(level, fmt, ...) and use it everwhere instead.
      
      Several uses had additional unnecessary newlines as the
      macro added a newline.  Remove the newline from the macro
      and add newlines to each use as appropriate.
      
      Remove now unused D<digit> macros.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95a69117
    • Woojung Huh's avatar
      smsc95xx: Add mdix control via ethtool · 13722bbe
      Woojung Huh authored
      Add mdix control through ethtool.
      Signed-off-by: default avatarWoojung Huh <Woojung.huh@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13722bbe
    • Woojung Huh's avatar
      smsc95xx: Add register define · 273bf288
      Woojung Huh authored
      Add STRAP_STATUS defines.
      Signed-off-by: default avatarWoojung Huh <Woojung.huh@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      273bf288
    • Woojung Huh's avatar
      smsc95xx: Add maintainer · 983ccd74
      Woojung Huh authored
      Add Microchip Linux Driver Support as maintainer
      because this driver is maintaining by Microchip.
      Signed-off-by: default avatarWoojung Huh <Woojung.huh@gmail.com>
      Acked-by: default avatarSteve Glendinning <steve.glendinning@shawell.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      983ccd74
    • David S. Miller's avatar
      Merge branch 'mv88e6xxx-isolate-Global2' · 464520a1
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6xxx: isolate Global2 support
      
      Registers of Marvell chips are organized in internal SMI devices.
      
      One of them at address 0x1C is called Global2. It provides an extended
      set of registers, used for interrupt control, EEPROM access, indirect
      PHY access (to bypass the PHY Polling Unit) and cross-chip setup.
      
      Most chips have it, but some others don't (older ones such as 6060).
      
      Now that its related code is isolated in mv88e6xxx_g2_* functions, move
      it to its own global2.c file, making most of its setup code static.
      
      Then make its compilation optional, which allows to reduce the size of
      the mv88e6xxx driver for devices such as home routers embedding Ethernet
      chips without Global2 support.
      
      It is present on most recent chips, thus enable its support by default.
      
      Changes in v2: fail probe if GLOBAL2 is required but not enabled.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      464520a1
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: make global2 code optional · ca070c10
      Vivien Didelot authored
      Since not every chip has a Global2 set of registers, make its support
      optional, in which case the related functions will return -EOPNOTSUPP.
      
      This also allows to reduce the size of the mv88e6xxx driver for devices
      such as home routers embedding Ethernet chips without Global2 support.
      
      It is present on most recent chips, thus enable its support by default.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca070c10
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: move Global2 code · ec561276
      Vivien Didelot authored
      Marvell chips are composed of multiple SMI devices. One of them at
      address 0x1C is called Global2. It provides an extended set of
      registers, used for interrupt control, EEPROM access, indirect PHY
      access (to bypass the PHY Polling Unit) and cross-chip related setup.
      
      Most chips have it, but some others don't (older ones such as 6060).
      
      Now that its related code is isolated in mv88e6xxx_g2_* functions, move
      it to its own global2.c file, making most of its setup code static.
      Document each registers in the meantime.
      
      Its compilation can be later avoided for chips without such registers.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec561276
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: fix module naming · 6654d0bf
      Vivien Didelot authored
      Since the mv88e6xxx.c file has been renamed, the driver compiled as a
      module is called chip.ko instead of mv88e6xxx.ko. Fix this.
      
      Fixes: fad09c73 ("net: dsa: mv88e6xxx: rename single-chip support")
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6654d0bf
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 60175ccd
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree.  Most relevant updates are the removal of per-conntrack timers to
      use a workqueue/garbage collection approach instead from Florian
      Westphal, the hash and numgen expression for nf_tables from Laura
      Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
      removal of ip_conntrack sysctl and many other incremental updates on our
      Netfilter codebase.
      
      More specifically, they are:
      
      1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
         transport area in dccp, sctp, tcp, udp and udplite protocol
         conntrackers, from Gao Feng.
      
      2) Missing whitespace on error message in physdev match, from Hangbin Liu.
      
      3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.
      
      4) Add nf_ct_expires() helper function and use it, from Florian Westphal.
      
      5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
         from Florian.
      
      6) Rename nf_tables set implementation to nft_set_{name}.c
      
      7) Introduce the hash expression to allow arbitrary hashing of selector
         concatenations, from Laura Garcia Liebana.
      
      8) Remove ip_conntrack sysctl backward compatibility code, this code has
         been around for long time already, and we have two interfaces to do
         this already: nf_conntrack sysctl and ctnetlink.
      
      9) Use nf_conntrack_get_ht() helper function whenever possible, instead
         of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.
      
      10) Add quota expression for nf_tables.
      
      11) Add number generator expression for nf_tables, this supports
          incremental and random generators that can be combined with maps,
          very useful for load balancing purpose, again from Laura Garcia Liebana.
      
      12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.
      
      13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
          configuration, this is used by a follow up patch to perform better chain
          update validation.
      
      14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
          nft_set_hash implementation to honor the NLM_F_EXCL flag.
      
      15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
          patch from Florian Westphal.
      
      16) Don't use the DYING bit to know if the conntrack event has been already
          delivered, instead a state variable to track event re-delivery
          states, also from Florian.
      
      17) Remove the per-conntrack timer, use the workqueue approach that was
          discussed during the NFWS, from Florian Westphal.
      
      18) Use the netlink conntrack table dump path to kill stale entries,
          again from Florian.
      
      19) Add a garbage collector to get rid of stale conntracks, from
          Florian.
      
      20) Reschedule garbage collector if eviction rate is high.
      
      21) Get rid of the __nf_ct_kill_acct() helper.
      
      22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.
      
      23) Make nf_log_set() interface assertive on unsupported families.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60175ccd
  3. 04 Sep, 2016 13 commits