1. 06 Sep, 2016 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 60175ccd
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree.  Most relevant updates are the removal of per-conntrack timers to
      use a workqueue/garbage collection approach instead from Florian
      Westphal, the hash and numgen expression for nf_tables from Laura
      Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
      removal of ip_conntrack sysctl and many other incremental updates on our
      Netfilter codebase.
      
      More specifically, they are:
      
      1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
         transport area in dccp, sctp, tcp, udp and udplite protocol
         conntrackers, from Gao Feng.
      
      2) Missing whitespace on error message in physdev match, from Hangbin Liu.
      
      3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.
      
      4) Add nf_ct_expires() helper function and use it, from Florian Westphal.
      
      5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
         from Florian.
      
      6) Rename nf_tables set implementation to nft_set_{name}.c
      
      7) Introduce the hash expression to allow arbitrary hashing of selector
         concatenations, from Laura Garcia Liebana.
      
      8) Remove ip_conntrack sysctl backward compatibility code, this code has
         been around for long time already, and we have two interfaces to do
         this already: nf_conntrack sysctl and ctnetlink.
      
      9) Use nf_conntrack_get_ht() helper function whenever possible, instead
         of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.
      
      10) Add quota expression for nf_tables.
      
      11) Add number generator expression for nf_tables, this supports
          incremental and random generators that can be combined with maps,
          very useful for load balancing purpose, again from Laura Garcia Liebana.
      
      12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.
      
      13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
          configuration, this is used by a follow up patch to perform better chain
          update validation.
      
      14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
          nft_set_hash implementation to honor the NLM_F_EXCL flag.
      
      15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
          patch from Florian Westphal.
      
      16) Don't use the DYING bit to know if the conntrack event has been already
          delivered, instead a state variable to track event re-delivery
          states, also from Florian.
      
      17) Remove the per-conntrack timer, use the workqueue approach that was
          discussed during the NFWS, from Florian Westphal.
      
      18) Use the netlink conntrack table dump path to kill stale entries,
          again from Florian.
      
      19) Add a garbage collector to get rid of stale conntracks, from
          Florian.
      
      20) Reschedule garbage collector if eviction rate is high.
      
      21) Get rid of the __nf_ct_kill_acct() helper.
      
      22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.
      
      23) Make nf_log_set() interface assertive on unsupported families.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60175ccd
  2. 04 Sep, 2016 6 commits
  3. 03 Sep, 2016 16 commits
  4. 02 Sep, 2016 14 commits
  5. 01 Sep, 2016 3 commits
    • Roopa Prabhu's avatar
      rtnetlink: fdb dump: optimize by saving last interface markers · d297653d
      Roopa Prabhu authored
      fdb dumps spanning multiple skb's currently restart from the first
      interface again for every skb. This results in unnecessary
      iterations on the already visited interfaces and their fdb
      entries. In large scale setups, we have seen this to slow
      down fdb dumps considerably. On a system with 30k macs we
      see fdb dumps spanning across more than 300 skbs.
      
      To fix the problem, this patch replaces the existing single fdb
      marker with three markers: netdev hash entries, netdevs and fdb
      index to continue where we left off instead of restarting from the
      first netdev. This is consistent with link dumps.
      
      In the process of fixing the performance issue, this patch also
      re-implements fix done by
      commit 472681d5 ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
      (with an internal fix from Wilson Kok) in the following ways:
      - change ndo_fdb_dump handlers to return error code instead
      of the last fdb index
      - use cb->args strictly for dump frag markers and not error codes.
      This is consistent with other dump functions.
      
      Below results were taken on a system with 1000 netdevs
      and 35085 fdb entries:
      before patch:
      $time bridge fdb show | wc -l
      15065
      
      real    1m11.791s
      user    0m0.070s
      sys 1m8.395s
      
      (existing code does not return all macs)
      
      after patch:
      $time bridge fdb show | wc -l
      35085
      
      real    0m2.017s
      user    0m0.113s
      sys 0m1.942s
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d297653d
    • Gao Feng's avatar
      rps: flow_dissector: Add the const for the parameter of flow_keys_have_l4 · 66fdd05e
      Gao Feng authored
      Add the const for the parameter of flow_keys_have_l4 for the readability.
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66fdd05e
    • David Howells's avatar
      rxrpc: Don't expose skbs to in-kernel users [ver #2] · d001648e
      David Howells authored
      Don't expose skbs to in-kernel users, such as the AFS filesystem, but
      instead provide a notification hook the indicates that a call needs
      attention and another that indicates that there's a new call to be
      collected.
      
      This makes the following possibilities more achievable:
      
       (1) Call refcounting can be made simpler if skbs don't hold refs to calls.
      
       (2) skbs referring to non-data events will be able to be freed much sooner
           rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
           will be able to consult the call state.
      
       (3) We can shortcut the receive phase when a call is remotely aborted
           because we don't have to go through all the packets to get to the one
           cancelling the operation.
      
       (4) It makes it easier to do encryption/decryption directly between AFS's
           buffers and sk_buffs.
      
       (5) Encryption/decryption can more easily be done in the AFS's thread
           contexts - usually that of the userspace process that issued a syscall
           - rather than in one of rxrpc's background threads on a workqueue.
      
       (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.
      
      To make this work, the following interface function has been added:
      
           int rxrpc_kernel_recv_data(
      		struct socket *sock, struct rxrpc_call *call,
      		void *buffer, size_t bufsize, size_t *_offset,
      		bool want_more, u32 *_abort_code);
      
      This is the recvmsg equivalent.  It allows the caller to find out about the
      state of a specific call and to transfer received data into a buffer
      piecemeal.
      
      afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
      logic between them.  They don't wait synchronously yet because the socket
      lock needs to be dealt with.
      
      Five interface functions have been removed:
      
      	rxrpc_kernel_is_data_last()
          	rxrpc_kernel_get_abort_code()
          	rxrpc_kernel_get_error_number()
          	rxrpc_kernel_free_skb()
          	rxrpc_kernel_data_consumed()
      
      As a temporary hack, sk_buffs going to an in-kernel call are queued on the
      rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
      in-kernel user.  To process the queue internally, a temporary function,
      temp_deliver_data() has been added.  This will be replaced with common code
      between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
      future patch.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d001648e