1. 13 Jan, 2015 16 commits
  2. 12 Jan, 2015 24 commits
    • David S. Miller's avatar
      Merge branch 'tuntap_queues' · d2c60b13
      David S. Miller authored
      Pankaj Gupta says:
      
      ====================
      Increase the limit of tuntap queues
      
      Networking under KVM works best if we allocate a per-vCPU rx and tx
      queue in a virtual NIC. This requires a per-vCPU queue on the host side.
      Modern physical NICs have multiqueue support for large number of queues.
      To scale vNIC to run multiple queues parallel to maximum number of vCPU's
      we need to increase number of queues support in tuntap.
      
      Changes from v4:
      PATCH2: Michael.S.Tsirkin - Updated change comment message.
      
      Changes from v3:
      PATCH1: Michael.S.Tsirkin - Some cleanups and updated commit message.
                                  Perf numbers on 10 Gbs NIC
      Changes from v2:
      PATCH 3: David Miller     - flex array adds extra level of indirection
                                  for preallocated array.(dropped, as flow array
      			    is allocated using kzalloc with failover to zalloc).
      Changes from v1:
      PATCH 2: David Miller     - sysctl changes to limit number of queues
                                  not required for unprivileged users(dropped).
      
      Changes from RFC
      PATCH 1: Sergei Shtylyov  - Add an empty line after declarations.
      PATCH 2: Jiri Pirko -       Do not introduce new module paramaters.
      	 Michael.S.Tsirkin- We can use sysctl for limiting max number
                                  of queues.
      
      This series is to increase the number of tuntap queues. Original work is being
      done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29'
      patch series as a reference. As per discussion in the patch series:
      
      There were two reasons which prevented us from increasing number of tun queues:
      
      - The netdev_queue array in netdevice were allocated through kmalloc, which may
        cause a high order memory allocation too when we have several queues.
        E.g. sizeof(netdev_queue) is 320, which means a high order allocation would
        happens when the device has more than 16 queues.
      
      - We store the hash buckets in tun_struct which results a very large size of
        tun_struct, this high order memory allocation fail easily when the memory is
        fragmented.
      
      The patch 60877a32 increases the number of tx
      queues. Memory allocation fallback to vzalloc() when kmalloc() fails.
      
      This series tries to address following issues:
      
      - Increase the number of netdev_queue queues for rx similarly its done for tx
        queues by falling back to vzalloc() when memory allocation with kmalloc() fails.
      
      - Increase number of queues to 256, maximum number is equal to maximum number
        of vCPUS allowed in a guest.
      
      I have also done testing with multiple parallel Netperf sessions for different
      combination of queues and CPU's. It seems to be working fine without much increase
      in cpu load with increase in number of queues. I also see good increase in throughput
      with increase in number of queues. Though i had limitation of 8 physical CPU's.
      
      For this test: Two Hosts(Host1 & Host2) are directly connected with cable
      Host1 is running Guest1. Data is sent from Host2 to Guest1 via Host1.
      
      Host kernel: 3.19.0-rc2+, AMD Opteron(tm) Processor 6320
      NIC : Emulex Corporation OneConnect 10Gb NIC (be3)
      
      Patch Applied  %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle  throughput
      Single Queue, 2 vCPU's
      -------------
      Before Patch :all    0.19    0.00    0.16    0.07    0.04    0.10    0.00    0.18    0.00   99.26  57864.18
      After  Patch :all    0.99    0.00    0.64    0.69    0.07    0.26    0.00    1.58    0.00   95.77  57735.77
      
      With 2 Queues, 2 vCPU's
      ---------------
      Before Patch :all    0.19    0.00    0.19    0.10    0.04    0.11    0.00    0.28    0.00   99.08  63083.09
      After  Patch :all    0.87    0.00    0.73    0.78    0.09    0.35    0.00    2.04    0.00   95.14  62917.03
      
      With 4 Queues, 4 vCPU's
      --------------
      Before Patch :all    0.20    0.00    0.21    0.11    0.04    0.12    0.00    0.32    0.00   99.00  80865.06
      After  Patch :all    0.71    0.00    0.93    0.85    0.11    0.51    0.00    2.62    0.00   94.27  86463.19
      
      With 8 Queues, 8 vCPU's
      --------------
      Before Patch :all    0.19    0.00    0.18    0.09    0.04    0.11    0.00    0.23    0.00   99.17  86795.31
      After  Patch :all    0.65    0.00    1.18    0.93    0.13    0.68    0.00    3.38    0.00   93.05  89459.93
      
      With 16 Queues, 8 vCPU's
      --------------
      After  Patch :all    0.61    0.00    1.59    0.97    0.18    0.92    0.00    4.32    0.00   91.41  120951.60
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2c60b13
    • Pankaj Gupta's avatar
      tuntap: Increase the number of queues in tun. · baf71c5c
      Pankaj Gupta authored
      Networking under kvm works best if we allocate a per-vCPU RX and TX
      queue in a virtual NIC. This requires a per-vCPU queue on the host side.
      
      It is now safe to increase the maximum number of queues.
      Preceding patch: 'net: allow large number of rx queues'
      made sure this won't cause failures due to high order memory
      allocations. Increase it to 256: this is the max number of vCPUs
      KVM supports.
      
      Size of tun_struct changes from 8512 to 10496 after this patch. This keeps
      pages allocated for tun_struct before and after the patch to 3.
      Signed-off-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarDavid Gibson <dgibson@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baf71c5c
    • Pankaj Gupta's avatar
      net: allow large number of rx queues · 10595902
      Pankaj Gupta authored
      netif_alloc_rx_queues() uses kcalloc() to allocate memory
      for "struct netdev_queue *_rx" array.
      If we are doing large rx queue allocation kcalloc() might
      fail, so this patch does a fallback to vzalloc().
      Similar implementation is done for tx queue allocation in
      netif_alloc_netdev_queues().
      
      We avoid failure of high order memory allocation
      with the help of vzalloc(), this allows us to do large
      rx and tx queue allocation which in turn helps us to
      increase the number of queues in tun.
      
      As vmalloc() adds overhead on a critical network path,
      __GFP_REPEAT flag is used with kzalloc() to do this fallback
      only when really needed.
      Signed-off-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarDavid Gibson <dgibson@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10595902
    • Kenneth Williams's avatar
      team: Remove dead code · e350a96e
      Kenneth Williams authored
      The deleted lines are called from a function which is called:
      1) Only through __team_options_register via team_options_register and
      2) Only during initialization / mode initialization when there are no
      ports attached.
      Therefore the ports list is guarenteed to be empty and this code will
      never be executed.
      Signed-off-by: default avatarKenneth Williams <ken@williamsclan.us>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e350a96e
    • David Decotigny's avatar
      b1e8bc61
    • Rickard Strandqvist's avatar
      net: sched: sch_teql: Remove unused function · ddcde70c
      Rickard Strandqvist authored
      Remove the function teql_neigh_release() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: default avatarRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddcde70c
    • Rickard Strandqvist's avatar
      net: xfrm: xfrm_algo: Remove unused function · 83400b99
      Rickard Strandqvist authored
      Remove the function aead_entries() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: default avatarRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83400b99
    • David S. Miller's avatar
      Merge branch 'bridge_vlan_ranges' · d0d2cc53
      David S. Miller authored
      Roopa Prabhu says:
      
      ====================
      bridge: support for vlan range in setlink/dellink
      
      This series adds new flags in IFLA_BRIDGE_VLAN_INFO to indicate
      vlan range.
      
      Will post corresponding iproute2 patches if these get accepted.
      
      v1-> v2
          - changed patches to use a nested list attribute
          IFLA_BRIDGE_VLAN_INFO_LIST as suggested by scott feldman
          - dropped notification changes from the series. Will post them
          separately after this range message is accepted.
      
      v2 -> v3
          - incorporated some review feedback
          - include patches to fill vlan ranges during getlink
          - Dropped IFLA_BRIDGE_VLAN_INFO_LIST. I think it may get
          confusing to userspace if we introduce yet another way to
          send lists. With getlink already sending nested
          IFLA_BRIDGE_VLAN_INFO in IFLA_AF_SPEC, It seems better to
          use the existing format for lists and just use the flags from v2
          to mark vlan ranges
      ====================
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0d2cc53
    • Roopa Prabhu's avatar
      bridge: new function to pack vlans into ranges during gets · 36cd0ffb
      Roopa Prabhu authored
      This patch adds new function to pack vlans into ranges
      whereever applicable using the flags BRIDGE_VLAN_INFO_RANGE_BEGIN
      and BRIDGE VLAN_INFO_RANGE_END
      
      Old vlan packing code is moved to a new function and continues to be
      called when filter_mask is RTEXT_FILTER_BRVLAN.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36cd0ffb
    • Roopa Prabhu's avatar
      rtnetlink: new filter RTEXT_FILTER_BRVLAN_COMPRESSED · 35a27cee
      Roopa Prabhu authored
      This filter is same as RTEXT_FILTER_BRVLAN except that it tries
      to compress the consecutive vlans into ranges.
      
      This helps on systems with large number of configured vlans.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35a27cee
    • Roopa Prabhu's avatar
      bridge: support for multiple vlans and vlan ranges in setlink and dellink requests · bdced7ef
      Roopa Prabhu authored
      This patch changes bridge IFLA_AF_SPEC netlink attribute parser to
      look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows
      userspace to pack more than one vlan in the setlink msg.
      
      The dumps were already sending more than one vlan info in the getlink msg.
      
      This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
      BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range
      
      This patch also deletes unused ifla_br_policy.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdced7ef
    • Vincenzo Maffione's avatar
      drivers: net: xen-netfront: remove residual dead code · dd2e8bf5
      Vincenzo Maffione authored
      This patch removes some unused arrays from the netfront private
      data structures. These arrays were used in "flip" receive mode.
      Signed-off-by: default avatarVincenzo Maffione <v.maffione@gmail.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd2e8bf5
    • Shrikrishna Khare's avatar
      Driver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate · 5ec82c1e
      Shrikrishna Khare authored
      Failing to reinitialize on wakeup results in loss of network connectivity for
      vmxnet3 interface.
      Signed-off-by: default avatarSrividya Murali <smurali@vmware.com>
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Reviewed-by: default avatarShreyas N Bhatewara <sbhatewara@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ec82c1e
    • Jonathan Toppins's avatar
      bonding: cleanup bond_opts array · 7bfa0145
      Jonathan Toppins authored
      Remove the empty array element initializer and size the array with
      BOND_OPT_LAST so the compiler will complain if more elements are in
      there than should be.
      
      An interesting unwanted side effect of this initializer is that if one
      inserts new options into the middle of the array then this initializer
      will zero out the option that equals BOND_OPT_TLB_DYNAMIC_LB+1.
      
      Example:
      Extend the OPTS enum:
      enum {
         ...
         BOND_OPT_TLB_DYNAMIC_LB,
         BOND_OPT_LACP_NEW1,
         BOND_OPT_LAST
      };
      
      Now insert into bond_opts array:
      static const struct bond_option bond_opts[] = {
            ...
            [BOND_OPT_LACP_RATE] = { .... unchanged stuff .... },
            [BOND_OPT_LACP_NEW1] = { ... new stuff ... },
            ...
            [BOND_OPT_TLB_DYNAMIC_LB] = { .... unchanged stuff ....},
            { } // MARK A
      };
      
      Since BOND_OPT_LACP_NEW1 = BOND_OPT_TLB_DYNAMIC_LB+1, the last
      initializer (MARK A) will overwrite the contents of BOND_OPT_LACP_NEW1
      and can be easily viewed with the crash utility.
      Signed-off-by: default avatarJonathan Toppins <jtoppins@cumulusnetworks.com>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bfa0145
    • David S. Miller's avatar
      Merge branch 'tipc-namespaces' · d9fbfb94
      David S. Miller authored
      Ying Xue says:
      
      ====================
      tipc: make tipc support namespace
      
      This patchset aims to add net namespace support for TIPC stack.
      
      Currently TIPC module declares the following global resources:
      - TIPC network idenfication number
      - TIPC node table
      - TIPC bearer list table
      - TIPC broadcast link
      - TIPC socket reference table
      - TIPC name service table
      - TIPC node address
      - TIPC service subscriber server
      - TIPC random value
      - TIPC netlink
      
      In order that TIPC is aware of namespace, above each resource must be
      allocated, initialized and destroyed inside per namespace. Therefore,
      the major works of this patchset are to isolate these global resources
      and make them private for each namespace. However, before these changes
      come true, some necessary preparation works must be first done: convert
      socket reference table with generic rhashtable, cleanup core.c and
      core.h files, remove unnecessary wrapper functions for kernel timer
      interfaces and so on.
      
      It should be noted that commit ##1 ("tipc: fix bug in broadcast
      retransmit code") was already submitted to 'net' tree, so please see
      below link:
      
      http://patchwork.ozlabs.org/patch/426717/
      
      Since it is prerequisite for the rest of the series to apply, I
      prepend them to the series.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9fbfb94
    • Ying Xue's avatar
      tipc: make netlink support net namespace · d49e2041
      Ying Xue authored
      Currently tipc module only allows users sitting on "init_net" namespace
      to configure it through netlink interface. But now almost each tipc
      component is able to be aware of net namespace, so it's time to open
      the permission for users residing in other namespaces, allowing them
      to configure their own tipc stack instance through netlink interface.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d49e2041
    • Ying Xue's avatar
      tipc: make tipc random value aware of net namespace · bafa29e3
      Ying Xue authored
      After namespace is supported, each namespace should own its private
      random value. So the global variable representing the random value
      must be moved to tipc_net structure.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bafa29e3
    • Ying Xue's avatar
      tipc: make subscriber server support net namespace · a62fbcce
      Ying Xue authored
      TIPC establishes one subscriber server which allows users to subscribe
      their interesting name service status. After tipc supports namespace,
      one dedicated tipc stack instance is created for each namespace, and
      each instance can be deemed as one independent TIPC node. As a result,
      subscriber server must be built for each namespace.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a62fbcce
    • Ying Xue's avatar
      tipc: make tipc node address support net namespace · 34747539
      Ying Xue authored
      If net namespace is supported in tipc, each namespace will be treated
      as a separate tipc node. Therefore, every namespace must own its
      private tipc node address. This means the "tipc_own_addr" global
      variable of node address must be moved to tipc_net structure to
      satisfy the requirement. It's turned out that users also can assign
      node address for every namespace.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34747539
    • Ying Xue's avatar
      tipc: name tipc name table support net namespace · 4ac1c8d0
      Ying Xue authored
      TIPC name table is used to store the mapping relationship between
      TIPC service name and socket port ID. When tipc supports namespace,
      it allows users to publish service names only owned by a certain
      namespace. Therefore, every namespace must have its private name
      table to prevent service names published to one namespace from being
      contaminated by other service names in another namespace. Therefore,
      The name table global variable (ie, nametbl) and its lock must be
      moved to tipc_net structure, and a parameter of namespace must be
      added for necessary functions so that they can obtain name table
      variable defined in tipc_net structure.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ac1c8d0
    • Ying Xue's avatar
      tipc: make tipc socket support net namespace · e05b31f4
      Ying Xue authored
      Now tipc socket table is statically allocated as a global variable.
      Through it, we can look up one socket instance with port ID, insert
      a new socket instance to the table, and delete a socket from the
      table. But when tipc supports net namespace, each namespace must own
      its specific socket table. So the global variable of socket table
      must be redefined in tipc_net structure. As a concequence, a new
      socket table will be allocated when a new namespace is created, and
      a socket table will be deallocated when namespace is destroyed.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e05b31f4
    • Ying Xue's avatar
      tipc: make tipc broadcast link support net namespace · 1da46568
      Ying Xue authored
      TIPC broadcast link is statically established and its relevant states
      are maintained with the global variables: "bcbearer", "bclink" and
      "bcl". Allowing different namespace to own different broadcast link
      instances, these variables must be moved to tipc_net structure and
      broadcast link instances would be allocated and initialized when
      namespace is created.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1da46568
    • Ying Xue's avatar
      tipc: make bearer list support net namespace · 7f9f95d9
      Ying Xue authored
      Bearer list defined as a global variable is used to store bearer
      instances. When tipc supports net namespace, bearers created in
      one namespace must be isolated with others allocated in other
      namespaces, which requires us that the bearer list(bearer_list)
      must be moved to tipc_net structure. As a result, a net namespace
      pointer has to be passed to functions which access the bearer list.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f9f95d9
    • Ying Xue's avatar
      tipc: make tipc node table aware of net namespace · f2f9800d
      Ying Xue authored
      Global variables associated with node table are below:
      - node table list (node_htable)
      - node hash table list (tipc_node_list)
      - node table lock (node_list_lock)
      - node number counter (tipc_num_nodes)
      - node link number counter (tipc_num_links)
      
      To make node table support namespace, above global variables must be
      moved to tipc_net structure in order to keep secret for different
      namespaces. As a consequence, these variables are allocated and
      initialized when namespace is created, and deallocated when namespace
      is destroyed. After the change, functions associated with these
      variables have to utilize a namespace pointer to access them. So
      adding namespace pointer as a parameter of these functions is the
      major change made in the commit.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f9800d