1. 10 Aug, 2015 27 commits
  2. 07 Aug, 2015 13 commits
    • Tom Herbert's avatar
      net: Fix race condition in store_rps_map · 10e4ea75
      Tom Herbert authored
      There is a race condition in store_rps_map that allows jump label
      count in rps_needed to go below zero. This can happen when
      concurrently attempting to set and a clear map.
      
      Scenario:
      
      1. rps_needed count is zero
      2. New map is assigned by setting thread, but rps_needed count _not_ yet
         incremented (rps_needed count still zero)
      2. Map is cleared by second thread, old_map set to that just assigned
      3. Second thread performs static_key_slow_dec, rps_needed count now goes
         negative
      
      Fix is to increment or decrement rps_needed under the spinlock.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10e4ea75
    • Wenyu Zhang's avatar
      openvswitch: Make 100 percents packets sampled when sampling rate is 1. · e05176a3
      Wenyu Zhang authored
      When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
      should be sampled even the prandom32() generate the number of UINT32_MAX.
      And none packet need be sampled when the probability is 0.
      Signed-off-by: default avatarWenyu Zhang <wenyuz@vmware.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e05176a3
    • Alexei Starovoitov's avatar
      vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA · da8b43c0
      Alexei Starovoitov authored
      IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
      so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
      'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
      This mode can be used by routing, tc+bpf and ovs.
      Only ovs is strictly flow based, so 'collect metadata' is a better
      name for this tunnel mode.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da8b43c0
    • David S. Miller's avatar
      Merge branch 'rds-tcp-netns' · e03c5128
      David S. Miller authored
      Sowmini Varadhan says:
      
      ====================
      RDS-TCP: Network namespace support
      
      This patch series contains the set of changes to correctly set up
      the infra for PF_RDS sockets that use TCP as the transport in multiple
      network namespaces.
      
      Patch 1 in the series is the minimal set of changes to allow
      a single instance of RDS-TCP to run in any (i.e init_net or other) net
      namespace.  The changes in this patch set ensure that the execution of
      'modprobe [-r] rds_tcp' sets up the kernel TCP sockets
      relative to the current netns, so that RDS applications can send/recv
      packets from that netns, and the netns can later be deleted cleanly.
      
      Patch 2 of the series further allows multiple RDS-TCP instances,
      one per network namespace. The changes in this patch allows dynamic
      creation/tear-down of RDS-TCP client and server sockets  across all
      current and future namespaces.
      
      v2 changes from RFC sent out earlier:
          David Ahern comments in patch 1, net_device notifier in patch 2,
          patch 3 broken off and submitted separately.
      v3: Cong Wang review comments.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e03c5128
    • Sowmini Varadhan's avatar
      RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns. · 467fa153
      Sowmini Varadhan authored
      Register pernet subsys init/stop functions that will set up
      and tear down per-net RDS-TCP listen endpoints. Unregister
      pernet subusys functions on 'modprobe -r' to clean up these
      end points.
      
      Enable keepalive on both accept and connect socket endpoints.
      The keepalive timer expiration will ensure that client socket
      endpoints will be removed as appropriate from the netns when
      an interface is removed from a namespace.
      
      Register a device notifier callback that will clean up all
      sockets (and thus avoid the need to wait for keepalive timeout)
      when the loopback device is unregistered from the netns indicating
      that the netns is getting deleted.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      467fa153
    • Sowmini Varadhan's avatar
      RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net · d5a8ac28
      Sowmini Varadhan authored
      Open the sockets calling sock_create_kern() with the correct struct net
      pointer, and use that struct net pointer when verifying the
      address passed to rds_bind().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5a8ac28
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 1ebd08a7
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-08-05
      
      This series contains updates to i40e, i40evf and e1000e.
      
      Anjali adds support for x772 devices to i40e and i40evf.  With the added
      support, x772 supports offloading of the outer UDP transmit and receive
      checksum for tunneled packets.  Also supports evicting ATR filters in the
      hardware, so update the driver with this new feature set.
      
      Raanan provides several fixes for e1000e, first rectifies the Energy
      Efficient Ethernet in Sx code so that it only applies to parts that
      actually support EEE in Sx.  Fix whitespace and moved ICH8 related define
      to the proper context.  Fixed the ASPM locking which was reported by
      Bjorn Helgaas.  Fix a workaround implementation for systime which could
      experience a large non-linear increment of the systime value when
      checking for overflow.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ebd08a7
    • Jason A. Donenfeld's avatar
      net_dbg_ratelimited: turn into no-op when !DEBUG · d92cff89
      Jason A. Donenfeld authored
      The pr_debug family of functions turns into a no-op when -DDEBUG is not
      specified, opting instead to call "no_printk", which gets compiled to a
      no-op (but retains gcc's nice warnings about printf-style arguments).
      
      The problem with net_dbg_ratelimited is that it is defined to be a
      variant of net_ratelimited_function, which expands to essentially:
      
          if (net_ratelimit())
              pr_debug(fmt, ...);
      
      When DEBUG is not defined, then this becomes,
      
          if (net_ratelimit())
              ;
      
      This seems benign, except it isn't. Firstly, there's the obvious
      overhead of calling net_ratelimit needlessly, which does quite some book
      keeping for the rate limiting. Given that the pr_debug and
      net_dbg_ratelimited family of functions are sprinkled liberally through
      performance critical code, with developers assuming they'll be compiled
      out to a no-op most of the time, we certainly do not want this needless
      book keeping. Secondly, and most visibly, even though no debug message
      is printed when DEBUG is not defined, if there is a flood of
      invocations, dmesg winds up peppered with messages such as
      "net_ratelimit: 320 callbacks suppressed". This is because our
      aforementioned net_ratelimit() function actually prints this text in
      some circumstances. It's especially odd to see this when there isn't any
      other accompanying debug message.
      
      So, in sum, it doesn't make sense to have this function's current
      behavior, and instead it should match what every other debug family of
      functions in the kernel does with !DEBUG -- nothing.
      
      This patch replaces calls to net_dbg_ratelimited when !DEBUG with
      no_printk, keeping with the idiom of all the other debug print helpers.
      
      Also, though not strictly neccessary, it guards the call with an if (0)
      so that all evaluation of any arguments are sure to be compiled out.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d92cff89
    • Roopa Prabhu's avatar
      af_mpls: add null dev check in find_outdev · 3dcb615e
      Roopa Prabhu authored
      This patch adds null dev check for the 'cfg->rc_via_table ==
      NEIGH_LINK_TABLE or dev_get_by_index() failed' case
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dcb615e
    • David S. Miller's avatar
      Merge branch 'test-bpf-next' · 3c621818
      David S. Miller authored
      Nicolas Schichan says:
      
      ====================
      test_bpf improvements
      
      Please find below the patch series with my latest changes to test_bpf.
      
      The first patch checks for unexpected NULL generated skbs before
      running the filter.
      
      The second patch adds the possibility for tests to generate fragmented
      skbs.
      
      The third patch tests LD_ABS and LD_IND on fragmented skbs.
      
      The fourth patch adds the possibility to restrict the tests being run
      by specifying the name/id/range of the test(s) to run via module
      parameters.
      
      The fifth patch tests LD_ABS and LD_IND on non fragmented skbs with
      various sizes and alignments.
      
      The sixth and final patch checks that the interpreter or JIT correctly
      resets A and X to 0.
      
      This serie is against today's net-next tree.
      
      Changes in V2:
      
      * move declaration of 'ptr' in if() block in patch 2/6.
      
      * fix various typos in patch 4/6
      
      * rework default init of test_range array and cleanup exclude_test()
        return condition in patch 4/6.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c621818
    • Nicolas Schichan's avatar
      test_bpf: add tests checking that JIT/interpreter sets A and X to 0. · 86bf1721
      Nicolas Schichan authored
      It is mandatory for the JIT or interpreter to reset the A and X
      registers to 0 before running the filter. Check that it is the case on
      various ALU and JMP instructions.
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86bf1721
    • Nicolas Schichan's avatar
      test_bpf: add more tests for LD_ABS and LD_IND. · 08fcb08f
      Nicolas Schichan authored
      This exerces the LD_ABS and LD_IND instructions for various sizes and
      alignments. This also checks that X when used as an offset to a
      BPF_IND instruction first in a filter is correctly set to 0.
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08fcb08f
    • Nicolas Schichan's avatar
      test_bpf: add module parameters to filter the tests to run. · d2648d4e
      Nicolas Schichan authored
      When developping on the interpreter or a particular JIT, it can be
      interesting to restrict the tests list to a specific test or a
      particular range of tests.
      
      This patch adds the following module parameters to the test_bpf module:
      
      * test_name=<string>: only the specified named test will be run.
      
      * test_id=<number>: only the test with the specified id will be run
        (see the output of test_bpf without parameters to get the test id).
      
      * test_range=<number>,<number>: only the tests within IDs in the
        specified id range are run (see the output of test_bpf without
        parameters to get the test ids).
      
      Any invalid range, test id or test name will result in -EINVAL being
      returned and no tests being run.
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2648d4e