1. 15 Feb, 2015 9 commits
  2. 13 Feb, 2015 2 commits
  3. 12 Feb, 2015 12 commits
  4. 11 Feb, 2015 17 commits
    • David S. Miller's avatar
      Merge branch 'rco_correctness' · 777b3e93
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      net: Fixes to remote checksum offload and CHECKSUM_PARTIAL
      
      This patch set fixes a correctness problem with remote checksum
      offload, clarifies the meaning of CHECKSUM_PARTIAL, and allows
      remote checksum offload to set CHECKSUM_PARTIAL instead of
      calling csum_partial and modifying the checksum.
      
      Specifically:
        - In the GRO remote checksum path, restore the checksum after
          calling lower layer GRO functions. This is needed if the
          packet is forwarded off host with the Remote Checksum Offload
          option still present.
        - Clarify meaning of CHECKSUM PARTIAL in the receive path. Only
          the checksums referred to by checksum partial and any preceding
          checksums can be considered verified.
        - Fixes to UDP tunnel GRO complete. Need to set SKB_GSO_UDP_TUNNEL_*,
          SKB_GSO_TUNNEL_REMCSUM, and skb->encapsulation for forwarding
          case.
        - Infrastructure to allow setting of CHECKSUM_PARTIAL in remote
          checksum offload. This a potential performance benefit instead
          of calling csum_partial (potentially twice, once in GRO path
          and once in normal path). The downside of using CHECKSUM_PARTIAL
          and not actually writing the checksum is that we aren't verifying
          that the sender correctly wrote the pseudo checksum into the
          checksum field, or that the start/offset values actually point
          to a checksum. If the sender did not set up these fields correctly,
          a packet might be accepted locally, but not accepted by a peer
          when the packet is forwarded off host. Verifying these fields
          seems non-trivial, and because the fields can only be incorrect
          due to sender error and not corruption (outer checksum protects
          against that) we'll make use of CHECKSUM_PARTIAL the default. This
          behavior can be reverted as an netlink option on the encapsulation
          socket.
        - Change VXLAN and GUE to set CHECKSUM_PARTIAL in remote checksum
          offload by default, configuration hooks can revert to using
          csum_partial.
      
      Testing:
      
      I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200
      streams for GRE/GUE and for VXLAN. This compares before the fixes,
      the fixes with not setting checksum partial in remote checksum offload,
      and with the fixes setting checksum partial. The overall effect seems
      be that using checksum partial is a slight performance win, perf
      definitely shows a significant reduction of time in csum_partial on
      the receive CPUs.
      
      GRE/GUE
          TCP_STREAM
            Before fixes
              9.22% TX CPU utilization
              13.57% RX CPU utilization
              9133 Mbps
            Not using checksum partial
              9.59% TX CPU utilization
              14.95% RX CPU utilization
              9132 Mbps
            Using checksum partial
              9.37% TX CPU utilization
              13.89% RX CPU utilization
              9132 Mbps
          TCP_RR
            Before fixes
              CPU utilization
              159/251/447 90/95/99% latencies
              1.1462e+06 tps
            Not using checksum partial
              92.94% CPU utilization
              158/253/445 90/95/99% latencies
              1.12988e+06 tps
            Using checksum partial
              92.78% CPU utilization
              158/250/450 90/95/99% latencies
              1.15343e+06 tps
      
      VXLAN
          TCP_STREAM
            Before fixes
              9.24% TX CPU utilization
              13.74% RX CPU utilization
              9093 Mbps
            Not using checksum partial
              9.95% TX CPU utilization
              14.66% RX CPU utilization
              9094 Mbps
            Using checksum partial
              10.24% TX CPU utilization
              13.32% RX CPU utilization
              9093 Mbps
          TCP_RR
            Before fixes
              92.91% CPU utilization
              151/241/437 90/95/99% latencies
              1.15939e+06 tps
            Not using checksum partial
              93.07% CPU utilization
              156/246/425 90/95/99% latencies
              1.1451e+06 tps
            Using checksum partial
              95.51% CPU utilization
              156/249/459 90/95/99% latencies
              1.17004e+06 tps
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      777b3e93
    • Tom Herbert's avatar
      gue: Use checksum partial with remote checksum offload · fe881ef1
      Tom Herbert authored
      Change remote checksum handling to set checksum partial as default
      behavior. Added an iflink parameter to configure not using
      checksum partial (calling csum_partial to update checksum).
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe881ef1
    • Tom Herbert's avatar
      vxlan: Use checksum partial with remote checksum offload · 0ace2ca8
      Tom Herbert authored
      Change remote checksum handling to set checksum partial as default
      behavior. Added an iflink parameter to configure not using
      checksum partial (calling csum_partial to update checksum).
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ace2ca8
    • Tom Herbert's avatar
      net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offload · 15e2396d
      Tom Herbert authored
      This patch adds infrastructure so that remote checksum offload can
      set CHECKSUM_PARTIAL instead of calling csum_partial and writing
      the modfied checksum field.
      
      Add skb_remcsum_adjust_partial function to set an skb for using
      CHECKSUM_PARTIAL with remote checksum offload.  Changed
      skb_remcsum_process and skb_gro_remcsum_process to take a boolean
      argument to indicate if checksum partial can be set or the
      checksum needs to be modified using the normal algorithm.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15e2396d
    • Tom Herbert's avatar
      net: Use more bit fields in napi_gro_cb · baa32ff4
      Tom Herbert authored
      This patch moves the free and same_flow fields to be bit fields
      (2 and 1 bit sized respectively). This frees up some space for u16's.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baa32ff4
    • Tom Herbert's avatar
      udp: Set SKB_GSO_UDP_TUNNEL* in UDP GRO path · 6db93ea1
      Tom Herbert authored
      Properly set GSO types and skb->encapsulation in the UDP tunnel GRO
      complete so that packets are properly represented for GSO. This sets
      SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether
      non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if
      the remote checksum option was processed.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6db93ea1
    • Tom Herbert's avatar
      net: Clarify meaning of CHECKSUM_PARTIAL for receive path · 6edec0e6
      Tom Herbert authored
      The current meaning of CHECKSUM_PARTIAL for validating checksums
      is that _all_ checksums in the packet are considered valid.
      However, in the manner that CHECKSUM_PARTIAL is set only the checksum
      at csum_start+csum_offset and any preceding checksums may
      be considered valid. If there are checksums in the packet after
      csum_offset it is possible they have not been verfied.
      
      This patch changes CHECKSUM_PARTIAL logic in skb_csum_unnecessary and
      __skb_gro_checksum_validate_needed to only considered checksums
      referring to csum_start and any preceding checksums (with starting
      offset before csum_start) to be verified.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6edec0e6
    • Tom Herbert's avatar
      net: Fix remcsum in GRO path to not change packet · 26c4f7da
      Tom Herbert authored
      Remote checksum offload processing is currently the same for both
      the GRO and non-GRO path. When the remote checksum offload option
      is encountered, the checksum field referred to is modified in
      the packet. So in the GRO case, the packet is modified in the
      GRO path and then the operation is skipped when the packet goes
      through the normal path based on skb->remcsum_offload. There is
      a problem in that the packet may be modified in the GRO path, but
      then forwarded off host still containing the remote checksum option.
      A remote host will again perform RCO but now the checksum verification
      will fail since GRO RCO already modified the checksum.
      
      To fix this, we ensure that GRO restores a packet to it's original
      state before returning. In this model, when GRO processes a remote
      checksum option it still changes the checksum per the algorithm
      but on return from lower layer processing the checksum is restored
      to its original value.
      
      In this patch we add define gro_remcsum structure which is passed
      to skb_gro_remcsum_process to save offset and delta for the checksum
      being changed. After lower layer processing, skb_gro_remcsum_cleanup
      is called to restore the checksum before returning from GRO.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26c4f7da
    • Geert Uytterhoeven's avatar
      openvswitch: Add missing initialization in validate_and_copy_set_tun() · 13101602
      Geert Uytterhoeven authored
      net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’:
      net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function
      
      If ipv4_tun_from_nlattr() returns a different positive value than
      OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and
      validate_and_copy_set_tun() may return an undefined value instead of a
      zero success indicator. Initialize err to zero to fix this.
      
      Fixes: 1dd144cf ("openvswitch: Support VXLAN Group Policy extension")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13101602
    • Pravin B Shelar's avatar
      openvswitch: Reset key metadata for packet execution. · b35725a2
      Pravin B Shelar authored
      Userspace packet execute command pass down flow key for given
      packet. But userspace can skip some parameter with zero value.
      Therefore kernel needs to initialize key metadata to zero.
      
      Fixes: 07148121 ("openvswitch: Eliminate memset() from flow_extract.")
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b35725a2
    • Joe Perches's avatar
      treewide: Remove unnecessary SSB_DEVTABLE_END macro · 673e2baa
      Joe Perches authored
      Use the normal {} instead of a macro to terminate an array.
      
      Remove the macro too.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      673e2baa
    • Joe Perches's avatar
      treewide: Remove unnecessary BCMA_CORETABLE_END macro · f7219b52
      Joe Perches authored
      Use the normal {} instead of a macro to terminate an array.
      
      Remove the macro too.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7219b52
    • Sowmini Varadhan's avatar
      rds: rds_cong_queue_updates needs to defer the congestion update transmission · 80ad0d4a
      Sowmini Varadhan authored
      When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
      from rds_cong_queue_update because
      (a) we are already holding the sock_lock in the recv path, and
          will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
          lock
      (b) cong_queue_update does an irqsave on the rds_cong_lock, and this
          will trigger warnings (for a good reason) from functions called
          out of sock_lock.
      
      This patch reverts the change introduced by
      2fa57129 ("RDS: Bypass workqueue when queueing cong updates").
      
      The patch has been verified for both RDS/TCP as well as RDS/RDMA
      to ensure that there are not regressions for either transport:
       - for verification of  RDS/TCP a client-server unit-test was used,
         with the server blocked in gdb and thus unable to drain its rcvbuf,
         eventually triggering a RDS congestion update.
       - for RDS/RDMA, the standard IB regression tests were used
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ad0d4a
    • Vlad Yasevich's avatar
      ipv6: Partial checksum only UDP packets · bf250a1f
      Vlad Yasevich authored
      ip6_append_data is used by other protocols and some of them can't
      be partially checksummed.  Only partially checksum UDP protocol.
      
      Fixes: 32dce968 (ipv6: Allow for partial checksums on non-ufo packets)
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Tested-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf250a1f
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4a3046d6
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains two small Netfilter updates for your
      net-next tree, they are:
      
      1) Add ebtables support to nft_compat, from Arturo Borrero.
      
      2) Fix missing validation of the SET_ID attribute in the lookup
         expressions, from Patrick McHardy.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a3046d6
    • Linus Torvalds's avatar
      Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6 · 73b4f63a
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "Highlights this time around include:
      
         - A thrashing of SubmittingPatches to bring it out of the "send
           everything to Linus" era of kernel development.
      
         - A new document on completions from Nicholas McGuire
      
         - Lots of typo fixes, formatting improvements, corrections, build
           fixes, and more"
      
      * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
        Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
        can-doc: Fixed a wrong filepath in can.txt
        Documentation: Fix trivial typo in comment.
        kgdb,docs: Fix typo and minor style issues
        Documentation: add description for FTRACE probe status
        doc: brief user documentation for completion
        Documentation/misc-devices/mei: Fix indentation of embedded code.
        Documentation/misc-devices/mei: Fix indentation of enumeration.
        Documentation/misc-devices/mei: Fix spacing around parentheses.
        Documentation/misc-devices/mei: Fix formatting of headings.
        Documentation: devicetree: Fix double words in Doumentation/devicetree
        Documentation: mm: Fix typo in vm.txt
        lockstat: Add documentation on contention and contenting points
        Documentation: fix blackfin gptimers-example build errors
        Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
        CodingStyle: enable emacs display of trailing whitespace
        DocBook: Do not exceed argument list limit
        gpio: board.txt: Fix the gpio name example
        Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
        MAINTAINERS: Add the docs-next git tree to the maintainer entry
        ...
      73b4f63a
    • Linus Torvalds's avatar
      Merge branch 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration · bfe9183f
      Linus Torvalds authored
      Pull mailbox framework updates from Jassi Brar.
      
      * 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: Add Altera mailbox driver
        mailbox: check for bit set before polling
        Mailbox: Fix return value check in pcc_init()
      bfe9183f