1. 15 Apr, 2014 9 commits
  2. 14 Apr, 2014 26 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 00cbc3dc
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains three Netfilter fixes for your net tree,
      they are:
      
      * Fix missing generation sequence initialization which results in a splat
        if lockdep is enabled, it was introduced in the recent works to improve
        nf_conntrack scalability, from Andrey Vagin.
      
      * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
        disabled otherwise this crashes due to a double release, from Andrey
        Vagin.
      
      * Fix nf_tables cmp fast in big endian, from Patrick McHardy.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00cbc3dc
    • Vlad Yasevich's avatar
      net: Start with correct mac_len in skb_network_protocol · 1e785f48
      Vlad Yasevich authored
      Sometimes, when the packet arrives at skb_mac_gso_segment()
      its skb->mac_len already accounts for some of the mac lenght
      headers in the packet.  This seems to happen when forwarding
      through and OpenSSL tunnel.
      
      When we start looking for any vlan headers in skb_network_protocol()
      we seem to ignore any of the already known mac headers and start
      with an ETH_HLEN.  This results in an incorrect offset, dropped
      TSO frames and general slowness of the connection.
      
      We can start counting from the known skb->mac_len
      and return at least that much if all mac level headers
      are known and accounted for.
      
      Fixes: 53d6471c (net: Account for all vlan headers in skb_mac_gso_segment)
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Daniel Borkman <dborkman@redhat.com>
      Tested-by: default avatarMartin Filip <nexus+kernel@smoula.net>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e785f48
    • Daniel Borkmann's avatar
      Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" · 362d5204
      Daniel Borkmann authored
      This reverts commit ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management
      to reflect real state of the receiver's buffer") as it introduced a
      serious performance regression on SCTP over IPv4 and IPv6, though a not
      as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.
      
      Current state:
      
      [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 17:56:21 GMT
      Connecting to host 192.168.241.3, port 5201
            Cookie: Lab200slot2.1397238981.812898.548918
      [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
      [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
      [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
      [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
      [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
      [  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
      [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
      [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
      [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
      [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
      [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
      [  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
      [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
      (etc)
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 19:08:41 GMT
      Connecting to host 2001:db8:0:f101::1, port 5201
            Cookie: Lab200slot2.1397243321.714295.2b3f7c
      [  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
      [  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
      [  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
      [  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
      [  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
      [  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
      [  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
      [  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
      [  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
      [  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
      [  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
      [  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
      (etc)
      
      After patch:
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
      Time: Mon, 14 Apr 2014 16:40:48 GMT
      Connecting to host 192.168.240.3, port 5201
            Cookie: Lab200slot2.1397493648.413274.65e131
      [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
      [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
      [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
      [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
      
      With the reverted patch applied, the SCTP/IPv4 performance is back
      to normal on latest upstream for IPv4 and IPv6 and has same throughput
      as 3.4.2 test kernel, steady and interval reports are smooth again.
      
      Fixes: ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
      Reported-by: default avatarPeter Butler <pbutler@sonusnet.com>
      Reported-by: default avatarDongsheng Song <dongsheng.song@gmail.com>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Tested-by: default avatarPeter Butler <pbutler@sonusnet.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
      Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      362d5204
    • Steve Wise's avatar
      cxgb4: Save the correct mac addr for hw-loopback connections in the L2T · bfae2324
      Steve Wise authored
      Hardware needs the local device mac address to support hw loopback for
      rdma loopback connections.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfae2324
    • Daniel Borkmann's avatar
      net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W · 8c482cdc
      Daniel Borkmann authored
      While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
      been wrongly decoded by commit a8fc9277 ("sk-filter: Add ability to
      get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
      although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.
      
      In practice, this should not have much side-effect though, as such
      conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
      operation PR_GET_SECCOMP will only return the current seccomp mode, but
      not the filter itself. Since the transition to the new BPF infrastructure,
      it's also not used anymore, so we can simply remove this as it's
      unreachable.
      
      Fixes: a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c482cdc
    • Daniel Borkmann's avatar
      seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF · 2eac7648
      Daniel Borkmann authored
      Linus reports that on 32-bit x86 Chromium throws the following seccomp
      resp. audit log messages:
      
        audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
      syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
      
        audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
      compat=0 ip=0xb2dd9852 code=0x50000
      
      These audit messages are being triggered via audit_seccomp() through
      __secure_computing() in seccomp mode (BPF) filter with seccomp return
      codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
      during filter runtime. Moreover, Linus reports that x86_64 Chromium
      seems fine.
      
      The underlying issue that explains this is that the implementation of
      populate_seccomp_data() is wrong. Our seccomp data structure sd that
      is being shared with user ABI is:
      
        struct seccomp_data {
          int nr;
          __u32 arch;
          __u64 instruction_pointer;
          __u64 args[6];
        };
      
      Therefore, a simple cast to 'unsigned long *' for storing the value of
      the syscall argument via syscall_get_arguments() is just wrong as on
      32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
      at wrong offsets in args[] member, and thus i) could leak stack memory
      to user space and ii) tampers with the logic of seccomp BPF programs
      that read out and check for syscall arguments:
      
        syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
      
      Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
      test machine through slow ssh X forwarding, but it fixes the issue on
      my side. So fix it up by storing args in type correct variables, gcc
      is clever and optimizes the copy away in other cases, e.g. x86_64.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Reported-and-bisected-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eac7648
    • David S. Miller's avatar
      Merge branch 'qlcnic' · 14ed4a5b
      David S. Miller authored
      Shahed Shaikh says:
      
      ====================
      qlcnic: Bug fixes
      
      This patch series contains following bug fixes -
      
      * Send INIT_NIC_FUNC mailbox command as first mailbox
      * Fix a panic because of uninitialized delayed_work.
      * Fix inconsistent calculation of max rings count.
      * Fix PVID configuration issue. Driver needs to clear older
        PVID before adding new one.
      * Fix QLogic application/driver interface by packing vNIC information
        array.
      * Fix a crash when user tries to disable SR-IOV while VFs are
        still assigned to VMs.
      
      Please apply to net.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14ed4a5b
    • Manish Chopra's avatar
      qlcnic: Do not disable SR-IOV when VFs are assigned to VMs · 696f1943
      Manish Chopra authored
      o While disabling SR-IOV when VFs are assigned to VMs causes host crash
        so return -EPERM when user request to disable SR-IOV using pci sysfs in
        case of VFs are assigned to VMs.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      696f1943
    • Jitendra Kalsaria's avatar
      qlcnic: Fix QLogic application/driver interface for virtual NIC configuration · 4f030227
      Jitendra Kalsaria authored
      o Application expect vNIC number as the array index but driver interface
      return configuration in array index form.
      
      o Pack the vNIC information array in the buffer such that application can
      access it using vNIC number as the array index.
      Signed-off-by: default avatarJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f030227
    • Jitendra Kalsaria's avatar
      qlcnic: Fix PVID configuration on eSwitch port. · a78b6da8
      Jitendra Kalsaria authored
      Clear older PVID before adding a newer PVID to the eSwicth port
      Signed-off-by: default avatarJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a78b6da8
    • Shahed Shaikh's avatar
      qlcnic: Fix max ring count calculation · 7b546842
      Shahed Shaikh authored
      Do not read max rings count from qlcnic_get_nic_info(). Use driver defined
      values for 82xx adapters. In case of 83xx adapters, use minimum of firmware
      provided and driver defined values.
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b546842
    • Sucheta Chakraborty's avatar
      qlcnic: Fix to send INIT_NIC_FUNC as first mailbox. · 4d52e1e8
      Sucheta Chakraborty authored
      o INIT_NIC_FUNC should be first mailbox sent. Sending DCB capability and
        parameter query commands after that command.
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d52e1e8
    • Sucheta Chakraborty's avatar
      qlcnic: Fix panic due to uninitialzed delayed_work struct in use. · 463518a0
      Sucheta Chakraborty authored
      o AEN event was being received before initializing delayed_work struct
        and handlers for it. This was resulting in crash. This patch fixes it.
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      463518a0
    • David S. Miller's avatar
      Merge branch 'be2net' · 677df2f4
      David S. Miller authored
      Sathya Perla says:
      
      ====================
      be2net: patch set
      
      Patch 1/2 is a v2 of a patch that was submitted earlier (as a part of a
      different patch-set). v2 incorporates a suggestion given by David Laight
      for how long to poll for pending TX completions while disabling a device.
      
      Patch 2/2 fixes a crash in be_remove()->be_close()
      path after be2net has aborted an EEH error recovery
      due to a permanant failure.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      677df2f4
    • Kalesh AP's avatar
      be2net: Fix invocation of be_close() after be_clear() · e1ad8e33
      Kalesh AP authored
      In the EEH error recovery path, when a permanent failure occurs,
      we clean up adapter structure (i.e. destroy queues etc) by calling
      be_clear() and return PCI_ERS_RESULT_DISCONNECT.
      After this the stack tries to remove device from bus and calls
      be_remove() which invokes netdev_unregister()->be_close().
      be_close() operating on destroyed queues results in a
      NULL dereference.
      
      This patch fixes this problem by introducing a flag to keep track
      of the setup state.
      Signed-off-by: default avatarKalesh AP <kalesh.purayil@emulex.com>
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1ad8e33
    • Vasundhara Volam's avatar
      be2net: Fix to reap TX compls till HW doesn't respond for some time · 1a3d0717
      Vasundhara Volam authored
      be_close() currently waits for a max of 200ms to receive all pending
      TX compls. This timeout value was roughly calculated based on 10G
      transmission speeds and the TX queue depth. This timeout may not be
      enough when the link is operating at lower speeds or in multi-channel/SR-IOV
      configs with TX-rate limiting setting.
      
      It is hard to calculate a "proper timeout value" that works in all
      configurations.  This patch solves this problem by continuing to reap
      TX completions till the HW is completely silent for a period of 10ms or
      a HW error is detected.
      
      v2: implements the new scheme (as suggested by David Laight) instead of
      just waiting longer than 200ms for reaping all completions.
      Signed-off-by: default avatarVasundhara Volam <vasundhara.volam@emulex.com>
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3d0717
    • Amir Vadai's avatar
      net/mlx4_core: Defer VF initialization till PF is fully initialized · e1a5ddc5
      Amir Vadai authored
      Fix in commit [1] is not sufficient since a deferred VF initialization
      could happen after pci_enable_sriov() is finished, but before the PF is
      fully initialized.
      Need to prevent VFs from initializing till the PF is fully ready and
      comm channel is operational.
      
      [1] - 97989356 "net/mlx4_core: mlx4_init_slave() shouldn't access comm
            channel before PF is ready"
      
      CC: Stuart Hayes <Stuart_Hayes@Dell.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1a5ddc5
    • Daniel J Blueman's avatar
      bnx2: Don't build unused suspend/resume functions not enabled · 77d149c4
      Daniel J Blueman authored
      When CONFIG_PM_SLEEP isn't enabled, bnx2_suspend/resume are unused; don't
      build them when they aren't used.
      Signed-off-by: default avatarDaniel J Blueman <daniel@quora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77d149c4
    • Eric Dumazet's avatar
      ipv6: Limit mtu to 65575 bytes · 30f78d8e
      Eric Dumazet authored
      Francois reported that setting big mtu on loopback device could prevent
      tcp sessions making progress.
      
      We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
      
      We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
      
      Tested:
      
      ifconfig lo mtu 70000
      netperf -H ::1
      
      Before patch : Throughput :   0.05 Mbits
      
      After patch : Throughput : 35484 Mbits
      Reported-by: default avatarFrancois WELLENREITER <f.wellenreiter@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30f78d8e
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4 · b855d416
      Patrick McHardy authored
      nft_cmp_fast is used for equality comparisions of size <= 4. For
      comparisions of size < 4 byte a mask is calculated that is applied to
      both the data from userspace (during initialization) and the register
      value (during runtime). Both values are stored using (in effect) memcpy
      to a memory area that is then interpreted as u32 by nft_cmp_fast.
      
      This works fine on little endian since smaller types have the same base
      address, however on big endian this is not true and the smaller types
      are interpreted as a big number with trailing zero bytes.
      
      The mask therefore must not include the lower bytes, but the higher bytes
      on big endian. Add a helper function that does a cpu_to_le32 to switch
      the bytes on big endian. Since we're dealing with a mask of just consequitive
      bits, this works out fine.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b855d416
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: initialize net.ct.generation · ee214d54
      Andrey Vagin authored
      [  251.920788] INFO: trying to register non-static key.
      [  251.921386] the code is fine but needs lockdep annotation.
      [  251.921386] turning off the locking correctness validator.
      [  251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294
      [  251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  251.921386]  0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd
      [  251.921386]  ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0
      [  251.921386]  ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172
      [  251.921386] Call Trace:
      [  251.921386]  [<ffffffff816b7ecd>] dump_stack+0x45/0x56
      [  251.921386]  [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c
      [  251.921386]  [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40
      [  251.921386]  [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
      [  251.921386]  [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
      [  251.921386]  [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40
      [  251.921386]  [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
      [  251.921386]  [<ffffffff810c7272>] lock_acquire+0xa2/0x120
      [  251.921386]  [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
      [  251.921386]  [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack]
      [  251.921386]  [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
      [  251.921386]  [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
      [  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
      [  251.921386]  [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0
      [  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
      [  251.921386]  [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190
      [  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
      [  251.921386]  [<ffffffff815e98f2>] ip_output+0x92/0x100
      [  251.921386]  [<ffffffff815e8df9>] ip_local_out+0x29/0x90
      [  251.921386]  [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0
      [  251.921386]  [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0
      [  251.921386]  [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960
      [  251.921386]  [<ffffffff81602d82>] tcp_connect+0x812/0x960
      [  251.921386]  [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70
      [  251.921386]  [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0
      [  251.921386]  [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470
      [  251.921386]  [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330
      [  251.921386]  [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0
      [  251.921386]  [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
      [  251.921386]  [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0
      [  251.921386]  [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50
      [  251.921386]  [<ffffffff8158b157>] SYSC_connect+0xe7/0x120
      [  251.921386]  [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0
      [  251.921386]  [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
      [  251.921386]  [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
      [  251.921386]  [<ffffffff8158c36e>] SyS_connect+0xe/0x10
      [  251.921386]  [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b
      [  312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333)
      [  312.015097] INFO: Stall ended before state dump start
      
      Fixes: 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ee214d54
    • Mathias Krause's avatar
      filter: prevent nla extensions to peek beyond the end of the message · 05ab8f26
      Mathias Krause authored
      The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
      for a minimal message length before testing the supplied offset to be
      within the bounds of the message. This allows the subtraction of the nla
      header to underflow and therefore -- as the data type is unsigned --
      allowing far to big offset and length values for the search of the
      netlink attribute.
      
      The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
      also wrong. It has the minuend and subtrahend mixed up, therefore
      calculates a huge length value, allowing to overrun the end of the
      message while looking for the netlink attribute.
      
      The following three BPF snippets will trigger the bugs when attached to
      a UNIX datagram socket and parsing a message with length 1, 2 or 3.
      
       ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
       | ld	#0x87654321
       | ldx	#42
       | ld	#nla
       | ret	a
       `---
      
       ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
       | ld	#0x87654321
       | ldx	#42
       | ld	#nlan
       | ret	a
       `---
      
       ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
       | ; (needs a fake netlink header at offset 0)
       | ld	#0
       | ldx	#42
       | ld	#nlan
       | ret	a
       `---
      
      Fix the first issue by ensuring the message length fulfills the minimal
      size constrains of a nla header. Fix the second bug by getting the math
      for the remainder calculation right.
      
      Fixes: 4738c1db ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
      Fixes: d214c753 ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05ab8f26
    • Julian Anastasov's avatar
      ipv4: return valid RTA_IIF on ip route get · 91146153
      Julian Anastasov authored
      Extend commit 13378cad
      ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
      RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
      which is displayed as 'iif *'.
      
      inet_iif is not appropriate to use because skb_iif is not set.
      Use the skb->dev->ifindex instead.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91146153
    • Thomas Petazzoni's avatar
      Revert "net: mvneta: fix usage as a module on RGMII configurations" · cc6ca302
      Thomas Petazzoni authored
      This reverts commit e3a8786c. While
      this commit allows to use the mvneta driver as a module on some
      configurations, it breaks other configurations even if mvneta is used
      built-in.
      
      This breakage is due to the fact that on some RGMII platforms, the PCS
      bit has to be set, and on some other platforms, it has to be
      cleared. At the moment, we lack informations to know exactly the
      significance of this bit (the datasheet only says "enables PCS"), and
      so we can't produce a patch that will work on all platforms at this
      point. And since this change is breaking the network completely for
      many users, it's much better to revert it for now. We'll come back
      later with a proper fix that takes into account all platforms.
      
      Basically:
      
       * Armada XP GP is configured as RGMII-ID, and needs the PCS bit to be
         set.
       * Armada 370 Mirabox is configured as RGMII-ID, and needs the PCS bit
         to be cleared.
      
      And at the moment, we don't know how to make the distinction between
      those two cases. One hint is that the Armada XP GP appears in fact to
      be using a QSGMII connection with the PHY (Quad-SGMII), but
      configuring it as SGMII doesn't work, while RGMII-ID works. This needs
      more investigation, but in the mean time, let's unbreak the network
      for all those users.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Reported-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Reported-by: default avatarAlexander Reuter <Alexander.Reuter@gmx.net>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=73401
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc6ca302
    • Wei Yang's avatar
      net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one() · befdf897
      Wei Yang authored
      pci_match_id() just match the static pci_device_id, which may return NULL if
      someone binds the driver to a device manually using
      /sys/bus/pci/drivers/.../new_id.
      
      This patch wrap up a helper function __mlx4_remove_one() which does the tear
      down function but preserve the drv_data. Functions like
      mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
      releasing drvdata.
      
      Fixes: 97a5221f "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset".
      
      CC: Bjorn Helgaas <bhelgaas@google.com>
      CC: Amir Vadai <amirv@mellanox.com>
      CC: Jack Morgenstein <jackm@dev.mellanox.co.il>
      CC: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarWei Yang <weiyang@linux.vnet.ibm.com>
      Acked-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      befdf897
    • Wang, Xiaoming's avatar
      net: ipv4: current group_info should be put after using. · b04c4619
      Wang, Xiaoming authored
      Plug a group_info refcount leak in ping_init.
      group_info is only needed during initialization and
      the code failed to release the reference on exit.
      While here move grabbing the reference to a place
      where it is actually needed.
      Signed-off-by: default avatarChuansheng Liu <chuansheng.liu@intel.com>
      Signed-off-by: default avatarZhang Dongxing <dongxing.zhang@intel.com>
      Signed-off-by: default avatarxiaoming wang <xiaoming.wang@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b04c4619
  3. 13 Apr, 2014 5 commits
    • Linus Torvalds's avatar
      Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 321d03c8
      Linus Torvalds authored
      Pull misc kbuild changes from Michal Marek:
       "Here is the non-critical part of kbuild:
         - One bogus coccinelle check removed, one check fixed not to suggest
           the obsolete PTR_RET macro
         - scripts/tags.sh does not index the generated *.mod.c files
         - new objdiff tool to list differences between two versions of an
           object file
         - A fix for scripts/bootgraph.pl"
      
      * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        scripts/coccinelle: Use PTR_ERR_OR_ZERO
        scripts/bootgraph.pl: Add graphic header
        scripts: objdiff: detect object code changes between two commits
        Coccicheck: Remove memcpy to struct assignment test
        scripts/tags.sh: Ignore *.mod.c
      321d03c8
    • Mikulas Patocka's avatar
      sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue · fd1232b2
      Mikulas Patocka authored
      This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
      returns QUEUE FULL status.
      
      When the controller encounters an error (including QUEUE FULL or BUSY
      status), it aborts all not yet submitted requests in the function
      sym_dequeue_from_squeue.
      
      This function aborts them with DID_SOFT_ERROR.
      
      If the disk has full tag queue, the request that caused the overflow is
      aborted with QUEUE FULL status (and the scsi midlayer properly retries
      it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
      the following requests with DID_SOFT_ERROR --- for them, the midlayer
      does just a few retries and then signals the error up to sd.
      
      The result is that disk returning QUEUE FULL causes request failures.
      
      The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
      (rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
      64 tags, but under some access patterns it return QUEUE FULL when there
      are less than 64 pending tags.  The SCSI specification allows returning
      QUEUE FULL anytime and it is up to the host to retry.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd1232b2
    • Paul Mackerras's avatar
      powerpc: Don't try to set LPCR unless we're in hypervisor mode · 18aa0da3
      Paul Mackerras authored
      Commit 8f619b54 ("powerpc/ppc64: Do not turn AIL (reloc-on
      interrupts) too early") added code to set the AIL bit in the LPCR
      without checking whether the kernel is running in hypervisor mode.  The
      result is that when the kernel is running as a guest (i.e., under
      PowerKVM or PowerVM), the processor takes a privileged instruction
      interrupt at that point, causing a panic.  The visible result is that
      the kernel hangs after printing "returning from prom_init".
      
      This fixes it by checking for hypervisor mode being available before
      setting LPCR.  If we are not in hypervisor mode, we enable relocation-on
      interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18aa0da3
    • Davidlohr Bueso's avatar
      futex: update documentation for ordering guarantees · d7e8af1a
      Davidlohr Bueso authored
      Commits 11d4616b ("futex: revert back to the explicit waiter
      counting code") and 69cd9eba ("futex: avoid race between requeue and
      wake") changed some of the finer details of how we think about futexes.
      One was a late fix and the other a consequence of overlooking the whole
      requeuing logic.
      
      The first change caused our documentation to be incorrect, and the
      second made us aware that we need to explicitly add more details to it.
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7e8af1a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 454fd351
      Linus Torvalds authored
      Pull yet more networking updates from David Miller:
      
       1) Various fixes to the new Redpine Signals wireless driver, from
          Fariya Fatima.
      
       2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
          Dmitry Petukhov.
      
       3) UFO and TSO packets differ in whether they include the protocol
          header in gso_size, account for that in skb_gso_transport_seglen().
         From Florian Westphal.
      
       4) If VLAN untagging fails, we double free the SKB in the bridging
          output path.  From Toshiaki Makita.
      
       5) Several call sites of sk->sk_data_ready() were referencing an SKB
          just added to the socket receive queue in order to calculate the
          second argument via skb->len.  This is dangerous because the moment
          the skb is added to the receive queue it can be consumed in another
          context and freed up.
      
          It turns out also that none of the sk->sk_data_ready()
          implementations even care about this second argument.
      
          So just kill it off and thus fix all these use-after-free bugs as a
          side effect.
      
       6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
      
       7) pktgen needs to do locking properly for LLTX devices, from Daniel
          Borkmann.
      
       8) xen-netfront driver initializes TX array entries in RX loop :-) From
          Vincenzo Maffione.
      
       9) After refactoring, some tunnel drivers allow a tunnel to be
          configured on top itself.  Fix from Nicolas Dichtel.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
        vti: don't allow to add the same tunnel twice
        gre: don't allow to add the same tunnel twice
        drivers: net: xen-netfront: fix array initialization bug
        pktgen: be friendly to LLTX devices
        r8152: check RTL8152_UNPLUG
        net: sun4i-emac: add promiscuous support
        net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
        net: ipv6: Fix oif in TCP SYN+ACK route lookup.
        drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
        drivers: net: cpsw: discard all packets received when interface is down
        net: Fix use after free by removing length arg from sk_data_ready callbacks.
        Drivers: net: hyperv: Address UDP checksum issues
        Drivers: net: hyperv: Negotiate suitable ndis version for offload support
        Drivers: net: hyperv: Allocate memory for all possible per-pecket information
        bridge: Fix double free and memory leak around br_allowed_ingress
        bonding: Remove debug_fs files when module init fails
        i40evf: program RSS LUT correctly
        i40evf: remove open-coded skb_cow_head
        ixgb: remove open-coded skb_cow_head
        igbvf: remove open-coded skb_cow_head
        ...
      454fd351