1. 30 Sep, 2016 6 commits
    • Xin Long's avatar
      sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock · 1cceda78
      Xin Long authored
      When sctp dumps all the ep->assocs, it needs to lock_sock first,
      but now it locks sock in rcu_read_lock, and lock_sock may sleep,
      which would break rcu_read_lock.
      
      This patch is to get and hold one sock when traversing the list.
      After that and get out of rcu_read_lock, lock and dump it. Then
      it will traverse the list again to get the next one until all
      sctp socks are dumped.
      
      For sctp_diag_dump_one, it fixes this issue by holding asoc and
      moving cb() out of rcu_read_lock in sctp_transport_lookup_process.
      
      Fixes: 8f840e47 ("sctp: add the sctp_diag.c file")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cceda78
    • David S. Miller's avatar
      Merge branch 'sctp-fixes' · 75b005b9
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for prsctp polices
      
      This patchset is to fix 2 issues for prsctp polices:
      
        1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue
           when overloading the CPU.
      
        2. patch 3 fix "prsctp polices should check both sides' prsctp_capable,
           instead of only local side".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75b005b9
    • Xin Long's avatar
      sctp: change to check peer prsctp_capable when using prsctp polices · be4947bf
      Xin Long authored
      Now before using prsctp polices, sctp uses asoc->prsctp_enable to
      check if prsctp is enabled. However asoc->prsctp_enable is set only
      means local host support prsctp, sctp should not abandon packet if
      peer host doesn't enable prsctp.
      
      So this patch is to use asoc->peer.prsctp_capable to check if prsctp
      is enabled on both side, instead of asoc->prsctp_enable, as asoc's
      peer.prsctp_capable is set only when local and peer both enable prsctp.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be4947bf
    • Xin Long's avatar
      sctp: remove prsctp_param from sctp_chunk · 0605483f
      Xin Long authored
      Now sctp uses chunk->prsctp_param to save the prsctp param for all the
      prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
      We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
      and reuse msg->expires_at for TTL policy, as the prsctp polices and old
      expires policy are mutual exclusive.
      
      This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
      expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
      polices.
      
      Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
      as it needs a u64 variables to save the expires_at time.
      
      This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
      issue.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0605483f
    • Xin Long's avatar
      sctp: move sent_count to the memory hole in sctp_chunk · 73dca124
      Xin Long authored
      Now pahole sctp_chunk, it has 2 memory holes:
         struct sctp_chunk {
      	struct list_head           list;
      	atomic_t                   refcnt;
      	/* XXX 4 bytes hole, try to pack */
      	...
      	long unsigned int          prsctp_param;
      	int                        sent_count;
      	/* XXX 4 bytes hole, try to pack */
      
      This patch is to move up sent_count to fill the 1st one and eliminate
      the 2nd one.
      
      It's not just another struct compaction, it also fixes the "netperf-
      Throughput_Mbps -37.2% regression" issue when overloading the CPU.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73dca124
    • Milton Miller's avatar
      tg3: Avoid NULL pointer dereference in tg3_io_error_detected() · 1b0ff898
      Milton Miller authored
      While the driver is probing the adapter, an error may occur before the
      netdev structure is allocated and attached to pci_dev. In this case,
      not only netdev isn't available, but the tg3 private structure is also
      not available as it is just math from the NULL pointer, so dereferences
      must be skipped.
      
      The following trace is seen when the error is triggered:
      
        [1.402247] Unable to handle kernel paging request for data at address 0x00001a99
        [1.402410] Faulting instruction address: 0xc0000000007e33f8
        [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
        [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
        [1.402513] Modules linked in:
        [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
        [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000
        [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0
        [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
        [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000422  XER: 20000000
        [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1
        GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000
        GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299
        GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006
        GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768
        GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002
        GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000
        [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
        [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340
      
      This patch avoids the NULL pointer dereference by moving the access after
      the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a
      check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan].
      
      Fixes: 0486a063 ("tg3: prevent ifup/ifdown during PCI error recovery")
      Fixes: dfc8f370 ("net/tg3: Release IRQs on permanent error")
      Tested-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: default avatarMilton Miller <miltonm@us.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b0ff898
  2. 27 Sep, 2016 9 commits
    • David S. Miller's avatar
      Merge branch 'act_ife-fixes' · 7b8147aa
      David S. Miller authored
      Yotam Gigi says:
      
      ====================
      Fix tc-ife bugs
      
      This patch-set contains two bugfixes in the tc-ife action, one fixing some
      random behaviour in encode side, and one fixing the decode side packet
      parsing logic.
      
      v2->v3
       - Fix the encode side instead of the decode side
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b8147aa
    • Yotam Gigi's avatar
      act_ife: Fix false encoding · c006da0b
      Yotam Gigi authored
      On ife encode side, the action stores the different tlvs inside the ife
      header, where each tlv length field should refer to the length of the
      whole tlv (without additional padding) and not just the data length.
      
      On ife decode side, the action iterates over the tlvs in the ife header
      and parses them one by one, where in each iteration the current pointer is
      advanced according to the tlv size.
      
      Before, the encoding encoded only the data length inside the tlv, which led
      to false parsing of ife the header. In addition, due to the fact that the
      loop counter was unsigned, it could lead to infinite parsing loop.
      
      This fix changes the loop counter to be signed and fixes the encoding to
      take into account the tlv type and size.
      
      Fixes: 28a10c42 ("net sched: fix encoding to use real length")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c006da0b
    • Yotam Gigi's avatar
      act_ife: Fix external mac header on encode · 4b1d488a
      Yotam Gigi authored
      On ife encode side, external mac header is copied from the original packet
      and may be overridden if the user requests. Before, the mac header copy
      was done from memory region that might not be accessible anymore, as
      skb_cow_head might free it and copy the packet. This led to random values
      in the external mac header once the values were not set by user.
      
      This fix takes the internal mac header from the packet, after the call to
      skb_cow_head.
      
      Fixes: ef6980b6 ("net sched: introduce IFE action")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b1d488a
    • Jorgen Hansen's avatar
      VSOCK: Don't dec ack backlog twice for rejected connections · 1190cfdb
      Jorgen Hansen authored
      If a pending socket is marked as rejected, we will decrease the
      sk_ack_backlog twice. So don't decrement it for rejected sockets
      in vsock_pending_work().
      
      Testing of the rejected socket path was done through code
      modifications.
      Reported-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarJorgen Hansen <jhansen@vmware.com>
      Reviewed-by: default avatarAdit Ranadive <aditr@vmware.com>
      Reviewed-by: default avatarAditya Sarwade <asarwade@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1190cfdb
    • Florian Fainelli's avatar
      Revert "net: ethernet: bcmgenet: use phydev from struct net_device" · bf1a85a8
      Florian Fainelli authored
      This reverts commit 62469c76 ("net: ethernet: bcmgenet: use phydev
      from struct net_device") because it causes GENETv1/2/3 adapters to
      expose the following behavior after an ifconfig down/up sequence:
      
      PING fainelli-linux (10.112.156.244): 56 data bytes
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!)
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!)
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!)
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!)
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!)
      64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!)
      
      This was previously fixed by commit 5dbebbb4 ("net: bcmgenet:
      Software reset EPHY after power on") but the commit we are reverting was
      essentially making this previous commit void, here is why.
      
      Without commit 62469c76 we would have the following scenario after
      an ifconfig down then up sequence:
      
      - bcmgenet_open() calls bcmgenet_power_up() to make sure the PHY is
        initialized *before* we get to initialize the UniMAC, this is
        critical to ensure the PHY is in a correct state, priv->phydev is
        valid, this code executes fine
      
      - second time from bcmgenet_mii_probe(), through the normal
        phy_init_hw() call (which arguably could be optimized out)
      
      Everything is fine in that case. With commit 62469c76, we would have
      the following scenario to happen after an ifconfig down then up
      sequence:
      
      - bcmgenet_close() calls phy_disonnect() which makes dev->phydev become
        NULL
      
      - when bcmgenet_open() executes again and calls bcmgenet_mii_reset() from
        bcmgenet_power_up() to initialize the internal PHY, the NULL check
        becomes true, so we do not reset the PHY, yet we keep going on and
        initialize the UniMAC, causing MAC activity to occur
      
      - we call bcmgenet_mii_reset() from bcmgenet_mii_probe(), but this is
        too late, the PHY is botched, and causes the above bogus pings/packets
        transmission/reception to occur
      Reported-by: default avatarJaedon Shin <jaedon.shin@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf1a85a8
    • David S. Miller's avatar
      Merge branch 'fec-align' · 6c1394f3
      David S. Miller authored
      Eric Nelson says:
      
      ====================
      net: fec: updates to align IP header
      
      This patch series is the outcome of investigation into very high
      numbers of alignment faults on kernel 4.1.33 from the linux-fslc
      tree:
          https://github.com/freescale/linux-fslc/tree/4.1-1.0.x-imx
      
      The first two patches remove support for the receive accelerator (RACC) from
      the i.MX25 and i.MX27 SoCs which don't support the function.
      
      The third patch enables hardware alignment of the ethernet packet payload
      (and especially the IP header) to prevent alignment faults in the IP stack.
      
      Testing on i.MX6UL on the 4.1.33 kernel showed that this patch removed
      on the order of 70k alignment faults during a 100MiB transfer using
      wget.
      
      Testing on an i.MX6Q (SABRE Lite) board on net-next (4.8.0-rc7) showed
      a much more modest improvement from 10's of faults, and it's not clear
      why that's the case.
      ====================
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c1394f3
    • Eric Nelson's avatar
      net: fec: align IP header in hardware · 3ac72b7b
      Eric Nelson authored
      The FEC receive accelerator (RACC) supports shifting the data payload of
      received packets by 16-bits, which aligns the payload (IP header) on a
      4-byte boundary, which is, if not required, at least strongly suggested
      by the Linux networking layer.
      
      Without this patch, a huge number of alignment faults will be taken by the
      IP stack, as seen in /proc/cpu/alignment:
      
      	~/$ cat /proc/cpu/alignment
      	User:		0
      	System:		72645 (inet_gro_receive+0x104/0x27c)
      	Skipped:	0
      	Half:		0
      	Word:		0
      	DWord:		0
      	Multi:		72645
      	User faults:	3 (fixup+warn)
      
      This patch was suggested by Andrew Lunn in this message to linux-netdev:
      	http://marc.info/?l=linux-arm-kernel&m=147465452108384&w=2
      
      and adapted from a patch by Russell King from 2014:
      	http://git.arm.linux.org.uk/cgit/linux-arm.git/commit/?id=70d8a8aSigned-off-by: default avatarEric Nelson <eric@nelint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ac72b7b
    • Eric Nelson's avatar
      net: fec: remove QUIRK_HAS_RACC from i.mx27 · 97dc499c
      Eric Nelson authored
      According to the i.MX27 reference manual, this SoC does not have support
      for the receive accelerator (RACC) register at offset 0x1C4.
      
      	http://cache.nxp.com/files/32bit/doc/ref_manual/MCIMX27RM.pdfSigned-off-by: default avatarEric Nelson <eric@nelint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97dc499c
    • Eric Nelson's avatar
      net: fec: remove QUIRK_HAS_RACC from i.mx25 · 653d37d8
      Eric Nelson authored
      According to the i.MX25 reference manual, this SoC does not have support
      for the receive accelerator (RACC) register at offset 0x1C4.
      
      http://www.nxp.com/files/dsp/doc/ref_manual/IMX25RM.pdfSigned-off-by: default avatarEric Nelson <eric@nelint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      653d37d8
  3. 26 Sep, 2016 1 commit
    • Nikolay Aleksandrov's avatar
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov authored
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cf75070
  4. 24 Sep, 2016 1 commit
  5. 23 Sep, 2016 5 commits
  6. 22 Sep, 2016 11 commits
  7. 21 Sep, 2016 7 commits