1. 08 Dec, 2013 40 commits
    • Shawn Landden's avatar
      net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST · 2afe5cfb
      Shawn Landden authored
      [ Upstream commit d3f7d56a ]
      
      Commit 35f9c09f (tcp: tcp_sendpages() should call tcp_push() once)
      added an internal flag MSG_SENDPAGE_NOTLAST, similar to
      MSG_MORE.
      
      algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
      and need to see the new flag as identical to MSG_MORE.
      
      This fixes sendfile() on AF_ALG.
      
      v3: also fix udp
      
      Cc: Tom Herbert <therbert@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org> # 3.4.x + 3.2.x
      Reported-and-tested-by: default avatarShawn Landden <shawnlandden@gmail.com>
      Original-patch: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarShawn Landden <shawn@churchofgit.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2afe5cfb
    • Linus Walleij's avatar
      net: smc91: fix crash regression on the versatile · 2c9681eb
      Linus Walleij authored
      [ Upstream commit a0c20fb0 ]
      
      After commit e9e4ea74
      "net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
      The Versatile SMSC LAN91C111 is crashing like this:
      
      ------------[ cut here ]------------
      kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
      Internal error: Oops - BUG: 0 [#1] ARM
      Modules linked in:
      CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
      task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000
      PC is at smc_hardware_send_pkt+0x198/0x22c
      LR is at smc_hardware_send_pkt+0x24/0x22c
      pc : [<c01be324>]    lr : [<c01be1b0>]    psr: 20000013
      sp : c6cd1d08  ip : 00000001  fp : 00000000
      r10: c02adb08  r9 : 00000000  r8 : c6ced802
      r7 : c786fba0  r6 : 00000146  r5 : c8800000  r4 : c78d6000
      r3 : 0000000f  r2 : 00000146  r1 : 00000000  r0 : 00000031
      Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0005317f  Table: 06cf4000  DAC: 00000015
      Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
      Stack: (0xc6cd1d08 to 0xc6cd2000)
      1d00:                   00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
      1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
      1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
      1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
      1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
      1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
      1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
      1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
      1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
      1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
      1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
      1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
      1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
      1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
      1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
      1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
      1f20: 06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
      1f40: 00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
      1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
      1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
      1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
      1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
      1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
      [<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
      [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
      [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c)
      [<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c)
      [<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28)
      [<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8)
      [<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc)
      [<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c)
      Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2)
      ---[ end trace 81104fe70e8da7fe ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      
      This is because the macro operations in smc91x.h defined
      for Versatile are missing SMC_outsw() as used in this
      commit.
      
      The Versatile needs and uses the same accessors as the other
      platforms in the first if(...) clause, just switch it to using
      that and we have one problem less to worry about.
      
      This includes a hunk of a patch from Will Deacon fixin
      the other 32bit platforms as well: Innokom, Ramses, PXA,
      PCM027.
      
      Checkpatch complains about spacing, but I have opted to
      follow the style of this .h-file.
      
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Nicolas Pitre <nico@fluxnic.net>
      Cc: Eric Miao <eric.y.miao@gmail.com>
      Cc: Jonathan Cameron <jic23@cam.ac.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c9681eb
    • Yang Yingliang's avatar
      net: 8139cp: fix a BUG_ON triggered by wrong bytes_compl · 2650c2f8
      Yang Yingliang authored
      [ Upstream commit 7fe0ee09 ]
      
      Using iperf to send packets(GSO mode is on), a bug is triggered:
      
      [  212.672781] kernel BUG at lib/dynamic_queue_limits.c:26!
      [  212.673396] invalid opcode: 0000 [#1] SMP
      [  212.673882] Modules linked in: 8139cp(O) nls_utf8 edd fuse loop dm_mod ipv6 i2c_piix4 8139too i2c_core intel_agp joydev pcspkr hid_generic intel_gtt floppy sr_mod mii button sg cdrom ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata scsi_mod [last unloaded: 8139cp]
      [  212.676084] CPU: 0 PID: 4124 Comm: iperf Tainted: G           O 3.12.0-0.7-default+ #16
      [  212.676084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [  212.676084] task: ffff8800d83966c0 ti: ffff8800db4c8000 task.ti: ffff8800db4c8000
      [  212.676084] RIP: 0010:[<ffffffff8122e23f>]  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
      [  212.676084] RSP: 0018:ffff880116e03e30  EFLAGS: 00010083
      [  212.676084] RAX: 00000000000005ea RBX: 0000000000000f7c RCX: 0000000000000002
      [  212.676084] RDX: ffff880111dd0dc0 RSI: 0000000000000bd4 RDI: ffff8800db6ffcc0
      [  212.676084] RBP: ffff880116e03e48 R08: 0000000000000992 R09: 0000000000000000
      [  212.676084] R10: ffffffff8181e400 R11: 0000000000000004 R12: 000000000000000f
      [  212.676084] R13: ffff8800d94ec840 R14: ffff8800db440c80 R15: 000000000000000e
      [  212.676084] FS:  00007f6685a3c700(0000) GS:ffff880116e00000(0000) knlGS:0000000000000000
      [  212.676084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  212.676084] CR2: 00007f6685ad6460 CR3: 00000000db714000 CR4: 00000000000006f0
      [  212.676084] Stack:
      [  212.676084]  ffff8800db6ffc00 000000000000000f ffff8800d94ec840 ffff880116e03eb8
      [  212.676084]  ffffffffa041509f ffff880116e03e88 0000000f16e03e88 ffff8800d94ec000
      [  212.676084]  00000bd400059858 000000050000000f ffffffff81094c36 ffff880116e03eb8
      [  212.676084] Call Trace:
      [  212.676084]  <IRQ>
      [  212.676084]  [<ffffffffa041509f>] cp_interrupt+0x4ef/0x590 [8139cp]
      [  212.676084]  [<ffffffff81094c36>] ? ktime_get+0x56/0xd0
      [  212.676084]  [<ffffffff8108cf73>] handle_irq_event_percpu+0x53/0x170
      [  212.676084]  [<ffffffff8108d0cc>] handle_irq_event+0x3c/0x60
      [  212.676084]  [<ffffffff8108fdb5>] handle_fasteoi_irq+0x55/0xf0
      [  212.676084]  [<ffffffff810045df>] handle_irq+0x1f/0x30
      [  212.676084]  [<ffffffff81003c8b>] do_IRQ+0x5b/0xe0
      [  212.676084]  [<ffffffff8142beaa>] common_interrupt+0x6a/0x6a
      [  212.676084]  <EOI>
      [  212.676084]  [<ffffffffa0416a21>] ? cp_start_xmit+0x621/0x97c [8139cp]
      [  212.676084]  [<ffffffffa0416a09>] ? cp_start_xmit+0x609/0x97c [8139cp]
      [  212.676084]  [<ffffffff81378ed9>] dev_hard_start_xmit+0x2c9/0x550
      [  212.676084]  [<ffffffff813960a9>] sch_direct_xmit+0x179/0x1d0
      [  212.676084]  [<ffffffff813793f3>] dev_queue_xmit+0x293/0x440
      [  212.676084]  [<ffffffff813b0e46>] ip_finish_output+0x236/0x450
      [  212.676084]  [<ffffffff810e59e7>] ? __alloc_pages_nodemask+0x187/0xb10
      [  212.676084]  [<ffffffff813b10e8>] ip_output+0x88/0x90
      [  212.676084]  [<ffffffff813afa64>] ip_local_out+0x24/0x30
      [  212.676084]  [<ffffffff813aff0d>] ip_queue_xmit+0x14d/0x3e0
      [  212.676084]  [<ffffffff813c6fd1>] tcp_transmit_skb+0x501/0x840
      [  212.676084]  [<ffffffff813c8323>] tcp_write_xmit+0x1e3/0xb20
      [  212.676084]  [<ffffffff81363237>] ? skb_page_frag_refill+0x87/0xd0
      [  212.676084]  [<ffffffff813c8c8b>] tcp_push_one+0x2b/0x40
      [  212.676084]  [<ffffffff813bb7e6>] tcp_sendmsg+0x926/0xc90
      [  212.676084]  [<ffffffff813e1d21>] inet_sendmsg+0x61/0xc0
      [  212.676084]  [<ffffffff8135e861>] sock_aio_write+0x101/0x120
      [  212.676084]  [<ffffffff81107cf1>] ? vma_adjust+0x2e1/0x5d0
      [  212.676084]  [<ffffffff812163e0>] ? timerqueue_add+0x60/0xb0
      [  212.676084]  [<ffffffff81130b60>] do_sync_write+0x60/0x90
      [  212.676084]  [<ffffffff81130d44>] ? rw_verify_area+0x54/0xf0
      [  212.676084]  [<ffffffff81130f66>] vfs_write+0x186/0x190
      [  212.676084]  [<ffffffff811317fd>] SyS_write+0x5d/0xa0
      [  212.676084]  [<ffffffff814321e2>] system_call_fastpath+0x16/0x1b
      [  212.676084] Code: ca 41 89 dc 41 29 cc 45 31 db 29 c2 41 89 c5 89 d0 45 29 c5 f7 d0 c1 e8 1f e9 43 ff ff ff 66 0f 1f 44 00 00 31 c0 e9 7b ff ff ff <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 c7 47 40 00
      [  212.676084] RIP  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
      ------------[ cut here ]------------
      
      When a skb has frags, bytes_compl plus skb->len nr_frags times in cp_tx().
      It's not the correct value(actually, it should plus skb->len once) and it
      will trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed).
      So only increase bytes_compl when finish sending all frags. pkts_compl also
      has a wrong value, fix it too.
      
      It's introduced by commit 871f0d4c ("8139cp: enable bql").
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2650c2f8
    • David Chang's avatar
      r8169: check ALDPS bit and disable it if enabled for the 8168g · 8b0a34fa
      David Chang authored
      [ Upstream commit 1bac1072 ]
      
      Windows driver will enable ALDPS function, but linux driver and firmware
      do not have any configuration related to ALDPS function for 8168g.
      So restart system to linux and remove the NIC cable, LAN enter ALDPS,
      then LAN RX will be disabled.
      
      This issue can be easily reproduced on dual boot windows and linux
      system with RTL_GIGA_MAC_VER_40 chip.
      
      Realtek said, ALDPS function can be disabled by configuring to PHY,
      switch to page 0x0A43, reg0x10 bit2=0.
      Signed-off-by: default avatarDavid Chang <dchang@suse.com>
      Acked-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b0a34fa
    • Francois Romieu's avatar
      via-velocity: fix netif_receive_skb use in irq disabled section. · df0c053e
      Francois Romieu authored
      [ Upstream commit bc9627e7 ]
      
      2fdac010 ("via-velocity.c: update napi
      implementation") overlooked an irq disabling spinlock when the Rx part
      of the NAPI poll handler was converted from netif_rx to netif_receive_skb.
      
      NAPI Rx processing can be taken out of the locked section with a pair of
      napi_{disable / enable} since it only races with the MTU change function.
      
      An heavier rework of the NAPI locking would be able to perform NAPI Tx
      before Rx where I simply removed one of velocity_tx_srv calls.
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1022733
      Fixes: 2fdac010 (via-velocity.c: update napi implementation)
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Tested-by: default avatarAlex A. Schmidt <aaschmidt1@gmail.com>
      Cc: Jamie Heilman <jamie@audible.transient.net>
      Cc: Michele Baldessari <michele@acksyn.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df0c053e
    • Andy Whitcroft's avatar
      xen-netback: include definition of csum_ipv6_magic · d172c155
      Andy Whitcroft authored
      [ Upstream commit ae5e8127 ]
      
      We are now using csum_ipv6_magic, include the appropriate header.
      Avoids the following error:
      
          drivers/net/xen-netback/netback.c:1313:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
              tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d172c155
    • Eric Dumazet's avatar
      sch_tbf: handle too small burst · f2b44be9
      Eric Dumazet authored
      [ Upstream commit 4d0820cf ]
      
      If a too small burst is inadvertently set on TBF, we might trigger
      a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a
      qdisc_reshape_fail() call.
      
      tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate
      50mbit
      
      Fix the bug, and add a warning, as such configuration is not
      going to work anyway for non GSO packets.
      
      (For some reason, one has to use a burst >= 1520 to get a working
      configuration, even with old kernels. This is a probable iproute2/tc
      bug)
      
      Based on a report and initial patch from Yang Yingliang
      
      Fixes: e43ac79a ("sch_tbf: segment too big GSO packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2b44be9
    • Herbert Xu's avatar
      gro: Clean up tcpX_gro_receive checksum verification · 99834e4d
      Herbert Xu authored
      [ Upstream commit b8ee93ba ]
      
      This patch simplifies the checksum verification in tcpX_gro_receive
      by reusing the CHECKSUM_COMPLETE code for CHECKSUM_NONE.  All it
      does for CHECKSUM_NONE is compute the partial checksum and then
      treat it as if it came from the hardware (CHECKSUM_COMPLETE).
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      
      Cheers,
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99834e4d
    • Herbert Xu's avatar
      gro: Only verify TCP checksums for candidates · d8f70311
      Herbert Xu authored
      [ Upstream commit cc5c00bb ]
      
      In some cases we may receive IP packets that are longer than
      their stated lengths.  Such packets are never merged in GRO.
      However, we may end up computing their checksums incorrectly
      and end up allowing packets with a bogus checksum enter our
      stack with the checksum status set as verified.
      
      Since such packets are rare and not performance-critical, this
      patch simply skips the checksum verification for them.
      Reported-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      
      Thanks,
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8f70311
    • Herbert Xu's avatar
      gso: handle new frag_list of frags GRO packets · c5352f36
      Herbert Xu authored
      [ Upstream commit 9d8506cc ]
      
      Recently GRO started generating packets with frag_lists of frags.
      This was not handled by GSO, thus leading to a crash.
      
      Thankfully these packets are of a regular form and are easy to
      handle.  This patch handles them in two ways.  For completely
      non-linear frag_list entries, we simply continue to iterate over
      the frag_list frags once we exhaust the normal frags.  For frag_list
      entries with linear parts, we call pskb_trim on the first part
      of the frag_list skb, and then process the rest of the frags in
      the usual way.
      
      This patch also kills a chunk of dead frag_list code that has
      obviously never ever been run since it ends up generating a bogus
      GSO-segmented packet with a frag_list entry.
      
      Future work is planned to split super big packets into TSO
      ones.
      
      Fixes: 8a29111c ("net: gro: allow to build full sized skb")
      Reported-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Reported-by: default avatarJerry Chu <hkchu@google.com>
      Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Tested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5352f36
    • Veaceslav Falico's avatar
      af_packet: block BH in prb_shutdown_retire_blk_timer() · f517950d
      Veaceslav Falico authored
      [ Upstream commit ec6f809f ]
      
      Currently we're using plain spin_lock() in prb_shutdown_retire_blk_timer(),
      however the timer might fire right in the middle and thus try to re-aquire
      the same spinlock, leaving us in a endless loop.
      
      To fix that, use the spin_lock_bh() to block it.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Daniel Borkmann <dborkman@redhat.com>
      CC: Willem de Bruijn <willemb@google.com>
      CC: Phil Sutter <phil@nwl.cc>
      CC: Eric Dumazet <edumazet@google.com>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f517950d
    • Daniel Borkmann's avatar
      packet: fix use after free race in send path when dev is released · efc27dbe
      Daniel Borkmann authored
      [ Upstream commit e40526cb ]
      
      Salam reported a use after free bug in PF_PACKET that occurs when
      we're sending out frames on a socket bound device and suddenly the
      net device is being unregistered. It appears that commit 827d9780
      introduced a possible race condition between {t,}packet_snd() and
      packet_notifier(). In the case of a bound socket, packet_notifier()
      can drop the last reference to the net_device and {t,}packet_snd()
      might end up suddenly sending a packet over a freed net_device.
      
      To avoid reverting 827d9780 and thus introducing a performance
      regression compared to the current state of things, we decided to
      hold a cached RCU protected pointer to the net device and maintain
      it on write side via bind spin_lock protected register_prot_hook()
      and __unregister_prot_hook() calls.
      
      In {t,}packet_snd() path, we access this pointer under rcu_read_lock
      through packet_cached_dev_get() that holds reference to the device
      to prevent it from being freed through packet_notifier() while
      we're in send path. This is okay to do as dev_put()/dev_hold() are
      per-cpu counters, so this should not be a performance issue. Also,
      the code simplifies a bit as we don't need need_rls_dev anymore.
      
      Fixes: 827d9780 ("af-packet: Use existing netdev reference for bound sockets.")
      Reported-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efc27dbe
    • Ding Tianhong's avatar
      bridge: flush br's address entry in fdb when remove the bridge dev · 4af9d888
      Ding Tianhong authored
      [ Upstream commit f8730420 ]
      
      When the following commands are executed:
      
      brctl addbr br0
      ifconfig br0 hw ether <addr>
      rmmod bridge
      
      The calltrace will occur:
      
      [  563.312114] device eth1 left promiscuous mode
      [  563.312188] br0: port 1(eth1) entered disabled state
      [  563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects
      [  563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G           O 3.12.0-0.7-default+ #9
      [  563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [  563.468200]  0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8
      [  563.468204]  ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8
      [  563.468206]  ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78
      [  563.468209] Call Trace:
      [  563.468218]  [<ffffffff814d1c92>] dump_stack+0x6a/0x78
      [  563.468234]  [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100
      [  563.468242]  [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge]
      [  563.468247]  [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge]
      [  563.468254]  [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0
      [  563.468259]  [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b
      [  570.377958] Bridge firewalling registered
      
      --------------------------- cut here -------------------------------
      
      The reason is that when the bridge dev's address is changed, the
      br_fdb_change_mac_address() will add new address in fdb, but when
      the bridge was removed, the address entry in the fdb did not free,
      the bridge_fdb_cache still has objects when destroy the cache, Fix
      this by flushing the bridge address entry when removing the bridge.
      
      v2: according to the Toshiaki Makita and Vlad's suggestion, I only
          delete the vlan0 entry, it still have a leak here if the vlan id
          is other number, so I need to call fdb_delete_by_port(br, NULL, 1)
          to flush all entries whose dst is NULL for the bridge.
      Suggested-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Suggested-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af9d888
    • Vlad Yasevich's avatar
      net: core: Always propagate flag changes to interfaces · a6ec437b
      Vlad Yasevich authored
      [ Upstream commit d2615bf4 ]
      
      The following commit:
          b6c40d68
          net: only invoke dev->change_rx_flags when device is UP
      
      tried to fix a problem with VLAN devices and promiscuouse flag setting.
      The issue was that VLAN device was setting a flag on an interface that
      was down, thus resulting in bad promiscuity count.
      This commit blocked flag propagation to any device that is currently
      down.
      
      A later commit:
          deede2fa
          vlan: Don't propagate flag changes on down interfaces
      
      fixed VLAN code to only propagate flags when the VLAN interface is up,
      thus fixing the same issue as above, only localized to VLAN.
      
      The problem we have now is that if we have create a complex stack
      involving multiple software devices like bridges, bonds, and vlans,
      then it is possible that the flags would not propagate properly to
      the physical devices.  A simple examle of the scenario is the
      following:
      
        eth0----> bond0 ----> bridge0 ---> vlan50
      
      If bond0 or eth0 happen to be down at the time bond0 is added to
      the bridge, then eth0 will never have promisc mode set which is
      currently required for operation as part of the bridge.  As a
      result, packets with vlan50 will be dropped by the interface.
      
      The only 2 devices that implement the special flag handling are
      VLAN and DSA and they both have required code to prevent incorrect
      flag propagation.  As a result we can remove the generic solution
      introduced in b6c40d68 and leave
      it to the individual devices to decide whether they will block
      flag propagation or not.
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Suggested-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6ec437b
    • Alexei Starovoitov's avatar
      ipv4: fix race in concurrent ip_route_input_slow() · fa2f52de
      Alexei Starovoitov authored
      [ Upstream commit dcdfdf56 ]
      
      CPUs can ask for local route via ip_route_input_noref() concurrently.
      if nh_rth_input is not cached yet, CPUs will proceed to allocate
      equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
      via rt_cache_route()
      Most of the time they succeed, but on occasion the following two lines:
      	orig = *p;
      	prev = cmpxchg(p, orig, rt);
      in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
      But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
      so dst is leaking. dst_destroy() is never called and 'lo' device
      refcnt doesn't go to zero, which can be seen in the logs as:
      	unregister_netdevice: waiting for lo to become free. Usage count = 1
      Adding mdelay() between above two lines makes it easily reproducible.
      Fix it similar to nh_pcpu_rth_output case.
      
      Fixes: d2d68ba9 ("ipv4: Cache input routes in fib_info nexthops.")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa2f52de
    • Andrey Vagin's avatar
      tcp: don't update snd_nxt, when a socket is switched from repair mode · 30eb9c19
      Andrey Vagin authored
      [ Upstream commit dbde4979 ]
      
      snd_nxt must be updated synchronously with sk_send_head.  Otherwise
      tp->packets_out may be updated incorrectly, what may bring a kernel panic.
      
      Here is a kernel panic from my host.
      [  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
      ...
      [  146.301158] Call Trace:
      [  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0
      
      Before this panic a tcp socket was restored. This socket had sent and
      unsent data in the write queue. Sent data was restored in repair mode,
      then the socket was switched from reapair mode and unsent data was
      restored. After that the socket was switched back into repair mode.
      
      In that moment we had a socket where write queue looks like this:
      snd_una    snd_nxt   write_seq
         |_________|________|
                   |
      	  sk_send_head
      
      After a second switching from repair mode the state of socket was
      changed:
      
      snd_una          snd_nxt, write_seq
         |_________ ________|
                   |
      	  sk_send_head
      
      This state is inconsistent, because snd_nxt and sk_send_head are not
      synchronized.
      
      Bellow you can find a call trace, how packets_out can be incremented
      twice for one skb, if snd_nxt and sk_send_head are not synchronized.
      In this case packets_out will be always positive, even when
      sk_write_queue is empty.
      
      tcp_write_wakeup
      	skb = tcp_send_head(sk);
      	tcp_fragment
      		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
      			tcp_adjust_pcount(sk, skb, diff);
      	tcp_event_new_data_sent
      		tp->packets_out += tcp_skb_pcount(skb);
      
      I think update of snd_nxt isn't required, when a socket is switched from
      repair mode.  Because it's initialized in tcp_connect_init. Then when a
      write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
      so it's always is in consistent state.
      
      I have checked, that the bug is not reproduced with this patch and
      all tests about restoring tcp connections work fine.
      
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30eb9c19
    • Ying Xue's avatar
      atm: idt77252: fix dev refcnt leak · 5f406a1e
      Ying Xue authored
      [ Upstream commit b5de4a22 ]
      
      init_card() calls dev_get_by_name() to get a network deceive. But it
      doesn't decrease network device reference count after the device is
      used.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f406a1e
    • fan.du's avatar
      xfrm: Release dst if this dst is improper for vti tunnel · fa3735e9
      fan.du authored
      [ Upstream commit 236c9f84 ]
      
      After searching rt by the vti tunnel dst/src parameter,
      if this rt has neither attached to any transformation
      nor the transformation is not tunnel oriented, this rt
      should be released back to ip layer.
      
      otherwise causing dst memory leakage.
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa3735e9
    • Jiri Pirko's avatar
      netfilter: push reasm skb through instead of original frag skbs · 0a6905b2
      Jiri Pirko authored
      [ Upstream commit 6aafeef0 ]
      
      Pushing original fragments through causes several problems. For example
      for matching, frags may not be matched correctly. Take following
      example:
      
      <example>
      On HOSTA do:
      ip6tables -I INPUT -p icmpv6 -j DROP
      ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
      
      and on HOSTB you do:
      ping6 HOSTA -s2000    (MTU is 1500)
      
      Incoming echo requests will be filtered out on HOSTA. This issue does
      not occur with smaller packets than MTU (where fragmentation does not happen)
      </example>
      
      As was discussed previously, the only correct solution seems to be to use
      reassembled skb instead of separete frags. Doing this has positive side
      effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
      dances in ipvs and conntrack can be removed.
      
      Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
      entirely and use code in net/ipv6/reassembly.c instead.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a6905b2
    • Jiri Pirko's avatar
      ip6_output: fragment outgoing reassembled skb properly · 88a6ec98
      Jiri Pirko authored
      [ Upstream commit 9037c357 ]
      
      If reassembled packet would fit into outdev MTU, it is not fragmented
      according the original frag size and it is send as single big packet.
      
      The second case is if skb is gso. In that case fragmentation does not happen
      according to the original frag size.
      
      This patch fixes these.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88a6ec98
    • Vlad Yasevich's avatar
      ipv6: Fix inet6_init() cleanup order · 72b89bae
      Vlad Yasevich authored
      Commit 6d0bfe22
      	net: ipv6: Add IPv6 support to the ping socket
      
      introduced a change in the cleanup logic of inet6_init and
      has a bug in that ipv6_packet_cleanup() may not be called.
      Fix the cleanup ordering.
      
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Lorenzo Colitti <lorenzo@google.com>
      CC: Fabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72b89bae
    • Hannes Frederic Sowa's avatar
      ipv6: fix leaking uninitialized port number of offender sockaddr · 00fecf73
      Hannes Frederic Sowa authored
      [ Upstream commit 1fa4c710 ]
      
      Offenders don't have port numbers, so set it to 0.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00fecf73
    • Dan Carpenter's avatar
      net: clamp ->msg_namelen instead of returning an error · 5bca4e49
      Dan Carpenter authored
      [ Upstream commit db31c55a ]
      
      If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
      original code that would lead to memory corruption in the kernel if you
      had audit configured.  If you didn't have audit configured it was
      harmless.
      
      There are some programs such as beta versions of Ruby which use too
      large of a buffer and returning an error code breaks them.  We should
      clamp the ->msg_namelen value instead.
      
      Fixes: 1661bf36 ("net: heap overflow in __audit_sockaddr()")
      Reported-by: default avatarEric Wong <normalperson@yhbt.net>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bca4e49
    • Hannes Frederic Sowa's avatar
      inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions · d4cbc4a9
      Hannes Frederic Sowa authored
      [ Upstream commit 85fbaa75 ]
      
      Commit bceaa902 ("inet: prevent leakage
      of uninitialized memory to user in recv syscalls") conditionally updated
      addr_len if the msg_name is written to. The recv_error and rxpmtu
      functions relied on the recvmsg functions to set up addr_len before.
      
      As this does not happen any more we have to pass addr_len to those
      functions as well and set it to the size of the corresponding sockaddr
      length.
      
      This broke traceroute and such.
      
      Fixes: bceaa902 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
      Reported-by: default avatarBrad Spengler <spender@grsecurity.net>
      Reported-by: Tom Labanowski
      Cc: mpb <mpb.mail@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4cbc4a9
    • Hannes Frederic Sowa's avatar
      net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage) · 4d7aef1f
      Hannes Frederic Sowa authored
      [ Upstream commit 68c6beb3 ]
      
      In that case it is probable that kernel code overwrote part of the
      stack. So we should bail out loudly here.
      
      The BUG_ON may be removed in future if we are sure all protocols are
      conformant.
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d7aef1f
    • Hannes Frederic Sowa's avatar
      net: rework recvmsg handler msg_name and msg_namelen logic · 0cefe287
      Hannes Frederic Sowa authored
      [ Upstream commit f3d33426 ]
      
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cefe287
    • Hannes Frederic Sowa's avatar
      ping: prevent NULL pointer dereference on write to msg_name · 086663e0
      Hannes Frederic Sowa authored
      [ Upstream commit cf970c00 ]
      
      A plain read() on a socket does set msg->msg_name to NULL. So check for
      NULL pointer first.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      086663e0
    • Hannes Frederic Sowa's avatar
      inet: prevent leakage of uninitialized memory to user in recv syscalls · 7a9b8e64
      Hannes Frederic Sowa authored
      [ Upstream commit bceaa902 ]
      
      Only update *addr_len when we actually fill in sockaddr, otherwise we
      can return uninitialized memory from the stack to the caller in the
      recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
      checks because we only get called with a valid addr_len pointer either
      from sock_common_recvmsg or inet_recvmsg.
      
      If a blocking read waits on a socket which is concurrently shut down we
      now return zero and set msg_msgnamelen to 0.
      Reported-by: default avatarmpb <mpb.mail@gmail.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a9b8e64
    • Eric Dumazet's avatar
      pkt_sched: fq: fix pacing for small frames · d902e2c3
      Eric Dumazet authored
      [ Upstream commit f52ed899 ]
      
      For performance reasons, sch_fq tried hard to not setup timers for every
      sent packet, using a quantum based heuristic : A delay is setup only if
      the flow exhausted its credit.
      
      Problem is that application limited flows can refill their credit
      for every queued packet, and they can evade pacing.
      
      This problem can also be triggered when TCP flows use small MSS values,
      as TSO auto sizing builds packets that are smaller than the default fq
      quantum (3028 bytes)
      
      This patch adds a 40 ms delay to guard flow credit refill.
      
      Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d902e2c3
    • Eric Dumazet's avatar
      pkt_sched: fq: warn users using defrate · 8ec21820
      Eric Dumazet authored
      [ Upstream commit 65c5189a ]
      
      Commit 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing")
      obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.
      
      Suggested by David Miller
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ec21820
    • Eric Dumazet's avatar
      ipv4: fix possible seqlock deadlock · 41d5db11
      Eric Dumazet authored
      [ Upstream commit c9e90429 ]
      
      ip4_datagram_connect() being called from process context,
      it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
      otherwise we can deadlock on 32bit arches, or get corruptions of
      SNMP counters.
      
      Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41d5db11
    • Chris Metcalf's avatar
      connector: improved unaligned access error fix · ca86fc7e
      Chris Metcalf authored
      [ Upstream commit 1ca1a4cf ]
      
      In af3e095a, Erik Jacobsen fixed one type of unaligned access
      bug for ia64 by converting a 64-bit write to use put_unaligned().
      Unfortunately, since gcc will convert a short memset() to a series
      of appropriately-aligned stores, the problem is now visible again
      on tilegx, where the memset that zeros out proc_event is converted
      to three 64-bit stores, causing an unaligned access panic.
      
      A better fix for the original problem is to ensure that proc_event
      is aligned to 8 bytes here.  We can do that relatively easily by
      arranging to start the struct cn_msg aligned to 8 bytes and then
      offset by 4 bytes.  Doing so means that the immediately following
      proc_event structure is then correctly aligned to 8 bytes.
      
      The result is that the memset() stores are now aligned, and as an
      added benefit, we can remove the put_unaligned() calls in the code.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca86fc7e
    • Maciej Żenczykowski's avatar
      pkt_sched: fq: change classification of control packets · 67db3935
      Maciej Żenczykowski authored
      [ Upstream commit 2abc2f07 ]
      
      Initial sch_fq implementation copied code from pfifo_fast to classify
      a packet as a high prio packet.
      
      This clashes with setups using PRIO with say 7 bands, as one of the
      band could be incorrectly (mis)classified by FQ.
      
      Packets would be queued in the 'internal' queue, and no pacing ever
      happen for this special queue.
      
      Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67db3935
    • Nicolas Dichtel's avatar
      ip6tnl: fix use after free of fb_tnl_dev · 06c375b4
      Nicolas Dichtel authored
      [ Upstream commit 1e9f3d6f ]
      
      Bug has been introduced by commit bb814094 ("ip6tnl: allow to use rtnl ops
      on fb tunnel").
      
      When ip6_tunnel.ko is unloaded, FB device is delete by rtnl_link_unregister()
      and then we try to use the pointer in ip6_tnl_destroy_tunnels().
      
      Let's add an handler for dellink, which will never remove the FB tunnel. With
      this patch it will no more be possible to remove it via 'ip link del ip6tnl0',
      but it's safer.
      
      The same fix was already proposed by Willem de Bruijn <willemb@google.com> for
      sit interfaces.
      
      CC: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06c375b4
    • Dan Carpenter's avatar
      isdnloop: use strlcpy() instead of strcpy() · c2ef4d4c
      Dan Carpenter authored
      [ Upstream commit f9a23c84 ]
      
      These strings come from a copy_from_user() and there is no way to be
      sure they are NUL terminated.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2ef4d4c
    • Willem de Bruijn's avatar
      sit: fix use after free of fb_tunnel_dev · 7c9c354a
      Willem de Bruijn authored
      [ Upstream commit 9434266f ]
      
      Bug: The fallback device is created in sit_init_net and assumed to be
      freed in sit_exit_net. First, it is dereferenced in that function, in
      sit_destroy_tunnels:
      
              struct net *net = dev_net(sitn->fb_tunnel_dev);
      
      Prior to this, rtnl_unlink_register has removed all devices that match
      rtnl_link_ops == sit_link_ops.
      
      Commit 205983c4 added the line
      
      +       sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
      
      which cases the fallback device to match here and be freed before it
      is last dereferenced.
      
      Fix: This commit adds an explicit .delllink callback to sit_link_ops
      that skips deallocation at rtnl_unlink_register for the fallback
      device. This mechanism is comparable to the one in ip_tunnel.
      
      It also modifies sit_destroy_tunnels and its only caller sit_exit_net
      to avoid the offending dereference in the first place. That double
      lookup is more complicated than required.
      
      Test: The bug is only triggered when CONFIG_NET_NS is enabled. It
      causes a GPF only when CONFIG_DEBUG_SLAB is enabled. Verified that
      this bug exists at the mentioned commit, at davem-net HEAD and at
      3.11.y HEAD. Verified that it went away after applying this patch.
      
      Fixes: 205983c4 ("sit: allow to use rtnl ops on fb tunnel")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c9c354a
    • Eric Dumazet's avatar
      net-tcp: fix panic in tcp_fastopen_cache_set() · a9ac5b0b
      Eric Dumazet authored
      [ Upstream commit dccf76ca ]
      
      We had some reports of crashes using TCP fastopen, and Dave Jones
      gave a nice stack trace pointing to the error.
      
      Issue is that tcp_get_metrics() should not be called with a NULL dst
      
      Fixes: 1fe4c481 ("net-tcp: Fast Open client - cookie cache")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Tested-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9ac5b0b
    • Nikolay Aleksandrov's avatar
      bonding: fix two race conditions in bond_store_updelay/downdelay · accbc62e
      Nikolay Aleksandrov authored
      [ Upstream commit b869ccfa ]
      
      This patch fixes two race conditions between bond_store_updelay/downdelay
      and bond_store_miimon which could lead to division by zero as miimon can
      be set to 0 while either updelay/downdelay are being set and thus miss the
      zero check in the beginning, the zero div happens because updelay/downdelay
      are stored as new_value / bond->params.miimon. Use rtnl to synchronize with
      miimon setting.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Veaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      accbc62e
    • Eric Dumazet's avatar
      tcp: tsq: restore minimal amount of queueing · f4a5ec10
      Eric Dumazet authored
      [ Upstream commit 98e09386 ]
      
      After commit c9eeec26 ("tcp: TSQ can use a dynamic limit"), several
      users reported throughput regressions, notably on mvneta and wifi
      adapters.
      
      802.11 AMPDU requires a fair amount of queueing to be effective.
      
      This patch partially reverts the change done in tcp_write_xmit()
      so that the minimal amount is sysctl_tcp_limit_output_bytes.
      
      It also remove the use of this sysctl while building skb stored
      in write queue, as TSO autosizing does the right thing anyway.
      
      Users with well behaving NICS and correct qdisc (like sch_fq),
      can then lower the default sysctl_tcp_limit_output_bytes value from
      128KB to 8KB.
      
      This new usage of sysctl_tcp_limit_output_bytes permits each driver
      authors to check how their driver performs when/if the value is set
      to a minimum of 4KB.
      
      Normally, line rate for a single TCP flow should be possible,
      but some drivers rely on timers to perform TX completion and
      too long TX completion delays prevent reaching full throughput.
      
      Fixes: c9eeec26 ("tcp: TSQ can use a dynamic limit")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarSujith Manoharan <sujith@msujith.org>
      Reported-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Tested-by: default avatarSujith Manoharan <sujith@msujith.org>
      Cc: Felix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4a5ec10
    • Jason Wang's avatar
      macvtap: limit head length of skb allocated · ebee20ff
      Jason Wang authored
      [ Upstream commit 16a3fa28 ]
      
      We currently use hdr_len as a hint of head length which is advertised by
      guest. But when guest advertise a very big value, it can lead to an 64K+
      allocating of kmalloc() which has a very high possibility of failure when host
      memory is fragmented or under heavy stress. The huge hdr_len also reduce the
      effect of zerocopy or even disable if a gso skb is linearized in guest.
      
      To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the
      head, which guarantees an order 0 allocation each time.
      
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebee20ff