- 29 Jan, 2016 26 commits
-
-
Eric Dumazet authored
commit 60bc851a upstream. Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 2ee9cbe7) [wt: adjusted context] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Marcelo Ricardo Leitner authored
[ Upstream commit 01ce63c9 ] Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy related to disabling sock timestamp. When SCTP accepts an association or peel one off, it copies sock flags but forgot to call net_enable_timestamp() if a packet timestamping flag was copied, leading to extra calls to net_disable_timestamp() whenever such clones were closed. The fix is to call net_enable_timestamp() whenever we copy a sock with that flag on, like tcp does. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: SK_FLAGS_TIMESTAMP is newly defined] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit d85242d9) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Daniel Borkmann authored
[ Upstream commit 6900317f ] David and HacKurx reported a following/similar size overflow triggered in a grsecurity kernel, thanks to PaX's gcc size overflow plugin: (Already fixed in later grsecurity versions by Brad and PaX Team.) [ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314 cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr; [ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7 [ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...] [ 1002.296153] ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8 [ 1002.296162] ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8 [ 1002.296169] ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60 [ 1002.296176] Call Trace: [ 1002.296190] [<ffffffff818129ba>] dump_stack+0x45/0x57 [ 1002.296200] [<ffffffff8121f838>] report_size_overflow+0x38/0x60 [ 1002.296209] [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300 [ 1002.296220] [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930 [ 1002.296228] [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60 [ 1002.296236] [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50 [ 1002.296243] [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60 [ 1002.296248] [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0 [ 1002.296257] [<ffffffff81693496>] __sys_recvmsg+0x46/0x80 [ 1002.296263] [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40 [ 1002.296271] [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85 Further investigation showed that this can happen when an *odd* number of fds are being passed over AF_UNIX sockets. In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)), where i is the number of successfully passed fds, differ by 4 bytes due to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary on 64 bit. The padding is used to align subsequent cmsg headers in the control buffer. When the control buffer passed in from the receiver side *lacks* these 4 bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will overflow in scm_detach_fds(): int cmlen = CMSG_LEN(i * sizeof(int)); <--- cmlen w/o tail-padding err = put_user(SOL_SOCKET, &cm->cmsg_level); if (!err) err = put_user(SCM_RIGHTS, &cm->cmsg_type); if (!err) err = put_user(cmlen, &cm->cmsg_len); if (!err) { cmlen = CMSG_SPACE(i * sizeof(int)); <--- cmlen w/ 4 byte extra tail-padding msg->msg_control += cmlen; msg->msg_controllen -= cmlen; <--- iff no tail-padding space here ... } ... wrap-around F.e. it will wrap to a length of 18446744073709551612 bytes in case the receiver passed in msg->msg_controllen of 20 bytes, and the sender properly transferred 1 fd to the receiver, so that its CMSG_LEN results in 20 bytes and CMSG_SPACE in 24 bytes. In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an issue in my tests as alignment seems always on 4 byte boundary. Same should be in case of native 32 bit, where we end up with 4 byte boundaries as well. In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving a single fd would mean that on successful return, msg->msg_controllen is being set by the kernel to 24 bytes instead, thus more than the input buffer advertised. It could f.e. become an issue if such application later on zeroes or copies the control buffer based on the returned msg->msg_controllen elsewhere. Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253). Going over the code, it seems like msg->msg_controllen is not being read after scm_detach_fds() in scm_recv() anymore by the kernel, good! Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg()) and unix_stream_recvmsg(). Both return back to their recvmsg() caller, and ___sys_recvmsg() places the updated length, that is, new msg_control - old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen in the example). Long time ago, Wei Yongjun fixed something related in commit 1ac70e7a ("[NET]: Fix function put_cmsg() which may cause usr application memory overflow"). RFC3542, section 20.2. says: The fields shown as "XX" are possible padding, between the cmsghdr structure and the data, and between the data and the next cmsghdr structure, if required by the implementation. While sending an application may or may not include padding at the end of last ancillary data in msg_controllen and implementations must accept both as valid. On receiving a portable application must provide space for padding at the end of the last ancillary data as implementations may copy out the padding at the end of the control message buffer and include it in the received msg_controllen. When recvmsg() is called if msg_controllen is too small for all the ancillary data items including any trailing padding after the last item an implementation may set MSG_CTRUNC. Since we didn't place MSG_CTRUNC for already quite a long time, just do the same as in 1ac70e7a to avoid an overflow. Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix error in SCM_RIGHTS code sample"). Some people must have copied this (?), thus it got triggered in the wild (reported several times during boot by David and HacKurx). No Fixes tag this time as pre 2002 (that is, pre history tree). Reported-by: David Sterba <dave@jikos.cz> Reported-by: HacKurx <hackurx@gmail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Emese Revfy <re.emese@gmail.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 831a2a17) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Eric Dumazet authored
[ Upstream commit 142a2e7e ] Dmitry provided a syzkaller (http://github.com/google/syzkaller) generated program that triggers the WARNING at net/ipv4/tcp.c:1729 in tcp_recvmsg() : WARN_ON(tp->copied_seq != tp->rcv_nxt && !(flags & (MSG_PEEK | MSG_TRUNC))); His program is specifically attempting a Cross SYN TCP exchange, that we support (for the pleasure of hackers ?), but it looks we lack proper tcp->copied_seq initialization. Thanks again Dmitry for your report and testings. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 6cfa9781) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Jan Stancek authored
commit 27f972d3 upstream. We encountered a panic on boot in ipmi_si on a dell per320 due to an uninitialized timer as follows. static int smi_start_processing(void *send_info, ipmi_smi_t intf) { /* Try to claim any interrupts. */ if (new_smi->irq_setup) new_smi->irq_setup(new_smi); --> IRQ arrives here and irq handler tries to modify uninitialized timer which triggers BUG_ON(!timer->function) in __mod_timer(). Call Trace: <IRQ> [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si] [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si] [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si] [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350 [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si] [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170 [<ffffffff810f245e>] handle_edge_irq+0xde/0x180 [<ffffffff8100fc59>] handle_irq+0x49/0xa0 [<ffffffff8154643c>] do_IRQ+0x6c/0xf0 [<ffffffff8100ba53>] ret_from_intr+0x0/0x11 /* Set up the timer that drives the interface. */ setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi); The following patch fixes the problem. To: Openipmi-developer@lists.sourceforge.net To: Corey Minyard <minyard@acm.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 207ffa8c) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Sasha Levin authored
commit 119d6f6a upstream. Because wakeups can (fundamentally) be late, a task might not be in the expected state. Therefore testing against a task's state is racy, and can yield false positives. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oleg@redhat.com Fixes: 9067ac85 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task") Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.comSigned-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 0e796c1b) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Andrew Lunn authored
commit 4eba7bb1 upstream. When a multicast group is joined on a socket, a struct ip_mc_socklist is appended to the sockets mc_list containing information about the joined group. If the interface is hot unplugged, this entry becomes stale. Prior to commit 52ad353a ("igmp: fix the problem when mc leave group") it was possible to remove the stale entry by performing a IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on the interface. However, this fix enforces that the interface must still exist. Thus with time, the number of stale entries grows, until sysctl_igmp_max_memberships is reached and then it is not possible to join and more groups. The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is performed without specifying the interface, either by ifindex or ip address. However here we do supply one of these. So loosen the restriction on device existence to only apply when the interface has not been specified. This then restores the ability to clean up the stale entries. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 52ad353a "(igmp: fix the problem when mc leave group") Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit b1fa8526) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Peter Hurley authored
commit ee9159dd upstream. The N_X25 line discipline may access the previous line discipline's closed and already-freed private data on open [1]. The tty->disc_data field _never_ refers to valid data on entry to the line discipline's open() method. Rather, the ldisc is expected to initialize that field for its own use for the lifetime of the instance (ie. from open() to close() only). [1] [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 485274cf) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Jeff Layton authored
commit c812012f upstream. If we pass in an empty nfs_fattr struct to nfs_update_inode, it will (correctly) not update any of the attributes, but it then clears the NFS_INO_INVALID_ATTR flag, which indicates that the attributes are up to date. Don't clear the flag if the fattr struct has no valid attrs to apply. Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit ddab0155) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
David Turner authored
commit a4dad1ae upstream. In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by: David Turner <novalis@novalis.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Mark Harris <mh8928@yahoo.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 6dfd5f6a) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Jan Kara authored
commit c2489e07 upstream. The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 47ae562e) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Al Viro authored
commit 0ebf7f10 upstream. The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: - Adjust context - Also delete unused sysv_fast_symlink_inode_operations] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 081c7697) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Roman Gushchin authored
commit 3ca8138f upstream. I got a report about unkillable task eating CPU. Further investigation shows, that the problem is in the fuse_fill_write_pages() function. If iov's first segment has zero length, we get an infinite loop, because we never reach iov_iter_advance() call. Fix this by calling iov_iter_advance() before repeating an attempt to copy data from userspace. A similar problem is described in 124d3b70 ("fix writev regression: pan hanging unkillable and un-straceable"). If zero-length segmend is followed by segment with invalid address, iov_iter_fault_in_readable() checks only first segment (zero-length), iov_iter_copy_from_user_atomic() skips it, fails at second and returns zero -> goto again without skipping zero-length segment. Patch calls iov_iter_advance() before goto again: we'll skip zero-length segment at second iteraction and iov_iter_fault_in_readable() will detect invalid address. Special thanks to Konstantin Khlebnikov, who helped a lot with the commit description. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: ea9b9907 ("fuse: implement perform_write") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit a5b23416) [wt: adjusted context, as commit 478e0841 from 3.1 was never backported to call mark_page_accessed() eventhough it seems it should have been] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
lucien authored
commit ed5a377d upstream. now sctp auth cannot work well when setting a hmacid manually, which is caused by that we didn't use the network order for hmacid, so fix it by adding the transformation in sctp_auth_ep_set_hmacs. even we set hmacid with the network order in userspace, it still can't work, because of this condition in sctp_auth_ep_set_hmacs(): if (id > SCTP_AUTH_HMAC_ID_MAX) return -EOPNOTSUPP; so this wasn't working before and thus it won't break compatibility. Fixes: 65b07e5d ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 98d37e7f) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
David S. Miller authored
commit 5233252f upstream. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Hannes Frederic Sowa authored
commit 7bbadd2d upstream. Docbook does not like the definition of macros inside a field declaration and adds a warning. Move the definition out. Fixes: 79462ad0 ("net: add validation for the socket syscall protocol argument") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: keep open-coding U8_MAX] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 35da6d62) [wt: adjusted context] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Hannes Frederic Sowa authored
commit 79462ad0 upstream. 郭永刚 reported that one could simply crash the kernel as root by using a simple program: int socket_fd; struct sockaddr_in addr; addr.sin_port = 0; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_family = 10; socket_fd = socket(10,3,0x40000000); connect(socket_fd , &addr,16); AF_INET, AF_INET6 sockets actually only support 8-bit protocol identifiers. inet_sock's skc_protocol field thus is sized accordingly, thus larger protocol identifiers simply cut off the higher bits and store a zero in the protocol fields. This could lead to e.g. NULL function pointer because as a result of the cut off inet_num is zero and we call down to inet_autobind, which is NULL for raw sockets. kernel: Call Trace: kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70 kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80 kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110 kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80 kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200 kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10 kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89 I found no particular commit which introduced this problem. CVE: CVE-2015-8543 Cc: Cong Wang <cwang@twopensource.com> Reported-by: 郭永刚 <guoyonggang@360.cn> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 2.6.32: open-code U8_MAX; adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
David Howells authored
commit b4a1b4f5 upstream. This fixes CVE-2015-7550. There's a race between keyctl_read() and keyctl_revoke(). If the revoke happens between keyctl_read() checking the validity of a key and the key's semaphore being taken, then the key type read method will see a revoked key. This causes a problem for the user-defined key type because it assumes in its read method that there will always be a payload in a non-revoked key and doesn't check for a NULL pointer. Fix this by making keyctl_read() check the validity of a key after taking semaphore instead of before. I think the bug was introduced with the original keyrings code. This was discovered by a multithreaded test program generated by syzkaller (http://github.com/google/syzkaller). Here's a cleaned up version: #include <sys/types.h> #include <keyutils.h> #include <pthread.h> void *thr0(void *arg) { key_serial_t key = (unsigned long)arg; keyctl_revoke(key); return 0; } void *thr1(void *arg) { key_serial_t key = (unsigned long)arg; char buffer[16]; keyctl_read(key, buffer, 16); return 0; } int main() { key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING); pthread_t th[5]; pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key); pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key); pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key); pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key); pthread_join(th[0], 0); pthread_join(th[1], 0); pthread_join(th[2], 0); pthread_join(th[3], 0); return 0; } Build as: cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread Run as: while keyctl-race; do :; done as it may need several iterations to crash the kernel. The crash can be summarised as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff81279b08>] user_read+0x56/0xa3 ... Call Trace: [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: James Morris <james.l.morris@oracle.com> [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Eric Dumazet authored
commit 197c949e upstream. Backport of this upstream commit into stable kernels : 89c22d8c ("net: Fix skb csum races when peeking") exposed a bug in udp stack vs MSG_PEEK support, when user provides a buffer smaller than skb payload. In this case, skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov); returns -EFAULT. This bug does not happen in upstream kernels since Al Viro did a great job to replace this into : skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg); This variant is safe vs short buffers. For the time being, instead reverting Herbert Xu patch and add back skb->ip_summed invalid changes, simply store the result of udp_lib_checksum_complete() so that we avoid computing the checksum a second time, and avoid the problematic skb_copy_and_csum_datagram_iovec() call. This patch can be applied on recent kernels as it avoids a double checksumming, then backported to stable kernels as a bug fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 18a6eba2) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Willy Tarreau authored
This reverts commit c507639b. As reported by Michal Kubecek, this fix doesn't handle truncated reads correctly. Next patch from Eric fixes it better. Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ben Hutchings authored
Fix failure paths in ext4_fill_super() that can lead to a null dereference. This was designated CVE-2015-8324. Mostly extracted from commit 744692dc ("ext4: use ext4_get_block_write in buffer write"). However there's one more incorrect goto to fix, removed upstream in commit cf40db13 ("ext4: remove failed journal checksum check"). Reference: https://bugs.openvz.org/browse/OVZ-6541Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Rainer Weikusat authored
commit 7d267278 upstream. Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 2.6.32: - Access sk_sleep directly, not through sk_sleep() function - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Quentin Casasnovas authored
commit 8c7188b2 upstream. Sasha's found a NULL pointer dereference in the RDS connection code when sending a message to an apparently unbound socket. The problem is caused by the code checking if the socket is bound in rds_sendmsg(), which checks the rs_bound_addr field without taking a lock on the socket. This opens a race where rs_bound_addr is temporarily set but where the transport is not in rds_bind(), leading to a NULL pointer dereference when trying to dereference 'trans' in __rds_conn_create(). Vegard wrote a reproducer for this issue, so kindly ask him to share if you're interested. I cannot reproduce the NULL pointer dereference using Vegard's reproducer with this patch, whereas I could without. Complete earlier incomplete fix to CVE-2015-6937: 74e98eb0 ("RDS: verify the underlying transport exists before creating a connection") Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ben Hutchings authored
commit 4ab42d78 upstream. Currently slhc_init() treats out-of-range values of rslots and tslots as equivalent to 0, except that if tslots is too large it will dereference a null pointer (CVE-2015-7799). Add a range-check at the top of the function and make it return an ERR_PTR() on error instead of NULL. Change the callers accordingly. Compile-tested only. Reported-by: 郭永刚 <guoyonggang@360.cn> References: http://article.gmane.org/gmane.comp.security.oss.general/17908Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 2.6.32: adjust filenames, context, indentation] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ben Hutchings authored
commit 0baa57d8 upstream. Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
-
WANG Cong authored
commit 7ba0c47c upstream. We need to wait for the flying timers, since we are going to free the mrtable right after it. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> [ wt: 2.6.32 has a single table hence a single timer. ip6_mr_init() has the same del_timer() call on the error path, but we don't need to change it since at this point the timer hasn't been started yet ] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
- 05 Dec, 2015 14 commits
-
-
Willy Tarreau authored
Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Christophe Leroy authored
commit 0ff28d9f upstream. Using sendfile with below small program to get MD5 sums of some files, it appear that big files (over 64kbytes with 4k pages system) get a wrong MD5 sum while small files get the correct sum. This program uses sendfile() to send a file to an AF_ALG socket for hashing. /* md5sum2.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #include <sys/socket.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/if_alg.h> int main(int argc, char **argv) { int sk = socket(AF_ALG, SOCK_SEQPACKET, 0); struct stat st; struct sockaddr_alg sa = { .salg_family = AF_ALG, .salg_type = "hash", .salg_name = "md5", }; int n; bind(sk, (struct sockaddr*)&sa, sizeof(sa)); for (n = 1; n < argc; n++) { int size; int offset = 0; char buf[4096]; int fd; int sko; int i; fd = open(argv[n], O_RDONLY); sko = accept(sk, NULL, 0); fstat(fd, &st); size = st.st_size; sendfile(sko, fd, &offset, size); size = read(sko, buf, sizeof(buf)); for (i = 0; i < size; i++) printf("%2.2x", buf[i]); printf(" %s\n", argv[n]); close(fd); close(sko); } exit(0); } Test below is done using official linux patch files. First result is with a software based md5sum. Second result is with the program above. root@vgoip:~# ls -l patch-3.6.* -rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz -rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gz root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz 5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gz After investivation, it appears that sendfile() sends the files by blocks of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each block, the SPLICE_F_MORE flag is missing, therefore the hashing operation is reset as if it was the end of the file. This patch adds SPLICE_F_MORE to the flags when more data is pending. With the patch applied, we get the correct sums: root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit fcb27817) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Eric Dumazet authored
[ Upstream commit 8fa677d2 ] Under low memory conditions, tcp_sk_init() and icmp_sk_init() can both iterate on all possible cpus and call inet_ctl_sock_destroy(), with eventual NULL pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit f79c83d6) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Ani Sinha authored
[ Upstream commit 44f49dd8 ] Fixes the following kernel BUG : BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758 caller is __this_cpu_preempt_check+0x13/0x15 CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8 Call Trace: [<ffffffff81482b2a>] dump_stack+0x52/0x80 [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1 [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15 [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810e6974>] ? pollwake+0x4d/0x51 [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77 [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32 [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53 [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75 [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55 [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41 [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86 [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99 [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32 [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3 [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e Signed-off-by: Ani Sinha <ani@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: ipmr doesn't implement IPSTATS_MIB_OUTOCTETS] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 33cf84ba) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Sowmini Varadhan authored
[ Upstream commit 8ce675ff ] Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit f114d937) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Maciej W. Rozycki authored
commit b582ef5c upstream. Do not clobber the buffer space passed from `search_binary_handler' and originally preloaded by `prepare_binprm' with the executable's file header by overwriting it with its interpreter's file header. Instead keep the buffer space intact and directly use the data structure locally allocated for the interpreter's file header, fixing a bug introduced in 2.1.14 with loadable module support (linux-mips.org commit beb11695 [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history). Adjust the amount of data read from the interpreter's file accordingly. This was not an issue before loadable module support, because back then `load_elf_binary' was executed only once for a given ELF executable, whether the function succeeded or failed. With loadable module support supported and enabled, upon a failure of `load_elf_binary' -- which may for example be caused by architecture code rejecting an executable due to a missing hardware feature requested in the file header -- a module load is attempted and then the function reexecuted by `search_binary_handler'. With the executable's file header replaced with its interpreter's file header the executable can then be erroneously accepted in this subsequent attempt. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit beebd9fa) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Dan Carpenter authored
commit 1f35d04a upstream. The iomap[] array has PCIM_IOMAP_MAX (6) elements and not DEVICE_COUNT_RESOURCE (16). This bug was found using a static checker. It may be that the "if (!(mask & (1 << i)))" check means we never actually go past the end of the array in real life. Fixes: ec04b075 ('iomap: implement pcim_iounmap_regions()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit e7102453) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Richard Purdie authored
commit 79b568b9 upstream. hid_connect adds various strings to the buffer but they're all conditional. You can find circumstances where nothing would be written to it but the kernel will still print the supposedly empty buffer with printk. This leads to corruption on the console/in the logs. Ensure buf is initialized to an empty string. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> [dvhart: Initialize string to "" rather than assign buf[0] = NULL;] Cc: Jiri Kosina <jikos@kernel.org> Cc: linux-input@vger.kernel.org Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 604bfd00) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Joe Perches authored
[ Upstream commit 077cb37f ] It seems that kernel memory can leak into userspace by a kmalloc, ethtool_get_strings, then copy_to_user sequence. Avoid this by using kcalloc to zero fill the copied buffer. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 68c3e59a) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Dāvis Mosāns authored
commit 22805217 upstream. When pci_pool_alloc fails in mvs_task_prep then task->lldd_task stays NULL but it's later used in mvs_abort_task as slot which is passed to mvs_slot_task_free causing NULL pointer dereference. Just return from mvs_slot_task_free when passed with NULL slot. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101891Signed-off-by: Dāvis Mosāns <davispuh@gmail.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit cc1875ec) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Kosuke Tatsukawa authored
commit e81107d4 upstream. My colleague ran into a program stall on a x86_64 server, where n_tty_read() was waiting for data even if there was data in the buffer in the pty. kernel stack for the stuck process looks like below. #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 #1 [ffff88303d107bd0] schedule at ffffffff815c513e #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 #5 [ffff88303d107dd0] tty_read at ffffffff81368013 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 #8 [ffff88303d107f00] sys_read at ffffffff811a4306 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 There seems to be two problems causing this issue. First, in drivers/tty/n_tty.c, __receive_buf() stores the data and updates ldata->commit_head using smp_store_release() and then checks the wait queue using waitqueue_active(). However, since there is no memory barrier, __receive_buf() could return without calling wake_up_interactive_poll(), and at the same time, n_tty_read() could start to wait in wait_woken() as in the following chart. __receive_buf() n_tty_read() ------------------------------------------------------------------------ if (waitqueue_active(&tty->read_wait)) /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ add_wait_queue(&tty->read_wait, &wait); ... if (!input_available_p(tty, 0)) { smp_store_release(&ldata->commit_head, ldata->read_head); ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ The second problem is that n_tty_read() also lacks a memory barrier call and could also cause __receive_buf() to return without calling wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() as in the chart below. __receive_buf() n_tty_read() ------------------------------------------------------------------------ spin_lock_irqsave(&q->lock, flags); /* from add_wait_queue() */ ... if (!input_available_p(tty, 0)) { /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ smp_store_release(&ldata->commit_head, ldata->read_head); if (waitqueue_active(&tty->read_wait)) __add_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock,flags); /* from add_wait_queue() */ ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ There are also other places in drivers/tty/n_tty.c which have similar calls to waitqueue_active(), so instead of adding many memory barrier calls, this patch simply removes the call to waitqueue_active(), leaving just wake_up*() behind. This fixes both problems because, even though the memory access before or after the spinlocks in both wake_up*() and add_wait_queue() can sneak into the critical section, it cannot go past it and the critical section assures that they will be serialized (please see "INTER-CPU ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a better explanation). Moreover, the resulting code is much simpler. Latency measurement using a ping-pong test over a pty doesn't show any visible performance drop. Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: - Use wake_up_interruptible(), not wake_up_interruptible_poll() - There are only two spurious uses of waitqueue_active() to remove] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 80910ccd) [wt: file is drivers/char/n_tty.c in 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Mel Gorman authored
commit 2f84a899 upstream. SunDong reported the following on https://bugzilla.kernel.org/show_bug.cgi?id=103841 I think I find a linux bug, I have the test cases is constructed. I can stable recurring problems in fedora22(4.0.4) kernel version, arch for x86_64. I construct transparent huge page, when the parent and child process with MAP_SHARE, MAP_PRIVATE way to access the same huge page area, it has the opportunity to lead to huge page copy on write failure, and then it will munmap the child corresponding mmap area, but then the child mmap area with VM_MAYSHARE attributes, child process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags functions (vma - > vm_flags & VM_MAYSHARE). There were a number of problems with the report (e.g. it's hugetlbfs that triggers this, not transparent huge pages) but it was fundamentally correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that looks like this vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800 prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0 pgoff 0 file ffff88106bdb9800 private_data (null) flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb) ------------ kernel BUG at mm/hugetlb.c:462! SMP Modules linked in: xt_pkttype xt_LOG xt_limit [..] CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012 set_vma_resv_flags+0x2d/0x30 The VM_BUG_ON is correct because private and shared mappings have different reservation accounting but the warning clearly shows that the VMA is shared. When a private COW fails to allocate a new page then only the process that created the VMA gets the page -- all the children unmap the page. If the children access that data in the future then they get killed. The problem is that the same file is mapped shared and private. During the COW, the allocation fails, the VMAs are traversed to unmap the other private pages but a shared VMA is found and the bug is triggered. This patch identifies such VMAs and skips them. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: SunDong <sund_sky@126.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 846bc2d8) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Thomas Gleixner authored
commit eddd3826 upstream. Dmitry Vyukov reported the following using trinity and the memory error detector AddressSanitizer (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel). [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000 [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a) [ 124.578633] Accessed by thread T10915: [ 124.579295] inlined in describe_heap_address ./arch/x86/mm/asan/report.c:164 [ 124.579295] #0 ffffffff810dd277 in asan_report_error ./arch/x86/mm/asan/report.c:278 [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region ./arch/x86/mm/asan/asan.c:37 [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0 [ 124.581893] #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444 The address checks in the 64bit implementation of get_wchan() are wrong in several ways: - The lower bound of the stack is not the start of the stack page. It's the start of the stack page plus sizeof (struct thread_info) - The upper bound must be: top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long). The 2 * sizeof(unsigned long) is required because the stack pointer points at the frame pointer. The layout on the stack is: ... IP FP ... IP FP. So we need to make sure that both IP and FP are in the bounds. Fix the bound checks and get rid of the mix of numeric constants, u64 and unsigned long. Making all unsigned long allows us to use the same function for 32bit as well. Use READ_ONCE() when accessing the stack. This does not prevent a concurrent wakeup of the task and the stack changing, but at least it avoids TOCTOU. Also check task state at the end of the loop. Again that does not prevent concurrent changes, but it avoids walking for nothing. Add proper comments while at it. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de> Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.2: - s/READ_ONCE/ACCESS_ONCE/ - Remove use of TOP_OF_KERNEL_STACK_PADDING, not defined here and would be defined as 0] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 5311d93d) Signed-off-by: Willy Tarreau <w@1wt.eu>
-
Peter Zijlstra authored
commit 275d7d44 upstream. Poma (on the way to another bug) reported an assertion triggering: [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90 [<ffffffff81150822>] __module_address+0x32/0x150 [<ffffffff81150956>] __module_text_address+0x16/0x70 [<ffffffff81150f19>] symbol_put_addr+0x29/0x40 [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core] Laura Abbott <labbott@redhat.com> produced a patch which lead us to inspect symbol_put_addr(). This function has a comment claiming it doesn't need to disable preemption around the module lookup because it holds a reference to the module it wants to find, which therefore cannot go away. This is wrong (and a false optimization too, preempt_disable() is really rather cheap, and I doubt any of this is on uber critical paths, otherwise it would've retained a pointer to the actual module anyway and avoided the second lookup). While its true that the module cannot go away while we hold a reference on it, the data structure we do the lookup in very much _CAN_ change while we do the lookup. Therefore fix the comment and add the required preempt_disable(). Reported-by: poma <pomidorabelisima@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: a6e6abd5 ("module: remove module_text_address()") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3895ff2d) Signed-off-by: Willy Tarreau <w@1wt.eu>
-