- 21 Mar, 2016 1 commit
-
-
Rainer Weikusat authored
commit 7d267278 upstream. Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by:
Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by:
Jason Baron <jbaron@akamai.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Zefan Li <lizefan@huawei.com>
-
- 27 Apr, 2014 1 commit
-
-
Eric Dumazet authored
[ Upstream commit de144391 ] Some applications didn't expect recvmsg() on a non blocking socket could return -EINTR. This possibility was added as a side effect of commit b3ca9b02 ("net: fix multithreaded signal handling in unix recv routines"). To hit this bug, you need to be a bit unlucky, as the u->readlock mutex is usually held for very small periods. Fixes: b3ca9b02 ("net: fix multithreaded signal handling in unix recv routines") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Jan, 2014 2 commits
-
-
Sasha Levin authored
[ Upstream commit 37ab4fa7 ] This is similar to the set_peek_off patch where calling bind while the socket is stuck in unix_dgram_recvmsg() will block and cause a hung task spew after a while. This is also the last place that did a straightforward mutex_lock(), so there shouldn't be any more of these patches. Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sasha Levin authored
[ Upstream commit 12663bfc ] unix_dgram_recvmsg() will hold the readlock of the socket until recv is complete. In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until unix_dgram_recvmsg() will complete (which can take a while) without allowing us to break out of it, triggering a hung task spew. Instead, allow set_peek_off to fail, this way userspace will not hang. Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Acked-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 08 Dec, 2013 1 commit
-
-
Hannes Frederic Sowa authored
[ Upstream commit f3d33426 ] This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 04 Nov, 2013 2 commits
-
-
Daniel Borkmann authored
[ Upstream commit 90c6bd34 ] In the case of credentials passing in unix stream sockets (dgram sockets seem not affected), we get a rather sparse race after commit 16e57262 ("af_unix: dont send SCM_CREDENTIALS by default"). We have a stream server on receiver side that requests credential passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED on each spawned/accepted socket on server side to 1 first (as it's not inherited), it can happen that in the time between accept() and setsockopt() we get interrupted, the sender is being scheduled and continues with passing data to our receiver. At that time SO_PASSCRED is neither set on sender nor receiver side, hence in cmsg's SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534 (== overflow{u,g}id) instead of what we actually would like to see. On the sender side, here nc -U, the tests in maybe_add_creds() invoked through unix_stream_sendmsg() would fail, as at that exact time, as mentioned, the sender has neither SO_PASSCRED on his side nor sees it on the server side, and we have a valid 'other' socket in place. Thus, sender believes it would just look like a normal connection, not needing/requesting SO_PASSCRED at that time. As reverting 16e57262 would not be an option due to the significant performance regression reported when having creds always passed, one way/trade-off to prevent that would be to set SO_PASSCRED on the listener socket and allow inheriting these flags to the spawned socket on server side in accept(). It seems also logical to do so if we'd tell the listener socket to pass those flags onwards, and would fix the race. Before, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}}, msg_flags=0}, 0) = 5 After, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}}, msg_flags=0}, 0) = 5 Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mathias Krause authored
[ Upstream commit 6865d1e8 ] When filling the netlink message we miss to wipe the pad field, therefore leak one byte of heap memory to userland. Fix this by setting pad to 0. Signed-off-by:
Mathias Krause <minipli@googlemail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 01 May, 2013 1 commit
-
-
Eric W. Biederman authored
[ Upstream commit 0e82e7f6 ] It was reported that the following LSB test case failed https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we were not coallescing unix stream messages when the application was expecting us to. The problem was that the first send was before the socket was accepted and thus sock->sk_socket was NULL in maybe_add_creds, and the second send after the socket was accepted had a non-NULL value for sk->socket and thus we could tell the credentials were not needed so we did not bother. The unnecessary credentials on the first message cause unix_stream_recvmsg to start verifying that all messages had the same credentials before coallescing and then the coallescing failed because the second message had no credentials. Ignoring credentials when we don't care in unix_stream_recvmsg fixes a long standing pessimization which would fail to coallesce messages when reading from a unix stream socket if the senders were different even if we did not care about their credentials. I have tested this and verified that the in the LSB test case mentioned above that the messages do coallesce now, while the were failing to coallesce without this change. Reported-by:
Karel Srot <ksrot@redhat.com> Reported-by:
Ding Tianhong <dingtianhong@huawei.com> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 05 Apr, 2013 1 commit
-
-
Paul Moore authored
[ Upstream commit ded34e0f ] As reported by Jan, and others over the past few years, there is a race condition caused by unix_release setting the sock->sk pointer to NULL before properly marking the socket as dead/orphaned. This can cause a problem with the LSM hook security_unix_may_send() if there is another socket attempting to write to this partially released socket in between when sock->sk is set to NULL and it is marked as dead/orphaned. This patch fixes this by only setting sock->sk to NULL after the socket has been marked as dead; I also take the opportunity to make unix_release_sock() a void function as it only ever returned 0/success. Dave, I think this one should go on the -stable pile. Special thanks to Jan for coming up with a reproducer for this problem. Reported-by:
Jan Stancek <jan.stancek@gmail.com> Signed-off-by:
Paul Moore <pmoore@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 02 Oct, 2012 1 commit
-
-
Eric Dumazet authored
[ Upstream commit e0e3cea4 ] Pablo Neira Ayuso discovered that avahi and potentially NetworkManager accept spoofed Netlink messages because of a kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data to the receiver if the sender did not provide such data, instead of not including any such data at all or including the correct data from the peer (as it is the case with AF_UNIX). This bug was introduced in commit 16e57262 (af_unix: dont send SCM_CREDENTIALS by default) This patch forces passing credentials for netlink, as before the regression. Another fix would be to not add SCM_CREDENTIALS in netlink messages if not provided by the sender, but it might break some programs. With help from Florian Weimer & Petr Matousek This issue is designated as CVE-2012-3520 Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 23 Mar, 2012 1 commit
-
-
Hans Verkuil authored
In some cases the poll() implementation in a driver has to do different things depending on the events the caller wants to poll for. An example is when a driver needs to start a DMA engine if the caller polls for POLLIN, but doesn't want to do that if POLLIN is not requested but instead only POLLOUT or POLLPRI is requested. This is something that can happen in the video4linux subsystem among others. Unfortunately, the current epoll/poll/select implementation doesn't provide that information reliably. The poll_table_struct does have it: it has a key field with the event mask. But once a poll() call matches one or more bits of that mask any following poll() calls are passed a NULL poll_table pointer. Also, the eventpoll implementation always left the key field at ~0 instead of using the requested events mask. This was changed in eventpoll.c so the key field now contains the actual events that should be polled for as set by the caller. The solution to the NULL poll_table pointer is to set the qproc field to NULL in poll_table once poll() matches the events, not the poll_table pointer itself. That way drivers can obtain the mask through a new poll_requested_events inline. The poll_table_struct can still be NULL since some kernel code calls it internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In that case poll_requested_events() returns ~0 (i.e. all events). Very rarely drivers might want to know whether poll_wait will actually wait. If another earlier file descriptor in the set already matched the events the caller wanted to wait for, then the kernel will return from the select() call without waiting. This might be useful information in order to avoid doing expensive work. A new helper function poll_does_not_wait() is added that drivers can use to detect this situation. This is now used in sock_poll_wait() in include/net/sock.h. This was the only place in the kernel that needed this information. Drivers should no longer access any of the poll_table internals, but use the poll_requested_events() and poll_does_not_wait() access functions instead. In order to enforce that the poll_table fields are now prepended with an underscore and a comment was added warning against using them directly. This required a change in unix_dgram_poll() in unix/af_unix.c which used the key field to get the requested events. It's been replaced by a call to poll_requested_events(). For qproc it was especially important to change its name since the behavior of that field changes with this patch since this function pointer can now be NULL when that wasn't possible in the past. Any driver accessing the qproc or key fields directly will now fail to compile. Some notes regarding the correctness of this patch: the driver's poll() function is called with a 'struct poll_table_struct *wait' argument. This pointer may or may not be NULL, drivers can never rely on it being one or the other as that depends on whether or not an earlier file descriptor in the select()'s fdset matched the requested events. There are only three things a driver can do with the wait argument: 1) obtain the key field: events = wait ? wait->key : ~0; This will still work although it should be replaced with the new poll_requested_events() function (which does exactly the same). This will now even work better, since wait is no longer set to NULL unnecessarily. 2) use the qproc callback. This could be deadly since qproc can now be NULL. Renaming qproc should prevent this from happening. There are no kernel drivers that actually access this callback directly, BTW. 3) test whether wait == NULL to determine whether poll would return without waiting. This is no longer sufficient as the correct test is now wait == NULL || wait->_qproc == NULL. However, the worst that can happen here is a slight performance hit in the case where wait != NULL and wait->_qproc == NULL. In that case the driver will assume that poll_wait() will actually add the fd to the set of waiting file descriptors. Of course, poll_wait() will not do that since it tests for wait->_qproc. This will not break anything, though. There is only one place in the whole kernel where this happens (sock_poll_wait() in include/net/sock.h) and that code will be replaced by a call to poll_does_not_wait() in the next patch. Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait() actually waiting. The next file descriptor from the set might match the event mask and thus any possible waits will never happen. Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by:
Jonathan Corbet <corbet@lwn.net> Reviewed-by:
Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by:
Hans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 21 Mar, 2012 2 commits
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 26 Feb, 2012 1 commit
-
-
Pablo Neira Ayuso authored
Davem considers that the argument list of this interface is getting out of control. This patch tries to address this issue following his proposal: struct netlink_dump_control c = { .dump = dump, .done = done, ... }; netlink_dump_start(..., &c); Suggested by David S. Miller. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 22 Feb, 2012 1 commit
-
-
Eric Dumazet authored
Piergiorgio Beruto expressed the need to fetch size of first datagram in queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl. I suggested instead to implement MSG_TRUNC support as a recv() input flag, as already done for RAW, UDP & NETLINK sockets. len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC); MSG_TRUNC asks recv() to return the real length of the packet, even when is was longer than the passed buffer. There is risk that a userland application used MSG_TRUNC by accident (since it had no effect on af_unix sockets) and this might break after this patch. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Tested-by:
Piergiorgio Beruto <piergiorgio.beruto@gmail.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 21 Feb, 2012 2 commits
-
-
Pavel Emelyanov authored
The same here -- we can protect the sk_peek_off manipulations with the unix_sk->readlock mutex. The peeking of data from a stream socket is done in the datagram style, i.e. even if there's enough room for more data in the user buffer, only the head skb's data is copied in there. This feature is preserved when peeking data from a given offset -- the data is read till the nearest skb's boundary. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
The sk_peek_off manipulations are protected with the unix_sk->readlock mutex. This mutex is enough since all we need is to syncronize setting the offset vs reading the queue head. The latter is fully covered with the mentioned lock. The recently added __skb_recv_datagram's offset is used to pick the skb to read the data from. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 30 Jan, 2012 1 commit
-
-
Eric Dumazet authored
Commit 0884d7aa (AF_UNIX: Fix poll blocking problem when reading from a stream socket) added a regression for epoll() in Edge Triggered mode (EPOLLET) Appropriate fix is to use skb_peek()/skb_unlink() instead of skb_dequeue(), and only call skb_unlink() when skb is fully consumed. This remove the need to requeue a partial skb into sk_receive_queue head and the extra sk->sk_data_ready() calls that added the regression. This is safe because once skb is given to sk_receive_queue, it is not modified by a writer, and readers are serialized by u->readlock mutex. This also reduce number of spinlock acquisition for small reads or MSG_PEEK users so should improve overall performance. Reported-by:
Nick Mathewson <nickm@freehaven.net> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Cc: Alexey Moiseytsev <himeraster@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 07 Jan, 2012 1 commit
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 04 Jan, 2012 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 30 Dec, 2011 3 commits
-
-
Pavel Emelyanov authored
While it's not too late fix the recently added RQLEN diag extension to report rqlen and wqlen in the same way as TCP does. I.e. for listening sockets the ack backlog length (which is the input queue length for socket) in rqlen and the max ack backlog length in wqlen, and what the CINQ/OUTQ ioctls do for established. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Currently tcp diag reports rqlen and wqlen values similar to how the CINQ/COUTQ iotcls do. To make unix diag report these values in the same way move the respective code into helpers. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
[ Fix indentation of sock_diag*() calls. -DaveM ] Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 26 Dec, 2011 2 commits
-
-
David S. Miller authored
Otherwise we leave uninitialized kernel memory in there. Reported-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
The NLA_PUT macro should accept the actual attribute length, not the amount of elements in array :( Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Dec, 2011 1 commit
-
-
Cyrill Gorcunov authored
Otherwise getting | net/unix/diag.c:312:16: error: expected declaration specifiers or ‘...’ before string constant | net/unix/diag.c:313:1: error: expected declaration specifiers or ‘...’ before string constant Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 16 Dec, 2011 10 commits
-
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
When establishing a unix connection on stream sockets the server end receives an skb with socket in its receive queue. Report who is waiting for these ends to be accepted for listening sockets via NLA. There's a lokcing issue with this -- the unix sk state lock is required to access the peer, and it is taken under the listening sk's queue lock. Strictly speaking the queue lock should be taken inside the state lock, but since in this case these two sockets are different it shouldn't lead to deadlock. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Report the peer socket inode ID as NLA. With this it's finally possible to find out the other end of an interesting unix connection. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Actually, the socket path if it's not anonymous doesn't give a clue to which file the socket is bound to. Even if the path is absolute, it can be unlinked and then new socket can be bound to it. With this NLA it's possible to check which file a particular socket is really bound to. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Report the sun_path when requested as NLA. With leading '\0' if present but without the leading AF_UNIX bits. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
The socket inode is used as a key for lookup. This is effectively the only really unique ID of a unix socket, but using this for search currently has one problem -- it is O(number of sockets) :( Does it worth fixing this lookup or inventing some other ID for unix sockets? Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Walk the unix sockets table and fill the core response structure, which includes type, state and inode. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Includes basic module_init/_exit functionality, dump/get_exact stubs and declares the basic API structures for request and response. Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 26 Nov, 2011 1 commit
-
-
Alexey Moiseytsev authored
poll() call may be blocked by concurrent reading from the same stream socket. Signed-off-by:
Alexey Moiseytsev <himeraster@gmail.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 28 Sep, 2011 1 commit
-
-
Eric Dumazet authored
Since commit 7361c36c (af_unix: Allow credentials to work across user and pid namespaces) af_unix performance dropped a lot. This is because we now take a reference on pid and cred in each write(), and release them in read(), usually done from another process, eventually from another cpu. This triggers false sharing. # Events: 154K cycles # # Overhead Command Shared Object Symbol # ........ ....... .................. ......................... # 10.40% hackbench [kernel.kallsyms] [k] put_pid 8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg 7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg 6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock 4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb 4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns 4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred 2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm ...
-
- 16 Sep, 2011 1 commit
-
-
David S. Miller authored
This reverts commit 0856a304 . As requested by Eric Dumazet, it has various ref-counting problems and has introduced regressions. Eric will add a more suitable version of this performance fix. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 25 Aug, 2011 1 commit
-
-
Tim Chen authored
Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to allow credentials to work across pid namespaces for packets sent via UNIX sockets. However, the atomic reference counts on pid and credentials caused plenty of cache bouncing when there are numerous threads of the same pid sharing a UNIX socket. This patch mitigates the problem by eliminating extraneous reference counts on pid and credentials on both send and receive path of UNIX sockets. I found a 2x improvement in hackbench's threaded case. On the receive path in unix_dgram_recvmsg, currently there is an increment of reference count on pid and credentials in scm_set_cred. Then there are two decrement of the reference counts. Once in scm_recv and once when skb_free_datagram call skb->destructor function unix_destruct_scm. One pair of increment and decrement of ref count on pid and credentials can be eliminated from the receive path. Until we destroy the skb, we already set a reference when we created the skb on the send side. On the send path, there are two increments of ref count on pid and credentials, once in scm_send and once in unix_scm_to_skb. Then there is a decrement of the reference counts in scm_destroy's call to scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair of increment and decrement of the reference counts can be removed so we only need to increment the ref counts once. By incorporating these changes, for hackbench running on a 4 socket NHM-EX machine with 40 cores, the execution of hackbench on 50 groups of 20 threads sped up by factor of 2. Hackbench command used for testing: ./hackbench 50 thread 2000 Signed-off-by:
Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-