- 21 Apr, 2012 40 commits
-
-
Eric Dumazet authored
iov of more than 8 entries are allocated in sendmsg()/recvmsg() through sock_kmalloc() As these allocations are temporary only and small enough, it makes sense to use plain kmalloc() and avoid sk_omem_alloc atomic overhead. Slightly changed fast path to be even faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mike Waychison <mikew@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
There are options, which are set up on a socket while performing TCP handshake. Need to resurrect them on a socket while repairing. A new sockoption accepts a buffer and parses it. The buffer should be CODE:VALUE sequence of bytes, where CODE is standard option code and VALUE is the respective value. Only 4 options should be handled on repaired socket. To read 3 out of 4 of these options the TCP_INFO sockoption can be used. An ability to get the last one (the mss_clamp) was added by the previous patch. Now the restore. Three of these options -- timestamp_ok, mss_clamp and snd_wscale -- are just restored on a coket. The sack_ok flags has 2 issues. First, whether or not to do sacks at all. This flag is just read and set back. No other sack info is saved or restored, since according to the standart and the code dropping all sack-ed segments is OK, the sender will resubmit them again, so after the repair we will probably experience a pause in connection. Next, the fack bit. It's just set back on a socket if the respective sysctl is set. No collected stats about packets flow is preserved. As far as I see (plz, correct me if I'm wrong) the fack-based congestion algorithm survives dropping all of the stats and repairs itself eventually, probably losing the performance for that period. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
The mss_clamp is the only connection-time negotiated option which cannot be obtained from the user space. Make the TCP_MAXSEG sockopt report one in the repair mode. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Reading queues under repair mode is done with recvmsg call. The queue-under-repair set by TCP_REPAIR_QUEUE option is used to determine which queue should be read. Thus both send and receive queue can be read with this. Caller must pass the MSG_PEEK flag. Writing to queues is done with sendmsg call and yet again -- the repair-queue option can be used to push data into the receive queue. When putting an skb into receive queue a zero tcp header is appented to its head to address the tcp_hdr(skb)->syn and the ->fin checks by the (after repair) tcp_recvmsg. These flags flags are both set to zero and that's why. The fin cannot be met in the queue while reading the source socket, since the repair only works for closed/established sockets and queueing fin packet always changes its state. The syn in the queue denotes that the respective skb's seq is "off-by-one" as compared to the actual payload lenght. Thus, at the rcv queue refill we can just drop this flag and set the skb's sequences to precice values. When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent, waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the protocol POV the send queue looks like it was sent, but the data between the write_seq and snd_nxt is lost in the network. This helps to avoid another sockoption for setting the snd_nxt sequence. Leaving the whole queue in a 'not yet sent' state (as it will be after sendmsg-s) will not allow to receive any acks from the peer since the ack_seq will be after the snd_nxt. Thus even the ack for the window probe will be dropped and the connection will be 'locked' with the zero peer window. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
This includes (according the the previous description): * TCP_REPAIR sockoption This one just puts the socket in/out of the repair mode. Allowed for CAP_NET_ADMIN and for closed/establised sockets only. When repair mode is turned off and the socket happens to be in the established state the window probe is sent to the peer to 'unlock' the connection. * TCP_REPAIR_QUEUE sockoption This one sets the queue which we're about to repair. The 'no-queue' is set by default. * TCP_QUEUE_SEQ socoption Sets the write_seq/rcv_nxt of a selected repaired queue. Allowed for TCP_CLOSE-d sockets only. When the socket changes its state the other seq-s are changed by the kernel according to the protocol rules (most of the existing code is actually reused). * Ability to forcibly bind a socket to a port The sk->sk_reuse is set to SK_FORCE_REUSE. * Immediate connect modification The connect syscall initializes the connection, then directly jumps to the code which finalizes it. * Silent close modification The close just aborts the connection (similar to SO_LINGER with 0 time) but without sending any FIN/RST-s to peer. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Emelyanov authored
Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The two options are separate, and some platforms (e.g. arm pxa) have ISA slots but no ISA dma controller, so they cannot build drivers using the DMA API functions. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
Some architectures like ARM cannot handle large numbers as arguments to udelay, so the drivers should use mdelay when delaying for multiple miliseconds. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
No driver should spin the CPU for 10ms, so better use an msleep, which is allowed in the ->suspend function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The ax88796 driver uses the CRC32 functions, so make sure that they are actually enabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The iwmc3200 driver selects other code in Kconfig that depends on EXPERIMENTAL. Kconfig warns about this when CONFIG_EXPERIMENTAL is not already set, so logically, these options should also be marked experimental or promoted to stable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The caif_shmcore requires io.h in order to use ioremap, so include that explicitly to compile in all configurations. Also add a note about the use of ioremap(), which is not a proper way to map a DMA buffer into kernel space. It's not completely clear what the intention is for using ioremap, but it is clear that the result of ioremap must not simply be accessed using kernel pointers but should use readl/writel or memcopy_{to,from}io. Assigning the result of ioremap to a regular pointer that can also be set to something else is not ok. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
Drivers that refer to a __devexit function in an operations structure need to annotate that pointer with __devexit_p so replace it with a NULL pointer when the section gets discarded. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The davinci_emac driver can be a module, so the symbols it needs from the cpdma driver must be exported. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
The time stamping code in this driver appears to have been copied from the ixp4xx_eth.c driver, including this timing comment. I had actually measured the time stamp delay on an IXP425, but I really doubt that this value also applies here. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
This patch fixes code which needlessly ran the BPF twice per packet. Instead, we just run the classifier once and test whether the packet is any kind of PTP event message. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch fixes the driver so that multicast PTP event messages can be recognized by the hardware time stamping unit. The station address register must be set according to the desired transport type. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
We will let the pch_gbe code do that according to the receive time stamp filter. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch clears up a few coding style issues: - Makes two function definitions a bit nicer looking. - Remove unneeded parentheses. - Simplify macros for register bits. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
The code in phc_gbe_main will need to call this method in order to set the station address register according to the receive time stamping filter. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
The reset logic after a Rx FIFO overrun will clear the programmed multicast addresses. This patch fixes the issue by reprogramming the registers after the reset. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch makes logic surrounding the test of the transmit time stamping flag more readable. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takahiro Shimizu authored
This patch fixes the helper functions that give the transmit and receive time stamps to return nanoseconds, instead of arbitrary clock ticks. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
All of the users have been converted to use registera_net_sysctl so we no longer need register_net_sysctl. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
There isn't much advantage here except that strings paths are a bit easier to read, and converting everything to them allows me to kill off ctl_path. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
The sysctl core no longer natively understands sysctl tables with .child entries. Split the ipv6_table to remove the .child entries. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
The sysctl core no longer natively understands sysctl tables with .child entries. Kill the intermediate tables and use register_net_sysctl directly to remove the need for compatibility code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Don't register/unregister every ax25 table in a batch. Instead register and unregister per device ax25 sysctls as ax25 devices come and go. This moves ax25 to be a completely modern sysctl user. Registering the sysctls in just the initial network namespace, removing the use of .child entries that are no longer natively supported by the sysctl core and taking advantage of the fact that there are no longer any ordering constraints between registering and unregistering different sysctl tables. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
sysctl no longer requires explicit creation of directories. The neigh directory is always populated with at least a default entry so this won't cause any user visible changes. Delete the ipv4_path and the ipv4_skeleton these are no longer needed. Directly register the ipv4_route_table. And since I am an idiot remove the header definitions that I should have removed in the previous patch. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
sysctl no longer requires explicit creation of directories. The neigh directory is always populated with at least a default entry so this should cause no user visible changes. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
On the next line we register the net_core_table in net/core which creates the directory and ensures it exists. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
register_sysctl_rotable never caught on as an interesting way to register sysctls. My take on the situation is that what we want are sysctls that we can only see in the initial network namespace. What we have implemented with register_sysctl_rotable are sysctls that we can see in all of the network namespaces and can only change in the initial network namespace. That is a very silly way to go. Just register the network sysctls in the initial network namespace and we don't have any weird special cases to deal with. The sysctls affected are: /proc/sys/net/ipv4/ipfrag_secret_interval /proc/sys/net/ipv4/ipfrag_max_dist /proc/sys/net/ipv6/ip6frag_secret_interval /proc/sys/net/ipv6/mld_max_msf I really don't expect anyone will miss them if they can't read them in a child user namespace. CC: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-