- 02 May, 2012 9 commits
-
-
Jason Wang authored
When a packet were fully copied in zerocopy, we don't wait for the DMA done to mark the done flag, so after the packet were passed to lower device, we need to add used and signal guest immediately. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
Currently, we restart tx polling unconditionally when sendmsg() fails. This would cause unnecessary wakeups of vhost wokers and waste cpu utlization when evil userspace(guest driver) is able to hit EFAULT or EINVAL. The polling is only needed when the socket send buffer were exceeded or not enough memory. So fix this by restarting polling only when sendmsg() returns EAGAIN/ENOBUFS. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
When we want to disable vhost_net backend while there's a tx work, a possible NULL pointer defernece may happen we we try to deference the vq->bufs after vhost_net_set_backend() assign a NULL to it. As suggested by Michael, fix this by checking the vq->bufs instead of vhost_sock_zcopy(). Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
There're several reasons that the vectors need to be validated: - Return error when caller provides vectors whose num is greater than UIO_MAXIOV. - Linearize part of skb when userspace provides vectors grater than MAX_SKB_FRAGS. - Return error when userspace provides vectors whose total length may exceed - MAX_SKB_FRAGS * PAGE_SIZE. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
Current the SKBTX_DEV_ZEROCOPY is set unconditionally after zerocopy_sg_from_iovec(), this would lead NULL pointer when macvtap fails to build zerocopy skb because destructor_arg was not initialized. Solve this by set this flag after the skb were built successfully. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
When get_user_pages_fast() fails to get all requested pages, we could not use kfree_skb() to free it as it has not been put in the skb fragments. So we need to call put_page() instead. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
As the skb fragment were pinned/built from user pages, we should account the page instead of length for truesize. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
This patch fixes the offset calculation when building skb: - offset1 were used as skb data offset not vector offset - reset offset to zero only when we advance to next vector Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Michael S. Tsirkin authored
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
- 01 May, 2012 31 commits
-
-
Eric Dumazet authored
Add ECN (Explicit Congestion Notification) marking capability to netem tc qdisc add dev eth0 root netem drop 0.5 ecn Instead of dropping packets, try to ECN mark them. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
remove useless casts and rename variables for less confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP or UDP stacks have big enough latencies that prefetching next pointer is worth it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
The netlink API lets users create unmanaged L2TPv3 tunnels using iproute2. Until now, a request to create an unmanaged L2TPv3 IP encapsulation tunnel over IPv6 would be rejected with EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP encapsulation over IPv6, we can add support for that tunnel type. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Elston authored
L2TPv3 defines an IP encapsulation packet format where data is carried directly over IP (no UDP). The kernel already has support for L2TP IP encapsulation over IPv4 (l2tp_ip). This patch introduces support for L2TP IP encapsulation over IPv6. The implementation is derived from ipv6/raw and ipv4/l2tp_ip. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Elston authored
For implementing other protocols on top of IPv6, such as L2TPv3's IP encapsulation over ipv6, we'd like to call some IPv6 functions which are not currently exported. This patch exports them. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Elston authored
This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using the netlink API. We already support unmanaged L2TPv3 tunnels over IPv4. A patch to iproute2 to make use of this feature will be submitted separately. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Elston authored
If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the IPv6 address correctly. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
Userspace uses connect() to associate a pppol2tp socket with a tunnel socket. This needs to allow the caller to supply the new IPv6 sockaddr_pppol2tp structures if IPv6 is used. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
Checkpatch warns about the use of __attribute__((packed)). So use the recommended __packed syntax instead. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
The l2tp_ip socket currently maintains packet/byte stats in its private socket structure. But these counters aren't exposed to userspace and so serve no purpose. The counters were also smp-unsafe. So this patch just gets rid of the stats. While here, change a couple of internal __u32 variables to u32. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
Cleanup the l2tp_ip code to make use of an existing ipv4 support function. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
L2TP uses 64-bit counters but since these are not updated atomically, we need to make them safe for smp. This patch addresses that. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
PHY polling code for FPGA is considered in every MDIO R/W API. no need to add additional code to atl1c_change_mtu. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: David Liu <dwliu@qca.qaulcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
L0S might be unstable if no cable link, only enable it when link up. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
There may be tx-skbs still pending in HW when PHY link down. Reset MAC will make the DMA engine go to the start point. and release all pending skbs. Note: Reset MAC will clear any interrupt status and mask. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
common_task might be running while close routine is called, wait/cancel it. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
The hardware incorrectly process L0S/L1 entrance if the chipset/root response after specific/shorter timer and cause system hang. Enlarge the timeout value to avoid this issue. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
On some platform with EEPROM/OTP existing, the BIOS could overwrite a new MAC address for the NIC. so, the permanent mac address should be from BIOS. the address is restored when driver removing. Voltage raising isn't applicable for l1d. Replace swab32 with htonl for big/little endian platform. related Registers are refined as well. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
The Close-action is done by atl1c_reset_pcie, remove it from atl1c_get_permanent_address. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
WoL status is read-clear and should be cleared when in S0 status. putting it in atl1c_reset_pcie is more suitable than in atl1c_get_permanent_address. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
On some platforms the PHY settings need to change depending on the cable link status to get better stability. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang, Xiong authored
All supported devices have one issue that msi interrupt doesn't assert if pci command register bit (PCI_COMMAND_INTX_DISABLE) is set. Add workaround in drivers/pci/quirks.c Signed-off-by: xiong <xiong@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
-
Eric Dumazet authored
Before doing skb->head_frag work on bnx2x driver, I found too much stuff was inlined in bnx2x/bnx2x_cmn.h for no good reason and made my work not very easy. Move some big functions out of this include file to the respective .c file. A lot of inline keywords are not needed at all in this huge driver. text data bss dec hex filename 490083 1270 56 491409 77f91 bnx2x/bnx2x.ko.before 484206 1270 56 485532 7689c bnx2x/bnx2x.ko Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
RongQing.Li authored
The reset logic after a Rx FIFO overrun will clear the programmed multicast addresses. This patch fixes the issue by reprogramming the registers after the reset. The commit eefc48b0 ("pch_gbe: reprogram multicast address register on reset") tried to fix this problem, but it introduces unnecessary codes. In fact, all multicast addresses have been saved in netdev->mc, So we can call pch_gbe_set_multi() directly after reset_hw and reset_rx. This commit kills 50+ line codes Cc: Richard Cochran <richardcochran@gmail.com> Cc: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
__skb_splice_bits() can check if skb to be spliced has its skb->head mapped to a page fragment, instead of a kmalloc() area. If so we can avoid a copy of the skb head and get a reference on underlying page. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP coalesce can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We had to disable coalescing in this case, for performance reasons. We 'upgrade' skb->head as a fragment in itself. This reduces number of cache misses when user makes its copies, since a less sk_buff are fetched. This makes receive and ofo queues shorter and thus reduce cache line misses in TCP stack. This is a followup of patch "net: allow skb->head to be a page fragment" Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce" counter being incremented. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This patch converts tg3 driver, one of our reference drivers, to use new build_skb() api in frag mode. Instead of using kmalloc() to allocate the memory block that will be used by build_skb() as skb->head, we use a page fragment. This is a followup of patch "net: allow skb->head to be a page fragment" This allows GRO, TCP coalescing, and splice() to be more efficient. Incidentally, this also removes SLUB slow path contention in kfree() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-