- 11 May, 2012 40 commits
-
-
Steven Rostedt authored
commit db4c75cb upstream. While debugging a latency with someone on IRC (mirage335) on #linux-rt (OFTC), we discovered that the stacktrace output of the latency tracers (preemptirqsoff) was empty. This bug was caused by the creation of the dynamic length stack trace again (like commit 12b5da34 "tracing: Fix ent_size in trace output" was). This bug is caused by the latency tracers requiring the next event to determine the time between the current event and the next. But by grabbing the next event, the iter->ent_size is set to the next event instead of the current one. As the stacktrace event is the last event, this makes the ent_size zero and causes nothing to be printed for the stack trace. The dynamic stacktrace uses the ent_size to determine how much of the stack can be printed. The ent_size of zero means no stack. The simple fix is to save the iter->ent_size before finding the next event. Note, mirage335 asked to remain anonymous from LKML and git, so I will not add the Reported-by and Tested-by tags, even though he did report the issue and tested the fix. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
he, bo authored
commit fb2cf2c6 upstream. Under extreme memory used up situations, percpu allocation might fail. We hit it when system goes to suspend-to-ram, causing a kworker panic: EIP: [<c124411a>] build_sched_domains+0x23a/0xad0 Kernel panic - not syncing: Fatal exception Pid: 3026, comm: kworker/u:3 3.0.8-137473-gf42fbef #1 Call Trace: [<c18cc4f2>] panic+0x66/0x16c [...] [<c1244c37>] partition_sched_domains+0x287/0x4b0 [<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210 [<c123712d>] cpuset_cpu_inactive+0x1d/0x30 [...] With this fix applied build_sched_domains() will return -ENOMEM and the suspend attempt fails. Signed-off-by:
he, bo <bo.he@intel.com> Reviewed-by:
Zhang, Yanmin <yanmin.zhang@intel.com> Reviewed-by:
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo [ So, we fail to deallocate a CPU because we cannot allocate RAM :-/ I don't like that kind of sad behavior but nevertheless it should not crash under high memory load. ] Signed-off-by:
Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: change filename] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Nicolas Ferre authored
commit ed8b0d67 upstream. This loop on EBCISR register was designed to clear IRQ sources before enabling a DMA channel. This register is clear-on-read so a race condition can appear if another channel is already active and has just finished its transfer. Removing this read on EBCISR is fixing the issue as there is no case where an IRQ could be pending: we already make sure that this register is drained at probe() time and during resume. Signed-off-by:
Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by:
Vinod Koul <vinod.koul@linux.intel.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Mark Brown authored
commit 1a38336b upstream. This ensures a clean startup of the channels, without this change some use cases could result in issues in a small proportion of cases. Signed-off-by:
Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Mark Brown authored
commit 7e1f7c8a upstream. Line widgets had not been included in either the power up or power down sequences so if a widget had an event associated with it that event would never be run. Fix this minimally by adding them to the sequences, we should probably be doing away with the specific widget types as they all have the same priority anyway. Signed-off-by:
Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Konrad Rzeszutek Wilk authored
commit cf405ae6 upstream. When we boot on a machine that can hotplug CPUs and we are using 'dom0_max_vcpus=X' on the Xen hypervisor line to clip the amount of CPUs available to the initial domain, we get this: (XEN) Command line: com1=115200,8n1 dom0_mem=8G noreboot dom0_max_vcpus=8 sync_console mce_verbosity=verbose console=com1,vga loglvl=all guest_loglvl=all .. snip.. DMI: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x032.072520111118 07/25/2011 .. snip. SMP: Allowing 64 CPUs, 32 hotplug CPUs installing Xen timer for CPU 7 cpu 7 spinlock event irq 361 NMI watchdog: disabled (cpu7): hardware events not enabled Brought up 8 CPUs .. snip.. [acpi processor finds the CPUs are not initialized and starts calling arch_register_cpu, which creates /sys/devices/system/cpu/cpu8/online] CPU 8 got hotplugged CPU 9 got hotplugged CPU 10 got hotplugged .. snip.. initcall 1_acpi_battery_init_async+0x0/0x1b returned 0 after 406 usecs calling erst_init+0x0/0x2bb @ 1 [and the scheduler sticks newly started tasks on the new CPUs, but said CPUs cannot be initialized b/c the hypervisor has limited the amount of vCPUS to 8 - as per the dom0_max_vcpus=8 flag. The spinlock tries to kick the other CPU, but the structure for that is not initialized and we crash.] BUG: unable to handle kernel paging request at fffffffffffffed8 IP: [<ffffffff81035289>] xen_spin_lock+0x29/0x60 PGD 180d067 PUD 180e067 PMD 0 Oops: 0002 [#1] SMP CPU 7 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc2upstream-00001-gf5154e8 #1 Intel Corporation S2600CP/S2600CP RIP: e030:[<ffffffff81035289>] [<ffffffff81035289>] xen_spin_lock+0x29/0x60 RSP: e02b:ffff8801fb9b3a70 EFLAGS: 00010282 With this patch, we cap the amount of vCPUS that the initial domain can run, to exactly what dom0_max_vcpus=X has specified. In the future, if there is a hypercall that will allow a running domain to expand past its initial set of vCPUS, this patch should be re-evaluated. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
David Vrabel authored
commit 7eb7ce4d upstream. In xen_restore_fl_direct(), xen_force_evtchn_callback() was being called even if no events were pending. This resulted in (depending on workload) about a 100 times as many xen_version hypercalls as necessary. Fix this by correcting the sense of the conditional jump. This seems to give a significant performance benefit for some workloads. There is some subtle tricksy "..since the check here is trying to check both pending and masked in a single cmpw, but I think this is correct. It will call check_events now only when the combined mask+pending word is 0x0001 (aka unmasked, pending)." (Ian) Acked-by:
Ian Campbell <ian.campbell@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Linus Torvalds authored
commit fcbf94b9 upstream. This reverts commit a32744d4. While that commit was technically the right thing to do, and made the x86-64 compat mode work identically to native 32-bit mode (and thus fixing the problem with a 32-bit systemd install on a 64-bit kernel), it turns out that the automount binaries had workarounds for this compat problem. Now, the workarounds are disgusting: doing an "uname()" to find out the architecture of the kernel, and then comparing it for the 64-bit cases and fixing up the size of the read() in automount for those. And they were confused: it's not actually a generic 64-bit issue at all, it's very much tied to just x86-64, which has different alignment for an 'u64' in 64-bit mode than in 32-bit mode. But the end result is that fixing the compat layer actually breaks the case of a 32-bit automount on a x86-64 kernel. There are various approaches to fix this (including just doing a "strcmp()" on current->comm and comparing it to "automount"), but I think that I will do the one that teaches pipes about a special "packet mode", which will allow user space to not have to care too deeply about the padding at the end of the autofs packet. That change will make the compat workaround unnecessary, so let's revert it first, and get automount working again in compat mode. The packetized pipes will then fix autofs for systemd. Reported-and-requested-by:
Michael Tokarev <mjt@tls.msk.ru> Cc: Ian Kent <raven@themaw.net> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Bryan O'Donoghue authored
commit cbf2829b upstream. Current APIC code assumes MSR_IA32_APICBASE is present for all systems. Pentium Classic P5 and friends didn't have this MSR. MSR_IA32_APICBASE was introduced as an architectural MSR by Intel @ P6. Code paths that can touch this MSR invalidly are when vendor == Intel && cpu-family == 5 and APIC bit is set in CPUID - or when you simply pass lapic on the kernel command line, on a P5. The below patch stops Linux incorrectly interfering with the MSR_IA32_APICBASE for P5 class machines. Other code paths exist that touch the MSR - however those paths are not currently reachable for a conformant P5. Signed-off-by:
Bryan O'Donoghue <bryan.odonoghue@linux.intel.com> Link: http://lkml.kernel.org/r/4F8EEDD3.1080404@linux.intel.comSigned-off-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Andreas Herrmann authored
commit a956bd6f upstream. Loading the microcode driver on an unsupported CPU and subsequently unloading the driver causes WARNING: at fs/sysfs/group.c:138 mc_device_remove+0x5f/0x70 [microcode]() Hardware name: 01972NG sysfs group ffffffffa00013d0 not found for kobject 'cpu0' Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel btusb snd_hda_codec bluetooth thinkpad_acpi rfkill microcode(-) [last unloaded: cfg80211] Pid: 4560, comm: modprobe Not tainted 3.4.0-rc2-00002-g258f7426 #5 Call Trace: [<ffffffff8103113b>] ? warn_slowpath_common+0x7b/0xc0 [<ffffffff81031235>] ? warn_slowpath_fmt+0x45/0x50 [<ffffffff81120e74>] ? sysfs_remove_group+0x34/0x120 [<ffffffffa00000ef>] ? mc_device_remove+0x5f/0x70 [microcode] [<ffffffff81331eb9>] ? subsys_interface_unregister+0x69/0xa0 [<ffffffff81563526>] ? mutex_lock+0x16/0x40 [<ffffffffa0000c3e>] ? microcode_exit+0x50/0x92 [microcode] [<ffffffff8107051d>] ? sys_delete_module+0x16d/0x260 [<ffffffff810a0065>] ? wait_iff_congested+0x45/0x110 [<ffffffff815656af>] ? page_fault+0x1f/0x30 [<ffffffff81565ba2>] ? system_call_fastpath+0x16/0x1b on recent kernels. This is due to commit 8a25a2fd ("cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem") which renders commit 6c53cbfc ("x86, microcode: Correct sysdev_add error path") useless. See http://marc.info/?l=linux-kernel&m=133416246406478 Avoid above warning by restoring the old driver behaviour before 6c53cbfc ("x86, microcode: Correct sysdev_add error path"). Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Signed-off-by:
Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.comSigned-off-by:
Borislav Petkov <borislav.petkov@amd.com> [bwh: Backported to 3.2: deleted line uses sys_dev, not dev] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Fred Isaman authored
commit 8ccd271f upstream. Signed-off-by:
Fred Isaman <iisaman@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Fred Isaman authored
commit 73fb7bc7 upstream. Signed-off-by:
Fred Isaman <iisaman@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Trond Myklebust authored
commit 55725513 upstream. Since we may be simulating flock() locks using NFS byte range locks, we can't rely on the VFS having checked the file open mode for us. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Trond Myklebust authored
commit 05ffe24f upstream. All callers of nfs4_handle_exception() that need to handle NFS4ERR_OPENMODE correctly should set exception->inode Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit 98a2139f upstream. When hostname contains colon (e.g. when it is an IPv6 address) it needs to be enclosed in brackets to make parsing of NFS device string possible. Fix nfs_do_root_mount() to enclose hostname properly when needed. NFS code actually does not need this as it does not parse the string passed by nfs_do_root_mount() but the device string is exposed to userspace in /proc/mounts. CC: Josh Boyer <jwboyer@redhat.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Neal Cardwell authored
[ Upstream commit d135c522 ] Commit f5fff5dc forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6 TCP server sockets that used TCP_MAXSEG would find that the advmss of child sockets would be incorrect. This commit mirrors the advmss logic from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this logic should probably be shared between IPv4 and IPv6, but this at least fixes this issue. Signed-off-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric W. Biederman authored
[ Upstream commit 3adadc08 ] While reviewing the sysctl code in ax25 I spotted races in ax25_exit where it is possible to receive notifications and packets after already freeing up some of the data structures needed to process those notifications and updates. Call unregister_netdevice_notifier early so that the rest of the cleanup code does not need to deal with network devices. This takes advantage of my recent enhancement to unregister_netdevice_notifier to send unregister notifications of all network devices that are current registered. Move the unregistration for packet types, socket types and protocol types before we cleanup any of the ax25 data structures to remove the possibilities of other races. Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Dan Carpenter authored
[ Upstream commit 716af4ab ] MAX_ADDR_LEN is 32. ETH_ALEN is 6. mac->sa_data is a 14 byte array, so the memcpy() is doing a read past the end of the array. I asked about this on netdev and Ben Hutchings told me it's supposed to be copying ETH_ALEN bytes (thanks Ben). Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Julian Anastasov authored
[ Upstream commit b922934d ] ops_init should free the net_generic data on init failure and __register_pernet_operations should not call ops_free when NET_NS is not enabled. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Reviewed-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 4d846f02 ] tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing sender to increase its window. tcp_grow_window() still assumes a tcp frame is under MSS, but its no longer true with LRO/GRO. This patch fixes one of the performance issue we noticed with GRO on. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Hiroaki SHIMODA authored
commit 890fdf2a upstream. In register_netdevice(), when ndo_init() is successful and later some error occurred, ndo_uninit() will be called. So dummy deivce is desirable to implement ndo_uninit() method to free percpu stats for this case. And, ndo_uninit() is also called along with dev->destructor() when device is unregistered, so in order to prevent dev->dstats from being freed twice, dev->destructor is modified to free_netdev(). Signed-off-by:
Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Stephane Fillod authored
[ Upstream commit a99ff7d0 ] Make smsc75xx recalculate the hard_mtu after adjusting the hard_header_len. Without this, usbnet adjusts the MTU down to 1492 bytes, and the host is unable to receive standard 1500-byte frames from the device. Inspired by same fix on cdc_eem 78fb72f7. Tested on ARM/Omap3 with EVB-LAN7500-LC. Signed-off-by:
Stephane Fillod <fillods@users.sf.net> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
David Ward authored
[ Upstream commit 244b65db ] A parameter set exists for WRED mode, called wred_set, to hold the same values for qavg and qidlestart across all VQs. The WRED mode values had been previously held in the VQ for the default DP. After these values were moved to wred_set, the VQ for the default DP was no longer created automatically (so that it could be omitted on purpose, to have packets in the default DP enqueued directly to the device without using RED). However, gred_dump() was overlooked during that change; in WRED mode it still reads qavg/qidlestart from the VQ for the default DP, which might not even exist. As a result, this command sequence will cause an oops: tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \ DPs 3 default 2 grio tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS This fixes gred_dump() in WRED mode to use the values held in wred_set. Signed-off-by:
David Ward <david.ward@ll.mit.edu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Davide Ciminaghi authored
[ Upstream commit 8a9a0ea6 ] At the beginning of ks_rcv(), a for loop retrieves the header information relevant to all the frames stored in the mac's internal buffers. The number of pending frames is stored as an 8 bits field in KS_RXFCTR. If interrupts are disabled long enough to allow for more than 32 frames to accumulate in the MAC's internal buffers, a buffer overflow occurs. This patch fixes the problem by making the driver's frame_head_info buffer big enough. Well actually, since the chip appears to have 12K of internal rx buffers and the shortest ethernet frame should be 64 bytes long, maybe the limit could be set to 12*1024/64 = 192 frames, but 255 should be safer. Signed-off-by:
Davide Ciminaghi <ciminaghi@gnudd.com> Signed-off-by:
Raffaele Recalcati <raffaele.recalcati@bticino.it> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Will Deacon authored
[ Upstream commit 3c5e979b ] The SMSC911x driver resets the ->head, ->data and ->tail pointers in the skb on the reset path in order to avoid buffer overflow due to packet padding performed by the hardware. This patch fixes the receive path so that the skb pointers are fixed up after the data has been read from the device, The error path is also fixed to use number of words consistently and prevent erroneous FIFO fastforwarding when skipping over bad data. Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Jason Wang authored
[ Upstream commit a8c9cb10 ] We set intr mask before its handler is registered, this does not work well when 8139cp is sharing irq line with other devices. As the irq could be enabled by the device before 8139cp's hander is registered which may lead unhandled irq. Fix this by introducing an helper cp_irq_enable() and call it after request_irq(). Signed-off-by:
Jason Wang <jasowang@redhat.com> Reviewed-by:
Flavio Leitner <fbl@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Tony Zelenoff authored
[ Upstream commit 03662e41 ] Problem: There was two separate work_struct structures which share one handler. Unfortunately getting atl1_adapter structure from work_struct in case of DMA error was done from incorrect offset which cause kernel panics. Solution: The useless work_struct for DMA error removed and handler name changed to more generic one. Signed-off-by:
Tony Zelenoff <antonz@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ This combines upstream commit a21d4572 and the follow-on bug fix commit 22b4a4f2 ] Marc Merlin reported many order-1 allocations failures in TX path on its wireless setup, that dont make any sense with MTU=1500 network, and non SG capable hardware. After investigation, it turns out TCP uses sk_stream_alloc_skb() and used as a convention skb_tailroom(skb) to know how many bytes of data payload could be put in this skb (for non SG capable devices) Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER + sizeof(struct skb_shared_info) being above 2048) Later, mac80211 layer need to add some bytes at the tail of skb (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is available has to call pskb_expand_head() and request order-1 allocations. This patch changes sk_stream_alloc_skb() so that only sk->sk_prot->max_header bytes of headroom are reserved, and use a new skb field, avail_size to hold the data payload limit. This way, order-0 allocations done by TCP stack can leave more than 2 KB of tailroom and no more allocation is performed in mac80211 layer (or any layer needing some tailroom) avail_size is unioned with mark/dropcount, since mark will be set later in IP stack for output packets. Therefore, skb size is unchanged. Reported-by:
Marc MERLIN <marc@merlins.org> Tested-by:
Marc MERLIN <marc@merlins.org> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> [bwh: Correct commit hash for follow-on bug fix] Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 4fa48bf3 ] commit f07d960d (tcp: avoid frag allocation for small frames) breaked assumption in tcp stack that skb is either linear (skb->data_len == 0), or fully fragged (skb->data_len == skb->len) tcp_trim_head() made this assumption, we must fix it. Thanks to Vijay for providing a very detailed explanation. Reported-by:
Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 87151b86 ] Marc Merlin reported many order-1 allocations failures in TX path on its wireless setup, that dont make any sense with MTU=1500 network, and non SG capable hardware. Turns out part of the problem comes from pskb_expand_head() not using ksize() to get exact head size given by kmalloc(). Doing the same thing than __alloc_skb() allows more tailroom in skb and can prevent future reallocations. As a bonus, struct skb_shared_info becomes cache line aligned. Reported-by:
Marc MERLIN <marc@merlins.org> Tested-by:
Marc MERLIN <marc@merlins.org> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Neal Cardwell authored
[ Upstream commit 18a223e0 ] Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and unscaled RTT samples. The intent in the code was to only use the 'm' measurement if it was a new minimum. However, since 'm' had not yet been shifted left 3 bits but 'new_sample' had, this comparison would nearly always succeed, leading us to erroneously set our receive-side RTT estimate to the 'm' sample when that sample could be nearly 8x too high to use. The overall effect is to often cause the receive-side RTT estimate to be significantly too large (up to 40% too large for brief periods in my tests). Signed-off-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 110c4330 ] As soon as an skb is queued into socket error queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 4a7e7c2a ] As soon as an skb is queued into socket receive_queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Phil Sutter authored
[ Upstream commit 4eee6a3a ] This happened on a machine with a custom hotplug script calling nameif, probably due to slow firmware loading. At the time nameif uses ethtool to gather interface information, i2400m->fw_name is zero and so a null pointer dereference occurs from within i2400m_get_drvinfo(). Signed-off-by:
Phil Sutter <phil.sutter@viprinet.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Veaceslav Falico authored
[ Upstream commit 5a430974 ] When a slave comes up, we're unsetting the current_arp_slave without removing active flags from it, which can lead to situations where we have more than one slave with active flags in active-backup mode. To avoid this situation we must remove the active flags from a slave before removing it as a current_arp_slave. Signed-off-by:
Veaceslav Falico <vfalico@redhat.com> Signed-off-by:
Jay Vosburgh <fubar@us.ibm.com> Signed-off-by:
Andy Gospodarek <andy@greyhouse.net> Signed-off-by:
Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Sasha Levin authored
[ Upstream commit bcf1b70a ] A phonet packet is limited to USHRT_MAX bytes, this is never checked during tx which means that the user can specify any size he wishes, and the kernel will attempt to allocate that size. In the good case, it'll lead to the following warning, but it may also cause the kernel to kick in the OOM and kill a random task on the server. [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730() [ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46 [ 8921.756672] Call Trace: [ 8921.758185] [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0 [ 8921.762868] [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20 [ 8921.765399] [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730 [ 8921.769226] [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20 [ 8921.771686] [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660 [ 8921.773919] [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240 [ 8921.776248] [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0 [ 8921.778294] [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0 [ 8921.780847] [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260 [ 8921.783179] [<ffffffff821b3c65>] __alloc_skb+0x75/0x170 [ 8921.784971] [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260 [ 8921.787111] [<ffffffff821b002e>] ? release_sock+0x7e/0x90 [ 8921.788973] [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20 [ 8921.791052] [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380 [ 8921.792931] [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180 [ 8921.794917] [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90 [ 8921.797053] [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70 [ 8921.798992] [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0 [ 8921.801395] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 [ 8921.803501] [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0 [ 8921.805505] [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140 [ 8921.807860] [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110 [ 8921.809986] [<ffffffff811958e7>] ? might_fault+0x97/0xa0 [ 8921.811998] [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90 [ 8921.814595] [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0 [ 8921.816702] [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200 [ 8921.818819] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50 [ 8921.820863] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 [ 8921.823318] [<ffffffff811e1926>] vfs_writev+0x46/0x60 [ 8921.825219] [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0 [ 8921.827127] [<ffffffff82658039>] system_call_fastpath+0x16/0x1b [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]--- Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Acked-by:
Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
RongQing.Li authored
[ Upstream commit 78d50217 ] Convert array index from the loop bound to the loop index. And remove the void type conversion to ip6_mc_del1_src() return code, seem it is unnecessary, since ip6_mc_del1_src() does not use __must_check similar attribute, no compiler will report the warning when it is removed. v2: enrich the commit header Signed-off-by:
RongQing.Li <roy.qing.li@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Herbert Xu authored
[ Upstream commit 996304bb ] As it stands the bridge IGMP snooping system will respond to group leave messages with queries for remaining membership. This is both unnecessary and undesirable. First of all any multicast routers present should be doing this rather than us. What's more the queries that we send may end up upsetting other multicast snooping swithces in the system that are buggy. In fact, we can simply remove the code that send these queries because the existing membership expiry mechanism doesn't rely on them anyway. So this patch simply removes all code associated with group queries in response to group leave messages. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Thomas Graf authored
[ Upstream commit acdd5985 ] getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns an error if the user provides less bytes than the size of struct sctp_event_subscribe. Struct sctp_event_subscribe needs to be extended by an u8 for every new event or notification type that is added. This obviously makes getsockopt fail for binaries that are compiled against an older versions of <net/sctp/user.h> which do not contain all event types. This patch changes getsockopt behaviour to no longer return an error if not enough bytes are being provided by the user. Instead, it returns as much of sctp_event_subscribe as fits into the provided buffer. This leads to the new behavior that users see what they have been aware of at compile time. The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this. Signed-off-by:
Thomas Graf <tgraf@suug.ch> Acked-by:
Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ This combines upstream commit 2f533844 and the follow-on bug fix commit 35f9c09f ] vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by:
Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk>
-