- 30 Oct, 2014 5 commits
-
-
David Matlack authored
commit 56f17dd3 upstream. The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Lutomirski authored
commit a1480dcc upstream. Accessing do_remount_sb should require global CAP_SYS_ADMIN, but only one of the two call sites was appropriately protected. Fixes CVE-2014-7975. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sage Weil authored
commit 42383020 upstream. We check whether transid is already committed via last_trans_committed and then search through trans_list for pending transactions. If last_trans_committed is updated by btrfs_commit_transaction after we check it (there is no locking), we will fail to find the committed transaction and return EINVAL to the caller. This has been observed occasionally by ceph-osd (which uses this ioctl heavily). Fix by rechecking whether the provided transid <= last_trans_committed after the search fails, and if so return 0. Signed-off-by: Sage Weil <sage@redhat.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josef Bacik authored
commit bbe90514 upstream. Marc Merlin sent me a broken fs image months ago where it would blow up in the upper->checked BUG_ON() in build_backref_tree. This is because we had a scenario like this block a -- level 4 (not shared) | block b -- level 3 (reloc block, shared) | block c -- level 2 (not shared) | block d -- level 1 (shared) | block e -- level 0 (shared) We go to build a backref tree for block e, we notice block d is shared and add it to the list of blocks to lookup it's backrefs for. Now when we loop around we will check edges for the block, so we will see we looked up block c last time. So we lookup block d and then see that the block that points to it is block c and we can just skip that edge since we've already been up this path. The problem is because we clear need_check when we see block d (as it is shared) we never add block b as needing to be checked. And because block c is in our path already we bail out before we walk up to block b and add it to the backref check list. To fix this we need to reset need_check if we trip over a block that doesn't need to be checked. This will make sure that any subsequent blocks in the path as we're walking up afterwards are added to the list to be processed. With this patch I can now mount Marc's fs image and it'll complete the balance without panicing. Thanks, Reported-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josef Bacik authored
commit 1d52c78a upstream. When doing log replay we may have to update inodes, which traditionally goes through our delayed inode stuff. This will try to move space over from the trans handle, but we don't reserve space in our trans handle on replay since we don't know how much we will need, so instead we try to flush. But because we have a trans handle open we won't flush anything, so if we are out of reserve space we will simply return ENOSPC. Since we know that if an operation made it into the log then we definitely had space before the box bought the farm then we don't need to worry about doing this space reservation. Use the fs_info->log_root_recovering flag to skip the delayed inode stuff and update the item directly. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Oct, 2014 17 commits
-
-
Greg Kroah-Hartman authored
-
Andreas Bomholtz authored
commit dee80ad1 upstream. Added the Seluxit ApS USB Serial Dongle to cp210x driver. Signed-off-by: Andreas Bomholtz <andreas@seluxit.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joe Savage authored
commit bfc2d7df upstream. Added support for Ketra N1 wireless interface, which uses the Silicon Labs' CP2104 USB to UART bridge with customized PID 8946. Signed-off-by: Joe Savage <joe.savage@goketra.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lu Baolu authored
commit ddbe1fca upstream. This full-speed USB device generates spurious remote wakeup event as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result, Linux can't enter system suspend and S0ix power saving modes once this keyboard is used. This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk. With this quirk set, wakeup capability will be ignored during device configure. This patch could be back-ported to kernels as old as 2.6.39. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gao feng authored
[ Upstream commit 33d99113 ] commit 25fb6ca4 "net IPv6 : Fix broken IPv6 routing table after loopback down-up" allocates addrconf router for ipv6 address when lo device up. but commit a881ae1f "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks this behavior. Since the addrconf router is moved to the garbage list when lo device down, we should release this router and rellocate a new one for ipv6 address when lo device up. This patch solves bug 67951 on bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=67951 change from v1: use ip6_rt_put to repleace ip6_del_rt, thanks Hannes! change code style, suggested by Sergei. CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Per Hurtig authored
[ Upstream commit bef1909e ] Fix to a problem observed when losing a FIN segment that does not contain data. In such situations, TLP is unable to recover from *any* tail loss and instead adds at least PTO ms to the retransmission process, i.e., RTO = RTO + PTO. Signed-off-by: Per Hurtig <per.hurtig@kau.se> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlad Yasevich authored
[ Upstream commit bdf6fa52 ] Currently association restarts do not take into consideration the state of the socket. When a restart happens, the current assocation simply transitions into established state. This creates a condition where a remote system, through a the restart procedure, may create a local association that is no way reachable by user. The conditions to trigger this are as follows: 1) Remote does not acknoledge some data causing data to remain outstanding. 2) Local application calls close() on the socket. Since data is still outstanding, the association is placed in SHUTDOWN_PENDING state. However, the socket is closed. 3) The remote tries to create a new association, triggering a restart on the local system. The association moves from SHUTDOWN_PENDING to ESTABLISHED. At this point, it is no longer reachable by any socket on the local system. This patch addresses the above situation by moving the newly ESTABLISHED association into SHUTDOWN-SENT state and bundling a SHUTDOWN after the COOKIE-ACK chunk. This way, the restarted associate immidiately enters the shutdown procedure and forces the termination of the unreachable association. Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicolas Dichtel authored
[ Upstream commit 3be07244 ] In xmit path, we build a flowi6 which will be used for the output route lookup. We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the protocol should be IPPROTO_GRE. Fixes: c12b395a ("gre: Support GRE over IPv6") Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
KY Srinivasan authored
[ Upstream commit dedb845d ] After the packet is successfully sent, we should not touch the skb as it may have been freed. This patch is based on the work done by Long Li <longli@microsoft.com>. In this version of the patch I have fixed issues pointed out by David. David, please queue this up for stable. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Long Li <longli@microsoft.com> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlad Yasevich authored
[ Upstream commit 7d3083ee ] When receiving a vlan-tagged frame that still contains a vlan header, the length of the packet will be greater then MTU+ETH_HLEN since it will account of the extra vlan header. TG3 checks this for the case for 802.1Q, but not for 802.1ad. As a result, full sized 802.1ad frames get dropped by the card. Add a check for 802.1ad protocol when receving full sized frames. Suggested-by: Prashant Sreedharan <prashant@broadcom.com> CC: Prashant Sreedharan <prashant@broadcom.com> CC: Michael Chan <mchan@broadcom.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlad Yasevich authored
[ Upstream commit 476c1885 ] TG3 appears to have an issue performing TSO and checksum offloading correclty when the frame has been vlan encapsulated (non-accelrated). In these cases, tcp checksum is not correctly updated. This patch attempts to work around this issue. After the patch, 802.1ad vlans start working correctly over tg3 devices. CC: Prashant Sreedharan <prashant@broadcom.com> CC: Michael Chan <mchan@broadcom.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guillaume Nault authored
[ Upstream commit eed4d839 ] Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU. The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get() could return NULL if tunnel->sock->sk_dst_cache was reset just before the call, thus making dst_mtu() dereference a NULL pointer: [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0 [ 1937.664005] Oops: 0000 [#1] SMP [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core] [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1 [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008 [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000 [ 1937.664005] RIP: 0010:[<ffffffffa049db88>] [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282 [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5 [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000 [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000 [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000 [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009 [ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000 [ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0 [ 1937.664005] Stack: [ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009 [ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57 [ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000 [ 1937.664005] Call Trace: [ 1937.664005] [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp] [ 1937.664005] [<ffffffff81109b57>] ? might_fault+0x9e/0xa5 [ 1937.664005] [<ffffffff81109b0e>] ? might_fault+0x55/0xa5 [ 1937.664005] [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26 [ 1937.664005] [<ffffffff81309196>] SYSC_connect+0x87/0xb1 [ 1937.664005] [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56 [ 1937.664005] [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1 [ 1937.664005] [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1937.664005] [<ffffffff8114c262>] ? spin_lock+0x9/0xb [ 1937.664005] [<ffffffff813092b4>] SyS_connect+0x9/0xb [ 1937.664005] [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89 [ 1937.664005] RIP [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP <ffff8800c43c7de8> [ 1937.664005] CR2: 0000000000000020 [ 1939.559375] ---[ end trace 82d44500f28f8708 ]--- Fixes: f34c4a35 ("l2tp: take PMTU from tunnel UDP socket") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Benc authored
[ Upstream commit 2ba5af42 ] When there are multiple vlan headers present in a received frame, the first one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the skb beyond the VLAN TPID may be still non-linear, including the inner TCI and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is executed, __pop_vlan_tci pulls the next vlan header into vlan_tci. This leads to two things: 1. Part of the resulting ethernet header is in the non-linear part of the skb. When eth_type_trans is called later as the result of OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci is in fact accessing random data when it reads past the TPID. 2. network_header points into the ethernet header instead of behind it. mac_len is set to a wrong value (10), too. Reported-by: Yulong Pei <ypei@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit dc808110 ] af_packet can currently overwrite kernel memory by out of bound accesses, because it assumed a [new] block can always hold one frame. This is not generally the case, even if most existing tools do it right. This patch clamps too long frames as API permits, and issue a one time error on syslog. [ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82 In this example, packet header tp_snaplen was set to 3966, and tp_len was set to 5042 (skb->len) Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.") Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Neal Cardwell authored
[ Upstream commit 4fab9071 ] Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shmulik Ladkani authored
[ Upstream commit bc8fc7b8 ] As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"), when looking up a tunnel, tunnel's underlying interface (t->parms.link) is verified to match incoming traffic's ingress device. However the comparison was incorrectly based on skb->dev->iflink. Instead, dev->ifindex should be used, which correctly represents the interface from which the IP stack hands the ipip6 packets. This allows setting up sit tunnels bound to vlan interfaces (otherwise incoming ipip6 traffic on the vlan interface was dropped due to ipip6_tunnel_lookup match failure). Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stanislaw Gruszka authored
[ Upstream commit 10545937 ] On IOMMU systems DMA mapping can fail, we need to check for that possibility. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 09 Oct, 2014 14 commits
-
-
Greg Kroah-Hartman authored
-
Stratos Karafotis authored
commit dfa5bb62 upstream. The ondemand governor calculates load in terms of frequency and increases it only if load_freq is greater than up_threshold multiplied by the current or average frequency. This appears to produce oscillations of frequency between min and max because, for example, a relatively small load can easily saturate minimum frequency and lead the CPU to the max. Then, it will decrease back to the min due to small load_freq. Change the calculation method of load and target frequency on the basis of the following two observations: - Load computation should not depend on the current or average measured frequency. For example, absolute load of 80% at 100MHz is not necessarily equivalent to 8% at 1000MHz in the next sampling interval. - It should be possible to increase the target frequency to any value present in the frequency table proportional to the absolute load, rather than to the max only, so that: Target frequency = C * load where we take C = policy->cpuinfo.max_freq / 100. Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait. Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an increase ~1.5% in performance. cpufreq_stats (time_in_state) shows that middle frequencies are used more, with this patch. Highest and lowest frequencies were used less by ~9%. [rjw: We have run multiple other tests on kernels with this change applied and in the vast majority of cases it turns out that the resulting performance improvement also leads to reduced consumption of energy. The change is additionally justified by the overall simplification of the code in question.] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Schwab authored
commit a857c0b9 upstream. The time spent by a CPU under a given frequency is stored in jiffies unit in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of the frequency. This is what is displayed in the following file on the right column: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 19835820 2300000 3172 [...] Now cpufreq converts this jiffies unit delta to clock_t before returning it to the user as in the above file. And that conversion is achieved using the API cputime64_to_clock_t(). Although it accidentally works on traditional tick based cputime accounting, where cputime_t maps directly to jiffies, it doesn't work with other types of cputime accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs or any granularity preffered by the architecture. For example we get a buggy zero delta on full dyntick configurations: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 0 2300000 0 [...] Fix this with using the proper jiffies_64_t to clock_t conversion. Reported-and-tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johannes Berg authored
commit bd8c78e7 upstream. In testmode and vendor command reply/event SKBs we use the skb cb data to store nl80211 parameters between allocation and sending. This causes the code for CONFIG_NETLINK_MMAP to get confused, because it takes ownership of the skb cb data when the SKB is handed off to netlink, and it doesn't explicitly clear it. Clear the skb cb explicitly when we're done and before it gets passed to netlink to avoid this issue. Reported-by: Assaf Azulay <assaf.azulay@intel.com> Reported-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lars Ellenberg authored
commit bbc1c5e8 upstream. Since linux kernel 3.13, kthread_run() internally uses wait_for_completion_killable(). We sometimes may use kthread_run() while we still have a signal pending, which we used to kick our threads out of potentially blocking network functions, causing kthread_run() to mistake that as a new fatal signal and fail. Fix: flush_signals() before kthread_run(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Hunter authored
commit d78c9300 upstream. timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Reported-by: Aaron Jacobs <jacobsa@google.com> Signed-off-by: Andrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by: John Stultz <john.stultz@linaro.org> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 8e0e99ba upstream. It has come to my attention (thanks Martin) that 'discard_zeroes_data' is only a hint. Some devices in some cases don't do what it says on the label. The use of DISCARD in RAID5 depends on reads from discarded regions being predictably zero. If a write to a previously discarded region performs a read-modify-write cycle it assumes that the parity block was consistent with the data blocks. If all were zero, this would be the case. If some are and some aren't this would not be the case. This could lead to data corruption after a device failure when data needs to be reconstructed from the parity. As we cannot trust 'discard_zeroes_data', ignore it by default and so disallow DISCARD on all raid4/5/6 arrays. As many devices are trustworthy, and as there are benefits to using DISCARD, add a module parameter to over-ride this caution and cause DISCARD to work if discard_zeroes_data is set. If a site want to enable DISCARD on some arrays but not on others they should select DISCARD support at the filesystem level, and set the raid456 module parameter. raid456.devices_handle_discard_safely=Y As this is a data-safety issue, I believe this patch is suitable for -stable. DISCARD support for RAID456 was added in 3.7 Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Fixes: 620125f2Signed-off-by: NeilBrown <neilb@suse.de> [bwh: Backported to 3.10: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans Verkuil authored
commit 58d75f4b upstream. The recent conversion of saa7134 to vb2 unconvered a poll() bug that broke the teletext applications alevt and mtt. These applications expect that calling poll() without having called VIDIOC_STREAMON will cause poll() to return POLLERR. That did not happen in vb2. This patch fixes that behavior. It also fixes what should happen when poll() is called when STREAMON is called but no buffers have been queued. In that case poll() will also return POLLERR, but only for capture queues since output queues will always return POLLOUT anyway in that situation. This brings the vb2 behavior in line with the old videobuf behavior. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit abc40bd2 upstream. This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte"). If a huge page is being split due a protection change and the tail will be in a PROT_NONE vma then NUMA hinting PTEs are temporarily created in the protected VMA. VM_RW|VM_PROTNONE |-----------------| ^ split here In the specific case above, it should get fixed up by change_pte_range() but there is a window of opportunity for weirdness to happen. Similarly, if a huge page is shrunk and split during a protection update but before pmd_numa is cleared then a pte_numa can be left behind. Instead of adding complexity trying to deal with the case, this patch will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults will not be triggered which is marginal in comparison to the complexity in dealing with the corner cases during THP split. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Waiman Long authored
commit f8303c25 upstream. In __split_huge_page_map(), the check for page_mapcount(page) is invariant within the for loop. Because of the fact that the macro is implemented using atomic_read(), the redundant check cannot be optimized away by the compiler leading to unnecessary read to the page structure. This patch moves the invariant bug check out of the loop so that it will be done only once. On a 3.16-rc1 based kernel, the execution time of a microbenchmark that broke up 1000 transparent huge pages using munmap() had an execution time of 38,245us and 38,548us with and without the patch respectively. The performance gain is about 1%. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (Red Hat) authored
commit 24607f11 upstream. Commit 651e22f2 "ring-buffer: Always reset iterator to reader page" fixed one bug but in the process caused another one. The reset is to update the header page, but that fix also changed the way the cached reads were updated. The cache reads are used to test if an iterator needs to be updated or not. A ring buffer iterator, when created, disables writes to the ring buffer but does not stop other readers or consuming reads from happening. Although all readers are synchronized via a lock, they are only synchronized when in the ring buffer functions. Those functions may be called by any number of readers. The iterator continues down when its not interrupted by a consuming reader. If a consuming read occurs, the iterator starts from the beginning of the buffer. The way the iterator sees that a consuming read has happened since its last read is by checking the reader "cache". The cache holds the last counts of the read and the reader page itself. Commit 651e22f2 changed what was saved by the cache_read when the rb_iter_reset() occurred, making the iterator never match the cache. Then if the iterator calls rb_iter_reset(), it will go into an infinite loop by checking if the cache doesn't match, doing the reset and retrying, just to see that the cache still doesn't match! Which should never happen as the reset is suppose to set the cache to the current value and there's locks that keep a consuming reader from having access to the data. Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Triplett authored
commit 62b4d204 upstream. commit 03b8c7b6 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") added the HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in the middle of the options for the EXPERT menu. However, HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops placing items in the EXPERT menu, and displays the remaining several EXPERT items (starting with EPOLL) directly in the General Setup menu. Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the subsequent items display as part of the EXPERT menu again; the EMBEDDED menu now appears as the next top-level item in the General Setup menu, which makes General Setup much shorter and more usable. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
commit 6c72e350 upstream. Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit c03aa9f6 upstream. We did not implement any bound on number of indirect ICBs we follow when loading inode. Thus corrupted medium could cause kernel to go into an infinite loop, possibly causing a stack overflow. Fix the possible stack overflow by removing recursion from __udf_read_inode() and limit number of indirect ICBs we follow to avoid infinite loops. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 05 Oct, 2014 4 commits
-
-
Greg Kroah-Hartman authored
-
Oleg Nesterov authored
commit 4449a51a upstream. Aleksei hit the soft lockup during reading /proc/PID/smaps. David investigated the problem and suggested the right fix. while_each_thread() is racy and should die, this patch updates vm_is_stack(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Aleksei Besogonov <alex.besogonov@gmail.com> Tested-by: Aleksei Besogonov <alex.besogonov@gmail.com> Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Oleg Nesterov authored
commit 4d4048be upstream. find_lock_task_mm() expects it is called under rcu or tasklist lock, but it seems that at least oom_unkillable_task()->task_in_mem_cgroup() and mem_cgroup_out_of_memory()->oom_badness() can call it lockless. Perhaps we could fix the callers, but this patch simply adds rcu lock into find_lock_task_mm(). This also allows to simplify a bit one of its callers, oom_kill_process(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Sergey Dyasly <dserrg@gmail.com> Cc: Sameer Nanda <snanda@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Oleg Nesterov authored
commit ad962441 upstream. At least out_of_memory() calls has_intersects_mems_allowed() without even rcu_read_lock(), this is obviously buggy. Add the necessary rcu_read_lock(). This means that we can not simply return from the loop, we need "bool ret" and "break". While at it, swap the names of task_struct's (the argument and the local). This cleans up the code a little bit and avoids the unnecessary initialization. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Sergey Dyasly <dserrg@gmail.com> Tested-by: Sergey Dyasly <dserrg@gmail.com> Reviewed-by: Sameer Nanda <snanda@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-