- 05 Oct, 2014 40 commits
-
-
Rafael J. Wysocki authored
commit 43e8317b upstream. Use the observation that, for platform-dependent sleep states (PM_SUSPEND_STANDBY, PM_SUSPEND_MEM), a given state is either always supported or always unsupported and store that information in pm_states[] instead of calling valid_state() every time we need to check it. Also do not use valid_state() for PM_SUSPEND_FREEZE, which is always valid, and move the pm_test_level validity check for PM_SUSPEND_FREEZE directly into enter_state(). Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Brian Norris <computersforpeace@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rafael J. Wysocki authored
commit 27ddcc65 upstream. To allow sleep states corresponding to the "mem", "standby" and "freeze" lables to be different from the pm_states[] indexes of those strings, introduce struct pm_sleep_state, consisting of a string label and a state number, and turn pm_states[] into an array of objects of that type. This modification should not lead to any functional changes. Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Brian Norris <computersforpeace@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julian Anastasov authored
commit eb90b0c7 upstream. commit fc604767 ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768Reported-by:
Chris J Arges <chris.j.arges@canonical.com> Tested-by:
Chris J Arges <chris.j.arges@canonical.com> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
commit caa8ad94 upstream. There's actually no good reason why we cannot use cgroup id 0, so lets just remove this artificial barrier. Reported-by:
Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Tested-by:
Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Gartrell authored
commit 76f084bc upstream. Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by:
Alex Gartrell <agartrell@fb.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
commit 7bd8490e upstream. xt_hashlimit cannot be used with large hash tables, because garbage collector is run from a timer. If table is really big, its possible to hold cpu for more than 500 msec, which is unacceptable. Switch to a work queue, and use proper scheduling points to remove latencies spikes. Later, we also could switch to a smoother garbage collection done at lookup time, one bucket at a time... Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julian Anastasov authored
commit 2627b7e1 upstream. commit 8f4e0a18 ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit f0cc9a05 upstream. r1_bio->start_next_window is not initialised in the READ case, so allow_barrier may incorrectly decrement conf->current_window_requests which can cause raise_barrier() to block forever. Fixes: 79ef3a8aReported-by:
Brassow Jonathan <jbrassow@redhat.com> Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit b8cb6b4c upstream. If a devices is being recovered it is not InSync and is not Faulty. If a read error is experienced on that device, fix_read_error() will be called, but it ignores non-InSync devices. So it will neither fix the error nor fail the device. It is incorrect that fix_read_error() ignores non-InSync devices. It should only ignore Faulty devices. So fix it. This became a bug when we allowed reading from a device that was being recovered. It is suitable for any subsequent -stable kernel. Fixes: da8840a7Reported-by:
Alexander Lyakas <alex.bolshoy@gmail.com> Tested-by:
Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 34e97f17 upstream. Both normal IO and resync IO can be retried with reschedule_retry() and so be counted into ->nr_queued, but only normal IO gets counted in ->nr_pending. Before the recent improvement to RAID1 resync there could only possibly have been one or the other on the queue. When handling a read failure it could only be normal IO. So when handle_read_error() called freeze_array() the fact that freeze_array only compares ->nr_queued against ->nr_pending was safe. But now that these two types can interleave, we can have both normal and resync IO requests queued, so we need to count them both in nr_pending. This error can lead to freeze_array() hanging if there is a read error, so it is suitable for -stable. Fixes: 79ef3a8aReported-by:
Brassow Jonathan <jbrassow@redhat.com> Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit c2fd4c94 upstream. raise_barrier() uses next_resync as part of its calculations, so it really should be updated first, instead of afterwards. next_resync is always used under resync_lock so update it under resync lock to, just before it is used. That is safest. This could cause normal IO and resync IO to interact badly so it suitable for -stable. Fixes: 79ef3a8aSigned-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 23554960 upstream. next_resync is (approximately) the location for the next resync request. However it does *not* reliably determine the earliest location at which resync might be happening. This is because resync requests can complete out of order, and we only limit the number of current requests, not the distance from the earliest pending request to the latest. mddev->curr_resync_completed is a reliable indicator of the earliest position at which resync could be happening. It is updated less frequently, but is actually reliable which is more important. So use it to determine if a write request is before the region being resynced and so safe from conflict. This error can allow resync IO to interfere with normal IO which could lead to data corruption. Hence: stable. Fixes: 79ef3a8aSigned-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 2f73d3c5 upstream. The resync/recovery process for raid1 was recently changed so that writes could happen in parallel with resync providing they were in different regions of the device. There is a problem though: While a write request will always wait for conflicting resync to complete, a resync request will *not* always wait for conflicting writes to complete. Two changes are needed to fix this: 1/ raise_barrier (which waits until it is safe to do resync) must wait until current_window_requests is zero 2/ wait_battier (which waits at the start of a new write request) must update current_window_requests if the request could possible conflict with a concurrent resync. As concurrent writes and resync can lead to data loss, this patch is suitable for -stable. Fixes: 79ef3a8a Cc: majianpeng <majianpeng@gmail.com> Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit c6d119cf upstream. commit 79ef3a8a made it possible for reads to happen concurrently with resync. This means that we need to be more careful where read_balancing is allowed during resync - we can no longer be sure that any resync that has already started will definitely finish. So keep read_balancing to before recovery_cp, which is conservative but safe. This bug makes it possible to read from a device that doesn't have up-to-date data, so it can cause data corruption. So it is suitable for any kernel since 3.11. Fixes: 79ef3a8aSigned-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
commit 669cc7ba upstream. If there are outstanding writes when close_sync is called, the change to ->start_next_window might cause them to decrement the wrong counter when they complete. Fix this by merging the two counters into the one that will be decremented. Having an incorrect value in a counter can cause raise_barrier() to hangs, so this is suitable for -stable. Fixes: 79ef3a8aSigned-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans Verkuil authored
commit 77639ff2 upstream. The log_status function should show HDMI information, but the test checking for an HDMI input was inverted. Fix this. Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by:
Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans Verkuil authored
commit 6a03dc92 upstream. This was caused by an uninitialized setup.config field. Based on a suggestion from Devin Heitmueller. Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Thanks-to: Devin Heitmueller <dheitmueller@kernellabs.com> Reported-by:
Scott Robinson <scott.robinson55@gmail.com> Tested-by:
Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Malcolm Priestley authored
commit a04646c0 upstream. add the following IDs USB_PID_PCTV_78E (0x025a) for PCTV 78e USB_PID_PCTV_79E (0x0262) for PCTV 79e For these it9135 devices. Signed-off-by:
Malcolm Priestley <tvboxspy@gmail.com> Cc: Antti Palosaari <crope@iki.fi> Signed-off-by:
Antti Palosaari <crope@iki.fi> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Prarit Bhargava authored
commit 7106e02b upstream. While debugging a cpufreq-related hardware failure on a system I saw the following lockdep warning: ========================= [ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G E ------------------------- insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there! (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80 3 locks held by insmod/2247: #0: (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120 #1: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80 #2: (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80 stack backtrace: CPU: 0 PID: 2247 Comm: insmod Tainted: G E 3.17.0-rc4+ #1 Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 08/24/2013 0000000000000000 000000008f3063c4 ffff88006f87bb30 ffffffff8171b358 ffff88006bcf3750 ffff88006f87bb68 ffffffff810e09e1 ffff88006e1b1400 ffffea0001b86c00 ffffffff8156d327 ffff880073003500 0000000000000246 Call Trace: [<ffffffff8171b358>] dump_stack+0x4d/0x66 [<ffffffff810e09e1>] debug_check_no_locks_freed+0x171/0x180 [<ffffffff8156d327>] ? __cpufreq_add_dev.isra.21+0x427/0xb80 [<ffffffff8121412b>] kfree+0xab/0x2b0 [<ffffffff8156d327>] __cpufreq_add_dev.isra.21+0x427/0xb80 [<ffffffff81724cf7>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq] [<ffffffff8156da8e>] cpufreq_add_dev+0xe/0x10 [<ffffffff814855d1>] subsys_interface_register+0xc1/0x120 [<ffffffff8156bcf2>] cpufreq_register_driver+0x112/0x340 [<ffffffff8121415a>] ? kfree+0xda/0x2b0 [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq] [<ffffffffa003562e>] pcc_cpufreq_init+0x4af/0xe81 [pcc_cpufreq] [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq] [<ffffffff81002144>] do_one_initcall+0xd4/0x210 [<ffffffff811f7472>] ? __vunmap+0xd2/0x120 [<ffffffff81127155>] load_module+0x1315/0x1b70 [<ffffffff811222a0>] ? store_uevent+0x70/0x70 [<ffffffff811229d9>] ? copy_module_from_fd.isra.44+0x129/0x180 [<ffffffff81127b86>] SyS_finit_module+0xa6/0xd0 [<ffffffff81725b69>] system_call_fastpath+0x16/0x1b cpufreq: __cpufreq_add_dev: ->get() failed insmod: ERROR: could not insert module pcc-cpufreq.ko: No such device The warning occurs in the __cpufreq_add_dev() code which does down_write(&policy->rwsem); ... if (cpufreq_driver->get && !cpufreq_driver->setpolicy) { policy->cur = cpufreq_driver->get(policy->cpu); if (!policy->cur) { pr_err("%s: ->get() failed\n", __func__); goto err_get_freq; } If cpufreq_driver->get(policy->cpu) returns an error we execute the code at err_get_freq, which does not up the policy->rwsem. This causes the lockdep warning. Trivial patch to up the policy->rwsem in the error path. After the patch has been applied, and an error occurs in the cpufreq_driver->get(policy->cpu) call we will now see cpufreq: __cpufreq_add_dev: ->get() failed cpufreq: __cpufreq_add_dev: ->get() failed modprobe: ERROR: could not insert 'pcc_cpufreq': No such device Fixes: 4e97b631 (cpufreq: Initialize governor for a new policy under policy->rwsem) Signed-off-by:
Prarit Bhargava <prarit@redhat.com> Acked-by:
Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johannes Berg authored
commit bd8c78e7 upstream. In testmode and vendor command reply/event SKBs we use the skb cb data to store nl80211 parameters between allocation and sending. This causes the code for CONFIG_NETLINK_MMAP to get confused, because it takes ownership of the skb cb data when the SKB is handed off to netlink, and it doesn't explicitly clear it. Clear the skb cb explicitly when we're done and before it gets passed to netlink to avoid this issue. Reported-by:
Assaf Azulay <assaf.azulay@intel.com> Reported-by:
David Spinadel <david.spinadel@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anton Altaparmakov authored
commit f2d5a944 upstream. On 32-bit architectures, the legacy buffer_head functions are not always handling the sector number with the proper 64-bit types, and will thus fail on 4TB+ disks. Any code that uses __getblk() (and thus bread(), breadahead(), sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop in __getblk_slow() with an infinite stream of errors logged to dmesg like this: __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648 b_state=0x00000020, b_size=512 device sda1 blocksize: 512 Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the top 32-bits are missing (in this case the 0x1 at the top). This is because grow_dev_page() is broken and has a 32-bit overflow due to shifting the page index value (a pgoff_t - which is just 32 bits on 32-bit architectures) left-shifted as the block number. But the top bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit before the shift. This patch fixes this issue by type casting "index" to sector_t before doing the left shift. Note this is not a theoretical bug but has been seen in the field on a 4TiB hard drive with logical sector size 512 bytes. This patch has been verified to fix the infinite loop problem on 3.17-rc5 kernel using a 4TB disk image mounted using "-o loop". Without this patch doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 100% reproducibly whilst with the patch it works fine as expected. Signed-off-by:
Anton Altaparmakov <aia21@cantab.net> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Deucher authored
commit 2e97140d upstream. Use the new vga_switcheroo_fini_domain_pm_ops function to unregister the pm ops. Based on a patch from: Pali Rohár <pali.rohar@gmail.com> bug: https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by:
Ben Skeggs <bskeggs@redhat.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Pali Rohár <pali.rohar@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Deucher authored
commit 53beaa01 upstream. Use the new vga_switcheroo_fini_domain_pm_ops function to unregister the pm ops. Based on a patch from: Pali Rohár <pali.rohar@gmail.com> bug: https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by:
Ben Skeggs <bskeggs@redhat.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Deucher authored
commit 766a53d0 upstream. Drivers should call this on unload to unregister pmops. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by:
Ben Skeggs <bskeggs@redhat.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Pali Rohár <pali.rohar@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Cong Wang authored
commit 3577af70 upstream. We saw a kernel soft lockup in perf_remove_from_context(), it looks like the `perf` process, when exiting, could not go out of the retry loop. Meanwhile, the target process was forking a child. So either the target process should execute the smp function call to deactive the event (if it was running) or it should do a context switch which deactives the event. It seems we optimize out a context switch in perf_event_context_sched_out(), and what's more important, we still test an obsolete task pointer when retrying, so no one actually would deactive that event in this situation. Fix it directly by reloading the task pointer in perf_remove_from_context(). This should cure the above soft lockup. Signed-off-by:
Cong Wang <cwang@twopensource.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Matan Barak authored
commit a59c5850 upstream. When marsheling a user path to the kernel struct ib_sa_path, need to zero smac, dmac and set the vlan id to the "no vlan" value. Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures") Reported-by:
Aleksey Senin <alekseys@mellanox.com> Signed-off-by:
Matan Barak <matanb@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Moni Shoua authored
commit f5c4834d upstream. When reading the IPv6 addresses from the net-device, make sure to avoid adding a duplicate entry to the GID table because of equality between the default GID we generate and the default IPv6 link-local address of the device. Fixes: acc4fccf ("IB/mlx4: Make sure GID index 0 is always occupied") Signed-off-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Moni Shoua authored
commit e381835c upstream. When Ethernet netdev is not present for a port (e.g. when the link layer type of the port is InfiniBand) it's possible to dereference a null pointer when we do netdevice scanning. To fix that, we move a section of code that needs to run only when netdev is present to a proper if () statement. Fixes: ad4885d2 ("IB/mlx4: Build the port IBoE GID table properly under bonding") Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Marciniszyn authored
commit 85cbb7c7 upstream. This particular reference count is not needed with the rcu protection, and the current code leaks a reference count, causing a hang in qib_qp_destroy(). Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit cfb2f9d5 upstream. Callers of d_splice_alias(dentry, inode) don't need iput(), neither on success nor on failure. Either the reference to inode is stored in a previously negative dentry, or it's dropped. In either case inode reference the caller used to hold is consumed. __gfs2_lookup() does iput() in case when d_splice_alias() has failed. Double iput() if we ever hit that. And gfs2_create_inode() ends up not only with double iput(), but with link count dropped to zero - on an inode it has just found in directory. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Steven Whitehouse <swhiteho@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Richard Larocque authored
commit 474e941b upstream. Locks the k_itimer's it_lock member when handling the alarm timer's expiry callback. The regular posix timers defined in posix-timers.c have this lock held during timout processing because their callbacks are routed through posix_timer_fn(). The alarm timers follow a different path, so they ought to grab the lock somewhere else. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by:
Richard Larocque <rlarocque@google.com> Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Richard Larocque authored
commit 265b81d2 upstream. Avoids sending a signal to alarm timers created with sigev_notify set to SIGEV_NONE by checking for that special case in the timeout callback. The regular posix timers avoid sending signals to SIGEV_NONE timers by not scheduling any callbacks for them in the first place. Although it would be possible to do something similar for alarm timers, it's simpler to handle this as a special case in the timeout. Prior to this patch, the alarm timer would ignore the sigev_notify value and try to deliver signals to the process anyway. Even worse, the sanity check for the value of sigev_signo is skipped when SIGEV_NONE was specified, so the signal number could be bogus. If sigev_signo was an unitialized value (as it often would be if SIGEV_NONE is used), then it's hard to predict which signal will be sent. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by:
Richard Larocque <rlarocque@google.com> Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Richard Larocque authored
commit e86fea76 upstream. Returns the time remaining for an alarm timer, rather than the time at which it is scheduled to expire. If the timer has already expired or it is not currently scheduled, the it_value's members are set to zero. This new behavior matches that of the other posix-timers and the POSIX specifications. This is a change in user-visible behavior, and may break existing applications. Hopefully, few users rely on the old incorrect behavior. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by:
Richard Larocque <rlarocque@google.com> [jstultz: minor style tweak] Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John David Anglin authored
commit d26a7730 upstream. In spite of what the GCC manual says, the -mfast-indirect-calls has never been supported in the 64-bit parisc compiler. Indirect calls have always been done using function descriptors irrespective of the -mfast-indirect-calls option. Recently, it was noticed that a function descriptor was always requested when the -mfast-indirect-calls option was specified. This caused problems when the option was used in application code and doesn't make any sense because the whole point of the option is to avoid using a function descriptor for indirect calls. Fixing this broke 64-bit kernel builds. I will fix GCC but for now we need the attached change. This results in the same kernel code as before. Signed-off-by:
John David Anglin <dave.anglin@bell.net> Signed-off-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guy Martin authored
commit 89206491 upstream. The current LWS cas only works correctly for 32bit. The new LWS allows for CAS operations of variable size. Signed-off-by:
Guy Martin <gmsoft@tuxicoman.be> Signed-off-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 7bd88377 upstream. return the value instead, and have path_init() do the assignment. Broken by "vfs: Fix absolute RCU path walk failures due to uninitialized seq number", which was Cc-stable with 2.6.38+ as destination. This one should go where it went. To avoid dummy value returned in case when root is already set (it would do no harm, actually, since the only caller that doesn't ignore the return value is guaranteed to have nd->root *not* set, but it's more obvious that way), lift the check into callers. And do the same to set_root(), to keep them in sync. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 78e05b14 upstream. Similar to the previous commit which described why we need to add a barrier to arch_spin_is_locked(), we have a similar problem with spin_unlock_wait(). We need a barrier on entry to ensure any spinlock we have previously taken is visibly locked prior to the load of lock->slock. It's also not clear if spin_unlock_wait() is intended to have ACQUIRE semantics. For now be conservative and add a barrier on exit to give it ACQUIRE semantics. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 51d7d520 upstream. The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked. Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time. There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update. Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic value: The first CPU does: spin_lock(*A) if spin_is_locked(*B) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*A) And the second CPU does: spin_lock(*B) if spin_is_locked(*A) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*B) Although this is a strange locking construct, it should work. It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked(). For now we assume spin_is_locked() is implemented as below, and we break it out in our examples: bool spin_is_locked(*LOCK) { LOAD l = *LOCK return l.locked } Our intuition is that there should be no problem even if the two code sequences run simultaneously such as: CPU 0 CPU 1 ================================================== spin_lock(*A) spin_lock(*B) LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) If one CPU gets the lock before the other then it will do the update and the other CPU will back off: CPU 0 CPU 1 ================================================== spin_lock(*A) LOAD b = *B spin_lock(*B) if b.locked # false LOAD a = *A else if a.locked # true smp_mb() # nothing LOAD r1 = *COUNTER spin_unlock(*B) r1++ STORE *COUNTER = r1 spin_unlock(*A) However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional. Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(*LOCK) { LOAD l = *LOCK l.locked = true STORE *LOCK = l ACQUIRE_BARRIER } The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt: This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1". As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of *LOCK. Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future. What this means in practice is that the following can occur: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true LOAD b = *B LOAD a = *A STORE *A = a STORE *B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = *COUNTER LOAD r2 = *COUNTER r1++ r2++ STORE *COUNTER = r1 STORE *COUNTER = r2 # Lost update spin_unlock(*A) spin_unlock(*B) That is, the load of *B can occur prior to the store that makes *A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked. The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us: bool spin_is_locked(*LOCK) { smp_mb() LOAD l = *LOCK return l.locked } The new barrier orders the store to the lock we are locking vs the load of the other lock: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true STORE *A = a STORE *B = b smp_mb() smp_mb() LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anton Blanchard authored
commit 85101af1 upstream. ABIv2 kernels are failing to backtrace through the kernel. An example: 39.30% readseek2_proce [kernel.kallsyms] [k] find_get_entry | --- find_get_entry __GI___libc_read The problem is in valid_next_sp() where we check that the new stack pointer is at least STACK_FRAME_OVERHEAD below the previous one. ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes with no paramter save area. STACK_FRAME_OVERHEAD is in theory the minimum stack frame size, but we over 240 uses of it, some of which assume that it includes space for the parameter area. We need to work through all our stack defines and rationalise them but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using in valid_next_sp(). This fixes the issue: 30.64% readseek2_proce [kernel.kallsyms] [k] find_get_entry | --- find_get_entry pagecache_get_page generic_file_read_iter new_sync_read vfs_read sys_read syscall_exit __GI___libc_read Reported-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arend van Spriel authored
commit 87c47903 upstream. The firmware notifies about interface changes through the IF event which has a NO_IF flag that means host can ignore the event. This behaviour was introduced in the driver by: commit 2ee8382f Author: Arend van Spriel <arend@broadcom.com> Date: Sat Aug 10 12:27:24 2013 +0200 brcmfmac: ignore IF event if firmware indicates it It turns out that the IF event for the P2P_DEVICE also has this flag set, but the event should not be ignored in this scenario. The mentioned commit caused a regression in 3.12 kernel in creation of the P2P_DEVICE interface. Reviewed-by:
Hante Meuleman <meuleman@broadcom.com> Reviewed-by:
Franky (Zhenhui) Lin <frankyl@broadcom.com> Reviewed-by:
Daniel (Deognyoun) Kim <dekim@broadcom.com> Reviewed-by:
Pieter-Paul Giesberts <pieterpg@broadcom.com> Signed-off-by:
Arend van Spriel <arend@broadcom.com> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-