- 31 Aug, 2015 8 commits
-
-
Haozhong Zhang authored
commit d7add054 upstream. When kvm_set_msr_common() handles a guest's write to MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data written by guest and then use it to adjust TSC offset by calling a call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset() indicates whether the adjustment is in host TSC cycles or in guest TSC cycles. If SVM TSC scaling is enabled, adjust_tsc_offset() [i.e. svm_adjust_tsc_offset()] will first scale the adjustment; otherwise, it will just use the unscaled one. As the MSR write here comes from the guest, the adjustment is in guest TSC cycles. However, the current kvm_set_msr_common() uses it as a value in host TSC cycles (by using true as the 3rd parameter of adjust_tsc_offset()), which can result in an incorrect adjustment of TSC offset if SVM TSC scaling is enabled. This patch fixes this problem. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Peter Zijlstra authored
commit fed66e2c upstream. Vince reported that the fasync signal stuff doesn't work proper for inherited events. So fix that. Installing fasync allocates memory and sets filp->f_flags |= FASYNC, which upon the demise of the file descriptor ensures the allocation is freed and state is updated. Now for perf, we can have the events stick around for a while after the original FD is dead because of references from child events. So we cannot copy the fasync pointer around. We can however consistently use the parent's fasync, as that will be updated. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho deMelo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twinsSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Roland Dreier authored
commit 9c395170 upstream. If an initiator doesn't have any real LUNs assigned, we should report LUN 0 and a LUN list length of 1. Some versions of Solaris at least go beserk if we report a LUN list length of 0. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Alexei Potashnik authored
commit 9547308b upstream. Make sure all non-READ SCSI commands get targ_xfer_tag initialized to 0xffffffff, not just WRITEs. Double-free of a TUR cmd object occurs under the following scenario: 1. TUR received (targ_xfer_tag is uninitialized and left at 0) 2. TUR status sent 3. First unsolicited NOPIN is sent to initiator (gets targ_xfer_tag of 0) 4. NOPOUT for NOPIN (with TTT=0) arrives - its ExpStatSN acks TUR status, TUR is queued for removal - LIO tries to find NOPIN with TTT=0, but finds the same TUR instead, TUR is queued for removal for the 2nd time (Drop unbalanced conditional bracket usage - nab) Signed-off-by: Alexei Potashnik <alexei@purestorage.com> Signed-off-by: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> [ luis: backported to 3.16: - kept brackets as they are needed in 3.16 kernel ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Guenter Roeck authored
commit 8ef9724b upstream. When inserting a new register into a block, the present bit map size is increased using krealloc. krealloc does not clear the additionally allocated memory, leaving it filled with random values. Result is that some registers are considered cached even though this is not the case. Fix the problem by clearing the additionally allocated memory. Also, if the bitmap size does not increase, do not reallocate the bitmap at all to reduce overhead. Fixes: 3f4ff561 ("regmap: rbtree: Make cache_present bitmap per node") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Bob Liu authored
commit 53bc7dc0 upstream. The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge work haven't finished. There is a work_pending() before this BUG_ON, but it doesn't account if the work is still currently running. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Bob Liu authored
commit 7b076750 upstream. We should consider info->feature_persistent when adding indirect page to list info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered. When we are using persistent grants the indirect_pages list should always be empty because blkfront has pre-allocated enough persistent pages to fill all requests on the ring. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Charles Keepax authored
commit 72e43164 upstream. The PM runtime core by default assumes a chip is suspended when runtime PM is enabled. Currently the arizona driver enables runtime PM when the chip is fully active and then disables the DCVDD regulator at the end of arizona_dev_init. This however has several problems, firstly the if we reach the end of arizona_dev_init, we did not properly follow all the proceedures for shutting down the chip, and most notably we never marked the chip as cache only so any writes occurring between then and the next PM runtime resume will be lost. Secondly, if we are already resumed when we reach the end of dev_init, then at best we get unbalanced regulator enable/disables at work we lose DCVDD whilst we need it. Additionally, since the commit 4f0216409f7c ("mfd: arizona: Add better support for system suspend"), the PM runtime operations may disable/enable the IRQ, so the IRQs must now be enabled before we call any PM operations. This patch adds a call to pm_runtime_set_active to inform the PM core that the device is starting up active and moves the PM enabling to around the IRQ initialisation to avoid any PM callbacks happening until the IRQs are initialised. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
- 27 Aug, 2015 32 commits
-
-
Mimi Zohar authored
commit 4351c294 upstream. The current "mask" policy option matches files opened as MAY_READ, MAY_WRITE, MAY_APPEND or MAY_EXEC. This patch extends the "mask" option to match files opened containing one of these modes. For example, "mask=^MAY_READ" would match files opened read-write. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mimi Zohar authored
commit 139069ef upstream. The new "euid" policy condition measures files with the specified effective uid (euid). In addition, for CAP_SETUID files it measures files with the specified uid or suid. Changelog: - fixed checkpatch.pl warnings - fixed avc denied {setuid} messages - based on Roberto's feedback Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Herton R. Krzesinski authored
commit 7250dc3f upstream. I received a report from an user of following mouse which needs this quirk: usb 1-1.6: USB disconnect, device number 58 usb 1-1.6: new low speed USB device number 59 using ehci_hcd usb 1-1.6: New USB device found, idVendor=04f2, idProduct=1053 usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 1-1.6: Product: USB Optical Mouse usb 1-1.6: Manufacturer: PixArt usb 1-1.6: configuration #1 chosen from 1 choice input: PixArt USB Optical Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6:1.0/input/input5887 generic-usb 0003:04F2:1053.16FE: input,hidraw2: USB HID v1.11 Mouse [PixArt USB Optical Mouse] on usb-0000:00:1a.0-1.6/input0 The quirk was tested by the reporter and it fixed the frequent disconnections etc. [jkosina@suse.cz: reorder the position in hid-ids.h] Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nikolay Borisov authored
commit c45653c3 upstream. Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible deadlocks in the page writeback path. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nikolay Borisov authored
commit bd7ade3c upstream. sb_getblk() is used during ext4 (and possibly other FSes) writeback paths. Sometimes such path require allocating memory and guaranteeing that such allocation won't block. Currently, however, there is no way to provide user flags for sb_getblk which could lead to deadlocks. This patch implements a sb_getblk_gfp with the only difference it can accept user-provided GFP flags. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Gioh Kim authored
commit 3b5e6454 upstream. A buffer cache is allocated from movable area because it is referred for a while and released soon. But some filesystems are taking buffer cache for a long time and it can disturb page migration. New APIs are introduced to allocate buffer cache with user specific flag. *_gfp APIs are for user want to set page allocation flag for page cache allocation. And *_unmovable APIs are for the user wants to allocate page cache from non-movable area. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> [ luis: 3.16 prereq for: bd7ade3c "bufferhead: Add _gfp version for sb_getblk()" ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Christoph Hellwig authored
commit 16b8528d upstream. We only want to steer the I/O completion towards a queue, but don't actually access any per-CPU data, so the raw_ version is fine to use and avoids the warnings when using smp_processor_id(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Andy Lutomirski <luto@kernel.org> Tested-by: Andy Lutomirski <luto@kernel.org> Acked-by: Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> [ luis: backported to 3.16: - dropped changes to megasas_build_dcdb_fusion() ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Chad Dupuis authored
commit ef86cb20 upstream. Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Linus Torvalds authored
commit 6f957724 upstream. The firmware class uevent function accessed the "fw_priv->buf" buffer without the proper locking and testing for NULL. This is an old bug (looks like it goes back to 2012 and commit 1244691c: "firmware loader: introduce firmware_buf"), but for some reason it's triggering only now in 4.2-rc1. Shuah Khan is trying to bisect what it is that causes this to trigger more easily, but in the meantime let's just fix the bug since others are hitting it too (at least Ingo reports having seen it as well). Reported-and-tested-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Ming Lei <ming.lei@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Florian Westphal authored
commit 1e16aa3d upstream. skb_gso_segment() has a 'features' argument representing offload features available to the output path. A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use those instead of the provided ones when handing encapsulation/tunnels. Depending on dev->hw_enc_features of the output device skb_gso_segment() can then return NULL even when the caller has disabled all GSO feature bits, as segmentation of inner header thinks device will take care of segmentation. This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs that did not fit the remaining token quota as the segmentation does not work when device supports corresponding hw offload capabilities. Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Ivan Vecera authored
commit ade4dc3e upstream. The commit "e29aa339 bna: Enable Multi Buffer RX" moved packets counter increment from the beginning of the NAPI processing loop after the check for erroneous packets so they are never accounted. This counter is used to inform firmware about number of processed completions (packets). As these packets are never acked the firmware fires IRQs for them again and again. Fixes: e29aa339 ("bna: Enable Multi Buffer RX") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Eric Dumazet authored
commit 10e2eb87 upstream. Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit f8864972 ("ipv4: fix dst race in sk_dst_get()"), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned in commit d0c294c5 ("tcp: prevent fetching dst twice in early demux code") Fixes: 421b3885 ("udp: ipv4: Add udp early demux") Tested-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz> Reported-by: Alex Gartrell <agartrell@fb.com> Cc: Michal Kubeček <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Dan Carpenter authored
commit 468b732b upstream. "len" is a signed integer. We check that len is not negative, so it goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than INT_MAX so the condition can never be true. I don't know if this is harmful but it seems safe to limit "len" to INT_MAX - 4095. Fixes: a8c879a7 ('RDS: Info and stats') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Florian Westphal authored
commit 0470eb99 upstream. Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr = 64, .nm_frame_size = 16384, .nm_frame_nr = 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 Call Trace: <IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270 [<ffffffff81085bed>] __might_sleep+0x4d/0x90 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150 [<ffffffff817e484d>] __sk_free+0x1d/0x160 [<ffffffff817e49a9>] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Diagnosed-by: Cong Wang <cwang@twopensource.com> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
dingtianhong authored
commit a951bc1e upstream. The "follow" fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 > /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 > /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Eric Dumazet authored
commit 03645a11 upstream. ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Herbert Xu authored
commit a0a2a660 upstream. The commit 738ac1eb ("net: Clone skb before setting peeked flag") introduced a use-after-free bug in skb_recv_datagram. This is because skb_set_peeked may create a new skb and free the existing one. As it stands the caller will continue to use the old freed skb. This patch fixes it by making skb_set_peeked return the new skb (or the old one if unchanged). Fixes: 738ac1eb ("net: Clone skb before setting peeked flag") Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Herbert Xu authored
commit 89c22d8c upstream. When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Herbert Xu authored
commit 738ac1eb upstream. Shared skbs must not be modified and this is crucial for broadcast and/or multicast paths where we use it as an optimisation to avoid unnecessary cloning. The function skb_recv_datagram breaks this rule by setting peeked without cloning the skb first. This causes funky races which leads to double-free. This patch fixes this by cloning the skb and replacing the skb in the list when setting skb->peeked. Fixes: a59322be ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Julian Anastasov authored
commit 2c17d27c upstream. Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.18 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Oleg Nesterov authored
commit fecdf8be upstream. pktgen_thread_worker() is obviously racy, kthread_stop() can come between the kthread_should_stop() check and set_current_state(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Stephen Smalley authored
commit fdd75ea8 upstream. Calling connect() with an AF_TIPC socket would trigger a series of error messages from SELinux along the lines of: SELinux: Invalid class 0 type=AVC msg=audit(1434126658.487:34500): avc: denied { <unprintable> } for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable> permissive=0 This was due to a failure to initialize the security state of the new connection sock by the tipc code, leaving it with junk in the security class field and an unlabeled secid. Add a call to security_sk_clone() to inherit the security state from the parent socket. Reported-by: Tim Shearer <tim.shearer@overturenetworks.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Daniel Borkmann authored
commit 4f7d2cdf upstream. Jason Gunthorpe reported that since commit c02db8c6 ("rtnetlink: make SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes anymore with respect to their policy, that is, ifla_vfinfo_policy[]. Before, they were part of ifla_policy[], but they have been nested since placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO, which is another nested attribute for the actual VF attributes such as IFLA_VF_MAC, IFLA_VF_VLAN, etc. Despite the policy being split out from ifla_policy[] in this commit, it's never applied anywhere. nla_for_each_nested() only does basic nla_ok() testing for struct nlattr, but it doesn't know about the data context and their requirements. Fix, on top of Jason's initial work, does 1) parsing of the attributes with the right policy, and 2) using the resulting parsed attribute table from 1) instead of the nla_for_each_nested() loop (just like we used to do when still part of ifla_policy[]). Reference: http://thread.gmane.org/gmane.linux.network/368913 Fixes: c02db8c6 ("rtnetlink: make SR-IOV VF interface symmetric") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Rony Efraim <ronye@mellanox.com> Cc: Vlad Zolotarov <vladz@cloudius-systems.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.18 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Dave Airlie authored
commit 7f98ca45 upstream. We apparantly get a hotplug irq before we've initialised modesetting, [drm] Loading R100 Microcode BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c125f56f>] __mutex_lock_slowpath+0x23/0x91 *pde = 00000000 Oops: 0002 [#1] Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.2.0-rc7-00015-gbf674028 #111 Hardware name: MicroLink /D850MV , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003 Workqueue: events radeon_hotplug_work_func [radeon] task: f6ca5900 ti: f6d3e000 task.ti: f6d3e000 EIP: 0060:[<c125f56f>] EFLAGS: 00010282 CPU: 0 EIP is at __mutex_lock_slowpath+0x23/0x91 EAX: 00000000 EBX: f5e900fc ECX: 00000000 EDX: fffffffe ESI: f6ca5900 EDI: f5e90100 EBP: f5e90000 ESP: f6d3ff0c DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 36f61000 CR4: 000006d0 Stack: f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0 c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940 Call Trace: [<c103c4c1>] ? dequeue_task_fair+0xa4/0xb7 [<c125f162>] ? mutex_lock+0x9/0xa [<f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon] [<c1034741>] ? process_one_work+0xfc/0x194 [<c1034b58>] ? worker_thread+0x18d/0x218 [<c10349cb>] ? rescuer_thread+0x1d5/0x1d5 [<c103742a>] ? kthread+0x7b/0x80 [<c12601c0>] ? ret_from_kernel_thread+0x20/0x30 [<c10373af>] ? init_completion+0x18/0x18 Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63 Reported-and-Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jan Kara authored
commit 8f2f3eb5 upstream. fsnotify_clear_marks_by_group_flags() can race with fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked() drops mark_mutex, a mark from the list iterated by fsnotify_clear_marks_by_group_flags() can be freed and thus the next entry pointer we have cached may become stale and we dereference free memory. Fix the problem by first moving marks to free to a special private list and then always free the first entry in the special list. This method is safe even when entries from the list can disappear once we drop the lock. Signed-off-by: Jan Kara <jack@suse.com> Reported-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joseph Qi authored
commit 209f7512 upstream. The "BUG_ON(list_empty(&osb->blocked_lock_list))" in ocfs2_downconvert_thread_do_work can be triggered in the following case: ocfs2dc has firstly saved osb->blocked_lock_count to local varibale processed, and then processes the dentry lockres. During the dentry put, it calls iput and then deletes rw, inode and open lockres from blocked list in ocfs2_mark_lockres_freeing. And this causes the variable `processed' to not reflect the number of blocked lockres to be processed, which triggers the BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Marcus Gelderie authored
commit de54b9ac upstream. A while back, the message queue implementation in the kernel was improved to use btrees to speed up retrieval of messages, in commit d6629859 ("ipc/mqueue: improve performance of send/recv"). That patch introducing the improved kernel handling of message queues (using btrees) has, as a by-product, changed the meaning of the QSIZE field in the pseudo-file created for the queue. Before, this field reflected the size of the user-data in the queue. Since, it also takes kernel data structures into account. For example, if 13 bytes of user data are in the queue, on my machine the file reports a size of 61 bytes. There was some discussion on this topic before (for example https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74): The pseudofiles in the mqueue filesystem (usually mounted at /dev/mqueue) expose fields with metadata describing a message queue. One of these fields, QSIZE, as originally implemented, showed the total number of bytes of user data in all messages in the message queue, and this feature was documented from the beginning in the mq_overview(7) page. In 3.5, some other (useful) work happened to break the user-space API in a couple of places, including the value exposed via QSIZE, which now includes a measure of kernel overhead bytes for the queue, a figure that renders QSIZE useless for its original purpose, since there's no way to deduce the number of overhead bytes consumed by the implementation. (The other user-space breakage was subsequently fixed.) This patch removes the accounting of kernel data structures in the queue. Reporting the size of these data-structures in the QSIZE field was a breaking change (see Michael's comment above). Without the QSIZE field reporting the total size of user-data in the queue, there is no way to deduce this number. It should be noted that the resource limit RLIMIT_MSGQUEUE is counted against the worst-case size of the queue (in both the old and the new implementation). Therefore, the kernel overhead accounting in QSIZE is not necessary to help the user understand the limitations RLIMIT imposes on the processes. Signed-off-by: Marcus Gelderie <redmnic@gmail.com> Acked-by: Doug Ledford <dledford@redhat.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: John Duffy <jb_duffy@btinternet.com> Cc: Arto Bendiken <arto@bendiken.net> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Daney authored
commit 46011e6e upstream. On MIPS the GLOBAL bit of the PTE must have the same value in any aligned pair of PTEs. These pairs of PTEs are referred to as "buddies". In a SMP system is is possible for two CPUs to be calling set_pte() on adjacent PTEs at the same time. There is a race between setting the PTE and a different CPU setting the GLOBAL bit in its buddy PTE. This race can be observed when multiple CPUs are executing vmap()/vfree() at the same time. Make setting the buddy PTE's GLOBAL bit an atomic operation to close the race condition. The case of CONFIG_64BIT_PHYS_ADDR && CONFIG_CPU_MIPS32 is *not* handled. Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10835/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Michal Hocko authored
commit ecf5fc6e upstream. Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384e ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f44 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. [tytso@mit.edu: corrected the control flow] Fixes: c3b94f44 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ luis: backported to 3.16: used Hugh's backport for 4.1 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Takashi Sakamoto authored
commit 18f5ed36 upstream. Fireworks uses TSB43CB43(IceLynx-Micro) as its IEC 61883-1/6 interface. This chip includes ARM7 core, and loads and runs program. The firmware is stored in on-board memory and loaded every powering-on from it. Echo Audio ships several versions of firmwares for each model. These firmwares have each quirk and the quirk changes a sequence of packets. As long as I investigated, AudioFire2/AudioFire4/AudioFirePre8 have a quirk to transfer a first packet with 0x02 in its dbc field. This causes ALSA Fireworks driver to detect discontinuity. In this case, firmware version 5.7.0, 5.7.3 and 5.8.0 are used. Payload CIP CIP quadlets header1 header2 02 00050002 90ffffff <- 42 0005000a 90013000 42 00050012 90014400 42 0005001a 90015800 02 0005001a 90ffffff 42 00050022 90019000 42 0005002a 9001a400 42 00050032 9001b800 02 00050032 90ffffff 42 0005003a 9001d000 42 00050042 9001e400 42 0005004a 9001f800 02 0005004a 90ffffff (AudioFire2 with firmware version 5.7.) $ dmesg snd-fireworks fw1.0: Detect discontinuity of CIP: 00 02 These models, AudioFire8 (since Jul 2009 ) and Gibson Robot Interface Pack series uses the same ARM binary as their firmware. Thus, this quirk may be observed among them. This commit adds a new member for AMDTP structure. This member represents the value of dbc field in a first AMDTP packet. Drivers can set it with a preferred value according to model's quirk. Tested-by: Johannes Oertei <johannes.oertel@uni-due.de> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Gavin Shan authored
commit ffe5adcb upstream. When xhci_mem_cleanup() is called, it's possible that the command timer isn't initialized and scheduled. For those cases, to delete the command timer causes soft-lockup as below stack dump shows. The patch avoids deleting the command timer if it's not scheduled with the help of timer_pending(). NMI watchdog: BUG: soft lockup - CPU#40 stuck for 23s! [kworker/40:1:8140] : NIP [c000000000150b30] lock_timer_base.isra.34+0x90/0xa0 LR [c000000000150c24] try_to_del_timer_sync+0x34/0xa0 Call Trace: [c000000f67c975e0] [c0000000015b84f8] mon_ops+0x0/0x8 (unreliable) [c000000f67c97620] [c000000000150c24] try_to_del_timer_sync+0x34/0xa0 [c000000f67c97660] [c000000000150cf0] del_timer_sync+0x60/0x80 [c000000f67c97690] [c00000000070ac0c] xhci_mem_cleanup+0x5c/0x5e0 [c000000f67c97740] [c00000000070c2e8] xhci_mem_init+0x1158/0x13b0 [c000000f67c97860] [c000000000700978] xhci_init+0x88/0x110 [c000000f67c978e0] [c000000000701644] xhci_gen_setup+0x2b4/0x590 [c000000f67c97970] [c0000000006d4410] xhci_pci_setup+0x40/0x190 [c000000f67c979f0] [c0000000006b1af8] usb_add_hcd+0x418/0xba0 [c000000f67c97ab0] [c0000000006cb15c] usb_hcd_pci_probe+0x1dc/0x5c0 [c000000f67c97b50] [c0000000006d3ba4] xhci_pci_probe+0x64/0x1f0 [c000000f67c97ba0] [c0000000004fe9ac] local_pci_probe+0x6c/0x130 [c000000f67c97c30] [c0000000000e5ce8] work_for_cpu_fn+0x38/0x60 [c000000f67c97c60] [c0000000000eacb8] process_one_work+0x198/0x470 [c000000f67c97cf0] [c0000000000eb6ac] worker_thread+0x37c/0x5a0 [c000000f67c97d80] [c0000000000f2730] kthread+0x110/0x130 [c000000f67c97e30] [c000000000009660] ret_from_kernel_thread+0x5c/0x7c Reported-by: Priya M. A <priyama2@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mathias Nyman authored
commit 7895086a upstream. We need to check that a TRB is part of the current segment before calculating its DMA address. Previously a ring segment didn't use a full memory page, and every new ring segment got a new memory page, so the off by one error in checking the upper bound was never seen. Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one didn't catch the case when a TRB was the first element of the next segment. This is triggered if the virtual memory pages for a ring segment are next to each in increasing order where the ring buffer wraps around and causes errors like: [ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1 [ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0 The trb-end address is one outside the end-seg address. Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-