- 20 Feb, 2015 40 commits
-
-
Ashay Jaiswal authored
commit 83b0302d upstream. The regulator framework maintains a list of consumer regulators for a regulator device and protects it from concurrent access using the regulator device's mutex lock. In the case of regulator_put() the consumer is removed and regulator device's parameters are updated without holding the regulator device's mutex. This would lead to a race condition between the regulator_put() and any function which traverses the consumer list or modifies regulator device's parameters. Fix this race condition by holding the regulator device's mutex in case of regulator_put. Signed-off-by: Ashay Jaiswal <ashayj@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org> [bwh: Backported to 3.2: - Adjust context - Don't touch the comment; __regulator_put() has not been split out of regulator_put() here] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Srihari Vijayaraghavan authored
commit 148e9a71 upstream. On some laptops, keyboard needs to be reset in order to successfully detect touchpad (e.g., some Gigabyte laptop models with Elantech touchpads). Without resettin keyboard touchpad pretends to be completely dead. Based on the original patch by Mateusz Jończyk this version has been expanded to include DMI based detection & application of the fix automatically on the affected models of laptops. This has been confirmed to fix problem by three users already on three different models of laptops. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81331Signed-off-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com> Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com> Tested by: Zakariya Dehlawi <zdehlawi@gmail.com> Tested-by: Guillaum Bouchard <guillaum.bouchard@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Sasha Levin authored
commit 6ada1fc0 upstream. An unvalidated user input is multiplied by a constant, which can result in an undefined behaviour for large values. While this is validated later, we should avoid triggering undefined behaviour. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [jstultz: include trivial milisecond->microsecond correction noticed by Andy] Signed-off-by: John Stultz <john.stultz@linaro.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Shevchenko authored
commit 4aaa7187 upstream. DMA mapped IO should be unmapped on the error path in probe() and unconditionally on remove(). Fixes: 62936009 ([libata] Add 460EX on-chip SATA driver, sata_dwc_460ex) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Linus Torvalds authored
commit fee7e49d upstream. Jay Foad reports that the address sanitizer test (asan) sometimes gets confused by a stack pointer that ends up being outside the stack vma that is reported by /proc/maps. This happens due to an interaction between RLIMIT_STACK and the guard page: when we do the guard page check, we ignore the potential error from the stack expansion, which effectively results in a missing guard page, since the expected stack expansion won't have been done. And since /proc/maps explicitly ignores the guard page (commit d7824370: "mm: fix up some user-visible effects of the stack guard page"), the stack pointer ends up being outside the reported stack area. This is the minimal patch: it just propagates the error. It also effectively makes the guard page part of the stack limit, which in turn measn that the actual real stack is one page less than the stack limit. Let's see if anybody notices. We could teach acct_stack_growth() to allow an extra page for a grow-up/grow-down stack in the rlimit test, but I don't want to add more complexity if it isn't needed. Reported-and-tested-by: Jay Foad <jay.foad@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Zidan Wang authored
commit 22ee76da upstream. wm8960 codec can't support sample rate 11250, it must be 11025. Signed-off-by: Zidan Wang <b50113@freescale.com> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David Peterson authored
commit 1ae78a48 upstream. Added virtual com port VID/PID entries for CEL USB sticks and MeshWorks devices. Signed-off-by: David Peterson <david.peterson@cel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Michael S. Tsirkin authored
commit a1eb03f5 upstream. The reason we defer kfree until release function is because it's a general rule for kobjects: kfree of the reference counter itself is only legal in the release function. Previous patch didn't make this clear, document this in code. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Sasha Levin authored
commit 63bd62a0 upstream. A struct device which has just been unregistered can live on past the point at which a driver decides to drop it's initial reference to the kobject gained on allocation. This implies that when releasing a virtio device, we can't free a struct virtio_device until the underlying struct device has been released, which might not happen immediately on device_unregister(). Unfortunately, this is exactly what virtio pci does: it has an empty release callback, and frees memory immediately after unregistering the device. This causes an easy to reproduce crash if CONFIG_DEBUG_KOBJECT_RELEASE it enabled. To fix, free the memory only once we know the device is gone in the release callback. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Wanlong Gao authored
commit 9bffdca8 upstream. Use dev_to_virtio wrapper in virtio to make code clearly. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Shevchenko authored
commit 67bf9cda upstream. The FIFO size is 40 accordingly to the specifications, but this means 0x40, i.e. 64 bytes. This patch fixes the typo and enables FIFO size autodetection for Intel MID devices. Fixes: 7063c0d9 (spi/dw_spi: add DMA support) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Axel Lin authored
commit d297933c upstream. Current code tries to find the highest valid fifo depth by checking the value it wrote to DW_SPI_TXFLTR. There are a few problems in current code: 1) There is an off-by-one in dws->fifo_len setting because it assumes the latest register write fails so the latest valid value should be fifo - 1. 2) We know the depth could be from 2 to 256 from HW spec, so it is not necessary to test fifo == 257. In the case fifo is 257, it means the latest valid setting is fifo = 256. So after the for loop iteration, we should check fifo == 2 case instead of fifo == 257 if detecting the FIFO depth fails. This patch fixes above issues. Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Takashi Iwai authored
commit c507de88 upstream. stac_store_hints() does utterly wrong for masking the values for gpio_dir and gpio_data, likely due to copy&paste errors. Fortunately, this feature is used very rarely, so the impact must be really small. Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
This reverts commit 9f871e88, which was commit 1485348d upstream. It can cause connections to stall when a PMTU event occurs. This was fixed by commit 843925f3 ("tcp: Do not apply TSO segment limit to non-TSO packets") upstream, but that depends on other changes to TSO. The original issue this fixed was a performance regression for the sfc driver in extreme cases of TSO (skb with > 100 segments). This is not really very important and it seems best to revert it rather than try to fix it up. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org Cc: linux-net-drivers@solarflare.com
-
Preston Fick authored
commit 90441b4d upstream. Fixing typo for MeshConnect IDs. The original PID (0x8875) is not in production and is not needed. Instead it has been changed to the official production PID (0x8857). Signed-off-by: Preston Fick <pffick@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Tomi Valkeinen authored
commit 30ea9c52 upstream. fb_deferred_io_fsync() returns the value of schedule_delayed_work() as an error code, but schedule_delayed_work() does not return an error. It returns true/false depending on whether the work was already queued. Fix this by ignoring the return value of schedule_delayed_work(). Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Tomi Valkeinen authored
commit 92b004d1 upstream. If the probe of an fb driver has been deferred due to missing dependencies, and the probe is later ran when a module is loaded, the fbdev framework will try to find a logo to use. However, the logos are __initdata, and have already been freed. This causes sometimes page faults, if the logo memory is not mapped, sometimes other random crashes as the logo data is invalid, and sometimes nothing, if the fbdev decides to reject the logo (e.g. the random value depicting the logo's height is too big). This patch adds a late_initcall function to mark the logos as freed. In reality the logos are freed later, and fbdev probe may be ran between this late_initcall and the freeing of the logos. In that case we will miss drawing the logo, even if it would be possible. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Toshiaki Makita authored
commit 796f2da8 upstream. When vlan tags are stacked, it is very likely that the outer tag is stored in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto. Currently netif_skb_features() first looks at skb->protocol even if there is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol encapsulated by the inner vlan instead of the inner vlan protocol. This allows GSO packets to be passed to HW and they end up being corrupted. Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - We don't support 802.1ad tag offload - Keep passing protocol to harmonize_features()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Rabin Vincent authored
commit 7e77bdeb upstream. If a request is backlogged, it's complete() handler will get called twice: once with -EINPROGRESS, and once with the final error code. af_alg's complete handler, unlike other users, does not handle the -EINPROGRESS but instead always completes the completion that recvmsg() is waiting on. This can lead to a return to user space while the request is still pending in the driver. If userspace closes the sockets before the requests are handled by the driver, this will lead to use-after-frees (and potential crashes) in the kernel due to the tfm having been freed. The crashes can be easily reproduced (for example) by reducing the max queue length in cryptod.c and running the following (from http://www.chronox.de/libkcapi.html) on AES-NI capable hardware: $ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \ -k 00000000000000000000000000000000 \ -p 00000000000000000000000000000000 >/dev/null & done Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit e237ec37 upstream. Check that length specified in a component of a symlink fits in the input buffer we are reading. Also properly ignore component length for component types that do not use it. Otherwise we read memory after end of buffer for corrupted udf image. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Lutomirski authored
commit 394f56fe upstream. The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> [bwh: Backported to 3.2: - Adjust context - The whole file is only built for x86_64; adjust comment for this] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit 0e5cc9a4 upstream. Symlink reading code does not check whether the resulting path fits into the page provided by the generic code. This isn't as easy as just checking the symlink size because of various encoding conversions we perform on path. So we have to check whether there is still enough space in the buffer on the fly. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit fef2e9f3 upstream. Currently, we ignore symlink component of type 2. But mkisofs and other OS' seem to treat it as / so do the same for compatibility. Reported-by: "Gábor S." <otnaccess@hotmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit a1d47b26 upstream. UDF specification allows arbitrarily large symlinks. However we support only symlinks at most one block large. Check the length of the symlink so that we don't access memory beyond end of the symlink block. Reported-by: Carl Henrik Lunde <chlunde@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit e159332b upstream. Verify that inode size is sane when loading inode with data stored in ICB. Otherwise we may get confused later when working with the inode and inode size is too big. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: on error, call make_bad_inode() then return] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit 4e202462 upstream. We didn't check length of rock ridge ER records before printing them. Thus corrupted isofs image can cause us to access and print some memory behind the buffer with obvious consequences. Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Junxiao Bi authored
commit 136f49b9 upstream. For buffer write, page lock will be got in write_begin and released in write_end, in ocfs2_write_end_nolock(), before it unlock the page in ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask for the read lock of journal->j_trans_barrier. Holding page lock and ask for journal->j_trans_barrier breaks the locking order. This will cause a deadlock with journal commit threads, ocfs2cmt will get write lock of journal->j_trans_barrier first, then it wakes up kjournald2 to do the commit work, at last it waits until done. To commit journal, kjournald2 needs flushing data first, it needs get the cache page lock. Since some ocfs2 cluster locks are holding by write process, this deadlock may hung the whole cluster. unlock pages before ocfs2_run_deallocs() can fix the locking order, also put unlock before ocfs2_commit_trans() to make page lock is unlocked before j_trans_barrier to preserve unlocking order. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jiri Jaburek authored
commit d70a1b98 upstream. The Arcam rPAC seems to have the same problem - whenever anything (alsamixer, udevd, 3.9+ kernel from 60af3d03, ..) attempts to access mixer / control interface of the card, the firmware "locks up" the entire device, resulting in SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error from alsa-lib. Other operating systems can somehow read the mixer (there seems to be playback volume/mute), but any manipulation is ignored by the device (which has hardware volume controls). Signed-off-by: Jiri Jaburek <jjaburek@redhat.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Lutomirski authored
commit 3fb2f423 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Yan, Zheng authored
commit 97c85a82 upstream. Current snaphost code does not properly handle moving inode from one empty snap realm to another empty snap realm. After changing inode's snap realm, some dirty pages' snap context can be not equal to inode's i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() The fix is introduce a global empty snap context for all empty snap realm. This avoids triggering the BUG() for filesystem with no snapshot. Fixes: http://tracker.ceph.com/issues/9928Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com> [bwh: Backported to 3.2: - Adjust context - As we don't have ceph_create_snap_context(), open-code it in ceph_snap_init()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Nicholas Bellinger authored
commit 6bf6ca75 upstream. This patch changes iscsit_do_tx_data() to fail on short writes when kernel_sendmsg() returns a value different than requested transfer length, returning -EPIPE and thus causing a connection reset to occur. This avoids a potential bug in the original code where a short write would result in kernel_sendmsg() being called again with the original iovec base + length. In practice this has not been an issue because iscsit_do_tx_data() is only used for transferring 48 byte headers + 4 byte digests, along with seldom used control payloads from NOPIN + TEXT_RSP + REJECT with less than 32k of data. So following Al's audit of iovec consumers, go ahead and fail the connection on short writes for now, and remove the bogus logic ahead of his proper upstream fix. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit f54e18f1 upstream. Rock Ridge extensions define so called Continuation Entries (CE) which define where is further space with Rock Ridge data. Corrupted isofs image can contain arbitrarily long chain of these, including a one containing loop and thus causing kernel to end in an infinite loop when traversing these entries. Limit the traversal to 32 entries which should be more than enough space to store all the Rock Ridge data. Reported-by: P J P <ppandit@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Lutomirski authored
commit 0e58af4e upstream. Users have no business installing custom code segments into the GDT, and segments that are not present but are otherwise valid are a historical source of interesting attacks. For completeness, block attempts to set the L bit. (Prior to this patch, the L bit would have been silently dropped.) This is an ABI break. I've checked glibc, musl, and Wine, and none of them look like they'll have any trouble. Note to stable maintainers: this is a hardening patch that fixes no known bugs. Given the possibility of ABI issues, this probably shouldn't be backported quickly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dan Carpenter authored
commit b5c8afe5 upstream. "origPtr" is used as an offset into the bd->dbuf[] array. That array is allocated in start_bunzip() and has "bd->dbufSize" number of elements so the test here should be >= instead of >. Later we check "origPtr" again before using it as an offset so I don't know if this bug can be triggered in real life. Fixes: bc22c17e ('bzip2/lzma: library support for gzip, bzip2 and lzma decompression') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Thomas Gleixner authored
commit c291ee62 upstream. Since the rework of the sparse interrupt code to actually free the unused interrupt descriptors there exists a race between the /proc interfaces to the irq subsystem and the code which frees the interrupt descriptor. CPU0 CPU1 show_interrupts() desc = irq_to_desc(X); free_desc(desc) remove_from_radix_tree(); kfree(desc); raw_spinlock_irq(&desc->lock); /proc/interrupts is the only interface which can actively corrupt kernel memory via the lock access. /proc/stat can only read from freed memory. Extremly hard to trigger, but possible. The interfaces in /proc/irq/N/ are not affected by this because the removal of the proc file is serialized in procfs against concurrent readers/writers. The removal happens before the descriptor is freed. For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue as the descriptor is never freed. It's merely cleared out with the irq descriptor lock held. So any concurrent proc access will either see the old correct value or the cleared out ones. Protect the lookup and access to the irq descriptor in show_interrupts() with the sparse_irq_lock. Provide kstat_irqs_usr() which is protecting the lookup and access with sparse_irq_lock and switch /proc/stat to use it. Document the existing kstat_irqs interfaces so it's clear that the caller needs to take care about protection. The users of these interfaces are either not affected due to SPARSE_IRQ=n or already protected against removal. Fixes: 1f5a5b87 "genirq: Implement a sane sparse_irq allocator" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.2: - Adjust context - Handle the CONFIG_GENERIC_HARDIRQS=n case] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andreas Müller authored
commit d025933e upstream. As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount" stopped being incremented after the use-after-free fix. Furthermore, the RX-LED will be triggered by every multicast frame (which wouldn't happen before) which wouldn't allow the LED to rest at all. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the patch. Fixes: b8fff407 ("mac80211: fix use-after-free in defragmentation") Signed-off-by: Andreas Müller <goo@stapelspeicher.org> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Lutomirski authored
commit f647d7c1 upstream. Otherwise, if buggy user code points DS or ES into the TLS array, they would be corrupted after a context switch. This also significantly improves the comments and documents some gotchas in the code. Before this patch, the both tests below failed. With this patch, the es test passes, although the gsbase test still fails. ----- begin es test ----- /* * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned short GDT3(int idx) { return (idx << 3) | 3; } static int create_tls(int idx, unsigned int base) { struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; if (syscall(SYS_set_thread_area, &desc) != 0) err(1, "set_thread_area"); return desc.entry_number; } int main() { int idx = create_tls(-1, 0); printf("Allocated GDT index %d\n", idx); unsigned short orig_es; asm volatile ("mov %%es,%0" : "=rm" (orig_es)); int errors = 0; int total = 1000; for (int i = 0; i < total; i++) { asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx))); usleep(100); unsigned short es; asm volatile ("mov %%es,%0" : "=rm" (es)); asm volatile ("mov %0,%%es" : : "rm" (orig_es)); if (es != GDT3(idx)) { if (errors == 0) printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n", GDT3(idx), es); errors++; } } if (errors) { printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total); return 1; } else { printf("[OK]\tES was preserved\n"); return 0; } } ----- end es test ----- ----- begin gsbase test ----- /* * gsbase.c, a gsbase test * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned char *testptr, *testptr2; static unsigned char read_gs_testvals(void) { unsigned char ret; asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr)); return ret; } int main() { int errors = 0; testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr == MAP_FAILED) err(1, "mmap"); testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr2 == MAP_FAILED) err(1, "mmap"); *testptr = 0; *testptr2 = 1; if (syscall(SYS_arch_prctl, ARCH_SET_GS, (unsigned long)testptr2 - (unsigned long)testptr) != 0) err(1, "ARCH_SET_GS"); usleep(100); if (read_gs_testvals() == 1) { printf("[OK]\tARCH_SET_GS worked\n"); } else { printf("[FAIL]\tARCH_SET_GS failed\n"); errors++; } asm volatile ("mov %0,%%gs" : : "r" (0)); if (read_gs_testvals() == 0) { printf("[OK]\tWriting 0 to gs worked\n"); } else { printf("[FAIL]\tWriting 0 to gs failed\n"); errors++; } usleep(100); if (read_gs_testvals() == 0) { printf("[OK]\tgsbase is still zero\n"); } else { printf("[FAIL]\tgsbase was corrupted\n"); errors++; } return errors == 0 ? 0 : 1; } ----- end gsbase test ----- Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit a682e9c2 upstream. If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error return value is then (in most cases) just overwritten before we return. This can result in reporting success to userspace although error happened. This bug was introduced by commit 2e54eb96 ("BKL: Remove BKL from ncpfs"). Propagate the errors correctly. Coverity id: 1226925. Fixes: 2e54eb96 ("BKL: Remove BKL from ncpfs") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Filipe Manana authored
commit 678886bd upstream. When we abort a transaction we iterate over all the ranges marked as dirty in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them from those trees, add them back (unpin) to the free space caches and, if the fs was mounted with "-o discard", perform a discard on those regions. Also, after adding the regions to the free space caches, a fitrim ioctl call can see those ranges in a block group's free space cache and perform a discard on the ranges, so the same issue can happen without "-o discard" as well. This causes corruption, affecting one or multiple btree nodes (in the worst case leaving the fs unmountable) because some of those ranges (the ones in the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are referred by the last committed super block - breaking the rule that anything that was committed by a transaction is untouched until the next transaction commits successfully. I ran into this while running in a loop (for several hours) the fstest that I recently submitted: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim The corruption always happened when a transaction aborted and then fsck complained like this: _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 read block failed check_tree_block Couldn't open file system In this case 94945280 corresponded to the root of a tree. Using frace what I observed was the following sequence of steps happened: 1) transaction N started, fs_info->pinned_extents pointed to fs_info->freed_extents[0]; 2) node/eb 94945280 is created; 3) eb is persisted to disk; 4) transaction N commit starts, fs_info->pinned_extents now points to fs_info->freed_extents[1], and transaction N completes; 5) transaction N + 1 starts; 6) eb is COWed, and btrfs_free_tree_block() called for this eb; 7) eb range (94945280 to 94945280 + 16Kb) is added to fs_info->pinned_extents (fs_info->freed_extents[1]); 8) Something goes wrong in transaction N + 1, like hitting ENOSPC for example, and the transaction is aborted, turning the fs into readonly mode. The stack trace I got for example: [112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66 [112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98 [112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50 [112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs] [112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs] [112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs] [112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs] (...) [112065.264493] ---[ end trace dd7903a975a31a08 ]--- [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left [112065.264997] BTRFS info (device sdc): forced readonly 9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in fs_info->fs_state and calls btrfs_cleanup_transaction(), which in turn calls btrfs_destroy_pinned_extent(); 10) Then btrfs_destroy_pinned_extent() iterates over all the ranges marked as dirty in fs_info->freed_extents[], and for each one it calls discard, if the fs was mounted with "-o discard", and adds the range to the free space cache of the respective block group; 11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path, sees the free space entries and performs a discard; 12) After an umount and mount (or fsck), our eb's location on disk was full of zeroes, and it should have been untouched, because it was marked as dirty in the fs_info->pinned_extents tree, and therefore used by the trees that the last committed superblock points to. Fix this by not performing a discard and not adding the ranges to the free space caches - it's useless from this point since the fs is now in readonly mode and we won't write free space caches to disk anymore (otherwise we would leak space) nor any new superblock. By not adding the ranges to the free space caches, it prevents other code paths from allocating that space and write to it as well, therefore being safer and simpler. This isn't a new problem, as it's been present since 2011 (git commit acce952b). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Alexander Duyck authored
commit a5a519b2 upstream. In recent testing I had disabled CONFIG_IP_MULTIPLE_TABLES and as a result when I ran "cat /proc/net/fib_trie" the main trie was displayed multiple times. I found that the problem line of code was in the function fib_trie_seq_next. Specifically the line below caused the indexes to go in the opposite direction of our traversal: h = tb->tb_id & (FIB_TABLE_HASHSZ - 1); This issue was that the RT tables are defined such that RT_TABLE_LOCAL is ID 255, while it is located at TABLE_LOCAL_INDEX of 0, and RT_TABLE_MAIN is 254 with a TABLE_MAIN_INDEX of 1. This means that the above line will return 1 for the local table and 0 for main. The result is that fib_trie_seq_next will return NULL at the end of the local table, fib_trie_seq_start will return the start of the main table, and then fib_trie_seq_next will loop on main forever as h will always return 0. The fix for this is to reverse the ordering of the two tables. It has the advantage of making it so that the tables now print in the same order regardless of if multiple tables are enabled or not. In order to make the definition consistent with the multiple tables case I simply masked the to RT_TABLE_XXX values by (FIB_TABLE_HASHSZ - 1). This way the two table layouts should always stay consistent. Fixes: 93456b6d ("[IPV4]: Unify access to the routing tables") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-