- 05 Sep, 2014 40 commits
-
-
Kinglong Mee authored
commit d9499a95 upstream. A memory allocation failure could cause nfsd_startup_generic to fail, in which case nfsd_users wouldn't be incorrectly left elevated. After nfsd restarts nfsd_startup_generic will then succeed without doing anything--the first consequence is likely nfs4_start_net finding a bad laundry_wq and crashing. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 4539f149 "nfsd: replace boolean nfsd_up flag by users counter" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Takashi Iwai authored
commit dd5f5006 upstream. The commit [5ee0f803: usbcore: don't log on consecutive debounce failures of the same port] added the check of the reliable port, but it also replaced the device argument to dev_err() wrongly, which leads to a NULL dereference. This patch restores the right device, port_dev->dev. Also, since dev_err() itself shows the port number, reduce the port number shown in the error message, essentially reverting to the state before the commit 5ee0f803. [The fix suggested by Hannes, and the error message cleanup suggested by Alan Stern] Fixes: 5ee0f803 ('usbcore: don't log on consecutive debounce failures of the same port') Reported-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roger Quadros authored
commit bdd405d2 upstream. If user specifies that USB autosuspend must be disabled by module parameter "usbcore.autosuspend=-1" then we must prevent autosuspend of USB hub devices as well. commit 596d789a introduced in v3.8 changed the original behaivour and stopped respecting the usbcore.autosuspend parameter for hubs. Fixes: 596d789a "USB: set hub's default autosuspend delay as 0" Signed-off-by: Roger Quadros <rogerq@ti.com> Tested-by: Michael Welling <mwelling@emacinc.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Chen authored
commit 5cbcc35e upstream. The roothub's index per controller is from 0, but the hub port index per hub is from 1, this patch fixes "can't find device at roohub" problem for connecting test fixture at roohub when do USB-IF Embedded Host High-Speed Electrical Test. This patch is for v3.12+. Signed-off-by: Peter Chen <peter.chen@freescale.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
James Forshaw authored
commit 6817ae22 upstream. This patch fixes a potential security issue in the whiteheat USB driver which might allow a local attacker to cause kernel memory corrpution. This is due to an unchecked memcpy into a fixed size buffer (of 64 bytes). On EHCI and XHCI busses it's possible to craft responses greater than 64 bytes leading a buffer overflow. Signed-off-by: James Forshaw <forshaw@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jaša Bartelj authored
commit 646907f5 upstream. Added support to the ftdi_sio driver for ekey Converter USB which uses an FT232BM chip. Signed-off-by: Jaša Bartelj <jasa.bartelj@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 6552cc7f upstream. Add device id for Basic Micro ATOM Nano USB2Serial adapters. Reported-by: Nicolas Alt <n.alt@mytum.de> Tested-by: Nicolas Alt <n.alt@mytum.de> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tony Lindgren authored
commit cc824534 upstream. Looks like MUSB cable removal can cause wake-up interrupts to stop working for device tree based booting at least for UART3 even as nothing is dynamically remuxed. This can be fixed by calling reconfigure_io_chain() for device tree based booting in hwmod code. Note that we already do that for legacy booting if the legacy mux is configured. My guess is that this is related to UART3 and MUSB ULPI hsusb0_data0 and hsusb0_data1 support for Carkit mode that somehow affect the configured IO chain for UART3 and require rearming the wake-up interrupts. In general, for device tree based booting, pinctrl-single calls the rearm hook that in turn calls reconfigure_io_chain so calling reconfigure_io_chain should not be needed from the hwmod code for other events. So let's limit the hwmod rearming of iochain only to HWMOD_FORCE_MSTANDBY where MUSB is currently the only user of it. If we see other devices needing similar changes we can add more checks for it. Cc: Paul Walmsley <paul@pwsan.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans de Goede authored
commit e21eba05 upstream. This is a bit bigger hammer then I would like to use for this, but for now it will have to make do. I'm working on getting my hands on one of these so that I can try to get streams to work (with a quirk flag if necessary) and then we can re-enable them. For now this at least makes uas capable disk enclosures work again by forcing fallback to the usb-storage driver. https://bugzilla.kernel.org/show_bug.cgi?id=79511Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mathias Nyman authored
commit 365038d8 upstream. When we manually need to move the TR dequeue pointer we need to set the correct cycle bit as well. Previously we used the trb pointer from the last event received as a base, but this was changed in commit 1f81b6d2 ("usb: xhci: Prefer endpoint context dequeue pointer") to use the dequeue pointer from the endpoint context instead It turns out some Asmedia controllers advance the dequeue pointer stored in the endpoint context past the event triggering TRB, and this messed up the way the cycle bit was calculated. Instead of adding a quirk or complicating the already hard to follow cycle bit code, the whole cycle bit calculation is now simplified and adapted to handle event and endpoint context dequeue pointer differences. Fixes: 1f81b6d2 ("usb: xhci: Prefer endpoint context dequeue pointer") Reported-by: Maciej Puzio <mx34567@gmail.com> Reported-by: Evan Langlois <uudruid74@gmail.com> Reviewed-by: Julius Werner <jwerner@chromium.org> Tested-by: Maciej Puzio <mx34567@gmail.com> Tested-by: Evan Langlois <uudruid74@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Huang Rui authored
commit 2597fe99 upstream. AMD xHC also needs short tx quirk after tested on most of chipset generations. That's because there is the same incorrect behavior like Fresco Logic host. Please see below message with on USB webcam attached on xHC host: [ 139.262944] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.266934] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.270913] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.274937] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.278914] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.282936] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.286915] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.290938] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.294913] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? [ 139.298917] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? Reported-by: Arindam Nath <arindam.nath@amd.com> Tested-by: Shriraj-Rai P <shriraj-rai.p@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans de Goede authored
commit 9a548863 upstream. When using a Renesas uPD720231 chipset usb-3 uas to sata bridge with a 120G Crucial M500 ssd, model string: Crucial_ CT120M500SSD1, together with a the integrated Intel xhci controller on a Haswell laptop: 00:14.0 USB controller [0c03]: Intel Corporation 8 Series USB xHCI HC [8086:9c31] (rev 04) The following error gets logged to dmesg: xhci error: Transfer event TRB DMA ptr not part of current TD Treating COMP_STOP the same as COMP_STOP_INVAL when no event_seg gets found fixes this. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Larry Finger authored
commit a2fa6721 upstream. The Elecom WDC-150SU2M uses this chip. Reported-by: Hiroki Kondo <kompiro@gmail.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Holger Paradies authored
commit 8626d524 upstream. The stick is not recognized. This dongle uses r8188eu but usb-id is missing. 3.16.0 Signed-off-by: Holger Paradies <retabell@gmx.de> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mark Einon authored
commit ec0a38bf upstream. Fix two reported bugs, caused by et131x_adapter->phydev->addr being accessed before it is initialised, by: - letting et131x_mii_write() take a phydev address, instead of using the one stored in adapter by default. This is so et131x_mdio_write() can use it's own addr value. - removing implementation of et131x_mdio_reset(), as it's not needed. - moving a call to et131x_disable_phy_coma() in et131x_pci_setup(), which uses phydev->addr, until after the mdiobus has been registered. Link: https://bugzilla.kernel.org/show_bug.cgi?id=80751 Link: https://bugzilla.kernel.org/show_bug.cgi?id=77121Signed-off-by: Mark Einon <mark.einon@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pranith Kumar authored
commit e409842a upstream. The following patch fixes a build error on sparc32. I think it should go to stable 3.16. Remove a circular dependency on atomic.h header file which leads to compilation failure on sparc32 as reported here: http://kisskb.ellerman.id.au/kisskb/buildresult/11340509/ The specific dependency is as follows: In file included from arch/sparc/include/asm/smp_32.h:24:0, from arch/sparc/include/asm/smp.h:6, from arch/sparc/include/asm/switch_to_32.h:4, from arch/sparc/include/asm/switch_to.h:6, from arch/sparc/include/asm/ptrace.h:84, from arch/sparc/include/asm/processor_32.h:16, from arch/sparc/include/asm/processor.h:6, from arch/sparc/include/asm/barrier_32.h:4, from arch/sparc/include/asm/barrier.h:6, from arch/sparc/include/asm/atomic_32.h:17, from arch/sparc/include/asm/atomic.h:6, from drivers/staging/lustre/lustre/obdclass/class_obd.c:38 Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit db9ee220 upstream. It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 022eaa75 upstream. When recovering the journal, don't fall into an infinite loop if we encounter a corrupt journal block. Instead, just skip the block and return an error, which fails the mount and thus forces the user to run a full filesystem fsck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit d80d448c upstream. When performing a same-directory rename, it's possible that adding or setting the new directory entry will cause the directory to overflow the inline data area, which causes the directory to be converted to an extent-based directory. Under this circumstance it is necessary to re-read the directory when deleting the old dirent because the "old directory" context still points to i_block in the inode table, which is now an extent tree root! The delete fails with an FS error, and the subsequent fsck complains about incorrect link counts and hardlinked directories. Test case (originally found with flat_dir_test in the metadata_csum test program): # mkfs.ext4 -O inline_data /dev/sda # mount /dev/sda /mnt # mkdir /mnt/x # touch /mnt/x/changelog.gz /mnt/x/copyright /mnt/x/README.Debian # sync # for i in /mnt/x/*; do mv $i $i.longer; done # ls -la /mnt/x/ total 0 -rw-r--r-- 1 root root 0 Aug 25 12:03 changelog.gz.longer -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright.longer -rw-r--r-- 1 root root 0 Aug 25 12:03 README.Debian.longer (Hey! Why are there four files now??) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Monakhov authored
commit 6603120e upstream. In case of delalloc block i_disksize may be less than i_size. So we have to update i_disksize each time we allocated and submitted some blocks beyond i_disksize. We weren't doing this on the error paths, so fix this. testcase: xfstest generic/019 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Monakhov authored
commit c174e6d6 upstream. After commit f282ac19 we use different transactions for preallocation and i_disksize update which result in complain from fsck after power-failure. spotted by generic/019. IMHO this is regression because fs becomes inconsistent, even more 'e2fsck -p' will no longer works (which drives admins go crazy) Same transaction requirement applies ctime,mtime updates testcase: xfstest generic/019 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Monakhov authored
commit 69dc9536 upstream. Currently we reserve only 4 blocks but in worst case scenario ext4_zero_partial_blocks() may want to zeroout and convert two non adjacent blocks. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Monakhov authored
commit 4631dbf6 upstream. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Usyskin authored
commit 8e8248b1 upstream. NFC will leak buffer if send failed. Use single exit point that does the freeing Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Usyskin authored
commit 73ab4232 upstream. If connect request is queued (e.g. device in pg) set client state to initializing, thus avoid preliminary exit in wait if current state is disconnected. This is regression from: commit e4d8270e Author: Alexander Usyskin <alexander.usyskin@intel.com> mei: set connecting state just upon connection request is sent to the fw Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liu Bo authored
commit 9e0af237 upstream. This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Btrfs now migrates to use kernel workqueue, but it introduces this hang problem. Btrfs has a kind of work queued as an ordered way, which means that its ordered_func() must be processed in the way of FIFO, so it usually looks like -- normal_work_helper(arg) work = container_of(arg, struct btrfs_work, normal_work); work->func() <---- (we name it work X) for ordered_work in wq->ordered_list ordered_work->ordered_func() ordered_work->ordered_free() The hang is a rare case, first when we find free space, we get an uncached block group, then we go to read its free space cache inode for free space information, so it will file a readahead request btrfs_readpages() for page that is not in page cache __do_readpage() submit_extent_page() btrfs_submit_bio_hook() btrfs_bio_wq_end_io() submit_bio() end_workqueue_bio() <--(ret by the 1st endio) queue a work(named work Y) for the 2nd also the real endio() So the hang occurs when work Y's work_struct and work X's work_struct happens to share the same address. A bit more explanation, A,B,C -- struct btrfs_work arg -- struct work_struct kthread: worker_thread() pick up a work_struct from @worklist process_one_work(arg) worker->current_work = arg; <-- arg is A->normal_work worker->current_func(arg) normal_work_helper(arg) A = container_of(arg, struct btrfs_work, normal_work); A->func() A->ordered_func() A->ordered_free() <-- A gets freed B->ordered_func() submit_compressed_extents() find_free_extent() load_free_space_inode() ... <-- (the above readhead stack) end_workqueue_bio() btrfs_queue_work(work C) B->ordered_free() As if work A has a high priority in wq->ordered_list and there are more ordered works queued after it, such as B->ordered_func(), its memory could have been freed before normal_work_helper() returns, which means that kernel workqueue code worker_thread() still has worker->current_work pointer to be work A->normal_work's, ie. arg's address. Meanwhile, work C is allocated after work A is freed, work C->normal_work and work A->normal_work are likely to share the same address(I confirmed this with ftrace output, so I'm not just guessing, it's rare though). When another kthread picks up work C->normal_work to process, and finds our kthread is processing it(see find_worker_executing_work()), it'll think work C as a collision and skip then, which ends up nobody processing work C. So the situation is that our kthread is waiting forever on work C. Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work->func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Mason authored
commit f6dc45c7 upstream. We should only be flushing on close if the file was flagged as needing it during truncate. I broke this with my ordered data vs transaction commit deadlock fix. Thanks to Miao Xie for catching this. Signed-off-by: Chris Mason <clm@fb.com> Reported-by: Miao Xie <miaox@cn.fujitsu.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liu Bo authored
commit 38c1c2e4 upstream. The crash is ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:2124! [...] Workqueue: btrfs-endio normal_work_helper [btrfs] RIP: 0010:[<ffffffffa02d6055>] [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs] This is in fact a regression. It is because we forgot to increase @offset properly in reading corrupted block, so that the @offset remains, and this leads to checksum errors while reading left blocks queued up in the same bio, and then ends up with hiting the above BUG_ON. Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Mason authored
commit 8d875f95 upstream. Truncates and renames are often used to replace old versions of a file with new versions. Applications often expect this to be an atomic replacement, even if they haven't done anything to make sure the new version is fully on disk. Btrfs has strict flushing in place to make sure that renaming over an old file with a new file will fully flush out the new file before allowing the transaction commit with the rename to complete. This ordering means the commit code needs to be able to lock file pages, and there are a few paths in the filesystem where we will try to end a transaction with the page lock held. It's rare, but these things can deadlock. This patch removes the ordered flushes and switches to a best effort filemap_flush like ext4 uses. It's not perfect, but it should fix the deadlocks. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liu Bo authored
commit ce62003f upstream. When failing to allocate space for the whole compressed extent, we'll fallback to uncompressed IO, but we've forgotten to redirty the pages which belong to this compressed extent, and these 'clean' pages will simply skip 'submit' part and go to endio directly, at last we got data corruption as we write nothing. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Tested-By: Martin Steigerwald <martin@lichtvoll.de> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filipe Manana authored
commit 6f7ff6d7 upstream. Before processing the extent buffer, acquire a read lock on it, so that we're safe against concurrent updates on the extent buffer. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filipe Manana authored
commit 27b9a812 upstream. Under rare circumstances we can end up leaving 2 versions of a checksum for the same file extent range. The reason for this is that after calling btrfs_next_leaf we process slot 0 of the leaf it returns, instead of processing the slot set in path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after btrfs_next_leaf() releases the path and before it searches for the next leaf, another task might cause a split of the next leaf, which migrates some of its keys to the leaf we were processing before calling btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the same leaf but with path->slots[0] having a slot number corresponding to the first new key it got, that is, a slot number that didn't exist before calling btrfs_next_leaf(), as the leaf now has more keys than it had before. So we must really process the returned leaf starting at path->slots[0] always, as it isn't always 0, and the key at slot 0 can have an offset much lower than our search offset/bytenr. For example, consider the following scenario, where we have: sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568 four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472 Leaf N: slot = 0 slot = btrfs_header_nritems() - 1 |-------------------------------------------------------------------| | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] | |-------------------------------------------------------------------| Leaf N + 1: slot = 0 slot = btrfs_header_nritems() - 1 |--------------------------------------------------------------------| | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 | |--------------------------------------------------------------------| Because we are at the last slot of leaf N, we call btrfs_next_leaf() to find the next highest key, which releases the current path and then searches for that next key. However after releasing the path and before finding that next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore btrfs_next_leaf() will returns us a path again with leaf N but with the slot pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N is then: slot = 0 slot = btrfs_header_nritems() - 2 slot = btrfs_header_nritems() - 1 |----------------------------------------------------------------------------------------------------| | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] [(CSUM CSUM 40161280), size 32] | |----------------------------------------------------------------------------------------------------| And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump into the "insert:" label, which will set tmp to: tmp = min((sums->len - total_bytes) >> blocksize_bits, (next_offset - file_key.offset) >> blocksize_bits) = min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) = min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4 and ins_size = csum_size * tmp = 4 * 4 = 16 bytes. In other words, we insert a new csum item in the tree with key (CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong, because the item with key (CSUM CSUM 40161280) (the one that was moved from leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288 bytes of our data and won't get those old checksums removed. So this leaves us 2 different checksums for 3 4kb blocks of data in the tree, and breaks the logical rule: Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover An obvious bad effect of this is that a subsequent csum tree lookup to get the checksum of any of the blocks with logical offset of 40161280, 40165376 or 40169472 (the last 3 4kb blocks of file data), will get the old checksums. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Takashi Iwai authored
commit 4eb1f66d upstream. We've got bug reports that btrfs crashes when quota is enabled on 32bit kernel, typically with the Oops like below: BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs] *pde = 00000000 Oops: 0000 [#1] SMP CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S W 3.15.2-1.gd43d97e-default #1 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs] task: f1478130 ti: f147c000 task.ti: f147c000 EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0 EIP is at find_parent_nodes+0x360/0x1380 [btrfs] EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000 ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690 Stack: 00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050 00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000 00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000 Call Trace: [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs] [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs] [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs] [<c025e38b>] process_one_work+0x11b/0x390 [<c025eea1>] worker_thread+0x101/0x340 [<c026432b>] kthread+0x9b/0xb0 [<c0712a71>] ret_from_kernel_thread+0x21/0x30 [<c0264290>] kthread_create_on_node+0x110/0x110 This indicates a NULL corruption in prefs_delayed list. The further investigation and bisection pointed that the call of ulist_add_merge() results in the corruption. ulist_add_merge() takes u64 as aux and writes a 64bit value into old_aux. The callers of this function in backref.c, however, pass a pointer of a pointer to old_aux. That is, the function overwrites 64bit value on 32bit pointer. This caused a NULL in the adjacent variable, in this case, prefs_delayed. Here is a quick attempt to band-aid over this: a new function, ulist_add_merge_ptr() is introduced to pass/store properly a pointer value instead of u64. There are still ugly void ** cast remaining in the callers because void ** cannot be taken implicitly. But, it's safer than explicit cast to u64, anyway. Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen M. Cameron authored
commit 0758f4f7 upstream. When copy_from_user fails, return -EFAULT, not -ENOMEM Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Reported-by: Robert Elliott <elliott@hp.com> Reviewed-by: Joe Handzik <joseph.t.handzik@hp.com> Reviewed-by: Scott Teel <scott.teel@hp.com> Reviewed by: Mike MIller <michael.miller@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hugh Dickins authored
commit b38af472 upstream. Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008 at mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels: where zap_pte_range() checks page->mapping to see if PageAnon(page). Those addresses fit struct pages for pfns d2001 and d2000, and in each dump a register or a stack slot showed d2001730 or d2000730: pte flags 0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has a hole between cfffffff and 100000000, which would need special access. Commit c46a7c81 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL pte no longer passes the pte_special() test, so zap_pte_range() goes on to try to access a non-existent struct page. Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE) to complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE). A hint that this was a problem was that c46a7c81 added pte_numa() test to vm_normal_page(), and moved its is_zero_pfn() test from slow to fast path: This was papering over a pte_special() snag when the zero page was encountered during zap. This patch reverts vm_normal_page() to how it was before, relying on pte_special(). It still appears that this patch may be incomplete: aren't there other places which need to be handling PROTNONE along with PRESENT? For example, pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a PROT_NONE area, that would make it pte_special(). This is side-stepped by the fact that NUMA hinting faults skipped PROT_NONE VMAs and there are no grounds where a NUMA hinting fault on a PROT_NONE VMA would be interesting. Fixes: c46a7c81 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Vrabel authored
commit 8d5999df upstream. If the timer irqs are resumed during device resume it is possible in certain circumstances for the resume to hang early on, before device interrupts are resumed. For an Ubuntu 14.04 PVHVM guest this would occur in ~0.5% of resume attempts. It is not entirely clear what is occuring the point of the hang but I think a task necessary for the resume calls schedule_timeout(), waiting for a timer interrupt (which never arrives). This failure may require specific tasks to be running on the other VCPUs to trigger (processes are not frozen during a suspend/resume if PREEMPT is disabled). Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in syscore_resume(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Vrabel authored
commit 7d951f3c upstream. Commit b7dd0e35 (x86/xen: safely map and unmap grant frames when in atomic context) causes PVH guests to crash in arch_gnttab_map_shared() when they attempted to map the pages for the grant table. This use of a PV-specific function during the PVH grant table setup is non-obvious and not needed. The standard vmap() function does the right thing. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Mukesh Rathor <mukesh.rathor@oracle.com> Tested-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Matt Fleming authored
commit 7b2a583a upstream. Without CONFIG_RELOCATABLE the early boot code will decompress the kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS days, that isn't going to fly with UEFI since parts of the firmware code/data may be located at LOAD_PHYSICAL_ADDR. Straying outside of the bounds of the regions we've explicitly requested from the firmware will cause all sorts of trouble. Bruno reports that his machine resets while trying to decompress the kernel image. We already go to great pains to ensure the kernel is loaded into a suitably aligned buffer, it's just that the address isn't necessarily LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use by the firmware. Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we can load the kernel at any address with the correct alignment. Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Vrabel authored
commit dcecb8fd upstream. When using the FIFO-based ABI on x86_64, if the last port is at the end of an event array page then sync_test_bit() on this port's event word will read beyond the end of the page and in certain circumstances this may fault. The fault requires the following page in the kernel's direct mapping to be not present, which would mean: a) the array page is the last page of RAM; or b) the following page is ballooned out /and/ it has been used for a foreign mapping by a kernel driver (such as netback or blkback) /and/ the grant has been unmapped. Use the infrastructure added for arm64 to ensure that all bitops operating on event words are unsigned long aligned. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit ed5c41d3 upstream. Commit ea431643 ("x86/mce: Fix CMCI preemption bugs") breaks RT by the completely unrelated conversion of the cmci_discover_lock to a regular (non raw) spinlock. This lock was annotated in commit 59d958d2 ("locking, x86: mce: Annotate cmci_discover_lock as raw") with a proper explanation why. The argument for converting the lock back to a regular spinlock was: - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. Which is complete nonsense. The raw_spinlock is disabling preemption in the same way as a regular spinlock. In mainline spinlock maps to raw_spinlock, in RT spinlock becomes a "sleeping" lock. raw_spinlock has on RT exactly the same semantics as in mainline. And because this lock is taken in non preemptible context it must be raw on RT. Undo the locking brainfart. Reported-by: Clark Williams <williams@redhat.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-