- 02 Nov, 2015 14 commits
-
-
Chuck Lever authored
Introduce a code path in the rpcrdma_reply_handler() to catch incoming backward direction RPC calls and route them to the ULP's backchannel server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Backward direction RPC replies are sent via the client transport's send_request method, the same way forward direction RPC calls are sent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Pre-allocate extra send and receive Work Requests needed to handle backchannel receives and sends. The transport doesn't know how many extra WRs to pre-allocate until the xprt_setup_backchannel() call, but that's long after the WRs are allocated during forechannel setup. So, use a fixed value for now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
xprtrdma's backward direction send and receive buffers are the same size as the forechannel's inline threshold, and must be pre- registered. The consumer has no control over which receive buffer the adapter chooses to catch an incoming backwards-direction call. Any receive buffer can be used for either a forward reply or a backward call. Thus both types of RPC message must all be the same size. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Now that RPC replies are processed in a workqueue, there's no need to disable IRQs when managing send and receive buffers. This saves noticeable overhead per RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: The reply tasklet is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The reply tasklet is fast, but it's single threaded. After reply traffic saturates a single CPU, there's no more reply processing capacity. Replace the tasklet with a workqueue to spread reply handling across all CPUs. This also moves RPC/RDMA reply handling out of the soft IRQ context and into a context that allows sleeps. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The rb_send_bufs and rb_recv_bufs arrays are used to implement a pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep structs. Replace those arrays with free lists. To allow more than 512 RPCs in-flight at once, each of these arrays would be larger than a page (assuming 8-byte addresses and 4KB pages). Allowing up to 64K in-flight RPCs (as TCP now does), each buffer array would have to be 128 pages. That's an order-6 allocation. (Not that we're going there.) A list is easier to expand dynamically. Instead of allocating a larger array of pointers and copying the existing pointers to the new array, simply append more buffers to each list. This also makes it simpler to manage receive buffers that might catch backwards-direction calls, or to post receive buffers in bulk to amortize the overhead of ib_post_recv. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: The error cases in rpcrdma_reply_handler() almost never execute. Ensure the compiler places them out of the hot path. No behavior change expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Commit 8301a2c0 ("xprtrdma: Limit work done by completion handler") was supposed to prevent xprtrdma's upcall handlers from starving other softIRQ work by letting them return to the provider before all CQEs have been polled. The logic assumes the provider will call the upcall handler again immediately if the CQ is re-armed while there are still queued CQEs. This assumption is invalid. The IBTA spec says that after a CQ is armed, the hardware must interrupt only when a new CQE is inserted. xprtrdma can't rely on the provider calling again, even though some providers do. Therefore, leaving CQEs on queue makes sense only when there is another mechanism that ensures all remaining CQEs are consumed in a timely fashion. xprtrdma does not have such a mechanism. If a CQE remains queued, the transport can wait forever to send the next RPC. Finally, move the wcs array back onto the stack to ensure that the poll array is always local to the CPU where the completion upcall is running. Fixes: 8301a2c0 ("xprtrdma: Limit work done by completion ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns a positive value if WCs were added to a CQ after the last completion upcall but before the CQ has been re-armed. Commit 7f23f6f6 ("xprtrmda: Reduce lock contention in completion handlers") assumed that when ib_req_notify_cq() returned a positive RC, the CQ had also been successfully re-armed, making it safe to return control to the provider without losing any completion signals. That is an invalid assumption. Change both completion handlers to continue polling while ib_req_notify_cq() returns a positive value. Fixes: 7f23f6f6 ("xprtrmda: Reduce lock contention in ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
After adding a swapfile on an NFS/RDMA mount and removing the normal swap partition, I was able to push the NFS client well into swap without any issue. I forgot to swapoff the NFS file before rebooting. This pinned the NFS mount and the IB core and provider, causing shutdown to hang. I think this is expected and safe behavior. Probably shutdown scripts should "swapoff -a" before unmounting any filesystems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Steve Wise authored
Unsignaled send WRs can get flushed as part of normal unmount, so don't log them as warnings. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- 18 Oct, 2015 3 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "Here are some bugfixes for the I2C subsystem. Kieran found a flaw in the recently renewed wake irq handling. Mika handled a user bug report where the ACPI info turned out to be unusable. I updated MAINTAINERS so that such bug reports will sooner get to the right people. Geert pointed me to a problem of some i2c drivers regarding PM which I fixed" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: designware: Do not use parameters from ACPI on Dell Inspiron 7348 MAINTAINERS: add maintainers for Synopsis Designware I2C drivers i2c: designware-platdrv: enable RuntimePM before registering to the core i2c: s3c2410: enable RuntimePM before registering to the core i2c: rcar: enable RuntimePM before registering to the core i2c: return probe deferred status on dev_pm_domain_attach
-
Mika Westerberg authored
ACPI SSCN/FMCN methods were originally added because then the platform can provide the most accurate HCNT/LCNT values to the driver. However, this seems not to be true for Dell Inspiron 7348 where using these causes the touchpad to fail in boot: i2c_hid i2c-DLL0675:00: failed to retrieve report from device. i2c_designware INT3433:00: i2c_dw_handle_tx_abort: lost arbitration i2c_hid i2c-DLL0675:00: failed to retrieve report from device. i2c_designware INT3433:00: controller timed out The values received from ACPI are (in fast mode): HCNT: 72 LCNT: 160 this translates to following timings (input clock is 100MHz on Broadwell): tHIGH: 720 ns (spec min 600 ns) tLOW: 1600 ns (spec min 1300 ns) Bus period: 2920 ns (assuming 300 ns tf and tr) Bus speed: 342.5 kHz Both tHIGH and tLOW are within the I2C specification. The calculated values when ACPI parameters are not used are (in fast mode): HCNT: 87 LCNT: 159 which translates to: tHIGH: 870 ns (spec min 600 ns) tLOW: 1590 ns (spec min 1300 ns) Bus period 3060 ns (assuming 300 ns tf and tr) Bus speed 326.8 kHz These values are also within the I2C specification. Since both ACPI and calculated values meet the I2C specification timing requirements it is hard to say why the touchpad does not function properly with the ACPI values except that the bus speed is higher in this case (but still well below the max 400kHz). Solve this by adding DMI quirk to the driver that disables using ACPI parameters on this particulare machine. Reported-by: Pavel Roskin <plroskin@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Pavel Roskin <plroskin@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
-
- 17 Oct, 2015 3 commits
-
-
Linus Torvalds authored
Merge branches 'irq-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq/timer fixes from Thomas Gleixner: "irq: a fix for the new hierarchical MSI interrupt handling which unbreaks PCI=n configurations. timers: a fix for the new hrtimer clock offset update mechanism to ensure that the boot time offset is respected" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Do not use pci_msi_[un]mask_irq as default methods * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Increment clock_was_set_seq in timekeeping_init()
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input fixes from Dmitry Torokhov: "Just two small fixups to ads7846 touchscreen controller driver and Cypress touchpad driver" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: cyapa - fix the copy paste error on electrodes_rx value Input: ads7846 - correct the value got from SPI
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fix from Stephen Boyd: "Just one revert for Armada XP devices: the conversion to of_clk_get_parent_name() wasn't a direct translation, so we revert back to of_clk_get() + __clk_get_name(). We could make of_clk_get_parent_name() more robust, but that may have unintended side-effects, so we'll do that in the next version" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: Partially revert "clk: mvebu: Convert to clk_hw based provider APIs"
-
- 16 Oct, 2015 20 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds authored
Pull device mapper fixes from Mike Snitzer: "Two DM target error path cleanup fixes (one for stable in DM thinp and one for a v4.3-rc5 thinko in DM snapshot)" * tag 'dm-4.3-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: fix missing pool reference count decrement in pool_ctr error path dm snapshot persistent: fix missing cleanup in persistent_ctr error path
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs fixes from Chris Mason: "I have two more bug fixes for btrfs. My commit fixes a bug we hit last week at FB, a combination of lots of hard links and an admin command to resolve inode numbers. Dave is adding checks to make sure balance on current kernels ignores filters it doesn't understand. The penalty for being wrong is just doing more work (not crashing etc), but it's a good fix" * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix use after free iterating extrefs btrfs: check unsupported filters in balance arguments
-
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds authored
Pull Ceph fixes from Sage Weil: "Just two small items from Ilya: The first patch fixes the RBD readahead to grab full objects. The second fixes the write ops to prevent undue promotion when a cache tier is configured on the server side" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use writefull op for object size writes rbd: set max_sectors explicitly
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management and ACPI fixes from Rafael Wysocki: "These fix two recent regressions (ACPICA, the generic power domains framework) and one crash that may happen on specific hardware supported since 4.1 (intel_pstate). Specifics: - Fix a regression introduced by a recent ACPICA cleanup that uncovered a latent bug (Lv Zheng). - Fix a recent regression in the generic power domains framework that may cause it to violate PM QoS latency constraints in some cases (Ulf Hansson). - Fix an intel_pstate driver crash on the Knights Landing chips that do not update the MPERF counter as often as expected by the driver which may result in a divide by 0 (Srinivas Pandruvada)" * tag 'pm+acpi-4.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL) ACPICA: Tables: Fix FADT dependency regression PM / Domains: Fix validation of latency constraints in genpd governor
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Nothing too crazy or exciting: - two MAINTAINERS entries that I didn't see the point in delaying. - one drm mst fix to stop sending uninitialised data to monitors - two amdgpu fixes - one radeon mst tiling fix - one vmwgfx regression fix - one virtio warning fix. I have found one locking problem that needs a bit of reorg to fix, but I'm not sure it's worth putting in -fixes as I don't think we've seen it hit in the real world ever, I just found it using the virtio-gpu driver when working on it. I'll possibly send it next week once I've time to discuss with Daniel" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/virtio: use %llu format string form atomic64_t MAINTAINERS: Add myself as maintainer for the gma500 driver MAINTAINERS: add a maintainer for the atmel-hlcdc DRM driver drm/amdgpu: Keep the pflip interrupts always enabled v7 drm/amdgpu: adjust default dispclk (v2) drm/dp/mst: make mst i2c transfer code more robust. drm/radeon: attach tile property to mst connector drm/vmwgfx: Fix kernel NULL pointer dereference on older hardware
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Re-enable CONFIG_SCSI_DH in our defconfigs - Remove unused os_area_db_id_video_mode - cxl: fix leak of IRQ names in cxl_free_afu_irqs() from Andrew - cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API from Andrew - cxl: fix leak of ctx->mapping when releasing kernel API contexts from Andrew - cxl: Workaround malformed pcie packets on some cards from Philippe - cxl: Fix number of allocated pages in SPA from Christophe Lombard - Fix checkstop in native_hpte_clear() with lockdep from Cyril - Panic on unhandled Machine Check on powernv from Daniel - selftests/powerpc: Fix build failure of load_unaligned_zeropad test * tag 'powerpc-4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix build failure of load_unaligned_zeropad test powerpc/powernv: Panic on unhandled Machine Check powerpc: Fix checkstop in native_hpte_clear() with lockdep cxl: Fix number of allocated pages in SPA cxl: Workaround malformed pcie packets on some cards cxl: fix leak of ctx->mapping when releasing kernel API contexts cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API cxl: fix leak of IRQ names in cxl_free_afu_irqs() powerpc/ps3: Remove unused os_area_db_id_video_mode powerpc/configs: Re-enable CONFIG_SCSI_DH
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: sh: add copy_user_page() alias for __copy_user() lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE mm, dax: fix DAX deadlocks memcg: convert threshold to bytes builddeb: remove debian/files before build mm, fs: obey gfp_mapping for add_to_page_cache()
-
Ross Zwisler authored
copy_user_page() is needed by DAX. Without this we get a compile error for DAX on SH: fs/dax.c:280:2: error: implicit declaration of function `copy_user_page' [-Werror=implicit-function-declaration] copy_user_page(vto, (void __force *)vfrom, vaddr, to); ^ This was done with a random config that happened to include DAX support. This patch has only been compile tested. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
lib/built-in.o: In function `__bitrev32': deftree.c:(.text+0x1e799): undefined reference to `byte_rev_table' deftree.c:(.text+0x1e7a0): undefined reference to `byte_rev_table' deftree.c:(.text+0x1e7b4): undefined reference to `byte_rev_table' deftree.c:(.text+0x1e7c1): undefined reference to `byte_rev_table' Anything which uses bitrevX() has to select BITREVERSE, to grab lib/bitrev.o. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ross Zwisler authored
The following two locking commits in the DAX code: commit 84317297 ("dax: fix race between simultaneous faults") commit 46c043ed ("mm: take i_mmap_lock in unmap_mapping_range() for DAX") introduced a number of deadlocks and other issues which need to be fixed for the v4.3 kernel. The list of issues in DAX after these commits (some newly introduced by the commits, some preexisting) can be found here: https://lkml.org/lkml/2015/9/25/602 (Subject: "Re: [PATCH] dax: fix deadlock in __dax_fault"). This undoes most of the changes introduced by those two commits, essentially returning us to the DAX locking scheme that was used in v4.2. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Tested-by: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shaohua Li authored
page_counter_memparse() returns pages for the threshold, while mem_cgroup_usage() returns bytes for memory usage. Convert the threshold to bytes. Fixes: 3e32cb2e ("memcg: rename cgroup_event to mem_cgroup_event"). Signed-off-by: Shaohua Li <shli@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Riku Voipio authored
Commit 3716001b ("deb-pkg: add source package") added the ability to create a debian changelog file. This exposed that previously the builddeb script hasn't cleared debian/files between builds. As debian/files keeps accumulating entries, the changes file will end up growing indefinelty. With outdated entries in debian/files, builddeb script will exit with failure. This regression impacts those who use "make deb-pkg" target to build kernel into a .deb package and never use "make mrproper" or other means to clean kernel tree from generated directories. To fix the regression, remove debian/files before starting build and in the generated clean rule. Fixes: 3716001b ("deb-pkg: add source package") Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Tested-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Ben Hutchings <ben@decadent.org.uk> Cc: Michal Marek <mmarek@suse.cz> Cc: maximilian attems <maks@stro.at> Cc: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Commit 6afdb859 ("mm: do not ignore mapping_gfp_mask in page cache allocation paths") has caught some users of hardcoded GFP_KERNEL used in the page cache allocation paths. This, however, wasn't complete and there were others which went unnoticed. Dave Chinner has reported the following deadlock for xfs on loop device: : With the recent merge of the loop device changes, I'm now seeing : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073. : : The deadlocked is as follows: : : kloopd1: loop_queue_read_work : xfs_file_iter_read : lock XFS inode XFS_IOLOCK_SHARED (on image file) : page cache read (GFP_KERNEL) : radix tree alloc : memory reclaim : reclaim XFS inodes : log force to unpin inodes : <wait for log IO completion> : : xfs-cil/loop1: <does log force IO work> : xlog_cil_push : xlog_write : <loop issuing log writes> : xlog_state_get_iclog_space() : <blocks due to all log buffers under write io> : <waits for IO completion> : : kloopd1: loop_queue_write_work : xfs_file_write_iter : lock XFS inode XFS_IOLOCK_EXCL (on image file) : <wait for inode to be unlocked> : : i.e. the kloopd, with it's split read and write work queues, has : introduced a dependency through memory reclaim. i.e. that writes : need to be able to progress for reads make progress. : : The problem, fundamentally, is that mpage_readpages() does a : GFP_KERNEL allocation, rather than paying attention to the inode's : mapping gfp mask, which is set to GFP_NOFS. : : The didn't used to happen, because the loop device used to issue : reads through the splice path and that does: : : error = add_to_page_cache_lru(page, mapping, index, : GFP_KERNEL & mapping_gfp_mask(mapping)); This has changed by commit aa4d8616 ("block: loop: switch to VFS ITER_BVEC"). This patch changes mpage_readpage{s} to follow gfp mask set for the mapping. There are, however, other places which are doing basically the same. lustre:ll_dir_filler is doing GFP_KERNEL from the function which apparently uses GFP_NOFS for other allocations so let's make this consistent. cifs:readpages_get_pages is called from cifs_readpages and __cifs_readpages_from_fscache called from the same path obeys mapping gfp. ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well regardless it uses mapping_gfp_mask for the page allocation. ext4_mpage_readpages is the called from the page cache allocation path same as read_pages and read_cache_pages As I've noticed in my previous post I cannot say I would be happy about sprinkling mapping_gfp_mask all over the place and it sounds like we should drop gfp_mask argument altogether and use it internally in __add_to_page_cache_locked that would require all the filesystems to use mapping gfp consistently which I am not sure is the case here. From a quick glance it seems that some file system use it all the time while others are selective. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Dave Chinner <david@fromorbit.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Ming Lei <ming.lei@canonical.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ilya Dryomov authored
This covers only the simplest case - an object size sized write, but it's still useful in tiering setups when EC is used for the base tier as writefull op can be proxied, saving an object promotion. Even though updating ceph_osdc_new_request() to allow writefull should just be a matter of fixing an assert, I didn't do it because its only user is cephfs. All other sites were updated. Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Ilya Dryomov authored
Commit 30e2bc08 ("Revert "block: remove artifical max_hw_sectors cap"") restored a clamp on max_sectors. It's now 2560 sectors instead of 1024, but it's not good enough: we set max_hw_sectors to rbd object size because we don't want object sized I/Os to be split, and the default object size is 4M. So, set max_sectors to max_hw_sectors in rbd at queue init time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
-
Thomas Gleixner authored
timekeeping_init() can set the wall time offset, so we need to increment the clock_was_set_seq counter. That way hrtimers will pick up the early offset immediately. Otherwise on a machine which does not set wall time later in the boot process the hrtimer offset is stale at 0 and wall time timers are going to expire with a delay of 45 years. Fixes: 868a3e91 "hrtimer: Make offset update smarter" Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stefan Liebler <stli@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
-
Rafael J. Wysocki authored
* acpica: ACPICA: Tables: Fix FADT dependency regression * pm-domains: PM / Domains: Fix validation of latency constraints in genpd governor * pm-cpufreq: cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
-
Marc Zyngier authored
When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS is set, and that any of .mask or .unmask are NULL in the irq_chip structure, we set them to pci_msi_[un]mask_irq. This is a bad idea for at least two reasons: - PCI_MSI might not be selected, kernel fails to build (yes, this is legitimate, at least on arm64!) - This may not be a PCI/MSI domain at all (platform MSI, for example) Either way, this looks wrong. Move the overriding of mask/unmask to the PCI counterpart, and panic is any of these two methods is not set in the core code (they really should be present). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/1444760085-27857-1-git-send-email-marc.zyngier@arm.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Arnd Bergmann authored
The virtgpu driver prints the last_seq variable using the %ld or %lu format string, which does not work correctly on all architectures and causes this compiler warning on ARM: drivers/gpu/drm/virtio/virtgpu_fence.c: In function 'virtio_timeline_value_str': drivers/gpu/drm/virtio/virtgpu_fence.c:64:22: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'long long int' [-Wformat=] snprintf(str, size, "%lu", atomic64_read(&fence->drv->last_seq)); ^ drivers/gpu/drm/virtio/virtgpu_debugfs.c: In function 'virtio_gpu_debugfs_irq_info': drivers/gpu/drm/virtio/virtgpu_debugfs.c:37:16: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Wformat=] seq_printf(m, "fence %ld %lld\n", ^ In order to avoid the warnings, this changes the format strings to %llu and adds a cast to u64, which makes it work the same way everywhere. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
-
Patrik Jakobsson authored
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
-