- 06 May, 2014 40 commits
-
-
Leigh Brown authored
commit a2f8d6b3 upstream. In "ARM: dts: am33xx: correcting dt node unit address for usb", the usb_ctrl_mod and cppi41dma nodes were updated with the correct register addresses. However, the dts files that reference these nodes were not updated, and those devices are no longer being enabled. This patch corrects the references for the affected dts files. Signed-off-by:
Leigh Brown <leigh@solinno.co.uk> Signed-off-by:
Tony Lindgren <tony@atomide.com> Cc: Johan Hovold <jhovold@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aaron Sanders authored
commit b16c02fb upstream. Add device ids to pl2303 for the Hewlett-Packard HP POS pole displays: LD960: 03f0:0B39 LCM220: 03f0:3139 LCM960: 03f0:3239 [ Johan: fix indentation and sort PIDs numerically ] Signed-off-by:
Aaron Sanders <aaron.sanders@hp.com> Signed-off-by:
Johan Hovold <jhovold@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit 4f2fe2d2 upstream. To avoid memory fetch underflows with larger USB transfers, Tegra SoCs need txfill_tuning's txfifothresh register field set to a non-default value. Add a custom reset override in order to set this up. These values are recommended practice for all Tegra chips. However, I've only noticed practical problems when not setting them this way on systems using Tegra124. Hence, CC: stable only for recent kernels which actually support Tegra124. Signed-off-by:
Stephen Warren <swarren@nvidia.com> Acked-by:
Alan Stern <stern@rowland.harvard.edu> Tested-by:
Alexandre Courbot <acourbot@nvidia.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit 9ef1af9e upstream. The Tegra124 clock DT binding currently provides 3 clocks that don't actually exist; 2 for NAND and one for UART5/UARTE. Delete these. While this is technically an incompatible DT ABI change, nothing could have used these clock IDs for anything practical, since the HW doesn't exist. Signed-off-by:
Stephen Warren <swarren@nvidia.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit 9ba71705 upstream. The Tegra124 clock driver currently provides 3 clocks that don't actually exist; 2 for NAND and one for UART5/UARTE. Delete these. Signed-off-by:
Stephen Warren <swarren@nvidia.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit 862f0eea upstream. Tegra124 only has 4 UARTs. Parts of the documentation hint at a fifth UART, but this appears to be left-over from earlier SoC documentation. Remove the non-existent DT node for UART5. Signed-off-by:
Stephen Warren <swarren@nvidia.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julius Werner authored
commit 1f81b6d2 upstream. We have observed a rare cycle state desync bug after Set TR Dequeue Pointer commands on Intel LynxPoint xHCs (resulting in an endpoint that doesn't fetch new TRBs and thus an unresponsive USB device). It always triggers when a previous Set TR Dequeue Pointer command has set the pointer to the final Link TRB of a segment, and then another URB gets enqueued and cancelled again before it can be completed. Further investigation showed that the xHC had returned the Link TRB in the TRB Pointer field of the Transfer Event (CC == Stopped -- Length Invalid), but when xhci_find_new_dequeue_state() later accesses the Endpoint Context's TR Dequeue Pointer field it is set to the first TRB of the next segment. The driver expects those two values to be the same in this situation, and uses the cycle state of the latter together with the address of the former. This should be fine according to the XHCI specification, since the endpoint ring should be stopped when returning the Transfer Event and thus should not advance over the Link TRB before it gets restarted. However, real-world XHCI implementations apparently don't really care that much about these details, so the driver should follow a more defensive approach to try to work around HC spec violations. This patch removes the stopped_trb variable that had been used to store the TRB Pointer from the last Transfer Event of a stopped TRB. Instead, xhci_find_new_dequeue_state() now relies only on the Endpoint Context, requiring a small amount of additional processing to find the virtual address corresponding to the TR Dequeue Pointer. Some other parts of the function were slightly rearranged to better fit into this model. This patch should be backported to kernels as old as 2.6.31 that contain the commit ae636747 "USB: xhci: URB cancellation support." Signed-off-by:
Julius Werner <jwerner@chromium.org> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit 6e6358fc upstream. We haven't taken i_mutex yet, so we need to use i_size_read(). Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit 622cad13 upstream. The function ext4_update_i_disksize() is used in only one place, in the function mpage_map_and_submit_extent(). Move its code to simplify the code paths, and also move the call to ext4_mark_inode_dirty() into the i_data_sem's critical region, to be consistent with all of the other places where we update i_disksize. That way, we also keep the raw_inode's i_disksize protected, to avoid the following race: CPU #1 CPU #2 down_write(&i_data_sem) Modify i_disk_size up_write(&i_data_sem) down_write(&i_data_sem) Modify i_disk_size Copy i_disk_size to on-disk inode up_write(&i_data_sem) Copy i_disk_size to on-disk inode Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit ec4cb1aa upstream. When heavily exercising xattr code the assertion that jbd2_journal_dirty_metadata() shouldn't return error was triggered: WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260() CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1 Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978 Call Trace: [<ffffffff816311b0>] dump_stack+0x19/0x1b [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0 [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260 [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140 [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0 [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910 [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0 [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140 [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50 [<ffffffff811935ce>] generic_setxattr+0x6e/0x90 [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0 [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0 [<ffffffff8119421e>] setxattr+0x13e/0x1e0 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70 [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100 [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b The reason for the warning is that buffer_head passed into jbd2_journal_dirty_metadata() didn't have journal_head attached. This is caused by the following race of two ext4_xattr_release_block() calls: CPU1 CPU2 ext4_xattr_release_block() ext4_xattr_release_block() lock_buffer(bh); /* False */ if (BHDR(bh)->h_refcount == cpu_to_le32(1)) } else { le32_add_cpu(&BHDR(bh)->h_refcount, -1); unlock_buffer(bh); lock_buffer(bh); /* True */ if (BHDR(bh)->h_refcount == cpu_to_le32(1)) get_bh(bh); ext4_free_blocks() ... jbd2_journal_forget() jbd2_journal_unfile_buffer() -> JH is gone error = ext4_handle_dirty_xattr_block(handle, inode, bh); -> triggers the warning We fix the problem by moving ext4_handle_dirty_xattr_block() under the buffer lock. Sadly this cannot be done in nojournal mode as that function can call sync_dirty_buffer() which would deadlock. Luckily in nojournal mode the race is harmless (we only dirty already freed buffer) and thus for nojournal mode we leave the dirtying outside of the buffer lock. Reported-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Matthew Wilcox authored
commit 9503c67c upstream. ext4_end_bio() currently throws away the error that it receives. Chances are this is part of a spate of errors, one of which will end up getting the error returned to userspace somehow, but we shouldn't take that risk. Also print out the errno to aid in debug. Signed-off-by:
Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kazuya Mio authored
commit 4adb6ab3 upstream. When we try to get 2^32-1 block of the file which has the extent (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON in ext4_ext_put_gap_in_cache(). To avoid the problem, ext4_map_blocks() needs to check the file logical block number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot handle 2^32-1 because the maximum file logical block number is 2^32-2. Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid. So ext4_map_blocks() should also return the same errno. Signed-off-by:
Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin K. Petersen authored
commit b7aa84d9 upstream. Commit 4550dd6c introduced for_each_bvec() which iterates over each bvec attached to a bio or bip. However, the macro fails to check bi_size before dereferencing which can lead to crashes while counting/mapping integrity scatterlist segments. Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit f2ebb3a9 upstream. The current mainline has copies propagated to *all* nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Krzysztof Kozlowski authored
commit 238e1405 upstream. If parent device does not have of_node set the s2mps11_clk_parse_dt() returned NULL. This NULL was later passed to of_clk_add_provider() which dereferenced it in pr_debug() call. Signed-off-by:
Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by:
Mike Turquette <mturquette@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tetsuo Handa authored
commit f81c2015 upstream. Commit 9548906b ('xattr: Constify ->name member of "struct xattr"') missed that ocfs2 is calling kfree(xattr->name). As a result, kernel panic occurs upon calling kfree(xattr->name) because xattr->name refers static constant names. This patch removes kfree(xattr->name) from ocfs2_mknod() and ocfs2_symlink(). Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by:
Tariq Saeed <tariq.x.saeed@oracle.com> Tested-by:
Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by:
Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
alex chen authored
commit f7cf4f5b upstream. Do not put bh when buffer_uptodate failed in ocfs2_write_block and ocfs2_write_super_or_backup, because it will put bh in b_end_io. Otherwise it will hit a warning "VFS: brelse: Trying to free free buffer". Signed-off-by:
Alex Chen <alex.chen@huawei.com> Reviewed-by:
Joseph Qi <joseph.qi@huawei.com> Reviewed-by:
Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <jlbec@evilplan.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Junxiao Bi authored
commit ded2cf71 upstream. There is a race window in dlm_do_recovery() between dlm_remaster_locks() and dlm_reset_recovery() when the recovery master nearly finish the recovery process for a dead node. After the master sends FINALIZE_RECO message in dlm_remaster_locks(), another node may become the recovery master for another dead node, and then send the BEGIN_RECO message to all the nodes included the old master, in the handler of this message dlm_begin_reco_handler() of old master, dlm->reco.dead_node and dlm->reco.new_master will be set to the second dead node and the new master, then in dlm_reset_recovery(), these two variables will be reset to default value. This will cause new recovery master can not finish the recovery process and hung, at last the whole cluster will hung for recovery. old recovery master: new recovery master: dlm_remaster_locks() become recovery master for another dead node. dlm_send_begin_reco_message() dlm_begin_reco_handler() { if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) { return -EAGAIN; } dlm_set_reco_master(dlm, br->node_idx); dlm_set_reco_dead_node(dlm, br->dead_node); } dlm_reset_recovery() { dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM); dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM); } will hang in dlm_remaster_locks() for request dlm locks info Before send FINALIZE_RECO message, recovery master should set DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done, this can break the race windows as the BEGIN_RECO messages will not be handled before DLM_RECO_STATE_FINALIZE flag is cleared. A similar race may happen between new recovery master and normal node which is in dlm_finalize_reco_handler(), also fix it. Signed-off-by:
Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by:
Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by:
Wengang Wang <wen.gang.wang@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Junxiao Bi authored
commit 34aa8dac upstream. This issue was introduced by commit 800deef3 ("ocfs2: use list_for_each_entry where benefical") in 2007 where it replaced list_for_each with list_for_each_entry. The variable "lock" will point to invalid data if "tmpq" list is empty and a panic will be triggered due to this. Sunil advised reverting it back, but the old version was also not right. At the end of the outer for loop, that list_for_each_entry will also set "lock" to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock" will be an stale invalid data and cause the panic. So reverting the list_for_each back and reset "lock" to NULL to fix this issue. Another concern is that this seemes can not happen because the "tmpq" list should not be empty. Let me describe how. old lock resource owner(node 1): migratation target(node 2): image there's lockres with a EX lock from node 2 in granted list, a NR lock from node x with convert_type EX in converting list. dlm_empty_lockres() { dlm_pick_migration_target() { pick node 2 as target as its lock is the first one in granted list. } dlm_migrate_lockres() { dlm_mark_lockres_migrating() { res->state |= DLM_LOCK_RES_BLOCK_DIRTY; wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res)); //after the above code, we can not dirty lockres any more, // so dlm_thread shuffle list will not run downconvert lock from EX to NR upconvert lock from NR to EX <<< migration may schedule out here, then <<< node 2 send down convert request to convert type from EX to <<< NR, then send up convert request to convert type from NR to <<< EX, at this time, lockres granted list is empty, and two locks <<< in the converting list, node x up convert lock followed by <<< node 2 up convert lock. // will set lockres RES_MIGRATING flag, the following // lock/unlock can not run dlm_lockres_release_ast(dlm, res); } dlm_send_one_lockres() dlm_process_recovery_data() for (i=0; i<mres->num_locks; i++) if (ml->node == dlm->node_num) for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) { list_for_each_entry(lock, tmpq, list) if (lock) break; <<< lock is invalid as grant list is empty. } if (lock->ml.node != ml->node) BUG() >>> crash here } I see the above locks status from a vmcore of our internal bug. Signed-off-by:
Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by:
Wengang Wang <wen.gang.wang@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by:
Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Serge Hallyn authored
commit ea1a8217 upstream. If the glibc xattr.h header is included after the uapi header, compilation fails due to an enum re-using a #define from the uapi header. Protect against this by guarding the define and enum inclusions against each other. (See https://lists.debian.org/debian-glibc/2014/03/msg00029.html and https://sourceware.org/glibc/wiki/Synchronizing_Headers for more information.) Signed-off-by:
Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Allan McRae <allan@archlinux.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liu Hua authored
commit 80df2847 upstream. As sysctl_hung_task_timeout_sec is unsigned long, when this value is larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in watchdog will return immediately without sleep and with print : schedule_timeout: wrong timeout value ffffffffffffff83 and then the funtion watchdog will call schedule_timeout_interruptible again and again. The screen will be filled with "schedule_timeout: wrong timeout value ffffffffffffff83" This patch does some check and correction in sysctl, to let the function schedule_timeout_interruptible allways get the valid parameter. Signed-off-by:
Liu Hua <sdu.liu@huawei.com> Tested-by:
Satoru Takeuchi <satoru.takeuchi@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mizuma, Masayoshi authored
commit 55f67141 upstream. When I decrease the value of nr_hugepage in procfs a lot, softlockup happens. It is because there is no chance of context switch during this process. On the other hand, when I allocate a large number of hugepages, there is some chance of context switch. Hence softlockup doesn't happen during this process. So it's necessary to add the context switch in the freeing process as same as allocating process to avoid softlockup. When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing process occupied a CPU over 150 seconds and following softlockup message appeared twice or more. $ echo 6000000 > /proc/sys/vm/nr_hugepages $ cat /proc/sys/vm/nr_hugepages 6000000 $ grep ^Huge /proc/meminfo HugePages_Total: 6000000 HugePages_Free: 6000000 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB $ echo 0 > /proc/sys/vm/nr_hugepages BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ... Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1 Call Trace: free_pool_huge_page+0xb8/0xd0 set_max_huge_pages+0x128/0x190 hugetlb_sysctl_handler_common+0x113/0x140 hugetlb_sysctl_handler+0x1e/0x20 proc_sys_call_handler+0x97/0xd0 proc_sys_write+0x14/0x20 vfs_write+0xb8/0x1a0 sys_write+0x51/0x90 __audit_syscall_exit+0x265/0x290 system_call_fastpath+0x16/0x1b I have not confirmed this problem with upstream kernels because I am not able to prepare the machine equipped with 12TB memory now. However I confirmed that the amount of decreasing hugepages was directly proportional to the amount of required time. I measured required times on a smaller machine. It showed 130-145 hugepages decreased in a millisecond. Amount of decreasing Required time Decreasing rate hugepages (msec) (pages/msec) ------------------------------------------------------------ 10,000 pages == 20GB 70 - 74 135-142 30,000 pages == 60GB 208 - 229 131-144 It means decrement of 6TB hugepages will trigger softlockup with the default threshold 20sec, in this decreasing rate. Signed-off-by:
Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlastimil Babka authored
commit 57e68e9c upstream. A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin fuzzing with trinity. The call site try_to_unmap_cluster() does not lock the pages other than its check_page parameter (which is already locked). The BUG_ON in mlock_vma_page() is not documented and its purpose is somewhat unclear, but apparently it serializes against page migration, which could otherwise fail to transfer the PG_mlocked flag. This would not be fatal, as the page would be eventually encountered again, but NR_MLOCK accounting would become distorted nevertheless. This patch adds a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that effect. The call site try_to_unmap_cluster() is fixed so that for page != check_page, trylock_page() is attempted (to avoid possible deadlocks as we already have check_page locked) and mlock_vma_page() is performed only upon success. If the page lock cannot be obtained, the page is left without PG_mlocked, which is again not a problem in the whole unevictable memory design. Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Bob Liu <bob.liu@oracle.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by:
Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johannes Weiner authored
commit 3a025760 upstream. On NUMA systems, a node may start thrashing cache or even swap anonymous pages while there are still free pages on remote nodes. This is a result of commits 81c0a2bb ("mm: page_alloc: fair zone allocator policy") and fff4068c ("mm: page_alloc: revert NUMA aspect of fair allocation policy"). Before those changes, the allocator would first try all allowed zones, including those on remote nodes, before waking any kswapds. But now, the allocator fastpath doubles as the fairness pass, which in turn can only consider the local node to prevent remote spilling based on exhausted fairness batches alone. Remote nodes are only considered in the slowpath, after the kswapds are woken up. But if remote nodes still have free memory, kswapd should not be woken to rebalance the local node or it may thrash cash or swap prematurely. Fix this by adding one more unfair pass over the zonelist that is allowed to spill to remote nodes after the local fairness pass fails but before entering the slowpath and waking the kswapds. This also gets rid of the GFP_THISNODE exemption from the fairness protocol because the unfair pass is no longer tied to kswapd, which GFP_THISNODE is not allowed to wake up. However, because remote spills can be more frequent now - we prefer them over local kswapd reclaim - the allocation batches on remote nodes could underflow more heavily. When resetting the batches, use atomic_long_read() directly instead of zone_page_state() to calculate the delta as the latter filters negative counter values. Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Rik van Riel <riel@redhat.com> Acked-by:
Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Matt Fleming authored
commit a0c32761 upstream. Kees reported the following error: arch/sh/kernel/dumpstack.c: In function 'print_trace_address': arch/sh/kernel/dumpstack.c:118:2: error: format not a string literal and no format arguments [-Werror=format-security] Use the "%s" format so that it's impossible to interpret 'data' as a format string. Signed-off-by:
Matt Fleming <matt.fleming@intel.com> Reported-by:
Kees Cook <keescook@chromium.org> Acked-by:
Kees Cook <keescook@chromium.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Bellinger authored
commit 03e7848a upstream. This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING status require an extra target_put_sess_cmd() in isert_put_cmd() code when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist() context during session shutdown. The extra kref PUT is required so that transport_generic_free_cmd() invokes the last target_put_sess_cmd() -> target_release_cmd_kref(), which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd descriptor with WRITE_PENDING status, and awake the completion in target_wait_for_sess_cmds() to invoke TFO->release_cmd(). The bug was manifesting itself in target_wait_for_sess_cmds() where a se_cmd descriptor with WRITE_PENDING status would end up sleeping indefinately. Acked-by:
Sagi Grimberg <sagig@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Bellinger authored
commit f46d6a8a upstream. This patch changes isert_conn_create_fastreg_pool() to follow logic in iscsi_target_locate_portal() for determining how many FRMR descriptors to allocate based upon the number of possible per-session command slots that are available. This addresses an OOPs in isert_reg_rdma() where due to the use of ISCSI_DEF_XMIT_CMDS_MAX could end up returning a bogus fast_reg_descriptor when the number of active tags exceeded the original hardcoded max. Note this also includes moving isert_conn_create_fastreg_pool() from isert_connect_request() to isert_put_login_tx() before posting the final Login Response PDU in order to determine the se_nacl->queue_depth (eg: number of tags) per session the target will be enforcing. v2 changes: - Move isert_conn->conn_fr_pool list_head init into isert_conn_request() v3 changes: - Drop unnecessary list_empty() check in isert_reg_rdma() (Sagi) Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sam Bradshaw authored
commit 5eb9291c upstream. This patch fixes 2 issues in the fast completion path: 1) Possible double completions / double dma_unmap_sg() calls due to lack of atomicity in the check and subsequent dereference of the upper layer callback function. Fixed with cmpxchg before unmap and callback. 2) Regression in unaligned IO constraining workaround for p420m devices. Fixed by checking if IO is unaligned and using proper semaphore if so. Signed-off-by:
Sam Bradshaw <sbradshaw@micron.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felipe Franciosi authored
commit 368c89d7 upstream. If the buffers are unmapped after completing a request, then stale data might be in the request. Signed-off-by:
Felipe Franciosi <felipe@paradoxo.org> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felipe Franciosi authored
commit 1044b1bb upstream. We need to set the queue bounce limit during the device initialization to prevent excessive bouncing on 32 bit architectures. Signed-off-by:
Felipe Franciosi <felipe@paradoxo.org> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alan Stern authored
commit 6aec044c upstream. When a driver doesn't have pre_reset, post_reset, or reset_resume methods, the USB core unbinds that driver when its device undergoes a reset or a reset-resume, and then rebinds it afterward. The existing straightforward implementation can lead to problems, because each interface gets unbound and rebound before the next interface is handled. If a driver claims additional interfaces, the claim may fail because the old binding instance may still own the additional interface when the new instance tries to claim it. This patch fixes the problem by first unbinding all the interfaces that are marked (i.e., their needs_binding flag is set) and then rebinding all of them. The patch also makes the helper functions in driver.c a little more uniform and adjusts some out-of-date comments. Signed-off-by:
Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by:
"Poulain, Loic" <loic.poulain@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Mack authored
commit a31a942a upstream. Tests have shown that when a power-up transition is followed by other PHY operations too quickly, the USB port appears dead. Waiting 1ms fixes this problem. Signed-off-by:
Daniel Mack <zonque@gmail.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Simek authored
commit ead5178b upstream. Add new ulpi IDs which are available on Xilinx Zynq boards. Signed-off-by:
Michal Simek <michal.simek@xilinx.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Gortmaker authored
commit f76a1cbe upstream. Commit 3e6c6f63 ("Delay creation of khcvd thread") moved the call of hvc_init from being a device_initcall into hvc_alloc, and used a non-null hvc_driver as indication of whether hvc_init had already been called. The problem with this is that hvc_driver is only assigned a value at the bottom of hvc_init, and so there is a window where multiple hvc_alloc calls can be in progress at the same time and hence try and call hvc_init multiple times. Previously the use of device_init guaranteed that hvc_init was only called once. This manifests itself as sporadic instances of two hvc_init calls racing each other, and with the loser of the race getting -EBUSY from tty_register_driver() and hence that virtual console fails: Couldn't register hvc console driver virtio-ports vport0p1: error -16 allocating hvc for port Here we add an atomic_t to guarantee we'll never run hvc_init twice. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 3e6c6f63 ("Delay creation of khcvd thread") Reported-by:
Jim Somerville <Jim.Somerville@windriver.com> Tested-by:
Jim Somerville <Jim.Somerville@windriver.com> Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felipe Balbi authored
commit 3063a12b upstream. commi 30a70b02 (usb: musb: fix obex in g_nokia.ko causing kernel panic) removed phy_power_on() and phy_power_off() calls from runtime PM callbacks but it failed to note that the driver depended on pm_runtime_get_sync() calls to power up the PHY, thus leaving some platforms without any means to have a working PHY. Fix that by enabling the phy during omap2430_musb_init() and killing it in omap2430_musb_exit(). Fixes: 30a70b02 (usb: musb: fix obex in g_nokia.ko causing kernel panic) Cc: Pali Rohár <pali.rohar@gmail.com> Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Reported-by:
Michael Scott <hashcode0f@gmail.com> Tested-by:
Michael Scott <hashcode0f@gmail.com> Tested-by:
Stefan Roese <sr@denx.de> Reported-by:
Rabin Vincent <rabin@rab.in> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felipe Balbi authored
commit eee3f15d upstream. instead of relying on the otg pointer, which can be NULL in certain cases, we can use the gadget and host pointers we already hold inside struct musb. Tested-by:
Tony Lindgren <tony@atomide.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felipe Balbi authored
commit 61018305 upstream. commit 388e5c51 (usb: dwc3: remove dwc3 dependency on host AND gadget.) created the possibility for host-only and peripheral-only dwc3 builds but left a possible randconfig build error when host-only builds are selected. Reported-by:
Jim Davis <jim.epost@gmail.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Huang Rui authored
commit 06f9b6e5 upstream. Around DWC USB3 2.30a release another bit has been added to the Device-Specific Event (DEVT) Event Information (EvtInfo) bitfield. Because of that, what used to be 8 bits long, has become 9 bits long. Per dwc3 2.30a+ spec in the Device-Specific Event (DEVT), the field of Event Information Bits(EvtInfo) uses [24:16] bits, and it has 9 bits not 8 bits. And the following reserved field uses [31:25] bits not [31:24] bits, and it has 7 bits. So in dwc3_event_devt, the bit mask should be: event_info [24:16] 9 bits reserved31_25 [31:25] 7 bits This patch makes sure that newer core releases will work fine with Linux and that we will decode the event information properly on new core releases. [ balbi@ti.com : improve commit log a bit ] Signed-off-by:
Huang Rui <ray.huang@amd.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wolfram Sang authored
commit 61f03191 upstream. Signed-off-by:
Wolfram Sang <wsa@the-dreams.de> Signed-off-by:
Hans de Goede <hdegoede@redhat.com> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Vaussard authored
commit 8b57b966 upstream. Commit 3fdfedaa "[media] omap3isp: preview: Lower the crop margins" accidentally changed the previewer's cropping, causing the previewer to miss four pixels on each line, thus corrupting the final image. Restored the removed setting. Signed-off-by:
Florian Vaussard <florian.vaussard@epfl.ch> Signed-off-by:
Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-