- 20 Apr, 2016 30 commits
-
-
Martyn Welch authored
[ Upstream commit cddc9434 ] The CP2105 is used in the GE Healthcare Remote Alarm Box, with the Manufacturer ID of 0x1901 and Product ID of 0x0194. Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Josh Boyer authored
[ Upstream commit ea6db90e ] A Fedora user reports that the ftdi_sio driver works properly for the ICP DAS I-7561U device. Further, the user manual for these devices instructs users to load the driver and add the ids using the sysfs interface. Add support for these in the driver directly so that the devices work out of the box instead of needing manual configuration. Reported-by: <thesource@mail.ru> CC: stable <stable@vger.kernel.org> Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Filipe Manana authored
[ Upstream commit 56f23fdb ] If we rename an inode A (be it a file or a directory), create a new inode B with the old name of inode A and under the same parent directory, fsync inode B and then power fail, at log tree replay time we end up removing inode A completely. If inode A is a directory then all its files are gone too. Example scenarios where this happens: This is reproducible with the following steps, taken from a couple of test cases written for fstests which are going to be submitted upstream soon: # Scenario 1 mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt mkdir -p /mnt/a/x echo "hello" > /mnt/a/x/foo echo "world" > /mnt/a/x/bar sync mv /mnt/a/x /mnt/a/y mkdir /mnt/a/x xfs_io -c fsync /mnt/a/x <power failure happens> The next time the fs is mounted, log tree replay happens and the directory "y" does not exist nor do the files "foo" and "bar" exist anywhere (neither in "y" nor in "x", nor the root nor anywhere). # Scenario 2 mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt mkdir /mnt/a echo "hello" > /mnt/a/foo sync mv /mnt/a/foo /mnt/a/bar echo "world" > /mnt/a/foo xfs_io -c fsync /mnt/a/foo <power failure happens> The next time the fs is mounted, log tree replay happens and the file "bar" does not exists anymore. A file with the name "foo" exists and it matches the second file we created. Another related problem that does not involve file/data loss is when a new inode is created with the name of a deleted snapshot and we fsync it: mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt mkdir /mnt/testdir btrfs subvolume snapshot /mnt /mnt/testdir/snap btrfs subvolume delete /mnt/testdir/snap rmdir /mnt/testdir mkdir /mnt/testdir xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir <power failure> The next time the fs is mounted the log replay procedure fails because it attempts to delete the snapshot entry (which has dir item key type of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry, resulting in the following error that causes mount to fail: [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257 [52174.512570] ------------[ cut here ]------------ [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]() [52174.514681] BTRFS: Transaction aborted (error -2) [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G W 4.5.0-rc6-btrfs-next-27+ #1 [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [52174.524053] 0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758 [52174.524053] 0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd [52174.524053] 00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88 [52174.524053] Call Trace: [52174.524053] [<ffffffff81264e93>] dump_stack+0x67/0x90 [52174.524053] [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2 [52174.524053] [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs] [52174.524053] [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50 [52174.524053] [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs] [52174.524053] [<ffffffff8118f5e9>] ? iput+0xb0/0x284 [52174.524053] [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs] [52174.524053] [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs] [52174.524053] [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs] [52174.524053] [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs] [52174.524053] [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs] [52174.524053] [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs] [52174.524053] [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs] [52174.524053] [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs] [52174.524053] [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs] [52174.524053] [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf [52174.524053] [<ffffffff8117bafa>] mount_fs+0x67/0x131 [52174.524053] [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde [52174.524053] [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs] [52174.524053] [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf [52174.524053] [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3 [52174.524053] [<ffffffff8117bafa>] mount_fs+0x67/0x131 [52174.524053] [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde [52174.524053] [<ffffffff8119590f>] do_mount+0x8a6/0x9e8 [52174.524053] [<ffffffff811358dd>] ? strndup_user+0x3f/0x59 [52174.524053] [<ffffffff81195c65>] SyS_mount+0x77/0x9f [52174.524053] [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]--- Fix this by forcing a transaction commit when such cases happen. This means we check in the commit root of the subvolume tree if there was any other inode with the same reference when the inode we are fsync'ing is a new inode (created in the current transaction). Test cases for fstests, covering all the scenarios given above, were submitted upstream for fstests: * fstests: generic test for fsync after renaming directory https://patchwork.kernel.org/patch/8694281/ * fstests: generic test for fsync after renaming file https://patchwork.kernel.org/patch/8694301/ * fstests: add btrfs test for fsync after snapshot deletion https://patchwork.kernel.org/patch/8670671/ Cc: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Filipe Manana authored
[ Upstream commit a89ca6f2 ] When we have the no_holes feature enabled, if a we truncate a file to a smaller size, truncate it again but to a size greater than or equals to its original size and fsync it, the log tree will not have any information about the hole covering the range [truncate_1_offset, new_file_size[. Which means if the fsync log is replayed, the file will remain with the state it had before both truncate operations. Without the no_holes feature this does not happen, since when the inode is logged (full sync flag is set) it will find in the fs/subvol tree a leaf with a generation matching the current transaction id that has an explicit extent item representing the hole. Fix this by adding an explicit extent item representing a hole between the last extent and the inode's i_size if we are doing a full sync. The issue is easy to reproduce with the following test case for fstests: . ./common/rc . ./common/filter . ./common/dmflakey _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey # This test was motivated by an issue found in btrfs when the btrfs # no-holes feature is enabled (introduced in kernel 3.14). So enable # the feature if the fs being tested is btrfs. if [ $FSTYP == "btrfs" ]; then _require_btrfs_fs_feature "no_holes" _require_btrfs_mkfs_feature "no-holes" MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" fi rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey # Create our test files and make sure everything is durably persisted. $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ -c "pwrite -S 0xbb 64K 61K" \ $SCRATCH_MNT/foo | _filter_xfs_io $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ -c "pwrite -S 0xff 64K 61K" \ $SCRATCH_MNT/bar | _filter_xfs_io sync # Now truncate our file foo to a smaller size (64Kb) and then truncate # it to the size it had before the shrinking truncate (125Kb). Then # fsync our file. If a power failure happens after the fsync, we expect # our file to have a size of 125Kb, with the first 64Kb of data having # the value 0xaa and the second 61Kb of data having the value 0x00. $XFS_IO_PROG -c "truncate 64K" \ -c "truncate 125K" \ -c "fsync" \ $SCRATCH_MNT/foo # Do something similar to our file bar, but the first truncation sets # the file size to 0 and the second truncation expands the size to the # double of what it was initially. $XFS_IO_PROG -c "truncate 0" \ -c "truncate 253K" \ -c "fsync" \ $SCRATCH_MNT/bar _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again, mount to trigger log replay and validate file # contents. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # We expect foo to have a size of 125Kb, the first 64Kb of data all # having the value 0xaa and the remaining 61Kb to be a hole (all bytes # with value 0x00). echo "File foo content after log replay:" od -t x1 $SCRATCH_MNT/foo # We expect bar to have a size of 253Kb and no extents (any byte read # from bar has the value 0x00). echo "File bar content after log replay:" od -t x1 $SCRATCH_MNT/bar status=0 exit The expected file contents in the golden output are: File foo content after log replay: 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0372000 File bar content after log replay: 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0772000 Without this fix, their contents are: File foo content after log replay: 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb * 0372000 File bar content after log replay: 0000000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee * 0200000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0772000 A test case submission for fstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Filipe Manana authored
[ Upstream commit 36283bf7 ] After commit 4f764e51 ("Btrfs: remove deleted xattrs on fsync log replay"), we can end up in a situation where during log replay we end up deleting xattrs that were never deleted when their file was last fsynced. This happens in the fast fsync path (flag BTRFS_INODE_NEEDS_FULL_SYNC is not set in the inode) if the inode has the flag BTRFS_INODE_COPY_EVERYTHING set, the xattr was added in a past transaction and the leaf where the xattr is located was not updated (COWed or created) in the current transaction. In this scenario the xattr item never ends up in the log tree and therefore at log replay time, which makes the replay code delete the xattr from the fs/subvol tree as it thinks that xattr was deleted prior to the last fsync. Fix this by always logging all xattrs, which is the simplest and most reliable way to detect deleted xattrs and replay the deletes at log replay time. This issue is reproducible with the following test case for fstests: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" here=`pwd` tmp=/tmp/$$ status=1 # failure is the default! _cleanup() { _cleanup_flakey rm -f $tmp.* } trap "_cleanup; exit \$status" 0 1 2 3 15 # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey . ./common/attr # real QA test starts here # We create a lot of xattrs for a single file. Only btrfs and xfs are currently # able to store such a large mount of xattrs per file, other filesystems such # as ext3/4 and f2fs for example, fail with ENOSPC even if we attempt to add # less than 1000 xattrs with very small values. _supported_fs btrfs xfs _supported_os Linux _need_to_be_root _require_scratch _require_dm_flakey _require_attrs _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create the test file with some initial data and make sure everything is # durably persisted. $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" $SCRATCH_MNT/foo | _filter_xfs_io sync # Add many small xattrs to our file. # We create such a large amount because it's needed to trigger the issue found # in btrfs - we need to have an amount that causes the fs to have at least 3 # btree leafs with xattrs stored in them, and it must work on any leaf size # (maximum leaf/node size is 64Kb). num_xattrs=2000 for ((i = 1; i <= $num_xattrs; i++)); do name="user.attr_$(printf "%04d" $i)" $SETFATTR_PROG -n $name -v "val_$(printf "%04d" $i)" $SCRATCH_MNT/foo done # Sync the filesystem to force a commit of the current btrfs transaction, this # is a necessary condition to trigger the bug on btrfs. sync # Now update our file's data and fsync the file. # After a successful fsync, if the fsync log/journal is replayed we expect to # see all the xattrs we added before with the same values (and the updated file # data of course). Btrfs used to delete some of these xattrs when it replayed # its fsync log/journal. $XFS_IO_PROG -c "pwrite -S 0xbb 8K 16K" \ -c "fsync" \ $SCRATCH_MNT/foo | _filter_xfs_io # Simulate a crash/power loss. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again and mount. This makes the fs replay its fsync log. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey echo "File content after crash and log replay:" od -t x1 $SCRATCH_MNT/foo echo "File xattrs after crash and log replay:" for ((i = 1; i <= $num_xattrs; i++)); do name="user.attr_$(printf "%04d" $i)" echo -n "$name=" $GETFATTR_PROG --absolute-names -n $name --only-values $SCRATCH_MNT/foo echo done status=0 exit The golden output expects all xattrs to be available, and with the correct values, after the fsync log is replayed. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Jerome Marchand authored
[ Upstream commit 8d4a2ec1 ] Changes since V1: fixed the description and added KASan warning. In assoc_array_insert_into_terminal_node(), we call the compare_object() method on all non-empty slots, even when they're not leaves, passing a pointer to an unexpected structure to compare_object(). Currently it causes an out-of-bound read access in keyring_compare_object detected by KASan (see below). The issue is easily reproduced with keyutils testsuite. Only call compare_object() when the slot is a leave. KASan warning: ================================================================== BUG: KASAN: slab-out-of-bounds in keyring_compare_object+0x213/0x240 at addr ffff880060a6f838 Read of size 8 by task keyctl/1655 ============================================================================= BUG kmalloc-192 (Not tainted): kasan: bad access detected ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in assoc_array_insert+0xfd0/0x3a60 age=69 cpu=1 pid=1647 ___slab_alloc+0x563/0x5c0 __slab_alloc+0x51/0x90 kmem_cache_alloc_trace+0x263/0x300 assoc_array_insert+0xfd0/0x3a60 __key_link_begin+0xfc/0x270 key_create_or_update+0x459/0xaf0 SyS_add_key+0x1ba/0x350 entry_SYSCALL_64_fastpath+0x12/0x76 INFO: Slab 0xffffea0001829b80 objects=16 used=8 fp=0xffff880060a6f550 flags=0x3fff8000004080 INFO: Object 0xffff880060a6f740 @offset=5952 fp=0xffff880060a6e5d1 Bytes b4 ffff880060a6f730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f740: d1 e5 a6 60 00 88 ff ff 0e 00 00 00 00 00 00 00 ...`............ Object ffff880060a6f750: 02 cf 8e 60 00 88 ff ff 02 c0 8e 60 00 88 ff ff ...`.......`.... Object ffff880060a6f760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f7d0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffff880060a6f7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 0 PID: 1655 Comm: keyctl Tainted: G B 4.5.0-rc4-kasan+ #291 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000000 000000001b2800b4 ffff880060a179e0 ffffffff81b60491 ffff88006c802900 ffff880060a6f740 ffff880060a17a10 ffffffff815e2969 ffff88006c802900 ffffea0001829b80 ffff880060a6f740 ffff880060a6e650 Call Trace: [<ffffffff81b60491>] dump_stack+0x85/0xc4 [<ffffffff815e2969>] print_trailer+0xf9/0x150 [<ffffffff815e9454>] object_err+0x34/0x40 [<ffffffff815ebe50>] kasan_report_error+0x230/0x550 [<ffffffff819949be>] ? keyring_get_key_chunk+0x13e/0x210 [<ffffffff815ec62d>] __asan_report_load_n_noabort+0x5d/0x70 [<ffffffff81994cc3>] ? keyring_compare_object+0x213/0x240 [<ffffffff81994cc3>] keyring_compare_object+0x213/0x240 [<ffffffff81bc238c>] assoc_array_insert+0x86c/0x3a60 [<ffffffff81bc1b20>] ? assoc_array_cancel_edit+0x70/0x70 [<ffffffff8199797d>] ? __key_link_begin+0x20d/0x270 [<ffffffff8199786c>] __key_link_begin+0xfc/0x270 [<ffffffff81993389>] key_create_or_update+0x459/0xaf0 [<ffffffff8128ce0d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81992f30>] ? key_type_lookup+0xc0/0xc0 [<ffffffff8199e19d>] ? lookup_user_key+0x13d/0xcd0 [<ffffffff81534763>] ? memdup_user+0x53/0x80 [<ffffffff819983ea>] SyS_add_key+0x1ba/0x350 [<ffffffff81998230>] ? key_get_type_from_user.constprop.6+0xa0/0xa0 [<ffffffff828bcf4e>] ? retint_user+0x18/0x23 [<ffffffff8128cc7e>] ? trace_hardirqs_on_caller+0x3fe/0x580 [<ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19 [<ffffffff828bc432>] entry_SYSCALL_64_fastpath+0x12/0x76 Memory state around the buggy address: ffff880060a6f700: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc >ffff880060a6f800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880060a6f880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880060a6f900: fc fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00 ================================================================== Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Dennis Kadioglu authored
[ Upstream commit b4203ff5 ] Plantronics BT300 does not support reading the sample rate which leads to many lines of "cannot get freq at ep 0x1". This patch adds the USB ID of the BT300 to quirks.c and avoids those error messages. Signed-off-by: Dennis Kadioglu <denk@post.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
David Disseldorp authored
[ Upstream commit 2224d879 ] As of 5a60e876, RBD object request allocations are made via rbd_obj_request_create() with GFP_NOIO. However, subsequent OSD request allocations in rbd_osd_req_create*() use GFP_ATOMIC. With heavy page cache usage (e.g. OSDs running on same host as krbd client), rbd_osd_req_create() order-1 GFP_ATOMIC allocations have been observed to fail, where direct reclaim would have allowed GFP_NOIO allocations to succeed. Cc: stable@vger.kernel.org # 3.18+ Suggested-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Neil Brown <neilb@suse.com> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Paolo Bonzini authored
[ Upstream commit 95272c29 ] -ftracer can duplicate asm blocks causing compilation to fail in noclone functions. For example, KVM declares a global variable in an asm like asm("2: ... \n .pushsection data \n .global vmx_return \n vmx_return: .long 2b"); and -ftracer causes a double declaration. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: stable@vger.kernel.org Cc: kvm@vger.kernel.org Reported-by: Linda Walsh <lkml@tlinx.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Joe Perches authored
[ Upstream commit cb984d10 ] As gcc major version numbers are going to advance rather rapidly in the future, there's no real value in separate files for each compiler version. Deduplicate some of the macros #defined in each file too. Neaten comments using normal kernel commenting style. Signed-off-by: Joe Perches <joe@perches.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Johannes Berg authored
[ Upstream commit 62b14b24 ] The original hand-implemented hash-table in mac80211 couldn't result in insertion errors, and while converting to rhashtable I evidently forgot to check the errors. This surfaced now only because Ben is adding many identical keys and that resulted in hidden insertion errors. Cc: stable@vger.kernel.org Fixes: 7bedd0cf ("mac80211: use rhashtable for station table") Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Lyude authored
[ Upstream commit 9e60290d ] After unplugging a DP MST display from the system, we have to go through and destroy all of the DRM connectors associated with it since none of them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector() doesn't do a good enough job of ensuring that throughout the destruction process that no modesettings can be done with the connectors. As it is right now, intel_dp_destroy_mst_connector() works like this: * Take all modeset locks * Clear the configuration of the crtc on the connector, if there is one * Drop all modeset locks, this is required because of circular dependency issues that arise with trying to remove the connector from sysfs with modeset locks held * Unregister the connector * Take all modeset locks, again * Do the rest of the required cleaning for destroying the connector * Finally drop all modeset locks for good This only works sometimes. During the destruction process, it's very possible that a userspace application will attempt to do a modesetting using the connector. When we drop the modeset locks, an ioctl handler such as drm_mode_setcrtc has the oppurtunity to take all of the modeset locks from us. When this happens, one thing leads to another and eventually we end up committing a mode with the non-existent connector: [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f [drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f [drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi And in some cases, such as with the T460s using an MST dock, this results in breaking modesetting and/or panicking the system. To work around this, we now unregister the connector at the very beginning of intel_dp_destroy_mst_connector(), grab all the modesetting locks, and then hold them until we finish the rest of the function. CC: stable@vger.kernel.org Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Rob Clark <rclark@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1458155884-13877-1-git-send-email-cpaul@redhat.com (cherry picked from commit 1f771755) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Maarten Lankhorst authored
[ Upstream commit 20fae983 ] Fully remove the MST connector from the atomic state, and remove the early returns in check_*_state for MST connectors. With atomic the state can be made consistent all the time. Thanks to Sivakumar Thulasimani for the idea of using drm_atomic_helper_set_config. Changes since v1: - Remove the MST check in intel_connector_check_state too. Changes since v2: - Use drm_atomic_helper_set_config. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Andy Shevchenko authored
[ Upstream commit 4f4bc0ab ] There is a typo in documentation regarding to descriptor empty bit (DESCE) which is set to 1 when descriptor is empty. Thus, status register at the end of a transfer usually returns all DESCE bits set and thus it will never be zero. Moreover, there are 2 bits (CDESC) that encode current descriptor, on which interrupt has been asserted. In case when we have few descriptors programmed we might have non-zero value. Remove DESCE and CDESC bits from DMA channel status register (HSU_CH_SR) when reading it. Fixes: 2b49e0c5 ("dmaengine: append hsu DMA driver") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Yoshihiro Shimoda authored
[ Upstream commit 4fccb076 ] This patch fixes an issue that usbhsg_queue_done() may cause kernel panic when dma callback is running and usb_ep_disable() is called by interrupt handler. (Especially, we can reproduce this issue using g_audio with usb-dmac driver.) For example of a flow: usbhsf_dma_complete (on tasklet) --> usbhsf_pkt_handler (on tasklet) --> usbhsg_queue_done (on tasklet) *** interrupt happened and usb_ep_disable() is called *** --> usbhsg_queue_pop (on tasklet) Then, oops happened. Fixes: e73a9891 ("usb: renesas_usbhs: add DMAEngine support") Cc: <stable@vger.kernel.org> # v3.1+ Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Boris Ostrovsky authored
[ Upstream commit ff1e22e7 ] Moving an unmasked irq may result in irq handler being invoked on both source and target CPUs. With 2-level this can happen as follows: On source CPU: evtchn_2l_handle_events() -> generic_handle_irq() -> handle_edge_irq() -> eoi_pirq(): irq_move_irq(data); /***** WE ARE HERE *****/ if (VALID_EVTCHN(evtchn)) clear_evtchn(evtchn); If at this moment target processor is handling an unrelated event in evtchn_2l_handle_events()'s loop it may pick up our event since target's cpu_evtchn_mask claims that this event belongs to it *and* the event is unmasked and still pending. At the same time, source CPU will continue executing its own handle_edge_irq(). With FIFO interrupt the scenario is similar: irq_move_irq() may result in a EVTCHNOP_unmask hypercall which, in turn, may make the event pending on the target CPU. We can avoid this situation by moving and clearing the event while keeping event masked. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Takashi Iwai authored
[ Upstream commit f03b24a8 ] Phoenix Audio TMX320 gives the similar error when the sample rate is asked: usb 2-1.3: 2:1: cannot get freq at ep 0x85 usb 2-1.3: 1:1: cannot get freq at ep 0x2 .... Add the corresponding USB-device ID (1de7:0014) to snd_usb_get_sample_rate_quirk() list. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=110221 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Theodore Ts'o authored
[ Upstream commit c325a67c ] Previously, ext4 would fail the mount if the file system had the quota feature enabled and quota mount options (used for the older quota setups) were present. This broke xfstests, since xfs silently ignores the usrquote and grpquota mount options if they are specified. This commit changes things so that we are consistent with xfs; having the mount options specified is harmless, so no sense break users by forbidding them. Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Yuki Shibuya authored
[ Upstream commit 321c5658 ] Non maskable interrupts (NMI) are preferred to interrupts in current implementation. If a NMI is pending and NMI is blocked by the result of nmi_allowed(), pending interrupt is not injected and enable_irq_window() is not executed, even if interrupts injection is allowed. In old kernel (e.g. 2.6.32), schedule() is often called in NMI context. In this case, interrupts are needed to execute iret that intends end of NMI. The flag of blocking new NMI is not cleared until the guest execute the iret, and interrupts are blocked by pending NMI. Due to this, iret can't be invoked in the guest, and the guest is starved until block is cleared by some events (e.g. canceling injection). This patch injects pending interrupts, when it's allowed, even if NMI is blocked. And, If an interrupts is pending after executing inject_pending_event(), enable_irq_window() is executed regardless of NMI pending counter. Cc: stable@vger.kernel.org Signed-off-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Theodore Ts'o authored
[ Upstream commit daf647d2 ] With the internal Quota feature, mke2fs creates empty quota inodes and quota usage tracking is enabled as soon as the file system is mounted. Since quotacheck is no longer preallocating all of the blocks in the quota inode that are likely needed to be written to, we are now seeing a lockdep false positive caused by needing to allocate a quota block from inside ext4_map_blocks(), while holding i_data_sem for a data inode. This results in this complaint: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); Google-Bug-Id: 27907753 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Martin K. Petersen authored
[ Upstream commit f08bb1e0 ] During revalidate we check whether device capacity has changed before we decide whether to output disk information or not. The check for old capacity failed to take into account that we scaled sdkp->capacity based on the reported logical block size. And therefore the capacity test would always fail for devices with sectors bigger than 512 bytes and we would print several copies of the same discovery information. Avoid scaling sdkp->capacity and instead adjust the value on the fly when setting the block device capacity and generating fake C/H/S geometry. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: <stable@vger.kernel.org> Reported-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinicke <hare@suse.de> Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Mauro Carvalho Chehab authored
[ Upstream commit e8e3039f ] The au0828 dev_state is actually a bit mask. It should not be checking with "==" but, instead, with a logic and. There are some places where it was doing it wrong. Fix that by replacing the dev_state set/clear/test with the bitops. As reviewed by Shuah: "Looks good. Tested running bind/unbind au0828 loop for 1000 times. Didn't see any problems and the v4l2_querycap() problem has been fixed with this patch. After the above test, ran bind/unbind snd_usb_audio 1000 times. Didn't see any problems. Generated media graph and the graph looks good." Cc: stable@vger.kernel.org Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com> Tested-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Shuah Khan authored
[ Upstream commit ed940cd2 ] au0828_v4l2_close() check for dev_state == DEV_DISCONNECTED will fail to detect the device disconnected state correctly, if au0828_v4l2_open() runs to set the DEV_INITIALIZED bit. A loop test of bind/unbind found this bug by increasing the likelihood of au0828_v4l2_open() occurring while unbind is in progress. When au0828_v4l2_close() fails to detect that the device is in disconnect state, it attempts to power down the device and fails with the following general protection fault: [ 260.992962] Call Trace: [ 260.993008] [<ffffffffa0f80f0f>] ? xc5000_sleep+0x8f/0xd0 [xc5000] [ 260.993095] [<ffffffffa0f6803c>] ? fe_standby+0x3c/0x50 [tuner] [ 260.993186] [<ffffffffa0ef541c>] au0828_v4l2_close+0x53c/0x620 [au0828] [ 260.993298] [<ffffffffa0d08ec0>] v4l2_release+0xf0/0x210 [videodev] [ 260.993382] [<ffffffff81570f9c>] __fput+0x1fc/0x6c0 [ 260.993449] [<ffffffff815714ce>] ____fput+0xe/0x10 [ 260.993519] [<ffffffff8116eb83>] task_work_run+0x133/0x1f0 [ 260.993602] [<ffffffff810035d0>] exit_to_usermode_loop+0x140/0x170 [ 260.993681] [<ffffffff810061ca>] syscall_return_slowpath+0x16a/0x1a0 [ 260.993754] [<ffffffff82835fb3>] entry_SYSCALL_64_fastpath+0xa6/0xa8 Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Oliver Neukum authored
[ Upstream commit 5a07975a ] The driver can be crashed with devices that expose crafted descriptors with too few endpoints. See: http://seclists.org/bugtraq/2016/Mar/61Signed-off-by: Oliver Neukum <ONeukum@suse.com> [johan: fix OOB endpoint check and add error messages ] Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Oliver Neukum authored
[ Upstream commit c55aee1b ] An attack using missing endpoints exists. CVE-2016-3137 Signed-off-by: Oliver Neukum <ONeukum@suse.com> CC: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Oliver Neukum authored
[ Upstream commit 4e9a0b05 ] An attack using the lack of sanity checking in probe is known. This patch checks for the existence of a second port. CVE-2016-3136 Signed-off-by: Oliver Neukum <ONeukum@suse.com> CC: stable@vger.kernel.org [johan: add error message ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
John Keeping authored
[ Upstream commit d59a1f71 ] The SPICE protocol considers the position of a cursor to be the location of its active pixel on the display, so the cursor is drawn with its top-left corner at "(x - hot_spot_x, y - hot_spot_y)" but the DRM cursor position gives the location where the top-left corner should be drawn, with the hotspot being a hint for drivers that need it. This fixes the location of the window resize cursors when using Fluxbox with the QXL DRM driver and both the QXL and modesetting X drivers. Signed-off-by: John Keeping <john@metanate.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/1447845445-2116-1-git-send-email-john@metanate.comSigned-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Yoshihiro Shimoda authored
[ Upstream commit 6490865c ] This patch adds a code to surely disable TX IRQ of the pipe before starting TX DMAC transfer. Otherwise, a lot of unnecessary TX IRQs may happen in rare cases when DMAC is used. Fixes: e73a9891 ("usb: renesas_usbhs: add DMAEngine support") Cc: <stable@vger.kernel.org> # v3.1+ Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Yoshihiro Shimoda authored
[ Upstream commit 894f2fc4 ] When unexpected situation happened (e.g. tx/rx irq happened while DMAC is used), the usbhsf_pkt_handler() was possible to cause NULL pointer dereference like the followings: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 80000007 [#1] SMP ARM Modules linked in: usb_f_acm u_serial g_serial libcomposite CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.5.0-rc6-00842-gac57066-dirty #63 Hardware name: Generic R8A7790 (Flattened Device Tree) task: c0729c00 ti: c0724000 task.ti: c0724000 PC is at 0x0 LR is at usbhsf_pkt_handler+0xac/0x118 pc : [<00000000>] lr : [<c03257e0>] psr: 60000193 sp : c0725db8 ip : 00000000 fp : c0725df4 r10: 00000001 r9 : 00000193 r8 : ef3ccab4 r7 : ef3cca10 r6 : eea4586c r5 : 00000000 r4 : ef19ceb4 r3 : 00000000 r2 : 0000009c r1 : c0725dc4 r0 : ef19ceb4 This patch adds a condition to avoid the dereference. Fixes: e73a9891 ("usb: renesas_usbhs: add DMAEngine support") Cc: <stable@vger.kernel.org> # v3.1+ Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Lokesh Vutla authored
[ Upstream commit 3ca4a238 ] Commit 127500cc ("ARM: OMAP2+: Only write the sysconfig on idle when necessary") talks about verification of sysconfig cache value before updating it, only during idle path. But the patch is adding the verification in the enable path. So, adding the check in a proper place as per the commit description. Not keeping this check during enable path as there is a chance of losing context and it is safe to do on idle as the context of the register will never be lost while the device is active. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Tero Kristo <t-kristo@ti.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: <stable@vger.kernel.org> # 3.12+ Fixes: commit 127500cc "ARM: OMAP2+: Only write the sysconfig on idle when necessary" [paul@pwsan.com: appears to have been caused by my own mismerge of the originally posted patch] Signed-off-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
- 19 Apr, 2016 2 commits
-
-
Alan Stern authored
[ Upstream commit 972e6a99 ] The usbhid driver has inconsistently duplicated code in its post-reset, resume, and reset-resume pathways. reset-resume doesn't check HID_STARTED before trying to restart the I/O queues. resume fails to clear the HID_SUSPENDED flag if HID_STARTED isn't set. resume calls usbhid_restart_queues() with usbhid->lock held and the others call it without holding the lock. The first item in particular causes a problem following a reset-resume if the driver hasn't started up its I/O. URB submission fails because usbhid->urbin is NULL, and this triggers an unending reset-retry loop. This patch fixes the problem by creating a new subroutine, hid_restart_io(), to carry out all the common activities. It also adds some checks that were missing in the original code: After a reset, there's no need to clear any halted endpoints. After a resume, if a reset is pending there's no need to restart any I/O until the reset is finished. After a resume, if the interrupt-IN endpoint is halted there's no need to submit the input URB until the halt has been cleared. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Daniel Fraga <fragabr@gmail.com> Tested-by: Daniel Fraga <fragabr@gmail.com> CC: <stable@vger.kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Sasha Levin authored
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
- 18 Apr, 2016 8 commits
-
-
Hyungwon Hwang authored
[ Upstream commit 023d8218 ] The commit [bd481285: ALSA: hda - Fix forgotten HDMI monitor_present update] covered the missing update of monitor_present flag, but this caused a regression for devices without the i915 eld notifier. Since the old code supposed that pin_eld->monitor_present was updated by the caller side, the hdmi_present_sense_via_verbs() doesn't update the temporary eld->monitor_present but only pin_eld->monitor_present, which is now overridden in update_eld(). The fix is to update pin_eld->monitor_present as well before calling update_eld(). Note that this may still leave monitor_present flag in an inconsistent state when the driver repolls, but this is at least the old behavior. More proper fix will follow in the later patch. Fixes: bd481285 ('ALSA: hda - Fix forgotten HDMI monitor_present update') Signed-off-by: Hyungwon Hwang <hyungwon.hwang7@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Cyrille Pitchen authored
[ Upstream commit 9b52d55f ] The change fixes potential oops while accessing iomem on invalid address, if devm_ioremap_resource() fails due to some reason. The devm_ioremap_resource() function returns ERR_PTR() and never returns NULL, which makes useless a following check for NULL. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Fixes: b0e8b341 ("crypto: atmel - use devm_xxx() managed function") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
dann frazier authored
[ Upstream commit 67dfa175 ] GCC6 (and Linaro's 2015.12 snapshot of GCC5) has a new default that uses adrp/ldr or adrp/add to address literal pools. When CONFIG_ARM64_ERRATUM_843419 is enabled, modules built with this toolchain fail to load: module libahci: unsupported RELA relocation: 275 This patch fixes the problem by passing '-mpc-relative-literal-loads' to the compiler. Cc: stable@vger.kernel.org Fixes: df057cc7 ("arm64: errata: add module build workaround for erratum #843419") BugLink: http://bugs.launchpad.net/bugs/1533009Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Christophe Lyon <christophe.lyon@linaro.org> Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Vlastimil Babka authored
[ Upstream commit d9dddbf5 ] Hanjun Guo has reported that a CMA stress test causes broken accounting of CMA and free pages: > Before the test, I got: > -bash-4.3# cat /proc/meminfo | grep Cma > CmaTotal: 204800 kB > CmaFree: 195044 kB > > > After running the test: > -bash-4.3# cat /proc/meminfo | grep Cma > CmaTotal: 204800 kB > CmaFree: 6602584 kB > > So the freed CMA memory is more than total.. > > Also the the MemFree is more than mem total: > > -bash-4.3# cat /proc/meminfo > MemTotal: 16342016 kB > MemFree: 22367268 kB > MemAvailable: 22370528 kB Laura Abbott has confirmed the issue and suspected the freepage accounting rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA pageblocks: > CMA isolates MAX_ORDER aligned blocks, but, during the process, > partialy isolated block exists. If MAX_ORDER is 11 and > pageblock_order is 9, two pageblocks make up MAX_ORDER > aligned block and I can think following scenario because pageblock > (un)isolation would be done one by one. > > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, > MIGRATE_ISOLATE, respectively. > > CC -> IC -> II (Isolation) > II -> CI -> CC (Un-isolation) > > If some pages are freed at this intermediate state such as IC or CI, > that page could be merged to the other page that is resident on > different type of pageblock and it will cause wrong freepage count. This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but since it doesn't hold the zone->lock between pageblocks, a race window does exist. It's also likely that unexpected merging can occur between MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in __free_one_page() since commit 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock"). However, we only check the migratetype of the pageblock where buddy merging has been initiated, not the migratetype of the buddy pageblock (or group of pageblocks) which can be MIGRATE_ISOLATE. Joonsoo has suggested checking for buddy migratetype as part of page_is_buddy(), but that would add extra checks in allocator hotpath and bloat-o-meter has shown significant code bloat (the function is inline). This patch reduces the bloat at some expense of more complicated code. The buddy-merging while-loop in __free_one_page() is initially bounded to pageblock_border and without any migratetype checks. The checks are placed outside, bumping the max_order if merging is allowed, and returning to the while-loop with a statement which can't be possibly considered harmful. This fixes the accounting bug and also removes the arguably weird state in the original commit 3c605096 where buddies could be left unmerged. Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Link: https://lkml.org/lkml/2016/3/2/280Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Debugged-by: Laura Abbott <labbott@redhat.com> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Kirill A. Shutemov authored
[ Upstream commit d00181b9 ] Let's try to be consistent about data type of page order. [sfr@canb.auug.org.au: fix build (type of pageblock_order)] [hughd@google.com: some configs end up with MAX_ORDER and pageblock_order having different types] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
[ Upstream commit d70ddd7a ] __free_pages_bootmem prepares a page for release to the buddy allocator and assumes that the struct page is initialised. Parallel initialisation of struct pages defers initialisation and __free_pages_bootmem can be called for struct pages that cannot yet map struct page to PFN. This patch passes PFN to __free_pages_bootmem with no other functional change. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Nate Zimmer <nzimmer@sgi.com> Tested-by: Waiman Long <waiman.long@hp.com> Tested-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Joseph Qi authored
[ Upstream commit be12b299 ] When master handles convert request, it queues ast first and then returns status. This may happen that the ast is sent before the request status because the above two messages are sent by two threads. And right after the ast is sent, if master down, it may trigger BUG in dlm_move_lockres_to_recovery_list in the requested node because ast handler moves it to grant list without clear lock->convert_pending. So remove BUG_ON statement and check if the ast is processed in dlmconvert_remote. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-
Joseph Qi authored
[ Upstream commit ac7cf246 ] There is a race window between dlmconvert_remote and dlm_move_lockres_to_recovery_list, which will cause a lock with OCFS2_LOCK_BUSY in grant list, thus system hangs. dlmconvert_remote { spin_lock(&res->spinlock); list_move_tail(&lock->list, &res->converting); lock->convert_pending = 1; spin_unlock(&res->spinlock); status = dlm_send_remote_convert_request(); >>>>>> race window, master has queued ast and return DLM_NORMAL, and then down before sending ast. this node detects master down and calls dlm_move_lockres_to_recovery_list, which will revert the lock to grant list. Then OCFS2_LOCK_BUSY won't be cleared as new master won't send ast any more because it thinks already be authorized. spin_lock(&res->spinlock); lock->convert_pending = 0; if (status != DLM_NORMAL) dlm_revert_pending_convert(res, lock); spin_unlock(&res->spinlock); } In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set (res is still in recovering) or res master changed (new master has finished recovery), reset the status to DLM_RECOVERING, then it will retry convert. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
-