- 09 Jan, 2015 40 commits
-
-
Cong Wang authored
Since f660daac (oom: thaw threads if oom killed thread is frozen before deferring) OOM killer relies on being able to thaw a frozen task to handle OOM situation but a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE) has reorganized the code and stopped clearing freeze flag in __thaw_task. This means that the target task only wakes up and goes into the fridge again because the freezing condition hasn't changed for it. This reintroduces the bug fixed by f660daac. Fix the issue by checking for TIF_MEMDIE thread flag in freezing_slow_path and exclude the task from freezing completely. If a task was already frozen it would get woken by __thaw_task from OOM killer and get out of freezer after rechecking freezing(). Changes since v1 - put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator as per Oleg - return __thaw_task into oom_scan_process_thread because oom_kill_process will not wake task in the fridge because it is sleeping uninterruptible [mhocko@suse.cz: rewrote the changelog] Fixes: a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE) Cc: 3.3+ <stable@vger.kernel.org> # 3.3+ Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
Michal Hocko <mhocko@suse.cz> Acked-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 51fae6da) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit be7c6dba) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Quentin Casasnovas authored
Fixes commit 2f06fa04 which was an incorrect backported version of commit d6b41cb0 upstream. If val_count is zero we return -EINVAL with map->lock_arg locked, which will deadlock the kernel next time we try to acquire this lock. This was introduced by f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.") which improperly back-ported d6b41cb0. This issue was found during review of Ubuntu Trusty 3.13.0-40.68 kernel to prepare Ksplice rebootless updates. Fixes: f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.") Signed-off-by:
Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 197b3975) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
bob picco authored
The T5 (niagara5) has different PCR related HV fast trap values and a new HV API Group. This patch utilizes these and shares when possible with niagara4. We use the same sparc_pmu niagara4_pmu. Should there be new effort to obtain the MCU perf statistics then this would have to be changed. Cc: sparclinux@vger.kernel.org Signed-off-by:
Bob Picco <bob.picco@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 05aa1651) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Xiubo Li authored
Since we cannot make sure the 'val_count' will always be none zero here, and then if it equals to zero, the kmemdup() will return ZERO_SIZE_PTR, which equals to ((void *)16). So this patch fix this with just doing the zero check before calling kmemdup(). Signed-off-by:
Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit d6b41cb0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Bryan O'Donoghue authored
This patch is to enable the USB gadget device for Intel Quark X1000 Signed-off-by:
Bryan O'Donoghue <bryan.odonoghue@intel.com> Signed-off-by:
Bing Niu <bing.niu@intel.com> Signed-off-by:
Alvin (Weike) Chen <alvin.chen@intel.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> (cherry picked from commit a68df706) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
Verify that inode size is sane when loading inode with data stored in ICB. Otherwise we may get confused later when working with the inode and inode size is too big. CC: stable@vger.kernel.org Reported-by:
Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit e159332b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
If two threads call bitmap_unplug at the same time, then one might schedule all the writes, and the other might decide that it doesn't need to wait. But really it does. It rarely hurts to wait when it isn't absolutely necessary, and the current code doesn't really focus on 'absolutely necessary' anyway. So just wait always. This can potentially lead to data corruption if a crash happens at an awkward time and data was written before the bitmap was updated. It is very unlikely, but this should go to -stable just to be safe. Appropriate for any -stable. Signed-off-by:
NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org (please delay until 3.18 is released) (cherry picked from commit 4b5060dd) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Greg Kroah-Hartman authored
This reverts commit 2dbfff81, which really is commit 558e4736 upstream. Sorry for the confusion, this got applied twice, and reverted once, this is the second revert and I hope to never touch it again... Reported-by:
Lv Zheng <lv.zheng@intel.com> Cc: Alexander Mezin <mezin.alexander@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 1c9e23ba) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Lv Zheng authored
commit 558e4736 upstream. There is platform refusing to respond QR_EC when SCI_EVT isn't set which is Acer Aspire V5-573G. By disallowing QR_EC to be issued before the previous one has been completed we are able to reduce the possibilities to trigger issues on such platforms. Note that this fix can only reduce the occurrence rate of this issue, but this issue may still occur when such a platform doesn't clear SCI_EVT before or immediately after completing the previous QR_EC transaction. This patch cannot fix the CLEAR_ON_RESUME quirk which also relies on the assumption that the platforms are able to respond even when SCI_EVT isn't set. But this patch is still useful as it can help to reduce the number of scheduled QR_EC work items. Link: https://bugzilla.kernel.org/show_bug.cgi?id=82611Reported-and-tested-by:
Alexander Mezin <mezin.alexander@gmail.com> Signed-off-by:
Lv Zheng <lv.zheng@intel.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 2dbfff81)
-
Stanislaw Gruszka authored
X550VB as many others Asus laptops need wapf4 quirk to make RFKILL switch be functional. Otherwise system boots with wireless card disabled and is only possible to enable it by suspend/resume. Bug report: http://bugzilla.redhat.com/show_bug.cgi?id=1089731#c23Reported-and-tested-by:
Vratislav Podzimek <vpodzime@redhat.com> Signed-off-by:
Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by:
Darren Hart <dvhart@linux.intel.com> (cherry picked from commit 4ec7a45b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Filipe Manana authored
When we abort a transaction we iterate over all the ranges marked as dirty in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them from those trees, add them back (unpin) to the free space caches and, if the fs was mounted with "-o discard", perform a discard on those regions. Also, after adding the regions to the free space caches, a fitrim ioctl call can see those ranges in a block group's free space cache and perform a discard on the ranges, so the same issue can happen without "-o discard" as well. This causes corruption, affecting one or multiple btree nodes (in the worst case leaving the fs unmountable) because some of those ranges (the ones in the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are referred by the last committed super block - breaking the rule that anything that was committed by a transaction is untouched until the next transaction commits successfully. I ran into this while running in a loop (for several hours) the fstest that I recently submitted: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim The corruption always happened when a transaction aborted and then fsck complained like this: _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 read block failed check_tree_block Couldn't open file system In this case 94945280 corresponded to the root of a tree. Using frace what I observed was the following sequence of steps happened: 1) transaction N started, fs_info->pinned_extents pointed to fs_info->freed_extents[0]; 2) node/eb 94945280 is created; 3) eb is persisted to disk; 4) transaction N commit starts, fs_info->pinned_extents now points to fs_info->freed_extents[1], and transaction N completes; 5) transaction N + 1 starts; 6) eb is COWed, and btrfs_free_tree_block() called for this eb; 7) eb range (94945280 to 94945280 + 16Kb) is added to fs_info->pinned_extents (fs_info->freed_extents[1]); 8) Something goes wrong in transaction N + 1, like hitting ENOSPC for example, and the transaction is aborted, turning the fs into readonly mode. The stack trace I got for example: [112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66 [112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98 [112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50 [112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs] [112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs] [112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs] [112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs] (...) [112065.264493] ---[ end trace dd7903a975a31a08 ]--- [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left [112065.264997] BTRFS info (device sdc): forced readonly 9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in fs_info->fs_state and calls btrfs_cleanup_transaction(), which in turn calls btrfs_destroy_pinned_extent(); 10) Then btrfs_destroy_pinned_extent() iterates over all the ranges marked as dirty in fs_info->freed_extents[], and for each one it calls discard, if the fs was mounted with "-o discard", and adds the range to the free space cache of the respective block group; 11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path, sees the free space entries and performs a discard; 12) After an umount and mount (or fsck), our eb's location on disk was full of zeroes, and it should have been untouched, because it was marked as dirty in the fs_info->pinned_extents tree, and therefore used by the trees that the last committed superblock points to. Fix this by not performing a discard and not adding the ranges to the free space caches - it's useless from this point since the fs is now in readonly mode and we won't write free space caches to disk anymore (otherwise we would leak space) nor any new superblock. By not adding the ranges to the free space caches, it prevents other code paths from allocating that space and write to it as well, therefore being safer and simpler. This isn't a new problem, as it's been present since 2011 (git commit acce952b). Cc: stable@vger.kernel.org # any kernel released after 2011-01-06 Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
Chris Mason <clm@fb.com> (cherry picked from commit 678886bd) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Josef Bacik authored
We use the modified list to keep track of which extents have been modified so we know which ones are candidates for logging at fsync() time. Newly modified extents are added to the list at modification time, around the same time the ordered extent is created. We do this so that we don't have to wait for ordered extents to complete before we know what we need to log. The problem is when something like this happens log extent 0-4k on inode 1 copy csum for 0-4k from ordered extent into log sync log commit transaction log some other extent on inode 1 ordered extent for 0-4k completes and adds itself onto modified list again log changed extents see ordered extent for 0-4k has already been logged at this point we assume the csum has been copied sync log crash On replay we will see the extent 0-4k in the log, drop the original 0-4k extent which is the same one that we are replaying which also drops the csum, and then we won't find the csum in the log for that bytenr. This of course causes us to have errors about not having csums for certain ranges of our inode. So remove the modified list manipulation in unpin_extent_cache, any modified extents should have been added well before now, and we don't want them re-logged. This fixes my test that I could reliably reproduce this problem with. Thanks, cc: stable@vger.kernel.org Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com> (cherry picked from commit a2804695) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Michael Halcrow authored
Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the end of the allocated buffer during encrypted filename decoding. This fix corrects the issue by getting rid of the unnecessary 0 write when the current bit offset is 2. Signed-off-by:
Michael Halcrow <mhalcrow@google.com> Reported-by:
Dmitry Chernenkov <dmitryc@google.com> Suggested-by:
Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org # v2.6.29+: 51ca58dc eCryptfs: Filename Encryption: Encoding and encryption functions Signed-off-by:
Tyler Hicks <tyhicks@canonical.com> (cherry picked from commit 94208064) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Tyler Hicks authored
The ecryptfs_encrypted_view mount option greatly changes the functionality of an eCryptfs mount. Instead of encrypting and decrypting lower files, it provides a unified view of the encrypted files in the lower filesystem. The presence of the ecryptfs_encrypted_view mount option is intended to force a read-only mount and modifying files is not supported when the feature is in use. See the following commit for more information: e77a56dd [PATCH] eCryptfs: Encrypted passthrough This patch forces the mount to be read-only when the ecryptfs_encrypted_view mount option is specified by setting the MS_RDONLY flag on the superblock. Additionally, this patch removes some broken logic in ecryptfs_open() that attempted to prevent modifications of files when the encrypted view feature was in use. The check in ecryptfs_open() was not sufficient to prevent file modifications using system calls that do not operate on a file descriptor. Signed-off-by:
Tyler Hicks <tyhicks@canonical.com> Reported-by:
Priya Bansal <p.bansal@samsung.com> Cc: stable@vger.kernel.org # v2.6.21+: e77a56dd [PATCH] eCryptfs: Encrypted passthrough (cherry picked from commit 332b122d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
UDF specification allows arbitrarily large symlinks. However we support only symlinks at most one block large. Check the length of the symlink so that we don't access memory beyond end of the symlink block. CC: stable@vger.kernel.org Reported-by:
Carl Henrik Lunde <chlunde@gmail.com> Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit a1d47b26) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
Verify that inode size is sane when loading inode with data stored in ICB. Otherwise we may get confused later when working with the inode and inode size is too big. CC: stable@vger.kernel.org Reported-by:
Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit e159332b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Reviewed-by:
"Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 24c037eb) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error return value is then (in most cases) just overwritten before we return. This can result in reporting success to userspace although error happened. This bug was introduced by commit 2e54eb96 ("BKL: Remove BKL from ncpfs"). Propagate the errors correctly. Coverity id: 1226925. Fixes: 2e54eb96 ("BKL: Remove BKL from ncpfs") Signed-off-by:
Jan Kara <jack@suse.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit a682e9c2) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Rabin Vincent authored
If a request is backlogged, it's complete() handler will get called twice: once with -EINPROGRESS, and once with the final error code. af_alg's complete handler, unlike other users, does not handle the -EINPROGRESS but instead always completes the completion that recvmsg() is waiting on. This can lead to a return to user space while the request is still pending in the driver. If userspace closes the sockets before the requests are handled by the driver, this will lead to use-after-frees (and potential crashes) in the kernel due to the tfm having been freed. The crashes can be easily reproduced (for example) by reducing the max queue length in cryptod.c and running the following (from http://www.chronox.de/libkcapi.html) on AES-NI capable hardware: $ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \ -k 00000000000000000000000000000000 \ -p 00000000000000000000000000000000 >/dev/null & done Cc: stable@vger.kernel.org Signed-off-by:
Rabin Vincent <rabin.vincent@axis.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> (cherry picked from commit 7e77bdeb) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Cc: stable@vger.kernel.org Reviewed-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit 66d2f338) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace makes it easier to verify all credentials obtained with the uid mapping can be obtained without the uid mapping without privilege. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit f95d7918) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Cc: stable@vger.kernel.org Reviewed-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit 80dd00a2) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Cc: stable@vger.kernel.org Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit 0542f17b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount when nodev was implicitly added on mount. Tested-by:
Cedric Bosdonnat <cbosdonnat@suse.com> Tested-by:
Richard Weinberger <richard@nod.at> Cc: stable@vger.kernel.org Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit 3e186641) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andreas Müller authored
As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount" stopped being incremented after the use-after-free fix. Furthermore, the RX-LED will be triggered by every multicast frame (which wouldn't happen before) which wouldn't allow the LED to rest at all. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the patch. Cc: stable@vger.kernel.org Fixes: b8fff407 ("mac80211: fix use-after-free in defragmentation") Signed-off-by:
Andreas Müller <goo@stapelspeicher.org> [rewrite commit message] Signed-off-by:
Johannes Berg <johannes.berg@intel.com> (cherry picked from commit d025933e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Takashi Iwai authored
When loading encrypted-keys module, if the last check of aes_get_sizes() in init_encrypted() fails, the driver just returns an error without unregistering its key type. This results in the stale entry in the list. In addition to memory leaks, this leads to a kernel crash when registering a new key type later. This patch fixes the problem by swapping the calls of aes_get_sizes() and register_key_type(), and releasing resources properly at the error paths. Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163 Cc: <stable@vger.kernel.org> Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Mimi Zohar <zohar@linux.vnet.ibm.com> (cherry picked from commit b26bdde5) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
We didn't check length of rock ridge ER records before printing them. Thus corrupted isofs image can cause us to access and print some memory behind the buffer with obvious consequences. Reported-and-tested-by:
Carl Henrik Lunde <chlunde@ping.uio.no> CC: stable@vger.kernel.org Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit 4e202462) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments") Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # Only if 0e58af4e is backported Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by:
Ingo Molnar <mingo@kernel.org> (cherry picked from commit 3fb2f423) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dan Carpenter authored
This function isn't right and it causes a static checker warning: drivers/md/dm-thin.c:3016 maybe_resize_data_dev() error: potentially using uninitialized 'sb_data_size'. It should set "*count" and return zero on success the same as the sm_metadata_get_nr_blocks() function does earlier. Fixes: 3241b1d3 ('dm: add persistent data library') Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Acked-by:
Joe Thornber <ejt@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> (cherry picked from commit c1c6156f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Darrick J. Wong authored
When dm-bufio sets out to use the bio built into a struct dm_buffer to issue an IO, it needs to call bio_reset after it's done with the bio so that we can free things attached to the bio such as the integrity payload. Therefore, inject our own endio callback to take care of the bio_reset after calling submit_io's end_io callback. Test case: 1. modprobe scsi_debug delay=0 dif=1 dix=199 ato=1 dev_size_mb=300 2. Set up a dm-bufio client, e.g. dm-verity, on the scsi_debug device 3. Repeatedly read metadata and watch kmalloc-192 leak! Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org (cherry picked from commit 445559cd) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Peng Tao authored
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt early so that it is safe to call .release in case nfs4_alloc_pages fails. Signed-off-by:
Peng Tao <tao.peng@primarydata.com> Fixes: a47970ff ("NFSv4.1: Hold reference to layout hdr in layoutget") Cc: stable@vger.kernel.org # 3.9+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com> (cherry picked from commit 4bd5a980) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Sumit.Saxena@avagotech.com authored
Corrected wait_event() call which was waiting for wrong completion status (0xFF). Signed-off-by:
Sumit Saxena <sumit.saxena@avagotech.com> Signed-off-by:
Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by:
Tomas Henzl <thenzl@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> (cherry picked from commit 170c2387) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Baruch Siach authored
Make force_ro consistent with other sysfs entries. Fixes: 371a689f ('mmc: MMC boot partitions support') Cc: Andrei Warkentin <andrey.warkentin@gmail.com> Signed-off-by:
Baruch Siach <baruch@tkos.co.il> Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> (cherry picked from commit 0031a98a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dmitry Eremin-Solenikov authored
Some boards with TC6393XB chip require full state restore during system resume thanks to chip's VCC being cut off during suspend (Sharp SL-6000 tosa is one of them). Failing to do so would result in ohci Oops on resume due to internal memory contentes being changed. Fail ohci suspend on tc6393xb is full state restore is required. Recommended workaround is to unbind tmio-ohci driver before suspend and rebind it after resume. Cc: stable@vger.kernel.org Signed-off-by:
Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by:
Lee Jones <lee.jones@linaro.org> (cherry picked from commit 1a5fb99d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Cc: stable@vger.kernel.org Suggested-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 29fa6825) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
Otherwise, if buggy user code points DS or ES into the TLS array, they would be corrupted after a context switch. This also significantly improves the comments and documents some gotchas in the code. Before this patch, the both tests below failed. With this patch, the es test passes, although the gsbase test still fails. ----- begin es test ----- /* * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned short GDT3(int idx) { return (idx << 3) | 3; } static int create_tls(int idx, unsigned int base) { struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; if (syscall(SYS_set_thread_area, &desc) != 0) err(1, "set_thread_area"); return desc.entry_number; } int main() { int idx = create_tls(-1, 0); printf("Allocated GDT index %d\n", idx); unsigned short orig_es; asm volatile ("mov %%es,%0" : "=rm" (orig_es)); int errors = 0; int total = 1000; for (int i = 0; i < total; i++) { asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx))); usleep(100); unsigned short es; asm volatile ("mov %%es,%0" : "=rm" (es)); asm volatile ("mov %0,%%es" : : "rm" (orig_es)); if (es != GDT3(idx)) { if (errors == 0) printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n", GDT3(idx), es); errors++; } } if (errors) { printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total); return 1; } else { printf("[OK]\tES was preserved\n"); return 0; } } ----- end es test ----- ----- begin gsbase test ----- /* * gsbase.c, a gsbase test * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned char *testptr, *testptr2; static unsigned char read_gs_testvals(void) { unsigned char ret; asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr)); return ret; } int main() { int errors = 0; testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr == MAP_FAILED) err(1, "mmap"); testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr2 == MAP_FAILED) err(1, "mmap"); *testptr = 0; *testptr2 = 1; if (syscall(SYS_arch_prctl, ARCH_SET_GS, (unsigned long)testptr2 - (unsigned long)testptr) != 0) err(1, "ARCH_SET_GS"); usleep(100); if (read_gs_testvals() == 1) { printf("[OK]\tARCH_SET_GS worked\n"); } else { printf("[FAIL]\tARCH_SET_GS failed\n"); errors++; } asm volatile ("mov %0,%%gs" : : "r" (0)); if (read_gs_testvals() == 0) { printf("[OK]\tWriting 0 to gs worked\n"); } else { printf("[FAIL]\tWriting 0 to gs failed\n"); errors++; } usleep(100); if (read_gs_testvals() == 0) { printf("[OK]\tgsbase is still zero\n"); } else { printf("[FAIL]\tgsbase was corrupted\n"); errors++; } return errors == 0 ? 0 : 1; } ----- end gsbase test ----- Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.netSigned-off-by:
Ingo Molnar <mingo@kernel.org> (cherry picked from commit f647d7c1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments") Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # Only if 0e58af4e is backported Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by:
Ingo Molnar <mingo@kernel.org> x86/tls: Disallow unusual TLS segments Users have no business installing custom code segments into the GDT, and segments that are not present but are otherwise valid are a historical source of interesting attacks. For completeness, block attempts to set the L bit. (Prior to this patch, the L bit would have been silently dropped.) This is an ABI break. I've checked glibc, musl, and Wine, and none of them look like they'll have any trouble. Note to stable maintainers: this is a hardening patch that fixes no known bugs. Given the possibility of ABI issues, this probably shouldn't be backported quickly. Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Acked-by:
H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org # optional Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by:
Ingo Molnar <mingo@kernel.org> (cherry picked from commit 3fb2f423 0e58af4e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
Installing a 16-bit RW data segment into the GDT defeats espfix. AFAICT this will not affect glibc, Wine, or dosemu at all. Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Acked-by:
H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by:
Ingo Molnar <mingo@kernel.org> (cherry picked from commit 41bdc785) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
Rock Ridge extensions define so called Continuation Entries (CE) which define where is further space with Rock Ridge data. Corrupted isofs image can contain arbitrarily long chain of these, including a one containing loop and thus causing kernel to end in an infinite loop when traversing these entries. Limit the traversal to 32 entries which should be more than enough space to store all the Rock Ridge data. Reported-by:
P J P <ppandit@redhat.com> CC: stable@vger.kernel.org Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit f54e18f1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-