- 05 May, 2017 5 commits
-
-
Germano Percossi authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 1fa839b4 upstream. This fixes Continuous Availability when errors during file reopen are encountered. cifs_user_readv and cifs_user_writev would wait for ever if results of cifs_reopen_file are not stored and for later inspection. In fact, results are checked and, in case of errors, a chain of function calls leading to reads and writes to be scheduled in a separate thread is skipped. These threads will wake up the corresponding waiters once reads and writes are done. However, given the return value is not stored, when rc is checked for errors a previous one (always zero) is inspected instead. This leads to pending reads/writes added to the list, making cifs_user_readv and cifs_user_writev wait for ever. Signed-off-by: Germano Percossi <germano.percossi@citrix.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Ilia Mirkin authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit f94773b9 upstream. The NV4A (aka NV44A) is an oddity in the family. It only comes in AGP and PCI varieties, rather than a core PCIE chip with a bridge for AGP/PCI as necessary. As a result, it appears that the MMU is also non-functional. For AGP cards, the vast majority of the NV4A lineup, this worked out since we force AGP cards to use the nv04 mmu. However for PCI variants, this did not work. Switching to the NV04 MMU makes it work like a charm. Thanks to mwk for the suggestion. This should be a no-op for NV4A AGP boards, as they were using it already. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70388Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Ilia Mirkin authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 83bce9c2 upstream. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Fixes: 590801c1 ("drm/nouveau/mpeg: remove dependence on namedb/engctx lookup") Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Kirill A. Shutemov authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 5b7abeae upstream. Yet another instance of the same race. Fix is identical to change_huge_pmd(). See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details. Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Tejun Heo authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 77f88796 upstream. Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 04 May, 2017 1 commit
-
-
Thadeu Lima de Souza Cascardo authored
Ignore: yes Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 27 Apr, 2017 34 commits
-
-
Thadeu Lima de Souza Cascardo authored
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Roman Pen authored
BugLink: http://bugs.launchpad.net/bugs/1683976 get_disk(),get_gendisk() calls have non explicit side effect: they increase the reference on the disk owner module. The following is the correct sequence how to get a disk reference and to put it: disk = get_gendisk(...); /* use disk */ owner = disk->fops->owner; put_disk(disk); module_put(owner); fs/block_dev.c is aware of this required module_put() call, but f.e. blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put a module reference. To see a leakage in action cgroups throttle config can be used. In the following script I'm removing throttle for /dev/ram0 (actually this is NOP, because throttle was never set for this device): # lsmod | grep brd brd 5175 0 # i=100; while [ $i -gt 0 ]; do echo "1:0 0" > \ /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \ done # lsmod | grep brd brd 5175 100 Now brd module has 100 references. The issue is fixed by calling module_put() just right away put_disk(). Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gi-Oh Kim <gi-oh.kim@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 39a169b6) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When device open races with device shutdown, we can get the following oops in scsi_disk_get(): [11863.044351] general protection fault: 0000 [#1] SMP [11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop] [11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W 4.10.0-rc2-xen+ #35 [11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000 [11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70 [11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202 [11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000 [11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880 [11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001 [11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa [11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d [11863.059217] FS: 00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000 [11863.059217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0 [11863.059217] Call Trace: [11863.059217] ? disk_get_part+0x22/0x1f0 [11863.059217] sd_open+0x39/0x130 [11863.059217] __blkdev_get+0x69/0x430 [11863.059217] ? bd_acquire+0x7f/0xc0 [11863.059217] ? bd_acquire+0x96/0xc0 [11863.059217] ? blkdev_get+0x350/0x350 [11863.059217] blkdev_get+0x126/0x350 [11863.059217] ? _raw_spin_unlock+0x2b/0x40 [11863.059217] ? bd_acquire+0x7f/0xc0 [11863.059217] ? blkdev_get+0x350/0x350 [11863.059217] blkdev_open+0x65/0x80 ... As you can see RAX value is already poisoned showing that gendisk we got is already freed. The problem is that get_gendisk() looks up device number in ext_devt_idr and then does get_disk() which does kobject_get() on the disks kobject. However the disk gets removed from ext_devt_idr only in disk_release() (through blk_free_devt()) at which moment it has already 0 refcount and is already on its way to be freed. Indeed we've got a warning from kobject_get() about 0 refcount shortly before the oops. We fix the problem by using kobject_get_unless_zero() in get_disk() so that get_disk() cannot get reference on a disk that is already being freed. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit d01b2dcb) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Make the function available for outside use and fortify it against NULL kobject. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit c70c176f) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When block device is closed, we call inode_detach_wb() in __blkdev_put() which sets inode->i_wb to NULL. That is contrary to expectations that inode->i_wb stays valid once set during the whole inode's lifetime and leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because inode_to_wb() returned NULL. The reason why we called inode_detach_wb() is not valid anymore though. BDI is guaranteed to stay along until we call bdi_put() from bdev_evict_inode() so we can postpone calling inode_detach_wb() to that moment. Also add a warning to catch if someone uses inode_detach_wb() in a dangerous way. Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit f759741d) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called from bdi_unregister() which is not necessarily called from bdi_destroy() and thus the name is somewhat misleading. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit b1c51afc) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently we wait for all cgwbs to get released in cgwb_bdi_destroy() (called from bdi_unregister()). That is however unnecessary now when cgwb->bdi is a proper refcounted reference (thus bdi cannot get released before all cgwbs are released) and when cgwb_bdi_destroy() shuts down writeback directly. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 4514451e) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy() which also means that writeback has been shutdown on them. Since this wait is going away, directly shutdown writeback on cgwbs from cgwb_bdi_destroy() to avoid live writeback structures after bdi_unregister() has finished. To make that safe with concurrent shutdown from cgwb_release_workfn(), we also have to make sure wb_shutdown() returns only after the bdi_writeback structure is really shutdown. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 5318ce7d) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently root wb_writeback structure is added to bdi->wb_list in bdi_init() and never removed. That is different from all other wb_writeback structures which get added to the list when created and removed from it before wb_shutdown(). So move list addition of root bdi_writeback to bdi_register() and list removal of all wb_writeback structures to wb_shutdown(). That way a wb_writeback structure is on bdi->wb_list if and only if it can handle writeback and it will make it easier for us to handle shutdown of all wb_writeback structures in bdi_unregister(). Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit e8cb72b3) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback structures except for the one embedded inside struct backing_dev_info. That will allow us to simplify bdi unregistration. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 810df54a) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 congested->bdi pointer is used only to be able to remove congested structure from bdi->cgwb_congested_tree on structure release. Moreover the pointer can become NULL when we unregister the bdi. Rename the field to __bdi and add a comment to make it more explicit this is internal stuff of memcg writeback code and people should not use the field as such use will be likely race prone. We do not bother with converting congested->bdi to a proper refcounted reference. It will be slightly ugly to special-case bdi->wb.congested to avoid effectively a cyclic reference of bdi to itself and the reference gets cleared from bdi_unregister() making it impossible to reference a freed bdi. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit b7d680d7) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When disk->fops->open() in __blkdev_get() returns -ERESTARTSYS, we restart the process of opening the block device. However we forget to switch bdev->bd_bdi back to noop_backing_dev_info and as a result bdev inode will be pointing to a stale bdi. Fix the problem by setting bdev->bd_bdi later when __blkdev_get() is already guaranteed to succeed. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit 03e26279) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()" added disk->queue dereference to del_gendisk(). Although del_gendisk() is not supposed to be called without disk->queue valid and blk_unregister_queue() warns in that case, this change will make it oops instead. Return to the old more robust behavior of just warning when del_gendisk() gets called for gendisk with disk->queue being NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 90f16fdd) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 bdi_writeback_congested structures get created for each blkcg and bdi regardless whether bdi is registered or not. When they are created in unregistered bdi and the request queue (and thus bdi) is then destroyed while blkg still holds reference to bdi_writeback_congested structure, this structure will be referencing freed bdi and last wb_congested_put() will try to remove the structure from already freed bdi. With commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()", SCSI started to destroy bdis without calling bdi_unregister() first (previously it was calling bdi_unregister() even for unregistered bdis) and thus the code detaching bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we started hitting this use-after-free bug. It is enough to boot a KVM instance with virtio-scsi device to trigger this behavior. Fix the problem by detaching bdi_writeback_congested structures in bdi_exit() instead of bdi_unregister(). This is also more logical as they can get attached to bdi regardless whether it ever got registered or not. Fixes: 165a5e22Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit df23de55) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 SCSI can call device_add_disk() several times for one request queue when a device in unbound and bound, creating new gendisk each time. This will lead to bdi being repeatedly registered and unregistered. This was not a big problem until commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()" since bdi was only registered repeatedly (bdi_register() handles repeated calls fine, only we ended up leaking reference to gendisk due to overwriting bdi->owner) but unregistered only in blk_cleanup_queue() which didn't get called repeatedly. After 165a5e22 we were doing correct bdi_register() - bdi_unregister() cycles however bdi_unregister() is not prepared for it. So make sure bdi_unregister() cleans up bdi in such a way that it is prepared for a possible following bdi_register() call. An easy way to provoke this behavior is to enable CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a scsi disk which immediately hangs without this fix. Fixes: 165a5e22Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit b6f8fec4) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Commit 6cd18e71 "block: destroy bdi before blockdev is unregistered." moved bdi unregistration (at that time through bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because it needs to happen before blk_unregister_region() call in del_gendisk() for MD. SCSI though will free up the device number from sd_remove() called through a maze of callbacks from device_del() in __scsi_remove_device() before blk_cleanup_queue() and thus similar races as described in 6cd18e71 can happen for SCSI as well as reported by Omar [1]. Moving bdi_unregister() to del_gendisk() works for MD and fixes the problem for SCSI since del_gendisk() gets called from sd_remove() before freeing the device number. This also makes device_add_disk() (calling bdi_register_owner()) more symmetric with del_gendisk(). [1] http://marc.info/?l=linux-block&m=148554717109098&w=2Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit 165a5e22) Conflicts: block/blk-core.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 So far we initialized bd_bdi only in bdget(). That is fine for normal bdev inodes however for the special case of the root inode of blockdev_superblock that function is never called and thus bd_bdi is left uninitialized. As a result bdev_evict_inode() may oops doing bdi_put(root->bd_bdi) on that inode as can be seen when doing: mount -t bdev none /mnt Fix the problem by initializing bd_bdi when first allocating the inode and then reinitializing bd_bdi in bdev_evict_inode(). Thanks to syzkaller team for finding the problem. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: b1d2dc56 ("block: Make blk_get_backing_dev_info() safe without open bdev") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit a5a79d00) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When a device gets removed, block device inode unhashed so that it is not used anymore (bdget() will not find it anymore). Later when a new device gets created with the same device number, we create new block device inode. However there may be file system device inodes whose i_bdev still points to the original block device inode and thus we get two active block device inodes for the same device. They will share the same gendisk so the only visible differences will be that page caches will not be coherent and BDIs will be different (the old block device inode still points to unregistered BDI). Fix the problem by checking in bd_acquire() whether i_bdev still points to active block device inode and re-lookup the block device if not. That way any open of a block device happening after the old device has been removed will get correct block device inode. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit cccd9fb9) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Iteration over partitions in del_gendisk() omits part0. Add bdev_unhash_inode() call for the whole device. Otherwise if the device number gets reused, bdev inode will be still associated with the old (stale) bdi. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit d06e05c0) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Move bdev_unhash_inode() after invalidate_partition() as invalidate_partition() looks up bdev and it cannot find the right bdev inode after bdev_unhash_inode() is called. Thus invalidate_partition() would not invalidate page cache of the previously used bdev. Also use part_devt() when calling bdev_unhash_inode() instead of manually creating the device number. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 4b8c861a) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit efa7c9f9) Conflicts: block/compat_ioctl.c block/ioctl.c fs/btrfs/volumes.c include/linux/blkdev.h Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currenly blk_get_backing_dev_info() is not safe to be called when the block device is not open as bdev->bd_disk is NULL in that case. However inode_to_bdi() uses this function and may be call called from flusher worker or other writeback related functions without bdev being open which leads to crashes such as: [113031.075540] Unable to handle kernel paging request for data at address 0x00000000 [113031.075614] Faulting instruction address: 0xc0000000003692e0 0:mon> t [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590 [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150 [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450 [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580 [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590 [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660 [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130 [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit b1d2dc56) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit d03f6cdc) Conflicts: block/blk-sysfs.c include/linux/backing-dev-defs.h Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit dc3b17cc) Conflicts: block/blk-cgroup.c block/blk-core.c block/blk-settings.c block/blk-sysfs.c block/blk-wbt.c drivers/block/aoe/aoeblk.c drivers/block/drbd/drbd_nl.c drivers/md/dm.c drivers/md/md.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily. Furthermore we need to avoid issues when new device with the same major,minor pair gets created since reusing the bdi structure would be rather difficult in this case. Unhashing block device inode on gendisk destruction nicely deals with these problems. Once last block device inode reference is dropped (which may be directly in del_gendisk()), the inode gets evicted. Furthermore if the major,minor pair gets reallocated, we are guaranteed to get new block device inode even if old block device inode is not yet evicted and thus we avoid issues with possible reuse of bdi. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit f44f1ab5) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Masaki Ota authored
BugLink: http://bugs.launchpad.net/bugs/1662589 Devices identified as E7="73 03 28" use slightly modified version of V8 protocol, with lower count per electrode, different offsets, and different feature bits in OTP data. Fixes: aeaa881f ("Input: ALPS - set DualPoint flag for 74 03 28 devices") Signed-off-by: Masaki Ota <masaki.ota@jp.alps.com> Acked-by: Pali Rohar <pali.rohar@gmail.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Nick Fletcher <nick.m.fletcher@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> (backported from commit e7348396) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Greg Kroah-Hartman authored
BugLink: http://bugs.launchpad.net/bugs/1683728Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Horia Geantă authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit 40c98cb5 upstream. RNG instantiation was previously fixed by commit 62743a41 ("crypto: caam - fix RNG init descriptor ret. code checking") while deinstantiation was not addressed. Since the descriptors used are similar, in the sense that they both end with a JUMP HALT command, checking for errors should be similar too, i.e. status code 7000_0000h should be considered successful. Fixes: 1005bccd ("crypto: caam - enable instantiation of all RNG4 state handles") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Matt Redfearn authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit c25f8064 upstream. Commit dda45f70 ("MIPS: Switch to the irq_stack in interrupts") changed both the normal and vectored interrupt handlers. Unfortunately the vectored version, "except_vec_vi_handler", was incorrectly modified to unconditionally jal to plat_irq_dispatch, rather than doing a jalr to the vectored handler that has been set up. This is ok for many platforms which set the vectored handler to plat_irq_dispatch anyway, but will cause problems with platforms that use other handlers. Fixes: dda45f70 ("MIPS: Switch to the irq_stack in interrupts") Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15110/Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Matt Redfearn authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit 3cc3434f upstream. Since do_IRQ is now invoked on a separate IRQ stack, we select HAVE_IRQ_EXIT_ON_IRQ_STACK so that softirq's may be invoked directly from irq_exit(), rather than requiring do_softirq_own_stack. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Acked-by: Jason A. Donenfeld <jason@zx2c4.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14744/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Matt Redfearn authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit dda45f70 upstream. When enterring interrupt context via handle_int or except_vec_vi, switch to the irq_stack of the current CPU if it is not already in use. The current stack pointer is masked with the thread size and compared to the base or the irq stack. If it does not match then the stack pointer is set to the top of that stack, otherwise this is a nested irq being handled on the irq stack so the stack pointer should be left as it was. The in-use stack pointer is placed in the callee saved register s1. It will be saved to the stack when plat_irq_dispatch is invoked and can be restored once control returns here. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Acked-by: Jason A. Donenfeld <jason@zx2c4.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14743/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Matt Redfearn authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit 510d8636 upstream. The SAVE_SOME macro is used to save the execution context on all exceptions. If an exception occurs while executing user code, the stack is switched to the kernel's stack for the current task, and register $28 is switched to point to the current_thread_info, which is at the bottom of the stack region. If the exception occurs while executing kernel code, the stack is left, and this change ensures that register $28 is not updated. This is the correct behaviour when the kernel can be executing on the separate irq stack, because the thread_info will not be at the base of it. With this change, register $28 is only switched to it's kernel conventional usage of the currrent thread info pointer at the point at which execution enters kernel space. Doing it on every exception was redundant, but OK without an IRQ stack, but will be erroneous once that is introduced. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Acked-by: Jason A. Donenfeld <jason@zx2c4.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14742/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Matt Redfearn authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit d42d8d10 upstream. Within unwind stack, check if the stack pointer being unwound is within the CPU's irq_stack and if so use that page rather than the task's stack page. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Acked-by: Jason A. Donenfeld <jason@zx2c4.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Maciej W. Rozycki <macro@imgtec.com> Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14741/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Matt Redfearn authored
BugLink: http://bugs.launchpad.net/bugs/1683728 commit fe8bd18f upstream. Allocate a per-cpu irq stack for use within interrupt handlers. Also add a utility function on_irq_stack to determine if a given stack pointer is within the irq stack for that cpu. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Acked-by: Jason A. Donenfeld <jason@zx2c4.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Petr Mladek <pmladek@suse.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14740/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-