- 05 May, 2017 12 commits
-
-
Nicholas Bellinger authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 1c99de98 upstream. Once upon a time back in 2009, a work-around was added to support the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login did not propose nor respond to MaxBurstLength, FirstBurstLength, DefaultTime2Wait and DefaultTime2Retain keys. The work-around in iscsi_check_proposer_for_optional_reply() allowed the missing keys to be proposed, but did not require waiting for a response before moving to full feature phase operation. This allowed GlobalSAN v3.3 to work out-of-the box, and for many years we didn't run into login interopt issues with any other initiators.. Until recently, when Martin tried a QLogic 57840S iSCSI Offload HBA on Windows 2016 which completed login, but subsequently failed with: Got unknown iSCSI OpCode: 0x43 The issue was QLogic MSFT side did not propose DefaultTime2Wait + DefaultTime2Retain, so LIO proposes them itself, and immediately transitions to full feature phase because of the GlobalSAN hack. However, the QLogic MSFT side still attempts to respond to DefaultTime2Retain + DefaultTime2Wait, even though LIO has set ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT in last login response. So while the QLogic MSFT side should have been proposing these two keys to start, it was doing the correct thing per RFC-3720 attempting to respond to proposed keys before transitioning to full feature phase. All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541) does correctly propose the four keys during login, making the original work-around moot. So in order to allow QLogic MSFT to run unmodified as-is, go ahead and drop this long standing work-around. Reported-by: Martin Svec <martin.svec@zoner.cz> Cc: Martin Svec <martin.svec@zoner.cz> Cc: Himanshu Madhani <Himanshu.Madhani@cavium.com> Cc: Arun Easi <arun.easi@cavium.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Nicholas Bellinger authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit efb2ea77 upstream. This patch fixes a iscsi-target specific TMR reference leak during session shutdown, that could occur when a TMR was quiesced before the hand-off back to iscsi-target code via transport_cmd_check_stop_to_fabric(). The reference leak happens because iscsit_free_cmd() was incorrectly skipping the final target_put_sess_cmd() for TMRs when transport_generic_free_cmd() returned zero because the se_cmd->cmd_kref did not reach zero, due to the missing se_cmd assignment in original code. The result was iscsi_cmd and it's associated se_cmd memory would be freed once se_sess->sess_cmd_map where released, but the associated se_tmr_req was leaked and remained part of se_device->dev_tmr_list. This bug would manfiest itself as kernel paging request OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req attempted to dereference it's se_cmd pointer that had already been released during normal session shutdown. To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD and ISCSI_OP_SCSI_TMFUNC the same when there is an extra se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use op_scsi to signal __iscsit_free_cmd() when the former needs to clear any further iscsi related I/O state. Reported-by: Rob Millner <rlm@daterainc.com> Cc: Rob Millner <rlm@daterainc.com> Reported-by: Chu Yuan Lin <cyl@datera.io> Cc: Chu Yuan Lin <cyl@datera.io> Tested-by: Chu Yuan Lin <cyl@datera.io> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Dan Williams authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit b03b99a3 upstream. While reviewing the -stable patch for commit 86ef58a4 "nfit, libnvdimm: fix interleave set cookie calculation" Ben noted: "This is returning an int, thus it's effectively doing a 32-bit comparison and not the 64-bit comparison you say is needed." Update the compare operation to be immune to this integer demotion problem. Cc: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Fixes: 86ef58a4 ("nfit, libnvdimm: fix interleave set cookie calculation") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Thomas Gleixner authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 6fdc6dd9 upstream. The vsyscall32 sysctl can racy against a concurrent fork when it switches from disabled to enabled: arch_setup_additional_pages() if (vdso32_enabled) --> No mapping sysctl.vsysscall32() --> vdso32_enabled = true create_elf_tables() ARCH_DLINFO_IA32 if (vdso32_enabled) { --> Add VDSO entry with NULL pointer Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for the newly forked process or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Mathias Krause authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit c06989da upstream. vdso_enabled can be set to arbitrary integer values via the kernel command line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'. load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32 merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which causes a segfault when the application tries to use the VDSO. Restrict the valid arguments on the command line and the sysctl to 0 and 1. Fixes: b0b49f26 ("x86, vdso: Remove compat vdso support") Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@redhat.com> Link: http://lkml.kernel.org/r/1491424561-7187-1-git-send-email-minipli@googlemail.com Link: http://lkml.kernel.org/r/20170410151723.518412863@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Peter Zijlstra authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit f2200ac3 upstream. When the perf_branch_entry::{in_tx,abort,cycles} fields were added, intel_pmu_lbr_read_32() wasn't updated to initialize them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 135c5612 ("perf/x86/intel: Support Haswell/v4 LBR format") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Cameron Gutman authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 53763668 upstream. Signed-off-by: Cameron Gutman <aicommander@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Germano Percossi authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 1fa839b4 upstream. This fixes Continuous Availability when errors during file reopen are encountered. cifs_user_readv and cifs_user_writev would wait for ever if results of cifs_reopen_file are not stored and for later inspection. In fact, results are checked and, in case of errors, a chain of function calls leading to reads and writes to be scheduled in a separate thread is skipped. These threads will wake up the corresponding waiters once reads and writes are done. However, given the return value is not stored, when rc is checked for errors a previous one (always zero) is inspected instead. This leads to pending reads/writes added to the list, making cifs_user_readv and cifs_user_writev wait for ever. Signed-off-by: Germano Percossi <germano.percossi@citrix.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Ilia Mirkin authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit f94773b9 upstream. The NV4A (aka NV44A) is an oddity in the family. It only comes in AGP and PCI varieties, rather than a core PCIE chip with a bridge for AGP/PCI as necessary. As a result, it appears that the MMU is also non-functional. For AGP cards, the vast majority of the NV4A lineup, this worked out since we force AGP cards to use the nv04 mmu. However for PCI variants, this did not work. Switching to the NV04 MMU makes it work like a charm. Thanks to mwk for the suggestion. This should be a no-op for NV4A AGP boards, as they were using it already. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70388Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Ilia Mirkin authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 83bce9c2 upstream. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Fixes: 590801c1 ("drm/nouveau/mpeg: remove dependence on namedb/engctx lookup") Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Kirill A. Shutemov authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 5b7abeae upstream. Yet another instance of the same race. Fix is identical to change_huge_pmd(). See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details. Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Tejun Heo authored
BugLink: http://bugs.launchpad.net/bugs/1687629 commit 77f88796 upstream. Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 04 May, 2017 1 commit
-
-
Thadeu Lima de Souza Cascardo authored
Ignore: yes Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 27 Apr, 2017 27 commits
-
-
Thadeu Lima de Souza Cascardo authored
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Roman Pen authored
BugLink: http://bugs.launchpad.net/bugs/1683976 get_disk(),get_gendisk() calls have non explicit side effect: they increase the reference on the disk owner module. The following is the correct sequence how to get a disk reference and to put it: disk = get_gendisk(...); /* use disk */ owner = disk->fops->owner; put_disk(disk); module_put(owner); fs/block_dev.c is aware of this required module_put() call, but f.e. blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put a module reference. To see a leakage in action cgroups throttle config can be used. In the following script I'm removing throttle for /dev/ram0 (actually this is NOP, because throttle was never set for this device): # lsmod | grep brd brd 5175 0 # i=100; while [ $i -gt 0 ]; do echo "1:0 0" > \ /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \ done # lsmod | grep brd brd 5175 100 Now brd module has 100 references. The issue is fixed by calling module_put() just right away put_disk(). Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gi-Oh Kim <gi-oh.kim@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 39a169b6) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When device open races with device shutdown, we can get the following oops in scsi_disk_get(): [11863.044351] general protection fault: 0000 [#1] SMP [11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop] [11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W 4.10.0-rc2-xen+ #35 [11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000 [11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70 [11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202 [11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000 [11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880 [11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001 [11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa [11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d [11863.059217] FS: 00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000 [11863.059217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0 [11863.059217] Call Trace: [11863.059217] ? disk_get_part+0x22/0x1f0 [11863.059217] sd_open+0x39/0x130 [11863.059217] __blkdev_get+0x69/0x430 [11863.059217] ? bd_acquire+0x7f/0xc0 [11863.059217] ? bd_acquire+0x96/0xc0 [11863.059217] ? blkdev_get+0x350/0x350 [11863.059217] blkdev_get+0x126/0x350 [11863.059217] ? _raw_spin_unlock+0x2b/0x40 [11863.059217] ? bd_acquire+0x7f/0xc0 [11863.059217] ? blkdev_get+0x350/0x350 [11863.059217] blkdev_open+0x65/0x80 ... As you can see RAX value is already poisoned showing that gendisk we got is already freed. The problem is that get_gendisk() looks up device number in ext_devt_idr and then does get_disk() which does kobject_get() on the disks kobject. However the disk gets removed from ext_devt_idr only in disk_release() (through blk_free_devt()) at which moment it has already 0 refcount and is already on its way to be freed. Indeed we've got a warning from kobject_get() about 0 refcount shortly before the oops. We fix the problem by using kobject_get_unless_zero() in get_disk() so that get_disk() cannot get reference on a disk that is already being freed. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit d01b2dcb) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Make the function available for outside use and fortify it against NULL kobject. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit c70c176f) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When block device is closed, we call inode_detach_wb() in __blkdev_put() which sets inode->i_wb to NULL. That is contrary to expectations that inode->i_wb stays valid once set during the whole inode's lifetime and leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because inode_to_wb() returned NULL. The reason why we called inode_detach_wb() is not valid anymore though. BDI is guaranteed to stay along until we call bdi_put() from bdev_evict_inode() so we can postpone calling inode_detach_wb() to that moment. Also add a warning to catch if someone uses inode_detach_wb() in a dangerous way. Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit f759741d) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called from bdi_unregister() which is not necessarily called from bdi_destroy() and thus the name is somewhat misleading. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit b1c51afc) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently we wait for all cgwbs to get released in cgwb_bdi_destroy() (called from bdi_unregister()). That is however unnecessary now when cgwb->bdi is a proper refcounted reference (thus bdi cannot get released before all cgwbs are released) and when cgwb_bdi_destroy() shuts down writeback directly. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 4514451e) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy() which also means that writeback has been shutdown on them. Since this wait is going away, directly shutdown writeback on cgwbs from cgwb_bdi_destroy() to avoid live writeback structures after bdi_unregister() has finished. To make that safe with concurrent shutdown from cgwb_release_workfn(), we also have to make sure wb_shutdown() returns only after the bdi_writeback structure is really shutdown. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 5318ce7d) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently root wb_writeback structure is added to bdi->wb_list in bdi_init() and never removed. That is different from all other wb_writeback structures which get added to the list when created and removed from it before wb_shutdown(). So move list addition of root bdi_writeback to bdi_register() and list removal of all wb_writeback structures to wb_shutdown(). That way a wb_writeback structure is on bdi->wb_list if and only if it can handle writeback and it will make it easier for us to handle shutdown of all wb_writeback structures in bdi_unregister(). Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit e8cb72b3) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback structures except for the one embedded inside struct backing_dev_info. That will allow us to simplify bdi unregistration. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 810df54a) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 congested->bdi pointer is used only to be able to remove congested structure from bdi->cgwb_congested_tree on structure release. Moreover the pointer can become NULL when we unregister the bdi. Rename the field to __bdi and add a comment to make it more explicit this is internal stuff of memcg writeback code and people should not use the field as such use will be likely race prone. We do not bother with converting congested->bdi to a proper refcounted reference. It will be slightly ugly to special-case bdi->wb.congested to avoid effectively a cyclic reference of bdi to itself and the reference gets cleared from bdi_unregister() making it impossible to reference a freed bdi. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit b7d680d7) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When disk->fops->open() in __blkdev_get() returns -ERESTARTSYS, we restart the process of opening the block device. However we forget to switch bdev->bd_bdi back to noop_backing_dev_info and as a result bdev inode will be pointing to a stale bdi. Fix the problem by setting bdev->bd_bdi later when __blkdev_get() is already guaranteed to succeed. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit 03e26279) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()" added disk->queue dereference to del_gendisk(). Although del_gendisk() is not supposed to be called without disk->queue valid and blk_unregister_queue() warns in that case, this change will make it oops instead. Return to the old more robust behavior of just warning when del_gendisk() gets called for gendisk with disk->queue being NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 90f16fdd) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 bdi_writeback_congested structures get created for each blkcg and bdi regardless whether bdi is registered or not. When they are created in unregistered bdi and the request queue (and thus bdi) is then destroyed while blkg still holds reference to bdi_writeback_congested structure, this structure will be referencing freed bdi and last wb_congested_put() will try to remove the structure from already freed bdi. With commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()", SCSI started to destroy bdis without calling bdi_unregister() first (previously it was calling bdi_unregister() even for unregistered bdis) and thus the code detaching bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we started hitting this use-after-free bug. It is enough to boot a KVM instance with virtio-scsi device to trigger this behavior. Fix the problem by detaching bdi_writeback_congested structures in bdi_exit() instead of bdi_unregister(). This is also more logical as they can get attached to bdi regardless whether it ever got registered or not. Fixes: 165a5e22Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit df23de55) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 SCSI can call device_add_disk() several times for one request queue when a device in unbound and bound, creating new gendisk each time. This will lead to bdi being repeatedly registered and unregistered. This was not a big problem until commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()" since bdi was only registered repeatedly (bdi_register() handles repeated calls fine, only we ended up leaking reference to gendisk due to overwriting bdi->owner) but unregistered only in blk_cleanup_queue() which didn't get called repeatedly. After 165a5e22 we were doing correct bdi_register() - bdi_unregister() cycles however bdi_unregister() is not prepared for it. So make sure bdi_unregister() cleans up bdi in such a way that it is prepared for a possible following bdi_register() call. An easy way to provoke this behavior is to enable CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a scsi disk which immediately hangs without this fix. Fixes: 165a5e22Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit b6f8fec4) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Commit 6cd18e71 "block: destroy bdi before blockdev is unregistered." moved bdi unregistration (at that time through bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because it needs to happen before blk_unregister_region() call in del_gendisk() for MD. SCSI though will free up the device number from sd_remove() called through a maze of callbacks from device_del() in __scsi_remove_device() before blk_cleanup_queue() and thus similar races as described in 6cd18e71 can happen for SCSI as well as reported by Omar [1]. Moving bdi_unregister() to del_gendisk() works for MD and fixes the problem for SCSI since del_gendisk() gets called from sd_remove() before freeing the device number. This also makes device_add_disk() (calling bdi_register_owner()) more symmetric with del_gendisk(). [1] http://marc.info/?l=linux-block&m=148554717109098&w=2Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit 165a5e22) Conflicts: block/blk-core.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 So far we initialized bd_bdi only in bdget(). That is fine for normal bdev inodes however for the special case of the root inode of blockdev_superblock that function is never called and thus bd_bdi is left uninitialized. As a result bdev_evict_inode() may oops doing bdi_put(root->bd_bdi) on that inode as can be seen when doing: mount -t bdev none /mnt Fix the problem by initializing bd_bdi when first allocating the inode and then reinitializing bd_bdi in bdev_evict_inode(). Thanks to syzkaller team for finding the problem. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: b1d2dc56 ("block: Make blk_get_backing_dev_info() safe without open bdev") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit a5a79d00) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 When a device gets removed, block device inode unhashed so that it is not used anymore (bdget() will not find it anymore). Later when a new device gets created with the same device number, we create new block device inode. However there may be file system device inodes whose i_bdev still points to the original block device inode and thus we get two active block device inodes for the same device. They will share the same gendisk so the only visible differences will be that page caches will not be coherent and BDIs will be different (the old block device inode still points to unregistered BDI). Fix the problem by checking in bd_acquire() whether i_bdev still points to active block device inode and re-lookup the block device if not. That way any open of a block device happening after the old device has been removed will get correct block device inode. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit cccd9fb9) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Iteration over partitions in del_gendisk() omits part0. Add bdev_unhash_inode() call for the whole device. Otherwise if the device number gets reused, bdev inode will be still associated with the old (stale) bdi. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit d06e05c0) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Move bdev_unhash_inode() after invalidate_partition() as invalidate_partition() looks up bdev and it cannot find the right bdev inode after bdev_unhash_inode() is called. Thus invalidate_partition() would not invalidate page cache of the previously used bdev. Also use part_devt() when calling bdev_unhash_inode() instead of manually creating the device number. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 4b8c861a) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit efa7c9f9) Conflicts: block/compat_ioctl.c block/ioctl.c fs/btrfs/volumes.c include/linux/blkdev.h Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currenly blk_get_backing_dev_info() is not safe to be called when the block device is not open as bdev->bd_disk is NULL in that case. However inode_to_bdi() uses this function and may be call called from flusher worker or other writeback related functions without bdev being open which leads to crashes such as: [113031.075540] Unable to handle kernel paging request for data at address 0x00000000 [113031.075614] Faulting instruction address: 0xc0000000003692e0 0:mon> t [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590 [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150 [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450 [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580 [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590 [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660 [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130 [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit b1d2dc56) Conflicts: fs/block_dev.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit d03f6cdc) Conflicts: block/blk-sysfs.c include/linux/backing-dev-defs.h Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (back ported from commit dc3b17cc) Conflicts: block/blk-cgroup.c block/blk-core.c block/blk-settings.c block/blk-sysfs.c block/blk-wbt.c drivers/block/aoe/aoeblk.c drivers/block/drbd/drbd_nl.c drivers/md/dm.c drivers/md/md.c Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jan Kara authored
BugLink: http://bugs.launchpad.net/bugs/1659111 Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily. Furthermore we need to avoid issues when new device with the same major,minor pair gets created since reusing the bdi structure would be rather difficult in this case. Unhashing block device inode on gendisk destruction nicely deals with these problems. Once last block device inode reference is dropped (which may be directly in del_gendisk()), the inode gets evicted. Furthermore if the major,minor pair gets reallocated, we are guaranteed to get new block device inode even if old block device inode is not yet evicted and thus we avoid issues with possible reuse of bdi. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit f44f1ab5) Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Masaki Ota authored
BugLink: http://bugs.launchpad.net/bugs/1662589 Devices identified as E7="73 03 28" use slightly modified version of V8 protocol, with lower count per electrode, different offsets, and different feature bits in OTP data. Fixes: aeaa881f ("Input: ALPS - set DualPoint flag for 74 03 28 devices") Signed-off-by: Masaki Ota <masaki.ota@jp.alps.com> Acked-by: Pali Rohar <pali.rohar@gmail.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Nick Fletcher <nick.m.fletcher@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> (backported from commit e7348396) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Greg Kroah-Hartman authored
BugLink: http://bugs.launchpad.net/bugs/1683728Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-