- 13 Mar, 2018 22 commits
-
-
Andy Lutomirski authored
BugLink: https://bugs.launchpad.net/bugs/1664602 I got a couple more reports: the Samsung APST issues appears to affect multiple 950-series devices in Dell XPS 15 9550 and Precision 5510 laptops. Change the quirk: rather than blacklisting the firmware on the first problematic SSD that was reported, disable APST on all 144d:a802 devices if they're installed in the two affected Dell models. While we're at it, disable only the deepest sleep state instead of all of them -- the reporters say that this is sufficient to fix the problem. (I have a device that appears to be entirely identical to one of the affected devices, but I have a different Dell laptop, so it's not the case that all Samsung devices with firmware BXW75D0Q are broken under all circumstances.) Samsung engineers have an affected system, and hopefully they'll give us a better workaround some time soon. In the mean time, this should minimize regressions. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> (backported from commit ff5350a8) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Lutomirski authored
BugLink: https://bugs.launchpad.net/bugs/1664602 NVMe devices can advertise multiple power states. These states can be either "operational" (the device is fully functional but possibly slow) or "non-operational" (the device is asleep until woken up). Some devices can automatically enter a non-operational state when idle for a specified amount of time and then automatically wake back up when needed. The hardware configuration is a table. For each state, an entry in the table indicates the next deeper non-operational state, if any, to autonomously transition to and the idle time required before transitioning. This patch teaches the driver to program APST so that each successive non-operational state will be entered after an idle time equal to 100% of the total latency (entry plus exit) associated with that state. The maximum acceptable latency is controlled using dev_pm_qos (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational states with total latency greater than this value will not be used. As a special case, setting the latency tolerance to 0 will disable APST entirely. On hardware without APST support, the sysfs file will not be exposed. The latency tolerance for newly-probed devices is set by the module parameter nvme_core.default_ps_max_latency_us. In theory, the device can expose "default" APST table, but this doesn't seem to function correctly on my device (Samsung 950), nor does it seem particularly useful. There is also an optional mechanism by which a configuration can be "saved" so it will be automatically loaded on reset. This can be configured from userspace, but it doesn't seem useful to support in the driver. On my laptop, enabling APST seems to save nearly 1W. The hardware tables can be decoded in userspace with nvme-cli. 'nvme id-ctrl /dev/nvmeN' will show the power state table and 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST configuration. This feature is quirked off on a known-buggy Samsung device. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com> (backported from commit c5552fde) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Lutomirski authored
BugLink: https://bugs.launchpad.net/bugs/1664602 Currently, all NVMe quirks are based on PCI IDs. Add a mechanism to define quirks based on identify_ctrl's vendor id, model number, and/or firmware revision. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com> (backported from commit bd4da3ab) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Lutomirski authored
BugLink: https://bugs.launchpad.net/bugs/1664602 Any user I can imagine that needs a buffer at all will want to pass a pointer directly. There are no currently callers that use buffers, so this change is painless, and it will make it much easier to start using features that use buffers (e.g. APST). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 1a6fe74d) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Lutomirski authored
BugLink: https://bugs.launchpad.net/bugs/1664602 nvme_set_features() callers seem to expect that passing NULL as the result pointer is acceptable. Teach nvme_set_features() not to try to write to the NULL address. For symmetry, make the same change to nvme_get_features(), despite the fact that all current callers pass a valid result pointer. I assume that this bug hasn't been reported in practice because the callers that pass NULL are all in the SCSI translation layer and no one uses the relevant operations. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 9b47f77a) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Christoph Hellwig authored
BugLink: https://bugs.launchpad.net/bugs/1664602 NVMe over fabrics will use __nvme_submit_sync_cmd in the the transport and require a few tweaks to it. For that we export it and add a few more paramters: 1. allow passing a queue ID to the block layer For the NVMe over Fabrics connect command we need to able to specify a queue ID that we want to send the command on. Add a qid parameter to the relevant functions to enable this behavior. 2. allow submitting at_head commands In cases where we want to (re)connect to a controller where we have inflight queued commands we want to first connect and only then allow the other queued commands to be kicked. This will prevents failures in controller resets and reconnects. 3. allow passing flags to blk_mq_allocate_request Both for Fabrics connect the the keep-alive feature in NVMe 1.2.1 we want to be able to use reserved requests. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit eb71f435) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Christoph Hellwig authored
BugLink: https://bugs.launchpad.net/bugs/1664602 Centralize the check if a given NVMe command reads or writes data. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com> (backported from commit 7a5abb4b) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Christoph Hellwig authored
BugLink: https://bugs.launchpad.net/bugs/1664602 Both LighNVM and NVMe over Fabrics need to look at more than just the status and result field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bj?rling <m@bjorling.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com> (backported from commit 1cb3cce5) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Lutomirski authored
BugLink: https://bugs.launchpad.net/bugs/1664602 As far as I can tell, there is basically nothing correct about this code. It misinterprets npss (off-by-one). It hardcodes a bunch of power states, which is nonsense, because they're all just indices into a table that software needs to parse. It completely ignores the distinction between operational and non-operational states. And, until 4.8, if all of the above magically succeeded, it would dereference a NULL pointer and OOPS. Since this code appears to be useless, just delete it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 26501db8) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Acked-By: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Alexei Starovoitov authored
when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a52670 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> (backported from commit c131187d) CVE-2017-17862 [ saf: Add partial backport of 3df126f3 ("bpf: don't (ab)use instructions to store state") to add bpf_insn_aux_data state to verifier_env ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Jann Horn authored
[ Upstream commit 95a762e2 ] Distinguish between BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit) and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit); only perform sign extension in the first case. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16995 for this issue. v3: - add CVE number (Ben Hutchings) Fixes: 48461135 ("bpf: allow access into map value arrays") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> CVE-2017-16995 [ saf: Backport to 4.4. Include partial backports of 4923ec0b ("bpf: simplify verifier register state assignments") and 969bf05e ("bpf: direct packet access") to extend reg_state.imm to 64-bit. ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Wanpeng Li authored
CVE-2017-17741 Reported by syzkaller: BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm] Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298 CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0xab/0xe1 print_address_description+0x6b/0x290 kasan_report+0x28a/0x370 write_mmio+0x11e/0x270 [kvm] emulator_read_write_onepage+0x311/0x600 [kvm] emulator_read_write+0xef/0x240 [kvm] emulator_fix_hypercall+0x105/0x150 [kvm] em_hypercall+0x2b/0x80 [kvm] x86_emulate_insn+0x2b1/0x1640 [kvm] x86_emulate_instruction+0x39a/0xb90 [kvm] handle_exception+0x1b4/0x4d0 [kvm_intel] vcpu_enter_guest+0x15a0/0x2640 [kvm] kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall) to the guest memory, however, write_mmio tracepoint always prints 8 bytes through *(u64 *)val since kvm splits the mmio access into 8 bytes. This leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes it by just accessing the bytes which we operate on. Before patch: syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f After patch: syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (backported from e39d200f) Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Mohamed Ghannam authored
set rm->atomic.op_active to 0 when rds_pin_pages() fails or the user supplied address is invalid, this prevents a NULL pointer usage in rds_atomic_free_op() Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> CVE-2018-5333 (cherry picked from commit 7d11f77f) Signed-off-by: Benjamin M Romer <benjamin.romer@canonical.com> Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Benjamin M Romer <benjamin.romer@canonical.com>
-
Ido Schimmel authored
BugLink: http://bugs.launchpad.net/bugs/1738219 When the 'ignore_routes_with_linkdown' sysctl is set, we should not consider linkdown nexthops during route lookup. While the code correctly verifies that the initially selected route ('match') has a carrier, it does not perform the same check in the subsequent multipath selection, resulting in a potential packet loss. In case the chosen route does not have a carrier and the sysctl is set, choose the initially selected route. Fixes: 35103d11 ("net: ipv6 sysctl option to ignore routes when nexthop link is down") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit bbfcd776) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Ryan Harper authored
BugLink: http://bugs.launchpad.net/bugs/1729145 - decouple emitting a cached_dev CHANGE uevent which includes dev.uuid and dev.label from bch_cached_dev_run() which only happens when a bcacheX device is bound to the actual backing block device (bcache0 -> vdb) - update bch_cached_dev_run() to invoke bch_cached_dev_emit_change() as needed; no functional code path changes here - Modify register_bcache to detect a re-registering of a bcache cached_dev, and in that case call bcache_cached_dev_emit_change() to Signed-off-by: Ryan Harper <ryan.harper@canonical.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Benjamin Poirier authored
BugLink: http://bugs.launchpad.net/bugs/1730550 Lennart reported the following race condition: \ e1000_watchdog_task \ e1000e_has_link \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link /* link is up */ mac->get_link_status = false; /* interrupt */ \ e1000_msix_other hw->mac.get_link_status = true; link_active = !hw->mac.get_link_status /* link_active is false, wrongly */ This problem arises because the single flag get_link_status is used to signal two different states: link status needs checking and link status is down. Avoid the problem by using the return value of .check_for_link to signal the link status to e1000e_has_link(). Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> (cherry picked from commit 19110cfb) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Benjamin Poirier authored
BugLink: http://bugs.launchpad.net/bugs/1730550 When e1000e_poll() is not fast enough to keep up with incoming traffic, the adapter (when operating in msix mode) raises the Other interrupt to signal Receiver Overrun. This is a double problem because 1) at the moment e1000_msix_other() assumes that it is only called in case of Link Status Change and 2) if the condition persists, the interrupt is repeatedly raised again in quick succession. Ideally we would configure the Other interrupt to not be raised in case of receiver overrun but this doesn't seem possible on this adapter. Instead, we handle the first part of the problem by reverting to the practice of reading ICR in the other interrupt handler, like before commit 16ecba59 ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit 0a8047ac ("e1000e: Fix msi-x interrupt automask") which cleared IAME from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts anymore. We handle the second part of the problem by not re-enabling the Other interrupt right away when there is overrun. Instead, we wait until traffic subsides, napi polling mode is exited and interrupts are re-enabled. Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> Fixes: 16ecba59 ("e1000e: Do not read ICR in Other interrupt") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> (cherry picked from commit 4aea7a5c) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Alan Liu authored
BugLink: https://bugs.launchpad.net/bugs/1736317 QCA6174 WLAN.RM.2.0 firmware uses max_tx_power instead of using max_reg_power to set transmission power. The tx power was about -50dbm, after applying this change, it become -32dbm. Signed-off-by: Alan Liu <alanliu@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> (cherry picked from commit 513527c8) Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com> Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Kai Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Dan Carpenter authored
BugLink: https://bugs.launchpad.net/bugs/1720228 It's possible to use "err" without initializing it. If it happens to be a 2 which is SCSI_DH_RETRY then that could cause a bug. Bart Van Assche pointed out that we should probably re-initialize it for every iteration through the retry loop. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinicke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com> (cherry picked from commit a4bd8520) Signed-off-by: Dragan Stancevic <dragan.stancevic@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Vinson Lee authored
BugLink: http://bugs.launchpad.net/bugs/1734757Suggested-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: Vinson Lee <vlee@freedesktop.org> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Colin Ian King authored
BugLink: https://bugs.launchpad.net/bugs/1703742 The current configuration is set to always use transparent hugepages by default. There exists plenty of anecdotal evidence that this is less than perfect a choice and in some scenarios it leads to some performance issues. My own investigations with stress-ng stream and malloc tests show that the current default impacts performance. I ran various test scenarios on different MADVISE configurations, each result below is based on the average of 5 runs on an i7-3770 CPU @ 3.4GHz with 8GB memory, 8MB L3 cache, 256K L2 cache, 32K/32K L1 cache. All the above results are from an average of 5 rounds of tests. malloc allocation stressor: malloc always madvise size (MB) ops/sec ops/sec 32 1254.43 2422.49 64 2100.36 4300.28 128 3768.57 7215.38 256 7940.73 14893.85 512 17618.62 26861.29 1024 32777.17 48029.37 Clearly madvise is more performent. stream bandwidth/compute stressor: stream always madvise NOHUGEPAGE size (MB) MB/sec MB/sec 1 17713.54 18439.69 2 12460.34 13015.46 4 12195.81 12694.51 8 12085.11 12674.26 16 12054.09 12649.00 32 12082.42 12409.65 64 12262.88 12084.85 128 12235.25 11788.49 256 11808.69 11283.69 512 11970.01 12434.82 For small allocations, always is less performant. Large allocations can enable the more performant transparent huge pages with madvise(2) if we disable always as default. Other stress-ng memory allocation/writing/freeing and madvise operations showed little significant differences. I have also experimented with boot testing Ubuntu with kernels configured with different MADVISE configs and found there is little noticeable difference in performance, so I believe that there is little scope for any kitten killer performance regressions with this change. This change will by default not use transparent huge pages unless madvise(2) is used to instruct the kernel to do so on a memory mapping. According to the madvise(2) manual, this only takes effect on private anonymous mappings with MADV_HUGEPAGE. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Kleber Sacilotto de Souza authored
Ignore: yes Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
-
- 12 Feb, 2018 3 commits
-
-
Khalid Elmously authored
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Whitcroft authored
Fix a miss-backport of the upstream commit. Fixes: 63da13a9 ("net: ipv4: fix for a race condition in raw_sendmsg -- fix backport") BugLink: http://bugs.launchpad.net/bugs/1748671Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Khalid Elmously authored
Ignore: yes Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
- 11 Feb, 2018 8 commits
-
-
Andy Whitcroft authored
Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
Ignore: yes Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
CVE-2017-5715 (Spectre v2 Intel) Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
CVE-2017-5715 (Spectre v2 Intel) Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
CVE-2017-5715 (Spectre v2 Intel) When we have full retpoline enabled then we do not actually need to toggle IBRS on entering and leaving the kernel. Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
Andy Whitcroft authored
CVE-2017-5715 (Spectre v2 Intel) This reverts commit d31a04f8. Signed-off-by: Andy Whitcroft <apw@canonical.com>
-
- 09 Feb, 2018 7 commits
-
-
Khalid Elmously authored
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Eric Desrochers authored
BugLink: https://bugs.launchpad.net/bugs/1744117 Add codec IDs for several recently released, pending, and historical NVIDIA GPU audio controllers to the patch table, to allow the correct patch functions to be selected for them. Signed-off-by: Daniel Dadap <ddadap@nvidia.com> Reviewed-by: Andy Ritger <aritger@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> (cherry picked from commit 74ec1181) Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com> Acked-by: Hui Wang <hui.wang@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Rafael David Tinoco authored
BugLink: https://bugs.launchpad.net/bugs/1569925 If, for any reason, userland shuts down iscsi transport interfaces before proper logouts - like when logging in to LUNs manually, without logging out on server shutdown, or when automated scripts can't umount/logout from logged LUNs - kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths. PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c This happens because iscsi_eh_cmd_timed_out(), the transport layer timeout helper, would tell the queue timeout function (scsi_times_out) to reset the request timer over and over, until the session state is back to logged in state. Unfortunately, during server shutdown, this might never happen again. Other option would be "not to handle" the issue in the transport layer. That would trigger the error handler logic, which would also need the session state to be logged in again. Best option, for such case, is to tell upper layers that the command was handled during the transport layer error handler helper, marking it as DID_NO_CONNECT, which will allow completion and inform about the problem. After the session was marked as ISCSI_STATE_FAILED, due to the first timeout during the server shutdown phase, all subsequent cmds will fail to be queued, allowing upper logic to fail faster. Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry-picked from commit d7549412 next-20180117) Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Xinyu Lin authored
BugLink: http://bugs.launchpad.net/bugs/1743053 LITEON EP1 has the same timeout issues as CX1 series devices. Revert max_sectors to the value of 1024. 'e0edc8c5 ("libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices")' Signed-off-by: Xinyu Lin <xinyu0123@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit db5ff909) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Alexander Yarygin authored
BugLink: http://bugs.launchpad.net/bugs/1747090 Some facility bits are in a range that is defined to be "ok for guests without any necessary hypervisor changes". Enable those bits. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> (cherry picked from commit ed8dda0b) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Christian Borntraeger authored
BugLink: http://bugs.launchpad.net/bugs/1747090 The new firmware interfaces for branch prediction behaviour changes are transparently available for the guest. Nevertheless, there is new state attached that should be migrated and properly resetted. Provide a mechanism for handling reset and migration. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> (back ported from commit 35b3fde6) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-
Andy Whitcroft authored
CVE-2017-5715 (Spectre v2 Intel) When we have full retpoline enabled then we do not actually require IBPB flushes when entering the kernel. Add a new use_ibpb bit to represent when we have retpoline enabled. Further split the enable bit into two 0x1 representing whether entry IBPB is enabled and 0x10 representing whether kernel flushes for userspace/VMs etc are applied. Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
-