- 08 Mar, 2017 18 commits
-
-
Johan Hovold authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 6d104af3 upstream. Not all platforms support DMA to the stack, and specifically since v4.9 this is no longer supported on x86 with VMAP_STACK either. Note that the macro-mode buffer was larger than necessary. Fixes: 6f78193e ("HID: corsair: Add Corsair Vengeance K90 driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Bjorn Helgaas authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 51ebfc92 upstream. A PCI-to-PCIe bridge (a "reverse bridge") has a PCI or PCI-X primary interface and a PCI Express secondary interface. The PCIe interface is a Downstream Port that originates a Link. See the "PCI Express to PCI/PCI-X Bridge Specification", rev 1.0, sections 1.2 and A.6. The bug report below involves a PCI-to-PCIe bridge and a PCIe switch below the bridge: 00:1e.0 Intel 82801 PCI Bridge to [bus 01-0a] 01:00.0 Pericom PI7C9X111SL PCIe-to-PCI Reversible Bridge to [bus 02-0a] 02:00.0 Pericom Device 8608 [PCIe Upstream Port] to [bus 03-0a] 03:01.0 Pericom Device 8608 [PCIe Downstream Port] to [bus 0a] 01:00.0 is configured as a PCI-to-PCIe bridge (despite the name printed by lspci). As we traverse a PCIe hierarchy, device connections alternate between PCIe Links and internal Switch logic. Previously we did not recognize that 01:00.0 had a secondary link, so we thought the 02:00.0 Upstream Port *did* have a secondary link. In fact, it's the other way around: 01:00.0 has a secondary link, and 02:00.0 has internal Switch logic on its secondary side. When we thought 02:00.0 had a secondary link, the pci_scan_slot() -> only_one_child() path assumed 02:00.0 could have only one child, so 03:00.0 was the only possible downstream device. But 03:00.0 doesn't exist, so we didn't look for any other devices on bus 03. Booting with "pci=pcie_scan_all" is a workaround, but we don't want users to have to do that. Recognize that PCI-to-PCIe bridges originate links on their secondary interfaces. Link: https://bugzilla.kernel.org/show_bug.cgi?id=189361 Fixes: d0751b98 ("PCI: Add dev->has_secondary_link to track downstream PCIe links") Tested-by: Blake Moore <blake.moore@men.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Tahsin Erdogan authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit a8a86d78 upstream. fuse_abort_conn() moves requests from pending list to a temporary list before canceling them. This operation races with request_wait_answer() which also tries to remove the request after it gets a fatal signal. It checks FR_PENDING flag to determine whether the request is still in the pending list. Make fuse_abort_conn() clear FR_PENDING flag so that request_wait_answer() does not remove the request from temporary list. This bug causes an Oops when trying to delete an already deleted list entry in end_requests(). Fixes: ee314a87 ("fuse: abort: no fc->lock needed for request ending") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
J. Bruce Fields authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 78794d18 upstream. Context expiry times are in units of seconds since boot, not unix time. The use of get_seconds() here therefore sets the expiry time decades in the future. This prevents timely freeing of contexts destroyed by client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually (when the module is unloaded or the container shut down), but a lot of contexts could pile up before then. Fixes: c5b29f88 "sunrpc: use seconds since boot in expiry cache" Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Bjorn Helgaas authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 89e9f7bc upstream. Martin reported that the Supermicro X8DTH-i/6/iF/6F advertises incorrect host bridge windows via _CRS: pci_root PNP0A08:00: host bridge window [io 0xf000-0xffff] pci_root PNP0A08:01: host bridge window [io 0xf000-0xffff] Both bridges advertise the 0xf000-0xffff window, which cannot be correct. Work around this by ignoring _CRS on this system. The downside is that we may not assign resources correctly to hot-added PCI devices (if they are possible on this system). Link: https://bugzilla.kernel.org/show_bug.cgi?id=42606Reported-by: Martin Burnicki <martin.burnicki@meinberg.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Gu Zheng authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 497de07d upstream. This change was missed the tmpfs modification in In CVE-2016-7097 commit 07393101 ("posix_acl: Clear SGID bit when setting file permissions") It can test by xfstest generic/375, which failed to clear setgid bit in the following test case on tmpfs: touch $testfile chown 100:100 $testfile chmod 2755 $testfile _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile Signed-off-by: Gu Zheng <guzheng1@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Vladimir Zapolskiy authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit af92305e upstream. On i.MX31 AVIC interrupt controller base address is at 0x68000000. The problem was shadowed by the AVIC driver, which takes the correct base address from a SoC specific header file. Fixes: d2a37b3d ("ARM i.MX31: Add devicetree support") Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Vladimir Zapolskiy authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 1f87aee6 upstream. i.MX31 Clock Control Module controller is found on AIPS2 bus, move it there from SPBA bus to avoid a conflict of device IO space mismatch. Fixes: ef0e4a60 ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup") Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Vladimir Zapolskiy authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 2e575cbc upstream. The type of AVIC interrupt controller found on i.MX31 is one-cell, namely 31 for CCM DVFS and 53 for CCM, however for clock control module its interrupts are specified as 3-cells, fix it. Fixes: ef0e4a60 ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup") Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Arnaldo Carvalho de Melo authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit cf346d5b upstream. Both register_perl_scripting() and register_python_scripting() allocate this variable, fix it by checking if it already was. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 7e4b21b8 ("perf/scripts: Add Python scripting engine") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Kamal Heib authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 0b59970e upstream. Remove the warning print of "can't use of GFP_NOIO" to avoid prints in each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO. This print become more annoying when the IPoIB interface is configured to work in connected mode. Fixes: 09b93088 ('IB: Add a QP creation flag to use GFP_NOIO allocations') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Eran Ben Elisha authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 1f22e454 upstream. According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is supported only if dmfs_ipoib bit is set. If it isn't set we want to ensure allocating NET_IF QPs fail. We do so by filling out the allocation bitmap. By thus, the NET_IF QPs allocating function won't find any free QP and will fail. Fixes: c1c98501 ('IB/mlx4: Add support for steerable IB UD QPs') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Saeed Mahameed authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 6fa26208 upstream. Report the correct speed in the port attributes when using a 56Gbps ethernet link. Without this change the field is incorrectly set to 10. Fixes: a9c766bb ('IB/mlx4: Fix info returned when querying IBoE ports') Fixes: 2e96691c ('IB: Use central enum for speed instead of hard-coded values') Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jack Morgenstein authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit c482af64 upstream. For non-special QPs, the port value becomes non-zero only at the RESET-to-INIT transition. If the QP has not undergone that transition, its port number value is still zero. If such a QP is destroyed before being moved out of the RESET state, subtracting one from the qp port number results in a negative value. Using that negative value as an index into the qp1_proxy array results in an out-of-bounds array reference. Fix this by testing that the QP type is one that uses qp1_proxy before using the port number. For special QPs of all types, the port number is specified at QP creation time. Fixes: 9433c188 ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Maor Gottlieb authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit af4295c1 upstream. Set traffic class within sl_tclass_flowlabel when create iboe AH. Without this the TOS value will be empty when running VLAN tagged traffic, because the TOS value is taken from the traffic class in the address handle attributes. Fixes: 9106c410 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Eli Cohen authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit acbda523 upstream. Wait before continuing unload till all pending mkey async creation requests are done. Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Steven Rostedt authored
BugLink: http://bugs.launchpad.net/bugs/1660993 commit 8329e818 upstream. Matt Fleming reported seeing crashes when enabling and disabling function profiling which uses function graph tracer. Later Namhyung Kim hit a similar issue and he found that the issue was due to the jmp to ftrace_stub in ftrace_graph_call was only two bytes, and when it was changed to jump to the tracing code, it overwrote the ftrace_stub that was after it. Masami Hiramatsu bisected this down to a binutils change: 8dcea93252a9ea7dff57e85220a719e2a5e8ab41 is the first bad commit commit 8dcea93252a9ea7dff57e85220a719e2a5e8ab41 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 15 03:17:31 2015 -0700 Add -mshared option to x86 ELF assembler This patch adds -mshared option to x86 ELF assembler. By default, assembler will optimize out non-PLT relocations against defined non-weak global branch targets with default visibility. The -mshared option tells the assembler to generate code which may go into a shared library where all non-weak global branch targets with default visibility can be preempted. The resulting code is slightly bigger. This option only affects the handling of branch instructions. Declaring ftrace_stub as a weak call prevents gas from using two byte jumps to it, which would be converted to a jump to the function graph code. Link: http://lkml.kernel.org/r/20160516230035.1dbae571@gandalf.local.homeReported-by: Matt Fleming <matt@codeblueprint.co.uk> Reported-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Thadeu Lima de Souza Cascardo authored
Ignore: yes Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 03 Mar, 2017 4 commits
-
-
Stefan Bader authored
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Alexander Popov authored
Currently N_HDLC line discipline uses a self-made singly linked list for data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after an error. The commit be10eb75 ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf. After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put one data buffer to tx_free_buf_list twice. That causes double free in n_hdlc_release(). Let's use standard kernel linked list and get rid of n_hdlc.tbuf: in case of tx error put current data buffer after the head of tx_buf_list. Signed-off-by: Alexander Popov <alex.popov@linux.com> CVE-2017-2636 Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Jiri Slaby authored
The class of 4 n_hdls buf locks is the same because a single function n_hdlc_buf_list_init is used to init all the locks. But since flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep emits a warning: ============================================= [ INFO: possible recursive locking detected ] 4.3.0-25.g91e30a7-default #1 Not tainted --------------------------------------------- a.out/1248 is trying to acquire lock: (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc] but task is already holding lock: (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->spinlock)->rlock); lock(&(&list->spinlock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by a.out/1248: #0: (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50 #1: (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc] ... Call Trace: ... [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70 [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc] [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc] [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40 ... Fix it by initializing the spin_locks separately. This removes also reduntand memset of a freshly kzallocated space. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CVE-2017-2636 (cherry-picked from e9b736d8 upstream) Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Stefan Bader authored
Ignore: yes Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
- 20 Feb, 2017 3 commits
-
-
Stefan Bader authored
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Andrey Konovalov authored
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet is forcibly freed via __kfree_skb in dccp_rcv_state_process if dccp_v6_conn_request successfully returns. However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb is saved to ireq->pktopts and the ref count for skb is incremented in dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed in dccp_rcv_state_process. Fix by calling consume_skb instead of doing goto discard and therefore calling __kfree_skb. Similar fixes for TCP: fb7e2399 [TCP]: skb is unexpectedly freed. 0aea76d3 tcp: SYN packets are now simply consumed Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> CVE-2017-6074 BugLink: http://bugs.launchpad.net/bugs/1665935 (cherry-picked from 5edabca9 davem) Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Stefan Bader authored
Ignore: yes Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
- 01 Feb, 2017 1 commit
-
-
Thadeu Lima de Souza Cascardo authored
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 31 Jan, 2017 5 commits
-
-
Thadeu Lima de Souza Cascardo authored
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Keno Fischer authored
BugLink: http://bugs.launchpad.net/bugs/1658270 In 19be0eaf ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE after a COW was resolved to setting the (newly introduced) FOLL_COW instead. Simultaneously, the check in gup.c was updated to still allow writes with FOLL_FORCE set if FOLL_COW had also been set. However, a similar check in huge_memory.c was forgotten. As a result, remote memory writes to ro regions of memory backed by transparent huge pages cause an infinite loop in the kernel (handle_mm_fault sets FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is true. While in this state the process is stil SIGKILLable, but little else works (e.g. no ptrace attach, no other signals). This is easily reproduced with the following code (assuming thp are set to always): #include <assert.h> #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #define TEST_SIZE 5 * 1024 * 1024 int main(void) { int status; pid_t child; int fd = open("/proc/self/mem", O_RDWR); void *addr = mmap(NULL, TEST_SIZE, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr != MAP_FAILED); pid_t parent_pid = getpid(); if ((child = fork()) == 0) { void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr2 != MAP_FAILED); memset(addr2, 'a', TEST_SIZE); pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr); return 0; } assert(child == waitpid(child, &status, 0)); assert(WIFEXITED(status) && WEXITSTATUS(status) == 0); return 0; } Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to the update in gup.c in the original commit. The same pattern exists in follow_devmap_pmd. However, we should not be able to reach that check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever do. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Vitaly Kuznetsov authored
BugLink: http://bugs.launchpad.net/bugs/1630924 It may happen that secondary CPUs are still alive and resetting hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc() as we don't check for it being not NULL there. It is safe as we're not freeing this page anyways. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 56ef6718) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Vitaly Kuznetsov authored
BugLink: http://bugs.launchpad.net/bugs/1630924 There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt') which injects NMI to the guest. We may want to crash the guest and do kdump on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to allow the kdump kernel to re-establish VMBus connection so it will see VMBus devices (storage, network,..). To properly unload VMBus making it possible to start over during kdump we need to do the following: - Send an 'unload' message to the hypervisor. This can be done on any CPU so we do this the crashing CPU. - Receive the 'unload finished' reply message. WS2012R2 delivers this message to the CPU which was used to establish VMBus connection during module load and this CPU may differ from the CPU sending 'unload'. Receiving a VMBus message means the following: - There is a per-CPU slot in memory for one message. This slot can in theory be accessed by any CPU. - We get an interrupt on the CPU when a message was placed into the slot. - When we read the message we need to clear the slot and signal the fact to the hypervisor. In case there are more messages to this CPU pending the hypervisor will deliver the next message. The signaling is done by writing to an MSR so this can only be done on the appropriate CPU. To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload() function which checks message slots for all CPUs in a loop waiting for the 'unload finished' messages. However, there is an issue which arises when these conditions are met: - We're crashing on a CPU which is different from the one which was used to initially contact the hypervisor. - The CPU which was used for the initial contact is blocked with interrupts disabled and there is a message pending in the message slot. In this case we won't be able to read the 'unload finished' message on the crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs simultaneously: the first CPU entering panic() will proceed to crash and all other CPUs will stop themselves with interrupts disabled. The suggested solution is to handle unknown NMIs for Hyper-V guests on the first CPU which gets them only. This will allow us to rely on VMBus interrupt handler being able to receive the 'unload finish' message in case it is delivered to a different CPU. The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the boot CPU only, WS2012R2 and earlier Hyper-V versions are affected. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: devel@linuxdriverproject.org Cc: Haiyang Zhang <haiyangz@microsoft.com> Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 59107e2f) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Joseph Salisbury authored
BugLink: http://bugs.launchpad.net/bugs/1630238 CONFIG_PINCTRL_CHERRYVIEW=y (must be y, not m) in addition to CONFIG_TOUCHSCREEN_ELAN to get the touchscreen working on CYAN. The change of Intel related PINCTRLs (specifically the CHERRYVIEW one) to m instead of y causes at least the touchscreen regression. Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 30 Jan, 2017 3 commits
-
-
Colin Ian King authored
BugLink: http://bugs.launchpad.net/bugs/1652132 Quirking the following AMI USB device with ALWAYS_POLL fixes an AMI virtual keyboard and mouse from not responding and timing out when it is attached to a ppc64el Power 8 system and when we have some rapid open/closes on the mouse device. usb 1-3: new high-speed USB device number 2 using xhci_hcd usb 1-3: New USB device found, idVendor=046b, idProduct=ff01 usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 usb 1-3: Product: Virtual Hub usb 1-3: Manufacturer: American Megatrends Inc. usb 1-3: SerialNumber: serial usb 1-3.3: new high-speed USB device number 3 using xhci_hcd usb 1-3.3: New USB device found, idVendor=046b, idProduct=ff31 usb 1-3.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 usb 1-3.3: Product: Virtual HardDisk Device usb 1-3.3: Manufacturer: American Megatrends Inc. usb 1-3.4: new low-speed USB device number 4 using xhci_hcd usb 1-3.4: New USB device found, idVendor=046b, idProduct=ff10 usb 1-3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 1-3.4: Product: Virtual Keyboard and Mouse usb 1-3.4: Manufacturer: American Megatrends Inc. With the quirk I have not been able to trigger the issue with half an hour of saturation soak testing. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Joseph Salisbury <joseph.salisbury@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Kamal Mostafa authored
This long-standing oversight in our debian rules hardcodes the string "linux" instead of using the $(src_pkg_name) for just one of the generated .deb package names: linux-headers-x.x.x-x. Lets fix it in the generic branches (T,X,Y,Z,unstable) so that we won't have to keep applying this patch to each of the derivative/custom kernels. -----8<----- Ignore: yes Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Ben Romer <ben.romer@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Alexander Duyck authored
BugLink: http://bugs.launchpad.net/bugs/1658491 When I was adding the code for enabling VLAN promiscuous mode with SR-IOV enabled I had inadvertently left the VLNCTRL.VFE bit unchanged as I has assumed there was code in another path that was setting it when we enabled SR-IOV. This wasn't the case and as a result we were just disabling VLAN filtering for all the VFs apparently. Also the previous patches were always clearing CFIEN which was always set to 0 by the hardware anyway so I am dropping the redundant bit clearing. Fixes: 16369564 ("ixgbe: Add support for VLAN promiscuous with SR-IOV") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> (cherry-picked from commit f60439bc) Fixes: 6c39aa93 ("ixgbe: Add support for VLAN promiscuous with SR-IOV") Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
- 26 Jan, 2017 6 commits
-
-
Michal Hocko authored
BugLink: https://bugs.launchpad.net/bugs/1655842 There have been several reports about pre-mature OOM killer invocation in 4.7 kernel when order-2 allocation request (for the kernel stack) invoked OOM killer even during basic workloads (light IO or even kernel compile on some filesystems). In all reported cases the memory is fragmented and there are no order-2+ pages available. There is usually a large amount of slab memory (usually dentries/inodes) and further debugging has shown that there are way too many unmovable blocks which are skipped during the compaction. Multiple reporters have confirmed that the current linux-next which includes [1] and [2] helped and OOMs are not reproducible anymore. A simpler fix for the late rc and stable is to simply ignore the compaction feedback and retry as long as there is a reclaim progress and we are not getting OOM for order-0 pages. We already do that for CONFING_COMPACTION=n so let's reuse the same code when compaction is enabled as well. [1] http://lkml.kernel.org/r/20160810091226.6709-1-vbabka@suse.cz [2] http://lkml.kernel.org/r/f7a9ea9d-bb88-bfd6-e340-3a933559305a@suse.cz Fixes: 0a0337e0 ("mm, oom: rework oom detection") Link: http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.czSigned-off-by: Michal Hocko <mhocko@suse.com> Tested-by: Olaf Hering <olaf@aepfle.de> Tested-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Cc: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [4.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6b4e3181) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
-
Michal Hocko authored
BugLink: https://bugs.launchpad.net/bugs/1655842 Joonsoo has reported that he is able to trigger OOM for !costly high order requests (heavy fork() workload close the OOM) with the new oom detection rework. This is because we rely only on should_reclaim_retry when the compaction is disabled and it only checks watermarks for the requested order and so we might trigger OOM when there is a lot of free memory. It is not very clear what are the usual workloads when the compaction is disabled. Relying on high order allocations heavily without any mechanism to create those orders except for unbound amount of reclaim is certainly not a good idea. To prevent from potential regressions let's help this configuration some. We have to sacrifice the determinsm though because there simply is none here possible. should_compact_retry implementation for !CONFIG_COMPACTION, which was empty so far, will do watermark check for order-0 on all eligible zones. This will cause retrying until either the reclaim cannot make any further progress or all the zones are depleted even for order-0 pages. This means that the number of retries is basically unbounded for !costly orders but that was the case before the rework as well so this shouldn't regress. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.orgReported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (backported from commit 31e49bfd) [cascardo: fixup ac_classzone_idx to ac->classzone_idx] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
-
Michal Hocko authored
BugLink: https://bugs.launchpad.net/bugs/1655842 "mm: consider compaction feedback also for costly allocation" has removed the upper bound for the reclaim/compaction retries based on the number of reclaimed pages for costly orders. While this is desirable the patch did miss a mis interaction between reclaim, compaction and the retry logic. The direct reclaim tries to get zones over min watermark while compaction backs off and returns COMPACT_SKIPPED when all zones are below low watermark + 1<<order gap. If we are getting really close to OOM then __compaction_suitable can keep returning COMPACT_SKIPPED a high order request (e.g. hugetlb order-9) while the reclaim is not able to release enough pages to get us over low watermark. The reclaim is still able to make some progress (usually trashing over few remaining pages) so we are not able to break out from the loop. I have seen this happening with the same test described in "mm: consider compaction feedback also for costly allocation" on a swapless system. The original problem got resolved by "vmscan: consider classzone_idx in compaction_ready" but it shows how things might go wrong when we approach the oom event horizont. The reason why compaction requires being over low rather than min watermark is not clear to me. This check was there essentially since 56de7263 ("mm: compaction: direct compact when a high-order allocation fails"). It is clearly an implementation detail though and we shouldn't pull it into the generic retry logic while we should be able to cope with such eventuality. The only place in should_compact_retry where we retry without any upper bound is for compaction_withdrawn() case. Introduce compaction_zonelist_suitable function which checks the given zonelist and returns true only if there is at least one zone which would would unblock __compaction_suitable if more memory got reclaimed. In this implementation it checks __compaction_suitable with NR_FREE_PAGES plus part of the reclaimable memory as the target for the watermark check. The reclaimable memory is reduced linearly by the allocation order. The idea is that we do not want to reclaim all the remaining memory for a single allocation request just unblock __compaction_suitable which doesn't guarantee we will make a further progress. The new helper is then used if compaction_withdrawn() feedback was provided so we do not retry if there is no outlook for a further progress. !costly requests shouldn't be affected much - e.g. order-2 pages would require to have at least 64kB on the reclaimable LRUs while order-9 would need at least 32M which should be enough to not lock up. [vbabka@suse.cz: fix classzone_idx vs. high_zoneidx usage in compaction_zonelist_suitable] [akpm@linux-foundation.org: fix it for Mel's mm-page_alloc-remove-field-from-alloc_context.patch] Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (backported from commit 86a294a8) [cascardo: fixup ac_classzone_idx to ac->classzone_idx] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
-
Michal Hocko authored
BugLink: https://bugs.launchpad.net/bugs/1655842 PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside should_reclaim_retry currently where we decide to not retry after at least order worth of pages were reclaimed or the watermark check for at least one zone would succeed after reclaiming all pages if the reclaim hasn't made any progress. Compaction feedback is mostly ignored and we just try to make sure that the compaction did at least something before giving up. The first condition was added by a41f24ea ("page allocator: smarter retry of costly-order allocations) and it assumed that lumpy reclaim could have created a page of the sufficient order. Lumpy reclaim, has been removed quite some time ago so the assumption doesn't hold anymore. Remove the check for the number of reclaimed pages and rely on the compaction feedback solely. should_reclaim_retry now only makes sure that we keep retrying reclaim for high order pages only if they are hidden by watermaks so order-0 reclaim makes really sense. should_compact_retry now keeps retrying even for the costly allocations. The number of retries is reduced wrt. !costly requests because they are less important and harder to grant and so their pressure shouldn't cause contention for other requests or cause an over reclaim. We also do not reset no_progress_loops for costly request to make sure we do not keep reclaiming too agressively. This has been tested by running a process which fragments memory: - compact memory - mmap large portion of the memory (1920M on 2GRAM machine with 2G of swapspace) - MADV_DONTNEED single page in PAGE_SIZE*((1UL<<MAX_ORDER)-1) steps until certain amount of memory is freed (250M in my test) and reduce the step to (step / 2) + 1 after reaching the end of the mapping - then run a script which populates the page cache 2G (MemTotal) from /dev/zero to a new file And then tries to allocate nr_hugepages=$(awk '/MemAvailable/{printf "%d\n", $2/(2*1024)}' /proc/meminfo) huge pages. root@test1:~# echo 1 > /proc/sys/vm/overcommit_memory;echo 1 > /proc/sys/vm/compact_memory; ./fragment-mem-and-run /root/alloc_hugepages.sh 1920M 250M Node 0, zone DMA 31 28 31 10 2 0 2 1 2 3 1 Node 0, zone DMA32 437 319 171 50 28 25 20 16 16 14 437 * This is the /proc/buddyinfo after the compaction Done fragmenting. size=2013265920 freed=262144000 Node 0, zone DMA 165 48 3 1 2 0 2 2 2 2 0 Node 0, zone DMA32 35109 14575 185 51 41 12 6 0 0 0 0 * /proc/buddyinfo after memory got fragmented Executing "/root/alloc_hugepages.sh" Eating some pagecache 508623+0 records in 508623+0 records out 2083319808 bytes (2.1 GB) copied, 11.7292 s, 178 MB/s Node 0, zone DMA 3 5 3 1 2 0 2 2 2 2 0 Node 0, zone DMA32 111 344 153 20 24 10 3 0 0 0 0 * /proc/buddyinfo after page cache got eaten Trying to allocate 129 129 * 129 hugepages requested and all of them granted. Node 0, zone DMA 3 5 3 1 2 0 2 2 2 2 0 Node 0, zone DMA32 127 97 30 99 11 6 2 1 4 0 0 * /proc/buddyinfo after hugetlb allocation. 10 runs will behave as follows: Trying to allocate 130 130 -- Trying to allocate 129 129 -- Trying to allocate 128 128 -- Trying to allocate 129 129 -- Trying to allocate 128 128 -- Trying to allocate 129 129 -- Trying to allocate 132 132 -- Trying to allocate 129 129 -- Trying to allocate 128 128 -- Trying to allocate 129 129 So basically 100% success for all 10 attempts. Without the patch numbers looked much worse: Trying to allocate 128 12 -- Trying to allocate 129 14 -- Trying to allocate 129 7 -- Trying to allocate 129 16 -- Trying to allocate 129 30 -- Trying to allocate 129 38 -- Trying to allocate 129 19 -- Trying to allocate 129 37 -- Trying to allocate 129 28 -- Trying to allocate 129 37 Just for completness the base kernel without oom detection rework looks as follows: Trying to allocate 127 30 -- Trying to allocate 129 12 -- Trying to allocate 129 52 -- Trying to allocate 128 32 -- Trying to allocate 129 12 -- Trying to allocate 129 10 -- Trying to allocate 129 32 -- Trying to allocate 128 14 -- Trying to allocate 128 16 -- Trying to allocate 129 8 As we can see the success rate is much more volatile and smaller without this patch. So the patch not only makes the retry logic for costly requests more sensible the success rate is even higher. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 7854ea6c) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
-
Michal Hocko authored
BugLink: https://bugs.launchpad.net/bugs/1655842 should_reclaim_retry will give up retries for higher order allocations if none of the eligible zones has any requested or higher order pages available even if we pass the watermak check for order-0. This is done because there is no guarantee that the reclaimable and currently free pages will form the required order. This can, however, lead to situations where the high-order request (e.g. order-2 required for the stack allocation during fork) will trigger OOM too early - e.g. after the first reclaim/compaction round. Such a system would have to be highly fragmented and there is no guarantee further reclaim/compaction attempts would help but at least make sure that the compaction was active before we go OOM and keep retrying even if should_reclaim_retry tells us to oom if - the last compaction round backed off or - we haven't completed at least MAX_COMPACT_RETRIES active compaction rounds. The first rule ensures that the very last attempt for compaction was not ignored while the second guarantees that the compaction has done some work. Multiple retries might be needed to prevent occasional pigggy backing of other contexts to steal the compacted pages before the current context manages to retry to allocate them. compaction_failed() is taken as a final word from the compaction that the retry doesn't make much sense. We have to be careful though because the first compaction round is MIGRATE_ASYNC which is rather weak as it ignores pages under writeback and gives up too easily in other situations. We therefore have to make sure that MIGRATE_SYNC_LIGHT mode has been used before we give up. With this logic in place we do not have to increase the migration mode unconditionally and rather do it only if the compaction failed for the weaker mode. A nice side effect is that the stronger migration mode is used only when really needed so this has a potential of smaller latencies in some cases. Please note that the compaction doesn't tell us much about how successful it was when returning compaction_made_progress so we just have to blindly trust that another retry is worthwhile and cap the number to something reasonable to guarantee a convergence. If the given number of successful retries is not sufficient for a reasonable workloads we should focus on the collected compaction tracepoints data and try to address the issue in the compaction code. If this is not feasible we can increase the retries limit. [mhocko@suse.com: fix warning] Link: http://lkml.kernel.org/r/20160512061636.GA4200@dhcp22.suse.czSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 33c2d214) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
-
Michal Hocko authored
BugLink: https://bugs.launchpad.net/bugs/1655842 Compaction can provide a wild variation of feedback to the caller. Many of them are implementation specific and the caller of the compaction (especially the page allocator) shouldn't be bound to specifics of the current implementation. This patch abstracts the feedback into three basic types: - compaction_made_progress - compaction was active and made some progress. - compaction_failed - compaction failed and further attempts to invoke it would most probably fail and therefore it is not worth retrying - compaction_withdrawn - compaction wasn't invoked for an implementation specific reasons. In the current implementation it means that the compaction was deferred, contended or the page scanners met too early without any progress. Retrying is still worthwhile. [vbabka@suse.cz: do not change thp back off behavior] [akpm@linux-foundation.org: fix typo in comment, per Hillf] Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (backported from commit cab1802b) [cascardo: fixup as we don't have kcompactd] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
-