Commits · 9ef4d29c0f468f4edd11c5ba5cd4796923b715d8 · Kirill Smelkov / linux

08 Mar, 2017 12 commits

ARM: dts: imx31: fix AVIC base address · 9ef4d29c

Vladimir Zapolskiy authored Nov 17, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit af92305e upstream.

On i.MX31 AVIC interrupt controller base address is at 0x68000000.

The problem was shadowed by the AVIC driver, which takes the correct
base address from a SoC specific header file.

Fixes: d2a37b3d ("ARM i.MX31: Add devicetree support")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

9ef4d29c

ARM: dts: imx31: move CCM device node to AIPS2 bus devices · 47f5d3fc

Vladimir Zapolskiy authored Sep 26, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit 1f87aee6 upstream.

i.MX31 Clock Control Module controller is found on AIPS2 bus, move it
there from SPBA bus to avoid a conflict of device IO space mismatch.

Fixes: ef0e4a60 ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup")
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

47f5d3fc

ARM: dts: imx31: fix clock control module interrupts description · 9e302c75

Vladimir Zapolskiy authored Sep 26, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit 2e575cbc upstream.

The type of AVIC interrupt controller found on i.MX31 is one-cell,
namely 31 for CCM DVFS and 53 for CCM, however for clock control
module its interrupts are specified as 3-cells, fix it.

Fixes: ef0e4a60 ("ARM: mx31: Replace clk_register_clkdev with clock DT lookup")
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

9e302c75

perf scripting: Avoid leaking the scripting_context variable · 9b56175a

Arnaldo Carvalho de Melo authored Oct 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit cf346d5b upstream.

Both register_perl_scripting() and register_python_scripting() allocate
this variable, fix it by checking if it already was.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 7e4b21b8 ("perf/scripts: Add Python scripting engine")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

9b56175a

IB/IPoIB: Remove can't use GFP_NOIO warning · 72e98ad8

Kamal Heib authored Nov 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit 0b59970e upstream.

Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.

This print become more annoying when the IPoIB interface is configured
to work in connected mode.

Fixes: 09b93088 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

72e98ad8

IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs · c4158d12

Eran Ben Elisha authored Nov 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit 1f22e454 upstream.

According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
supported only if dmfs_ipoib bit is set.

If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
function won't find any free QP and will fail.

Fixes: c1c98501 ('IB/mlx4: Add support for steerable IB UD QPs')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

c4158d12

IB/mlx4: Fix port query for 56Gb Ethernet links · 49f77e6a

Saeed Mahameed authored Nov 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit 6fa26208 upstream.

Report the correct speed in the port attributes when using a 56Gbps
ethernet link.  Without this change the field is incorrectly set to 10.

Fixes: a9c766bb ('IB/mlx4: Fix info returned when querying IBoE ports')
Fixes: 2e96691c ('IB: Use central enum for speed instead of hard-coded values')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

49f77e6a

IB/mlx4: Fix out-of-range array index in destroy qp flow · 4d8d8f81

Jack Morgenstein authored Nov 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit c482af64 upstream.

For non-special QPs, the port value becomes non-zero only at the
RESET-to-INIT transition. If the QP has not undergone that transition,
its port number value is still zero.

If such a QP is destroyed before being moved out of the RESET state,
subtracting one from the qp port number results in a negative value.
Using that negative value as an index into the qp1_proxy array
results in an out-of-bounds array reference.

Fix this by testing that the QP type is one that uses qp1_proxy before
using the port number. For special QPs of all types, the port number is
specified at QP creation time.

Fixes: 9433c188 ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

4d8d8f81

IB/mlx4: Set traffic class in AH · bac1d443

Maor Gottlieb authored Nov 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit af4295c1 upstream.

Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.

Fixes: 9106c410 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

bac1d443

IB/mlx5: Wait for all async command completions to complete · f7da8081

Eli Cohen authored Oct 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit acbda523 upstream.

Wait before continuing unload till all pending mkey async creation requests
are done.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

f7da8081

ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it · 9b797468

Steven Rostedt authored May 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1660993

commit 8329e818 upstream.

Matt Fleming reported seeing crashes when enabling and disabling
function profiling which uses function graph tracer. Later Namhyung Kim
hit a similar issue and he found that the issue was due to the jmp to
ftrace_stub in ftrace_graph_call was only two bytes, and when it was
changed to jump to the tracing code, it overwrote the ftrace_stub that
was after it.

Masami Hiramatsu bisected this down to a binutils change:

8dcea93252a9ea7dff57e85220a719e2a5e8ab41 is the first bad commit
commit 8dcea93252a9ea7dff57e85220a719e2a5e8ab41
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri May 15 03:17:31 2015 -0700

    Add -mshared option to x86 ELF assembler

    This patch adds -mshared option to x86 ELF assembler.  By default,
    assembler will optimize out non-PLT relocations against defined non-weak
    global branch targets with default visibility.  The -mshared option tells
    the assembler to generate code which may go into a shared library
    where all non-weak global branch targets with default visibility can
    be preempted.  The resulting code is slightly bigger.  This option
    only affects the handling of branch instructions.

Declaring ftrace_stub as a weak call prevents gas from using two byte
jumps to it, which would be converted to a jump to the function graph
code.

Link: http://lkml.kernel.org/r/20160516230035.1dbae571@gandalf.local.homeReported-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

9b797468

UBUNTU: Start new release · b75dc28e
Thadeu Lima de Souza Cascardo authored Mar 08, 2017
```
Ignore: yes
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
```
b75dc28e

03 Mar, 2017 4 commits

UBUNTU: Ubuntu-4.4.0-66.87 · 05022128
Stefan Bader authored Mar 03, 2017
```
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
```
05022128

tty: n_hdlc: get rid of racy n_hdlc.tbuf · 579abe8d

Alexander Popov authored Mar 01, 2017

Currently N_HDLC line discipline uses a self-made singly linked list for
data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
an error.

The commit be10eb75
("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
one data buffer to tx_free_buf_list twice. That causes double free in
n_hdlc_release().

Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
in case of tx error put current data buffer after the head of tx_buf_list.
Signed-off-by: Alexander Popov <alex.popov@linux.com>

CVE-2017-2636
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

579abe8d

TTY: n_hdlc, fix lockdep false positive · c1c6bc86

Jiri Slaby authored Nov 26, 2015

The class of 4 n_hdls buf locks is the same because a single function
n_hdlc_buf_list_init is used to init all the locks. But since
flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
emits a warning:
=============================================
[ INFO: possible recursive locking detected ]
4.3.0-25.g91e30a7-default #1 Not tainted
---------------------------------------------
a.out/1248 is trying to acquire lock:
 (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]

but task is already holding lock:
 (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&list->spinlock)->rlock);
  lock(&(&list->spinlock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by a.out/1248:
 #0:  (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
 #1:  (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
...
Call Trace:
...
 [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
 [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
 [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
 [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
...

Fix it by initializing the spin_locks separately. This removes also
reduntand memset of a freshly kzallocated space.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

CVE-2017-2636

(cherry-picked from e9b736d8 upstream)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

c1c6bc86

UBUNTU: Start new release · 05eb5f2b
Stefan Bader authored Mar 03, 2017
```
Ignore: yes
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
```
05eb5f2b

20 Feb, 2017 3 commits

UBUNTU: Ubuntu-4.4.0-64.85 · 18a85f61
Stefan Bader authored Feb 20, 2017
```
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
```
18a85f61

dccp: fix freeing skb too early for IPV6_RECVPKTINFO · 4feb04ad

Andrey Konovalov authored Feb 16, 2017

In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.

However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.

Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.

Similar fixes for TCP:

fb7e2399 [TCP]: skb is unexpectedly freed.
0aea76d3 tcp: SYN packets are now
simply consumed
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

CVE-2017-6074
BugLink: http://bugs.launchpad.net/bugs/1665935

(cherry-picked from 5edabca9 davem)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

4feb04ad

UBUNTU: Start new release · 58c3d1ca
Stefan Bader authored Feb 20, 2017
```
Ignore: yes
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
```
58c3d1ca

01 Feb, 2017 1 commit
- UBUNTU: Ubuntu-4.4.0-63.84 · 59afec46
  Thadeu Lima de Souza Cascardo authored Jan 31, 2017
```
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
```
  59afec46
31 Jan, 2017 5 commits

UBUNTU: [Config] Drop elants_i2c and pinctrl-cherryview from ABI · 40b517cc
Thadeu Lima de Souza Cascardo authored Jan 31, 2017
```
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
```
40b517cc

UBUNTU: SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp · 9c49adf9

Keno Fischer authored Jan 27, 2017

BugLink: http://bugs.launchpad.net/bugs/1658270

In 19be0eaf ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()"),
the mm code was changed from unsetting FOLL_WRITE after a COW was resolved to
setting the (newly introduced) FOLL_COW instead. Simultaneously, the check in
gup.c was updated to still allow writes with FOLL_FORCE set if FOLL_COW had
also been set. However, a similar check in huge_memory.c was forgotten. As a
result, remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets FOLL_COW and
returns 0 causing a retry, but follow_trans_huge_pmd bails out immidiately
because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is true. While in this
state the process is stil SIGKILLable, but little else works (e.g. no ptrace
attach, no other signals). This is easily reproduced with the following
code (assuming thp are set to always):

    #include <assert.h>
    #include <fcntl.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>

    #define TEST_SIZE 5 * 1024 * 1024

    int main(void) {
      int status;
      pid_t child;
      int fd = open("/proc/self/mem", O_RDWR);
      void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                        MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
      assert(addr != MAP_FAILED);
      pid_t parent_pid = getpid();
      if ((child = fork()) == 0) {
        void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                           MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
        assert(addr2 != MAP_FAILED);
        memset(addr2, 'a', TEST_SIZE);
        pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
        return 0;
      }
      assert(child == waitpid(child, &status, 0));
      assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
      return 0;
    }

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to
the update in gup.c in the original commit. The same pattern exists in
follow_devmap_pmd. However, we should not be able to reach that check
with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever
do.
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

9c49adf9

UBUNTU: SAUCE: hv: don't reset hv_context.tsc_page on crash · 800d849b

Vitaly Kuznetsov authored Jan 27, 2017

BugLink: http://bugs.launchpad.net/bugs/1630924

It may happen that secondary CPUs are still alive and resetting
hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc()
as we don't check for it being not NULL there. It is safe as we're not
freeing this page anyways.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 56ef6718)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

800d849b

x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic · 6a5e3913

Vitaly Kuznetsov authored Jan 27, 2017

BugLink: http://bugs.launchpad.net/bugs/1630924

There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).

To properly unload VMBus making it possible to start over during kdump we
need to do the following:

 - Send an 'unload' message to the hypervisor. This can be done on any CPU
   so we do this the crashing CPU.

 - Receive the 'unload finished' reply message. WS2012R2 delivers this
   message to the CPU which was used to establish VMBus connection during
   module load and this CPU may differ from the CPU sending 'unload'.

Receiving a VMBus message means the following:

 - There is a per-CPU slot in memory for one message. This slot can in
   theory be accessed by any CPU.

 - We get an interrupt on the CPU when a message was placed into the slot.

 - When we read the message we need to clear the slot and signal the fact
   to the hypervisor. In case there are more messages to this CPU pending
   the hypervisor will deliver the next message. The signaling is done by
   writing to an MSR so this can only be done on the appropriate CPU.

To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:

 - We're crashing on a CPU which is different from the one which was used
   to initially contact the hypervisor.

 - The CPU which was used for the initial contact is blocked with interrupts
   disabled and there is a message pending in the message slot.

In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.

The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.

The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 59107e2f)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

6a5e3913

UBUNTU: [Config] CONFIG_PINCTRL_CHERRYVIEW=y · c4ae8dea

Joseph Salisbury authored Dec 13, 2016

BugLink: http://bugs.launchpad.net/bugs/1630238

CONFIG_PINCTRL_CHERRYVIEW=y (must be y, not m) in addition to CONFIG_TOUCHSCREEN_ELAN to get the touchscreen working on CYAN.
The change of Intel related PINCTRLs (specifically the CHERRYVIEW one) to m instead of y causes at least the touchscreen regression.
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

c4ae8dea

30 Jan, 2017 3 commits

UBUNTU: SAUCE: HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL · 4ffb8f69

Colin Ian King authored Jan 27, 2017

BugLink: http://bugs.launchpad.net/bugs/1652132

Quirking the following AMI USB device with ALWAYS_POLL fixes an AMI
virtual keyboard and mouse from not responding and timing out when
it is attached to a ppc64el Power 8 system and when we have some
rapid open/closes on the mouse device.

 usb 1-3: new high-speed USB device number 2 using xhci_hcd
 usb 1-3: New USB device found, idVendor=046b, idProduct=ff01
 usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
 usb 1-3: Product: Virtual Hub
 usb 1-3: Manufacturer: American Megatrends Inc.
 usb 1-3: SerialNumber: serial
 usb 1-3.3: new high-speed USB device number 3 using xhci_hcd
 usb 1-3.3: New USB device found, idVendor=046b, idProduct=ff31
 usb 1-3.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
 usb 1-3.3: Product: Virtual HardDisk Device
 usb 1-3.3: Manufacturer: American Megatrends Inc.
 usb 1-3.4: new low-speed USB device number 4 using xhci_hcd
 usb 1-3.4: New USB device found, idVendor=046b, idProduct=ff10
 usb 1-3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
 usb 1-3.4: Product: Virtual Keyboard and Mouse
 usb 1-3.4: Manufacturer: American Megatrends Inc.

With the quirk I have not been able to trigger the issue with
half an hour of saturation soak testing.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

4ffb8f69

UBUNTU: [debian] derive indep_hdrs_pkg_name from src_pkg_name · 813489f7

Kamal Mostafa authored Jan 24, 2017

This long-standing oversight in our debian rules hardcodes the string "linux"
instead of using the $(src_pkg_name) for just one of the generated .deb package
names: linux-headers-x.x.x-x.  Lets fix it in the generic branches
(T,X,Y,Z,unstable) so that we won't have to keep applying this patch to each of
the derivative/custom kernels.

-----8<-----

Ignore: yes
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Ben Romer <ben.romer@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

813489f7

ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths · 5a96c011

Alexander Duyck authored Jan 23, 2017

BugLink: http://bugs.launchpad.net/bugs/1658491

When I was adding the code for enabling VLAN promiscuous mode with SR-IOV
enabled I had inadvertently left the VLNCTRL.VFE bit unchanged as I has
assumed there was code in another path that was setting it when we enabled
SR-IOV.  This wasn't the case and as a result we were just disabling VLAN
filtering for all the VFs apparently.

Also the previous patches were always clearing CFIEN which was always set
to 0 by the hardware anyway so I am dropping the redundant bit clearing.

Fixes: 16369564 ("ixgbe: Add support for VLAN promiscuous with SR-IOV")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry-picked from commit f60439bc)
Fixes: 6c39aa93 ("ixgbe: Add support for VLAN promiscuous with SR-IOV")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

5a96c011

26 Jan, 2017 12 commits

mm, oom: prevent premature OOM killer invocation for high order request · 018a3c3c

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

There have been several reports about pre-mature OOM killer invocation
in 4.7 kernel when order-2 allocation request (for the kernel stack)
invoked OOM killer even during basic workloads (light IO or even kernel
compile on some filesystems).  In all reported cases the memory is
fragmented and there are no order-2+ pages available.  There is usually
a large amount of slab memory (usually dentries/inodes) and further
debugging has shown that there are way too many unmovable blocks which
are skipped during the compaction.  Multiple reporters have confirmed
that the current linux-next which includes [1] and [2] helped and OOMs
are not reproducible anymore.

A simpler fix for the late rc and stable is to simply ignore the
compaction feedback and retry as long as there is a reclaim progress and
we are not getting OOM for order-0 pages.  We already do that for
CONFING_COMPACTION=n so let's reuse the same code when compaction is
enabled as well.

[1] http://lkml.kernel.org/r/20160810091226.6709-1-vbabka@suse.cz
[2] http://lkml.kernel.org/r/f7a9ea9d-bb88-bfd6-e340-3a933559305a@suse.cz

Fixes: 0a0337e0 ("mm, oom: rework oom detection")
Link: http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.czSigned-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Tested-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Cc: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[4.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6b4e3181)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

018a3c3c

mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION · 3c2eabde

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

Joonsoo has reported that he is able to trigger OOM for !costly high
order requests (heavy fork() workload close the OOM) with the new oom
detection rework.  This is because we rely only on should_reclaim_retry
when the compaction is disabled and it only checks watermarks for the
requested order and so we might trigger OOM when there is a lot of free
memory.

It is not very clear what are the usual workloads when the compaction is
disabled.  Relying on high order allocations heavily without any
mechanism to create those orders except for unbound amount of reclaim is
certainly not a good idea.

To prevent from potential regressions let's help this configuration
some.  We have to sacrifice the determinsm though because there simply
is none here possible.  should_compact_retry implementation for
!CONFIG_COMPACTION, which was empty so far, will do watermark check for
order-0 on all eligible zones.  This will cause retrying until either
the reclaim cannot make any further progress or all the zones are
depleted even for order-0 pages.  This means that the number of retries
is basically unbounded for !costly orders but that was the case before
the rework as well so this shouldn't regress.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.orgReported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit 31e49bfd)
[cascardo: fixup ac_classzone_idx to ac->classzone_idx]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

3c2eabde

mm, oom, compaction: prevent from should_compact_retry looping for ever for costly orders · 8f32a02e

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

"mm: consider compaction feedback also for costly allocation" has
removed the upper bound for the reclaim/compaction retries based on the
number of reclaimed pages for costly orders.  While this is desirable
the patch did miss a mis interaction between reclaim, compaction and the
retry logic.  The direct reclaim tries to get zones over min watermark
while compaction backs off and returns COMPACT_SKIPPED when all zones
are below low watermark + 1<<order gap.  If we are getting really close
to OOM then __compaction_suitable can keep returning COMPACT_SKIPPED a
high order request (e.g.  hugetlb order-9) while the reclaim is not able
to release enough pages to get us over low watermark.  The reclaim is
still able to make some progress (usually trashing over few remaining
pages) so we are not able to break out from the loop.

I have seen this happening with the same test described in "mm: consider
compaction feedback also for costly allocation" on a swapless system.
The original problem got resolved by "vmscan: consider classzone_idx in
compaction_ready" but it shows how things might go wrong when we
approach the oom event horizont.

The reason why compaction requires being over low rather than min
watermark is not clear to me.  This check was there essentially since
56de7263 ("mm: compaction: direct compact when a high-order
allocation fails").  It is clearly an implementation detail though and
we shouldn't pull it into the generic retry logic while we should be
able to cope with such eventuality.  The only place in
should_compact_retry where we retry without any upper bound is for
compaction_withdrawn() case.

Introduce compaction_zonelist_suitable function which checks the given
zonelist and returns true only if there is at least one zone which would
would unblock __compaction_suitable if more memory got reclaimed.  In
this implementation it checks __compaction_suitable with NR_FREE_PAGES
plus part of the reclaimable memory as the target for the watermark
check.  The reclaimable memory is reduced linearly by the allocation
order.  The idea is that we do not want to reclaim all the remaining
memory for a single allocation request just unblock
__compaction_suitable which doesn't guarantee we will make a further
progress.

The new helper is then used if compaction_withdrawn() feedback was
provided so we do not retry if there is no outlook for a further
progress.  !costly requests shouldn't be affected much - e.g.  order-2
pages would require to have at least 64kB on the reclaimable LRUs while
order-9 would need at least 32M which should be enough to not lock up.

[vbabka@suse.cz: fix classzone_idx vs. high_zoneidx usage in compaction_zonelist_suitable]
[akpm@linux-foundation.org: fix it for Mel's mm-page_alloc-remove-field-from-alloc_context.patch]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit 86a294a8)
[cascardo: fixup ac_classzone_idx to ac->classzone_idx]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

8f32a02e

mm: consider compaction feedback also for costly allocation · 51760132

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside
should_reclaim_retry currently where we decide to not retry after at
least order worth of pages were reclaimed or the watermark check for at
least one zone would succeed after reclaiming all pages if the reclaim
hasn't made any progress.  Compaction feedback is mostly ignored and we
just try to make sure that the compaction did at least something before
giving up.

The first condition was added by a41f24ea ("page allocator: smarter
retry of costly-order allocations) and it assumed that lumpy reclaim
could have created a page of the sufficient order.  Lumpy reclaim, has
been removed quite some time ago so the assumption doesn't hold anymore.
Remove the check for the number of reclaimed pages and rely on the
compaction feedback solely.  should_reclaim_retry now only makes sure
that we keep retrying reclaim for high order pages only if they are
hidden by watermaks so order-0 reclaim makes really sense.

should_compact_retry now keeps retrying even for the costly allocations.
The number of retries is reduced wrt.  !costly requests because they are
less important and harder to grant and so their pressure shouldn't cause
contention for other requests or cause an over reclaim.  We also do not
reset no_progress_loops for costly request to make sure we do not keep
reclaiming too agressively.

This has been tested by running a process which fragments memory:
	- compact memory
	- mmap large portion of the memory (1920M on 2GRAM machine with 2G
	  of swapspace)
	- MADV_DONTNEED single page in PAGE_SIZE*((1UL<<MAX_ORDER)-1)
	  steps until certain amount of memory is freed (250M in my test)
	  and reduce the step to (step / 2) + 1 after reaching the end of
	  the mapping
	- then run a script which populates the page cache 2G (MemTotal)
	  from /dev/zero to a new file
And then tries to allocate
nr_hugepages=$(awk '/MemAvailable/{printf "%d\n", $2/(2*1024)}' /proc/meminfo)
huge pages.

root@test1:~# echo 1 > /proc/sys/vm/overcommit_memory;echo 1 > /proc/sys/vm/compact_memory; ./fragment-mem-and-run /root/alloc_hugepages.sh 1920M 250M
Node 0, zone      DMA     31     28     31     10      2      0      2      1      2      3      1
Node 0, zone    DMA32    437    319    171     50     28     25     20     16     16     14    437

* This is the /proc/buddyinfo after the compaction

Done fragmenting. size=2013265920 freed=262144000
Node 0, zone      DMA    165     48      3      1      2      0      2      2      2      2      0
Node 0, zone    DMA32  35109  14575    185     51     41     12      6      0      0      0      0

* /proc/buddyinfo after memory got fragmented

Executing "/root/alloc_hugepages.sh"
Eating some pagecache
508623+0 records in
508623+0 records out
2083319808 bytes (2.1 GB) copied, 11.7292 s, 178 MB/s
Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
Node 0, zone    DMA32    111    344    153     20     24     10      3      0      0      0      0

* /proc/buddyinfo after page cache got eaten

Trying to allocate 129
129

* 129 hugepages requested and all of them granted.

Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
Node 0, zone    DMA32    127     97     30     99     11      6      2      1      4      0      0

* /proc/buddyinfo after hugetlb allocation.

10 runs will behave as follows:
Trying to allocate 130
130
--
Trying to allocate 129
129
--
Trying to allocate 128
128
--
Trying to allocate 129
129
--
Trying to allocate 128
128
--
Trying to allocate 129
129
--
Trying to allocate 132
132
--
Trying to allocate 129
129
--
Trying to allocate 128
128
--
Trying to allocate 129
129

So basically 100% success for all 10 attempts.
Without the patch numbers looked much worse:
Trying to allocate 128
12
--
Trying to allocate 129
14
--
Trying to allocate 129
7
--
Trying to allocate 129
16
--
Trying to allocate 129
30
--
Trying to allocate 129
38
--
Trying to allocate 129
19
--
Trying to allocate 129
37
--
Trying to allocate 129
28
--
Trying to allocate 129
37

Just for completness the base kernel without oom detection rework looks
as follows:
Trying to allocate 127
30
--
Trying to allocate 129
12
--
Trying to allocate 129
52
--
Trying to allocate 128
32
--
Trying to allocate 129
12
--
Trying to allocate 129
10
--
Trying to allocate 129
32
--
Trying to allocate 128
14
--
Trying to allocate 128
16
--
Trying to allocate 129
8

As we can see the success rate is much more volatile and smaller without
this patch. So the patch not only makes the retry logic for costly
requests more sensible the success rate is even higher.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7854ea6c)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

51760132

mm, oom: protect !costly allocations some more · 989e78e9

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

should_reclaim_retry will give up retries for higher order allocations
if none of the eligible zones has any requested or higher order pages
available even if we pass the watermak check for order-0.  This is done
because there is no guarantee that the reclaimable and currently free
pages will form the required order.

This can, however, lead to situations where the high-order request (e.g.
order-2 required for the stack allocation during fork) will trigger OOM
too early - e.g.  after the first reclaim/compaction round.  Such a
system would have to be highly fragmented and there is no guarantee
further reclaim/compaction attempts would help but at least make sure
that the compaction was active before we go OOM and keep retrying even
if should_reclaim_retry tells us to oom if

	- the last compaction round backed off or
	- we haven't completed at least MAX_COMPACT_RETRIES active
	  compaction rounds.

The first rule ensures that the very last attempt for compaction was not
ignored while the second guarantees that the compaction has done some
work.  Multiple retries might be needed to prevent occasional pigggy
backing of other contexts to steal the compacted pages before the
current context manages to retry to allocate them.

compaction_failed() is taken as a final word from the compaction that
the retry doesn't make much sense.  We have to be careful though because
the first compaction round is MIGRATE_ASYNC which is rather weak as it
ignores pages under writeback and gives up too easily in other
situations.  We therefore have to make sure that MIGRATE_SYNC_LIGHT mode
has been used before we give up.  With this logic in place we do not
have to increase the migration mode unconditionally and rather do it
only if the compaction failed for the weaker mode.  A nice side effect
is that the stronger migration mode is used only when really needed so
this has a potential of smaller latencies in some cases.

Please note that the compaction doesn't tell us much about how
successful it was when returning compaction_made_progress so we just
have to blindly trust that another retry is worthwhile and cap the
number to something reasonable to guarantee a convergence.

If the given number of successful retries is not sufficient for a
reasonable workloads we should focus on the collected compaction
tracepoints data and try to address the issue in the compaction code.
If this is not feasible we can increase the retries limit.

[mhocko@suse.com: fix warning]
  Link: http://lkml.kernel.org/r/20160512061636.GA4200@dhcp22.suse.czSigned-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 33c2d214)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

989e78e9

mm, compaction: abstract compaction feedback to helpers · bca70cc5

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

Compaction can provide a wild variation of feedback to the caller.  Many
of them are implementation specific and the caller of the compaction
(especially the page allocator) shouldn't be bound to specifics of the
current implementation.

This patch abstracts the feedback into three basic types:
	- compaction_made_progress - compaction was active and made some
	  progress.
	- compaction_failed - compaction failed and further attempts to
	  invoke it would most probably fail and therefore it is not
	  worth retrying
	- compaction_withdrawn - compaction wasn't invoked for an
          implementation specific reasons. In the current implementation
          it means that the compaction was deferred, contended or the
          page scanners met too early without any progress. Retrying is
          still worthwhile.

[vbabka@suse.cz: do not change thp back off behavior]
[akpm@linux-foundation.org: fix typo in comment, per Hillf]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit cab1802b)
[cascardo: fixup as we don't have kcompactd]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

bca70cc5

mm, compaction: distinguish between full and partial COMPACT_COMPLETE · ff732276

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

COMPACT_COMPLETE now means that compaction and free scanner met.  This
is not very useful information if somebody just wants to use this
feedback and make any decisions based on that.  The current caller might
be a poor guy who just happened to scan tiny portion of the zone and
that could be the reason no suitable pages were compacted.  Make sure we
distinguish the full and partial zone walks.

Consumers should treat COMPACT_PARTIAL_SKIPPED as a potential success
and be optimistic in retrying.

The existing users of COMPACT_COMPLETE are conservatively changed to use
COMPACT_PARTIAL_SKIPPED as well but some of them should be probably
reconsidered and only defer the compaction only for COMPACT_COMPLETE
with the new semantic.

This patch shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c8f7de0b)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

ff732276

mm, compaction: simplify __alloc_pages_direct_compact feedback interface · 5d042d3c

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

__alloc_pages_direct_compact communicates potential back off by two
variables:
	- deferred_compaction tells that the compaction returned
	  COMPACT_DEFERRED
	- contended_compaction is set when there is a contention on
	  zone->lock resp. zone->lru_lock locks

__alloc_pages_slowpath then backs of for THP allocation requests to
prevent from long stalls. This is rather messy and it would be much
cleaner to return a single compact result value and hide all the nasty
details into __alloc_pages_direct_compact.

This patch shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit c5d01d0d)
[cascardo: small cleanup as we backported 0a0337e0 before]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

5d042d3c

mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED · 0b7dc5f6

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

try_to_compact_pages() can currently return COMPACT_SKIPPED even when
the compaction is defered for some zone just because zone DMA is skipped
in 99% of cases due to watermark checks.  This makes COMPACT_DEFERRED
basically unusable for the page allocator as a feedback mechanism.

Make sure we distinguish those two states properly and switch their
ordering in the enum.  This would mean that the COMPACT_SKIPPED will be
returned only when all eligible zones are skipped.

As a result COMPACT_DEFERRED handling for THP in __alloc_pages_slowpath
will be more precise and we would bail out rather than reclaim.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1d4746d3)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

0b7dc5f6

mm, compaction: change COMPACT_ constants into enum · ea422501

Michal Hocko authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

Compaction code is doing weird dances between COMPACT_FOO -> int ->
unsigned long

But there doesn't seem to be any reason for that.  All functions which
return/use one of those constants are not expecting any other value so it
really makes sense to define an enum for them and make it clear that no
other values are expected.

This is a pure cleanup and shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ea7ab982)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

ea422501

mm, page_alloc: convert alloc_flags to unsigned · 09807762

Mel Gorman authored Jan 24, 2017

BugLink: https://bugs.launchpad.net/bugs/1655842

alloc_flags is a bitmask of flags but it is signed which does not
necessarily generate the best code depending on the compiler.  Even
without an impact, it makes more sense that this be unsigned.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(backported from commit c603844b)
[cascardo: keep ALLOC_CPUSET at __alloc_pages_nodemask]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

09807762

UBUNTU: SAUCE: ibmvscsis: Fix srp_transfer_data fail return code · 5cb925f3

Bryant G. Ly authored Jan 17, 2017

BugLink: http://bugs.launchpad.net/bugs/1657194

If srp_transfer_data fails within ibmvscsis_write_pending, then
the most likely scenario is that the client timed out the op and
removed the TCE mapping. Thus it will loop forever retrying the
op that is pretty much guaranteed to fail forever. A better return
code would be EIO instead of EAGAIN.

Cc: stable@vger.kernel.org
Reported-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Bryant G. Ly <bgly@us.ibm.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

5cb925f3