Commits · 895b58405a5a34a5ea8961b535bc26603e2aaa1d · Kirill Smelkov / linux

29 Feb, 2016 40 commits

UBUNTU: SAUCE: fs: Treat foreign mounts as nosuid · 895b5840

Andy Lutomirski authored Oct 14, 2014

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

895b5840

UBUNTU: SAUCE: block_dev: Check permissions towards block device inode when mounting · 9a472674

Seth Forshee authored Oct 07, 2015

Unprivileged users should not be able to mount block devices when
they lack sufficient privileges towards the block device inode.
Update blkdev_get_by_path() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

9a472674

UBUNTU: SAUCE: block_dev: Support checking inode permissions in lookup_bdev() · 4a959886

Seth Forshee authored Jul 31, 2015

When looking up a block device by path no permission check is
done to verify that the user has access to the block device inode
at the specified path. In some cases it may be necessary to
check permissions towards the inode, such as allowing
unprivileged users to mount block devices in user namespaces.

Add an argument to lookup_bdev() to optionally perform this
permission check. A value of 0 skips the permission check and
behaves the same as before. A non-zero value specifies the mask
of access rights required towards the inode at the specified
path. The check is always skipped if the user has CAP_SYS_ADMIN.

All callers of lookup_bdev() currently pass a mask of 0, so this
patch results in no functional change. Subsequent patches will
add permission checks where appropriate.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

4a959886

UBUNTU: SAUCE: Smack: Add support for unprivileged mounts from user namespaces · 7effec54

Seth Forshee authored Sep 23, 2015

Security labels from unprivileged mounts cannot be trusted.
Ideally for these mounts we would assign the objects in the
filesystem the same label as the inode for the backing device
passed to mount. Unfortunately it's currently impossible to
determine which inode this is from the LSM mount hooks, so we
settle for the label of the process doing the mount.

This label is assigned to s_root, and also to smk_default to
ensure that new inodes receive this label. The transmute property
is also set on s_root to make this behavior more explicit, even
though it is technically not necessary.

If a filesystem has existing security labels, access to inodes is
permitted if the label is the same as smk_root, otherwise access
is denied. The SMACK64EXEC xattr is completely ignored.

Explicit setting of security labels continues to require
CAP_MAC_ADMIN in init_user_ns.

Altogether, this ensures that filesystem objects are not
accessible to subjects which cannot already access the backing
store, that MAC is not violated for any objects in the fileystem
which are already labeled, and that a user cannot use an
unprivileged mount to gain elevated MAC privileges.

sysfs, tmpfs, and ramfs are already mountable from user
namespaces and support security labels. We can't rule out the
possibility that these filesystems may already be used in mounts
from user namespaces with security lables set from the init
namespace, so failing to trust lables in these filesystems may
introduce regressions. It is safe to trust labels from these
filesystems, since the unprivileged user does not control the
backing store and thus cannot supply security labels, so an
explicit exception is made to trust labels from these
filesystems.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

7effec54

UBUNTU: SAUCE: fs: Limit file caps to the user namespace of the super block · 4147df9e

Seth Forshee authored Sep 23, 2015

Capability sets attached to files must be ignored except in the
user namespaces where the mounter is privileged, i.e. s_user_ns
and its descendants. Otherwise a vector exists for gaining
privileges in namespaces where a user is not already privileged.

Add a new helper function, in_user_ns(), to test whether a user
namespace is the same as or a descendant of another namespace.
Use this helper to determine whether a file's capability set
should be applied to the caps constructed during exec.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

4147df9e

UBUNTU: SAUCE: fs: Add user namesapace member to struct super_block · b6500fda

Seth Forshee authored Sep 23, 2015

Initially this will be used to eliminate the implicit MNT_NODEV
flag for mounts from user namespaces. In the future it will also
be used for translating ids and checking capabilities for
filesystems mounted from user namespaces.

s_user_ns is initialized in alloc_super() and is generally set to
current_user_ns(). To avoid security and corruption issues, two
additional mount checks are also added:

 - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
   in current_user_ns().

 - sget() will fail with EBUSY when the filesystem it's looking
   for is already mounted from another user namespace.

proc requires some special handling. The user namespace of
current isn't appropriate when forking as a result of clone (2)
with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
namespace of the parent and make proc unmountable in the new user
namespace. Instead, the user namespace which owns the new pid
namespace is used. sget_userns() is allowed to allow passing in
a namespace other than that of current, and sget becomes a
wrapper around sget_userns() which passes current_user_ns().

Changes to original version of this patch
  * Documented @user_ns in sget_userns, alloc_super and fs.h
  * Kept an blank line in fs.h
  * Removed unncessary include of user_namespace.h from fs.h
  * Tweaked the location of get_user_ns and put_user_ns so
    the security modules can (if they wish) depend on it.
  -- EWB
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b6500fda

NTB: Add support for AMD PCI-Express Non-Transparent Bridge · ca175ae9

Xiangliang Yu authored Jan 21, 2016

BugLink: http://bugs.launchpad.net/bugs/1542071

This adds support for AMD's PCI-Express Non-Transparent Bridge
(NTB) device on the Zeppelin platform. The driver connnects to the
standard NTB sub-system interface, with modification to add hooks
for power management in a separate patch. The AMD NTB device has 3
memory windows, 16 doorbell, 16 scratch-pad registers, and supports
up to 16 PCIe lanes running a Gen3 speeds.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
(cherry picked from commit a1b36958)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ca175ae9

UBUNTU: [Config] CONFIG_NTB_AMD=m · b28a28de

Tim Gardner authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1542071Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b28a28de

UBUNTU: SAUCE: storvsc: use small sg_tablesize on x86 · 5f53c4e9

Joseph Salisbury authored Oct 15, 2015

BugLink: http://bugs.launchpad.net/bugs/1495983

OriginalAuthor: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Brad Figg <brad.figg@canonical.com>

5f53c4e9

UBUNTU: [Config] CONFIG_ARMV8_DEPRECATED=y · 464e0afc

Tim Gardner authored Feb 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1545542Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

464e0afc

ahci_xgene: Implement the workaround to fix the missing of the edge interrupt... · 1cab7985

Suman Tripathi authored Feb 06, 2016

ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.

Due to H/W errata, the HOST_IRQ_STAT register misses the edge interrupt
when clearing the HOST_IRQ_STAT register and hardware reporting the
PORT_IRQ_STAT register happens to be at the same clock cycle.
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from linux-next commit 32aea268)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

1cab7985

ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci. · b293ac01

Suman Tripathi authored Feb 06, 2016

The flexibility to override the irq handles in the LLD's are already
present, so controllers implementing a edge trigger latch can
implement their own interrupt handler inside the driver.  This patch
removes the AHCI_HFLAG_EDGE_IRQ support from libahci and moves edge
irq handling to ahci_xgene.

tj: Minor update to description.
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kenrel.org>
(cherry picked from linux-next commit d867b95f)
[ dannf: offset adjustments ]
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b293ac01

libahci: Implement the capability to override the generic ahci interrupt handler. · f8aa59f3

Suman Tripathi authored Feb 06, 2016

This patch implements the capability to override the generic AHCI
interrupt handler so that specific ahci drivers can implement their
own custom interrupt handler routines.  It also exports
ahci_handle_port_intr so that custom irq_handler implementations can
use it.

tj: s/ahci_irq_handler/irq_handler/ and updated description.
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from linux-next commit f070d671)
[ dannf: backported to v4.4 ]
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f8aa59f3

megaraid: Fix possible NULL pointer deference in mraid_mm_ioctl · b90e5b1b

Nicholas Krause authored Jan 05, 2016

This adds the needed check after the call to the function
mraid_mm_alloc_kioc in order to make sure that this function has not
returned NULL and therefore makes sure we do not deference a NULL
pointer if one is returned by mraid_mm_alloc_kioc.  Further more add
needed comments explaining that this function call can return NULL if
the list head is empty for the pointer passed in order to allow furture
users to understand this required pointer check.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7296f62f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b90e5b1b

UBUNTU: Start new release · ecf10133
Tim Gardner authored Feb 12, 2016
```
Ignore: yes
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
```
ecf10133
UBUNTU: Ubuntu-4.4.0-5.20 · f29a3f86
Tim Gardner authored Feb 11, 2016
```
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
```
f29a3f86

s390/cio: update measurement characteristics · fe0b3e65

Sebastian Ott authored Jan 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1541534

Per channel path measurement characteristics are obtained during channel
path registration. However if some properties of a channel path change
we don't update the measurement characteristics.

Make sure to update the characteristics when we change the properties of
a channel path or receive a notification from FW about such a change.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 9f3d6d7a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

fe0b3e65

s390/cio: ensure consistent measurement state · a4503e66

Sebastian Ott authored Jan 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1541534

Make sure that in all cases where we could not obtain measurement
characteristics the associated fields are set to invalid values.

Note: without this change the "shared" capability of a channel path
for which we could not obtain the measurement characteristics was
incorrectly displayed as 0 (not shared). We will now correctly
report "unknown" in this case.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 61f0bfcf)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

a4503e66

s390/cio: fix measurement characteristics memleak · d4c21c8b

Sebastian Ott authored Jan 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1541534

Measurement characteristics are allocated during channel path
registration but not freed during deregistration. Fix this by
embedding these characteristics inside struct channel_path.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 0d9bfe91)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

d4c21c8b

qeth: initialize net_device with carrier off · 9afc6703

Ursula Braun authored Dec 11, 2015

BugLink: http://bugs.launchpad.net/bugs/1541907

/sys/class/net/<interface>/operstate for an active qeth network
interface offen shows "unknown", which translates to "state UNKNOWN
in output of "ip link show". It is caused by a missing initialization
of the __LINK_STATE_NOCARRIER bit in the net_device state field.
This patch adds a netif_carrier_off() invocation when creating the
net_device for a qeth device.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reference-ID: Bugzilla 133209
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e5ebe632)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

9afc6703

UBUNTU: SAUCE: (noup) Update spl to 0.6.5.4-0ubuntu2, zfs to 0.6.5.4-0ubuntu2 · 3258f645
Tim Gardner authored Feb 10, 2016
```
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
```
3258f645

mm: CONFIG_NR_ZONES_EXTENDED · f671c3e6

Dan Williams authored Feb 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1534647

ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new mm
zones that are bumping up against the current maximum limit of 4 zones,
i.e.  2 bits in page->flags.  When adding a zone this equation still needs
to be satisified:

    SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT
	  <= BITS_PER_LONG - NR_PAGEFLAGS

ZONE_DEVICE currently tries to satisfy this equation by requiring that
ZONE_DMA be disabled, but this is untenable given generic kernels want to
support ZONE_DEVICE and ZONE_DMA simultaneously.  ZONE_CMA would like to
increase the amount of memory covered per section, but that limits the
minimum granularity at which consecutive memory ranges can be added via
devm_memremap_pages().

The trade-off of what is acceptable to sacrifice depends heavily on the
platform.  For example, ZONE_CMA is targeted for 32-bit platforms where
page->flags is constrained, but those platforms likely do not care about
the minimum granularity of memory hotplug.  A big iron machine with 1024
numa nodes can likely sacrifice ZONE_DMA where a general purpose
distribution kernel can not.

CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected when
the number of configured zones exceeds 4.  It documents the configuration
symbols and definitions that get modified when ZONES_WIDTH is greater than
2.

For now, it steals a bit from NODES_SHIFT.  Later on it can be used to
document the definitions that get modified when a 32-bit configuration
wants more zone bits.

Note that GFP_ZONE_TABLE poses an interesting constraint since
include/linux/gfp.h gets included by the 32-bit portion of a 64-bit build.
We need to be careful to only build the table for zones that have a
corresponding gfp_t flag.  GFP_ZONES_SHIFT is introduced for this purpose.
This patch does not attempt to solve the problem of adding a new zone
that also has a corresponding GFP_ flag.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931
Fixes: 033fbae9 ("mm: ZONE_DEVICE for "device memory"")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Mark <markk@clara.co.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next commit 27ffb3827ac71a46e8d52fc7ed7302d33a619d6c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f671c3e6

UBUNTU: [Config] CONFIG_ZONE_DMA=y · f192f00a

Tim Gardner authored Feb 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1534647Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f192f00a

perf kvm/powerpc: Add support for HCALL reasons · e6e8b816

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

Powerpc provides hcall events that also provides insights into guest
behaviour. Enhance perf kvm stat to record and analyze hcall events.

 - To trace hcall events :
  perf kvm stat record

 - To show the results :
  perf kvm stat report --event=hcall

The result shows the number of hypervisor calls from the guest grouped
by their respective reasons displayed with the frequency.

This patch makes use of two additional tracepoints
"kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall
codes to their respective names, it needs a mapping. Such mapping is
added in this patch in book3s_hcalls.h.

 # pgrep qemu
A sample output :
19378
60515

2 VMs running.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515 --event=hcall

Analyze events for all VMs, all VCPUs:

    HCALL-EVENT Samples Samples% Time% MinTime MaxTime  AvgTime

          H_IPI     822  66.08% 88.10% 0.63us  11.38us 2.05us (+- 1.42%)
     H_SEND_CRQ     144  11.58%  3.77% 0.41us   0.88us 0.50us (+- 1.47%)
   H_VIO_SIGNAL     118   9.49%  2.86% 0.37us   0.83us 0.47us (+- 1.43%)
H_PUT_TERM_CHAR      76   6.11%  2.07% 0.37us   0.90us 0.52us (+- 2.43%)
H_GET_TERM_CHAR      74   5.95%  2.23% 0.37us   1.70us 0.58us (+- 4.77%)
         H_RTAS       6   0.48%  0.85% 1.10us   9.25us 2.70us (+-48.57%)
      H_PERFMON       4   0.32%  0.12% 0.41us   0.96us 0.59us (+-20.92%)

Total Samples:1244, Total events handled time:1916.69us.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-4-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 78e6c39b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

e6e8b816

perf kvm/powerpc: Port perf kvm stat to powerpc · c66c9b37

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

perf kvm can be used to analyze guest exit reasons. This support already
exists in x86. Hence, porting it to powerpc.

 - To trace KVM events :
  perf kvm stat record
  If many guests are running, we can track for a specific guest by using
  --pid as in : perf kvm stat record --pid <pid>

 - To see the results :
  perf kvm stat report

The result shows the number of exits (from the guest context to
host/hypervisor context) grouped by their respective exit reasons with
their frequency.

Since, different powerpc machines have different KVM tracepoints, this
patch discovers the available tracepoints dynamically and accordingly
looks for them. If any single tracepoint is not present, this support
won't be enabled for reporting. To record, this will fail if any of the
events we are looking to record isn't available.  Right now, its only
supported on PowerPC Book3S_HV architectures.

To analyze the different exits, group them and present them (in a slight
descriptive way) to the user, we need a mapping between the "exit code"
(dumped in the kvm_guest_exit tracepoint data) and to its related
Interrupt vector description (exit reason). This patch adds this mapping
in book3s_hv_exits.h.

It records on two available KVM tracepoints for book3s_hv:

"kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter".

Here is a sample o/p:
 # pgrep qemu
19378
60515

2 Guests are running on the host.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515

Analyze events for pid(s) 60515, all VCPUs:

     VM-EXIT Samples Samples% Time% MinTime    MaxTime  Avg time

       SYSCALL  9141  63.67%  7.49% 1.26us   5782.39us    9.87us (+- 6.46%)
H_DATA_STORAGE  4114  28.66%  5.07% 1.72us   4597.68us   14.84us (+-20.06%)
HV_DECREMENTER   418   2.91%  4.26% 0.70us  30002.22us  122.58us (+-70.29%)
      EXTERNAL   392   2.73%  0.06% 0.64us    104.10us    1.94us (+-18.83%)
RETURN_TO_HOST   287   2.00% 83.11% 1.53us 124240.15us 3486.52us (+-16.81%)
H_INST_STORAGE     5   0.03%  0.00% 1.88us      3.73us    2.39us (+-14.20%)

Total Samples:14357, Total events handled time:1203918.42us.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-3-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 066d3593)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

c66c9b37

perf kvm/{x86,s390}: Remove const from kvm_events_tp · 83f70bbc

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

This patch removes the "const" qualifier from kvm_events_tp declaration
to account for the fact that some architectures may need to update this
variable dynamically. For instance, powerpc will need to update this
variable dynamically depending on the machine type.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-2-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 48deaa74)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

83f70bbc

perf kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h · f48df91f

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic
discovery of kvm events (if its needed). To do this, some extern
variables have been introduced with which we can keep the generic
functions generic.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-1-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 162607ea)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f48df91f

UBUNTU: SAUCE: nbd: ratelimit error msgs after socket close · 28c8fd86

Dan Streetman authored Feb 08, 2016

BugLink: http://bugs.launchpad.net/bugs/1505564

Make the "Attempted send on closed socket" error messages generated in
nbd_request_handler() ratelimited.

When the nbd socket is shutdown, the nbd_request_handler() function emits
an error message for every request remaining in its queue.  If the queue
is large, this will spam a large amount of messages to the log.  There's
no need for a separate error message for each request, so this patch
ratelimits it.

In the specific case this was found, the system was virtual and the error
messages were logged to the serial port, which overwhelmed it.

Fixes: 4d48a542 ("nbd: fix I/O hang on disconnected nbds")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
(cherry-picked from commit da6ccaaa git://git.pengutronix.de/git/mpa/linux-nbd.git)
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

28c8fd86

net/mlx5e: Assign random MAC address if needed · f70ccc2c

Saeed Mahameed authored Dec 10, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Under SRIOV there might be a case where VFs are loaded
without pre-assigned MAC address. In this case, the VF
will randomize its own MAC.  This will address the case
of administrator not assigning MAC to the VF through
the PF OS APIs and keep udev happy.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 108805fc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f70ccc2c

net/mlx5: Fix query E-Switch capabilities · 061ac1e1

Saeed Mahameed authored Dec 10, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

E-Switch capabilities should be queried only if E-Switch flow table
is supported and not only when vport group manager.

Fixes: d6666753 ("net/mlx5: E-Switch, Introduce HCA cap and E-Switch vport context")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9bd0a185)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

061ac1e1

net/mlx5e: Add support for SR-IOV ndos · d527bce8

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Implement and enable SR-IOV ndos to manage SR-IOV configuration via
netdev netlink API.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 66e49ded)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

d527bce8

net/mlx5: E-Switch, Introduce get vf statistics · 44efc528

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Add support to get VF statistics using query vport
counter command.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3b751a2a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

44efc528

net/mlx5: E-Switch, Introduce set vport vlan (VST mode) · b77d0399

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Add query and modify functions to control client vlan and qos
striping or insertion, in E-Switch vports contexts.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9e7ea352)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b77d0399

net/mlx5: E-Switch, Introduce HCA cap and E-Switch vport context · 2bc72190

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

E-Switch vport context is unlike NIC vport context, managed by the
E-Switch manager or vport_group_manager and not by the NIC(VF) driver.

The E-Switch manager can access (read/modify) any of its vports
E-Switch context.

Currently E-Switch vport context includes only clietnt and server
vlan insertion and striping data (for later support of VST mode).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d6666753)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

2bc72190

net/mlx5: E-Switch, Introduce Vport administration functions · a5c64acf

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Implement set VF mac/link state and query VF config
to be used later in nedev VF ndos or any other management API.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 77256579)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

a5c64acf

net/mlx5: E-Switch, Add SR-IOV (FDB) support · 42eb2054

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Enabling E-Switch SRIOV for nvfs+1 vports.

Create E-Switch FDB for L2 UC/MC mac steering between VFs/PF and
external vport (Uplink).

FDB contains forwarding rules such as:
	UC MAC0 -> vport0(PF).
	UC MAC1 -> vport1.
	UC MAC2 -> vport2.
	MC MACX -> vport0, vport2, Uplink.
	MC MACY -> vport1, Uplink.

For unmatched traffic FDB has the following default rules:
	Unmached Traffic (src vport != Uplink) -> Uplink.
	Unmached Traffic (src vport == Uplink) -> vport0(PF).

FDB rules population:
Each NIC vport (VF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 81848731)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

42eb2054

net/mlx5: E-Switch, Introduce FDB hardware capabilities · 8519cf0c

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Define needed hardware structures and capabilities needed
for E-Switch FDB flow tables and read them on driver load.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 495716b1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

8519cf0c

net/mlx5: Introducing E-Switch and l2 table · 79f6e12d

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

E-Switch is the software entity that represents and manages ConnectX4
inter-HCA ethernet l2 switching.

E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be
connected to the device through a vport of an e-switch.

Each e-switch is managed by one vNIC identified by
HCA_CAP.vport_group_manager (usually it is the PF/vport[0]),
and its main responsibility is to forward each packet to the
right vport.

e-Switch needs to manage its own l2-table and FDB tables.

L2 table is a flow table that is managed by FW, it is needed for
Multi-host (Multi PF) configuration for inter HCA switching between
PFs.

FDB table is a flow table that is totally managed by e-Switch driver,
its main responsibility is to switch packets between e-Swtich internal
vports and uplink vport that belong to the same.

This patch introduces only e-Swtich l2 table management, FDB managemnt
will come later when ethernet SRIOV/VFs will be enabled.

preperation for ethernet sriov and l2 table management.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 073bb189)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

79f6e12d

net/mlx5e: Write vlan list into vport context · b4768b41

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Each Vport/vNIC must notify underlying e-Switch layer
for vlan table changes in-order to update SR-IOV FDB tables.

We do that at vlan_rx_add_vid and vlan_rx_kill_vid ndos.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit aad9e6e4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b4768b41

net/mlx5e: Write UC/MC list and promisc mode into vport context · ada09a90

Saeed Mahameed authored Dec 01, 2015

BugLink: http://bugs.launchpad.net/bugs/1540435

Each Vport/vNIC must notify underlying e-Switch layer
for UC/MC list and promisc mode updates, in-order to update
l2 tables and SR-IOV FDB tables.

We do that at set_rx_mode ndo.

preperation for ethernet-SRIOV and l2 table management.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5e55da1d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ada09a90