Commits · 5d07c96f100cca4726661127eb100be4a7feff71 · Kirill Smelkov / linux

06 Apr, 2016 6 commits

UBUNTU: SAUCE: fuse: Restrict allow_other to the superblock's namespace or a descendant · 5d07c96f

Seth Forshee authored Oct 02, 2014

Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

5d07c96f

UBUNTU: SAUCE: fuse: Support fuse filesystems outside of init_user_ns · e5f2c764

Seth Forshee authored Jun 26, 2014

In order to support mounts from namespaces other than
init_user_ns, fuse must translate uids and gids to/from the
userns of the process servicing requests on /dev/fuse. This
patch does that, with a couple of restrictions on the namespace:

 - The userns for the fuse connection is fixed to the namespace
   from which /dev/fuse is opened.

 - The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the
need to pass around userns references and by allowing fuse to
rely on the checks in inode_change_ok for ownership changes.
Either restriction could be relaxed in the future if needed.

For cuse the namespace used for the connection is also simply
current_user_ns() at the time /dev/cuse is opened.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

e5f2c764

UBUNTU: SAUCE: fuse: Add support for pid namespaces · 9b869708

Seth Forshee authored Jul 02, 2014

If the userspace process servicing fuse requests is running in
a pid namespace then pids passed via the fuse fd need to be
translated relative to that namespace. Capture the pid namespace
in use when the filesystem is mounted and use this for pid
translation.

Since no use case currently exists for changing namespaces all
translations are done relative to the pid namespace in use when
/dev/fuse is opened. Mounting or /dev/fuse IO from another
namespace will return errors.

Requests from processes whose pid cannot be translated into the
target namespace are not permitted, except for requests
allocated via fuse_get_req_nofail_nopages. For no-fail requests
in.h.pid will be 0 if the pid translation fails.

File locking changes based on previous work done by Eric
Biederman.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

9b869708

UBUNTU: SAUCE: capabilities: Allow privileged user in s_user_ns to set security.* xattrs · e1804ed9

Seth Forshee authored Dec 11, 2014

A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

e1804ed9

UBUNTU: SAUCE: fs: Allow superblock owner to access do_remount_sb() · 05b459a0

Seth Forshee authored Jan 22, 2015

Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

05b459a0

UBUNTU: SAUCE: fs: Don't remove suid for CAP_FSETID in s_user_ns · b50099a2

Seth Forshee authored Feb 15, 2015

Expand the check in should_remove_suid() to keep privileges for
CAP_FSETID in s_user_ns rather than init_user_ns.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b50099a2

29 Feb, 2016 34 commits

UBUNTU: SAUCE: fs: Update posix_acl support to handle user namespace mounts · 7efdc167

Seth Forshee authored Feb 02, 2015

ids in on-disk ACLs should be converted to s_user_ns instead of
init_user_ns as is done now. This introduces the possibility for
id mappings to fail, and when this happens syscalls will return
EOVERFLOW.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

7efdc167

UBUNTU: SAUCE: fs: Refuse uid/gid changes which don't map into s_user_ns · fe50b8ac

Seth Forshee authored Nov 19, 2014

Add checks to inode_change_ok to verify that uid and gid changes
will map into the superblock's user namespace. If they do not
fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

fe50b8ac

UBUNTU: SAUCE: cred: Reject inodes with invalid ids in set_create_file_as() · 6f41f62c

Seth Forshee authored Feb 05, 2015

Using INVALID_[UG]ID for the LSM file creation context doesn't
make sense, so return an error if the inode passed to
set_create_file_as() has an invalid id.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

6f41f62c

UBUNTU: SAUCE: fs: Check for invalid i_uid in may_follow_link() · 3289bfab

Seth Forshee authored Aug 26, 2014

Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted when protected
symlinks are enabled.

Add a new helper function, uid_valid_eq(), and use this to
validate that the ids in may_follow_link() are both equal and
valid. Also add an equivalent helper for gids, which is
currently unused.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

3289bfab

UBUNTU: SAUCE: Smack: Handle labels consistently in untrusted mounts · bdeb24e7

Seth Forshee authored Sep 30, 2015

The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
differently in untrusted mounts. This is confusing and
potentically problematic. Change this to handle them all the same
way that SMACK64 is currently handled; that is, read the label
from disk and check it at use time. For SMACK64 and SMACK64MMAP
access is denied if the label does not match smk_root. To be
consistent with suid, a SMACK64EXEC label which does not match
smk_root will still allow execution of the file but will not run
with the label supplied in the xattr.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

bdeb24e7

UBUNTU: SAUCE: userns: Replace in_userns with current_in_userns · 2c95625c

Seth Forshee authored Sep 29, 2015

All current callers of in_userns pass current_user_ns as the
first argument. Simplify by replacing in_userns with
current_in_userns which checks whether current_user_ns is in the
namespace supplied as an argument.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

2c95625c

UBUNTU: SAUCE: selinux: Add support for unprivileged mounts from user namespaces · 7baf165b

Seth Forshee authored Jul 29, 2015

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

7baf165b

UBUNTU: SAUCE: fs: Treat foreign mounts as nosuid · 895b5840

Andy Lutomirski authored Oct 14, 2014

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

895b5840

UBUNTU: SAUCE: block_dev: Check permissions towards block device inode when mounting · 9a472674

Seth Forshee authored Oct 07, 2015

Unprivileged users should not be able to mount block devices when
they lack sufficient privileges towards the block device inode.
Update blkdev_get_by_path() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

9a472674

UBUNTU: SAUCE: block_dev: Support checking inode permissions in lookup_bdev() · 4a959886

Seth Forshee authored Jul 31, 2015

When looking up a block device by path no permission check is
done to verify that the user has access to the block device inode
at the specified path. In some cases it may be necessary to
check permissions towards the inode, such as allowing
unprivileged users to mount block devices in user namespaces.

Add an argument to lookup_bdev() to optionally perform this
permission check. A value of 0 skips the permission check and
behaves the same as before. A non-zero value specifies the mask
of access rights required towards the inode at the specified
path. The check is always skipped if the user has CAP_SYS_ADMIN.

All callers of lookup_bdev() currently pass a mask of 0, so this
patch results in no functional change. Subsequent patches will
add permission checks where appropriate.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

4a959886

UBUNTU: SAUCE: Smack: Add support for unprivileged mounts from user namespaces · 7effec54

Seth Forshee authored Sep 23, 2015

Security labels from unprivileged mounts cannot be trusted.
Ideally for these mounts we would assign the objects in the
filesystem the same label as the inode for the backing device
passed to mount. Unfortunately it's currently impossible to
determine which inode this is from the LSM mount hooks, so we
settle for the label of the process doing the mount.

This label is assigned to s_root, and also to smk_default to
ensure that new inodes receive this label. The transmute property
is also set on s_root to make this behavior more explicit, even
though it is technically not necessary.

If a filesystem has existing security labels, access to inodes is
permitted if the label is the same as smk_root, otherwise access
is denied. The SMACK64EXEC xattr is completely ignored.

Explicit setting of security labels continues to require
CAP_MAC_ADMIN in init_user_ns.

Altogether, this ensures that filesystem objects are not
accessible to subjects which cannot already access the backing
store, that MAC is not violated for any objects in the fileystem
which are already labeled, and that a user cannot use an
unprivileged mount to gain elevated MAC privileges.

sysfs, tmpfs, and ramfs are already mountable from user
namespaces and support security labels. We can't rule out the
possibility that these filesystems may already be used in mounts
from user namespaces with security lables set from the init
namespace, so failing to trust lables in these filesystems may
introduce regressions. It is safe to trust labels from these
filesystems, since the unprivileged user does not control the
backing store and thus cannot supply security labels, so an
explicit exception is made to trust labels from these
filesystems.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

7effec54

UBUNTU: SAUCE: fs: Limit file caps to the user namespace of the super block · 4147df9e

Seth Forshee authored Sep 23, 2015

Capability sets attached to files must be ignored except in the
user namespaces where the mounter is privileged, i.e. s_user_ns
and its descendants. Otherwise a vector exists for gaining
privileges in namespaces where a user is not already privileged.

Add a new helper function, in_user_ns(), to test whether a user
namespace is the same as or a descendant of another namespace.
Use this helper to determine whether a file's capability set
should be applied to the caps constructed during exec.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

4147df9e

UBUNTU: SAUCE: fs: Add user namesapace member to struct super_block · b6500fda

Seth Forshee authored Sep 23, 2015

Initially this will be used to eliminate the implicit MNT_NODEV
flag for mounts from user namespaces. In the future it will also
be used for translating ids and checking capabilities for
filesystems mounted from user namespaces.

s_user_ns is initialized in alloc_super() and is generally set to
current_user_ns(). To avoid security and corruption issues, two
additional mount checks are also added:

 - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
   in current_user_ns().

 - sget() will fail with EBUSY when the filesystem it's looking
   for is already mounted from another user namespace.

proc requires some special handling. The user namespace of
current isn't appropriate when forking as a result of clone (2)
with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
namespace of the parent and make proc unmountable in the new user
namespace. Instead, the user namespace which owns the new pid
namespace is used. sget_userns() is allowed to allow passing in
a namespace other than that of current, and sget becomes a
wrapper around sget_userns() which passes current_user_ns().

Changes to original version of this patch
  * Documented @user_ns in sget_userns, alloc_super and fs.h
  * Kept an blank line in fs.h
  * Removed unncessary include of user_namespace.h from fs.h
  * Tweaked the location of get_user_ns and put_user_ns so
    the security modules can (if they wish) depend on it.
  -- EWB
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b6500fda

NTB: Add support for AMD PCI-Express Non-Transparent Bridge · ca175ae9

Xiangliang Yu authored Jan 21, 2016

BugLink: http://bugs.launchpad.net/bugs/1542071

This adds support for AMD's PCI-Express Non-Transparent Bridge
(NTB) device on the Zeppelin platform. The driver connnects to the
standard NTB sub-system interface, with modification to add hooks
for power management in a separate patch. The AMD NTB device has 3
memory windows, 16 doorbell, 16 scratch-pad registers, and supports
up to 16 PCIe lanes running a Gen3 speeds.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
(cherry picked from commit a1b36958)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ca175ae9

UBUNTU: [Config] CONFIG_NTB_AMD=m · b28a28de

Tim Gardner authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1542071Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b28a28de

UBUNTU: SAUCE: storvsc: use small sg_tablesize on x86 · 5f53c4e9

Joseph Salisbury authored Oct 15, 2015

BugLink: http://bugs.launchpad.net/bugs/1495983

OriginalAuthor: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Brad Figg <brad.figg@canonical.com>

5f53c4e9

UBUNTU: [Config] CONFIG_ARMV8_DEPRECATED=y · 464e0afc

Tim Gardner authored Feb 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1545542Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

464e0afc

ahci_xgene: Implement the workaround to fix the missing of the edge interrupt... · 1cab7985

Suman Tripathi authored Feb 06, 2016

ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.

Due to H/W errata, the HOST_IRQ_STAT register misses the edge interrupt
when clearing the HOST_IRQ_STAT register and hardware reporting the
PORT_IRQ_STAT register happens to be at the same clock cycle.
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from linux-next commit 32aea268)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

1cab7985

ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci. · b293ac01

Suman Tripathi authored Feb 06, 2016

The flexibility to override the irq handles in the LLD's are already
present, so controllers implementing a edge trigger latch can
implement their own interrupt handler inside the driver.  This patch
removes the AHCI_HFLAG_EDGE_IRQ support from libahci and moves edge
irq handling to ahci_xgene.

tj: Minor update to description.
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kenrel.org>
(cherry picked from linux-next commit d867b95f)
[ dannf: offset adjustments ]
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b293ac01

libahci: Implement the capability to override the generic ahci interrupt handler. · f8aa59f3

Suman Tripathi authored Feb 06, 2016

This patch implements the capability to override the generic AHCI
interrupt handler so that specific ahci drivers can implement their
own custom interrupt handler routines.  It also exports
ahci_handle_port_intr so that custom irq_handler implementations can
use it.

tj: s/ahci_irq_handler/irq_handler/ and updated description.
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from linux-next commit f070d671)
[ dannf: backported to v4.4 ]
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f8aa59f3

megaraid: Fix possible NULL pointer deference in mraid_mm_ioctl · b90e5b1b

Nicholas Krause authored Jan 05, 2016

This adds the needed check after the call to the function
mraid_mm_alloc_kioc in order to make sure that this function has not
returned NULL and therefore makes sure we do not deference a NULL
pointer if one is returned by mraid_mm_alloc_kioc.  Further more add
needed comments explaining that this function call can return NULL if
the list head is empty for the pointer passed in order to allow furture
users to understand this required pointer check.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7296f62f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

b90e5b1b

UBUNTU: Start new release · ecf10133
Tim Gardner authored Feb 12, 2016
```
Ignore: yes
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
```
ecf10133
UBUNTU: Ubuntu-4.4.0-5.20 · f29a3f86
Tim Gardner authored Feb 11, 2016
```
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
```
f29a3f86

s390/cio: update measurement characteristics · fe0b3e65

Sebastian Ott authored Jan 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1541534

Per channel path measurement characteristics are obtained during channel
path registration. However if some properties of a channel path change
we don't update the measurement characteristics.

Make sure to update the characteristics when we change the properties of
a channel path or receive a notification from FW about such a change.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 9f3d6d7a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

fe0b3e65

s390/cio: ensure consistent measurement state · a4503e66

Sebastian Ott authored Jan 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1541534

Make sure that in all cases where we could not obtain measurement
characteristics the associated fields are set to invalid values.

Note: without this change the "shared" capability of a channel path
for which we could not obtain the measurement characteristics was
incorrectly displayed as 0 (not shared). We will now correctly
report "unknown" in this case.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 61f0bfcf)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

a4503e66

s390/cio: fix measurement characteristics memleak · d4c21c8b

Sebastian Ott authored Jan 25, 2016

BugLink: http://bugs.launchpad.net/bugs/1541534

Measurement characteristics are allocated during channel path
registration but not freed during deregistration. Fix this by
embedding these characteristics inside struct channel_path.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 0d9bfe91)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

d4c21c8b

qeth: initialize net_device with carrier off · 9afc6703

Ursula Braun authored Dec 11, 2015

BugLink: http://bugs.launchpad.net/bugs/1541907

/sys/class/net/<interface>/operstate for an active qeth network
interface offen shows "unknown", which translates to "state UNKNOWN
in output of "ip link show". It is caused by a missing initialization
of the __LINK_STATE_NOCARRIER bit in the net_device state field.
This patch adds a netif_carrier_off() invocation when creating the
net_device for a qeth device.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reference-ID: Bugzilla 133209
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e5ebe632)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

9afc6703

UBUNTU: SAUCE: (noup) Update spl to 0.6.5.4-0ubuntu2, zfs to 0.6.5.4-0ubuntu2 · 3258f645
Tim Gardner authored Feb 10, 2016
```
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
```
3258f645

mm: CONFIG_NR_ZONES_EXTENDED · f671c3e6

Dan Williams authored Feb 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1534647

ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new mm
zones that are bumping up against the current maximum limit of 4 zones,
i.e.  2 bits in page->flags.  When adding a zone this equation still needs
to be satisified:

    SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT
	  <= BITS_PER_LONG - NR_PAGEFLAGS

ZONE_DEVICE currently tries to satisfy this equation by requiring that
ZONE_DMA be disabled, but this is untenable given generic kernels want to
support ZONE_DEVICE and ZONE_DMA simultaneously.  ZONE_CMA would like to
increase the amount of memory covered per section, but that limits the
minimum granularity at which consecutive memory ranges can be added via
devm_memremap_pages().

The trade-off of what is acceptable to sacrifice depends heavily on the
platform.  For example, ZONE_CMA is targeted for 32-bit platforms where
page->flags is constrained, but those platforms likely do not care about
the minimum granularity of memory hotplug.  A big iron machine with 1024
numa nodes can likely sacrifice ZONE_DMA where a general purpose
distribution kernel can not.

CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected when
the number of configured zones exceeds 4.  It documents the configuration
symbols and definitions that get modified when ZONES_WIDTH is greater than
2.

For now, it steals a bit from NODES_SHIFT.  Later on it can be used to
document the definitions that get modified when a 32-bit configuration
wants more zone bits.

Note that GFP_ZONE_TABLE poses an interesting constraint since
include/linux/gfp.h gets included by the 32-bit portion of a 64-bit build.
We need to be careful to only build the table for zones that have a
corresponding gfp_t flag.  GFP_ZONES_SHIFT is introduced for this purpose.
This patch does not attempt to solve the problem of adding a new zone
that also has a corresponding GFP_ flag.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931
Fixes: 033fbae9 ("mm: ZONE_DEVICE for "device memory"")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Mark <markk@clara.co.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from linux-next commit 27ffb3827ac71a46e8d52fc7ed7302d33a619d6c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f671c3e6

UBUNTU: [Config] CONFIG_ZONE_DMA=y · f192f00a

Tim Gardner authored Feb 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1534647Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f192f00a

perf kvm/powerpc: Add support for HCALL reasons · e6e8b816

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

Powerpc provides hcall events that also provides insights into guest
behaviour. Enhance perf kvm stat to record and analyze hcall events.

 - To trace hcall events :
  perf kvm stat record

 - To show the results :
  perf kvm stat report --event=hcall

The result shows the number of hypervisor calls from the guest grouped
by their respective reasons displayed with the frequency.

This patch makes use of two additional tracepoints
"kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall
codes to their respective names, it needs a mapping. Such mapping is
added in this patch in book3s_hcalls.h.

 # pgrep qemu
A sample output :
19378
60515

2 VMs running.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515 --event=hcall

Analyze events for all VMs, all VCPUs:

    HCALL-EVENT Samples Samples% Time% MinTime MaxTime  AvgTime

          H_IPI     822  66.08% 88.10% 0.63us  11.38us 2.05us (+- 1.42%)
     H_SEND_CRQ     144  11.58%  3.77% 0.41us   0.88us 0.50us (+- 1.47%)
   H_VIO_SIGNAL     118   9.49%  2.86% 0.37us   0.83us 0.47us (+- 1.43%)
H_PUT_TERM_CHAR      76   6.11%  2.07% 0.37us   0.90us 0.52us (+- 2.43%)
H_GET_TERM_CHAR      74   5.95%  2.23% 0.37us   1.70us 0.58us (+- 4.77%)
         H_RTAS       6   0.48%  0.85% 1.10us   9.25us 2.70us (+-48.57%)
      H_PERFMON       4   0.32%  0.12% 0.41us   0.96us 0.59us (+-20.92%)

Total Samples:1244, Total events handled time:1916.69us.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-4-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 78e6c39b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

e6e8b816

perf kvm/powerpc: Port perf kvm stat to powerpc · c66c9b37

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

perf kvm can be used to analyze guest exit reasons. This support already
exists in x86. Hence, porting it to powerpc.

 - To trace KVM events :
  perf kvm stat record
  If many guests are running, we can track for a specific guest by using
  --pid as in : perf kvm stat record --pid <pid>

 - To see the results :
  perf kvm stat report

The result shows the number of exits (from the guest context to
host/hypervisor context) grouped by their respective exit reasons with
their frequency.

Since, different powerpc machines have different KVM tracepoints, this
patch discovers the available tracepoints dynamically and accordingly
looks for them. If any single tracepoint is not present, this support
won't be enabled for reporting. To record, this will fail if any of the
events we are looking to record isn't available.  Right now, its only
supported on PowerPC Book3S_HV architectures.

To analyze the different exits, group them and present them (in a slight
descriptive way) to the user, we need a mapping between the "exit code"
(dumped in the kvm_guest_exit tracepoint data) and to its related
Interrupt vector description (exit reason). This patch adds this mapping
in book3s_hv_exits.h.

It records on two available KVM tracepoints for book3s_hv:

"kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter".

Here is a sample o/p:
 # pgrep qemu
19378
60515

2 Guests are running on the host.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515

Analyze events for pid(s) 60515, all VCPUs:

     VM-EXIT Samples Samples% Time% MinTime    MaxTime  Avg time

       SYSCALL  9141  63.67%  7.49% 1.26us   5782.39us    9.87us (+- 6.46%)
H_DATA_STORAGE  4114  28.66%  5.07% 1.72us   4597.68us   14.84us (+-20.06%)
HV_DECREMENTER   418   2.91%  4.26% 0.70us  30002.22us  122.58us (+-70.29%)
      EXTERNAL   392   2.73%  0.06% 0.64us    104.10us    1.94us (+-18.83%)
RETURN_TO_HOST   287   2.00% 83.11% 1.53us 124240.15us 3486.52us (+-16.81%)
H_INST_STORAGE     5   0.03%  0.00% 1.88us      3.73us    2.39us (+-14.20%)

Total Samples:14357, Total events handled time:1203918.42us.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-3-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 066d3593)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

c66c9b37

perf kvm/{x86,s390}: Remove const from kvm_events_tp · 83f70bbc

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

This patch removes the "const" qualifier from kvm_events_tp declaration
to account for the fact that some architectures may need to update this
variable dynamically. For instance, powerpc will need to update this
variable dynamically depending on the machine type.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-2-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 48deaa74)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

83f70bbc

perf kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h · f48df91f

Hemant Kumar authored Jan 28, 2016

BugLink: http://bugs.launchpad.net/bugs/1521678

Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic
discovery of kvm events (if its needed). To do this, some extern
variables have been introduced with which we can keep the generic
functions generic.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott  Wood <scottwood@freescale.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1453962787-15376-1-git-send-email-hemant@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from linux-next commit 162607ea)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

f48df91f