1. 11 Apr, 2016 25 commits
  2. 30 Mar, 2016 5 commits
    • Christian Borntraeger's avatar
      kernel: Provide READ_ONCE and ASSIGN_ONCE · b5be8baf
      Christian Borntraeger authored
      commit 230fa253 upstream.
      
      ACCESS_ONCE does not work reliably on non-scalar types. For
      example gcc 4.6 and 4.7 might remove the volatile tag for such
      accesses during the SRA (scalar replacement of aggregates) step
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
      
      Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
      scalar types as suggested by Linus Torvalds. Accesses larger than
      the machines word size cannot be guaranteed to be atomic. These
      macros will use memcpy and emit a build warning.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b5be8baf
    • Eric W. Biederman's avatar
      umount: Do not allow unmounting rootfs. · e1ce59d2
      Eric W. Biederman authored
      commit da362b09 upstream.
      
      Andrew Vagin <avagin@parallels.com> writes:
      
      > #define _GNU_SOURCE
      > #include <sys/types.h>
      > #include <sys/stat.h>
      > #include <fcntl.h>
      > #include <sched.h>
      > #include <unistd.h>
      > #include <sys/mount.h>
      >
      > int main(int argc, char **argv)
      > {
      > 	int fd;
      >
      > 	fd = open("/proc/self/ns/mnt", O_RDONLY);
      > 	if (fd < 0)
      > 	   return 1;
      > 	   while (1) {
      > 	   	 if (umount2("/", MNT_DETACH) ||
      > 		        setns(fd, CLONE_NEWNS))
      > 					break;
      > 					}
      >
      > 					return 0;
      > }
      >
      > root@ubuntu:/home/avagin# gcc -Wall nsenter.c -o nsenter
      > root@ubuntu:/home/avagin# strace ./nsenter
      > execve("./nsenter", ["./nsenter"], [/* 22 vars */]) = 0
      > ...
      > open("/proc/self/ns/mnt", O_RDONLY)     = 3
      > umount("/", MNT_DETACH)                 = 0
      > setns(3, 131072)                        = 0
      > umount("/", MNT_DETACH
      >
      causes:
      
      > [  260.548301] ------------[ cut here ]------------
      > [  260.550941] kernel BUG at /build/buildd/linux-3.13.0/fs/pnode.c:372!
      > [  260.552068] invalid opcode: 0000 [#1] SMP
      > [  260.552068] Modules linked in: xt_CHECKSUM iptable_mangle xt_tcpudp xt_addrtype xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc nfsd auth_rpcgss nfs_acl aesni_intel nfs lockd aes_x86_64 sunrpc fscache lrw gf128mul glue_helper ablk_helper cryptd serio_raw ppdev parport_pc lp parport btrfs xor raid6_pq libcrc32c psmouse floppy
      > [  260.552068] CPU: 0 PID: 1723 Comm: nsenter Not tainted 3.13.0-30-generic #55-Ubuntu
      > [  260.552068] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      > [  260.552068] task: ffff8800376097f0 ti: ffff880074824000 task.ti: ffff880074824000
      > [  260.552068] RIP: 0010:[<ffffffff811e9483>]  [<ffffffff811e9483>] propagate_umount+0x123/0x130
      > [  260.552068] RSP: 0018:ffff880074825e98  EFLAGS: 00010246
      > [  260.552068] RAX: ffff88007c741140 RBX: 0000000000000002 RCX: ffff88007c741190
      > [  260.552068] RDX: ffff88007c741190 RSI: ffff880074825ec0 RDI: ffff880074825ec0
      > [  260.552068] RBP: ffff880074825eb0 R08: 00000000000172e0 R09: ffff88007fc172e0
      > [  260.552068] R10: ffffffff811cc642 R11: ffffea0001d59000 R12: ffff88007c741140
      > [  260.552068] R13: ffff88007c741140 R14: ffff88007c741140 R15: 0000000000000000
      > [  260.552068] FS:  00007fd5c7e41740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      > [  260.552068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > [  260.552068] CR2: 00007fd5c7968050 CR3: 0000000070124000 CR4: 00000000000406f0
      > [  260.552068] Stack:
      > [  260.552068]  0000000000000002 0000000000000002 ffff88007c631000 ffff880074825ed8
      > [  260.552068]  ffffffff811dcfac ffff88007c741140 0000000000000002 ffff88007c741160
      > [  260.552068]  ffff880074825f38 ffffffff811dd12b ffffffff811cc642 0000000075640000
      > [  260.552068] Call Trace:
      > [  260.552068]  [<ffffffff811dcfac>] umount_tree+0x20c/0x260
      > [  260.552068]  [<ffffffff811dd12b>] do_umount+0x12b/0x300
      > [  260.552068]  [<ffffffff811cc642>] ? final_putname+0x22/0x50
      > [  260.552068]  [<ffffffff811cc849>] ? putname+0x29/0x40
      > [  260.552068]  [<ffffffff811dd88c>] SyS_umount+0xdc/0x100
      > [  260.552068]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
      > [  260.552068] Code: 89 50 08 48 8b 50 08 48 89 02 49 89 45 08 e9 72 ff ff ff 0f 1f 44 00 00 4c 89 e6 4c 89 e7 e8 f5 f6 ff ff 48 89 c3 e9 39 ff ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 66 66 66 66 90 55 b8 01
      > [  260.552068] RIP  [<ffffffff811e9483>] propagate_umount+0x123/0x130
      > [  260.552068]  RSP <ffff880074825e98>
      > [  260.611451] ---[ end trace 11c33d85f1d4c652 ]--
      
      Which in practice is totally uninteresting.  Only the global root user can
      do it, and it is just a stupid thing to do.
      
      However that is no excuse to allow a silly way to oops the kernel.
      
      We can avoid this silly problem by setting MNT_LOCKED on the rootfs
      mount point and thus avoid needing any special cases in the unmount
      code.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e1ce59d2
    • David S. Miller's avatar
      ipv4: Don't do expensive useless work during inetdev destroy. · 5cc4ff31
      David S. Miller authored
      commit fbd40ea0 upstream.
      
      When an inetdev is destroyed, every address assigned to the interface
      is removed.  And in this scenerio we do two pointless things which can
      be very expensive if the number of assigned interfaces is large:
      
      1) Address promotion.  We are deleting all addresses, so there is no
         point in doing this.
      
      2) A full nf conntrack table purge for every address.  We only need to
         do this once, as is already caught by the existing
         masq_dev_notifier so masq_inet_event() can skip this.
      
      [mk] 3.12.*: The change in masq_inet_event() needs to be duplicated in
      both IPv4 and IPv6 version of the function, these two were merged in
      3.18.
      Reported-by: default avatarSolar Designer <solar@openwall.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5cc4ff31
    • Gabriel Krisman Bertazi's avatar
      ipr: Fix regression when loading firmware · 7514ee10
      Gabriel Krisman Bertazi authored
      commit 21b81716 upstream.
      
      Commit d63c7dd5 ("ipr: Fix out-of-bounds null overwrite") removed
      the end of line handling when storing the update_fw sysfs attribute.
      This changed the userpace API because it started refusing writes
      terminated by a line feed, which broke the update tools we already have.
      
      This patch re-adds that handling, so both a write terminated by a line
      feed or not can make it through with the update.
      
      Fixes: d63c7dd5 ("ipr: Fix out-of-bounds null overwrite")
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Insu Yun <wuninsu@gmail.com>
      Acked-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7514ee10
    • Insu Yun's avatar
      ipr: Fix out-of-bounds null overwrite · def91dea
      Insu Yun authored
      commit d63c7dd5 upstream.
      
      Return value of snprintf is not bound by size value, 2nd argument.
      (https://www.kernel.org/doc/htmldocs/kernel-api/API-snprintf.html).
      Return value is number of printed chars, can be larger than 2nd
      argument.  Therefore, it can write null byte out of bounds ofbuffer.
      Since snprintf puts null, it does not need to put additional null byte.
      Signed-off-by: default avatarInsu Yun <wuninsu@gmail.com>
      Reviewed-by: default avatarShane Seymour <shane.seymour@hpe.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      def91dea
  3. 16 Mar, 2016 7 commits
    • Jiri Slaby's avatar
      Linux 3.12.57 · d9d35182
      Jiri Slaby authored
      d9d35182
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY · 372e0615
      Konrad Rzeszutek Wilk authored
      commit 8d47065f upstream.
      
      Commit 408fb0e5 (xen/pciback: Don't
      allow MSI-X ops if PCI_COMMAND_MEMORY is not set) prevented enabling
      MSI-X on passed-through virtual functions, because it checked the VF
      for PCI_COMMAND_MEMORY but this is not a valid bit for VFs.
      
      Instead, check the physical function for PCI_COMMAND_MEMORY.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      372e0615
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. · bb7aa305
      Konrad Rzeszutek Wilk authored
      commit 408fb0e5 upstream.
      
      commit f598282f ("PCI: Fix the NIU MSI-X problem in a better way")
      teaches us that dealing with MSI-X can be troublesome.
      
      Further checks in the MSI-X architecture shows that if the
      PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
      may not be able to access the BAR (since they are memory regions).
      
      Since the MSI-X tables are located in there.. that can lead
      to us causing PCIe errors. Inhibit us performing any
      operation on the MSI-X unless the MEMORY bit is set.
      
      Note that Xen hypervisor with:
      "x86/MSI-X: access MSI-X table only after having enabled MSI-X"
      will return:
      xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
      
      When the generic MSI code tries to setup the PIRQ without
      MEMORY bit set. Which means with later versions of Xen
      (4.6) this patch is not neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bb7aa305
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. · 388a8005
      Konrad Rzeszutek Wilk authored
      commit 7cfb905b upstream.
      
      Otherwise just continue on, returning the same values as
      previously (return of 0, and op->result has the PIRQ value).
      
      This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
      
      The pci_disable_msi or pci_disable_msix have the checks for
      msi_enabled or msix_enabled so they will error out immediately.
      
      However the guest can still call these operations and cause
      us to disable the 'ack_intr'. That means the backend IRQ handler
      for the legacy interrupt will not respond to interrupts anymore.
      
      This will lead to (if the device is causing an interrupt storm)
      for the Linux generic code to disable the interrupt line.
      
      Naturally this will only happen if the device in question
      is plugged in on the motherboard on shared level interrupt GSI.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      388a8005
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Do not install an IRQ handler for MSI interrupts. · 4113b288
      Konrad Rzeszutek Wilk authored
      commit a396f3a2 upstream.
      
      Otherwise an guest can subvert the generic MSI code to trigger
      an BUG_ON condition during MSI interrupt freeing:
      
       for (i = 0; i < entry->nvec_used; i++)
              BUG_ON(irq_has_action(entry->irq + i));
      
      Xen PCI backed installs an IRQ handler (request_irq) for
      the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
      (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
      done in case the device has legacy interrupts the GSI line
      is shared by the backend devices.
      
      To subvert the backend the guest needs to make the backend
      to change the dev->irq from the GSI to the MSI interrupt line,
      make the backend allocate an interrupt handler, and then command
      the backend to free the MSI interrupt and hit the BUG_ON.
      
      Since the backend only calls 'request_irq' when the guest
      writes to the PCI_COMMAND register the guest needs to call
      XEN_PCI_OP_enable_msi before any other operation. This will
      cause the generic MSI code to setup an MSI entry and
      populate dev->irq with the new PIRQ value.
      
      Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
      and cause the backend to setup an IRQ handler for dev->irq
      (which instead of the GSI value has the MSI pirq). See
      'xen_pcibk_control_isr'.
      
      Then the guest disables the MSI: XEN_PCI_OP_disable_msi
      which ends up triggering the BUG_ON condition in 'free_msi_irqs'
      as there is an IRQ handler for the entry->irq (dev->irq).
      
      Note that this cannot be done using MSI-X as the generic
      code does not over-write dev->irq with the MSI-X PIRQ values.
      
      The patch inhibits setting up the IRQ handler if MSI or
      MSI-X (for symmetry reasons) code had been called successfully.
      
      P.S.
      Xen PCIBack when it sets up the device for the guest consumption
      ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
      XSA-120 addendum patch removed that - however when upstreaming said
      addendum we found that it caused issues with qemu upstream. That
      has now been fixed in qemu upstream.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4113b288
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled · d8e0a80d
      Konrad Rzeszutek Wilk authored
      commit 5e0ce145 upstream.
      
      The guest sequence of:
      
        a) XEN_PCI_OP_enable_msix
        b) XEN_PCI_OP_enable_msix
      
      results in hitting an NULL pointer due to using freed pointers.
      
      The device passed in the guest MUST have MSI-X capability.
      
      The a) constructs and SysFS representation of MSI and MSI groups.
      The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
      'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
      in a) pdev->msi_irq_groups is still set) and also free's ALL of the
      MSI-X entries of the device (the ones allocated in step a) and b)).
      
      The unwind code: 'free_msi_irqs' deletes all the entries and tries to
      delete the pdev->msi_irq_groups (which hasn't been set to NULL).
      However the pointers in the SysFS are already freed and we hit an
      NULL pointer further on when 'strlen' is attempted on a freed pointer.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
      against that. The check for msi_enabled is not stricly neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d8e0a80d
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled · 0842f7d8
      Konrad Rzeszutek Wilk authored
      commit 56441f3c upstream.
      
      The guest sequence of:
      
       a) XEN_PCI_OP_enable_msi
       b) XEN_PCI_OP_enable_msi
       c) XEN_PCI_OP_disable_msi
      
      results in hitting an BUG_ON condition in the msi.c code.
      
      The MSI code uses an dev->msi_list to which it adds MSI entries.
      Under the above conditions an BUG_ON() can be hit. The device
      passed in the guest MUST have MSI capability.
      
      The a) adds the entry to the dev->msi_list and sets msi_enabled.
      The b) adds a second entry but adding in to SysFS fails (duplicate entry)
      and deletes all of the entries from msi_list and returns (with msi_enabled
      is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:
      
      BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
      
      and blows up.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
      against that. The check for msix_enabled is not stricly neccessary.
      
      This is part of XSA-157.
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0842f7d8
  4. 14 Mar, 2016 3 commits
    • Rusty Russell's avatar
      modules: fix longstanding /proc/kallsyms vs module insertion race. · a801fe89
      Rusty Russell authored
      commit 8244062e upstream.
      
      For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
      There's one full copy, marked SHF_ALLOC and laid out at the end of the
      module's init section.  There's also a cut-down version that only
      contains core symbols and strings, and lives in the module's core
      section.
      
      After module init (and before we free the module memory), we switch
      the mod->symtab, mod->num_symtab and mod->strtab to point to the core
      versions.  We do this under the module_mutex.
      
      However, kallsyms doesn't take the module_mutex: it uses
      preempt_disable() and rcu tricks to walk through the modules, because
      it's used in the oops path.  It's also used in /proc/kallsyms.
      There's nothing atomic about the change of these variables, so we can
      get the old (larger!) num_symtab and the new symtab pointer; in fact
      this is what I saw when trying to reproduce.
      
      By grouping these variables together, we can use a
      carefully-dereferenced pointer to ensure we always get one or the
      other (the free of the module init section is already done in an RCU
      callback, so that's safe).  We allocate the init one at the end of the
      module init section, and keep the core one inside the struct module
      itself (it could also have been allocated at the end of the module
      core, but that's probably overkill).
      
      [ Rebased for 4.4-stable and older, because the following changes aren't
        in the older trees:
        - e0224418: adds arg to is_core_symbol
        - 7523e4dc: module_init/module_core/init_size/core_size
          become init_layout.base/core_layout.base/init_layout.size/core_layout.size.
      
        Original commit: 8244062e
      ]
      Reported-by: default avatarWeilong Chen <chenweilong@huawei.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a801fe89
    • Jason Andryuk's avatar
      lib/ucs2_string: Correct ucs2 -> utf8 conversion · 1f62ed7f
      Jason Andryuk authored
      commit a6807590 upstream.
      
      The comparisons should be >= since 0x800 and 0x80 require an additional bit
      to store.
      
      For the 3 byte case, the existing shift would drop off 2 more bits than
      intended.
      
      For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in
      byte 2.
      Signed-off-by: default avatarJason Andryuk <jandryuk@gmail.com>
      Reviewed-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Matthew Garrett <mjg59@coreos.com>
      Cc: "Lee, Chun-Yi" <jlee@suse.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1f62ed7f
    • Matt Fleming's avatar
      efi: Add pstore variables to the deletion whitelist · ba1b6019
      Matt Fleming authored
      commit e246eb56 upstream.
      
      Laszlo explains why this is a good idea,
      
       'This is because the pstore filesystem can be backed by UEFI variables,
        and (for example) a crash might dump the last kilobytes of the dmesg
        into a number of pstore entries, each entry backed by a separate UEFI
        variable in the above GUID namespace, and with a variable name
        according to the above pattern.
      
        Please see "drivers/firmware/efi/efi-pstore.c".
      
        While this patch series will not prevent the user from deleting those
        UEFI variables via the pstore filesystem (i.e., deleting a pstore fs
        entry will continue to delete the backing UEFI variable), I think it
        would be nice to preserve the possibility for the sysadmin to delete
        Linux-created UEFI variables that carry portions of the crash log,
        *without* having to mount the pstore filesystem.'
      
      There's also no chance of causing machines to become bricked by
      deleting these variables, which is the whole purpose of excluding
      things from the whitelist.
      
      Use the LINUX_EFI_CRASH_GUID guid and a wildcard '*' for the match so
      that we don't have to update the string in the future if new variable
      name formats are created for crash dump variables.
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Acked-by: default avatarPeter Jones <pjones@redhat.com>
      Tested-by: default avatarPeter Jones <pjones@redhat.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: "Lee, Chun-Yi" <jlee@suse.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      
      ba1b6019