1. 15 Feb, 2016 12 commits
    • Vignesh R's avatar
      spi: ti-qspi: Fix data corruption seen on r/w stress test · 3b8d4228
      Vignesh R authored
      commit bc27a539 upstream.
      
      Writing invalid command to QSPI_SPI_CMD_REG will terminate current
      transfer and de-assert the chip select. This has to be done before
      calling spi_finalize_current_message(). Because
      spi_finalize_current_message() will mark the end of current message
      transfer and schedule the next transfer. If the chipselect is not
      de-asserted before calling spi_finalize_current_message() then the next
      transfer will overlap with the previous transfer leading to data
      corruption.
      __spi_pump_message() can be called either from kthread worker context or
      directly from the calling process's context. It is possible that these
      two calls can race against each other. But race is serialized by
      checking whether master->cur_msg == NULL (pointer to msg being handled
      by transfer_one() at present). The master->cur_msg is set to NULL when
      spi_finalize_current_message() is called on that message, which means
      calling spi_finalize_current_message() allows __spi_sync() to pump next
      message in calling process context.
      Now if spi-ti-qspi calls spi_finalize_current_message() before we
      terminate transfer at hardware side, if __spi_pump_message() is called
      from process context then the successive transactions can overlap.
      
      Fix this by moving writing invalid command to QSPI_SPI_CMD_REG to
      before calling spi_finalize_current_message() call.
      Signed-off-by: default avatarVignesh R <vigneshr@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3b8d4228
    • David Mosberger-Tang's avatar
      spi: atmel: Fix DMA-setup for transfers with more than 8 bits per word · faed3a9b
      David Mosberger-Tang authored
      commit 06515f83 upstream.
      
      The DMA-slave configuration depends on the whether <= 8 or > 8 bits
      are transferred per word, so we need to call
      atmel_spi_dma_slave_config() with the correct value.
      Signed-off-by: default avatarDavid Mosberger <davidm@egauge.net>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      faed3a9b
    • Mauricio Faria de Oliveira's avatar
      Revert "dm mpath: fix stalls when handling invalid ioctls" · ec3b2cdc
      Mauricio Faria de Oliveira authored
      commit 47796938 upstream.
      
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ec3b2cdc
    • Dmitry V. Levin's avatar
      sh64: fix __NR_fgetxattr · afa6ea6a
      Dmitry V. Levin authored
      commit 2d33fa10 upstream.
      
      According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
      has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
      which is of course used by another syscall, __NR_sched_setaffinity in this
      case.
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      afa6ea6a
    • xuejiufei's avatar
      ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup · 4e7094d5
      xuejiufei authored
      commit c95a5180 upstream.
      
      When recovery master down, dlm_do_local_recovery_cleanup() only remove
      the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
      Which will make umount thread falling in dead loop migrating $RECOVERY
      to the dead node.
      Signed-off-by: default avatarxuejiufei <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4e7094d5
    • xuejiufei's avatar
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · 327f9703
      xuejiufei authored
      commit bef5502d upstream.
      
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      327f9703
    • Richard Weinberger's avatar
      kernel/signal.c: unexport sigsuspend() · 562b5d5e
      Richard Weinberger authored
      commit 9d8a7652 upstream.
      
      sigsuspend() is nowhere used except in signal.c itself, so we can mark it
      static do not pollute the global namespace.
      
      But this patch is more than a boring cleanup patch, it fixes a real issue
      on UserModeLinux.  UML has a special console driver to display ttys using
      xterm, or other terminal emulators, on the host side.  Vegard reported
      that sometimes UML is unable to spawn a xterm and he's facing the
      following warning:
      
        WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0()
      
      It turned out that this warning makes absolutely no sense as the UML
      xterm code calls sigsuspend() on the host side, at least it tries.  But
      as the kernel itself offers a sigsuspend() symbol the linker choose this
      one instead of the glibc wrapper.  Interestingly this code used to work
      since ever but always blocked signals on the wrong side.  Some recent
      kernel change made the WARN_ON() trigger and uncovered the bug.
      
      It is a wonderful example of how much works by chance on computers. :-)
      
      Fixes: 68f3f16d ("new helper: sigsuspend()")
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      562b5d5e
    • OGAWA Hirofumi's avatar
      fat: fix fake_offset handling on error path · 8ffd304f
      OGAWA Hirofumi authored
      commit 928a4771 upstream.
      
      For the root directory, .  and ..  are faked (using dir_emit_dots()) and
      ctx->pos is reset from 2 to 0.
      
      A corrupted root directory could cause fat_get_entry() to fail, but
      ->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos
      rewound to 0), so any following calls to ->iterate() continue to return
      the same entries again and again.
      
      The result is that userspace will never see the end of the directory,
      causing e.g.  'ls' to hang in a getdents() loop.
      
      [hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset]
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarRichard Weinberger <richard.weinberger@gmail.com>
      Signed-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8ffd304f
    • Arnd Bergmann's avatar
      remoteproc: avoid stack overflow in debugfs file · 881aa623
      Arnd Bergmann authored
      commit 92792e48 upstream.
      
      Recent gcc versions warn about reading from a negative offset of
      an on-stack array:
      
      drivers/remoteproc/remoteproc_debugfs.c: In function 'rproc_recovery_write':
      drivers/remoteproc/remoteproc_debugfs.c:167:9: warning: 'buf[4294967295u]' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      I don't see anything in sys_write() that prevents us from
      being called with a zero 'count' argument, so we should
      add an extra check in rproc_recovery_write() to prevent the
      access and avoid the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 2e37abb8 ("remoteproc: create a 'recovery' debugfs entry")
      Signed-off-by: default avatarOhad Ben-Cohen <ohad@wizery.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      881aa623
    • Oleg Nesterov's avatar
      proc: actually make proc_fd_permission() thread-friendly · 71755056
      Oleg Nesterov authored
      commit 54708d28 upstream.
      
      The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      fixed the access to /proc/self/fd from sub-threads, but introduced another
      problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
      if generic_permission() fails.
      
      Change proc_fd_permission() to check same_thread_group(pid_task(), current).
      
      Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      Reported-by: default avatar"Jin, Yihua" <yihua.jin@intel.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      71755056
    • Jiri Slaby's avatar
      Revert "ocfs2: fix umask ignored issue" · 9d7b1354
      Jiri Slaby authored
      This reverts commit e1f20b83, upstream
      commit 8f1eb487. This commit fixes
      702e5bc6 ("ocfs2: use generic posix ACL infrastructure"), which is only
      in 3.14.
      
      So this commit should have never been applied to 3.12 and it can
      cause sgid inheritance issues.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9d7b1354
    • Ben Hutchings's avatar
      pipe: Fix buffer offset after partially failed read · 6316a3e9
      Ben Hutchings authored
      Quoting the RHEL advisory:
      
      > It was found that the fix for CVE-2015-1805 incorrectly kept buffer
      > offset and buffer length in sync on a failed atomic read, potentially
      > resulting in a pipe buffer state corruption. A local, unprivileged user
      > could use this flaw to crash the system or leak kernel memory to user
      > space. (CVE-2016-0774, Moderate)
      
      The same flawed fix was applied to stable branches from 2.6.32.y to
      3.14.y inclusive, and I was able to reproduce the issue on 3.2.y.
      We need to give pipe_iov_copy_to_user() a separate offset variable
      and only update the buffer offset if it succeeds.
      
      References: https://rhn.redhat.com/errata/RHSA-2016-0103.htmlSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6316a3e9
  2. 12 Feb, 2016 28 commits
    • J. Bruce Fields's avatar
      dcache: use IS_ROOT to decide where dentry is hashed · 173a4661
      J. Bruce Fields authored
      commit 7632e465 upstream.
      
      Every hashed dentry is either hashed in the dentry_hashtable, or a
      superblock's s_anon list.
      
      __d_drop() assumes it can determine which is the case by checking
      DCACHE_DISCONNECTED; this is not true.
      
      It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
      only hashed on dentry_hashtable, but is fully connected to its parents
      back to the root.
      
      But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
      attempts to connect a directory (found by filehandle lookup) back to
      root by ascending to parents and performing lookups one at a time.  It
      does not clear DCACHE_DISCONNECTED until it's done, and that is not at
      all an atomic process.
      
      In particular, it is possible for DCACHE_DISCONNECTED to be set on a
      dentry which is hashed on the dentry_hashtable.
      
      Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
      *does* work:
      
      Dentries are hashed only by:
      
      	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.
      
      	- __d_rehash, called by _d_rehash: hashes to the dentry's
      	  parent, and all callers of _d_rehash appear to have d_parent
      	  set to a "real" parent.
      	- __d_rehash, called by __d_move: rehashes the moved dentry to
      	  hash chain determined by target, and assigns target's d_parent
      	  to its d_parent, before dropping the dentry's d_lock.
      
      Therefore I believe it's safe for a holder of a dentry's d_lock to
      assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
      true.
      
      I believe the incorrect assumption about DCACHE_DISCONNECTED was
      originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
      locking".
      
      Also add a comment while we're here.
      
      Cc: Nick Piggin <npiggin@kernel.dk>
      Acked-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      173a4661
    • Jiri Slaby's avatar
      Linux 3.12.54 · 9f9aab6e
      Jiri Slaby authored
      9f9aab6e
    • Nikolay Borisov's avatar
      dm thin: fix race condition when destroying thin pool workqueue · 021cc4c8
      Nikolay Borisov authored
      commit 18d03e8c upstream.
      
      When a thin pool is being destroyed delayed work items are
      cancelled using cancel_delayed_work(), which doesn't guarantee that on
      return the delayed item isn't running.  This can cause the work item to
      requeue itself on an already destroyed workqueue.  Fix this by using
      cancel_delayed_work_sync() which guarantees that on return the work item
      is not running anymore.
      
      Fixes: 905e51b3 ("dm thin: commit outstanding data every second")
      Fixes: 85ad643b ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      021cc4c8
    • Ioan-Adrian Ratiu's avatar
      HID: usbhid: fix recursive deadlock · d43afc4a
      Ioan-Adrian Ratiu authored
      commit e470127e upstream.
      
      The critical section protected by usbhid->lock in hid_ctrl() is too
      big and because of this it causes a recursive deadlock. "Too big" means
      the case statement and the call to hid_input_report() do not need to be
      protected by the spinlock (no URB operations are done inside them).
      
      The deadlock happens because in certain rare cases drivers try to grab
      the lock while handling the ctrl irq which grabs the lock before them
      as described above. For example newer wacom tablets like 056a:033c try
      to reschedule proximity reads from wacom_intuos_schedule_prox_event()
      calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
      which tries to grab the usbhid lock already held by hid_ctrl().
      
      There are two ways to get out of this deadlock:
          1. Make the drivers work "around" the ctrl critical region, in the
          wacom case for ex. by delaying the scheduling of the proximity read
          request itself to a workqueue.
          2. Shrink the critical region so the usbhid lock protects only the
          instructions which modify usbhid state, calling hid_input_report()
          with the spinlock unlocked, allowing the device driver to grab the
          lock first, finish and then grab the lock afterwards in hid_ctrl().
      
      This patch implements the 2nd solution.
      Signed-off-by: default avatarIoan-Adrian Ratiu <adi@adirat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d43afc4a
    • Seth Jennings's avatar
      drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 48a17e3c
      Seth Jennings authored
      commit 26bbe7ef upstream.
      
      Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
      x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
      generic x86 64bit") introduced large block sizes for x86.  This made it
      possible to have multiple sections per memory block where previously,
      there was a only every one section per block.
      
      Since blocks consist of contiguous ranges of section, there can be holes
      in the blocks where sections are not present.  If one attempts to
      offline such a block, a crash occurs since the code is not designed to
      deal with this.
      
      This patch is a quick fix to gaurd against the crash by not allowing
      blocks with non-present sections to be offlined.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Reported-by: default avatarAndrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      48a17e3c
    • Mike Snitzer's avatar
      dm btree: fix leak of bufio-backed block in btree_split_sibling error path · 1b53306f
      Mike Snitzer authored
      commit 30ce6e1c upstream.
      
      The block allocated at the start of btree_split_sibling() is never
      released if later insert_at() fails.
      
      Fix this by releasing the previously allocated bufio block using
      unlock_block().
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b53306f
    • Herbert Xu's avatar
      crypto: algif_hash - Only export and import on sockets with data · 23130403
      Herbert Xu authored
      commit 4afa5f96 upstream.
      
      The hash_accept call fails to work on sockets that have not received
      any data.  For some algorithm implementations it may cause crashes.
      
      This patch fixes this by ensuring that we only export and import on
      sockets that have received data.
      Reported-by: default avatarHarsh Jain <harshjain.prof@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      23130403
    • libin's avatar
      recordmcount: Fix endianness handling bug for nop_mcount · 4366e332
      libin authored
      commit c84da8b9 upstream.
      
      In nop_mcount, shdr->sh_offset and welp->r_offset should handle
      endianness properly, otherwise it will trigger Segmentation fault
      if the recordmcount main and file.o have different endianness.
      
      Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4366e332
    • Greg Kroah-Hartman's avatar
      xhci: fix placement of call to usb_disabled() · b48d0542
      Greg Kroah-Hartman authored
      In the backport of 1eaf35e4, the call to
      usb_disabled() was too late, after we had already done some allocation.
      Move that call to the top of the function instead, making the logic
      match what is intended and is in the original patch.
      Reported-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b48d0542
    • Tejun Heo's avatar
      Revert "workqueue: make sure delayed work run in local cpu" · 237c785f
      Tejun Heo authored
      commit 041bd12e upstream.
      
      This reverts commit 874bbfe6.
      
      Workqueue used to implicity guarantee that work items queued without
      explicit CPU specified are put on the local CPU.  Recent changes in
      timer broke the guarantee and led to vmstat breakage which was fixed
      by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU
      we need it to run on").
      
      vmstat is the most likely to expose the issue and it's quite possible
      that there are other similar problems which are a lot more difficult
      to trigger.  As a preventive measure, 874bbfe6 ("workqueue: make
      sure delayed work run in local cpu") was applied to restore the local
      CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
      which got fixed by 22b886dd ("timers: Use proper base migration in
      add_timer_on()").  Due to code restructuring, the commit couldn't be
      backported beyond certain point and stable kernels which only had
      874bbfe6 started crashing.
      
      The local CPU guarantee was accidental more than anything else and we
      want to get rid of it anyway.  As, with the vmstat case fixed,
      874bbfe6 is causing more problems than it's fixing, it has been
      decided to take the chance and officially break the guarantee by
      reverting the commit.  A debug feature will be added to force foreign
      CPU assignment to expose cases relying on the guarantee and fixes for
      the individual cases will be backported to stable as necessary.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu")
      Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      237c785f
    • Linus Torvalds's avatar
      vmstat: explicitly schedule per-cpu work on the CPU we need it to run on · 933f407f
      Linus Torvalds authored
      commit 176bed1d upstream.
      
      The vmstat code uses "schedule_delayed_work_on()" to do the initial
      startup of the delayed work on the right CPU, but then once it was
      started it would use the non-cpu-specific "schedule_delayed_work()" to
      re-schedule it on that CPU.
      
      That just happened to schedule it on the same CPU historically (well, in
      almost all situations), but the code _requires_ this work to be per-cpu,
      and should say so explicitly rather than depend on the non-cpu-specific
      scheduling to schedule on the current CPU.
      
      The timer code is being changed to not be as single-minded in always
      running things on the calling CPU.
      
      See also commit 874bbfe6 ("workqueue: make sure delayed work run in
      local cpu") that for now maintains the local CPU guarantees just in case
      there are other broken users that depended on the accidental behavior.
      
      js: 3.12 backport
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mike Galbraith <mgalbraith@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      933f407f
    • Andrew Morton's avatar
      openrisc: fix CONFIG_UID16 setting · 6b443a1b
      Andrew Morton authored
      commit 04ea1e91 upstream.
      
      openrisc-allnoconfig:
      
        kernel/uid16.c: In function 'SYSC_setgroups16':
        kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
        kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast
      
      openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.
      
      Fixes: 2813893f ("kernel: conditionally support non-root users, groups and capabilities")
      Reported-by: default avatarFengguang Wu <fengguang.wu@gmail.com>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6b443a1b
    • Jiri Slaby's avatar
      x86: vvar, fix excessive gcc-6 DECLARE_VVAR warnings · d6ace935
      Jiri Slaby authored
      On 3.12, with gcc-6, I see a lot of:
      arch/x86/include/asm/vvar.h:33:28: warning: ‘vvaraddr_jiffies’ defined but not used [-Wunused-const-variable]
        static type const * const vvaraddr_ ## name =   \
                                  ^
      arch/x86/include/asm/vvar.h:46:1: note: in expansion of macro ‘DECLARE_VVAR’
       DECLARE_VVAR(0, volatile unsigned long, jiffies)
       ^~~~~~~~~~~~
      
      In upstream, this is fixed by ef721987 (x86, vdso: Introduce VVAR
      marco for vdso32) and f40c3300 (x86, vdso: Move the vvar and hpet
      mappings next to the 64-bit vDSO). But this is not applicable to
      stable.
      
      So mark the vvar declaration as __maybe_unused and be done with it.
      This will generate it to the code only if it is used. I.e. the same as
      with gcc < 6.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@amacapital.net>
      d6ace935
    • Joe Perches's avatar
      compiler-gcc: integrate the various compiler-gcc[345].h files · 1fa9b58c
      Joe Perches authored
      commit cb984d10 upstream.
      
      As gcc major version numbers are going to advance rather rapidly in the
      future, there's no real value in separate files for each compiler
      version.
      
      Deduplicate some of the macros #defined in each file too.
      
      Neaten comments using normal kernel commenting style.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Alan Modra <amodra@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1fa9b58c
    • Steven Noonan's avatar
      compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles · e3e923ee
      Steven Noonan authored
      commit 5631b8fb upstream.
      
      The bug referenced by the comment in this commit was not
      completely fixed in GCC 4.8.2, as I mentioned in a thread back
      in February:
      
         https://lkml.org/lkml/2014/2/12/797
      
      The conclusion at that time was to make the quirk unconditional
      until the bug could be found and fixed in GCC. Unfortunately,
      when I submitted the patch (commit a9f18034) I left a comment
      in that claimed the bug was fixed in GCC 4.8.2+.
      
      This comment is inaccurate, and should be removed.
      Signed-off-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e3e923ee
    • Yang Shi's avatar
      arm64: restore bogomips information in /proc/cpuinfo · 0b3f87b4
      Yang Shi authored
      commit 92e788b7 upstream.
      
      As previously reported, some userspace applications depend on bogomips
      showed by /proc/cpuinfo. Although there is much less legacy impact on
      aarch64 than arm, it does break libvirt.
      
      This patch reverts commit 326b16db ("arm64: delay: don't bother
      reporting bogomips in /proc/cpuinfo"), but with some tweak due to
      context change and without the pr_info().
      
      Fixes: 326b16db ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo")
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0b3f87b4
    • Guenter Roeck's avatar
      mn10300: Select CONFIG_HAVE_UID16 to fix build failure · f5e77f5a
      Guenter Roeck authored
      commit c86576ea upstream.
      
      mn10300 builds fail with
      
      fs/stat.c: In function 'cp_old_stat':
      fs/stat.c:163:2: error: 'old_uid_t' undeclared
      
      ipc/util.c: In function 'ipc64_perm_to_ipc_perm':
      ipc/util.c:540:2: error: 'old_uid_t' undeclared
      
      Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16
      to fix the problem.
      
      Fixes: fbc416ff ("arm64: fix building without CONFIG_UID16")
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarAcked-by: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f5e77f5a
    • Richard Purdie's avatar
      HID: core: Avoid uninitialized buffer access · 19a3cb32
      Richard Purdie authored
      commit 79b568b9 upstream.
      
      hid_connect adds various strings to the buffer but they're all
      conditional. You can find circumstances where nothing would be written
      to it but the kernel will still print the supposedly empty buffer with
      printk. This leads to corruption on the console/in the logs.
      
      Ensure buf is initialized to an empty string.
      Signed-off-by: default avatarRichard Purdie <richard.purdie@linuxfoundation.org>
      [dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: linux-input@vger.kernel.org
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      19a3cb32
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · 01bdbe95
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      01bdbe95
    • Will Deacon's avatar
      arm64: mm: ensure that the zero page is visible to the page table walker · 13d8053e
      Will Deacon authored
      commit 32d63978 upstream.
      
      In paging_init, we allocate the zero page, memset it to zero and then
      point TTBR0 to it in order to avoid speculative fetches through the
      identity mapping.
      
      In order to guarantee that the freshly zeroed page is indeed visible to
      the page table walker, we need to execute a dsb instruction prior to
      writing the TTBR.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13d8053e
    • John Blackwood's avatar
      arm64: Clear out any singlestep state on a ptrace detach operation · 8abc0d5b
      John Blackwood authored
      commit 5db4fd8c upstream.
      
      Make sure to clear out any ptrace singlestep state when a ptrace(2)
      PTRACE_DETACH call is made on arm64 systems.
      
      Otherwise, the previously ptraced task will die off with a SIGTRAP
      signal if the debugger just previously singlestepped the ptraced task.
      Signed-off-by: default avatarJohn Blackwood <john.blackwood@ccur.com>
      [will: added comment to justify why this is in the arch code]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8abc0d5b
    • Arnd Bergmann's avatar
      arm64: fix building without CONFIG_UID16 · 4f29293d
      Arnd Bergmann authored
      commit fbc416ff upstream.
      
      As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
      disabled currently fails because the system call table still needs to
      reference the individual function entry points that are provided by
      kernel/sys_ni.c in this case, and the declarations are hidden inside
      of #ifdef CONFIG_UID16:
      
      arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
       __SYSCALL(__NR_lchown, sys_lchown16)
      
      I believe this problem only exists on ARM64, because older architectures
      tend to not need declarations when their system call table is built
      in assembly code, while newer architectures tend to not need UID16
      support. ARM64 only uses these system calls for compatibility with
      32-bit ARM binaries.
      
      This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
      set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
      declarations whenever we need them, but otherwise the behavior is
      unchanged.
      
      Fixes: af1839eb ("Kconfig: clean up the long arch list for the UID16 config option")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4f29293d
    • Marc Zyngier's avatar
      arm64: KVM: Fix AArch32 to AArch64 register mapping · 9bd3d70a
      Marc Zyngier authored
      commit c0f09634 upstream.
      
      When running a 32bit guest under a 64bit hypervisor, the ARMv8
      architecture defines a mapping of the 32bit registers in the 64bit
      space. This includes banked registers that are being demultiplexed
      over the 64bit ones.
      
      On exceptions caused by an operation involving a 32bit register, the
      HW exposes the register number in the ESR_EL2 register. It was so
      far understood that SW had to distinguish between AArch32 and AArch64
      accesses (based on the current AArch32 mode and register number).
      
      It turns out that I misinterpreted the ARM ARM, and the clue is in
      D1.20.1: "For some exceptions, the exception syndrome given in the
      ESR_ELx identifies one or more register numbers from the issued
      instruction that generated the exception. Where the exception is
      taken from an Exception level using AArch32 these register numbers
      give the AArch64 view of the register."
      
      Which means that the HW is already giving us the translated version,
      and that we shouldn't try to interpret it at all (for example, doing
      an MMIO operation from the IRQ mode using the LR register leads to
      very unexpected behaviours).
      
      The fix is thus not to perform a call to vcpu_reg32() at all from
      vcpu_reg(), and use whatever register number is supplied directly.
      The only case we need to find out about the mapping is when we
      actively generate a register access, which only occurs when injecting
      a fault in a guest.
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9bd3d70a
    • Ulrich Weigand's avatar
      scripts/recordmcount.pl: support data in text section on powerpc · a69c779e
      Ulrich Weigand authored
      commit 2e50c4be upstream.
      
      If a text section starts out with a data blob before the first
      function start label, disassembly parsing doing in recordmcount.pl
      gets confused on powerpc, leading to creation of corrupted module
      objects.
      
      This was not a problem so far since the compiler would never create
      such text sections.  However, this has changed with a recent change
      in GCC 6 to support distances of > 2GB between a function and its
      assoicated TOC in the ELFv2 ABI, exposing this problem.
      
      There is already code in recordmcount.pl to handle such data blobs
      on the sparc64 platform.  This patch uses the same method to handle
      those on powerpc as well.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a69c779e
    • Boqun Feng's avatar
      powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered · 166947a7
      Boqun Feng authored
      commit 81d7a329 upstream.
      
      According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
      versions all need to be fully ordered, however they are now just
      RELEASE+ACQUIRE, which are not fully ordered.
      
      So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
      PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
      __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
      of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
      b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      
      This patch depends on patch "powerpc: Make value-returning atomics fully
      ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      166947a7
    • Boqun Feng's avatar
      powerpc: Make value-returning atomics fully ordered · 432c20b3
      Boqun Feng authored
      commit 49e9cf3f upstream.
      
      According to memory-barriers.txt:
      
      > Any atomic operation that modifies some state in memory and returns
      > information about the state (old or new) implies an SMP-conditional
      > general memory barrier (smp_mb()) on each side of the actual
      > operation ...
      
      Which mean these operations should be fully ordered. However on PPC,
      PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
      which is currently "lwsync" if SMP=y. The leading "lwsync" can not
      guarantee fully ordered atomics, according to Paul Mckenney:
      
      https://lkml.org/lkml/2015/10/14/970
      
      To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
      the fully-ordered semantics.
      
      This also makes futex atomics fully ordered, which can avoid possible
      memory ordering problems if userspace code relies on futex system call
      for fully ordered semantics.
      
      Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      432c20b3
    • Michael Neuling's avatar
      powerpc/tm: Block signal return setting invalid MSR state · e9214d10
      Michael Neuling authored
      commit d2b9d2a5 upstream.
      
      Currently we allow both the MSR T and S bits to be set by userspace on
      a signal return.  Unfortunately this is a reserved configuration and
      will cause a TM Bad Thing exception if attempted (via rfid).
      
      This patch checks for this case in both the 32 and 64 bit signals
      code.  If both T and S are set, we mark the context as invalid.
      
      Found using a syscall fuzzer.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e9214d10
    • Dan Streetman's avatar
      xfrm: dst_entries_init() per-net dst_ops · 3e29fa5b
      Dan Streetman authored
      [ Upstream commit a8a572a6 ]
      
      Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
      templates; their dst_entries counters will never be used.  Move the
      xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
      xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
      and dst_entries_destroy for each net namespace.
      
      The ipv4 and ipv6 xfrms each create dst_ops template, and perform
      dst_entries_init on the templates.  The template values are copied to each
      net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
      pcpuc_entries field is a percpu counter and cannot be used correctly by
      simply copying it to another object.
      
      The result of this is a very subtle bug; changes to the dst entries
      counter from one net namespace may sometimes get applied to a different
      net namespace dst entries counter.  This is because of how the percpu
      counter works; it has a main count field as well as a pointer to the
      percpu variables.  Each net namespace maintains its own main count
      variable, but all point to one set of percpu variables.  When any net
      namespace happens to change one of the percpu variables to outside its
      small batch range, its count is moved to the net namespace's main count
      variable.  So with multiple net namespaces operating concurrently, the
      dst_ops entries counter can stray from the actual value that it should
      be; if counts are consistently moved from one net namespace to another
      (which my testing showed is likely), then one net namespace winds up
      with a negative dst_ops count while another winds up with a continually
      increasing count, eventually reaching its gc_thresh limit, which causes
      all new traffic on the net namespace to fail with -ENOBUFS.
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3e29fa5b