1. 26 Jul, 2019 40 commits
    • Daniel Jordan's avatar
      padata: use smp_mb in padata_reorder to avoid orphaned padata jobs · 2b335bac
      Daniel Jordan authored
      commit cf144f81 upstream.
      
      Testing padata with the tcrypt module on a 5.2 kernel...
      
          # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
          # modprobe tcrypt mode=211 sec=1
      
      ...produces this splat:
      
          INFO: task modprobe:10075 blocked for more than 120 seconds.
                Not tainted 5.2.0-base+ #16
          modprobe        D    0 10075  10064 0x80004080
          Call Trace:
           ? __schedule+0x4dd/0x610
           ? ring_buffer_unlock_commit+0x23/0x100
           schedule+0x6c/0x90
           schedule_timeout+0x3b/0x320
           ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
           wait_for_common+0x160/0x1a0
           ? wake_up_q+0x80/0x80
           { crypto_wait_req }             # entries in braces added by hand
           { do_one_aead_op }
           { test_aead_jiffies }
           test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
           do_test+0x4053/0x6a2b [tcrypt]
           ? 0xffffffffa00f4000
           tcrypt_mod_init+0x50/0x1000 [tcrypt]
           ...
      
      The second modprobe command never finishes because in padata_reorder,
      CPU0's load of reorder_objects is executed before the unlocking store in
      spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:
      
      CPU0                                 CPU1
      
      padata_reorder                       padata_do_serial
        LOAD reorder_objects  // 0
                                             INC reorder_objects  // 1
                                             padata_reorder
                                               TRYLOCK pd->lock   // failed
        UNLOCK pd->lock
      
      CPU0 deletes the timer before returning from padata_reorder and since no
      other job is submitted to padata, modprobe waits indefinitely.
      
      Add a pair of full barriers to guarantee proper ordering:
      
      CPU0                                 CPU1
      
      padata_reorder                       padata_do_serial
        UNLOCK pd->lock
        smp_mb()
        LOAD reorder_objects
                                             INC reorder_objects
                                             smp_mb__after_atomic()
                                             padata_reorder
                                               TRYLOCK pd->lock
      
      smp_mb__after_atomic is needed so the read part of the trylock operation
      comes after the INC, as Andrea points out.   Thanks also to Andrea for
      help with writing a litmus test.
      
      Fixes: 16295bec ("padata: Generic parallelization/serialization interface")
      Signed-off-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b335bac
    • Lyude Paul's avatar
      drm/nouveau/i2c: Enable i2c pads & busses during preinit · 8e74f324
      Lyude Paul authored
      commit 7cb95eee upstream.
      
      It turns out that while disabling i2c bus access from software when the
      GPU is suspended was a step in the right direction with:
      
      commit 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after
      ->fini()")
      
      We also ended up accidentally breaking the vbios init scripts on some
      older Tesla GPUs, as apparently said scripts can actually use the i2c
      bus. Since these scripts are executed before initializing any
      subdevices, we end up failing to acquire access to the i2c bus which has
      left a number of cards with their fan controllers uninitialized. Luckily
      this doesn't break hardware - it just means the fan gets stuck at 100%.
      
      This also means that we've always been using our i2c busses before
      initializing them during the init scripts for older GPUs, we just didn't
      notice it until we started preventing them from being used until init.
      It's pretty impressive this never caused us any issues before!
      
      So, fix this by initializing our i2c pad and busses during subdev
      pre-init. We skip initializing aux busses during pre-init, as those are
      guaranteed to only ever be used by nouveau for DP aux transactions.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Tested-by: default avatarMarc Meledandri <m.meledandri@gmail.com>
      Fixes: 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after ->fini()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e74f324
    • Linus Walleij's avatar
      ARM: dts: gemini: Set DIR-685 SPI CS as active low · 4c64814a
      Linus Walleij authored
      commit f90b8fda upstream.
      
      The SPI to the display on the DIR-685 is active low, we were
      just saved by the SPI library enforcing active low on everything
      before, so set it as active low to avoid ambiguity.
      
      Link: https://lore.kernel.org/r/20190715202101.16060-1-linus.walleij@linaro.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c64814a
    • Masahiro Yamada's avatar
      kconfig: fix missing choice values in auto.conf · ea3f1487
      Masahiro Yamada authored
      commit 8e2442a5 upstream.
      
      Since commit 00c864f8 ("kconfig: allow all config targets to write
      auto.conf if missing"), Kconfig creates include/config/auto.conf in the
      defconfig stage when it is missing.
      
      Joonas Kylmälä reported incorrect auto.conf generation under some
      circumstances.
      
      To reproduce it, apply the following diff:
      
      |  --- a/arch/arm/configs/imx_v6_v7_defconfig
      |  +++ b/arch/arm/configs/imx_v6_v7_defconfig
      |  @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
      |   CONFIG_USB_CONFIGFS_F_HID=y
      |   CONFIG_USB_CONFIGFS_F_UVC=y
      |   CONFIG_USB_CONFIGFS_F_PRINTER=y
      |  -CONFIG_USB_ZERO=m
      |  -CONFIG_USB_AUDIO=m
      |  -CONFIG_USB_ETH=m
      |  -CONFIG_USB_G_NCM=m
      |  -CONFIG_USB_GADGETFS=m
      |  -CONFIG_USB_FUNCTIONFS=m
      |  -CONFIG_USB_MASS_STORAGE=m
      |  -CONFIG_USB_G_SERIAL=m
      |  +CONFIG_USB_FUNCTIONFS=y
      |   CONFIG_MMC=y
      |   CONFIG_MMC_SDHCI=y
      |   CONFIG_MMC_SDHCI_PLTFM=y
      
      And then, run:
      
      $ make ARCH=arm mrproper imx_v6_v7_defconfig
      
      You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
      .config, but not in the auto.conf.
      
      Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
      block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
      
      This is probably a similar situation described in commit beaaddb6
      ("kconfig: tests: test defconfig when two choices interact").
      
      When sym_calc_choice() is called, the choice symbol forgets the
      SYMBOL_DEF_USER unless all of its choice values are explicitly set by
      the user.
      
      The choice symbol is given just one chance to recall it because
      set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
      is set.
      
      When sym_calc_choice() is called again, the choice symbol forgets it
      forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
      Hence, we cannot call sym_clear_all_valid() again and again.
      
      It is crazy to repeat set and unset of internal flags. However, we
      cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
      Doing so would re-introduce the problem solved by commit 5d09598d
      ("kconfig: fix new choices being skipped upon config update").
      
      To work around the issue, conf_write_autoconf() stopped calling
      sym_clear_all_valid().
      
      conf_write() must be changed accordingly. Currently, it clears
      SYMBOL_WRITE after the symbol is written into the .config file. This
      is needed to prevent it from writing the same symbol multiple times in
      case the symbol is declared in two or more locations. I added the new
      flag SYMBOL_WRITTEN, to track the symbols that have been written.
      
      Anyway, this is a cheesy workaround in order to suppress the issue
      as far as defconfig is concerned.
      
      Handling of choices is totally broken. sym_clear_all_valid() is called
      every time a user touches a symbol from the GUI interface. To reproduce
      it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
      around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
      from the .config file.
      
      I added the Fixes tag since it is more fatal than before. But, this
      has been broken since long long time before, and still it is.
      We should take a closer look to fix this correctly somehow.
      
      Fixes: 00c864f8 ("kconfig: allow all config targets to write auto.conf if missing")
      Cc: linux-stable <stable@vger.kernel.org> # 4.19+
      Reported-by: default avatarJoonas Kylmälä <joonas.kylmala@iki.fi>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Tested-by: default avatarJoonas Kylmälä <joonas.kylmala@iki.fi>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea3f1487
    • Vitor Soares's avatar
      i3c: fix i2c and i3c scl rate by bus mode · 3620a72c
      Vitor Soares authored
      commit ecc8fb54 upstream.
      
      Currently the I3C framework limits SCL frequency to FM speed when
      dealing with a mixed slow bus, even if all I2C devices are FM+ capable.
      
      The core was also not accounting for I3C speed limitations when
      operating in mixed slow mode and was erroneously using FM+ speed as the
      max I2C speed when operating in mixed fast mode.
      
      Fixes: 3a379bbc ("i3c: Add core I3C infrastructure")
      Signed-off-by: default avatarVitor Soares <vitor.soares@synopsys.com>
      Cc: Boris Brezillon <bbrezillon@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: <linux-kernel@vger.kernel.org>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3620a72c
    • Radoslaw Burny's avatar
      fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes. · 7eb45a94
      Radoslaw Burny authored
      commit 5ec27ec7 upstream.
      
      Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
      but this is not a correct behavior for proc.  Since sysctl permission
      check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
      sense to use these values in u_[ug]id of proc inodes.  In other words:
      although uid/gid in the inode is not read during test_perm, the inode
      logically belongs to the root of the namespace.  I have confirmed this
      with Eric Biederman at LPC and in this thread:
        https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com
      
      Consequences
      ============
      
      Since the i_[ug]id values of proc nodes are not used for permissions
      checks, this change usually makes no functional difference.  However, it
      causes an issue in a setup where:
      
       * a namespace container is created without root user in container -
         hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID
      
       * container creator tries to configure it by writing /proc/sys files,
         e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit
      
      Kernel does not allow to open an inode for writing if its i_[ug]id are
      invalid, making it impossible to write shmmax and thus - configure the
      container.
      
      Using a container with no root mapping is apparently rare, but we do use
      this configuration at Google.  Also, we use a generic tool to configure
      the container limits, and the inability to write any of them causes a
      failure.
      
      History
      =======
      
      The invalid uids/gids in inodes first appeared due to 81754357 (fs:
      Update i_[ug]id_(read|write) to translate relative to s_user_ns).
      However, AFAIK, this did not immediately cause any issues.  The
      inability to write to these "invalid" inodes was only caused by a later
      commit 0bd23d09 (vfs: Don't modify inodes with a uid or gid unknown
      to the vfs).
      
      Tested: Used a repro program that creates a user namespace without any
      mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
      Before the change, it shows the overflow uid, with the change it's 0.
      The overflow uid indicates that the uid in the inode is not correct and
      thus it is not possible to open the file for writing.
      
      Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
      Fixes: 0bd23d09 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
      Signed-off-by: default avatarRadoslaw Burny <rburny@google.com>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: John Sperbeck <jsperbeck@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7eb45a94
    • Eric W. Biederman's avatar
      signal: Correct namespace fixups of si_pid and si_uid · e897dd22
      Eric W. Biederman authored
      commit 7a0cf094 upstream.
      
      The function send_signal was split from __send_signal so that it would
      be possible to bypass the namespace logic based upon current[1].  As it
      turns out the si_pid and the si_uid fixup are both inappropriate in
      the case of kill_pid_usb_asyncio so move that logic into send_signal.
      
      It is difficult to arrange but possible for a signal with an si_code
      of SI_TIMER or SI_SIGIO to be sent across namespace boundaries.  In
      which case tests for when it is ok to change si_pid and si_uid based
      on SI_FROMUSER are incorrect.  Replace the use of SI_FROMUSER with a
      new test has_si_pid_and_used based on siginfo_layout.
      
      Now that the uid fixup is no longer present after expanding
      SEND_SIG_NOINFO properly calculate the si_uid that the target
      task needs to read.
      
      [1] 7978b567 ("signals: add from_ancestor_ns parameter to send_signal()")
      Cc: stable@vger.kernel.org
      Fixes: 6588c1e3 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
      Fixes: 6b550f94 ("user namespace: make signal.c respect user namespaces")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e897dd22
    • Eric W. Biederman's avatar
      signal/usb: Replace kill_pid_info_as_cred with kill_pid_usb_asyncio · c8c3ea85
      Eric W. Biederman authored
      commit 70f1b0d3 upstream.
      
      The usb support for asyncio encoded one of it's values in the wrong
      field.  It should have used si_value but instead used si_addr which is
      not present in the _rt union member of struct siginfo.
      
      The practical result of this is that on a 64bit big endian kernel
      when delivering a signal to a 32bit process the si_addr field
      is set to NULL, instead of the expected pointer value.
      
      This issue can not be fixed in copy_siginfo_to_user32 as the usb
      usage of the the _sigfault (aka si_addr) member of the siginfo
      union when SI_ASYNCIO is set is incompatible with the POSIX and
      glibc usage of the _rt member of the siginfo union.
      
      Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
      dedicated function for this one specific case.  There are no other
      users of kill_pid_info_as_cred so this specialization should have no
      impact on the amount of code in the kernel.  Have kill_pid_usb_asyncio
      take instead of a siginfo_t which is difficult and error prone, 3
      arguments, a signal number, an errno value, and an address enconded as
      a sigval_t.  The encoding of the address as a sigval_t allows the
      code that reads the userspace request for a signal to handle this
      compat issue along with all of the other compat issues.
      
      Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
      the pointer value at the in si_pid (instead of si_addr).  That is the
      code now verifies that si_pid and si_addr always occur at the same
      location.  Further the code veries that for native structures a value
      placed in si_pid and spilling into si_uid will appear in userspace in
      si_addr (on a byte by byte copy of siginfo or a field by field copy of
      siginfo).  The code also verifies that for a 64bit kernel and a 32bit
      userspace the 32bit pointer will fit in si_pid.
      
      I have used the usbsig.c program below written by Alan Stern and
      slightly tweaked by me to run on a big endian machine to verify the
      issue exists (on sparc64) and to confirm the patch below fixes the issue.
      
       /* usbsig.c -- test USB async signal delivery */
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <fcntl.h>
       #include <signal.h>
       #include <string.h>
       #include <sys/ioctl.h>
       #include <unistd.h>
       #include <endian.h>
       #include <linux/usb/ch9.h>
       #include <linux/usbdevice_fs.h>
      
       static struct usbdevfs_urb urb;
       static struct usbdevfs_disconnectsignal ds;
       static volatile sig_atomic_t done = 0;
      
       void urb_handler(int sig, siginfo_t *info , void *ucontext)
       {
       	printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
       	       sig, info->si_signo, info->si_errno, info->si_code,
       	       info->si_addr, &urb);
      
       	printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
       }
      
       void ds_handler(int sig, siginfo_t *info , void *ucontext)
       {
       	printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
       	       sig, info->si_signo, info->si_errno, info->si_code,
       	       info->si_addr, &ds);
      
       	printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
       	done = 1;
       }
      
       int main(int argc, char **argv)
       {
       	char *devfilename;
       	int fd;
       	int rc;
       	struct sigaction act;
       	struct usb_ctrlrequest *req;
       	void *ptr;
       	char buf[80];
      
       	if (argc != 2) {
       		fprintf(stderr, "Usage: usbsig device-file-name\n");
       		return 1;
       	}
      
       	devfilename = argv[1];
       	fd = open(devfilename, O_RDWR);
       	if (fd == -1) {
       		perror("Error opening device file");
       		return 1;
       	}
      
       	act.sa_sigaction = urb_handler;
       	sigemptyset(&act.sa_mask);
       	act.sa_flags = SA_SIGINFO;
      
       	rc = sigaction(SIGUSR1, &act, NULL);
       	if (rc == -1) {
       		perror("Error in sigaction");
       		return 1;
       	}
      
       	act.sa_sigaction = ds_handler;
       	sigemptyset(&act.sa_mask);
       	act.sa_flags = SA_SIGINFO;
      
       	rc = sigaction(SIGUSR2, &act, NULL);
       	if (rc == -1) {
       		perror("Error in sigaction");
       		return 1;
       	}
      
       	memset(&urb, 0, sizeof(urb));
       	urb.type = USBDEVFS_URB_TYPE_CONTROL;
       	urb.endpoint = USB_DIR_IN | 0;
       	urb.buffer = buf;
       	urb.buffer_length = sizeof(buf);
       	urb.signr = SIGUSR1;
      
       	req = (struct usb_ctrlrequest *) buf;
       	req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
       	req->bRequest = USB_REQ_GET_DESCRIPTOR;
       	req->wValue = htole16(USB_DT_DEVICE << 8);
       	req->wIndex = htole16(0);
       	req->wLength = htole16(sizeof(buf) - sizeof(*req));
      
       	rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
       	if (rc == -1) {
       		perror("Error in SUBMITURB ioctl");
       		return 1;
       	}
      
       	rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
       	if (rc == -1) {
       		perror("Error in REAPURB ioctl");
       		return 1;
       	}
      
       	memset(&ds, 0, sizeof(ds));
       	ds.signr = SIGUSR2;
       	ds.context = &ds;
       	rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
       	if (rc == -1) {
       		perror("Error in DISCSIGNAL ioctl");
       		return 1;
       	}
      
       	printf("Waiting for usb disconnect\n");
       	while (!done) {
       		sleep(1);
       	}
      
       	close(fd);
       	return 0;
       }
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-usb@vger.kernel.org
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Oliver Neukum <oneukum@suse.com>
      Fixes: v2.3.39
      Cc: stable@vger.kernel.org
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8c3ea85
    • Julien Thierry's avatar
      arm64: irqflags: Add condition flags to inline asm clobber list · 2cd1c187
      Julien Thierry authored
      commit f5706578 upstream.
      
      Some of the inline assembly instruction use the condition flags and need
      to include "cc" in the clobber list.
      
      Fixes: 4a503217 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
      Cc: <stable@vger.kernel.org> # 5.1.x-
      Suggested-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cd1c187
    • Jon Hunter's avatar
      arm64: tegra: Fix AGIC register range · cc43c9ef
      Jon Hunter authored
      commit ba24eee6 upstream.
      
      The Tegra AGIC interrupt controller is an ARM GIC400 interrupt
      controller. Per the ARM GIC device-tree binding, the first address
      region is for the GIC distributor registers and the second address
      region is for the GIC CPU interface registers. The address space for
      the distributor registers is 4kB, but currently this is incorrectly
      defined as 8kB for the Tegra AGIC and overlaps with the CPU interface
      registers. Correct the address space for the distributor to be 4kB.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Fixes: bcdbde43 ("arm64: tegra: Add AGIC node for Tegra210")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc43c9ef
    • Like Xu's avatar
      KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed · edadec19
      Like Xu authored
      commit 6fc3977c upstream.
      
      If a perf_event creation fails due to any reason of the host perf
      subsystem, it has no chance to log the corresponding event for guest
      which may cause abnormal sampling data in guest result. In debug mode,
      this message helps to understand the state of vPMC and we may not
      limit the number of occurrences but not in a spamming style.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edadec19
    • Michael Neuling's avatar
      KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation · 95680d0e
      Michael Neuling authored
      commit 3fefd1cd upstream.
      
      When emulating tsr, treclaim and trechkpt, we incorrectly set CR0. The
      code currently sets:
          CR0 <- 00 || MSR[TS]
      but according to the ISA it should be:
          CR0 <-  0 || MSR[TS] || 0
      
      This fixes the bit shift to put the bits in the correct location.
      
      This is a data integrity issue as CR0 is corrupted.
      
      Fixes: 4bb3c7a0 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
      Cc: stable@vger.kernel.org # v4.17+
      Tested-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95680d0e
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Clear pending decrementer exceptions on nested guest entry · 5328368b
      Suraj Jitindar Singh authored
      commit 3c25ab35 upstream.
      
      If we enter an L1 guest with a pending decrementer exception then this
      is cleared on guest exit if the guest has writtien a positive value
      into the decrementer (indicating that it handled the decrementer
      exception) since there is no other way to detect that the guest has
      handled the pending exception and that it should be dequeued. In the
      event that the L1 guest tries to run a nested (L2) guest immediately
      after this and the L2 guest decrementer is negative (which is loaded
      by L1 before making the H_ENTER_NESTED hcall), then the pending
      decrementer exception isn't cleared and the L2 entry is blocked since
      L1 has a pending exception, even though L1 may have already handled
      the exception and written a positive value for it's decrementer. This
      results in a loop of L1 trying to enter the L2 guest and L0 blocking
      the entry since L1 has an interrupt pending with the outcome being
      that L2 never gets to run and hangs.
      
      Fix this by clearing any pending decrementer exceptions when L1 makes
      the H_ENTER_NESTED hcall since it won't do this if it's decrementer
      has gone negative, and anyway it's decrementer has been communicated
      to L0 in the hdec_expires field and L0 will return control to L1 when
      this goes negative by delivering an H_DECREMENTER exception.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5328368b
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decrementer · eb6bb8d5
      Suraj Jitindar Singh authored
      commit 86953770 upstream.
      
      On POWER9 the decrementer can operate in large decrementer mode where
      the decrementer is 56 bits and signed extended to 64 bits. When not
      operating in this mode the decrementer behaves as a 32 bit decrementer
      which is NOT signed extended (as on POWER8).
      
      Currently when reading a guest decrementer value we don't take into
      account whether the large decrementer is enabled or not, and this
      means the value will be incorrect when the guest is not using the
      large decrementer. Fix this by sign extending the value read when the
      guest isn't using the large decrementer.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb6bb8d5
    • Wanpeng Li's avatar
      KVM: VMX: check CPUID before allowing read/write of IA32_XSS · f4905184
      Wanpeng Li authored
      commit 4d763b16 upstream.
      
      Raise #GP when guest read/write IA32_XSS, but the CPUID bits
      say that it shouldn't exist.
      
      Fixes: 20300099 (kvm: vmx: add MSR logic for XSAVES)
      Reported-by: default avatarXiaoyao Li <xiaoyao.li@linux.intel.com>
      Reported-by: default avatarTao Xu <tao3.xu@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4905184
    • Sean Christopherson's avatar
      KVM: VMX: Fix handling of #MC that occurs during VM-Entry · e64176cd
      Sean Christopherson authored
      commit beb8d93b upstream.
      
      A previous fix to prevent KVM from consuming stale VMCS state after a
      failed VM-Entry inadvertantly blocked KVM's handling of machine checks
      that occur during VM-Entry.
      
      Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
      depending on when the #MC is recognoized.  As it pertains to this bug
      fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
      is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
      indicate the VM-Entry failed.
      
      If a machine-check event occurs during a VM entry, one of the following occurs:
       - The machine-check event is handled as if it occurred before the VM entry:
              ...
       - The machine-check event is handled after VM entry completes:
              ...
       - A VM-entry failure occurs as described in Section 26.7. The basic
         exit reason is 41, for "VM-entry failure due to machine-check event".
      
      Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
      vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
      Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
      in a sane fashion and also simplifies vmx_complete_atomic_exit() since
      VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
      
      Fixes: b060ca3b ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e64176cd
    • Sean Christopherson's avatar
      KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 · 3ea511be
      Sean Christopherson authored
      commit 3b013a29 upstream.
      
      If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must
      be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS
      when MPX is supported.  Because the value effectively comes from vmcs01,
      vmcs02 must be updated even if vmcs12 is clean.
      
      Fixes: 62cf9bd8 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ea511be
    • Sean Christopherson's avatar
      KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped · 4ff7d3d1
      Sean Christopherson authored
      commit 73cb8556 upstream.
      
      ... as a malicious userspace can run a toy guest to generate invalid
      virtual-APIC page addresses in L1, i.e. flood the kernel log with error
      messages.
      
      Fixes: 69090810 ("KVM: nVMX: allow tests to use bad virtual-APIC page address")
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ff7d3d1
    • Sakari Ailus's avatar
      media: videobuf2-dma-sg: Prevent size from overflowing · 95fdf43f
      Sakari Ailus authored
      commit 14f28f5c upstream.
      
      buf->size is an unsigned long; casting that to int will lead to an
      overflow if buf->size exceeds INT_MAX.
      
      Fix this by changing the type to unsigned long instead. This is possible
      as the buf->size is always aligned to PAGE_SIZE, and therefore the size
      will never have values lesser than 0.
      
      Note on backporting to stable: the file used to be under
      drivers/media/v4l2-core, it was moved to the current location after 4.14.
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95fdf43f
    • Sakari Ailus's avatar
      media: videobuf2-core: Prevent size alignment wrapping buffer size to 0 · 1d9067ed
      Sakari Ailus authored
      commit defcdc5d upstream.
      
      PAGE_ALIGN() may wrap the buffer size around to 0. Prevent this by
      checking that the aligned value is not smaller than the unaligned one.
      
      Note on backporting to stable: the file used to be under
      drivers/media/v4l2-core, it was moved to the current location after 4.14.
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d9067ed
    • Ezequiel Garcia's avatar
      media: coda: Remove unbalanced and unneeded mutex unlock · 8829a3a0
      Ezequiel Garcia authored
      commit 766b9b16 upstream.
      
      The mutex unlock in the threaded interrupt handler is not paired
      with any mutex lock. Remove it.
      
      This bug has been here for a really long time, so it applies
      to any stable repo.
      Reviewed-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarEzequiel Garcia <ezequiel@collabora.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8829a3a0
    • Boris Brezillon's avatar
      media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom() · 0efb4790
      Boris Brezillon authored
      commit 07d89227 upstream.
      
      cfg->type can be overridden by v4l2_ctrl_fill() and the new value is
      stored in the local type var. Fix the tests to use this local var.
      
      Fixes: 0996517c ("V4L/DVB: v4l2: Add new control handling framework")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      [hverkuil-cisco@xs4all.nl: change to !qmenu and !qmenu_int (checkpatch)]
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0efb4790
    • Luis Henriques's avatar
      ceph: fix end offset in truncate_inode_pages_range call · baa616ae
      Luis Henriques authored
      commit d31d07b9 upstream.
      
      Commit e450f4d1 ("ceph: pass inclusive lend parameter to
      filemap_write_and_wait_range()") fixed the end offset parameter used to
      call filemap_write_and_wait_range and invalidate_inode_pages2_range.
      Unfortunately it missed truncate_inode_pages_range, introducing a
      regression that is easily detected by xfstest generic/130.
      
      The problem is that when doing direct IO it is possible that an extra page
      is truncated from the page cache when the end offset is page aligned.
      This can cause data loss if that page hasn't been sync'ed to the OSDs.
      
      While there, change code to use PAGE_ALIGN macro instead.
      
      Cc: stable@vger.kernel.org
      Fixes: e450f4d1 ("ceph: pass inclusive lend parameter to filemap_write_and_wait_range()")
      Signed-off-by: default avatarLuis Henriques <lhenriques@suse.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa616ae
    • Takashi Iwai's avatar
      ALSA: hda/hdmi - Fix i915 reverse port/pin mapping · 1d149073
      Takashi Iwai authored
      commit 3140aafb upstream.
      
      The recent fix for Icelake HDMI codec introduced the mapping from pin
      NID to the i915 gfx port number.  However, it forgot the reverse
      mapping from the port number to the pin NID that is used in the ELD
      notifier callback.  As a result, it's processed to a wrong widget and
      gives a warning like
        snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 5 not registered
      
      This patch corrects it with a proper reverse mapping function.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204133
      Fixes: b0d8bc50 ("ALSA: hda: hdmi - add Icelake support")
      Reviewed-by: default avatarKai Vehmanen <kai.vehmanen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d149073
    • Takashi Iwai's avatar
      ALSA: hda/hdmi - Remove duplicated define · 33b9b37d
      Takashi Iwai authored
      commit eb417711 upstream.
      
      INTEL_GET_VENDOR_VERB is defined twice identically.
      Let's remove a superfluous line.
      
      Fixes: b0d8bc50 ("ALSA: hda: hdmi - add Icelake support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33b9b37d
    • Hui Wang's avatar
      ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine · 70981824
      Hui Wang authored
      commit 4b4e0e32 upstream.
      
      Without this patch, the headset-mic and headphone-mic don't work.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70981824
    • Kailang Yang's avatar
      ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell platform · 7f6d5649
      Kailang Yang authored
      commit fbc57129 upstream.
      
      It assigned to wrong model. So, The headphone Mic can't work.
      
      Fixes: 3f640970 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f6d5649
    • Takashi Iwai's avatar
      ALSA: hda - Don't resume forcibly i915 HDMI/DP codec · 68ffeea0
      Takashi Iwai authored
      commit 4914da2f upstream.
      
      We apply the codec resume forcibly at system resume callback for
      updating and syncing the jack detection state that may have changed
      during sleeping.  This is, however, superfluous for the codec like
      Intel HDMI/DP, where the jack detection is managed via the audio
      component notification; i.e. the jack state change shall be reported
      sooner or later from the graphics side at mode change.
      
      This patch changes the codec resume callback to avoid the forcible
      resume conditionally with a new flag, codec->relaxed_resume, for
      reducing the resume time.  The flag is set in the codec probe.
      
      Although this doesn't fix the entire bug mentioned in the bugzilla
      entry below, it's still a good optimization and some improvements are
      seen.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201901
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68ffeea0
    • Takashi Iwai's avatar
      ALSA: seq: Break too long mutex context in the write loop · 6a163566
      Takashi Iwai authored
      commit ede34f39 upstream.
      
      The fix for the racy writes and ioctls to sequencer widened the
      application of client->ioctl_mutex to the whole write loop.  Although
      it does unlock/relock for the lengthy operation like the event dup,
      the loop keeps the ioctl_mutex for the whole time in other
      situations.  This may take quite long time if the user-space would
      give a huge buffer, and this is a likely cause of some weird behavior
      spotted by syzcaller fuzzer.
      
      This patch puts a simple workaround, just adding a mutex break in the
      loop when a large number of events have been processed.  This
      shouldn't hit any performance drop because the threshold is set high
      enough for usual operations.
      
      Fixes: 7bd80091 ("ALSA: seq: More protection for concurrent write and ioctl races")
      Reported-by: syzbot+97aae04ce27e39cbfca9@syzkaller.appspotmail.com
      Reported-by: syzbot+4c595632b98bb8ffcc66@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a163566
    • Xiao Ni's avatar
      raid5-cache: Need to do start() part job after adding journal device · 70ec79ce
      Xiao Ni authored
      commit d9771f5e upstream.
      
      commit d5d885fd ("md: introduce new personality funciton start()")
      splits the init job to two parts. The first part run() does the jobs that
      do not require the md threads. The second part start() does the jobs that
      require the md threads.
      
      Now it just does run() in adding new journal device. It needs to do the
      second part start() too.
      
      Fixes: d5d885fd ("md: introduce new personality funciton start()")
      Cc: stable@vger.kernel.org #v4.9+
      Reported-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70ec79ce
    • Mark Brown's avatar
      ASoC: core: Adapt for debugfs API change · e36f752d
      Mark Brown authored
      commit c2c928c9 upstream.
      
      Back in ff9fb72b (debugfs: return error values, not NULL) the
      debugfs APIs were changed to return error pointers rather than NULL
      pointers on error, breaking the error checking in ASoC. Update the
      code to use IS_ERR() and log the codes that are returned as part of
      the error messages.
      
      Fixes: ff9fb72b (debugfs: return error values, not NULL)
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e36f752d
    • Mark Brown's avatar
      ASoC: dapm: Adapt for debugfs API change · f4d67769
      Mark Brown authored
      commit ceaea851 upstream.
      
      Back in ff9fb72b (debugfs: return error values, not NULL) the
      debugfs APIs were changed to return error pointers rather than NULL
      pointers on error, breaking the error checking in ASoC. Update the
      code to use IS_ERR() and log the codes that are returned as part of
      the error messages.
      
      Fixes: ff9fb72b (debugfs: return error values, not NULL)
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d67769
    • Christophe Leroy's avatar
      lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE · 688cef50
      Christophe Leroy authored
      commit aeb87246 upstream.
      
      All mapping iterator logic is based on the assumption that sg->offset
      is always lower than PAGE_SIZE.
      
      But there are situations where sg->offset is such that the SG item
      is on the second page. In that case sg_copy_to_buffer() fails
      properly copying the data into the buffer. One of the reason is
      that the data will be outside the kmapped area used to access that
      data.
      
      This patch fixes the issue by adjusting the mapping iterator
      offset and pgoffset fields such that offset is always lower than
      PAGE_SIZE.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Fixes: 4225fc85 ("lib/scatterlist: use page iterator in the mapping iterator")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      688cef50
    • Trond Myklebust's avatar
      SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request · fa1a4486
      Trond Myklebust authored
      commit 75369089 upstream.
      
      The bvec tracks the list of pages, so if the number of pages changes
      due to a re-encode, we need to reset the bvec as well.
      
      Fixes: 277e4ab7 ("SUNRPC: Simplify TCP receive code by switching...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa1a4486
    • Trond Myklebust's avatar
      pnfs: Fix a problem where we gratuitously start doing I/O through the MDS · 95934ea7
      Trond Myklebust authored
      commit 58bbeab4 upstream.
      
      If the client has to stop in pnfs_update_layout() to wait for another
      layoutget to complete, it currently exits and defaults to I/O through
      the MDS if the layoutget was successful.
      
      Fixes: d03360aa ("pNFS: Ensure we return the error if someone kills...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95934ea7
    • Trond Myklebust's avatar
      pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error · 119c5aa1
      Trond Myklebust authored
      commit 8e04fdfa upstream.
      
      mirror->mirror_ds can be NULL if uninitialised, but can contain
      a PTR_ERR() if call to GETDEVICEINFO failed.
      
      Fixes: 65990d1a ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # 4.10+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      119c5aa1
    • Max Kellermann's avatar
      Revert "NFS: readdirplus optimization by cache mechanism" (memleak) · 3536b79b
      Max Kellermann authored
      commit db531db9 upstream.
      
      This reverts commit be4c2d47.
      
      That commit caused a severe memory leak in nfs_readdir_make_qstr().
      
      When listing a directory with more than 100 files (this is how many
      struct nfs_cache_array_entry elements fit in one 4kB page), all
      allocated file name strings past those 100 leak.
      
      The root of the leakage is that those string pointers are managed in
      pages which are never linked into the page cache.
      
      fs/nfs/dir.c puts pages into the page cache by calling
      read_cache_page(); the callback function nfs_readdir_filler() will
      then fill the given page struct which was passed to it, which is
      already linked in the page cache (by do_read_cache_page() calling
      add_to_page_cache_lru()).
      
      Commit be4c2d47 added another (local) array of allocated pages, to
      be filled with more data, instead of discarding excess items received
      from the NFS server.  Those additional pages can be used by the next
      nfs_readdir_filler() call (from within the same nfs_readdir() call).
      
      The leak happens when some of those additional pages are never used
      (copied to the page cache using copy_highpage()).  The pages will be
      freed by nfs_readdir_free_pages(), but their contents will not.  The
      commit did not invoke nfs_readdir_clear_array() (and doing so would
      have been dangerous, because it did not track which of those pages
      were already copied to the page cache, risking double free bugs).
      
      How to reproduce the leak:
      
      - Use a kernel with CONFIG_SLUB_DEBUG_ON.
      
      - Create a directory on a NFS mount with more than 100 files with
        names long enough to use the "kmalloc-32" slab (so we can easily
        look up the allocation counts):
      
        for i in `seq 110`; do touch ${i}_0123456789abcdef; done
      
      - Drop all caches:
      
        echo 3 >/proc/sys/vm/drop_caches
      
      - Check the allocation counter:
      
        grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
        30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1
      
      - Request a directory listing and check the allocation counters again:
      
        ls
        [...]
        grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
        30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1
      
      There are now 120 new allocations.
      
      - Drop all caches and check the counters again:
      
        echo 3 >/proc/sys/vm/drop_caches
        grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
        30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1
      
      110 allocations are gone, but 10 have leaked and will never be freed.
      
      Unhelpfully, those allocations are explicitly excluded from KMEMLEAK,
      that's why my initial attempts with KMEMLEAK were not successful:
      
      	/*
      	 * Avoid a kmemleak false positive. The pointer to the name is stored
      	 * in a page cache page which kmemleak does not scan.
      	 */
      	kmemleak_not_leak(string->name);
      
      It would be possible to solve this bug without reverting the whole
      commit:
      
      - keep track of which pages were not used, and call
        nfs_readdir_clear_array() on them, or
      - manually link those pages into the page cache
      
      But for now I have decided to just revert the commit, because the real
      fix would require complex considerations, risking more dangerous
      (crash) bugs, which may seem unsuitable for the stable branches.
      Signed-off-by: default avatarMax Kellermann <mk@cm4all.com>
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3536b79b
    • Trond Myklebust's avatar
      NFSv4: Handle the special Linux file open access mode · b3dd02f9
      Trond Myklebust authored
      commit 44942b4e upstream.
      
      According to the open() manpage, Linux reserves the access mode 3
      to mean "check for read and write permission on the file and return
      a file descriptor that can't be used for reading or writing."
      
      Currently, the NFSv4 code will ask the server to open the file,
      and will use an incorrect share access mode of 0. Since it has
      an incorrect share access mode, the client later forgets to send
      a corresponding close, meaning it can leak stateids on the server.
      
      Fixes: ce4ef7c0 ("NFS: Split out NFS v4 file operations")
      Cc: stable@vger.kernel.org # 3.6+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3dd02f9
    • Julien Thierry's avatar
      arm64: Fix interrupt tracing in the presence of NMIs · 5c48d5d4
      Julien Thierry authored
      commit 17ce302f upstream.
      
      In the presence of any form of instrumentation, nmi_enter() should be
      done before calling any traceable code and any instrumentation code.
      
      Currently, nmi_enter() is done in handle_domain_nmi(), which is much
      too late as instrumentation code might get called before. Move the
      nmi_enter/exit() calls to the arch IRQ vector handler.
      
      On arm64, it is not possible to know if the IRQ vector handler was
      called because of an NMI before acknowledging the interrupt. However, It
      is possible to know whether normal interrupts could be taken in the
      interrupted context (i.e. if taking an NMI in that context could
      introduce a potential race condition).
      
      When interrupting a context with IRQs disabled, call nmi_enter() as soon
      as possible. In contexts with IRQs enabled, defer this to the interrupt
      controller, which is in a better position to know if an interrupt taken
      is an NMI.
      
      Fixes: bc3c03cc ("arm64: Enable the support of pseudo-NMIs")
      Cc: <stable@vger.kernel.org> # 5.1.x-
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c48d5d4
    • Dmitry Osipenko's avatar
      opp: Don't use IS_ERR on invalid supplies · 44c92c0c
      Dmitry Osipenko authored
      commit 560d1bca upstream.
      
      _set_opp_custom() receives a set of OPP supplies as its arguments and
      the caller of it passes NULL when the supplies are not valid. But
      _set_opp_custom(), by mistake, checks for error by performing
      IS_ERR(old_supply) on it which will always evaluate to false.
      
      The problem was spotted during of testing of upcoming update for the
      NVIDIA Tegra CPUFreq driver.
      
      Cc: stable <stable@vger.kernel.org>
      Fixes: 7e535993 ("OPP: Separate out custom OPP handler specific code")
      Reported-by: default avatarMarc Dietrich <marvin24@gmx.de>
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      [ Viresh: Massaged changelog ]
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44c92c0c