1. 22 May, 2019 33 commits
  2. 16 May, 2019 7 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.44 · dafc674b
      Greg Kroah-Hartman authored
      dafc674b
    • Dexuan Cui's avatar
      PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary · 9fa23ea1
      Dexuan Cui authored
      commit 340d4556 upstream.
      
      When we hot-remove a device, usually the host sends us a PCI_EJECT message,
      and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
      
      When we execute the quick hot-add/hot-remove test, the host may not send
      us the PCI_EJECT message if the guest has not fully finished the
      initialization by sending the PCI_RESOURCES_ASSIGNED* message to the
      host, so it's potentially unsafe to only depend on the
      pci_destroy_slot() in hv_eject_device_work() because the code path
      
      create_root_hv_pci_bus()
       -> hv_pci_assign_slots()
      
      is not called in this case. Note: in this case, the host still sends the
      guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
      
      In the quick hot-add/hot-remove test, we can have such a race before
      the code path
      
      pci_devices_present_work()
       -> new_pcichild_device()
      
      adds the new device into the hbus->children list, we may have already
      received the PCI_EJECT message, and since the tasklet handler
      
      hv_pci_onchannelcallback()
      
      may fail to find the "hpdev" by calling
      
      get_pcichild_wslot(hbus, dev_message->wslot.slot)
      
      hv_pci_eject_device() is not called; Later, by continuing execution
      
      create_root_hv_pci_bus()
       -> hv_pci_assign_slots()
      
      creates the slot and the PCI_BUS_RELATIONS message with
      bus_rel->device_count == 0 removes the device from hbus->children, and
      we end up being unable to remove the slot in
      
      hv_pci_remove()
       -> hv_pci_remove_slots()
      
      Remove the slot in pci_devices_present_work() when the device
      is removed to address this race.
      
      pci_devices_present_work() and hv_eject_device_work() run in the
      singled-threaded hbus->wq, so there is not a double-remove issue for the
      slot.
      
      We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback()
      to the workqueue, because we need the hv_pci_onchannelcallback()
      synchronously call hv_pci_eject_device() to poll the channel
      ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue
      fixed in commit de0aa7b2 ("PCI: hv: Fix 2 hang issues in
      hv_compose_msi_msg()")
      
      Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: rewritten commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fa23ea1
    • Dexuan Cui's avatar
      PCI: hv: Add hv_pci_remove_slots() when we unload the driver · 76888d13
      Dexuan Cui authored
      commit 15becc2b upstream.
      
      When we unload the pci-hyperv host controller driver, the host does not
      send us a PCI_EJECT message.
      
      In this case we also need to make sure the sysfs PCI slot directory is
      removed, otherwise a command on a slot file eg:
      
      "cat /sys/bus/pci/slots/2/address"
      
      will trigger a
      
      "BUG: unable to handle kernel paging request"
      
      and, if we unload/reload the driver several times we would end up with
      stale slot entries in PCI slot directories in /sys/bus/pci/slots/
      
      root@localhost:~# ls -rtl  /sys/bus/pci/slots/
      total 0
      drwxr-xr-x 2 root root 0 Feb  7 10:49 2
      drwxr-xr-x 2 root root 0 Feb  7 10:49 2-1
      drwxr-xr-x 2 root root 0 Feb  7 10:51 2-2
      
      Add the missing code to remove the PCI slot and fix the current
      behaviour.
      
      Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: reformatted the log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76888d13
    • Dexuan Cui's avatar
      PCI: hv: Fix a memory leak in hv_eject_device_work() · a47e0054
      Dexuan Cui authored
      commit 05f151a7 upstream.
      
      When a device is created in new_pcichild_device(), hpdev->refs is set
      to 2 (i.e. the initial value of 1 plus the get_pcichild()).
      
      When we hot remove the device from the host, in a Linux VM we first call
      hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
      then schedules a work of hv_eject_device_work(), so hpdev->refs becomes
      3 (let's ignore the paired get/put_pcichild() in other places). But in
      hv_eject_device_work(), currently we only call put_pcichild() twice,
      meaning the 'hpdev' struct can't be freed in put_pcichild().
      
      Add one put_pcichild() to fix the memory leak.
      
      The device can also be removed when we run "rmmod pci-hyperv". On this
      path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
      hpdev->refs is 2, and we do correctly call put_pcichild() twice in
      pci_devices_present_work().
      
      Fixes: 4daace0d ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: commit log rework]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a47e0054
    • Laurentiu Tudor's avatar
      powerpc/booke64: set RI in default MSR · 4179b858
      Laurentiu Tudor authored
      commit 5266e58d upstream.
      
      Set RI in the default kernel's MSR so that the architected way of
      detecting unrecoverable machine check interrupts has a chance to work.
      This is inline with the MSR setup of the rest of booke powerpc
      architectures configured here.
      Signed-off-by: default avatarLaurentiu Tudor <laurentiu.tudor@nxp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4179b858
    • Russell Currey's avatar
      powerpc/powernv/idle: Restore IAMR after idle · 71b20cdb
      Russell Currey authored
      commit a3f3072d upstream.
      
      Without restoring the IAMR after idle, execution prevention on POWER9
      with Radix MMU is overwritten and the kernel can freely execute
      userspace without faulting.
      
      This is necessary when returning from any stop state that modifies
      user state, as well as hypervisor state.
      
      To test how this fails without this patch, load the lkdtm driver and
      do the following:
      
        $ echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
      
      which won't fault, then boot the kernel with powersave=off, where it
      will fault. Applying this patch will fix this.
      
      Fixes: 3b10d009 ("powerpc/mm/radix: Prevent kernel execution of user space")
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71b20cdb
    • Rick Lindsley's avatar
      powerpc/book3s/64: check for NULL pointer in pgd_alloc() · 69c2b71c
      Rick Lindsley authored
      commit f3935626 upstream.
      
      When the memset code was added to pgd_alloc(), it failed to consider
      that kmem_cache_alloc() can return NULL. It's uncommon, but not
      impossible under heavy memory contention. Example oops:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000a4000
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        CPU: 70 PID: 48471 Comm: entrypoint.sh Kdump: loaded Not tainted 4.14.0-115.6.1.el7a.ppc64le #1
        task: c000000334a00000 task.stack: c000000331c00000
        NIP:  c0000000000a4000 LR: c00000000012f43c CTR: 0000000000000020
        REGS: c000000331c039c0 TRAP: 0300   Not tainted  (4.14.0-115.6.1.el7a.ppc64le)
        MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 44022840  XER: 20040000
        CFAR: c000000000008874 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
        ...
        NIP [c0000000000a4000] memset+0x68/0x104
        LR [c00000000012f43c] mm_init+0x27c/0x2f0
        Call Trace:
          mm_init+0x260/0x2f0 (unreliable)
          copy_mm+0x11c/0x638
          copy_process.isra.28.part.29+0x6fc/0x1080
          _do_fork+0xdc/0x4c0
          ppc_clone+0x8/0xc
        Instruction dump:
        409e000c b0860000 38c60002 409d000c 90860000 38c60004 78a0d183 78a506a0
        7c0903a6 41820034 60000000 60420000 <f8860000> f8860008 f8860010 f8860018
      
      Fixes: fc5c2f4a ("powerpc/mm/hash64: Zero PGD pages on allocation")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: default avatarRick Lindsley <ricklind@vnet.linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69c2b71c