1. 26 Jul, 2019 40 commits
    • Danit Goldberg's avatar
      IB/mlx5: Report correctly tag matching rendezvous capability · 7211b040
      Danit Goldberg authored
      commit 89705e92 upstream.
      
      Userspace expects the IB_TM_CAP_RC bit to indicate that the device
      supports RC transport tag matching with rendezvous offload. However the
      firmware splits this into two capabilities for eager and rendezvous tag
      matching.
      
      Only if the FW supports both modes should userspace be told the tag
      matching capability is available.
      
      Cc: <stable@vger.kernel.org> # 4.13
      Fixes: eb761894 ("IB/mlx5: Fill XRQ capabilities")
      Signed-off-by: default avatarDanit Goldberg <danitg@mellanox.com>
      Reviewed-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7211b040
    • Filipe Manana's avatar
      Btrfs: add missing inode version, ctime and mtime updates when punching hole · 55d036c1
      Filipe Manana authored
      commit 17900668 upstream.
      
      If the range for which we are punching a hole covers only part of a page,
      we end up updating the inode item but we skip the update of the inode's
      iversion, mtime and ctime. Fix that by ensuring we update those properties
      of the inode.
      
      A patch for fstests test case generic/059 that tests this as been sent
      along with this fix.
      
      Fixes: 2aaa6655 ("Btrfs: add hole punching")
      Fixes: e8c1c76e ("Btrfs: add missing inode update when punching hole")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55d036c1
    • Filipe Manana's avatar
      Btrfs: fix fsync not persisting dentry deletions due to inode evictions · 82e85ad0
      Filipe Manana authored
      commit 803f0f64 upstream.
      
      In order to avoid searches on a log tree when unlinking an inode, we check
      if the inode being unlinked was logged in the current transaction, as well
      as the inode of its parent directory. When any of the inodes are logged,
      we proceed to delete directory items and inode reference items from the
      log, to ensure that if a subsequent fsync of only the inode being unlinked
      or only of the parent directory when the other is not fsync'ed as well,
      does not result in the entry still existing after a power failure.
      
      That check however is not reliable when one of the inodes involved (the
      one being unlinked or its parent directory's inode) is evicted, since the
      logged_trans field is transient, that is, it is not stored on disk, so it
      is lost when the inode is evicted and loaded into memory again (which is
      set to zero on load). As a consequence the checks currently being done by
      btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always
      return true if the inode was evicted before, regardless of the inode
      having been logged or not before (and in the current transaction), this
      results in the dentry being unlinked still existing after a log replay
      if after the unlink operation only one of the inodes involved is fsync'ed.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ touch /mnt/dir/foo
        $ xfs_io -c fsync /mnt/dir/foo
      
        # Keep an open file descriptor on our directory while we evict inodes.
        # We just want to evict the file's inode, the directory's inode must not
        # be evicted.
        $ ( cd /mnt/dir; while true; do :; done ) &
        $ pid=$!
      
        # Wait a bit to give time to background process to chdir to our test
        # directory.
        $ sleep 0.5
      
        # Trigger eviction of the file's inode.
        $ echo 2 > /proc/sys/vm/drop_caches
      
        # Unlink our file and fsync the parent directory. After a power failure
        # we don't expect to see the file anymore, since we fsync'ed the parent
        # directory.
        $ rm -f $SCRATCH_MNT/dir/foo
        $ xfs_io -c fsync /mnt/dir
      
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ ls /mnt/dir
        foo
        $
         --> file still there, unlink not persisted despite explicit fsync on dir
      
      Fix this by checking if the inode has the full_sync bit set in its runtime
      flags as well, since that bit is set everytime an inode is loaded from
      disk, or for other less common cases such as after a shrinking truncate
      or failure to allocate extent maps for holes, and gets cleared after the
      first fsync. Also consider the inode as possibly logged only if it was
      last modified in the current transaction (besides having the full_fsync
      flag set).
      
      Fixes: 3a5f1d45 ("Btrfs: Optimize btree walking while logging inodes")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82e85ad0
    • Filipe Manana's avatar
      Btrfs: fix data loss after inode eviction, renaming it, and fsync it · 79906804
      Filipe Manana authored
      commit d1d832a0 upstream.
      
      When we log an inode, regardless of logging it completely or only that it
      exists, we always update it as logged (logged_trans and last_log_commit
      fields of the inode are updated). This is generally fine and avoids future
      attempts to log it from having to do repeated work that brings no value.
      
      However, if we write data to a file, then evict its inode after all the
      dealloc was flushed (and ordered extents completed), rename the file and
      fsync it, we end up not logging the new extents, since the rename may
      result in logging that the inode exists in case the parent directory was
      logged before. The following reproducer shows and explains how this can
      happen:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ touch /mnt/dir/foo
        $ touch /mnt/dir/bar
      
        # Do a direct IO write instead of a buffered write because with a
        # buffered write we would need to make sure dealloc gets flushed and
        # complete before we do the inode eviction later, and we can not do that
        # from user space with call to things such as sync(2) since that results
        # in a transaction commit as well.
        $ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar
      
        # Keep the directory dir in use while we evict inodes. We want our file
        # bar's inode to be evicted but we don't want our directory's inode to
        # be evicted (if it were evicted too, we would not be able to reproduce
        # the issue since the first fsync below, of file foo, would result in a
        # transaction commit.
        $ ( cd /mnt/dir; while true; do :; done ) &
        $ pid=$!
      
        # Wait a bit to give time for the background process to chdir.
        $ sleep 0.1
      
        # Evict all inodes, except the inode for the directory dir because it is
        # currently in use by our background process.
        $ echo 2 > /proc/sys/vm/drop_caches
      
        # fsync file foo, which ends up persisting information about the parent
        # directory because it is a new inode.
        $ xfs_io -c fsync /mnt/dir/foo
      
        # Rename bar, this results in logging that this inode exists (inode item,
        # names, xattrs) because the parent directory is in the log.
        $ mv /mnt/dir/bar /mnt/dir/baz
      
        # Now fsync baz, which ends up doing absolutely nothing because of the
        # rename operation which logged that the inode exists only.
        $ xfs_io -c fsync /mnt/dir/baz
      
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ od -t x1 -A d /mnt/dir/baz
        0000000
      
          --> Empty file, data we wrote is missing.
      
      Fix this by not updating last_sub_trans of an inode when we are logging
      only that it exists and the inode was not yet logged since it was loaded
      from disk (full_sync bit set), this is enough to make btrfs_inode_in_log()
      return false for this scenario and make us log the inode. The logged_trans
      of the inode is still always setsince that alone is used to track if names
      need to be deleted as part of unlink operations.
      
      Fixes: 257c62e1 ("Btrfs: avoid tree log commit when there are no changes")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79906804
    • Niklas Cassel's avatar
      PCI: qcom: Ensure that PERST is asserted for at least 100 ms · 97392d4b
      Niklas Cassel authored
      commit 64adde31 upstream.
      
      Currently, there is only a 1 ms sleep after asserting PERST.
      
      Reading the datasheets for different endpoints, some require PERST to be
      asserted for 10 ms in order for the endpoint to perform a reset, others
      require it to be asserted for 50 ms.
      
      Several SoCs using this driver uses PCIe Mini Card, where we don't know
      what endpoint will be plugged in.
      
      The PCI Express Card Electromechanical Specification r2.0, section
      2.2, "PERST# Signal" specifies:
      
      "On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from
      the power rails achieving specified operating limits."
      
      Add a sleep of 100 ms before deasserting PERST, in order to ensure that
      we are compliant with the spec.
      
      Fixes: 82a82383 ("PCI: qcom: Add Qualcomm PCIe controller driver")
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: default avatarStanimir Varbanov <svarbanov@mm-sol.com>
      Cc: stable@vger.kernel.org # 4.5+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97392d4b
    • Mika Westerberg's avatar
      PCI: Do not poll for PME if the device is in D3cold · e67c8a7e
      Mika Westerberg authored
      commit 000dd531 upstream.
      
      PME polling does not take into account that a device that is directly
      connected to the host bridge may go into D3cold as well. This leads to a
      situation where the PME poll thread reads from a config space of a
      device that is in D3cold and gets incorrect information because the
      config space is not accessible.
      
      Here is an example from Intel Ice Lake system where two PCIe root ports
      are in D3cold (I've instrumented the kernel to log the PMCSR register
      contents):
      
        [   62.971442] pcieport 0000:00:07.1: Check PME status, PMCSR=0xffff
        [   62.971504] pcieport 0000:00:07.0: Check PME status, PMCSR=0xffff
      
      Since 0xffff is interpreted so that PME is pending, the root ports will
      be runtime resumed. This repeats over and over again essentially
      blocking all runtime power management.
      
      Prevent this from happening by checking whether the device is in D3cold
      before its PME status is read.
      
      Fixes: 71a83bd7 ("PCI/PM: add runtime PM support to PCIe port")
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: 3.6+ <stable@vger.kernel.org> # v3.6+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e67c8a7e
    • Dexuan Cui's avatar
      PCI: hv: Fix a use-after-free bug in hv_eject_device_work() · d3fbb2a1
      Dexuan Cui authored
      commit 4df591b2 upstream.
      
      Fix a use-after-free in hv_eject_device_work().
      
      Fixes: 05f151a7 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3fbb2a1
    • Alexander Shishkin's avatar
      intel_th: pci: Add Ice Lake NNPI support · 68d2b51d
      Alexander Shishkin authored
      commit 4aa5aed2 upstream.
      
      This adds Ice Lake NNPI support to the Intel(R) Trace Hub.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20190621161930.60785-5-alexander.shishkin@linux.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68d2b51d
    • Bart Van Assche's avatar
      RDMA/srp: Accept again source addresses that do not have a port number · 2cb05390
      Bart Van Assche authored
      commit bcef5b72 upstream.
      
      The function srp_parse_in() is used both for parsing source address
      specifications and for target address specifications. Target addresses
      must have a port number. Having to specify a port number for source
      addresses is inconvenient. Make sure that srp_parse_in() supports again
      parsing addresses with no port number.
      
      Cc: <stable@vger.kernel.org>
      Fixes: c62adb7d ("IB/srp: Fix IPv6 address parsing")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cb05390
    • Damien Le Moal's avatar
      block: Fix potential overflow in blk_report_zones() · c1bbef41
      Damien Le Moal authored
      commit 113ab72e upstream.
      
      For large values of the number of zones reported and/or large zone
      sizes, the sector increment calculated with
      
      blk_queue_zone_sectors(q) * n
      
      in blk_report_zones() loop can overflow the unsigned int type used for
      the calculation as both "n" and blk_queue_zone_sectors() value are
      unsigned int. E.g. for a device with 256 MB zones (524288 sectors),
      overflow happens with 8192 or more zones reported.
      
      Changing the return type of blk_queue_zone_sectors() to sector_t, fixes
      this problem and avoids overflow problem for all other callers of this
      helper too. The same change is also applied to the bdev_zone_sectors()
      helper.
      
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1bbef41
    • Damien Le Moal's avatar
      block: Allow mapping of vmalloc-ed buffers · 397918f6
      Damien Le Moal authored
      commit b4c5875d upstream.
      
      To allow the SCSI subsystem scsi_execute_req() function to issue
      requests using large buffers that are better allocated with vmalloc()
      rather than kmalloc(), modify bio_map_kern() to allow passing a buffer
      allocated with vmalloc().
      
      To do so, detect vmalloc-ed buffers using is_vmalloc_addr(). For
      vmalloc-ed buffers, flush the buffer using flush_kernel_vmap_range(),
      use vmalloc_to_page() instead of virt_to_page() to obtain the pages of
      the buffer, and invalidate the buffer addresses with
      invalidate_kernel_vmap_range() on completion of read BIOs. This last
      point is executed using the function bio_invalidate_vmalloc_pages()
      which is defined only if the architecture defines
      ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, that is, if the architecture
      actually needs the invalidation done.
      
      Fixes: 515ce606 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      397918f6
    • Andres Rodriguez's avatar
      drm/edid: parse CEA blocks embedded in DisplayID · 9949a900
      Andres Rodriguez authored
      commit e28ad544 upstream.
      
      DisplayID blocks allow embedding of CEA blocks. The payloads are
      identical to traditional top level CEA extension blocks, but the header
      is slightly different.
      
      This change allows the CEA parser to find a CEA block inside a DisplayID
      block. Additionally, it adds support for parsing the embedded CTA
      header. No further changes are necessary due to payload parity.
      
      This change fixes audio support for the Valve Index HMD.
      Signed-off-by: default avatarAndres Rodriguez <andresx7@gmail.com>
      Reviewed-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v4.15
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190619180901.17901-1-andresx7@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9949a900
    • Kim Phillips's avatar
      perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs · e457f13e
      Kim Phillips authored
      commit 2f217d58 upstream.
      
      Fill in the L3 performance event select register ThreadMask
      bitfield, to enable per hardware thread accounting.
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190628215906.4276-2-kim.phillips@amd.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e457f13e
    • Kim Phillips's avatar
      perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs · 3e1b297e
      Kim Phillips authored
      commit 16f46411 upstream.
      
      The following commit:
      
        d7cbbe49 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
      
      enables L3 PMC events for all threads and slices by writing 1's in
      'ChL3PmcCfg' (L3 PMC PERF_CTL) register fields.
      
      Those bitfields overlap with high order event select bits in the Data
      Fabric PMC control register, however.
      
      So when a user requests raw Data Fabric events (-e amd_df/event=0xYYY/),
      the two highest order bits get inadvertently set, changing the counter
      select to events that don't exist, and for which no counts are read.
      
      This patch changes the logic to write the L3 masks only when dealing
      with L3 PMC counters.
      
      AMD Family 16h and below Northbridge (NB) counters were not affected.
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: d7cbbe49 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
      Link: https://lkml.kernel.org/r/20190628215906.4276-1-kim.phillips@amd.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e1b297e
    • Kan Liang's avatar
      perf/x86/intel: Fix spurious NMI on fixed counter · f7aa77ce
      Kan Liang authored
      commit e4557c1a upstream.
      
      If a user first sample a PEBS event on a fixed counter, then sample a
      non-PEBS event on the same fixed counter on Icelake, it will trigger
      spurious NMI. For example:
      
        perf record -e 'cycles:p' -a
        perf record -e 'cycles' -a
      
      The error message for spurious NMI:
      
        [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2.
        [    +0.000000] Do you have a strange power saving mode enabled?
        [    +0.000000] Dazed and confused, but trying to continue
      
      The bug was introduced by the following commit:
      
        commit 6f55967a ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      
      The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(),
      which returns immediately.  The related bit of PEBS_ENABLE MSR will never be
      cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter,
      but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs.
      
      Check and disable PEBS for fixed counters after intel_pmu_disable_fixed().
      Reported-by: default avatarYi, Ammy <ammy.yi@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 6f55967a ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7aa77ce
    • David Rientjes's avatar
      x86/boot: Fix memory leak in default_get_smp_config() · 7a45c683
      David Rientjes authored
      commit e74bd969 upstream.
      
      When default_get_smp_config() is called with early == 1 and mpf->feature1
      is non-zero, mpf is leaked because the return path does not do
      early_memunmap().
      
      Fix this and share a common exit routine.
      
      Fixes: 5997efb9 ("x86/boot: Use memremap() to map the MPF and MPC data")
      Reported-by: default avatarCfir Cohen <cfir@google.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091942570.28240@chino.kir.corp.google.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a45c683
    • Soeren Moch's avatar
      rt2x00usb: fix rx queue hang · bbe75669
      Soeren Moch authored
      commit 41a531ff upstream.
      
      Since commit ed194d13 ("usb: core: remove local_irq_save() around
       ->complete() handler") the handler rt2x00usb_interrupt_rxdone() is
      not running with interrupts disabled anymore. So this completion handler
      is not guaranteed to run completely before workqueue processing starts
      for the same queue entry.
      Be sure to set all other flags in the entry correctly before marking
      this entry ready for workqueue processing. This way we cannot miss error
      conditions that need to be signalled from the completion handler to the
      worker thread.
      Note that rt2x00usb_work_rxdone() processes all available entries, not
      only such for which queue_work() was called.
      
      This patch is similar to what commit df71c9cf ("rt2x00: fix order
      of entry flags modification") did for TX processing.
      
      This fixes a regression on a RT5370 based wifi stick in AP mode, which
      suddenly stopped data transmission after some period of heavy load. Also
      stopping the hanging hostapd resulted in the error message "ieee80211
      phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
      Other operation modes are probably affected as well, this just was
      the used testcase.
      
      Fixes: ed194d13 ("usb: core: remove local_irq_save() around ->complete() handler")
      Cc: stable@vger.kernel.org # 4.20+
      Signed-off-by: default avatarSoeren Moch <smoch@web.de>
      Acked-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbe75669
    • YueHaibing's avatar
      9p/virtio: Add cleanup path in p9_virtio_init · 7f235a53
      YueHaibing authored
      commit d4548543 upstream.
      
      KASAN report this:
      
      BUG: unable to handle kernel paging request at ffffffffa0097000
      PGD 3870067 P4D 3870067 PUD 3871063 PMD 2326e2067 PTE 0
      Oops: 0000 [#1
      CPU: 0 PID: 5340 Comm: modprobe Not tainted 5.1.0-rc7+ #25
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:__list_add_valid+0x10/0x70
      Code: c3 48 8b 06 55 48 89 e5 5d 48 39 07 0f 94 c0 0f b6 c0 c3 90 90 90 90 90 90 90 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 <48> 8b 32 48 39 f0 75 3a
      
      RSP: 0018:ffffc90000e23c68 EFLAGS: 00010246
      RAX: ffffffffa00ad000 RBX: ffffffffa009d000 RCX: 0000000000000000
      RDX: ffffffffa0097000 RSI: ffffffffa0097000 RDI: ffffffffa009d000
      RBP: ffffc90000e23c68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0097000
      R13: ffff888231797180 R14: 0000000000000000 R15: ffffc90000e23e78
      FS:  00007fb215285540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffa0097000 CR3: 000000022f144000 CR4: 00000000000006f0
      Call Trace:
       v9fs_register_trans+0x2f/0x60 [9pnet
       ? 0xffffffffa0087000
       p9_virtio_init+0x25/0x1000 [9pnet_virtio
       do_one_initcall+0x6c/0x3cc
       ? kmem_cache_alloc_trace+0x248/0x3b0
       do_init_module+0x5b/0x1f1
       load_module+0x1db1/0x2690
       ? m_show+0x1d0/0x1d0
       __do_sys_finit_module+0xc5/0xd0
       __x64_sys_finit_module+0x15/0x20
       do_syscall_64+0x6b/0x1d0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7fb214d8e839
      Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01
      
      RSP: 002b:00007ffc96554278 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000055e67eed2aa0 RCX: 00007fb214d8e839
      RDX: 0000000000000000 RSI: 000055e67ce95c2e RDI: 0000000000000003
      RBP: 000055e67ce95c2e R08: 0000000000000000 R09: 000055e67eed2aa0
      R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
      R13: 000055e67eeda500 R14: 0000000000040000 R15: 000055e67eed2aa0
      Modules linked in: 9pnet_virtio(+) 9pnet gre rfkill vmw_vsock_virtio_transport_common vsock [last unloaded: 9pnet_virtio
      CR2: ffffffffa0097000
      ---[ end trace 4a52bb13ff07b761
      
      If register_virtio_driver() fails in p9_virtio_init,
      we should call v9fs_unregister_trans() to do cleanup.
      
      Link: http://lkml.kernel.org/r/20190430115942.41840-1-yuehaibing@huawei.com
      Cc: stable@vger.kernel.org
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: b530cc79 ("9p: add virtio transport")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f235a53
    • YueHaibing's avatar
      9p/xen: Add cleanup path in p9_trans_xen_init · 36acd9cc
      YueHaibing authored
      commit 80a316ff upstream.
      
      If xenbus_register_frontend() fails in p9_trans_xen_init,
      we should call v9fs_unregister_trans() to do cleanup.
      
      Link: http://lkml.kernel.org/r/20190430143933.19368-1-yuehaibing@huawei.com
      Cc: stable@vger.kernel.org
      Fixes: 868eb122 ("xen/9pfs: introduce Xen 9pfs transport driver")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36acd9cc
    • Juergen Gross's avatar
      xen/events: fix binding user event channels to cpus · ccfc9d9d
      Juergen Gross authored
      commit bce5963b upstream.
      
      When binding an interdomain event channel to a vcpu via
      IOCTL_EVTCHN_BIND_INTERDOMAIN not only the event channel needs to be
      bound, but the affinity of the associated IRQi must be changed, too.
      Otherwise the IRQ and the event channel won't be moved to another vcpu
      in case the original vcpu they were bound to is going offline.
      
      Cc: <stable@vger.kernel.org> # 4.13
      Fixes: c48f64ab ("xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccfc9d9d
    • Damien Le Moal's avatar
      dm zoned: fix zone state management race · 842ee766
      Damien Le Moal authored
      commit 3b8cafdd upstream.
      
      dm-zoned uses the zone flag DMZ_ACTIVE to indicate that a zone of the
      backend device is being actively read or written and so cannot be
      reclaimed. This flag is set as long as the zone atomic reference
      counter is not 0. When this atomic is decremented and reaches 0 (e.g.
      on BIO completion), the active flag is cleared and set again whenever
      the zone is reused and BIO issued with the atomic counter incremented.
      These 2 operations (atomic inc/dec and flag set/clear) are however not
      always executed atomically under the target metadata mutex lock and
      this causes the warning:
      
      WARN_ON(!test_bit(DMZ_ACTIVE, &zone->flags));
      
      in dmz_deactivate_zone() to be displayed. This problem is regularly
      triggered with xfstests generic/209, generic/300, generic/451 and
      xfs/077 with XFS being used as the file system on the dm-zoned target
      device. Similarly, xfstests ext4/303, ext4/304, generic/209 and
      generic/300 trigger the warning with ext4 use.
      
      This problem can be easily fixed by simply removing the DMZ_ACTIVE flag
      and managing the "ACTIVE" state by directly looking at the reference
      counter value. To do so, the functions dmz_activate_zone() and
      dmz_deactivate_zone() are changed to inline functions respectively
      calling atomic_inc() and atomic_dec(), while the dmz_is_active() macro
      is changed to an inline function calling atomic_read().
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMasato Suzuki <masato.suzuki@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      842ee766
    • Daniel Jordan's avatar
      padata: use smp_mb in padata_reorder to avoid orphaned padata jobs · 2b335bac
      Daniel Jordan authored
      commit cf144f81 upstream.
      
      Testing padata with the tcrypt module on a 5.2 kernel...
      
          # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
          # modprobe tcrypt mode=211 sec=1
      
      ...produces this splat:
      
          INFO: task modprobe:10075 blocked for more than 120 seconds.
                Not tainted 5.2.0-base+ #16
          modprobe        D    0 10075  10064 0x80004080
          Call Trace:
           ? __schedule+0x4dd/0x610
           ? ring_buffer_unlock_commit+0x23/0x100
           schedule+0x6c/0x90
           schedule_timeout+0x3b/0x320
           ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
           wait_for_common+0x160/0x1a0
           ? wake_up_q+0x80/0x80
           { crypto_wait_req }             # entries in braces added by hand
           { do_one_aead_op }
           { test_aead_jiffies }
           test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
           do_test+0x4053/0x6a2b [tcrypt]
           ? 0xffffffffa00f4000
           tcrypt_mod_init+0x50/0x1000 [tcrypt]
           ...
      
      The second modprobe command never finishes because in padata_reorder,
      CPU0's load of reorder_objects is executed before the unlocking store in
      spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:
      
      CPU0                                 CPU1
      
      padata_reorder                       padata_do_serial
        LOAD reorder_objects  // 0
                                             INC reorder_objects  // 1
                                             padata_reorder
                                               TRYLOCK pd->lock   // failed
        UNLOCK pd->lock
      
      CPU0 deletes the timer before returning from padata_reorder and since no
      other job is submitted to padata, modprobe waits indefinitely.
      
      Add a pair of full barriers to guarantee proper ordering:
      
      CPU0                                 CPU1
      
      padata_reorder                       padata_do_serial
        UNLOCK pd->lock
        smp_mb()
        LOAD reorder_objects
                                             INC reorder_objects
                                             smp_mb__after_atomic()
                                             padata_reorder
                                               TRYLOCK pd->lock
      
      smp_mb__after_atomic is needed so the read part of the trylock operation
      comes after the INC, as Andrea points out.   Thanks also to Andrea for
      help with writing a litmus test.
      
      Fixes: 16295bec ("padata: Generic parallelization/serialization interface")
      Signed-off-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b335bac
    • Lyude Paul's avatar
      drm/nouveau/i2c: Enable i2c pads & busses during preinit · 8e74f324
      Lyude Paul authored
      commit 7cb95eee upstream.
      
      It turns out that while disabling i2c bus access from software when the
      GPU is suspended was a step in the right direction with:
      
      commit 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after
      ->fini()")
      
      We also ended up accidentally breaking the vbios init scripts on some
      older Tesla GPUs, as apparently said scripts can actually use the i2c
      bus. Since these scripts are executed before initializing any
      subdevices, we end up failing to acquire access to the i2c bus which has
      left a number of cards with their fan controllers uninitialized. Luckily
      this doesn't break hardware - it just means the fan gets stuck at 100%.
      
      This also means that we've always been using our i2c busses before
      initializing them during the init scripts for older GPUs, we just didn't
      notice it until we started preventing them from being used until init.
      It's pretty impressive this never caused us any issues before!
      
      So, fix this by initializing our i2c pad and busses during subdev
      pre-init. We skip initializing aux busses during pre-init, as those are
      guaranteed to only ever be used by nouveau for DP aux transactions.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Tested-by: default avatarMarc Meledandri <m.meledandri@gmail.com>
      Fixes: 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after ->fini()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e74f324
    • Linus Walleij's avatar
      ARM: dts: gemini: Set DIR-685 SPI CS as active low · 4c64814a
      Linus Walleij authored
      commit f90b8fda upstream.
      
      The SPI to the display on the DIR-685 is active low, we were
      just saved by the SPI library enforcing active low on everything
      before, so set it as active low to avoid ambiguity.
      
      Link: https://lore.kernel.org/r/20190715202101.16060-1-linus.walleij@linaro.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c64814a
    • Masahiro Yamada's avatar
      kconfig: fix missing choice values in auto.conf · ea3f1487
      Masahiro Yamada authored
      commit 8e2442a5 upstream.
      
      Since commit 00c864f8 ("kconfig: allow all config targets to write
      auto.conf if missing"), Kconfig creates include/config/auto.conf in the
      defconfig stage when it is missing.
      
      Joonas Kylmälä reported incorrect auto.conf generation under some
      circumstances.
      
      To reproduce it, apply the following diff:
      
      |  --- a/arch/arm/configs/imx_v6_v7_defconfig
      |  +++ b/arch/arm/configs/imx_v6_v7_defconfig
      |  @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
      |   CONFIG_USB_CONFIGFS_F_HID=y
      |   CONFIG_USB_CONFIGFS_F_UVC=y
      |   CONFIG_USB_CONFIGFS_F_PRINTER=y
      |  -CONFIG_USB_ZERO=m
      |  -CONFIG_USB_AUDIO=m
      |  -CONFIG_USB_ETH=m
      |  -CONFIG_USB_G_NCM=m
      |  -CONFIG_USB_GADGETFS=m
      |  -CONFIG_USB_FUNCTIONFS=m
      |  -CONFIG_USB_MASS_STORAGE=m
      |  -CONFIG_USB_G_SERIAL=m
      |  +CONFIG_USB_FUNCTIONFS=y
      |   CONFIG_MMC=y
      |   CONFIG_MMC_SDHCI=y
      |   CONFIG_MMC_SDHCI_PLTFM=y
      
      And then, run:
      
      $ make ARCH=arm mrproper imx_v6_v7_defconfig
      
      You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
      .config, but not in the auto.conf.
      
      Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
      block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
      
      This is probably a similar situation described in commit beaaddb6
      ("kconfig: tests: test defconfig when two choices interact").
      
      When sym_calc_choice() is called, the choice symbol forgets the
      SYMBOL_DEF_USER unless all of its choice values are explicitly set by
      the user.
      
      The choice symbol is given just one chance to recall it because
      set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
      is set.
      
      When sym_calc_choice() is called again, the choice symbol forgets it
      forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
      Hence, we cannot call sym_clear_all_valid() again and again.
      
      It is crazy to repeat set and unset of internal flags. However, we
      cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
      Doing so would re-introduce the problem solved by commit 5d09598d
      ("kconfig: fix new choices being skipped upon config update").
      
      To work around the issue, conf_write_autoconf() stopped calling
      sym_clear_all_valid().
      
      conf_write() must be changed accordingly. Currently, it clears
      SYMBOL_WRITE after the symbol is written into the .config file. This
      is needed to prevent it from writing the same symbol multiple times in
      case the symbol is declared in two or more locations. I added the new
      flag SYMBOL_WRITTEN, to track the symbols that have been written.
      
      Anyway, this is a cheesy workaround in order to suppress the issue
      as far as defconfig is concerned.
      
      Handling of choices is totally broken. sym_clear_all_valid() is called
      every time a user touches a symbol from the GUI interface. To reproduce
      it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
      around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
      from the .config file.
      
      I added the Fixes tag since it is more fatal than before. But, this
      has been broken since long long time before, and still it is.
      We should take a closer look to fix this correctly somehow.
      
      Fixes: 00c864f8 ("kconfig: allow all config targets to write auto.conf if missing")
      Cc: linux-stable <stable@vger.kernel.org> # 4.19+
      Reported-by: default avatarJoonas Kylmälä <joonas.kylmala@iki.fi>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Tested-by: default avatarJoonas Kylmälä <joonas.kylmala@iki.fi>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea3f1487
    • Vitor Soares's avatar
      i3c: fix i2c and i3c scl rate by bus mode · 3620a72c
      Vitor Soares authored
      commit ecc8fb54 upstream.
      
      Currently the I3C framework limits SCL frequency to FM speed when
      dealing with a mixed slow bus, even if all I2C devices are FM+ capable.
      
      The core was also not accounting for I3C speed limitations when
      operating in mixed slow mode and was erroneously using FM+ speed as the
      max I2C speed when operating in mixed fast mode.
      
      Fixes: 3a379bbc ("i3c: Add core I3C infrastructure")
      Signed-off-by: default avatarVitor Soares <vitor.soares@synopsys.com>
      Cc: Boris Brezillon <bbrezillon@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: <linux-kernel@vger.kernel.org>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3620a72c
    • Radoslaw Burny's avatar
      fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes. · 7eb45a94
      Radoslaw Burny authored
      commit 5ec27ec7 upstream.
      
      Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
      but this is not a correct behavior for proc.  Since sysctl permission
      check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
      sense to use these values in u_[ug]id of proc inodes.  In other words:
      although uid/gid in the inode is not read during test_perm, the inode
      logically belongs to the root of the namespace.  I have confirmed this
      with Eric Biederman at LPC and in this thread:
        https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com
      
      Consequences
      ============
      
      Since the i_[ug]id values of proc nodes are not used for permissions
      checks, this change usually makes no functional difference.  However, it
      causes an issue in a setup where:
      
       * a namespace container is created without root user in container -
         hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID
      
       * container creator tries to configure it by writing /proc/sys files,
         e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit
      
      Kernel does not allow to open an inode for writing if its i_[ug]id are
      invalid, making it impossible to write shmmax and thus - configure the
      container.
      
      Using a container with no root mapping is apparently rare, but we do use
      this configuration at Google.  Also, we use a generic tool to configure
      the container limits, and the inability to write any of them causes a
      failure.
      
      History
      =======
      
      The invalid uids/gids in inodes first appeared due to 81754357 (fs:
      Update i_[ug]id_(read|write) to translate relative to s_user_ns).
      However, AFAIK, this did not immediately cause any issues.  The
      inability to write to these "invalid" inodes was only caused by a later
      commit 0bd23d09 (vfs: Don't modify inodes with a uid or gid unknown
      to the vfs).
      
      Tested: Used a repro program that creates a user namespace without any
      mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
      Before the change, it shows the overflow uid, with the change it's 0.
      The overflow uid indicates that the uid in the inode is not correct and
      thus it is not possible to open the file for writing.
      
      Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
      Fixes: 0bd23d09 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
      Signed-off-by: default avatarRadoslaw Burny <rburny@google.com>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: John Sperbeck <jsperbeck@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7eb45a94
    • Eric W. Biederman's avatar
      signal: Correct namespace fixups of si_pid and si_uid · e897dd22
      Eric W. Biederman authored
      commit 7a0cf094 upstream.
      
      The function send_signal was split from __send_signal so that it would
      be possible to bypass the namespace logic based upon current[1].  As it
      turns out the si_pid and the si_uid fixup are both inappropriate in
      the case of kill_pid_usb_asyncio so move that logic into send_signal.
      
      It is difficult to arrange but possible for a signal with an si_code
      of SI_TIMER or SI_SIGIO to be sent across namespace boundaries.  In
      which case tests for when it is ok to change si_pid and si_uid based
      on SI_FROMUSER are incorrect.  Replace the use of SI_FROMUSER with a
      new test has_si_pid_and_used based on siginfo_layout.
      
      Now that the uid fixup is no longer present after expanding
      SEND_SIG_NOINFO properly calculate the si_uid that the target
      task needs to read.
      
      [1] 7978b567 ("signals: add from_ancestor_ns parameter to send_signal()")
      Cc: stable@vger.kernel.org
      Fixes: 6588c1e3 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
      Fixes: 6b550f94 ("user namespace: make signal.c respect user namespaces")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e897dd22
    • Eric W. Biederman's avatar
      signal/usb: Replace kill_pid_info_as_cred with kill_pid_usb_asyncio · c8c3ea85
      Eric W. Biederman authored
      commit 70f1b0d3 upstream.
      
      The usb support for asyncio encoded one of it's values in the wrong
      field.  It should have used si_value but instead used si_addr which is
      not present in the _rt union member of struct siginfo.
      
      The practical result of this is that on a 64bit big endian kernel
      when delivering a signal to a 32bit process the si_addr field
      is set to NULL, instead of the expected pointer value.
      
      This issue can not be fixed in copy_siginfo_to_user32 as the usb
      usage of the the _sigfault (aka si_addr) member of the siginfo
      union when SI_ASYNCIO is set is incompatible with the POSIX and
      glibc usage of the _rt member of the siginfo union.
      
      Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
      dedicated function for this one specific case.  There are no other
      users of kill_pid_info_as_cred so this specialization should have no
      impact on the amount of code in the kernel.  Have kill_pid_usb_asyncio
      take instead of a siginfo_t which is difficult and error prone, 3
      arguments, a signal number, an errno value, and an address enconded as
      a sigval_t.  The encoding of the address as a sigval_t allows the
      code that reads the userspace request for a signal to handle this
      compat issue along with all of the other compat issues.
      
      Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
      the pointer value at the in si_pid (instead of si_addr).  That is the
      code now verifies that si_pid and si_addr always occur at the same
      location.  Further the code veries that for native structures a value
      placed in si_pid and spilling into si_uid will appear in userspace in
      si_addr (on a byte by byte copy of siginfo or a field by field copy of
      siginfo).  The code also verifies that for a 64bit kernel and a 32bit
      userspace the 32bit pointer will fit in si_pid.
      
      I have used the usbsig.c program below written by Alan Stern and
      slightly tweaked by me to run on a big endian machine to verify the
      issue exists (on sparc64) and to confirm the patch below fixes the issue.
      
       /* usbsig.c -- test USB async signal delivery */
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <fcntl.h>
       #include <signal.h>
       #include <string.h>
       #include <sys/ioctl.h>
       #include <unistd.h>
       #include <endian.h>
       #include <linux/usb/ch9.h>
       #include <linux/usbdevice_fs.h>
      
       static struct usbdevfs_urb urb;
       static struct usbdevfs_disconnectsignal ds;
       static volatile sig_atomic_t done = 0;
      
       void urb_handler(int sig, siginfo_t *info , void *ucontext)
       {
       	printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
       	       sig, info->si_signo, info->si_errno, info->si_code,
       	       info->si_addr, &urb);
      
       	printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
       }
      
       void ds_handler(int sig, siginfo_t *info , void *ucontext)
       {
       	printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
       	       sig, info->si_signo, info->si_errno, info->si_code,
       	       info->si_addr, &ds);
      
       	printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
       	done = 1;
       }
      
       int main(int argc, char **argv)
       {
       	char *devfilename;
       	int fd;
       	int rc;
       	struct sigaction act;
       	struct usb_ctrlrequest *req;
       	void *ptr;
       	char buf[80];
      
       	if (argc != 2) {
       		fprintf(stderr, "Usage: usbsig device-file-name\n");
       		return 1;
       	}
      
       	devfilename = argv[1];
       	fd = open(devfilename, O_RDWR);
       	if (fd == -1) {
       		perror("Error opening device file");
       		return 1;
       	}
      
       	act.sa_sigaction = urb_handler;
       	sigemptyset(&act.sa_mask);
       	act.sa_flags = SA_SIGINFO;
      
       	rc = sigaction(SIGUSR1, &act, NULL);
       	if (rc == -1) {
       		perror("Error in sigaction");
       		return 1;
       	}
      
       	act.sa_sigaction = ds_handler;
       	sigemptyset(&act.sa_mask);
       	act.sa_flags = SA_SIGINFO;
      
       	rc = sigaction(SIGUSR2, &act, NULL);
       	if (rc == -1) {
       		perror("Error in sigaction");
       		return 1;
       	}
      
       	memset(&urb, 0, sizeof(urb));
       	urb.type = USBDEVFS_URB_TYPE_CONTROL;
       	urb.endpoint = USB_DIR_IN | 0;
       	urb.buffer = buf;
       	urb.buffer_length = sizeof(buf);
       	urb.signr = SIGUSR1;
      
       	req = (struct usb_ctrlrequest *) buf;
       	req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
       	req->bRequest = USB_REQ_GET_DESCRIPTOR;
       	req->wValue = htole16(USB_DT_DEVICE << 8);
       	req->wIndex = htole16(0);
       	req->wLength = htole16(sizeof(buf) - sizeof(*req));
      
       	rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
       	if (rc == -1) {
       		perror("Error in SUBMITURB ioctl");
       		return 1;
       	}
      
       	rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
       	if (rc == -1) {
       		perror("Error in REAPURB ioctl");
       		return 1;
       	}
      
       	memset(&ds, 0, sizeof(ds));
       	ds.signr = SIGUSR2;
       	ds.context = &ds;
       	rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
       	if (rc == -1) {
       		perror("Error in DISCSIGNAL ioctl");
       		return 1;
       	}
      
       	printf("Waiting for usb disconnect\n");
       	while (!done) {
       		sleep(1);
       	}
      
       	close(fd);
       	return 0;
       }
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-usb@vger.kernel.org
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Oliver Neukum <oneukum@suse.com>
      Fixes: v2.3.39
      Cc: stable@vger.kernel.org
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8c3ea85
    • Julien Thierry's avatar
      arm64: irqflags: Add condition flags to inline asm clobber list · 2cd1c187
      Julien Thierry authored
      commit f5706578 upstream.
      
      Some of the inline assembly instruction use the condition flags and need
      to include "cc" in the clobber list.
      
      Fixes: 4a503217 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
      Cc: <stable@vger.kernel.org> # 5.1.x-
      Suggested-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cd1c187
    • Jon Hunter's avatar
      arm64: tegra: Fix AGIC register range · cc43c9ef
      Jon Hunter authored
      commit ba24eee6 upstream.
      
      The Tegra AGIC interrupt controller is an ARM GIC400 interrupt
      controller. Per the ARM GIC device-tree binding, the first address
      region is for the GIC distributor registers and the second address
      region is for the GIC CPU interface registers. The address space for
      the distributor registers is 4kB, but currently this is incorrectly
      defined as 8kB for the Tegra AGIC and overlaps with the CPU interface
      registers. Correct the address space for the distributor to be 4kB.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Fixes: bcdbde43 ("arm64: tegra: Add AGIC node for Tegra210")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc43c9ef
    • Like Xu's avatar
      KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed · edadec19
      Like Xu authored
      commit 6fc3977c upstream.
      
      If a perf_event creation fails due to any reason of the host perf
      subsystem, it has no chance to log the corresponding event for guest
      which may cause abnormal sampling data in guest result. In debug mode,
      this message helps to understand the state of vPMC and we may not
      limit the number of occurrences but not in a spamming style.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edadec19
    • Michael Neuling's avatar
      KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation · 95680d0e
      Michael Neuling authored
      commit 3fefd1cd upstream.
      
      When emulating tsr, treclaim and trechkpt, we incorrectly set CR0. The
      code currently sets:
          CR0 <- 00 || MSR[TS]
      but according to the ISA it should be:
          CR0 <-  0 || MSR[TS] || 0
      
      This fixes the bit shift to put the bits in the correct location.
      
      This is a data integrity issue as CR0 is corrupted.
      
      Fixes: 4bb3c7a0 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
      Cc: stable@vger.kernel.org # v4.17+
      Tested-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95680d0e
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Clear pending decrementer exceptions on nested guest entry · 5328368b
      Suraj Jitindar Singh authored
      commit 3c25ab35 upstream.
      
      If we enter an L1 guest with a pending decrementer exception then this
      is cleared on guest exit if the guest has writtien a positive value
      into the decrementer (indicating that it handled the decrementer
      exception) since there is no other way to detect that the guest has
      handled the pending exception and that it should be dequeued. In the
      event that the L1 guest tries to run a nested (L2) guest immediately
      after this and the L2 guest decrementer is negative (which is loaded
      by L1 before making the H_ENTER_NESTED hcall), then the pending
      decrementer exception isn't cleared and the L2 entry is blocked since
      L1 has a pending exception, even though L1 may have already handled
      the exception and written a positive value for it's decrementer. This
      results in a loop of L1 trying to enter the L2 guest and L0 blocking
      the entry since L1 has an interrupt pending with the outcome being
      that L2 never gets to run and hangs.
      
      Fix this by clearing any pending decrementer exceptions when L1 makes
      the H_ENTER_NESTED hcall since it won't do this if it's decrementer
      has gone negative, and anyway it's decrementer has been communicated
      to L0 in the hdec_expires field and L0 will return control to L1 when
      this goes negative by delivering an H_DECREMENTER exception.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5328368b
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decrementer · eb6bb8d5
      Suraj Jitindar Singh authored
      commit 86953770 upstream.
      
      On POWER9 the decrementer can operate in large decrementer mode where
      the decrementer is 56 bits and signed extended to 64 bits. When not
      operating in this mode the decrementer behaves as a 32 bit decrementer
      which is NOT signed extended (as on POWER8).
      
      Currently when reading a guest decrementer value we don't take into
      account whether the large decrementer is enabled or not, and this
      means the value will be incorrect when the guest is not using the
      large decrementer. Fix this by sign extending the value read when the
      guest isn't using the large decrementer.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb6bb8d5
    • Wanpeng Li's avatar
      KVM: VMX: check CPUID before allowing read/write of IA32_XSS · f4905184
      Wanpeng Li authored
      commit 4d763b16 upstream.
      
      Raise #GP when guest read/write IA32_XSS, but the CPUID bits
      say that it shouldn't exist.
      
      Fixes: 20300099 (kvm: vmx: add MSR logic for XSAVES)
      Reported-by: default avatarXiaoyao Li <xiaoyao.li@linux.intel.com>
      Reported-by: default avatarTao Xu <tao3.xu@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4905184
    • Sean Christopherson's avatar
      KVM: VMX: Fix handling of #MC that occurs during VM-Entry · e64176cd
      Sean Christopherson authored
      commit beb8d93b upstream.
      
      A previous fix to prevent KVM from consuming stale VMCS state after a
      failed VM-Entry inadvertantly blocked KVM's handling of machine checks
      that occur during VM-Entry.
      
      Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
      depending on when the #MC is recognoized.  As it pertains to this bug
      fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
      is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
      indicate the VM-Entry failed.
      
      If a machine-check event occurs during a VM entry, one of the following occurs:
       - The machine-check event is handled as if it occurred before the VM entry:
              ...
       - The machine-check event is handled after VM entry completes:
              ...
       - A VM-entry failure occurs as described in Section 26.7. The basic
         exit reason is 41, for "VM-entry failure due to machine-check event".
      
      Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
      vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
      Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
      in a sane fashion and also simplifies vmx_complete_atomic_exit() since
      VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
      
      Fixes: b060ca3b ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e64176cd
    • Sean Christopherson's avatar
      KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 · 3ea511be
      Sean Christopherson authored
      commit 3b013a29 upstream.
      
      If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must
      be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS
      when MPX is supported.  Because the value effectively comes from vmcs01,
      vmcs02 must be updated even if vmcs12 is clean.
      
      Fixes: 62cf9bd8 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ea511be
    • Sean Christopherson's avatar
      KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped · 4ff7d3d1
      Sean Christopherson authored
      commit 73cb8556 upstream.
      
      ... as a malicious userspace can run a toy guest to generate invalid
      virtual-APIC page addresses in L1, i.e. flood the kernel log with error
      messages.
      
      Fixes: 69090810 ("KVM: nVMX: allow tests to use bad virtual-APIC page address")
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ff7d3d1
    • Sakari Ailus's avatar
      media: videobuf2-dma-sg: Prevent size from overflowing · 95fdf43f
      Sakari Ailus authored
      commit 14f28f5c upstream.
      
      buf->size is an unsigned long; casting that to int will lead to an
      overflow if buf->size exceeds INT_MAX.
      
      Fix this by changing the type to unsigned long instead. This is possible
      as the buf->size is always aligned to PAGE_SIZE, and therefore the size
      will never have values lesser than 0.
      
      Note on backporting to stable: the file used to be under
      drivers/media/v4l2-core, it was moved to the current location after 4.14.
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95fdf43f