1. 26 Oct, 2015 21 commits
    • Jason Wang's avatar
      kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd · c4e74f99
      Jason Wang authored
      commit 8453fecb upstream.
      
      We only want zero length mmio eventfd to be registered on
      KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
      make sure this.
      
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c4e74f99
    • Will Deacon's avatar
      arm64: head.S: initialise mdcr_el2 in el2_setup · 0d658d98
      Will Deacon authored
      commit d10bcd47 upstream.
      
      When entering the kernel at EL2, we fail to initialise the MDCR_EL2
      register which controls debug access and PMU capabilities at EL1.
      
      This patch ensures that the register is initialised so that all traps
      are disabled and all the PMU counters are available to the host. When a
      guest is scheduled, KVM takes care to configure trapping appropriately.
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0d658d98
    • Daniel Axtens's avatar
      cxl: Fix unbalanced pci_dev_get in cxl_probe · 62414d49
      Daniel Axtens authored
      commit 2925c2fd upstream.
      
      Currently the first thing we do in cxl_probe is to grab a reference
      on the pci device. Later on, we call device_register on our adapter.
      In our remove path, we call device_unregister, but we never call
      pci_dev_put. We therefore leak the device every time we do a
      reflash.
      
      device_register/unregister is sufficient to hold the reference.
      Therefore, drop the call to pci_dev_get.
      
      Here's why this is safe.
      The proposed cxl_probe(pdev) calls cxl_adapter_init:
          a) init calls cxl_adapter_alloc, which creates a struct cxl,
             conventionally called adapter. This struct contains a
             device entry, adapter->dev.
      
          b) init calls cxl_configure_adapter, where we set
             adapter->dev.parent = &dev->dev (here dev is the pci dev)
      
      So at this point, the cxl adapter's device's parent is the PCI
      device that I want to be refcounted properly.
      
          c) init calls cxl_register_adapter
             *) cxl_register_adapter calls device_register(&adapter->dev)
      
      So now we're in device_register, where dev is the adapter device, and
      we want to know if the PCI device is safe after we return.
      
      device_register(&adapter->dev) calls device_initialize() and then
      device_add().
      
      device_add() does a get_device(). device_add() also explicitly grabs
      the device's parent, and calls get_device() on it:
      
               parent = get_device(dev->parent);
      
      So therefore, device_register() takes a lock on the parent PCI dev,
      which is what pci_dev_get() was guarding. pci_dev_get() can therefore
      be safely removed.
      
      Fixes: f204e0b8 ("cxl: Driver code for powernv PCIe based cards for userspace access")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      62414d49
    • Peter Chen's avatar
      usb: chipidea: udc: using the correct stall implementation · 7c1a9e59
      Peter Chen authored
      commit 56ffa1d1 upstream.
      
      According to spec, there are functional and protocol stalls.
      
      For functional stall, it is for bulk and interrupt endpoints,
      below are cases for it:
      - Host sends SET_FEATURE request for Set-Halt, the udc driver
      needs to set stall, and return true unconditionally.
      - The gadget driver may call usb_ep_set_halt to stall certain
      endpoints, if there is a transfer in pending, the udc driver
      should not set stall, and return -EAGAIN accordingly.
      These two kinds of stall need to be cleared by host using CLEAR_FEATURE
      request (Clear-Halt).
      
      For protocol stall, it is for control endpoint, this stall will
      be set if the control request has failed. This stall will be
      cleared by next setup request (hardware will do it).
      
      It fixed usbtest (drivers/usb/misc/usbtest.c) Test 13 "set/clear halt"
      test failure, meanwhile, this change has been verified by
      USB2 CV Compliance Test and MSC Tests.
      
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Felipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@freescale.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7c1a9e59
    • Jeff Mahoney's avatar
      btrfs: skip waiting on ordered range for special files · a6d10b34
      Jeff Mahoney authored
      commit a30e577c upstream.
      
      In btrfs_evict_inode, we properly truncate the page cache for evicted
      inodes but then we call btrfs_wait_ordered_range for every inode as well.
      It's the right thing to do for regular files but results in incorrect
      behavior for device inodes for block devices.
      
      filemap_fdatawrite_range gets called with inode->i_mapping which gets
      resolved to the block device inode before getting passed to
      wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi.  What happens
      next depends on whether there's an open file handle associated with the
      inode.  If there is, we write to the block device, which is unexpected
      behavior.  If there isn't, we through normally and inode->i_data is used.
      We can also end up racing against open/close which can result in crashes
      when i_mapping points to a block device inode that has been closed.
      
      Since there can't be any page cache associated with special file inodes,
      it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
      the problem.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911Tested-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a6d10b34
    • Filipe Manana's avatar
      Btrfs: fix read corruption of compressed and shared extents · 46e971cd
      Filipe Manana authored
      commit 005efedf upstream.
      
      If a file has a range pointing to a compressed extent, followed by
      another range that points to the same compressed extent and a read
      operation attempts to read both ranges (either completely or part of
      them), the pages that correspond to the second range are incorrectly
      filled with zeroes.
      
      Consider the following example:
      
        File layout
        [0 - 8K]                      [8K - 24K]
            |                             |
            |                             |
         points to extent X,         points to extent X,
         offset 4K, length of 8K     offset 0, length 16K
      
        [extent X, compressed length = 4K uncompressed length = 16K]
      
      If a readpages() call spans the 2 ranges, a single bio to read the extent
      is submitted - extent_io.c:submit_extent_page() would only create a new
      bio to cover the second range pointing to the extent if the extent it
      points to had a different logical address than the extent associated with
      the first range. This has a consequence of the compressed read end io
      handler (compression.c:end_compressed_bio_read()) finish once the extent
      is decompressed into the pages covering the first range, leaving the
      remaining pages (belonging to the second range) filled with zeroes (done
      by compression.c:btrfs_clear_biovec_end()).
      
      So fix this by submitting the current bio whenever we find a range
      pointing to a compressed extent that was preceded by a range with a
      different extent map. This is the simplest solution for this corner
      case. Making the end io callback populate both ranges (or more, if we
      have multiple pointing to the same extent) is a much more complex
      solution since each bio is tightly coupled with a single extent map and
      the extent maps associated to the ranges pointing to the shared extent
      can have different offsets and lengths.
      
      The following test case for fstests triggers the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create a test file with a single extent that is compressed (the
            # data we write into it is highly compressible no matter which
            # compression algorithm is used, zlib or lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"        \
                            -c "pwrite -S 0xbb 4K 8K"        \
                            -c "pwrite -S 0xcc 12K 4K"       \
                            $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone our extent into an adjacent offset.
            $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            # Same as before but for this file we clone the extent into a lower
            # file offset.
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K"         \
                            -c "pwrite -S 0xbb 12K 8K"        \
                            -c "pwrite -S 0xcc 20K 4K"        \
                            $SCRATCH_MNT/bar | _filter_xfs_io
      
            $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
                $SCRATCH_MNT/bar $SCRATCH_MNT/bar
      
            echo "File digests before unmounting filesystem:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
      
            # Evicting the inode or clearing the page cache before reading
            # again the file would also trigger the bug - reads were returning
            # all bytes in the range corresponding to the second reference to
            # the extent with a value of 0, but the correct data was persisted
            # (it was a bug exclusively in the read path). The issue happened
            # only if the same readpages() call targeted pages belonging to the
            # first and second ranges that point to the same compressed extent.
            _scratch_remount
      
            echo "File digests after mounting filesystem again:"
            # Must match the same digests we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      46e971cd
    • Carl Frederik Werner's avatar
      ARM: dts: omap3-beagle: make i2c3, ddc and tfp410 gpio work again · 3571c540
      Carl Frederik Werner authored
      commit 3a2fa775 upstream.
      
      Let's fix pinmux address of gpio 170 used by tfp410 powerdown-gpio.
      
      According to the OMAP35x Technical Reference Manual
        CONTROL_PADCONF_I2C3_SDA[15:0]  0x480021C4 mode0: i2c3_sda
        CONTROL_PADCONF_I2C3_SDA[31:16] 0x480021C4 mode4: gpio_170
      the pinmux address of gpio 170 must be 0x480021C6.
      
      The former wrong address broke i2c3 (used by hdmi ddc), resulting in
      kernel message:
        omap_i2c 48060000.i2c: controller timed out
      
      Fixes: 8cecf52b ("ARM: omap3-beagle.dts: add display information")
      Signed-off-by: default avatarCarl Frederik Werner <frederik@cfbw.eu>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3571c540
    • Shaohua Li's avatar
      x86/apic: Serialize LVTT and TSC_DEADLINE writes · ac366408
      Shaohua Li authored
      commit 5d7c631d upstream.
      
      The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
      MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
      guaranteed that the write to LVTT has reached the APIC before the
      TSC_DEADLINE MSR is written. In such a case the write to the MSR is
      ignored and as a consequence the local timer interrupt never fires.
      
      The SDM decribes this issue for xAPIC and x2APIC modes. The
      serialization methods recommended by the SDM differ.
      
      xAPIC:
       "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
        2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
        3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
        4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
      
      x2APIC:
       "To allow for efficient access to the APIC registers in x2APIC mode,
        the serializing semantics of WRMSR are relaxed when writing to the
        APIC registers. Thus, system software should not use 'WRMSR to APIC
        registers in x2APIC mode' as a serializing instruction. Read and write
        accesses to the APIC registers will occur in program order. A WRMSR to
        an APIC register may complete before all preceding stores are globally
        visible; software can prevent this by inserting a serializing
        instruction, an SFENCE, or an MFENCE before the WRMSR."
      
      The xAPIC method is to just wait for the memory mapped write to hit
      the LVTT by checking whether the MSR write has reached the hardware.
      There is no reason why a proper MFENCE after the memory mapped write would
      not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
      xAPIC case as well.
      
      Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
      unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
      support.
      
      [ tglx: Massaged the changelog ]
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: <Kernel-team@fb.com>
      Cc: <lenb@kernel.org>
      Cc: <fenghua.yu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ac366408
    • Will Deacon's avatar
      KVM: arm64: add workaround for Cortex-A57 erratum #852523 · 88645476
      Will Deacon authored
      commit 43297dda upstream.
      
      When restoring the system register state for an AArch32 guest at EL2,
      writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
      which can lead to the guest effectively running with junk in the DACR
      and running into unexpected domain faults.
      
      This patch works around the issue by re-ordering our restoration of the
      AArch32 register aliases so that they happen before the AArch64 system
      registers. Ensuring that the registers are restored in this order
      guarantees that they will be correctly synchronised by the core.
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      88645476
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Fix up user_dmabuf refcounting · 127037ec
      Thomas Hellstrom authored
      commit 54c12bc3 upstream.
      
      If user space calls unreference on a user_dmabuf it will typically
      kill the struct ttm_base_object member which is responsible for the
      user-space visibility. However the dmabuf part may still be alive and
      refcounted. In some situations, like for shared guest-backed surface
      referencing/opening, the driver may try to reference the
      struct ttm_base_object member again, causing an immediate kernel warning
      and a later kernel NULL pointer dereference.
      
      Fix this by always maintaining a reference on the struct
      ttm_base_object member, in situations where it might subsequently be
      referenced.
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarBrian Paul <brianp@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      127037ec
    • Liu.Zhao's avatar
      USB: option: add ZTE PIDs · 19cad4ad
      Liu.Zhao authored
      commit 19ab6bc5 upstream.
      
      This is intended to add ZTE device PIDs on kernel.
      Signed-off-by: default avatarLiu.Zhao <lzsos369@163.com>
      [johan: sort the new entries ]
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      19cad4ad
    • John Stultz's avatar
      time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() · cfd504fe
      John Stultz authored
      commit 2619d7e9 upstream.
      
      The internal clocksteering done for fine-grained error
      correction uses a logarithmic approximation, so any time
      adjtimex() adjusts the clock steering, timekeeping_freqadjust()
      quickly approximates the correct clock frequency over a series
      of ticks.
      
      Unfortunately, the logic in timekeeping_freqadjust(), introduced
      in commit:
      
        dc491596 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
      
      used the abs() function with a s64 error value to calculate the
      size of the approximated adjustment to be made.
      
      Per include/linux/kernel.h:
      
        "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
      
      Thus on 32-bit platforms, this resulted in the clocksteering to
      take a quite dampended random walk trying to converge on the
      proper frequency, which caused the adjustments to be made much
      slower then intended (most easily observed when large
      adjustments are made).
      
      This patch fixes the issue by using abs64() instead.
      Reported-by: default avatarNuno Gonçalves <nunojpg@gmail.com>
      Tested-by: default avatarNuno Goncalves <nunojpg@gmail.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cfd504fe
    • Guenter Roeck's avatar
      hwmon: (nct6775) Swap STEP_UP_TIME and STEP_DOWN_TIME registers for most chips · 6f789ac4
      Guenter Roeck authored
      commit 728d2940 upstream.
      
      The STEP_UP_TIME and STEP_DOWN_TIME registers are swapped for all chips but
      NCT6775.
      Reported-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Reviewed-by: default avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6f789ac4
    • Jann Horn's avatar
      CIFS: fix type confusion in copy offload ioctl · d575294a
      Jann Horn authored
      commit 4c17a6d5 upstream.
      
      This might lead to local privilege escalation (code execution as
      kernel) for systems where the following conditions are met:
      
       - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
       - a cifs filesystem is mounted where:
        - the mount option "vers" was used and set to a value >=2.0
        - the attacker has write access to at least one file on the filesystem
      
      To attack this, an attacker would have to guess the target_tcon
      pointer (but guessing wrong doesn't cause a crash, it just returns an
      error code) and win a narrow race.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d575294a
    • Paul Mackerras's avatar
      powerpc/MSI: Fix race condition in tearing down MSI interrupts · 2ce99cd2
      Paul Mackerras authored
      commit e297c939 upstream.
      
      This fixes a race which can result in the same virtual IRQ number
      being assigned to two different MSI interrupts.  The most visible
      consequence of that is usually a warning and stack trace from the
      sysfs code about an attempt to create a duplicate entry in sysfs.
      
      The race happens when one CPU (say CPU 0) is disposing of an MSI
      while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
      (for example) pnv_teardown_msi_irqs(), which calls
      msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
      hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
      to calling irq_dispose_mapping() to free up the virtal IRQ number,
      CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
      MSI, and gets the same hardware IRQ number that CPU 0 just freed.
      CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
      which sees that there is currently a mapping for that hardware IRQ
      number and returns the corresponding virtual IRQ number (which is
      the same virtual IRQ number that CPU 0 was using).  CPU 0 then
      calls irq_dispose_mapping() and frees that virtual IRQ number.
      Now, if another CPU comes along and calls irq_create_mapping(), it
      is likely to get the virtual IRQ number that was just freed,
      resulting in the same virtual IRQ number apparently being used for
      two different hardware interrupts.
      
      To fix this race, we just move the call to msi_bitmap_free_hwirqs()
      to after the call to irq_dispose_mapping().  Since virq_to_hw()
      doesn't work for the virtual IRQ number after irq_dispose_mapping()
      has been called, we need to call it before irq_dispose_mapping() and
      remember the result for the msi_bitmap_free_hwirqs() call.
      
      The pattern of calling msi_bitmap_free_hwirqs() before
      irq_dispose_mapping() appears in 5 places under arch/powerpc, and
      appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC
      U3/U4 MSI backend") from 2007.
      
      Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      [ kamal: backport to 3.19-stable: pasemi/msi.c -->
        arch/powerpc/sysdev/mpic_pasemi_msi.c ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2ce99cd2
    • Ard Biesheuvel's avatar
      ARM: 8429/1: disable GCC SRA optimization · 6d81ea7f
      Ard Biesheuvel authored
      commit a077224f upstream.
      
      While working on the 32-bit ARM port of UEFI, I noticed a strange
      corruption in the kernel log. The following snprintf() statement
      (in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
      
      	snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
      
      was producing the following output in the log:
      
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      
      As it turns out, this is caused by incorrect code being emitted for
      the string() function in lib/vsprintf.c. The following code
      
      	if (!(spec.flags & LEFT)) {
      		while (len < spec.field_width--) {
      			if (buf < end)
      				*buf = ' ';
      			++buf;
      		}
      	}
      	for (i = 0; i < len; ++i) {
      		if (buf < end)
      			*buf = *s;
      		++buf; ++s;
      	}
      	while (len < spec.field_width--) {
      		if (buf < end)
      			*buf = ' ';
      		++buf;
      	}
      
      when called with len == 0, triggers an issue in the GCC SRA optimization
      pass (Scalar Replacement of Aggregates), which handles promotion of signed
      struct members incorrectly. This is a known but as yet unresolved issue.
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
      case, it is causing the second while loop to be executed erroneously a
      single time, causing the additional space characters to be printed.
      
      So disable the optimization by passing -fno-ipa-sra.
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6d81ea7f
    • Guenter Roeck's avatar
      spi: Fix documentation of spi_alloc_master() · 70761657
      Guenter Roeck authored
      commit a394d635 upstream.
      
      Actually, spi_master_put() after spi_alloc_master() must _not_ be followed
      by kfree(). The memory is already freed with the call to spi_master_put()
      through spi_master_class, which registers a release function. Calling both
      spi_master_put() and kfree() results in often nasty (and delayed) crashes
      elsewhere in the kernel, often in the networking stack.
      
      This reverts commit eb4af0f5.
      
      Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269
      or
      http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html
      
      Alexey Klimov: This revert becomes valid after
      94c69f76 when spi-imx.c
      has been fixed and there is no need to call kfree() so comment
      for spi_alloc_master() should be fixed.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAlexey Klimov <alexey.klimov@linaro.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      70761657
    • Tan, Jui Nee's avatar
      spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled · 88f143b3
      Tan, Jui Nee authored
      commit 02bc933e upstream.
      
      On Intel Baytrail, there is case when interrupt handler get called, no SPI
      message is captured. The RX FIFO is indeed empty when RX timeout pending
      interrupt (SSSR_TINT) happens.
      
      Use the BIOS version where both HSUART and SPI are on the same IRQ. Both
      drivers are using IRQF_SHARED when calling the request_irq function. When
      running two separate and independent SPI and HSUART application that
      generate data traffic on both components, user will see messages like
      below on the console:
      
        pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler
      
      This commit will fix this by first checking Receiver Time-out Interrupt,
      if it is disabled, ignore the request and return without servicing.
      Signed-off-by: default avatarTan, Jui Nee <jui.nee.tan@intel.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      88f143b3
    • David Howells's avatar
      KEYS: Don't permit request_key() to construct a new keyring · cf9f252f
      David Howells authored
      commit 911b79cd upstream.
      
      If request_key() is used to find a keyring, only do the search part - don't
      do the construction part if the keyring was not found by the search.  We
      don't really want keyrings in the negative instantiated state since the
      rejected/negative instantiation error value in the payload is unioned with
      keyring metadata.
      
      Now the kernel gives an error:
      
      	request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted)
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cf9f252f
    • David Howells's avatar
      KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring · b67c9e4a
      David Howells authored
      commit f05819df upstream.
      
      The following sequence of commands:
      
          i=`keyctl add user a a @s`
          keyctl request2 keyring foo bar @t
          keyctl unlink $i @s
      
      tries to invoke an upcall to instantiate a keyring if one doesn't already
      exist by that name within the user's keyring set.  However, if the upcall
      fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
      other error code.  When the key is garbage collected, the key destroy
      function is called unconditionally and keyring_destroy() uses list_empty()
      on keyring->type_data.link - which is in a union with reject_error.
      Subsequently, the kernel tries to unlink the keyring from the keyring names
      list - which oopses like this:
      
      	BUG: unable to handle kernel paging request at 00000000ffffff8a
      	IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
      	...
      	Workqueue: events key_garbage_collector
      	...
      	RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
      	RSP: 0018:ffff88003e2f3d30  EFLAGS: 00010203
      	RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
      	RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
      	RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
      	R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
      	R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
      	...
      	CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
      	...
      	Call Trace:
      	 [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
      	 [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
      	 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547
      	 [<ffffffff8105fd17>] worker_thread+0x26e/0x361
      	 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
      	 [<ffffffff810648ad>] kthread+0xf3/0xfb
      	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
      	 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
      	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
      
      Note the value in RAX.  This is a 32-bit representation of -ENOKEY.
      
      The solution is to only call ->destroy() if the key was successfully
      instantiated.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b67c9e4a
    • David Howells's avatar
      KEYS: Fix race between key destruction and finding a keyring by name · 423661a2
      David Howells authored
      commit 94c4554b upstream.
      
      There appears to be a race between:
      
       (1) key_gc_unused_keys() which frees key->security and then calls
           keyring_destroy() to unlink the name from the name list
      
       (2) find_keyring_by_name() which calls key_permission(), thus accessing
           key->security, on a key before checking to see whether the key usage is 0
           (ie. the key is dead and might be cleaned up).
      
      Fix this by calling ->destroy() before cleaning up the core key data -
      including key->security.
      Reported-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      423661a2
  2. 23 Oct, 2015 2 commits
  3. 21 Oct, 2015 1 commit
    • Jann Horn's avatar
      fs: if a coredump already exists, unlink and recreate with O_EXCL · abcd0a01
      Jann Horn authored
      commit fbb18169 upstream.
      
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      abcd0a01
  4. 20 Oct, 2015 16 commits
    • Christoph Hellwig's avatar
      scsi_dh: fix randconfig build error · 61095333
      Christoph Hellwig authored
      commit 294ab783 upstream.
      
      It looks like the Kconfig check that was meant to fix this (commit
      fe9233fb [SCSI] scsi_dh: fix kconfig related
      build errors) was actually reversed, but no-one noticed until the new set of
      patches which separated DM and SCSI_DH).
      
      Fixes: fe9233fbSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      61095333
    • Daniel Borkmann's avatar
      netlink, mmap: fix edge-case leakages in nf queue zero-copy · c6ce5b9c
      Daniel Borkmann authored
      commit 6bb0fef4 upstream.
      
      When netlink mmap on receive side is the consumer of nf queue data,
      it can happen that in some edge cases, we write skb shared info into
      the user space mmap buffer:
      
      Assume a possible rx ring frame size of only 4096, and the network skb,
      which is being zero-copied into the netlink skb, contains page frags
      with an overall skb->len larger than the linear part of the netlink
      skb.
      
      skb_zerocopy(), which is generic and thus not aware of the fact that
      shared info cannot be accessed for such skbs then tries to write and
      fill frags, thus leaking kernel data/pointers and in some corner cases
      possibly writing out of bounds of the mmap area (when filling the
      last slot in the ring buffer this way).
      
      I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
      an advertised length larger than 4096, where the linear part is visible
      at the slot beginning, and the leaked sizeof(struct skb_shared_info)
      has been written to the beginning of the next slot (also corrupting
      the struct nl_mmap_hdr slot header incl. status etc), since skb->end
      points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.
      
      The fix adds and lets __netlink_alloc_skb() take the actual needed
      linear room for the network skb + meta data into account. It's completely
      irrelevant for non-mmaped netlink sockets, but in case mmap sockets
      are used, it can be decided whether the available skb_tailroom() is
      really large enough for the buffer, or whether it needs to internally
      fallback to a normal alloc_skb().
      
      >From nf queue side, the information whether the destination port is
      an mmap RX ring is not really available without extra port-to-socket
      lookup, thus it can only be determined in lower layers i.e. when
      __netlink_alloc_skb() is called that checks internally for this. I
      chose to add the extra ldiff parameter as mmap will then still work:
      We have data_len and hlen in nfqnl_build_packet_message(), data_len
      is the full length (capped at queue->copy_range) for skb_zerocopy()
      and hlen some possible part of data_len that needs to be copied; the
      rem_len variable indicates the needed remaining linear mmap space.
      
      The only other workaround in nf queue internally would be after
      allocation time by f.e. cap'ing the data_len to the skb_tailroom()
      iff we deal with an mmap skb, but that would 1) expose the fact that
      we use a mmap skb to upper layers, and 2) trim the skb where we
      otherwise could just have moved the full skb into the normal receive
      queue.
      
      After the patch, in my test case the ring slot doesn't fit and therefore
      shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
      thus needs to be picked up via recv().
      
      Fixes: 3ab1f683 ("nfnetlink: add support for memory mapped netlink")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c6ce5b9c
    • Linus Lüssing's avatar
      batman-adv: Make NC capability changes atomic · 857e1d74
      Linus Lüssing authored
      commit 4635469f upstream.
      
      Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
      OGM handler might undo the set/clear of a specific bit from another
      handler run in between.
      
      Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
      
      Fixes: 3f4841ff ("batman-adv: tvlv - add network coding container")
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      857e1d74
    • Linus Lüssing's avatar
      batman-adv: Make DAT capability changes atomic · b1f1d512
      Linus Lüssing authored
      commit 65d7d460 upstream.
      
      Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
      OGM handler might undo the set/clear of a specific bit from another
      handler run in between.
      
      Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
      
      Fixes: 17cf0ea4 ("batman-adv: tvlv - add distributed arp table container")
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b1f1d512
    • Chuck Lever's avatar
      svcrdma: Fix send_reply() scatter/gather set-up · 2bad139d
      Chuck Lever authored
      commit 9d11b51c upstream.
      
      The Linux NFS server returns garbage in the data payload of inline
      NFS/RDMA READ replies. These are READs of under 1000 bytes or so
      where the client has not provided either a reply chunk or a write
      list.
      
      The NFS server delivers the data payload for an NFS READ reply to
      the transport in an xdr_buf page list. If the NFS client did not
      provide a reply chunk or a write list, send_reply() is supposed to
      set up a separate sge for the page containing the READ data, and
      another sge for XDR padding if needed, then post all of the sges via
      a single SEND Work Request.
      
      The problem is send_reply() does not advance through the xdr_buf
      when setting up scatter/gather entries for SEND WR. It always calls
      dma_map_xdr with xdr_off set to zero. When there's more than one
      sge, dma_map_xdr() sets up the SEND sge's so they all point to the
      xdr_buf's head.
      
      The current Linux NFS/RDMA client always provides a reply chunk or
      a write list when performing an NFS READ over RDMA. Therefore, it
      does not exercise this particular case. The Linux server has never
      had to use more than one extra sge for building RPC/RDMA replies
      with a Linux client.
      
      However, an NFS/RDMA client _is_ allowed to send small NFS READs
      without setting up a write list or reply chunk. The NFS READ reply
      fits entirely within the inline reply buffer in this case. This is
      perhaps a more efficient way of performing NFS READs that the Linux
      NFS/RDMA client may some day adopt.
      
      Fixes: b432e6b3 ('svcrdma: Change DMA mapping logic to . . .')
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2bad139d
    • Krzysztof Kozlowski's avatar
      thermal: exynos: Disable the regulator on probe failure · 061d7d61
      Krzysztof Kozlowski authored
      commit 5f09a5cb upstream.
      
      During probe the regulator (if present) was enabled but not disabled in
      case of failure. So an unsuccessful probe lead to enabling the
      regulator which was actually not needed because the device was not
      enabled.
      
      Additionally each deferred probe lead to increase of regulator enable
      count so it would not be effectively disabled during removal of the
      device.
      
      Test HW: Exynos4412 - Trats2 board
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 498d22f6 ("thermal: exynos: Support for TMU regulator defined at device tree")
      Reviewed-by: default avatarJavier Martinez Canillas <javier.martinez@collabora.co.uk>
      Signed-off-by: default avatarLukasz Majewski <l.majewski@samsung.com>
      Tested-by: default avatarLukasz Majewski <l.majewski@samsung.com>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      061d7d61
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix 64-bits register writes · 6f86ac90
      Florian Fainelli authored
      commit 03679a14 upstream.
      
      The macro to write 64-bits quantities to the 32-bits register swapped
      the value and offsets arguments, we want to preserve the ordering of the
      arguments with respect to how writel() is implemented for instance:
      value first, offset/base second.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6f86ac90
    • Daniel Borkmann's avatar
      ebpf: fix fd refcount leaks related to maps in bpf syscall · f6531970
      Daniel Borkmann authored
      commit 592867bf upstream.
      
      We may already have gotten a proper fd struct through fdget(), so
      whenever we return at the end of an map operation, we need to call
      fdput(). However, each map operation from syscall side first probes
      CHECK_ATTR() to verify that unused fields in the bpf_attr union are
      zero.
      
      In case of malformed input, we return with error, but the lookup to
      the map_fd was already performed at that time, so that we return
      without an corresponding fdput(). Fix it by performing an fdget()
      only right before bpf_map_get(). The fdget() invocation on maps in
      the verifier is not affected.
      
      Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f6531970
    • Eric Dumazet's avatar
      task_work: remove fifo ordering guarantee · fcd51a97
      Eric Dumazet authored
      commit c8219906 upstream.
      
      In commit f341861f ("task_work: add a scheduling point in
      task_work_run()") I fixed a latency problem adding a cond_resched()
      call.
      
      Later, commit ac3d0da8 added yet another loop to reverse a list,
      bringing back the latency spike :
      
      I've seen in some cases this loop taking 275 ms, if for example a
      process with 2,000,000 files is killed.
      
      We could add yet another cond_resched() in the reverse loop, or we
      can simply remove the reversal, as I do not think anything
      would depend on order of task_work_add() submitted works.
      
      Fixes: ac3d0da8 ("task_work: Make task_work_add() lockless")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fcd51a97
    • Haggai Eran's avatar
      IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow · 289f3b34
      Haggai Eran authored
      commit 11d74804 upstream.
      
      The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in
      its error flow even though there is never a case where the error flow
      occurs with a valid MR pointer to destroy.
      
      Remove the clean_mr() call and the incorrect comment above it.
      
      Fixes: b4cfe447 ("IB/mlx5: Implement on demand paging by adding
      support for MMU notifiers")
      Cc: Eli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarHaggai Eran <haggaie@mellanox.com>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      289f3b34
    • Viresh Kumar's avatar
      cpufreq: dt: Tolerance applies on both sides of target voltage · caf1dbe3
      Viresh Kumar authored
      commit a2022001 upstream.
      
      Tolerance applies on both sides of the target voltage, i.e. both min and
      max sides. But while checking if a voltage is supported by the regulator
      or not, we haven't taken care of tolerance on the lower side. Fix that.
      
      Cc: Lucas Stach <l.stach@pengutronix.de>
      Fixes: 045ee45c ("cpufreq: cpufreq-dt: disable unsupported OPPs")
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      caf1dbe3
    • Daniel Borkmann's avatar
      ipv6: fix exthdrs offload registration in out_rt path · 664f1547
      Daniel Borkmann authored
      commit e41b0bed upstream.
      
      We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
      but in error path, we try to unregister it with inet_del_offload(). This
      doesn't seem correct, it should actually be inet6_del_offload(), also
      ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
      also uses rthdr_offload twice), but it got removed entirely later on.
      
      Fixes: 3336288a ("ipv6: Switch to using new offload infrastructure.")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      664f1547
    • Daniel Borkmann's avatar
      sock, diag: fix panic in sock_diag_put_filterinfo · 0d84a1a2
      Daniel Borkmann authored
      commit b382c086 upstream.
      
      diag socket's sock_diag_put_filterinfo() dumps classic BPF programs
      upon request to user space (ss -0 -b). However, native eBPF programs
      attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method:
      
      Their orig_prog is always NULL. However, sock_diag_put_filterinfo()
      unconditionally tries to access its filter length resp. wants to copy
      the filter insns from there. Internal cBPF to eBPF transformations
      attached to sockets don't have this issue, as orig_prog state is kept.
      
      It's currently only used by packet sockets. If we would want to add
      native eBPF support in the future, this needs to be done through
      a different attribute than PACKET_DIAG_FILTER to not confuse possible
      user space disassemblers that work on diag data.
      
      Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0d84a1a2
    • Lukas Wunner's avatar
      drm/i915: Preserve SSC earlier · b9e7e50f
      Lukas Wunner authored
      commit 69f92f67 upstream.
      
      Commit 92122789 ("drm/i915: preserve SSC if previously set v3")
      added code to intel_modeset_gem_init to override the SSC status read
      from VBT with the SSC status set by BIOS.
      
      However, intel_modeset_gem_init is invoked *after* intel_modeset_init,
      which calls intel_setup_outputs, which *modifies* SSC status by way of
      intel_init_pch_refclk. So unlike advertised, intel_modeset_gem_init
      doesn't preserve the SSC status set by BIOS but whatever
      intel_init_pch_refclk decided on.
      
      This is a problem on dual gpu laptops such as the MacBook Pro which
      require either a handler to switch DDC lines, or the discrete gpu
      to proxy DDC/AUX communication: Both the handler and the discrete
      gpu may initialize after the i915 driver, and consequently, an LVDS
      connector may initially seem disconnected and the SSC therefore
      is disabled by intel_init_pch_refclk, but on reprobe the connector
      may turn out to be connected and the SSC must then be enabled.
      
      Due to 92122789 however, the SSC is not enabled on reprobe since
      it is assumed BIOS disabled it while in fact it was disabled by
      intel_init_pch_refclk.
      
      Also, because the SSC status is preserved so late, the preserved value
      only ever gets used on resume but not on panel initialization:
      intel_modeset_init calls intel_init_display which indirectly calls
      intel_panel_use_ssc via multiple subroutines, *before* the BIOS value
      overrides the VBT value in intel_modeset_gem_init (intel_panel_use_ssc
      is the sole user of dev_priv->vbt.lvds_use_ssc).
      
      Fix this by moving the code introduced by 92122789 from
      intel_modeset_gem_init to intel_modeset_init before the invocation
      of intel_setup_outputs and intel_init_display.
      
      Add a DRM_DEBUG_KMS as suggested way back by Jani:
      http://lists.freedesktop.org/archives/intel-gfx/2014-June/046666.html
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88861
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61115Tested-by: default avatarPaul Hordiienko <pvt.gord@gmail.com>
          [MBP  6,2 2010  intel ILK + nvidia GT216  pre-retina]
      Tested-by: default avatarWilliam Brown <william@blackhats.net.au>
          [MBP  8,2 2011  intel SNB + amd turks     pre-retina]
      Tested-by: default avatarLukas Wunner <lukas@wunner.de>
          [MBP  9,1 2012  intel IVB + nvidia GK107  pre-retina]
      Tested-by: default avatarBruno Bierbaumer <bruno@bierbaumer.net>
          [MBP 11,3 2013  intel HSW + nvidia GK107  retina -- work in progress]
      Fixes: 92122789 ("drm/i915: preserve SSC if previously set v3")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b9e7e50f
    • Jialing Fu's avatar
      mmc: core: fix race condition in mmc_wait_data_done · ef0439a0
      Jialing Fu authored
      commit 71f8a4b8 upstream.
      
      The following panic is captured in ker3.14, but the issue still exists
      in latest kernel.
      ---------------------------------------------------------------------
      [   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
      at virtual address 00000578
      ......
      [   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
      [   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
      [   20.740134] c0 3136 (Compiler) Call trace:
      [   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
      [   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
      [   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
      [   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
      [   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
      [   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
      [   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
      [   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
      [   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
      [   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
      ----------------------------------------------------------------------
      Because in SMP, "mrq" has race condition between below two paths:
      path1: CPU0: <tasklet context>
        static void mmc_wait_data_done(struct mmc_request *mrq)
        {
           mrq->host->context_info.is_done_rcv = true;
           //
           // If CPU0 has just finished "is_done_rcv = true" in path1, and at
           // this moment, IRQ or ICache line missing happens in CPU0.
           // What happens in CPU1 (path2)?
           //
           // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
           // path2 would have chance to break from wait_event_interruptible
           // in mmc_wait_for_data_req_done and continue to run for next
           // mmc_request (mmc_blk_rw_rq_prep).
           //
           // Within mmc_blk_rq_prep, mrq is cleared to 0.
           // If below line still gets host from "mrq" as the result of
           // compiler, the panic happens as we traced.
           wake_up_interruptible(&mrq->host->context_info.wait);
        }
      
      path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
        static int mmc_wait_for_data_req_done(...
        {
           ...
           while (1) {
                 wait_event_interruptible(context_info->wait,
                         (context_info->is_done_rcv ||
                          context_info->is_new_req));
           	   static void mmc_blk_rw_rq_prep(...
                 {
                 ...
                 memset(brq, 0, sizeof(struct mmc_blk_request));
      
      This issue happens very coincidentally; however adding mdelay(1) in
      mmc_wait_data_done as below could duplicate it easily.
      
         static void mmc_wait_data_done(struct mmc_request *mrq)
         {
           mrq->host->context_info.is_done_rcv = true;
      +    mdelay(1);
           wake_up_interruptible(&mrq->host->context_info.wait);
          }
      
      At runtime, IRQ or ICache line missing may just happen at the same place
      of the mdelay(1).
      
      This patch gets the mmc_context_info at the beginning of function, it can
      avoid this race condition.
      Signed-off-by: default avatarJialing Fu <jlfu@marvell.com>
      Tested-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Fixes: 2220eedf ("mmc: fix async request mechanism ....")
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ef0439a0
    • Yishai Hadas's avatar
      IB/uverbs: Fix race between ib_uverbs_open and remove_one · d6723451
      Yishai Hadas authored
      commit 35d4a0b6 upstream.
      
      Fixes: 2a72f212 ("IB/uverbs: Remove dev_table")
      
      Before this commit there was a device look-up table that was protected
      by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
      it was dropped and container_of was used instead, it enabled the race
      with remove_one as dev might be freed just after:
      dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
      before the kref_get.
      
      In addition, this buggy patch added some dead code as
      container_of(x,y,z) can never be NULL and so dev can never be NULL.
      As a result the comment above ib_uverbs_open saying "the open method
      will either immediately run -ENXIO" is wrong as it can never happen.
      
      The solution follows Jason Gunthorpe suggestion from below URL:
      https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html
      
      cdev will hold a kref on the parent (the containing structure,
      ib_uverbs_device) and only when that kref is released it is
      guaranteed that open will never be called again.
      
      In addition, fixes the active count scheme to use an atomic
      not a kref to prevent WARN_ON as pointed by above comment
      from Jason.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d6723451