1. 27 Oct, 2015 11 commits
    • Will Deacon's avatar
      arm64: head.S: initialise mdcr_el2 in el2_setup · 01894534
      Will Deacon authored
      commit d10bcd47 upstream.
      
      When entering the kernel at EL2, we fail to initialise the MDCR_EL2
      register which controls debug access and PMU capabilities at EL1.
      
      This patch ensures that the register is initialised so that all traps
      are disabled and all the PMU counters are available to the host. When a
      guest is scheduled, KVM takes care to configure trapping appropriately.
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      01894534
    • Jeff Mahoney's avatar
      btrfs: skip waiting on ordered range for special files · 2f4f9c7e
      Jeff Mahoney authored
      commit a30e577c upstream.
      
      In btrfs_evict_inode, we properly truncate the page cache for evicted
      inodes but then we call btrfs_wait_ordered_range for every inode as well.
      It's the right thing to do for regular files but results in incorrect
      behavior for device inodes for block devices.
      
      filemap_fdatawrite_range gets called with inode->i_mapping which gets
      resolved to the block device inode before getting passed to
      wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi.  What happens
      next depends on whether there's an open file handle associated with the
      inode.  If there is, we write to the block device, which is unexpected
      behavior.  If there isn't, we through normally and inode->i_data is used.
      We can also end up racing against open/close which can result in crashes
      when i_mapping points to a block device inode that has been closed.
      
      Since there can't be any page cache associated with special file inodes,
      it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
      the problem.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911Tested-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2f4f9c7e
    • Filipe Manana's avatar
      Btrfs: fix read corruption of compressed and shared extents · cfe4f0f1
      Filipe Manana authored
      commit 005efedf upstream.
      
      If a file has a range pointing to a compressed extent, followed by
      another range that points to the same compressed extent and a read
      operation attempts to read both ranges (either completely or part of
      them), the pages that correspond to the second range are incorrectly
      filled with zeroes.
      
      Consider the following example:
      
        File layout
        [0 - 8K]                      [8K - 24K]
            |                             |
            |                             |
         points to extent X,         points to extent X,
         offset 4K, length of 8K     offset 0, length 16K
      
        [extent X, compressed length = 4K uncompressed length = 16K]
      
      If a readpages() call spans the 2 ranges, a single bio to read the extent
      is submitted - extent_io.c:submit_extent_page() would only create a new
      bio to cover the second range pointing to the extent if the extent it
      points to had a different logical address than the extent associated with
      the first range. This has a consequence of the compressed read end io
      handler (compression.c:end_compressed_bio_read()) finish once the extent
      is decompressed into the pages covering the first range, leaving the
      remaining pages (belonging to the second range) filled with zeroes (done
      by compression.c:btrfs_clear_biovec_end()).
      
      So fix this by submitting the current bio whenever we find a range
      pointing to a compressed extent that was preceded by a range with a
      different extent map. This is the simplest solution for this corner
      case. Making the end io callback populate both ranges (or more, if we
      have multiple pointing to the same extent) is a much more complex
      solution since each bio is tightly coupled with a single extent map and
      the extent maps associated to the ranges pointing to the shared extent
      can have different offsets and lengths.
      
      The following test case for fstests triggers the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create a test file with a single extent that is compressed (the
            # data we write into it is highly compressible no matter which
            # compression algorithm is used, zlib or lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"        \
                            -c "pwrite -S 0xbb 4K 8K"        \
                            -c "pwrite -S 0xcc 12K 4K"       \
                            $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone our extent into an adjacent offset.
            $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            # Same as before but for this file we clone the extent into a lower
            # file offset.
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K"         \
                            -c "pwrite -S 0xbb 12K 8K"        \
                            -c "pwrite -S 0xcc 20K 4K"        \
                            $SCRATCH_MNT/bar | _filter_xfs_io
      
            $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
                $SCRATCH_MNT/bar $SCRATCH_MNT/bar
      
            echo "File digests before unmounting filesystem:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
      
            # Evicting the inode or clearing the page cache before reading
            # again the file would also trigger the bug - reads were returning
            # all bytes in the range corresponding to the second reference to
            # the extent with a value of 0, but the correct data was persisted
            # (it was a bug exclusively in the read path). The issue happened
            # only if the same readpages() call targeted pages belonging to the
            # first and second ranges that point to the same compressed extent.
            _scratch_remount
      
            echo "File digests after mounting filesystem again:"
            # Must match the same digests we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cfe4f0f1
    • Shaohua Li's avatar
      x86/apic: Serialize LVTT and TSC_DEADLINE writes · 17d85849
      Shaohua Li authored
      commit 5d7c631d upstream.
      
      The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
      MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
      guaranteed that the write to LVTT has reached the APIC before the
      TSC_DEADLINE MSR is written. In such a case the write to the MSR is
      ignored and as a consequence the local timer interrupt never fires.
      
      The SDM decribes this issue for xAPIC and x2APIC modes. The
      serialization methods recommended by the SDM differ.
      
      xAPIC:
       "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
        2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
        3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
        4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
      
      x2APIC:
       "To allow for efficient access to the APIC registers in x2APIC mode,
        the serializing semantics of WRMSR are relaxed when writing to the
        APIC registers. Thus, system software should not use 'WRMSR to APIC
        registers in x2APIC mode' as a serializing instruction. Read and write
        accesses to the APIC registers will occur in program order. A WRMSR to
        an APIC register may complete before all preceding stores are globally
        visible; software can prevent this by inserting a serializing
        instruction, an SFENCE, or an MFENCE before the WRMSR."
      
      The xAPIC method is to just wait for the memory mapped write to hit
      the LVTT by checking whether the MSR write has reached the hardware.
      There is no reason why a proper MFENCE after the memory mapped write would
      not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
      xAPIC case as well.
      
      Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
      unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
      support.
      
      [ tglx: Massaged the changelog ]
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: <Kernel-team@fb.com>
      Cc: <lenb@kernel.org>
      Cc: <fenghua.yu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      17d85849
    • Liu.Zhao's avatar
      USB: option: add ZTE PIDs · 77a428ad
      Liu.Zhao authored
      commit 19ab6bc5 upstream.
      
      This is intended to add ZTE device PIDs on kernel.
      Signed-off-by: default avatarLiu.Zhao <lzsos369@163.com>
      [johan: sort the new entries ]
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      77a428ad
    • Guenter Roeck's avatar
      hwmon: (nct6775) Swap STEP_UP_TIME and STEP_DOWN_TIME registers for most chips · a34de469
      Guenter Roeck authored
      commit 728d2940 upstream.
      
      The STEP_UP_TIME and STEP_DOWN_TIME registers are swapped for all chips but
      NCT6775.
      Reported-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Reviewed-by: default avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a34de469
    • Jann Horn's avatar
      CIFS: fix type confusion in copy offload ioctl · 3249f58a
      Jann Horn authored
      commit 4c17a6d5 upstream.
      
      This might lead to local privilege escalation (code execution as
      kernel) for systems where the following conditions are met:
      
       - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
       - a cifs filesystem is mounted where:
        - the mount option "vers" was used and set to a value >=2.0
        - the attacker has write access to at least one file on the filesystem
      
      To attack this, an attacker would have to guess the target_tcon
      pointer (but guessing wrong doesn't cause a crash, it just returns an
      error code) and win a narrow race.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3249f58a
    • Paul Mackerras's avatar
      powerpc/MSI: Fix race condition in tearing down MSI interrupts · e6d78946
      Paul Mackerras authored
      commit e297c939 upstream.
      
      This fixes a race which can result in the same virtual IRQ number
      being assigned to two different MSI interrupts.  The most visible
      consequence of that is usually a warning and stack trace from the
      sysfs code about an attempt to create a duplicate entry in sysfs.
      
      The race happens when one CPU (say CPU 0) is disposing of an MSI
      while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
      (for example) pnv_teardown_msi_irqs(), which calls
      msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
      hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
      to calling irq_dispose_mapping() to free up the virtal IRQ number,
      CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
      MSI, and gets the same hardware IRQ number that CPU 0 just freed.
      CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
      which sees that there is currently a mapping for that hardware IRQ
      number and returns the corresponding virtual IRQ number (which is
      the same virtual IRQ number that CPU 0 was using).  CPU 0 then
      calls irq_dispose_mapping() and frees that virtual IRQ number.
      Now, if another CPU comes along and calls irq_create_mapping(), it
      is likely to get the virtual IRQ number that was just freed,
      resulting in the same virtual IRQ number apparently being used for
      two different hardware interrupts.
      
      To fix this race, we just move the call to msi_bitmap_free_hwirqs()
      to after the call to irq_dispose_mapping().  Since virq_to_hw()
      doesn't work for the virtual IRQ number after irq_dispose_mapping()
      has been called, we need to call it before irq_dispose_mapping() and
      remember the result for the msi_bitmap_free_hwirqs() call.
      
      The pattern of calling msi_bitmap_free_hwirqs() before
      irq_dispose_mapping() appears in 5 places under arch/powerpc, and
      appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC
      U3/U4 MSI backend") from 2007.
      
      Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      [ kamal: backport to 3.19-stable: pasemi/msi.c -->
        arch/powerpc/sysdev/mpic_pasemi_msi.c ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e6d78946
    • Ard Biesheuvel's avatar
      ARM: 8429/1: disable GCC SRA optimization · 1c2898b2
      Ard Biesheuvel authored
      commit a077224f upstream.
      
      While working on the 32-bit ARM port of UEFI, I noticed a strange
      corruption in the kernel log. The following snprintf() statement
      (in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
      
      	snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
      
      was producing the following output in the log:
      
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      
      As it turns out, this is caused by incorrect code being emitted for
      the string() function in lib/vsprintf.c. The following code
      
      	if (!(spec.flags & LEFT)) {
      		while (len < spec.field_width--) {
      			if (buf < end)
      				*buf = ' ';
      			++buf;
      		}
      	}
      	for (i = 0; i < len; ++i) {
      		if (buf < end)
      			*buf = *s;
      		++buf; ++s;
      	}
      	while (len < spec.field_width--) {
      		if (buf < end)
      			*buf = ' ';
      		++buf;
      	}
      
      when called with len == 0, triggers an issue in the GCC SRA optimization
      pass (Scalar Replacement of Aggregates), which handles promotion of signed
      struct members incorrectly. This is a known but as yet unresolved issue.
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
      case, it is causing the second while loop to be executed erroneously a
      single time, causing the additional space characters to be printed.
      
      So disable the optimization by passing -fno-ipa-sra.
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1c2898b2
    • Guenter Roeck's avatar
      spi: Fix documentation of spi_alloc_master() · fda98848
      Guenter Roeck authored
      commit a394d635 upstream.
      
      Actually, spi_master_put() after spi_alloc_master() must _not_ be followed
      by kfree(). The memory is already freed with the call to spi_master_put()
      through spi_master_class, which registers a release function. Calling both
      spi_master_put() and kfree() results in often nasty (and delayed) crashes
      elsewhere in the kernel, often in the networking stack.
      
      This reverts commit eb4af0f5.
      
      Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269
      or
      http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html
      
      Alexey Klimov: This revert becomes valid after
      94c69f76 when spi-imx.c
      has been fixed and there is no need to call kfree() so comment
      for spi_alloc_master() should be fixed.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAlexey Klimov <alexey.klimov@linaro.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fda98848
    • Tan, Jui Nee's avatar
      spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled · 69a4d746
      Tan, Jui Nee authored
      commit 02bc933e upstream.
      
      On Intel Baytrail, there is case when interrupt handler get called, no SPI
      message is captured. The RX FIFO is indeed empty when RX timeout pending
      interrupt (SSSR_TINT) happens.
      
      Use the BIOS version where both HSUART and SPI are on the same IRQ. Both
      drivers are using IRQF_SHARED when calling the request_irq function. When
      running two separate and independent SPI and HSUART application that
      generate data traffic on both components, user will see messages like
      below on the console:
      
        pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler
      
      This commit will fix this by first checking Receiver Time-out Interrupt,
      if it is disabled, ignore the request and return without servicing.
      Signed-off-by: default avatarTan, Jui Nee <jui.nee.tan@intel.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      69a4d746
  2. 26 Oct, 2015 4 commits
    • Sabrina Dubroca's avatar
      [stable-only] net: add length argument to skb_copy_and_csum_datagram_iovec · 1837825e
      Sabrina Dubroca authored
      Without this length argument, we can read past the end of the iovec in
      memcpy_toiovec because we have no way of knowing the total length of the
      iovec's buffers.
      
      This is needed for stable kernels where 89c22d8c ("net: Fix skb
      csum races when peeking") has been backported but that don't have the
      ioviter conversion, which is almost all the stable trees <= 3.18.
      
      This also fixes a kernel crash for NFS servers when the client uses
       -onfsvers=3,proto=udp to mount the export.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1837825e
    • David Howells's avatar
      KEYS: Don't permit request_key() to construct a new keyring · 35552610
      David Howells authored
      commit 911b79cd upstream.
      
      If request_key() is used to find a keyring, only do the search part - don't
      do the construction part if the keyring was not found by the search.  We
      don't really want keyrings in the negative instantiated state since the
      rejected/negative instantiation error value in the payload is unioned with
      keyring metadata.
      
      Now the kernel gives an error:
      
      	request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted)
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      35552610
    • David Howells's avatar
      KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring · 1881b6b8
      David Howells authored
      commit f05819df upstream.
      
      The following sequence of commands:
      
          i=`keyctl add user a a @s`
          keyctl request2 keyring foo bar @t
          keyctl unlink $i @s
      
      tries to invoke an upcall to instantiate a keyring if one doesn't already
      exist by that name within the user's keyring set.  However, if the upcall
      fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
      other error code.  When the key is garbage collected, the key destroy
      function is called unconditionally and keyring_destroy() uses list_empty()
      on keyring->type_data.link - which is in a union with reject_error.
      Subsequently, the kernel tries to unlink the keyring from the keyring names
      list - which oopses like this:
      
      	BUG: unable to handle kernel paging request at 00000000ffffff8a
      	IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
      	...
      	Workqueue: events key_garbage_collector
      	...
      	RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
      	RSP: 0018:ffff88003e2f3d30  EFLAGS: 00010203
      	RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
      	RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
      	RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
      	R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
      	R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
      	...
      	CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
      	...
      	Call Trace:
      	 [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
      	 [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
      	 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547
      	 [<ffffffff8105fd17>] worker_thread+0x26e/0x361
      	 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
      	 [<ffffffff810648ad>] kthread+0xf3/0xfb
      	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
      	 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
      	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
      
      Note the value in RAX.  This is a 32-bit representation of -ENOKEY.
      
      The solution is to only call ->destroy() if the key was successfully
      instantiated.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1881b6b8
    • David Howells's avatar
      KEYS: Fix race between key destruction and finding a keyring by name · 348fce88
      David Howells authored
      commit 94c4554b upstream.
      
      There appears to be a race between:
      
       (1) key_gc_unused_keys() which frees key->security and then calls
           keyring_destroy() to unlink the name from the name list
      
       (2) find_keyring_by_name() which calls key_permission(), thus accessing
           key->security, on a key before checking to see whether the key usage is 0
           (ie. the key is dead and might be cleaned up).
      
      Fix this by calling ->destroy() before cleaning up the core key data -
      including key->security.
      Reported-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      348fce88
  3. 23 Oct, 2015 2 commits
  4. 21 Oct, 2015 23 commits
    • Jann Horn's avatar
      fs: if a coredump already exists, unlink and recreate with O_EXCL · 0c923501
      Jann Horn authored
      commit fbb18169 upstream.
      
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0c923501
    • Christoph Hellwig's avatar
      scsi_dh: fix randconfig build error · 601f2d65
      Christoph Hellwig authored
      commit 294ab783 upstream.
      
      It looks like the Kconfig check that was meant to fix this (commit
      fe9233fb [SCSI] scsi_dh: fix kconfig related
      build errors) was actually reversed, but no-one noticed until the new set of
      patches which separated DM and SCSI_DH).
      
      Fixes: fe9233fbSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      601f2d65
    • Eric Dumazet's avatar
      task_work: remove fifo ordering guarantee · 4062b842
      Eric Dumazet authored
      commit c8219906 upstream.
      
      In commit f341861f ("task_work: add a scheduling point in
      task_work_run()") I fixed a latency problem adding a cond_resched()
      call.
      
      Later, commit ac3d0da8 added yet another loop to reverse a list,
      bringing back the latency spike :
      
      I've seen in some cases this loop taking 275 ms, if for example a
      process with 2,000,000 files is killed.
      
      We could add yet another cond_resched() in the reverse loop, or we
      can simply remove the reversal, as I do not think anything
      would depend on order of task_work_add() submitted works.
      
      Fixes: ac3d0da8 ("task_work: Make task_work_add() lockless")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4062b842
    • Daniel Borkmann's avatar
      ipv6: fix exthdrs offload registration in out_rt path · a43e6290
      Daniel Borkmann authored
      commit e41b0bed upstream.
      
      We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
      but in error path, we try to unregister it with inet_del_offload(). This
      doesn't seem correct, it should actually be inet6_del_offload(), also
      ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
      also uses rthdr_offload twice), but it got removed entirely later on.
      
      Fixes: 3336288a ("ipv6: Switch to using new offload infrastructure.")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a43e6290
    • Jialing Fu's avatar
      mmc: core: fix race condition in mmc_wait_data_done · 20b79355
      Jialing Fu authored
      commit 71f8a4b8 upstream.
      
      The following panic is captured in ker3.14, but the issue still exists
      in latest kernel.
      ---------------------------------------------------------------------
      [   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
      at virtual address 00000578
      ......
      [   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
      [   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
      [   20.740134] c0 3136 (Compiler) Call trace:
      [   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
      [   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
      [   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
      [   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
      [   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
      [   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
      [   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
      [   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
      [   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
      [   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
      ----------------------------------------------------------------------
      Because in SMP, "mrq" has race condition between below two paths:
      path1: CPU0: <tasklet context>
        static void mmc_wait_data_done(struct mmc_request *mrq)
        {
           mrq->host->context_info.is_done_rcv = true;
           //
           // If CPU0 has just finished "is_done_rcv = true" in path1, and at
           // this moment, IRQ or ICache line missing happens in CPU0.
           // What happens in CPU1 (path2)?
           //
           // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
           // path2 would have chance to break from wait_event_interruptible
           // in mmc_wait_for_data_req_done and continue to run for next
           // mmc_request (mmc_blk_rw_rq_prep).
           //
           // Within mmc_blk_rq_prep, mrq is cleared to 0.
           // If below line still gets host from "mrq" as the result of
           // compiler, the panic happens as we traced.
           wake_up_interruptible(&mrq->host->context_info.wait);
        }
      
      path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
        static int mmc_wait_for_data_req_done(...
        {
           ...
           while (1) {
                 wait_event_interruptible(context_info->wait,
                         (context_info->is_done_rcv ||
                          context_info->is_new_req));
           	   static void mmc_blk_rw_rq_prep(...
                 {
                 ...
                 memset(brq, 0, sizeof(struct mmc_blk_request));
      
      This issue happens very coincidentally; however adding mdelay(1) in
      mmc_wait_data_done as below could duplicate it easily.
      
         static void mmc_wait_data_done(struct mmc_request *mrq)
         {
           mrq->host->context_info.is_done_rcv = true;
      +    mdelay(1);
           wake_up_interruptible(&mrq->host->context_info.wait);
          }
      
      At runtime, IRQ or ICache line missing may just happen at the same place
      of the mdelay(1).
      
      This patch gets the mmc_context_info at the beginning of function, it can
      avoid this race condition.
      Signed-off-by: default avatarJialing Fu <jlfu@marvell.com>
      Tested-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Fixes: 2220eedf ("mmc: fix async request mechanism ....")
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      20b79355
    • Yishai Hadas's avatar
      IB/uverbs: Fix race between ib_uverbs_open and remove_one · c7af2b1f
      Yishai Hadas authored
      commit 35d4a0b6 upstream.
      
      Fixes: 2a72f212 ("IB/uverbs: Remove dev_table")
      
      Before this commit there was a device look-up table that was protected
      by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
      it was dropped and container_of was used instead, it enabled the race
      with remove_one as dev might be freed just after:
      dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
      before the kref_get.
      
      In addition, this buggy patch added some dead code as
      container_of(x,y,z) can never be NULL and so dev can never be NULL.
      As a result the comment above ib_uverbs_open saying "the open method
      will either immediately run -ENXIO" is wrong as it can never happen.
      
      The solution follows Jason Gunthorpe suggestion from below URL:
      https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html
      
      cdev will hold a kref on the parent (the containing structure,
      ib_uverbs_device) and only when that kref is released it is
      guaranteed that open will never be called again.
      
      In addition, fixes the active count scheme to use an atomic
      not a kref to prevent WARN_ON as pointed by above comment
      from Jason.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c7af2b1f
    • Noa Osherovich's avatar
      IB/mlx4: Use correct SL on AH query under RoCE · 57e034c8
      Noa Osherovich authored
      commit 5e99b139 upstream.
      
      The mlx4 IB driver implementation for ib_query_ah used a wrong offset
      (28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
      
      Fixes: fa417f7b ('IB/mlx4: Add support for IBoE')
      Signed-off-by: default avatarShani Michaeli <shanim@mellanox.com>
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      57e034c8
    • Jack Morgenstein's avatar
      IB/mlx4: Forbid using sysfs to change RoCE pkeys · 95901c96
      Jack Morgenstein authored
      commit 2b135db3 upstream.
      
      The pkey mapping for RoCE must remain the default mapping:
      VFs:
        virtual index 0 = mapped to real index 0 (0xFFFF)
        All others indices: mapped to a real pkey index containing an
                            invalid pkey.
      PF:
        virtual index i = real index i.
      
      Don't allow users to change these mappings using files found in
      sysfs.
      
      Fixes: c1e7e466 ('IB/mlx4: Add iov directory in sysfs under the ib device')
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      95901c96
    • Jack Morgenstein's avatar
      IB/mlx4: Fix potential deadlock when sending mad to wire · 588ea508
      Jack Morgenstein authored
      commit 90c1d8b6 upstream.
      
      send_mad_to_wire takes the same spinlock that is taken in
      the interrupt context.  Therefore, it needs irqsave/restore.
      
      Fixes: b9c5d6a6 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      588ea508
    • Kan Liang's avatar
      perf stat: Get correct cpu id for print_aggr · b11a5d7c
      Kan Liang authored
      commit 601083cf upstream.
      
      print_aggr() fails to print per-core/per-socket statistics after commit
      582ec082 ("perf stat: Fix per-socket output bug for uncore events")
      if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
      index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
      used to get aggregated id.
      
      Here is an example:
      
      Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
      cpumask 0,18)
      
        $ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
      
      Without this patch, it failes to get CPU 18 result.
      
         Performance counter stats for 'CPU(s) 0,18':
      
        S0-C0           1            7526851      cycles
        S0-C0           1               1.05 MiB  uncore_imc_0/cas_count_read/
        S1-C0           0      <not counted>      cycles
        S1-C0           0      <not counted> MiB  uncore_imc_0/cas_count_read/
      
      With this patch, it can get both CPU0 and CPU18 result.
      
         Performance counter stats for 'CPU(s) 0,18':
      
        S0-C0           1            6327768      cycles
        S0-C0           1               0.47 MiB  uncore_imc_0/cas_count_read/
        S1-C0           1             330228      cycles
        S1-C0           1               0.29 MiB  uncore_imc_0/cas_count_read/
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: 582ec082 ("perf stat: Fix per-socket output bug for uncore events")
      Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b11a5d7c
    • Michael Ellerman's avatar
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 2a6bb040
      Michael Ellerman authored
      commit 74b5037b upstream.
      
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      PAGE_SIZE.
      
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      
      Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2a6bb040
    • Linus Lüssing's avatar
      batman-adv: Make NC capability changes atomic · 9121de8a
      Linus Lüssing authored
      commit 4635469f upstream.
      
      Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
      OGM handler might undo the set/clear of a specific bit from another
      handler run in between.
      
      Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
      
      Fixes: 3f4841ff ("batman-adv: tvlv - add network coding container")
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9121de8a
    • Linus Lüssing's avatar
      batman-adv: Make DAT capability changes atomic · 298ac5fa
      Linus Lüssing authored
      commit 65d7d460 upstream.
      
      Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
      OGM handler might undo the set/clear of a specific bit from another
      handler run in between.
      
      Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
      
      Fixes: 17cf0ea4 ("batman-adv: tvlv - add distributed arp table container")
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      298ac5fa
    • Arnaldo Carvalho de Melo's avatar
      perf hists: Update the column width for the "srcline" sort key · 08e2d63e
      Arnaldo Carvalho de Melo authored
      commit e8e6d37e upstream.
      
      When we introduce a new sort key, we need to update the
      hists__calc_col_len() function accordingly, otherwise the width
      will be limited to strlen(header).
      
      We can't update it when obtaining a line value for a column (for
      instance, in sort__srcline_cmp()), because we reset it all when doing a
      resort (see hists__output_recalc_col_len()), so we need to, from what is
      in the hist_entry fields, set each of the column widths.
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Fixes: 409a8be6 ("perf tools: Add sort by src line/number")
      Link: http://lkml.kernel.org/n/tip-jgbe0yx8v1gs89cslr93pvz2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      08e2d63e
    • Paul Bolle's avatar
      windfarm: decrement client count when unregistering · b4f5dd7d
      Paul Bolle authored
      commit fe2b5921 upstream.
      
      wf_unregister_client() increments the client count when a client
      unregisters. That is obviously incorrect. Decrement that client count
      instead.
      
      Fixes: 75722d39 ("[PATCH] ppc64: Thermal control for SMU based machines")
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b4f5dd7d
    • Dan Carpenter's avatar
      usb: gadget: m66592-udc: forever loop in set_feature() · 3684f1e3
      Dan Carpenter authored
      commit 5feb5d20 upstream.
      
      There is an "&&" vs "||" typo here so this loops 3000 times or if we get
      unlucky it could loop forever.
      
      Fixes: ceaa0a6e ('usb: gadget: m66592-udc: add support for TEST_MODE')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      [ luis: backported to 3.16:
        - file rename: drivers/usb/gadget/udc/m66592-udc.c ->
          drivers/usb/gadget/m66592-udc.c ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3684f1e3
    • Dan Carpenter's avatar
      clk: versatile: off by one in clk_sp810_timerclken_of_get() · aef48b41
      Dan Carpenter authored
      commit 3294bee8 upstream.
      
      The ">" should be ">=" or we end up reading beyond the end of the array.
      
      Fixes: 6e973d2c ('clk: vexpress: Add separate SP810 driver')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarPawel Moll <pawel.moll@arm.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aef48b41
    • Jan Kara's avatar
      jbd2: avoid infinite loop when destroying aborted journal · 8e8fa8e2
      Jan Kara authored
      commit 841df7df upstream.
      
      Commit 6f6a6fda "jbd2: fix ocfs2 corrupt when updating journal
      superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
      when the journal is aborted. That makes logic in
      jbd2_log_do_checkpoint() bail out which is fine, except that
      jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
      a progress in cleaning the journal. Without it jbd2_journal_destroy()
      just loops in an infinite loop.
      
      Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
      jbd2_log_do_checkpoint() fails with error.
      Reported-by: default avatarEryu Guan <guaneryu@gmail.com>
      Tested-by: default avatarEryu Guan <guaneryu@gmail.com>
      Fixes: 6f6a6fdaSigned-off-by: default avatarJan Kara <jack@suse.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [ luis: backported to 3.16: used Jan's backport ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8e8fa8e2
    • Thomas Huth's avatar
      powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · 1fcfcdca
      Thomas Huth authored
      commit 1c2cb594 upstream.
      
      The EPOW interrupt handler uses rtas_get_sensor(), which in turn
      uses rtas_busy_delay() to wait for RTAS becoming ready in case it
      is necessary. But rtas_busy_delay() is annotated with might_sleep()
      and thus may not be used by interrupts handlers like the EPOW handler!
      This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
      enabled:
      
       BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
       Call Trace:
       [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
       [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
       [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
       [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
       [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
       [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
       [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
       [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
       [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
       [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
       [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
       [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
       [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
      
      Fix this issue by introducing a new rtas_get_sensor_fast() function
      that does not use rtas_busy_delay() - and thus can only be used for
      sensors that do not cause a BUSY condition - known as "fast" sensors.
      
      The EPOW sensor is defined to be "fast" in sPAPR - mpe.
      
      Fixes: 587f83e8 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1fcfcdca
    • Minfei Huang's avatar
      x86/mm: Initialize pmd_idx in page_table_range_init_count() · a9b101ae
      Minfei Huang authored
      commit 9962eea9 upstream.
      
      The variable pmd_idx is not initialized for the first iteration of the
      for loop.
      
      Assign the proper value which indexes the start address.
      
      Fixes: 719272c4 'x86, mm: only call early_ioremap_page_table_range_init() once'
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Cc: tony.luck@intel.com
      Cc: wangnan0@huawei.com
      Cc: david.vrabel@citrix.com
      Reviewed-by: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a9b101ae
    • Bjorn Helgaas's avatar
      PCI: Fix TI816X class code quirk · ea9e218f
      Bjorn Helgaas authored
      commit d1541dc9 upstream.
      
      In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO".
      But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class
      and needs to be shifted to make space for the low-order interface byte.
      
      Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code.
      
      Fixes: 63c44080 ("PCI: Add quirk for setting valid class for TI816X Endpoint")
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Hemant Pedanekar <hemantp@ti.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ea9e218f
    • Jonathon Jongsma's avatar
      drm/qxl: validate monitors config modes · e74904d7
      Jonathon Jongsma authored
      commit bd3e1c7c upstream.
      
      Due to some recent changes in
      drm_helper_probe_single_connector_modes_merge_bits(), old custom modes
      were not being pruned properly. In current kernels,
      drm_mode_validate_basic() is called to sanity-check each mode in the
      list. If the sanity-check passes, the mode's status gets set to to
      MODE_OK. In older kernels this check was not done, so old custom modes
      would still have a status of MODE_UNVERIFIED at this point, and would
      therefore be pruned later in the function.
      
      As a result of this new behavior, the list of modes for a device always
      includes every custom mode ever configured for the device, with the
      largest one listed first. Since desktop environments usually choose the
      first preferred mode when a hotplug event is emitted, this had the
      result of making it very difficult for the user to reduce the size of
      the display.
      
      The qxl driver did implement the mode_valid connector function, but it
      was empty. In order to restore the old behavior where old custom modes
      are pruned, we implement a proper mode_valid function for the qxl
      driver. This function now checks each mode against the last configured
      custom mode and the list of standard modes. If the mode doesn't match
      any of these, its status is set to MODE_BAD so that it will be pruned as
      expected.
      Signed-off-by: default avatarJonathon Jongsma <jjongsma@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e74904d7
    • Hin-Tak Leung's avatar
      hfs: fix B-tree corruption after insertion at position 0 · 50ea6daa
      Hin-Tak Leung authored
      commit b4cc0efe upstream.
      
      Fix B-tree corruption when a new record is inserted at position 0 in the
      node in hfs_brec_insert().
      
      This is an identical change to the corresponding hfs b-tree code to Sergei
      Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
      to keep similar code paths in the hfs and hfsplus drivers in sync, where
      appropriate.
      Signed-off-by: default avatarHin-Tak Leung <htl10@users.sourceforge.net>
      Cc: Sergei Antonov <saproj@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Reviewed-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      50ea6daa