1. 05 Sep, 2014 40 commits
    • Johan Hovold's avatar
      USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID · 21f4a77a
      Johan Hovold authored
      commit 6552cc7f upstream.
      
      Add device id for Basic Micro ATOM Nano USB2Serial adapters.
      Reported-by: default avatarNicolas Alt <n.alt@mytum.de>
      Tested-by: default avatarNicolas Alt <n.alt@mytum.de>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21f4a77a
    • Tony Lindgren's avatar
      ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled · 937f8d16
      Tony Lindgren authored
      commit cc824534 upstream.
      
      Looks like MUSB cable removal can cause wake-up interrupts to
      stop working for device tree based booting at least for UART3
      even as nothing is dynamically remuxed. This can be fixed by
      calling reconfigure_io_chain() for device tree based booting
      in hwmod code. Note that we already do that for legacy booting
      if the legacy mux is configured.
      
      My guess is that this is related to UART3 and MUSB ULPI
      hsusb0_data0 and hsusb0_data1 support for Carkit mode that
      somehow affect the configured IO chain for UART3 and require
      rearming the wake-up interrupts.
      
      In general, for device tree based booting, pinctrl-single
      calls the rearm hook that in turn calls reconfigure_io_chain
      so calling reconfigure_io_chain should not be needed from the
      hwmod code for other events.
      
      So let's limit the hwmod rearming of iochain only to
      HWMOD_FORCE_MSTANDBY where MUSB is currently the only user
      of it. If we see other devices needing similar changes we can
      add more checks for it.
      
      Cc: Paul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      937f8d16
    • Huang Rui's avatar
      usb: xhci: amd chipset also needs short TX quirk · 307dad06
      Huang Rui authored
      commit 2597fe99 upstream.
      
      AMD xHC also needs short tx quirk after tested on most of chipset
      generations. That's because there is the same incorrect behavior like
      Fresco Logic host. Please see below message with on USB webcam
      attached on xHC host:
      
      [  139.262944] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.266934] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.270913] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.274937] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.278914] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.282936] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.286915] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.290938] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.294913] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      [  139.298917] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      Reported-by: default avatarArindam Nath <arindam.nath@amd.com>
      Tested-by: default avatarShriraj-Rai P <shriraj-rai.p@amd.com>
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      307dad06
    • Hans de Goede's avatar
      xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL · d5dbf1cb
      Hans de Goede authored
      commit 9a548863 upstream.
      
      When using a Renesas uPD720231 chipset usb-3 uas to sata bridge with a 120G
      Crucial M500 ssd, model string: Crucial_ CT120M500SSD1, together with a
      the integrated Intel xhci controller on a Haswell laptop:
      
      00:14.0 USB controller [0c03]: Intel Corporation 8 Series USB xHCI HC [8086:9c31] (rev 04)
      
      The following error gets logged to dmesg:
      
      xhci error: Transfer event TRB DMA ptr not part of current TD
      
      Treating COMP_STOP the same as COMP_STOP_INVAL when no event_seg gets found
      fixes this.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5dbf1cb
    • Larry Finger's avatar
      staging: r8188eu: Add new USB ID · 769459b9
      Larry Finger authored
      commit a2fa6721 upstream.
      
      The Elecom WDC-150SU2M uses this chip.
      Reported-by: default avatarHiroki Kondo <kompiro@gmail.com>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      769459b9
    • Holger Paradies's avatar
      staging/rtl8188eu: add 0df6:0076 Sitecom Europe B.V. · ec938407
      Holger Paradies authored
      commit 8626d524 upstream.
      
      The stick is not recognized.
      This dongle uses r8188eu but usb-id is missing.
      3.16.0
      Signed-off-by: default avatarHolger Paradies <retabell@gmx.de>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec938407
    • Mark Einon's avatar
      staging: et131x: Fix errors caused by phydev->addr accesses before initialisation · 980fc164
      Mark Einon authored
      commit ec0a38bf upstream.
      
      Fix two reported bugs, caused by et131x_adapter->phydev->addr being accessed
      before it is initialised, by:
      
      - letting et131x_mii_write() take a phydev address, instead of using the one
        stored in adapter by default. This is so et131x_mdio_write() can use it's own
        addr value.
      - removing implementation of et131x_mdio_reset(), as it's not needed.
      - moving a call to et131x_disable_phy_coma() in et131x_pci_setup(), which uses
        phydev->addr, until after the mdiobus has been registered.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=80751
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=77121Signed-off-by: default avatarMark Einon <mark.einon@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      980fc164
    • Darrick J. Wong's avatar
      jbd2: fix descriptor block size handling errors with journal_csum · 834adfb2
      Darrick J. Wong authored
      commit db9ee220 upstream.
      
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: default avatarTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      834adfb2
    • Darrick J. Wong's avatar
      jbd2: fix infinite loop when recovering corrupt journal blocks · 78353ceb
      Darrick J. Wong authored
      commit 022eaa75 upstream.
      
      When recovering the journal, don't fall into an infinite loop if we
      encounter a corrupt journal block.  Instead, just skip the block and
      return an error, which fails the mount and thus forces the user to run
      a full filesystem fsck.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78353ceb
    • Dmitry Monakhov's avatar
      ext4: update i_disksize coherently with block allocation on error path · 08dc1706
      Dmitry Monakhov authored
      commit 6603120e upstream.
      
      In case of delalloc block i_disksize may be less than i_size. So we
      have to update i_disksize each time we allocated and submitted some
      blocks beyond i_disksize.  We weren't doing this on the error paths,
      so fix this.
      
      testcase: xfstest generic/019
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08dc1706
    • Alexander Usyskin's avatar
      mei: nfc: fix memory leak in error path · 3e6c4753
      Alexander Usyskin authored
      commit 8e8248b1 upstream.
      
      NFC will leak buffer if send failed.
      Use single exit point that does the freeing
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e6c4753
    • Alexander Usyskin's avatar
      mei: reset client state on queued connect request · fec95c2a
      Alexander Usyskin authored
      commit 73ab4232 upstream.
      
      If connect request is queued (e.g. device in pg) set client state
      to initializing, thus avoid preliminary exit in wait if current
      state is disconnected.
      
      This is regression from:
      
      commit e4d8270e
      Author: Alexander Usyskin <alexander.usyskin@intel.com>
      mei: set connecting state just upon connection request is sent to the fw
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fec95c2a
    • Liu Bo's avatar
      Btrfs: fix crash on endio of reading corrupted block · a848c23f
      Liu Bo authored
      commit 38c1c2e4 upstream.
      
      The crash is
      
      ------------[ cut here ]------------
      kernel BUG at fs/btrfs/extent_io.c:2124!
      [...]
      Workqueue: btrfs-endio normal_work_helper [btrfs]
      RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
      
      This is in fact a regression.
      
      It is because we forgot to increase @offset properly in reading corrupted block,
      so that the @offset remains, and this leads to checksum errors while reading
      left blocks queued up in the same bio, and then ends up with hiting the above
      BUG_ON.
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a848c23f
    • Liu Bo's avatar
      Btrfs: fix compressed write corruption on enospc · bd7c57d3
      Liu Bo authored
      commit ce62003f upstream.
      
      When failing to allocate space for the whole compressed extent, we'll
      fallback to uncompressed IO, but we've forgotten to redirty the pages
      which belong to this compressed extent, and these 'clean' pages will
      simply skip 'submit' part and go to endio directly, at last we got data
      corruption as we write nothing.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Tested-By: default avatarMartin Steigerwald <martin@lichtvoll.de>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd7c57d3
    • Filipe Manana's avatar
      Btrfs: read lock extent buffer while walking backrefs · 8a6de3b6
      Filipe Manana authored
      commit 6f7ff6d7 upstream.
      
      Before processing the extent buffer, acquire a read lock on it, so
      that we're safe against concurrent updates on the extent buffer.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a6de3b6
    • Filipe Manana's avatar
      Btrfs: fix csum tree corruption, duplicate and outdated checksums · dcc48de7
      Filipe Manana authored
      commit 27b9a812 upstream.
      
      Under rare circumstances we can end up leaving 2 versions of a checksum
      for the same file extent range.
      
      The reason for this is that after calling btrfs_next_leaf we process
      slot 0 of the leaf it returns, instead of processing the slot set in
      path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
      btrfs_next_leaf() releases the path and before it searches for the next
      leaf, another task might cause a split of the next leaf, which migrates
      some of its keys to the leaf we were processing before calling
      btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
      same leaf but with path->slots[0] having a slot number corresponding
      to the first new key it got, that is, a slot number that didn't exist
      before calling btrfs_next_leaf(), as the leaf now has more keys than
      it had before. So we must really process the returned leaf starting at
      path->slots[0] always, as it isn't always 0, and the key at slot 0 can
      have an offset much lower than our search offset/bytenr.
      
      For example, consider the following scenario, where we have:
      
      sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
      four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472
      
        Leaf N:
      
          slot = 0                           slot = btrfs_header_nritems() - 1
        |-------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
        |-------------------------------------------------------------------|
      
        Leaf N + 1:
      
            slot = 0                          slot = btrfs_header_nritems() - 1
        |--------------------------------------------------------------------|
        | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
        |--------------------------------------------------------------------|
      
      Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
      find the next highest key, which releases the current path and then searches
      for that next key. However after releasing the path and before finding that
      next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
      to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
      btrfs_next_leaf() will returns us a path again with leaf N but with the slot
      pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
      is then:
      
          slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
        |----------------------------------------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
        |----------------------------------------------------------------------------------------------------|
      
      And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
      into the "insert:" label, which will set tmp to:
      
          tmp = min((sums->len - total_bytes) >> blocksize_bits,
              (next_offset - file_key.offset) >> blocksize_bits) =
          min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
          min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4
      
      and
      
         ins_size = csum_size * tmp = 4 * 4 = 16 bytes.
      
      In other words, we insert a new csum item in the tree with key
      (CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
      for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
      because the item with key (CSUM CSUM 40161280) (the one that was moved from
      leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
      bytes of our data and won't get those old checksums removed.
      
      So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
      and breaks the logical rule:
      
         Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover
      
      An obvious bad effect of this is that a subsequent csum tree lookup to get
      the checksum of any of the blocks with logical offset of 40161280, 40165376
      or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcc48de7
    • Takashi Iwai's avatar
      Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch · 1b37263c
      Takashi Iwai authored
      commit 4eb1f66d upstream.
      
      We've got bug reports that btrfs crashes when quota is enabled on
      32bit kernel, typically with the Oops like below:
       BUG: unable to handle kernel NULL pointer dereference at 00000004
       IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
       *pde = 00000000
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
       Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
       task: f1478130 ti: f147c000 task.ti: f147c000
       EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
       EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
       EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
       ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
        DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
       CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
       Stack:
        00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
        00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
        00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
       Call Trace:
        [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
        [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
        [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
        [<c025e38b>] process_one_work+0x11b/0x390
        [<c025eea1>] worker_thread+0x101/0x340
        [<c026432b>] kthread+0x9b/0xb0
        [<c0712a71>] ret_from_kernel_thread+0x21/0x30
        [<c0264290>] kthread_create_on_node+0x110/0x110
      
      This indicates a NULL corruption in prefs_delayed list.  The further
      investigation and bisection pointed that the call of ulist_add_merge()
      results in the corruption.
      
      ulist_add_merge() takes u64 as aux and writes a 64bit value into
      old_aux.  The callers of this function in backref.c, however, pass a
      pointer of a pointer to old_aux.  That is, the function overwrites
      64bit value on 32bit pointer.  This caused a NULL in the adjacent
      variable, in this case, prefs_delayed.
      
      Here is a quick attempt to band-aid over this: a new function,
      ulist_add_merge_ptr() is introduced to pass/store properly a pointer
      value instead of u64.  There are still ugly void ** cast remaining
      in the callers because void ** cannot be taken implicitly.  But, it's
      safer than explicit cast to u64, anyway.
      
      Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b37263c
    • Stephen M. Cameron's avatar
      hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl · cf5d3012
      Stephen M. Cameron authored
      commit 0758f4f7 upstream.
      
      When copy_from_user fails, return -EFAULT, not -ENOMEM
      Signed-off-by: default avatarStephen M. Cameron <scameron@beardog.cce.hp.com>
      Reported-by: default avatarRobert Elliott <elliott@hp.com>
      Reviewed-by: default avatarJoe Handzik <joseph.t.handzik@hp.com>
      Reviewed-by: default avatarScott Teel <scott.teel@hp.com>
      Reviewed by: Mike MIller <michael.miller@canonical.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf5d3012
    • David Vrabel's avatar
      x86/xen: resume timer irqs early · f3ccb9a9
      David Vrabel authored
      commit 8d5999df upstream.
      
      If the timer irqs are resumed during device resume it is possible in
      certain circumstances for the resume to hang early on, before device
      interrupts are resumed.  For an Ubuntu 14.04 PVHVM guest this would
      occur in ~0.5% of resume attempts.
      
      It is not entirely clear what is occuring the point of the hang but I
      think a task necessary for the resume calls schedule_timeout(),
      waiting for a timer interrupt (which never arrives).  This failure may
      require specific tasks to be running on the other VCPUs to trigger
      (processes are not frozen during a suspend/resume if PREEMPT is
      disabled).
      
      Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
      syscore_resume().
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3ccb9a9
    • David Vrabel's avatar
      x86/xen: use vmap() to map grant table pages in PVH guests · 8e669fe4
      David Vrabel authored
      commit 7d951f3c upstream.
      
      Commit b7dd0e35 (x86/xen: safely map and unmap grant frames when
      in atomic context) causes PVH guests to crash in
      arch_gnttab_map_shared() when they attempted to map the pages for the
      grant table.
      
      This use of a PV-specific function during the PVH grant table setup is
      non-obvious and not needed.  The standard vmap() function does the
      right thing.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reported-by: default avatarMukesh Rathor <mukesh.rathor@oracle.com>
      Tested-by: default avatarMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e669fe4
    • Matt Fleming's avatar
      x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub · 0a00abae
      Matt Fleming authored
      commit 7b2a583a upstream.
      
      Without CONFIG_RELOCATABLE the early boot code will decompress the
      kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
      days, that isn't going to fly with UEFI since parts of the firmware
      code/data may be located at LOAD_PHYSICAL_ADDR.
      
      Straying outside of the bounds of the regions we've explicitly requested
      from the firmware will cause all sorts of trouble. Bruno reports that
      his machine resets while trying to decompress the kernel image.
      
      We already go to great pains to ensure the kernel is loaded into a
      suitably aligned buffer, it's just that the address isn't necessarily
      LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
      by the firmware.
      
      Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
      can load the kernel at any address with the correct alignment.
      Reported-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a00abae
    • David Vrabel's avatar
      xen/events/fifo: ensure all bitops are properly aligned even on x86 · 924c4521
      David Vrabel authored
      commit dcecb8fd upstream.
      
      When using the FIFO-based ABI on x86_64, if the last port is at the
      end of an event array page then sync_test_bit() on this port's event
      word will read beyond the end of the page and in certain circumstances
      this may fault.
      
      The fault requires the following page in the kernel's direct mapping
      to be not present, which would mean:
      
      a) the array page is the last page of RAM; or
      
      b) the following page is ballooned out /and/ it has been used for a
         foreign mapping by a kernel driver (such as netback or blkback)
         /and/ the grant has been unmapped.
      
      Use the infrastructure added for arm64 to ensure that all bitops
      operating on event words are unsigned long aligned.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      924c4521
    • Arnd Bergmann's avatar
      hpsa: fix non-x86 builds · 1b313b33
      Arnd Bergmann authored
      commit 0b9e7b74 upstream.
      
      commit 28e13446 "[SCSI] hpsa: enable unit attention reporting"
      turns on unit attention notifications, but got the change wrong for
      all architectures other than x86, which now store an uninitialized
      value into the device register.
      
      Gcc helpfully warns about this:
      
      ../drivers/scsi/hpsa.c: In function 'hpsa_set_driver_support_bits':
      ../drivers/scsi/hpsa.c:6373:17: warning: 'driver_support' is used uninitialized in this function [-Wuninitialized]
        driver_support |= ENABLE_UNIT_ATTN;
                       ^
      
      This moves the #ifdef so only the prefetch-enable is conditional
      on x86, not also reading the initial register contents.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 28e13446 "[SCSI] hpsa: enable unit attention reporting"
      Acked-by: default avatarStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b313b33
    • Andy Lutomirski's avatar
      x86_64/vsyscall: Fix warn_bad_vsyscall log output · c2c5bc2d
      Andy Lutomirski authored
      commit 53b884ac upstream.
      
      This commit in Linux 3.6:
      
          commit c767a54b
          Author: Joe Perches <joe@perches.com>
          Date:   Mon May 21 19:50:07 2012 -0700
      
              x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
      
      caused warn_bad_vsyscall to output garbage in the middle of the
      line.  Revert the bad part of it.
      
      The printk in question isn't actually bare; the level is "%s".
      
      The bug this fixes is purely cosmetic; backports are optional.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/03eac1f24110bbe496ecc12a4df467e0d88466d4.1406330947.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2c5bc2d
    • Brian W Hart's avatar
      powerpc/powernv: Update dev->dma_mask in pci_set_dma_mask() path · daaa4960
      Brian W Hart authored
      commit a32305bf upstream.
      
      powerpc defines various machine-specific routines for handling
      pci_set_dma_mask().  The routines for machine "PowerNV" may neglect
      to set dev->dma_mask.  This could confuse anyone (e.g. drivers) that
      consult dev->dma_mask to find the current mask.  Set the dma_mask in
      the PowerNV leaf routine.
      Signed-off-by: default avatarBrian W. Hart <hartb@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      daaa4960
    • Tyrel Datwyler's avatar
      powerpc/pci: Reorder pci bus/bridge unregistration during PHB removal · 51d79fc2
      Tyrel Datwyler authored
      commit 73400565 upstream.
      
      Commit bcdde7e2 made __sysfs_remove_dir() recursive and introduced a BUG_ON
      during PHB removal while attempting to delete the power managment attribute
      group of the bus. This is a result of tearing the bridge and bus devices down
      out of order in remove_phb_dynamic. Since, the the bus resides below the bridge
      in the sysfs device tree it should be torn down first.
      
      This patch simply moves the device_unregister call for the PHB bridge device
      after the device_unregister call for the PHB bus.
      
      Fixes: bcdde7e2 ("sysfs: make __sysfs_remove_dir() recursive")
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51d79fc2
    • Christoph Schulz's avatar
      x86: don't exclude low BIOS area when allocating address space for non-PCI cards · bc904327
      Christoph Schulz authored
      commit cbace46a upstream.
      
      Commit 30919b0b ("x86: avoid low BIOS area when allocating address
      space") moved the test for resource allocations that fall within the first
      1MB of address space from the PCI-specific path to a generic path, such
      that all resource allocations will avoid this area.  However, this breaks
      ISA cards which need to allocate a memory region within the first 1MB.  An
      example is the i82365 PCMCIA controller and derivatives like the Ricoh
      RF5C296/396 which map part of the PCMCIA socket memory address space into
      the first 1MB of system memory address space.  They do not work anymore as
      no usable memory region exists due to this change:
      
        Intel ISA PCIC probe: Ricoh RF5C296/396 ISA-to-PCMCIA at port 0x3e0 ofs 0x00, 2 sockets
        host opts [0]: none
        host opts [1]: none
        ISA irqs (scanned) = 3,4,5,9,10 status change on irq 10
        pcmcia_socket pcmcia_socket1: pccard: PCMCIA card inserted into slot 1
        pcmcia_socket pcmcia_socket0: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
        pcmcia_socket pcmcia_socket0: cs: IO port probe 0xa00-0xaff: clean.
        pcmcia_socket pcmcia_socket0: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0d0000-0x0dffff: clean.
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0e0000-0x0effff: clean.
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x60000000-0x60ffffff: clean.
        pcmcia_socket pcmcia_socket0: cs: memory probe 0xa0000000-0xa0ffffff: clean.
        pcmcia_socket pcmcia_socket1: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
        pcmcia_socket pcmcia_socket1: cs: IO port probe 0xa00-0xaff: clean.
        pcmcia_socket pcmcia_socket1: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0d0000-0x0dffff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0e0000-0x0effff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x60000000-0x60ffffff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0xa0000000-0xa0ffffff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0cc000-0x0effff: excluding 0xe0000-0xeffff
        pcmcia_socket pcmcia_socket1: cs: unable to map card memory!
      
      If filtering out the first 1MB is reverted, everything works as expected.
      Tested-by: default avatarRobert Resch <fli4l@robert.reschpara.de>
      Signed-off-by: default avatarChristoph Schulz <develop@kristov.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc904327
    • Simone Gotti's avatar
      ACPI / PCI: Fix sysfs acpi_index and label errors · 639b96cc
      Simone Gotti authored
      commit dcfa9be8 upstream.
      
      Fix errors in handling "device label" _DSM return values.
      
      If _DSM returns a Unicode string, the ACPI type is ACPI_TYPE_BUFFER, not
      ACPI_TYPE_STRING.  Fix dsm_label_utf16s_to_utf8s() to convert UTF-16 from
      acpi_object->buffer instead of acpi_object->string.
      
      Prior to v3.14, we accepted Unicode labels (ACPI_TYPE_BUFFER return
      values).  But after 1d0fcef7, we accepted only ASCII (ACPI_TYPE_STRING)
      (and we incorrectly tried to convert those ASCII labels from UTF-16 to
      UTF-8).
      
      Rejecting Unicode labels made us return -EPERM when reading sysfs
      "acpi_index" or "label" files, which in turn caused on-board network
      interfaces on a Dell PowerEdge E420 to be renamed (by udev net_id internal)
      from eno1/eno2 to enp2s0f0/enp2s0f1.
      
      Fix this by accepting either ACPI_TYPE_STRING (and treating it as ASCII) or
      ACPI_TYPE_BUFFER (and converting from UTF-16 to UTF-8).
      
      [bhelgaas: changelog]
      Fixes: 1d0fcef7 ("ACPI / PCI: replace open-coded _DSM code with helper functions")
      Signed-off-by: default avatarSimone Gotti <simone.gotti@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      639b96cc
    • Vidya Sagar's avatar
      PCI: Configure ASPM when enabling device · e9746f98
      Vidya Sagar authored
      commit 1f6ae47e upstream.
      
      We can't do ASPM configuration at enumeration-time because enabling it
      makes some defective hardware unresponsive, even if ASPM is disabled later
      (see 41cd766b ("PCI: Don't enable aspm before drivers have had a chance
      to veto it").  Therefore, we have to do it after a driver claims the
      device.
      
      We previously configured ASPM in pci_set_power_state(), but that's not a
      very good place because it's not really related to setting the PCI device
      power state, and doing it there means:
      
        - We incorrectly skipped ASPM config when setting a device that's
          already in D0 to D0.
      
        - We unnecessarily configured ASPM when setting a device to a low-power
          state (the ASPM feature only applies when the device is in D0).
      
        - We unnecessarily configured ASPM when called from a .resume() method
          (ASPM configuration needs to be restored during resume, but
          pci_restore_pcie_state() should already do this).
      
      Move ASPM configuration from pci_set_power_state() to
      do_pci_enable_device() so we do it when a driver enables a device.
      
      [bhelgaas: changelog]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=79621
      Fixes: db288c9c ("PCI / PM: restore the original behavior of pci_set_power_state()")
      Suggested-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarVidya Sagar <sagar.tv@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9746f98
    • Alex Deucher's avatar
      drm/radeon: add additional SI pci ids · f92f2ce2
      Alex Deucher authored
      commit 37dbeab7 upstream.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f92f2ce2
    • Alex Deucher's avatar
      drm/radeon: add new bonaire pci ids · 6429fe88
      Alex Deucher authored
      commit 5fc540ed upstream.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6429fe88
    • Alex Deucher's avatar
    • Theodore Ts'o's avatar
      ext4: fix BUG_ON in mb_free_blocks() · 8bafd605
      Theodore Ts'o authored
      commit c99d1e6e upstream.
      
      If we suffer a block allocation failure (for example due to a memory
      allocation failure), it's possible that we will call
      ext4_discard_allocated_blocks() before we've actually allocated any
      blocks.  In that case, fe_len and fe_start in ac->ac_f_ex will still
      be zero, and this will result in mb_free_blocks(inode, e4b, 0, 0)
      triggering the BUG_ON on mb_free_blocks():
      
      	BUG_ON(last >= (sb->s_blocksize << 3));
      
      Fix this by bailing out of ext4_discard_allocated_blocks() if fs_len
      is zero.
      
      Also fix a missing ext4_mb_unload_buddy() call in
      ext4_discard_allocated_blocks().
      
      Google-Bug-Id: 16844242
      
      Fixes: 86f0afd4Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bafd605
    • Michael S. Tsirkin's avatar
      kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601) · 42a1927a
      Michael S. Tsirkin authored
      commit 350b8bdd upstream.
      
      The third parameter of kvm_iommu_put_pages is wrong,
      It should be 'gfn - slot->base_gfn'.
      
      By making gfn very large, malicious guest or userspace can cause kvm to
      go to this error path, and subsequently to pass a huge value as size.
      Alternatively if gfn is small, then pages would be pinned but never
      unpinned, causing host memory leak and local DOS.
      
      Passing a reasonable but large value could be the most dangerous case,
      because it would unpin a page that should have stayed pinned, and thus
      allow the device to DMA into arbitrary memory.  However, this cannot
      happen because of the condition that can trigger the error:
      
      - out of memory (where you can't allocate even a single page)
        should not be possible for the attacker to trigger
      
      - when exceeding the iommu's address space, guest pages after gfn
        will also exceed the iommu's address space, and inside
        kvm_iommu_put_pages() the iommu_iova_to_phys() will fail.  The
        page thus would not be unpinned at all.
      Reported-by: default avatarJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42a1927a
    • Paolo Bonzini's avatar
      Revert "KVM: x86: Increase the number of fixed MTRR regs to 10" · 98a634d0
      Paolo Bonzini authored
      commit 0d234daf upstream.
      
      This reverts commit 682367c4,
      which causes 32-bit SMP Windows 7 guests to panic.
      
      SeaBIOS has a limit on the number of MTRRs that it can handle,
      and this patch exceeded the limit.  Better revert it.
      Thanks to Nadav Amit for debugging the cause.
      Reported-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98a634d0
    • Wanpeng Li's avatar
      KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use · dfc7aa43
      Wanpeng Li authored
      commit 56cc2406 upstream.
      
      After commit 77b0f5d6 (KVM: nVMX: Ack and write vector info to intr_info
      if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
      emulated. To do so, KVM will ask the APIC for the interrupt vector if
      during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set.  With APICv,
      kvm_get_apic_interrupt would return -1 and give the following WARNING:
      
      Call Trace:
       [<ffffffff81493563>] dump_stack+0x49/0x5e
       [<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
       [<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
       [<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
       [<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
       [<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
       [<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
       [<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
       [<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
       [<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
      
      To fix this, we cannot rely on the processor's virtual interrupt delivery,
      because "acknowledge interrupt on exit" must only update the virtual
      ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
      but it should not deliver the interrupt through the IDT.  Thus, KVM has
      to deliver the interrupt "by hand", similar to the treatment of EOI in
      commit fc57ac2c (KVM: lapic: sync highest ISR to hardware apic on
      EOI, 2014-05-14).
      
      The patch modifies kvm_cpu_get_interrupt to always acknowledge an
      interrupt; there are only two callers, and the other is not affected
      because it is never reached with kvm_apic_vid_enabled() == true.  Then it
      modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
      to the registers.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Suggested-by: default avatar"Zhang, Yang Z" <yang.z.zhang@intel.com>
      Tested-by: default avatarLiu, RongrongX <rongrongx.liu@intel.com>
      Tested-by: default avatarFelipe Reyes <freyes@suse.com>
      Fixes: 77b0f5d6Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfc7aa43
    • Paolo Bonzini's avatar
      KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table · a5d5291d
      Paolo Bonzini authored
      commit 0f6c0a74 upstream.
      
      Currently, the EOI exit bitmap (used for APICv) does not include
      interrupts that are masked.  However, this can cause a bug that manifests
      as an interrupt storm inside the guest.  Alex Williamson reported the
      bug and is the one who really debugged this; I only wrote the patch. :)
      
      The scenario involves a multi-function PCI device with OHCI and EHCI
      USB functions and an audio function, all assigned to the guest, where
      both USB functions use legacy INTx interrupts.
      
      As soon as the guest boots, interrupts for these devices turn into an
      interrupt storm in the guest; the host does not see the interrupt storm.
      Basically the EOI path does not work, and the guest continues to see the
      interrupt over and over, even after it attempts to mask it at the APIC.
      The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
      with not many changes in the area of APIC/IOAPIC handling).
      
      Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
      on in the eoi_exit_bitmap and TMR, and things then work.  What happens
      is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
      It does not have set bit 59 because the RTE was masked, so the IOAPIC
      never sees the EOI and the interrupt continues to fire in the guest.
      
      My guess was that the guest is masking the interrupt in the redirection
      table in the interrupt routine, i.e. while the interrupt is set in a
      LAPIC's ISR, The simplest fix is to ignore the masking state, we would
      rather have an unnecessary exit rather than a missed IRQ ACK and anyway
      IOAPIC interrupts are not as performance-sensitive as for example MSIs.
      Alex tested this patch and it fixed his bug.
      
      [Thanks to Alex for his precise description of the problem
       and initial debugging effort.  A lot of the text above is
       based on emails exchanged with him.]
      Reported-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5d5291d
    • Nadav Amit's avatar
      KVM: x86: Inter-privilege level ret emulation is not implemeneted · f0ed4477
      Nadav Amit authored
      commit 9e8919ae upstream.
      
      Return unhandlable error on inter-privilege level ret instruction.  This is
      since the current emulation does not check the privilege level correctly when
      loading the CS, and does not pop RSP/SS as needed.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0ed4477
    • Steven Rostedt's avatar
      debugfs: Fix corrupted loop in debugfs_remove_recursive · da660d1a
      Steven Rostedt authored
      commit 485d4402 upstream.
      
      [ I'm currently running my tests on it now, and so far, after a few
       hours it has yet to blow up. I'll run it for 24 hours which it never
       succeeded in the past. ]
      
      The tracing code has a way to make directories within the debugfs file
      system as well as deleting them using mkdir/rmdir in the instance
      directory. This is very limited in functionality, such as there is
      no renames, and the parent directory "instance" can not be modified.
      The tracing code creates the instance directory from the debugfs code
      and then replaces the dentry->d_inode->i_op with its own to allow
      for mkdir/rmdir to work.
      
      When these are called, the d_entry and inode locks need to be released
      to call the instance creation and deletion code. That code has its own
      accounting and locking to serialize everything to prevent multiple
      users from causing harm. As the parent "instance" directory can not
      be modified this simplifies things.
      
      I created a stress test that creates several threads that randomly
      creates and deletes directories thousands of times a second. The code
      stood up to this test and I submitted it a while ago.
      
      Recently I added a new test that adds readers to the mix. While the
      instance directories were being added and deleted, readers would read
      from these directories and even enable tracing within them. This test
      was able to trigger a bug:
      
       general protection fault: 0000 [#1] PREEMPT SMP
       Modules linked in: ...
       CPU: 3 PID: 17789 Comm: rmdir Tainted: G        W     3.15.0-rc2-test+ #41
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
       task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
       RIP: 0010:[<ffffffff811ed5eb>]  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
       RSP: 0018:ffff880077019df8  EFLAGS: 00010246
       RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
       RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
       RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
       R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
       R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
       FS:  00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
       Stack:
        ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
        0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
        00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
       Call Trace:
        [<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
        [<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
        [<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
        [<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
        [<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
       Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
       RIP  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
        RSP <ffff880077019df8>
      
      It took a while, but every time it triggered, it was always in the
      same place:
      
      	list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
      
      Where the child->d_u.d_child seemed to be corrupted.  I added lots of
      trace_printk()s to see what was wrong, and sure enough, it was always
      the child's d_u.d_child field. I looked around to see what touches
      it and noticed that in __dentry_kill() which calls dentry_free():
      
      static void dentry_free(struct dentry *dentry)
      {
      	/* if dentry was never visible to RCU, immediate free is OK */
      	if (!(dentry->d_flags & DCACHE_RCUACCESS))
      		__d_free(&dentry->d_u.d_rcu);
      	else
      		call_rcu(&dentry->d_u.d_rcu, __d_free);
      }
      
      I also noticed that __dentry_kill() unlinks the child->d_u.child
      under the parent->d_lock spin_lock.
      
      Looking back at the loop in debugfs_remove_recursive() it never takes the
      parent->d_lock to do the list walk. Adding more tracing, I was able to
      prove this was the issue:
      
       ftrace-t-15385   1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
          rmdir-15409   2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
      
      The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
      it.
      
      In order to fix this, the list walk needs to be modified a bit to take
      the parent->d_lock. The safe version is no longer necessary, as every
      time we remove a child, the parent->d_lock must be released and the
      list walk must start over. Each time a child is removed, even though it
      may still be on the list, it should be skipped by the first check
      in the loop:
      
      		if (!debugfs_positive(child))
      			continue;
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da660d1a
    • Arnd Bergmann's avatar
      crypto: ux500 - make interrupt mode plausible · de63adc7
      Arnd Bergmann authored
      commit e1f8859e upstream.
      
      The interrupt handler in the ux500 crypto driver has an obviously
      incorrect way to access the data buffer, which for a while has
      caused this build warning:
      
      ../ux500/cryp/cryp_core.c: In function 'cryp_interrupt_handler':
      ../ux500/cryp/cryp_core.c:234:5: warning: passing argument 1 of '__fswab32' makes integer from pointer without a cast [enabled by default]
           writel_relaxed(ctx->indata,
           ^
      In file included from ../include/linux/swab.h:4:0,
                       from ../include/uapi/linux/byteorder/big_endian.h:12,
                       from ../include/linux/byteorder/big_endian.h:4,
                       from ../arch/arm/include/uapi/asm/byteorder.h:19,
                       from ../include/asm-generic/bitops/le.h:5,
                       from ../arch/arm/include/asm/bitops.h:340,
                       from ../include/linux/bitops.h:33,
                       from ../include/linux/kernel.h:10,
                       from ../include/linux/clk.h:16,
                       from ../drivers/crypto/ux500/cryp/cryp_core.c:12:
      ../include/uapi/linux/swab.h:57:119: note: expected '__u32' but argument is of type 'const u8 *'
       static inline __attribute_const__ __u32 __fswab32(__u32 val)
      
      There are at least two, possibly three problems here:
      a) when writing into the FIFO, we copy the pointer rather than the
         actual data we want to give to the hardware
      b) the data pointer is an array of 8-bit values, while the FIFO
         is 32-bit wide, so both the read and write access fail to do
         a proper type conversion
      c) This seems incorrect for big-endian kernels, on which we need to
         byte-swap any register access, but not normally FIFO accesses,
         at least the DMA case doesn't do it either.
      
      This converts the bogus loop to use the same readsl/writesl pair
      that we use for the two other modes (DMA and polling). This is
      more efficient and consistent, and probably correct for endianess.
      
      The bug has existed since the driver was first merged, and was
      probably never detected because nobody tried to use interrupt mode.
      It might make sense to backport this fix to stable kernels, depending
      on how the crypto maintainers feel about that.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: linux-crypto@vger.kernel.org
      Cc: Fabio Baltieri <fabio.baltieri@linaro.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de63adc7