1. 29 Sep, 2015 40 commits
    • Thierry Reding's avatar
      iommu/tegra-smmu: Parameterize number of TLB lines · 1f398b31
      Thierry Reding authored
      commit 11cec15b upstream.
      
      The number of TLB lines was increased from 16 on Tegra30 to 32 on
      Tegra114 and later. Parameterize the value so that the initial default
      can be set accordingly.
      
      On Tegra30, initializing the value to 32 would effectively disable the
      TLB and hence cause massive latencies for memory accesses translated
      through the SMMU. This is especially noticeable for isochronuous clients
      such as display, whose FIFOs would continuously underrun.
      
      Fixes: 89184651 ("memory: Add NVIDIA Tegra memory controller support")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f398b31
    • Will Deacon's avatar
      iommu/io-pgtable-arm: Unmap and free table when overwriting with block · fb2908b5
      Will Deacon authored
      commit cf27ec93 upstream.
      
      When installing a block mapping, we unconditionally overwrite a non-leaf
      PTE if we find one. However, this can cause a problem if the following
      sequence of events occur:
      
        (1) iommu_map called for a 4k (i.e. PAGE_SIZE) mapping at some address
            - We initialise the page table all the way down to a leaf entry
            - No TLB maintenance is required, because we're going from invalid
              to valid.
      
        (2) iommu_unmap is called on the mapping installed in (1)
            - We walk the page table to the final (leaf) entry and zero it
            - We only changed a valid leaf entry, so we invalidate leaf-only
      
        (3) iommu_map is called on the same address as (1), but this time for
            a 2MB (i.e. BLOCK_SIZE) mapping)
            - We walk the page table down to the penultimate level, where we
              find a table entry
            - We overwrite the table entry with a block mapping and return
              without any TLB maintenance and without freeing the memory used
              by the now-orphaned table.
      
      This last step can lead to a walk-cache caching the overwritten table
      entry, causing unexpected faults when the new mapping is accessed by a
      device. One way to fix this would be to collapse the page table when
      freeing the last page at a given level, but this would require expensive
      iteration on every map call. Instead, this patch detects the case when
      we are overwriting a table entry and explicitly unmaps the table first,
      which takes care of both freeing and TLB invalidation.
      Reported-by: default avatarBrian Starkey <brian.starkey@arm.com>
      Tested-by: default avatarBrian Starkey <brian.starkey@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb2908b5
    • Emil Medve's avatar
      iommu/fsl: Really fix init section(s) content · 70ebd071
      Emil Medve authored
      commit 57fb907d upstream.
      
      '0f1fb99b iommu/fsl: Fix section mismatch' was intended to address the modpost
      warning and the potential crash. Crash which is actually easy to trigger with a
      'unbind' followed by a 'bind' sequence. The fix is wrong as
      fsl_of_pamu_driver.driver gets added by bus_add_driver() to a couple of
      klist(s) which become invalid/corrupted as soon as the init sections are freed.
      Depending on when/how the init sections storage is reused various/random errors
      and crashes will happen
      
      'cd70d465 iommu/fsl: Various cleanups' contains annotations that go further down
      the wrong path laid by '0f1fb99b iommu/fsl: Fix section mismatch'
      
      Now remove all the incorrect annotations from the above mentioned patches (not
      exactly a revert) and those previously existing in the code, This fixes the
      modpost warning(s), the unbind/bind sequence crashes and the random
      errors/crashes
      
      Fixes: 0f1fb99b ("iommu/fsl: Fix section mismatch")
      Fixes: cd70d465 ("iommu/fsl: Various cleanups")
      Signed-off-by: default avatarEmil Medve <Emilian.Medve@Freescale.com>
      Acked-by: default avatarVarun Sethi <Varun.Sethi@freescale.com>
      Tested-by: default avatarMadalin Bucur <Madalin.Bucur@freescale.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70ebd071
    • NeilBrown's avatar
      md: flush ->event_work before stopping array. · 18c45d9c
      NeilBrown authored
      commit ee5d004f upstream.
      
      The 'event_work' worker used by dm-raid may still be running
      when the array is stopped.  This can result in an oops.
      
      So flush the workqueue on which it is run after detaching
      and before destroying the device.
      Reported-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Fixes: 9d09e663 ("dm: raid456 basic support")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18c45d9c
    • NeilBrown's avatar
      md/raid10: always set reshape_safe when initializing reshape_position. · ae286448
      NeilBrown authored
      commit 299b0685 upstream.
      
      'reshape_position' tracks where in the reshape we have reached.
      'reshape_safe' tracks where in the reshape we have safely recorded
      in the metadata.
      
      These are compared to determine when to update the metadata.
      So it is important that reshape_safe is initialised properly.
      Currently it isn't.  When starting a reshape from the beginning
      it usually has the correct value by luck.  But when reducing the
      number of devices in a RAID10, it has the wrong value and this leads
      to the metadata not being updated correctly.
      This can lead to corruption if the reshape is not allowed to complete.
      
      This patch is suitable for any -stable kernel which supports RAID10
      reshape, which is 3.5 and later.
      
      Fixes: 3ea7daa5 ("md/raid10: add reshape support")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae286448
    • NeilBrown's avatar
      md/raid5: don't let shrink_slab shrink too far. · d7edf5fe
      NeilBrown authored
      commit 49895bcc upstream.
      
      I have a report of drop_one_stripe() called from
      raid5_cache_scan() apparently finding ->max_nr_stripes == 0.
      
      This should not be allowed.
      
      So add a test to keep max_nr_stripes above min_nr_stripes.
      
      Also use a 'mask' rather than a 'mod' in drop_one_stripe
      to ensure 'hash' is valid even if max_nr_stripes does reach zero.
      
      
      Fixes: edbe83ab ("md/raid5: allow the stripe_cache to grow and shrink.")
      Reported-by: default avatarTomas Papan <tomas.papan@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7edf5fe
    • NeilBrown's avatar
      md/raid5: avoid races when changing cache size. · ed8b3124
      NeilBrown authored
      commit 2d5b569b upstream.
      
      Cache size can grow or shrink due to various pressures at
      any time.  So when we resize the cache as part of a 'grow'
      operation (i.e. change the size to allow more devices) we need
      to blocks that automatic growing/shrinking.
      
      So introduce a mutex.  auto grow/shrink uses mutex_trylock()
      and just doesn't bother if there is a blockage.
      Resizing the whole cache holds the mutex to ensure that
      the correct number of new stripes is allocated.
      
      This bug can result in some stripes not being freed when an
      array is stopped.  This leads to the kmem_cache not being
      freed and a subsequent array can try to use the same kmem_cache
      and get confused.
      
      Fixes: edbe83ab ("md/raid5: allow the stripe_cache to grow and shrink.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      ed8b3124
    • Jialing Fu's avatar
      mmc: core: fix race condition in mmc_wait_data_done · cd6a4dd8
      Jialing Fu authored
      commit 71f8a4b8 upstream.
      
      The following panic is captured in ker3.14, but the issue still exists
      in latest kernel.
      ---------------------------------------------------------------------
      [   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
      at virtual address 00000578
      ......
      [   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
      [   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
      [   20.740134] c0 3136 (Compiler) Call trace:
      [   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
      [   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
      [   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
      [   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
      [   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
      [   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
      [   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
      [   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
      [   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
      [   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
      ----------------------------------------------------------------------
      Because in SMP, "mrq" has race condition between below two paths:
      path1: CPU0: <tasklet context>
        static void mmc_wait_data_done(struct mmc_request *mrq)
        {
           mrq->host->context_info.is_done_rcv = true;
           //
           // If CPU0 has just finished "is_done_rcv = true" in path1, and at
           // this moment, IRQ or ICache line missing happens in CPU0.
           // What happens in CPU1 (path2)?
           //
           // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
           // path2 would have chance to break from wait_event_interruptible
           // in mmc_wait_for_data_req_done and continue to run for next
           // mmc_request (mmc_blk_rw_rq_prep).
           //
           // Within mmc_blk_rq_prep, mrq is cleared to 0.
           // If below line still gets host from "mrq" as the result of
           // compiler, the panic happens as we traced.
           wake_up_interruptible(&mrq->host->context_info.wait);
        }
      
      path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
        static int mmc_wait_for_data_req_done(...
        {
           ...
           while (1) {
                 wait_event_interruptible(context_info->wait,
                         (context_info->is_done_rcv ||
                          context_info->is_new_req));
           	   static void mmc_blk_rw_rq_prep(...
                 {
                 ...
                 memset(brq, 0, sizeof(struct mmc_blk_request));
      
      This issue happens very coincidentally; however adding mdelay(1) in
      mmc_wait_data_done as below could duplicate it easily.
      
         static void mmc_wait_data_done(struct mmc_request *mrq)
         {
           mrq->host->context_info.is_done_rcv = true;
      +    mdelay(1);
           wake_up_interruptible(&mrq->host->context_info.wait);
          }
      
      At runtime, IRQ or ICache line missing may just happen at the same place
      of the mdelay(1).
      
      This patch gets the mmc_context_info at the beginning of function, it can
      avoid this race condition.
      Signed-off-by: default avatarJialing Fu <jlfu@marvell.com>
      Tested-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Fixes: 2220eedf ("mmc: fix async request mechanism ....")
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd6a4dd8
    • Jisheng Zhang's avatar
      mmc: sdhci: also get preset value and driver type for MMC_DDR52 · cea49b29
      Jisheng Zhang authored
      commit 0dafa60e upstream.
      
      commit bb8175a8 ("mmc: sdhci: clarify DDR timing mode between
      SD-UHS and eMMC") added MMC_DDR52 as eMMC's DDR mode to be
      distinguished from SD-UHS, but it missed setting driver type for
      MMC_DDR52 timing mode.
      
      So sometimes we get the following error on Marvell BG2Q DMP board:
      
      [    1.559598] mmcblk0: error -84 transferring data, sector 0, nr 8, cmd
      response 0x900, card status 0xb00
      [    1.569314] mmcblk0: retrying using single block read
      [    1.575676] mmcblk0: error -84 transferring data, sector 2, nr 6, cmd
      response 0x900, card status 0x0
      [    1.585202] blk_update_request: I/O error, dev mmcblk0, sector 2
      [    1.591818] mmcblk0: error -84 transferring data, sector 3, nr 5, cmd
      response 0x900, card status 0x0
      [    1.601341] blk_update_request: I/O error, dev mmcblk0, sector 3
      
      This patches fixes this by adding the missing driver type setting.
      
      Fixes: bb8175a8 ("mmc: sdhci: clarify DDR timing mode ...")
      Signed-off-by: default avatarJisheng Zhang <jszhang@marvell.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cea49b29
    • Adam Lee's avatar
      mmc: sdhci-pci: set the clear transfer mode register quirk for O2Micro · 2b1e7d58
      Adam Lee authored
      commit 143b648d upstream.
      
      This patch fixes MMC not working issue on O2Micro/BayHub Host, which
      requires transfer mode register to be cleared when sending no DMA
      command.
      Signed-off-by: default avatarPeter Guo <peter.guo@bayhubtech.com>
      Signed-off-by: default avatarAdam Lee <adam.lee@canonical.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b1e7d58
    • Jann Horn's avatar
      fs: Don't dump core if the corefile would become world-readable. · 2be9c826
      Jann Horn authored
      commit 40f705a7 upstream.
      
      On a filesystem like vfat, all files are created with the same owner
      and mode independent of who created the file. When a vfat filesystem
      is mounted with root as owner of all files and read access for everyone,
      root's processes left world-readable coredumps on it (but other
      users' processes only left empty corefiles when given write access
      because of the uid mismatch).
      
      Given that the old behavior was inconsistent and insecure, I don't see
      a problem with changing it. Now, all processes refuse to dump core unless
      the resulting corefile will only be readable by their owner.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2be9c826
    • Jann Horn's avatar
      fs: if a coredump already exists, unlink and recreate with O_EXCL · 244d3c13
      Jann Horn authored
      commit fbb18169 upstream.
      
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      244d3c13
    • Jaewon Kim's avatar
      vmscan: fix increasing nr_isolated incurred by putback unevictable pages · 154dff39
      Jaewon Kim authored
      commit c54839a7 upstream.
      
      reclaim_clean_pages_from_list() assumes that shrink_page_list() returns
      number of pages removed from the candidate list.  But shrink_page_list()
      puts back mlocked pages without passing it to caller and without
      counting as nr_reclaimed.  This increases nr_isolated.
      
      To fix this, this patch changes shrink_page_list() to pass unevictable
      pages back to caller.  Caller will take care those pages.
      
      Minchan said:
      
      It fixes two issues.
      
      1. With unevictable page, cma_alloc will be successful.
      
      Exactly speaking, cma_alloc of current kernel will fail due to
      unevictable pages.
      
      2. fix leaking of NR_ISOLATED counter of vmstat
      
      With it, too_many_isolated works.  Otherwise, it could make hang until
      the process get SIGKILL.
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      154dff39
    • Helge Deller's avatar
      parisc: Filter out spurious interrupts in PA-RISC irq handler · 804a6f7f
      Helge Deller authored
      commit b1b4e435 upstream.
      
      When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
      long way to go to find the right IRQ line, registering it, then registering the
      serial port and the irq handler for the serial port. During this phase spurious
      interrupts for the serial port may happen which then crashes the kernel because
      the action handler might not have been set up yet.
      
      So, basically it's a race condition between the serial port hardware and the
      CPU which sets up the necessary fields in the irq sructs. The main reason for
      this race is, that we unmask the serial port irqs too early without having set
      up everything properly before (which isn't easily possible because we need the
      IRQ number to register the serial ports).
      
      This patch is a work-around for this problem. It adds checks to the CPU irq
      handler to verify if the IRQ action field has been initialized already. If not,
      we just skip this interrupt (which isn't critical for a serial port at bootup).
      The real fix would probably involve rewriting all PA-RISC specific IRQ code
      (for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
      the irq chips and proper irq enabling along this line.
      
      This bug has been in the PA-RISC port since the beginning, but the crashes
      happened very rarely with currently used hardware.  But on the latest machine
      which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
      1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
      crashed at every boot because of this race. So, without this patch the machine
      would currently be unuseable.
      
      For the record, here is the flow logic:
      1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
      2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
      3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
      4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
      5. serial_init_chip() then registers the 8250 port.
      Problems:
      - In step 4 the CPU irq shouldn't have been registered yet, but after step 5
      - If serial irq happens between 4 and 5 have finished, the kernel will crash
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      804a6f7f
    • John David Anglin's avatar
      parisc: Use double word condition in 64bit CAS operation · f39b5f92
      John David Anglin authored
      commit 1b59ddfc upstream.
      
      The attached change fixes the condition used in the "sub" instruction.
      A double word comparison is needed.  This fixes the 64-bit LWS CAS
      operation on 64-bit kernels.
      
      I can now enable 64-bit atomic support in GCC.
      
      Signed-off-by: John David Anglin <dave.anglin>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f39b5f92
    • Helge Deller's avatar
      PCI,parisc: Enable 64-bit bus addresses on PA-RISC · 645305df
      Helge Deller authored
      commit e02a653e upstream.
      
      Commit 3a9ad0b4 ("PCI: Add pci_bus_addr_t") unconditionally introduced usage of
      64-bit PCI bus addresses on all 64-bit platforms which broke PA-RISC.
      
      It turned out that due to enabling the 64-bit addresses, the PCI logic decided
      to use the GMMIO instead of the LMMIO region. This commit simply disables
      registering the GMMIO and thus we fall back to use the LMMIO region as before.
      
      Reverts commit 45ea2a5f
      ("PCI: Don't use 64-bit bus addresses on PA-RISC")
      
      To: linux-parisc@vger.kernel.org
      Cc: linux-pci@vger.kernel.org
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Meelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      645305df
    • Mitja Spes's avatar
      rtc: abx80x: fix RTC write bit · 1a64393e
      Mitja Spes authored
      commit 5f1b2f77 upstream.
      
      Fix RTC write bit as per application manual
      Signed-off-by: default avatarMitja Spes <mitja@lxnav.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a64393e
    • Joonyoung Shim's avatar
      rtc: s5m: fix to update ctrl register · 4530473f
      Joonyoung Shim authored
      commit ff02c044 upstream.
      
      According to datasheet, the S2MPS13X and S2MPS14X should update write
      buffer via setting WUDR bit to high after ctrl register is written.
      
      If not, ALARM interrupt of rtc-s5m doesn't happen first time when i use
      tools/testing/selftests/timers/rtctest.c test program and hour format is
      used to 12 hour mode in Odroid-XU3 board.
      
      One more issue is the RTC doesn't keep time on Odroid-XU3 board when i
      turn on board after power off even if RTC battery is connected. It can
      be solved as setting WUDR & RUDR bits to high at the same time after
      RTC_CTRL register is written. It's same with condition of only writing
      ALARM registers, so this is for only S2MPS14 and we should set WUDR &
      A_UDR bits to high on S2MPS13.
      
      I can't find any reasonable description about this like fix from
      datasheet, but can find similar codes from rtc driver source of
      hardkernel kernel and vendor kernel.
      Signed-off-by: default avatarJoonyoung Shim <jy0922.shim@samsung.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Tested-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4530473f
    • Joonyoung Shim's avatar
      rtc: s3c: fix disabled clocks for alarm · 68912df9
      Joonyoung Shim authored
      commit 1fb1c35f upstream.
      
      The clock enable/disable codes for alarm have been removed from
      commit 24e14554 ("drivers/rtc/rtc-s3c.c: delete duplicate clock
      control") and the clocks are disabled even if alarm is set, so alarm
      interrupt can't happen.
      
      The s3c_rtc_setaie function can be called several times with 'enabled'
      argument having same value, so it needs to check whether clocks are
      enabled or not.
      Signed-off-by: default avatarJoonyoung Shim <jy0922.shim@samsung.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68912df9
    • Trond Myklebust's avatar
      SUNRPC: Lock the transport layer on shutdown · 85d1ba73
      Trond Myklebust authored
      commit 79234c3d upstream.
      
      Avoid all races with the connect/disconnect handlers by taking the
      transport lock.
      Reported-by: default avatar"Suzuki K. Poulose" <suzuki.poulose@arm.com>
      Acked-by: default avatarJeff Layton <jlayton@poochiereds.net>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85d1ba73
    • Trond Myklebust's avatar
      SUNRPC: Ensure that we wait for connections to complete before retrying · 77bb3c93
      Trond Myklebust authored
      commit 0fdea1e8 upstream.
      
      Commit 718ba5b8, moved the responsibility for unlocking the socket to
      xs_tcp_setup_socket, meaning that the socket will be unlocked before we
      know that it has finished trying to connect. The following patch is based on
      an initial patch by Russell King to ensure that we delay clearing the
      XPRT_CONNECTING flag until we either know that we failed to initiate
      a connection attempt, or the connection attempt itself failed.
      
      Fixes: 718ba5b8 ("SUNRPC: Add helpers to prevent socket create from racing")
      Reported-by: default avatarRussell King <linux@arm.linux.org.uk>
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Tested-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Tested-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77bb3c93
    • Trond Myklebust's avatar
      SUNRPC: xs_reset_transport must mark the connection as disconnected · f160db25
      Trond Myklebust authored
      commit 0c78789e upstream.
      
      In case the reconnection attempt fails.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f160db25
    • Trond Myklebust's avatar
      SUNRPC: Fix a thinko in xs_connect() · fc56e115
      Trond Myklebust authored
      commit 99b1a4c3 upstream.
      
      It is rather pointless to test the value of transport->inet after
      calling xs_reset_transport(), since it will always be zero, and
      so we will never see any exponential back off behaviour.
      Also don't force early connections for SOFTCONN tasks. If the server
      disconnects us, we should respect the exponential backoff.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc56e115
    • Pratyush Anand's avatar
      net: sunrpc: fix tracepoint Warning: unknown op '->' · 0e592fde
      Pratyush Anand authored
      commit 051ac384 upstream.
      
      `perf stat  -e sunrpc:svc_xprt_do_enqueue true` results in
      
      Warning: unknown op '->'
      Warning: [sunrpc:svc_xprt_do_enqueue] unknown op '->'
      
      Similar warning for svc_handle_xprt as well.
      
      Actually TP_printk() should never dereference an address saved in the ring
      buffer that points somewhere in the kernel. There's no guarantee that that
      object still exists (with the exception of static strings).
      
      Therefore change all the arguments for TP_printk(), so that it references
      values existing in the ring buffer only.
      
      While doing that, also fix another possible bug when argument xprt could be
      NULL and TP_fast_assign() tries to access it's elements.
      Signed-off-by: default avatarPratyush Anand <panand@redhat.com>
      Reviewed-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Fixes: 83a712e0 "sunrpc: add some tracepoints around ..."
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e592fde
    • Trond Myklebust's avatar
      Revert "NFSv4: Remove incorrect check in can_open_delegated()" · d40d9de9
      Trond Myklebust authored
      commit 36319608 upstream.
      
      This reverts commit 4e379d36.
      
      This commit opens up a race between the recovery code and the open code.
      Reported-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d40d9de9
    • Trond Myklebust's avatar
      NFSv4.1: Fix a protocol issue with CLOSE stateids · 3d5c6b90
      Trond Myklebust authored
      commit 4a1e2feb upstream.
      
      According to RFC5661 Section 18.2.4, CLOSE is supposed to return
      the zero stateid. This means that nfs_clear_open_stateid_locked()
      cannot assume that the result stateid will always match the 'other'
      field of the existing open stateid when trying to determine a race
      with a parallel OPEN.
      
      Instead, we look at the argument, and check for matches.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d5c6b90
    • Trond Myklebust's avatar
      NFSv4.1/flexfiles: Fix a protocol error in layoutreturn · f6384199
      Trond Myklebust authored
      commit d1354907 upstream.
      
      According to the flexfiles protocol, the layoutreturn should specify an
      array of errors in the following format:
      
      struct ff_ioerr4 {
      	offset4        ffie_offset;
      	length4        ffie_length;
      	stateid4       ffie_stateid;
      	device_error4  ffie_errors<>;
      };
      
      This patch fixes up the code to ensure that our ffie_errors is indeed
      encoded as an array (albeit with only a single entry).
      Reported-by: default avatarTom Haynes <thomas.haynes@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6384199
    • Peng Tao's avatar
      NFS41/flexfiles: zero out DS write wcc · b5a6dec7
      Peng Tao authored
      commit 54204010 upstream.
      
      We do not want to update inode attributes with DS values.
      Signed-off-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5a6dec7
    • Trond Myklebust's avatar
      NFSv4: Force a post-op attribute update when holding a delegation · 66bfdda4
      Trond Myklebust authored
      commit aaae3f00 upstream.
      
      If the ctime or mtime or change attribute have changed because
      of an operation we initiated, we should make sure that we force
      an attribute update. However we do not want to mark the page cache
      for revalidation.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66bfdda4
    • Peng Tao's avatar
      NFS41/flexfiles: update inode after write finishes · 73e8e7b2
      Peng Tao authored
      commit 69f230d9 upstream.
      
      Otherwise we break fstest case tests/read_write/mctime.t
      
      Does files layout need the same fix as well?
      
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Signed-off-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73e8e7b2
    • Trond Myklebust's avatar
      NFS: nfs_set_pgio_error sometimes misses errors · 8d1920be
      Trond Myklebust authored
      commit e9ae58ae upstream.
      
      We should ensure that we always set the pgio_header's error field
      if a READ or WRITE RPC call returns an error. The current code depends
      on 'hdr->good_bytes' always being initialised to a large value, which
      is not always done correctly by callers.
      When this happens, applications may end up missing important errors.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d1920be
    • Kinglong Mee's avatar
      NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client · 87fbed41
      Kinglong Mee authored
      commit 18e3b739 upstream.
      
      ---Steps to Reproduce--
      <nfs-server>
      # cat /etc/exports
      /nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
      /nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)
      
      <nfs-client>
      # mount -t nfs nfs-server:/nfs/ /mnt/
      # ll /mnt/*/
      
      <nfs-server>
      # cat /etc/exports
      /nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
      /nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
      # service nfs restart
      
      <nfs-client>
      # ll /mnt/*/    --->>>>> oops here
      
      [ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
      [ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
      [ 5123.104131] Oops: 0000 [#1]
      [ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
      [ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
      [ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
      [ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
      [ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
      [ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
      [ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
      [ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
      [ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
      [ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
      [ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
      [ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
      [ 5123.112888] Stack:
      [ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
      [ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
      [ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
      [ 5123.115264] Call Trace:
      [ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
      [ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
      [ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
      [ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
      [ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
      [ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
      [ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
      [ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
      [ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
      [ 5123.122308]  RSP <ffff88005877fdb8>
      [ 5123.122942] CR2: 0000000000000000
      
      Fixes: ec011fe8 ("NFS: Introduce a vector of migration recovery ops")
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87fbed41
    • Trond Myklebust's avatar
      NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked() · 0bdce6a8
      Trond Myklebust authored
      commit 6f536936 upstream.
      
      - Switch back to using list_for_each_entry(). Fixes an incorrect test
        for list NULL termination.
      - Do not assume that lists are sorted.
      - Finally, consider an existing entry to match if it consists of a subset
        of the addresses in the new entry.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bdce6a8
    • Trond Myklebust's avatar
      NFS: Don't let the ctime override attribute barriers. · e204275a
      Trond Myklebust authored
      commit 7c2dad99 upstream.
      
      Chuck reports seeing cases where a GETATTR that happens to race
      with an asynchronous WRITE is overriding the file size, despite
      the attribute barrier being set by the writeback code.
      
      The culprit turns out to be the check in nfs_ctime_need_update(),
      which sees that the ctime is newer than the cached ctime, and
      assumes that it is safe to override the attribute barrier.
      This patch removes that override, and ensures that attribute
      barriers are always respected.
      Reported-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Fixes: a08a8cd3 ("NFS: Add attribute update barriers to NFS writebacks")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e204275a
    • NeilBrown's avatar
      NFSv4: don't set SETATTR for O_RDONLY|O_EXCL · 7bc97ee9
      NeilBrown authored
      commit efcbc04e upstream.
      
      It is unusual to combine the open flags O_RDONLY and O_EXCL, but
      it appears that libre-office does just that.
      
      [pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
      [pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>
      
      NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
      probably to reset the timestamps.
      
      When it was an O_RDONLY open, the SETATTR command does not
      identify any actual attributes to change.
      If no delegation was provided to the open, the SETATTR uses the
      all-zeros stateid and the request is accepted (at least by the
      Linux NFS server - no harm, no foul).
      
      If a read-delegation was provided, this is used in the SETATTR
      request, and a Netapp filer will justifiably claim
      NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
      to retry - indefinitely.
      
      So only treat O_EXCL specially if O_CREAT was also given.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bc97ee9
    • Jeff Layton's avatar
      nfsd: ensure that delegation stateid hash references are only put once · ea056d3d
      Jeff Layton authored
      commit 3fcbbd24 upstream.
      
      It's possible that a DELEGRETURN could race with (e.g.) client expiry,
      in which case we could end up putting the delegation hash reference more
      than once.
      
      Have unhash_delegation_locked return a bool that indicates whether it
      was already unhashed. In the case of destroy_delegation we only
      conditionally put the hash reference if that returns true.
      
      The other callers of unhash_delegation_locked call it while walking
      list_heads that shouldn't yet be detached. If we find that it doesn't
      return true in those cases, then throw a WARN_ON as that indicates that
      we have a partially hashed delegation, and that something is likely very
      wrong.
      Tested-by: default avatarAndrew W Elble <aweits@rit.edu>
      Tested-by: default avatarAnna Schumaker <Anna.Schumaker@netapp.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea056d3d
    • Jeff Layton's avatar
      nfsd: ensure that the ol stateid hash reference is only put once · 0940ed48
      Jeff Layton authored
      commit e8568739 upstream.
      
      When an open or lock stateid is hashed, we take an extra reference to
      it. When we unhash it, we drop that reference. The code however does
      not properly account for the case where we have two callers concurrently
      trying to unhash the stateid. This can lead to list corruption and the
      hash reference being put more than once.
      
      Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
      list_head, and then testing to see if that list_head is empty before
      releasing the hash reference. This means that some of the unhashing
      wrappers now become bool return functions so we can test to see whether
      the stateid was unhashed before we put the reference.
      Reported-by: default avatarAndrew W Elble <aweits@rit.edu>
      Tested-by: default avatarAndrew W Elble <aweits@rit.edu>
      Reported-by: default avatarAnna Schumaker <Anna.Schumaker@netapp.com>
      Tested-by: default avatarAnna Schumaker <Anna.Schumaker@netapp.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0940ed48
    • Kinglong Mee's avatar
      nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug · 8751006a
      Kinglong Mee authored
      commit 6896f15a upstream.
      
      Currently we'll respond correctly to a request for either
      FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both
      attributes simultaneously.
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8751006a
    • Trond Myklebust's avatar
      NFSv4/pnfs: Ensure we don't miss a file extension · 9b6d61ed
      Trond Myklebust authored
      commit 2b83d3de upstream.
      
      pNFS writes don't return attributes, however that doesn't mean that we
      should ignore the fact that they may be extending the file. This patch
      ensures that if a write is seen to extend the file, then we always set
      an attribute barrier, and update the cached file size.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: Peng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b6d61ed
    • Filipe Manana's avatar
      Btrfs: check if previous transaction aborted to avoid fs corruption · fe995635
      Filipe Manana authored
      commit 1f9b8c8f upstream.
      
      While we are committing a transaction, it's possible the previous one is
      still finishing its commit and therefore we wait for it to finish first.
      However we were not checking if that previous transaction ended up getting
      aborted after we waited for it to commit, so we ended up committing the
      current transaction which can lead to fs corruption because the new
      superblock can point to trees that have had one or more nodes/leafs that
      were never durably persisted.
      The following sequence diagram exemplifies how this is possible:
      
                CPU 0                                                        CPU 1
      
        transaction N starts
      
        (...)
      
        btrfs_commit_transaction(N)
      
          cur_trans->state = TRANS_STATE_COMMIT_START;
          (...)
          cur_trans->state = TRANS_STATE_COMMIT_DOING;
          (...)
      
          cur_trans->state = TRANS_STATE_UNBLOCKED;
          root->fs_info->running_transaction = NULL;
      
                                                                    btrfs_start_transaction()
                                                                       --> starts transaction N + 1
      
          btrfs_write_and_wait_transaction(trans, root);
            --> starts writing all new or COWed ebs created
                at transaction N
      
                                                                    creates some new ebs, COWs some
                                                                    existing ebs but doesn't COW or
                                                                    deletes eb X
      
                                                                    btrfs_commit_transaction(N + 1)
                                                                      (...)
                                                                      cur_trans->state = TRANS_STATE_COMMIT_START;
                                                                      (...)
                                                                      wait_for_commit(root, prev_trans);
                                                                        --> prev_trans == transaction N
      
          btrfs_write_and_wait_transaction() continues
          writing ebs
             --> fails writing eb X, we abort transaction N
                 and set bit BTRFS_FS_STATE_ERROR on
                 fs_info->fs_state, so no new transactions
                 can start after setting that bit
      
             cleanup_transaction()
               btrfs_cleanup_one_transaction()
                 wakes up task at CPU 1
      
                                                                      continues, doesn't abort because
                                                                      cur_trans->aborted (transaction N + 1)
                                                                      is zero, and no checks for bit
                                                                      BTRFS_FS_STATE_ERROR in fs_info->fs_state
                                                                      are made
      
                                                                      btrfs_write_and_wait_transaction(trans, root);
                                                                        --> succeeds, no errors during writeback
      
                                                                      write_ctree_super(trans, root, 0);
                                                                        --> succeeds
                                                                        --> we have now a superblock that points us
                                                                            to some root that uses eb X, which was
                                                                            never written to disk
      
      In this scenario future attempts to read eb X from disk results in an
      error message like "parent transid verify failed on X wanted Y found Z".
      
      So fix this by aborting the current transaction if after waiting for the
      previous transaction we verify that it was aborted.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe995635