1. 24 Jun, 2016 17 commits
  2. 08 Jun, 2016 23 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.6.2 · 536b1f59
      Greg Kroah-Hartman authored
      536b1f59
    • Jon Hunter's avatar
      regulator: Fix deadlock during regulator registration · 18055cbc
      Jon Hunter authored
      commit a2151374 upstream.
      
      Commit 5e3ca2b3 ("regulator: Try to resolve regulators supplies on
      registration") added a call to regulator_resolve_supply() within
      regulator_register() where the regulator_list_mutex is held. This causes
      a deadlock to occur on the Tegra114 Dalmore board when the palmas PMIC
      is registered because regulator_register_resolve_supply() calls
      regulator_dev_lookup() which may try to acquire the regulator_list_mutex
      again.
      
      Fix this by releasing the mutex before calling
      regulator_register_resolve_supply() and update the error exit path to
      ensure the mutex is released on an error.
      
      [Made commit message more legible -- broonie]
      Signed-off-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18055cbc
    • Mike Marciniszyn's avatar
      IB/hfi1: Fix hard lockup due to not using save/restore spin lock · 55d37507
      Mike Marciniszyn authored
      commit 7049de65 upstream.
      
      Commit b9b06cb6
      ("IB/hfi1: Fix missing lock/unlock in verbs drain callback")
      added a spin lock.
      
      Unfortunately, the new lock code can be called from a base
      level interrupt state, and an interrupt that can get stacked
      will attempt to get the same lock.
      
      Fix by using the flag save/restore spin lock variation.
      
      Cc: stable@vger.kernel.org # 4.6+
      Reviewed-by: default avatarSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55d37507
    • Arnd Bergmann's avatar
      drm: msm: remove unused variable · 1ce76644
      Arnd Bergmann authored
      commit 6979cd54 upstream.
      
      A recent cleanup removed the only user of the 'kms' variable in
      msm_preclose(), causing a harmless compiler warning:
      
      drivers/gpu/drm/msm/msm_drv.c: In function 'msm_preclose':
      drivers/gpu/drm/msm/msm_drv.c:468:18: error: unused variable 'kms' [-Werror=unused-variable]
      
      This removes the variable as well.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 4016260b ("drm/msm: fix bug after preclose removal")
      Signed-off-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ce76644
    • Dave Chinner's avatar
      xfs: skip stale inodes in xfs_iflush_cluster · 9b09e967
      Dave Chinner authored
      commit 7d3aa7fe upstream.
      
      We don't write back stale inodes so we should skip them in
      xfs_iflush_cluster, too.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b09e967
    • Dave Chinner's avatar
      xfs: fix inode validity check in xfs_iflush_cluster · 03b58aaa
      Dave Chinner authored
      commit 51b07f30 upstream.
      
      Some careless idiot(*) wrote crap code in commit 1a3e8f3d ("xfs:
      convert inode cache lookups to use RCU locking") back in late 2010,
      and so xfs_iflush_cluster checks the wrong inode for whether it is
      still valid under RCU protection. Fix it to lock and check the
      correct inode.
      
      (*) Careless-idiot: Dave Chinner <dchinner@redhat.com>
      Discovered-by: default avatarBrain Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03b58aaa
    • Dave Chinner's avatar
      xfs: xfs_iflush_cluster fails to abort on error · 94e7cf3f
      Dave Chinner authored
      commit b1438f47 upstream.
      
      When a failure due to an inode buffer occurs, the error handling
      fails to abort the inode writeback correctly. This can result in the
      inode being reclaimed whilst still in the AIL, leading to
      use-after-free situations as well as filesystems that cannot be
      unmounted as the inode log items left in the AIL never get removed.
      
      Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
      the inode flush being aborted correctly.
      Reported-by: default avatarShyam Kaushik <shyam@zadarastorage.com>
      Diagnosed-by: default avatarShyam Kaushik <shyam@zadarastorage.com>
      Tested-by: default avatarShyam Kaushik <shyam@zadarastorage.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94e7cf3f
    • Dave Chinner's avatar
      xfs: remove xfs_fs_evict_inode() · 5b0d2a62
      Dave Chinner authored
      commit 8179c036 upstream.
      
      Joe Lawrence reported a list_add corruption with 4.6-rc1 when
      testing some custom md administration code that made it's own
      block device nodes for the md array. The simple test loop of:
      
      for i in {0..100}; do
      	mknod --mode=0600 $tmp/tmp_node b $MAJOR $MINOR
      	mdadm --detail --export $tmp/tmp_node > /dev/null
      	rm -f $tmp/tmp_node
      done
      
      
      Would produce this warning in bd_acquire() when mdadm opened the
      device node:
      
      list_add double add: new=ffff88043831c7b8, prev=ffff8804380287d8, next=ffff88043831c7b8.
      
      And then produce this from bd_forget from kdevtmpfs evicting a block
      dev inode:
      
      list_del corruption. prev->next should be ffff8800bb83eb10, but was ffff88043831c7b8
      
      This is a regression caused by commit c19b3b05 ("xfs: mode di_mode
      to vfs inode"). The issue is that xfs_inactive() frees the
      unlinked inode, and the above commit meant that this freeing zeroed
      the mode in the struct inode. The problem is that after evict() has
      called ->evict_inode, it expects the i_mode to be intact so that it
      can call bd_forget() or cd_forget() to drop the reference to the
      block device inode attached to the XFS inode.
      
      In reality, the only thing we do in xfs_fs_evict_inode() that is not
      generic is call xfs_inactive(). We can move the xfs_inactive() call
      to xfs_fs_destroy_inode() without any problems at all, and this
      will leave the VFS inode intact until it is completely done with it.
      
      So, remove xfs_fs_evict_inode(), and do the work it used to do in
      ->destroy_inode instead.
      Reported-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b0d2a62
    • Dave Chinner's avatar
      xfs: Don't wrap growfs AGFL indexes · 3abc7b73
      Dave Chinner authored
      commit ad747e3b upstream.
      
      Commit 96f859d5 ("libxfs: pack the agfl header structure so
      XFS_AGFL_SIZE is correct") allowed the freelist to use the empty
      slot at the end of the freelist on 64 bit systems that was not
      being used due to sizeof() rounding up the structure size.
      
      This has caused versions of xfs_repair prior to 4.5.0 (which also
      has the fix) to report this as a corruption once the filesystem has
      been grown. Older kernels can also have problems (seen from a whacky
      container/vm management environment) mounting filesystems grown on a
      system with a newer kernel than the vm/container it is deployed on.
      
      To avoid this problem, change the initial free list indexes not to
      wrap across the end of the AGFL, hence avoiding the initialisation
      of agf_fllast to the last index in the AGFL.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3abc7b73
    • Eric Sandeen's avatar
      xfs: disallow rw remount on fs with unknown ro-compat features · 88f4d3b7
      Eric Sandeen authored
      commit d0a58e83 upstream.
      
      Today, a kernel which refuses to mount a filesystem read-write
      due to unknown ro-compat features can still transition to read-write
      via the remount path.  The old kernel is most likely none the wiser,
      because it's unaware of the new feature, and isn't using it.  However,
      writing to the filesystem may well corrupt metadata related to that
      new feature, and moving to a newer kernel which understand the feature
      will have problems.
      
      Right now the only ro-compat feature we have is the free inode btree,
      which showed up in v3.16.  It would be good to push this back to
      all the active stable kernels, I think, so that if anyone is using
      newer mkfs (which enables the finobt feature) with older kernel
      releases, they'll be protected.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBill O'Donnell <billodo@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88f4d3b7
    • Arnd Bergmann's avatar
      gcov: disable tree-loop-im to reduce stack usage · 579c08b8
      Arnd Bergmann authored
      commit c87bf431 upstream.
      
      Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like
      
      lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
      lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      After some investigation, I found that this behavior started with gcc-4.9,
      and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
      A suggested workaround for it is to use the -fno-tree-loop-im
      flag that turns off one of the optimization stages in gcc, so the
      code runs a little slower but does not use excessive amounts
      of stack.
      
      We could make this conditional on the gcc version, but I could not
      find an easy way to do this in Kbuild and the benefit would be
      fairly small, given that most of the gcc version in production are
      affected now.
      
      I'm marking this for 'stable' backports because it addresses a bug
      with code generation in gcc that exists in all kernel versions
      with the affected gcc releases.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      579c08b8
    • Kirill A. Shutemov's avatar
      mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() · e2a120ab
      Kirill A. Shutemov authored
      commit 0798d3c0 upstream.
      
      If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail
      page from a pte, the "address" must be THP aligned in order for the
      page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds.
      
      Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com
      Fixes: 6d0a07ed ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Tested-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2a120ab
    • Srinivas Pandruvada's avatar
      scripts/package/Makefile: rpmbuild add support of RPMOPTS · 6d7ceda7
      Srinivas Pandruvada authored
      commit 65a9f31c upstream.
      
      After commit 21a59991 ("scripts/package/Makefile: rpmbuild is needed
      for rpm targets"), it is no longer possible to specify RPMOPTS.
      For example, we can no longer able to control _topdir using the following
      make command.
      make RPMOPTS="--define '_topdir /home/xyz/workspace/'" binrpm-pkg
      
      Fixes: 21a59991 ("scripts/package/Makefile: rpmbuild is needed for rpm targets")
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d7ceda7
    • Ville Syrjälä's avatar
      dma-debug: avoid spinlock recursion when disabling dma-debug · 923dd9f8
      Ville Syrjälä authored
      commit 3017cd63 upstream.
      
      With netconsole (at least) the pr_err("...  disablingn") call can
      recurse back into the dma-debug code, where it'll try to grab
      free_entries_lock again.  Avoid the problem by doing the printk after
      dropping the lock.
      
      Link: http://lkml.kernel.org/r/1463678421-18683-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      923dd9f8
    • Rafael J. Wysocki's avatar
      PM / sleep: Handle failures in device_suspend_late() consistently · 41103c79
      Rafael J. Wysocki authored
      commit 3a17fb32 upstream.
      
      Grygorii Strashko reports:
      
       The PM runtime will be left disabled for the device if its
       .suspend_late() callback fails and async suspend is not allowed
       for this device. In this case device will not be added in
       dpm_late_early_list and dpm_resume_early() will ignore this
       device, as result PM runtime will be disabled for it forever
       (side effect: after 8 subsequent failures for the same device
       the PM runtime will be reenabled due to disable_depth overflow).
      
      To fix this problem, add devices to dpm_late_early_list regardless
      of whether or not device_suspend_late() returns errors for them.
      
      That will ensure failures in there to be handled consistently for
      all devices regardless of their async suspend/resume status.
      Reported-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41103c79
    • Weston Andros Adamson's avatar
      nfs: avoid race that crashes nfs_init_commit · b64a69a6
      Weston Andros Adamson authored
      commit ade8febd upstream.
      
      Since the patch "NFS: Allow multiple commit requests in flight per file"
      we can run multiple simultaneous commits on the same inode.  This
      introduced a race over collecting pages to commit that made it possible
      to call nfs_init_commit() with an empty list - which causes crashes like
      the one below.
      
      The fix is to catch this race and avoid calling nfs_init_commit and
      initiate_commit when there is no work to do.
      
      Here is the crash:
      
      [600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      [600522.078475] IP: [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
      [600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0
      [600522.078972] Oops: 0000 [#1] SMP
      [600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3
      [600522.081380]  vmw_pvscsi ata_generic pata_acpi
      [600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1
      [600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
      [600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000
      [600522.083378] RIP: 0010:[<ffffffffa0479e72>]  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
      [600522.083973] RSP: 0018:ffff88042ae87438  EFLAGS: 00010246
      [600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588
      [600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40
      [600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0
      [600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0
      [600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840
      [600522.087484] FS:  00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [600522.088070] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0
      [600522.089327] Stack:
      [600522.089926]  0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa
      [600522.090549]  0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930
      [600522.091169]  ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840
      [600522.091789] Call Trace:
      [600522.092420]  [<ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4]
      [600522.093052]  [<ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles]
      [600522.093696]  [<ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
      [600522.094359]  [<ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs]
      [600522.095032]  [<ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs]
      [600522.095719]  [<ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs]
      [600522.096410]  [<ffffffff811a8122>] try_to_release_page+0x32/0x50
      [600522.097109]  [<ffffffff811bd4f1>] shrink_page_list+0x961/0xb30
      [600522.097812]  [<ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550
      [600522.098530]  [<ffffffff811bea65>] shrink_lruvec+0x635/0x840
      [600522.099250]  [<ffffffff811bed60>] shrink_zone+0xf0/0x2f0
      [600522.099974]  [<ffffffff811bf312>] do_try_to_free_pages+0x192/0x470
      [600522.100709]  [<ffffffff811bf6ca>] try_to_free_pages+0xda/0x170
      [600522.101464]  [<ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970
      [600522.102235]  [<ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230
      [600522.103000]  [<ffffffff813a1589>] ? cpumask_any_but+0x39/0x50
      [600522.103774]  [<ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490
      [600522.104558]  [<ffffffff810e3438>] ? __wake_up+0x48/0x60
      [600522.105357]  [<ffffffff811d7d3b>] do_wp_page+0xab/0x4f0
      [600522.106137]  [<ffffffff810a1bbb>] ? release_task+0x36b/0x470
      [600522.106902]  [<ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0
      [600522.107659]  [<ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900
      [600522.108431]  [<ffffffff81067ef1>] __do_page_fault+0x181/0x420
      [600522.109173]  [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280
      [600522.109893]  [<ffffffff810681c0>] do_page_fault+0x30/0x80
      [600522.110594]  [<ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120
      [600522.111288]  [<ffffffff81790a58>] page_fault+0x28/0x30
      [600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08
      [600522.113343] RIP  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
      [600522.114003]  RSP <ffff88042ae87438>
      [600522.114636] CR2: 0000000000000040
      
      Fixes: af7cf057 (NFS: Allow multiple commit requests in flight per file)
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b64a69a6
    • Nicolai Stange's avatar
      ext4: silence UBSAN in ext4_mb_init() · 35b5ea70
      Nicolai Stange authored
      commit 935244cd upstream.
      
      Currently, in ext4_mb_init(), there's a loop like the following:
      
        do {
          ...
          offset += 1 << (sb->s_blocksize_bits - i);
          i++;
        } while (i <= sb->s_blocksize_bits + 1);
      
      Note that the updated offset is used in the loop's next iteration only.
      
      However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
      the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
      and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
        shift exponent 4294967295 is too large for 32-bit type 'int'
        [...]
        Call Trace:
         [<ffffffff818c4d25>] dump_stack+0xbc/0x117
         [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
         [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
         [<ffffffff814293c7>] ? create_cache+0x57/0x1f0
         [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
         [<ffffffff821c2168>] ? mutex_lock+0x38/0x60
         [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
         [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
         [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
         [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
         [...]
      
      Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of offset is never used again.
      
      Silence UBSAN by introducing another variable, offset_incr, holding the
      next increment to apply to offset and adjust that one by right shifting it
      by one position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35b5ea70
    • Nicolai Stange's avatar
      ext4: address UBSAN warning in mb_find_order_for_block() · e7bc5145
      Nicolai Stange authored
      commit b5cb316c upstream.
      
      Currently, in mb_find_order_for_block(), there's a loop like the following:
      
        while (order <= e4b->bd_blkbits + 1) {
          ...
          bb += 1 << (e4b->bd_blkbits - order);
        }
      
      Note that the updated bb is used in the loop's next iteration only.
      
      However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
      the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
        shift exponent -1 is negative
        [...]
        Call Trace:
         [<ffffffff818c4d35>] dump_stack+0xbc/0x117
         [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
         [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
         [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
         [...]
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of bb is never used again.
      
      Silence UBSAN by introducing another variable, bb_incr, holding the next
      increment to apply to bb and adjust that one by right shifting it by one
      position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7bc5145
    • Jan Kara's avatar
      ext4: fix oops on corrupted filesystem · 80f1ca54
      Jan Kara authored
      commit 74177f55 upstream.
      
      When filesystem is corrupted in the right way, it can happen
      ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we
      subsequently remove inode from the in-memory orphan list. However this
      deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we
      leave i_orphan list_head with a stale content. Later we can look at this
      content causing list corruption, oops, or other issues. The reported
      trace looked like:
      
      WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100()
      list_del corruption, 0000000061c1d6e0->next is LIST_POISON1
      0000000000100100)
      CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250
      Stack:
       60462947 62219960 602ede24 62219960
       602ede24 603ca293 622198f0 602f02eb
       62219950 6002c12c 62219900 601b4d6b
      Call Trace:
       [<6005769c>] ? vprintk_emit+0x2dc/0x5c0
       [<602ede24>] ? printk+0x0/0x94
       [<600190bc>] show_stack+0xdc/0x1a0
       [<602ede24>] ? printk+0x0/0x94
       [<602ede24>] ? printk+0x0/0x94
       [<602f02eb>] dump_stack+0x2a/0x2c
       [<6002c12c>] warn_slowpath_common+0x9c/0xf0
       [<601b4d6b>] ? __list_del_entry+0x6b/0x100
       [<6002c254>] warn_slowpath_fmt+0x94/0xa0
       [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0
       [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0
       [<60023ebf>] ? set_signals+0x3f/0x50
       [<600a205a>] ? kmem_cache_free+0x10a/0x180
       [<602f4e88>] ? mutex_lock+0x18/0x30
       [<601b4d6b>] __list_del_entry+0x6b/0x100
       [<601177ec>] ext4_orphan_del+0x22c/0x2f0
       [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0
       [<6010b973>] ? ext4_truncate+0x383/0x390
       [<6010bc8b>] ext4_write_begin+0x30b/0x4b0
       [<6001bb50>] ? copy_from_user+0x0/0xb0
       [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0
       [<60072c4f>] generic_perform_write+0xaf/0x1e0
       [<600c4166>] ? file_update_time+0x46/0x110
       [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0
       [<6010030f>] ext4_file_write_iter+0x15f/0x470
       [<60094e10>] ? unlink_file_vma+0x0/0x70
       [<6009b180>] ? unlink_anon_vmas+0x0/0x260
       [<6008f169>] ? free_pgtables+0xb9/0x100
       [<600a6030>] __vfs_write+0xb0/0x130
       [<600a61d5>] vfs_write+0xa5/0x170
       [<600a63d6>] SyS_write+0x56/0xe0
       [<6029fcb0>] ? __libc_waitpid+0x0/0xa0
       [<6001b698>] handle_syscall+0x68/0x90
       [<6002633d>] userspace+0x4fd/0x600
       [<6002274f>] ? save_registers+0x1f/0x40
       [<60028bd7>] ? arch_prctl+0x177/0x1b0
       [<60017bd5>] fork_handler+0x85/0x90
      
      Fix the problem by using list_del_init() as we always should with
      i_orphan list.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80f1ca54
    • Seth Forshee's avatar
      ext4: fix check of dqget() return value in ext4_ioctl_setproject() · e0df3698
      Seth Forshee authored
      commit ff0bc084 upstream.
      
      A failed call to dqget() returns an ERR_PTR() and not null. Fix
      the check in ext4_ioctl_setproject() to handle this correctly.
      
      Fixes: 9b7365fc ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support")
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0df3698
    • Theodore Ts'o's avatar
      ext4: clean up error handling when orphan list is corrupted · 447b62ed
      Theodore Ts'o authored
      commit 7827a7f6 upstream.
      
      Instead of just printing warning messages, if the orphan list is
      corrupted, declare the file system is corrupted.  If there are any
      reserved inodes in the orphaned inode list, declare the file system
      corrupted and stop right away to avoid doing more potential damage to
      the file system.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      447b62ed
    • Theodore Ts'o's avatar
      ext4: fix hang when processing corrupted orphaned inode list · 84ce09a5
      Theodore Ts'o authored
      commit c9eb13a9 upstream.
      
      If the orphaned inode list contains inode #5, ext4_iget() returns a
      bad inode (since the bootloader inode should never be referenced
      directly).  Because of the bad inode, we end up processing the inode
      repeatedly and this hangs the machine.
      
      This can be reproduced via:
      
         mke2fs -t ext4 /tmp/foo.img 100
         debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
         mount -o loop /tmp/foo.img /mnt
      
      (But don't do this if you are using an unpatched kernel if you care
      about the system staying functional.  :-)
      
      This bug was found by the port of American Fuzzy Lop into the kernel
      to find file system problems[1].  (Since it *only* happens if inode #5
      shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
      surprising that AFL needed two hours before it found it.)
      
      [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
      
      Reported by: Vegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84ce09a5
    • Jan Kara's avatar
      ext4: fix data exposure after a crash · efafc423
      Jan Kara authored
      commit 06bd3c36 upstream.
      
      Huang has reported that in his powerfail testing he is seeing stale
      block contents in some of recently allocated blocks although he mounts
      ext4 in data=ordered mode. After some investigation I have found out
      that indeed when delayed allocation is used, we don't add inode to
      transaction's list of inodes needing flushing before commit. Originally
      we were doing that but commit f3b59291 removed the logic with a
      flawed argument that it is not needed.
      
      The problem is that although for delayed allocated blocks we write their
      contents immediately after allocating them, there is no guarantee that
      the IO scheduler or device doesn't reorder things and thus transaction
      allocating blocks and attaching them to inode can reach stable storage
      before actual block contents. Actually whenever we attach freshly
      allocated blocks to inode using a written extent, we should add inode to
      transaction's ordered inode list to make sure we properly wait for block
      contents to be written before committing the transaction. So that is
      what we do in this patch. This also handles other cases where stale data
      exposure was possible - like filling hole via mmap in
      data=ordered,nodelalloc mode.
      
      The only exception to the above rule are extending direct IO writes where
      blkdev_direct_IO() waits for IO to complete before increasing i_size and
      thus stale data exposure is not possible. For now we don't complicate
      the code with optimizing this special case since the overhead is pretty
      low. In case this is observed to be a performance problem we can always
      handle it using a special flag to ext4_map_blocks().
      
      Fixes: f3b59291Reported-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Tested-by: default avatar"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efafc423