1. 22 May, 2018 4 commits
  2. 21 May, 2018 11 commits
    • Al Viro's avatar
      Merge branch 'work.misc' into work.lookup · 837f3ec6
      Al Viro authored
      837f3ec6
    • Al Viro's avatar
      aio: fix io_destroy(2) vs. lookup_ioctx() race · baf10564
      Al Viro authored
      kill_ioctx() used to have an explicit RCU delay between removing the
      reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
      At some point that delay had been removed, on the theory that
      percpu_ref_kill() itself contained an RCU delay.  Unfortunately, that was
      the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
      by lookup_ioctx().  As the result, we could get ctx freed right under
      lookup_ioctx().  Tejun has fixed that in a6d7cff4 ("fs/aio: Add explicit
      RCU grace period when freeing kioctx"); however, that fix is not enough.
      
      Suppose io_destroy() from one thread races with e.g. io_setup() from another;
      CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
      has picked it (under rcu_read_lock()).  Then CPU1 proceeds to drop the
      refcount, getting it to 0 and triggering a call of free_ioctx_users(),
      which proceeds to drop the secondary refcount and once that reaches zero
      calls free_ioctx_reqs().  That does
              INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
              queue_rcu_work(system_wq, &ctx->free_rwork);
      and schedules freeing the whole thing after RCU delay.
      
      In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
      refcount from 0 to 1 and returned the reference to io_setup().
      
      Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
      freed until after percpu_ref_get().  Sure, we'd increment the counter before
      ctx can be freed.  Now we are out of rcu_read_lock() and there's nothing to
      stop freeing of the whole thing.  Unfortunately, CPU2 assumes that since it
      has grabbed the reference, ctx is *NOT* going away until it gets around to
      dropping that reference.
      
      The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
      It's not costlier than what we currently do in normal case, it's safe to
      call since freeing *is* delayed and it closes the race window - either
      lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
      won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
      fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
      the object in question at all.
      
      Cc: stable@kernel.org
      Fixes: a6d7cff4 "fs/aio: Add explicit RCU grace period when freeing kioctx"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      baf10564
    • Al Viro's avatar
      ext2: fix a block leak · 5aa1437d
      Al Viro authored
      open file, unlink it, then use ioctl(2) to make it immutable or
      append only.  Now close it and watch the blocks *not* freed...
      
      Immutable/append-only checks belong in ->setattr().
      Note: the bug is old and backport to anything prior to 737f2e93
      ("ext2: convert to use the new truncate convention") will need
      these checks lifted into ext2_setattr().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5aa1437d
    • Al Viro's avatar
      nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed · 3819bb0d
      Al Viro authored
      That can (and does, on some filesystems) happen - ->mkdir() (and thus
      vfs_mkdir()) can legitimately leave its argument negative and just
      unhash it, counting upon the lookup to pick the object we'd created
      next time we try to look at that name.
      
      Some vfs_mkdir() callers forget about that possibility...
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3819bb0d
    • Al Viro's avatar
      cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed · 9c3e9025
      Al Viro authored
      That can (and does, on some filesystems) happen - ->mkdir() (and thus
      vfs_mkdir()) can legitimately leave its argument negative and just
      unhash it, counting upon the lookup to pick the object we'd created
      next time we try to look at that name.
      
      Some vfs_mkdir() callers forget about that possibility...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9c3e9025
    • Al Viro's avatar
      unfuck sysfs_mount() · 7b745a4e
      Al Viro authored
      new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
      and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
      is not a solution - IS_ERR(root) is true in some cases when new_sb is true.
      
      Make sure new_sb is initialized (and matches the reality) in all cases and
      fix the condition for dropping kobj reference - we want it done precisely
      in those situations where the reference has not been transferred into a new
      super_block instance.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7b745a4e
    • Al Viro's avatar
      kernfs: deal with kernfs_fill_super() failures · 82382ace
      Al Viro authored
      make sure that info->node is initialized early, so that kernfs_kill_sb()
      can list_del() it safely.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      82382ace
    • Joe Perches's avatar
      cramfs: Fix IS_ENABLED typo · 08a8f308
      Joe Perches authored
      There's an extra C here...
      
      Fixes: 99c18ce5 ("cramfs: direct memory access support")
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      08a8f308
    • Al Viro's avatar
      befs_lookup(): use d_splice_alias() · f4e4d434
      Al Viro authored
      RTFS(Documentation/filesystems/nfs/Exporting) if you try to make
      something exportable.
      
      Fixes: ac632f5b "befs: add NFS export support"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f4e4d434
    • Al Viro's avatar
      affs_lookup: switch to d_splice_alias() · 87fbd639
      Al Viro authored
      Making something exportable takes more than providing ->s_export_ops.
      In particular, ->lookup() *MUST* use d_splice_alias() instead of
      d_add().
      
      Reading Documentation/filesystems/nfs/Exporting would've been a good idea;
      as it is, exporting AFFS is badly (and exploitably) broken.
      
      Partially-Fixes: ed4433d7 "fs/affs: make affs exportable"
      Acked-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      87fbd639
    • Al Viro's avatar
      affs_lookup(): close a race with affs_remove_link() · 30da870c
      Al Viro authored
      we unlock the directory hash too early - if we are looking at secondary
      link and primary (in another directory) gets removed just as we unlock,
      we could have the old primary moved in place of the secondary, leaving
      us to look into freed entry (and leaving our dentry with ->d_fsdata
      pointing to a freed entry).
      
      Cc: stable@vger.kernel.org # 2.4.4+
      Acked-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      30da870c
  3. 18 May, 2018 2 commits
  4. 14 May, 2018 2 commits
  5. 13 May, 2018 3 commits
    • Al Viro's avatar
      fix breakage caused by d_find_alias() semantics change · b127125d
      Al Viro authored
      "VFS: don't keep disconnected dentries on d_anon" had a non-trivial
      side-effect - d_unhashed() now returns true for those dentries,
      making d_find_alias() skip them altogether.  For most of its callers
      that's fine - we really want a connected alias there.  However,
      there is a codepath where we relied upon picking such aliases
      if nothing else could be found - selinux delayed initialization
      of contexts for inodes on already mounted filesystems used to
      rely upon that.
      
      Cc: stable@kernel.org # f1ee6162 "VFS: don't keep disconnected dentries on d_anon"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b127125d
    • Al Viro's avatar
      vfat: simplify checks in vfat_lookup() · f6ddc161
      Al Viro authored
      vfat_d_anon_disconn() is called only if alias->d_parent is equal to
      dentry->d_parent *and* it returns false unless alias->d_parent == alias.
      But in that case alias is the directory we are doing lookup in, and
      d_splice_alias() would've done the right thing.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f6ddc161
    • Al Viro's avatar
      get rid of dead code in d_find_alias() · 61fec493
      Al Viro authored
      All "try disconnected alias if nothing else fits" logics in d_find_alias()
      got accidentally disabled by Neil a while ago; for most of the callers it
      was the right thing to do, so fixes belong in few callers that *do* want
      disconnected aliases.  This just takes the now-dead code in d_find_alias()
      out.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      61fec493
  6. 11 May, 2018 2 commits
    • Dave Chinner's avatar
      fs: don't scan the inode cache before SB_BORN is set · 79f546a6
      Dave Chinner authored
      We recently had an oops reported on a 4.14 kernel in
      xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
      and so the m_perag_tree lookup walked into lala land.  It produces
      an oops down this path during the failed mount:
      
        radix_tree_gang_lookup_tag+0xc4/0x130
        xfs_perag_get_tag+0x37/0xf0
        xfs_reclaim_inodes_count+0x32/0x40
        xfs_fs_nr_cached_objects+0x11/0x20
        super_cache_count+0x35/0xc0
        shrink_slab.part.66+0xb1/0x370
        shrink_node+0x7e/0x1a0
        try_to_free_pages+0x199/0x470
        __alloc_pages_slowpath+0x3a1/0xd20
        __alloc_pages_nodemask+0x1c3/0x200
        cache_grow_begin+0x20b/0x2e0
        fallback_alloc+0x160/0x200
        kmem_cache_alloc+0x111/0x4e0
      
      The problem is that the superblock shrinker is running before the
      filesystem structures it depends on have been fully set up. i.e.
      the shrinker is registered in sget(), before ->fill_super() has been
      called, and the shrinker can call into the filesystem before
      fill_super() does it's setup work. Essentially we are exposed to
      both use-after-free and use-before-initialisation bugs here.
      
      To fix this, add a check for the SB_BORN flag in super_cache_count.
      In general, this flag is not set until ->fs_mount() completes
      successfully, so we know that it is set after the filesystem
      setup has completed. This matches the trylock_super() behaviour
      which will not let super_cache_scan() run if SB_BORN is not set, and
      hence will not allow the superblock shrinker from entering the
      filesystem while it is being set up or after it has failed setup
      and is being torn down.
      
      Cc: stable@kernel.org
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      79f546a6
    • Al Viro's avatar
      do d_instantiate/unlock_new_inode combinations safely · 1e2e547a
      Al Viro authored
      For anything NFS-exported we do _not_ want to unlock new inode
      before it has grown an alias; original set of fixes got the
      ordering right, but missed the nasty complication in case of
      lockdep being enabled - unlock_new_inode() does
      	lockdep_annotate_inode_mutex_key(inode)
      which can only be done before anyone gets a chance to touch
      ->i_mutex.  Unfortunately, flipping the order and doing
      unlock_new_inode() before d_instantiate() opens a window when
      mkdir can race with open-by-fhandle on a guessed fhandle, leading
      to multiple aliases for a directory inode and all the breakage
      that follows from that.
      
      	Correct solution: a new primitive (d_instantiate_new())
      combining these two in the right order - lockdep annotate, then
      d_instantiate(), then the rest of unlock_new_inode().  All
      combinations of d_instantiate() with unlock_new_inode() should
      be converted to that.
      
      Cc: stable@kernel.org	# 2.6.29 and later
      Tested-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1e2e547a
  7. 10 May, 2018 1 commit
  8. 02 May, 2018 2 commits
  9. 20 Apr, 2018 2 commits
  10. 16 Apr, 2018 11 commits
    • Al Viro's avatar
      remove rpc_rmdir() · 69c45d57
      Al Viro authored
      no users since 2014...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      69c45d57
    • Tetsuo Handa's avatar
      mm,vmscan: Allow preallocating memory for register_shrinker(). · 8e04944f
      Tetsuo Handa authored
      syzbot is catching so many bugs triggered by commit 9ee332d9
      ("sget(): handle failures of register_shrinker()"). That commit expected
      that calling kill_sb() from deactivate_locked_super() without successful
      fill_super() is safe, but the reality was different; some callers assign
      attributes which are needed for kill_sb() after sget() succeeds.
      
      For example, [1] is a report where sb->s_mode (which seems to be either
      FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
      assigned unless sget() succeeds. But it does not worth complicate sget()
      so that register_shrinker() failure path can safely call
      kill_block_super() via kill_sb(). Making alloc_super() fail if memory
      allocation for register_shrinker() failed is much simpler. Let's avoid
      calling deactivate_locked_super() from sget_userns() by preallocating
      memory for the shrinker and making register_shrinker() in sget_userns()
      never fail.
      
      [1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8e04944f
    • Al Viro's avatar
      rpc_pipefs: fix double-dput() · 4a3877c4
      Al Viro authored
      if we ever hit rpc_gssd_dummy_depopulate() dentry passed to
      it has refcount equal to 1.  __rpc_rmpipe() drops it and
      dput() done after that hits an already freed dentry.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4a3877c4
    • Al Viro's avatar
      orangefs_kill_sb(): deal with allocation failures · 65903842
      Al Viro authored
      orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
      oops in that case.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      65903842
    • Al Viro's avatar
      jffs2_kill_sb(): deal with failed allocations · c66b23c2
      Al Viro authored
      jffs2_fill_super() might fail to allocate jffs2_sb_info;
      jffs2_kill_sb() must survive that.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c66b23c2
    • Al Viro's avatar
      hypfs_kill_super(): deal with failed allocations · a24cd490
      Al Viro authored
      hypfs_fill_super() might fail to allocate sbi; hypfs_kill_super()
      should not oops on that.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a24cd490
    • Zev Weiss's avatar
      fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range() · 22762711
      Zev Weiss authored
      It's a fairly inconsequential bug, since fdput() won't actually try to
      fput() the file due to fd.flags (and thus FDPUT_FPUT) being zero in
      the failure case, but most other vfs code takes steps to avoid this.
      Signed-off-by: default avatarZev Weiss <zev@bewilderbeest.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      22762711
    • Linus Torvalds's avatar
      Linux 4.17-rc1 · 60cc43fc
      Linus Torvalds authored
      60cc43fc
    • Linus Torvalds's avatar
      Merge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · e37563bb
      Linus Torvalds authored
      Pull more btrfs updates from David Sterba:
       "We have queued a few more fixes (error handling, log replay,
        softlockup) and the rest is SPDX updates that touche almost all files
        so the diffstat is long"
      
      * tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: Only check first key for committed tree blocks
        btrfs: add SPDX header to Kconfig
        btrfs: replace GPL boilerplate by SPDX -- sources
        btrfs: replace GPL boilerplate by SPDX -- headers
        Btrfs: fix loss of prealloc extents past i_size after fsync log replay
        Btrfs: clean up resources during umount after trans is aborted
        btrfs: Fix possible softlock on single core machines
        Btrfs: bail out on error during replay_dir_deletes
        Btrfs: fix NULL pointer dereference in log_dir_items
      e37563bb
    • Linus Torvalds's avatar
      Merge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6 · 09c9b0ea
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "SMB3 fixes, a few for stable, and some important cleanup work from
        Ronnie of the smb3 transport code"
      
      * tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: change validate_buf to validate_iov
        cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
        cifs: Change SMB2_open to return an iov for the error parameter
        cifs: add resp_buf_size to the mid_q_entry structure
        smb3.11: replace a 4 with server->vals->header_preamble_size
        cifs: replace a 4 with server->vals->header_preamble_size
        cifs: add pdu_size to the TCP_Server_Info structure
        SMB311: Improve checking of negotiate security contexts
        SMB3: Fix length checking of SMB3.11 negotiate request
        CIFS: add ONCE flag for cifs_dbg type
        cifs: Use ULL suffix for 64-bit constant
        SMB3: Log at least once if tree connect fails during reconnect
        cifs: smb2pdu: Fix potential NULL pointer dereference
      09c9b0ea
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f0d98d85
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a set of minor (and safe changes) that didn't make the initial
        pull request plus some bug fixes.
      
        The status handling code is actually a running regression from the
        previous merge window which had an incomplete fix (now reverted) and
        most of the remaining bug fixes are for problems older than the
        current merge window"
      
      [ Side note: this merge also takes the base kernel git repository to 6+
        million objects for the first time. Technically we hit it a couple of
        merges ago already if you count all the tag objects, but now it
        reaches 6M+ objects reachable from HEAD.
      
        I was joking around that that's when I should switch to 5.0, because
        3.0 happened at the 2M mark, and 4.0 happened at 4M objects. But
        probably not, even if numerology is about as good a reason as any.
      
                                                                    - Linus ]
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: devinfo: Add Microsoft iSCSI target to 1024 sector blacklist
        scsi: cxgb4i: silence overflow warning in t4_uld_rx_handler()
        scsi: dpt_i2o: Use after free in I2ORESETCMD ioctl
        scsi: core: Make scsi_result_to_blk_status() recognize CONDITION MET
        scsi: core: Rename __scsi_error_from_host_byte() into scsi_result_to_blk_status()
        Revert "scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"
        scsi: aacraid: Insure command thread is not recursively stopped
        scsi: qla2xxx: Correct setting of SAM_STAT_CHECK_CONDITION
        scsi: qla2xxx: correctly shift host byte
        scsi: qla2xxx: Fix race condition between iocb timeout and initialisation
        scsi: qla2xxx: Avoid double completion of abort command
        scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
        scsi: scsi_dh: Don't look for NULL devices handlers by name
        scsi: core: remove redundant assignment to shost->use_blk_mq
      f0d98d85