1. 22 Jun, 2017 8 commits
    • Tahsin Erdogan's avatar
      ext4: call journal revoke when freeing ea_inode blocks · ddfa17e4
      Tahsin Erdogan authored
      ea_inode contents are treated as metadata, that's why it is journaled
      during initial writes. Failing to call revoke during freeing could cause
      user data to be overwritten with original ea_inode contents during journal
      replay.
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ddfa17e4
    • Tahsin Erdogan's avatar
      ext4: ea_inode owner should be the same as the inode owner · 9e1ba001
      Tahsin Erdogan authored
      Quota charging is based on the ownership of the inode. Currently, the
      xattr inode owner is set to the caller which may be different from the
      parent inode owner. This is inconsistent with how quota is charged for
      xattr block and regular data block writes.
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9e1ba001
    • Tahsin Erdogan's avatar
      ext4: attach jinode after creation of xattr inode · bd3b963b
      Tahsin Erdogan authored
      In data=ordered mode jinode needs to be attached to the xattr inode when
      writing data to it. Attachment normally occurs during file open for regular
      files. Since we are not using file interface to write to the xattr inode,
      the jinode attach needs to be done manually.
      
      Otherwise the following crash occurs in data=ordered mode.
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: jbd2_journal_file_inode+0x37/0x110
       PGD 13b3c0067
       P4D 13b3c0067
       PUD 137660067
       PMD 0
      
       Oops: 0000 [#1] SMP
       CPU: 3 PID: 1877 Comm: python Not tainted 4.12.0-rc1+ #749
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       task: ffff88010e368980 task.stack: ffffc90000374000
       RIP: 0010:jbd2_journal_file_inode+0x37/0x110
       RSP: 0018:ffffc90000377980 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff880123b06230 RCX: 0000000000280000
       RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88012c8585d0
       RBP: ffffc900003779b0 R08: 0000000000000202 R09: 0000000000000001
       R10: 0000000000000000 R11: 0000000000000400 R12: ffff8801111f81c0
       R13: ffff88013b2b6800 R14: ffffc90000377ab0 R15: 0000000000000001
       FS:  00007f0c99b77740(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000136d91000 CR4: 00000000000006e0
       Call Trace:
        jbd2_journal_inode_add_write+0xe/0x10
        ext4_map_blocks+0x59e/0x620
        ext4_xattr_set_entry+0x501/0x7d0
        ext4_xattr_block_set+0x1b2/0x9b0
        ext4_xattr_set_handle+0x322/0x4f0
        ext4_xattr_set+0x144/0x1a0
        ext4_xattr_user_set+0x34/0x40
        __vfs_setxattr+0x66/0x80
        __vfs_setxattr_noperm+0x69/0x1c0
        vfs_setxattr+0xa2/0xb0
        setxattr+0x12e/0x150
        path_setxattr+0x87/0xb0
        SyS_setxattr+0xf/0x20
        entry_SYSCALL_64_fastpath+0x18/0xad
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bd3b963b
    • Tahsin Erdogan's avatar
      ext4: do not set posix acls on xattr inodes · 1b917ed8
      Tahsin Erdogan authored
      We don't need acls on xattr inodes because they are not directly
      accessible from user mode.
      
      Besides lockdep complains about recursive locking of xattr_sem as seen
      below.
      
        =============================================
        [ INFO: possible recursive locking detected ]
        4.11.0-rc8+ #402 Not tainted
        ---------------------------------------------
        python/1894 is trying to acquire lock:
         (&ei->xattr_sem){++++..}, at: [<ffffffff804878a6>] ext4_xattr_get+0x66/0x270
      
        but task is already holding lock:
         (&ei->xattr_sem){++++..}, at: [<ffffffff80489500>] ext4_xattr_set_handle+0xa0/0x5d0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&ei->xattr_sem);
          lock(&ei->xattr_sem);
      
         *** DEADLOCK ***
      
         May be due to missing lock nesting notation
      
        3 locks held by python/1894:
         #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff803d829f>] mnt_want_write+0x1f/0x50
         #1:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803dda27>] vfs_setxattr+0x57/0xb0
         #2:  (&ei->xattr_sem){++++..}, at: [<ffffffff80489500>] ext4_xattr_set_handle+0xa0/0x5d0
      
        stack backtrace:
        CPU: 0 PID: 1894 Comm: python Not tainted 4.11.0-rc8+ #402
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         dump_stack+0x67/0x99
         __lock_acquire+0x5f3/0x1830
         lock_acquire+0xb5/0x1d0
         down_read+0x2f/0x60
         ext4_xattr_get+0x66/0x270
         ext4_get_acl+0x43/0x1e0
         get_acl+0x72/0xf0
         posix_acl_create+0x5e/0x170
         ext4_init_acl+0x21/0xc0
         __ext4_new_inode+0xffd/0x16b0
         ext4_xattr_set_entry+0x5ea/0xb70
         ext4_xattr_block_set+0x1b5/0x970
         ext4_xattr_set_handle+0x351/0x5d0
         ext4_xattr_set+0x124/0x180
         ext4_xattr_user_set+0x34/0x40
         __vfs_setxattr+0x66/0x80
         __vfs_setxattr_noperm+0x69/0x1c0
         vfs_setxattr+0xa2/0xb0
         setxattr+0x129/0x160
         path_setxattr+0x87/0xb0
         SyS_setxattr+0xf/0x20
         entry_SYSCALL_64_fastpath+0x18/0xad
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1b917ed8
    • Tahsin Erdogan's avatar
      ext4: lock inode before calling ext4_orphan_add() · 0de5983d
      Tahsin Erdogan authored
      ext4_orphan_add() requires caller to be holding the inode lock.
      Add missing lock statements.
      
       WARNING: CPU: 3 PID: 1806 at fs/ext4/namei.c:2731 ext4_orphan_add+0x4e/0x240
       CPU: 3 PID: 1806 Comm: python Not tainted 4.12.0-rc1+ #746
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       task: ffff880135d466c0 task.stack: ffffc900014b0000
       RIP: 0010:ext4_orphan_add+0x4e/0x240
       RSP: 0018:ffffc900014b3d50 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff8801348fe1f0 RCX: ffffc900014b3c64
       RDX: 0000000000000000 RSI: ffff8801348fe1f0 RDI: ffff8801348fe1f0
       RBP: ffffc900014b3da0 R08: 0000000000000000 R09: ffffffff80e82025
       R10: 0000000000004692 R11: 000000000000468d R12: ffff880137598000
       R13: ffff880137217000 R14: ffff880134ac58d0 R15: 0000000000000000
       FS:  00007fc50f09e740(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000008bc2e0 CR3: 00000001375ac000 CR4: 00000000000006e0
       Call Trace:
        ext4_xattr_inode_orphan_add.constprop.19+0x9d/0xf0
        ext4_xattr_delete_inode+0x1c4/0x2f0
        ext4_evict_inode+0x15a/0x7f0
        evict+0xc0/0x1a0
        iput+0x16a/0x270
        do_unlinkat+0x172/0x290
        SyS_unlink+0x11/0x20
        entry_SYSCALL_64_fastpath+0x18/0xad
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0de5983d
    • Tahsin Erdogan's avatar
      ext4: fix lockdep warning about recursive inode locking · 33d201e0
      Tahsin Erdogan authored
      Setting a large xattr value may require writing the attribute contents
      to an external inode. In this case we may need to lock the xattr inode
      along with the parent inode. This doesn't pose a deadlock risk because
      xattr inodes are not directly visible to the user and their access is
      restricted.
      
      Assign a lockdep subclass to xattr inode's lock.
      
       ============================================
       WARNING: possible recursive locking detected
       4.12.0-rc1+ #740 Not tainted
       --------------------------------------------
       python/1822 is trying to acquire lock:
        (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff804912ca>] ext4_xattr_set_entry+0x65a/0x7b0
      
       but task is already holding lock:
        (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803d6687>] vfs_setxattr+0x57/0xb0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&sb->s_type->i_mutex_key#15);
         lock(&sb->s_type->i_mutex_key#15);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       4 locks held by python/1822:
        #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff803d0eef>] mnt_want_write+0x1f/0x50
        #1:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803d6687>] vfs_setxattr+0x57/0xb0
        #2:  (jbd2_handle){.+.+..}, at: [<ffffffff80493f40>] start_this_handle+0xf0/0x420
        #3:  (&ei->xattr_sem){++++..}, at: [<ffffffff804920ba>] ext4_xattr_set_handle+0x9a/0x4f0
      
       stack backtrace:
       CPU: 0 PID: 1822 Comm: python Not tainted 4.12.0-rc1+ #740
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       Call Trace:
        dump_stack+0x67/0x9e
        __lock_acquire+0x5f3/0x1750
        lock_acquire+0xb5/0x1d0
        down_write+0x2c/0x60
        ext4_xattr_set_entry+0x65a/0x7b0
        ext4_xattr_block_set+0x1b2/0x9b0
        ext4_xattr_set_handle+0x322/0x4f0
        ext4_xattr_set+0x144/0x1a0
        ext4_xattr_user_set+0x34/0x40
        __vfs_setxattr+0x66/0x80
        __vfs_setxattr_noperm+0x69/0x1c0
        vfs_setxattr+0xa2/0xb0
        setxattr+0x12e/0x150
        path_setxattr+0x87/0xb0
        SyS_setxattr+0xf/0x20
        entry_SYSCALL_64_fastpath+0x18/0xad
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      33d201e0
    • Andreas Dilger's avatar
      ext4: xattr-in-inode support · e50e5129
      Andreas Dilger authored
      Large xattr support is implemented for EXT4_FEATURE_INCOMPAT_EA_INODE.
      
      If the size of an xattr value is larger than will fit in a single
      external block, then the xattr value will be saved into the body
      of an external xattr inode.
      
      The also helps support a larger number of xattr, since only the headers
      will be stored in the in-inode space or the single external block.
      
      The inode is referenced from the xattr header via "e_value_inum",
      which was formerly "e_value_block", but that field was never used.
      The e_value_size still contains the xattr size so that listing
      xattrs does not need to look up the inode if the data is not accessed.
      
      struct ext4_xattr_entry {
              __u8    e_name_len;     /* length of name */
              __u8    e_name_index;   /* attribute name index */
              __le16  e_value_offs;   /* offset in disk block of value */
              __le32  e_value_inum;   /* inode in which value is stored */
              __le32  e_value_size;   /* size of attribute value */
              __le32  e_hash;         /* hash value of name and value */
              char    e_name[0];      /* attribute name */
      };
      
      The xattr inode is marked with the EXT4_EA_INODE_FL flag and also
      holds a back-reference to the owning inode in its i_mtime field,
      allowing the ext4/e2fsck to verify the correct inode is accessed.
      
      [ Applied fix by Dan Carpenter to avoid freeing an ERR_PTR. ]
      
      Lustre-Jira: https://jira.hpdd.intel.com/browse/LU-80
      Lustre-bugzilla: https://bugzilla.lustre.org/show_bug.cgi?id=4424Signed-off-by: default avatarKalpak Shah <kalpak.shah@sun.com>
      Signed-off-by: default avatarJames Simmons <uja.ornl@gmail.com>
      Signed-off-by: default avatarAndreas Dilger <andreas.dilger@intel.com>
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      e50e5129
    • Artem Blagodarenko's avatar
      ext4: add largedir feature · e08ac99f
      Artem Blagodarenko authored
      This INCOMPAT_LARGEDIR feature allows larger directories to be created
      in ldiskfs, both with directory sizes over 2GB and and a maximum htree
      depth of 3 instead of the current limit of 2. These features are needed
      in order to exceed the current limit of approximately 10M entries in a
      single directory.
      
      This patch was originally written by Yang Sheng to support the Lustre server.
      
      [ Bumped the credits needed to update an indexed directory -- tytso ]
      Signed-off-by: default avatarLiang Zhen <liang.zhen@intel.com>
      Signed-off-by: default avatarYang Sheng <yang.sheng@intel.com>
      Signed-off-by: default avatarArtem Blagodarenko <artem.blagodarenko@seagate.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <andreas.dilger@intel.com>
      e08ac99f
  2. 29 May, 2017 1 commit
    • Jan Kara's avatar
      ext4: fix fdatasync(2) after extent manipulation operations · 67a7d5f5
      Jan Kara authored
      Currently, extent manipulation operations such as hole punch, range
      zeroing, or extent shifting do not record the fact that file data has
      changed and thus fdatasync(2) has a work to do. As a result if we crash
      e.g. after a punch hole and fdatasync, user can still possibly see the
      punched out data after journal replay. Test generic/392 fails due to
      these problems.
      
      Fix the problem by properly marking that file data has changed in these
      operations.
      
      CC: stable@vger.kernel.org
      Fixes: a4bb6b64Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      67a7d5f5
  3. 26 May, 2017 2 commits
    • Jan Kara's avatar
      ext4: fix data corruption for mmap writes · a056bdaa
      Jan Kara authored
      mpage_submit_page() can race with another process growing i_size and
      writing data via mmap to the written-back page. As mpage_submit_page()
      samples i_size too early, it may happen that ext4_bio_write_page()
      zeroes out too large tail of the page and thus corrupts user data.
      
      Fix the problem by sampling i_size only after the page has been
      write-protected in page tables by clear_page_dirty_for_io() call.
      Reported-by: default avatarMichael Zimmer <michael@swarm64.com>
      CC: stable@vger.kernel.org
      Fixes: cb20d518Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a056bdaa
    • Jan Kara's avatar
      ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO · 4f8caa60
      Jan Kara authored
      When ext4_map_blocks() is called with EXT4_GET_BLOCKS_ZERO to zero-out
      allocated blocks and these blocks are actually converted from unwritten
      extent the following race can happen:
      
      CPU0					CPU1
      
      page fault				page fault
      ...					...
      ext4_map_blocks()
        ext4_ext_map_blocks()
          ext4_ext_handle_unwritten_extents()
            ext4_ext_convert_to_initialized()
      	- zero out converted extent
      	ext4_zeroout_es()
      	  - inserts extent as initialized in status tree
      
      					ext4_map_blocks()
      					  ext4_es_lookup_extent()
      					    - finds initialized extent
      					write data
        ext4_issue_zeroout()
          - zeroes out new extent overwriting data
      
      This problem can be reproduced by generic/340 for the fallocated case
      for the last block in the file.
      
      Fix the problem by avoiding zeroing out the area we are mapping with
      ext4_map_blocks() in ext4_ext_convert_to_initialized(). It is pointless
      to zero out this area in the first place as the caller asked us to
      convert the area to initialized because he is just going to write data
      there before the transaction finishes. To achieve this we delete the
      special case of zeroing out full extent as that will be handled by the
      cases below zeroing only the part of the extent that needs it. We also
      instruct ext4_split_extent() that the middle of extent being split
      contains data so that ext4_split_extent_at() cannot zero out full extent
      in case of ENOSPC.
      
      CC: stable@vger.kernel.org
      Fixes: 12735f88Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4f8caa60
  4. 24 May, 2017 5 commits
  5. 22 May, 2017 8 commits
    • Konstantin Khlebnikov's avatar
      ext4: keep existing extra fields when inode expands · 887a9730
      Konstantin Khlebnikov authored
      ext4_expand_extra_isize() should clear only space between old and new
      size.
      
      Fixes: 6dd4ee7c # v2.6.23
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      887a9730
    • Konstantin Khlebnikov's avatar
      ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors · 9651e6b2
      Konstantin Khlebnikov authored
      I've got another report about breaking ext4 by ENOMEM error returned from
      ext4_mb_load_buddy() caused by memory shortage in memory cgroup.
      This time inside ext4_discard_preallocations().
      
      This patch replaces ext4_error() with ext4_warning() where errors returned
      from ext4_mb_load_buddy() are not fatal and handled by caller:
      * ext4_mb_discard_group_preallocations() - called before generating ENOSPC,
        we'll try to discard other group or return ENOSPC into user-space.
      * ext4_trim_all_free() - just stop trimming and return ENOMEM from ioctl.
      
      Some callers cannot handle errors, thus __GFP_NOFAIL is used for them:
      * ext4_discard_preallocations()
      * ext4_mb_discard_lg_preallocations()
      
      Fixes: adb7ef60 ("ext4: use __GFP_NOFAIL in ext4_free_blocks()")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9651e6b2
    • Jan Kara's avatar
      ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff() · 3f1d5bad
      Jan Kara authored
      There is an off-by-one error in loop termination conditions in
      ext4_find_unwritten_pgoff() since 'end' may index a page beyond end of
      desired range if 'endoff' is page aligned. It doesn't have any visible
      effects but still it is good to fix it.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      3f1d5bad
    • Jan Kara's avatar
      ext4: fix SEEK_HOLE · 7d95eddf
      Jan Kara authored
      Currently, SEEK_HOLE implementation in ext4 may both return that there's
      a hole at some offset although that offset already has data and skip
      some holes during a search for the next hole. The first problem is
      demostrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
      Whence	Result
      HOLE	0
      
      Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
      a hole although we have written data there. The second problem can be
      demonstrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offsets 56k..128k has been ignored by the
      SEEK_HOLE call.
      
      The underlying problem is in the ext4_find_unwritten_pgoff() which is
      just buggy. In some cases it fails to update returned offset when it
      finds a hole (when no pages are found or when the first found page has
      higher index than expected), in some cases conditions for detecting hole
      are just missing (we fail to detect a situation where indices of
      returned pages are not contiguous).
      
      Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
      indices and also handle all cases where we got less pages then expected
      in one place and handle it properly there.
      
      CC: stable@vger.kernel.org
      Fixes: c8c0df24
      CC: Zheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      7d95eddf
    • Tahsin Erdogan's avatar
      jbd2: preserve original nofs flag during journal restart · b4709067
      Tahsin Erdogan authored
      When a transaction starts, start_this_handle() saves current
      PF_MEMALLOC_NOFS value so that it can be restored at journal stop time.
      Journal restart is a special case that calls start_this_handle() without
      stopping the transaction. start_this_handle() isn't aware that the
      original value is already stored so it overwrites it with current value.
      
      For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set
      at the end:
      
        jbd2_journal_start()
        jbd2__journal_restart()
        jbd2_journal_stop()
      
      Make jbd2__journal_restart() restore the original value before calling
      start_this_handle().
      
      Fixes: 81378da6 ("jbd2: mark the transaction context with the scope GFP_NOFS context")
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      b4709067
    • Jan Kara's avatar
      ext4: clear lockdep subtype for quota files on quota off · 964edf66
      Jan Kara authored
      Quota files have special ranking of i_data_sem lock. We inform lockdep
      about it when turning on quotas however when turning quotas off, we
      don't clear the lockdep subclass from i_data_sem lock and thus when the
      inode gets later reused for a normal file or directory, lockdep gets
      confused and complains about possible deadlocks. Fix the problem by
      resetting lockdep subclass of i_data_sem on quota off.
      
      Cc: stable@vger.kernel.org
      Fixes: daf647d2Reported-and-tested-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      964edf66
    • Linus Torvalds's avatar
      Linux 4.12-rc2 · 08332893
      Linus Torvalds authored
      08332893
    • Linus Torvalds's avatar
      x86: fix 32-bit case of __get_user_asm_u64() · 33c9e972
      Linus Torvalds authored
      The code to fetch a 64-bit value from user space was entirely buggered,
      and has been since the code was merged in early 2016 in commit
      b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
      kernels").
      
      Happily the buggered routine is almost certainly entirely unused, since
      the normal way to access user space memory is just with the non-inlined
      "get_user()", and the inlined version didn't even historically exist.
      
      The normal "get_user()" case is handled by external hand-written asm in
      arch/x86/lib/getuser.S that doesn't have either of these issues.
      
      There were two independent bugs in __get_user_asm_u64():
      
       - it still did the STAC/CLAC user space access marking, even though
         that is now done by the wrapper macros, see commit 11f1a4b9
         ("x86: reorganize SMAP handling in user space accesses").
      
         This didn't result in a semantic error, it just means that the
         inlined optimized version was hugely less efficient than the
         allegedly slower standard version, since the CLAC/STAC overhead is
         quite high on modern Intel CPU's.
      
       - the double register %eax/%edx was marked as an output, but the %eax
         part of it was touched early in the asm, and could thus clobber other
         inputs to the asm that gcc didn't expect it to touch.
      
         In particular, that meant that the generated code could look like
         this:
      
              mov    (%eax),%eax
              mov    0x4(%eax),%edx
      
         where the load of %edx obviously was _supposed_ to be from the 32-bit
         word that followed the source of %eax, but because %eax was
         overwritten by the first instruction, the source of %edx was
         basically random garbage.
      
      The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
      the 64-bit output as early-clobber to let gcc know that no inputs should
      alias with the output register.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@kernel.org   # v4.8+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33c9e972
  6. 21 May, 2017 7 commits
    • Linus Torvalds's avatar
      Clean up x86 unsafe_get/put_user() type handling · 334a023e
      Linus Torvalds authored
      Al noticed that unsafe_put_user() had type problems, and fixed them in
      commit a7cc722f ("fix unsafe_put_user()"), which made me look more
      at those functions.
      
      It turns out that unsafe_get_user() had a type issue too: it limited the
      largest size of the type it could handle to "unsigned long".  Which is
      fine with the current users, but doesn't match our existing normal
      get_user() semantics, which can also handle "u64" even when that does
      not fit in a long.
      
      While at it, also clean up the type cast in unsafe_put_user().  We
      actually want to just make it an assignment to the expected type of the
      pointer, because we actually do want warnings from types that don't
      convert silently.  And it makes the code more readable by not having
      that one very long and complex line.
      
      [ This patch might become stable material if we ever end up back-porting
        any new users of the unsafe uaccess code, but as things stand now this
        doesn't matter for any current existing uses. ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      334a023e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f3926e4c
      Linus Torvalds authored
      Pull misc uaccess fixes from Al Viro:
       "Fix for unsafe_put_user() (no callers currently in mainline, but
        anyone starting to use it will step into that) + alpha osf_wait4()
        infoleak fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        osf_wait4(): fix infoleak
        fix unsafe_put_user()
      f3926e4c
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 970c305a
      Linus Torvalds authored
      Pull scheduler fix from Thomas Gleixner:
       "A single scheduler fix:
      
        Prevent idle task from ever being preempted. That makes sure that
        synchronize_rcu_tasks() which is ignoring idle task does not pretend
        that no task is stuck in preempted state. If that happens and idle was
        preempted on a ftrace trampoline the machine crashes due to
        inconsistent state"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Call __schedule() from do_idle() without enabling preemption
      970c305a
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7a3d627
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of small fixes for the irq subsystem:
      
         - Cure a data ordering problem with chained interrupts
      
         - Three small fixlets for the mbigen irq chip"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Fix chained interrupt data ordering
        irqchip/mbigen: Fix the clear register offset calculation
        irqchip/mbigen: Fix potential NULL dereferencing
        irqchip/mbigen: Fix memory mapping code
      e7a3d627
    • Al Viro's avatar
      osf_wait4(): fix infoleak · a8c39544
      Al Viro authored
      failing sys_wait4() won't fill struct rusage...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a8c39544
    • Al Viro's avatar
      fix unsafe_put_user() · a7cc722f
      Al Viro authored
      __put_user_size() relies upon its first argument having the same type as what
      the second one points to; the only other user makes sure of that and
      unsafe_put_user() should do the same.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a7cc722f
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 56f410cf
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix a bug caused by not cleaning up the new instance unique triggers
         when deleting an instance. It also creates a selftest that triggers
         that bug.
      
       - Fix the delayed optimization happening after kprobes boot up self
         tests being removed by freeing of init memory.
      
       - Comment kprobes on why the delay optimization is not a problem for
         removal of modules, to keep other developers from searching that
         riddle.
      
       - Fix another case of rcu not watching in stack trace tracing.
      
      * tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Make sure RCU is watching before calling a stack trace
        kprobes: Document how optimized kprobes are removed from module unload
        selftests/ftrace: Add test to remove instance with active event triggers
        selftests/ftrace: Fix bashisms
        ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
        ftrace/instances: Clear function triggers when removing instances
        ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
        tracing/kprobes: Enforce kprobes teardown after testing
        tracing: Move postpone selftests to core from early_initcall
      56f410cf
  7. 20 May, 2017 9 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 894e2164
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A small collection of fixes that should go into this cycle.
      
         - a pull request from Christoph for NVMe, which ended up being
           manually applied to avoid pulling in newer bits in master. Mostly
           fibre channel fixes from James, but also a few fixes from Jon and
           Vijay
      
         - a pull request from Konrad, with just a single fix for xen-blkback
           from Gustavo.
      
         - a fuseblk bdi fix from Jan, fixing a regression in this series with
           the dynamic backing devices.
      
         - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull().
      
         - a request leak fix for drbd from Lars, fixing a regression in the
           last series with the kref changes. This will go to stable as well"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvmet: release the sq ref on rdma read errors
        nvmet-fc: remove target cpu scheduling flag
        nvme-fc: stop queues on error detection
        nvme-fc: require target or discovery role for fc-nvme targets
        nvme-fc: correct port role bits
        nvme: unmap CMB and remove sysfs file in reset path
        blktrace: fix integer parse
        fuseblk: Fix warning in super_setup_bdi_name()
        block: xen-blkback: add null check to avoid null pointer dereference
        drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
      894e2164
    • Vijay Immanuel's avatar
      nvmet: release the sq ref on rdma read errors · 549f01ae
      Vijay Immanuel authored
      On rdma read errors, release the sq ref that was taken
      when the req was initialized. This avoids a hang in
      nvmet_sq_destroy() when the queue is being freed.
      Signed-off-by: default avatarVijay Immanuel <vijayi@attalasystems.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      549f01ae
    • James Smart's avatar
      nvmet-fc: remove target cpu scheduling flag · 4b8ba5fa
      James Smart authored
      Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4b8ba5fa
    • James Smart's avatar
      nvme-fc: stop queues on error detection · 2952a879
      James Smart authored
      Per the recommendation by Sagi on:
      http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html
      
      Rather than waiting for reset work thread to stop queues and abort the ios,
      immediately stop the queues on error detection. Reset thread will restop
      the queues (as it's called on other paths), but it does not appear to have
      a side effect.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2952a879
    • James Smart's avatar
      nvme-fc: require target or discovery role for fc-nvme targets · 85e6a6ad
      James Smart authored
      In order to create an association, the remoteport must be
      serving either a target role or a discovery role.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      85e6a6ad
    • James Smart's avatar
      nvme-fc: correct port role bits · 41231090
      James Smart authored
      FC Port roles is a bit mask, not individual values.
      Correct nvme definitions to unique bits.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      41231090
    • Jon Derrick's avatar
      nvme: unmap CMB and remove sysfs file in reset path · f63572df
      Jon Derrick authored
      CMB doesn't get unmapped until removal while getting remapped on every
      reset. Add the unmapping and sysfs file removal to the reset path in
      nvme_pci_disable to match the mapping path in nvme_pci_enable.
      
      Fixes: 202021c1 ("nvme : Add sysfs entry for NVMe CMBs when appropriate")
      Signed-off-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-By: default avatarStephen Bates <sbates@raithlin.com>
      Cc: <stable@vger.kernel.org> # 4.9+
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f63572df
    • Linus Torvalds's avatar
      Merge tag 'staging-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · ef82f1ad
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are a number of staging driver fixes for 4.12-rc2
      
        Most of them are typec driver fixes found by reviewers and users of
        the code. There are also some removals of files no longer needed in
        the tree due to the ion driver rewrite in 4.12-rc1, as well as some
        wifi driver fixes. And to round it out, a MAINTAINERS file update.
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (22 commits)
        MAINTAINERS: greybus-dev list is members-only
        staging: fsl-dpaa2/eth: add ETHERNET dependency
        staging: typec: fusb302: refactor resume retry mechanism
        staging: typec: fusb302: reset i2c_busy state in error
        staging: rtl8723bs: remove re-positioned call to kfree in os_dep/ioctl_cfg80211.c
        staging: rtl8192e: GetTs Fix invalid TID 7 warning.
        staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
        staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
        staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
        staging: vc04_services: Fix bulk cache maintenance
        staging: ccree: remove extraneous spin_unlock_bh() in error handler
        staging: typec: Fix sparse warnings about incorrect types
        staging: typec: fusb302: do not free gpio from managed resource
        staging: typec: tcpm: Fix Port Power Role field in PS_RDY messages
        staging: typec: tcpm: Respond to Discover Identity commands
        staging: typec: tcpm: Set correct flags in PD request messages
        staging: typec: tcpm: Drop duplicate PD messages
        staging: typec: fusb302: Fix chip->vbus_present init value
        staging: typec: fusb302: Fix module autoload
        staging: typec: tcpci: declare private structure as static
        ...
      ef82f1ad
    • Linus Torvalds's avatar
      Merge tag 'usb-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 32026293
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of small USB fixes for 4.12-rc2
      
        Most of them come from Johan, in his valiant quest to fix up all
        drivers that could be affected by "malicious" USB devices. There's
        also some fixes for more "obscure" drivers to handle some of the
        vmalloc stack fallout (which for USB drivers, was always the case, but
        very few people actually ran those systems...)
      
        Other than that, the normal set of xhci and gadget and musb driver
        fixes as well.
      
        All have been in linux-next with no reported issues"
      
      * tag 'usb-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (42 commits)
        usb: musb: tusb6010_omap: Do not reset the other direction's packet size
        usb: musb: Fix trying to suspend while active for OTG configurations
        usb: host: xhci-plat: propagate return value of platform_get_irq()
        xhci: Fix command ring stop regression in 4.11
        xhci: remove GFP_DMA flag from allocation
        USB: xhci: fix lock-inversion problem
        usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd
        usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
        xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
        usb: xhci: trace URB before giving it back instead of after
        USB: serial: qcserial: add more Lenovo EM74xx device IDs
        USB: host: xhci: use max-port define
        USB: hub: fix SS max number of ports
        USB: hub: fix non-SS hub-descriptor handling
        USB: hub: fix SS hub-descriptor handling
        USB: usbip: fix nonconforming hub descriptor
        USB: gadget: dummy_hcd: fix hub-descriptor removable fields
        doc-rst: fixed kernel-doc directives in usb/typec.rst
        USB: core: of: document reference taken by companion helper
        USB: ehci-platform: fix companion-device leak
        ...
      32026293