- 23 Jun, 2017 9 commits
-
-
Eric Biggers authored
Currently, filesystems allow truncate(2) on an encrypted file without the encryption key. However, it's impossible to correctly handle the case where the size being truncated to is not a multiple of the filesystem block size, because that would require decrypting the final block, zeroing the part beyond i_size, then encrypting the block. As other modifications to encrypted file contents are prohibited without the key, just prohibit truncate(2) as well, making it fail with ENOKEY. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Eric Biggers authored
Since only an open file can be mmap'ed, and we only allow open()ing an encrypted file when its key is available, there is no need to check for the key again before permitting each mmap(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Chao Yu authored
kstrtoull returns 0 on success, however, in reserved_clusters_store we will return -EINVAL if kstrtoull returns 0, it makes us fail to update reserved_clusters value through sysfs. Fixes: 76d33bca Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Darrick J. Wong authored
For 1k-block filesystems, the filesystem starts at block 1, not block 0. This fact is recorded in s_first_data_block, so use that to bump up the start_fsb before we start querying the filesystem for its space map. Without this, ext4/026 fails on 1k block ext4 because various functions (notably ext4_get_group_no_and_offset) don't know what to do with an fsblock that is "before" the start of the filesystem and return garbage results (blockgroup 2^32-1, etc.) that confuse fsmap. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
Previously a bad directory block with a bad checksum is skipped; we should be returning EFSBADCRC (aka EBADMSG). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Khazhismel Kumykov authored
Previously, a read error would be ignored and we would eventually return NULL from ext4_find_entry, which signals "no such file or directory". We should be returning EIO. Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
-
Eric Biggers authored
Currently it's possible to encrypt all files and directories on an ext4 filesystem by deleting everything, including lost+found, then setting an encryption policy on the root directory. However, this is incompatible with e2fsck because e2fsck expects to find, create, and/or write to lost+found and does not have access to any encryption keys. Especially problematic is that if e2fsck can't find lost+found, it will create it without regard for whether the root directory is encrypted. This is wrong for obvious reasons, and it causes a later run of e2fsck to consider the lost+found directory entry to be corrupted. Encrypting the root directory may also be of limited use because it is the "all-or-nothing" use case, for which dm-crypt can be used instead. (By design, encryption policies are inherited and cannot be overridden; so the root directory having an encryption policy implies that all files and directories on the filesystem have that same encryption policy.) In any case, encrypting the root directory is broken currently and must not be allowed; so start returning an error if userspace requests it. For now only do this in ext4, because f2fs and ubifs do not appear to have the lost+found requirement. We could move it into fscrypt_ioctl_set_policy() later if desired, though. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
-
Daeho Jeong authored
Now, when we mount ext4 filesystem with '-o discard' option, we have to issue all the discard commands for the blocks to be deallocated and wait for the completion of the commands on the commit complete phase. Because this procedure might involve a lot of sequential combinations of issuing discard commands and waiting for that, the delay of this procedure might be too much long, even to 17.0s in our test, and it results in long commit delay and fsync() performance degradation. To reduce this kind of delay, instead of adding callback for each extent and handling all of them in a sequential manner on commit phase, we instead add a separate list of extents to free to the superblock and then process this list at once after transaction commits so that we can issue all the discard commands in a parallel manner like XFS filesystem. Finally, we could enhance the discard command handling performance. The result was such that 17.0s delay of a single commit in the worst case has been enhanced to 4.8s. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Hobin Woo <hobin.woo@samsung.com> Tested-by: Kitae Lee <kitae87.lee@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz>
-
Jan Kara authored
These days inode reclaim calls evict_inode() only when it has no pages in the mapping. In that case it is not necessary to wait for transaction commit in ext4_evict_inode() as there can be no pages waiting to be committed. So avoid unnecessary transaction waiting in that case. We still have to keep the check for the case where ext4_evict_inode() gets called from other paths (e.g. umount) where inode still can have some page cache pages. Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 22 Jun, 2017 31 commits
-
-
Tahsin Erdogan authored
The main purpose of mb cache is to achieve deduplication in extended attributes. In use cases where opportunity for deduplication is unlikely, it only adds overhead. Add a mount option to explicitly turn off mb cache. Suggested-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
To verify that a xattr entry is not pointing to the wrong xattr inode, we currently check that the target inode has EXT4_EA_INODE_FL flag set and also the entry size matches the target inode size. For stronger validation, also incorporate crc32c hash of the value into the e_hash field. This is done regardless of whether the entry lives in the inode body or external attribute block. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
When an extended attribute block is modified, ext4_xattr_hash_entry() recalculates e_hash for the entry that is pointed by s->here. This is unnecessary if the modification is to remove an entry. Currently, if the removed entry is the last one and there are other entries remaining, hash calculation targets the just erased entry which has been filled with zeroes and effectively does nothing. If the removed entry is not the last one and there are more entries, this time it will recalculate hash on the next entry which is totally unnecessary. Fix these by moving the decision on when to recalculate hash to ext4_xattr_set_entry(). Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
New ea_inode feature allows putting large xattr values into external inodes. struct ext4_xattr_entry and the attribute name however have to remain in the inode extra space or external attribute block. Once that space is exhausted, no further entries can be added. Some of that space could also be used by values that fit in there at the time of addition. So, a single xattr entry whose value barely fits in the external block could prevent further entries being added. To mitigate the problem, this patch introduces a notion of reserved space in the external attribute block that cannot be used by value data. This reserve is enforced when ea_inode feature is enabled. The amount of reserve is arbitrarily chosen to be min(block_size/8, 1024). The table below shows how much space is reserved for each block size and the guaranteed mininum number of entries that can be placed in the external attribute block. block size reserved bytes entries (name length = 16) 1k 128 3 2k 256 7 4k 512 15 8k 1024 31 16k 1024 31 32k 1024 31 64k 1024 31 Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Ext4 ea_inode feature allows storing xattr values in external inodes to be able to store values that are bigger than a block in size. Ext4 also has deduplication support for these type of inodes. With deduplication, the actual storage waste is eliminated but the users of such inodes are still charged full quota for the inodes as if there was no sharing happening in the background. This design requires ext4 to manually charge the users because the inodes are shared. An implication of this is that, if someone calls chown on a file that has such references we need to transfer the quota for the file and xattr inodes. Current dquot_transfer() function implicitly transfers one inode charge. With ea_inode feature, we would like to transfer multiple inode charges. Add get_inode_usage callback which can interrogate the total number of inodes that were charged for a given inode. [ Applied fix from Colin King to make sure the 'ret' variable is initialized on the successful return path. Detected by CoverityScan, CID#1446616 ("Uninitialized scalar variable") --tytso] Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jan Kara <jack@suse.cz>
-
Tahsin Erdogan authored
Ext4 now supports xattr values that are up to 64k in size (vfs limit). Large xattr values are stored in external inodes each one holding a single value. Once written the data blocks of these inodes are immutable. The real world use cases are expected to have a lot of value duplication such as inherited acls etc. To reduce data duplication on disk, this patch implements a deduplicator that allows sharing of xattr inodes. The deduplication is based on an in-memory hash lookup that is a best effort sharing scheme. When a xattr inode is read from disk (i.e. getxattr() call), its crc32c hash is added to a hash table. Before creating a new xattr inode for a value being set, the hash table is checked to see if an existing inode holds an identical value. If such an inode is found, the ref count on that inode is incremented. On value removal the ref count is decremented and if it reaches zero the inode is deleted. The quota charging for such inodes is manually managed. Every reference holder is charged the full size as if there was no sharing happening. This is consistent with how xattr blocks are also charged. [ Fixed up journal credits calculation to handle inline data and the rare case where an shared xattr block can get freed when two thread race on breaking the xattr block sharing. --tytso ] Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
During inode deletion, the number of journal credits that will be needed is hard to determine. For that reason we have journal extend/restart calls in several places. Whenever a transaction is restarted, filesystem must be in a consistent state because there is no atomicity guarantee beyond a restart call. Add ext4_xattr_ensure_credits() helper function which takes care of journal extend/restart logic. It also handles getting jbd2 write access and dirty metadata calls. This function is called at every iteration of handling an ea_inode reference. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
IS_NOQUOTA() indicates whether quota is disabled for an inode. Ext4 also uses it to check whether an inode is for a quota file. The distinction currently doesn't matter because quota is disabled only for the quota files. When we start disabling quota for other inodes in the future, we will want to make the distinction clear. Replace IS_NOQUOTA() call with ext4_is_quota_file() at places where we are checking for quota files. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
There will be a second mb_cache instance that tracks ea_inodes. Make existing names more explicit so that it is clear that they refer to xattr block cache. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Make names more generic so that mbcache usage is not limited to block sharing. In a subsequent patch in the series ("ext4: xattr inode deduplication"), we start using the mbcache code for sharing xattr inodes. With that patch, old mb_cache_entry.e_block field could be holding either a block number or an inode number. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Since this is a xattr specific data structure it is cleaner to keep it in xattr header file. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Tracking struct inode * rather than the inode number eliminates the repeated ext4_xattr_inode_iget() call later. The second call cannot fail in practice but still requires explanation when it wants to ignore the return value. Avoid the trouble and make things simple. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Both ext4_set_acl() and ext4_set_context() need to be made aware of ea_inode feature when it comes to credits calculation. Also add a sufficient credits check in ext4_xattr_set_handle() right after xattr write lock is grabbed. Original credits calculation is done outside the lock so there is a possiblity that the initially calculated credits are not sufficient anymore. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
In a few places the function returns without trying to pass the actual error code to the caller. Fix those. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
When value size is <= EXT4_XATTR_MIN_LARGE_EA_SIZE(), and it doesn't fit in either inline or xattr block, a second try is made to store it in an external inode while storing the entry itself in inline area. There should also be an attempt to store the entry in xattr block. This patch adds a retry loop to do that. It also makes the caller the sole decider on whether to store a value in an external inode. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
When there is no space for a value in xattr block, it may be stored in an xattr inode even if the value length is less than EXT4_XATTR_MIN_LARGE_EA_SIZE(). So the current assumption in credits calculation is wrong. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
When a xattr entry refers to an external inode, the value data is not available in the inline area so we should not attempt to read it using value offset. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
When moving xattr entries from inline area to a xattr block, entries that refer to external xattr inodes need special handling because value data is not available in the inline area but rather should be read from its external inode. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
ext4_xattr_make_inode_space() is interested in calculating the inline space used in an inode. When a xattr entry refers to an external inode the value size indicates the external inode size, not the value size in the inline area. Change the function to take this into account. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
ext4_xattr_value_same() is used as a quick optimization in case the new xattr value is identical to the previous value. When xattr value is stored in a xattr inode the check becomes expensive so it is better to just assume that they are not equal. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Two places in code missed converting xattr inode number using le32_to_cpu(). Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
The input and output values of *size parameter are equal on successful return from ext4_xattr_inode_get(). On error return, the callers ignore the output value so there is no need to update it. Also check for NULL return from ext4_bread(). If the actual xattr inode size happens to be smaller than the expected size, ext4_bread() may return NULL which would indicate data corruption. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
In general, kernel functions indicate success/failure through their return values. This function returns the status as an output parameter and reserves the return value for the inode. Make it follow the general convention. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
EXT4_XATTR_MAX_LARGE_EA_SIZE definition in ext4 is currently unused. Besides, vfs enforces its own 64k limit which makes the 1MB limit in ext4 redundant. Remove it. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
The ref count on ea_inode is incremented by ext4_xattr_inode_orphan_add() which is supposed to be decremented by ext4_xattr_inode_array_free(). The decrement is conditioned on whether the ea_inode is currently on the orphan list. However, the orphan list addition only happens when journaling is enabled. In non-journaled case,r we fail to release the ref count causing an error message like below. "VFS: Busy inodes after unmount of sdb. Self-destruct in 5 seconds. Have a nice day..." Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
ea_inode contents are treated as metadata, that's why it is journaled during initial writes. Failing to call revoke during freeing could cause user data to be overwritten with original ea_inode contents during journal replay. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Quota charging is based on the ownership of the inode. Currently, the xattr inode owner is set to the caller which may be different from the parent inode owner. This is inconsistent with how quota is charged for xattr block and regular data block writes. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
In data=ordered mode jinode needs to be attached to the xattr inode when writing data to it. Attachment normally occurs during file open for regular files. Since we are not using file interface to write to the xattr inode, the jinode attach needs to be done manually. Otherwise the following crash occurs in data=ordered mode. BUG: unable to handle kernel NULL pointer dereference at (null) IP: jbd2_journal_file_inode+0x37/0x110 PGD 13b3c0067 P4D 13b3c0067 PUD 137660067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 1877 Comm: python Not tainted 4.12.0-rc1+ #749 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88010e368980 task.stack: ffffc90000374000 RIP: 0010:jbd2_journal_file_inode+0x37/0x110 RSP: 0018:ffffc90000377980 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880123b06230 RCX: 0000000000280000 RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88012c8585d0 RBP: ffffc900003779b0 R08: 0000000000000202 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000400 R12: ffff8801111f81c0 R13: ffff88013b2b6800 R14: ffffc90000377ab0 R15: 0000000000000001 FS: 00007f0c99b77740(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000136d91000 CR4: 00000000000006e0 Call Trace: jbd2_journal_inode_add_write+0xe/0x10 ext4_map_blocks+0x59e/0x620 ext4_xattr_set_entry+0x501/0x7d0 ext4_xattr_block_set+0x1b2/0x9b0 ext4_xattr_set_handle+0x322/0x4f0 ext4_xattr_set+0x144/0x1a0 ext4_xattr_user_set+0x34/0x40 __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x69/0x1c0 vfs_setxattr+0xa2/0xb0 setxattr+0x12e/0x150 path_setxattr+0x87/0xb0 SyS_setxattr+0xf/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
We don't need acls on xattr inodes because they are not directly accessible from user mode. Besides lockdep complains about recursive locking of xattr_sem as seen below. ============================================= [ INFO: possible recursive locking detected ] 4.11.0-rc8+ #402 Not tainted --------------------------------------------- python/1894 is trying to acquire lock: (&ei->xattr_sem){++++..}, at: [<ffffffff804878a6>] ext4_xattr_get+0x66/0x270 but task is already holding lock: (&ei->xattr_sem){++++..}, at: [<ffffffff80489500>] ext4_xattr_set_handle+0xa0/0x5d0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ei->xattr_sem); lock(&ei->xattr_sem); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by python/1894: #0: (sb_writers#10){.+.+.+}, at: [<ffffffff803d829f>] mnt_want_write+0x1f/0x50 #1: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803dda27>] vfs_setxattr+0x57/0xb0 #2: (&ei->xattr_sem){++++..}, at: [<ffffffff80489500>] ext4_xattr_set_handle+0xa0/0x5d0 stack backtrace: CPU: 0 PID: 1894 Comm: python Not tainted 4.11.0-rc8+ #402 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x67/0x99 __lock_acquire+0x5f3/0x1830 lock_acquire+0xb5/0x1d0 down_read+0x2f/0x60 ext4_xattr_get+0x66/0x270 ext4_get_acl+0x43/0x1e0 get_acl+0x72/0xf0 posix_acl_create+0x5e/0x170 ext4_init_acl+0x21/0xc0 __ext4_new_inode+0xffd/0x16b0 ext4_xattr_set_entry+0x5ea/0xb70 ext4_xattr_block_set+0x1b5/0x970 ext4_xattr_set_handle+0x351/0x5d0 ext4_xattr_set+0x124/0x180 ext4_xattr_user_set+0x34/0x40 __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x69/0x1c0 vfs_setxattr+0xa2/0xb0 setxattr+0x129/0x160 path_setxattr+0x87/0xb0 SyS_setxattr+0xf/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
ext4_orphan_add() requires caller to be holding the inode lock. Add missing lock statements. WARNING: CPU: 3 PID: 1806 at fs/ext4/namei.c:2731 ext4_orphan_add+0x4e/0x240 CPU: 3 PID: 1806 Comm: python Not tainted 4.12.0-rc1+ #746 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880135d466c0 task.stack: ffffc900014b0000 RIP: 0010:ext4_orphan_add+0x4e/0x240 RSP: 0018:ffffc900014b3d50 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801348fe1f0 RCX: ffffc900014b3c64 RDX: 0000000000000000 RSI: ffff8801348fe1f0 RDI: ffff8801348fe1f0 RBP: ffffc900014b3da0 R08: 0000000000000000 R09: ffffffff80e82025 R10: 0000000000004692 R11: 000000000000468d R12: ffff880137598000 R13: ffff880137217000 R14: ffff880134ac58d0 R15: 0000000000000000 FS: 00007fc50f09e740(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000008bc2e0 CR3: 00000001375ac000 CR4: 00000000000006e0 Call Trace: ext4_xattr_inode_orphan_add.constprop.19+0x9d/0xf0 ext4_xattr_delete_inode+0x1c4/0x2f0 ext4_evict_inode+0x15a/0x7f0 evict+0xc0/0x1a0 iput+0x16a/0x270 do_unlinkat+0x172/0x290 SyS_unlink+0x11/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Tahsin Erdogan authored
Setting a large xattr value may require writing the attribute contents to an external inode. In this case we may need to lock the xattr inode along with the parent inode. This doesn't pose a deadlock risk because xattr inodes are not directly visible to the user and their access is restricted. Assign a lockdep subclass to xattr inode's lock. ============================================ WARNING: possible recursive locking detected 4.12.0-rc1+ #740 Not tainted -------------------------------------------- python/1822 is trying to acquire lock: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff804912ca>] ext4_xattr_set_entry+0x65a/0x7b0 but task is already holding lock: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803d6687>] vfs_setxattr+0x57/0xb0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sb->s_type->i_mutex_key#15); lock(&sb->s_type->i_mutex_key#15); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by python/1822: #0: (sb_writers#10){.+.+.+}, at: [<ffffffff803d0eef>] mnt_want_write+0x1f/0x50 #1: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffff803d6687>] vfs_setxattr+0x57/0xb0 #2: (jbd2_handle){.+.+..}, at: [<ffffffff80493f40>] start_this_handle+0xf0/0x420 #3: (&ei->xattr_sem){++++..}, at: [<ffffffff804920ba>] ext4_xattr_set_handle+0x9a/0x4f0 stack backtrace: CPU: 0 PID: 1822 Comm: python Not tainted 4.12.0-rc1+ #740 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x67/0x9e __lock_acquire+0x5f3/0x1750 lock_acquire+0xb5/0x1d0 down_write+0x2c/0x60 ext4_xattr_set_entry+0x65a/0x7b0 ext4_xattr_block_set+0x1b2/0x9b0 ext4_xattr_set_handle+0x322/0x4f0 ext4_xattr_set+0x144/0x1a0 ext4_xattr_user_set+0x34/0x40 __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x69/0x1c0 vfs_setxattr+0xa2/0xb0 setxattr+0x12e/0x150 path_setxattr+0x87/0xb0 SyS_setxattr+0xf/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-