Commit 70cb0d02 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Added new ext4 debugging ioctls to allow userspace to get information
  about the state of the extent status cache.

  Dropped workaround for pre-1970 dates which were encoded incorrectly
  in pre-4.4 kernels. Since both the kernel correctly generates, and
  e2fsck detects and fixes this issue for the past four years, it'e time
  to drop the workaround. (Also, it's not like files with dates in the
  distant past were all that common in the first place.)

  A lot of miscellaneous bug fixes and cleanups, including some ext4
  Documentation fixes. Also included are two minor bug fixes in
  fs/unicode"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
  unicode: make array 'token' static const, makes object smaller
  unicode: Move static keyword to the front of declarations
  ext4: add missing bigalloc documentation.
  ext4: fix kernel oops caused by spurious casefold flag
  ext4: fix integer overflow when calculating commit interval
  ext4: use percpu_counters for extent_status cache hits/misses
  ext4: fix potential use after free after remounting with noblock_validity
  jbd2: add missing tracepoint for reserved handle
  ext4: fix punch hole for inline_data file systems
  ext4: rework reserved cluster accounting when invalidating pages
  ext4: documentation fixes
  ext4: treat buffers with write errors as containing valid data
  ext4: fix warning inside ext4_convert_unwritten_extents_endio
  ext4: set error return correctly when ext4_htree_store_dirent fails
  ext4: drop legacy pre-1970 encoding workaround
  ext4: add new ioctl EXT4_IOC_GET_ES_CACHE
  ext4: add a new ioctl EXT4_IOC_GETSTATE
  ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE
  jbd2: flush_descriptor(): Do not decrease buffer head's ref count
  ext4: remove unnecessary error check
  ...
parents 104c0d6b 040823b5
...@@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size ...@@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size
exceeds the page size. However, for a filesystem of mostly huge files, exceeds the page size. However, for a filesystem of mostly huge files,
it is desirable to be able to allocate disk blocks in units of multiple it is desirable to be able to allocate disk blocks in units of multiple
blocks to reduce both fragmentation and metadata overhead. The blocks to reduce both fragmentation and metadata overhead. The
`bigalloc <Bigalloc>`__ feature provides exactly this ability. The bigalloc feature provides exactly this ability.
administrator can set a block cluster size at mkfs time (which is stored
in the s\_log\_cluster\_size field in the superblock); from then on, the The bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to
block bitmaps track clusters, not individual blocks. This means that use clustered allocation, so that each bit in the ext4 block allocation
block groups can be several gigabytes in size (instead of just 128MiB); bitmap addresses a power of two number of blocks. For example, if the
however, the minimum allocation unit becomes a cluster, not a block, file system is mainly going to be storing large files in the 4-32
even for directories. TaoBao had a patchset to extend the “use units of megabyte range, it might make sense to set a cluster size of 1 megabyte.
clusters instead of blocks” to the extent tree, though it is not clear This means that each bit in the block allocation bitmap now addresses
where those patches went-- they eventually morphed into “extent tree v2” 256 4k blocks. This shrinks the total size of the block allocation
but that code has not landed as of May 2015. bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also
means that a block group addresses 32 gigabytes instead of 128 megabytes,
also shrinking the amount of file system overhead for metadata.
The administrator can set a block cluster size at mkfs time (which is
stored in the s\_log\_cluster\_size field in the superblock); from then
on, the block bitmaps track clusters, not individual blocks. This means
that block groups can be several gigabytes in size (instead of just
128MiB); however, the minimum allocation unit becomes a cluster, not a
block, even for directories. TaoBao had a patchset to extend the “use
units of clusters instead of blocks” to the extent tree, though it is
not clear where those patches went-- they eventually morphed into
“extent tree v2” but that code has not landed as of May 2015.
...@@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the ...@@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the
superblock, group descriptors, data block bitmaps for groups 0-3, inode superblock, group descriptors, data block bitmaps for groups 0-3, inode
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
space in group 0 is for file data. The effect of this is to group the space in group 0 is for file data. The effect of this is to group the
block metadata close together for faster loading, and to enable large block group metadata close together for faster loading, and to enable
files to be continuous on disk. Backup copies of the superblock and large files to be continuous on disk. Backup copies of the superblock
group descriptors are always at the beginning of block groups, even if and group descriptors are always at the beginning of block groups, even
flex\_bg is enabled. The number of block groups that make up a flex\_bg if flex\_bg is enabled. The number of block groups that make up a
is given by 2 ^ ``sb.s_log_groups_per_flex``. flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
Meta Block Groups Meta Block Groups
----------------- -----------------
......
...@@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is ...@@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is
4KiB. You may experience mounting problems if block size is greater than 4KiB. You may experience mounting problems if block size is greater than
page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory
pages). By default a filesystem can contain 2^32 blocks; if the '64bit' pages). By default a filesystem can contain 2^32 blocks; if the '64bit'
feature is enabled, then a filesystem can have 2^64 blocks. feature is enabled, then a filesystem can have 2^64 blocks. The location
of structures is stored in terms of the block number the structure lives
in and not the absolute offset on disk.
For 32-bit filesystems, limits are as follows: For 32-bit filesystems, limits are as follows:
......
...@@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference ...@@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference
- File name. - File name.
Since file names cannot be longer than 255 bytes, the new directory Since file names cannot be longer than 255 bytes, the new directory
entry format shortens the rec\_len field and uses the space for a file entry format shortens the name\_len field and uses the space for a file
type flag, probably to avoid having to load every inode during directory type flag, probably to avoid having to load every inode during directory
tree traversal. This format is ``ext4_dir_entry_2``, which is at most tree traversal. This format is ``ext4_dir_entry_2``, which is at most
263 bytes long, though on disk you'll need to reference 263 bytes long, though on disk you'll need to reference
......
...@@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``. ...@@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
* - 0x1E * - 0x1E
- \_\_le16 - \_\_le16
- bg\_checksum - bg\_checksum
- Group descriptor checksum; crc16(sb\_uuid+group+desc) if the - Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the
RO\_COMPAT\_GDT\_CSUM feature is set, or crc32c(sb\_uuid+group\_desc) & RO\_COMPAT\_GDT\_CSUM feature is set, or
0xFFFF if the RO\_COMPAT\_METADATA\_CSUM feature is set. crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the
RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum
field in bg\_desc is skipped when calculating crc16 checksum,
and set to zero if crc32c checksum is used.
* - * -
- -
- -
......
...@@ -472,8 +472,8 @@ inode, which allows struct ext4\_inode to grow for a new kernel without ...@@ -472,8 +472,8 @@ inode, which allows struct ext4\_inode to grow for a new kernel without
having to upgrade all of the on-disk inodes. Access to fields beyond having to upgrade all of the on-disk inodes. Access to fields beyond
EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within
``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as ``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as
of October 2013) the inode structure is 156 bytes of August 2019) the inode structure is 160 bytes
(``i_extra_isize = 28``). The extra space between the end of the inode (``i_extra_isize = 32``). The extra space between the end of the inode
structure and the end of the inode record can be used to store extended structure and the end of the inode record can be used to store extended
attributes. Each inode record can be as large as the filesystem block attributes. Each inode record can be as large as the filesystem block
size, though this is not terribly efficient. size, though this is not terribly efficient.
......
...@@ -58,7 +58,7 @@ The ext4 superblock is laid out as follows in ...@@ -58,7 +58,7 @@ The ext4 superblock is laid out as follows in
* - 0x1C * - 0x1C
- \_\_le32 - \_\_le32
- s\_log\_cluster\_size - s\_log\_cluster\_size
- Cluster size is (2 ^ s\_log\_cluster\_size) blocks if bigalloc is - Cluster size is 2 ^ (10 + s\_log\_cluster\_size) blocks if bigalloc is
enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size. enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size.
* - 0x20 * - 0x20
- \_\_le32 - \_\_le32
...@@ -447,7 +447,7 @@ The ext4 superblock is laid out as follows in ...@@ -447,7 +447,7 @@ The ext4 superblock is laid out as follows in
- Upper 8 bits of the s_wtime field. - Upper 8 bits of the s_wtime field.
* - 0x275 * - 0x275
- \_\_u8 - \_\_u8
- s\_wtime_hi - s\_mtime_hi
- Upper 8 bits of the s_mtime field. - Upper 8 bits of the s_mtime field.
* - 0x276 * - 0x276
- \_\_u8 - \_\_u8
...@@ -466,12 +466,20 @@ The ext4 superblock is laid out as follows in ...@@ -466,12 +466,20 @@ The ext4 superblock is laid out as follows in
- s\_last_error_time_hi - s\_last_error_time_hi
- Upper 8 bits of the s_last_error_time_hi field. - Upper 8 bits of the s_last_error_time_hi field.
* - 0x27A * - 0x27A
- \_\_u8[2] - \_\_u8
- s\_pad - s\_pad[2]
- Zero padding. - Zero padding.
* - 0x27C * - 0x27C
- \_\_le16
- s\_encoding
- Filename charset encoding.
* - 0x27E
- \_\_le16
- s\_encoding_flags
- Filename charset encoding flags.
* - 0x280
- \_\_le32 - \_\_le32
- s\_reserved[96] - s\_reserved[95]
- Padding to the end of the block. - Padding to the end of the block.
* - 0x3FC * - 0x3FC
- \_\_le32 - \_\_le32
...@@ -617,7 +625,7 @@ following: ...@@ -617,7 +625,7 @@ following:
* - 0x80 * - 0x80
- Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT). - Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT).
* - 0x100 * - 0x100
- Multiple mount protection. Not implemented (INCOMPAT\_MMP). - Multiple mount protection (INCOMPAT\_MMP).
* - 0x200 * - 0x200
- Flexible block groups. See the earlier discussion of this feature - Flexible block groups. See the earlier discussion of this feature
(INCOMPAT\_FLEX\_BG). (INCOMPAT\_FLEX\_BG).
......
...@@ -38,6 +38,7 @@ int __init ext4_init_system_zone(void) ...@@ -38,6 +38,7 @@ int __init ext4_init_system_zone(void)
void ext4_exit_system_zone(void) void ext4_exit_system_zone(void)
{ {
rcu_barrier();
kmem_cache_destroy(ext4_system_zone_cachep); kmem_cache_destroy(ext4_system_zone_cachep);
} }
...@@ -49,17 +50,26 @@ static inline int can_merge(struct ext4_system_zone *entry1, ...@@ -49,17 +50,26 @@ static inline int can_merge(struct ext4_system_zone *entry1,
return 0; return 0;
} }
static void release_system_zone(struct ext4_system_blocks *system_blks)
{
struct ext4_system_zone *entry, *n;
rbtree_postorder_for_each_entry_safe(entry, n,
&system_blks->root, node)
kmem_cache_free(ext4_system_zone_cachep, entry);
}
/* /*
* Mark a range of blocks as belonging to the "system zone" --- that * Mark a range of blocks as belonging to the "system zone" --- that
* is, filesystem metadata blocks which should never be used by * is, filesystem metadata blocks which should never be used by
* inodes. * inodes.
*/ */
static int add_system_zone(struct ext4_sb_info *sbi, static int add_system_zone(struct ext4_system_blocks *system_blks,
ext4_fsblk_t start_blk, ext4_fsblk_t start_blk,
unsigned int count) unsigned int count)
{ {
struct ext4_system_zone *new_entry = NULL, *entry; struct ext4_system_zone *new_entry = NULL, *entry;
struct rb_node **n = &sbi->system_blks.rb_node, *node; struct rb_node **n = &system_blks->root.rb_node, *node;
struct rb_node *parent = NULL, *new_node = NULL; struct rb_node *parent = NULL, *new_node = NULL;
while (*n) { while (*n) {
...@@ -91,7 +101,7 @@ static int add_system_zone(struct ext4_sb_info *sbi, ...@@ -91,7 +101,7 @@ static int add_system_zone(struct ext4_sb_info *sbi,
new_node = &new_entry->node; new_node = &new_entry->node;
rb_link_node(new_node, parent, n); rb_link_node(new_node, parent, n);
rb_insert_color(new_node, &sbi->system_blks); rb_insert_color(new_node, &system_blks->root);
} }
/* Can we merge to the left? */ /* Can we merge to the left? */
...@@ -101,7 +111,7 @@ static int add_system_zone(struct ext4_sb_info *sbi, ...@@ -101,7 +111,7 @@ static int add_system_zone(struct ext4_sb_info *sbi,
if (can_merge(entry, new_entry)) { if (can_merge(entry, new_entry)) {
new_entry->start_blk = entry->start_blk; new_entry->start_blk = entry->start_blk;
new_entry->count += entry->count; new_entry->count += entry->count;
rb_erase(node, &sbi->system_blks); rb_erase(node, &system_blks->root);
kmem_cache_free(ext4_system_zone_cachep, entry); kmem_cache_free(ext4_system_zone_cachep, entry);
} }
} }
...@@ -112,7 +122,7 @@ static int add_system_zone(struct ext4_sb_info *sbi, ...@@ -112,7 +122,7 @@ static int add_system_zone(struct ext4_sb_info *sbi,
entry = rb_entry(node, struct ext4_system_zone, node); entry = rb_entry(node, struct ext4_system_zone, node);
if (can_merge(new_entry, entry)) { if (can_merge(new_entry, entry)) {
new_entry->count += entry->count; new_entry->count += entry->count;
rb_erase(node, &sbi->system_blks); rb_erase(node, &system_blks->root);
kmem_cache_free(ext4_system_zone_cachep, entry); kmem_cache_free(ext4_system_zone_cachep, entry);
} }
} }
...@@ -126,7 +136,7 @@ static void debug_print_tree(struct ext4_sb_info *sbi) ...@@ -126,7 +136,7 @@ static void debug_print_tree(struct ext4_sb_info *sbi)
int first = 1; int first = 1;
printk(KERN_INFO "System zones: "); printk(KERN_INFO "System zones: ");
node = rb_first(&sbi->system_blks); node = rb_first(&sbi->system_blks->root);
while (node) { while (node) {
entry = rb_entry(node, struct ext4_system_zone, node); entry = rb_entry(node, struct ext4_system_zone, node);
printk(KERN_CONT "%s%llu-%llu", first ? "" : ", ", printk(KERN_CONT "%s%llu-%llu", first ? "" : ", ",
...@@ -137,7 +147,47 @@ static void debug_print_tree(struct ext4_sb_info *sbi) ...@@ -137,7 +147,47 @@ static void debug_print_tree(struct ext4_sb_info *sbi)
printk(KERN_CONT "\n"); printk(KERN_CONT "\n");
} }
static int ext4_protect_reserved_inode(struct super_block *sb, u32 ino) /*
* Returns 1 if the passed-in block region (start_blk,
* start_blk+count) is valid; 0 if some part of the block region
* overlaps with filesystem metadata blocks.
*/
static int ext4_data_block_valid_rcu(struct ext4_sb_info *sbi,
struct ext4_system_blocks *system_blks,
ext4_fsblk_t start_blk,
unsigned int count)
{
struct ext4_system_zone *entry;
struct rb_node *n;
if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) ||
(start_blk + count < start_blk) ||
(start_blk + count > ext4_blocks_count(sbi->s_es))) {
sbi->s_es->s_last_error_block = cpu_to_le64(start_blk);
return 0;
}
if (system_blks == NULL)
return 1;
n = system_blks->root.rb_node;
while (n) {
entry = rb_entry(n, struct ext4_system_zone, node);
if (start_blk + count - 1 < entry->start_blk)
n = n->rb_left;
else if (start_blk >= (entry->start_blk + entry->count))
n = n->rb_right;
else {
sbi->s_es->s_last_error_block = cpu_to_le64(start_blk);
return 0;
}
}
return 1;
}
static int ext4_protect_reserved_inode(struct super_block *sb,
struct ext4_system_blocks *system_blks,
u32 ino)
{ {
struct inode *inode; struct inode *inode;
struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_sb_info *sbi = EXT4_SB(sb);
...@@ -163,14 +213,15 @@ static int ext4_protect_reserved_inode(struct super_block *sb, u32 ino) ...@@ -163,14 +213,15 @@ static int ext4_protect_reserved_inode(struct super_block *sb, u32 ino)
if (n == 0) { if (n == 0) {
i++; i++;
} else { } else {
if (!ext4_data_block_valid(sbi, map.m_pblk, n)) { if (!ext4_data_block_valid_rcu(sbi, system_blks,
map.m_pblk, n)) {
ext4_error(sb, "blocks %llu-%llu from inode %u " ext4_error(sb, "blocks %llu-%llu from inode %u "
"overlap system zone", map.m_pblk, "overlap system zone", map.m_pblk,
map.m_pblk + map.m_len - 1, ino); map.m_pblk + map.m_len - 1, ino);
err = -EFSCORRUPTED; err = -EFSCORRUPTED;
break; break;
} }
err = add_system_zone(sbi, map.m_pblk, n); err = add_system_zone(system_blks, map.m_pblk, n);
if (err < 0) if (err < 0)
break; break;
i += n; i += n;
...@@ -180,94 +231,130 @@ static int ext4_protect_reserved_inode(struct super_block *sb, u32 ino) ...@@ -180,94 +231,130 @@ static int ext4_protect_reserved_inode(struct super_block *sb, u32 ino)
return err; return err;
} }
static void ext4_destroy_system_zone(struct rcu_head *rcu)
{
struct ext4_system_blocks *system_blks;
system_blks = container_of(rcu, struct ext4_system_blocks, rcu);
release_system_zone(system_blks);
kfree(system_blks);
}
/*
* Build system zone rbtree which is used for block validity checking.
*
* The update of system_blks pointer in this function is protected by
* sb->s_umount semaphore. However we have to be careful as we can be
* racing with ext4_data_block_valid() calls reading system_blks rbtree
* protected only by RCU. That's why we first build the rbtree and then
* swap it in place.
*/
int ext4_setup_system_zone(struct super_block *sb) int ext4_setup_system_zone(struct super_block *sb)
{ {
ext4_group_t ngroups = ext4_get_groups_count(sb); ext4_group_t ngroups = ext4_get_groups_count(sb);
struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_system_blocks *system_blks;
struct ext4_group_desc *gdp; struct ext4_group_desc *gdp;
ext4_group_t i; ext4_group_t i;
int flex_size = ext4_flex_bg_size(sbi); int flex_size = ext4_flex_bg_size(sbi);
int ret; int ret;
if (!test_opt(sb, BLOCK_VALIDITY)) { if (!test_opt(sb, BLOCK_VALIDITY)) {
if (sbi->system_blks.rb_node) if (sbi->system_blks)
ext4_release_system_zone(sb); ext4_release_system_zone(sb);
return 0; return 0;
} }
if (sbi->system_blks.rb_node) if (sbi->system_blks)
return 0; return 0;
system_blks = kzalloc(sizeof(*system_blks), GFP_KERNEL);
if (!system_blks)
return -ENOMEM;
for (i=0; i < ngroups; i++) { for (i=0; i < ngroups; i++) {
cond_resched(); cond_resched();
if (ext4_bg_has_super(sb, i) && if (ext4_bg_has_super(sb, i) &&
((i < 5) || ((i % flex_size) == 0))) ((i < 5) || ((i % flex_size) == 0)))
add_system_zone(sbi, ext4_group_first_block_no(sb, i), add_system_zone(system_blks,
ext4_group_first_block_no(sb, i),
ext4_bg_num_gdb(sb, i) + 1); ext4_bg_num_gdb(sb, i) + 1);
gdp = ext4_get_group_desc(sb, i, NULL); gdp = ext4_get_group_desc(sb, i, NULL);
ret = add_system_zone(sbi, ext4_block_bitmap(sb, gdp), 1); ret = add_system_zone(system_blks,
ext4_block_bitmap(sb, gdp), 1);
if (ret) if (ret)
return ret; goto err;
ret = add_system_zone(sbi, ext4_inode_bitmap(sb, gdp), 1); ret = add_system_zone(system_blks,
ext4_inode_bitmap(sb, gdp), 1);
if (ret) if (ret)
return ret; goto err;
ret = add_system_zone(sbi, ext4_inode_table(sb, gdp), ret = add_system_zone(system_blks,
ext4_inode_table(sb, gdp),
sbi->s_itb_per_group); sbi->s_itb_per_group);
if (ret) if (ret)
return ret; goto err;
} }
if (ext4_has_feature_journal(sb) && sbi->s_es->s_journal_inum) { if (ext4_has_feature_journal(sb) && sbi->s_es->s_journal_inum) {
ret = ext4_protect_reserved_inode(sb, ret = ext4_protect_reserved_inode(sb, system_blks,
le32_to_cpu(sbi->s_es->s_journal_inum)); le32_to_cpu(sbi->s_es->s_journal_inum));
if (ret) if (ret)
return ret; goto err;
} }
/*
* System blks rbtree complete, announce it once to prevent racing
* with ext4_data_block_valid() accessing the rbtree at the same
* time.
*/
rcu_assign_pointer(sbi->system_blks, system_blks);
if (test_opt(sb, DEBUG)) if (test_opt(sb, DEBUG))
debug_print_tree(sbi); debug_print_tree(sbi);
return 0; return 0;
err:
release_system_zone(system_blks);
kfree(system_blks);
return ret;
} }
/* Called when the filesystem is unmounted */ /*
* Called when the filesystem is unmounted or when remounting it with
* noblock_validity specified.
*
* The update of system_blks pointer in this function is protected by
* sb->s_umount semaphore. However we have to be careful as we can be
* racing with ext4_data_block_valid() calls reading system_blks rbtree
* protected only by RCU. So we first clear the system_blks pointer and
* then free the rbtree only after RCU grace period expires.
*/
void ext4_release_system_zone(struct super_block *sb) void ext4_release_system_zone(struct super_block *sb)
{ {
struct ext4_system_zone *entry, *n; struct ext4_system_blocks *system_blks;
rbtree_postorder_for_each_entry_safe(entry, n, system_blks = rcu_dereference_protected(EXT4_SB(sb)->system_blks,
&EXT4_SB(sb)->system_blks, node) lockdep_is_held(&sb->s_umount));
kmem_cache_free(ext4_system_zone_cachep, entry); rcu_assign_pointer(EXT4_SB(sb)->system_blks, NULL);
EXT4_SB(sb)->system_blks = RB_ROOT; if (system_blks)
call_rcu(&system_blks->rcu, ext4_destroy_system_zone);
} }
/*
* Returns 1 if the passed-in block region (start_blk,
* start_blk+count) is valid; 0 if some part of the block region
* overlaps with filesystem metadata blocks.
*/
int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk,
unsigned int count) unsigned int count)
{ {
struct ext4_system_zone *entry; struct ext4_system_blocks *system_blks;
struct rb_node *n = sbi->system_blks.rb_node; int ret;
if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || /*
(start_blk + count < start_blk) || * Lock the system zone to prevent it being released concurrently
(start_blk + count > ext4_blocks_count(sbi->s_es))) { * when doing a remount which inverse current "[no]block_validity"
sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); * mount option.
return 0; */
} rcu_read_lock();
while (n) { system_blks = rcu_dereference(sbi->system_blks);
entry = rb_entry(n, struct ext4_system_zone, node); ret = ext4_data_block_valid_rcu(sbi, system_blks, start_blk,
if (start_blk + count - 1 < entry->start_blk) count);
n = n->rb_left; rcu_read_unlock();
else if (start_blk >= (entry->start_blk + entry->count)) return ret;
n = n->rb_right;
else {
sbi->s_es->s_last_error_block = cpu_to_le64(start_blk);
return 0;
}
}
return 1;
} }
int ext4_check_blockref(const char *function, unsigned int line, int ext4_check_blockref(const char *function, unsigned int line,
......
...@@ -668,14 +668,15 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len, ...@@ -668,14 +668,15 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
const char *str, const struct qstr *name) const char *str, const struct qstr *name)
{ {
struct qstr qstr = {.name = str, .len = len }; struct qstr qstr = {.name = str, .len = len };
struct inode *inode = dentry->d_parent->d_inode;
if (!IS_CASEFOLDED(dentry->d_parent->d_inode)) { if (!IS_CASEFOLDED(inode) || !EXT4_SB(inode->i_sb)->s_encoding) {
if (len != name->len) if (len != name->len)
return -1; return -1;
return memcmp(str, name->name, len); return memcmp(str, name->name, len);
} }
return ext4_ci_compare(dentry->d_parent->d_inode, name, &qstr, false); return ext4_ci_compare(inode, name, &qstr, false);
} }
static int ext4_d_hash(const struct dentry *dentry, struct qstr *str) static int ext4_d_hash(const struct dentry *dentry, struct qstr *str)
...@@ -685,7 +686,7 @@ static int ext4_d_hash(const struct dentry *dentry, struct qstr *str) ...@@ -685,7 +686,7 @@ static int ext4_d_hash(const struct dentry *dentry, struct qstr *str)
unsigned char *norm; unsigned char *norm;
int len, ret = 0; int len, ret = 0;
if (!IS_CASEFOLDED(dentry->d_inode)) if (!IS_CASEFOLDED(dentry->d_inode) || !um)
return 0; return 0;
norm = kmalloc(PATH_MAX, GFP_ATOMIC); norm = kmalloc(PATH_MAX, GFP_ATOMIC);
......
...@@ -185,6 +185,14 @@ struct ext4_map_blocks { ...@@ -185,6 +185,14 @@ struct ext4_map_blocks {
unsigned int m_flags; unsigned int m_flags;
}; };
/*
* Block validity checking, system zone rbtree.
*/
struct ext4_system_blocks {
struct rb_root root;
struct rcu_head rcu;
};
/* /*
* Flags for ext4_io_end->flags * Flags for ext4_io_end->flags
*/ */
...@@ -285,6 +293,9 @@ struct ext4_io_submit { ...@@ -285,6 +293,9 @@ struct ext4_io_submit {
~((ext4_fsblk_t) (s)->s_cluster_ratio - 1)) ~((ext4_fsblk_t) (s)->s_cluster_ratio - 1))
#define EXT4_LBLK_CMASK(s, lblk) ((lblk) & \ #define EXT4_LBLK_CMASK(s, lblk) ((lblk) & \
~((ext4_lblk_t) (s)->s_cluster_ratio - 1)) ~((ext4_lblk_t) (s)->s_cluster_ratio - 1))
/* Fill in the low bits to get the last block of the cluster */
#define EXT4_LBLK_CFILL(sbi, lblk) ((lblk) | \
((ext4_lblk_t) (sbi)->s_cluster_ratio - 1))
/* Get the cluster offset */ /* Get the cluster offset */
#define EXT4_PBLK_COFF(s, pblk) ((pblk) & \ #define EXT4_PBLK_COFF(s, pblk) ((pblk) & \
((ext4_fsblk_t) (s)->s_cluster_ratio - 1)) ((ext4_fsblk_t) (s)->s_cluster_ratio - 1))
...@@ -653,6 +664,10 @@ enum { ...@@ -653,6 +664,10 @@ enum {
#define EXT4_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY #define EXT4_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY
#define EXT4_IOC_GET_ENCRYPTION_PWSALT FS_IOC_GET_ENCRYPTION_PWSALT #define EXT4_IOC_GET_ENCRYPTION_PWSALT FS_IOC_GET_ENCRYPTION_PWSALT
#define EXT4_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY #define EXT4_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY
/* ioctl codes 19--39 are reserved for fscrypt */
#define EXT4_IOC_CLEAR_ES_CACHE _IO('f', 40)
#define EXT4_IOC_GETSTATE _IOW('f', 41, __u32)
#define EXT4_IOC_GET_ES_CACHE _IOWR('f', 42, struct fiemap)
#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR #define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR #define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR
...@@ -666,6 +681,16 @@ enum { ...@@ -666,6 +681,16 @@ enum {
#define EXT4_GOING_FLAGS_LOGFLUSH 0x1 /* flush log but not data */ #define EXT4_GOING_FLAGS_LOGFLUSH 0x1 /* flush log but not data */
#define EXT4_GOING_FLAGS_NOLOGFLUSH 0x2 /* don't flush log nor data */ #define EXT4_GOING_FLAGS_NOLOGFLUSH 0x2 /* don't flush log nor data */
/*
* Flags returned by EXT4_IOC_GETSTATE
*
* We only expose to userspace a subset of the state flags in
* i_state_flags
*/
#define EXT4_STATE_FLAG_EXT_PRECACHED 0x00000001
#define EXT4_STATE_FLAG_NEW 0x00000002
#define EXT4_STATE_FLAG_NEWENTRY 0x00000004
#define EXT4_STATE_FLAG_DA_ALLOC_CLOSE 0x00000008
#if defined(__KERNEL__) && defined(CONFIG_COMPAT) #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/* /*
...@@ -683,6 +708,12 @@ enum { ...@@ -683,6 +708,12 @@ enum {
#define EXT4_IOC32_SETVERSION_OLD FS_IOC32_SETVERSION #define EXT4_IOC32_SETVERSION_OLD FS_IOC32_SETVERSION
#endif #endif
/*
* Returned by EXT4_IOC_GET_ES_CACHE as an additional possible flag.
* It indicates that the entry in extent status cache is for a hole.
*/
#define EXT4_FIEMAP_EXTENT_HOLE 0x08000000
/* Max physical block we can address w/o extents */ /* Max physical block we can address w/o extents */
#define EXT4_MAX_BLOCK_FILE_PHYS 0xFFFFFFFF #define EXT4_MAX_BLOCK_FILE_PHYS 0xFFFFFFFF
...@@ -812,21 +843,8 @@ static inline __le32 ext4_encode_extra_time(struct timespec64 *time) ...@@ -812,21 +843,8 @@ static inline __le32 ext4_encode_extra_time(struct timespec64 *time)
static inline void ext4_decode_extra_time(struct timespec64 *time, static inline void ext4_decode_extra_time(struct timespec64 *time,
__le32 extra) __le32 extra)
{ {
if (unlikely(extra & cpu_to_le32(EXT4_EPOCH_MASK))) { if (unlikely(extra & cpu_to_le32(EXT4_EPOCH_MASK)))
#if 1
/* Handle legacy encoding of pre-1970 dates with epoch
* bits 1,1. (This backwards compatibility may be removed
* at the discretion of the ext4 developers.)
*/
u64 extra_bits = le32_to_cpu(extra) & EXT4_EPOCH_MASK;
if (extra_bits == 3 && ((time->tv_sec) & 0x80000000) != 0)
extra_bits = 0;
time->tv_sec += extra_bits << 32;
#else
time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32;
#endif
}
time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
} }
...@@ -1427,7 +1445,7 @@ struct ext4_sb_info { ...@@ -1427,7 +1445,7 @@ struct ext4_sb_info {
int s_jquota_fmt; /* Format of quota to use */ int s_jquota_fmt; /* Format of quota to use */
#endif #endif
unsigned int s_want_extra_isize; /* New inodes should reserve # bytes */ unsigned int s_want_extra_isize; /* New inodes should reserve # bytes */
struct rb_root system_blks; struct ext4_system_blocks __rcu *system_blks;
#ifdef EXTENTS_STATS #ifdef EXTENTS_STATS
/* ext4 extents stats */ /* ext4 extents stats */
...@@ -3267,6 +3285,9 @@ extern int ext4_ext_check_inode(struct inode *inode); ...@@ -3267,6 +3285,9 @@ extern int ext4_ext_check_inode(struct inode *inode);
extern ext4_lblk_t ext4_ext_next_allocated_block(struct ext4_ext_path *path); extern ext4_lblk_t ext4_ext_next_allocated_block(struct ext4_ext_path *path);
extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len); __u64 start, __u64 len);
extern int ext4_get_es_cache(struct inode *inode,
struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
extern int ext4_ext_precache(struct inode *inode); extern int ext4_ext_precache(struct inode *inode);
extern int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len); extern int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len);
extern int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len); extern int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len);
...@@ -3359,6 +3380,19 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end) ...@@ -3359,6 +3380,19 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
extern const struct iomap_ops ext4_iomap_ops; extern const struct iomap_ops ext4_iomap_ops;
static inline int ext4_buffer_uptodate(struct buffer_head *bh)
{
/*
* If the buffer has the write error flag, we have failed
* to write out data in the block. In this case, we don't
* have to read the block because we may read the old data
* successfully.
*/
if (!buffer_uptodate(bh) && buffer_write_io_error(bh))
set_buffer_uptodate(bh);
return buffer_uptodate(bh);
}
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
#define EFSBADCRC EBADMSG /* Bad CRC detected */ #define EFSBADCRC EBADMSG /* Bad CRC detected */
......
...@@ -2315,6 +2315,52 @@ static int ext4_fill_fiemap_extents(struct inode *inode, ...@@ -2315,6 +2315,52 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
return err; return err;
} }
static int ext4_fill_es_cache_info(struct inode *inode,
ext4_lblk_t block, ext4_lblk_t num,
struct fiemap_extent_info *fieinfo)
{
ext4_lblk_t next, end = block + num - 1;
struct extent_status es;
unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
unsigned int flags;
int err;
while (block <= end) {
next = 0;
flags = 0;
if (!ext4_es_lookup_extent(inode, block, &next, &es))
break;
if (ext4_es_is_unwritten(&es))
flags |= FIEMAP_EXTENT_UNWRITTEN;
if (ext4_es_is_delayed(&es))
flags |= (FIEMAP_EXTENT_DELALLOC |
FIEMAP_EXTENT_UNKNOWN);
if (ext4_es_is_hole(&es))
flags |= EXT4_FIEMAP_EXTENT_HOLE;
if (next == 0)
flags |= FIEMAP_EXTENT_LAST;
if (flags & (FIEMAP_EXTENT_DELALLOC|
EXT4_FIEMAP_EXTENT_HOLE))
es.es_pblk = 0;
else
es.es_pblk = ext4_es_pblock(&es);
err = fiemap_fill_next_extent(fieinfo,
(__u64)es.es_lblk << blksize_bits,
(__u64)es.es_pblk << blksize_bits,
(__u64)es.es_len << blksize_bits,
flags);
if (next == 0)
break;
block = next;
if (err < 0)
return err;
if (err == 1)
return 0;
}
return 0;
}
/* /*
* ext4_ext_determine_hole - determine hole around given block * ext4_ext_determine_hole - determine hole around given block
* @inode: inode we lookup in * @inode: inode we lookup in
...@@ -3813,8 +3859,8 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle, ...@@ -3813,8 +3859,8 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
* illegal. * illegal.
*/ */
if (ee_block != map->m_lblk || ee_len > map->m_len) { if (ee_block != map->m_lblk || ee_len > map->m_len) {
#ifdef EXT4_DEBUG #ifdef CONFIG_EXT4_DEBUG
ext4_warning("Inode (%ld) finished: extent logical block %llu," ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu,"
" len %u; IO logical block %llu, len %u", " len %u; IO logical block %llu, len %u",
inode->i_ino, (unsigned long long)ee_block, ee_len, inode->i_ino, (unsigned long long)ee_block, ee_len,
(unsigned long long)map->m_lblk, map->m_len); (unsigned long long)map->m_lblk, map->m_len);
...@@ -5017,8 +5063,6 @@ static int ext4_find_delayed_extent(struct inode *inode, ...@@ -5017,8 +5063,6 @@ static int ext4_find_delayed_extent(struct inode *inode,
return next_del; return next_del;
} }
/* fiemap flags we can handle specified here */
#define EXT4_FIEMAP_FLAGS (FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR)
static int ext4_xattr_fiemap(struct inode *inode, static int ext4_xattr_fiemap(struct inode *inode,
struct fiemap_extent_info *fieinfo) struct fiemap_extent_info *fieinfo)
...@@ -5055,10 +5099,16 @@ static int ext4_xattr_fiemap(struct inode *inode, ...@@ -5055,10 +5099,16 @@ static int ext4_xattr_fiemap(struct inode *inode,
return (error < 0 ? error : 0); return (error < 0 ? error : 0);
} }
int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, static int _ext4_fiemap(struct inode *inode,
__u64 start, __u64 len) struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len,
int (*fill)(struct inode *, ext4_lblk_t,
ext4_lblk_t,
struct fiemap_extent_info *))
{ {
ext4_lblk_t start_blk; ext4_lblk_t start_blk;
u32 ext4_fiemap_flags = FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR;
int error = 0; int error = 0;
if (ext4_has_inline_data(inode)) { if (ext4_has_inline_data(inode)) {
...@@ -5075,14 +5125,18 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, ...@@ -5075,14 +5125,18 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
error = ext4_ext_precache(inode); error = ext4_ext_precache(inode);
if (error) if (error)
return error; return error;
fieinfo->fi_flags &= ~FIEMAP_FLAG_CACHE;
} }
/* fallback to generic here if not in extents fmt */ /* fallback to generic here if not in extents fmt */
if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) &&
fill == ext4_fill_fiemap_extents)
return generic_block_fiemap(inode, fieinfo, start, len, return generic_block_fiemap(inode, fieinfo, start, len,
ext4_get_block); ext4_get_block);
if (fiemap_check_flags(fieinfo, EXT4_FIEMAP_FLAGS)) if (fill == ext4_fill_es_cache_info)
ext4_fiemap_flags &= FIEMAP_FLAG_XATTR;
if (fiemap_check_flags(fieinfo, ext4_fiemap_flags))
return -EBADR; return -EBADR;
if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) { if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
...@@ -5101,12 +5155,36 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, ...@@ -5101,12 +5155,36 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
* Walk the extent tree gathering extent information * Walk the extent tree gathering extent information
* and pushing extents back to the user. * and pushing extents back to the user.
*/ */
error = ext4_fill_fiemap_extents(inode, start_blk, error = fill(inode, start_blk, len_blks, fieinfo);
len_blks, fieinfo);
} }
return error; return error;
} }
int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len)
{
return _ext4_fiemap(inode, fieinfo, start, len,
ext4_fill_fiemap_extents);
}
int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len)
{
if (ext4_has_inline_data(inode)) {
int has_inline;
down_read(&EXT4_I(inode)->xattr_sem);
has_inline = ext4_has_inline_data(inode);
up_read(&EXT4_I(inode)->xattr_sem);
if (has_inline)
return 0;
}
return _ext4_fiemap(inode, fieinfo, start, len,
ext4_fill_es_cache_info);
}
/* /*
* ext4_access_path: * ext4_access_path:
* Function to access the path buffer for marking it dirty. * Function to access the path buffer for marking it dirty.
......
This diff is collapsed.
...@@ -70,8 +70,8 @@ struct ext4_es_tree { ...@@ -70,8 +70,8 @@ struct ext4_es_tree {
struct ext4_es_stats { struct ext4_es_stats {
unsigned long es_stats_shrunk; unsigned long es_stats_shrunk;
unsigned long es_stats_cache_hits; struct percpu_counter es_stats_cache_hits;
unsigned long es_stats_cache_misses; struct percpu_counter es_stats_cache_misses;
u64 es_stats_scan_time; u64 es_stats_scan_time;
u64 es_stats_max_scan_time; u64 es_stats_max_scan_time;
struct percpu_counter es_stats_all_cnt; struct percpu_counter es_stats_all_cnt;
...@@ -140,6 +140,7 @@ extern void ext4_es_find_extent_range(struct inode *inode, ...@@ -140,6 +140,7 @@ extern void ext4_es_find_extent_range(struct inode *inode,
ext4_lblk_t lblk, ext4_lblk_t end, ext4_lblk_t lblk, ext4_lblk_t end,
struct extent_status *es); struct extent_status *es);
extern int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk, extern int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t *next_lblk,
struct extent_status *es); struct extent_status *es);
extern bool ext4_es_scan_range(struct inode *inode, extern bool ext4_es_scan_range(struct inode *inode,
int (*matching_fn)(struct extent_status *es), int (*matching_fn)(struct extent_status *es),
...@@ -246,7 +247,6 @@ extern int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk, ...@@ -246,7 +247,6 @@ extern int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
bool allocated); bool allocated);
extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk, extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len); ext4_lblk_t len);
extern void ext4_es_remove_blks(struct inode *inode, ext4_lblk_t lblk, extern void ext4_clear_inode_es(struct inode *inode);
ext4_lblk_t len);
#endif /* _EXT4_EXTENTS_STATUS_H */ #endif /* _EXT4_EXTENTS_STATUS_H */
...@@ -230,8 +230,6 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ...@@ -230,8 +230,6 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (IS_DAX(inode)) if (IS_DAX(inode))
return ext4_dax_write_iter(iocb, from); return ext4_dax_write_iter(iocb, from);
#endif #endif
if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT))
return -EOPNOTSUPP;
if (!inode_trylock(inode)) { if (!inode_trylock(inode)) {
if (iocb->ki_flags & IOCB_NOWAIT) if (iocb->ki_flags & IOCB_NOWAIT)
......
...@@ -280,7 +280,7 @@ int ext4fs_dirhash(const struct inode *dir, const char *name, int len, ...@@ -280,7 +280,7 @@ int ext4fs_dirhash(const struct inode *dir, const char *name, int len,
unsigned char *buff; unsigned char *buff;
struct qstr qstr = {.name = name, .len = len }; struct qstr qstr = {.name = name, .len = len };
if (len && IS_CASEFOLDED(dir)) { if (len && IS_CASEFOLDED(dir) && um) {
buff = kzalloc(sizeof(char) * PATH_MAX, GFP_KERNEL); buff = kzalloc(sizeof(char) * PATH_MAX, GFP_KERNEL);
if (!buff) if (!buff)
return -ENOMEM; return -ENOMEM;
......
...@@ -1416,7 +1416,7 @@ int ext4_inlinedir_to_tree(struct file *dir_file, ...@@ -1416,7 +1416,7 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
err = ext4_htree_store_dirent(dir_file, hinfo->hash, err = ext4_htree_store_dirent(dir_file, hinfo->hash,
hinfo->minor_hash, de, &tmp_str); hinfo->minor_hash, de, &tmp_str);
if (err) { if (err) {
count = err; ret = err;
goto out; goto out;
} }
count++; count++;
......
...@@ -527,7 +527,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, ...@@ -527,7 +527,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
return -EFSCORRUPTED; return -EFSCORRUPTED;
/* Lookup extent status tree firstly */ /* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, map->m_lblk, &es)) { if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) { if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) {
map->m_pblk = ext4_es_pblock(&es) + map->m_pblk = ext4_es_pblock(&es) +
map->m_lblk - es.es_lblk; map->m_lblk - es.es_lblk;
...@@ -695,7 +695,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, ...@@ -695,7 +695,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
* extent status tree. * extent status tree.
*/ */
if ((flags & EXT4_GET_BLOCKS_PRE_IO) && if ((flags & EXT4_GET_BLOCKS_PRE_IO) &&
ext4_es_lookup_extent(inode, map->m_lblk, &es)) { ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
if (ext4_es_is_written(&es)) if (ext4_es_is_written(&es))
goto out_sem; goto out_sem;
} }
...@@ -1024,7 +1024,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode, ...@@ -1024,7 +1024,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
bh = ext4_getblk(handle, inode, block, map_flags); bh = ext4_getblk(handle, inode, block, map_flags);
if (IS_ERR(bh)) if (IS_ERR(bh))
return bh; return bh;
if (!bh || buffer_uptodate(bh)) if (!bh || ext4_buffer_uptodate(bh))
return bh; return bh;
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &bh); ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &bh);
wait_on_buffer(bh); wait_on_buffer(bh);
...@@ -1051,7 +1051,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count, ...@@ -1051,7 +1051,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,
for (i = 0; i < bh_count; i++) for (i = 0; i < bh_count; i++)
/* Note that NULL bhs[i] is valid because of holes. */ /* Note that NULL bhs[i] is valid because of holes. */
if (bhs[i] && !buffer_uptodate(bhs[i])) if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1,
&bhs[i]); &bhs[i]);
...@@ -1656,49 +1656,6 @@ void ext4_da_release_space(struct inode *inode, int to_free) ...@@ -1656,49 +1656,6 @@ void ext4_da_release_space(struct inode *inode, int to_free)
dquot_release_reservation_block(inode, EXT4_C2B(sbi, to_free)); dquot_release_reservation_block(inode, EXT4_C2B(sbi, to_free));
} }
static void ext4_da_page_release_reservation(struct page *page,
unsigned int offset,
unsigned int length)
{
int contiguous_blks = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
unsigned int stop = offset + length;
ext4_fsblk_t lblk;
BUG_ON(stop > PAGE_SIZE || stop < length);
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;
if (next_off > stop)
break;
if ((offset <= curr_off) && (buffer_delay(bh))) {
contiguous_blks++;
clear_buffer_delay(bh);
} else if (contiguous_blks) {
lblk = page->index <<
(PAGE_SHIFT - inode->i_blkbits);
lblk += (curr_off >> inode->i_blkbits) -
contiguous_blks;
ext4_es_remove_blks(inode, lblk, contiguous_blks);
contiguous_blks = 0;
}
curr_off = next_off;
} while ((bh = bh->b_this_page) != head);
if (contiguous_blks) {
lblk = page->index << (PAGE_SHIFT - inode->i_blkbits);
lblk += (curr_off >> inode->i_blkbits) - contiguous_blks;
ext4_es_remove_blks(inode, lblk, contiguous_blks);
}
}
/* /*
* Delayed allocation stuff * Delayed allocation stuff
*/ */
...@@ -1878,7 +1835,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock, ...@@ -1878,7 +1835,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
(unsigned long) map->m_lblk); (unsigned long) map->m_lblk);
/* Lookup extent status tree firstly */ /* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, iblock, &es)) { if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
if (ext4_es_is_hole(&es)) { if (ext4_es_is_hole(&es)) {
retval = 0; retval = 0;
down_read(&EXT4_I(inode)->i_data_sem); down_read(&EXT4_I(inode)->i_data_sem);
...@@ -2800,15 +2757,6 @@ static int ext4_writepages(struct address_space *mapping, ...@@ -2800,15 +2757,6 @@ static int ext4_writepages(struct address_space *mapping,
goto out_writepages; goto out_writepages;
} }
if (ext4_should_dioread_nolock(inode)) {
/*
* We may need to convert up to one extent per block in
* the page and we may dirty the inode.
*/
rsv_blocks = 1 + ext4_chunk_trans_blocks(inode,
PAGE_SIZE >> inode->i_blkbits);
}
/* /*
* If we have inline data and arrive here, it means that * If we have inline data and arrive here, it means that
* we will soon create the block for the 1st page, so * we will soon create the block for the 1st page, so
...@@ -2827,6 +2775,15 @@ static int ext4_writepages(struct address_space *mapping, ...@@ -2827,6 +2775,15 @@ static int ext4_writepages(struct address_space *mapping,
ext4_journal_stop(handle); ext4_journal_stop(handle);
} }
if (ext4_should_dioread_nolock(inode)) {
/*
* We may need to convert up to one extent per block in
* the page and we may dirty the inode.
*/
rsv_blocks = 1 + ext4_chunk_trans_blocks(inode,
PAGE_SIZE >> inode->i_blkbits);
}
if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
range_whole = 1; range_whole = 1;
...@@ -3242,24 +3199,6 @@ static int ext4_da_write_end(struct file *file, ...@@ -3242,24 +3199,6 @@ static int ext4_da_write_end(struct file *file,
return ret ? ret : copied; return ret ? ret : copied;
} }
static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
/*
* Drop reserved blocks
*/
BUG_ON(!PageLocked(page));
if (!page_has_buffers(page))
goto out;
ext4_da_page_release_reservation(page, offset, length);
out:
ext4_invalidatepage(page, offset, length);
return;
}
/* /*
* Force all delayed allocation blocks to be allocated for a given inode. * Force all delayed allocation blocks to be allocated for a given inode.
*/ */
...@@ -4002,7 +3941,7 @@ static const struct address_space_operations ext4_da_aops = { ...@@ -4002,7 +3941,7 @@ static const struct address_space_operations ext4_da_aops = {
.write_end = ext4_da_write_end, .write_end = ext4_da_write_end,
.set_page_dirty = ext4_set_page_dirty, .set_page_dirty = ext4_set_page_dirty,
.bmap = ext4_bmap, .bmap = ext4_bmap,
.invalidatepage = ext4_da_invalidatepage, .invalidatepage = ext4_invalidatepage,
.releasepage = ext4_releasepage, .releasepage = ext4_releasepage,
.direct_IO = ext4_direct_IO, .direct_IO = ext4_direct_IO,
.migratepage = buffer_migrate_page, .migratepage = buffer_migrate_page,
...@@ -4314,6 +4253,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) ...@@ -4314,6 +4253,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0); trace_ext4_punch_hole(inode, offset, length, 0);
ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
if (ext4_has_inline_data(inode)) {
down_write(&EXT4_I(inode)->i_mmap_sem);
ret = ext4_convert_inline_data(inode);
up_write(&EXT4_I(inode)->i_mmap_sem);
if (ret)
return ret;
}
/* /*
* Write out all dirty pages to avoid race conditions * Write out all dirty pages to avoid race conditions
* Then release them. * Then release them.
...@@ -5137,6 +5085,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ...@@ -5137,6 +5085,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
"iget: bogus i_mode (%o)", inode->i_mode); "iget: bogus i_mode (%o)", inode->i_mode);
goto bad_inode; goto bad_inode;
} }
if (IS_CASEFOLDED(inode) && !ext4_has_feature_casefold(inode->i_sb))
ext4_error_inode(inode, function, line, 0,
"casefold flag without casefold feature");
brelse(iloc.bh); brelse(iloc.bh);
unlock_new_inode(inode); unlock_new_inode(inode);
......
...@@ -745,6 +745,74 @@ static void ext4_fill_fsxattr(struct inode *inode, struct fsxattr *fa) ...@@ -745,6 +745,74 @@ static void ext4_fill_fsxattr(struct inode *inode, struct fsxattr *fa)
fa->fsx_projid = from_kprojid(&init_user_ns, ei->i_projid); fa->fsx_projid = from_kprojid(&init_user_ns, ei->i_projid);
} }
/* copied from fs/ioctl.c */
static int fiemap_check_ranges(struct super_block *sb,
u64 start, u64 len, u64 *new_len)
{
u64 maxbytes = (u64) sb->s_maxbytes;
*new_len = len;
if (len == 0)
return -EINVAL;
if (start > maxbytes)
return -EFBIG;
/*
* Shrink request scope to what the fs can actually handle.
*/
if (len > maxbytes || (maxbytes - len) < start)
*new_len = maxbytes - start;
return 0;
}
/* So that the fiemap access checks can't overflow on 32 bit machines. */
#define FIEMAP_MAX_EXTENTS (UINT_MAX / sizeof(struct fiemap_extent))
static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
{
struct fiemap fiemap;
struct fiemap __user *ufiemap = (struct fiemap __user *) arg;
struct fiemap_extent_info fieinfo = { 0, };
struct inode *inode = file_inode(filp);
struct super_block *sb = inode->i_sb;
u64 len;
int error;
if (copy_from_user(&fiemap, ufiemap, sizeof(fiemap)))
return -EFAULT;
if (fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS)
return -EINVAL;
error = fiemap_check_ranges(sb, fiemap.fm_start, fiemap.fm_length,
&len);
if (error)
return error;
fieinfo.fi_flags = fiemap.fm_flags;
fieinfo.fi_extents_max = fiemap.fm_extent_count;
fieinfo.fi_extents_start = ufiemap->fm_extents;
if (fiemap.fm_extent_count != 0 &&
!access_ok(fieinfo.fi_extents_start,
fieinfo.fi_extents_max * sizeof(struct fiemap_extent)))
return -EFAULT;
if (fieinfo.fi_flags & FIEMAP_FLAG_SYNC)
filemap_write_and_wait(inode->i_mapping);
error = ext4_get_es_cache(inode, &fieinfo, fiemap.fm_start, len);
fiemap.fm_flags = fieinfo.fi_flags;
fiemap.fm_mapped_extents = fieinfo.fi_extents_mapped;
if (copy_to_user(ufiemap, &fiemap, sizeof(fiemap)))
error = -EFAULT;
return error;
}
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{ {
struct inode *inode = file_inode(filp); struct inode *inode = file_inode(filp);
...@@ -1142,6 +1210,33 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) ...@@ -1142,6 +1210,33 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP; return -EOPNOTSUPP;
return fscrypt_ioctl_get_key_status(filp, (void __user *)arg); return fscrypt_ioctl_get_key_status(filp, (void __user *)arg);
case EXT4_IOC_CLEAR_ES_CACHE:
{
if (!inode_owner_or_capable(inode))
return -EACCES;
ext4_clear_inode_es(inode);
return 0;
}
case EXT4_IOC_GETSTATE:
{
__u32 state = 0;
if (ext4_test_inode_state(inode, EXT4_STATE_EXT_PRECACHED))
state |= EXT4_STATE_FLAG_EXT_PRECACHED;
if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
state |= EXT4_STATE_FLAG_NEW;
if (ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY))
state |= EXT4_STATE_FLAG_NEWENTRY;
if (ext4_test_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE))
state |= EXT4_STATE_FLAG_DA_ALLOC_CLOSE;
return put_user(state, (__u32 __user *) arg);
}
case EXT4_IOC_GET_ES_CACHE:
return ext4_ioctl_get_es_cache(filp, arg);
case EXT4_IOC_FSGETXATTR: case EXT4_IOC_FSGETXATTR:
{ {
struct fsxattr fa; struct fsxattr fa;
...@@ -1278,6 +1373,9 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ...@@ -1278,6 +1373,9 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case FS_IOC_GETFSMAP: case FS_IOC_GETFSMAP:
case FS_IOC_ENABLE_VERITY: case FS_IOC_ENABLE_VERITY:
case FS_IOC_MEASURE_VERITY: case FS_IOC_MEASURE_VERITY:
case EXT4_IOC_CLEAR_ES_CACHE:
case EXT4_IOC_GETSTATE:
case EXT4_IOC_GET_ES_CACHE:
break; break;
default: default:
return -ENOIOCTLCMD; return -ENOIOCTLCMD;
......
...@@ -1312,7 +1312,7 @@ void ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname, ...@@ -1312,7 +1312,7 @@ void ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
{ {
int len; int len;
if (!IS_CASEFOLDED(dir)) { if (!IS_CASEFOLDED(dir) || !EXT4_SB(dir->i_sb)->s_encoding) {
cf_name->name = NULL; cf_name->name = NULL;
return; return;
} }
...@@ -2183,7 +2183,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, ...@@ -2183,7 +2183,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
#ifdef CONFIG_UNICODE #ifdef CONFIG_UNICODE
if (ext4_has_strict_mode(sbi) && IS_CASEFOLDED(dir) && if (ext4_has_strict_mode(sbi) && IS_CASEFOLDED(dir) &&
utf8_validate(sbi->s_encoding, &dentry->d_name)) sbi->s_encoding && utf8_validate(sbi->s_encoding, &dentry->d_name))
return -EINVAL; return -EINVAL;
#endif #endif
......
...@@ -1878,6 +1878,13 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, ...@@ -1878,6 +1878,13 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
} else if (token == Opt_commit) { } else if (token == Opt_commit) {
if (arg == 0) if (arg == 0)
arg = JBD2_DEFAULT_MAX_COMMIT_AGE; arg = JBD2_DEFAULT_MAX_COMMIT_AGE;
else if (arg > INT_MAX / HZ) {
ext4_msg(sb, KERN_ERR,
"Invalid commit interval %d, "
"must be smaller than %d",
arg, INT_MAX / HZ);
return -1;
}
sbi->s_commit_interval = HZ * arg; sbi->s_commit_interval = HZ * arg;
} else if (token == Opt_debug_want_extra_isize) { } else if (token == Opt_debug_want_extra_isize) {
sbi->s_want_extra_isize = arg; sbi->s_want_extra_isize = arg;
......
...@@ -638,10 +638,8 @@ static void flush_descriptor(journal_t *journal, ...@@ -638,10 +638,8 @@ static void flush_descriptor(journal_t *journal,
{ {
jbd2_journal_revoke_header_t *header; jbd2_journal_revoke_header_t *header;
if (is_journal_aborted(journal)) { if (is_journal_aborted(journal))
put_bh(descriptor);
return; return;
}
header = (jbd2_journal_revoke_header_t *)descriptor->b_data; header = (jbd2_journal_revoke_header_t *)descriptor->b_data;
header->r_count = cpu_to_be32(offset); header->r_count = cpu_to_be32(offset);
......
...@@ -569,6 +569,9 @@ int jbd2_journal_start_reserved(handle_t *handle, unsigned int type, ...@@ -569,6 +569,9 @@ int jbd2_journal_start_reserved(handle_t *handle, unsigned int type,
} }
handle->h_type = type; handle->h_type = type;
handle->h_line_no = line_no; handle->h_line_no = line_no;
trace_jbd2_handle_start(journal->j_fs_dev->bd_dev,
handle->h_transaction->t_tid, type,
line_no, handle->h_buffer_credits);
return 0; return 0;
} }
EXPORT_SYMBOL(jbd2_journal_start_reserved); EXPORT_SYMBOL(jbd2_journal_start_reserved);
......
...@@ -154,7 +154,7 @@ static int utf8_parse_version(const char *version, unsigned int *maj, ...@@ -154,7 +154,7 @@ static int utf8_parse_version(const char *version, unsigned int *maj,
{ {
substring_t args[3]; substring_t args[3];
char version_string[12]; char version_string[12];
const struct match_token token[] = { static const struct match_token token[] = {
{1, "%d.%d.%d"}, {1, "%d.%d.%d"},
{0, NULL} {0, NULL}
}; };
......
...@@ -35,7 +35,7 @@ unsigned int total_tests; ...@@ -35,7 +35,7 @@ unsigned int total_tests;
#define test_f(cond, fmt, ...) _test(cond, __func__, __LINE__, fmt, ##__VA_ARGS__) #define test_f(cond, fmt, ...) _test(cond, __func__, __LINE__, fmt, ##__VA_ARGS__)
#define test(cond) _test(cond, __func__, __LINE__, "") #define test(cond) _test(cond, __func__, __LINE__, "")
const static struct { static const struct {
/* UTF-8 strings in this vector _must_ be NULL-terminated. */ /* UTF-8 strings in this vector _must_ be NULL-terminated. */
unsigned char str[10]; unsigned char str[10];
unsigned char dec[10]; unsigned char dec[10];
...@@ -89,7 +89,7 @@ const static struct { ...@@ -89,7 +89,7 @@ const static struct {
}; };
const static struct { static const struct {
/* UTF-8 strings in this vector _must_ be NULL-terminated. */ /* UTF-8 strings in this vector _must_ be NULL-terminated. */
unsigned char str[30]; unsigned char str[30];
unsigned char ncf[30]; unsigned char ncf[30];
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment