Commit 2150edc6 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
  jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
  ext4: Remove "extents" mount option
  block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
  ext4: Make printk's consistently prefixed with "EXT4-fs: "
  ext4: Add sanity checks for the superblock before mounting the filesystem
  ext4: Add mount option to set kjournald's I/O priority
  jbd2: Submit writes to the journal using WRITE_SYNC
  jbd2: Add pid and journal device name to the "kjournald2 starting" message
  ext4: Add markers for better debuggability
  ext4: Remove code to create the journal inode
  ext4: provide function to release metadata pages under memory pressure
  ext3: provide function to release metadata pages under memory pressure
  add releasepage hooks to block devices which can be used by file systems
  ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
  ext4: Init the complete page while building buddy cache
  ext4: Don't allow new groups to be added during block allocation
  ext4: mark the blocks/inode bitmap beyond end of group as used
  ext4: Use new buffer_head flag to check uninit group bitmaps initialization
  ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
  ext4: code cleanup
  ...
parents cd764695 4b905671
......@@ -58,13 +58,22 @@ Note: More extensive information for getting started with ext4 can be
# mount -t ext4 /dev/hda1 /wherever
- When comparing performance with other filesystems, remember that
ext3/4 by default offers higher data integrity guarantees than most.
So when comparing with a metadata-only journalling filesystem, such
as ext3, use `mount -o data=writeback'. And you might as well use
`mount -o nobh' too along with it. Making the journal larger than
the mke2fs default often helps performance with metadata-intensive
workloads.
- When comparing performance with other filesystems, it's always
important to try multiple workloads; very often a subtle change in a
workload parameter can completely change the ranking of which
filesystems do well compared to others. When comparing versus ext3,
note that ext4 enables write barriers by default, while ext3 does
not enable write barriers by default. So it is useful to use
explicitly specify whether barriers are enabled or not when via the
'-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
for a fair comparison. When tuning ext3 for best benchmark numbers,
it is often worthwhile to try changing the data journaling mode; '-o
data=writeback,nobh' can be faster for some workloads. (Note
however that running mounted with data=writeback can potentially
leave stale data exposed in recently written files in case of an
unclean shutdown, which could be a security exposure in some
situations.) Configuring the filesystem with a large journal can
also be helpful for metadata-intensive workloads.
2. Features
===========
......@@ -74,7 +83,7 @@ Note: More extensive information for getting started with ext4 can be
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
* extent format reduces metadata overhead (RAM, IO for access, transactions)
* extent format more robust in face of on-disk corruption due to magics,
* internal redunancy in tree
* internal redundancy in tree
* improved file allocation (multi-block alloc)
* fix 32000 subdirectory limit
* nsec timestamps for mtime, atime, ctime, create time
......@@ -116,10 +125,11 @@ grouping of bitmaps and inode tables. Some test results available here:
When mounting an ext4 filesystem, the following option are accepted:
(*) == default
extents (*) ext4 will use extents to address file data. The
file system will no longer be mountable by ext3.
noextents ext4 will not use extents for newly created files
ro Mount filesystem read only. Note that ext4 will
replay the journal (and thus write to the
partition) even when mounted "read only". The
mount options "ro,noload" can be used to prevent
writes to the filesystem.
journal_checksum Enable checksumming of the journal transactions.
This will allow the recovery code in e2fsck and the
......@@ -134,17 +144,17 @@ journal_async_commit Commit block can be written to disk without waiting
journal=update Update the ext4 file system's journal to the current
format.
journal=inum When a journal already exists, this option is ignored.
Otherwise, it specifies the number of the inode which
will represent the ext4 file system's journal file.
journal_dev=devnum When the external journal device's major/minor numbers
have changed, this option allows the user to specify
the new journal location. The journal device is
identified through its new major/minor numbers encoded
in devnum.
noload Don't load the journal on mounting.
noload Don't load the journal on mounting. Note that
if the filesystem was not unmounted cleanly,
skipping the journal replay will lead to the
filesystem containing inconsistencies that can
lead to any number of problems.
data=journal All data are committed into the journal prior to being
written into the main file system.
......@@ -219,9 +229,12 @@ minixdf Make 'df' act like Minix.
debug Extra debugging information is sent to syslog.
errors=remount-ro(*) Remount the filesystem read-only on an error.
errors=remount-ro Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.
(These mount options override the errors behavior
specified in the superblock, which can be configured
using tune2fs)
data_err=ignore(*) Just print an error message if an error occurs
in a file data buffer in ordered mode.
......@@ -261,6 +274,42 @@ delalloc (*) Deferring block allocation until write-out time.
nodelalloc Disable delayed allocation. Blocks are allocation
when data is copied from user to page cache.
max_batch_time=usec Maximum amount of time ext4 should wait for
additional filesystem operations to be batch
together with a synchronous write operation.
Since a synchronous write operation is going to
force a commit and then a wait for the I/O
complete, it doesn't cost much, and can be a
huge throughput win, we wait for a small amount
of time to see if any other transactions can
piggyback on the synchronous write. The
algorithm used is designed to automatically tune
for the speed of the disk, by measuring the
amount of time (on average) that it takes to
finish committing a transaction. Call this time
the "commit time". If the time that the
transactoin has been running is less than the
commit time, ext4 will try sleeping for the
commit time to see if other operations will join
the transaction. The commit time is capped by
the max_batch_time, which defaults to 15000us
(15ms). This optimization can be turned off
entirely by setting max_batch_time to 0.
min_batch_time=usec This parameter sets the commit time (as
described above) to be at least min_batch_time.
It defaults to zero microseconds. Increasing
this parameter may improve the throughput of
multi-threaded, synchronous workloads on very
fast disks, at the cost of increasing latency.
journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the
highest priorty) which should be used for I/O
operations submitted by kjournald2 during a
commit operation. This defaults to 3, which is
a slightly higher priority than the default I/O
priority.
Data Mode
=========
There are 3 different data modes:
......
......@@ -36,6 +36,12 @@ config LBD
This option also enables support for single files larger than
2TB.
The ext4 filesystem requires that this feature be enabled in
order to support filesystems that have the huge_file feature
enabled. Otherwise, it will refuse to mount any filesystems
that use the huge_file feature, which is enabled by default
by mke2fs.ext4. The GFS2 filesystem also requires this feature.
If unsure, say N.
config BLK_DEV_IO_TRACE
......
......@@ -1234,6 +1234,20 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
return blkdev_ioctl(bdev, mode, cmd, arg);
}
/*
* Try to release a page associated with block device when the system
* is under memory pressure.
*/
static int blkdev_releasepage(struct page *page, gfp_t wait)
{
struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super;
if (super && super->s_op->bdev_try_to_free_page)
return super->s_op->bdev_try_to_free_page(super, page, wait);
return try_to_free_buffers(page);
}
static const struct address_space_operations def_blk_aops = {
.readpage = blkdev_readpage,
.writepage = blkdev_writepage,
......@@ -1241,6 +1255,7 @@ static const struct address_space_operations def_blk_aops = {
.write_begin = blkdev_write_begin,
.write_end = blkdev_write_end,
.writepages = generic_writepages,
.releasepage = blkdev_releasepage,
.direct_IO = blkdev_direct_IO,
};
......
......@@ -35,23 +35,71 @@ static void TEA_transform(__u32 buf[4], __u32 const in[])
/* The old legacy hash */
static __u32 dx_hack_hash (const char *name, int len)
static __u32 dx_hack_hash_unsigned(const char *name, int len)
{
__u32 hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
const unsigned char *ucp = (const unsigned char *) name;
while (len--) {
hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
if (hash & 0x80000000)
hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return hash0 << 1;
}
static __u32 dx_hack_hash_signed(const char *name, int len)
{
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
const signed char *scp = (const signed char *) name;
while (len--) {
__u32 hash = hash1 + (hash0 ^ (*name++ * 7152373));
hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
if (hash & 0x80000000) hash -= 0x7fffffff;
if (hash & 0x80000000)
hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return (hash0 << 1);
return hash0 << 1;
}
static void str2hashbuf(const char *msg, int len, __u32 *buf, int num)
static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
{
__u32 pad, val;
int i;
const signed char *scp = (const signed char *) msg;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
val = pad;
if (len > num*4)
len = num * 4;
for (i = 0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = ((int) scp[i]) + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
num--;
}
}
if (--num >= 0)
*buf++ = val;
while (--num >= 0)
*buf++ = pad;
}
static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
{
__u32 pad, val;
int i;
const unsigned char *ucp = (const unsigned char *) msg;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
......@@ -62,7 +110,7 @@ static void str2hashbuf(const char *msg, int len, __u32 *buf, int num)
for (i=0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = msg[i] + (val << 8);
val = ((int) ucp[i]) + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
......@@ -95,6 +143,8 @@ int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
const char *p;
int i;
__u32 in[8], buf[4];
void (*str2hashbuf)(const char *, int, __u32 *, int) =
str2hashbuf_signed;
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
......@@ -113,13 +163,18 @@ int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
}
switch (hinfo->hash_version) {
case DX_HASH_LEGACY_UNSIGNED:
hash = dx_hack_hash_unsigned(name, len);
break;
case DX_HASH_LEGACY:
hash = dx_hack_hash(name, len);
hash = dx_hack_hash_signed(name, len);
break;
case DX_HASH_HALF_MD4_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
case DX_HASH_HALF_MD4:
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 8);
(*str2hashbuf)(p, len, in, 8);
half_md4_transform(buf, in);
len -= 32;
p += 32;
......@@ -127,10 +182,12 @@ int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
minor_hash = buf[2];
hash = buf[1];
break;
case DX_HASH_TEA_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
case DX_HASH_TEA:
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 4);
(*str2hashbuf)(p, len, in, 4);
TEA_transform(buf, in);
len -= 16;
p += 16;
......
......@@ -364,6 +364,8 @@ dx_probe(struct qstr *entry, struct inode *dir,
goto fail;
}
hinfo->hash_version = root->info.hash_version;
if (hinfo->hash_version <= DX_HASH_TEA)
hinfo->hash_version += EXT3_SB(dir->i_sb)->s_hash_unsigned;
hinfo->seed = EXT3_SB(dir->i_sb)->s_hash_seed;
if (entry)
ext3fs_dirhash(entry->name, entry->len, hinfo);
......@@ -632,6 +634,9 @@ int ext3_htree_fill_tree(struct file *dir_file, __u32 start_hash,
dir = dir_file->f_path.dentry->d_inode;
if (!(EXT3_I(dir)->i_flags & EXT3_INDEX_FL)) {
hinfo.hash_version = EXT3_SB(dir->i_sb)->s_def_hash_version;
if (hinfo.hash_version <= DX_HASH_TEA)
hinfo.hash_version +=
EXT3_SB(dir->i_sb)->s_hash_unsigned;
hinfo.seed = EXT3_SB(dir->i_sb)->s_hash_seed;
count = htree_dirblock_to_tree(dir_file, dir, 0, &hinfo,
start_hash, start_minor_hash);
......@@ -1152,9 +1157,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
u32 hash2;
struct dx_map_entry *map;
char *data1 = (*bh)->b_data, *data2;
unsigned split, move, size, i;
unsigned split, move, size;
struct ext3_dir_entry_2 *de = NULL, *de2;
int err = 0;
int err = 0, i;
bh2 = ext3_append (handle, dir, &newblock, &err);
if (!(bh2)) {
......@@ -1394,6 +1399,8 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
/* Initialize as for dx_probe */
hinfo.hash_version = root->info.hash_version;
if (hinfo.hash_version <= DX_HASH_TEA)
hinfo.hash_version += EXT3_SB(dir->i_sb)->s_hash_unsigned;
hinfo.seed = EXT3_SB(dir->i_sb)->s_hash_seed;
ext3fs_dirhash(name, namelen, &hinfo);
frame = frames;
......
......@@ -683,6 +683,26 @@ static struct dentry *ext3_fh_to_parent(struct super_block *sb, struct fid *fid,
ext3_nfs_get_inode);
}
/*
* Try to release metadata pages (indirect blocks, directories) which are
* mapped via the block device. Since these pages could have journal heads
* which would prevent try_to_free_buffers() from freeing them, we must use
* jbd layer's try_to_free_buffers() function to release them.
*/
static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
gfp_t wait)
{
journal_t *journal = EXT3_SB(sb)->s_journal;
WARN_ON(PageChecked(page));
if (!page_has_buffers(page))
return 0;
if (journal)
return journal_try_to_free_buffers(journal, page,
wait & ~__GFP_WAIT);
return try_to_free_buffers(page);
}
#ifdef CONFIG_QUOTA
#define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group")
#define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
......@@ -749,6 +769,7 @@ static const struct super_operations ext3_sops = {
.quota_read = ext3_quota_read,
.quota_write = ext3_quota_write,
#endif
.bdev_try_to_free_page = bdev_try_to_free_page,
};
static const struct export_operations ext3_export_ops = {
......@@ -1750,6 +1771,18 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
for (i=0; i < 4; i++)
sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
sbi->s_def_hash_version = es->s_def_hash_version;
i = le32_to_cpu(es->s_flags);
if (i & EXT2_FLAGS_UNSIGNED_HASH)
sbi->s_hash_unsigned = 3;
else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
#ifdef __CHAR_UNSIGNED__
es->s_flags |= cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
sbi->s_hash_unsigned = 3;
#else
es->s_flags |= cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
#endif
sb->s_dirt = 1;
}
if (sbi->s_blocks_per_group > blocksize * 8) {
printk (KERN_ERR
......
This diff is collapsed.
......@@ -15,10 +15,9 @@
static const int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0};
unsigned long ext4_count_free(struct buffer_head *map, unsigned int numchars)
unsigned int ext4_count_free(struct buffer_head *map, unsigned int numchars)
{
unsigned int i;
unsigned long sum = 0;
unsigned int i, sum = 0;
if (!map)
return 0;
......
......@@ -64,7 +64,7 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
int ext4_check_dir_entry(const char *function, struct inode *dir,
struct ext4_dir_entry_2 *de,
struct buffer_head *bh,
unsigned long offset)
unsigned int offset)
{
const char *error_msg = NULL;
const int rlen = ext4_rec_len_from_disk(de->rec_len);
......@@ -84,9 +84,9 @@ int ext4_check_dir_entry(const char *function, struct inode *dir,
if (error_msg != NULL)
ext4_error(dir->i_sb, function,
"bad entry in directory #%lu: %s - "
"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
"offset=%u, inode=%u, rec_len=%d, name_len=%d",
dir->i_ino, error_msg, offset,
(unsigned long) le32_to_cpu(de->inode),
le32_to_cpu(de->inode),
rlen, de->name_len);
return error_msg == NULL ? 1 : 0;
}
......@@ -95,7 +95,7 @@ static int ext4_readdir(struct file *filp,
void *dirent, filldir_t filldir)
{
int error = 0;
unsigned long offset;
unsigned int offset;
int i, stored;
struct ext4_dir_entry_2 *de;
struct super_block *sb;
......@@ -405,7 +405,7 @@ static int call_filldir(struct file *filp, void *dirent,
sb = inode->i_sb;
if (!fname) {
printk(KERN_ERR "ext4: call_filldir: called with "
printk(KERN_ERR "EXT4-fs: call_filldir: called with "
"null fname?!?\n");
return 0;
}
......
This diff is collapsed.
......@@ -194,11 +194,6 @@ static inline unsigned short ext_depth(struct inode *inode)
return le16_to_cpu(ext_inode_hdr(inode)->eh_depth);
}
static inline void ext4_ext_tree_changed(struct inode *inode)
{
EXT4_I(inode)->i_ext_generation++;
}
static inline void
ext4_ext_invalidate_cache(struct inode *inode)
{
......
......@@ -31,7 +31,7 @@ typedef unsigned long long ext4_fsblk_t;
typedef __u32 ext4_lblk_t;
/* data type for block group number */
typedef unsigned long ext4_group_t;
typedef unsigned int ext4_group_t;
#define rsv_start rsv_window._rsv_start
#define rsv_end rsv_window._rsv_end
......@@ -100,9 +100,6 @@ struct ext4_inode_info {
*/
loff_t i_disksize;
/* on-disk additional length */
__u16 i_extra_isize;
/*
* i_data_sem is for serialising ext4_truncate() against
* ext4_getblock(). In the 2.4 ext2 design, great chunks of inode's
......@@ -117,7 +114,6 @@ struct ext4_inode_info {
struct inode vfs_inode;
struct jbd2_inode jinode;
unsigned long i_ext_generation;
struct ext4_ext_cache i_cached_extent;
/*
* File creation time. Its function is same as that of
......@@ -130,10 +126,14 @@ struct ext4_inode_info {
spinlock_t i_prealloc_lock;
/* allocation reservation info for delalloc */
unsigned long i_reserved_data_blocks;
unsigned long i_reserved_meta_blocks;
unsigned long i_allocated_meta_blocks;
unsigned int i_reserved_data_blocks;
unsigned int i_reserved_meta_blocks;
unsigned int i_allocated_meta_blocks;
unsigned short i_delalloc_reserved_flag;
/* on-disk additional length */
__u16 i_extra_isize;
spinlock_t i_block_reservation_lock;
};
......
......@@ -7,53 +7,96 @@
int __ext4_journal_get_undo_access(const char *where, handle_t *handle,
struct buffer_head *bh)
{
int err = jbd2_journal_get_undo_access(handle, bh);
int err = 0;
if (ext4_handle_valid(handle)) {
err = jbd2_journal_get_undo_access(handle, bh);
if (err)
ext4_journal_abort_handle(where, __func__, bh, handle, err);
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
}
return err;
}
int __ext4_journal_get_write_access(const char *where, handle_t *handle,
struct buffer_head *bh)
{
int err = jbd2_journal_get_write_access(handle, bh);
int err = 0;
if (ext4_handle_valid(handle)) {
err = jbd2_journal_get_write_access(handle, bh);
if (err)
ext4_journal_abort_handle(where, __func__, bh, handle, err);
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
}
return err;
}
int __ext4_journal_forget(const char *where, handle_t *handle,
struct buffer_head *bh)
{
int err = jbd2_journal_forget(handle, bh);
int err = 0;
if (ext4_handle_valid(handle)) {
err = jbd2_journal_forget(handle, bh);
if (err)
ext4_journal_abort_handle(where, __func__, bh, handle, err);
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
}
return err;
}
int __ext4_journal_revoke(const char *where, handle_t *handle,
ext4_fsblk_t blocknr, struct buffer_head *bh)
{
int err = jbd2_journal_revoke(handle, blocknr, bh);
int err = 0;
if (ext4_handle_valid(handle)) {
err = jbd2_journal_revoke(handle, blocknr, bh);
if (err)
ext4_journal_abort_handle(where, __func__, bh, handle, err);
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
}
return err;
}
int __ext4_journal_get_create_access(const char *where,
handle_t *handle, struct buffer_head *bh)
{
int err = jbd2_journal_get_create_access(handle, bh);
int err = 0;
if (ext4_handle_valid(handle)) {
err = jbd2_journal_get_create_access(handle, bh);
if (err)
ext4_journal_abort_handle(where, __func__, bh, handle, err);
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
}
return err;
}
int __ext4_journal_dirty_metadata(const char *where,
handle_t *handle, struct buffer_head *bh)
int __ext4_handle_dirty_metadata(const char *where, handle_t *handle,
struct inode *inode, struct buffer_head *bh)
{
int err = jbd2_journal_dirty_metadata(handle, bh);
int err = 0;
if (ext4_handle_valid(handle)) {
err = jbd2_journal_dirty_metadata(handle, bh);
if (err)
ext4_journal_abort_handle(where, __func__, bh, handle, err);
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
} else {
mark_buffer_dirty(bh);
if (inode && inode_needs_sync(inode)) {
sync_dirty_buffer(bh);
if (buffer_req(bh) && !buffer_uptodate(bh)) {
ext4_error(inode->i_sb, __func__,
"IO error syncing inode, "
"inode=%lu, block=%llu",
inode->i_ino,
(unsigned long long) bh->b_blocknr);
err = -EIO;
}
}
}
return err;
}
......@@ -33,7 +33,7 @@
#define EXT4_SINGLEDATA_TRANS_BLOCKS(sb) \
(EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS) \
|| test_opt(sb, EXTENTS) ? 27U : 8U)
? 27U : 8U)
/* Extended attribute operations touch at most two data buffers,
* two bitmap buffers, and two group summaries, in addition to the inode
......@@ -122,12 +122,6 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode);
* been done yet.
*/
static inline void ext4_journal_release_buffer(handle_t *handle,
struct buffer_head *bh)
{
jbd2_journal_release_buffer(handle, bh);
}
void ext4_journal_abort_handle(const char *caller, const char *err_fn,
struct buffer_head *bh, handle_t *handle, int err);
......@@ -146,8 +140,8 @@ int __ext4_journal_revoke(const char *where, handle_t *handle,
int __ext4_journal_get_create_access(const char *where,
handle_t *handle, struct buffer_head *bh);
int __ext4_journal_dirty_metadata(const char *where,
handle_t *handle, struct buffer_head *bh);
int __ext4_handle_dirty_metadata(const char *where, handle_t *handle,
struct inode *inode, struct buffer_head *bh);
#define ext4_journal_get_undo_access(handle, bh) \
__ext4_journal_get_undo_access(__func__, (handle), (bh))
......@@ -157,14 +151,57 @@ int __ext4_journal_dirty_metadata(const char *where,
__ext4_journal_revoke(__func__, (handle), (blocknr), (bh))
#define ext4_journal_get_create_access(handle, bh) \
__ext4_journal_get_create_access(__func__, (handle), (bh))
#define ext4_journal_dirty_metadata(handle, bh) \
__ext4_journal_dirty_metadata(__func__, (handle), (bh))
#define ext4_journal_forget(handle, bh) \
__ext4_journal_forget(__func__, (handle), (bh))
#define ext4_handle_dirty_metadata(handle, inode, bh) \
__ext4_handle_dirty_metadata(__func__, (handle), (inode), (bh))
handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks);
int __ext4_journal_stop(const char *where, handle_t *handle);
#define EXT4_NOJOURNAL_HANDLE ((handle_t *) 0x1)
static inline int ext4_handle_valid(handle_t *handle)
{
if (handle == EXT4_NOJOURNAL_HANDLE)
return 0;
return 1;
}
static inline void ext4_handle_sync(handle_t *handle)
{
if (ext4_handle_valid(handle))
handle->h_sync = 1;
}
static inline void ext4_handle_release_buffer(handle_t *handle,
struct buffer_head *bh)
{
if (ext4_handle_valid(handle))
jbd2_journal_release_buffer(handle, bh);
}
static inline int ext4_handle_is_aborted(handle_t *handle)
{
if (ext4_handle_valid(handle))
return is_handle_aborted(handle);
return 0;
}
static inline int ext4_handle_has_enough_credits(handle_t *handle, int needed)
{
if (ext4_handle_valid(handle) && handle->h_buffer_credits < needed)
return 0;
return 1;
}
static inline void ext4_journal_release_buffer(handle_t *handle,
struct buffer_head *bh)
{
if (ext4_handle_valid(handle))
jbd2_journal_release_buffer(handle, bh);
}
static inline handle_t *ext4_journal_start(struct inode *inode, int nblocks)
{
return ext4_journal_start_sb(inode->i_sb, nblocks);
......@@ -180,27 +217,37 @@ static inline handle_t *ext4_journal_current_handle(void)
static inline int ext4_journal_extend(handle_t *handle, int nblocks)
{
if (ext4_handle_valid(handle))
return jbd2_journal_extend(handle, nblocks);
return 0;
}
static inline int ext4_journal_restart(handle_t *handle, int nblocks)
{
if (ext4_handle_valid(handle))
return jbd2_journal_restart(handle, nblocks);
return 0;
}
static inline int ext4_journal_blocks_per_page(struct inode *inode)
{
if (EXT4_JOURNAL(inode) != NULL)
return jbd2_journal_blocks_per_page(inode);
return 0;
}
static inline int ext4_journal_force_commit(journal_t *journal)
{
if (journal)
return jbd2_journal_force_commit(journal);
return 0;
}
static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
{
if (ext4_handle_valid(handle))
return jbd2_journal_file_inode(handle, &EXT4_I(inode)->jinode);
return 0;
}
/* super.c */
......@@ -208,6 +255,8 @@ int ext4_force_commit(struct super_block *sb);
static inline int ext4_should_journal_data(struct inode *inode)
{
if (EXT4_JOURNAL(inode) == NULL)
return 0;
if (!S_ISREG(inode->i_mode))
return 1;
if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
......@@ -219,6 +268,8 @@ static inline int ext4_should_journal_data(struct inode *inode)
static inline int ext4_should_order_data(struct inode *inode)
{
if (EXT4_JOURNAL(inode) == NULL)
return 0;
if (!S_ISREG(inode->i_mode))
return 0;
if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)
......@@ -230,6 +281,8 @@ static inline int ext4_should_order_data(struct inode *inode)
static inline int ext4_should_writeback_data(struct inode *inode)
{
if (EXT4_JOURNAL(inode) == NULL)
return 0;
if (!S_ISREG(inode->i_mode))
return 0;
if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)
......
......@@ -57,6 +57,7 @@ struct ext4_sb_info {
u32 s_next_generation;
u32 s_hash_seed[4];
int s_def_hash_version;
int s_hash_unsigned; /* 3 if hash should be signed, 0 if not */
struct percpu_counter s_freeblocks_counter;
struct percpu_counter s_freeinodes_counter;
struct percpu_counter s_dirs_counter;
......@@ -73,6 +74,8 @@ struct ext4_sb_info {
struct journal_s *s_journal;
struct list_head s_orphan;
unsigned long s_commit_interval;
u32 s_max_batch_time;
u32 s_min_batch_time;
struct block_device *journal_bdev;
#ifdef CONFIG_JBD2_DEBUG
struct timer_list turn_ro_timer; /* For turning read-only (crash simulation) */
......@@ -101,7 +104,8 @@ struct ext4_sb_info {
spinlock_t s_reserve_lock;
spinlock_t s_md_lock;
tid_t s_last_transaction;
unsigned short *s_mb_offsets, *s_mb_maxs;
unsigned short *s_mb_offsets;
unsigned int *s_mb_maxs;
/* tunables */
unsigned long s_stripe;
......
......@@ -97,6 +97,8 @@ static int ext4_ext_journal_restart(handle_t *handle, int needed)
{
int err;
if (!ext4_handle_valid(handle))
return 0;
if (handle->h_buffer_credits > needed)
return 0;
err = ext4_journal_extend(handle, needed);
......@@ -134,7 +136,7 @@ static int ext4_ext_dirty(handle_t *handle, struct inode *inode,
int err;
if (path->p_bh) {
/* path points to block */
err = ext4_journal_dirty_metadata(handle, path->p_bh);
err = ext4_handle_dirty_metadata(handle, inode, path->p_bh);
} else {
/* path points to leaf/index in inode body */
err = ext4_mark_inode_dirty(handle, inode);
......@@ -191,7 +193,7 @@ ext4_ext_new_meta_block(handle_t *handle, struct inode *inode,
ext4_fsblk_t goal, newblock;
goal = ext4_ext_find_goal(inode, path, le32_to_cpu(ex->ee_block));
newblock = ext4_new_meta_block(handle, inode, goal, err);
newblock = ext4_new_meta_blocks(handle, inode, goal, NULL, err);
return newblock;
}
......@@ -780,7 +782,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
set_buffer_uptodate(bh);
unlock_buffer(bh);
err = ext4_journal_dirty_metadata(handle, bh);
err = ext4_handle_dirty_metadata(handle, inode, bh);
if (err)
goto cleanup;
brelse(bh);
......@@ -859,7 +861,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
set_buffer_uptodate(bh);
unlock_buffer(bh);
err = ext4_journal_dirty_metadata(handle, bh);
err = ext4_handle_dirty_metadata(handle, inode, bh);
if (err)
goto cleanup;
brelse(bh);
......@@ -955,7 +957,7 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
set_buffer_uptodate(bh);
unlock_buffer(bh);
err = ext4_journal_dirty_metadata(handle, bh);
err = ext4_handle_dirty_metadata(handle, inode, bh);
if (err)
goto out;
......@@ -1160,15 +1162,13 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
while (--depth >= 0) {
ix = path[depth].p_idx;
if (ix != EXT_LAST_INDEX(path[depth].p_hdr))
break;
goto got_index;
}
if (depth < 0) {
/* we've gone up to the root and
* found no index to the right */
/* we've gone up to the root and found no index to the right */
return 0;
}
got_index:
/* we've found index to the right, let's
* follow it and find the closest allocated
* block to the right */
......@@ -1201,7 +1201,6 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
*phys = ext_pblock(ex);
put_bh(bh);
return 0;
}
/*
......@@ -1622,7 +1621,6 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
ext4_ext_drop_refs(npath);
kfree(npath);
}
ext4_ext_tree_changed(inode);
ext4_ext_invalidate_cache(inode);
return err;
}
......@@ -2233,7 +2231,6 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start)
}
}
out:
ext4_ext_tree_changed(inode);
ext4_ext_drop_refs(path);
kfree(path);
ext4_journal_stop(handle);
......@@ -2250,7 +2247,7 @@ void ext4_ext_init(struct super_block *sb)
* possible initialization would be here
*/
if (test_opt(sb, EXTENTS)) {
if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) {
printk(KERN_INFO "EXT4-fs: file extents enabled");
#ifdef AGGRESSIVE_TEST
printk(", aggressive tests");
......@@ -2275,7 +2272,7 @@ void ext4_ext_init(struct super_block *sb)
*/
void ext4_ext_release(struct super_block *sb)
{
if (!test_opt(sb, EXTENTS))
if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS))
return;
#ifdef EXTENTS_STATS
......@@ -2380,7 +2377,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
struct inode *inode,
struct ext4_ext_path *path,
ext4_lblk_t iblock,
unsigned long max_blocks)
unsigned int max_blocks)
{
struct ext4_extent *ex, newex, orig_ex;
struct ext4_extent *ex1 = NULL;
......@@ -2678,26 +2675,26 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
*/
int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_lblk_t iblock,
unsigned long max_blocks, struct buffer_head *bh_result,
unsigned int max_blocks, struct buffer_head *bh_result,
int create, int extend_disksize)
{
struct ext4_ext_path *path = NULL;
struct ext4_extent_header *eh;
struct ext4_extent newex, *ex;
ext4_fsblk_t goal, newblock;
int err = 0, depth, ret;
unsigned long allocated = 0;
ext4_fsblk_t newblock;
int err = 0, depth, ret, cache_type;
unsigned int allocated = 0;
struct ext4_allocation_request ar;
loff_t disksize;
__clear_bit(BH_New, &bh_result->b_state);
ext_debug("blocks %u/%lu requested for inode %u\n",
ext_debug("blocks %u/%u requested for inode %u\n",
iblock, max_blocks, inode->i_ino);
/* check in cache */
goal = ext4_ext_in_cache(inode, iblock, &newex);
if (goal) {
if (goal == EXT4_EXT_CACHE_GAP) {
cache_type = ext4_ext_in_cache(inode, iblock, &newex);
if (cache_type) {
if (cache_type == EXT4_EXT_CACHE_GAP) {
if (!create) {
/*
* block isn't allocated yet and
......@@ -2706,7 +2703,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
goto out2;
}
/* we should allocate requested block */
} else if (goal == EXT4_EXT_CACHE_EXTENT) {
} else if (cache_type == EXT4_EXT_CACHE_EXTENT) {
/* block is already allocated */
newblock = iblock
- le32_to_cpu(newex.ee_block)
......@@ -2854,7 +2851,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
if (!newblock)
goto out2;
ext_debug("allocate new block: goal %llu, found %llu/%lu\n",
goal, newblock, allocated);
ar.goal, newblock, allocated);
/* try to insert new extent into found leaf and return */
ext4_ext_store_pblock(&newex, newblock);
......@@ -2950,7 +2947,7 @@ void ext4_ext_truncate(struct inode *inode)
* transaction synchronous.
*/
if (IS_SYNC(inode))
handle->h_sync = 1;
ext4_handle_sync(handle);
out_stop:
up_write(&EXT4_I(inode)->i_data_sem);
......@@ -3004,7 +3001,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
handle_t *handle;
ext4_lblk_t block;
loff_t new_size;
unsigned long max_blocks;
unsigned int max_blocks;
int ret = 0;
int ret2 = 0;
int retries = 0;
......@@ -3083,7 +3080,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
/*
* Callback function called for each extent to gather FIEMAP information.
*/
int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
struct ext4_ext_cache *newex, struct ext4_extent *ex,
void *data)
{
......@@ -3152,7 +3149,8 @@ int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
/* fiemap flags we can handle specified here */
#define EXT4_FIEMAP_FLAGS (FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR)
int ext4_xattr_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo)
static int ext4_xattr_fiemap(struct inode *inode,
struct fiemap_extent_info *fieinfo)
{
__u64 physical = 0;
__u64 length;
......
......@@ -140,9 +140,6 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
return 0;
}
extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
const struct file_operations ext4_file_operations = {
.llseek = generic_file_llseek,
.read = do_sync_read,
......
......@@ -35,23 +35,71 @@ static void TEA_transform(__u32 buf[4], __u32 const in[])
/* The old legacy hash */
static __u32 dx_hack_hash(const char *name, int len)
static __u32 dx_hack_hash_unsigned(const char *name, int len)
{
__u32 hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
const unsigned char *ucp = (const unsigned char *) name;
while (len--) {
hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
if (hash & 0x80000000)
hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return hash0 << 1;
}
static __u32 dx_hack_hash_signed(const char *name, int len)
{
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
const signed char *scp = (const signed char *) name;
while (len--) {
__u32 hash = hash1 + (hash0 ^ (*name++ * 7152373));
hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
if (hash & 0x80000000) hash -= 0x7fffffff;
if (hash & 0x80000000)
hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return (hash0 << 1);
return hash0 << 1;
}
static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
{
__u32 pad, val;
int i;
const signed char *scp = (const signed char *) msg;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
val = pad;
if (len > num*4)
len = num * 4;
for (i = 0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = ((int) scp[i]) + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
num--;
}
}
if (--num >= 0)
*buf++ = val;
while (--num >= 0)
*buf++ = pad;
}
static void str2hashbuf(const char *msg, int len, __u32 *buf, int num)
static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
{
__u32 pad, val;
int i;
const unsigned char *ucp = (const unsigned char *) msg;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
......@@ -62,7 +110,7 @@ static void str2hashbuf(const char *msg, int len, __u32 *buf, int num)
for (i = 0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = msg[i] + (val << 8);
val = ((int) ucp[i]) + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
......@@ -95,6 +143,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
const char *p;
int i;
__u32 in[8], buf[4];
void (*str2hashbuf)(const char *, int, __u32 *, int) =
str2hashbuf_signed;
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
......@@ -113,13 +163,18 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
}
switch (hinfo->hash_version) {
case DX_HASH_LEGACY_UNSIGNED:
hash = dx_hack_hash_unsigned(name, len);
break;
case DX_HASH_LEGACY:
hash = dx_hack_hash(name, len);
hash = dx_hack_hash_signed(name, len);
break;
case DX_HASH_HALF_MD4_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
case DX_HASH_HALF_MD4:
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 8);
(*str2hashbuf)(p, len, in, 8);
half_md4_transform(buf, in);
len -= 32;
p += 32;
......@@ -127,10 +182,12 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
minor_hash = buf[2];
hash = buf[1];
break;
case DX_HASH_TEA_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
case DX_HASH_TEA:
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 4);
(*str2hashbuf)(p, len, in, 4);
TEA_transform(buf, in);
len -= 16;
p += 16;
......
This diff is collapsed.
This diff is collapsed.
......@@ -99,7 +99,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
goto flags_out;
}
if (IS_SYNC(inode))
handle->h_sync = 1;
ext4_handle_sync(handle);
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err)
goto flags_err;
......
This diff is collapsed.
......@@ -20,6 +20,7 @@
#include <linux/version.h>
#include <linux/blkdev.h>
#include <linux/marker.h>
#include <linux/mutex.h>
#include "ext4_jbd2.h"
#include "ext4.h"
#include "group.h"
......@@ -98,9 +99,6 @@
*/
#define MB_DEFAULT_GROUP_PREALLOC 512
static struct kmem_cache *ext4_pspace_cachep;
static struct kmem_cache *ext4_ac_cachep;
static struct kmem_cache *ext4_free_ext_cachep;
struct ext4_free_data {
/* this links the free block information from group_info */
......@@ -120,26 +118,6 @@ struct ext4_free_data {
tid_t t_tid;
};
struct ext4_group_info {
unsigned long bb_state;
struct rb_root bb_free_root;
unsigned short bb_first_free;
unsigned short bb_free;
unsigned short bb_fragments;
struct list_head bb_prealloc_list;
#ifdef DOUBLE_CHECK
void *bb_bitmap;
#endif
unsigned short bb_counters[];
};
#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
#define EXT4_GROUP_INFO_LOCKED_BIT 1
#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
struct ext4_prealloc_space {
struct list_head pa_inode_list;
struct list_head pa_group_list;
......@@ -217,6 +195,11 @@ struct ext4_allocation_context {
__u8 ac_op; /* operation, for history only */
struct page *ac_bitmap_page;
struct page *ac_buddy_page;
/*
* pointer to the held semaphore upon successful
* block allocation
*/
struct rw_semaphore *alloc_semp;
struct ext4_prealloc_space *ac_pa;
struct ext4_locality_group *ac_lg;
};
......@@ -250,6 +233,7 @@ struct ext4_buddy {
struct super_block *bd_sb;
__u16 bd_blkbits;
ext4_group_t bd_group;
struct rw_semaphore *alloc_semp;
};
#define EXT4_MB_BITMAP(e4b) ((e4b)->bd_bitmap)
#define EXT4_MB_BUDDY(e4b) ((e4b)->bd_buddy)
......@@ -259,51 +243,12 @@ static inline void ext4_mb_store_history(struct ext4_allocation_context *ac)
{
return;
}
#else
static void ext4_mb_store_history(struct ext4_allocation_context *ac);
#endif
#define in_range(b, first, len) ((b) >= (first) && (b) <= (first) + (len) - 1)
struct buffer_head *read_block_bitmap(struct super_block *, ext4_group_t);
static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
ext4_group_t group);
static void ext4_mb_return_to_preallocation(struct inode *inode,
struct ext4_buddy *e4b, sector_t block,
int count);
static void ext4_mb_put_pa(struct ext4_allocation_context *,
struct super_block *, struct ext4_prealloc_space *pa);
static int ext4_mb_init_per_dev_proc(struct super_block *sb);
static int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
static void release_blocks_on_commit(journal_t *journal, transaction_t *txn);
static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
{
struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
bit_spin_lock(EXT4_GROUP_INFO_LOCKED_BIT, &(grinfo->bb_state));
}
static inline void ext4_unlock_group(struct super_block *sb,
ext4_group_t group)
{
struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
bit_spin_unlock(EXT4_GROUP_INFO_LOCKED_BIT, &(grinfo->bb_state));
}
static inline int ext4_is_group_locked(struct super_block *sb,
ext4_group_t group)
{
struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
return bit_spin_is_locked(EXT4_GROUP_INFO_LOCKED_BIT,
&(grinfo->bb_state));
}
static ext4_fsblk_t ext4_grp_offs_to_block(struct super_block *sb,
static inline ext4_fsblk_t ext4_grp_offs_to_block(struct super_block *sb,
struct ext4_free_extent *fex)
{
ext4_fsblk_t block;
......
......@@ -59,7 +59,8 @@ static int finish_range(handle_t *handle, struct inode *inode,
/*
* Make sure the credit we accumalated is not really high
*/
if (needed && handle->h_buffer_credits >= EXT4_RESERVE_TRANS_BLOCKS) {
if (needed && ext4_handle_has_enough_credits(handle,
EXT4_RESERVE_TRANS_BLOCKS)) {
retval = ext4_journal_restart(handle, needed);
if (retval)
goto err_out;
......@@ -229,7 +230,7 @@ static int extend_credit_for_blkdel(handle_t *handle, struct inode *inode)
{
int retval = 0, needed;
if (handle->h_buffer_credits > EXT4_RESERVE_TRANS_BLOCKS)
if (ext4_handle_has_enough_credits(handle, EXT4_RESERVE_TRANS_BLOCKS+1))
return 0;
/*
* We are freeing a blocks. During this we touch
......@@ -458,13 +459,13 @@ int ext4_ext_migrate(struct inode *inode)
struct list_blocks_struct lb;
unsigned long max_entries;
if (!test_opt(inode->i_sb, EXTENTS))
/*
* if mounted with noextents we don't allow the migrate
* If the filesystem does not support extents, or the inode
* already is extent-based, error out.
*/
return -EINVAL;
if ((EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
if (!EXT4_HAS_INCOMPAT_FEATURE(inode->i_sb,
EXT4_FEATURE_INCOMPAT_EXTENTS) ||
(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EINVAL;
if (S_ISLNK(inode->i_mode) && inode->i_blocks == 0)
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -457,7 +457,7 @@ static void ext4_xattr_update_super_block(handle_t *handle,
if (ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh) == 0) {
EXT4_SET_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_EXT_ATTR);
sb->s_dirt = 1;
ext4_journal_dirty_metadata(handle, EXT4_SB(sb)->s_sbh);
ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh);
}
}
......@@ -487,9 +487,9 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
ext4_forget(handle, 1, inode, bh, bh->b_blocknr);
} else {
le32_add_cpu(&BHDR(bh)->h_refcount, -1);
error = ext4_journal_dirty_metadata(handle, bh);
error = ext4_handle_dirty_metadata(handle, inode, bh);
if (IS_SYNC(inode))
handle->h_sync = 1;
ext4_handle_sync(handle);
DQUOT_FREE_BLOCK(inode, 1);
ea_bdebug(bh, "refcount now=%d; releasing",
le32_to_cpu(BHDR(bh)->h_refcount));
......@@ -724,7 +724,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
if (error == -EIO)
goto bad_block;
if (!error)
error = ext4_journal_dirty_metadata(handle,
error = ext4_handle_dirty_metadata(handle,
inode,
bs->bh);
if (error)
goto cleanup;
......@@ -794,7 +795,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
ea_bdebug(new_bh, "reusing; refcount now=%d",
le32_to_cpu(BHDR(new_bh)->h_refcount));
unlock_buffer(new_bh);
error = ext4_journal_dirty_metadata(handle,
error = ext4_handle_dirty_metadata(handle,
inode,
new_bh);
if (error)
goto cleanup_dquot;
......@@ -810,8 +812,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
/* We need to allocate a new block */
ext4_fsblk_t goal = ext4_group_first_block_no(sb,
EXT4_I(inode)->i_block_group);
ext4_fsblk_t block = ext4_new_meta_block(handle, inode,
goal, &error);
ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode,
goal, NULL, &error);
if (error)
goto cleanup;
ea_idebug(inode, "creating block %d", block);
......@@ -833,7 +835,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
set_buffer_uptodate(new_bh);
unlock_buffer(new_bh);
ext4_xattr_cache_insert(new_bh);
error = ext4_journal_dirty_metadata(handle, new_bh);
error = ext4_handle_dirty_metadata(handle,
inode, new_bh);
if (error)
goto cleanup;
}
......@@ -1040,7 +1043,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
*/
is.iloc.bh = NULL;
if (IS_SYNC(inode))
handle->h_sync = 1;
ext4_handle_sync(handle);
}
cleanup:
......
......@@ -27,7 +27,7 @@
#include <linux/security.h>
#include <linux/pid_namespace.h>
static int set_task_ioprio(struct task_struct *task, int ioprio)
int set_task_ioprio(struct task_struct *task, int ioprio)
{
int err;
struct io_context *ioc;
......@@ -70,6 +70,7 @@ static int set_task_ioprio(struct task_struct *task, int ioprio)
task_unlock(task);
return err;
}
EXPORT_SYMBOL_GPL(set_task_ioprio);
asmlinkage long sys_ioprio_set(int which, int who, int ioprio)
{
......
......@@ -249,16 +249,14 @@ static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
return ret;
}
#define NR_BATCH 64
static void
__flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
__flush_batch(journal_t *journal, int *batch_count)
{
int i;
ll_rw_block(SWRITE, *batch_count, bhs);
ll_rw_block(SWRITE, *batch_count, journal->j_chkpt_bhs);
for (i = 0; i < *batch_count; i++) {
struct buffer_head *bh = bhs[i];
struct buffer_head *bh = journal->j_chkpt_bhs[i];
clear_buffer_jwrite(bh);
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
......@@ -277,8 +275,7 @@ __flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
* Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
*/
static int __process_buffer(journal_t *journal, struct journal_head *jh,
struct buffer_head **bhs, int *batch_count,
transaction_t *transaction)
int *batch_count, transaction_t *transaction)
{
struct buffer_head *bh = jh2bh(jh);
int ret = 0;
......@@ -325,14 +322,14 @@ static int __process_buffer(journal_t *journal, struct journal_head *jh,
get_bh(bh);
J_ASSERT_BH(bh, !buffer_jwrite(bh));
set_buffer_jwrite(bh);
bhs[*batch_count] = bh;
journal->j_chkpt_bhs[*batch_count] = bh;
__buffer_relink_io(jh);
jbd_unlock_bh_state(bh);
transaction->t_chp_stats.cs_written++;
(*batch_count)++;
if (*batch_count == NR_BATCH) {
if (*batch_count == JBD2_NR_BATCH) {
spin_unlock(&journal->j_list_lock);
__flush_batch(journal, bhs, batch_count);
__flush_batch(journal, batch_count);
ret = 1;
}
}
......@@ -388,7 +385,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
if (journal->j_checkpoint_transactions == transaction &&
transaction->t_tid == this_tid) {
int batch_count = 0;
struct buffer_head *bhs[NR_BATCH];
struct journal_head *jh;
int retry = 0, err;
......@@ -402,7 +398,7 @@ int jbd2_log_do_checkpoint(journal_t *journal)
retry = 1;
break;
}
retry = __process_buffer(journal, jh, bhs, &batch_count,
retry = __process_buffer(journal, jh, &batch_count,
transaction);
if (retry < 0 && !result)
result = retry;
......@@ -419,7 +415,7 @@ int jbd2_log_do_checkpoint(journal_t *journal)
spin_unlock(&journal->j_list_lock);
retry = 1;
}
__flush_batch(journal, bhs, &batch_count);
__flush_batch(journal, &batch_count);
}
if (retry) {
......@@ -686,6 +682,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
safely remove this transaction from the log */
__jbd2_journal_drop_transaction(journal, transaction);
kfree(transaction);
/* Just in case anybody was waiting for more transactions to be
checkpointed... */
......@@ -760,5 +757,4 @@ void __jbd2_journal_drop_transaction(journal_t *journal, transaction_t *transact
J_ASSERT(journal->j_running_transaction != transaction);
jbd_debug(1, "Dropping transaction %d, all done\n", transaction->t_tid);
kfree(transaction);
}
This diff is collapsed.
This diff is collapsed.
......@@ -25,6 +25,7 @@
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/hrtimer.h>
static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh);
......@@ -48,6 +49,7 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
{
transaction->t_journal = journal;
transaction->t_state = T_RUNNING;
transaction->t_start_time = ktime_get();
transaction->t_tid = journal->j_transaction_sequence++;
transaction->t_expires = jiffies + journal->j_commit_interval;
spin_lock_init(&transaction->t_handle_lock);
......@@ -1240,7 +1242,7 @@ int jbd2_journal_stop(handle_t *handle)
{
transaction_t *transaction = handle->h_transaction;
journal_t *journal = transaction->t_journal;
int old_handle_count, err;
int err;
pid_t pid;
J_ASSERT(journal_current_handle() == handle);
......@@ -1263,24 +1265,54 @@ int jbd2_journal_stop(handle_t *handle)
/*
* Implement synchronous transaction batching. If the handle
* was synchronous, don't force a commit immediately. Let's
* yield and let another thread piggyback onto this transaction.
* Keep doing that while new threads continue to arrive.
* It doesn't cost much - we're about to run a commit and sleep
* on IO anyway. Speeds up many-threaded, many-dir operations
* by 30x or more...
*
* But don't do this if this process was the most recent one to
* perform a synchronous write. We do this to detect the case where a
* single process is doing a stream of sync writes. No point in waiting
* for joiners in that case.
* yield and let another thread piggyback onto this
* transaction. Keep doing that while new threads continue to
* arrive. It doesn't cost much - we're about to run a commit
* and sleep on IO anyway. Speeds up many-threaded, many-dir
* operations by 30x or more...
*
* We try and optimize the sleep time against what the
* underlying disk can do, instead of having a static sleep
* time. This is useful for the case where our storage is so
* fast that it is more optimal to go ahead and force a flush
* and wait for the transaction to be committed than it is to
* wait for an arbitrary amount of time for new writers to
* join the transaction. We achieve this by measuring how
* long it takes to commit a transaction, and compare it with
* how long this transaction has been running, and if run time
* < commit time then we sleep for the delta and commit. This
* greatly helps super fast disks that would see slowdowns as
* more threads started doing fsyncs.
*
* But don't do this if this process was the most recent one
* to perform a synchronous write. We do this to detect the
* case where a single process is doing a stream of sync
* writes. No point in waiting for joiners in that case.
*/
pid = current->pid;
if (handle->h_sync && journal->j_last_sync_writer != pid) {
u64 commit_time, trans_time;
journal->j_last_sync_writer = pid;
do {
old_handle_count = transaction->t_handle_count;
schedule_timeout_uninterruptible(1);
} while (old_handle_count != transaction->t_handle_count);
spin_lock(&journal->j_state_lock);
commit_time = journal->j_average_commit_time;
spin_unlock(&journal->j_state_lock);
trans_time = ktime_to_ns(ktime_sub(ktime_get(),
transaction->t_start_time));
commit_time = max_t(u64, commit_time,
1000*journal->j_min_batch_time);
commit_time = min_t(u64, commit_time,
1000*journal->j_max_batch_time);
if (trans_time < commit_time) {
ktime_t expires = ktime_add_ns(ktime_get(),
commit_time);
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
}
}
current->journal_info = NULL;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment