- 06 May, 2010 8 commits
-
-
Mark Fasheh authored
I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
Mark Fasheh authored
This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
Mark Fasheh authored
Inodes are always allocated from the global bitmap now so we don't need this any more. Also, the existing implementation bounces reservations around needlessly. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
Otherwise, the need for a very large contiguous allocation tends to wreak havoc on many inode allocation reservations on the local alloc, thus ruining any chances for contiguousness. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
- 24 Mar, 2010 1 commit
-
-
Mark Fasheh authored
When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
- 19 Mar, 2010 2 commits
-
-
Tao Ma authored
You can't store a pointer that you haven't filled in yet and expect it to work. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
Tao Ma authored
When replacing a xattr's value, in some case we wipe its name/value first and then re-add it. The wipe is done by ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or block. We currently adjust name_offset for all the entries which have (offset < name_offset). This does not adjust the entrie we're replacing. Since we are replacing the entry, we don't adjust the total entry count. When we calculate a new namevalue location, we trust the entries now-wrong offset in ocfs2_xa_get_free_start(). The solution is to also adjust the name_offset for the replaced entry, allowing ocfs2_xa_get_free_start() to calculate the new namevalue location correctly. The following script can trigger a kernel panic easily. echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE mount -t ocfs2 $DEVICE $MNT_DIR FILE=$MNT_DIR/$RANDOM for((i=0;i<76;i++)) do string_76="a$string_76" done string_78="aa$string_76" string_82="aaaa$string_78" touch $FILE setfattr -n 'user.test1234567890' -v $string_76 $FILE setfattr -n 'user.test1234567890' -v $string_78 $FILE setfattr -n 'user.test1234567890' -v $string_82 $FILE Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
- 18 Mar, 2010 1 commit
-
-
Mark Fasheh authored
What we were doing before was to ask for the current window size as the maximum allocation. This had the effect of limiting the amount of allocation we could get for the local alloc during times when the window size was shrunk due to fragmentation. In some cases, that could actually *increase* fragmentation by artificially limiting the number of bits we can accept. So while we still want to ask for a minimum number of bits equal to window size, there is no reason why we should limit the number of bits the local alloc should accept. Hence always allow the maximum number of local alloc bits. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
- 17 Mar, 2010 4 commits
-
-
Mark Fasheh authored
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory inode, but never setting it on the disk copy. Thus, acls were some times not getting propagated between nodes. This patch fixes the issue by adding a helper function ocfs2_acl_set_mode() which does this the right way. ocfs2_set_acl() and ocfs2_init_acl() are then updated to call ocfs2_acl_set_mode(). Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
Tao Ma authored
In reflink, we need to upate i_blocks for the target inode. Reported-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
Tao Ma authored
In ocfs2_validate_gd_parent, we check bg_chain against the cl_next_free_rec of the dinode. Actually in resize, we have the chance of bg_chain == cl_next_free_rec. So add some additional condition check for it. I also rename paramter "clean_error" to "resize", since the old one is not clearly enough to indicate that we should only meet with this case in resize. btw, the correpsonding bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
Sachin Prabhu authored
[PATCH] Skip check for mandatory locks when unlocking ocfs2_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. ocfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
-
- 15 Mar, 2010 21 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6Linus Torvalds authored
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits) ACPI: processor: push file static MADT pointer into internal map_madt_entry() ACPI: processor: refactor internal map_lsapic_id() ACPI: processor: refactor internal map_x2apic_id() ACPI: processor: refactor internal map_lapic_id() ACPI: processor: driver doesn't need to evaluate _PDC ACPI: processor: remove early _PDC optin quirks ACPI: processor: add internal processor_physically_present() ACPI: processor: move acpi_get_cpuid into processor_core.c ACPI: processor: export acpi_get_cpuid() ACPI: processor: mv processor_pdc.c processor_core.c ACPI: processor: mv processor_core.c processor_driver.c ACPI: plan to delete "acpi=ht" boot option ACPI: remove "acpi=ht" DMI blacklist PNPACPI: add bus number support PNPACPI: add window support resource: add window support resource: add bus number support resource: expand IORESOURCE_TYPE_BITS to make room for bus resource type acpiphp: Execute ACPI _REG method for hotadded devices ACPI video: Be more liberal in validating _BQC behaviour ...
-
Wolfram Sang authored
Commit 6992f533 ("sysfs: Use one lockdep class per sysfs attribute.") introduced this requirement. First, at25 was fixed manually. Then, other occurences were found with coccinelle and the following semantic patch. Results were reviewed and fixed up: @ init @ identifier struct_name, bin; @@ struct struct_name { ... struct bin_attribute bin; ... }; @ main extends init @ expression E; statement S; identifier name, err; @@ ( struct struct_name *name; | - struct struct_name *name = NULL; + struct struct_name *name; ) ... ( sysfs_bin_attr_init(&name->bin); | + sysfs_bin_attr_init(&name->bin); if (sysfs_create_bin_file(E, &name->bin)) S | + sysfs_bin_attr_init(&name->bin); err = sysfs_create_bin_file(E, &name->bin); ) Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Len Brown authored
Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668', 'bugzilla-531916-power-state', 'ht-warn-2.6.34', 'pnp', 'processor-rename', 'sony-2.6.34', 'suse-bugzilla-531547', 'tz-check', 'video' and 'misc-2.6.34' into release
-
Alex Chiang authored
There's no real need for a pointer to the MADT to be global. The only function who uses it is map_madt_entry. This allows us to remove some more ugly #ifdefs. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Un-nest the if statements for readability. Remove comments that re-state the obvious. Change the control flow so that we no longer need a temp variable. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Untangle the nested if conditions to make this function look more similar to the other map_*apic_id() functions. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Untangle the if() statement a little for readability. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Now that the early _PDC evaluation path knows how to correctly evaluate _PDC on only physically present processors, there's no need for the processor driver to evaluate it later when it loads. To cover the hotplug case, push _PDC evaluation down into the hotplug paths. Cc: x86@kernel.org Cc: Tony Luck <tony.luck@intel.com> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Now that we check for physically present processors before blindly evaluating _PDC, we no longer need to maintain a DMI opt-in table nor a kernel param. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Detect if a processor is physically present before evaluating _PDC. We want this because some BIOS will provide a _PDC even for processors that are not present. These bogus _PDC methods then attempt to load non-existent tables, which causes problems. Avoid those bogus landmines. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Enumerating processors (via MADT/_MAT) belongs in the processor core, which is always built-in, rather than living in the processor driver which may not be built. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
Rename static get_cpu_id() to acpi_get_cpuid() and export it. This change also gives us an opportunity to remove the #ifndef CONFIG_SMP from processor_driver.c and into a header file where it properly belongs. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
We've renamed the old processor_core.c to processor_driver.c, to convey the idea that it can be built modular and has driver-like bits. Now let's re-create a processor_core.c for the bits needed statically by the rest of the kernel. The contents of processor_pdc.c are a good starting spot, so let's just rename that file and complete our three card monte. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Alex Chiang authored
The ACPI processor driver can be built as a module. But it has pieces of code that should always be built statically into the kernel. The plan is for processor_core.c to contain the static bits while processor_driver.c contains the module-like bits. Since the bulk of the code in the current processor_core.c is module-like, first step is to rename the file to processor_driver.c Next step will re-create processor_core.c and cherry-pick out the static bits. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Len Brown authored
Signed-off-by: Len Brown <len.brown@intel.com>
-
Len Brown authored
SuSE added these entries when deploying ACPI in Linux-2.4. I pulled them into Linux-2.6 on 2003-08-09. Over the last 6+ years, several entries have proven to be unnecessary and deleted, while no new entries have been added. Matthew suggests that they now have negative value, and I agree. Based-on-patch-by: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Len Brown <len.brown@intel.com>
-
Bjorn Helgaas authored
Add support for bus number resources. This is for bridges with a range of bus numbers behind them. Previously, PNP ignored bus number resources. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Bjorn Helgaas authored
Add support for resource windows. This is for bridge resources, i.e., regions where a bridge forwards transactions from the primary to the secondary side. This does not add support for *setting* windows via the /proc interface. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Bjorn Helgaas authored
Add support for resource windows. This is for bridge resources, i.e., regions where a bridge forwards transactions from the primary to the secondary side. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Bjorn Helgaas authored
Add support for bus number resources. This is for bridges with a range of bus numbers behind them. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Bjorn Helgaas authored
No functional change; this just makes room for another resource type. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
- 14 Mar, 2010 3 commits
-
-
Dan Carpenter authored
The original code returns a freed pointer. This function is expected to return NULL on errors. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org>
-
Shaohua Li authored
Per ACPI spec, _ERG method should be executed before device driver gets control for hotpluged device. Firmware might do some configuration there. See http://bugzilla.kernel.org/show_bug.cgi?id=10805. In this machine, _REG method of docked device will configure cardbus bridge. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: Paul Martin <pm@debian.org> Signed-off-by: Len Brown <len.brown@intel.com>
-
Matthew Garrett authored
Right now, if _BQC returns a value we don't understand we immediately invalidate it. Change this behaviour so we only invalidate it if it continues to give an invalid answer after we've already set a brightness. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-