- 18 Jun, 2004 4 commits
-
-
Chris Mason authored
From: <mason@suse.com> From: <jeffm@suse.com> The current reiserfs allocator pretty much allocates things sequentially from the start of the disk, it works very nicely for desktop loads but once you've got more then one proc doing io data files can fragment badly. One obvious solution is something like ext2's bitmap groups, which puts file data into different areas of the disk based on which subdirectory they are in. The problem with bitmap groups is that if you've got a group of subdirectories their contents will be spread out all over the disk, leading to lots of seeks during a sequential read. This allocator patch uses the packing locality to determine which bitmap group to allocate from, but when you create a file it looks in the bitmaps to see how 'full' that packing locality already is. If it hasn't been heavily used yet, the packing locality is inherited from the parent directory putting files in new subdirs close to the parent subdir, otherwise it is the inode number of the parent directory putting new files far away from the parent subdir. The end result is fewer bitmap groups for the same working set. For example, one test data set created by 20 procs running in parallel has 6822 subdirs. And with vanilla reiserfs that would mean 6822 packing localities. This patch turns that into 26 packing localities. This makes sequential reads of big directory trees more efficient, but it also makes the btree more efficient in general. Things end up sorted better because groups of subdirs end up with similar keys in the btree, instead of being spread out all over. The bitmap grouping code tries to use the start of each bitmap group for metadata, and offsets the data slightly. The data and metadata are still close together, but not completely intermixed like they are in the default allocator. The end result is that leaf nodes tend to be close to each other, making metadata readahead more effective. The old block allocator had the ability to enforce a minimum allocation size, but did not use it. It now tries to do a pass looking for larger allocation chunks before falling back to the old behaviour of taking any blocks it can find. The patch changes the defaults to: mount -o alloc=skip_busy:dirid_groups:packing_groups You can get back the old behaviour with mount -o alloc=skip_busy mount -o alloc=dirid_groups will turn on the bitmap groups mount -o alloc=packing_groups turns on the packing locality reduction code mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and skip_busy Finally the patch adds a mount -o alloc=oid_groups, which puts files into bitmap groups based on a hash of their objectid. This would be used for databases or other situations where you have a limited number of very large files. This command will tell you how many packing localities are actually in use: debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Russell King authored
This patch cleans up needless includes of asm/pgalloc.h from the fs/ kernel/ and mm/ subtrees. Compile tested on multiple ARM platforms, and x86, this patch appears safe. This patch is part of a larger patch aiming towards getting the include of asm/pgtable.h out of linux/mm.h, so that asm/pgtable.h can sanely get at things like mm_struct and friends. I suggest testing in -mm for a while to ensure there aren't any hidden arch issues. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Yoav Zach authored
This patch allows for misc binaries to run with credentials and security token that are calculated according to the binaries, and not according to the interpreter, which is the legacy behavior of binfmt_misc. The way it is done is by calling prepare_binprm, which is where these attributes are calculated, before switching the 'file' field in the bprm from the binary to the interpreter. This feature should be used with care, since the interpreter will have root permissions when running a setuid binary owned by root. Please note - - Only root can register an interpreter with binfmt_misc. The feature is documented and the administrator is advised to handle it with care - The new feature is enabled only with a special flag in the registration string. When this flag is not specified the current behavior of binfmt_misc is kept - This is the only 'right' way for an interpreter to know the correct AT_SECURE value for the interpreted binary From: Chris Wright <chrisw@osdl.org> This patchset looks OK, except for one problem. It installs the fd (which could've been unreadable) without unsharing the ->files. So someone can use this to read unreadable yet executable files. Here's a patch which fixes that up. I added one bit that's commented out because I'm not positive if a final steal_locks() is needed. I did a fair amount of rearranging to simplify the error conditions relative to the fd_install(), and unshare_files(). From: Chris Wright <chrisw@osdl.org> I found that the intel patchset (and mine as well) leaked i_writecount on the original executed file. In addition, I verified that the steal_locks() bit is indeed needed. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Yoav Zach authored
<background> I work in a group that works on enabling the IA-32 Execution Layer (http://www.intel.com/pressroom/archive/releases/20040113comp.htm) on Linux. In a few words - this is a dynamic translator for IA-32 binaries on IPF platform. Following David Mosberger's advice - we use the binfmt_misc mechanism for the invocation of the translator whenever the user tries to exec an IA-32 binary. The EL is meant to help in the migration path from IA-32 to IPF. From our beta customers we learnt that at first stage - they tend to keep their environment mostly intact, using the legacy IA-32 binaries. Such an environment has, naturally, setuid and non-readable binaries. It will be useless to ask the administrator to change the settings of such an environment - some of them are very complex, and the administrators are reluctant to make any changes in a system that already proved itself to be robust and secure. So, our target with these patches is not to enhance the support for scripts but rather to allow a translator to be integrated into a working environment that is not (and should not be) aware to the fact it's being emulated. As I said before - it is practically hopeless to expect an administrator of such a system to change it so that it will suit the current behavior of binfmt_misc. But, even if we could do that, I'm not sure it would be a good idea - these changes are likely to be less secure than the suggested patches - - In order to execute non-readable binaries the binary will have to be made readable, which is obviously less secure than allowing only a trusted translator to read it - There will be no way for the translator to calculate the accurate AT_SECURE value for the translated process. This might end up with the translated process running in a non-secured mode when it actually needs to be secured. </background> I prepared a patch that solves a couple of problems that interpreters have when invoked via binfmt_misc. currently - 1) such interpreters cannot open non-readable binaries 2) the processes will have their credentials and security attributes calculated according to interpreter permissions and not those of the original binary the proposed patch solves these problems by - 1) opening the binary on behalf of the interpreter and passing its fd instead of the path as argv[1] to the interpreter 2) calling prepare_binprm with the file struct of the binary and not the one of the interpreter The new functionality is enabled by adding a special flag to the registration string. If this flag is not added then old behavior is not changed. A preliminary version of this patch was sent to the list on 9/1/2003 with the title "[PATCH]: non-readable binaries - binfmt_misc 2.6.0-test4". This new version fixes the concerns that were raised by the patch, except of calling unshare_files() before allocating a new fd. this is because this feature did not enter 2.6 yet. Arun Sharma <arun.sharma@intel.com> says: We were going through an internal review of this patch: http://marc.theaimsgroup.com/?l=linux-kernel&m=107424598901720&w=2 which is in your tree already. I'm not sure if this line of code got sufficient review. + /* call prepare_binprm before switching to interpreter's file + * so that all security calculation will be done according to + * binary and not interpreter */ + retval = prepare_binprm(bprm); The case that concerns me is: unprivileged interpreter and a privileged binary. One can use binfmt_misc to execute untrusted code (interpreter) with elevated privileges. One could argue that all binfmt_misc interpreters are trusted, because only root can register them. But that's a change from the traditional behavior of binfmt_misc (and binfmt_script). (Update): Arun pointed out that calculating the process credentials according to the binary that needs to be translated is a bit risky, since it requires the administrator to pay extra attention not to register an interpreter which is not intended to run with root credentials. After discussing this issue with him, I would like to propose a modified patch: The old patch did 2 things - 1) open the binary for reading and 2) calculate the credentials according to the binary. I removed the riskier part of changing the credentials calculation, so the revised patch only opens the binary for reading. It also includes few words of warning in the description of the 'open-binary' feature in binfmt_misc.txt, and makes the function entry_status print the flags in use. As for the 'credentials' part of the patch, I will prepare a separate patch for it and send it again to the LKML, describe the problem and ask for people comments. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 15 Jun, 2004 1 commit
-
-
Steve French authored
Fix i_size corruption in case of overlapped readdir changing cached file size and local cached write extending file
-
- 14 Jun, 2004 3 commits
-
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
Chris Wedgwood authored
Some filesystems can get overflows when their link-count exceeds 65534. This patch increases the kernels internal resolution for this and also has a check for the old-system call paths to return and error (-EOVERFLOW) as required (as suggested by Al Viro). Signed-off-by:
Chris Wedgwood <cw@f00f.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Nick Piggin <nickpiggin@yahoo.com.au> nfs_writepage() refuses to write back mapped pages at all on the page reclaim path, causing systems to get locked up when there's a lot of dirty mmapped data around. The patch changes NFS so that it will start I/O against these pages. The code as it stands is designed to defer writeout to pdflush which can do larger, more efficient I/Os. But there shouldn't be much traffic by this path, and going slow is better than not going at all. Patch originally from Trond. Signed-off-by:
Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 13 Jun, 2004 3 commits
-
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
- 12 Jun, 2004 4 commits
-
-
Andrew Morton authored
From: Davide Libenzi <davidel@xmailserver.org> This is a sanity check on the size parameter. Nothing explodes w/out, but the conversion to unsigned simply triggers a big allocation. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Reduce stack consumption in sync_inodes_sb() via read_page_state(). Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
- 11 Jun, 2004 3 commits
-
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com)
-
- 10 Jun, 2004 2 commits
-
-
Steve French authored
Signed-off-by: Steve French (sfrench@us.ibm.com>
-
Anton Altaparmakov authored
- Modify fs/ntfs/ntfs_readdir() to copy the index root attribute value to a buffer so that we can put the search context and unmap the mft record before calling the filldir() callback. We need to do this because of NFSd which calls ->lookup() from its filldir callback() and this causes NTFS to deadlock as ntfs_lookup() maps the mft record of the directory and since ntfs_readdir() has got it mapped already ntfs_lookup() deadlocks. Signed-off-by:
Anton Altaparmakov <aia21@cantab.net>
-
- 09 Jun, 2004 5 commits
-
-
Dave Kleikamp authored
The current warning and/or trap when the btstack is overrun in dtSearch or dtReadFirst are not very helpful. Add code to detect the stack overrun earlier, print something useful, and return gracefully. I've found that dbFree being called with blkno == 0 can lead to this error, so I put in a specific check for that. Signed-off-by:
Dave Kleikamp <shaggy@austin.ibm.com>
-
Andrew Morton authored
Randy Dunlap <rddunlap@osdl.org> points out that sparse warns about the test of an undefined preprocessor identifier. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
We need to take journal_lock_updates() while remounting r/o to prevent a new transaction starting while journal_flush() is running. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> There's a small window where the filesystem can be unmounted during writeback_inodes. The end result is the iput done by sync_sb_inodes could be done after the FS put_super and and the super has been removed from all lists. The fix is to hold the s_umount sem during sync_sb_inodes to make sure the FS doesn't get unmounted. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Fix a problem discovered by Jeff Mahoney <jeffm@suse.com>, based on an initial patch from Chris Mason <mason@suse.com>. journal_get_descriptor_buffer() is used to obtain a regular old buffer_head against the blockdev mapping. The caller will populate that bh by hand and will then submit it for writing. But there are problems: a) The function sets bh->b_state nonatomically. But this buffer is accessible to other CPUs via pagecache lookup. b) The function sets the buffer dirty and then the caller populates it and then it is submitted for I/O. Wrong order: there's a window in which the VM could write the buffer before it is fully populated. c) The function fails to set the buffer uptodate after zeroing it. And one caller forgot to mark it uptodate as well. So if the VM happens to decide to write the containing page back __block_write_full_page() encounters a dirty, not uptodate buffer, which is an illegal state. This was generating buffer_error() warnings before we removed buffer_error(). Leaving the buffer not uptodate also means that a concurrent reader of /dev/hda1 could cause physical I/O against the buffer, scribbling on what we just put in it. So journal_get_descriptor_buffer() is changed to mark the buffer uptodate, under the buffer lock. I considered changing journal_get_descriptor_buffer() to return a locked buffer but there doesn't seem to be a need for this, and both callers end up using ll_rw_block() anyway, which requires that the buffer be unlocked again. Note that the journal_get_descriptor_buffer() callers dirty these buffers with set_buffer_dirty(). That's a bit naughty, because it could create dirty buffers against a clean page - an illegal state. They really should use mark_buffer_dirty() to dirty the page and inode as well. But all callers will immediately write and clean the buffer anyway, so we can safely leave this optimising cheat in place. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 08 Jun, 2004 1 commit
-
-
Anton Altaparmakov authored
- Mark the volume dirty when (re)mounting read-write and mark it clean when unmounting or remounting read-only. If any volume errors are found, the volume is left marked dirty to force chkdsk to run. - Add code to set the NT4 compatibility flag when (re)mounting read-write for newer NTFS versions but leave it commented out for now since we do not make any modifications that are NTFS 1.2 specific yet and since setting this flag breaks Captive-NTFS which is not nice. This code must be enabled once we start writing NTFS 1.2 specific changes otherwise Windows NTFS driver might crash / cause corruption. - Fix a silly bug that caused a deadlock in ntfs_mft_writepage(). For inode 0, i.e. $MFT itself, we cannot use ilookup5() from there because the inode is already locked by the kernel (fs/fs-writeback.c::__sync_single_inode()) and ilookup5() waits until the inode is unlocked before returning it and it never gets unlocked because ntfs_mft_writepage() never returns. )-: Fortunately, we have inode 0 pinned in icache for the duration of the mount so we can access it directly. Signed-off-by:
Anton Altaparmakov <aia21@cantab.net>
-
- 07 Jun, 2004 6 commits
-
-
Steve French authored
handle partial page update of page in cache that is not uptodate better for the situation in which file is open writeonly Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Steve French authored
Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Anton Altaparmakov authored
information flags (fs/ntfs/super.c). Signed-off-by:
Anton Altaparmakov <aia21@cantab.net>
-
Steve French authored
Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Steve French authored
Signed-off-by:
Yury Umanets <torque@ukrpost.net> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Steve French authored
-
- 05 Jun, 2004 8 commits
-
-
Andrew Morton authored
fs/nfs/direct.c: In function `nfs_file_direct_write': fs/nfs/direct.c:549: warning: initialization discards qualifiers from pointer target type Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Rusty Russell authored
As pointed out by Paul Jackson <pj@sgi.com>, sometimes 99 chars is not enough. We currently get a page from sysfs: that code should check we haven't overrun it. Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au> Signed-off-by:
Paul Jackson <pj@sgi.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Fix error return in create. (See comment in xdr for createtype4 at end of rfc3530.) From: Andy Adamson <andros@citi.umich.edu> From: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by:
Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Fix a somewhat bizarre corner case in clid processing: a clientid match isn't required for case 3. From: Andy Adamson <andros@citi.umich.edu> From: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by:
Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Fix oops in release_lockowner. We need to break out to two loops, not just one, and if the loop finds nothing, 'local' won't be NULL. So just put the body of the 'if' inside the loop. From: Andy Adamson <andros@citi.umich.edu> From: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by:
Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Encode names directly into xdr buffer; this optimizes out a data copy, reduces stack usage, and will make life simpler when doing acls. From: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by:
Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
there's a small typo in nfsd_acceptable. It calls err = permission(parent->d_inode, S_IXOTH, NULL); It really wants to use MAY_EXEC instead of S_IXOTH. Those happen to be the same at the moment, but may not do so forever. From: Olaf Kirch <okir@suse.de>: From: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by:
Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
The "offset" in an entry in an nfs3 readdir response is 64 bits long and as it has only a 32 bit alignment, it fall half in one page of the response and half in another. This patch adds a second offset pointer (offset1) which points to the second half in the unusual case of the offset being split between pages, and sets and uses it accordingly. From: Olaf Kirch <okir@suse.de> Signed-off-by:
Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-