1. 18 Jun, 2004 4 commits
    • Chris Mason's avatar
      [PATCH] reiserfs: block allocator optimizations · 734db689
      Chris Mason authored
      
      From: <mason@suse.com>
      From: <jeffm@suse.com>
      
      The current reiserfs allocator pretty much allocates things sequentially
      from the start of the disk, it works very nicely for desktop loads but
      once you've got more then one proc doing io data files can fragment badly.
      
      One obvious solution is something like ext2's bitmap groups, which puts
      file data into different areas of the disk based on which subdirectory
      they are in.  The problem with bitmap groups is that if you've got a
      group of subdirectories their contents will be spread out all over the
      disk, leading to lots of seeks during a sequential read.
      
      This allocator patch uses the packing locality to determine which bitmap
      group to allocate from, but when you create a file it looks in the bitmaps
      to see how 'full' that packing locality already is.  If it hasn't been
      heavily used yet, the packing locality is inherited from the parent
      directory putting files in new subdirs close to the parent subdir,
      otherwise it is the inode number of the parent directory putting new
      files far away from the parent subdir.
      
      The end result is fewer bitmap groups for the same working set.  For
      example, one test data set created by 20 procs running in parallel has
      6822 subdirs.  And with vanilla reiserfs that would mean 6822
      packing localities.  This patch turns that into 26 packing localities.
      
      This makes sequential reads of big directory trees more efficient, but
      it also makes the btree more efficient in general.  Things end up sorted
      better because groups of subdirs end up with similar keys in the btree,
      instead of being spread out all over.
      
      The bitmap grouping code tries to use the start of each bitmap group
      for metadata, and offsets the data slightly.  The data and metadata
      are still close together, but not completely intermixed like they are
      in the default allocator.  The end result is that leaf nodes tend to be
      close to each other, making metadata readahead more effective.
      
      The old block allocator had the ability to enforce a minimum
      allocation size, but did not use it.  It now tries to do a pass looking
      for larger allocation chunks before falling back to the old behaviour
      of taking any blocks it can find.
      
      The patch changes the defaults to:
      
      mount -o alloc=skip_busy:dirid_groups:packing_groups
      
      You can get back the old behaviour with mount -o alloc=skip_busy
      
      mount -o alloc=dirid_groups will turn on the bitmap groups
      mount -o alloc=packing_groups turns on the packing locality reduction code
      mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
      skip_busy
      
      Finally the patch adds a mount -o alloc=oid_groups, which puts files into
      bitmap groups based on a hash of their objectid.  This would be used for
      databases or other situations where you have a limited number of very
      large files.
      
      This command will tell you how many packing localities are actually in
      use:
      
      debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      734db689
    • Russell King's avatar
      [PATCH] Clean up asm/pgalloc.h include · 1c60f076
      Russell King authored
      
      This patch cleans up needless includes of asm/pgalloc.h from the fs/
      kernel/ and mm/ subtrees.  Compile tested on multiple ARM platforms, and
      x86, this patch appears safe.
      
      This patch is part of a larger patch aiming towards getting the include of
      asm/pgtable.h out of linux/mm.h, so that asm/pgtable.h can sanely get at
      things like mm_struct and friends.
      
      I suggest testing in -mm for a while to ensure there aren't any hidden arch
      issues.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1c60f076
    • Yoav Zach's avatar
      [PATCH] binfmt_misc: improve calculation of interpreter's credentials · c407c033
      Yoav Zach authored
      
      This patch allows for misc binaries to run with credentials and security
      token that are calculated according to the binaries, and not according to the
      interpreter, which is the legacy behavior of binfmt_misc.
      
      The way it is done is by calling prepare_binprm, which is where these
      attributes are calculated, before switching the 'file' field in the bprm from
      the binary to the interpreter.
      
      This feature should be used with care, since the interpreter will have root
      permissions when running a setuid binary owned by root.
      
      Please note -
      
      - Only root can register an interpreter with binfmt_misc.  The feature is
        documented and the administrator is advised to handle it with care
      
      - The new feature is enabled only with a special flag in the registration
        string.  When this flag is not specified the current behavior of
        binfmt_misc is kept
      
      - This is the only 'right' way for an interpreter to know the correct
        AT_SECURE value for the interpreted binary
      
      
      From: Chris Wright <chrisw@osdl.org>
      
        This patchset looks OK, except for one problem.  It installs the fd (which
        could've been unreadable) without unsharing the ->files.  So someone can use
        this to read unreadable yet executable files.  Here's a patch which fixes
        that up.  I added one bit that's commented out because I'm not positive if a
        final steal_locks() is needed.
      
        I did a fair amount of rearranging to simplify the error conditions
        relative to the fd_install(), and unshare_files().
      
      From: Chris Wright <chrisw@osdl.org>
      
        I found that the intel patchset (and mine as well) leaked i_writecount on
        the original executed file.  In addition, I verified that the steal_locks()
        bit is indeed needed.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c407c033
    • Yoav Zach's avatar
      [PATCH] Handle non-readable binfmt_misc executables · 79baf43b
      Yoav Zach authored
      <background>
      
      I work in a group that works on enabling the IA-32 Execution Layer
      (http://www.intel.com/pressroom/archive/releases/20040113comp.htm) on Linux.
      In a few words - this is a dynamic translator for IA-32 binaries on IPF
      platform.  Following David Mosberger's advice - we use the binfmt_misc
      mechanism for the invocation of the translator whenever the user tries to
      exec an IA-32 binary.
      
      The EL is meant to help in the migration path from IA-32 to IPF.  From our
      beta customers we learnt that at first stage - they tend to keep their
      environment mostly intact, using the legacy IA-32 binaries.
      
      Such an environment has, naturally, setuid and non-readable binaries.  It
      will be useless to ask the administrator to change the settings of such an
      environment - some of them are very complex, and the administrators are
      reluctant to make any changes in a system that already proved itself to be
      robust and secure.  So, our target with these patches is not to enhance the
      support for scripts but rather to allow a translator to be integrated into a
      working environment that is not (and should not be) aware to the fact it's
      being emulated.
      
      As I said before - it is practically hopeless to expect an administrator of
      such a system to change it so that it will suit the current behavior of
      binfmt_misc.  But, even if we could do that,
      
      I'm not sure it would be a good idea - these changes are likely to be less
      secure than the suggested patches -
      
      - In order to execute non-readable binaries the binary will have to be made
        readable, which is obviously less secure than allowing only a trusted
        translator to read it
      
      - There will be no way for the translator to calculate the accurate
        AT_SECURE value for the translated process.  This might end up with the
        translated process running in a non-secured mode when it actually needs to
        be secured.
      
      </background>
      
      
      I prepared a patch that solves a couple of problems that interpreters have
      when invoked via binfmt_misc.  currently -
      
      1) such interpreters cannot open non-readable binaries
      
      2) the processes will have their credentials and security attributes
         calculated according to interpreter permissions and not those of the
         original binary
      
      the proposed patch solves these problems by -
      
      1) opening the binary on behalf of the interpreter and passing its fd
         instead of the path as argv[1] to the interpreter
      
      2) calling prepare_binprm with the file struct of the binary and not the
         one of the interpreter
      
      The new functionality is enabled by adding a special flag to the registration
      string.  If this flag is not added then old behavior is not changed.
      
      A preliminary version of this patch was sent to the list on 9/1/2003 with the
      title "[PATCH]: non-readable binaries - binfmt_misc 2.6.0-test4".  This new
      version fixes the concerns that were raised by the patch, except of calling
      unshare_files() before allocating a new fd.  this is because this feature did
      not enter 2.6 yet.
      
      
      Arun Sharma <arun.sharma@intel.com> says:
      
      We were going through an internal review of this patch:
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=107424598901720&w=2
      
      
      
      which is in your tree already.  I'm not sure if this line of code got
      sufficient review.
      
      +               /* call prepare_binprm before switching to interpreter's file
      +                * so that all security calculation will be done according to
      +                * binary and not interpreter */
      +               retval = prepare_binprm(bprm);
      
      The case that concerns me is: unprivileged interpreter and a privileged
      binary.  One can use binfmt_misc to execute untrusted code (interpreter) with
      elevated privileges.  One could argue that all binfmt_misc interpreters are
      trusted, because only root can register them.  But that's a change from the
      traditional behavior of binfmt_misc (and binfmt_script).
      
      
      (Update):
      
      Arun pointed out that calculating the process credentials according to the
      binary that needs to be translated is a bit risky, since it requires the
      administrator to pay extra attention not to register an interpreter which is
      not intended to run with root credentials.
      
      After discussing this issue with him, I would like to propose a modified
      patch: The old patch did 2 things - 1) open the binary for reading and 2)
      calculate the credentials according to the binary.
      
      I removed the riskier part of changing the credentials calculation, so the
      revised patch only opens the binary for reading.  It also includes few words
      of warning in the description of the 'open-binary' feature in
      binfmt_misc.txt, and makes the function entry_status print the flags in use.
      
      As for the 'credentials' part of the patch, I will prepare a separate patch
      for it and send it again to the LKML, describe the problem and ask for people
      comments.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      79baf43b
  2. 15 Jun, 2004 1 commit
  3. 14 Jun, 2004 3 commits
  4. 13 Jun, 2004 3 commits
  5. 12 Jun, 2004 4 commits
  6. 11 Jun, 2004 3 commits
  7. 10 Jun, 2004 2 commits
  8. 09 Jun, 2004 5 commits
    • Dave Kleikamp's avatar
      JFS: Better RAS when btstack is overrun · fa2c79c3
      Dave Kleikamp authored
      
      The current warning and/or trap when the btstack is overrun in
      dtSearch or dtReadFirst are not very helpful.  Add code to detect
      the stack overrun earlier, print something useful, and return
      gracefully.
      
      I've found that dbFree being called with blkno == 0 can lead to this
      error, so I put in a specific check for that.
      Signed-off-by: default avatarDave Kleikamp <shaggy@austin.ibm.com>
      fa2c79c3
    • Andrew Morton's avatar
      [PATCH] aio.c sparse warning fix · 004a2b36
      Andrew Morton authored
      
      Randy Dunlap <rddunlap@osdl.org> points out that sparse warns about the test
      of an undefined preprocessor identifier.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      004a2b36
    • Andrew Morton's avatar
      [PATCH] ext3: journal_flush() needs journal_lock_updates() · b7d41b55
      Andrew Morton authored
      
      We need to take journal_lock_updates() while remounting r/o to prevent a new
      transaction starting while journal_flush() is running.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b7d41b55
    • Andrew Morton's avatar
      [PATCH] writeback_inodes can race with unmount · 7052fc2b
      Andrew Morton authored
      
      From: Chris Mason <mason@suse.com>
      
      There's a small window where the filesystem can be unmounted during
      writeback_inodes.  The end result is the iput done by sync_sb_inodes could
      be done after the FS put_super and and the super has been removed from all
      lists.
      
      The fix is to hold the s_umount sem during sync_sb_inodes to make sure
      the FS doesn't get unmounted.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7052fc2b
    • Andrew Morton's avatar
      [PATCH] jbd: descriptor buffer state fix · ace476bb
      Andrew Morton authored
      
      Fix a problem discovered by Jeff Mahoney <jeffm@suse.com>, based on an initial
      patch from Chris Mason <mason@suse.com>.
      
      journal_get_descriptor_buffer() is used to obtain a regular old buffer_head
      against the blockdev mapping.  The caller will populate that bh by hand and
      will then submit it for writing.
      
      But there are problems:
      
      a) The function sets bh->b_state nonatomically.  But this buffer is
         accessible to other CPUs via pagecache lookup.
      
      b) The function sets the buffer dirty and then the caller populates it and
         then it is submitted for I/O.  Wrong order: there's a window in which the
         VM could write the buffer before it is fully populated.
      
      c) The function fails to set the buffer uptodate after zeroing it.  And one
         caller forgot to mark it uptodate as well.  So if the VM happens to decide
         to write the containing page back __block_write_full_page() encounters a
         dirty, not uptodate buffer, which is an illegal state.  This was generating
         buffer_error() warnings before we removed buffer_error().
      
         Leaving the buffer not uptodate also means that a concurrent reader of
         /dev/hda1 could cause physical I/O against the buffer, scribbling on what
         we just put in it.
      
         So journal_get_descriptor_buffer() is changed to mark the buffer
         uptodate, under the buffer lock.
      
      I considered changing journal_get_descriptor_buffer() to return a locked
      buffer but there doesn't seem to be a need for this, and both callers end up
      using ll_rw_block() anyway, which requires that the buffer be unlocked again.
      
      Note that the journal_get_descriptor_buffer() callers dirty these buffers with
      set_buffer_dirty().  That's a bit naughty, because it could create dirty
      buffers against a clean page - an illegal state.  They really should use
      mark_buffer_dirty() to dirty the page and inode as well.  But all callers will
      immediately write and clean the buffer anyway, so we can safely leave this
      optimising cheat in place.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ace476bb
  9. 08 Jun, 2004 1 commit
    • Anton Altaparmakov's avatar
      NTFS: 2.1.13 - Enable overwriting of resident files and housekeeping of system files. · 32e5fcaa
      Anton Altaparmakov authored
      - Mark the volume dirty when (re)mounting read-write and mark it clean
        when unmounting or remounting read-only.  If any volume errors are
        found, the volume is left marked dirty to force chkdsk to run.
      - Add code to set the NT4 compatibility flag when (re)mounting
        read-write for newer NTFS versions but leave it commented out for now
        since we do not make any modifications that are NTFS 1.2 specific yet
        and since setting this flag breaks Captive-NTFS which is not nice.
        This code must be enabled once we start writing NTFS 1.2 specific
        changes otherwise Windows NTFS driver might crash / cause corruption.
      - Fix a silly bug that caused a deadlock in ntfs_mft_writepage().
        For inode 0, i.e. $MFT itself, we cannot use ilookup5() from
        there because the inode is already locked by the kernel
        (fs/fs-writeback.c::__sync_single_inode()) and ilookup5() waits
        until the inode is unlocked before returning it and it never gets
        unlocked because ntfs_mft_writepage() never returns.  )-:
        Fortunately, we have inode 0 pinned in icache for the duration
        of the mount so we can access it directly.
      Signed-off-by: default avatarAnton Altaparmakov <aia21@cantab.net>
      32e5fcaa
  10. 07 Jun, 2004 6 commits
  11. 05 Jun, 2004 8 commits