1. 18 Jun, 2004 19 commits
    • Andrew Morton's avatar
      [PATCH] idr: remove counter bits from id's · 5470e17c
      Andrew Morton authored
      idr_get_new() currently returns an incrementing counter in the top 8 bits of
      the counter.  Which means that most users have to mask it off again, and we
      only have a 24-bit range.
      
      So remove that counter.  Also:
      
      - Remove the BITS_PER_INT define due to namespace collision risk.
      
      - Make MAX_ID_SHIFT 31, so counters have a 0 to 2G-1 range.
      
      - Why is MAX_ID_SHIFT using sizeof(int) and not sizeof(long)?  If it's for
        consistency across 32- and 64-bit machines, why not just make it "31"?
      
      - Does this still hold true with the counter removed?
      
      /* We can only use half the bits in the top level because there are
         only four possible bits in the top level (5 bits * 4 levels = 25
         bits, but you only use 24 bits in the id). */
      
        If not, what needs to change?
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5470e17c
    • Corey Minyard's avatar
      [PATCH] Fixes for idr code · 90e518e1
      Corey Minyard authored
      * On a 32-bit architecture, the idr code will cease to work if you add
        more than 2^20 entries.  You will not be able to find many of the
        entries.  The problem is that the IDR code uses 5-bit chunks of the
        number and the lower portion used by IDR is 24 bits, so you have one bit
        that leaks over into the comparisons that should not be there.  The
        solution is to mask off that bit before doing IDR processing.  This
        actually causes the POSIX timer code to crash if you create that many
        timers.  I have included an idr_test.tar.gz file that demonstrates this
        with and without the fix, in case you need more evidence :).
      
      * When the IDR fills up, it returns -1.  However, there was no way to
        check for this condition.  This patch adds the ability to check for the
        idr being full and fixes all the users.  It also fixes a problem in
        fs/super.c where the idr code wasn't checking for -1.
      
      * There was a race condition creating POSIX timers.  The timer was added
        to a task struct for another process then the data for the timer was
        filled out.  The other task could use/destroy time timer as soon as it is
        in the task's queue and the lock is released.  This moves settup up the
        timer data to before the timer is enqueued or (for some data) into the
        lock.
      
      * Change things so that the caller doesn't need to run idr_full() to find
        out the reason for an idr_get_new() failure.
      
        Just return -ENOSPC if the tree was full, or -EAGAIN if the caller needs
        to re-run idr_pre_get() and try again.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      90e518e1
    • Chris Mason's avatar
      [PATCH] reiserfs data logging support · f1372916
      Chris Mason authored
      Add data=journal support for reiserfs
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f1372916
    • Chris Mason's avatar
      [PATCH] reiserfs: btree readahead · 2167f071
      Chris Mason authored
      Walking the btree can trigger a number of single block synchronous reads.
      This patch does btree readahead during operations that are likely to be long
      and sequential.  So far, that only includes directory reads and truncates, but
      it can make both much faster.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2167f071
    • Chris Mason's avatar
      [PATCH] reiserfs: remove debugging warning from block allocator · 36f9f7fc
      Chris Mason authored
      Remove debugging warning from the reiserfs block allocator code
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      36f9f7fc
    • Chris Mason's avatar
      [PATCH] reiserfs: block allocator should not inherit "packing locality 1" · 930c07f9
      Chris Mason authored
      reiserfsck --rebuild-tree expects the only key with a packing locality of 1 to
      be for the root directory (key [1 2]).  The new block allocator inherited that
      packing locality down to subdirectories, which triggers failures in reiserfsck
      --rebuild-tree
      
      reiserfsck in readonly check mode doesn't complain about this, thanks to Jeff
      Mahoney for finding it.
      
      The fix is to never inherit packing locality #1
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      930c07f9
    • Chris Mason's avatar
      [PATCH] reiserfs: block allocator optimizations · 734db689
      Chris Mason authored
      From: <mason@suse.com>
      From: <jeffm@suse.com>
      
      The current reiserfs allocator pretty much allocates things sequentially
      from the start of the disk, it works very nicely for desktop loads but
      once you've got more then one proc doing io data files can fragment badly.
      
      One obvious solution is something like ext2's bitmap groups, which puts
      file data into different areas of the disk based on which subdirectory
      they are in.  The problem with bitmap groups is that if you've got a
      group of subdirectories their contents will be spread out all over the
      disk, leading to lots of seeks during a sequential read.
      
      This allocator patch uses the packing locality to determine which bitmap
      group to allocate from, but when you create a file it looks in the bitmaps
      to see how 'full' that packing locality already is.  If it hasn't been
      heavily used yet, the packing locality is inherited from the parent
      directory putting files in new subdirs close to the parent subdir,
      otherwise it is the inode number of the parent directory putting new
      files far away from the parent subdir.
      
      The end result is fewer bitmap groups for the same working set.  For
      example, one test data set created by 20 procs running in parallel has
      6822 subdirs.  And with vanilla reiserfs that would mean 6822
      packing localities.  This patch turns that into 26 packing localities.
      
      This makes sequential reads of big directory trees more efficient, but
      it also makes the btree more efficient in general.  Things end up sorted
      better because groups of subdirs end up with similar keys in the btree,
      instead of being spread out all over.
      
      The bitmap grouping code tries to use the start of each bitmap group
      for metadata, and offsets the data slightly.  The data and metadata
      are still close together, but not completely intermixed like they are
      in the default allocator.  The end result is that leaf nodes tend to be
      close to each other, making metadata readahead more effective.
      
      The old block allocator had the ability to enforce a minimum
      allocation size, but did not use it.  It now tries to do a pass looking
      for larger allocation chunks before falling back to the old behaviour
      of taking any blocks it can find.
      
      The patch changes the defaults to:
      
      mount -o alloc=skip_busy:dirid_groups:packing_groups
      
      You can get back the old behaviour with mount -o alloc=skip_busy
      
      mount -o alloc=dirid_groups will turn on the bitmap groups
      mount -o alloc=packing_groups turns on the packing locality reduction code
      mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
      skip_busy
      
      Finally the patch adds a mount -o alloc=oid_groups, which puts files into
      bitmap groups based on a hash of their objectid.  This would be used for
      databases or other situations where you have a limited number of very
      large files.
      
      This command will tell you how many packing localities are actually in
      use:
      
      debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      734db689
    • Andrew Morton's avatar
      [PATCH] ppc64: uninline __pte_free_tlb() · fab177a4
      Andrew Morton authored
      The pgalloc.h changes broke ppc64:
      
      In file included from include/asm-generic/tlb.h:18,
                       from include/asm/tlb.h:24,
                       from arch/ppc64/mm/hash_utils.c:48:
      include/asm/pgalloc.h: In function `__pte_free_tlb':
      include/asm/pgalloc.h:110: dereferencing pointer to incomplete type
      include/asm/pgalloc.h:111: dereferencing pointer to incomplete type
      
      Uninlining __pte_free_tlb() fixes that.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fab177a4
    • Russell King's avatar
      [PATCH] Clean up asm/pgalloc.h include 3 · d01034ea
      Russell King authored
      This patch cleans up needless includes of asm/pgalloc.h from the arch/i386/
      subtree.  Compile tested on x86_pc SMP.
      
      [I also tried VISWS + SMP without PM doesn't build in smpboot.c,
       though I don't believe its caused by this patch.  With PM, fails
       to link complaining maxcpus is undefined.  Therefore, I presume
       VISWS + SMP is an invalid configuration.]
      
      This patch is part of a larger patch aiming towards getting the include of
      asm/pgtable.h out of linux/mm.h, so that asm/pgtable.h can sanely get at
      things like mm_struct and friends.
      
      I suggest testing in -mm for a while to ensure there aren't any hidden arch
      issues.
      
      The outstanding list of files for other architectures can be found
      at http://www.arm.linux.org.uk/misc/pgalloc.txtSigned-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d01034ea
    • Russell King's avatar
      [PATCH] Clean up asm/pgalloc.h include · a646de6c
      Russell King authored
      This patch cleans up needless includes of asm/pgalloc.h from the drivers/
      subtree.  drivers/char/mem.c has been compile tested; the others have not,
      since they are for non-x86 and non-ARM architectures.
      
      This patch is part of a larger patch aiming towards getting the include of
      asm/pgtable.h out of linux/mm.h, so that asm/pgtable.h can sanely get at
      things like mm_struct and friends.
      
      I suggest testing in -mm for a while to ensure there aren't any hidden arch
      issues.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a646de6c
    • Russell King's avatar
      [PATCH] Clean up asm/pgalloc.h include · 1c60f076
      Russell King authored
      This patch cleans up needless includes of asm/pgalloc.h from the fs/
      kernel/ and mm/ subtrees.  Compile tested on multiple ARM platforms, and
      x86, this patch appears safe.
      
      This patch is part of a larger patch aiming towards getting the include of
      asm/pgtable.h out of linux/mm.h, so that asm/pgtable.h can sanely get at
      things like mm_struct and friends.
      
      I suggest testing in -mm for a while to ensure there aren't any hidden arch
      issues.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1c60f076
    • Yoav Zach's avatar
      [PATCH] binfmt_misc: improve calculation of interpreter's credentials · c407c033
      Yoav Zach authored
      This patch allows for misc binaries to run with credentials and security
      token that are calculated according to the binaries, and not according to the
      interpreter, which is the legacy behavior of binfmt_misc.
      
      The way it is done is by calling prepare_binprm, which is where these
      attributes are calculated, before switching the 'file' field in the bprm from
      the binary to the interpreter.
      
      This feature should be used with care, since the interpreter will have root
      permissions when running a setuid binary owned by root.
      
      Please note -
      
      - Only root can register an interpreter with binfmt_misc.  The feature is
        documented and the administrator is advised to handle it with care
      
      - The new feature is enabled only with a special flag in the registration
        string.  When this flag is not specified the current behavior of
        binfmt_misc is kept
      
      - This is the only 'right' way for an interpreter to know the correct
        AT_SECURE value for the interpreted binary
      
      
      From: Chris Wright <chrisw@osdl.org>
      
        This patchset looks OK, except for one problem.  It installs the fd (which
        could've been unreadable) without unsharing the ->files.  So someone can use
        this to read unreadable yet executable files.  Here's a patch which fixes
        that up.  I added one bit that's commented out because I'm not positive if a
        final steal_locks() is needed.
      
        I did a fair amount of rearranging to simplify the error conditions
        relative to the fd_install(), and unshare_files().
      
      From: Chris Wright <chrisw@osdl.org>
      
        I found that the intel patchset (and mine as well) leaked i_writecount on
        the original executed file.  In addition, I verified that the steal_locks()
        bit is indeed needed.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c407c033
    • Yoav Zach's avatar
      [PATCH] Handle non-readable binfmt_misc executables · 79baf43b
      Yoav Zach authored
      <background>
      
      I work in a group that works on enabling the IA-32 Execution Layer
      (http://www.intel.com/pressroom/archive/releases/20040113comp.htm) on Linux.
      In a few words - this is a dynamic translator for IA-32 binaries on IPF
      platform.  Following David Mosberger's advice - we use the binfmt_misc
      mechanism for the invocation of the translator whenever the user tries to
      exec an IA-32 binary.
      
      The EL is meant to help in the migration path from IA-32 to IPF.  From our
      beta customers we learnt that at first stage - they tend to keep their
      environment mostly intact, using the legacy IA-32 binaries.
      
      Such an environment has, naturally, setuid and non-readable binaries.  It
      will be useless to ask the administrator to change the settings of such an
      environment - some of them are very complex, and the administrators are
      reluctant to make any changes in a system that already proved itself to be
      robust and secure.  So, our target with these patches is not to enhance the
      support for scripts but rather to allow a translator to be integrated into a
      working environment that is not (and should not be) aware to the fact it's
      being emulated.
      
      As I said before - it is practically hopeless to expect an administrator of
      such a system to change it so that it will suit the current behavior of
      binfmt_misc.  But, even if we could do that,
      
      I'm not sure it would be a good idea - these changes are likely to be less
      secure than the suggested patches -
      
      - In order to execute non-readable binaries the binary will have to be made
        readable, which is obviously less secure than allowing only a trusted
        translator to read it
      
      - There will be no way for the translator to calculate the accurate
        AT_SECURE value for the translated process.  This might end up with the
        translated process running in a non-secured mode when it actually needs to
        be secured.
      
      </background>
      
      
      I prepared a patch that solves a couple of problems that interpreters have
      when invoked via binfmt_misc.  currently -
      
      1) such interpreters cannot open non-readable binaries
      
      2) the processes will have their credentials and security attributes
         calculated according to interpreter permissions and not those of the
         original binary
      
      the proposed patch solves these problems by -
      
      1) opening the binary on behalf of the interpreter and passing its fd
         instead of the path as argv[1] to the interpreter
      
      2) calling prepare_binprm with the file struct of the binary and not the
         one of the interpreter
      
      The new functionality is enabled by adding a special flag to the registration
      string.  If this flag is not added then old behavior is not changed.
      
      A preliminary version of this patch was sent to the list on 9/1/2003 with the
      title "[PATCH]: non-readable binaries - binfmt_misc 2.6.0-test4".  This new
      version fixes the concerns that were raised by the patch, except of calling
      unshare_files() before allocating a new fd.  this is because this feature did
      not enter 2.6 yet.
      
      
      Arun Sharma <arun.sharma@intel.com> says:
      
      We were going through an internal review of this patch:
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=107424598901720&w=2
      
      which is in your tree already.  I'm not sure if this line of code got
      sufficient review.
      
      +               /* call prepare_binprm before switching to interpreter's file
      +                * so that all security calculation will be done according to
      +                * binary and not interpreter */
      +               retval = prepare_binprm(bprm);
      
      The case that concerns me is: unprivileged interpreter and a privileged
      binary.  One can use binfmt_misc to execute untrusted code (interpreter) with
      elevated privileges.  One could argue that all binfmt_misc interpreters are
      trusted, because only root can register them.  But that's a change from the
      traditional behavior of binfmt_misc (and binfmt_script).
      
      
      (Update):
      
      Arun pointed out that calculating the process credentials according to the
      binary that needs to be translated is a bit risky, since it requires the
      administrator to pay extra attention not to register an interpreter which is
      not intended to run with root credentials.
      
      After discussing this issue with him, I would like to propose a modified
      patch: The old patch did 2 things - 1) open the binary for reading and 2)
      calculate the credentials according to the binary.
      
      I removed the riskier part of changing the credentials calculation, so the
      revised patch only opens the binary for reading.  It also includes few words
      of warning in the description of the 'open-binary' feature in
      binfmt_misc.txt, and makes the function entry_status print the flags in use.
      
      As for the 'credentials' part of the patch, I will prepare a separate patch
      for it and send it again to the LKML, describe the problem and ask for people
      comments.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      79baf43b
    • Matt Porter's avatar
      [PATCH] Add PPC4xx MAINTAINERS entry, merge CREDITS from 2.4 · 69ac831e
      Matt Porter authored
      Add myself as the PPC4xx maintainer. Merge CREDITS entry from 2.4
      Signed-off-by: default avatarMatt Porter <mporter@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      69ac831e
    • Olaf Hering's avatar
      [PATCH] ppc64: avoid multiline /proc/cmdline content on iSeries · 1d023f88
      Olaf Hering authored
      /proc/cmdline is filled via an OS400 call iSeries_init().  It scans the
      returned data from the end, instead of the beginning.  This leads to
      multiple lines in /proc/cmdline
      
      Just scan from the beginning and stop at the first newline.  This patch
      changes also the /proc/iSeries/mf/*/cmdline interface to do the same as the
      initial setup.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1d023f88
    • Mikael Pettersson's avatar
      [PATCH] ppc32 irq.c cpumask fix · bf70d874
      Mikael Pettersson authored
      2.6.7-rc3-mm1 changed cpumask_t from ulong to a struct, causing
      compile-time errors in arch/ppc/kernel/irq.c.
      
      Proposed fix below. Tested on a G3.
      Signed-off-by: default avatarMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bf70d874
    • Kumar Gala's avatar
      [PATCH] ppc32: support for e500 and 85xx · 9bad068c
      Kumar Gala authored
      Here is both a GNU style and BK patch for adding support for the e500 core and
      85xx platform to 2.6.  This is pretty much a direct port from 2.4 with a bit
      of cleanup around the edges.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9bad068c
    • Hugh Dickins's avatar
      [PATCH] mm: pretest pte_young and pte_dirty · f866d89a
      Hugh Dickins authored
      Test for pte_young before going to the costlier atomic test_and_clear, as
      asm-generic does.  Test for pte_dirty before going to the costlier atomic
      test_and_clear, as asm-generic does (I said before that I would not do so for
      pte_dirty, but was missing the point: there is nothing atomic about deciding
      to do nothing).  But I've not touched the rather different ppc and ppc64.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f866d89a
    • Hugh Dickins's avatar
      [PATCH] mm: flush TLB when clearing young · 68dc3ce3
      Hugh Dickins authored
      Traditionally we've not flushed TLB after clearing the young/referenced bit,
      it has seemed just a waste of time.  Russell King points out that on some
      architectures, with the move from 2.4 mm sweeping to 2.6 rmap, this may be a
      serious omission: very frequently referenced pages never re-marked young, and
      the worst choices made for unmapping.
      
      So, replace ptep_test_and_clear_young by ptep_clear_flush_young throughout
      rmap.c.  Originally I'd imagined making some kind of TLB gather optimization,
      but don't see what now: whether worth it rather depends on how common
      cross-cpu flushes are, and whether global or not.
      
      ppc and ppc64 have already found this issue, and worked around it by arranging
      TLB flush from their ptep_test_and_clear_young: with the aid of pgtable rmap
      pointers.  I'm hoping ptep_clear_flush_young will allow ppc and ppc64 to
      remove that special code, but won't change them myself.
      
      It's worth noting that it is Andrea's anon_vma rmap which makes the vma
      available for ptep_clear_flush_young in page_referenced_one: anonmm and
      pte_chains would both need an additional find_vma for that.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      68dc3ce3
  2. 17 Jun, 2004 3 commits
    • Richard Henderson's avatar
      [PATCH] alpha: fix discontigmem+initrd build · 3900b963
      Richard Henderson authored
      From: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      
      Compilation fails due to incorrect usage of NODE_DATA().
      
      Reported by hpa.
      3900b963
    • Paul Mackerras's avatar
      [PATCH] ppc64: Optimize exception/syscall entry/exit · 1ab196f3
      Paul Mackerras authored
      This rewrites the PPC64 exception entry/exit routines to make them
      smaller and faster.
      
      In particular we no longer save all of the registers for the common
      exceptions - system calls, hardware interrupts and decrementer (timer)
      interrupts - only the volatile registers.  The other registers are saved
      and restored (if used) by the C functions we call.  This involved
      changing the registers we use in early exception processing from r20-r23
      to r9-r12, which ended up changing quite a lot of code in head.S. 
      Overall this gives us about a 20% reduction in null syscall time. 
      
      Some system calls need all the registers (e.g.  fork/clone/vfork and
      [rt_]sigsuspend).  For these the syscall dispatch code calls a stub that
      saves the nonvolatile registers before calling the real handler.
      
      This also implements the force_successful_syscall_return() thing for
      ppc64.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1ab196f3
    • Paul Mackerras's avatar
      [PATCH] ppc64: Implement CONFIG_PREEMPT · 23932693
      Paul Mackerras authored
      This implements CONFIG_PREEMPT for ppc64.  Aside from the entry.S
      changes to check the _TIF_NEED_RESCHED bit when returning from an
      exception, there are various changes to make the ppc64-specific code
      preempt-safe, mostly adding preempt_enable/disable or get_cpu/put_cpu
      calls where needed.  I have been using this on my desktop G5 for the
      last week without problems.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      23932693
  3. 15 Jun, 2004 18 commits