1. 31 Aug, 2004 40 commits
    • Ryan S. Arnold's avatar
      [PATCH] interrupt driven hvc_console as vio device · 1fd6c4af
      Ryan S. Arnold authored
      This is an hvc_console patch which provides driver and ppc64 architecture
      fixes to enable the hvc_console driver to register itself as a vio device
      with the vio bus, provide hotplug add/remove for vty adapters, and act as
      an interrupt driven driver on Power-5 hardware or remain as a polling
      driver on Power-4 hardware.
      
      arch/ppc64/kernel/hvconsole.c
      =============================
      
      - Changed hvc_get_chars() and hvc_put_chars() api to take vtermno rather
        than index number.
      
      - Added hvc_find_vtys() function which walks the bus looking for
        vterm/vty devices to callback to the hvc_console driver.  This provides
        console output functionality prior to early console init (pre mem init
        and pre device probe).
      
      include/asm-ppc64/hvconsole.h
      =============================
      
      - Changed hvc_get_chars() and hvc_put_chars() api to take vtermno rather
        than index number.
      
      - Added hvc_find_vtys() function.
      
      - Added hvc_instantiate() function which is implemented by a console
        driver wanting to receive a callback of and early console init.
      
      drivers/char/hvc_console.c
      ==========================
      
      - Switch khvcd from kernel_threads to kthreads which got rid of
        deprecated daemonize().
      
      - Added module exit clause to be thorough (not terribly necessary with a
        console driver of course)
      
      - Added early discovery of vterm/vty adapters by doing a bus walk on
        early console init which results in hvc_instantiate() callback and
        addition of the vtermno into a static array of vtermnos supported as
        console adapters (meaning the console api's work against these vtermnos
        prior to full console initialization).
      
      - This driver is now registered as a vio driver which means that vty
        adapters are now managed via probe/remove.  This means hvc_console
        supports hotplug vty adapters.
      
      - Driver now requests more device nodes than what was found on the
        initial bus walk when registered as a tty driver to make room for hotplug
        vty adapters.  These secondary vty adapters provide a tty tunnel between
        partitions.
      
      - Removed static hvc_struct array and replaced with a linux list that has
        elements (hvc_struct instances) added/removed on probe/remove AFTER early
        console init.  This is important because kmalloc can't be done at early
        console init.
      
      - Driver now either runs in interrupt driven mode or in polling mode on
        older hardware.  The khvcd is smart enough to not 'schedule()' when there
        are no interrupts.
      
      - kobjects are now used for ref counting on the hvc_struct instances.
      
      - This driver puts the tty layer to sleep on hvc_close() if there are
        pending data writes being blocked by firmware.
      
      - Removed useless spinlocks in hvc_chars_in_buffer() and hvc_write_room.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarRyan S. Arnold <rsa@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1fd6c4af
    • Paul Fulghum's avatar
      [PATCH] synclinkmp transmit eom fix · 43eef00a
      Paul Fulghum authored
      Bug Fixes:
      
      * Fix transmit end of message (EOM) processing to work correctly with
        hardware auto CTS feature
      
      * Fix oops in error path if hardware diags fail during device
        initialization
      
      Cosmetic change:
      
      * Use existing macros for address space size instead of hardcoded values
      Signed-off-by: default avatarPaul Fulghum <paulkf@microgate.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      43eef00a
    • David Howells's avatar
      [PATCH] Fix a NULL pointer bug in do_generic_file_read() · a3f5b14e
      David Howells authored
      The attached patch fixes a bug introduced into do_generic_mapping_read() by
      which a file pointer becomes required.  I'd arranged things so that the
      file pointer was optional so that I could call the function directly on an
      inode.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a3f5b14e
    • H. Peter Anvin's avatar
      [PATCH] Make i386 signal delivery work with -mregparm · 9691cc0d
      H. Peter Anvin authored
      This patch allows i386 signal delivery to work correctly when userspace is
      compiled with -mregparm.  This is somewhat hacky: it passes the arguments
      *both* on the stack and in registers, but it works because there are only
      one or three (depending on SA_SIGINFO) official arguments.  If you're
      relying on the unofficial arguments then you're doing something nonportable
      anyway and can put in the __attribute__((cdecl,regparm(0))) in the correct
      place.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9691cc0d
    • Jeff Moyer's avatar
      [PATCH] netpoll: fix up trapped logic · 26fec91d
      Jeff Moyer authored
      This patch contains the updates necessary to fix the hangs in netconsole.
      This includes the changing of trapped to an atomic_t, and the addition of a
      netpoll_poll_lock.  It also turns dev->netpoll_rx into a bitfield which is
      used to keep from running the networking code from the netpoll_poll call path.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      26fec91d
    • Jeff Moyer's avatar
      [PATCH] netpoll: increase NAPI budget · 5b9c63bc
      Jeff Moyer authored
      I've upped the poll budget to 16 and added a comment explaining why.  I
      definitely ran into this problem when testing netdump.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5b9c63bc
    • Jeff Moyer's avatar
      [PATCH] netpoll: kill CONFIG_NETPOLL_RX · e3c265bc
      Jeff Moyer authored
      This patch removes CONFIG_NETPOLL_RX, as discussed.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e3c265bc
    • Matt Mackall's avatar
      [PATCH] netpoll: revert queue stopped change · a668a6b0
      Matt Mackall authored
      Here's the first of the broken out patch set.  This puts the check for
      netif_queue_stopped back into netpoll_send_skb.  Network drivers are not
      designed to have their hard_start_xmit routines called when the queue is
      stopped.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a668a6b0
    • Matt Mackall's avatar
      [PATCH] netpoll: fix unaligned accesses · 76bd9baa
      Matt Mackall authored
      Avoid some alignment traps.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      76bd9baa
    • Paolo Ornati's avatar
      [PATCH] tdfx linkage fix · 4dca0193
      Paolo Ornati authored
      drivers/built-in.o(.data+0x40a68): undefined reference to `cfb_fillrect'
      drivers/built-in.o(.data+0x40a6c): undefined reference to `cfb_copyarea'
      
      3dfx framebuffer driver depends on "cfb_fillrect.c" and "cfb_copyarea.c" if
      it's compiled without CONFIG_FB_3DFX_ACCEL turned on...
      Signed-off-by: default avatarPaolo Ornati <ornati@fastwebnet.it>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4dca0193
    • Andy Whitcroft's avatar
      [PATCH] use page_to_nid · 12bf4a56
      Andy Whitcroft authored
      There are a couple of places where we seem to go round the houses to get
      the numa node id from a page.  We have a macro for this so it seems
      sensible to use that.
      
      Both lookup_node and enqueue_huge_page use page_zone() to locate the zone,
      that to locate node pgdat_t and that to get the node_id.  Its more
      efficient to use page_to_nid() which gets the nid from the page flags,
      especially if we are not using the zone for anything else it.  Change these
      to use page_to_nid().
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      12bf4a56
    • Andy Whitcroft's avatar
      [PATCH] i386 bootmem restrictions · b1480d3f
      Andy Whitcroft authored
      (Comment changes only)
      
      The bootmem allocator is initialised before the kernel virtual address
      space has been fully established.  As a result, any allocations which are
      made before paging_init() has completed may point to invalid kernel
      addresses.  This patch notes this limitation and indicates where the
      allocator is fully available.
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b1480d3f
    • Rusty Russell's avatar
      [PATCH] Don't OOPS on stripped modules · 758638eb
      Rusty Russell authored
      Don't want to go overboard with the checks, but this is simple and
      reasonable.
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      758638eb
    • Olaf Kirch's avatar
      [PATCH] Prevent memory leak in devpts · 8cc42321
      Olaf Kirch authored
      There is a dentry refcount leak in devpts_get_tty.
      
      struct tty_struct *devpts_get_tty(int number)
      {
              struct dentry *dentry = get_node(number);
              struct tty_struct *tty;
      
              tty = (IS_ERR(dentry) || !dentry->d_inode) ? NULL :
                              dentry->d_inode->u.generic_ip;
      
              up(&devpts_root->d_inode->i_sem);
              return tty;
      }
      
      The get_node function does a lookup on /dev/pts/<number> and returns the
      dentry, taking a reference.  We should dput the dentry after extracting the
      tty pointer.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8cc42321
    • Joanne Dow's avatar
      [PATCH] Amiga partition reading fix · b13b24e7
      Joanne Dow authored
      I have a large archive of files stored on Amiga volumes.  Many of these
      volumes are on Fujitsu magneto-optical disks with 2k sector size.  The
      existing partitioning code cannot properly read them since it appears the
      OS automatically deblocks the large sectors into logical 512 byte sectors,
      something AmigaDOS never did.  I arranged the partitioning code to handle
      this situation.
      
      Second I have some rather strange test case disks, including my largest
      storage partition, that have somewhat unusual partition values.  As such I
      needed additional information in addition to the first and last block
      number information.  AmigaDOS reserves N blocks, with N greater than or
      equal to 1 and less than the size of the partition, for some boot time
      information and signatures.  I have some partitions that use other than the
      usual value of 2.
      
      There is one more "fix" that could be put in if someone needs it.  Another
      value in the "Rigid Disk Blocks" description of a partition is a "PreAlloc"
      value.  It defines a number of blocks at the end of the disk that are not
      considered to be a real part of the partition.  This was "important" in the
      days of 20 meg and 40 meg hard disks.  It is hardly important and not used
      on modern drives without special user intervention.
      
      This partitioning information is known correct.  I wrote the low level
      portion of the hard disk partitioning code for AmigaDOS 3.5 and 3.9.  I am
      also responsible for one of the more frequently used partitioning tools,
      RDPrepX, before that.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b13b24e7
    • Nick Piggin's avatar
      33982f7f
    • Nick Piggin's avatar
      [PATCH] use hlist for pid hash · f59ad67e
      Nick Piggin authored
      Use hlists for the PID hashes.  This halves the memory footprint of these
      hashes.  No benchmarks, but I think this is a worthy improvement because
      the hashes are something that would be likely to have significant portions
      loaded into the cache of every CPU on some workloads.
      
      This comes at the "expense" of
      	1. reintroducing the memory  prefetch into the hash traversal loop;
      	2. adding new pids to the head of the list instead of the tail. I
      	   suspect that if this was a big problem then the hash isn't sized
      	   well or could benefit from moving hot entries to the head.
      
      Also, account for all the pid hashes when reporting hash memory usage.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f59ad67e
    • Nick Piggin's avatar
      [PATCH] fix PID hash sizing · 2c05d9eb
      Nick Piggin authored
      A 4GB, 4-way Opteron would create the smallest size table (16 entries) because
      pidhash_init is called before mem_init which is where x86-64 sets up max_pfn.
      
      nr_kernel_pages is setup by paging_init, called from setup_arch, which is also
      where i386 sets up max_pfn.
      
      So export nr_kernel_pages, nr_all_pages.  Use nr_kernel_pages when sizing the
      PID hash.  This fixes the problem.
      
      This also makes the pid hash dependant on the size of ZONE_NORMAL instead of
      total size of memory.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2c05d9eb
    • Roland McGrath's avatar
      [PATCH] fix rusage semantics · 497c9d68
      Roland McGrath authored
      This patch changes the rusage bookkeeping and the semantics of the
      getrusage and times calls in a couple of ways.
      
      The first change is in the c* fields counting dead child processes.  POSIX
      requires that children that have died be counted in these fields when they
      are reaped by a wait* call, and that if they are never reaped (e.g.
      because of ignoring SIGCHLD, or exitting yourself first) then they are
      never counted.  These were counted in release_task for all threads.  I've
      changed it so they are counted in wait_task_zombie, i.e.  exactly when
      being reaped.
      
      POSIX also specifies for RUSAGE_CHILDREN that the report include the reaped
      child processes of the calling process, i.e.  whole thread group in Linux,
      not just ones forked by the calling thread.  POSIX specifies tms_c[us]time
      fields in the times call the same way.  I've moved the c* fields that
      contain this information into signal_struct, where the single set of
      counters accumulates data from any thread in the group that calls wait*.
      
      Finally, POSIX specifies getrusage and times as returning cumulative totals
      for the whole process (aka thread group), not just the calling thread.
      I've added fields in signal_struct to accumulate the stats of detached
      threads as they die.  The process stats are the sums of these records plus
      the stats of remaining each live/zombie thread.  The times and getrusage
      calls, and the internal uses for filling in wait4 results and siginfo_t,
      now iterate over the threads in the thread group and sum up their stats
      along with the stats recorded for threads already dead and gone.
      
      I added a new value RUSAGE_GROUP (-3) for the getrusage system call rather
      than changing the behavior of the old RUSAGE_SELF (0).  POSIX specifies
      RUSAGE_SELF to mean all threads, so the glibc getrusage call will just
      translate it to RUSAGE_GROUP for new kernels.  I did this thinking that
      someone somewhere might want the old behavior with an old glibc and a new
      kernel (it is only different if they are using CLONE_THREAD anyway). 
      However, I've changed the times system call to conform to POSIX as well and
      did not provide any backward compatibility there.  In that case there is
      nothing easy like a parameter value to use, it would have to be a new
      system call number.  That seems pretty pointless.  Given that, I wonder if
      it is worth bothering to preserve the compatible RUSAGE_SELF behavior by
      introducing RUSAGE_GROUP instead of just changing RUSAGE_SELF's meaning.
      Comments?
      
      I've done some basic testing on x86 and x86-64, and all the numbers come
      out right after these fixes.  (I have a test program that shows a few
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      497c9d68
    • Roland McGrath's avatar
      [PATCH] waitid system call · ca3f74aa
      Roland McGrath authored
      This patch adds a new system call `waitid'.  This is a new POSIX call that
      subsumes the rest of the wait* family and can do some things the older
      calls cannot.  A minor addition is the ability to select what kinds of
      status to check for with a mask of independent bits, so you can wait for
      just stops and not terminations, for example.  A more significant
      improvement is the WNOWAIT flag, which allows for polling child status
      without reaping.  This interface fills in a siginfo_t with the same details
      that a SIGCHLD for the status change has; some of that info (e.g.  si_uid)
      is not available via wait4 or other calls.
      
      I've added a new system call that has the parameter conventions of the
      POSIX function because that seems like the cleanest thing.  This patch
      includes the actual system call table additions for i386 and x86-64; other
      architectures will need to assign the system call number, and 64-bit ones
      may need to implement 32-bit compat support for it as I did for x86-64. 
      The new features could instead be provided by some new kludge inventions in
      the wait4 system call interface (that's what BSD did).  If kludges are
      preferable to adding a system call, I can work up something different.
      
      I added a struct rusage field si_rusage to siginfo_t in the SIGCHLD case
      (this does not affect the size or layout of the struct).  This is not part
      of the POSIX interface, but it makes it so that `waitid' subsumes all the
      functionality of `wait4'.  Future kernel ABIs (new arch's or whatnot) can
      have only the `waitid' system call and the rest of the wait* family
      including wait3 and wait4 can be implemented in user space using waitid.
      There is nothing in user space as yet that would make use of the new field.
      
      Most of the new functionality is implemented purely in the waitid system
      call itself.  POSIX also provides for the WCONTINUED flag to report when a
      child process had been stopped by job control and then resumed with
      SIGCONT.  Corresponding to this, a SIGCHLD is now generated when a child
      resumes (unless SA_NOCLDSTOP is set), with the value CLD_CONTINUED in
      siginfo_t.si_code.  To implement this, some additional bookkeeping is
      required in the signal code handling job control stops.
      
      The motivation for this work is to make it possible to implement the POSIX
      semantics of the `waitid' function in glibc completely and correctly.  If
      changing either the system call interface used to accomplish that, or any
      details of the kernel implementation work, would improve the chances of
      getting this incorporated, I am more than happy to work through any issues.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ca3f74aa
    • Anton Blanchard's avatar
      [PATCH] Using get_cycles for add_timer_randomness · 4c746d40
      Anton Blanchard authored
      I tested how long it took to do a dd from /dev/random on ppc64 before and
      after this patch, while doing a ping flood from another machine.
      
      before:
      # /usr/bin/time dd if=/dev/random of=/dev/zero count=1k
      0+51 records in
      Command terminated by signal 2
      0.00user 0.00system 19:18.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
      
      I gave up after 19 minutes.
      
      after:
      # /usr/bin/time dd if=/dev/random of=/dev/zero count=1k
      0+1024 records in
      0.00user 0.00system 0:33.38elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
      
      Just over 33 seconds. Better.
      
      From: Arnd Bergmann <arnd@arndb.de>
      
      I noticed that only i386 and x86-64 are currently using a high resolution
      timer source when adding randomness.  Since many architectures have a
      working get_cycles() implementation, it seems rather straightforward to use
      that.
      
      Has this been discussed before, or can anyone comment on the implementation
      below?
      
      This patch attempts to take into account the size of cycles_t, which is
      either 32 or 64 bits wide but independent of the architecture's word size.
      
      The behavior should be nearly identical to the old one on i386, x86-64 and
      all architectures without a time stamp counter, while finding more entropy
      on the other architectures.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4c746d40
    • Andi Kleen's avatar
      [PATCH] x86_64: emulate NUMA on non-NUMA hardware · 60b292ca
      Andi Kleen authored
      Apply this handy patch and boot with numa=fake=4 (or how many nodes you
      want, 8 max right now).
      
      There is a minor issue with the hash function, which can make the last node
      be bigger than the others.  Is probably fixable if it should be a problem.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      60b292ca
    • Matt Mackall's avatar
      [PATCH] tiny shmem/tmpfs replacement · 14ef4d0a
      Matt Mackall authored
      A patch to replace tmpfs/shmem with ramfs for systems without swap,
      incorporating the suggestions from Andi and Hugh.  It uses ramfs instead.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      14ef4d0a
    • Alan Cox's avatar
      [PATCH] VLAN support for 3c59x/3c90x · 59835997
      Alan Cox authored
      This adds VLAN support to the 3c59x/90x series hardware.
      
      Stefan de Konink ported this code from the 2.4 VLAN patches and tested it
      extensively. I cleaned up the ifdefs and fixed a problem with bracketing
      that made older cards fail.
      
      --
      
      Developer's Certificate of Origin 1.0
      
      By making a contribution to this project, I certify that:
      
      (a) The contribution was created in whole or in part by me and I have the
      right to submit it under the open source license indicated in the file; or
      
      (b) The contribution is based upon previous work that, to the best of my
      knowledge, is covered under an appropriate open source license and I have
      the right under that license to submit that work with modifications,
      whether created in whole or in part by me, under the same open source
      license (unless I am permitted to submit under a different license), as
      indicated in the file; or
      
      (c) The contribution was provided directly to me by some other person who
      certified (a), (b) or (c) and I have not modified it.
      
      I, Stefan de Konink, certify that:
       The contribution is based upon previous work that, again is based on GPL
      code and I have the right under that license to submit that work with
      modifications, whether created in whole or in part by me, under the same
      open source license.
      
      I, Alan Cox, certify likewise.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      59835997
    • Andrew Morton's avatar
      [PATCH] truncate_inode_pages latency fix · 65fe40ed
      Andrew Morton authored
      Fix scheduling latency issues with large truncates.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      65fe40ed
    • Oleg Nesterov's avatar
      [PATCH] hugetlbfs private mappings · c3dfa712
      Oleg Nesterov authored
      Hugetlbfs silently coerce private mappings of hugetlb files into shared
      ones.  So private writable mapping has MAP_SHARED semantics.  I think, such
      mappings should be disallowed.
      
      First, such behavior allows open hugetlbfs file O_RDONLY, and overwrite it
      via mmap(PROT_READ|PROT_WRITE, MAP_PRIVATE), so it is security bug.
      
      Second, private writable mmap() should fail just because kernel does not
      support this.
      
      I belisve, it is ok to allow private readonly hugetlb mappings,
      sys_mprotect() does not work with hugetlb vmas.
      
      There is another problem.  Hugetlb mapping is always prefaulted, pages
      allocated at mmap() time.  So even readonly mapping allows to enlarge the
      size of the hugetlbfs file, and steal huge pages without appropriative
      permissions.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c3dfa712
    • Oleg Nesterov's avatar
      [PATCH] /dev/zero vs hugetlb mappings. · ec081b11
      Oleg Nesterov authored
      Hugetlbfs mmap with MAP_PRIVATE becomes MAP_SHARED silently, but
      vma->vm_flags have no VM_SHARED bit.  Reading from /dev/zero into hugetlb
      area will do:
      
      read_zero()
          read_zero_pagealigned()
              if (vma->vm_flags & VM_SHARED)
                  break;                      // fallback to clear_user()
              zap_page_range();
              zeromap_page_range();
      
      It will hit BUG_ON() in unmap_hugepage_range() if region is not huge page
      aligned, or silently convert it into the private anonymous mapping.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ec081b11
    • Dave Hansen's avatar
      [PATCH] ppc64: add a pfn_to_kaddr() function · 82b11318
      Dave Hansen authored
      This is a helper function that a few architectures already have.  This just
      copies the i386 implementation to ppc64.
      Signed-off-by: default avatarDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      82b11318
    • Paul Mackerras's avatar
      [PATCH] ppc64: allocate irqstacks only for possible cpus · de2c2c9b
      Paul Mackerras authored
      With earlier setup of cpu_possible_map the number of irqstacks shrinks from
      NR_CPUS to the number of possible cpus.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      de2c2c9b
    • Paul Mackerras's avatar
      [PATCH] ppc64: set platform cpuids later in boot · 4fd4fa10
      Paul Mackerras authored
      Move the initialization of the per-cpu paca->hw_cpu_id out of the Open
      Firmware client boot code and into a common location which is executed
      later.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4fd4fa10
    • Paul Mackerras's avatar
      [PATCH] ppc64: rework PPC64 cpu map setup · 815e7a88
      Paul Mackerras authored
      Move all cpu map initializations to one place (except for the online map --
      cpus mark themselves online as they come up).  This sets up
      cpu_possible_map early enough that we can use num_possible_cpus for
      allocating irqstacks instead of NR_CPUS.  Hopefully this should also help
      set the stage for kexec.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      815e7a88
    • Paul Mackerras's avatar
      [PATCH] Update PPC MAINTAINERS & CREDITS · ce26f197
      Paul Mackerras authored
      David Engebretsen has moved on to other things and is no longer maintaining
      ppc64.  This patch adds an entry in CREDITS to note his contribution in
      leading the team that did the PPC64 port originally and updates various
      PPC-related MAINTAINERS entries.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ce26f197
    • Takashi Iwai's avatar
      [PATCH] Fix the unnecessary entropy call in the irq handler · 8cd809c5
      Takashi Iwai authored
      Currently add_interrupt_randomness() is called at each interrupt when one
      of the handlers has SA_SAMPLE_RANDOM flag, regardless whether the interrupt
      is processed by that handler or not.  This results in the higher latency
      and perfomance loss.
      
      The patch fixes this behavior to avoid the unnecessary call by checking the
      return value from each handler.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8cd809c5
    • Prasanna S. Panchamukhi's avatar
      [PATCH] Jumper Probes to provide function arguments · b08e7589
      Prasanna S. Panchamukhi authored
      A special kprobe type which can be placed on function entry points, and
      employs a simple mirroring principle to allow seamless access to the
      arguments of a function being probed.  The probe handler routine should
      have the same prototype as the function being probed.  Currently
      implemented for x86.
      
      The way it works is that when the probe is hit, the breakpoint handler
      simply irets to the probe handler's eip while retaining register and stack
      state corresponding to the function entry.  After it is done, the probe
      handler calls jprobe_return() which traps again to restore processor state
      and switch back to the probed function.  Linus noted correctly at KS that
      we need to be careful as gcc assumes that the callee owns arguments.  We
      save and restore enough stack bytes to cover argument space.
      
      Sample Usage:
      	static int jip_queue_xmit(struct sk_buff *skb, int ipfragok)
      	{
      		... whatever ...
      		jprobe_return();
      		return 0;
      	}
      
      	struct jprobe jp = {
      		{.addr = (kprobe_opcode_t *) ip_queue_xmit},
      		.entry = (kprobe_opcode_t *) jip_queue_xmit
      	};
      	register_jprobe(&jp);
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b08e7589
    • Prasanna S. Panchamukhi's avatar
      [PATCH] kprobes base patch · 4ece5899
      Prasanna S. Panchamukhi authored
      This patch helps developers to trap at almost any kernel code address,
      specifying a handler routine to be invoked when the breakpoint is hit.  
      
      Useful for analysing the Linux kernel by collecting debugging information
      non-disruptively.  Employs single-stepping out-of-line to avoid probe
      misses on SMP and may be especially useful in aiding debugging elusive
      races and problems on live systems.  More elaborate dynamic tracing tools
      such as DProbes can be built over the kprobes interface.
      
      
      Helps developers to trap at almost any kernel code address, specifying a
      handler routine to be invoked when the breakpoint is hit.  Useful for
      analysing the Linux kernel by collecting debugging information
      non-disruptively.  Employs single-stepping out-of-line to avoid probe
      misses on SMP and may be especially useful in aiding debugging elusive
      races and problems on live systems.  More elaborate dynamic tracing tools
      such as DProbes can be built over the kprobes interface.
      
      Sample usage:
      	To place a probe on __blockdev_direct_IO:
      	static int probe_handler(struct kprobe *p, struct pt_regs *)
      	{
      		... whatever ...
      	}
      	struct kprobe kp = {
      		.addr = __blockdev_direct_IO,
      		.pre_handler = probe_handler
      	};
      	register_kprobe(&kp);
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4ece5899
    • Prasanna S. Panchamukhi's avatar
      [PATCH] i386 exceptions notifier for kprobes · f63b75f9
      Prasanna S. Panchamukhi authored
      This patch provides notifiers for i386 architecture exceptions.  This patch
      has been ported from x86_64 architecture as suggested by Andi Kleen.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f63b75f9
    • Philippe Elie's avatar
      [PATCH] Fix oops with nmi-watchdog=2 · 5ae3fd75
      Philippe Elie authored
      Contributions from  Zarakin <zarakin@hotpop.com>
      
      Intel removed two msrs: MSR_P4_IQ_ESCR_0|1 (0x3ba/0x3bb), P4 model >= 3.  See
      Intel documentation Vol.  3 System Programming Guide Appendix B.
      
      nmi_watchdog=2 oopsed at boot time and oprofile at driver load.
      
      Avoid touching them when model >= 3.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5ae3fd75
    • Jason Davis's avatar
      [PATCH] platform update for ES7000 · 75b02c33
      Jason Davis authored
      This update only applies to Unisys' ES7000 server machines.  The patch adds
      a OEM id check to verify the current machine running is actually a Unisys
      type box before executing the Unisys OEM parser routine.  It also increases
      the MAX_MP_BUSSES definition from 32 to 256.  On the ES7000s, bus ID
      numbering can range from 0 to 255.  Without the patch, the system panics if
      booted with acpi=off.
      
      This patch has been tested and verified on an authentic ES7000 machine.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      75b02c33
    • Bjorn Helgaas's avatar
      [PATCH] Make assign_irq_vector() non-__init · 92d1cc78
      Bjorn Helgaas authored
      Make assign_irq_vector() non-__init always (it's called from
      io_apic_set_pci_routing(), which is used in the pci_enable_device() path).
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      92d1cc78
    • Anton Blanchard's avatar
      [PATCH] prio-tree: remove function prototype inside function · f4939b75
      Anton Blanchard authored
      I had a problem when compiling a 2.6 kernel with gcc 3.5 CVS.  The
      prototype for prio_tree_remove in mm/prio_tree.c is inside another
      function.  gcc 3.5 gets upset and removes the function completely.
      Apparently this isnt valid C, so lets fix it up.
      
      Details can be found here:
      
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17205Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f4939b75