1. 24 Aug, 2004 28 commits
    • Arjan van de Ven's avatar
      [PATCH] flexmmap patchkit: fix for 32 bit emu for 64 bit arches · 8fead718
      Arjan van de Ven authored
      Utz Lehmann <u.lehmann@de.tecosim.com> found a problem with the flexmmap
      patches on x86-64, what he is seeing is that the 32 bit personality isn't
      set at the first point of setting the allocator strategy.  The solution is
      simple, in binfmt_elf the personality is set so put the pick-layout
      function there.  Please consider,
      Signed-off-by: default avatarArjan van de Ven <arjanv@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8fead718
    • Ingo Molnar's avatar
      [PATCH] i386 virtual memory layout rework · 8913d55b
      Ingo Molnar authored
        Rework the i386 mm layout to allow applications to allocate more virtual
        memory, and larger contiguous chunks.
      
      
        - the patch is compatible with existing architectures that either make
          use of HAVE_ARCH_UNMAPPED_AREA or use the default mmap() allocator - there
          is no change in behavior.
      
        - 64-bit architectures can use the same mechanism to clean up 32-bit
          compatibility layouts: by defining HAVE_ARCH_PICK_MMAP_LAYOUT and
          providing a arch_pick_mmap_layout() function - which can then decide
          between various mmap() layout functions.
      
        - I also introduced a new personality bit (ADDR_COMPAT_LAYOUT) to signal
          older binaries that dont have PT_GNU_STACK.  x86 uses this to revert back
          to the stock layout.  I also changed x86 to not clear the personality bits
          upon exec(), like x86-64 already does.
      
        - once every architecture that uses HAVE_ARCH_UNMAPPED_AREA has defined
          its arch_pick_mmap_layout() function, we can get rid of
          HAVE_ARCH_UNMAPPED_AREA altogether, as a final cleanup.
      
        the new layout generation function (__get_unmapped_area()) got significant
        testing in FC1/2, so i'm pretty confident it's robust.
      
      
        Compiles & boots fine on an 'old' and on a 'new' x86 distro as well.
      
        The two known breakages were:
      
           http://www.redhatconfig.com/msg/67248.html
      
           [ 'cyzload' third-party utility broke. ]
      
           http://www.zipworld.com/au/~akpm/dde.tar.gz
      
           [ your editor broke :-) ]
      
        both were caused by application bugs that did:
      
      	int ret = malloc();
      
      	if (ret <= 0)
      		failure;
      
        such bugs are easy to spot if they happen, and if it happens it's possible
        to work it around immediately without having to change the binary, via the
        setarch patch.
      
        No other application has been found to be affected, and this particular
        change got pretty wide coverage already over RHEL3 and exec-shield, it's in
        use for more than a year.
      
      
        The setarch utility can be used to trigger the compatibility layout on
        x86, the following version has been patched to take the `-L' option:
      
       	http://people.redhat.com/mingo/flexible-mmap/setarch-1.4-2.tar.gz
      
        "setarch -L i386 <command>" will run the command with the old layout.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        The problem is in the flexible mmap patch: arch_get_unmapped_area_topdown
        is liable to give your mmap vm_start above TASK_SIZE with vm_end wrapped;
        which is confusing, and ends up as that BUG_ON(mm->map_count).
      
        The patch below stops that behaviour, but it's not the full solution:
        wilson_mmap_test -s 1000 then simply cannot allocate memory for the large
        mmap, whereas it works fine non-top-down.
      
        I think it's wrong to interpret a large or rlim_infinite stack rlimit as
        an inviolable request to reserve that much for the stack: it makes much less
        VM available than bottom up, not what was intended.  Perhaps top down should
        go bottom up (instead of belly up) when it fails - but I'd probably better
        leave that to Ingo.
      
        Or perhaps the default should place stack below text (as WLI suggested and
        ELF intended, with its text defaulting to 0x08048000, small progs sharing
        page table between stack and text and data); with a further personality for
        those needing bigger stack.
      
      From: Ingo Molnar <mingo@elte.hu>
      
        - fall back to the bottom-up layout if the stack can grow unlimited (if
        the stack ulimit has been set to RLIM_INFINITY)
      
        - try the bottom-up allocator if the top-down allocator fails - this can
        utilize the hole between the true bottom of the stack and its ulimit, as a
        last-resort effort.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8913d55b
    • Ingo Molnar's avatar
      [PATCH] sched: smt fixes · af6e050a
      Ingo Molnar authored
      while looking at HT scheduler bugreports and boot failures i discovered a
      bad assumption in most of the HT scheduling code: that resched_task() can
      be called without holding the task's runqueue.
      
      This is most definitely not valid - doing it without locking can lead to
      the task on that CPU exiting, and this CPU corrupting the (ex-) task_info
      struct.  It can also lead to HT-wakeup races with task switching on that
      other CPU.  (this_CPU marking the wrong task on that_CPU as need_resched -
      resulting in e.g.  idle wakeups not working.)
      
      The attached patch against fixes it all up. Changes:
      
      - resched_task() needs to touch the task so the runqueue lock of that CPU
        must be held: resched_task() now enforces this rule.
      
      - wake_priority_sleeper() was called without holding the runqueue lock.
      
      - wake_sleeping_dependent() needs to hold the runqueue locks of all
        siblings (2 typically).  Effects of this ripples back to schedule() as
        well - in the non-SMT case it gets compiled out so it's fine.
      
      - dependent_sleeper() needs the runqueue locks too - and it's slightly
        harder because it wants to know the 'next task' info which might change
        during the lock-drop/reacquire.  Ripple effect on schedule() => compiled
        out on non-SMT so fine.
      
      - resched_task() was disabling preemption for no good reason - all paths
        that called this function had either a spinlock held or irqs disabled.
      
      Compiled & booted on x86 SMP and UP, with and without SMT. Booted the
      SMT kernel on a real SMP+HT box as well. (Unpatched kernel wouldn't even
      boot with the resched_task() assert in place.)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      af6e050a
    • Ingo Molnar's avatar
      [PATCH] sched: self-reaping atomicity fix · f7e9143b
      Ingo Molnar authored
      disable preemption in the self-reap codepath, as such tasks may not be on
      the tasklist anymore and CPU-hotplug relies on the tasklist to migrate
      tasks.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f7e9143b
    • Ingo Molnar's avatar
      [PATCH] permit sleeping in release_task() · 2abf6861
      Ingo Molnar authored
      release_task() calls proc_pid_flush() call dput(), which can sleep.  But
      that's a late-in-exit no-preempt path with CONFIG_PREEMPT.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2abf6861
    • Ingo Molnar's avatar
      [PATCH] sched: new task fix · d513047b
      Ingo Molnar authored
      Rusty noticed that we update the parent ->avg_sleep without holding the
      runqueue lock. Also the code needed cleanups.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d513047b
    • Ingo Molnar's avatar
      [PATCH] sched: nonlinear timeslices · 68b4cdb8
      Ingo Molnar authored
      * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
      
      > Increasing priority (negative nice) doesn't have much impact. -20 CPU
      > hog only gets about double the CPU of a 0 priority CPU hog and only
      > about 120% the CPU time of a nice -10 hog.
      
      this is a property of the base scheduler as well.
      
      We can do a nonlinear timeslice distribution trivially - the attached
      patch implements the following timeslice distribution ontop of
      2.6.8-rc3-mm1:
      
         [ -20 ... 0 ... 19 ] => [800ms ... 100ms ... 5ms]
      
      the nice-20/nice+19 ratio is now 1:160 - sufficient for all aspects.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      68b4cdb8
    • Ingo Molnar's avatar
      [PATCH] sched: whitespace cleanups · 39901d5f
      Ingo Molnar authored
      - whitespace and style cleanups
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      39901d5f
    • Andrew Morton's avatar
      [PATCH] schedstat: UP fix · e272d4c2
      Andrew Morton authored
      SMP fix --
          for_each_domain() is not defined if not CONFIG_SMP, so show_schedstat
          needed a couple of extra ifdefs.
      Signed-off-by: default avatarRick Lindsley <ricklind@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e272d4c2
    • William Lee Irwin III's avatar
      [PATCH] sched: sparc32 fixes · 40efa147
      William Lee Irwin III authored
      Fix up sparc32 properly.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      40efa147
    • William Lee Irwin III's avatar
      [PATCH] sched: consolidate init_idle() and fork_by_hand() · 21d3dc9c
      William Lee Irwin III authored
      It appears that init_idle() and fork_by_hand() could be combined into a
      single method that calls init_idle() on behalf of the caller.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      21d3dc9c
    • Nathan Lynch's avatar
      [PATCH] move CONFIG_SCHEDSTATS to arch/ppc64/Kconfig.debug · 206f4a83
      Nathan Lynch authored
      Otherwise it shows up under "iSeries device drivers", which doesn't seem
      right.
      Signed-off-by: default avatarNathan Lynch <nathanl@austin.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      206f4a83
    • Rick Lindsley's avatar
      [PATCH] scheduler statistics · 7394ebbd
      Rick Lindsley authored
      It adds lots of CPU scheduler stats in /proc/pid/stat.  They are described in
      the new Documentation//sched-stats.txt
      
      We were carrying this patch offline for some time, but as there's still
      considerable ongoing work in this area, and as the new stats are a
      configuration option, I think it's best that this capability be in the base
      kernel.
      
      Nick removed a fair amount of statistics that he wasn't using.  The full patch
      gathers more information.  In particular, his patch doesn't include the code
      to measure the latency between the time a process is made runnable and the
      time it hits a processor which will be key to measuring interactivity changes.
      
      He passed his changes back to me and I got finished merging his changes with
      the current statistics patches just before OLS.  I believe this is largely a
      superset of the patch you grabbed and should port relatively easily too.
      
      Versions also exist for
      
          2.6.8-rc2
          2.6.8-rc2-mm1
          2.6.8-rc2-mm2
      
      at
          http://eaglet.rain.com/rick/linux/schedstat/patches/
      
      and within 24 hours at
      
          http://oss.software.ibm.com/linux/patches/?patch_id=730&show=all
      
      The version below is for 2.6.8-rc2-mm2 without the staircase code and has
      been compiled cleanly but not yet run.
      
      From: Ingo Molnar <mingo@elte.hu>
      
      this code needs a couple of cleanups before it can go into mainline:
      
      fs/proc/array.c, fs/proc/base.c, fs/proc/proc_misc.c:
      
       - moved the new /proc/<PID>/stat fields to /proc/<PID>/schedstat,
         because the new fields break older procps. It's cleaner this way
         anyway. This moving of fields necessiated a bump to version 10.
      
      Documentation/sched-stats.txt:
      
       - updated sched-stats.txt for version 10
      
       - wake_up_forked_thread() => wake_up_new_task()
      
       - updated the per-process field description
      
      Kconfig:
      
       - removed the default y and made the option dependent on DEBUG_KERNEL. 
         This is really for scheduler analysis, normal users dont need the 
         overhead.
      
      include/linux/sched.h:
      
       - moved the definitions into kernel/sched.c - this fixes UP compilation
         and is cleaner.
      
       - also moved the sched-domain definitions to sched.c - now that the 
         sched-domains internals are not exposed to architectures this is
         doable. It's also necessary due to the previous change.
      
      kernel/fork.c:
      
       - moved the ->sched_info init to sched_fork() where it belongs.
      
      kernel/sched.c:
      
       - wake_up_forked_thread() -> wake_up_new_task(), wuft_cnt -> wunt_cnt,
         wuft_moved -> wunt_moved.
      
       - wunt_cnt and wunt_moved were defined by never updated - added the 
         missing code to wake_up_new_task().
      
       - whitespace/style police
      
       - removed whitespace changes done to code not related to schedstats -
         i'll send a separate patch for these (and more).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7394ebbd
    • Con Kolivas's avatar
      [PATCH] sched: adjust p4 per-cpu gain · 8399dc16
      Con Kolivas authored
      The smt-nice handling is a little too aggressive by not estimating the per cpu
      gain as high enough for pentium4 hyperthread.  This patch changes the per
      sibling cpu gain from 15% to 25%.  The true per cpu gain is entirely dependant
      on the workload but overall the 2 species of Pentium4 that support
      hyperthreading have about 20-30% gain.
      
      P.S: Anton - For the power processors that are now using this SMT nice
      infrastructure it would be worth setting this value separately at 40%.
      Signed-off-by: default avatarCon Kolivas <kernel@kolivas.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8399dc16
    • Matthew Dobson's avatar
      [PATCH] Create cpu_sibling_map for PPC64 · 0c5af7c6
      Matthew Dobson authored
      In light of some proposed changes in the sched_domains code, I coded up
      this little ditty that simply creates and populates a cpu_sibling_map for
      PPC64 machines.  The patch just checks the CPU flags to determine if the
      CPU supports SMT (aka Hyper-Threading aka Multi-Threading aka ...) and
      fills in a mask of the siblings for each CPU in the system.  This should
      allow us to build sched_domains for PPC64 with generic code in
      kernel/sched.c for the SMT systems.  SMT is becoming more popular and is
      turning up in more and more architectures.  I don't think it will be too
      long until this feature is supported by most arches...
      Signed-off-by: default avatarMatthew Dobson <colpatch@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0c5af7c6
    • Dimitri Sivanich's avatar
      [PATCH] sched: isolated sched domains · 6f4c30b1
      Dimitri Sivanich authored
      Here's a version of the isolated scheduler domain code that I mentioned in
      an RFC on 7/22.  This patch applies on top of 2.6.8-rc2-mm1 (to include all
      of the new arch_init_sched_domain code).  This patch also contains the 2
      line fix to remove the check of first_cpu(sd->groups->cpumask)) that Jesse
      sent in earlier.
      
      Note that this has not been tested with CONFIG_SCHED_SMT.  I hope that my
      handling of those instances is OK.
      Signed-off-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6f4c30b1
    • Jesse Barnes's avatar
      [PATCH] sched: limit cpuspan of node scheduler domains · c183e253
      Jesse Barnes authored
        This patch limits the cpu span of each node's scheduler domain to prevent
        balancing across too many cpus.  The cpus included in a node's domain are
        determined by the SD_NODES_PER_DOMAIN define and the arch specific
        sched_domain_node_span routine if ARCH_HAS_SCHED_DOMAIN is defined.  If
        ARCH_HAS_SCHED_DOMAIN is not defined, behavior is unchanged--all possible
        cpus will be included in each node's scheduling domain.  Currently, only
        ia64 provides an arch specific sched_domain_node_span routine.
      
      From: Jesse Barnes <jbarnes@engr.sgi.com>
      
        This patch adds some more NUMA specific logic to the creation of scheduler
        domains.  Domains spanning all CPUs in a large system are too large to
        schedule across efficiently, leading to livelocks and inordinate amounts of
        time being spent in scheduler routines.  With this patch applied, the node
        scheduling domains for NUMA platforms will only contain a specified number
        of nearby CPUs, based on the value of SD_NODES_PER_DOMAIN.  It also allows
        arches to override SD_NODE_INIT, which sets the domain scheduling parameters
        for each node's domain.  This is necessary especially for large systems.
      
        Possible future directions:
      
        o multilevel node hierarchy (e.g.  node domains could contain 4 nodes
          worth of CPUs, supernode domains could contain 32 nodes worth, etc.  each
          with their own SD_NODE_INIT values)
      
        o more tweaking of SD_NODE_INIT values for good load balancing vs. 
          overhead tradeoffs
      
      From: mita akinobu <amgta@yacht.ocn.ne.jp>
      
        Compile fix
      Signed-off-by: default avatarJesse Barnes <jbarnes@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c183e253
    • Nick Piggin's avatar
      [PATCH] sched: consolidate sched domains · 8a7a2318
      Nick Piggin authored
        Teach the generic domains builder about SMT, and consolidate all
        architecture specific domain code into that.  Also, the SD_*_INIT macros can
        now be redefined by arch code without duplicating the entire setup code. 
        This can be done by defining ARCH_HASH_SCHED_TUNE.
      
        The generic builder has been simplified with the addition of a helper
        macro which will probably prove to be useful to arch specific code as well
        and should be exported if that is the case.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      
      From: Matthew Dobson <colpatch@us.ibm.com>
      
        The attached patch is against 2.6.8-rc2-mm2, and removes Nick's
        conditional definition & population of cpu_sibling_map[] in favor of my
        unconditional ones.  This does not affect how cpu_sibling_map is used, just
        gives it broader scope.
      
      From: Nick Piggin <nickpiggin@yahoo.com.au>
      
        Small fix to sched-consolidate-domains.patch picked up by
      
      From: Suresh <suresh.b.siddha@intel.com>
      
        another sched consolidate domains fix
      
      From: Nick Piggin <nickpiggin@yahoo.com.au>
      
        Don't use cpu_sibling_map if !CONFIG_SCHED_SMT
      
        This one spotted by Dimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8a7a2318
    • Ingo Molnar's avatar
      [PATCH] sched: fork hotplug hanling cleanup · c62e7cdb
      Ingo Molnar authored
      - remove the hotplug lock from around much of fork(), and re-copy the
        cpus_allowed mask to solve the hotplug race cleanly.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarSrivatsa Vaddagiri <vatsa@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c62e7cdb
    • Nick Piggin's avatar
      [PATCH] sched: remove balance on clone · c15d3bea
      Nick Piggin authored
      This removes balance on clone capability altogether.  I told Andi we wouldn't
      remove it yet, but provided it is in a single small patch, he mightn't get too
      upset.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c15d3bea
    • Nick Piggin's avatar
      [PATCH] sched: disable balance on clone · b4f14b64
      Nick Piggin authored
      Don't balance on clone by default.
      
      Balance on clone has a number of trivial performance failure cases, but it was
      needed to get decent OpenMP performance on NUMA (Opteron) systems.  Not doing
      child-runs-first for new threads also solves this problem in a nicer way
      (implemented in a previous patch).
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b4f14b64
    • Nick Piggin's avatar
      [PATCH] sched: sched misc changes · 8a78765b
      Nick Piggin authored
      Add some likely/unliklies, a for_each_cpu => for_each_cpu_online, and close
      the sched_exit race.
      
      From: Ingo Molnar <mingo@elte.hu>
      
        fix a typo in a previous patch breaking RT scheduling & interactivity.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8a78765b
    • Nick Piggin's avatar
      [PATCH] sched: make rt_task unlikely · 0df0d063
      Nick Piggin authored
      From: Ingo Molnar <mingo@elte.hu>
      
      RT tasks are unlikely, move this into rt_task() instead of open-coding it.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0df0d063
    • Ingo Molnar's avatar
      [PATCH] sched: misc cleanups #2 · ce9bb66d
      Ingo Molnar authored
       - fix two stale comments
       - cleanup
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ce9bb66d
    • Nick Piggin's avatar
      [PATCH] kernel thread idle fix · 49717553
      Nick Piggin authored
      Now that init_idle does not remove tasks from the runqueue, those
      architectures that use kernel_thread instead of copy_process for the idle
      task will break.  To fix, ensure that CLONE_IDLETASK tasks are not put on
      the runqueue in the first place.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      49717553
    • Nick Piggin's avatar
      [PATCH] sched: cleanup, improve sched <=> fork APIs · 3632d86a
      Nick Piggin authored
      Move balancing and child-runs-first logic from fork.c into sched.c where
      it belongs.
      
      * Consolidate wake_up_forked_process and wake_up_forked_thread into
        wake_up_new_process, and pass in clone_flags as suggested by Linus.  This
        removes a lot of code duplication and allows all logic to be handled in that
        function.
      
      * Don't do balance-on-clone balancing for vfork'ed threads.
      
      * Don't do set_task_cpu or balance one clone in wake_up_new_process. 
        Instead do it in sched_fork to fix set_cpus_allowed races.
      
      * Don't do child-runs-first for CLONE_VM processes, as there is obviously no
        COW benifit to be had.  This is a big one, it enables Andi's workload to run
        well without clone balancing, because the OpenMP child threads can get
        balanced off to other nodes *before* they start running and allocating
        memory.
      
      * Rename sched_balance_exec to sched_exec: hide the policy from the API.
      
      
      From: Ingo Molnar <mingo@elte.hu>
      
        rename wake_up_new_process -> wake_up_new_task.
      
        in sched.c we are gradually moving away from the overloaded 'process' or
        'thread' notion to the traditional task (or context) naming.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3632d86a
    • Nick Piggin's avatar
      [PATCH] sched: cleanup init_idle() · 70a0b8e7
      Nick Piggin authored
      Clean up init_idle to not use wake_up_forked_process, then undo all the stuff
      that call does.  Instead, do everything in init_idle.
      
      Make double_rq_lock depend on CONFIG_SMP because it is no longer used on UP.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      70a0b8e7
    • Ingo Molnar's avatar
      [PATCH] sched: fix timeslice calculations for HZ=1000. · b2a0e913
      Ingo Molnar authored
      The main benefit is that with the default HZ=1000 nice +19 tasks now get 5
      msecs of timeslices, so the ratio of CPU use is linear.  (nice 0 task gets
      20 times more CPU time than a nice 19 task.  Prior this change the ratio
      was 1:10)
      
      another effect is that nice 0 tasks now get a round 100 msecs of timeslices
      (as intended), instead of 102 msecs.
      
      here's a table of old/new timeslice values, for HZ=1000 and 100:
      
                            HZ=1000         (   HZ=100   )
                          old    new        ( old    new )
      
              nice -20:   200    200        ( 200    200 )
              nice -19:   195    195        ( 190    190 )
              ...
              nice 0:     102    100        ( 100    100 )
              nice 1:      97     95        (  90     90 )
              nice 2:      92     90        (  90     90 )
              ...
              nice 17:     19     15        (  10     10 )
              nice 18:     14     10        (  10     10 )
              nice 19:     10      5        (  10     10 )
      
      i've tested the patch on x86.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b2a0e913
  2. 23 Aug, 2004 12 commits