Commits · 1142d810298e694754498dbb4983fcb6cb7fd884 · nexedi / linux

06 May, 2010 1 commit

cpu_stop: implement stop_cpu[s]() · 1142d810

Tejun Heo authored 14 years ago


Implement a simplistic per-cpu maximum priority cpu monopolization
mechanism.  A non-sleeping callback can be scheduled to run on one or
multiple cpus with maximum priority monopolozing those cpus.  This is
primarily to replace and unify RT workqueue usage in stop_machine and
scheduler migration_thread which currently is serving multiple
purposes.

Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
stop_cpus() and try_stop_cpus().

This is to allow clean sharing of resources among stop_cpu and all the
migration thread users.  One stopper thread per cpu is created which
is currently named "stopper/CPU".  This will eventually replace the
migration thread and take on its name.

* This facility was originally named cpuhog and lived in separate
  files but Peter Zijlstra nacked the name and thus got renamed to
  cpu_stop and moved into stop_machine.c.

* Better reporting of preemption leak as per Peter's suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>

1142d810

23 Apr, 2010 3 commits

sched: Fix select_idle_sibling() logic in select_task_rq_fair() · 99bd5e2f

Suresh Siddha authored 15 years ago

Issues in the current select_idle_sibling() logic in select_task_rq_fair()
in the context of a task wake-up:

a) Once we select the idle sibling, we use that domain (spanning the cpu that
the task is currently woken-up and the idle sibling that we found) in our
wake_affine() decisions. This domain is completely different from the
domain(we are supposed to use) that spans the cpu that the task currently
woken-up and the cpu where the task previously ran.

b) We do select_idle_sibling() check only for the cpu that the task is
currently woken-up on. If select_task_rq_fair() selects the previously run
cpu for waking the task, doing a select_idle_sibling() check
for that cpu also helps and we don't do this currently.

c) In the scenarios where the cpu that the task is woken-up is busy but
with its HT siblings are idle, we are selecting the task be woken-up
on the idle HT sibling instead of a core that it previously ran
and currently completely idle. i.e., we are not taking decisions based on
wake_affine() but directly selecting an idle sibling that can cause
an imbalance at the SMT/MC level which will be later corrected by the
periodic load balancer.

Fix this by first going through the load imbalance calculations using
wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu,
then choose a possible idle sibling for waking up the task on.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

99bd5e2f

sched: Pre-compute cpumask_weight(sched_domain_span(sd)) · 669c55e9

Peter Zijlstra authored 14 years ago


Dave reported that his large SPARC machines spend lots of time in
hweight64(), try and optimize some of those needless cpumask_weight()
invocations (esp. with the large offstack cpumasks these are very
expensive indeed).
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

669c55e9

sched: Cure load average vs NO_HZ woes · 74f5187a

Peter Zijlstra authored 14 years ago


Chase reported that due to us decrementing calc_load_task prematurely
(before the next LOAD_FREQ sample), the load average could be scewed
by as much as the number of CPUs in the machine.

This patch, based on Chase's patch, cures the problem by keeping the
delta of the CPU going into NO_HZ idle separately and folding that in
on the next LOAD_FREQ update.

This restores the balance and we get strict LOAD_FREQ period samples.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chase Douglas <chase.douglas@canonical.com>
LKML-Reference: <1271934490.1776.343.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

74f5187a

15 Apr, 2010 2 commits

sched: Fix UP update_avg() build warning · 09a40af5

Mike Galbraith authored 14 years ago


update_avg() is only used for SMP builds, move it to the nearest
SMP block.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1271309399.14779.17.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

09a40af5

Merge branch 'linus' into sched/core · b257c14c

Ingo Molnar authored 14 years ago


Merge reason: merge the latest fixes, update to -rc4.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

b257c14c

14 Apr, 2010 1 commit

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 · 2ba3abd8

Linus Torvalds authored 14 years ago

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handling

2ba3abd8

13 Apr, 2010 33 commits

Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 · 0fdfe5ad

Linus Torvalds authored 14 years ago

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFSv4: fix delegated locking
  NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible
  NFS: Fix a race with the new commit code
  NFS: Ensure that writeback_single_inode() calls write_inode() when syncing
  NFS: Fix the mode calculation in nfs_find_open_context
  NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR

0fdfe5ad

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · 44d2d371

Linus Torvalds authored 14 years ago

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Add some more commentary to __raw_local_irq_save()
  sparc64: Fix memory leak in pci_register_iommu_region().
  sparc64: Add kmemleak annotation to sun4v_build_virq()
  sparc64: Support kmemleak.
  sparc64: Add function graph tracer support.
  sparc64: Give a stack frame to the ftrace call sites.
  sparc64: Use a seperate counter for timer interrupts and NMI checks, like x86.
  sparc64: Remove profiling from some low-level bits.
  sparc64: Kill unnecessary static on local var in ftrace_call_replace().
  sparc64: Kill CONFIG_STACK_DEBUG code.
  sparc64: Add HAVE_FUNCTION_TRACE_MCOUNT_TEST and tidy up.
  sparc64: Adjust __raw_local_irq_save() to cooperate in NMIs.
  sparc64: Use kstack_valid() in die_if_kernel().

44d2d371

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 465de2ba

Linus Torvalds authored 14 years ago

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits)
  smc91c92_cs: define multicast_table as unsigned char
  can: avoids a false warning
  e1000e: stop cleaning when we reach tx_ring->next_to_use
  igb: restrict WoL for 82576 ET2 Quad Port Server Adapter
  virtio_net: missing sg_init_table
  Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"
  iwlwifi: need check for valid qos packet before free
  tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
  udp: fix for unicast RX path optimization
  myri10ge: fix rx_pause in myri10ge_set_pauseparam
  net: corrected documentation for hardware time stamping
  stmmac: use resource_size()
  x.25 attempts to negotiate invalid throughput
  x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
  bridge: Fix IGMP3 report parsing
  cnic: Fix crash during bnx2x MTU change.
  qlcnic: fix set mac addr
  r6040: fix r6040_multicast_list
  vhost-net: fix vq_memory_access_ok error checking
  ath9k: fix double calls to ath_radio_enable
  ...

465de2ba

smc91c92_cs: define multicast_table as unsigned char · a6d37024

Ken Kawasaki authored 14 years ago


smc91c92_cs:
  * define multicast_table as unsigned char
  * remove unnecessary "#ifndef final_version"
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6d37024

can: avoids a false warning · 4ffa8701

Eric Dumazet authored 14 years ago


At this point optlen == sizeof(sfilter) but some compilers are dumb.

Reported-by: Németh Márton <nm127@freemail.h
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ffa8701

e1000e: stop cleaning when we reach tx_ring->next_to_use · dac87619

Terry Loftin authored 14 years ago


Tx ring buffers after tx_ring->next_to_use are volatile and could
change, possibly causing a crash.  Stop cleaning when we hit
tx_ring->next_to_use.
Signed-off-by: Terry Loftin <terry.loftin@hp.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dac87619

igb: restrict WoL for 82576 ET2 Quad Port Server Adapter · d5aa2252

Stefan Assmann authored 14 years ago


Restrict Wake-on-LAN to first port on 82576 ET2 quad port NICs, as it is
only supported there.
Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5aa2252

sparc64: Add some more commentary to __raw_local_irq_save() · c011f80b
David S. Miller authored 14 years ago
```
Suggested by Peter Zijlstra
Signed-off-by: David S. Miller <davem@davemloft.net>
```
c011f80b
Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ · 9343af08
David S. Miller authored 14 years ago
```
Conflicts:
	lib/Kconfig.debug
```
9343af08

sparc64: Fix memory leak in pci_register_iommu_region(). · e182c77c

David S. Miller authored 14 years ago


Found by kmemleak.

If request_resource() fails, we leak the struct resource we
allocated to represent the IOMMU mapping area.

This actually happens on sun4v machines because the IOMEM area is only
reported sans the IOMMU region, unlike all previous systems.  I'll
need to fix that at some point, but for now fix the leak.
Signed-off-by: David S. Miller <davem@davemloft.net>

e182c77c

sparc64: Add kmemleak annotation to sun4v_build_virq() · 25ad403f

David S. Miller authored 14 years ago


The only reference we store to this memory is in the form of a
physical address, so kmemleak can't see it.

Add a kmemleak_not_leak() annotation.

It's probably useful to be able to look at a dump of these things
either via debugfs or similar, and thus we could at some point store
them in some kind of table and therefore get rid of this annotation.
Signed-off-by: David S. Miller <davem@davemloft.net>

25ad403f

sparc64: Support kmemleak. · 8b8d8e28

David S. Miller authored 14 years ago


Only missing thing was an _sdata marker in vmlinux.lds.S
Signed-off-by: David S. Miller <davem@davemloft.net>

8b8d8e28

sparc64: Add function graph tracer support. · 9960e9e8
David S. Miller authored 14 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
9960e9e8

sparc64: Give a stack frame to the ftrace call sites. · a71d1d6b

David S. Miller authored 14 years ago


It's the only way we'll be able to implement the function
graph tracer properly.

A positive is that we no longer have to worry about the
linker over-optimizing the tail call, since we don't
use a tail call any more.
Signed-off-by: David S. Miller <davem@davemloft.net>

a71d1d6b

sparc64: Use a seperate counter for timer interrupts and NMI checks, like x86. · daecbf58

David S. Miller authored 14 years ago


This keeps us from having to use kstat_irqs_cpu() from the NMI handler,
the former of which is a profiled function.

Instead we use a currently empty slot in the cpu_data
Signed-off-by: David S. Miller <davem@davemloft.net>

daecbf58

sparc64: Remove profiling from some low-level bits. · f8e8a8e8

David S. Miller authored 14 years ago


These include the timer implementation, perf events support, and the
performance counter register (pcr) programming layer.
Signed-off-by: David S. Miller <davem@davemloft.net>

f8e8a8e8

sparc64: Kill unnecessary static on local var in ftrace_call_replace(). · d96478d5
David S. Miller authored 14 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
d96478d5

sparc64: Kill CONFIG_STACK_DEBUG code. · ddacd0bc

David S. Miller authored 14 years ago


The generic stack tracer does this job just as well.
Signed-off-by: David S. Miller <davem@davemloft.net>

ddacd0bc

sparc64: Add HAVE_FUNCTION_TRACE_MCOUNT_TEST and tidy up. · 63b75495

David S. Miller authored 14 years ago


Check function_trace_stop at ftrace_caller

Toss mcount_call and dummy call of ftrace_stub, unnecessary.

Document problems we'll have if the final kernel image link
ever turns on relaxation.

Properly size 'ftrace_call' so it looks right when inspecting
instructions under gdb et al.
Signed-off-by: David S. Miller <davem@davemloft.net>

63b75495

sparc64: Adjust __raw_local_irq_save() to cooperate in NMIs. · 0c25e9e6

David S. Miller authored 14 years ago


If we are in an NMI then doing a plain raw_local_irq_disable() will
write PIL_NORMAL_MAX into %pil, which is lower than PIL_NMI, and thus
we'll re-enable NMIs and recurse.

Doing a simple:

	%pil = %pil | PIL_NORMAL_MAX

does what we want, if we're already at PIL_NMI (15) we leave it at
that setting, else we set it to PIL_NORMAL_MAX (14).

This should get the function tracer working on sparc64.
Signed-off-by: David S. Miller <davem@davemloft.net>

0c25e9e6

sparc64: Use kstack_valid() in die_if_kernel(). · cb256aa6

David S. Miller authored 14 years ago


This gets rid of a local function (is_kernel_stack()) which tries to
do the same thing, yet poorly in that it doesn't handle IRQ stacks
properly.
Signed-off-by: David S. Miller <davem@davemloft.net>

cb256aa6

virtio_net: missing sg_init_table · 0e413f22

Shirley Ma authored 15 years ago


Add missing sg_init_table for sg_set_buf in virtio_net which
induced in defer skb patch.
Reported-by: Thomas Müller <thomas@mathtm.de>
Tested-by: Thomas Müller <thomas@mathtm.de>
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e413f22

Linux 2.6.34-rc4 · 0d0fb0f9
Linus Torvalds authored 14 years ago

0d0fb0f9

Merge branch 'anonvma' · 64a8920f

Linus Torvalds authored 14 years ago

* anonvma:
  anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma
  anon_vma: clone the anon_vma chain in the right order
  vma_adjust: fix the copying of anon_vma chains
  Simplify and comment on anon_vma re-use for anon_vma_prepare()

64a8920f

Merge master.kernel.org:/home/rmk/linux-2.6-arm · 50b88c46

Linus Torvalds authored 14 years ago

* master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits)
  ARM: Fix ioremap_cached()/ioremap_wc() for SMP platforms
  ARM: 6043/1: AT91 slow-clock resume: Don't wait for a disabled PLL to lock
  ARM: 6031/1: fix Thumb-2 decompressor
  ARM: 6029/1: ep93xx: gpio.c: local functions should be static
  ARM: 6028/1: ARM: add MAINTAINERS for U300
  ARM: 6024/1: bcmring: fix missing down on semaphore in dma.c
  MXC: mach_armadillo5x0: Add USB Host support.
  ARM mach-mx3: duplicated include
  ARM mach-mx3: duplicated include
  imx31: add watchdog device on litekit board.
  imx3: Add watchdog platform device support
  MXC: mach-mx31_3ds: add support for freescale mc13783 power management device.
  MXC: mach-mx31_3ds: Add SPI1 device support.
  MXC: mach-mx31_3ds: Add support for on board NAND Flash.
  MXC: mach-mx31_3ds: Update variable names over recent mach name modification.
  imx31: fix parent clock for rtc
  i.MX51: remove NFC AXI static mapping
  i.MX51: determine silicon revision dynamically
  i.MX51: map TZIC dynamically
  i.MX51: Use correct clock for gpt
  ...

50b88c46

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · d6cf853d

Linus Torvalds authored 14 years ago

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: make sure the chunk allocator doesn't create zero length chunks
  Btrfs: fix data enospc check overflow

d6cf853d

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 · 6a945f38

Linus Torvalds authored 14 years ago

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Fix possible dq_flags corruption
  quota: Hide warnings about writes to the filesystem before quota was turned on
  ext3: symlink must be handled via filesystem specific operation
  ext2: symlink must be handled via filesystem specific operation

6a945f38

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 · 50fc88cb

Linus Torvalds authored 14 years ago

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: add speciffic ->setattr callback
  udf: potential integer overflow

50fc88cb

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · 4505a493

Linus Torvalds authored 14 years ago

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (36 commits)
  MIPS: Calculate proper ebase value for 64-bit kernels
  MIPS: Alchemy: DB1200: Remove custom wait implementation
  MIPS: Big Sur: Make defconfig more useful.
  MIPS: Fix __vmalloc() etc. on MIPS for non-GPL modules
  MIPS: Sibyte: Fix M3 TLB exception handler workaround.
  MIPS: BCM63xx: Fix build failure in board_bcm963xx.c
  MIPS: uasm: Add OR instruction.
  MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions.
  MIPS: BCM63xx: Initialize gpio_out_low & out_high to current value at boot.
  MIPS: BCM63xx: Register SSB SPROM fallback in board's first stage callback
  MIPS: BCM63xx: Fix typo in cpu-feature-overrides file.
  MIPS: BCM63xx: Add support for second uart.
  MIPS: BCM63xx: Fix double gpio registration.
  MIPS: BCM63xx: Add DWVS0 board
  MIPS: BCM63xx: Add the RTA1025W-16 BCM6348-based board to suppported boards.
  MIPS: BCM63xx: Fix BCM6338 and BCM6345 gpio count
  MIPS: libgcc.h: Checkpatch cleanup
  MIPS: Loongson-2F: Flush the branch target history in BTB and RAS
  MIPS: Move signal trampolines off of the stack.
  MIPS: Preliminary VDSO
  ...

4505a493

Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux · fedfb947
Linus Torvalds authored 14 years ago
```
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
  svcrdma: RDMA support not yet compatible with RPC6
```
fedfb947

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · 44fa2b4b

Linus Torvalds authored 14 years ago

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix typo "numer" -> "number" in alloc.c
  nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()
  nilfs2: fix a wrong type conversion in nilfs_ioctl()

44fa2b4b

anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma · ea90002b

Linus Torvalds authored 14 years ago


Otherwise we might be mapping in a page in a new mapping, but that page
(through the swapcache) would later be mapped into an old mapping too.
The page->mapping must be the case that works for everybody, not just
the mapping that happened to page it in first.

Here's the scenario:

 - page gets allocated/mapped by process A. Let's call the anon_vma we
   associate the page with 'A' to keep it easy to track.

 - Process A forks, creating process B. The anon_vma in B is 'B', and has
   a chain that looks like 'B' -> 'A'. Everything is fine.

 - Swapping happens. The page (with mapping pointing to 'A') gets swapped
   out (perhaps not to disk - it's enough to assume that it's just not
   mapped any more, and lives entirely in the swap-cache)

 - Process B pages it in, which goes like this:

        do_swap_page ->
          page = lookup_swap_cache(entry);
         ...
          set_pte_at(mm, address, page_table, pte);
          page_add_anon_rmap(page, vma, address);

   And think about what happens here!

   In particular, what happens is that this will now be the "first"
   mapping of that page, so page_add_anon_rmap() used to do

        if (first)
                __page_set_anon_rmap(page, vma, address);

   and notice what anon_vma it will use? It will use the anon_vma for
   process B!

   What happens then? Trivial: process 'A' also pages it in (nothing
   happens, it's not the first mapping), and then process 'B' execve's
   or exits or unmaps, making anon_vma B go away.

   End result: process A has a page that points to anon_vma B, but
   anon_vma B does not exist any more.  This can go on forever.  Forget
   about RCU grace periods, forget about locking, forget anything like
   that.  The bug is simply that page->mapping points to an anon_vma
   that was correct at one point, but was _not_ the one that was shared
   by all users of that possible mapping.

Changing it to always use the deepest anon_vma in the anonvma chain gets
us to the safest model.

This can be improved in certain cases: if we know the page is private to
just this particular mapping (for example, it's a new page, or it is the
only swapcache entry), we could pick the top (most specific) anon_vma.

But that's a future optimization. Make it _work_ reliably first.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ea90002b

anon_vma: clone the anon_vma chain in the right order · 646d87b4

Linus Torvalds authored 14 years ago


We want to walk the chain in reverse order when cloning it, so that the
order of the result chain will be the same as the order in the source
chain.  When we add entries to the chain, they go at the head of the
chain, so we want to add the source head last.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

646d87b4