Commits · 4b36077f9cbc7bd58dc616a778b3290f1ea43c98 · nexedi / linux

04 Jan, 2005 40 commits

Zou Nanhai authored Jan 04, 2005

- Merge sys32_rt_sigtimedwait function in X86_64, IA64, PPC64, MIPS,
  SPARC64, S390 32 bit layer into 1 compat_rt_sigtimedwait function.  It will
  also fix a bug of copy wrong information to 32 bit userspace siginfo
  structure on X86_64, IA64 and SPARC64 when calling sigtimedwait on 32 bit
  layer.

- Change all name the of siginfo_t32 structure in X86_64, IA64, MIPS,
  SPARC64 and S390 to the name compat_siginfo_t as used in PPC64.

- Patch introduced a macro __COMPAT_ENDIAN_SWAP__ in
  include/asm-mips/compat.h when MIPS kernel is compiled in little-endian
  mode.  This macro is used to do byte swapping in function
  sigset_from_compat.

- This patch is only tested on X86_64 and IA_64.
Signed-off-by: Zou Nan hai <Nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4b36077f

[PATCH] remove dead ext3_put_inode prototype · effe830d

Christoph Hellwig authored Jan 04, 2005

ext3_put_inode has been removed a while ago.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

effe830d

[PATCH] udf: fix reservation discarding · f9346132

Christoph Hellwig authored Jan 04, 2005

UDF discards file preallocations on every ->put_inode which is totally
bogus.  It already discards them in ->release which makes sense for normal
writes, so the only additional discard is in ->clear_inode so we make sure
we don't leak any reservations for shared writeable mappings.

This follows similar changes to ext2 and ext3.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f9346132

[PATCH] udf: simplify udf_iget, fix race · 55521f50

Christoph Hellwig authored Jan 04, 2005

udf_iget calls __udf_read_inode after the inode has been unlocked and other
threads could access it.  Switching to iget_locked() fixes this race and
nicely simplifies the code.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

55521f50

[PATCH] Remove RCU abuse in cpu_idle() · f2f1b44c

Zwane Mwaikambo authored Jan 04, 2005

Introduce cpu_idle_wait() on architectures requiring modification of
pm_idle from modules, this will ensure that all processors have updated
their cached values of pm_idle upon exit.  This patch is to address the bug
report at http://bugme.osdl.org/show_bug.cgi?id=1716 and replaces the
current code fix which is in violation of normal RCU usage as pointed out
by Stephen, Dipankar and Paul.
Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f2f1b44c

[PATCH] __getblk_slow can loop forever when pages are partially mapped · a61e7286

Chris Mason authored Jan 04, 2005

When a block device is accessed via read/write, it is possible for some of
the buffers on a page to be mapped and others not.  __getblk and friends
assume this can't happen, and can end up looping forever when pages have
some unmapped buffers.  Picture:

lseek(/dev/xxx, 2048, SEEK_SET)
write(/dev/xxx, 2048 bytes)

Assuming the block size is 1k, page 0 has 4 buffers, two are mapped by
__block_prepare_write and two are not.  Next, another process triggers
getblk(/dev/xxx, blocknr = 0);

__getblk_slow will loop forever.  __find_get_block fails because the buffer
isn't mapped.  grow_dev_page does nothing because there are buffers on the
page with the correct size.  madhav@veritas.com and others at Veritas
tracked this down.

The fix below has two parts.  First, it changes __find_get_block to avoid
the buffer_error warnings when it finds unmapped buffers on the page.

Second, it changes grow_dev_page to map the buffers on the page by calling
init_page_buffers.  init_page_buffers is changed so we don't stomp on
uptodate bits for the buffers.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a61e7286

[PATCH] IRQ resource deallocation: ia64 · 5d25c798

Kenji Kaneshige authored Jan 04, 2005

This is an ia64 portion of IRQ resource deallocation. It implements
pcibios_disable_device() and acpi_unregister_gsi() for ia64.

    o acpi_unregister_gsi()

        Summary of changes for implementing this interface:

        - Add new function iosapic_unregister_intr() into
          arch/ia64/kernel/iosapic.c. This function frees an interrupt
          vector and related data structures.

        - Add new function free_irq_vector() into
          arch/ia64/kernel/irq_ia64.c. This frees an unused vector.

        - Change assign_irq_vector() to be able to support
          free_irq_vector().

    o pcibios_disable_device()

        This calls acpi_pci_irq_disable() to deallocate IRQ resources.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5d25c798

[PATCH] IRQ resource deallocation: ACPI · 0090012b

Kenji Kaneshige authored Jan 04, 2005

Architecture dependent IRQ resources such as interrupt vector for PCI
devices are allocated at pci_enable_device() time on i386, x86-64 and
ia64 platform. Today, however, these IRQ resources are never
deallocated even if they are no longer used. The following set of
patches adds supports to deallocate IRQ resources at
pci_disable_device() time.

The motivation of the set of patches is as follows:

    - IRQ resources such as interrupt vectors should be freed if they
      are no longer used because the amount of these resources are
      limited. By deallocating IRQ resources, we can recycle them.

    - I think some hardwares will support hot-pluggable I/O units with
      I/O xAPICs in the near future. So I/O xAPIC hot-plug support by
      OS will be needed soon. IRQ resouces deallocation will be one of
      the most important stuff for I/O xAPIC hot-plug.

For now, the following set of patches has ia64 implementation only.
i386 and x86_64 implementations are TBD.




This patch is ACPI portion of IRQ deallocation. This patch defines the
following new interface. The implementation of this interface depends
on each platform.

    o void acpi_unregister_gsi(u32 gsi)

        This is a opposite portion of acpi_register_gsi(). This has a
        responsibility for deallocating IRQ resources associated with
        the specified GSI number.

        We need to consider the case of shared interrupt. In the case
        of shared interrupt, acpi_register_gsi() is called multiple
        times for one gsi. That is, registrations and unregistrations
        can be nested.

        This function undoes the effect of one call to
        acpi_register_gsi(). If this matches the last registration,
        IRQ resources associated with the specified GSI number are
        freed.

This patch also adds the following new function.

    o void acpi_pci_irq_disable (struct pci_dev *dev)

        This function is a opposite portion of
        acpi_pci_enable_irq(). It clears the device's linux IRQ number
        and calls acpi_unregister_gsi() to deallocate IRQ resources.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0090012b

[PATCH] fix missing wakeup in ipc/sem · 956cdd1b

Manfred Spraul authored Jan 04, 2005

My patch that removed the spin_lock calls from the tail of sys_semtimedop
introduced a bug:

Before my patch was merged, every operation that altered an array called
update_queue.  That call woke up threads that were waiting until a
semaphore value becomes 0.  I've accidentially removed that call.

The attached patch fixes that by modifying update_queue: the function now
loops internally and wakes up all threads.  The patch also removes
update_queue calls from the error path of sys_semtimedop: failed operations
do not modify the array, no need to rescan the list of waiting threads.
Signed-Off-By: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

956cdd1b

[PATCH] Ext[23]: apply umask to symlinks with ACLs configured out · 79a35a44

Andreas Gruenbacher authored Jan 04, 2005

Keith Young <stripyd@stripydog.com> has reported that when ACLs are not
compiled in, the default implementation of ext[23]_init_acl applies the
umask to all new files, including symlinks, which is wrong. In this case
the VFS already takes care of applying the umask when needed, so ext2 and
ext3 need not bother about it. Remove the superfluous statements.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

79a35a44

[PATCH] get_blkdev_list() cleanup · eed6b962

Andrew Morton authored Jan 04, 2005

- Move prototype to genhd.h

- It is only needed for /proc
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

eed6b962

[PATCH] noone uses HAVE_ARCH_SI_CODES or HAVE_ARCH_SIGEVENT_T · 06901504

Stephen Rothwell authored Jan 04, 2005

Since asm-generic/siginfo.h was created, the architectures have been slowly
fixed/modified until noone uses HAVE_ARCH_SI_CODES or HAVE_ARCH_SIGEVENT_T
any more, so this patch removes the checks for them.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

06901504

[PATCH] loop device resursion avoidance · 577dfb53

Franz Pletz authored Jan 04, 2005

With Andries Brouwer <Andries.Brouwer@cwi.nl>

Fix various recursion scenarios wherein it was possible to mount a loop
device on itself, either directly or via intermediate loops devices.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

577dfb53

[PATCH] noop iosched: remove unused includes · 24498885

Pekka Enberg authored Jan 04, 2005

This patch removes unused includes from drivers/block/noop-iosched.c.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

24498885

[PATCH] noop iosched: make code static · e7e22a3a

Pekka Enberg authored Jan 04, 2005

This patch makes code static in drivers/block/noop-iosched.c and adds
__init and __exit for module initialization and cleanup functions.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e7e22a3a

[PATCH] cpumask: range check before using value · 920e3328

Randy Dunlap authored Jan 04, 2005

When setting the 'cpu_isolated_map' mask, check that the user input value
is valid (in range 0 ..  NR_CPUS - 1).  Also fix up kernel-parameters.txt
for this parameter.
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

920e3328

[PATCH] fix alt-sysrq deadlock · 68764ad9

Zwane Mwaikambo authored Jan 04, 2005

__handle_sysrq was modified to do a spin_lock_irqsave so we were entering
smp_send_stop with interrupts.  So reenable interrupts to prevent the
possible smp_call_function() deadlock.

(It's still deadlocky if the sysrq handler is against called via an
interrupt from a different device, but that seems unlikely).
Signed-off-by: Zwane Mwaikambo <zwane@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

68764ad9

[PATCH] Add PR_GET_NAME · e2901099

Prasanna Meda authored Jan 04, 2005

A while back we added the PR_SET_NAME prctl, but no PR_GET_NAME.  I guess
we should add this, if only to enable testing of PR_SET_NAME.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e2901099

[PATCH] panic_timeout: move to kernel.h · 4ffd90a1

Randy Dunlap authored Jan 04, 2005

Move 'panic_timeout' to linux/kernel.h.

ipmi_watchdog.c wanted to know why panic_timeout isn't in some header file.
 However, ipmi_watchdog.c doesn't even use it, so that reference was
deleted.  Other references now use kernel.h instead of straight extern int.
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4ffd90a1

[PATCH] EDD: add edd=off and edd=skipmbr options · e9855e2c

Matt Domsch authored Jan 04, 2005

EDD: add edd=off and edd=skipmbr command line options
   
New command line options
edd=off     (or edd=of)
edd=skipmbr (or edd=sk)

runtime options for disabling all EDD int13 calls completely, or for
skipping the int13 READ SECTOR calls, respectively.

These are provided to allow Linux distributions to include CONFIG_EDD=m, yet
allow end-users to disable parts of EDD which may not work well with their
system's BIOS.

I incorporated comments from Randy Dunlap, and got an ack from Andi Kleen.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e9855e2c

[PATCH] make gconfig work with gtk-2.4 · 9506e197

J. A. Magallon authored Jan 04, 2005

I need this to make gconfig work under gtk-2.4.  Without this, it just
coredumps.  There is some problem with pixmap creation/usage from XPM in
the way it is done in gconf, so I just added some stock icons.  It is even
prettier..;)

Could someone test this still works on gtk-2.0 or 2.2 ?

Changes:

- change the wiget class 'button' in glade files to something known to
  glade (GtkToolButton)
- use 'stock-id' property for toolbar buttons instead of "stock_pixmap"
- change unknown signal "pressed" to "clicked"
- remove manual setting of icons in gconf.c
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9506e197

[PATCH] sys_sched_setaffinity() on UP should fail for non-zero CPUs. · 07492792

Rusty Russell authored Jan 04, 2005

Return EINVAL for invalid sched_setaffinity on UP.  I was a little
surprised that sys_sched_setaffinity for CPU 1 didn't fail on my UP box.
With CONFIG_SMP it would have.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

07492792

[PATCH] smb_file_open() retval fix · c805134e

Tvrtko A. Ursulin authored Jan 04, 2005

Correctly propagate the return value from smb_open(). 
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c805134e

[PATCH] rcu: simplify quiescent state detection · 17b3fed1

Manfred Spraul authored Jan 04, 2005

Based on an initial patch from Oleg Nesterov <oleg@tv-sign.ru>

rcu_data.last_qsctr is not needed.  Actually, not even a counter is needed,
just a flag that indicates that there was a quiescent state.
Signed-Off-By: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

17b3fed1

[PATCH] rcu: make two internal structs static · 2f803905

Manfred Spraul authored Jan 04, 2005

The patch below makes two needlessly global structs static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2f803905

[PATCH] rcu: eliminate rcu_ctrlblk.lock · a48d69a5

Oleg Nesterov authored Jan 04, 2005

rcu_ctrlblk.lock is used to read the ->cur and ->next_pending
atomically in __rcu_process_callbacks(). It can be replaced
by a couple of memory barriers.

rcu_start_batch:
	rcp->next_pending = 0;
	smp_wmb();
	rcp->cur++;

__rcu_process_callbacks:
	rdp->batch = rcp->cur + 1;
	smp_rmb();
	if (!rcp->next_pending)
		rcu_start_batch(rcp, rsp, 1);

This way, if __rcu_process_callbacks() sees incremented ->cur value,
it must also see that ->next_pending == 0 (or rcu_start_batch() is
already in progress on another cpu).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a48d69a5

[PATCH] remove ip2 programs · 38f808dd

Adrian Bunk authored Jan 04, 2005

drivers/char/ip2/ contained three programs. Besides shipping programs at
this place doesn't sound like a good idea, they didn't even all compile.

The patch below removes them.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

38f808dd

[PATCH] Sync in core time granuality with filesystems · 8ce13b01

Andi Kleen authored Jan 04, 2005

This patch corrects a problem that was originally added with the nanosecond
timestamps in stat patch. The problem is that some file systems don't have
enough space in their on disk inode to save nanosecond timestamps, so they
truncate the c/a/mtime to seconds when flushing an dirty node. In core the
inode would have full jiffies granuality.

This can be observed by programs as a timestamp that jumps backwards under
specific loads when an inode is flushed and then reloaded from disk.

The problem was already known when the original patch went in, but it
wasn't deemed important enough at that time. So far there has been only
one report of it causing problems. Now Tridge is worried that it will
break running Excel over samba4 because Excel seems to do very anal
timestamp checking and samba4 will supply 100ns timestamps over the
network.

This patch solves it by putting the time resolution into the superblock of
a fs and always rounding the in core timestamps to that granuality.

This also supercedes some previous ext2/3 hacks to flush the inode less
often when only the subsecond timestamp changes.

I tried to keep the overhead low, in particular it tries to keep divisions
out of fast paths as far as possible.

The patch is quite big but 99% of it is just relatively straight forward
search'n'replace in a lot of fs. Unconverted filesystems will default to a
1ns granuality, but may still show the problem if they continue to use
CURRENT_TIME. I converted all in tree fs.

One possible future extension of this would be to have two time
granualities per superblock - one that specifies the visible resolution,
and the other to specify how often timestamps should be flushed to disk,
which could be tuned with a mount option per fs (e.g. often m/atimes don't
need to be flushed every second). Would be easy to do as an addon if
someone is interested.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8ce13b01

[PATCH] sys_stime needs a compat function · 8fa29920

Martin Schwidefsky authored Jan 04, 2005

I realized that the best way to get the sys_time/sys_stime problem fixed is
to make sys_time 64 bit safe by using "time_t *" instead of "int *" and to
introduce two proper compat functions compat_sys_time and compat_sys_stime.

The prototype change of sys_time is transparent for 32 bit architectures
because both "int" and "time_t" are 32 bit. For 64 bit the type change
would be wrong but luckily no 64 bit architecture uses sys_time/sys_stime
in 64 bit mode. The patch makes the following change:

ia64 : Remove sys32_time, use compat_sys_time and
add (!!) compat_sys_stime to compat syscall table.
mips : Use compat_sys_time/compat_sys_stime in 32 bit syscall table.
Add #ifdef magic to compile sys_time/sys_stime and
compat_sys_time/compat_sys_stime only if needed.
parisc : Remove sys32_time, use compat_sys_time and compat_sys_stime.
ppc64 : remove sys32_time, ppc64_sys32_stime and ppc64_sys_stime.
Use common compat_sys_time, compat_sys_stime and sys_stime.
s390 : Use compat_sys_stime. Add #ifdef magic to compile
sys_time/sys_stime and compat_sys_time/compat_sys_stime only
if needed.
sparc64 : Use compat_sys_time/compat_Sys_stime in 32 bit syscall table.
um : Remove um_time and um_stime. Use common functions sys_time and
sys_stime. This adds a CAP_SYS_TIME check to UMs stime call.
x86_64 : Remove sys32_time. Use compat_sys_time and compat_sys_stime
in 32 bit syscall table.

The original stime bug is fixed for mips, parisc, s390, sparc64 and
x86_64. Can the arch-maintainers please take a look at this?

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Convert compat_time_t to time_t in 32 bit emulation for sys_stime and
consolidate all the different implementation of sys_time, sys_stime and
their 32-bit emulation parts.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8fa29920

[PATCH] compile with -ffreestanding · d6326c18

Adrian Bunk authored Jan 04, 2005

For the kernel, it would be logical to use -ffreestanding.  The kernel is
not a hosted environment with a standard C library.

The gcc option -ffreestanding is supported by both gcc 2.95 and 3.4, which
covers the whole range of currently supported compilers.

Regarding changes caused by this patch:

Andi Kleen reported:
  Newer gcc rewrites sprintf(buf,"%s",str) to strcpy(buf,str) transparently.

This is only true with unit-at-a-time (disabled on i386 but enabled on
x86_64).  The Linux kernel doesn't offer a standard C library, and such
transparent replacements of kernel functions with builtins are quite
fragile.

Even with -ffreestanding, it's still possilble to explicitely use a gcc
builtin if desired.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d6326c18

[PATCH] Off by one in drivers/parport/probe.c · 064da5f6

Alexander Nyberg authored Jan 04, 2005

This fixes a theoretical bug indicated in:
http://bugme.osdl.org/show_bug.cgi?id=240

It prevents overflow in case the required buffer is larger than the passed
buffer.  This I found to be the minimally intrusive change.
Signed-off-by: Alexander Nyberg <alexn@dsv.su.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

064da5f6

[PATCH] ext3: support for EA in inode · 78085a46

Alex Tomas authored Jan 04, 2005

1) intent of the patch is to get possibility to store EAs in the body of large
   inode. it saves space and improves performance in some cases

2) the patch is quite simple: it works the same way original xattr does, but
   using other storage (inode body). body has priority over separate block.
   original routines (ext3_xattr_get, ext3_xattr_list, ext3_xattr_set) are
   renamed to ext3_xattr_block_*. new routines that handle inode storate are
   added (ext3_xattr_ibody_get, ext3_xattr_ibody_list, ext3_xattr_ibody_set).
   routines ext3_xattr_get, ext3_xattr_list and ext3_xattr_set allow user to
   accesss both the storages transparently

3) the change makes sense on filesystem with inode size >= 256 bytes only.
   2.4 kernels don't support such a filesystems, AFAIK. 2.6 kernels do support
   and ignore EAs stored in a body w/o the patch

4) debugfs and e2fsck need to be patched to deal with EAs in inode
   the patch will be sent later

5) testing results:
	a) Andrew Samba Master (tridge) has done successful tests
	b) we've been using ea-in-inode feature in Lustre for many months
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

78085a46

[PATCH] Reduce i_sem usage during file sync operations · fbdce7d7

Andrew Morton authored Jan 04, 2005

We hold i_sem during the various sync() operations to prevent livelocks:
if another thread is dirtying the file, a sync() may never return.

Or at least, that used to be true when we were using the per-address_space
page lists.  Since writeback has used radix tree traversal it is not possible
to livelock the sync() operations, because they only visit each page a single
time.

sync_page_range() (used by O_SYNC writes) has not been holding i_sem for quite
some time, for the above reasons.

The patch converts fsync(), fdatasync() and msync() to also not hold i_sem
during the radix-tree-based writeback.

Now, we _do_ still need to hold i_sem across the file->f_op->fsync() call,
because that is still based on a list_head walk, and is still livelockable.

But in the case of msync() I deliberately left i_sem untaken.  This is because
we're currently deadlockable in msync, because mmap_sem is already held, and
mmap_sem nexts inside i_sem, due to direct-io.c.

And yes, the ranking of down_read() veruss down() does matter:

	Task A			Task B		Task C

	down_read(rwsem)
				down(sem)
						down_write(rwsem)
	down(sem)
				down_read(rwsem)


C's down_write() will cause B's down_read to block.  B holds `sem', so A will
never release `rwsem'.

So the patch fixes a hard-to-hit triple-task deadlock, but adds a possible
livelock in msync().  It is possible to fix sys_msync() so that it takes i_sem
outside i_mmap_sem.  Later.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

fbdce7d7

[PATCH] suppress might_sleep() if oopsing · e486b6b7

Andrew Morton authored Jan 04, 2005

We can call might_sleep() functions on the oops handling path (under do_exit).

There seem little point in emitting spurious might_sleep() warnings into the
logs after the kernel has oopsed.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e486b6b7

[PATCH] fork: total_forks not counted under tasklist_lock · fe52f966

Prasanna Meda authored Jan 04, 2005

Bring the total_forks under tasklist_lock.  When most of the fork code
icluding nr_threads is moved to copy_process() from do_fork() code in 2.6,
this is left out.

Althought accuracy of total_forks is not important, it would be nice to add
this.  It does not involve additional cost, and the code will be cleaner if
it is grouped with nr_threads.  The difference is, total_forks will
increase on fork, but nr_threads will increase on fork and decrease on the
exit.

I also moved extern decleration to sched.h from proc_misc.c.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

fe52f966

[PATCH] time runx too fast after S3 · bb51bc59

Li Shaohua authored Jan 04, 2005

After resume from S3, 'date' shows time run too fast.
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bb51bc59

[PATCH] cpumask_t initializers · 90b8f3ac

Matthew Dobson authored Jan 04, 2005

In the course of another patch I've been working on, I stumbled across
some weirdness with some of the SD_*_INIT sched_domains initializers. A
day or so of digging narrowed it down to the CPU_MASK_NONE initializer
nested inside the sched_domain initializers. The errors I got were:

kernel/sched.c:4812: error: initializer element is not constant
kernel/sched.c:4812: error: (near initialization for `sched_domain_dummy')
kernel/sched.c:4812: error: initializer element is not constant

which was this line:

static struct sched_domain sched_domain_dummy = SD_CPU_INIT;

Janis Johnson, a GCC hacker, told me the following:

90b8f3ac

[PATCH] ext3: handle attempted double-delete of metadata. · a3192788

Stephen C. Tweedie authored Jan 04, 2005

This patch improves ext3's ability to deal with corruption on-disk.  If we
try to delete a metadata block twice, we confuse ext3's internal revoke
error-checking, resulting in a BUG().  But this can occur in practice due
to a corrupt indirect block, so we should attempt to fail gracefully.

Downgrade the assert failure to a JH_EXPECT_BH failure, and return EIO when
it occurs.

This is easily reproduced with a sample ext3 fs image containing an inode
which references the same indirect block more than once.  Deleting that
inode will BUG() an unfixed kernel with:

Assertion failure in journal_revoke() at fs/jbd/revoke.c:379:
"!buffer_revoked(bh)"

With the fix, ext3 recovers gracefully.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a3192788

[PATCH] ext3: handle attempted delete of bitmap blocks. · c579b4e2

Stephen C. Tweedie authored Jan 04, 2005

This patch improves ext3's ability to deal with corruption on-disk.  If we
ever get a corrupt inode or indirect block, then an attempt to delete it
can end up trying to remove any block on the fs, including bitmap blocks.
This can cause ext3 to assert-fail as we end up trying to do an ext3_forget
on a buffer with b_committed_data set.

The fix is to downgrade this to an IO error and journal abort, so that we
take the filesystem readonly but don't bring down the whole kernel.

Make J_EXPECT_JH() return a value so it can be easily tested and yet still
retained as an assert failure if we build ext3 with full internal debugging
enabled.  Make journal_forget() return an error code so that in this case
the error can be passed up to the caller.

This is easily reproduced with a sample ext3 fs image containing an inode
whose direct and indirect blocks refer to a block bitmap block.  Allocating
new blocks and then deleting that inode will BUG() with:

Assertion failure in journal_forget() at fs/jbd/transaction.c:1228:
"!jh->b_committed_data"

With the fix, ext3 recovers gracefully.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c579b4e2

[PATCH] ext3: cleanup handling of aborted transactions. · 046527de

Stephen C. Tweedie authored Jan 04, 2005

This patch improves ext3's error logging when we encounter an on-disk
corruption.  Previously, a transaction (such as a truncate) which encountered
many corruptions (eg.  a single highly-corrupt indirect block) would emit
copious "aborting transaction" errors to the log.

Even worse, encountering an aborted journal can count as such an error,
leading to a flood of spurious "aborting transaction: Journal has aborted"
errors.

With the fix, only emit that message on the first error.  The patch also
restores a missing \n in that printk path.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

046527de