Commits · b15fac8db8acc8eb467c81e2245b9f1248e7c588 · Kirill Smelkov / linux

01 Aug, 2002 4 commits
- SPARC: Move sun_do_break from serial layer into arch code. · b15fac8d
  David S. Miller authored Aug 01, 2002
  
  b15fac8d
- include/asm-sparc64/irq.h: Add irq_cannonicalize. · 01a42780
  David S. Miller authored Aug 01, 2002
  
  01a42780
- SPARC64: Kill rs_init calls from sbus/pci init. · 7ee16b6a
  David S. Miller authored Jul 31, 2002
  
  7ee16b6a
- SPARC64: Update for new do_munmap argument. · 6e61530c
  David S. Miller authored Jul 31, 2002
  
  6e61530c
31 Jul, 2002 9 commits
- SPARC UART: More build fixes. · b3718602
  David S. Miller authored Jul 31, 2002
  
  b3718602
- serial/sun{core,zilog}.c: build fixes. · 7e0fc16e
  David S. Miller authored Jul 31, 2002
  
  7e0fc16e
- serial/Makefile: Mark suncore.o as export-objs instead of sunsu.o · 6f0a8b1c
  David S. Miller authored Jul 31, 2002
  
  6f0a8b1c
- SPARC: Move of move to generic input layer for kbd/mouse. · dd68fd71
  David S. Miller authored Jul 31, 2002
  
  dd68fd71
- SPARC64: Implement semaphore trylock and downgrade. · 40b4cd90
  David S. Miller authored Jul 31, 2002
  
  40b4cd90
- SPARC: Kill old sunkbd/sunmouse drivers in favor of serio input layer. · 1e1449f7
  David S. Miller authored Jul 31, 2002
  
  1e1449f7
- SPARC: Move serial config over to use UART layer. · 4514a97b
  David S. Miller authored Jul 31, 2002
  
  4514a97b
- UART: Update for Sparc drivers. · 08a43526
  David S. Miller authored Jul 31, 2002
  
  08a43526
- SPARC: First pass converting serial drivers to UART layer. · d0c02df3
  David S. Miller authored Jul 31, 2002
  
  d0c02df3
30 Jul, 2002 3 commits
- OpenPROM: Sigh, put the length overflow check back it is needed. · 35cbb1e7
  David S. Miller authored Jul 30, 2002
  
  35cbb1e7
- SPARC: Kill CONFIG_SUN_CONSOLE checks, always on so check is pointless. · d837cda3
  David S. Miller authored Jul 30, 2002
  
  d837cda3
- OpenPROM: Kill len check, it is pointless. · d4d9c3ac
  David S. Miller authored Jul 30, 2002
  
  d4d9c3ac
28 Jul, 2002 24 commits

drivers/serial/Makefile: Add SUNCORE/SUNZILOG build. · aa2a3d64
David S. Miller authored Jul 28, 2002

aa2a3d64
SERIAL: sun.[ch] --> suncore.[ch] · 445c8de6
David S. Miller authored Jul 28, 2002

445c8de6
include/asm-sparc64/system.h: Define task_running. · 677cc14d
David S. Miller authored Jul 28, 2002

677cc14d
Merge nuts.ninka.net:/home/davem/src/BK/BAK-sparc-2.5 · a7046b3c
David S. Miller authored Jul 28, 2002
```
into nuts.ninka.net:/home/davem/src/BK/sparc-2.5
```
a7046b3c
SPARC: Beginning of converting Sparc serial drivers to UART layer. · 1a308b2c
David S. Miller authored Jul 28, 2002

1a308b2c
Make "cpu_relax()" imply a barrier, since that's how it is · 3f0c2c5b
Linus Torvalds authored Jul 28, 2002
```
used.

This fixes a lockup in synchronize_irq() on x86.
```
3f0c2c5b
Automerge · 984e13d3
Linus Torvalds authored Jul 28, 2002

984e13d3
Merge · 621f5626
Linus Torvalds authored Jul 28, 2002

621f5626

Ingo Molnar authored Jul 28, 2002

the attached patch is a comment update of sched.c and it also does a small
cleanup in migration_thread().

8e77485f

[PATCH] SCSI MODE_SENSE transfer length fix · c5155e55

Matthew Dharm authored Jul 28, 2002

Modified the MODE_SENSE write-protect test in sd.c to issue a SCSI
request with the request_bufflen the same size as the MODE_SENSE
command being issued requests.

c5155e55

[PATCH] SCSI INQUIRY transfer length fix · d7cdb541

Matthew Dharm authored Jul 28, 2002

Fixed one of the INQUIRY commands used for probing SCSI devices.  This
badly-formed command was trapped by the usb-storage driver BUG_ON()
which is designed to stop command with a badly formed transfer_length
field.

d7cdb541

[PATCH] put_page() uses audited · 06829ded

Andrew Morton authored Jul 28, 2002

Audit put_page() uses of pages that may be in the page cache.

Use page_cache_release() instead.

06829ded

[PATCH] Re: Limit in set_thread_area · 686d6649

Ingo Molnar authored Jul 28, 2002

the attached patch does the set_thread_area parameter simplification - it
also cleans up some other TLS issues, it removes the tls_* fields from the
thread_struct, and removes the now unused page-granularity flag.

686d6649

[PATCH] permit modular build of raw driver · 603e29ca

Andrew Morton authored Jul 28, 2002

This patch allows the raw driver to be built as a kernel module.

It also cleans up a bunch of stuff, C99ifies the initialisers, gives
lots of symbols static scope, etc.

The module is unloadable when there are zero bindings. The current
ioctl() interface have no way of undoing a binding - it only allows
bindings to be overwritten. So I overloaded a bind to major=0,minor=0
to mean "undo the binding". I'll update the raw(8) manpage for that.

generic_file_direct_IO has been exported to modules.

The call to invalidate_inode_pages2() has been removed from all
generic_file_driect_IO() callers, into generic_file_direct_IO() itself.
Mainly to avoid exporting invalidate_inode_pages2() to modules.

603e29ca

[PATCH] direct IO updates · 0d85f8bf

Andrew Morton authored Jul 28, 2002

This patch is a performance and correctness update to the direct-IO
code: O_DIRECT and the raw driver.  It mainly affects IO against
blockdevs.

The direct_io code was returning -EINVAL for a filesystem hole.  Change
it to clear the userspace page instead.

There were a few restrictions and weirdnesses wrt blocksize and
alignments.  The code has been reworked so we now lay out maximum-sized
BIOs at any sector alignment.

Because of this, the raw driver has been altered to set the blockdev's
soft blocksize to the minimum possible at open() time.  Typically, 512
bytes.  There are now no performance disadvantages to using small
blocksizes, and this gives the finest possible alignment.

There is no API here for setting or querying the soft blocksize of the
raw driver (there never was, really), which could conceivably be a
problem.  If it is, we can permit BLKBSZSET and BLKBSZGET against the
fd which /dev/raw/rawN returned, but that would require that
blk_ioctl() be exported to modules again.

This code is wickedly quick.  Here's an oprofile of a single 500MHz
PIII reading from four (old) scsi disks (two aic7xxx controllers) via
the raw driver.  Aggregate throughput is 72 megabytes/second:

c013363c 24       0.0896492   __set_page_dirty_buffers
c021b8cc 24       0.0896492   ahc_linux_isr
c012b5dc 25       0.0933846   kmem_cache_free
c014d894 26       0.09712     dio_bio_complete
c01cc78c 26       0.09712     number
c0123bd4 40       0.149415    follow_page
c01eed8c 46       0.171828    end_that_request_first
c01ed410 49       0.183034    blk_recount_segments
c01ed574 65       0.2428      blk_rq_map_sg
c014db38 85       0.317508    do_direct_IO
c021b090 90       0.336185    ahc_linux_run_device_queue
c010bb78 236      0.881551    timer_interrupt
c01052d8 25354    94.707      poll_idle

A testament to the efficiency of the 2.5 block layer.

And against four IDE disks on an HPT374 controller.  Throughput is 120
megabytes/sec:

c01eed8c 80       0.292462    end_that_request_first
c01fe850 87       0.318052    hpt3xx_intrproc
c01ed574 123      0.44966     blk_rq_map_sg
c01f8f10 141      0.515464    ata_select
c014db38 153      0.559333    do_direct_IO
c010bb78 235      0.859107    timer_interrupt
c01f9144 281      1.02727     ata_irq_enable
c01ff990 290      1.06017     udma_pci_init
c01fe878 308      1.12598     hpt3xx_maskproc
c02006f8 379      1.38554     idedisk_do_request
c02356a0 609      2.22637     pci_conf1_read
c01ff8dc 611      2.23368     udma_pci_start
c01ff950 922      3.37062     udma_pci_irq_status
c01f8fac 1002     3.66308     ata_status
c01ff26c 1059     3.87146     ata_start_dma
c01feb70 1141     4.17124     hpt374_udma_stop
c01f9228 3072     11.2305     ata_out_regfile
c01052d8 15193    55.5422     poll_idle

Not so good.

One problem which has been identified with O_DIRECT is the cost of
repeated calls into the mapping's get_block() callback.  Not a big
problem with ext2 but other filesystems have more complex get_block
implementations.

So what I have done is to require that callers of generic_direct_IO()
implement the new `get_blocks()' interface.  This is a small extension
to get_block().  It gets passed another argument which indicates the
maximum number of blocks which should be mapped, and it returns the
number of blocks which it did map in bh_result->b_size.  This allows
the fs to map up to 4G of disk (or of hole) in a single get_block()
invokation.

There are some other caveats and requirements of get_blocks() which are
documented in the comment block over fs/direct_io.c:get_more_blocks().

Possibly, get_blocks() will be the 2.6 kernel's way of doing gang block
mapping.  It certainly allows good speedups.  But it doesn't allow the
fs to return a scatter list of blocks - it only understands linear
chunks of disk.  I think that's really all it _should_ do.

I'll let get_blocks() sit for a while and wait for some feedback.  If
it is sufficient and nobody objects too much, I shall convert all
get_block() instances in the kernel to be get_blocks() instances.  And
I'll teach readahead (at least) to use the get_blocks() extension.

Delayed allocate writeback could use get_blocks().  As could
block_prepare_write() for blocksize < PAGE_CACHE_SIZE.  There's no
mileage using it in mpage_writepages() because all our filesystems are
syncalloc, and nobody uses MAP_SHARED for much.

It will be tricky to use get_blocks() for writes, because if a ton of
blocks have been mapped into the file and then something goes wrong,
the kernel needs to either remove those blocks from the file or zero
them out.  The direct_io code zeroes them out.

btw, some time ago you mentioned that some drivers and/or hardware may
get upset if there are multiple simultaneous IOs in progress against
the same block.  Well, the raw driver has always allowed that to
happen.  O_DIRECT writes to blockdevs do as well now.

todo:

1) The driver will probably explode if someone runs BLKBSZSET while
   IO is in progress.  Need to use bdclaim() somewhere.

2) readv() and writev() need to become direct_io-aware.  At present
   we're doing stop-and-wait for each segment when performing
   readv/writev against the raw driver and O_DIRECT blockdevs.

0d85f8bf

[PATCH] use c99 initialisers in ext3 · 62b52f5c
Andrew Morton authored Jul 28, 2002
```
Convert ext3 to the C99 initialiser format.  From Rusty.
```
62b52f5c

[PATCH] strict overcommit · 502bff06

Andrew Morton authored Jul 28, 2002

Alan's overcommit patch, brought to 2.5 by Robert Love.

Can't say I've tested its functionality at all, but it doesn't crash,
it has been in -ac and RH kernels for some time and I haven't observed
any of its functions on profiles.

"So what is strict VM overcommit? We introduce new overcommit
policies that attempt to never succeed an allocation that can not be
fulfilled by the backing store and consequently never OOM. This is
achieved through strict accounting of the committed address space and
a policy to allow/refuse allocations based on that accounting.

In the strictest of modes, it should be impossible to allocate more
memory than available and impossible to OOM. All memory failures
should be pushed down to the allocation routines -- malloc, mmap, etc.

The new modes are available via sysctl (same as before). See
Documentation/vm/overcommit-accounting for more information."

502bff06

[PATCH] for_each_zone macro · a4b065fa

Andrew Morton authored Jul 28, 2002

Patch from Robert Love.

Attached patch implements for_each_zone(zont_t *) which is a helper
macro to cleanup code of the form:

        for (pgdat = pgdat_list; pgdat; pgdat = pgdat->node_next) {
                for (i = 0; i < MAX_NR_ZONES; ++i) {
                        zone_t * z = pgdat->node_zones + i;
                        /* ... */
                }
        }

and replace it with:

        for_each_zone(zone) {
                /* ... */
        }

This patch only replaces one use of the above loop with the new macro.
Pending code, however, currently in the full rmap patch uses
for_each_zone more extensively.

a4b065fa

[PATCH] for_each_pgdat macro · f183c478

Andrew Morton authored Jul 28, 2002

Patch from Robert Love.

This patch implements for_each_pgdat(pg_data_t *) which is a helper
macro to cleanup code that does a loop of the form:

        pgdat = pgdat_list;
        while(pgdat) {
	        /* ... */
	        pgdat = pgdat->node_next;
	}

and replace it with:

	for_each_pgdat(pgdat) {
		/* ... */
	}

This code is from Rik's 2.4-rmap patch and is by William Irwin.

f183c478

[PATCH] optimise struct page layout · a854c11b

Andrew Morton authored Jul 28, 2002

Reorganise the members of struct page.

- Place ->flags at the start so the compiler can generate indirect
  addressing rather than indirect+indexed for this commonly-accessed
  field.  Shrinks the kernel by ~100 bytes.

- Keep ->count with ->flags so they have the best chance of
  being in the same cacheline.

a854c11b

[PATCH] speed up pte_chain locking on uniprocessors · ab35295d
Andrew Morton authored Jul 28, 2002
```
ifdef out some operations in pte_chain_lock() which are not necessary
on uniprocessor.
```
ab35295d

[PATCH] show_free_areas() cleanup · c1ab3459

Andrew Morton authored Jul 28, 2002

Cleanup to show_free_areas() from Bill Irwin:

show_free_areas() and show_free_areas_core() is a mess.
(1) it uses a bizarre and ugly form of list iteration to walk buddy lists
        use standard list functions instead
(2) it prints the same information repeatedly once per-node
        rationalize the braindamaged iteration logic
(3) show_free_areas_node() is useless and not called anywhere
        remove it entirely
(4) show_free_areas() itself just calls show_free_areas_core()
        remove show_free_areas_core() and do the stuff directly
(5) SWAP_CACHE_INFO is always #defined, remove it
(6) INC_CACHE_INFO() doesn't use the do { } while (0) construct

This patch also includes Matthew Dobson's patch which removes
mm/numa.c:node_lock.  The consensus is that it doesn't do anything now
that show_free_areas_node() isn't there.

c1ab3459

[PATCH] use a slab cache for pte_chains · cbb6e8ec

Andrew Morton authored Jul 28, 2002

Patch from Bill Irwin.

It removes the custom pte_chain allocator in mm/rmap.c and replaces it
with a slab cache.

"This patch
 (1) eliminates the pte_chain_freelist_lock and all contention on it
 (2) gives the VM the ability to recover unused pte_chain pages

 Anton Blanchard has reported (1) from prior incarnations of this patch.
 Craig Kulesa has reported (2) in combination with slab-on-LRU patches.

 I've left OOM detection out of this patch entirely as upcoming patches
 will do real OOM handling for pte_chains and all the code changed anyway."

cbb6e8ec

[PATCH] misc fixes · 1a40868e

Andrew Morton authored Jul 28, 2002

There are a few VM-related patches in this series.  Mainly fixes;
feature work is on hold.

We have some fairly serious locking contention problems with the reverse
mapping's pte_chains.  Until we have a clear way out of that I believe
that it is best to not merge code which has a lot of rmap dependency.

It is apparent that these problems will not be solved by tweaking -
some redesign is needed.  In the 2.5 timeframe the only practical
solution appears to be page table sharing, based on Daniel's February
work.  Daniel and Dave McCracken are working that.

Some bits and pieces here:

- list_splice() has an open-coded list_empty() in it.  Use
  list_empty() instead.

- in shrink_cache() we have a local `nr_pages' which shadows another
  local.  Rename the inner one.  (Nikita Danilov)

- Add a BUG() on a can't-happen code path in page_remove_rmap().

- Tighten up the bug checks in the BH completion handlers - if the
  buffer is still under IO then it must be locked, because we unlock it
  inside the page_uptodate_lock.

1a40868e