• Andrew Morton's avatar
    [PATCH] select() speedup · 57a54189
    Andrew Morton authored
    From: Christoph Hellwig <hch@infradead.org>
    
    Originally by David Mosberger, testing by Roger Luethi.  From the ia64 tree.
    
    Basically, it avoids going to memory all the time.  What this does is make
    life a lot easier for gcc, so it can actually do a decent amount of
    optimization.  The restructuring clearly is less important for out-of-order
    CPUs, but even there it gives some benefits.
    
    More specifically, the loop is now structured to operate one "unsigned long"
    at a time, rather than one bit at a time.  Of course, you still need to
    process all the bits, but most of the relevant state in the inner loop can be
    kept in registers.
    
    Roger Luethi measured the routine on a bunch of different machines (mostly
    x86, IIRC: P5, P6, Crusoe, Athlons) and performance improved there, too (and
    it should definitely improve performance on any RISC-like architecture).
    
    
    Roger's benchmarking results (vs number of fd's):
    
                                           File                   TCP
    Numbfer of fd's:                  10   250  500          10   250   500
    
    UP, Pentium MMX 233MHz original	 8.2 108.5 212.8	11.0 180.0 356.5
    UP, Pentium MMX 233MHz w/patch	 7.4  87.6 171.1	10.4 163.6 323.4
    
    MP, Pentium MMX 233MHz original	15.7 283.8 562.8	18.9 354.4 705.5
    MP, Pentium MMX 233MHz w/patch	14.6 255.6 506.5	17.8 332.8 664.1
    
    UP, Athlon 1394 MHz original	 1.3  13.4  26.1	 1.9  24.7  48.6
    UP, Athlon 1394 MHz w/patch	 1.2  11.0  21.5	 1.6  22.3  43.8
    
    MP, Athlon 1394 MHz original	 1.6  22.4  44.6	 1.9  30.9  60.5
    MP, Athlon 1394 MHz w/patch	 1.5  21.2  41.7	 1.9  30.2  59.6
    57a54189
select.c 12.1 KB