• David Miller's avatar
    ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd
    David Miller authored
    When performing a non-consuming read, a synchronize_sched() is
    performed once for every cpu which is actively tracing.
    
    This is very expensive, and can make it take several seconds to open
    up the 'trace' file with lots of cpus.
    
    Only one synchronize_sched() call is actually necessary.  What is
    desired is for all cpus to see the disabling state change.  So we
    transform the existing sequence:
    
    	for_each_cpu() {
    		ring_buffer_read_start();
    	}
    
    where each ring_buffer_start() call performs a synchronize_sched(),
    into the following:
    
    	for_each_cpu() {
    		ring_buffer_read_prepare();
    	}
    	ring_buffer_read_prepare_sync();
    	for_each_cpu() {
    		ring_buffer_read_start();
    	}
    
    wherein only the single ring_buffer_read_prepare_sync() call needs to
    do the synchronize_sched().
    
    The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
    memory and increments ->record_disabled.
    
    In the second phase, ring_buffer_read_prepare_sync() makes sure this
    ->record_disabled state is visible fully to all cpus.
    
    And in the final third phase, the ring_buffer_read_start() calls reset
    the 'iter' objects allocated in the first phase since we now know that
    none of the cpus are adding trace entries any more.
    
    This makes openning the 'trace' file nearly instantaneous on a
    sparc64 Niagara2 box with 128 cpus tracing.
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
    Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    72c9ddfd
trace.c 103 KB