• Paul Mackerras's avatar
    perfcounters: make context switch and migration software counters work again · c07c99b6
    Paul Mackerras authored
    Jaswinder Singh Rajput reported that commit 23a185ca caused the
    context switch and migration software counters to report zero always.
    With that commit, the software counters only count events that occur
    between sched-in and sched-out for a task.  This is necessary for the
    counter enable/disable prctls and ioctls to work.  However, the
    context switch and migration counts are incremented after sched-out
    for one task and before sched-in for the next.  Since the increment
    doesn't occur while a task is scheduled in (as far as the software
    counters are concerned) it doesn't count towards any counter.
    
    Thus the context switch and migration counters need to count events
    that occur at any time, provided the counter is enabled, not just
    those that occur while the task is scheduled in (from the perf_counter
    subsystem's point of view).  The problem though is that the software
    counter code can't tell the difference between being enabled and being
    scheduled in, and between being disabled and being scheduled out,
    since we use the one pair of enable/disable entry points for both.
    That is, the high-level disable operation simply arranges for the
    counter to not be scheduled in any more, and the high-level enable
    operation arranges for it to be scheduled in again.
    
    One way to solve this would be to have sched_in/out operations in the
    hw_perf_counter_ops struct as well as enable/disable.  However, this
    takes a simpler approach: it adds a 'prev_state' field to the
    perf_counter struct that allows a counter's enable method to know
    whether the counter was previously disabled or just inactive
    (scheduled out), and therefore whether the enable method is being
    called as a result of a high-level enable or a schedule-in operation.
    
    This then allows the context switch, migration and page fault counters
    to reset their hw.prev_count value in their enable functions only if
    they are called as a result of a high-level enable operation.
    Although page faults would normally only occur while the counter is
    scheduled in, this changes the page fault counter code too in case
    there are ever circumstances where page faults get counted against a
    task while its counters are not scheduled in.
    Reported-by: default avatarJaswinder Singh Rajput <jaswinder@kernel.org>
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    c07c99b6
perf_counter.c 52.6 KB