• Stephane Eranian's avatar
    [IA64] perfmon & PAL_HALT again · 8df5a500
    Stephane Eranian authored
    The pmu_active test is based on the values of PSR.up. THIS IS THE PROBLEM as
    it does not take into account the lazy restore logic which is as follow (simplified):
    
    context switch out:
    	save PMDs
    	clear psr.up
    	release ownership
    
    context switch in:
    	if (ctx->last_cpu == smp_processor_id() && ctx->cpu_activation == cpu_activation) {
    		set psr.up
    		return
    	}
    	restore PMD
    	restore PMC
    	ctx->last_cpu   = smp_processor_id();
    	ctx->activation = ++cpu_activation;
    	set psr.up
    
    The key here is that on context switch out, we clear psr.up and on context switch in
    we check if nobody else used the PMU on that processor since last time we came. In
    that case, we assume the PMD/PMC are ours and we simply reactivate.
    
    The Caliper problem is that between the moment we context switch out and the moment we
    come back, nobody effectively used the PMU BUT the processor went idle. Normally this
    would have no incidence but PAL_HALT does alter the PMU registers.  In default_idle(),
    the test on psr.up is not strong enough to cover this case and we go into PAL which
    trashed the PMU resgisters. When we come back we falsely assume that this is our state
    yet it is corrupted. Very nasty indeed.
    
    To avoid the problem it is necessary to forbid going to PAL_HALT as soon as perfmon
    installs some valid state in the PMU registers. This happens with an application
    attaches a context to a thread or CPU. It is not enough to check the psr/dcr bits.
    Hence I propose the attached patch. It adds a callback in process.c to modify the
    condition to enter PAL on idle. Basically, now it is conditional to pal_halt=1 AND
    perfmon saying it is okay.
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    8df5a500
process.c 21 KB