• Nicolas Cavallari's avatar
    mt76: mt7915e: Fix degraded performance after temporary overheat · 771cd8d4
    Nicolas Cavallari authored
    mt7915e registers a cooling_device with wrong semantics:
    
    1. cooling_device expect that higher states values should cool more, but
       mt7915e did the opposite...  with the exception of state == 0, which
       should "disable thermal management", but does not seem to have any
       effect since the previous state is kept.
    
    The result is that when the thermal zone heats up a bit and bumps the
    cooling_device state from 0 to 1 to cool a bit, the performance is
    destroyed, and when going back from 1 to 0, the performance stays bad.
    
    2. Reading the cooling_device state does not always return the last
       written state, but can return the actual hardware throttle state,
       which is different.
    
    This is a problem because the mt7915 firmware actually implement the
    equivalent of a thermal zone with trip points.  Setting the cooling
    device state actually changes the throttles at each trip point, so the
    following could occur if the first issue is fixed:
    
    - thermal subsystem set state to 100% power (state=0)
    - mt7915e driver set trip throttles to [100%, 50%, 25%, 12%]
    - hardware heats up and decides to switch to 50% power
    - thermal subsystem see that power is 50% (state=50), decide to increase
      it to 60% (state=40) because the rest of the system is cool.
    - mt7915e driver set trip throttle to [60%, 30%, 15%, 7%]
    - hardware thus switches to 30% power
    [race to the bottom continues...]
    
    This patch corrects the semantics of the cooling_device to the one that
    the thermal subsystem expect it.
    Signed-off-by: default avatarNicolas Cavallari <nicolas.cavallari@green-communications.fr>
    Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
    771cd8d4
init.c 28.4 KB