• Rafael J. Wysocki's avatar
    thermal: core: Allow thermal zones to tell the core to ignore them · e528be3c
    Rafael J. Wysocki authored
    The iwlwifi wireless driver registers a thermal zone that is only needed
    when the network interface handled by it is up and it wants that thermal
    zone to be effectively ignored by the core otherwise.
    
    Before commit a8a26177 ("thermal: core: Call monitor_thermal_zone()
    if zone temperature is invalid") that could be achieved by returning
    an error code from the thermal zone's .get_temp() callback because the
    core did not really handle errors returned by it almost at all.
    However, commit a8a26177 made the core attempt to recover from the
    situation in which the temperature of a thermal zone cannot be
    determined due to errors returned by its .get_temp() and is always
    invalid from the core's perspective.
    
    That was done because there are thermal zones in which .get_temp()
    returns errors to start with due to some difficulties related to the
    initialization ordering, but then it will start to produce valid
    temperature values at one point.
    
    Unfortunately, the simple approach taken by commit a8a26177,
    which is to poll the thermal zone periodically until its .get_temp()
    callback starts to return valid temperature values, is at odds with
    the special thermal zone in iwlwifi in which .get_temp() may always
    return an error because its network interface may always be down.  If
    that happens, every attempt to invoke the thermal zone's .get_temp()
    callback resulting in an error causes the thermal core to print a
    dev_warn() message to the kernel log which is super-noisy.
    
    To address this problem, make the core handle the case in which
    .get_temp() returns 0, but the temperature value returned by it
    is not actually valid, in a special way.  Namely, make the core
    completely ignore the invalid temperature value coming from
    .get_temp() in that case, which requires folding in
    update_temperature() into its caller and a few related changes.
    
    On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
    and put THERMAL_TEMP_INVALID into the temperature return memory
    location instead of returning an error when the firmware is not
    running or it is not of the right type.
    
    Also, to clearly separate the handling of invalid temperature
    values from the thermal zone initialization, introduce a special
    THERMAL_TEMP_INIT value specifically for the latter purpose.
    
    Fixes: a8a26177 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
    Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/Reported-by: default avatarEric Biggers <ebiggers@kernel.org>
    Reported-by: default avatarStefan Lippers-Hollmann <s.l-h@gmx.de>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
    Tested-by: default avatarStefan Lippers-Hollmann <s.l-h@gmx.de>
    Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net
    [ rjw: Rebased on top of the current mainline ]
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    e528be3c
thermal_core.h 10.5 KB