1. 28 Dec, 2023 3 commits
    • Rafael J. Wysocki's avatar
      thermal: core: Resume thermal zones asynchronously · 5a5efdaf
      Rafael J. Wysocki authored
      The resume of thermal zones in thermal_pm_notify() is carried out
      sequentially, which may be a problem if __thermal_zone_device_update()
      takes a significant time to run for some thermal zones, because some
      other thermal zones may need to wait for them to resume then and if
      any other PM notifiers are going to be invoked after the thermal one,
      they will need to wait for it either.
      
      To address this, make thermal_pm_notify() switch the poll_queue delayed
      work over to a one-shot thermal_zone_device_resume() work function that
      will restore the original one during the thermal zone resume and queue
      up poll_queue without a delay for each thermal zone.
      
      Link: https://lore.kernel.org/linux-pm/20231120234015.3273143-1-radusolea@google.com/Reported-by: default avatarRadu Solea <radusolea@google.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5a5efdaf
    • Rafael J. Wysocki's avatar
      thermal: core: Initialize poll_queue in thermal_zone_device_init() · 33fcb595
      Rafael J. Wysocki authored
      In preparation for a subsequent change, move the initialization of the
      poll_queue delayed work from thermal_zone_device_register_with_trips()
      to thermal_zone_device_init() which is called by the former.
      
      However, because thermal_zone_device_init() is also called by
      thermal_pm_notify(), make the latter call cancel_delayed_work() on
      poll_queue before invoking the former, so as to allow the work
      item to be re-initialized safely.
      
      Also move thermal_zone_device_check() which needs to be defined
      before thermal_zone_device_init(), so the latter can pass it to the
      INIT_DELAYED_WORK() macro.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      33fcb595
    • Rafael J. Wysocki's avatar
      thermal: core: Fix thermal zone suspend-resume synchronization · 4e814173
      Rafael J. Wysocki authored
      There are 3 synchronization issues with thermal zone suspend-resume
      during system-wide transitions:
      
       1. The resume code runs in a PM notifier which is invoked after user
          space has been thawed, so it can run concurrently with user space
          which can trigger a thermal zone device removal.  If that happens,
          the thermal zone resume code may use a stale pointer to the next
          list element and crash, because it does not hold thermal_list_lock
          while walking thermal_tz_list.
      
       2. The thermal zone resume code calls thermal_zone_device_init()
          outside the zone lock, so user space or an update triggered by
          the platform firmware may see an inconsistent state of a
          thermal zone leading to unexpected behavior.
      
       3. Clearing the in_suspend global variable in thermal_pm_notify()
          allows __thermal_zone_device_update() to continue for all thermal
          zones and it may as well run before the thermal_tz_list walk (or
          at any point during the list walk for that matter) and attempt to
          operate on a thermal zone that has not been resumed yet.  It may
          also race destructively with thermal_zone_device_init().
      
      To address these issues, add thermal_list_lock locking to
      thermal_pm_notify(), especially arount the thermal_tz_list,
      make it call thermal_zone_device_init() back-to-back with
      __thermal_zone_device_update() under the zone lock and replace
      in_suspend with per-zone bool "suspend" indicators set and unset
      under the given zone's lock.
      
      Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/Reported-by: default avatarBo Ye <bo.ye@mediatek.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4e814173
  2. 21 Dec, 2023 1 commit
  3. 15 Dec, 2023 1 commit
    • Rafael J. Wysocki's avatar
      thermal: core: Fix NULL pointer dereference in zone registration error path · 04e6ccfc
      Rafael J. Wysocki authored
      If device_register() in thermal_zone_device_register_with_trips()
      returns an error, the tz variable is set to NULL and subsequently
      dereferenced in kfree(tz->tzp).
      
      Commit adc8749b ("thermal/drivers/core: Use put_device() if
      device_register() fails") added the tz = NULL assignment in question to
      avoid a possible double-free after dropping the reference to the zone
      device.  However, after commit 4649620d ("thermal: core: Make
      thermal_zone_device_unregister() return after freeing the zone"), that
      assignment has become redundant, because dropping the reference to the
      zone device does not cause the zone object to be freed any more.
      
      Drop it to address the NULL pointer dereference.
      
      Fixes: 3d439b1a ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      04e6ccfc
  4. 13 Dec, 2023 6 commits
  5. 12 Dec, 2023 2 commits
    • Rafael J. Wysocki's avatar
      thermal: core: Rework thermal zone availability check · b38aa87f
      Rafael J. Wysocki authored
      In order to avoid running __thermal_zone_device_update() for thermal
      zones going away, the thermal zone lock is held around device_del()
      in thermal_zone_device_unregister() and thermal_zone_device_update()
      passes the given thermal zone device to device_is_registered().
      This allows thermal_zone_device_update() to skip the
      __thermal_zone_device_update() if device_del() has already run for
      the thermal zone at hand.
      
      However, instead of looking at driver core internals, the thermal
      subsystem may as well rely on its own data structures for this
      purpose.  Namely, if the thermal zone is not present in
      thermal_tz_list, it can be regarded as unavailable, which in fact is
      already the case in thermal_zone_device_unregister().  Accordingly,
      the device_is_registered() check in thermal_zone_device_update() can
      be replaced with checking whether or not the node list_head in struct
      thermal_zone_device is empty, in which case it is not there in
      thermal_tz_list.
      
      To make this work, though, it is necessary to initialize tz->node
      in thermal_zone_device_register_with_trips() before registering the
      thermal zone device and it needs to be added to thermal_tz_list and
      deleted from it under its zone lock.
      
      After the above modifications, the zone lock does not need to be
      held around device_del() in thermal_zone_device_unregister() any more.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-and-tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      b38aa87f
    • Rafael J. Wysocki's avatar
      thermal: Drop redundant and confusing device_is_registered() checks · c3ffdfff
      Rafael J. Wysocki authored
      Multiple places in the thermal subsystem (most importantly, sysfs
      attribute callback functions) check if the given thermal zone device is
      still registered in order to return early in case the device_del() in
      thermal_zone_device_unregister() has run already.
      
      However, after thermal_zone_device_unregister() has been made wait for
      all of the zone-related activity to complete before returning, it is
      not necessary to do that any more, because all of the code holding a
      reference to the thermal zone device object will be waited for even if
      it does not do anything special to enforce this.
      
      Accordingly, drop all of the device_is_registered() checks that are now
      redundant and get rid of the zone locking that is not necessary any more
      after dropping them.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-and-tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      c3ffdfff
  6. 11 Dec, 2023 1 commit
  7. 06 Dec, 2023 2 commits
    • Rafael J. Wysocki's avatar
      thermal: sysfs: Rework the reading of trip point attributes · 18dfb0e4
      Rafael J. Wysocki authored
      Rework the _show() callback functions for the trip point temperature,
      hysteresis and type attributes to avoid copying the values of struct
      thermal_trip fields that they do not use and make them carry out the
      same validation checks as the corresponding _store() callback functions.
      
      No intentional functional impact.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      18dfb0e4
    • Rafael J. Wysocki's avatar
      thermal: sysfs: Rework the handling of trip point updates · be0a3600
      Rafael J. Wysocki authored
      Both trip_point_temp_store() and trip_point_hyst_store() use
      thermal_zone_set_trip() to update a given trip point, but none of them
      actually needs to change more than one field in struct thermal_trip
      representing it.  However, each of them effectively calls
      __thermal_zone_get_trip() twice in a row for the same trip index value,
      once directly and once via thermal_zone_set_trip(), which is not
      particularly efficient, and the way in which thermal_zone_set_trip()
      carries out the update is not particularly straightforward.
      
      Moreover, input processing need not be done under the thermal zone lock
      in any of these functions.
      
      Rework trip_point_temp_store() and trip_point_hyst_store() to address
      the above, move the part of thermal_zone_set_trip() that is still
      useful to a new function called thermal_zone_trip_updated() and drop
      the rest of it.
      
      While at it, make trip_point_hyst_store() reject negative hysteresis
      values.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      be0a3600
  8. 30 Nov, 2023 1 commit
  9. 28 Nov, 2023 7 commits
  10. 20 Nov, 2023 1 commit
    • Rafael J. Wysocki's avatar
      thermal: core: Add trip thresholds for trip crossing detection · 44844db9
      Rafael J. Wysocki authored
      The trip crossing detection in handle_thermal_trip() does not work
      correctly in the cases when a trip point is crossed on the way up and
      then the zone temperature stays above its low temperature (that is, its
      temperature decreased by its hysteresis).  The trip temperature may
      be passed by the zone temperature subsequently in that case, even
      multiple times, but that does not count as the trip crossing as long as
      the zone temperature does not fall below the trip's low temperature or,
      in other words, until the trip is crossed on the way down.
      
      |-----------low--------high------------|
                   |<--------->|
                   |    hyst   |
                   |           |
                   |          -|--> crossed on the way up
                   |
               <---|-- crossed on the way down
      
      However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
      every time the trip temperature is passed by the zone temperature on
      the way up regardless of whether or not the trip has been crossed on
      the way down yet.  Moreover, it will not call thermal_notify_tz_trip_down()
      if the last zone temperature was between the trip's temperature and its
      low temperature, so some "trip crossed on the way down" events may not
      be reported.
      
      To address this issue, introduce trip thresholds equal to either the
      temperature of the given trip, or its low temperature, such that if
      the trip's threshold is passed by the zone temperature on the way up,
      its value will be set to the trip's low temperature and
      thermal_notify_tz_trip_up() will be called, and if the trip's threshold
      is passed by the zone temperature on the way down, its value will be set
      to the trip's temperature (high) and thermal_notify_tz_trip_down() will
      be called.  Accordingly, if the threshold is passed on the way up, it
      cannot be passed on the way up again until its passed on the way down
      and if it is passed on the way down, it cannot be passed on the way down
      again until it is passed on the way up which guarantees correct
      triggering of trip crossing notifications.
      
      If the last temperature of the zone is invalid, the trip's threshold
      will be set depending of the zone's current temperature: If that
      temperature is above the trip's temperature, its threshold will be
      set to its low temperature or otherwise its threshold will be set to
      its (high) temperature.  Because the zone temperature is initially
      set to invalid and tz->last_temperature is only updated by
      update_temperature(), this is sufficient to set the correct initial
      threshold values for all trips.
      
      Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.orgSigned-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      44844db9
  11. 19 Nov, 2023 8 commits
  12. 18 Nov, 2023 7 commits