1. 12 Jan, 2024 2 commits
    • Daniel Lezcano's avatar
      thermal/debugfs: Add thermal debugfs information for mitigation episodes · 7ef01f22
      Daniel Lezcano authored
      The mitigation episodes are recorded. A mitigation episode happens
      when the first trip point is crossed the way up and then the way
      down. During this episode other trip points can be crossed also and
      are accounted for this mitigation episode. The interesting information
      is the average temperature at the trip point, the undershot and the
      overshot. The standard deviation of the mitigated temperature will be
      added later.
      
      The thermal debugfs directory structure tries to stay consistent with
      the sysfs one but in a very simplified way:
      
      thermal/
       `-- thermal_zones
           |-- 0
           |   `-- mitigations
           `-- 1
               `-- mitigations
      
      The content of the mitigations file has the following format:
      
      ,-Mitigation at 349988258us, duration=130136ms
      | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
      |    0 |  passive |     65000 |      2000 |     130136 |     68227 |     62500 |     75625 |
      |    1 |  passive |     75000 |      2000 |     104209 |     74857 |     71666 |     77500 |
      ,-Mitigation at 272451637us, duration=75000ms
      | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
      |    0 |  passive |     65000 |      2000 |      75000 |     68561 |     62500 |     75000 |
      |    1 |  passive |     75000 |      2000 |      60714 |     74820 |     70555 |     77500 |
      ,-Mitigation at 238184119us, duration=27316ms
      | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
      |    0 |  passive |     65000 |      2000 |      27316 |     73377 |     62500 |     75000 |
      |    1 |  passive |     75000 |      2000 |      19468 |     75284 |     69444 |     77500 |
      ,-Mitigation at 39863713us, duration=136196ms
      | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
      |    0 |  passive |     65000 |      2000 |     136196 |     73922 |     62500 |     75000 |
      |    1 |  passive |     75000 |      2000 |      91721 |     74386 |     69444 |     78125 |
      
      More information for a better understanding of the thermal behavior
      will be added after. The idea is to give detailed statistics
      information about the undershots and overshots, the temperature speed,
      etc... As all the information in a single file is too much, the idea
      would be to create a directory named with the mitigation timestamp
      where all data could be added.
      
      Please note this code is immune against trip ordering but not against
      a trip temperature change while a mitigation is happening. However,
      this situation should be extremely rare, perhaps not happening and we
      might question ourselves if something should be done in the core
      framework for other components first.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      [ rjw: White space fixups, rebase ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7ef01f22
    • Daniel Lezcano's avatar
      thermal/debugfs: Add thermal cooling device debugfs information · 755113d7
      Daniel Lezcano authored
      The thermal framework does not have any debug information except a
      sysfs stat which is a bit controversial. This one allocates big chunks
      of memory for every cooling devices with a high number of states and
      could represent on some systems in production several megabytes of
      memory for just a portion of it. As the sysfs is limited to a page
      size, the output is not exploitable with large data array and gets
      truncated.
      
      The patch provides the same information than sysfs except the
      transitions are dynamically allocated, thus they won't show more
      events than the ones which actually occurred. There is no longer a
      size limitation and it opens the field for more debugging information
      where the debugfs is designed for, not sysfs.
      
      The thermal debugfs directory structure tries to stay consistent with
      the sysfs one but in a very simplified way:
      
      thermal/
       -- cooling_devices
          |-- 0
          |   |-- clear
          |   |-- time_in_state_ms
          |   |-- total_trans
          |   `-- trans_table
          |-- 1
          |   |-- clear
          |   |-- time_in_state_ms
          |   |-- total_trans
          |   `-- trans_table
          |-- 2
          |   |-- clear
          |   |-- time_in_state_ms
          |   |-- total_trans
          |   `-- trans_table
          |-- 3
          |   |-- clear
          |   |-- time_in_state_ms
          |   |-- total_trans
          |   `-- trans_table
          `-- 4
              |-- clear
              |-- time_in_state_ms
              |-- total_trans
              `-- trans_table
      
      The content of the files in the cooling devices directory is the same
      as the sysfs one except for the trans_table which has the following
      format:
      
      Transition	Hits
      1->0      	246
      0->1      	246
      2->1      	632
      1->2      	632
      3->2      	98
      2->3      	98
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      [ rjw: White space fixups, rebase ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      755113d7
  2. 09 Jan, 2024 4 commits
  3. 04 Jan, 2024 1 commit
  4. 02 Jan, 2024 25 commits
  5. 29 Dec, 2023 8 commits