Commits · c4026d3e2578291016703cd75f3a6f786f60cd80 · Kirill Smelkov / linux

14 Dec, 2022 29 commits

dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range · c4026d3e

Bryan Brattlof authored Oct 31, 2022

Only some of TI's J721E SoCs will need a eFuse register range mapped to
determine if they're affected by TI's i2128 erratum. All other SoC will
not need this eFuse range to function properly

Update the bindings for the k3_j72xx_bandgap thermal driver so other
devices will only need two register ranges
Signed-off-by: Bryan Brattlof <bb@ti.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221031232702.10339-7-bb@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

c4026d3e

dt-bindings: thermal: k3-j72xx: elaborate on binding description · effe8db0

Bryan Brattlof authored Oct 31, 2022

Elaborate on the function of this device node as well as some of the
properties this node uses.
Signed-off-by: Bryan Brattlof <bb@ti.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221031232702.10339-6-bb@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

effe8db0

thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround · 366444eb

Bryan Brattlof authored Oct 31, 2022

Some of TI's J721E SoCs require a software trimming procedure for the
temperature monitors to function properly. To determine if a particular
J721E is not affected by this erratum, both bits in the WKUP_SPARE_FUSE0
region must be set. Other SoCs, not affected by this erratum, will not
need this region.

Map the 'fuse_base' region only when the erratum fix is needed.
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://lore.kernel.org/r/20221031232702.10339-5-bb@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

366444eb

thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure · 156f0e2f

Bryan Brattlof authored Oct 31, 2022

'fuse_base' is only needed during the initial probe function to provide
data for a software trimming method for some of TI's devices affected by
the i2128 erratum. The devices not affected will not use this region

Remove fuse_base from the main k3_j72xx_bandgap structure
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://lore.kernel.org/r/20221031232702.10339-4-bb@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

156f0e2f

thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag · 311f328f

Bryan Brattlof authored Oct 31, 2022

Some of TI's J721E SoCs require a software trimming method to report
temperatures accurately. Currently we are using a few different data
types to indicate when we should apply the erratum.

Change the 'workaround_needed' variable's data type to a bool to align
with how we are using this variable currently.
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://lore.kernel.org/r/20221031232702.10339-3-bb@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

311f328f

thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function · 46cab93a

Bryan Brattlof authored Oct 31, 2022

The k3_thermal_get_temp() function can be simplified to return only
the result of k3_bgp_read_temp() without needing the 'ret' variable
Signed-off-by: Bryan Brattlof <bb@ti.com>
Link: https://lore.kernel.org/r/20221031232702.10339-2-bb@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

46cab93a

thermal/drivers/qcom: Demote error log of thermal zone register to debug · 2baad249

Manivannan Sadhasivam authored Oct 29, 2022

devm_thermal_of_zone_register() can fail with -ENODEV if thermal zone for
the channel is not represented in DT. This is perfectly fine since not all
sensors needs to be used for thermal zones but only a few in real world.

So demote the error log to debug to avoid spamming users.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20221029052933.32421-1-manivannan.sadhasivam@linaro.orgSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

2baad249

thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2 · 8763f8ac

Luca Weiss authored Oct 20, 2022

On gen2 chips the stage2 threshold is not 140 degC but 125 degC.

Make the warning message clearer by using this variable and also by
including the temperature that was checked for.

Fixes: aa92b331 ("thermal/drivers/qcom-spmi-temp-alarm: Add support for GEN2 rev 1 PMIC peripherals")
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Amit Kucheria <amitk@kernel.org>
Link: https://lore.kernel.org/r/20221020145237.942146-1-luca.weiss@fairphone.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

8763f8ac

dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450 · fa17c413

Krzysztof Kozlowski authored Nov 16, 2022

Narrow number of interrupts per variants: SC8280XP, SM6350 and SM8450.
The compatibles are already used and described. They only missed the
constraints of number of interrupts.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221116113140.69587-1-krzysztof.kozlowski@linaro.orgSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

fa17c413

thermal/core/power allocator: Remove a useless include · de04f680

Christophe JAILLET authored Nov 26, 2022

This file does not use rcu, so there is no point in including
<linux/rculist.h>.

Remove it.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/9adeec47cb5a8193016272d5c8bf936235c1711d.1669459337.git.christophe.jaillet@wanadoo.frSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

de04f680

thermal/drivers/imx8mm: Add hwmon support · de95d134

Alexander Stein authored Jul 26, 2022

Expose thermal sensors as HWMON devices.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20220726122331.323093-1-alexander.stein@ew.tq-group.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

de95d134

thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message · 6f894164

Johan Hovold authored Nov 02, 2022

Drivers should not be logging errors on probe deferral. Switch to using
dev_err_probe() to log failures when parsing the devicetree to avoid
errors like:

qcom-spmi-adc-tm5 c440000.spmi:pmic@0:adc-tm@3400: get dt data failed: -517

when a channel is not yet available.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20221102152630.696-1-johan+linaro@kernel.orgSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

6f894164

dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC · c464856e

Daniel Golle authored Nov 30, 2022

Document compatible string 'mediatek,mt7986-thermal' for V3 thermal
unit found in MT7986 SoCs.
'mediatek,mt7981-thermal' is also added as it is identical with the
thermal unit of MT7986.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

c464856e

thermal: ti-soc-thermal: Drop comma after SoC match table sentinel · 3f9cb579

Geert Uytterhoeven authored Nov 21, 2022

It does not make sense to have a comma after a sentinel, as any new
elements must be added before the sentinel.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Keerthy <j-keerthy@ti.com>
Link: https://lore.kernel.org/r/1d6de2a80b919cb11199e56ac06ad21c273ebe57.1669045586.git.geert+renesas@glider.beSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

3f9cb579

thermal/drivers/imx: Add support for loading calibration data from OCOTP · 40329164

Marek Vasut authored Dec 02, 2022

The TMU TASR, TCALIVn, TRIM registers must be explicitly programmed with
calibration values in OCOTP. Add support for reading the OCOTP calibration
data and programming those into the TMU hardware.

The MX8MM/MX8MN TMUv1 uses only one OCOTP cell, while MX8MP TMUv2 uses 4,
the programming differs in each case.

Based on U-Boot commits:
70487ff386c ("imx8mm: Load fuse for TMU TCALIV and TASR")
ebb9aab318b ("imx: load calibration parameters from fuse for i.MX8MP")
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

40329164

dt-bindings: thermal: imx8mm-thermal: Document optional nvmem-cells · 8848c0d7

Marek Vasut authored Dec 02, 2022

The TMU TASR, TCALIVn, TRIM registers must be explicitly programmed with
calibration values from OCOTP. Document optional phandle to OCOTP nvmem
provider.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

8848c0d7

thermal/drivers/qcom/tsens: Rework debugfs file structure · 89992d95

Christian Marangi authored Oct 22, 2022

The current tsens debugfs structure is composed by:
- a tsens dir in debugfs with a version file
- a directory for each tsens istance with sensors file to dump all the
  sensors value.

This works on the assumption that we have the same version for each
istance but this assumption seems fragile and with more than one tsens
istance results in the version file not tracking each of them.

A better approach is to just create a subdirectory for each tsens
istance and put there version and sensors debugfs file.

Using this new implementation results in less code since debugfs entry
are created only on successful tsens probe.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20221022125657.22530-4-ansuelsmth@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

89992d95

thermal/drivers/qcom/tsens: Fix wrong version id dbg_version_show · c7e077e9

Christian Marangi authored Oct 22, 2022

For VER_0 the version was incorrectly reported as 0.1.0.

Fix that and correctly report the major version for this old tsens
revision.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20221022125657.22530-3-ansuelsmth@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

c7e077e9

thermal/drivers/qcom/tsens: Init debugfs only with successful probe · de48d876

Christian Marangi authored Oct 22, 2022

Calibrate and tsens_register can fail or PROBE_DEFER. This will cause a
double or a wrong init of the debugfs information. Init debugfs only
with successful probe fixing warning about directory already present.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20221022125657.22530-2-ansuelsmth@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

de48d876

thermal/drivers/tsens: Add IPQ8074 support · 6840455d

Robert Marko authored Aug 19, 2022

Qualcomm IPQ8074 uses tsens v2.3 IP, however unlike other tsens v2 IP
it only has one IRQ, that is used for up/low as well as critical.
It also does not support negative trip temperatures.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220818220245.338396-4-robimarko@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

6840455d

thermal/drivers/tsens: Allow configuring min and max trips · f63baced

Robert Marko authored Aug 19, 2022

IPQ8074 and IPQ6018 dont support negative trip temperatures and support
up to 204 degrees C as the max trip temperature.

So, instead of always setting the -40 as min and 120 degrees C as max
allow it to be configured as part of the features.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220818220245.338396-3-robimarko@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

f63baced

thermal/drivers/tsens: Add support for combined interrupt · 4360af35

Robert Marko authored Aug 19, 2022

Despite using tsens v2.3 IP, IPQ8074 and IPQ6018 only have one IRQ for
signaling both up/low and critical trips.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20220818220245.338396-2-robimarko@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

4360af35

dt-bindings: thermal: tsens: Add ipq8074 compatible · c6db32ec

Robert Marko authored Aug 19, 2022

Qualcomm IPQ8074 has tsens v2.3.0 block, though unlike existing v2 IP it
only uses one IRQ, so tsens v2 compatible cannot be used as the fallback.

We also have to make sure that correct interrupts are set according to
compatibles, so populate interrupt information per compatibles.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220818220245.338396-1-robimarko@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

c6db32ec

thermal/of: Fix memory leak on thermal_of_zone_register() failure · 7ef2f023

Ido Schimmel authored Oct 20, 2022

The function does not free 'of_ops' upon failure, leading to a memory
leak [1].

Fix by freeing 'of_ops' in the error path.

[1]
unreferenced object 0xffff8ee846198c80 (size 128):
  comm "swapper/0", pid 1, jiffies 4294699704 (age 70.076s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    d0 3f 6e 8c ff ff ff ff 00 00 00 00 00 00 00 00  .?n.............
  backtrace:
    [<00000000d136f562>] __kmalloc_node_track_caller+0x42/0x120
    [<0000000063f31678>] kmemdup+0x1d/0x40
    [<00000000e6d24096>] thermal_of_zone_register+0x49/0x520
    [<000000005e78c755>] devm_thermal_of_zone_register+0x54/0x90
    [<00000000ee6b209e>] pmbus_add_sensor+0x1b4/0x1d0
    [<00000000896105e3>] pmbus_add_sensor_attrs_one+0x123/0x440
    [<0000000049e990a6>] pmbus_add_sensor_attrs+0xfe/0x1d0
    [<00000000466b5440>] pmbus_do_probe+0x66b/0x14e0
    [<0000000084d42285>] i2c_device_probe+0x13b/0x2f0
    [<0000000029e2ae74>] really_probe+0xce/0x2c0
    [<00000000692df15c>] driver_probe_device+0x19/0xd0
    [<00000000547d9cce>] __device_attach_driver+0x6f/0x100
    [<0000000020abd24b>] bus_for_each_drv+0x76/0xc0
    [<00000000665d9563>] __device_attach+0xfc/0x180
    [<000000008ddd4d6a>] bus_probe_device+0x82/0xa0
    [<000000009e61132b>] device_add+0x3fe/0x920

Fixes: 3fd6d6e2 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20221020103658.802457-1-idosch@nvidia.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

7ef2f023

thermal/drivers/k3_j72xx_bandgap: Fix the debug print message · a7c42af7

Keerthy authored Oct 10, 2022

The debug print message to check the workaround applicability is inverted.
Fix the same.

Fixes: ffcb2fc8 ("thermal: k3_j72xx_bandgap: Add the bandgap driver support")
Reported-by: Bryan Brattlof <bb@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Link: https://lore.kernel.org/r/20221010034126.3550-1-j-keerthy@ti.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

a7c42af7

dt-bindings: thermal: Convert generic-adc-thermal to DT schema · 87f9fe8c

Rob Herring authored Oct 11, 2022

Convert the 'generic-adc-thermal' binding to DT schema format.

The binding said '#thermal-sensor-cells' should be 1, but all in tree
users are 0 and 1 doesn't make sense for a single channel.

Drop the example's related providers and consumers of the
'generic-adc-thermal' node as the convention is to not have those in
the examples.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221011175235.3191509-1-robh@kernel.orgSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

87f9fe8c

thermal/drivers/imx8mm_thermal: Validate temperature range · d37edc73

Marcus Folkesson authored Oct 14, 2022

Check against the upper temperature limit (125 degrees C) before
consider the temperature valid.

Fixes: 5eed800a ("thermal: imx8mm: Add support for i.MX8MM thermal monitoring unit")
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Reviewed-by: Jacky Bai <ping.bai@nxp.com>
Link: https://lore.kernel.org/r/20221014073507.1594844-1-marcus.folkesson@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

d37edc73

thermal/drivers/imx8mm_thermal: Use GENMASK() when appropriate · 1f455f14

Marcus Folkesson authored Oct 14, 2022

GENMASK() is preferred to use for bitmasks.
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Link: https://lore.kernel.org/r/20221014081620.1599511-1-marcus.folkesson@gmail.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

1f455f14

dt-bindings: thermal: tsens: Add sm8450 compatible · f0f4c3ad

Luca Weiss authored Oct 16, 2022

Document the tsens-v2 compatible for sm8450 SoC.
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221016090035.565350-5-luca@z3ntu.xyzSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

f0f4c3ad

02 Dec, 2022 1 commit

Merge Intel thermal control drivers changes for v6.2 · 7d4b19ab

Rafael J. Wysocki authored Dec 02, 2022

 - Add Raptor Lake-S support to the intel_tcc_cooling driver (Zhang
   Rui).

 - Make the intel_tcc_cooling driver detect TCC locking (Zhang Rui).

 - Address Coverity warning in intel_hfi_process_event() (Ricardo Neri).

 - Prevent accidental clearing of HFI in the package thermal interrupt
   status (Srinivas Pandruvada).

 - Protect the clearing of status bits in MSR_IA32_PACKAGE_THERM_STATUS
   and MSR_IA32_THERM_STATUS (Srinivas Pandruvada).

 - Allow the HFI interrupt handler to ACK an event for the same
   timestamp (Srinivas Pandruvada).

* thermal-intel:
  thermal: intel: hfi: ACK HFI for the same timestamp
  thermal: intel: Protect clearing of thermal status bits
  thermal: intel: Prevent accidental clearing of HFI status
  thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
  thermal: intel: intel_tcc_cooling: Detect TCC lock bit
  thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages

7d4b19ab

25 Nov, 2022 1 commit

thermal: core: fix some possible name leaks in error paths · 4748f968

Yang Yingliang authored Nov 15, 2022

In some error paths before device_register(), the names allocated
by dev_set_name() are not freed. Move dev_set_name() front to
device_register(), so the name can be freed while calling
put_device().

Fixes: 1dd7128b ("thermal/core: Fix null pointer dereference in thermal_release()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

4748f968

23 Nov, 2022 3 commits

thermal: intel: hfi: ACK HFI for the same timestamp · c0e3acdc

Srinivas Pandruvada authored Nov 16, 2022

Some processors issue more than one HFI interrupt with the same
timestamp. Each interrupt must be acknowledged to let the hardware issue
new HFI interrupts. But this can't be done without some additional flow
modification in the existing interrupt handling.

For background, the HFI interrupt is a package level thermal interrupt
delivered via a LVT. This LVT is common for both the CPU and package
level interrupts. Hence, all CPUs receive the HFI interrupts. But only
one CPU should process interrupt and others simply exit by issuing EOI
to LAPIC.

The current HFI interrupt processing flow:

  1. Receive Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spinlock, one CPU will enter spinlock and others
     will simply return from here to issue EOI.
    (Let's assume CPU 4 is processing interrupt)
  4. Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.      ignore interrupt, unlock and return
  7. Copy the HFI message to local buffer
  8. unlock spinlock
  9. ACK HFI interrupt
 10. Queue the message for processing in a work-queue

It is tempting to simply acknowledge all the interrupts even if they
have the same timestamp. This may cause some interrupts to not be
processed.

Let's say CPU5 is slightly late and reaches step 4 while CPU4 is
between steps 8 and 9.

Currently we simply ignore interrupts with the same timestamp. No
issue here for CPU5. When CPU4 acknowledges the interrupt, the next
HFI interrupt can be delivered.

If we acknowledge interrupts with the same timestamp (at step 6), there
is a race condition. Under the same scenario, CPU 5 will acknowledge
the HFI interrupt. This lets hardware generate another HFI interrupt,
before CPU 4 start executing step 9. Once CPU 4 complete step 9, it
will acknowledge the newly arrived HFI interrupt, without actually
processing it.

Acknowledge the interrupt when holding the spinlock. This avoids
contention of the interrupt acknowledgment.

Updated flow:

  1. Receive HFI Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spin-lock
     Let's assume CPU 4 is processing interrupt
  4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit
  4.2	If hfi status is 0
  4.3		unlock spinlock
  4.4		return
  4.5 Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.1      ACK HFI Interrupt,
  6.2	unlock spinlock
  6.3	return
  7. Copy the HFI message to local buffer
  8. ACK HFI interrupt
  9. unlock spinlock
 10. Queue the message for processing in a work-queue

To avoid taking the lock unnecessarily, intel_hfi_process_event() checks
the status of the HFI interrupt before taking the lock. If CPU5 is late,
when it starts processing the interrupt there are two scenarios:

 a) CPU4 acknowledged the HFI interrupt before CPU5 read
    MSR_IA32_THERM_STATUS. CPU5 exits.

 b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the
    interrupt. CPU5 will take the lock if CPU4 has released it. It then
    re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt,
    the HFI status bit is clear and CPU5 exits. If a new HFI interrupt
    was generated it will find that the status bit is set and it will
    continue to process the interrupt. In this case even if timestamp
    is not changed, the ACK can be issued as this is a new interrupt.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Arshad, Adeel<adeel.arshad@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

c0e3acdc

thermal: intel: Protect clearing of thermal status bits · 930d06bf

Srinivas Pandruvada authored Nov 15, 2022

The clearing of the package thermal status is done by Read-Modify-Write
operation. This may result in clearing of some new status bits which are
being or about to be processed.

For example, while clearing of HFI status, after read of thermal status
register, a new thermal status bit is set by the hardware. But during
write back, the newly generated status bit will be set to 0 or cleared.
So, it is not safe to do read-modify-write.

Since thermal status Read-Write bits can be set to only 0 not 1, it is
safe to set all other bits to 1 which are not getting cleared.

Create a common interface for clearing package thermal status bits. Use
this interface to replace existing code to clear thermal package status
bits.

It is safe to call from different CPUs without protection as there is no
read-modify-write. Also wrmsrl results in just single instruction. For
example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If
CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write
0x4000aa8. The bits which are not part of clear are set to 1. The default
mask for bits, which can be written here is 0x4000aaa.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

930d06bf

thermal: intel: Prevent accidental clearing of HFI status · 6fe1e64b

Srinivas Pandruvada authored Nov 15, 2022

When there is a package thermal interrupt with PROCHOT log, it will be
processed and cleared. It is possible that there is an active HFI event
status, which is about to get processed or getting processed. While
clearing PROCHOT log bit, it will also clear HFI status bit. This means
that hardware is free to update HFI memory.

When clearing a package thermal interrupt, some processors will generate
a "general protection fault" when any of the read only bit is set to 1.

The driver maintains a mask of all read-write bits which can be set.

This mask doesn't include HFI status bit. This bit will also be cleared,
as it will be assumed read-only bit. So, add HFI status bit 26 to the
mask.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

6fe1e64b

14 Nov, 2022 6 commits

thermal/core: Protect thermal device operations against thermal device removal · b778b4d7

Guenter Roeck authored Nov 10, 2022

Thermal device operations may be called after thermal zone device removal.
After thermal zone device removal, thermal zone device operations must
no longer be called. To prevent such calls from happening, ensure that
the thermal device is registered before executing any thermal device
operations.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

b778b4d7

thermal/core: Remove thermal_zone_set_trips() · 91b3aafc

Guenter Roeck authored Nov 10, 2022

Since no callers of thermal_zone_set_trips() are left, remove the function.
Document __thermal_zone_set_trips() instead. Explicitly state that the
thermal zone lock must be held when calling the function, and that the
pointer to the thermal zone must be valid.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

91b3aafc

thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex · 05eeee2b

Guenter Roeck authored Nov 10, 2022

Protect access to thermal operations against thermal zone removal by
acquiring the thermal zone device mutex. After acquiring the mutex, check
if the thermal zone device is registered and abort the operation if not.

With this change, we can call __thermal_zone_device_update() instead of
thermal_zone_device_update() from trip_point_temp_store() and from
emul_temp_store(). Similar, we can call __thermal_zone_set_trips() instead
of thermal_zone_set_trips() from trip_point_hyst_store().
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

05eeee2b

thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex · ea37bec5

Guenter Roeck authored Nov 10, 2022

In preparation to protecting access to thermal operations against thermal
zone device removal, protect hwmon accesses to thermal zone operations
with the thermal zone mutex. After acquiring the mutex, ensure that the
thermal zone device is registered before proceeding.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ea37bec5

thermal/core: Introduce locked version of thermal_zone_device_update · 1c439dec

Guenter Roeck authored Nov 10, 2022

In thermal_zone_device_set_mode(), the thermal zone mutex is released only
to be reacquired in the subsequent call to thermal_zone_device_update().

Introduce __thermal_zone_device_update(), which is similar to
thermal_zone_device_update() but has to be called with the thermal device
mutex held. Call the new function from thermal_zone_device_set_mode()
to avoid the extra thermal device mutex release/acquire sequence in that
function.

With the new function in place, re-implement thermal_zone_device_update()
as wrapper around __thermal_zone_device_update() to acquire and release
the thermal device mutex.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

1c439dec

thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp · ed97d10a

Guenter Roeck authored Nov 10, 2022

All callers of __thermal_zone_get_temp() already validated the
thermal zone parameters. Move validation to thermal_zone_get_temp()
where it is actually needed. Also add kernel documentation for
__thermal_zone_get_temp(), listing the requirement that the
function must be called with validated parameters and with thermal
device mutex held.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ed97d10a