1. 02 Apr, 2024 5 commits
  2. 18 Jan, 2024 2 commits
  3. 12 Jan, 2024 4 commits
    • Chen Yu's avatar
      tools/power turbostat: Do not print negative LPI residency · 227ed18f
      Chen Yu authored
      turbostat prints the abnormal SYS%LPI across suspend-to-idle:
      SYS%LPI = 114479815993277.50
      
      This is reproduced by:
      Run a freeze cycle, e.g. "sleepgraph -m freeze -rtcwake 15".
      Then do a reboot. After boot up, launch the suspend-idle-idle
      and check the SYS%LPI field.
      
      The slp_so residence counter is in LPIT table, and BIOS does not
      clears this register across reset. The PMC expects the OS to calculate
      the LPI residency based on the delta. However, there is an firmware
      issue that the LPIT gets cleared to 0 during the second suspend
      to idle after the reboot, which brings negative delta value.
      
      [lenb: updated to print "neg" upon this BIOS failure]
      Reported-by: default avatarTodd Brandt <todd.e.brandt@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      227ed18f
    • Peng Liu's avatar
      tools/power turbostat: Fix Bzy_MHz documentation typo · 0b13410b
      Peng Liu authored
      The code calculates Bzy_MHz by multiplying TSC_delta * APERF_delta/MPERF_delta
      The man page erroneously showed that TSC_delta was divided.
      Signed-off-by: default avatarPeng Liu <liupeng17@lenovo.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      0b13410b
    • Wyes Karny's avatar
      tools/power turbostat: Increase the limit for fd opened · 3ac1d14d
      Wyes Karny authored
      When running turbostat, a system with 512 cpus reaches the limit for
      maximum number of file descriptors that can be opened. To solve this
      problem, the limit is raised to 2^15, which is a large enough number.
      
      Below data is collected from AMD server systems while running turbostat:
      
      |-----------+-------------------------------|
      | # of cpus | # of opened fds for turbostat |
      |-----------+-------------------------------|
      | 128       | 260                           |
      |-----------+-------------------------------|
      | 192       | 388                           |
      |-----------+-------------------------------|
      | 512       | 1028                          |
      |-----------+-------------------------------|
      
      So, the new max limit would be sufficient up to 2^14 cpus (but this
      also depends on how many counters are enabled).
      Reviewed-by: default avatarDoug Smythies <dsmythies@telus.net>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      Signed-off-by: default avatarWyes Karny <wyes.karny@amd.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      3ac1d14d
    • Doug Smythies's avatar
      tools/power turbostat: Fix added raw MSR output · e5f4e68e
      Doug Smythies authored
      When using --Summary mode, added MSRs in raw mode always
      print zeros. Print the actual register contents.
      
      Example, with patch:
      
      note the added column:
      --add msr0x64f,u32,package,raw,REASON
      
      Where:
      
      0x64F is MSR_CORE_PERF_LIMIT_REASONS
      
      Busy%   Bzy_MHz PkgTmp  PkgWatt CorWatt     REASON
      0.00    4800    35      1.42    0.76    0x00000000
      0.00    4801    34      1.42    0.76    0x00000000
      80.08   4531    66      108.17  107.52  0x08000000
      98.69   4530    66      133.21  132.54  0x08000000
      99.28   4505    66      128.26  127.60  0x0c000400
      99.65   4486    68      124.91  124.25  0x0c000400
      99.63   4483    68      124.90  124.25  0x0c000400
      79.34   4481    41      99.80   99.13   0x0c000000
      0.00    4801    41      1.40    0.73    0x0c000000
      
      Where, for the test processor (i5-10600K):
      
      PKG Limit #1: 125.000 Watts, 8.000000 sec
      MSR bit 26 = log; bit 10 = status
      
      PKG Limit #2: 136.000 Watts, 0.002441 sec
      MSR bit 27 = log; bit 11 = status
      
      Example, without patch:
      
      Busy%   Bzy_MHz PkgTmp  PkgWatt CorWatt     REASON
      0.01    4800    35      1.43    0.77    0x00000000
      0.00    4801    35      1.39    0.73    0x00000000
      83.49   4531    66      112.71  112.06  0x00000000
      98.69   4530    68      133.35  132.69  0x00000000
      99.31   4500    67      127.96  127.30  0x00000000
      99.63   4483    69      124.91  124.25  0x00000000
      99.61   4481    69      124.90  124.25  0x00000000
      99.61   4481    71      124.92  124.25  0x00000000
      59.35   4479    42      75.03   74.37   0x00000000
      0.00    4800    42      1.39    0.73    0x00000000
      0.00    4801    42      1.42    0.76    0x00000000
      
      c000000
      
      [lenb: simplified patch to apply only to package scope]
      Signed-off-by: default avatarDoug Smythies <dsmythies@telus.net>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      e5f4e68e
  4. 07 Jan, 2024 1 commit
  5. 06 Jan, 2024 2 commits
  6. 05 Jan, 2024 25 commits
  7. 04 Jan, 2024 1 commit
    • Linus Torvalds's avatar
      x86/csum: clean up `csum_partial' further · a476aae3
      Linus Torvalds authored
      Commit 688eb819 ("x86/csum: Improve performance of `csum_partial`")
      ended up improving the code generation for the IP csum calculations, and
      in particular special-casing the 40-byte case that is a hot case for
      IPv6 headers.
      
      It then had _another_ special case for the 64-byte unrolled loop, which
      did two chains of 32-byte blocks, which allows modern CPU's to improve
      performance by doing the chains in parallel thanks to renaming the carry
      flag.
      
      This just unifies the special cases and combines them into just one
      single helper the 40-byte csum case, and replaces the 64-byte case by a
      80-byte case that just does that single helper twice.  It avoids having
      all these different versions of inline assembly, and actually improved
      performance further in my tests.
      
      There was never anything magical about the 64-byte unrolled case, even
      though it happens to be a common size (and typically is the cacheline
      size).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a476aae3