1. 20 Oct, 2023 3 commits
    • Yang Jihong's avatar
      perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir() · c4a85263
      Yang Jihong authored
      If using parallel threads to collect data, perf record needs at least 6 fds
      per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
      pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)
      For an environment with more than 100 cores, if perf record uses both
      `-a` and `--threads` options, it is easy to exceed the upper limit of the
      file descriptor number, when we run out of them try to increase the limits.
      
      Before:
        $ ulimit -n
        1024
        $ lscpu | grep 'On-line CPU(s)'
        On-line CPU(s) list:                0-159
        $ perf record --threads -a sleep 1
        Failed to create data directory: Too many open files
      
      After:
        $ ulimit -n
        1024
        $ lscpu | grep 'On-line CPU(s)'
        On-line CPU(s) list:                0-159
        $ perf record --threads -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      c4a85263
    • Kajol Jain's avatar
      perf vendor events: Update PMC used in PM_RUN_INST_CMPL event for power10 platform · 3f8b6e5b
      Kajol Jain authored
      The CPI_STALL_RATIO metric group can be used to present the high
      level CPI stall breakdown metrics in powerpc, which will show:
      
      - DISPATCH_STALL_CPI ( Dispatch stall cycles per insn )
      - ISSUE_STALL_CPI ( Issue stall cycles per insn )
      - EXECUTION_STALL_CPI ( Execution stall cycles per insn )
      - COMPLETION_STALL_CPI ( Completion stall cycles per insn )
      
      Commit cf26e043 ("perf vendor events power10: Add JSON
      metric events to present CPI stall cycles in powerpc)" which added
      the CPI_STALL_RATIO metric group, also modified
      the PMC value used in PM_RUN_INST_CMPL event from PMC4 to PMC5,
      to avoid multiplexing of events.
      But that got revert in recent changes. Fix this issue by changing
      back the PMC value used in PM_RUN_INST_CMPL to PMC5.
      
      Result with the fix:
      
       ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload>
      
       Performance counter stats for 'workload':
      
              68,745,426      PM_CMPL_STALL                    #     0.21 COMPLETION_STALL_CPI
               7,692,827      PM_ISSUE_STALL                   #     0.02 ISSUE_STALL_CPI
             322,638,223      PM_RUN_INST_CMPL                 #     0.05 DISPATCH_STALL_CPI
                                                        #     0.48 EXECUTION_STALL_CPI
              16,858,553      PM_DISP_STALL_CYC
             153,880,133      PM_EXEC_STALL
      
             0.089774592 seconds time elapsed
      
      "--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
      in all group for more accuracy.
      
      Fixes: 7d473f47 ("perf vendor events: Move JSON/events to appropriate files for power10 platform")
      Reported-by: default avatarDisha Goel <disgoel@linux.vnet.ibm.com>
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Reviewed-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Tested-by: Disha Goel<disgoel@linux.ibm.com>
      Cc: maddy@linux.ibm.com
      Link: https://lore.kernel.org/r/20231016143110.244255-1-kjain@linux.ibm.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      3f8b6e5b
    • Thomas Richter's avatar
      perf trace: Use the right bpf_probe_read(_str) variant for reading user data · 5069211e
      Thomas Richter authored
      Perf test case 111 Check open filename arg using perf trace + vfs_getname
      fails on s390. This is caused by a failing function
      bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.
      
      The root cause is the lookup by address. Function bpf_probe_read()
      is used. This function works only for architectures
      with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
      
      On s390 is not possible to determine from the address to which
      address space the address belongs to (user or kernel space).
      
      Replace bpf_probe_read() by bpf_probe_read_kernel()
      and bpf_probe_read_str() by bpf_probe_read_user_str() to
      explicity specify the address space the address refers to.
      
      Output before:
       # ./perf trace -eopen,openat -- touch /tmp/111
       libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
       libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
       reg type unsupported for arg#0 function sys_enter#75
       0: R1=ctx(off=0,imm=0) R10=fp0
       ; int sys_enter(struct syscall_enter_args *args)
       0: (bf) r6 = r1           ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
       ; return bpf_get_current_pid_tgid();
       1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
       2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
       3: (bf) r2 = r10              ; R2_w=fp0 R10=fp0
       ;
       .....
       lines deleted here
       .....
       23: (bf) r3 = r6              ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
       24: (85) call bpf_probe_read#4
       unknown func bpf_probe_read#4
       processed 23 insns (limit 1000000) max_states_per_insn 0 \
      	 total_states 2 peak_states 2 mark_read 2
       -- END PROG LOAD LOG --
       libbpf: prog 'sys_enter': failed to load: -22
       libbpf: failed to load object 'augmented_raw_syscalls_bpf'
       libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
       ....
      
      Output after:
       # ./perf test -Fv 111
       111: Check open filename arg using perf trace + vfs_getname          :
       --- start ---
           1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
      	"/tmp/temporary_file.SWH85", \
      	flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
       ---- end ----
       Check open filename arg using perf trace + vfs_getname: Ok
       #
      
      Test with the sleep command shows:
      Output before:
       # ./perf trace -e *sleep sleep 1.234567890
           0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
               { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
       #
      
      Output after:
       # ./perf trace -e *sleep sleep 1.234567890
           0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
               { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
       #
      
      Fixes: 14e4b9f4 ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility")
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Co-developed-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: gor@linux.ibm.com
      Cc: hca@linux.ibm.com
      Cc: sumanthk@linux.ibm.com
      Cc: svens@linux.ibm.com
      Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      5069211e
  2. 18 Oct, 2023 3 commits
  3. 17 Oct, 2023 19 commits
  4. 12 Oct, 2023 15 commits