1. 03 May, 2024 5 commits
  2. 02 May, 2024 10 commits
    • Namhyung Kim's avatar
      perf maps: Remove check_invariants() from maps__lock() · 3cdd98b4
      Namhyung Kim authored
      I found that the debug build was a slowed down a lot by the maps lock
      code since it checks the invariants whenever it gets the pointer to the
      lock.  This means it checks twice the invariants before and after the
      access.
      
      Instead, let's move the checking code within the lock area but after any
      modification and remove it from the read paths.  This would remove (more
      than) half of the maps lock overhead.
      
      The time for perf report with a huge data file (200k+ of MMAP2 events).
      
        Non-debug     Before      After
        ---------   --------   --------
           2m 43s     6m 45s     4m 21s
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3cdd98b4
    • James Clark's avatar
      perf cs-etm: Improve version detection and error reporting · e3123079
      James Clark authored
      When the config validation functions are warning about ETMv3, they do it
      based on "not ETMv4". If the drivers aren't all loaded or the hardware
      doesn't support Coresight it will appear as "not ETMv4" and then Perf
      will print the error message "... not supported in ETMv3 ..." which is
      wrong and confusing.
      
      cs_etm_is_etmv4() is also misnamed because it also returns true for
      ETE because ETE has a superset of the ETMv4 metadata files. Although
      this was always done in the correct order so it wasn't a bug.
      
      Improve all this by making a single get version function which also
      handles not present as a separate case. Change the ETMv3 error message
      to only print when ETMv3 is detected, and add a new error message for
      the not present case.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linux.dev>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-4-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e3123079
    • James Clark's avatar
      perf cs-etm: Remove repeated fetches of the ETM PMU · bc5e0e1b
      James Clark authored
      Most functions already have cs_etm_pmu, so it's a bit neater to pass
      it through rather than itr only to convert it again.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-3-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bc5e0e1b
    • James Clark's avatar
      perf cs-etm: Use struct perf_cpu as much as possible · cbaf2c4f
      James Clark authored
      The perf_cpu struct makes some iterators simpler and avoids some
      mistakes with interchanging CPU IDs with indexes etc. At the moment in
      this file the conversion to an integer is done somewhere in the middle
      of the call tree. Change it to delay the conversion to an int until the
      leaf functions.
      
      Some of the usage patterns are duplicated, so instead of changing them
      all, make cs_etm_get_ro() more reusable and use that everywhere.
      cs_etm_get_ro() didn't return an error before, but return one now so
      that it can also be used where an error is needed. Continue to ignore
      the error where it was already ignored.
      
      Use cs_etm_pmu_path_exists() instead of cs_etm_get_ro() in
      cs_etm_is_etmv4() because cs_etm_get_ro() prints a warning, but path
      exists is sufficient for this use case.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-2-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cbaf2c4f
    • Namhyung Kim's avatar
      perf annotate-data: Check kind of stack variables · b7d4aacf
      Namhyung Kim authored
      I sometimes see ("unknown type") in the result and it was because it
      didn't check the type of stack variables properly during the instruction
      tracking.  The stack can carry constant values (without type info) and
      if the target instruction is accessing the stack location, it resulted
      in the "unknown type".
      
      Maybe we could pick one of integer types for the constant, but it
      doesn't really mean anything useful.  Let's just drop the stack slot if
      it doesn't have a valid type info.
      
      Here's an example how it got the unknown type.
      Note that 0xffffff48 = -0xb8.
        -----------------------------------------------------------
        find data type for 0xffffff48(reg6) at ...
        CU for ...
        frame base: cfa=0 fbreg=6
        scope: [2/2] (die:11cb97f)
        bb: [37 - 3a]
        var [37] reg15 type='int' size=0x4 (die:0x1180633)
        bb: [40 - 4b]
        mov [40] imm=0x1 -> reg13
        var [45] reg8 type='sigset_t*' size=0x8 (die:0x11a39ee)
        mov [45] imm=0x1 -> reg2                     <---  here reg2 has a constant
        bb: [215 - 237]
        mov [218] reg2 -> -0xb8(stack) constant      <---  and save it to the stack
        mov [225] reg13 -> -0xc4(stack) constant
        call [22f] find_task_by_vgpid
        call [22f] return -> reg0 type='struct task_struct*' size=0x8 (die:0x11881e8)
        bb: [5c8 - 5cf]
        bb: [2fb - 302]
        mov [2fb] -0xc4(stack) -> reg13 constant
        bb: [13b - 14d]
        mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c)
        bb: [153 - 153]
        chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg    <--- access here
        found by insn track: 0xffffff48(reg6) type-offset=0
         type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b7d4aacf
    • Namhyung Kim's avatar
      perf annotate-data: Handle multi regs in find_data_type_block() · af89e8f2
      Namhyung Kim authored
      The instruction tracking should be the same for the both registers.
      
      Just do it once and compare the result with multi regs as with the
      previous patches.
      
      Then we don't need to call find_data_type_block() separately for each
      reg.
      
      Let's remove the 'reg' argument from the relevant functions.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      af89e8f2
    • Namhyung Kim's avatar
      perf annotate-data: Check memory access with two registers · eba1f853
      Namhyung Kim authored
      The following instruction pattern is used to access a global variable.
      
        mov     $0x231c0, %rax
        movsql  %edi, %rcx
        mov     -0x7dc94ae0(,%rcx,8), %rcx
        cmpl    $0x0, 0xa60(%rcx,%rax,1)     <<<--- here
      
      The first instruction set the address of the per-cpu variable (here, it
      is 'runqueues' of type 'struct rq').  The second instruction seems like
      a cpu number of the per-cpu base.  The third instruction get the base
      offset of per-cpu area for that cpu.  The last instruction compares the
      value of the per-cpu variable at the offset of 0xa60.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      eba1f853
    • Namhyung Kim's avatar
      perf annotate-data: Handle direct global variable access · 4449c904
      Namhyung Kim authored
      Like per-cpu base offset array, sometimes it accesses the global
      variable directly using the offset.  Allow this type of instructions as
      long as it finds a global variable for the address.
      
        movslq  %edi, %rcx
        mov     -0x7dc94ae0(,%rcx,8), %rcx   <<<--- here
      
      As %rcx has a valid type (i.e. array index) from the first instruction,
      it will be checked by the first case in check_matching_type().  But as
      it's not a pointer type, the match will fail.  But in this case, it
      should check if it accesses the kernel global array variable.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4449c904
    • Namhyung Kim's avatar
      perf annotate-data: Collect global variables in advance · c1da8411
      Namhyung Kim authored
      Currently it looks up global variables from the current CU using address
      and name.  But it sometimes fails to find a variable as the variable can
      come from a different CU - but it's still strange it failed to find a
      declaration for some reason.
      
      Anyway, it can collect all global variables from all CU once and then
      lookup them later on.  This slightly improves the success rate of my
      test data set.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c1da8411
    • Namhyung Kim's avatar
      perf dwarf-aux: Add die_collect_global_vars() · d7b60803
      Namhyung Kim authored
      This function is to search all global variables in the CU.  We want to
      have the list of global variables at once and match them later.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d7b60803
  3. 27 Apr, 2024 25 commits