1. 03 Feb, 2021 14 commits
    • Athira Rajeev's avatar
      perf powerpc: Fix gap between kernel end and module start · 557c3ead
      Athira Rajeev authored
      Running "perf mem report" in TUI mode fails with ENOMEM message in
      powerpc:
      
        failed to process sample
      
      Running with debug and verbose options points that issue is while
      allocating memory for sample histograms.
      
      The error path is:
      
        symbol__inc_addr_samples() ->
          __symbol__inc_addr_samples() ->
            annotated_source__histogram()
      
      symbol__inc_addr_samples() calls annotated_source__alloc_histograms ()
      to allocate memory for sample histograms using calloc(). Here calloc()
      fails since the size of symbol is huge. The size of a symbol is
      calculated as difference between its start and end address.
      
      Example histogram allocation that fails is:
      
        sym->name is _end
        sym->start is 0xc0000000027a0000
        sym->end is 0xc008000003890000
        symbol__size(sym) is 0x80000010f0000
      
      In the above case, the difference between sym->start
      (0xc0000000027a0000) and sym->end (0xc008000003890000) is huge.
      
      This is same problem as in s390 and arm64 which are fixed in commits:
      
        b9c0a649 ("perf annotate: Fix s390 gap between kernel end and module start")
        78886f3e ("perf symbols: Fix arm64 gap between kernel start and module end")
      
      When this symbol was read first, its start and end address was set to
      address which matches with data from /proc/kallsyms.
      
      After symbol__new():
      
        symbol__new: _end 0xc0000000027a0000-0xc0000000027a0000
      
        From /proc/kallsyms:
        ...
        c000000002799370 b backtrace_flag
        c000000002799378 B radix_tree_node_cachep
        c000000002799380 B __bss_stop
        c0000000027a0000 B _end
        c008000003890000 t icmp_checkentry      [ip_tables]
        c008000003890038 t ipt_alloc_initial_table      [ip_tables]
        c008000003890468 T ipt_do_table [ip_tables]
        c008000003890de8 T ipt_unregister_table_pre_exit        [ip_tables]
        ...
      
      Perf calls function symbols__fixup_end() which sets the end of symbol to
      0xc008000003890000, which is the next address and this is the start
      address of first module (icmp_checkentry in above) which will make the
      huge symbol size of 0x80000010f0000.
      
      After symbols__fixup_end:
      
        symbols__fixup_end: sym->name: _end
        sym->start: 0xc0000000027a0000
        sym->end: 0xc008000003890000
      
      On powerpc, kernel text segment is located at 0xc000000000000000 whereas
      the modules are located at very high memory addresses,
      0xc00800000xxxxxxx. Since the gap between end of kernel text segment and
      beginning of first module's address is high, histogram allocation using
      calloc fails.
      
      Fix this by detecting the kernel's last symbol and limiting the range of
      last kernel symbol to pagesize.
      
      Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-By: default avatarKajol Jain <kjain@linux.ibm.com>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1609208054-1566-1-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      557c3ead
    • Yonatan Goldschmidt's avatar
      perf inject jit: Add namespaces support · 67dec926
      Yonatan Goldschmidt authored
      This patch fixes "perf inject --jit" to properly operate on
      namespaced/containerized processes:
      
      * jitdump files are generated by the process, thus they should be
        looked up in its mount NS.
      
      * DSOs of injected MMAP events will later be looked up in the process
        mount NS, so write them into its NS.
      
      * PIDs & TIDs from jitdump events need to be translated to the PID as
        seen by "perf record" before written into MMAP events.
      
      For a process in a different PID NS, the TID & PID given in the jitdump
      event are actually ignored; I use the TID & PID of the thread which
      mmap()ed the jitdump file. This is simplified and won't do for forks of
      the initial process, if they continue using the same jitdump file.
      Future patches might improve it.
      
      This was tested by recording a NodeJS process running with
      "--perf-prof", inside a Docker container, and by recording another
      NodeJS process running in the same namespaces as perf itself, to make
      sure it's not broken for non-containerized processes.
      Signed-off-by: default avatarYonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20201105015604.1726943-1-yonatan.goldschmidt@granulate.ioSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      67dec926
    • Yonatan Goldschmidt's avatar
      perf namespaces: Add 'in_pidns' to nsinfo struct · 2b51c71b
      Yonatan Goldschmidt authored
      Provides an accurate mean to determine if the owner thread is in a
      different PID namespace.
      Signed-off-by: default avatarYonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20201105015418.1725218-1-yonatan.goldschmidt@granulate.ioSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2b51c71b
    • Namhyung Kim's avatar
      perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events · 473f742e
      Namhyung Kim authored
      Like in __event__synthesize_thread(), I think it's better to use
      scandir() instead of the readdir() loop.  In case some malicious task
      continues to create new threads, the readdir() loop will run over and
      over to collect tids.  The scandir() also has the problem but the window
      is much smaller since it doesn't do much work during the iteration.
      
      Also add filter_task() function as we only care the tasks.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      473f742e
    • Namhyung Kim's avatar
      perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads · c1b90795
      Namhyung Kim authored
      To synthesize information to resolve sample IPs, it needs to scan task
      and mmap info from the /proc filesystem.  For each process, it opens
      (and reads) status and maps file respectively.  But as kernel threads
      don't have memory maps so we can skip the maps file.
      
      To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file.
      It's about the peak virtual memory usage so only user-level tasks have
      that.  Note that it's possible to miss the line due to partial reads.
      So we should double-check if it's a really kernel thread when there's no
      VmPeak line.
      
      Thus check "Threads:" line (which follows the VmPeak line whether or not
      it exists) to be sure it's read enough data - just in case of deeply
      nested pid namespaces or large number of supplementary groups are
      involved.
      
      This is for user process:
      
        $ head -40 /proc/1/status
        Name:	systemd
        Umask:	0000
        State:	S (sleeping)
        Tgid:	1
        Ngid:	0
        Pid:	1
        PPid:	0
        TracerPid:	0
        Uid:	0	0	0	0
        Gid:	0	0	0	0
        FDSize:	256
        Groups:
        NStgid:	1
        NSpid:	1
        NSpgid:	1
        NSsid:	1
        VmPeak:	  234192 kB           <-- here
        VmSize:	  169964 kB
        VmLck:	       0 kB
        VmPin:	       0 kB
        VmHWM:	   29528 kB
        VmRSS:	    6104 kB
        RssAnon:	    2756 kB
        RssFile:	    3348 kB
        RssShmem:	       0 kB
        VmData:	   19776 kB
        VmStk:	    1036 kB
        VmExe:	     784 kB
        VmLib:	    9532 kB
        VmPTE:	     116 kB
        VmSwap:	    2400 kB
        HugetlbPages:	       0 kB
        CoreDumping:	0
        THP_enabled:	1
        Threads:	1                     <-- and here
        SigQ:	1/62808
        SigPnd:	0000000000000000
        ShdPnd:	0000000000000000
        SigBlk:	7be3c0fe28014a03
        SigIgn:	0000000000001000
      
      And this is for kernel thread:
      
        $ head -20 /proc/2/status
        Name:	kthreadd
        Umask:	0000
        State:	S (sleeping)
        Tgid:	2
        Ngid:	0
        Pid:	2
        PPid:	0
        TracerPid:	0
        Uid:	0	0	0	0
        Gid:	0	0	0	0
        FDSize:	64
        Groups:
        NStgid:	2
        NSpid:	2
        NSpgid:	0
        NSsid:	0
        Threads:	1                     <-- here
        SigQ:	1/62808
        SigPnd:	0000000000000000
        ShdPnd:	0000000000000000
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c1b90795
    • Namhyung Kim's avatar
      perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis · 30626e08
      Namhyung Kim authored
      To save memory usage, it needs to reduce the number of entries in the
      proc filesystem.  It's using /proc/<PID>/task directory to traverse
      threads in the process and then kernel creates /proc/<PID>/task/<TID>
      entries.
      
      After that it checks the thread info using the /proc/<TID>/status file
      rather than /proc/<PID>/task/<TID>/status.  As far as I can see, they
      are the same and contain all the info we need.
      
      Using the latter eliminates the unnecessary /proc/<TID> entry.  This can
      be useful especially a large number of threads are used in the system.
      In my experiment around 1KB of memory on average was saved for each
      thread (which is not a thread group leader).
      
      To do this, pass both pid and tid to perf_event_prepare_comm() if it
      knows them.  In case it doesn't know, passing 0 as pid will do the old
      way.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210202090118.2008551-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      30626e08
    • John Garry's avatar
      perf vendor events arm64: Reference common and uarch events for A76 · c3a9cdef
      John Garry authored
      Reduce duplication in the JSONs by referencing standard events from
      armv8-common-and-microarch.json
      
      In general the "PublicDescription" fields are not modified when somewhat
      significantly worded differently than the standard.
      
      Apart from that, description and names for events slightly different to
      standard are changed (to standard) for consistency.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@openeuler.org
      Link: https://lore.kernel.org/r/1611835236-34696-5-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c3a9cdef
    • John Garry's avatar
      perf vendor events arm64: Reference common and uarch events for Ampere eMag · d02d5dc8
      John Garry authored
      Reduce duplication in the JSONs by referencing standard events from
      armv8-common-and-microarch.json
      
      In general the "PublicDescription" fields are not modified when somewhat
      significantly worded differently than the standard.
      
      Apart from that, description and names for events slightly different to
      standard are changed (to standard) for consistency.
      
      Note that names for events 0x34 and 0x35 are non-standard and remain
      unchanged. Those events came from the following originally:
      
        https://github.com/AmpereComputing/ampere-centos-kernel/blob/4c2479c67bbcf35b35224db12a092b33682b181c/Documentation/arm64/eMAG-ARM-CoreImpDefined.pdfSigned-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Cc: mathieu.poirier@linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@openeuler.org
      Link: https://lore.kernel.org/r/1611835236-34696-4-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d02d5dc8
    • John Garry's avatar
      perf vendor events arm64: Add common and uarch event JSON · c7766966
      John Garry authored
      Add a common and microarch JSON, which can be referenced from CPU JSONs.
      
      For now, brief and public description are as event brief event
      description from the ARMv8 ARM [0], D7-11.
      
      The list of events is not complete, as not all events will be referenced
      yet.
      
      Reference document is at the following:
      
      [0] https://documentation-service.arm.com/static/5fa3bd1eb209f547eebd4141?token=Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@openeuler.org
      Link: https://lore.kernel.org/r/1611835236-34696-3-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c7766966
    • John Garry's avatar
      perf vendor events arm64: Fix Ampere eMag event typo · 2bf797be
      John Garry authored
      The "briefdescription" for event 0x35 has a typo - fix it.
      
      Fixes: d35c595b ("perf vendor events arm64: Revise core JSON events for eMAG")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@openeuler.org
      Link: https://lore.kernel.org/r/1611835236-34696-2-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2bf797be
    • Jin Yao's avatar
      perf script: Support DSO filter like in other perf tools · 4b799a9b
      Jin Yao authored
      Other perf tool builtins already supported a DSO filter.
      
      For example:
      
        $ perf report --dsos a,b,c
      
      which only considers symbols in these dsos.
      
      Now the DSO filter is supported in 'perf script':
      
        root@kbl-ppc:~# ./perf script --dsos "[kernel.kallsyms]"
                  perf 18123 [000] 6142863.075104:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
                  perf 18123 [000] 6142863.075107:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
                  perf 18123 [000] 6142863.075108:         10   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
                  perf 18123 [000] 6142863.075109:        273   cycles:  ffffffff9ca7730a native_write_msr+0xa ([kernel.kallsyms])
                  perf 18123 [000] 6142863.075110:       7684   cycles:  ffffffff9ca3c9c0 native_sched_clock+0x50 ([kernel.kallsyms])
                  perf 18123 [000] 6142863.075112:     213017   cycles:  ffffffff9d765a92 syscall_exit_to_user_mode+0x32 ([kernel.kallsyms])
                  perf 18123 [001] 6142863.075156:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
                  perf 18123 [001] 6142863.075158:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
                  perf 18123 [001] 6142863.075159:         17   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
      
      Committer testing:
      
        $ perf script
                      ls 2364888 29303.010949:          1 cycles:u:  ffffffffa4bbc6a9 [unknown] ([unknown])
                      ls 2364888 29303.010957:          1 cycles:u:  ffffffffa429ef48 [unknown] ([unknown])
                      ls 2364888 29303.010961:          1 cycles:u:  ffffffffa4260133 [unknown] ([unknown])
                      ls 2364888 29303.010964:          5 cycles:u:  ffffffffa429efad [unknown] ([unknown])
                      ls 2364888 29303.010967:         41 cycles:u:  ffffffffa42a4586 [unknown] ([unknown])
                      ls 2364888 29303.010972:        435 cycles:u:  ffffffffa429efe0 [unknown] ([unknown])
                      ls 2364888 29303.010978:       5142 cycles:u:      7f9b95bc2abf __GI___tunables_init+0x11f (/usr/lib64/ld-2.32.so)
                      ls 2364888 29303.011006:      38551 cycles:u:  ffffffffa4290f61 [unknown] ([unknown])
                      ls 2364888 29303.011486:     238234 cycles:u:      7f9b95bb7741 _dl_relocate_object+0xa71 (/usr/lib64/ld-2.32.so)
                      ls 2364888 29303.011937:     415870 cycles:u:      7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so)
        $
      
      Before:
      
        $ perf script --dsos /usr/lib64/libc-2.32.so |& head -5
          Error: unknown option `dsos'
      
         Usage: perf script [<options>]
            or: perf script [<options>] record <script> [<record-options>] <command>
            or: perf script [<options>] report <script> [script-args]
        $
      
      After:
      
        $ perf script --dsos /usr/lib64/libc-2.32.so
                      ls 2364888 29303.011937:     415870 cycles:u:      7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so)
        $
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210124232750.19170-2-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4b799a9b
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Fix DSO filtering when not finding a map for a sampled address · c69bf11a
      Arnaldo Carvalho de Melo authored
      When we lookup an address and don't find a map we should filter that
      sample if the user specified a list of --dso entries to filter on, fix
      it.
      
      Before:
      
        $ perf script
                   sleep 274800  2843.556162:          1 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
                   sleep 274800  2843.556168:          1 cycles:u:  ffffffffbb2b047d [unknown] ([unknown])
                   sleep 274800  2843.556171:          1 cycles:u:  ffffffffbb2706b2 [unknown] ([unknown])
                   sleep 274800  2843.556174:          6 cycles:u:  ffffffffbb2b0267 [unknown] ([unknown])
                   sleep 274800  2843.556176:         59 cycles:u:  ffffffffbb2b03b1 [unknown] ([unknown])
                   sleep 274800  2843.556180:        691 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
                   sleep 274800  2843.556189:       9160 cycles:u:      7fa9550eeaa3 __GI___tunables_init+0xf3 (/usr/lib64/ld-2.32.so)
                   sleep 274800  2843.556312:      86937 cycles:u:      7fa9550e157b _dl_lookup_symbol_x+0x4b (/usr/lib64/ld-2.32.so)
        $
      
      So we have some samples we somehow didn't find in a map for, if we now
      do:
      
        $ perf report --stdio --dso /usr/lib64/ld-2.32.so
        # dso: /usr/lib64/ld-2.32.so
        #
        # Total Lost Samples: 0
        #
        # Samples: 8  of event 'cycles:u'
        # Event count (approx.): 96856
        #
        # Overhead  Command  Symbol
        # ........  .......  ........................
        #
            89.76%  sleep    [.] _dl_lookup_symbol_x
             9.46%  sleep    [.] __GI___tunables_init
             0.71%  sleep    [k] 0xffffffffbb26bff4
             0.06%  sleep    [k] 0xffffffffbb2b03b1
             0.01%  sleep    [k] 0xffffffffbb2b0267
             0.00%  sleep    [k] 0xffffffffbb2706b2
             0.00%  sleep    [k] 0xffffffffbb2b047d
        $
      
      After this patch we get the right output with just entries for the DSOs
      specified in --dso:
      
        $ perf report --stdio --dso /usr/lib64/ld-2.32.so
        # dso: /usr/lib64/ld-2.32.so
        #
        # Total Lost Samples: 0
        #
        # Samples: 8  of event 'cycles:u'
        # Event count (approx.): 96856
        #
        # Overhead  Command  Symbol
        # ........  .......  ........................
        #
            89.76%  sleep    [.] _dl_lookup_symbol_x
             9.46%  sleep    [.] __GI___tunables_init
        $
        #
      
      Fixes: 96415e4d ("perf symbols: Avoid unnecessary symbol loading when dso list is specified")
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210128131209.GD775562@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c69bf11a
    • Kan Liang's avatar
      perf stat: Add Topdown metrics events as default events · 42641d6f
      Kan Liang authored
      The Topdown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. From the Ice Lake and later platforms, the
      Topdown information can be retrieved from the dedicated "metrics"
      register, which isn't impacted by other events. Also, the Topdown
      metrics support both per thread/process and per core measuring.  Adding
      Topdown metrics events as default events can enrich the default
      measuring information, and would not cost any extra multiplexing.
      
      Introduce arch_evlist__add_default_attrs() to allow architecture
      specific default events. Add the Topdown metrics events in the X86
      specific arch_evlist__add_default_attrs(). Other architectures can add
      their own default events later separately.
      
      With the patch:
      
       $ perf stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                 0.82 msec task-clock:u              #    0.001 CPUs utilized
                    0      context-switches:u        #    0.000 K/sec
                    0      cpu-migrations:u          #    0.000 K/sec
                   61      page-faults:u             #    0.074 M/sec
              319,941      cycles:u                  #    0.388 GHz
              242,802      instructions:u            #    0.76  insn per cycle
               54,380      branches:u                #   66.028 M/sec
                4,043      branch-misses:u           #    7.43% of all branches
            1,585,555      slots:u                   # 1925.189 M/sec
              238,941      topdown-retiring:u        #     15.0% retiring
              410,378      topdown-bad-spec:u        #     25.8% bad speculation
              634,222      topdown-fe-bound:u        #     39.9% frontend bound
              304,675      topdown-be-bound:u        #     19.2% backend bound
      
             1.001791625 seconds time elapsed
      
             0.000000000 seconds user
             0.001572000 seconds sys
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20210121133752.118327-1-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42641d6f
    • John Garry's avatar
      perf test: Add parse-metric memory bandwidth testcase · 7efce5c2
      John Garry authored
      Event duration_time in a metric expression requires special handling.
      
      Improve test coverage by including a metric whose expression includes
      duration_time. The actual metric is a copied from the L1D_Cache_Fill_BW
      metric on my broadwell machine.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linuxarm@openeuler.org
      Link: http://lore.kernel.org/lkml/1611578842-5749-1-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7efce5c2
  2. 27 Jan, 2021 2 commits
  3. 26 Jan, 2021 7 commits
  4. 25 Jan, 2021 17 commits
    • Paolo Bonzini's avatar
      KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · 9a78e158
      Paolo Bonzini authored
      VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
      which may need to be loaded outside guest mode.  Therefore we cannot
      WARN in that case.
      
      However, that part of nested_get_vmcs12_pages is _not_ needed at
      vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
      so that both vmentry and migration (and in the latter case, independent
      of is_guest_mode) do the parts that are needed.
      
      Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9a78e158
    • Sean Christopherson's avatar
      KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written" · aed89418
      Sean Christopherson authored
      Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
      to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty.  Per
      commit de3cd117 ("KVM: x86: Omit caching logic for always-available
      GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.
      
      This reverts commit 1c04d8c9.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      aed89418
    • Sean Christopherson's avatar
      KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest · 25009140
      Sean Christopherson authored
      Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
      GRPs' dirty bits are set from time zero and never cleared, i.e. will
      always be seen as dirty.  The obvious alternative would be to clear
      the dirty bits when appropriate, but removing the dirty checks is
      desirable as it allows reverting GPR dirty+available tracking, which
      adds overhead to all flavors of x86 VMs.
      
      Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
      by the GHCB spec, which allows the hypervisor (or guest) to provide
      unnecessary info; it's the guest's responsibility to consume only what
      it needs (the hypervisor is untrusted after all).
      
        The guest and hypervisor can supply additional state if desired but
        must not rely on that additional state being provided.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      25009140
    • Maxim Levitsky's avatar
      KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration · d51e1d3f
      Maxim Levitsky authored
      Even when we are outside the nested guest, some vmcs02 fields
      may not be in sync vs vmcs12.  This is intentional, even across
      nested VM-exit, because the sync can be delayed until the nested
      hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
      rarely accessed fields.
      
      However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
      be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
      before the vmcs12 contents are copied to userspace.
      
      Fixes: 7952d769 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d51e1d3f
    • Lorenzo Brescia's avatar
      kvm: tracing: Fix unmatched kvm_entry and kvm_exit events · d95df951
      Lorenzo Brescia authored
      On VMX, if we exit and then re-enter immediately without leaving
      the vmx_vcpu_run() function, the kvm_entry event is not logged.
      That means we will see one (or more) kvm_exit, without its (their)
      corresponding kvm_entry, as shown here:
      
       CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
       CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
       CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE
      
      It also seems possible for a kvm_entry event to be logged, but then
      we leave vmx_vcpu_run() right away (if vmx->emulation_required is
      true). In this case, we will have a spurious kvm_entry event in the
      trace.
      
      Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
      (where trace_kvm_exit() already is).
      
      A trace obtained with this patch applied looks like this:
      
       CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
       CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT
      
      Of course, not calling trace_kvm_entry() in common x86 code any
      longer means that we need to adjust the SVM side of things too.
      Signed-off-by: default avatarLorenzo Brescia <lorenzo.brescia@edu.unito.it>
      Signed-off-by: default avatarDario Faggioli <dfaggioli@suse.com>
      Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d95df951
    • Zenghui Yu's avatar
      KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG · 01ead84c
      Zenghui Yu authored
      Update various words, including the wrong parameter name and the vague
      description of the usage of "slot" field.
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Message-Id: <20201208043439.895-1-yuzenghui@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      01ead84c
    • Jay Zhou's avatar
      KVM: x86: get smi pending status correctly · 1f7becf1
      Jay Zhou authored
      The injection process of smi has two steps:
      
          Qemu                        KVM
      Step1:
          cpu->interrupt_request &= \
              ~CPU_INTERRUPT_SMI;
          kvm_vcpu_ioctl(cpu, KVM_SMI)
      
                                      call kvm_vcpu_ioctl_smi() and
                                      kvm_make_request(KVM_REQ_SMI, vcpu);
      
      Step2:
          kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
      
                                      call process_smi() if
                                      kvm_check_request(KVM_REQ_SMI, vcpu) is
                                      true, mark vcpu->arch.smi_pending = true;
      
      The vcpu->arch.smi_pending will be set true in step2, unfortunately if
      vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
      set and vcpu has to exit to Qemu immediately during step2 before mark
      vcpu->arch.smi_pending true.
      During VM migration, Qemu will get the smi pending status from KVM using
      KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
      will be lost.
      Signed-off-by: default avatarJay Zhou <jianjay.zhou@huawei.com>
      Signed-off-by: default avatarShengen Zhuang <zhuangshengen@huawei.com>
      Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1f7becf1
    • Like Xu's avatar
      KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 98dd2f10
      Like Xu authored
      The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
      0x0300 in the intel_perfmon_event_map[]. Correct its usage.
      
      Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      98dd2f10
    • Like Xu's avatar
      KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · e61ab2a3
      Like Xu authored
      Since we know vPMU will not work properly when (1) the guest bit_width(s)
      of the [gp|fixed] counters are greater than the host ones, or (2) guest
      requested architectural events exceeds the range supported by the host, so
      we can setup a smaller left shift value and refresh the guest cpuid entry,
      thus fixing the following UBSAN shift-out-of-bounds warning:
      
      shift exponent 197 is too large for 64-bit type 'long long unsigned int'
      
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
       kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
       kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
       kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
       kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e61ab2a3
    • Sean Christopherson's avatar
      KVM: x86: Add more protection against undefined behavior in rsvd_bits() · eb79cd00
      Sean Christopherson authored
      Add compile-time asserts in rsvd_bits() to guard against KVM passing in
      garbage hardcoded values, and cap the upper bound at '63' for dynamic
      values to prevent generating a mask that would overflow a u64.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210113204515.3473079-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eb79cd00
    • Quentin Perret's avatar
      KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM · a10f373a
      Quentin Perret authored
      The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM
      as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM
      ioctl.
      
      Fixes: e5d83c74 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic")
      Signed-off-by: default avatarQuentin Perret <qperret@google.com>
      Message-Id: <20210108165349.747359-1-qperret@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a10f373a
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.11-2' of... · 615099b0
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 5.11, take #2
      
      - Don't allow tagged pointers to point to memslots
      - Filter out ARMv8.1+ PMU events on v8.0 hardware
      - Hide PMU registers from userspace when no PMU is configured
      - More PMU cleanups
      - Don't try to handle broken PSCI firmware
      - More sys_reg() to reg_to_encoding() conversions
      615099b0
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 13391c60
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix a regression in the cesa driver"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: marvel/cesa - Fix tdma descriptor on 64-bit
      13391c60
    • Johannes Berg's avatar
      fs/pipe: allow sendfile() to pipe again · f8ad8187
      Johannes Berg authored
      After commit 36e2c742 ("fs: don't allow splice read/write
      without explicit ops") sendfile() could no longer send data
      from a real file to a pipe, breaking for example certain cgit
      setups (e.g. when running behind fcgiwrap), because in this
      case cgit will try to do exactly this: sendfile() to a pipe.
      
      Fix this by using iter_file_splice_write for the splice_write
      method of pipes, as suggested by Christoph.
      
      Cc: stable@vger.kernel.org
      Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8ad8187
    • Sami Tolvanen's avatar
      Commit 9bb48c82 ("tty: implement write_iter") converted the tty · 9f12e37c
      Sami Tolvanen authored
      layer to use write_iter. Fix the redirected_tty_write declaration
      also in n_tty and change the comparisons to use write_iter instead of
      write.
      
      [ Also moved the declaration of redirected_tty_write() to the proper
        location in a header file. The reason for the bug was the bogus extern
        declaration in n_tty.c silently not matching the changed definition in
        tty_io.c, and because it wasn't in a shared header file, there was no
        cross-checking of the declaration.
      
        Sami noticed because Clang's Control Flow Integrity checking ended up
        incidentally noticing the inconsistent declaration.    - Linus ]
      
      Fixes: 9bb48c82 ("tty: implement write_iter")
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f12e37c
    • Linus Torvalds's avatar
      Merge tag 'printk-for-5.11-urgent-fixup' of... · 007ad27d
      Linus Torvalds authored
      Merge tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
      
      Pull printk fix from Petr Mladek:
       "The fix of a potential buffer overflow in 5.11-rc5 introduced another
        one. The trailing '\0' might be written up to the message "len" past
        the buffer. Fortunately, it is not that easy to hit.
      
        Most readers use 1kB buffers for a single message. Typical messages
        fit into the temporary buffer with enough reserve.
      
        Also readers do not rely on the '\0'. It is related to the previous
        fix. Some readers required the space for the trailing '\0'. We decided
        to write it there to avoid such regressions in the future.
      
        The most realistic victims are dumpers using kmsg_dump_get_buffer().
        They are filling the entire buffer with as many messages as possible.
        They are typically used when handling panic()"
      
      * tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: fix string termination for record_print_text()
      007ad27d
    • Petr Mladek's avatar
      Merge branch 'printk-rework' into for-linus · 61bb17da
      Petr Mladek authored
      61bb17da