Update Linux to v5.4.2
Change-Id: Idf6911045d9d382da2cfe01b1edff026404ac8fd
diff --git a/Documentation/trace/coresight-cpu-debug.txt b/Documentation/trace/coresight-cpu-debug.rst
similarity index 83%
rename from Documentation/trace/coresight-cpu-debug.txt
rename to Documentation/trace/coresight-cpu-debug.rst
index 89ab09e..993dd29 100644
--- a/Documentation/trace/coresight-cpu-debug.txt
+++ b/Documentation/trace/coresight-cpu-debug.rst
@@ -1,8 +1,9 @@
- Coresight CPU Debug Module
- ==========================
+==========================
+Coresight CPU Debug Module
+==========================
- Author: Leo Yan <leo.yan@linaro.org>
- Date: April 5th, 2017
+ :Author: Leo Yan <leo.yan@linaro.org>
+ :Date: April 5th, 2017
Introduction
------------
@@ -69,6 +70,7 @@
have been enabled properly. In ARMv8-a ARM (ARM DDI 0487A.k) chapter 'H9.1
Debug registers', the debug registers are spread into two domains: the debug
domain and the CPU domain.
+::
+---------------+
| |
@@ -125,18 +127,21 @@
"coresight_cpu_debug.enable=1" to the kernel command line parameter.
The driver also can work as module, so can enable the debugging when insmod
-module:
-# insmod coresight_cpu_debug.ko debug=1
+module::
+
+ # insmod coresight_cpu_debug.ko debug=1
When boot time or insmod module you have not enabled the debugging, the driver
uses the debugfs file system to provide a knob to dynamically enable or disable
debugging:
-To enable it, write a '1' into /sys/kernel/debug/coresight_cpu_debug/enable:
-# echo 1 > /sys/kernel/debug/coresight_cpu_debug/enable
+To enable it, write a '1' into /sys/kernel/debug/coresight_cpu_debug/enable::
-To disable it, write a '0' into /sys/kernel/debug/coresight_cpu_debug/enable:
-# echo 0 > /sys/kernel/debug/coresight_cpu_debug/enable
+ # echo 1 > /sys/kernel/debug/coresight_cpu_debug/enable
+
+To disable it, write a '0' into /sys/kernel/debug/coresight_cpu_debug/enable::
+
+ # echo 0 > /sys/kernel/debug/coresight_cpu_debug/enable
As explained in chapter "Clock and power domain", if you are working on one
platform which has idle states to power off debug logic and the power
@@ -151,37 +156,37 @@
It is possible to disable CPU idle states by way of the PM QoS
subsystem, more specifically by using the "/dev/cpu_dma_latency"
-interface (see Documentation/power/pm_qos_interface.txt for more
+interface (see Documentation/power/pm_qos_interface.rst for more
details). As specified in the PM QoS documentation the requested
parameter will stay in effect until the file descriptor is released.
-For example:
+For example::
-# exec 3<> /dev/cpu_dma_latency; echo 0 >&3
-...
-Do some work...
-...
-# exec 3<>-
+ # exec 3<> /dev/cpu_dma_latency; echo 0 >&3
+ ...
+ Do some work...
+ ...
+ # exec 3<>-
The same can also be done from an application program.
Disable specific CPU's specific idle state from cpuidle sysfs (see
-Documentation/cpuidle/sysfs.txt):
-# echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
+Documentation/admin-guide/pm/cpuidle.rst)::
+ # echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
Output format
-------------
-Here is an example of the debugging output format:
+Here is an example of the debugging output format::
-ARM external debug module:
-coresight-cpu-debug 850000.debug: CPU[0]:
-coresight-cpu-debug 850000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
-coresight-cpu-debug 850000.debug: EDPCSR: handle_IPI+0x174/0x1d8
-coresight-cpu-debug 850000.debug: EDCIDSR: 00000000
-coresight-cpu-debug 850000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
-coresight-cpu-debug 852000.debug: CPU[1]:
-coresight-cpu-debug 852000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
-coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358
-coresight-cpu-debug 852000.debug: EDCIDSR: 00000000
-coresight-cpu-debug 852000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
+ ARM external debug module:
+ coresight-cpu-debug 850000.debug: CPU[0]:
+ coresight-cpu-debug 850000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
+ coresight-cpu-debug 850000.debug: EDPCSR: handle_IPI+0x174/0x1d8
+ coresight-cpu-debug 850000.debug: EDCIDSR: 00000000
+ coresight-cpu-debug 850000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
+ coresight-cpu-debug 852000.debug: CPU[1]:
+ coresight-cpu-debug 852000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock)
+ coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358
+ coresight-cpu-debug 852000.debug: EDCIDSR: 00000000
+ coresight-cpu-debug 852000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0)
diff --git a/Documentation/trace/coresight.rst b/Documentation/trace/coresight.rst
new file mode 100644
index 0000000..72f4b7e
--- /dev/null
+++ b/Documentation/trace/coresight.rst
@@ -0,0 +1,498 @@
+======================================
+Coresight - HW Assisted Tracing on ARM
+======================================
+
+ :Author: Mathieu Poirier <mathieu.poirier@linaro.org>
+ :Date: September 11th, 2014
+
+Introduction
+------------
+
+Coresight is an umbrella of technologies allowing for the debugging of ARM
+based SoC. It includes solutions for JTAG and HW assisted tracing. This
+document is concerned with the latter.
+
+HW assisted tracing is becoming increasingly useful when dealing with systems
+that have many SoCs and other components like GPU and DMA engines. ARM has
+developed a HW assisted tracing solution by means of different components, each
+being added to a design at synthesis time to cater to specific tracing needs.
+Components are generally categorised as source, link and sinks and are
+(usually) discovered using the AMBA bus.
+
+"Sources" generate a compressed stream representing the processor instruction
+path based on tracing scenarios as configured by users. From there the stream
+flows through the coresight system (via ATB bus) using links that are connecting
+the emanating source to a sink(s). Sinks serve as endpoints to the coresight
+implementation, either storing the compressed stream in a memory buffer or
+creating an interface to the outside world where data can be transferred to a
+host without fear of filling up the onboard coresight memory buffer.
+
+At typical coresight system would look like this::
+
+ *****************************************************************
+ **************************** AMBA AXI ****************************===||
+ ***************************************************************** ||
+ ^ ^ | ||
+ | | * **
+ 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
+ 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
+ |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
+ | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
+ | # ETM # ::::: | # PTM # ::::: ::::: @ |
+ | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
+ | |->### | ! | |->### | ! | ! . | || DAP ||
+ | | # | ! | | # | ! | ! . | |||||||||
+ | | . | ! | | . | ! | ! . | | |
+ | | . | ! | | . | ! | ! . | | *
+ | | . | ! | | . | ! | ! . | | SWD/
+ | | . | ! | | . | ! | ! . | | JTAG
+ *****************************************************************<-|
+ *************************** AMBA Debug APB ************************
+ *****************************************************************
+ | . ! . ! ! . |
+ | . * . * * . |
+ *****************************************************************
+ ******************** Cross Trigger Matrix (CTM) *******************
+ *****************************************************************
+ | . ^ . . |
+ | * ! * * |
+ *****************************************************************
+ ****************** AMBA Advanced Trace Bus (ATB) ******************
+ *****************************************************************
+ | ! =============== |
+ | * ===== F =====<---------|
+ | ::::::::: ==== U ====
+ |-->:: CTI ::<!! === N ===
+ | ::::::::: ! == N ==
+ | ^ * == E ==
+ | ! &&&&&&&&& IIIIIII == L ==
+ |------>&& ETB &&<......II I =======
+ | ! &&&&&&&&& II I .
+ | ! I I .
+ | ! I REP I<..........
+ | ! I I
+ | !!>&&&&&&&&& II I *Source: ARM ltd.
+ |------>& TPIU &<......II I DAP = Debug Access Port
+ &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
+ ; PTM = Program Trace Macrocell
+ ; CTI = Cross Trigger Interface
+ * ETB = Embedded Trace Buffer
+ To trace port TPIU= Trace Port Interface Unit
+ SWD = Serial Wire Debug
+
+While on target configuration of the components is done via the APB bus,
+all trace data are carried out-of-band on the ATB bus. The CTM provides
+a way to aggregate and distribute signals between CoreSight components.
+
+The coresight framework provides a central point to represent, configure and
+manage coresight devices on a platform. This first implementation centers on
+the basic tracing functionality, enabling components such ETM/PTM, funnel,
+replicator, TMC, TPIU and ETB. Future work will enable more
+intricate IP blocks such as STM and CTI.
+
+
+Acronyms and Classification
+---------------------------
+
+Acronyms:
+
+PTM:
+ Program Trace Macrocell
+ETM:
+ Embedded Trace Macrocell
+STM:
+ System trace Macrocell
+ETB:
+ Embedded Trace Buffer
+ITM:
+ Instrumentation Trace Macrocell
+TPIU:
+ Trace Port Interface Unit
+TMC-ETR:
+ Trace Memory Controller, configured as Embedded Trace Router
+TMC-ETF:
+ Trace Memory Controller, configured as Embedded Trace FIFO
+CTI:
+ Cross Trigger Interface
+
+Classification:
+
+Source:
+ ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
+Link:
+ Funnel, replicator (intelligent or not), TMC-ETR
+Sinks:
+ ETBv1.0, ETB1.1, TPIU, TMC-ETF
+Misc:
+ CTI
+
+
+Device Tree Bindings
+--------------------
+
+See Documentation/devicetree/bindings/arm/coresight.txt for details.
+
+As of this writing drivers for ITM, STMs and CTIs are not provided but are
+expected to be added as the solution matures.
+
+
+Framework and implementation
+----------------------------
+
+The coresight framework provides a central point to represent, configure and
+manage coresight devices on a platform. Any coresight compliant device can
+register with the framework for as long as they use the right APIs:
+
+.. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc);
+.. c:function:: void coresight_unregister(struct coresight_device *csdev);
+
+The registering function is taking a ``struct coresight_desc *desc`` and
+register the device with the core framework. The unregister function takes
+a reference to a ``struct coresight_device *csdev`` obtained at registration time.
+
+If everything goes well during the registration process the new devices will
+show up under /sys/bus/coresight/devices, as showns here for a TC2 platform::
+
+ root:~# ls /sys/bus/coresight/devices/
+ replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
+ 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
+ root:~#
+
+The functions take a ``struct coresight_device``, which looks like this::
+
+ struct coresight_desc {
+ enum coresight_dev_type type;
+ struct coresight_dev_subtype subtype;
+ const struct coresight_ops *ops;
+ struct coresight_platform_data *pdata;
+ struct device *dev;
+ const struct attribute_group **groups;
+ };
+
+
+The "coresight_dev_type" identifies what the device is, i.e, source link or
+sink while the "coresight_dev_subtype" will characterise that type further.
+
+The ``struct coresight_ops`` is mandatory and will tell the framework how to
+perform base operations related to the components, each component having
+a different set of requirement. For that ``struct coresight_ops_sink``,
+``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been
+provided.
+
+The next field ``struct coresight_platform_data *pdata`` is acquired by calling
+``of_get_coresight_platform_data()``, as part of the driver's _probe routine and
+``struct device *dev`` gets the device reference embedded in the ``amba_device``::
+
+ static int etm_probe(struct amba_device *adev, const struct amba_id *id)
+ {
+ ...
+ ...
+ drvdata->dev = &adev->dev;
+ ...
+ }
+
+Specific class of device (source, link, or sink) have generic operations
+that can be performed on them (see ``struct coresight_ops``). The ``**groups``
+is a list of sysfs entries pertaining to operations
+specific to that component only. "Implementation defined" customisations are
+expected to be accessed and controlled using those entries.
+
+Device Naming scheme
+--------------------
+
+The devices that appear on the "coresight" bus were named the same as their
+parent devices, i.e, the real devices that appears on AMBA bus or the platform bus.
+Thus the names were based on the Linux Open Firmware layer naming convention,
+which follows the base physical address of the device followed by the device
+type. e.g::
+
+ root:~# ls /sys/bus/coresight/devices/
+ 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
+ 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
+ 20070000.etr 20120000.replicator 220c0000.funnel
+ 23040000.etm 23140000.etm 23340000.etm
+
+However, with the introduction of ACPI support, the names of the real
+devices are a bit cryptic and non-obvious. Thus, a new naming scheme was
+introduced to use more generic names based on the type of the device. The
+following rules apply::
+
+ 1) Devices that are bound to CPUs, are named based on the CPU logical
+ number.
+
+ e.g, ETM bound to CPU0 is named "etm0"
+
+ 2) All other devices follow a pattern, "<device_type_prefix>N", where :
+
+ <device_type_prefix> - A prefix specific to the type of the device
+ N - a sequential number assigned based on the order
+ of probing.
+
+ e.g, tmc_etf0, tmc_etr0, funnel0, funnel1
+
+Thus, with the new scheme the devices could appear as ::
+
+ root:~# ls /sys/bus/coresight/devices/
+ etm0 etm1 etm2 etm3 etm4 etm5 funnel0
+ funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
+
+Some of the examples below might refer to old naming scheme and some
+to the newer scheme, to give a confirmation that what you see on your
+system is not unexpected. One must use the "names" as they appear on
+the system under specified locations.
+
+How to use the tracer modules
+-----------------------------
+
+There are two ways to use the Coresight framework:
+
+1. using the perf cmd line tools.
+2. interacting directly with the Coresight devices using the sysFS interface.
+
+Preference is given to the former as using the sysFS interface
+requires a deep understanding of the Coresight HW. The following sections
+provide details on using both methods.
+
+1) Using the sysFS interface:
+
+Before trace collection can start, a coresight sink needs to be identified.
+There is no limit on the amount of sinks (nor sources) that can be enabled at
+any given moment. As a generic operation, all device pertaining to the sink
+class will have an "active" entry in sysfs::
+
+ root:/sys/bus/coresight/devices# ls
+ replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
+ 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
+ root:/sys/bus/coresight/devices# ls 20010000.etb
+ enable_sink status trigger_cntr
+ root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
+ root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
+ 1
+ root:/sys/bus/coresight/devices#
+
+At boot time the current etm3x driver will configure the first address
+comparator with "_stext" and "_etext", essentially tracing any instruction
+that falls within that range. As such "enabling" a source will immediately
+trigger a trace capture::
+
+ root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
+ root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
+ 1
+ root:/sys/bus/coresight/devices# cat 20010000.etb/status
+ Depth: 0x2000
+ Status: 0x1
+ RAM read ptr: 0x0
+ RAM wrt ptr: 0x19d3 <----- The write pointer is moving
+ Trigger cnt: 0x0
+ Control: 0x1
+ Flush status: 0x0
+ Flush ctrl: 0x2001
+ root:/sys/bus/coresight/devices#
+
+Trace collection is stopped the same way::
+
+ root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
+ root:/sys/bus/coresight/devices#
+
+The content of the ETB buffer can be harvested directly from /dev::
+
+ root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
+ of=~/cstrace.bin
+ 64+0 records in
+ 64+0 records out
+ 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
+ root:/sys/bus/coresight/devices#
+
+The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
+
+Following is a DS-5 output of an experimental loop that increments a variable up
+to a certain value. The example is simple and yet provides a glimpse of the
+wealth of possibilities that coresight provides.
+::
+
+ Info Tracing enabled
+ Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
+ Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
+ Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
+ Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Timestamp Timestamp: 17106715833
+ Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
+ Instruction 0 0x8026B550 E3530004 false CMP r3,#4
+ Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
+ Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
+ Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
+ Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
+ Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
+ Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
+ Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
+ Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
+ Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
+ Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
+ Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
+ Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
+ Info Tracing enabled
+ Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
+ Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
+ Timestamp Timestamp: 17107041535
+
+2) Using perf framework:
+
+Coresight tracers are represented using the Perf framework's Performance
+Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
+controlling when tracing gets enabled based on when the process of interest is
+scheduled. When configured in a system, Coresight PMUs will be listed when
+queried by the perf command line tool:
+
+ linaro@linaro-nano:~$ ./perf list pmu
+
+ List of pre-defined events (to be used in -e):
+
+ cs_etm// [Kernel PMU event]
+
+ linaro@linaro-nano:~$
+
+Regardless of the number of tracers available in a system (usually equal to the
+amount of processor cores), the "cs_etm" PMU will be listed only once.
+
+A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
+listed along with configuration options within forward slashes '/'. Since a
+Coresight system will typically have more than one sink, the name of the sink to
+work with needs to be specified as an event option.
+On newer kernels the available sinks are listed in sysFS under
+($SYSFS)/bus/event_source/devices/cs_etm/sinks/::
+
+ root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
+ tmc_etf0 tmc_etr0 tpiu0
+
+On older kernels, this may need to be found from the list of coresight devices,
+available under ($SYSFS)/bus/coresight/devices/::
+
+ root:~# ls /sys/bus/coresight/devices/
+ etm0 etm1 etm2 etm3 etm4 etm5 funnel0
+ funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
+ root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
+
+As mentioned above in section "Device Naming scheme", the names of the devices could
+look different from what is used in the example above. One must use the device names
+as it appears under the sysFS.
+
+The syntax within the forward slashes '/' is important. The '@' character
+tells the parser that a sink is about to be specified and that this is the sink
+to use for the trace session.
+
+More information on the above and other example on how to use Coresight with
+the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
+repository [#third]_.
+
+2.1) AutoFDO analysis using the perf tools:
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g::
+
+ perf record -e cs_etm/@tmc_etr0/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+::
+
+ perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO. It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5. The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+::
+
+ $ gcc-5 -O3 sort.c -o sort
+ $ taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 5910 ms
+
+ $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 12543 ms
+ [ perf record: Woken up 35 times to write data ]
+ [ perf record: Captured and wrote 69.640 MB perf.data ]
+
+ $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+ $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+ $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+ $ taskset -c 2 ./sort_autofdo
+ Bubble sorting array of 30000 elements
+ 5806 ms
+
+
+How to use the STM module
+-------------------------
+
+Using the System Trace Macrocell module is the same as the tracers - the only
+difference is that clients are driving the trace capture rather
+than the program flow through the code.
+
+As with any other CoreSight component, specifics about the STM tracer can be
+found in sysfs with more information on each entry being found in [#first]_::
+
+ root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
+ enable_source hwevent_select port_enable subsystem uevent
+ hwevent_enable mgmt port_select traceid
+ root@genericarmv8:~#
+
+Like any other source a sink needs to be identified and the STM enabled before
+being used::
+
+ root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
+ root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source
+
+From there user space applications can request and use channels using the devfs
+interface provided for that purpose by the generic STM API::
+
+ root@genericarmv8:~# ls -l /dev/stm0
+ crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0
+ root@genericarmv8:~#
+
+Details on how to use the generic STM API can be found here [#second]_.
+
+.. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
+
+.. [#second] Documentation/trace/stm.rst
+
+.. [#third] https://github.com/Linaro/perf-opencsd
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
deleted file mode 100644
index efbc832..0000000
--- a/Documentation/trace/coresight.txt
+++ /dev/null
@@ -1,430 +0,0 @@
- Coresight - HW Assisted Tracing on ARM
- ======================================
-
- Author: Mathieu Poirier <mathieu.poirier@linaro.org>
- Date: September 11th, 2014
-
-Introduction
-------------
-
-Coresight is an umbrella of technologies allowing for the debugging of ARM
-based SoC. It includes solutions for JTAG and HW assisted tracing. This
-document is concerned with the latter.
-
-HW assisted tracing is becoming increasingly useful when dealing with systems
-that have many SoCs and other components like GPU and DMA engines. ARM has
-developed a HW assisted tracing solution by means of different components, each
-being added to a design at synthesis time to cater to specific tracing needs.
-Components are generally categorised as source, link and sinks and are
-(usually) discovered using the AMBA bus.
-
-"Sources" generate a compressed stream representing the processor instruction
-path based on tracing scenarios as configured by users. From there the stream
-flows through the coresight system (via ATB bus) using links that are connecting
-the emanating source to a sink(s). Sinks serve as endpoints to the coresight
-implementation, either storing the compressed stream in a memory buffer or
-creating an interface to the outside world where data can be transferred to a
-host without fear of filling up the onboard coresight memory buffer.
-
-At typical coresight system would look like this:
-
- *****************************************************************
- **************************** AMBA AXI ****************************===||
- ***************************************************************** ||
- ^ ^ | ||
- | | * **
- 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
- 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
- |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
- | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
- | # ETM # ::::: | # PTM # ::::: ::::: @ |
- | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
- | |->### | ! | |->### | ! | ! . | || DAP ||
- | | # | ! | | # | ! | ! . | |||||||||
- | | . | ! | | . | ! | ! . | | |
- | | . | ! | | . | ! | ! . | | *
- | | . | ! | | . | ! | ! . | | SWD/
- | | . | ! | | . | ! | ! . | | JTAG
- *****************************************************************<-|
- *************************** AMBA Debug APB ************************
- *****************************************************************
- | . ! . ! ! . |
- | . * . * * . |
- *****************************************************************
- ******************** Cross Trigger Matrix (CTM) *******************
- *****************************************************************
- | . ^ . . |
- | * ! * * |
- *****************************************************************
- ****************** AMBA Advanced Trace Bus (ATB) ******************
- *****************************************************************
- | ! =============== |
- | * ===== F =====<---------|
- | ::::::::: ==== U ====
- |-->:: CTI ::<!! === N ===
- | ::::::::: ! == N ==
- | ^ * == E ==
- | ! &&&&&&&&& IIIIIII == L ==
- |------>&& ETB &&<......II I =======
- | ! &&&&&&&&& II I .
- | ! I I .
- | ! I REP I<..........
- | ! I I
- | !!>&&&&&&&&& II I *Source: ARM ltd.
- |------>& TPIU &<......II I DAP = Debug Access Port
- &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
- ; PTM = Program Trace Macrocell
- ; CTI = Cross Trigger Interface
- * ETB = Embedded Trace Buffer
- To trace port TPIU= Trace Port Interface Unit
- SWD = Serial Wire Debug
-
-While on target configuration of the components is done via the APB bus,
-all trace data are carried out-of-band on the ATB bus. The CTM provides
-a way to aggregate and distribute signals between CoreSight components.
-
-The coresight framework provides a central point to represent, configure and
-manage coresight devices on a platform. This first implementation centers on
-the basic tracing functionality, enabling components such ETM/PTM, funnel,
-replicator, TMC, TPIU and ETB. Future work will enable more
-intricate IP blocks such as STM and CTI.
-
-
-Acronyms and Classification
----------------------------
-
-Acronyms:
-
-PTM: Program Trace Macrocell
-ETM: Embedded Trace Macrocell
-STM: System trace Macrocell
-ETB: Embedded Trace Buffer
-ITM: Instrumentation Trace Macrocell
-TPIU: Trace Port Interface Unit
-TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router
-TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO
-CTI: Cross Trigger Interface
-
-Classification:
-
-Source:
- ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
-Link:
- Funnel, replicator (intelligent or not), TMC-ETR
-Sinks:
- ETBv1.0, ETB1.1, TPIU, TMC-ETF
-Misc:
- CTI
-
-
-Device Tree Bindings
-----------------------
-
-See Documentation/devicetree/bindings/arm/coresight.txt for details.
-
-As of this writing drivers for ITM, STMs and CTIs are not provided but are
-expected to be added as the solution matures.
-
-
-Framework and implementation
-----------------------------
-
-The coresight framework provides a central point to represent, configure and
-manage coresight devices on a platform. Any coresight compliant device can
-register with the framework for as long as they use the right APIs:
-
-struct coresight_device *coresight_register(struct coresight_desc *desc);
-void coresight_unregister(struct coresight_device *csdev);
-
-The registering function is taking a "struct coresight_device *csdev" and
-register the device with the core framework. The unregister function takes
-a reference to a "struct coresight_device", obtained at registration time.
-
-If everything goes well during the registration process the new devices will
-show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
-
-root:~# ls /sys/bus/coresight/devices/
-replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
-20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
-root:~#
-
-The functions take a "struct coresight_device", which looks like this:
-
-struct coresight_desc {
- enum coresight_dev_type type;
- struct coresight_dev_subtype subtype;
- const struct coresight_ops *ops;
- struct coresight_platform_data *pdata;
- struct device *dev;
- const struct attribute_group **groups;
-};
-
-
-The "coresight_dev_type" identifies what the device is, i.e, source link or
-sink while the "coresight_dev_subtype" will characterise that type further.
-
-The "struct coresight_ops" is mandatory and will tell the framework how to
-perform base operations related to the components, each component having
-a different set of requirement. For that "struct coresight_ops_sink",
-"struct coresight_ops_link" and "struct coresight_ops_source" have been
-provided.
-
-The next field, "struct coresight_platform_data *pdata" is acquired by calling
-"of_get_coresight_platform_data()", as part of the driver's _probe routine and
-"struct device *dev" gets the device reference embedded in the "amba_device":
-
-static int etm_probe(struct amba_device *adev, const struct amba_id *id)
-{
- ...
- ...
- drvdata->dev = &adev->dev;
- ...
-}
-
-Specific class of device (source, link, or sink) have generic operations
-that can be performed on them (see "struct coresight_ops"). The
-"**groups" is a list of sysfs entries pertaining to operations
-specific to that component only. "Implementation defined" customisations are
-expected to be accessed and controlled using those entries.
-
-
-How to use the tracer modules
------------------------------
-
-There are two ways to use the Coresight framework: 1) using the perf cmd line
-tools and 2) interacting directly with the Coresight devices using the sysFS
-interface. Preference is given to the former as using the sysFS interface
-requires a deep understanding of the Coresight HW. The following sections
-provide details on using both methods.
-
-1) Using the sysFS interface:
-
-Before trace collection can start, a coresight sink needs to be identified.
-There is no limit on the amount of sinks (nor sources) that can be enabled at
-any given moment. As a generic operation, all device pertaining to the sink
-class will have an "active" entry in sysfs:
-
-root:/sys/bus/coresight/devices# ls
-replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
-20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
-root:/sys/bus/coresight/devices# ls 20010000.etb
-enable_sink status trigger_cntr
-root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
-root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
-1
-root:/sys/bus/coresight/devices#
-
-At boot time the current etm3x driver will configure the first address
-comparator with "_stext" and "_etext", essentially tracing any instruction
-that falls within that range. As such "enabling" a source will immediately
-trigger a trace capture:
-
-root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
-root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
-1
-root:/sys/bus/coresight/devices# cat 20010000.etb/status
-Depth: 0x2000
-Status: 0x1
-RAM read ptr: 0x0
-RAM wrt ptr: 0x19d3 <----- The write pointer is moving
-Trigger cnt: 0x0
-Control: 0x1
-Flush status: 0x0
-Flush ctrl: 0x2001
-root:/sys/bus/coresight/devices#
-
-Trace collection is stopped the same way:
-
-root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
-root:/sys/bus/coresight/devices#
-
-The content of the ETB buffer can be harvested directly from /dev:
-
-root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
-of=~/cstrace.bin
-
-64+0 records in
-64+0 records out
-32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
-root:/sys/bus/coresight/devices#
-
-The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
-
-Following is a DS-5 output of an experimental loop that increments a variable up
-to a certain value. The example is simple and yet provides a glimpse of the
-wealth of possibilities that coresight provides.
-
-Info Tracing enabled
-Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
-Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
-Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
-Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
-Instruction 0 0x8026B550 E3530004 false CMP r3,#4
-Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
-Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
-Timestamp Timestamp: 17106715833
-Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
-Instruction 0 0x8026B550 E3530004 false CMP r3,#4
-Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
-Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
-Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
-Instruction 0 0x8026B550 E3530004 false CMP r3,#4
-Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
-Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
-Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
-Instruction 0 0x8026B550 E3530004 false CMP r3,#4
-Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
-Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
-Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
-Instruction 0 0x8026B550 E3530004 false CMP r3,#4
-Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
-Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
-Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
-Instruction 0 0x8026B550 E3530004 false CMP r3,#4
-Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
-Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
-Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
-Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
-Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
-Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
-Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
-Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
-Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
-Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
-Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
-Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
-Info Tracing enabled
-Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
-Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
-Timestamp Timestamp: 17107041535
-
-2) Using perf framework:
-
-Coresight tracers are represented using the Perf framework's Performance
-Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
-controlling when tracing gets enabled based on when the process of interest is
-scheduled. When configured in a system, Coresight PMUs will be listed when
-queried by the perf command line tool:
-
- linaro@linaro-nano:~$ ./perf list pmu
-
- List of pre-defined events (to be used in -e):
-
- cs_etm// [Kernel PMU event]
-
- linaro@linaro-nano:~$
-
-Regardless of the number of tracers available in a system (usually equal to the
-amount of processor cores), the "cs_etm" PMU will be listed only once.
-
-A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
-listed along with configuration options within forward slashes '/'. Since a
-Coresight system will typically have more than one sink, the name of the sink to
-work with needs to be specified as an event option. Names for sink to choose
-from are listed in sysFS under ($SYSFS)/bus/coresight/devices:
-
- root@linaro-nano:~# ls /sys/bus/coresight/devices/
- 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
- 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
- 20070000.etr 20120000.replicator 220c0000.funnel
- 23040000.etm 23140000.etm 23340000.etm
-
- root@linaro-nano:~# perf record -e cs_etm/@20070000.etr/u --per-thread program
-
-The syntax within the forward slashes '/' is important. The '@' character
-tells the parser that a sink is about to be specified and that this is the sink
-to use for the trace session.
-
-More information on the above and other example on how to use Coresight with
-the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
-repository [3].
-
-2.1) AutoFDO analysis using the perf tools:
-
-perf can be used to record and analyze trace of programs.
-
-Execution can be recorded using 'perf record' with the cs_etm event,
-specifying the name of the sink to record to, e.g:
-
- perf record -e cs_etm/@20070000.etr/u --per-thread
-
-The 'perf report' and 'perf script' commands can be used to analyze execution,
-synthesizing instruction and branch events from the instruction trace.
-'perf inject' can be used to replace the trace data with the synthesized events.
-The --itrace option controls the type and frequency of synthesized events
-(see perf documentation).
-
-Note that only 64-bit programs are currently supported - further work is
-required to support instruction decode of 32-bit Arm programs.
-
-
-Generating coverage files for Feedback Directed Optimization: AutoFDO
----------------------------------------------------------------------
-
-'perf inject' accepts the --itrace option in which case tracing data is
-removed and replaced with the synthesized events. e.g.
-
- perf inject --itrace --strip -i perf.data -o perf.data.new
-
-Below is an example of using ARM ETM for autoFDO. It requires autofdo
-(https://github.com/google/autofdo) and gcc version 5. The bubble
-sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
-
- $ gcc-5 -O3 sort.c -o sort
- $ taskset -c 2 ./sort
- Bubble sorting array of 30000 elements
- 5910 ms
-
- $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
- Bubble sorting array of 30000 elements
- 12543 ms
- [ perf record: Woken up 35 times to write data ]
- [ perf record: Captured and wrote 69.640 MB perf.data ]
-
- $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
- $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
- $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
- $ taskset -c 2 ./sort_autofdo
- Bubble sorting array of 30000 elements
- 5806 ms
-
-
-How to use the STM module
--------------------------
-
-Using the System Trace Macrocell module is the same as the tracers - the only
-difference is that clients are driving the trace capture rather
-than the program flow through the code.
-
-As with any other CoreSight component, specifics about the STM tracer can be
-found in sysfs with more information on each entry being found in [1]:
-
-root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
-enable_source hwevent_select port_enable subsystem uevent
-hwevent_enable mgmt port_select traceid
-root@genericarmv8:~#
-
-Like any other source a sink needs to be identified and the STM enabled before
-being used:
-
-root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
-root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
-
-From there user space applications can request and use channels using the devfs
-interface provided for that purpose by the generic STM API:
-
-root@genericarmv8:~# ls -l /dev/20100000.stm
-crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm
-root@genericarmv8:~#
-
-Details on how to use the generic STM API can be found here [2].
-
-[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
-[2]. Documentation/trace/stm.rst
-[3]. https://github.com/Linaro/perf-opencsd
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 7ea16a0..e3060ee 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -24,13 +24,13 @@
performance issues that take place outside of user-space.
Although ftrace is typically considered the function tracer, it
-is really a frame work of several assorted tracing utilities.
+is really a framework of several assorted tracing utilities.
There's latency tracing to examine what occurs between interrupts
disabled and enabled, as well as for preemption and from a time
a task is woken to the task is actually scheduled in.
One of the most common uses of ftrace is the event tracing.
-Through out the kernel is hundreds of static event points that
+Throughout the kernel is hundreds of static event points that
can be enabled via the tracefs file system to see what is
going on in certain parts of the kernel.
@@ -125,7 +125,8 @@
This file holds the output of the trace in a human
readable format (described below). Note, tracing is temporarily
- disabled while this file is being read (opened).
+ disabled when the file is open for reading. Once all readers
+ are closed, tracing is re-enabled.
trace_pipe:
@@ -139,8 +140,9 @@
will not be read again with a sequential read. The
"trace" file is static, and if the tracer is not
adding more data, it will display the same
- information every time it is read. This file will not
- disable tracing while being read.
+ information every time it is read. Unlike the
+ "trace" file, opening this file for reading will not
+ temporarily disable tracing.
trace_options:
@@ -233,6 +235,12 @@
This interface also allows for commands to be used. See the
"Filter commands" section for more details.
+ As a speed up, since processing strings can't be quite expensive
+ and requires a check of all functions registered to tracing, instead
+ an index can be written into this file. A number (starting with "1")
+ written will instead select the same corresponding at the line position
+ of the "available_filter_functions" file.
+
set_ftrace_notrace:
This has an effect opposite to that of
@@ -462,7 +470,7 @@
mono_raw:
This is the raw monotonic clock (CLOCK_MONOTONIC_RAW)
- which is montonic but is not subject to any rate adjustments
+ which is monotonic but is not subject to any rate adjustments
and ticks at the same rate as the hardware clocksource.
boot:
@@ -759,6 +767,37 @@
tracers from tracing simply echo "nop" into
current_tracer.
+Error conditions
+----------------
+
+ For most ftrace commands, failure modes are obvious and communicated
+ using standard return codes.
+
+ For other more involved commands, extended error information may be
+ available via the tracing/error_log file. For the commands that
+ support it, reading the tracing/error_log file after an error will
+ display more detailed information about what went wrong, if
+ information is available. The tracing/error_log file is a circular
+ error log displaying a small number (currently, 8) of ftrace errors
+ for the last (8) failed commands.
+
+ The extended error information and usage takes the form shown in
+ this example::
+
+ # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
+ echo: write error: Invalid argument
+
+ # cat /sys/kernel/debug/tracing/error_log
+ [ 5348.887237] location: error: Couldn't yyy: zzz
+ Command: xxx
+ ^
+ [ 7517.023364] location: error: Bad rrr: sss
+ Command: ppp qqq
+ ^
+
+ To clear the error log, echo the empty string into it::
+
+ # echo > /sys/kernel/debug/tracing/error_log
Examples of using the tracer
----------------------------
@@ -914,8 +953,8 @@
current trace and the next trace.
- '$' - greater than 1 second
- - '@' - greater than 100 milisecond
- - '*' - greater than 10 milisecond
+ - '@' - greater than 100 millisecond
+ - '*' - greater than 10 millisecond
- '#' - greater than 1000 microsecond
- '!' - greater than 100 microsecond
- '+' - greater than 10 microsecond
@@ -1396,6 +1435,58 @@
overhead may extend the latency times. But nevertheless, this
trace has provided some very helpful debugging information.
+If we prefer function graph output instead of function, we can set
+display-graph option::
+
+ with echo 1 > options/display-graph
+
+ # tracer: irqsoff
+ #
+ # irqsoff latency trace v1.1.5 on 4.20.0-rc6+
+ # --------------------------------------------------------------------
+ # latency: 3751 us, #274/274, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
+ # -----------------
+ # | task: bash-1507 (uid:0 nice:0 policy:0 rt_prio:0)
+ # -----------------
+ # => started at: free_debug_processing
+ # => ended at: return_to_handler
+ #
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| /
+ # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
+ # | | | | |||| | | | | | |
+ 0 us | 0) bash-1507 | d... | 0.000 us | _raw_spin_lock_irqsave();
+ 0 us | 0) bash-1507 | d..1 | 0.378 us | do_raw_spin_trylock();
+ 1 us | 0) bash-1507 | d..2 | | set_track() {
+ 2 us | 0) bash-1507 | d..2 | | save_stack_trace() {
+ 2 us | 0) bash-1507 | d..2 | | __save_stack_trace() {
+ 3 us | 0) bash-1507 | d..2 | | __unwind_start() {
+ 3 us | 0) bash-1507 | d..2 | | get_stack_info() {
+ 3 us | 0) bash-1507 | d..2 | 0.351 us | in_task_stack();
+ 4 us | 0) bash-1507 | d..2 | 1.107 us | }
+ [...]
+ 3750 us | 0) bash-1507 | d..1 | 0.516 us | do_raw_spin_unlock();
+ 3750 us | 0) bash-1507 | d..1 | 0.000 us | _raw_spin_unlock_irqrestore();
+ 3764 us | 0) bash-1507 | d..1 | 0.000 us | tracer_hardirqs_on();
+ bash-1507 0d..1 3792us : <stack trace>
+ => free_debug_processing
+ => __slab_free
+ => kmem_cache_free
+ => vm_area_free
+ => remove_vma
+ => exit_mmap
+ => mmput
+ => flush_old_exec
+ => load_elf_binary
+ => search_binary_handler
+ => __do_execve_file.isra.32
+ => __x64_sys_execve
+ => do_syscall_64
+ => entry_SYSCALL_64_after_hwframe
preemptoff
----------
@@ -2541,7 +2632,7 @@
recordmcount program (located in the scripts directory). This
program will parse the ELF headers in the C object to find all
the locations in the .text section that call mcount. Starting
-with gcc verson 4.6, the -mfentry has been added for x86, which
+with gcc version 4.6, the -mfentry has been added for x86, which
calls "__fentry__" instead of "mcount". Which is called before
the creation of the stack frame.
@@ -2784,6 +2875,38 @@
We can see that there's no more lock or preempt tracing.
+Selecting function filters via index
+------------------------------------
+
+Because processing of strings is expensive (the address of the function
+needs to be looked up before comparing to the string being passed in),
+an index can be used as well to enable functions. This is useful in the
+case of setting thousands of specific functions at a time. By passing
+in a list of numbers, no string processing will occur. Instead, the function
+at the specific location in the internal array (which corresponds to the
+functions in the "available_filter_functions" file), is selected.
+
+::
+
+ # echo 1 > set_ftrace_filter
+
+Will select the first function listed in "available_filter_functions"
+
+::
+
+ # head -1 available_filter_functions
+ trace_initcall_finish_cb
+
+ # cat set_ftrace_filter
+ trace_initcall_finish_cb
+
+ # head -50 available_filter_functions | tail -1
+ x86_pmu_commit_txn
+
+ # echo 1 50 > set_ftrace_filter
+ # cat set_ftrace_filter
+ trace_initcall_finish_cb
+ x86_pmu_commit_txn
Dynamic ftrace with the function graph tracer
---------------------------------------------
@@ -2978,7 +3101,7 @@
When the function is hit, it will dump the contents of the ftrace
ring buffer to the console. This is useful if you need to debug
something, and want to dump the trace when a certain function
- is hit. Perhaps its a function that is called before a tripple
+ is hit. Perhaps it's a function that is called before a triple
fault happens and does not allow you to get a regular dump.
- cpudump:
@@ -2987,6 +3110,9 @@
command, it only prints out the contents of the ring buffer for the
CPU that executed the function that triggered the dump.
+- stacktrace:
+ When the function is hit, a stack trace is recorded.
+
trace_pipe
----------
@@ -3029,7 +3155,10 @@
Note, reading the trace_pipe file will block until more input is
-added.
+added. This is contrary to the trace file. If any process opened
+the trace file for reading, it will actually disable tracing and
+prevent new entries from being added. The trace_pipe file does
+not have this limitation.
trace entries
-------------
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 5ac724b..8408670 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -25,7 +25,7 @@
hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
[:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
- [:clear][:name=histname1] [if <filter>]
+ [:clear][:name=histname1][:<handler>.<action>] [if <filter>]
When a matching event is hit, an entry is added to a hash table
using the key(s) and value(s) named. Keys and values correspond to
@@ -199,20 +199,8 @@
For some error conditions encountered when invoking a hist trigger
command, extended error information is available via the
- corresponding event's 'hist' file. Reading the hist file after an
- error will display more detailed information about what went wrong,
- if information is available. This extended error information will
- be available until the next hist trigger command for that event.
-
- If available for a given error condition, the extended error
- information and usage takes the following form::
-
- # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
- echo: write error: Invalid argument
-
- # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
- ERROR: Couldn't yyy: zzz
- Last command: xxx
+ tracing/error_log file. See Error Conditions in
+ :file:`Documentation/trace/ftrace.rst` for details.
6.2 'hist' trigger examples
---------------------------
@@ -1022,7 +1010,7 @@
For example, suppose we wanted to take a look at the relative
weights in terms of skb length for each callpath that leads to a
- netif_receieve_skb event when downloading a decent-sized file using
+ netif_receive_skb event when downloading a decent-sized file using
wget.
First we set up an initially paused stacktrace trigger on the
@@ -1765,7 +1753,7 @@
# echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> event2/trigger
-In the first line above, the event's timetamp is saved into the
+In the first line above, the event's timestamp is saved into the
variable ts0. In the next line, ts0 is subtracted from the second
event's timestamp to produce the latency, which is then assigned into
yet another variable, 'wakeup_lat'. The hist trigger below in turn
@@ -1811,7 +1799,7 @@
/sys/kernel/debug/tracing/synthetic_events
At this point, there isn't yet an actual 'wakeup_latency' event
-instantiated in the event subsytem - for this to happen, a 'hist
+instantiated in the event subsystem - for this to happen, a 'hist
trigger action' needs to be instantiated and bound to actual fields
and variables defined on other events (see Section 2.2.3 below on
how that is done using hist trigger 'onmatch' action). Once that is
@@ -1831,45 +1819,94 @@
Like any other event, once a histogram is enabled for the event, the
output can be displayed by reading the event's 'hist' file.
-2.2.3 Hist trigger 'actions'
-----------------------------
+2.2.3 Hist trigger 'handlers' and 'actions'
+-------------------------------------------
-A hist trigger 'action' is a function that's executed whenever a
-histogram entry is added or updated.
+A hist trigger 'action' is a function that's executed (in most cases
+conditionally) whenever a histogram entry is added or updated.
-The default 'action' if no special function is explicity specified is
-as it always has been, to simply update the set of values associated
-with an entry. Some applications, however, may want to perform
-additional actions at that point, such as generate another event, or
-compare and save a maximum.
+When a histogram entry is added or updated, a hist trigger 'handler'
+is what decides whether the corresponding action is actually invoked
+or not.
-The following additional actions are available. To specify an action
-for a given event, simply specify the action between colons in the
-hist trigger specification.
+Hist trigger handlers and actions are paired together in the general
+form:
- - onmatch(matching.event).<synthetic_event_name>(param list)
+ <handler>.<action>
- The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
- trigger action is invoked whenever an event matches and the
- histogram entry would be added or updated. It causes the named
- synthetic event to be generated with the values given in the
+To specify a handler.action pair for a given event, simply specify
+that handler.action pair between colons in the hist trigger
+specification.
+
+In theory, any handler can be combined with any action, but in
+practice, not every handler.action combination is currently supported;
+if a given handler.action combination isn't supported, the hist
+trigger will fail with -EINVAL;
+
+The default 'handler.action' if none is explicitly specified is as it
+always has been, to simply update the set of values associated with an
+entry. Some applications, however, may want to perform additional
+actions at that point, such as generate another event, or compare and
+save a maximum.
+
+The supported handlers and actions are listed below, and each is
+described in more detail in the following paragraphs, in the context
+of descriptions of some common and useful handler.action combinations.
+
+The available handlers are:
+
+ - onmatch(matching.event) - invoke action on any addition or update
+ - onmax(var) - invoke action if var exceeds current max
+ - onchange(var) - invoke action if var changes
+
+The available actions are:
+
+ - trace(<synthetic_event_name>,param list) - generate synthetic event
+ - save(field,...) - save current event fields
+ - snapshot() - snapshot the trace buffer
+
+The following commonly-used handler.action pairs are available:
+
+ - onmatch(matching.event).trace(<synthetic_event_name>,param list)
+
+ The 'onmatch(matching.event).trace(<synthetic_event_name>,param
+ list)' hist trigger action is invoked whenever an event matches
+ and the histogram entry would be added or updated. It causes the
+ named synthetic event to be generated with the values given in the
'param list'. The result is the generation of a synthetic event
that consists of the values contained in those variables at the
- time the invoking event was hit.
+ time the invoking event was hit. For example, if the synthetic
+ event name is 'wakeup_latency', a wakeup_latency event is
+ generated using onmatch(event).trace(wakeup_latency,arg1,arg2).
- The 'param list' consists of one or more parameters which may be
- either variables or fields defined on either the 'matching.event'
- or the target event. The variables or fields specified in the
- param list may be either fully-qualified or unqualified. If a
- variable is specified as unqualified, it must be unique between
- the two events. A field name used as a param can be unqualified
- if it refers to the target event, but must be fully qualified if
- it refers to the matching event. A fully-qualified name is of the
- form 'system.event_name.$var_name' or 'system.event_name.field'.
+ There is also an equivalent alternative form available for
+ generating synthetic events. In this form, the synthetic event
+ name is used as if it were a function name. For example, using
+ the 'wakeup_latency' synthetic event name again, the
+ wakeup_latency event would be generated by invoking it as if it
+ were a function call, with the event field values passed in as
+ arguments: onmatch(event).wakeup_latency(arg1,arg2). The syntax
+ for this form is:
+
+ onmatch(matching.event).<synthetic_event_name>(param list)
+
+ In either case, the 'param list' consists of one or more
+ parameters which may be either variables or fields defined on
+ either the 'matching.event' or the target event. The variables or
+ fields specified in the param list may be either fully-qualified
+ or unqualified. If a variable is specified as unqualified, it
+ must be unique between the two events. A field name used as a
+ param can be unqualified if it refers to the target event, but
+ must be fully qualified if it refers to the matching event. A
+ fully-qualified name is of the form 'system.event_name.$var_name'
+ or 'system.event_name.field'.
The 'matching.event' specification is simply the fully qualified
event name of the event that matches the target event for the
- onmatch() functionality, in the form 'system.event_name'.
+ onmatch() functionality, in the form 'system.event_name'. Histogram
+ keys of both events are compared to find if events match. In case
+ multiple histogram keys are used, they all must match in the specified
+ order.
Finally, the number and type of variables/fields in the 'param
list' must match the number and types of the fields in the
@@ -1896,6 +1933,12 @@
wakeup_new_test($testpid) if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
+ Or, equivalently, using the 'trace' keyword syntax:
+
+ # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
+ trace(wakeup_new_test,$testpid) if comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
+
Creating and displaying a histogram based on those events is now
just a matter of using the fields and new synthetic event in the
tracing/events/synthetic directory, as usual::
@@ -1926,9 +1969,9 @@
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger
Then, when the corresponding thread is actually scheduled onto the
- CPU by a sched_switch event, calculate the latency and use that
- along with another variable and an event field to generate a
- wakeup_latency synthetic event::
+ CPU by a sched_switch event (saved_pid matches next_pid), calculate
+ the latency and use that along with another variable and an event field
+ to generate a wakeup_latency synthetic event::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:\
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
@@ -2000,6 +2043,214 @@
Entries: 2
Dropped: 0
+ - onmax(var).snapshot()
+
+ The 'onmax(var).snapshot()' hist trigger action is invoked
+ whenever the value of 'var' associated with a histogram entry
+ exceeds the current maximum contained in that variable.
+
+ The end result is that a global snapshot of the trace buffer will
+ be saved in the tracing/snapshot file if 'var' exceeds the current
+ maximum for any hist trigger entry.
+
+ Note that in this case the maximum is a global maximum for the
+ current trace instance, which is the maximum across all buckets of
+ the histogram. The key of the specific trace event that caused
+ the global maximum and the global maximum itself are displayed,
+ along with a message stating that a snapshot has been taken and
+ where to find it. The user can use the key information displayed
+ to locate the corresponding bucket in the histogram for even more
+ detail.
+
+ As an example the below defines a couple of hist triggers, one for
+ sched_waking and another for sched_switch, keyed on pid. Whenever
+ a sched_waking event occurs, the timestamp is saved in the entry
+ corresponding to the current pid, and when the scheduler switches
+ back to that pid, the timestamp difference is calculated. If the
+ resulting latency, stored in wakeup_lat, exceeds the current
+ maximum latency, a snapshot is taken. As part of the setup, all
+ the scheduler events are also enabled, which are the events that
+ will show up in the snapshot when it is taken at some point:
+
+ # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+ # echo 'hist:keys=pid:ts0=common_timestamp.usecs \
+ if comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
+
+ # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
+ onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio, \
+ prev_comm):onmax($wakeup_lat).snapshot() \
+ if next_comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+ When the histogram is displayed, for each bucket the max value
+ and the saved values corresponding to the max are displayed
+ following the rest of the fields.
+
+ If a snapshot was taken, there is also a message indicating that,
+ along with the value and event that triggered the global maximum:
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
+ { next_pid: 2101 } hitcount: 200
+ max: 52 next_prio: 120 next_comm: cyclictest \
+ prev_pid: 0 prev_prio: 120 prev_comm: swapper/6
+
+ { next_pid: 2103 } hitcount: 1326
+ max: 572 next_prio: 19 next_comm: cyclictest \
+ prev_pid: 0 prev_prio: 120 prev_comm: swapper/1
+
+ { next_pid: 2102 } hitcount: 1982 \
+ max: 74 next_prio: 19 next_comm: cyclictest \
+ prev_pid: 0 prev_prio: 120 prev_comm: swapper/5
+
+ Snapshot taken (see tracing/snapshot). Details:
+ triggering value { onmax($wakeup_lat) }: 572 \
+ triggered by event with key: { next_pid: 2103 }
+
+ Totals:
+ Hits: 3508
+ Entries: 3
+ Dropped: 0
+
+ In the above case, the event that triggered the global maximum has
+ the key with next_pid == 2103. If you look at the bucket that has
+ 2103 as the key, you'll find the additional values save()'d along
+ with the local maximum for that bucket, which should be the same
+ as the global maximum (since that was the same value that
+ triggered the global snapshot).
+
+ And finally, looking at the snapshot data should show at or near
+ the end the event that triggered the snapshot (in this case you
+ can verify the timestamps between the sched_waking and
+ sched_switch events, which should match the time displayed in the
+ global maximum)::
+
+ # cat /sys/kernel/debug/tracing/snapshot
+
+ <...>-2103 [005] d..3 309.873125: sched_switch: prev_comm=cyclictest prev_pid=2103 prev_prio=19 prev_state=D ==> next_comm=swapper/5 next_pid=0 next_prio=120
+ <idle>-0 [005] d.h3 309.873611: sched_waking: comm=cyclictest pid=2102 prio=19 target_cpu=005
+ <idle>-0 [005] dNh4 309.873613: sched_wakeup: comm=cyclictest pid=2102 prio=19 target_cpu=005
+ <idle>-0 [005] d..3 309.873616: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2102 next_prio=19
+ <...>-2102 [005] d..3 309.873625: sched_switch: prev_comm=cyclictest prev_pid=2102 prev_prio=19 prev_state=D ==> next_comm=swapper/5 next_pid=0 next_prio=120
+ <idle>-0 [005] d.h3 309.874624: sched_waking: comm=cyclictest pid=2102 prio=19 target_cpu=005
+ <idle>-0 [005] dNh4 309.874626: sched_wakeup: comm=cyclictest pid=2102 prio=19 target_cpu=005
+ <idle>-0 [005] dNh3 309.874628: sched_waking: comm=cyclictest pid=2103 prio=19 target_cpu=005
+ <idle>-0 [005] dNh4 309.874630: sched_wakeup: comm=cyclictest pid=2103 prio=19 target_cpu=005
+ <idle>-0 [005] d..3 309.874633: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2102 next_prio=19
+ <idle>-0 [004] d.h3 309.874757: sched_waking: comm=gnome-terminal- pid=1699 prio=120 target_cpu=004
+ <idle>-0 [004] dNh4 309.874762: sched_wakeup: comm=gnome-terminal- pid=1699 prio=120 target_cpu=004
+ <idle>-0 [004] d..3 309.874766: sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=gnome-terminal- next_pid=1699 next_prio=120
+ gnome-terminal--1699 [004] d.h2 309.874941: sched_stat_runtime: comm=gnome-terminal- pid=1699 runtime=180706 [ns] vruntime=1126870572 [ns]
+ <idle>-0 [003] d.s4 309.874956: sched_waking: comm=rcu_sched pid=9 prio=120 target_cpu=007
+ <idle>-0 [003] d.s5 309.874960: sched_wake_idle_without_ipi: cpu=7
+ <idle>-0 [003] d.s5 309.874961: sched_wakeup: comm=rcu_sched pid=9 prio=120 target_cpu=007
+ <idle>-0 [007] d..3 309.874963: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=rcu_sched next_pid=9 next_prio=120
+ rcu_sched-9 [007] d..3 309.874973: sched_stat_runtime: comm=rcu_sched pid=9 runtime=13646 [ns] vruntime=22531430286 [ns]
+ rcu_sched-9 [007] d..3 309.874978: sched_switch: prev_comm=rcu_sched prev_pid=9 prev_prio=120 prev_state=R+ ==> next_comm=swapper/7 next_pid=0 next_prio=120
+ <...>-2102 [005] d..4 309.874994: sched_migrate_task: comm=cyclictest pid=2103 prio=19 orig_cpu=5 dest_cpu=1
+ <...>-2102 [005] d..4 309.875185: sched_wake_idle_without_ipi: cpu=1
+ <idle>-0 [001] d..3 309.875200: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2103 next_prio=19
+
+ - onchange(var).save(field,.. .)
+
+ The 'onchange(var).save(field,...)' hist trigger action is invoked
+ whenever the value of 'var' associated with a histogram entry
+ changes.
+
+ The end result is that the trace event fields specified as the
+ onchange.save() params will be saved if 'var' changes for that
+ hist trigger entry. This allows context from the event that
+ changed the value to be saved for later reference. When the
+ histogram is displayed, additional fields displaying the saved
+ values will be printed.
+
+ - onchange(var).snapshot()
+
+ The 'onchange(var).snapshot()' hist trigger action is invoked
+ whenever the value of 'var' associated with a histogram entry
+ changes.
+
+ The end result is that a global snapshot of the trace buffer will
+ be saved in the tracing/snapshot file if 'var' changes for any
+ hist trigger entry.
+
+ Note that in this case the changed value is a global variable
+ associated with current trace instance. The key of the specific
+ trace event that caused the value to change and the global value
+ itself are displayed, along with a message stating that a snapshot
+ has been taken and where to find it. The user can use the key
+ information displayed to locate the corresponding bucket in the
+ histogram for even more detail.
+
+ As an example the below defines a hist trigger on the tcp_probe
+ event, keyed on dport. Whenever a tcp_probe event occurs, the
+ cwnd field is checked against the current value stored in the
+ $cwnd variable. If the value has changed, a snapshot is taken.
+ As part of the setup, all the scheduler and tcp events are also
+ enabled, which are the events that will show up in the snapshot
+ when it is taken at some point:
+
+ # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+ # echo 1 > /sys/kernel/debug/tracing/events/tcp/enable
+
+ # echo 'hist:keys=dport:cwnd=snd_cwnd: \
+ onchange($cwnd).save(snd_wnd,srtt,rcv_wnd): \
+ onchange($cwnd).snapshot()' >> \
+ /sys/kernel/debug/tracing/events/tcp/tcp_probe/trigger
+
+ When the histogram is displayed, for each bucket the tracked value
+ and the saved values corresponding to that value are displayed
+ following the rest of the fields.
+
+ If a snapshot was taken, there is also a message indicating that,
+ along with the value and event that triggered the snapshot::
+
+ # cat /sys/kernel/debug/tracing/events/tcp/tcp_probe/hist
+
+ { dport: 1521 } hitcount: 8
+ changed: 10 snd_wnd: 35456 srtt: 154262 rcv_wnd: 42112
+
+ { dport: 80 } hitcount: 23
+ changed: 10 snd_wnd: 28960 srtt: 19604 rcv_wnd: 29312
+
+ { dport: 9001 } hitcount: 172
+ changed: 10 snd_wnd: 48384 srtt: 260444 rcv_wnd: 55168
+
+ { dport: 443 } hitcount: 211
+ changed: 10 snd_wnd: 26960 srtt: 17379 rcv_wnd: 28800
+
+ Snapshot taken (see tracing/snapshot). Details::
+
+ triggering value { onchange($cwnd) }: 10
+ triggered by event with key: { dport: 80 }
+
+ Totals:
+ Hits: 414
+ Entries: 4
+ Dropped: 0
+
+ In the above case, the event that triggered the snapshot has the
+ key with dport == 80. If you look at the bucket that has 80 as
+ the key, you'll find the additional values save()'d along with the
+ changed value for that bucket, which should be the same as the
+ global changed value (since that was the same value that triggered
+ the global snapshot).
+
+ And finally, looking at the snapshot data should show at or near
+ the end the event that triggered the snapshot::
+
+ # cat /sys/kernel/debug/tracing/snapshot
+
+ gnome-shell-1261 [006] dN.3 49.823113: sched_stat_runtime: comm=gnome-shell pid=1261 runtime=49347 [ns] vruntime=1835730389 [ns]
+ kworker/u16:4-773 [003] d..3 49.823114: sched_switch: prev_comm=kworker/u16:4 prev_pid=773 prev_prio=120 prev_state=R+ ==> next_comm=kworker/3:2 next_pid=135 next_prio=120
+ gnome-shell-1261 [006] d..3 49.823114: sched_switch: prev_comm=gnome-shell prev_pid=1261 prev_prio=120 prev_state=R+ ==> next_comm=kworker/6:2 next_pid=387 next_prio=120
+ kworker/3:2-135 [003] d..3 49.823118: sched_stat_runtime: comm=kworker/3:2 pid=135 runtime=5339 [ns] vruntime=17815800388 [ns]
+ kworker/6:2-387 [006] d..3 49.823120: sched_stat_runtime: comm=kworker/6:2 pid=387 runtime=9594 [ns] vruntime=14589605367 [ns]
+ kworker/6:2-387 [006] d..3 49.823122: sched_switch: prev_comm=kworker/6:2 prev_pid=387 prev_prio=120 prev_state=R+ ==> next_comm=gnome-shell next_pid=1261 next_prio=120
+ kworker/3:2-135 [003] d..3 49.823123: sched_switch: prev_comm=kworker/3:2 prev_pid=135 prev_prio=120 prev_state=T ==> next_comm=swapper/3 next_pid=0 next_prio=120
+ <idle>-0 [004] ..s7 49.823798: tcp_probe: src=10.0.0.10:54326 dest=23.215.104.193:80 mark=0x0 length=32 snd_nxt=0xe3ae2ff5 snd_una=0xe3ae2ecd snd_cwnd=10 ssthresh=2147483647 snd_wnd=28960 srtt=19604 rcv_wnd=29312
+
3. User space creating a trigger
--------------------------------
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 3069979..b7891cb 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -22,3 +22,6 @@
hwlat_detector
intel_th
stm
+ sys-t
+ coresight
+ coresight-cpu-debug
diff --git a/Documentation/trace/intel_th.rst b/Documentation/trace/intel_th.rst
index 19e2d63..baa12eb 100644
--- a/Documentation/trace/intel_th.rst
+++ b/Documentation/trace/intel_th.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
=======================
Intel(R) Trace Hub (TH)
=======================
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 8bfc75c..5599305 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -20,6 +20,9 @@
/sys/kernel/debug/tracing/kprobe_events, and enable it via
/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enable.
+You can also use /sys/kernel/debug/tracing/dynamic_events instead of
+kprobe_events. That interface will provide unified access to other
+dynamic events too.
Synopsis of kprobe_events
-------------------------
@@ -45,16 +48,21 @@
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack : Fetch stack address.
- $retval : Fetch return value.(*)
+ $argN : Fetch the Nth function argument. (N >= 1) (\*1)
+ $retval : Fetch return value.(\*2)
$comm : Fetch current task comm.
- +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+ +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+ \IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
- (x8/x16/x32/x64), "string" and bitfield are supported.
+ (x8/x16/x32/x64), "string", "ustring" and bitfield
+ are supported.
- (*) only for return probe.
- (**) this is useful for fetching a field of data structures.
+ (\*1) only for the probe on function entry (offs == 0).
+ (\*2) only for return probe.
+ (\*3) this is useful for fetching a field of data structures.
+ (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
Types
-----
@@ -64,16 +72,49 @@
in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32'
or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
x86-64 uses x64).
+These value types can be an array. To record array data, you can add '[N]'
+(where N is a fixed number, less than 64) to the base type.
+E.g. 'x16[4]' means an array of x16 (2bytes hex) with 4 elements.
+Note that the array can be applied to memory type fetchargs, you can not
+apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
+wrong, but '+8($stack):x8[8]' is OK.)
String type is a special type, which fetches a "null-terminated" string from
kernel space. This means it will fail and store NULL if the string container
-has been paged out.
+has been paged out. "ustring" type is an alternative of string for user-space.
+See :ref:`user_mem_access` for more info..
+The string array type is a bit different from other types. For other base
+types, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
+as +0(%di):x32.) But string[1] is not equal to string. The string type itself
+represents "char array", but string array type represents "char * array".
+So, for example, +0(%di):string[1] is equal to +0(+0(%di)):string.
Bitfield is another special type, which takes 3 parameters, bit-width, bit-
offset, and container-size (usually 32). The syntax is::
b<bit-width>@<bit-offset>/<container-size>
+Symbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
+which shows given pointer in "symbol+offset" style.
For $comm, the default type is "string"; any other type is invalid.
+.. _user_mem_access:
+User Memory Access
+------------------
+Kprobe events supports user-space memory access. For that purpose, you can use
+either user-space dereference syntax or 'ustring' type.
+
+The user-space dereference syntax allows you to access a field of a data
+structure in user-space. This is done by adding the "u" prefix to the
+dereference syntax. For example, +u4(%si) means it will read memory from the
+address in the register %si offset by 4, and the memory is expected to be in
+user-space. You can use this for strings too, e.g. +u0(%si):string will read
+a string from the address in the register %si that is expected to be in user-
+space. 'ustring' is a shortcut way of performing the same task. That is,
++0(%si):ustring is equivalent to +u0(%si):string.
+
+Note that kprobe-event provides the user-memory access syntax but it doesn't
+use it transparently. This means if you use normal dereference or string type
+for user memory, it might fail, and may always fail on some archs. The user
+has to carefully check if the target data is in kernel or user space.
Per-Probe Event Filtering
-------------------------
@@ -106,6 +147,20 @@
The first column is event name, the second is the number of probe hits,
the third is the number of probe miss-hits.
+Kernel Boot Parameter
+---------------------
+You can add and enable new kprobe events when booting up the kernel by
+"kprobe_event=" parameter. The parameter accepts a semicolon-delimited
+kprobe events, which format is similar to the kprobe_events.
+The difference is that the probe definition parameters are comma-delimited
+instead of space. For example, adding myprobe event on do_sys_open like below
+
+ p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
+
+should be below for kernel boot parameter (just replace spaces with comma)
+
+ p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack)
+
Usage examples
--------------
@@ -171,6 +226,13 @@
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
+Use the following command to start tracing in an interval.
+::
+
+ # echo 1 > tracing_on
+ Open something...
+ # echo 0 > tracing_on
+
And you can see the traced information via /sys/kernel/debug/tracing/trace.
::
diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
index 66bfd83..995da15 100644
--- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
+++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
@@ -113,7 +113,7 @@
my $regex_kswapd_sleep_default = 'nid=([0-9]*)';
my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*) gfp_flags=([A-Z_|]*)';
my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) classzone_idx=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_skipped=([0-9]*) nr_taken=([0-9]*) lru=([a-z_]*)';
-my $regex_lru_shrink_inactive_default = 'nid=([0-9]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) nr_dirty=([0-9]*) nr_writeback=([0-9]*) nr_congested=([0-9]*) nr_immediate=([0-9]*) nr_activate=([0-9]*) nr_ref_keep=([0-9]*) nr_unmap_fail=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)';
+my $regex_lru_shrink_inactive_default = 'nid=([0-9]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) nr_dirty=([0-9]*) nr_writeback=([0-9]*) nr_congested=([0-9]*) nr_immediate=([0-9]*) nr_activate_anon=([0-9]*) nr_activate_file=([0-9]*) nr_ref_keep=([0-9]*) nr_unmap_fail=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)';
my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)';
my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)';
@@ -212,7 +212,8 @@
"vmscan/mm_vmscan_lru_shrink_inactive",
$regex_lru_shrink_inactive_default,
"nid", "nr_scanned", "nr_reclaimed", "nr_dirty", "nr_writeback",
- "nr_congested", "nr_immediate", "nr_activate", "nr_ref_keep",
+ "nr_congested", "nr_immediate", "nr_activate_anon",
+ "nr_activate_file", "nr_ref_keep",
"nr_unmap_fail", "priority", "flags");
$regex_lru_shrink_active = generate_traceevent_regex(
"vmscan/mm_vmscan_lru_shrink_active",
@@ -407,7 +408,7 @@
}
my $nr_reclaimed = $3;
- my $flags = $12;
+ my $flags = $13;
my $file = 0;
if ($flags =~ /RECLAIM_WB_FILE/) {
$file = 1;
diff --git a/Documentation/trace/stm.rst b/Documentation/trace/stm.rst
index 2c22ddb..99f9996 100644
--- a/Documentation/trace/stm.rst
+++ b/Documentation/trace/stm.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
===================
System Trace Module
===================
@@ -53,12 +55,30 @@
be used for trace sources with the id string of "user/dummy".
Trace sources have to open the stm class device's node and write their
-trace data into its file descriptor. In order to identify themselves
-to the policy, they need to do a STP_POLICY_ID_SET ioctl on this file
-descriptor providing their id string. Otherwise, they will be
-automatically allocated a master/channel pair upon first write to this
-file descriptor according to the "default" rule of the policy, if such
-exists.
+trace data into its file descriptor.
+
+In order to find an appropriate policy node for a given trace source,
+several mechanisms can be used. First, a trace source can explicitly
+identify itself by calling an STP_POLICY_ID_SET ioctl on the character
+device's file descriptor, providing their id string, before they write
+any data there. Secondly, if they chose not to perform the explicit
+identification (because you may not want to patch existing software
+to do this), they can just start writing the data, at which point the
+stm core will try to find a policy node with the name matching the
+task's name (e.g., "syslogd") and if one exists, it will be used.
+Thirdly, if the task name can't be found among the policy nodes, the
+catch-all entry "default" will be used, if it exists. This entry also
+needs to be created and configured by the system administrator or
+whatever tools are taking care of the policy configuration. Finally,
+if all the above steps failed, the write() to an stm file descriptor
+will return a error (EINVAL).
+
+Previously, if no policy nodes were found for a trace source, the stm
+class would silently fall back to allocating the first available
+contiguous range of master/channels from the beginning of the device's
+master/channel range. The new requirement for a policy node to exist
+will help programmers and sysadmins identify gaps in configuration
+and have better control over the un-identified sources.
Some STM devices may allow direct mapping of the channel mmio regions
to userspace for zero-copy writing. One mappable page (in terms of
@@ -92,9 +112,9 @@
there's a node in the root of the policy directory that matches the
stm_source device's name (for example, "console"), this node will be
used to allocate master and channel numbers. If there's no such policy
-node, the stm core will pick the first contiguous chunk of channels
-within the first available master. Note that the node must exist
-before the stm_source device is connected to its stm device.
+node, the stm core will use the catch-all entry "default", if one
+exists. If neither policy nodes exist, the write() to stm_source_link
+will return an error.
stm_console
===========
diff --git a/Documentation/trace/sys-t.rst b/Documentation/trace/sys-t.rst
new file mode 100644
index 0000000..3d8eb92
--- /dev/null
+++ b/Documentation/trace/sys-t.rst
@@ -0,0 +1,62 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+MIPI SyS-T over STP
+===================
+
+The MIPI SyS-T protocol driver can be used with STM class devices to
+generate standardized trace stream. Aside from being a standard, it
+provides better trace source identification and timestamp correlation.
+
+In order to use the MIPI SyS-T protocol driver with your STM device,
+first, you'll need CONFIG_STM_PROTO_SYS_T.
+
+Now, you can select which protocol driver you want to use when you create
+a policy for your STM device, by specifying it in the policy name:
+
+# mkdir /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/
+
+In other words, the policy name format is extended like this:
+
+ <device_name>:<protocol_name>.<policy_name>
+
+With Intel TH, therefore it can look like "0-sth:p_sys-t.my-policy".
+
+If the protocol name is omitted, the STM class will chose whichever
+protocol driver was loaded first.
+
+You can also double check that everything is working as expected by
+
+# cat /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/protocol
+p_sys-t
+
+Now, with the MIPI SyS-T protocol driver, each policy node in the
+configfs gets a few additional attributes, which determine per-source
+parameters specific to the protocol:
+
+# mkdir /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/default
+# ls /config/stp-policy/dummy_stm.0:p_sys-t.my-policy/default
+channels
+clocksync_interval
+do_len
+masters
+ts_interval
+uuid
+
+The most important one here is the "uuid", which determines the UUID
+that will be used to tag all data coming from this source. It is
+automatically generated when a new node is created, but it is likely
+that you would want to change it.
+
+do_len switches on/off the additional "payload length" field in the
+MIPI SyS-T message header. It is off by default as the STP already
+marks message boundaries.
+
+ts_interval and clocksync_interval determine how much time in milliseconds
+can pass before we need to include a protocol (not transport, aka STP)
+timestamp in a message header or send a CLOCKSYNC packet, respectively.
+
+See Documentation/ABI/testing/configfs-stp-policy-p_sys-t for more
+details.
+
+* [1] https://www.mipi.org/specifications/sys-t
diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index d082281..98cde99 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -18,6 +18,10 @@
However unlike kprobe-event tracer, the uprobe event interface expects the
user to calculate the offset of the probepoint in the object.
+You can also use /sys/kernel/debug/tracing/dynamic_events instead of
+uprobe_events. That interface will provide unified access to other
+dynamic events too.
+
Synopsis of uprobe_tracer
-------------------------
::
@@ -38,16 +42,19 @@
@+OFFSET : Fetch memory at OFFSET (OFFSET from same file as PATH)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack : Fetch stack address.
- $retval : Fetch return value.(*)
+ $retval : Fetch return value.(\*1)
$comm : Fetch current task comm.
- +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+ +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
+ \IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
(x8/x16/x32/x64), "string" and bitfield are supported.
- (*) only for return probe.
- (**) this is useful for fetching a field of data structures.
+ (\*1) only for return probe.
+ (\*2) this is useful for fetching a field of data structures.
+ (\*3) Unlike kprobe event, "u" prefix will just be ignored, becuse uprobe
+ events can access only user-space memory.
Types
-----
@@ -69,10 +76,9 @@
Event Profiling
---------------
-You can check the total number of probe hits and probe miss-hits via
-/sys/kernel/debug/tracing/uprobe_profile.
-The first column is event name, the second is the number of probe hits,
-the third is the number of probe miss-hits.
+You can check the total number of probe hits per event via
+/sys/kernel/debug/tracing/uprobe_profile. The first column is the filename,
+the second is the event name, the third is the number of probe hits.
Usage examples
--------------
@@ -149,10 +155,15 @@
# echo 1 > events/uprobes/enable
-Lets disable the event after sleeping for some time.
+Lets start tracing, sleep for some time and stop tracing.
::
+ # echo 1 > tracing_on
# sleep 20
+ # echo 0 > tracing_on
+
+Also, you can disable the event by::
+
# echo 0 > events/uprobes/enable
And you can see the traced information via /sys/kernel/debug/tracing/trace.