Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 1 | .. _gprof: |
| 2 | |
| 3 | Gprof |
| 4 | ##### |
| 5 | This describes to do profiling of user Trusted Applications with ``gprof``. |
| 6 | |
| 7 | The configuration option ``CFG_TA_GPROF_SUPPORT=y`` enables OP-TEE to collect |
| 8 | profiling information from Trusted Applications running in user mode and |
| 9 | compiled with ``-pg``. Once collected, the profiling data are formatted in the |
| 10 | ``gmon.out`` format and sent to ``tee-supplicant`` via RPC, so they can be saved |
| 11 | to disk and later processed and displayed by the standard ``gprof`` tool. |
| 12 | |
| 13 | Usage |
| 14 | ***** |
| 15 | |
| 16 | - Build OP-TEE OS with ``CFG_TA_GPROF_SUPPORT=y``. You may also set |
Jerome Forissier | 2a22411 | 2020-05-27 11:45:25 +0200 | [diff] [blame] | 17 | ``CFG_ULIBS_MCOUNT=y`` to instrument the user TA libraries contained in |
| 18 | ``optee_os`` (such as ``libutee`` and ``libutils``). |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 19 | |
Sumit Garg | fcc4ed6 | 2019-06-10 13:59:53 +0530 | [diff] [blame] | 20 | - Build user TAs with ``-pg``, for instance enable: ``CFG_TA_MCOUNT=y`` to |
| 21 | instrument whole user TA. Note that instrumented TAs have a larger |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 22 | ``.bss`` section. The memory overhead is 1.36 times the ``.text`` size for |
| 23 | 32-bit TAs, and 1.77 times for 64-bit ones (refer to the TA linker script |
| 24 | for details: ``ta/arch/arm/ta.ld.S``). |
| 25 | |
| 26 | - Run the application normally. When the last session exits, |
| 27 | ``tee-supplicant`` will write profiling data to |
| 28 | ``/tmp/gmon-<ta_uuid>.out``. If the file already exists, a number is |
| 29 | appended, such as: ``gmon-<ta_uuid>.1.out``. |
| 30 | |
| 31 | - Run gprof on the TA ELF file and profiling output: ``gprof <ta_uuid>.elf |
| 32 | gmon-<ta_uuid>.out`` |
| 33 | |
| 34 | Implementation |
| 35 | ************** |
| 36 | Part of the profiling is implemented in libutee. Another part is done in the TEE |
| 37 | core by a pseudo-TA (``core/arch/arm/sta/gprof.c``). Two types of data are |
| 38 | collected: |
| 39 | |
| 40 | 1. Call graph information |
| 41 | - When TA source files are compiled with the -pg switch, the compiler |
| 42 | generates extra code into each function prologue to call the |
| 43 | instrumentation entry point (``__gnu_mcount_nc`` or ``_mcount`` |
| 44 | depending on the architecture). Each time an instrumented function is |
| 45 | called, libutee records a pair of program counters (one is the caller |
| 46 | and the other one is the callee) as well as the number of times this |
| 47 | specific arc of the call graph has been invoked. |
| 48 | |
| 49 | 2. PC distribution over time |
| 50 | - When an instrumented TA starts, libutee calls the pseudo-TA to start |
| 51 | PC sampling for the current session. Sampling data are written into |
| 52 | the user-space buffer directly by the TEE core. |
| 53 | |
| 54 | - Whenever the TA execution is interrupted, the TEE core records the |
| 55 | current program counter value and builds a histogram of program |
| 56 | locations (i.e., relative amount of time spent for each value of the |
| 57 | PC). This is later used by the gprof tool to derive the time spent in |
| 58 | each function. The sampling rate, which is assumed to be roughly |
| 59 | constant, is computed by keeping track of the time spent executing |
| 60 | user TA code and dividing the number of interrupts by the total time. |
| 61 | |
| 62 | - The profiling buffer into which call graph and sampling data are |
| 63 | recorded is allocated in the TA's ``.bss`` section. Some space is |
| 64 | reserved by the linker script, only when the TA is instrumented. |