Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 1 | ################################ |
| 2 | Profiler tool and TF-M Profiling |
| 3 | ################################ |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 4 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 5 | The profiler is a tool for profiling and benchmarking programs. The developer can |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 6 | leverage it to get the interested data of runtime. |
| 7 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 8 | Initially, the profiler supports only count logging. You can add "checkpoint" |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 9 | in the program. The timer count or CPU cycle count of this checkpoint can be |
| 10 | saved at runtime and be analysed in the future. |
| 11 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 12 | ********************************* |
| 13 | TF-M Profiling Build Instructions |
| 14 | ********************************* |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 15 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 16 | TF-M has integrated some built-in profiling cases. There are two configurations |
| 17 | for profiling: |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 18 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 19 | * ``CONFIG_TFM_ENABLE_PROFILING``: Enable profiling building in TF-M SPE and NSPE. |
| 20 | It cannot be enabled together with any regression test configs, for example ``TEST_NS``. |
| 21 | * ``TFM_TOOLS_PATH``: Path of tf-m-tools repo. The default value is ``DOWNLOAD`` |
| 22 | to fetch the remote source. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 23 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 24 | The section `TF-M Profiling Cases`_ introduces the profiling cases in TF-M. |
| 25 | To enable the built-in profiling cases in TF-M, run: |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 26 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 27 | .. code-block:: console |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 28 | |
Jianliang Shen | 25f6d7b | 2023-11-07 14:30:48 +0800 | [diff] [blame] | 29 | cd <path to tf-m-tools>/profiling/profiling_cases/tfm_profiling |
| 30 | mkdir build |
| 31 | |
| 32 | # Build SPE |
| 33 | cmake -S <path to tf-m> -B build/spe -DTFM_PLATFORM=arm/mps2/an521 \ |
| 34 | -DCONFIG_TFM_ENABLE_PROFILING=ON -DCMAKE_BUILD_TYPE=Release \ |
| 35 | -DTFM_EXTRA_PARTITION_PATHS=${PWD}/../prof_psa_client_api/partitions/prof_server_partition;${PWD}/../prof_psa_client_api/partitions/prof_client_partition \ |
| 36 | -DTFM_EXTRA_MANIFEST_LIST_FILES=${PWD}/../prof_psa_client_api/partitions/prof_psa_client_api_manifest_list.yaml \ |
| 37 | -DTFM_PARTITION_LOG_LEVEL=TFM_PARTITION_LOG_LEVEL_INFO |
| 38 | |
| 39 | # Another simple way to configure SPE: |
| 40 | cmake -S <path to tf-m> -B build/spe -DTFM_PLATFORM=arm/mps2/an521 \ |
| 41 | -DTFM_EXTRA_CONFIG_PATH=${PWD}/../prof_psa_client_api/partitions/config_spe.cmake |
| 42 | cmake --build build/spe -- install -j |
| 43 | |
| 44 | # Build NSPE |
| 45 | cmake -S . -B build/nspe -DCONFIG_SPE_PATH=build/spe/api_ns \ |
| 46 | -DTFM_TOOLCHAIN_FILE=build/spe/api_ns/cmake/toolchain_ns_GNUARM.cmake |
| 47 | cmake --build build/nspe -- -j |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 48 | |
Jianliang Shen | 2e13457 | 2023-11-22 15:38:33 +0800 | [diff] [blame] | 49 | .. Note:: |
| 50 | |
| 51 | TF-M profiling implementation relies on the physical CPU cycles provided by hardware |
| 52 | timer (refer to `Implement the HAL`_). It may not be supported on virtual platforms |
| 53 | or emulators. |
| 54 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 55 | ****************************** |
| 56 | Profiler Integration Reference |
| 57 | ****************************** |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 58 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 59 | `profiler/profiler.c` is the main source file to be complied with the tagert program. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 60 | |
| 61 | Initialization |
| 62 | ============== |
| 63 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 64 | ``PROFILING_INIT()`` defined in `profiling/export/prof_intf_s.h` shall be called |
| 65 | on the secure side before calling any other API of the profiler. It initializes the |
| 66 | HAL and the backend database which can be customized by users. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 67 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 68 | Implement the HAL |
| 69 | ----------------- |
| 70 | |
| 71 | `export/prof_hal.h` defines the HAL that should be implemented by the platform. |
| 72 | |
| 73 | * ``prof_hal_init()``: Initialize the counter hardware. |
| 74 | |
| 75 | * ``prof_hal_get_count()``: Get current counter value. |
| 76 | |
| 77 | Users shall implement platform-specific hardware support in ``prof_hal_init()`` |
| 78 | and ``prof_hal_get_count()`` under `export/platform`. |
| 79 | |
| 80 | Take `export/platform/tfm_hal_dwt_prof.c` as an example, it uses Data Watchpoint |
| 81 | and Trace unit (DWT) to count the CPU cycles which can be a reference for |
| 82 | performance. |
| 83 | |
| 84 | Setup Database |
| 85 | -------------- |
| 86 | |
| 87 | The size of the database is determined by ``PROF_DB_MAX`` defined in |
| 88 | `export/prof_common.h`. |
| 89 | |
| 90 | The developer can override the size by redefining ``PROF_DB_MAX``. |
| 91 | |
| 92 | Add Checkpoints |
| 93 | =============== |
| 94 | |
| 95 | The developer should identify the places in the source code for adding the |
| 96 | checkpoints. The count value of the timer or CPU cycle will be saved into the |
| 97 | database for the checkpoints. The interface APIs are defined in `export/prof_intf_s.h` for the secure side. |
| 98 | |
| 99 | It's also supported to add checkpoints on the non-secure side. |
| 100 | Add `export/ns/prof_intf_ns.c` to the source file list of the non-secure side. |
| 101 | The interface APIs for the non-secure side are defined in `export/ns/prof_intf_ns.h`. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 102 | |
| 103 | The counter logging related APIs are defined in macros to keep the interface |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 104 | consistent between the secure and non-secure sides. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 105 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 106 | Users can call macro ``PROF_TIMING_LOG()`` logs the counter value. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 107 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 108 | .. code-block:: c |
Elena Uziunaite | b90a340 | 2023-11-13 16:24:28 +0000 | [diff] [blame] | 109 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 110 | PROF_TIMING_LOG(topic_id, cp_id); |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 111 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 112 | +------------+--------------------------------------------------------------+ |
| 113 | | Parameters | Description | |
| 114 | +============+==============================================================+ |
| 115 | | topic_id | Topic is used to gather a group of checkpoints. | |
| 116 | | | It's useful when you have many checkpoints for different | |
| 117 | | | purposes. Topic can help to organize them and filter the | |
| 118 | | | related information out. It's an 8-bit unsigned value. | |
| 119 | +------------+--------------------------------------------------------------+ |
| 120 | | cp_id | Checkpoint ID. Different topics can have same cp_id. | |
| 121 | | | It's a 16-bit unsigned value. | |
| 122 | +------------+--------------------------------------------------------------+ |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 123 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 124 | Collect Data |
| 125 | ============ |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 126 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 127 | After successfully running the program, the data should be saved into the database. |
| 128 | The developer can dump the data through the interface defined in the header |
| 129 | files mentioned above. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 130 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 131 | For the same consistent reason as counter logging, the same macros are defined as |
| 132 | the interfaces for both secure and non-secure sides. |
| 133 | |
| 134 | The data fetching interfaces work in a stream way. ``PROF_FETCH_DATA_START`` and |
| 135 | ``PROF_FETCH_DATA_BY_TOPIC_START`` search the data that matches the given pattern |
| 136 | from the beginning of the database. ``PROF_FETCH_DATA_CONTINUE`` and |
| 137 | ``PROF_FETCH_DATA_BY_TOPIC_CONTINUE`` search from the next data set of the |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 138 | previous result. |
| 139 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 140 | .. Note:: |
Kevin Peng | dc06d4b | 2023-07-13 15:31:15 +0800 | [diff] [blame] | 141 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 142 | All the APIs increase the internal search index, be careful about mixing using them |
Kevin Peng | dc06d4b | 2023-07-13 15:31:15 +0800 | [diff] [blame] | 143 | for different checkpoints and topics at the same time. |
| 144 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 145 | The match condition of a search is controlled by the tag mask. It's ``tag value`` |
| 146 | & ``tag_mask`` == ``tag_pattern``. To enumerate the whole database, set |
| 147 | ``tag_mask`` and ``tag_pattern`` both to ``0``. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 148 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 149 | * ``PROF_FETCH_DATA_XXX``: The generic interface for getting data. |
| 150 | * ``PROF_FETCH_DATA_BY_TOPIC_XXX``: Get data for a specific ``topic``. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 151 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 152 | The APIs return ``false`` if no matching data is found until the end of the database. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 153 | |
| 154 | Calibration |
| 155 | =========== |
| 156 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 157 | The profiler itself has the tick or cycle cost. To get more accurate data, a |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 158 | calibration system is introduced. It's optional. |
| 159 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 160 | The counter logging APIs can be called from the secure or non-secure side. And the |
| 161 | cost of calling functions from these two worlds is different. So, secure and |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 162 | non-secure have different calibration data. |
| 163 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 164 | The system performance might float during the initialization, for example, change |
| 165 | CPU frequency, enable cache, etc. So, it's recommended that the calibration is |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 166 | done just before the first checkpoint. |
| 167 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 168 | * ``PROF_DO_CALIBRATE``: Call this macro to get the calibration value. The more ``rounds`` |
| 169 | the more accurate. |
| 170 | * ``PROF_GET_CALI_VALUE_FROM_TAG``: Get the calibration value from the tag. |
| 171 | The calibrated counter is ``current_counter - previous_counter - current_cali_value``. |
| 172 | Here ``current_cali_value`` equals ``PROF_GET_CALI_VALUE_FROM_TAG`` (current_tag). |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 173 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 174 | Data Analysis |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 175 | ============= |
| 176 | |
| 177 | Data analysis interfaces can be used to do some basic analysis and the data |
| 178 | returned is calibrated already. |
| 179 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 180 | ``PROF_DATA_DIFF``: Get the counter value difference for the two tags. Returning |
| 181 | ``0`` indicates errors. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 182 | |
| 183 | If the checkpoints are logged by multi-times, you can get the following counter |
| 184 | value differences between two tags: |
| 185 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 186 | * ``PROF_DATA_DIFF_MIN``: Get the minimum counter value difference for the two tags. |
| 187 | Returning ``UINT32_MAX`` indicates errors. |
| 188 | * ``PROF_DATA_DIFF_MAX``: Get the maximum counter value difference for the two tags. |
| 189 | Returning ``0`` indicates errors. |
| 190 | * ``PROF_DATA_DIFF_AVG``: Get the average counter value difference for the two tags. |
| 191 | Returning ``0`` indicates errors. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 192 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 193 | A customized software or tool can be used to generate the analysis report based |
| 194 | on the data. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 195 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 196 | Profiler Self-test |
| 197 | ================== |
| 198 | |
| 199 | `profiler_self_test` is a quick test for all interfaces above. To build and run |
| 200 | in the Linux: |
| 201 | |
| 202 | .. code-block:: console |
| 203 | |
| 204 | cd profiler_self_test |
| 205 | mkdir build && cd build |
| 206 | cmake .. && make |
| 207 | ./prof_self_test |
| 208 | |
| 209 | ******************** |
| 210 | TF-M Profiling Cases |
| 211 | ******************** |
| 212 | |
| 213 | The profiler tool has already been integrated into TF-M to analyze the program |
| 214 | performance with the built-in profiling cases. Users can also add a new |
| 215 | profiling case to get a specific profiling report. TF-M profiling provides |
| 216 | example profiling cases in `profiling_cases`. |
| 217 | |
| 218 | PSA Client API Profiling |
| 219 | ======================== |
| 220 | |
| 221 | This profiling case analyzes the performance of PSA Client APIs called from SPE |
| 222 | and NSPE, including ``psa_connect()``, ``psa_call()``, ``psa_close()`` and ``stateless psa_call()``. |
| 223 | The main structure is: |
| 224 | |
| 225 | :: |
| 226 | |
| 227 | prof_psa_client_api/ |
| 228 | ├── cases |
| 229 | │ ├── non_secure |
| 230 | │ └── secure |
| 231 | └── partitions |
| 232 | ├── prof_server_partition |
| 233 | └── prof_client_partition |
| 234 | |
| 235 | * The `cases` folder is the basic SPE and NSPE profiling log and analysis code. |
Jianliang Shen | 25f6d7b | 2023-11-07 14:30:48 +0800 | [diff] [blame] | 236 | * NSPE can use `prof_log` library to print the analysis result. |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 237 | * `prof_server_partition` is a dummy secure partition. It immediately returns |
| 238 | once it receives a PSA client call from a client. |
| 239 | * `prof_client_partition` is the SPE profiling entry to trigger the secure profiling. |
| 240 | |
| 241 | To make this profiling report more accurate, It is recommended to disable other |
| 242 | partitions and all irrelevant tests. |
| 243 | |
| 244 | Adding New TF-M Profiling Case |
| 245 | ============================== |
| 246 | |
| 247 | Users can add source folder `<prof_example>` under path `profiling_cases` to |
| 248 | customize performance analysis of target processes, such as the APIs of secure |
| 249 | partitions, the functions in the SPM, or the user's interfaces. The |
| 250 | integration requires these steps: |
| 251 | |
| 252 | 1. Confirm the target process block to create profiling cases. |
| 253 | 2. Enable or create the server partition if necessary. Note that the other |
| 254 | irrelevant partitions shall be disabled. |
| 255 | 3. Find ways to output profiling data. |
| 256 | 4. Trigger profiling cases in SPE or NSPE. |
| 257 | |
| 258 | a. For SPE, a secure client partition can be created to trigger the secure profiling. |
Jianliang Shen | 25f6d7b | 2023-11-07 14:30:48 +0800 | [diff] [blame] | 259 | b. For NSPE, the profiling case entry can be added to the 'tfm_ns' target under the `tfm_profiling` folder. |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 260 | |
| 261 | .. Note:: |
| 262 | |
| 263 | If the profiling case requires extra out-of-tree secure partition build, the |
| 264 | paths of extra partitions and manifest list file shall be appended in |
| 265 | ``TFM_EXTRA_PARTITION_PATHS`` and ``TFM_EXTRA_MANIFEST_LIST_FILES``. Refer to |
Elena Uziunaite | b90a340 | 2023-11-13 16:24:28 +0000 | [diff] [blame] | 266 | :doc:`Adding Secure Partition<TF-M:integration_guide/services/tfm_secure_partition_addition>`. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 267 | |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 268 | -------------- |
| 269 | |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 270 | *Copyright (c) 2022-2023, Arm Limited. All rights reserved.* |