| ################################ |
| Profiler tool and TF-M Profiling |
| ################################ |
| |
| The profiler is a tool for profiling and benchmarking programs. The developer can |
| leverage it to get the interested data of runtime. |
| |
| Initially, the profiler supports only count logging. You can add "checkpoint" |
| in the program. The timer count or CPU cycle count of this checkpoint can be |
| saved at runtime and be analysed in the future. |
| |
| ********************************* |
| TF-M Profiling Build Instructions |
| ********************************* |
| |
| TF-M has integrated some built-in profiling cases. There are two configurations |
| for profiling: |
| |
| * ``CONFIG_TFM_ENABLE_PROFILING``: Enable profiling building in TF-M SPE and NSPE. |
| It cannot be enabled together with any regression test configs, for example ``TEST_NS``. |
| * ``TFM_TOOLS_PATH``: Path of tf-m-tools repo. The default value is ``DOWNLOAD`` |
| to fetch the remote source. |
| |
| The section `TF-M Profiling Cases`_ introduces the profiling cases in TF-M. |
| To enable the built-in profiling cases in TF-M, run: |
| |
| .. code-block:: console |
| |
| cd <path to tf-m-tools>/profiling/profiling_cases/tfm_profiling |
| mkdir build |
| |
| # Build SPE |
| cmake -S <path to tf-m> -B build/spe -DTFM_PLATFORM=arm/mps2/an521 \ |
| -DCONFIG_TFM_ENABLE_PROFILING=ON -DCMAKE_BUILD_TYPE=Release \ |
| -DTFM_EXTRA_PARTITION_PATHS=${PWD}/../prof_psa_client_api/partitions/prof_server_partition;${PWD}/../prof_psa_client_api/partitions/prof_client_partition \ |
| -DTFM_EXTRA_MANIFEST_LIST_FILES=${PWD}/../prof_psa_client_api/partitions/prof_psa_client_api_manifest_list.yaml \ |
| -DTFM_PARTITION_LOG_LEVEL=TFM_PARTITION_LOG_LEVEL_INFO |
| |
| # Another simple way to configure SPE: |
| cmake -S <path to tf-m> -B build/spe -DTFM_PLATFORM=arm/mps2/an521 \ |
| -DTFM_EXTRA_CONFIG_PATH=${PWD}/../prof_psa_client_api/partitions/config_spe.cmake |
| cmake --build build/spe -- install -j |
| |
| # Build NSPE |
| cmake -S . -B build/nspe -DCONFIG_SPE_PATH=${PWD}/build/spe/api_ns \ |
| -DTFM_TOOLCHAIN_FILE=build/spe/api_ns/cmake/toolchain_ns_GNUARM.cmake |
| cmake --build build/nspe -- -j |
| |
| .. Note:: |
| |
| TF-M profiling implementation relies on the physical CPU cycles provided by hardware |
| timer (refer to `Implement the HAL`_). It may not be supported on virtual platforms |
| or emulators. |
| |
| ****************************** |
| Profiler Integration Reference |
| ****************************** |
| |
| `profiler/profiler.c` is the main source file to be complied with the tagert program. |
| |
| Initialization |
| ============== |
| |
| ``PROFILING_INIT()`` defined in `profiling/export/prof_intf_s.h` shall be called |
| on the secure side before calling any other API of the profiler. It initializes the |
| HAL and the backend database which can be customized by users. |
| |
| Implement the HAL |
| ----------------- |
| |
| `export/prof_hal.h` defines the HAL that should be implemented by the platform. |
| |
| * ``prof_hal_init()``: Initialize the counter hardware. |
| |
| * ``prof_hal_get_count()``: Get current counter value. |
| |
| Users shall implement platform-specific hardware support in ``prof_hal_init()`` |
| and ``prof_hal_get_count()`` under `export/platform`. |
| |
| Take `export/platform/tfm_hal_dwt_prof.c` as an example, it uses Data Watchpoint |
| and Trace unit (DWT) to count the CPU cycles which can be a reference for |
| performance. |
| |
| Setup Database |
| -------------- |
| |
| The size of the database is determined by ``PROF_DB_MAX`` defined in |
| `export/prof_common.h`. |
| |
| The developer can override the size by redefining ``PROF_DB_MAX``. |
| |
| Add Checkpoints |
| =============== |
| |
| The developer should identify the places in the source code for adding the |
| checkpoints. The count value of the timer or CPU cycle will be saved into the |
| database for the checkpoints. The interface APIs are defined in `export/prof_intf_s.h` for the secure side. |
| |
| It's also supported to add checkpoints on the non-secure side. |
| Add `export/ns/prof_intf_ns.c` to the source file list of the non-secure side. |
| The interface APIs for the non-secure side are defined in `export/ns/prof_intf_ns.h`. |
| |
| The counter logging related APIs are defined in macros to keep the interface |
| consistent between the secure and non-secure sides. |
| |
| Users can call macro ``PROF_TIMING_LOG()`` logs the counter value. |
| |
| .. code-block:: c |
| |
| PROF_TIMING_LOG(topic_id, cp_id); |
| |
| +------------+--------------------------------------------------------------+ |
| | Parameters | Description | |
| +============+==============================================================+ |
| | topic_id | Topic is used to gather a group of checkpoints. | |
| | | It's useful when you have many checkpoints for different | |
| | | purposes. Topic can help to organize them and filter the | |
| | | related information out. It's an 8-bit unsigned value. | |
| +------------+--------------------------------------------------------------+ |
| | cp_id | Checkpoint ID. Different topics can have same cp_id. | |
| | | It's a 16-bit unsigned value. | |
| +------------+--------------------------------------------------------------+ |
| |
| Collect Data |
| ============ |
| |
| After successfully running the program, the data should be saved into the database. |
| The developer can dump the data through the interface defined in the header |
| files mentioned above. |
| |
| For the same consistent reason as counter logging, the same macros are defined as |
| the interfaces for both secure and non-secure sides. |
| |
| The data fetching interfaces work in a stream way. ``PROF_FETCH_DATA_START`` and |
| ``PROF_FETCH_DATA_BY_TOPIC_START`` search the data that matches the given pattern |
| from the beginning of the database. ``PROF_FETCH_DATA_CONTINUE`` and |
| ``PROF_FETCH_DATA_BY_TOPIC_CONTINUE`` search from the next data set of the |
| previous result. |
| |
| .. Note:: |
| |
| All the APIs increase the internal search index, be careful about mixing using them |
| for different checkpoints and topics at the same time. |
| |
| The match condition of a search is controlled by the tag mask. It's ``tag value`` |
| & ``tag_mask`` == ``tag_pattern``. To enumerate the whole database, set |
| ``tag_mask`` and ``tag_pattern`` both to ``0``. |
| |
| * ``PROF_FETCH_DATA_XXX``: The generic interface for getting data. |
| * ``PROF_FETCH_DATA_BY_TOPIC_XXX``: Get data for a specific ``topic``. |
| |
| The APIs return ``false`` if no matching data is found until the end of the database. |
| |
| Calibration |
| =========== |
| |
| The profiler itself has the tick or cycle cost. To get more accurate data, a |
| calibration system is introduced. It's optional. |
| |
| The counter logging APIs can be called from the secure or non-secure side. And the |
| cost of calling functions from these two worlds is different. So, secure and |
| non-secure have different calibration data. |
| |
| The system performance might float during the initialization, for example, change |
| CPU frequency, enable cache, etc. So, it's recommended that the calibration is |
| done just before the first checkpoint. |
| |
| * ``PROF_DO_CALIBRATE``: Call this macro to get the calibration value. The more ``rounds`` |
| the more accurate. |
| * ``PROF_GET_CALI_VALUE_FROM_TAG``: Get the calibration value from the tag. |
| The calibrated counter is ``current_counter - previous_counter - current_cali_value``. |
| Here ``current_cali_value`` equals ``PROF_GET_CALI_VALUE_FROM_TAG`` (current_tag). |
| |
| Data Analysis |
| ============= |
| |
| Data analysis interfaces can be used to do some basic analysis and the data |
| returned is calibrated already. |
| |
| ``PROF_DATA_DIFF``: Get the counter value difference for the two tags. Returning |
| ``0`` indicates errors. |
| |
| If the checkpoints are logged by multi-times, you can get the following counter |
| value differences between two tags: |
| |
| * ``PROF_DATA_DIFF_MIN``: Get the minimum counter value difference for the two tags. |
| Returning ``UINT32_MAX`` indicates errors. |
| * ``PROF_DATA_DIFF_MAX``: Get the maximum counter value difference for the two tags. |
| Returning ``0`` indicates errors. |
| * ``PROF_DATA_DIFF_AVG``: Get the average counter value difference for the two tags. |
| Returning ``0`` indicates errors. |
| |
| A customized software or tool can be used to generate the analysis report based |
| on the data. |
| |
| Profiler Self-test |
| ================== |
| |
| `profiler_self_test` is a quick test for all interfaces above. To build and run |
| in the Linux: |
| |
| .. code-block:: console |
| |
| cd profiler_self_test |
| mkdir build && cd build |
| cmake .. && make |
| ./prof_self_test |
| |
| ******************** |
| TF-M Profiling Cases |
| ******************** |
| |
| The profiler tool has already been integrated into TF-M to analyze the program |
| performance with the built-in profiling cases. Users can also add a new |
| profiling case to get a specific profiling report. TF-M profiling provides |
| example profiling cases in `profiling_cases`. |
| |
| PSA Client API Profiling |
| ======================== |
| |
| This profiling case analyzes the performance of PSA Client APIs called from SPE |
| and NSPE, including ``psa_connect()``, ``psa_call()``, ``psa_close()`` and ``stateless psa_call()``. |
| The main structure is: |
| |
| :: |
| |
| prof_psa_client_api/ |
| ├── cases |
| │ ├── non_secure |
| │ └── secure |
| └── partitions |
| ├── prof_server_partition |
| └── prof_client_partition |
| |
| * The `cases` folder is the basic SPE and NSPE profiling log and analysis code. |
| * NSPE can use `prof_log` library to print the analysis result. |
| * `prof_server_partition` is a dummy secure partition. It immediately returns |
| once it receives a PSA client call from a client. |
| * `prof_client_partition` is the SPE profiling entry to trigger the secure profiling. |
| |
| To make this profiling report more accurate, It is recommended to disable other |
| partitions and all irrelevant tests. |
| |
| Adding New TF-M Profiling Case |
| ============================== |
| |
| Users can add source folder `<prof_example>` under path `profiling_cases` to |
| customize performance analysis of target processes, such as the APIs of secure |
| partitions, the functions in the SPM, or the user's interfaces. The |
| integration requires these steps: |
| |
| 1. Confirm the target process block to create profiling cases. |
| 2. Enable or create the server partition if necessary. Note that the other |
| irrelevant partitions shall be disabled. |
| 3. Find ways to output profiling data. |
| 4. Trigger profiling cases in SPE or NSPE. |
| |
| a. For SPE, a secure client partition can be created to trigger the secure profiling. |
| b. For NSPE, the profiling case entry can be added to the 'tfm_ns' target under the `tfm_profiling` folder. |
| |
| .. Note:: |
| |
| If the profiling case requires extra out-of-tree secure partition build, the |
| paths of extra partitions and manifest list file shall be appended in |
| ``TFM_EXTRA_PARTITION_PATHS`` and ``TFM_EXTRA_MANIFEST_LIST_FILES``. Refer to |
| :doc:`Adding Secure Partition<TF-M:integration_guide/services/tfm_secure_partition_addition>`. |
| |
| -------------- |
| |
| *Copyright (c) 2022-2023, Arm Limited. All rights reserved.* |