feat(runtime): for each CPU only map their own stack respectively

This commit changes the memory mapping of the stack area. Instead of
flat-mapping the whole stack area for each CPU, only map the stack
area for that particular CPU in the High region.

For this to work, the following changes are made:
 - move the mapping of the high region out from the realm lib to the
   xlat library
 - move the build time config option RMM_NUM_PAGES_PER_STACK to the xlat
   library.
 - add a new mm region to the high VA space for the stack of that CPU.
 - reorder segments in the linker script so that `.percpu` segment is
   the last. Change flat-mapping configurations so that the `.percpu`
   segment is not added.

Change-Id: I997c1b5ff2b972807d75692b5efc9075c5a29a30
Signed-off-by: Mate Toth-Pal <mate.toth-pal@arm.com>
diff --git a/docs/getting_started/build-options.rst b/docs/getting_started/build-options.rst
index 6667fe8..b863cd5 100644
--- a/docs/getting_started/build-options.rst
+++ b/docs/getting_started/build-options.rst
@@ -246,7 +246,7 @@
    RMM_UART_ADDR		,			,0x0			,"Base addr of UART to be used for RMM logs"
    PLAT_CMN_CTX_MAX_XLAT_TABLES ,			,0			,"Maximum number of translation tables used by the runtime context"
    PLAT_CMN_EXTRA_MMAP_REGIONS	,			,0			,"Extra platform mmap regions that need to be mapped in S1 xlat tables"
-   RMM_NUM_PAGES_PER_STACK	,			,3			,"Number of pages to use per CPU stack"
+   RMM_NUM_PAGES_PER_STACK	,			,5			,"Number of pages to use per CPU stack"
    MBEDTLS_ECP_MAX_OPS		,248 -			,1000			,"Number of max operations per ECC signing iteration"
    RMM_FPU_USE_AT_REL2		,ON | OFF		,OFF(fake_host) ON(aarch64),"Enable FPU/SIMD usage in RMM."
    RMM_MAX_GRANULES		,			,0			,"Maximum number of memory granules available to the system"