Update Linux to v5.4.2
Change-Id: Idf6911045d9d382da2cfe01b1edff026404ac8fd
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 22b4b00..99ca040 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -45,6 +45,7 @@
3.9 /proc/<pid>/map_files - Information about memory mapped files
3.10 /proc/<pid>/timerslack_ns - Task timerslack value
3.11 /proc/<pid>/patch_state - Livepatch patch operation state
+ 3.12 /proc/<pid>/arch_status - Task architecture specific information
4 Configuring procfs
4.1 Mount options
@@ -125,6 +126,13 @@
The link self points to the process reading the file system. Each process
subdirectory has the entries listed in Table 1-1.
+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.
Table 1-1: Process specific entries in /proc
..............................................................................
@@ -146,9 +154,11 @@
symbol the task is blocked in - or "0" if not blocked.
pagemap Page table
stack Report full stack trace, enable via CONFIG_STACKTRACE
- smaps an extension based on maps, showing the memory consumption of
+ smaps An extension based on maps, showing the memory consumption of
each mapping and flags associated with it
- numa_maps an extension based on maps, showing the memory locality and
+ smaps_rollup Accumulated smaps stats for all mappings of the process. This
+ can be derived from smaps, but is faster and more convenient
+ numa_maps An extension based on maps, showing the memory locality and
binding policy as well as mem usage (in pages) of each mapping.
..............................................................................
@@ -182,6 +192,7 @@
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
+ THP_enabled: 1
Threads: 1
SigQ: 0/28578
SigPnd: 0000000000000000
@@ -193,8 +204,10 @@
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
+ CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
+ Speculation_Store_Bypass: thread vulnerable
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
@@ -214,7 +227,7 @@
snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
It's slow but very precise.
-Table 1-2: Contents of the status files (as of 4.8)
+Table 1-2: Contents of the status files (as of 4.19)
..............................................................................
Field Content
Name filename of the executable
@@ -256,6 +269,8 @@
HugetlbPages size of hugetlb memory portions
CoreDumping process's memory is currently being dumped
(killing the process may lead to a corrupted core)
+ THP_enabled process is allowed to use THP (returns 0 when
+ PR_SET_THP_DISABLE is set on the process
Threads number of threads
SigQ number of signals queued/max. number for queue
SigPnd bitmap of pending signals for the thread
@@ -267,8 +282,10 @@
CapPrm bitmap of permitted capabilities
CapEff bitmap of effective capabilities
CapBnd bitmap of capabilities bounding set
+ CapAmb bitmap of ambient capabilities
NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
+ Speculation_Store_Bypass speculative store bypass mitigation status
Cpus_allowed mask of CPUs on which this process may run
Cpus_allowed_list Same as previous, but in "list format"
Mems_allowed mask of memory nodes allowed to this process
@@ -351,7 +368,7 @@
exit_code the thread's exit_code in the form reported by the waitpid system call
..............................................................................
-The /proc/PID/maps file containing the currently mapped memory regions and
+The /proc/PID/maps file contains the currently mapped memory regions and
their access permissions.
The format is:
@@ -402,11 +419,14 @@
or if empty, the mapping is anonymous.
The /proc/PID/smaps is an extension based on maps, showing the memory
-consumption for each of the process's mappings. For each of mappings there
-is a series of lines such as the following:
+consumption for each of the process's mappings. For each mapping (aka Virtual
+Memory Area, or VMA) there is a series of lines such as the following:
08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash
+
Size: 1084 kB
+KernelPageSize: 4 kB
+MMUPageSize: 4 kB
Rss: 892 kB
Pss: 374 kB
Shared_Clean: 892 kB
@@ -425,13 +445,17 @@
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
+THPeligible: 0
VmFlags: rd ex mr mw me dw
-the first of these lines shows the same information as is displayed for the
-mapping in /proc/PID/maps. The remaining lines show the size of the mapping
-(size), the amount of the mapping that is currently resident in RAM (RSS), the
-process' proportional share of this mapping (PSS), the number of clean and
-dirty private pages in the mapping.
+The first of these lines shows the same information as is displayed for the
+mapping in /proc/PID/maps. Following lines show the size of the mapping
+(size); the size of each page allocated when backing a VMA (KernelPageSize),
+which is usually the same as the size in the page table entries; the page size
+used by the MMU when backing a VMA (in most cases, the same as KernelPageSize);
+the amount of the mapping that is currently resident in RAM (RSS); the
+process' proportional share of this mapping (PSS); and the number of clean and
+dirty shared and private pages in the mapping.
The "proportional set size" (PSS) of a process is the count of pages it has
in memory, where each page is divided by the number of processes sharing it.
@@ -462,6 +486,8 @@
"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
does not take into account swapped out page of underlying shmem objects.
"Locked" indicates whether the mapping is locked in memory or not.
+"THPeligible" indicates whether the mapping is eligible for allocating THP
+pages - 1 if true, 0 otherwise. It just shows the current status.
"VmFlags" field deserves a separate description. This member represents the kernel
flags associated with the particular virtual memory area in two letter encoded
@@ -496,7 +522,9 @@
Note that there is no guarantee that every flag and associated mnemonic will
be present in all further kernel releases. Things get changed, the flags may
-be vanished or the reverse -- new added.
+be vanished or the reverse -- new added. Interpretation of their meaning
+might change in future as well. So each consumer of these flags has to
+follow each specific kernel version for the exact semantic.
This file is only present if the CONFIG_MMU kernel configuration option is
enabled.
@@ -512,6 +540,19 @@
2) If there is something at a given vaddr during the entirety of the
life of the smaps/maps walk, there will be some output for it.
+The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps,
+but their values are the sums of the corresponding values for all mappings of
+the process. Additionally, it contains these fields:
+
+Pss_Anon
+Pss_File
+Pss_Shmem
+
+They represent the proportional shares of anonymous, file, and shmem pages, as
+described for smaps above. These fields are omitted in smaps since each
+mapping identifies the type (anon, file, or shmem) of all pages it contains.
+Thus all information in smaps_rollup can be derived from smaps, but at a
+significantly higher cost.
The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
bits on both physical and virtual pages associated with a process, and the
@@ -858,6 +899,7 @@
AnonPages: 861800 kB
Mapped: 280372 kB
Shmem: 644 kB
+KReclaimable: 168048 kB
Slab: 284364 kB
SReclaimable: 159856 kB
SUnreclaim: 124508 kB
@@ -925,6 +967,9 @@
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
with huge pages
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
+KReclaimable: Kernel allocations that the kernel will attempt to reclaim
+ under memory pressure. Includes SReclaimable (below), and other
+ direct allocations with a shrinker.
Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
@@ -1455,7 +1500,7 @@
This chapter is heavily based on the documentation included in the pre 2.2
kernels, and became part of it in version 2.2.1 of the Linux kernel.
-Please see: Documentation/sysctl/ directory for descriptions of these
+Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
entries.
------------------------------------------------------------------------------
@@ -1925,6 +1970,45 @@
patched. If the patch is being disabled, then the task hasn't been
unpatched yet.
+3.12 /proc/<pid>/arch_status - task architecture specific status
+-------------------------------------------------------------------
+When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the
+architecture specific status of the task.
+
+Example
+-------
+ $ cat /proc/6753/arch_status
+ AVX512_elapsed_ms: 8
+
+Description
+-----------
+
+x86 specific entries:
+---------------------
+ AVX512_elapsed_ms:
+ ------------------
+ If AVX512 is supported on the machine, this entry shows the milliseconds
+ elapsed since the last time AVX512 usage was recorded. The recording
+ happens on a best effort basis when a task is scheduled out. This means
+ that the value depends on two factors:
+
+ 1) The time which the task spent on the CPU without being scheduled
+ out. With CPU isolation and a single runnable task this can take
+ several seconds.
+
+ 2) The time since the task was scheduled out last. Depending on the
+ reason for being scheduled out (time slice exhausted, syscall ...)
+ this can be arbitrary long time.
+
+ As a consequence the value cannot be considered precise and authoritative
+ information. The application which uses this information has to be aware
+ of the overall scenario on the system in order to determine whether a
+ task is a real AVX512 user or not. Precise information can be obtained
+ with performance counters.
+
+ A special value of '-1' indicates that no AVX512 usage was recorded, thus
+ the task is unlikely an AVX512 user, but depends on the workload and the
+ scheduling scenario, it also could be a false negative mentioned above.
------------------------------------------------------------------------------
Configuring procfs