Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 1 | ========================== |
| 2 | PCIe Device AER statistics |
| 3 | ========================== |
| 4 | These attributes show up under all the devices that are AER capable. These |
| 5 | statistical counters indicate the errors "as seen/reported by the device". |
| 6 | Note that this may mean that if an endpoint is causing problems, the AER |
| 7 | counters may increment at its link partner (e.g. root port) because the |
| 8 | errors may be "seen" / reported by the link partner and not the |
| 9 | problematic endpoint itself (which may report all counters as 0 as it never |
| 10 | saw any problems). |
| 11 | |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 12 | What: /sys/bus/pci/devices/<dev>/aer_dev_correctable |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 13 | Date: July 2018 |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 14 | KernelVersion: 4.19.0 |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 15 | Contact: linux-pci@vger.kernel.org, rajatja@google.com |
| 16 | Description: List of correctable errors seen and reported by this |
| 17 | PCI device using ERR_COR. Note that since multiple errors may |
| 18 | be reported using a single ERR_COR message, thus |
| 19 | TOTAL_ERR_COR at the end of the file may not match the actual |
| 20 | total of all the errors in the file. Sample output: |
| 21 | ------------------------------------------------------------------------- |
| 22 | localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable |
| 23 | Receiver Error 2 |
| 24 | Bad TLP 0 |
| 25 | Bad DLLP 0 |
| 26 | RELAY_NUM Rollover 0 |
| 27 | Replay Timer Timeout 0 |
| 28 | Advisory Non-Fatal 0 |
| 29 | Corrected Internal Error 0 |
| 30 | Header Log Overflow 0 |
| 31 | TOTAL_ERR_COR 2 |
| 32 | ------------------------------------------------------------------------- |
| 33 | |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 34 | What: /sys/bus/pci/devices/<dev>/aer_dev_fatal |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 35 | Date: July 2018 |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 36 | KernelVersion: 4.19.0 |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 37 | Contact: linux-pci@vger.kernel.org, rajatja@google.com |
| 38 | Description: List of uncorrectable fatal errors seen and reported by this |
| 39 | PCI device using ERR_FATAL. Note that since multiple errors may |
| 40 | be reported using a single ERR_FATAL message, thus |
| 41 | TOTAL_ERR_FATAL at the end of the file may not match the actual |
| 42 | total of all the errors in the file. Sample output: |
| 43 | ------------------------------------------------------------------------- |
| 44 | localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal |
| 45 | Undefined 0 |
| 46 | Data Link Protocol 0 |
| 47 | Surprise Down Error 0 |
| 48 | Poisoned TLP 0 |
| 49 | Flow Control Protocol 0 |
| 50 | Completion Timeout 0 |
| 51 | Completer Abort 0 |
| 52 | Unexpected Completion 0 |
| 53 | Receiver Overflow 0 |
| 54 | Malformed TLP 0 |
| 55 | ECRC 0 |
| 56 | Unsupported Request 0 |
| 57 | ACS Violation 0 |
| 58 | Uncorrectable Internal Error 0 |
| 59 | MC Blocked TLP 0 |
| 60 | AtomicOp Egress Blocked 0 |
| 61 | TLP Prefix Blocked Error 0 |
| 62 | TOTAL_ERR_FATAL 0 |
| 63 | ------------------------------------------------------------------------- |
| 64 | |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 65 | What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 66 | Date: July 2018 |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 67 | KernelVersion: 4.19.0 |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 68 | Contact: linux-pci@vger.kernel.org, rajatja@google.com |
| 69 | Description: List of uncorrectable nonfatal errors seen and reported by this |
| 70 | PCI device using ERR_NONFATAL. Note that since multiple errors |
| 71 | may be reported using a single ERR_FATAL message, thus |
| 72 | TOTAL_ERR_NONFATAL at the end of the file may not match the |
| 73 | actual total of all the errors in the file. Sample output: |
| 74 | ------------------------------------------------------------------------- |
| 75 | localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal |
| 76 | Undefined 0 |
| 77 | Data Link Protocol 0 |
| 78 | Surprise Down Error 0 |
| 79 | Poisoned TLP 0 |
| 80 | Flow Control Protocol 0 |
| 81 | Completion Timeout 0 |
| 82 | Completer Abort 0 |
| 83 | Unexpected Completion 0 |
| 84 | Receiver Overflow 0 |
| 85 | Malformed TLP 0 |
| 86 | ECRC 0 |
| 87 | Unsupported Request 0 |
| 88 | ACS Violation 0 |
| 89 | Uncorrectable Internal Error 0 |
| 90 | MC Blocked TLP 0 |
| 91 | AtomicOp Egress Blocked 0 |
| 92 | TLP Prefix Blocked Error 0 |
| 93 | TOTAL_ERR_NONFATAL 0 |
| 94 | ------------------------------------------------------------------------- |
| 95 | |
| 96 | ============================ |
| 97 | PCIe Rootport AER statistics |
| 98 | ============================ |
| 99 | These attributes show up under only the rootports (or root complex event |
| 100 | collectors) that are AER capable. These indicate the number of error messages as |
| 101 | "reported to" the rootport. Please note that the rootports also transmit |
| 102 | (internally) the ERR_* messages for errors seen by the internal rootport PCI |
| 103 | device, so these counters include them and are thus cumulative of all the error |
| 104 | messages on the PCI hierarchy originating at that root port. |
| 105 | |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 106 | What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 107 | Date: July 2018 |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 108 | KernelVersion: 4.19.0 |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 109 | Contact: linux-pci@vger.kernel.org, rajatja@google.com |
| 110 | Description: Total number of ERR_COR messages reported to rootport. |
| 111 | |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 112 | What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 113 | Date: July 2018 |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 114 | KernelVersion: 4.19.0 |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 115 | Contact: linux-pci@vger.kernel.org, rajatja@google.com |
| 116 | Description: Total number of ERR_FATAL messages reported to rootport. |
| 117 | |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 118 | What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 119 | Date: July 2018 |
David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 120 | KernelVersion: 4.19.0 |
Andrew Scull | b4b6d4a | 2019-01-02 15:54:55 +0000 | [diff] [blame] | 121 | Contact: linux-pci@vger.kernel.org, rajatja@google.com |
| 122 | Description: Total number of ERR_NONFATAL messages reported to rootport. |