Update Linux to v5.4.2
Change-Id: Idf6911045d9d382da2cfe01b1edff026404ac8fd
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
deleted file mode 100644
index 02a323c..0000000
--- a/Documentation/networking/00-INDEX
+++ /dev/null
@@ -1,234 +0,0 @@
-00-INDEX
- - this file
-3c509.txt
- - information on the 3Com Etherlink III Series Ethernet cards.
-6pack.txt
- - info on the 6pack protocol, an alternative to KISS for AX.25
-LICENSE.qla3xxx
- - GPLv2 for QLogic Linux Networking HBA Driver
-LICENSE.qlge
- - GPLv2 for QLogic Linux qlge NIC Driver
-LICENSE.qlcnic
- - GPLv2 for QLogic Linux qlcnic NIC Driver
-PLIP.txt
- - PLIP: The Parallel Line Internet Protocol device driver
-README.ipw2100
- - README for the Intel PRO/Wireless 2100 driver.
-README.ipw2200
- - README for the Intel PRO/Wireless 2915ABG and 2200BG driver.
-README.sb1000
- - info on General Instrument/NextLevel SURFboard1000 cable modem.
-altera_tse.txt
- - Altera Triple-Speed Ethernet controller.
-arcnet-hardware.txt
- - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc.
-arcnet.txt
- - info on the using the ARCnet driver itself.
-atm.txt
- - info on where to get ATM programs and support for Linux.
-ax25.txt
- - info on using AX.25 and NET/ROM code for Linux
-baycom.txt
- - info on the driver for Baycom style amateur radio modems
-bonding.txt
- - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux.
-bridge.txt
- - where to get user space programs for ethernet bridging with Linux.
-cdc_mbim.txt
- - 3G/LTE USB modem (Mobile Broadband Interface Model)
-checksum-offloads.txt
- - Explanation of checksum offloads; LCO, RCO
-cops.txt
- - info on the COPS LocalTalk Linux driver
-cs89x0.txt
- - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver
-cxacru.txt
- - Conexant AccessRunner USB ADSL Modem
-cxacru-cf.py
- - Conexant AccessRunner USB ADSL Modem configuration file parser
-cxgb.txt
- - Release Notes for the Chelsio N210 Linux device driver.
-dccp.txt
- - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42).
-dctcp.txt
- - DataCenter TCP congestion control
-de4x5.txt
- - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver
-decnet.txt
- - info on using the DECnet networking layer in Linux.
-dl2k.txt
- - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko).
-dm9000.txt
- - README for the Simtec DM9000 Network driver.
-dmfe.txt
- - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver.
-dns_resolver.txt
- - The DNS resolver module allows kernel servies to make DNS queries.
-driver.txt
- - Softnet driver issues.
-ena.txt
- - info on Amazon's Elastic Network Adapter (ENA)
-e100.txt
- - info on Intel's EtherExpress PRO/100 line of 10/100 boards
-e1000.txt
- - info on Intel's E1000 line of gigabit ethernet boards
-e1000e.txt
- - README for the Intel Gigabit Ethernet Driver (e1000e).
-eql.txt
- - serial IP load balancing
-fib_trie.txt
- - Level Compressed Trie (LC-trie) notes: a structure for routing.
-filter.txt
- - Linux Socket Filtering
-fore200e.txt
- - FORE Systems PCA-200E/SBA-200E ATM NIC driver info.
-framerelay.txt
- - info on using Frame Relay/Data Link Connection Identifier (DLCI).
-gen_stats.txt
- - Generic networking statistics for netlink users.
-generic-hdlc.txt
- - The generic High Level Data Link Control (HDLC) layer.
-generic_netlink.txt
- - info on Generic Netlink
-gianfar.txt
- - Gianfar Ethernet Driver.
-i40e.txt
- - README for the Intel Ethernet Controller XL710 Driver (i40e).
-i40evf.txt
- - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function
-ieee802154.txt
- - Linux IEEE 802.15.4 implementation, API and drivers
-igb.txt
- - README for the Intel Gigabit Ethernet Driver (igb).
-igbvf.txt
- - README for the Intel Gigabit Ethernet Driver (igbvf).
-ip-sysctl.txt
- - /proc/sys/net/ipv4/* variables
-ip_dynaddr.txt
- - IP dynamic address hack e.g. for auto-dialup links
-ipddp.txt
- - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
-iphase.txt
- - Interphase PCI ATM (i)Chip IA Linux driver info.
-ipsec.txt
- - Note on not compressing IPSec payload and resulting failed policy check.
-ipv6.txt
- - Options to the ipv6 kernel module.
-ipvs-sysctl.txt
- - Per-inode explanation of the /proc/sys/net/ipv4/vs interface.
-irda.txt
- - where to get IrDA (infrared) utilities and info for Linux.
-ixgb.txt
- - README for the Intel 10 Gigabit Ethernet Driver (ixgb).
-ixgbe.txt
- - README for the Intel 10 Gigabit Ethernet Driver (ixgbe).
-ixgbevf.txt
- - README for the Intel Virtual Function (VF) Driver (ixgbevf).
-l2tp.txt
- - User guide to the L2TP tunnel protocol.
-lapb-module.txt
- - programming information of the LAPB module.
-ltpc.txt
- - the Apple or Farallon LocalTalk PC card driver
-mac80211-auth-assoc-deauth.txt
- - authentication and association / deauth-disassoc with max80211
-mac80211-injection.txt
- - HOWTO use packet injection with mac80211
-multiqueue.txt
- - HOWTO for multiqueue network device support.
-netconsole.txt
- - The network console module netconsole.ko: configuration and notes.
-netdev-features.txt
- - Network interface features API description.
-netdevices.txt
- - info on network device driver functions exported to the kernel.
-netif-msg.txt
- - Design of the network interface message level setting (NETIF_MSG_*).
-netlink_mmap.txt
- - memory mapped I/O with netlink
-nf_conntrack-sysctl.txt
- - list of netfilter-sysctl knobs.
-nfc.txt
- - The Linux Near Field Communication (NFS) subsystem.
-openvswitch.txt
- - Open vSwitch developer documentation.
-operstates.txt
- - Overview of network interface operational states.
-packet_mmap.txt
- - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING).
-phonet.txt
- - The Phonet packet protocol used in Nokia cellular modems.
-phy.txt
- - The PHY abstraction layer.
-pktgen.txt
- - User guide to the kernel packet generator (pktgen.ko).
-policy-routing.txt
- - IP policy-based routing
-ppp_generic.txt
- - Information about the generic PPP driver.
-proc_net_tcp.txt
- - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces.
-radiotap-headers.txt
- - Background on radiotap headers.
-ray_cs.txt
- - Raylink Wireless LAN card driver info.
-rds.txt
- - Background on the reliable, ordered datagram delivery method RDS.
-regulatory.txt
- - Overview of the Linux wireless regulatory infrastructure.
-rxrpc.txt
- - Guide to the RxRPC protocol.
-s2io.txt
- - Release notes for Neterion Xframe I/II 10GbE driver.
-scaling.txt
- - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS.
-sctp.txt
- - Notes on the Linux kernel implementation of the SCTP protocol.
-secid.txt
- - Explanation of the secid member in flow structures.
-skfp.txt
- - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
-smc9.txt
- - the driver for SMC's 9000 series of Ethernet cards
-spider_net.txt
- - README for the Spidernet Driver (as found in PS3 / Cell BE).
-stmmac.txt
- - README for the STMicro Synopsys Ethernet driver.
-tc-actions-env-rules.txt
- - rules for traffic control (tc) actions.
-timestamping.txt
- - overview of network packet timestamping variants.
-tcp.txt
- - short blurb on how TCP output takes place.
-tcp-thin.txt
- - kernel tuning options for low rate 'thin' TCP streams.
-team.txt
- - pointer to information for ethernet teaming devices.
-tlan.txt
- - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info.
-tproxy.txt
- - Transparent proxy support user guide.
-tuntap.txt
- - TUN/TAP device driver, allowing user space Rx/Tx of packets.
-udplite.txt
- - UDP-Lite protocol (RFC 3828) introduction.
-vortex.txt
- - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards.
-vxge.txt
- - README for the Neterion X3100 PCIe Server Adapter.
-vxlan.txt
- - Virtual extensible LAN overview
-x25.txt
- - general info on X.25 development.
-x25-iface.txt
- - description of the X.25 Packet Layer to LAPB device interface.
-xfrm_device.txt
- - description of XFRM offload API
-xfrm_proc.txt
- - description of the statistics package for XFRM.
-xfrm_sync.txt
- - sync patches for XFRM enable migration of an SA between hosts.
-xfrm_sysctl.txt
- - description of the XFRM configuration options.
-z8530drv.txt
- - info about Linux driver for Z8530 based HDLC cards for AX.25
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index ff929cf..83f7ae5 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,14 +153,16 @@
Frames passed to the kernel are used for the ingress path (RX rings).
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr. E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
-UMEM Completetion Ring
-~~~~~~~~~~~~~~~~~~~~~~
+UMEM Completion Ring
+~~~~~~~~~~~~~~~~~~~~
The Completion Ring is used transfer ownership of UMEM frames from
kernel-space to user-space. Just like the Fill ring, UMEM indicies are
@@ -220,7 +222,21 @@
In order to use AF_XDP sockets there are two parts needed. The
user-space application and the XDP program. For a complete setup and
usage example, please refer to the sample application. The user-space
-side is xdpsock_user.c and the XDP side xdpsock_kern.c.
+side is xdpsock_user.c and the XDP side is part of libbpf.
+
+The XDP code sample included in tools/lib/bpf/xsk.c is the following::
+
+ SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
+ {
+ int index = ctx->rx_queue_index;
+
+ // A set entry here means that the correspnding queue_id
+ // has an active AF_XDP socket bound to it.
+ if (bpf_map_lookup_elem(&xsks_map, &index))
+ return bpf_redirect_map(&xsks_map, index, 0);
+
+ return XDP_PASS;
+ }
Naive ring dequeue and enqueue could look like this::
@@ -295,6 +311,41 @@
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
can be displayed with "-h", as usual.
+FAQ
+=======
+
+Q: I am not seeing any traffic on the socket. What am I doing wrong?
+
+A: When a netdev of a physical NIC is initialized, Linux usually
+ allocates one Rx and Tx queue pair per core. So on a 8 core system,
+ queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
+ bind call or the xsk_socket__create libbpf function call, you
+ specify a specific queue id to bind to and it is only the traffic
+ towards that queue you are going to get on you socket. So in the
+ example above, if you bind to queue 0, you are NOT going to get any
+ traffic that is distributed to queues 1 through 7. If you are
+ lucky, you will see the traffic, but usually it will end up on one
+ of the queues you have not bound to.
+
+ There are a number of ways to solve the problem of getting the
+ traffic you want to the queue id you bound to. If you want to see
+ all the traffic, you can force the netdev to only have 1 queue, queue
+ id 0, and then bind to queue 0. You can use ethtool to do this::
+
+ sudo ethtool -L <interface> combined 1
+
+ If you want to only see part of the traffic, you can program the
+ NIC through ethtool to filter out your traffic to a single queue id
+ that you can bind your XDP socket to. Here is one example in which
+ UDP traffic to and from port 4242 are sent to queue 2::
+
+ sudo ethtool -N <interface> rx-flow-hash udp4 fn
+ sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
+ 4242 action 2
+
+ A number of other ways are possible all up to the capabilitites of
+ the NIC you have.
+
Credits
=======
@@ -309,4 +360,3 @@
- Michael S. Tsirkin
- Qi Z Zhang
- Willem de Bruijn
-
diff --git a/Documentation/networking/batman-adv.rst b/Documentation/networking/batman-adv.rst
index 245fb6c..1802094 100644
--- a/Documentation/networking/batman-adv.rst
+++ b/Documentation/networking/batman-adv.rst
@@ -27,24 +27,8 @@
$ insmod batman-adv.ko
The module is now waiting for activation. You must add some interfaces on which
-batman can operate. After loading the module batman advanced will scan your
-systems interfaces to search for compatible interfaces. Once found, it will
-create subfolders in the ``/sys`` directories of each supported interface,
-e.g.::
-
- $ ls /sys/class/net/eth0/batman_adv/
- elp_interval iface_status mesh_iface throughput_override
-
-If an interface does not have the ``batman_adv`` subfolder, it probably is not
-supported. Not supported interfaces are: loopback, non-ethernet and batman's
-own interfaces.
-
-Note: After the module was loaded it will continuously watch for new
-interfaces to verify the compatibility. There is no need to reload the module
-if you plug your USB wifi adapter into your machine after batman advanced was
-initially loaded.
-
-The batman-adv soft-interface can be created using the iproute2 tool ``ip``::
+batman-adv can operate. The batman-adv soft-interface can be created using the
+iproute2 tool ``ip``::
$ ip link add name bat0 type batadv
@@ -52,57 +36,46 @@
$ ip link set dev eth0 master bat0
-Repeat this step for all interfaces you wish to add. Now batman starts
+Repeat this step for all interfaces you wish to add. Now batman-adv starts
using/broadcasting on this/these interface(s).
-By reading the "iface_status" file you can check its status::
-
- $ cat /sys/class/net/eth0/batman_adv/iface_status
- active
-
To deactivate an interface you have to detach it from the "bat0" interface::
$ ip link set dev eth0 nomaster
+The same can also be done using the batctl interface subcommand::
-All mesh wide settings can be found in batman's own interface folder::
+ batctl -m bat0 interface create
+ batctl -m bat0 interface add -M eth0
- $ ls /sys/class/net/bat0/mesh/
- aggregated_ogms fragmentation isolation_mark routing_algo
- ap_isolation gw_bandwidth log_level vlan0
- bonding gw_mode multicast_mode
- bridge_loop_avoidance gw_sel_class network_coding
- distributed_arp_table hop_penalty orig_interval
+To detach eth0 and destroy bat0::
-There is a special folder for debugging information::
+ batctl -m bat0 interface del -M eth0
+ batctl -m bat0 interface destroy
- $ ls /sys/kernel/debug/batman_adv/bat0/
- bla_backbone_table log neighbors transtable_local
- bla_claim_table mcast_flags originators
- dat_cache nc socket
- gateways nc_nodes transtable_global
+There are additional settings for each batadv mesh interface, vlan and hardif
+which can be modified using batctl. Detailed information about this can be found
+in its manual.
-Some of the files contain all sort of status information regarding the mesh
-network. For example, you can view the table of originators (mesh
-participants) with::
+For instance, you can check the current originator interval (value
+in milliseconds which determines how often batman-adv sends its broadcast
+packets)::
- $ cat /sys/kernel/debug/batman_adv/bat0/originators
-
-Other files allow to change batman's behaviour to better fit your requirements.
-For instance, you can check the current originator interval (value in
-milliseconds which determines how often batman sends its broadcast packets)::
-
- $ cat /sys/class/net/bat0/mesh/orig_interval
+ $ batctl -M bat0 orig_interval
1000
and also change its value::
- $ echo 3000 > /sys/class/net/bat0/mesh/orig_interval
+ $ batctl -M bat0 orig_interval 3000
In very mobile scenarios, you might want to adjust the originator interval to a
lower value. This will make the mesh more responsive to topology changes, but
will also increase the overhead.
+Information about the current state can be accessed via the batadv generic
+netlink family. batctl provides human readable version via its debug tables
+subcommands.
+
Usage
=====
@@ -147,43 +120,16 @@
menuconfig" and enable the option ``B.A.T.M.A.N. debugging``
(``CONFIG_BATMAN_ADV_DEBUG=y``).
-Those additional debug messages can be accessed using a special file in
-debugfs::
+Those additional debug messages can be accessed using the perf infrastructure::
- $ cat /sys/kernel/debug/batman_adv/bat0/log
+ $ trace-cmd stream -e batadv:batadv_dbg
The additional debug output is by default disabled. It can be enabled during
-run time. Following log_levels are defined:
+run time::
-.. flat-table::
+ $ batctl -m bat0 loglevel routes tt
- * - 0
- - All debug output disabled
- * - 1
- - Enable messages related to routing / flooding / broadcasting
- * - 2
- - Enable messages related to route added / changed / deleted
- * - 4
- - Enable messages related to translation table operations
- * - 8
- - Enable messages related to bridge loop avoidance
- * - 16
- - Enable messages related to DAT, ARP snooping and parsing
- * - 32
- - Enable messages related to network coding
- * - 64
- - Enable messages related to multicast
- * - 128
- - Enable messages related to throughput meter
- * - 255
- - Enable all messages
-
-The debug output can be changed at runtime using the file
-``/sys/class/net/bat0/mesh/log_level``. e.g.::
-
- $ echo 6 > /sys/class/net/bat0/mesh/log_level
-
-will enable debug messages for when routes change.
+will enable debug messages for when routes and translation table entries change.
Counters for different types of packets entering and leaving the batman-adv
module are available through ethtool::
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index d3e5dd2..e3abfbd 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -706,9 +706,9 @@
unsolicited IPv6 Neighbor Advertisements) to be issued after a
failover event. As soon as the link is up on the new slave
(possibly immediately) a peer notification is sent on the
- bonding device and each VLAN sub-device. This is repeated at
- each link monitor interval (arp_interval or miimon, whichever
- is active) if the number is greater than 1.
+ bonding device and each VLAN sub-device. This is repeated at
+ the rate specified by peer_notif_delay if the number is
+ greater than 1.
The valid range is 0 - 255; the default value is 1. These options
affect only the active-backup mode. These options were added for
@@ -727,6 +727,16 @@
The valid range is 0 - 65535; the default value is 1. This option
has effect only in balance-rr mode.
+peer_notif_delay
+
+ Specify the delay, in milliseconds, between each peer
+ notification (gratuitous ARP and unsolicited IPv6 Neighbor
+ Advertisement) when they are issued after a failover event.
+ This delay should be a multiple of the link monitor interval
+ (arp_interval or miimon, whichever is active). The default
+ value is 0 which means to match the value of the link monitor
+ interval.
+
primary
A string (eth0, eth2, etc) specifying which slave is the
diff --git a/Documentation/networking/caif/README b/Documentation/networking/caif/caif.rst
similarity index 70%
rename from Documentation/networking/caif/README
rename to Documentation/networking/caif/caif.rst
index 757ccfa..07afc80 100644
--- a/Documentation/networking/caif/README
+++ b/Documentation/networking/caif/caif.rst
@@ -1,18 +1,31 @@
-Copyright (C) ST-Ericsson AB 2010
-Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
-License terms: GNU General Public License (GPL) version 2
----------------------------------------------------------
+:orphan:
-=== Start ===
-If you have compiled CAIF for modules do:
-
-$modprobe crc_ccitt
-$modprobe caif
-$modprobe caif_socket
-$modprobe chnl_net
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
-=== Preparing the setup with a STE modem ===
+================
+Using Linux CAIF
+================
+
+
+:Copyright: |copy| ST-Ericsson AB 2010
+
+:Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
+
+Start
+=====
+
+If you have compiled CAIF for modules do::
+
+ $modprobe crc_ccitt
+ $modprobe caif
+ $modprobe caif_socket
+ $modprobe chnl_net
+
+
+Preparing the setup with a STE modem
+====================================
If you are working on integration of CAIF you should make sure
that the kernel is built with module support.
@@ -32,24 +45,30 @@
Normally Frame Checksum is always used on UART, but this is also provided as a
module parameter "ser_use_fcs".
-$ modprobe caif_serial ser_ttyname=/dev/ttyS0 ser_use_stx=yes
-$ ifconfig caif_ttyS0 up
+::
-PLEASE NOTE: There is a limitation in Android shell.
+ $ modprobe caif_serial ser_ttyname=/dev/ttyS0 ser_use_stx=yes
+ $ ifconfig caif_ttyS0 up
+
+PLEASE NOTE:
+ There is a limitation in Android shell.
It only accepts one argument to insmod/modprobe!
-=== Trouble shooting ===
+Trouble shooting
+================
There are debugfs parameters provided for serial communication.
/sys/kernel/debug/caif_serial/<tty-name>/
* ser_state: Prints the bit-mask status where
+
- 0x02 means SENDING, this is a transient state.
- 0x10 means FLOW_OFF_SENT, i.e. the previous frame has not been sent
- and is blocking further send operation. Flow OFF has been propagated
- to all CAIF Channels using this TTY.
+ and is blocking further send operation. Flow OFF has been propagated
+ to all CAIF Channels using this TTY.
* tty_status: Prints the bit-mask tty status information
+
- 0x01 - tty->warned is on.
- 0x02 - tty->low_latency is on.
- 0x04 - tty->packed is on.
@@ -58,13 +77,17 @@
- 0x20 - tty->stopped is on.
* last_tx_msg: Binary blob Prints the last transmitted frame.
- This can be printed with
- $od --format=x1 /sys/kernel/debug/caif_serial/<tty>/last_rx_msg.
- The first two tx messages sent look like this. Note: The initial
- byte 02 is start of frame extension (STX) used for re-syncing
- upon errors.
- - Enumeration:
+ This can be printed with::
+
+ $od --format=x1 /sys/kernel/debug/caif_serial/<tty>/last_rx_msg.
+
+ The first two tx messages sent look like this. Note: The initial
+ byte 02 is start of frame extension (STX) used for re-syncing
+ upon errors.
+
+ - Enumeration::
+
0000000 02 05 00 00 03 01 d2 02
| | | | | |
STX(1) | | | |
@@ -73,7 +96,9 @@
Command:Enumeration(1)
Link-ID(1)
Checksum(2)
- - Channel Setup:
+
+ - Channel Setup::
+
0000000 02 07 00 00 00 21 a1 00 48 df
| | | | | | | |
STX(1) | | | | | |
@@ -86,13 +111,18 @@
Checksum(2)
* last_rx_msg: Prints the last transmitted frame.
- The RX messages for LinkSetup look almost identical but they have the
- bit 0x20 set in the command bit, and Channel Setup has added one byte
- before Checksum containing Channel ID.
- NOTE: Several CAIF Messages might be concatenated. The maximum debug
+
+ The RX messages for LinkSetup look almost identical but they have the
+ bit 0x20 set in the command bit, and Channel Setup has added one byte
+ before Checksum containing Channel ID.
+
+ NOTE:
+ Several CAIF Messages might be concatenated. The maximum debug
buffer size is 128 bytes.
-== Error Scenarios:
+Error Scenarios
+===============
+
- last_tx_msg contains channel setup message and last_rx_msg is empty ->
The host seems to be able to send over the UART, at least the CAIF ldisc get
notified that sending is completed.
@@ -103,7 +133,9 @@
- if /sys/kernel/debug/caif_serial/<tty>/tty_status is non-zero there
might be problems transmitting over UART.
+
E.g. host and modem wiring is not correct you will typically see
tty_status = 0x10 (hw_stopped) and ser_state = 0x10 (FLOW_OFF_SENT).
+
You will probably see the enumeration message in last_tx_message
and empty last_rx_message.
diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst
new file mode 100644
index 0000000..905c8a8
--- /dev/null
+++ b/Documentation/networking/checksum-offloads.rst
@@ -0,0 +1,143 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Checksum Offloads
+=================
+
+
+Introduction
+============
+
+This document describes a set of techniques in the Linux networking stack to
+take advantage of checksum offload capabilities of various NICs.
+
+The following technologies are described:
+
+* TX Checksum Offload
+* LCO: Local Checksum Offload
+* RCO: Remote Checksum Offload
+
+Things that should be documented here but aren't yet:
+
+* RX Checksum Offload
+* CHECKSUM_UNNECESSARY conversion
+
+
+TX Checksum Offload
+===================
+
+The interface for offloading a transmit checksum to a device is explained in
+detail in comments near the top of include/linux/skbuff.h.
+
+In brief, it allows to request the device fill in a single ones-complement
+checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
+The device should compute the 16-bit ones-complement checksum (i.e. the
+'IP-style' checksum) from csum_start to the end of the packet, and fill in the
+result at (csum_start + csum_offset).
+
+Because csum_offset cannot be negative, this ensures that the previous value of
+the checksum field is included in the checksum computation, thus it can be used
+to supply any needed corrections to the checksum (such as the sum of the
+pseudo-header for UDP or TCP).
+
+This interface only allows a single checksum to be offloaded. Where
+encapsulation is used, the packet may have multiple checksum fields in
+different header layers, and the rest will have to be handled by another
+mechanism such as LCO or RCO.
+
+CRC32c can also be offloaded using this interface, by means of filling
+skb->csum_start and skb->csum_offset as described above, and setting
+skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
+
+No offloading of the IP header checksum is performed; it is always done in
+software. This is OK because when we build the IP header, we obviously have it
+in cache, so summing it isn't expensive. It's also rather short.
+
+The requirements for GSO are more complicated, because when segmenting an
+encapsulated packet both the inner and outer checksums may need to be edited or
+recomputed for each resulting segment. See the skbuff.h comment (section 'E')
+for more details.
+
+A driver declares its offload capabilities in netdev->hw_features; see
+Documentation/networking/netdev-features.txt for more. Note that a device
+which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
+csum_offset given in the SKB; if it tries to deduce these itself in hardware
+(as some NICs do) the driver should check that the values in the SKB match
+those which the hardware will deduce, and if not, fall back to checksumming in
+software instead (with skb_csum_hwoffload_help() or one of the
+skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
+include/linux/skbuff.h).
+
+The stack should, for the most part, assume that checksum offload is supported
+by the underlying device. The only place that should check is
+validate_xmit_skb(), and the functions it calls directly or indirectly. That
+function compares the offload features requested by the SKB (which may include
+other offloads besides TX Checksum Offload) and, if they are not supported or
+enabled on the device (determined by netdev->features), performs the
+corresponding offload in software. In the case of TX Checksum Offload, that
+means calling skb_csum_hwoffload_help(skb, features).
+
+
+LCO: Local Checksum Offload
+===========================
+
+LCO is a technique for efficiently computing the outer checksum of an
+encapsulated datagram when the inner checksum is due to be offloaded.
+
+The ones-complement sum of a correctly checksummed TCP or UDP packet is equal
+to the complement of the sum of the pseudo header, because everything else gets
+'cancelled out' by the checksum field. This is because the sum was
+complemented before being written to the checksum field.
+
+More generally, this holds in any case where the 'IP-style' ones complement
+checksum is used, and thus any checksum that TX Checksum Offload supports.
+
+That is, if we have set up TX Checksum Offload with a start/offset pair, we
+know that after the device has filled in that checksum, the ones complement sum
+from csum_start to the end of the packet will be equal to the complement of
+whatever value we put in the checksum field beforehand. This allows us to
+compute the outer checksum without looking at the payload: we simply stop
+summing when we get to csum_start, then add the complement of the 16-bit word
+at (csum_start + csum_offset).
+
+Then, when the true inner checksum is filled in (either by hardware or by
+skb_checksum_help()), the outer checksum will become correct by virtue of the
+arithmetic.
+
+LCO is performed by the stack when constructing an outer UDP header for an
+encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the
+IPv6 equivalents, in udp6_set_csum().
+
+It is also performed when constructing an IPv4 GRE header, in
+net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when
+constructing an IPv6 GRE header; the GRE checksum is computed over the whole
+packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
+LCO here as IPv6 GRE still uses an IP-style checksum.
+
+All of the LCO implementations use a helper function lco_csum(), in
+include/linux/skbuff.h.
+
+LCO can safely be used for nested encapsulations; in this case, the outer
+encapsulation layer will sum over both its own header and the 'middle' header.
+This does mean that the 'middle' header will get summed multiple times, but
+there doesn't seem to be a way to avoid that without incurring bigger costs
+(e.g. in SKB bloat).
+
+
+RCO: Remote Checksum Offload
+============================
+
+RCO is a technique for eliding the inner checksum of an encapsulated datagram,
+allowing the outer checksum to be offloaded. It does, however, involve a
+change to the encapsulation protocols, which the receiver must also support.
+For this reason, it is disabled by default.
+
+RCO is detailed in the following Internet-Drafts:
+
+* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
+* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
+
+In Linux, RCO is implemented individually in each encapsulation protocol, and
+most tunnel types have flags controlling its use. For instance, VXLAN has the
+flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
+used when transmitting to a given remote destination.
diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt
deleted file mode 100644
index 27bc09c..0000000
--- a/Documentation/networking/checksum-offloads.txt
+++ /dev/null
@@ -1,122 +0,0 @@
-Checksum Offloads in the Linux Networking Stack
-
-
-Introduction
-============
-
-This document describes a set of techniques in the Linux networking stack
- to take advantage of checksum offload capabilities of various NICs.
-
-The following technologies are described:
- * TX Checksum Offload
- * LCO: Local Checksum Offload
- * RCO: Remote Checksum Offload
-
-Things that should be documented here but aren't yet:
- * RX Checksum Offload
- * CHECKSUM_UNNECESSARY conversion
-
-
-TX Checksum Offload
-===================
-
-The interface for offloading a transmit checksum to a device is explained
- in detail in comments near the top of include/linux/skbuff.h.
-In brief, it allows to request the device fill in a single ones-complement
- checksum defined by the sk_buff fields skb->csum_start and
- skb->csum_offset. The device should compute the 16-bit ones-complement
- checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the
- packet, and fill in the result at (csum_start + csum_offset).
-Because csum_offset cannot be negative, this ensures that the previous
- value of the checksum field is included in the checksum computation, thus
- it can be used to supply any needed corrections to the checksum (such as
- the sum of the pseudo-header for UDP or TCP).
-This interface only allows a single checksum to be offloaded. Where
- encapsulation is used, the packet may have multiple checksum fields in
- different header layers, and the rest will have to be handled by another
- mechanism such as LCO or RCO.
-CRC32c can also be offloaded using this interface, by means of filling
- skb->csum_start and skb->csum_offset as described above, and setting
- skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
-No offloading of the IP header checksum is performed; it is always done in
- software. This is OK because when we build the IP header, we obviously
- have it in cache, so summing it isn't expensive. It's also rather short.
-The requirements for GSO are more complicated, because when segmenting an
- encapsulated packet both the inner and outer checksums may need to be
- edited or recomputed for each resulting segment. See the skbuff.h comment
- (section 'E') for more details.
-
-A driver declares its offload capabilities in netdev->hw_features; see
- Documentation/networking/netdev-features.txt for more. Note that a device
- which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start
- and csum_offset given in the SKB; if it tries to deduce these itself in
- hardware (as some NICs do) the driver should check that the values in the
- SKB match those which the hardware will deduce, and if not, fall back to
- checksumming in software instead (with skb_csum_hwoffload_help() or one of
- the skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
- include/linux/skbuff.h).
-
-The stack should, for the most part, assume that checksum offload is
- supported by the underlying device. The only place that should check is
- validate_xmit_skb(), and the functions it calls directly or indirectly.
- That function compares the offload features requested by the SKB (which
- may include other offloads besides TX Checksum Offload) and, if they are
- not supported or enabled on the device (determined by netdev->features),
- performs the corresponding offload in software. In the case of TX
- Checksum Offload, that means calling skb_csum_hwoffload_help(skb, features).
-
-
-LCO: Local Checksum Offload
-===========================
-
-LCO is a technique for efficiently computing the outer checksum of an
- encapsulated datagram when the inner checksum is due to be offloaded.
-The ones-complement sum of a correctly checksummed TCP or UDP packet is
- equal to the complement of the sum of the pseudo header, because everything
- else gets 'cancelled out' by the checksum field. This is because the sum was
- complemented before being written to the checksum field.
-More generally, this holds in any case where the 'IP-style' ones complement
- checksum is used, and thus any checksum that TX Checksum Offload supports.
-That is, if we have set up TX Checksum Offload with a start/offset pair, we
- know that after the device has filled in that checksum, the ones
- complement sum from csum_start to the end of the packet will be equal to
- the complement of whatever value we put in the checksum field beforehand.
- This allows us to compute the outer checksum without looking at the payload:
- we simply stop summing when we get to csum_start, then add the complement of
- the 16-bit word at (csum_start + csum_offset).
-Then, when the true inner checksum is filled in (either by hardware or by
- skb_checksum_help()), the outer checksum will become correct by virtue of
- the arithmetic.
-
-LCO is performed by the stack when constructing an outer UDP header for an
- encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for
- the IPv6 equivalents, in udp6_set_csum().
-It is also performed when constructing an IPv4 GRE header, in
- net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when
- constructing an IPv6 GRE header; the GRE checksum is computed over the
- whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be
- possible to use LCO here as IPv6 GRE still uses an IP-style checksum.
-All of the LCO implementations use a helper function lco_csum(), in
- include/linux/skbuff.h.
-
-LCO can safely be used for nested encapsulations; in this case, the outer
- encapsulation layer will sum over both its own header and the 'middle'
- header. This does mean that the 'middle' header will get summed multiple
- times, but there doesn't seem to be a way to avoid that without incurring
- bigger costs (e.g. in SKB bloat).
-
-
-RCO: Remote Checksum Offload
-============================
-
-RCO is a technique for eliding the inner checksum of an encapsulated
- datagram, allowing the outer checksum to be offloaded. It does, however,
- involve a change to the encapsulation protocols, which the receiver must
- also support. For this reason, it is disabled by default.
-RCO is detailed in the following Internet-Drafts:
-https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
-https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
-In Linux, RCO is implemented individually in each encapsulation protocol,
- and most tunnel types have flags controlling its use. For instance, VXLAN
- has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that
- RCO should be used when transmitting to a given remote destination.
diff --git a/Documentation/networking/conf.py b/Documentation/networking/conf.py
deleted file mode 100644
index 40f69e6..0000000
--- a/Documentation/networking/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Networking Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'networking.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/networking/decnet.txt b/Documentation/networking/decnet.txt
index e12a490..d192f8b 100644
--- a/Documentation/networking/decnet.txt
+++ b/Documentation/networking/decnet.txt
@@ -22,8 +22,6 @@
CONFIG_DECNET_ROUTER (to be able to add/delete routes)
CONFIG_NETFILTER (will be required for the DECnet routing daemon)
- CONFIG_DECNET_ROUTE_FWMARK is optional
-
Don't turn on SIOCGIFCONF support for DECnet unless you are really sure
that you need it, in general you won't and it can cause ifconfig to
malfunction.
diff --git a/Documentation/networking/defza.txt b/Documentation/networking/defza.txt
new file mode 100644
index 0000000..663e4a9
--- /dev/null
+++ b/Documentation/networking/defza.txt
@@ -0,0 +1,57 @@
+Notes on the DEC FDDIcontroller 700 (DEFZA-xx) driver v.1.1.4.
+
+
+DEC FDDIcontroller 700 is DEC's first-generation TURBOchannel FDDI
+network card, designed in 1990 specifically for the DECstation 5000
+model 200 workstation. The board is a single attachment station and
+it was manufactured in two variations, both of which are supported.
+
+First is the SAS MMF DEFZA-AA option, the original design implementing
+the standard MMF-PMD, however with a pair of ST connectors rather than
+the usual MIC connector. The other one is the SAS ThinWire/STP DEFZA-CA
+option, denoted 700-C, with the network medium selectable by a switch
+between the DEC proprietary ThinWire-PMD using a BNC connector and the
+standard STP-PMD using a DE-9F connector. This option can interface to
+a DECconcentrator 500 device and, in the case of the STP-PMD, also other
+FDDI equipment and was designed to make it easier to transition from
+existing IEEE 802.3 10BASE2 Ethernet and IEEE 802.5 Token Ring networks
+by providing means to reuse existing cabling.
+
+This driver handles any number of cards installed in a single system.
+They get fddi0, fddi1, etc. interface names assigned in the order of
+increasing TURBOchannel slot numbers.
+
+The board only supports DMA on the receive side. Transmission involves
+the use of PIO. As a result under a heavy transmission load there will
+be a significant impact on system performance.
+
+The board supports a 64-entry CAM for matching destination addresses.
+Two entries are preoccupied by the Directed Beacon and Ring Purger
+multicast addresses and the rest is used as a multicast filter. An
+all-multi mode is also supported for LLC frames and it is used if
+requested explicitly or if the CAM overflows. The promiscuous mode
+supports separate enables for LLC and SMT frames, but this driver
+doesn't support changing them individually.
+
+
+Known problems:
+
+None.
+
+
+To do:
+
+5. MAC address change. The card does not support changing the Media
+ Access Controller's address registers but a similar effect can be
+ achieved by adding an alias to the CAM. There is no way to disable
+ matching against the original address though.
+
+7. Queueing incoming/outgoing SMT frames in the driver if the SMT
+ receive/RMC transmit ring is full. (?)
+
+8. Retrieving/reporting FDDI/SNMP stats.
+
+
+Both success and failure reports are welcome.
+
+Maciej W. Rozycki <macro@linux-mips.org>
diff --git a/Documentation/networking/3c509.txt b/Documentation/networking/device_drivers/3com/3c509.txt
similarity index 100%
rename from Documentation/networking/3c509.txt
rename to Documentation/networking/device_drivers/3com/3c509.txt
diff --git a/Documentation/networking/vortex.txt b/Documentation/networking/device_drivers/3com/vortex.txt
similarity index 99%
rename from Documentation/networking/vortex.txt
rename to Documentation/networking/device_drivers/3com/vortex.txt
index ad3dead..587f3fc 100644
--- a/Documentation/networking/vortex.txt
+++ b/Documentation/networking/device_drivers/3com/vortex.txt
@@ -1,4 +1,4 @@
-Documentation/networking/vortex.txt
+Documentation/networking/device_drivers/3com/vortex.txt
Andrew Morton
30 April 2000
diff --git a/Documentation/networking/ena.txt b/Documentation/networking/device_drivers/amazon/ena.txt
similarity index 98%
rename from Documentation/networking/ena.txt
rename to Documentation/networking/device_drivers/amazon/ena.txt
index 2b4b6f5..1bb55c7 100644
--- a/Documentation/networking/ena.txt
+++ b/Documentation/networking/device_drivers/amazon/ena.txt
@@ -73,7 +73,7 @@
AQ is used for submitting management commands, and the
results/responses are reported asynchronously through ACQ.
-ENA introduces a very small set of management commands with room for
+ENA introduces a small set of management commands with room for
vendor-specific extensions. Most of the management operations are
framed in a generic Get/Set feature command.
@@ -202,11 +202,14 @@
The user can enable/disable adaptive moderation, modify the interrupt
delay table and restore its default values through sysfs.
+RX copybreak:
+=============
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
and can be configured by the ETHTOOL_STUNABLE command of the
SIOCETHTOOL ioctl.
SKB:
+====
The driver-allocated SKB for frames received from Rx handling using
NAPI context. The allocation method depends on the size of the packet.
If the frame length is larger than rx_copybreak, napi_get_frags()
diff --git a/Documentation/networking/device_drivers/aquantia/atlantic.txt b/Documentation/networking/device_drivers/aquantia/atlantic.txt
new file mode 100644
index 0000000..d235cba
--- /dev/null
+++ b/Documentation/networking/device_drivers/aquantia/atlantic.txt
@@ -0,0 +1,439 @@
+aQuantia AQtion Driver for the aQuantia Multi-Gigabit PCI Express Family of
+Ethernet Adapters
+=============================================================================
+
+Contents
+========
+
+- Identifying Your Adapter
+- Configuration
+- Supported ethtool options
+- Command Line Parameters
+- Config file parameters
+- Support
+- License
+
+Identifying Your Adapter
+========================
+
+The driver in this release is compatible with AQC-100, AQC-107, AQC-108 based ethernet adapters.
+
+
+SFP+ Devices (for AQC-100 based adapters)
+----------------------------------
+
+This release tested with passive Direct Attach Cables (DAC) and SFP+/LC Optical Transceiver.
+
+Configuration
+=========================
+ Viewing Link Messages
+ ---------------------
+ Link messages will not be displayed to the console if the distribution is
+ restricting system messages. In order to see network driver link messages on
+ your console, set dmesg to eight by entering the following:
+
+ dmesg -n 8
+
+ NOTE: This setting is not saved across reboots.
+
+ Jumbo Frames
+ ------------
+ The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
+ enabled by changing the MTU to a value larger than the default of 1500.
+ The maximum value for the MTU is 16000. Use the `ip` command to
+ increase the MTU size. For example:
+
+ ip link set mtu 16000 dev enp1s0
+
+ ethtool
+ -------
+ The driver utilizes the ethtool interface for driver configuration and
+ diagnostics, as well as displaying statistical information. The latest
+ ethtool version is required for this functionality.
+
+ NAPI
+ ----
+ NAPI (Rx polling mode) is supported in the atlantic driver.
+
+Supported ethtool options
+============================
+ Viewing adapter settings
+ ---------------------
+ ethtool <ethX>
+
+ Output example:
+
+ Settings for enp1s0:
+ Supported ports: [ TP ]
+ Supported link modes: 100baseT/Full
+ 1000baseT/Full
+ 10000baseT/Full
+ 2500baseT/Full
+ 5000baseT/Full
+ Supported pause frame use: Symmetric
+ Supports auto-negotiation: Yes
+ Supported FEC modes: Not reported
+ Advertised link modes: 100baseT/Full
+ 1000baseT/Full
+ 10000baseT/Full
+ 2500baseT/Full
+ 5000baseT/Full
+ Advertised pause frame use: Symmetric
+ Advertised auto-negotiation: Yes
+ Advertised FEC modes: Not reported
+ Speed: 10000Mb/s
+ Duplex: Full
+ Port: Twisted Pair
+ PHYAD: 0
+ Transceiver: internal
+ Auto-negotiation: on
+ MDI-X: Unknown
+ Supports Wake-on: g
+ Wake-on: d
+ Link detected: yes
+
+ ---
+ Note: AQrate speeds (2.5/5 Gb/s) will be displayed only with linux kernels > 4.10.
+ But you can still use these speeds:
+ ethtool -s eth0 autoneg off speed 2500
+
+ Viewing adapter information
+ ---------------------
+ ethtool -i <ethX>
+
+ Output example:
+
+ driver: atlantic
+ version: 5.2.0-050200rc5-generic-kern
+ firmware-version: 3.1.78
+ expansion-rom-version:
+ bus-info: 0000:01:00.0
+ supports-statistics: yes
+ supports-test: no
+ supports-eeprom-access: no
+ supports-register-dump: yes
+ supports-priv-flags: no
+
+
+ Viewing Ethernet adapter statistics:
+ ---------------------
+ ethtool -S <ethX>
+
+ Output example:
+ NIC statistics:
+ InPackets: 13238607
+ InUCast: 13293852
+ InMCast: 52
+ InBCast: 3
+ InErrors: 0
+ OutPackets: 23703019
+ OutUCast: 23704941
+ OutMCast: 67
+ OutBCast: 11
+ InUCastOctects: 213182760
+ OutUCastOctects: 22698443
+ InMCastOctects: 6600
+ OutMCastOctects: 8776
+ InBCastOctects: 192
+ OutBCastOctects: 704
+ InOctects: 2131839552
+ OutOctects: 226938073
+ InPacketsDma: 95532300
+ OutPacketsDma: 59503397
+ InOctetsDma: 1137102462
+ OutOctetsDma: 2394339518
+ InDroppedDma: 0
+ Queue[0] InPackets: 23567131
+ Queue[0] OutPackets: 20070028
+ Queue[0] InJumboPackets: 0
+ Queue[0] InLroPackets: 0
+ Queue[0] InErrors: 0
+ Queue[1] InPackets: 45428967
+ Queue[1] OutPackets: 11306178
+ Queue[1] InJumboPackets: 0
+ Queue[1] InLroPackets: 0
+ Queue[1] InErrors: 0
+ Queue[2] InPackets: 3187011
+ Queue[2] OutPackets: 13080381
+ Queue[2] InJumboPackets: 0
+ Queue[2] InLroPackets: 0
+ Queue[2] InErrors: 0
+ Queue[3] InPackets: 23349136
+ Queue[3] OutPackets: 15046810
+ Queue[3] InJumboPackets: 0
+ Queue[3] InLroPackets: 0
+ Queue[3] InErrors: 0
+
+ Interrupt coalescing support
+ ---------------------------------
+ ITR mode, TX/RX coalescing timings could be viewed with:
+
+ ethtool -c <ethX>
+
+ and changed with:
+
+ ethtool -C <ethX> tx-usecs <usecs> rx-usecs <usecs>
+
+ To disable coalescing:
+
+ ethtool -C <ethX> tx-usecs 0 rx-usecs 0 tx-max-frames 1 tx-max-frames 1
+
+ Wake on LAN support
+ ---------------------------------
+
+ WOL support by magic packet:
+
+ ethtool -s <ethX> wol g
+
+ To disable WOL:
+
+ ethtool -s <ethX> wol d
+
+ Set and check the driver message level
+ ---------------------------------
+
+ Set message level
+
+ ethtool -s <ethX> msglvl <level>
+
+ Level values:
+
+ 0x0001 - general driver status.
+ 0x0002 - hardware probing.
+ 0x0004 - link state.
+ 0x0008 - periodic status check.
+ 0x0010 - interface being brought down.
+ 0x0020 - interface being brought up.
+ 0x0040 - receive error.
+ 0x0080 - transmit error.
+ 0x0200 - interrupt handling.
+ 0x0400 - transmit completion.
+ 0x0800 - receive completion.
+ 0x1000 - packet contents.
+ 0x2000 - hardware status.
+ 0x4000 - Wake-on-LAN status.
+
+ By default, the level of debugging messages is set 0x0001(general driver status).
+
+ Check message level
+
+ ethtool <ethX> | grep "Current message level"
+
+ If you want to disable the output of messages
+
+ ethtool -s <ethX> msglvl 0
+
+ RX flow rules (ntuple filters)
+ ---------------------------------
+ There are separate rules supported, that applies in that order:
+ 1. 16 VLAN ID rules
+ 2. 16 L2 EtherType rules
+ 3. 8 L3/L4 5-Tuple rules
+
+
+ The driver utilizes the ethtool interface for configuring ntuple filters,
+ via "ethtool -N <device> <filter>".
+
+ To enable or disable the RX flow rules:
+
+ ethtool -K ethX ntuple <on|off>
+
+ When disabling ntuple filters, all the user programed filters are
+ flushed from the driver cache and hardware. All needed filters must
+ be re-added when ntuple is re-enabled.
+
+ Because of the fixed order of the rules, the location of filters is also fixed:
+ - Locations 0 - 15 for VLAN ID filters
+ - Locations 16 - 31 for L2 EtherType filters
+ - Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)
+
+ The L3/L4 5-tuple (protocol, source and destination IP address, source and
+ destination TCP/UDP/SCTP port) is compared against 8 filters. For IPv4, up to
+ 8 source and destination addresses can be matched. For IPv6, up to 2 pairs of
+ addresses can be supported. Source and destination ports are only compared for
+ TCP/UDP/SCTP packets.
+
+ To add a filter that directs packet to queue 5, use <-N|-U|--config-nfc|--config-ntuple> switch:
+
+ ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 2000 dst-port 2001 action 5 <loc 32>
+
+ - action is the queue number.
+ - loc is the rule number.
+
+ For "flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6" you must set the loc
+ number within 32 - 39.
+ For "flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6" you can set 8 rules
+ for traffic IPv4 or you can set 2 rules for traffic IPv6. Loc number traffic
+ IPv6 is 32 and 36.
+ At the moment you can not use IPv4 and IPv6 filters at the same time.
+
+ Example filter for IPv6 filter traffic:
+
+ sudo ethtool -N <ethX> flow-type tcp6 src-ip 2001:db8:0:f101::1 dst-ip 2001:db8:0:f101::2 action 1 loc 32
+ sudo ethtool -N <ethX> flow-type ip6 src-ip 2001:db8:0:f101::2 dst-ip 2001:db8:0:f101::5 action -1 loc 36
+
+ Example filter for IPv4 filter traffic:
+
+ sudo ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.4 dst-ip 10.0.0.7 src-port 2000 dst-port 2001 loc 32
+ sudo ethtool -N <ethX> flow-type tcp4 src-ip 10.0.0.3 dst-ip 10.0.0.9 src-port 2000 dst-port 2001 loc 33
+ sudo ethtool -N <ethX> flow-type ip4 src-ip 10.0.0.6 dst-ip 10.0.0.4 loc 34
+
+ If you set action -1, then all traffic corresponding to the filter will be discarded.
+ The maximum value action is 31.
+
+
+ The VLAN filter (VLAN id) is compared against 16 filters.
+ VLAN id must be accompanied by mask 0xF000. That is to distinguish VLAN filter
+ from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID
+ are passed in the same 'vlan' parameter.
+
+ To add a filter that directs packets from VLAN 2001 to queue 5:
+ ethtool -N <ethX> flow-type ip4 vlan 2001 m 0xF000 action 1 loc 0
+
+
+ L2 EtherType filters allows filter packet by EtherType field or both EtherType
+ and User Priority (PCP) field of 802.1Q.
+ UserPriority (vlan) parameter must be accompanied by mask 0x1FFF. That is to
+ distinguish VLAN filter from L2 Ethertype filter with UserPriority since both
+ User Priority and VLAN ID are passed in the same 'vlan' parameter.
+
+ To add a filter that directs IP4 packess of priority 3 to queue 3:
+ ethtool -N <ethX> flow-type ether proto 0x800 vlan 0x600 m 0x1FFF action 3 loc 16
+
+
+ To see the list of filters currently present:
+
+ ethtool <-u|-n|--show-nfc|--show-ntuple> <ethX>
+
+ Rules may be deleted from the table itself. This is done using:
+
+ sudo ethtool <-N|-U|--config-nfc|--config-ntuple> <ethX> delete <loc>
+
+ - loc is the rule number to be deleted.
+
+ Rx filters is an interface to load the filter table that funnels all flow
+ into queue 0 unless an alternative queue is specified using "action". In that
+ case, any flow that matches the filter criteria will be directed to the
+ appropriate queue. RX filters is supported on all kernels 2.6.30 and later.
+
+ RSS for UDP
+ ---------------------------------
+ Currently, NIC does not support RSS for fragmented IP packets, which leads to
+ incorrect working of RSS for fragmented UDP traffic. To disable RSS for UDP the
+ RX Flow L3/L4 rule may be used.
+
+ Example:
+ ethtool -N eth0 flow-type udp4 action 0 loc 32
+
+Command Line Parameters
+=======================
+The following command line parameters are available on atlantic driver:
+
+aq_itr -Interrupt throttling mode
+----------------------------------------
+Accepted values: 0, 1, 0xFFFF
+Default value: 0xFFFF
+0 - Disable interrupt throttling.
+1 - Enable interrupt throttling and use specified tx and rx rates.
+0xFFFF - Auto throttling mode. Driver will choose the best RX and TX
+ interrupt throtting settings based on link speed.
+
+aq_itr_tx - TX interrupt throttle rate
+----------------------------------------
+Accepted values: 0 - 0x1FF
+Default value: 0
+TX side throttling in microseconds. Adapter will setup maximum interrupt delay
+to this value. Minimum interrupt delay will be a half of this value
+
+aq_itr_rx - RX interrupt throttle rate
+----------------------------------------
+Accepted values: 0 - 0x1FF
+Default value: 0
+RX side throttling in microseconds. Adapter will setup maximum interrupt delay
+to this value. Minimum interrupt delay will be a half of this value
+
+Note: ITR settings could be changed in runtime by ethtool -c means (see below)
+
+Config file parameters
+=======================
+For some fine tuning and performance optimizations,
+some parameters can be changed in the {source_dir}/aq_cfg.h file.
+
+AQ_CFG_RX_PAGEORDER
+----------------------------------------
+Default value: 0
+RX page order override. Thats a power of 2 number of RX pages allocated for
+each descriptor. Received descriptor size is still limited by AQ_CFG_RX_FRAME_MAX.
+Increasing pageorder makes page reuse better (actual on iommu enabled systems).
+
+AQ_CFG_RX_REFILL_THRES
+----------------------------------------
+Default value: 32
+RX refill threshold. RX path will not refill freed descriptors until the
+specified number of free descriptors is observed. Larger values may help
+better page reuse but may lead to packet drops as well.
+
+AQ_CFG_VECS_DEF
+------------------------------------------------------------
+Number of queues
+Valid Range: 0 - 8 (up to AQ_CFG_VECS_MAX)
+Default value: 8
+Notice this value will be capped by the number of cores available on the system.
+
+AQ_CFG_IS_RSS_DEF
+------------------------------------------------------------
+Enable/disable Receive Side Scaling
+
+This feature allows the adapter to distribute receive processing
+across multiple CPU-cores and to prevent from overloading a single CPU core.
+
+Valid values
+0 - disabled
+1 - enabled
+
+Default value: 1
+
+AQ_CFG_NUM_RSS_QUEUES_DEF
+------------------------------------------------------------
+Number of queues for Receive Side Scaling
+Valid Range: 0 - 8 (up to AQ_CFG_VECS_DEF)
+
+Default value: AQ_CFG_VECS_DEF
+
+AQ_CFG_IS_LRO_DEF
+------------------------------------------------------------
+Enable/disable Large Receive Offload
+
+This offload enables the adapter to coalesce multiple TCP segments and indicate
+them as a single coalesced unit to the OS networking subsystem.
+The system consumes less energy but it also introduces more latency in packets processing.
+
+Valid values
+0 - disabled
+1 - enabled
+
+Default value: 1
+
+AQ_CFG_TX_CLEAN_BUDGET
+----------------------------------------
+Maximum descriptors to cleanup on TX at once.
+Default value: 256
+
+After the aq_cfg.h file changed the driver must be rebuilt to take effect.
+
+Support
+=======
+
+If an issue is identified with the released source code on the supported
+kernel with a supported adapter, email the specific information related
+to the issue to support@aquantia.com
+
+License
+=======
+
+aQuantia Corporation Network Driver
+Copyright(c) 2014 - 2019 aQuantia Corporation.
+
+This program is free software; you can redistribute it and/or modify it
+under the terms and conditions of the GNU General Public License,
+version 2, as published by the Free Software Foundation.
diff --git a/Documentation/networking/cxgb.txt b/Documentation/networking/device_drivers/chelsio/cxgb.txt
similarity index 100%
rename from Documentation/networking/cxgb.txt
rename to Documentation/networking/device_drivers/chelsio/cxgb.txt
diff --git a/Documentation/networking/cs89x0.txt b/Documentation/networking/device_drivers/cirrus/cs89x0.txt
similarity index 100%
rename from Documentation/networking/cs89x0.txt
rename to Documentation/networking/device_drivers/cirrus/cs89x0.txt
diff --git a/Documentation/networking/dm9000.txt b/Documentation/networking/device_drivers/davicom/dm9000.txt
similarity index 100%
rename from Documentation/networking/dm9000.txt
rename to Documentation/networking/device_drivers/davicom/dm9000.txt
diff --git a/Documentation/networking/de4x5.txt b/Documentation/networking/device_drivers/dec/de4x5.txt
similarity index 98%
rename from Documentation/networking/de4x5.txt
rename to Documentation/networking/device_drivers/dec/de4x5.txt
index c8e4ca9..452aac5 100644
--- a/Documentation/networking/de4x5.txt
+++ b/Documentation/networking/device_drivers/dec/de4x5.txt
@@ -84,7 +84,7 @@
Automedia detection is included so that in principle you can disconnect
from, e.g. TP, reconnect to BNC and things will still work (after a
- pause whilst the driver figures out where its media went). My tests
+ pause while the driver figures out where its media went). My tests
using ping showed that it appears to work....
By default, the driver will now autodetect any DECchip based card.
diff --git a/Documentation/networking/dmfe.txt b/Documentation/networking/device_drivers/dec/dmfe.txt
similarity index 100%
rename from Documentation/networking/dmfe.txt
rename to Documentation/networking/device_drivers/dec/dmfe.txt
diff --git a/Documentation/networking/dl2k.txt b/Documentation/networking/device_drivers/dlink/dl2k.txt
similarity index 100%
rename from Documentation/networking/dl2k.txt
rename to Documentation/networking/device_drivers/dlink/dl2k.txt
diff --git a/Documentation/networking/dpaa.txt b/Documentation/networking/device_drivers/freescale/dpaa.txt
similarity index 100%
rename from Documentation/networking/dpaa.txt
rename to Documentation/networking/device_drivers/freescale/dpaa.txt
diff --git a/Documentation/networking/dpaa2/dpio-driver.rst b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
similarity index 91%
rename from Documentation/networking/dpaa2/dpio-driver.rst
rename to Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
index 1358810..17dbee1 100644
--- a/Documentation/networking/dpaa2/dpio-driver.rst
+++ b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
@@ -19,27 +19,27 @@
This document provides an overview the Linux DPIO driver, its
subcomponents, and its APIs.
-See Documentation/networking/dpaa2/overview.rst for a general overview of DPAA2
-and the general DPAA2 driver architecture in Linux.
+See Documentation/networking/device_drivers/freescale/dpaa2/overview.rst for
+a general overview of DPAA2 and the general DPAA2 driver architecture in Linux.
Driver Overview
---------------
The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and
provides services that:
- A) allow other drivers, such as the Ethernet driver, to enqueue and dequeue
+
+ A. allow other drivers, such as the Ethernet driver, to enqueue and dequeue
frames for their respective objects
- B) allow drivers to register callbacks for data availability notifications
+ B. allow drivers to register callbacks for data availability notifications
when data becomes available on a queue or channel
- C) allow drivers to manage hardware buffer pools
+ C. allow drivers to manage hardware buffer pools
The Linux DPIO driver consists of 3 primary components--
DPIO object driver-- fsl-mc driver that manages the DPIO object
DPIO service-- provides APIs to other Linux drivers for services
- QBman portal interface-- sends portal commands, gets responses
-::
+ QBman portal interface-- sends portal commands, gets responses::
fsl-mc other
bus drivers
@@ -59,6 +59,7 @@
The diagram below shows how the DPIO driver components fit with the other
DPAA2 Linux driver components::
+
+------------+
| OS Network |
| Stack |
@@ -140,11 +141,10 @@
The qbman-portal component provides APIs to do the low level hardware
bit twiddling for operations such as:
- -initializing Qman software portals
- -building and sending portal commands
-
- -portal interrupt configuration and processing
+ - initializing Qman software portals
+ - building and sending portal commands
+ - portal interrupt configuration and processing
The qbman-portal APIs are not public to other drivers, and are
only used by dpio-service.
diff --git a/Documentation/networking/device_drivers/freescale/dpaa2/ethernet-driver.rst b/Documentation/networking/device_drivers/freescale/dpaa2/ethernet-driver.rst
new file mode 100644
index 0000000..cb4c9a0
--- /dev/null
+++ b/Documentation/networking/device_drivers/freescale/dpaa2/ethernet-driver.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===============================
+DPAA2 Ethernet driver
+===============================
+
+:Copyright: |copy| 2017-2018 NXP
+
+This file provides documentation for the Freescale DPAA2 Ethernet driver.
+
+Supported Platforms
+===================
+This driver provides networking support for Freescale DPAA2 SoCs, e.g.
+LS2080A, LS2088A, LS1088A.
+
+
+Architecture Overview
+=====================
+Unlike regular NICs, in the DPAA2 architecture there is no single hardware block
+representing network interfaces; instead, several separate hardware resources
+concur to provide the networking functionality:
+
+- network interfaces
+- queues, channels
+- buffer pools
+- MAC/PHY
+
+All hardware resources are allocated and configured through the Management
+Complex (MC) portals. MC abstracts most of these resources as DPAA2 objects
+and exposes ABIs through which they can be configured and controlled. A few
+hardware resources, like queues, do not have a corresponding MC object and
+are treated as internal resources of other objects.
+
+For a more detailed description of the DPAA2 architecture and its object
+abstractions see *Documentation/networking/device_drivers/freescale/dpaa2/overview.rst*.
+
+Each Linux net device is built on top of a Datapath Network Interface (DPNI)
+object and uses Buffer Pools (DPBPs), I/O Portals (DPIOs) and Concentrators
+(DPCONs).
+
+Configuration interface::
+
+ -----------------------
+ | DPAA2 Ethernet Driver |
+ -----------------------
+ . . .
+ . . .
+ . . . . . . . . . . . .
+ . . .
+ . . .
+ ---------- ---------- -----------
+ | DPBP API | | DPNI API | | DPCON API |
+ ---------- ---------- -----------
+ . . . software
+ ======= . ========== . ============ . ===================
+ . . . hardware
+ ------------------------------------------
+ | MC hardware portals |
+ ------------------------------------------
+ . . .
+ . . .
+ ------ ------ -------
+ | DPBP | | DPNI | | DPCON |
+ ------ ------ -------
+
+The DPNIs are network interfaces without a direct one-on-one mapping to PHYs.
+DPBPs represent hardware buffer pools. Packet I/O is performed in the context
+of DPCON objects, using DPIO portals for managing and communicating with the
+hardware resources.
+
+Datapath (I/O) interface::
+
+ -----------------------------------------------
+ | DPAA2 Ethernet Driver |
+ -----------------------------------------------
+ | ^ ^ | |
+ | | | | |
+ enqueue| dequeue| data | dequeue| seed |
+ (Tx) | (Rx, TxC)| avail.| request| buffers|
+ | | notify| | |
+ | | | | |
+ V | | V V
+ -----------------------------------------------
+ | DPIO Driver |
+ -----------------------------------------------
+ | | | | | software
+ | | | | | ================
+ | | | | | hardware
+ -----------------------------------------------
+ | I/O hardware portals |
+ -----------------------------------------------
+ | ^ ^ | |
+ | | | | |
+ | | | V |
+ V | ================ V
+ ---------------------- | -------------
+ queues ---------------------- | | Buffer pool |
+ ---------------------- | -------------
+ =======================
+ Channel
+
+Datapath I/O (DPIO) portals provide enqueue and dequeue services, data
+availability notifications and buffer pool management. DPIOs are shared between
+all DPAA2 objects (and implicitly all DPAA2 kernel drivers) that work with data
+frames, but must be affine to the CPUs for the purpose of traffic distribution.
+
+Frames are transmitted and received through hardware frame queues, which can be
+grouped in channels for the purpose of hardware scheduling. The Ethernet driver
+enqueues TX frames on egress queues and after transmission is complete a TX
+confirmation frame is sent back to the CPU.
+
+When frames are available on ingress queues, a data availability notification
+is sent to the CPU; notifications are raised per channel, so even if multiple
+queues in the same channel have available frames, only one notification is sent.
+After a channel fires a notification, is must be explicitly rearmed.
+
+Each network interface can have multiple Rx, Tx and confirmation queues affined
+to CPUs, and one channel (DPCON) for each CPU that services at least one queue.
+DPCONs are used to distribute ingress traffic to different CPUs via the cores'
+affine DPIOs.
+
+The role of hardware buffer pools is storage of ingress frame data. Each network
+interface has a privately owned buffer pool which it seeds with kernel allocated
+buffers.
+
+
+DPNIs are decoupled from PHYs; a DPNI can be connected to a PHY through a DPMAC
+object or to another DPNI through an internal link, but the connection is
+managed by MC and completely transparent to the Ethernet driver.
+
+::
+
+ --------- --------- ---------
+ | eth if1 | | eth if2 | | eth ifn |
+ --------- --------- ---------
+ . . .
+ . . .
+ . . .
+ ---------------------------
+ | DPAA2 Ethernet Driver |
+ ---------------------------
+ . . .
+ . . .
+ . . .
+ ------ ------ ------ -------
+ | DPNI | | DPNI | | DPNI | | DPMAC |----+
+ ------ ------ ------ ------- |
+ | | | | |
+ | | | | -----
+ =========== ================== | PHY |
+ -----
+
+Creating a Network Interface
+============================
+A net device is created for each DPNI object probed on the MC bus. Each DPNI has
+a number of properties which determine the network interface configuration
+options and associated hardware resources.
+
+DPNI objects (and the other DPAA2 objects needed for a network interface) can be
+added to a container on the MC bus in one of two ways: statically, through a
+Datapath Layout Binary file (DPL) that is parsed by MC at boot time; or created
+dynamically at runtime, via the DPAA2 objects APIs.
+
+
+Features & Offloads
+===================
+Hardware checksum offloading is supported for TCP and UDP over IPv4/6 frames.
+The checksum offloads can be independently configured on RX and TX through
+ethtool.
+
+Hardware offload of unicast and multicast MAC filtering is supported on the
+ingress path and permanently enabled.
+
+Scatter-gather frames are supported on both RX and TX paths. On TX, SG support
+is configurable via ethtool; on RX it is always enabled.
+
+The DPAA2 hardware can process jumbo Ethernet frames of up to 10K bytes.
+
+The Ethernet driver defines a static flow hashing scheme that distributes
+traffic based on a 5-tuple key: src IP, dst IP, IP proto, L4 src port,
+L4 dst port. No user configuration is supported for now.
+
+Hardware specific statistics for the network interface as well as some
+non-standard driver stats can be consulted through ethtool -S option.
diff --git a/Documentation/networking/dpaa2/index.rst b/Documentation/networking/device_drivers/freescale/dpaa2/index.rst
similarity index 85%
rename from Documentation/networking/dpaa2/index.rst
rename to Documentation/networking/device_drivers/freescale/dpaa2/index.rst
index 10bea11..67bd87f 100644
--- a/Documentation/networking/dpaa2/index.rst
+++ b/Documentation/networking/device_drivers/freescale/dpaa2/index.rst
@@ -7,3 +7,4 @@
overview
dpio-driver
+ ethernet-driver
diff --git a/Documentation/networking/dpaa2/overview.rst b/Documentation/networking/device_drivers/freescale/dpaa2/overview.rst
similarity index 100%
rename from Documentation/networking/dpaa2/overview.rst
rename to Documentation/networking/device_drivers/freescale/dpaa2/overview.rst
diff --git a/Documentation/networking/gianfar.txt b/Documentation/networking/device_drivers/freescale/gianfar.txt
similarity index 100%
rename from Documentation/networking/gianfar.txt
rename to Documentation/networking/device_drivers/freescale/gianfar.txt
diff --git a/Documentation/networking/device_drivers/google/gve.rst b/Documentation/networking/device_drivers/google/gve.rst
new file mode 100644
index 0000000..793693c
--- /dev/null
+++ b/Documentation/networking/device_drivers/google/gve.rst
@@ -0,0 +1,123 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+==============================================================
+Linux kernel driver for Compute Engine Virtual Ethernet (gve):
+==============================================================
+
+Supported Hardware
+===================
+The GVE driver binds to a single PCI device id used by the virtual
+Ethernet device found in some Compute Engine VMs.
+
++--------------+----------+---------+
+|Field | Value | Comments|
++==============+==========+=========+
+|Vendor ID | `0x1AE0` | Google |
++--------------+----------+---------+
+|Device ID | `0x0042` | |
++--------------+----------+---------+
+|Sub-vendor ID | `0x1AE0` | Google |
++--------------+----------+---------+
+|Sub-device ID | `0x0058` | |
++--------------+----------+---------+
+|Revision ID | `0x0` | |
++--------------+----------+---------+
+|Device Class | `0x200` | Ethernet|
++--------------+----------+---------+
+
+PCI Bars
+========
+The gVNIC PCI device exposes three 32-bit memory BARS:
+- Bar0 - Device configuration and status registers.
+- Bar1 - MSI-X vector table
+- Bar2 - IRQ, RX and TX doorbells
+
+Device Interactions
+===================
+The driver interacts with the device in the following ways:
+ - Registers
+ - A block of MMIO registers
+ - See gve_register.h for more detail
+ - Admin Queue
+ - See description below
+ - Reset
+ - At any time the device can be reset
+ - Interrupts
+ - See supported interrupts below
+ - Transmit and Receive Queues
+ - See description below
+
+Registers
+---------
+All registers are MMIO and big endian.
+
+The registers are used for initializing and configuring the device as well as
+querying device status in response to management interrupts.
+
+Admin Queue (AQ)
+----------------
+The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ
+commands, used by the driver to issue commands to the device and set up
+resources.The driver and the device maintain a count of how many commands
+have been submitted and executed. To issue AQ commands, the driver must do
+the following (with proper locking):
+
+1) Copy new commands into next available slots in the AQ array
+2) Increment its counter by he number of new commands
+3) Write the counter into the GVE_ADMIN_QUEUE_DOORBELL register
+4) Poll the ADMIN_QUEUE_EVENT_COUNTER register until it equals
+ the value written to the doorbell, or until a timeout.
+
+The device will update the status field in each AQ command reported as
+executed through the ADMIN_QUEUE_EVENT_COUNTER register.
+
+Device Resets
+-------------
+A device reset is triggered by writing 0x0 to the AQ PFN register.
+This causes the device to release all resources allocated by the
+driver, including the AQ itself.
+
+Interrupts
+----------
+The following interrupts are supported by the driver:
+
+Management Interrupt
+~~~~~~~~~~~~~~~~~~~~
+The management interrupt is used by the device to tell the driver to
+look at the GVE_DEVICE_STATUS register.
+
+The handler for the management irq simply queues the service task in
+the workqueue to check the register and acks the irq.
+
+Notification Block Interrupts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The notification block interrupts are used to tell the driver to poll
+the queues associated with that interrupt.
+
+The handler for these irqs schedule the napi for that block to run
+and poll the queues.
+
+Traffic Queues
+--------------
+gVNIC's queues are composed of a descriptor ring and a buffer and are
+assigned to a notification block.
+
+The descriptor rings are power-of-two-sized ring buffers consisting of
+fixed-size descriptors. They advance their head pointer using a __be32
+doorbell located in Bar2. The tail pointers are advanced by consuming
+descriptors in-order and updating a __be32 counter. Both the doorbell
+and the counter overflow to zero.
+
+Each queue's buffers must be registered in advance with the device as a
+queue page list, and packet data can only be put in those pages.
+
+Transmit
+~~~~~~~~
+gve maps the buffers for transmit rings into a FIFO and copies the packets
+into the FIFO before sending them to the NIC.
+
+Receive
+~~~~~~~
+The buffers for receive rings are put into a data ring that is the same
+length as the descriptor ring and the head and tail pointers advance over
+the rings together.
diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst
new file mode 100644
index 0000000..c1f7f75
--- /dev/null
+++ b/Documentation/networking/device_drivers/index.rst
@@ -0,0 +1,34 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+Vendor Device Drivers
+=====================
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ freescale/dpaa2/index
+ intel/e100
+ intel/e1000
+ intel/e1000e
+ intel/fm10k
+ intel/igb
+ intel/igbvf
+ intel/ixgb
+ intel/ixgbe
+ intel/ixgbevf
+ intel/i40e
+ intel/iavf
+ intel/ice
+ google/gve
+ mellanox/mlx5
+ netronome/nfp
+ pensando/ionic
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/networking/e100.rst b/Documentation/networking/device_drivers/intel/e100.rst
similarity index 93%
rename from Documentation/networking/e100.rst
rename to Documentation/networking/device_drivers/intel/e100.rst
index f81111e..caf023c 100644
--- a/Documentation/networking/e100.rst
+++ b/Documentation/networking/device_drivers/intel/e100.rst
@@ -1,6 +1,8 @@
-==============================================================
-Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters
-==============================================================
+.. SPDX-License-Identifier: GPL-2.0+
+
+=============================================================
+Linux Base Driver for the Intel(R) PRO/100 Family of Adapters
+=============================================================
June 1, 2018
@@ -19,7 +21,7 @@
In This Release
===============
-This file describes the Linux* Base Driver for the Intel(R) PRO/100 Family of
+This file describes the Linux Base Driver for the Intel(R) PRO/100 Family of
Adapters. This driver includes support for Itanium(R)2-based systems.
For questions related to hardware requirements, refer to the documentation
@@ -136,9 +138,9 @@
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
-Enabling Wake on LAN* (WoL)
----------------------------
-WoL is provided through the ethtool* utility. For instructions on
+Enabling Wake on LAN (WoL)
+--------------------------
+WoL is provided through the ethtool utility. For instructions on
enabling WoL with ethtool, refer to the ethtool man page. WoL will be
enabled on the system during the next shut down or reboot. For this
driver version, in order to enable WoL, the e100 driver must be loaded
diff --git a/Documentation/networking/e1000.rst b/Documentation/networking/device_drivers/intel/e1000.rst
similarity index 97%
rename from Documentation/networking/e1000.rst
rename to Documentation/networking/device_drivers/intel/e1000.rst
index f10dd40..4aaae0f 100644
--- a/Documentation/networking/e1000.rst
+++ b/Documentation/networking/device_drivers/intel/e1000.rst
@@ -1,6 +1,8 @@
-===========================================================
-Linux* Base Driver for Intel(R) Ethernet Network Connection
-===========================================================
+.. SPDX-License-Identifier: GPL-2.0+
+
+==========================================================
+Linux Base Driver for Intel(R) Ethernet Network Connection
+==========================================================
Intel Gigabit Linux driver.
Copyright(c) 1999 - 2013 Intel Corporation.
@@ -436,10 +438,10 @@
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
-Enabling Wake on LAN* (WoL)
----------------------------
+Enabling Wake on LAN (WoL)
+--------------------------
- WoL is configured through the ethtool* utility.
+ WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot.
For this driver version, in order to enable WoL, the e1000 driver must be
diff --git a/Documentation/networking/device_drivers/intel/e1000e.rst b/Documentation/networking/device_drivers/intel/e1000e.rst
new file mode 100644
index 0000000..f49cd37
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/e1000e.rst
@@ -0,0 +1,383 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=====================================================
+Linux Driver for Intel(R) Ethernet Network Connection
+=====================================================
+
+Intel Gigabit Linux driver.
+Copyright(c) 2008-2018 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Command Line Parameters
+- Additional Configurations
+- Support
+
+
+Identifying Your Adapter
+========================
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+https://www.intel.com/support
+
+
+Command Line Parameters
+=======================
+If the driver is built as a module, the following optional parameters are used
+by entering them on the command line with the modprobe command using this
+syntax::
+
+ modprobe e1000e [<option>=<VAL1>,<VAL2>,...]
+
+There needs to be a <VAL#> for each network port in the system supported by
+this driver. The values will be applied to each instance, in function order.
+For example::
+
+ modprobe e1000e InterruptThrottleRate=16000,16000
+
+In this case, there are two network ports supported by e1000e in the system.
+The default value for each parameter is generally the recommended setting,
+unless otherwise noted.
+
+NOTE: A descriptor describes a data buffer and attributes related to the data
+buffer. This information is accessed by the hardware.
+
+InterruptThrottleRate
+---------------------
+:Valid Range: 0,1,3,4,100-100000
+:Default Value: 3
+
+Interrupt Throttle Rate controls the number of interrupts each interrupt
+vector can generate per second. Increasing ITR lowers latency at the cost of
+increased CPU utilization, though it may help throughput in some circumstances.
+
+Setting InterruptThrottleRate to a value greater or equal to 100
+will program the adapter to send out a maximum of that many interrupts
+per second, even if more packets have come in. This reduces interrupt
+load on the system and can lower CPU utilization under heavy load,
+but will increase latency as packets are not processed as quickly.
+
+The default behaviour of the driver previously assumed a static
+InterruptThrottleRate value of 8000, providing a good fallback value for
+all traffic types, but lacking in small packet performance and latency.
+The hardware can handle many more small packets per second however, and
+for this reason an adaptive interrupt moderation algorithm was implemented.
+
+The driver has two adaptive modes (setting 1 or 3) in which
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
+that it receives. After determining the type of incoming traffic in the last
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+for that traffic.
+
+The algorithm classifies the incoming traffic every interval into
+classes. Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
+"Bulk traffic", for large amounts of packets of normal size; "Low latency",
+for small amounts of traffic and/or a significant percentage of small
+packets; and "Lowest latency", for almost completely small packets or
+minimal traffic.
+
+ - 0: Off
+ Turns off any interrupt moderation and may improve small packet latency.
+ However, this is generally not suitable for bulk throughput traffic due
+ to the increased CPU utilization of the higher interrupt rate.
+ - 1: Dynamic mode
+ This mode attempts to moderate interrupts per vector while maintaining
+ very low latency. This can sometimes cause extra CPU utilization. If
+ planning on deploying e1000e in a latency sensitive environment, this
+ parameter should be considered.
+ - 3: Dynamic Conservative mode (default)
+ In dynamic conservative mode, the InterruptThrottleRate value is set to
+ 4000 for traffic that falls in class "Bulk traffic". If traffic falls in
+ the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is
+ increased stepwise to 20000. This default mode is suitable for most
+ applications.
+ - 4: Simplified Balancing mode
+ In simplified mode the interrupt rate is based on the ratio of TX and
+ RX traffic. If the bytes per second rate is approximately equal, the
+ interrupt rate will drop as low as 2000 interrupts per second. If the
+ traffic is mostly transmit or mostly receive, the interrupt rate could
+ be as high as 8000.
+ - 100-100000:
+ Setting InterruptThrottleRate to a value greater or equal to 100
+ will program the adapter to send at most that many interrupts per second,
+ even if more packets have come in. This reduces interrupt load on the
+ system and can lower CPU utilization under heavy load, but will increase
+ latency as packets are not processed as quickly.
+
+NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
+RxAbsIntDelay parameters. In other words, minimizing the receive and/or
+transmit absolute delays does not force the controller to generate more
+interrupts than what the Interrupt Throttle Rate allows.
+
+RxIntDelay
+----------
+:Valid Range: 0-65535 (0=off)
+:Default Value: 0
+
+This value delays the generation of receive interrupts in units of 1.024
+microseconds. Receive interrupt reduction can improve CPU efficiency if
+properly tuned for specific network traffic. Increasing this value adds extra
+latency to frame reception and can end up decreasing the throughput of TCP
+traffic. If the system is reporting dropped receives, this value may be set
+too high, causing the driver to run out of available receive descriptors.
+
+CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang
+(stop transmitting) under certain network conditions. If this occurs a NETDEV
+WATCHDOG message is logged in the system event log. In addition, the
+controller is automatically reset, restoring the network connection. To
+eliminate the potential for the hang ensure that RxIntDelay is set to 0.
+
+RxAbsIntDelay
+-------------
+:Valid Range: 0-65535 (0=off)
+:Default Value: 8
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+receive interrupt is generated. This value ensures that an interrupt is
+generated after the initial packet is received within the set amount of time,
+which is useful only if RxIntDelay is non-zero. Proper tuning, along with
+RxIntDelay, may improve traffic throughput in specific network conditions.
+
+TxIntDelay
+----------
+:Valid Range: 0-65535 (0=off)
+:Default Value: 8
+
+This value delays the generation of transmit interrupts in units of 1.024
+microseconds. Transmit interrupt reduction can improve CPU efficiency if
+properly tuned for specific network traffic. If the system is reporting
+dropped transmits, this value may be set too high causing the driver to run
+out of available transmit descriptors.
+
+TxAbsIntDelay
+-------------
+:Valid Range: 0-65535 (0=off)
+:Default Value: 32
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+transmit interrupt is generated. It is useful only if TxIntDelay is non-zero.
+It ensures that an interrupt is generated after the initial Packet is sent on
+the wire within the set amount of time. Proper tuning, along with TxIntDelay,
+may improve traffic throughput in specific network conditions.
+
+copybreak
+---------
+:Valid Range: 0-xxxxxxx (0=off)
+:Default Value: 256
+
+The driver copies all packets below or equaling this size to a fresh receive
+buffer before handing it up the stack.
+This parameter differs from other parameters because it is a single (not 1,1,1
+etc.) parameter applied to all driver instances and it is also available
+during runtime at /sys/module/e1000e/parameters/copybreak.
+
+To use copybreak, type::
+
+ modprobe e1000e.ko copybreak=128
+
+SmartPowerDownEnable
+--------------------
+:Valid Range: 0,1
+:Default Value: 0 (disabled)
+
+Allows the PHY to turn off in lower power states. The user can turn off this
+parameter in supported chipsets.
+
+KumeranLockLoss
+---------------
+:Valid Range: 0,1
+:Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial silicon
+releases of ICH8 systems.
+
+IntMode
+-------
+:Valid Range: 0-2
+:Default Value: 0
+
+ +-------+----------------+
+ | Value | Interrupt Mode |
+ +=======+================+
+ | 0 | Legacy |
+ +-------+----------------+
+ | 1 | MSI |
+ +-------+----------------+
+ | 2 | MSI-X |
+ +-------+----------------+
+
+IntMode allows load time control over the type of interrupt registered for by
+the driver. MSI-X is required for multiple queue support, and some kernels and
+combinations of kernel .config options will force a lower level of interrupt
+support.
+
+This command will show different values for each type of interrupt::
+
+ cat /proc/interrupts
+
+CrcStripping
+------------
+:Valid Range: 0,1
+:Default Value: 1 (enabled)
+
+Strip the CRC from received packets before sending up the network stack. If
+you have a machine with a BMC enabled but cannot receive IPMI traffic after
+loading or enabling the driver, try disabling this feature.
+
+WriteProtectNVM
+---------------
+:Valid Range: 0,1
+:Default Value: 1 (enabled)
+
+If set to 1, configure the hardware to ignore all write/erase cycles to the
+GbE region in the ICHx NVM (in order to prevent accidental corruption of the
+NVM). This feature can be disabled by setting the parameter to 0 during initial
+driver load.
+
+NOTE: The machine must be power cycled (full off/on) when enabling NVM writes
+via setting the parameter to zero. Once the NVM has been locked (via the
+parameter at 1 when the driver loads) it cannot be unlocked except via power
+cycle.
+
+Debug
+-----
+:Valid Range: 0-16 (0=none,...,16=all)
+:Default Value: 0
+
+This parameter adjusts the level of debug messages displayed in the system logs.
+
+
+Additional Features and Configurations
+======================================
+
+Jumbo Frames
+------------
+Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
+to a value larger than the default value of 1500.
+
+Use the ifconfig command to increase the MTU size. For example, enter the
+following where <x> is the interface number::
+
+ ifconfig eth<x> mtu 9000 up
+
+Alternatively, you can use the ip command as follows::
+
+ ip link set mtu 9000 dev eth<x>
+ ip link set up dev eth<x>
+
+This setting is not saved across reboots. The setting change can be made
+permanent by adding 'MTU=9000' to the file:
+
+- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
+- For SLES: /etc/sysconfig/network/<config_file>
+
+NOTE: The maximum MTU setting for Jumbo Frames is 8996. This value coincides
+with the maximum Jumbo Frames size of 9018 bytes.
+
+NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
+poor performance or loss of link.
+
+NOTE: The following adapters limit Jumbo Frames sized packets to a maximum of
+4088 bytes:
+
+ - Intel(R) 82578DM Gigabit Network Connection
+ - Intel(R) 82577LM Gigabit Network Connection
+
+The following adapters do not support Jumbo Frames:
+
+ - Intel(R) PRO/1000 Gigabit Server Adapter
+ - Intel(R) PRO/1000 PM Network Connection
+ - Intel(R) 82562G 10/100 Network Connection
+ - Intel(R) 82562G-2 10/100 Network Connection
+ - Intel(R) 82562GT 10/100 Network Connection
+ - Intel(R) 82562GT-2 10/100 Network Connection
+ - Intel(R) 82562V 10/100 Network Connection
+ - Intel(R) 82562V-2 10/100 Network Connection
+ - Intel(R) 82566DC Gigabit Network Connection
+ - Intel(R) 82566DC-2 Gigabit Network Connection
+ - Intel(R) 82566DM Gigabit Network Connection
+ - Intel(R) 82566MC Gigabit Network Connection
+ - Intel(R) 82566MM Gigabit Network Connection
+ - Intel(R) 82567V-3 Gigabit Network Connection
+ - Intel(R) 82577LC Gigabit Network Connection
+ - Intel(R) 82578DC Gigabit Network Connection
+
+NOTE: Jumbo Frames cannot be configured on an 82579-based Network device if
+MACSec is enabled on the system.
+
+
+ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The latest ethtool
+version is required for this functionality. Download it at:
+
+https://www.kernel.org/pub/software/network/ethtool/
+
+NOTE: When validating enable/disable tests on some parts (for example, 82578),
+it is necessary to add a few seconds between tests when working with ethtool.
+
+
+Speed and Duplex Configuration
+------------------------------
+In addressing speed and duplex configuration issues, you need to distinguish
+between copper-based adapters and fiber-based adapters.
+
+In the default mode, an Intel(R) Ethernet Network Adapter using copper
+connections will attempt to auto-negotiate with its link partner to determine
+the best setting. If the adapter cannot establish link with the link partner
+using auto-negotiation, you may need to manually configure the adapter and link
+partner to identical settings to establish link and pass packets. This should
+only be needed when attempting to link with an older switch that does not
+support auto-negotiation or one that has been forced to a specific speed or
+duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
+and higher cannot be forced. Use the autonegotiation advertising setting to
+manually set devices for 1 Gbps and higher.
+
+Speed, duplex, and autonegotiation advertising are configured through the
+ethtool utility.
+
+Caution: Only experienced network administrators should force speed and duplex
+or change autonegotiation advertising manually. The settings at the switch must
+always match the adapter settings. Adapter performance may suffer or your
+adapter may not operate if you configure the adapter differently from your
+switch.
+
+An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
+will not attempt to auto-negotiate with its link partner since those adapters
+operate only in full duplex and only at their native speed.
+
+
+Enabling Wake on LAN (WoL)
+--------------------------
+WoL is configured through the ethtool utility.
+
+WoL will be enabled on the system during the next shut down or reboot. For
+this driver version, in order to enable WoL, the e1000e driver must be loaded
+prior to shutting down or suspending the system.
+
+NOTE: Wake on LAN is only supported on port A for the following devices:
+- Intel(R) PRO/1000 PT Dual Port Network Connection
+- Intel(R) PRO/1000 PT Dual Port Server Connection
+- Intel(R) PRO/1000 PT Dual Port Server Adapter
+- Intel(R) PRO/1000 PF Dual Port Server Adapter
+- Intel(R) PRO/1000 PT Quad Port Server Adapter
+- Intel(R) Gigabit PT Quad Port Server ExpressModule
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/intel/fm10k.rst b/Documentation/networking/device_drivers/intel/fm10k.rst
new file mode 100644
index 0000000..4d279e6
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/fm10k.rst
@@ -0,0 +1,142 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=============================================================
+Linux Base Driver for Intel(R) Ethernet Multi-host Controller
+=============================================================
+
+August 20, 2018
+Copyright(c) 2015-2018 Intel Corporation.
+
+Contents
+========
+- Identifying Your Adapter
+- Additional Configurations
+- Performance Tuning
+- Known Issues
+- Support
+
+Identifying Your Adapter
+========================
+The driver in this release is compatible with devices based on the Intel(R)
+Ethernet Multi-host Controller.
+
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+http://www.intel.com/support
+
+
+Flow Control
+------------
+The Intel(R) Ethernet Switch Host Interface Driver does not support Flow
+Control. It will not send pause frames. This may result in dropped frames.
+
+
+Virtual Functions (VFs)
+-----------------------
+Use sysfs to enable VFs.
+Valid Range: 0-64
+
+For example::
+
+ echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs //enable VFs
+ echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs
+
+NOTE: Neither the device nor the driver control how VFs are mapped into config
+space. Bus layout will vary by operating system. On operating systems that
+support it, you can check sysfs to find the mapping.
+
+NOTE: When SR-IOV mode is enabled, hardware VLAN filtering and VLAN tag
+stripping/insertion will remain enabled. Please remove the old VLAN filter
+before the new VLAN filter is added. For example::
+
+ ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0
+ ip link set eth0 vf 0 vlan 0 // Delete vlan 100
+ ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0
+
+
+Additional Features and Configurations
+======================================
+
+Jumbo Frames
+------------
+Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
+to a value larger than the default value of 1500.
+
+Use the ifconfig command to increase the MTU size. For example, enter the
+following where <x> is the interface number::
+
+ ifconfig eth<x> mtu 9000 up
+
+Alternatively, you can use the ip command as follows::
+
+ ip link set mtu 9000 dev eth<x>
+ ip link set up dev eth<x>
+
+This setting is not saved across reboots. The setting change can be made
+permanent by adding 'MTU=9000' to the file:
+
+- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
+- For SLES: /etc/sysconfig/network/<config_file>
+
+NOTE: The maximum MTU setting for Jumbo Frames is 15342. This value coincides
+with the maximum Jumbo Frames size of 15364 bytes.
+
+NOTE: This driver will attempt to use multiple page sized buffers to receive
+each jumbo packet. This should help to avoid buffer starvation issues when
+allocating receive packets.
+
+
+Generic Receive Offload, aka GRO
+--------------------------------
+The driver supports the in-kernel software implementation of GRO. GRO has
+shown that by coalescing Rx traffic into larger chunks of data, CPU
+utilization can be significantly reduced when under large Rx load. GRO is an
+evolution of the previously-used LRO interface. GRO is able to coalesce
+other protocols besides TCP. It's also safe to use with configurations that
+are problematic for LRO, namely bridging and iSCSI.
+
+
+
+Supported ethtool Commands and Options for Filtering
+----------------------------------------------------
+-n --show-nfc
+ Retrieves the receive network flow classification configurations.
+
+rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
+ Retrieves the hash options for the specified network traffic type.
+
+-N --config-nfc
+ Configures the receive network flow classification.
+
+rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r
+ Configures the hash options for the specified network traffic type.
+
+- udp4: UDP over IPv4
+- udp6: UDP over IPv6
+- f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.
+- n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.
+
+
+Known Issues/Troubleshooting
+============================
+
+Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS under Linux KVM
+-------------------------------------------------------------------------------------
+KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This
+includes traditional PCIe devices, as well as SR-IOV-capable devices based on
+the Intel Ethernet Controller XL710.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/intel/i40e.rst b/Documentation/networking/device_drivers/intel/i40e.rst
new file mode 100644
index 0000000..8a9b185
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/i40e.rst
@@ -0,0 +1,771 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=================================================================
+Linux Base Driver for the Intel(R) Ethernet Controller 700 Series
+=================================================================
+
+Intel 40 Gigabit Linux driver.
+Copyright(c) 1999-2018 Intel Corporation.
+
+Contents
+========
+
+- Overview
+- Identifying Your Adapter
+- Intel(R) Ethernet Flow Director
+- Additional Configurations
+- Known Issues
+- Support
+
+
+Driver information can be obtained using ethtool, lspci, and ifconfig.
+Instructions on updating ethtool can be found in the section Additional
+Configurations later in this document.
+
+For questions related to hardware requirements, refer to the documentation
+supplied with your Intel adapter. All hardware requirements listed apply to use
+with Linux.
+
+
+Identifying Your Adapter
+========================
+The driver is compatible with devices based on the following:
+
+ * Intel(R) Ethernet Controller X710
+ * Intel(R) Ethernet Controller XL710
+ * Intel(R) Ethernet Network Connection X722
+ * Intel(R) Ethernet Controller XXV710
+
+For the best performance, make sure the latest NVM/FW is installed on your
+device.
+
+For information on how to identify your adapter, and for the latest NVM/FW
+images and Intel network drivers, refer to the Intel Support website:
+https://www.intel.com/support
+
+SFP+ and QSFP+ Devices
+----------------------
+For information about supported media, refer to this document:
+https://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf
+
+NOTE: Some adapters based on the Intel(R) Ethernet Controller 700 Series only
+support Intel Ethernet Optics modules. On these adapters, other modules are not
+supported and will not function. In all cases Intel recommends using Intel
+Ethernet Optics; other modules may function but are not validated by Intel.
+Contact Intel for supported media types.
+
+NOTE: For connections based on Intel(R) Ethernet Controller 700 Series, support
+is dependent on your system board. Please see your vendor for details.
+
+NOTE: In systems that do not have adequate airflow to cool the adapter and
+optical modules, you must use high temperature optical modules.
+
+Virtual Functions (VFs)
+-----------------------
+Use sysfs to enable VFs. For example::
+
+ #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs #enable VFs
+ #echo 0 > /sys/class/net/$dev/device/sriov_numvfs #disable VFs
+
+For example, the following instructions will configure PF eth0 and the first VF
+on VLAN 10::
+
+ $ ip link set dev eth0 vf 0 vlan 10
+
+VLAN Tag Packet Steering
+------------------------
+Allows you to send all packets with a specific VLAN tag to a particular SR-IOV
+virtual function (VF). Further, this feature allows you to designate a
+particular VF as trusted, and allows that trusted VF to request selective
+promiscuous mode on the Physical Function (PF).
+
+To set a VF as trusted or untrusted, enter the following command in the
+Hypervisor::
+
+ # ip link set dev eth0 vf 1 trust [on|off]
+
+Once the VF is designated as trusted, use the following commands in the VM to
+set the VF to promiscuous mode.
+
+::
+
+ For promiscuous all:
+ #ip link set eth2 promisc on
+ Where eth2 is a VF interface in the VM
+
+ For promiscuous Multicast:
+ #ip link set eth2 allmulticast on
+ Where eth2 is a VF interface in the VM
+
+NOTE: By default, the ethtool priv-flag vf-true-promisc-support is set to
+"off",meaning that promiscuous mode for the VF will be limited. To set the
+promiscuous mode for the VF to true promiscuous and allow the VF to see all
+ingress traffic, use the following command::
+
+ #ethtool -set-priv-flags p261p1 vf-true-promisc-support on
+
+The vf-true-promisc-support priv-flag does not enable promiscuous mode; rather,
+it designates which type of promiscuous mode (limited or true) you will get
+when you enable promiscuous mode using the ip link commands above. Note that
+this is a global setting that affects the entire device. However,the
+vf-true-promisc-support priv-flag is only exposed to the first PF of the
+device. The PF remains in limited promiscuous mode (unless it is in MFP mode)
+regardless of the vf-true-promisc-support setting.
+
+Now add a VLAN interface on the VF interface::
+
+ #ip link add link eth2 name eth2.100 type vlan id 100
+
+Note that the order in which you set the VF to promiscuous mode and add the
+VLAN interface does not matter (you can do either first). The end result in
+this example is that the VF will get all traffic that is tagged with VLAN 100.
+
+Intel(R) Ethernet Flow Director
+-------------------------------
+The Intel Ethernet Flow Director performs the following tasks:
+
+- Directs receive packets according to their flows to different queues.
+- Enables tight control on routing a flow in the platform.
+- Matches flows and CPU cores for flow affinity.
+- Supports multiple parameters for flexible flow classification and load
+ balancing (in SFP mode only).
+
+NOTE: The Linux i40e driver supports the following flow types: IPv4, TCPv4, and
+UDPv4. For a given flow type, it supports valid combinations of IP addresses
+(source or destination) and UDP/TCP ports (source and destination). For
+example, you can supply only a source IP address, a source IP address and a
+destination port, or any combination of one or more of these four parameters.
+
+NOTE: The Linux i40e driver allows you to filter traffic based on a
+user-defined flexible two-byte pattern and offset by using the ethtool user-def
+and mask fields. Only L3 and L4 flow types are supported for user-defined
+flexible filters. For a given flow type, you must clear all Intel Ethernet Flow
+Director filters before changing the input set (for that flow type).
+
+To enable or disable the Intel Ethernet Flow Director::
+
+ # ethtool -K ethX ntuple <on|off>
+
+When disabling ntuple filters, all the user programmed filters are flushed from
+the driver cache and hardware. All needed filters must be re-added when ntuple
+is re-enabled.
+
+To add a filter that directs packet to queue 2, use -U or -N switch::
+
+ # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
+ 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
+
+To set a filter using only the source and destination IP address::
+
+ # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
+ 192.168.10.2 action 2 [loc 1]
+
+To see the list of filters currently present::
+
+ # ethtool <-u|-n> ethX
+
+Application Targeted Routing (ATR) Perfect Filters
+--------------------------------------------------
+ATR is enabled by default when the kernel is in multiple transmit queue mode.
+An ATR Intel Ethernet Flow Director filter rule is added when a TCP-IP flow
+starts and is deleted when the flow ends. When a TCP-IP Intel Ethernet Flow
+Director rule is added from ethtool (Sideband filter), ATR is turned off by the
+driver. To re-enable ATR, the sideband can be disabled with the ethtool -K
+option. For example::
+
+ ethtool –K [adapter] ntuple [off|on]
+
+If sideband is re-enabled after ATR is re-enabled, ATR remains enabled until a
+TCP-IP flow is added. When all TCP-IP sideband rules are deleted, ATR is
+automatically re-enabled.
+
+Packets that match the ATR rules are counted in fdir_atr_match stats in
+ethtool, which also can be used to verify whether ATR rules still exist.
+
+Sideband Perfect Filters
+------------------------
+Sideband Perfect Filters are used to direct traffic that matches specified
+characteristics. They are enabled through ethtool's ntuple interface. To add a
+new filter use the following command::
+
+ ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
+ dst-port <port> action <queue>
+
+Where:
+ <device> - the ethernet device to program
+ <type> - can be ip4, tcp4, udp4, or sctp4
+ <ip> - the ip address to match on
+ <port> - the port number to match on
+ <queue> - the queue to direct traffic towards (-1 discards matching traffic)
+
+Use the following command to display all of the active filters::
+
+ ethtool -u <device>
+
+Use the following command to delete a filter::
+
+ ethtool -U <device> delete <N>
+
+Where <N> is the filter id displayed when printing all the active filters, and
+may also have been specified using "loc <N>" when adding the filter.
+
+The following example matches TCP traffic sent from 192.168.0.1, port 5300,
+directed to 192.168.0.5, port 80, and sends it to queue 7::
+
+ ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
+ src-port 5300 dst-port 80 action 7
+
+For each flow-type, the programmed filters must all have the same matching
+input set. For example, issuing the following two commands is acceptable::
+
+ ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
+ ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
+
+Issuing the next two commands, however, is not acceptable, since the first
+specifies src-ip and the second specifies dst-ip::
+
+ ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
+ ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
+
+The second command will fail with an error. You may program multiple filters
+with the same fields, using different values, but, on one device, you may not
+program two tcp4 filters with different matching fields.
+
+Matching on a sub-portion of a field is not supported by the i40e driver, thus
+partial mask fields are not supported.
+
+The driver also supports matching user-defined data within the packet payload.
+This flexible data is specified using the "user-def" field of the ethtool
+command in the following way:
+
++----------------------------+--------------------------+
+| 31 28 24 20 16 | 15 12 8 4 0 |
++----------------------------+--------------------------+
+| offset into packet payload | 2 bytes of flexible data |
++----------------------------+--------------------------+
+
+For example,
+
+::
+
+ ... user-def 0x4FFFF ...
+
+tells the filter to look 4 bytes into the payload and match that value against
+0xFFFF. The offset is based on the beginning of the payload, and not the
+beginning of the packet. Thus
+
+::
+
+ flow-type tcp4 ... user-def 0x8BEAF ...
+
+would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the
+TCP/IPv4 payload.
+
+Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload.
+Thus to match the first byte of the payload, you must actually add 4 bytes to
+the offset. Also note that ip4 filters match both ICMP frames as well as raw
+(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 frame.
+
+The maximum offset is 64. The hardware will only read up to 64 bytes of data
+from the payload. The offset must be even because the flexible data is 2 bytes
+long and must be aligned to byte 0 of the packet payload.
+
+The user-defined flexible offset is also considered part of the input set and
+cannot be programmed separately for multiple filters of the same type. However,
+the flexible data is not part of the input set and multiple filters may use the
+same offset but match against different data.
+
+To create filters that direct traffic to a specific Virtual Function, use the
+"action" parameter. Specify the action as a 64 bit value, where the lower 32
+bits represents the queue number, while the next 8 bits represent which VF.
+Note that 0 is the PF, so the VF identifier is offset by 1. For example::
+
+ ... action 0x800000002 ...
+
+specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
+that VF.
+
+Note that these filters will not break internal routing rules, and will not
+route traffic that otherwise would not have been sent to the specified Virtual
+Function.
+
+Setting the link-down-on-close Private Flag
+-------------------------------------------
+When the link-down-on-close private flag is set to "on", the port's link will
+go down when the interface is brought down using the ifconfig ethX down command.
+
+Use ethtool to view and set link-down-on-close, as follows::
+
+ ethtool --show-priv-flags ethX
+ ethtool --set-priv-flags ethX link-down-on-close [on|off]
+
+Viewing Link Messages
+---------------------
+Link messages will not be displayed to the console if the distribution is
+restricting system messages. In order to see network driver link messages on
+your console, set dmesg to eight by entering the following::
+
+ dmesg -n 8
+
+NOTE: This setting is not saved across reboots.
+
+Jumbo Frames
+------------
+Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
+to a value larger than the default value of 1500.
+
+Use the ifconfig command to increase the MTU size. For example, enter the
+following where <x> is the interface number::
+
+ ifconfig eth<x> mtu 9000 up
+
+Alternatively, you can use the ip command as follows::
+
+ ip link set mtu 9000 dev eth<x>
+ ip link set up dev eth<x>
+
+This setting is not saved across reboots. The setting change can be made
+permanent by adding 'MTU=9000' to the file::
+
+ /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
+ /etc/sysconfig/network/<config_file> // for SLES
+
+NOTE: The maximum MTU setting for Jumbo Frames is 9702. This value coincides
+with the maximum Jumbo Frames size of 9728 bytes.
+
+NOTE: This driver will attempt to use multiple page sized buffers to receive
+each jumbo packet. This should help to avoid buffer starvation issues when
+allocating receive packets.
+
+ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The latest ethtool
+version is required for this functionality. Download it at:
+https://www.kernel.org/pub/software/network/ethtool/
+
+Supported ethtool Commands and Options for Filtering
+----------------------------------------------------
+-n --show-nfc
+ Retrieves the receive network flow classification configurations.
+
+rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6
+ Retrieves the hash options for the specified network traffic type.
+
+-N --config-nfc
+ Configures the receive network flow classification.
+
+rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r...
+ Configures the hash options for the specified network traffic type.
+
+udp4 UDP over IPv4
+udp6 UDP over IPv6
+
+f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
+n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
+
+Speed and Duplex Configuration
+------------------------------
+In addressing speed and duplex configuration issues, you need to distinguish
+between copper-based adapters and fiber-based adapters.
+
+In the default mode, an Intel(R) Ethernet Network Adapter using copper
+connections will attempt to auto-negotiate with its link partner to determine
+the best setting. If the adapter cannot establish link with the link partner
+using auto-negotiation, you may need to manually configure the adapter and link
+partner to identical settings to establish link and pass packets. This should
+only be needed when attempting to link with an older switch that does not
+support auto-negotiation or one that has been forced to a specific speed or
+duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
+and higher cannot be forced. Use the autonegotiation advertising setting to
+manually set devices for 1 Gbps and higher.
+
+NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet
+Network Adapter XXV710 based devices.
+
+Speed, duplex, and autonegotiation advertising are configured through the
+ethtool utility.
+
+Caution: Only experienced network administrators should force speed and duplex
+or change autonegotiation advertising manually. The settings at the switch must
+always match the adapter settings. Adapter performance may suffer or your
+adapter may not operate if you configure the adapter differently from your
+switch.
+
+An Intel(R) Ethernet Network Adapter using fiber-based connections, however,
+will not attempt to auto-negotiate with its link partner since those adapters
+operate only in full duplex and only at their native speed.
+
+NAPI
+----
+NAPI (Rx polling mode) is supported in the i40e driver.
+For more information on NAPI, see
+https://wiki.linuxfoundation.org/networking/napi
+
+Flow Control
+------------
+Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
+receiving and transmitting pause frames for i40e. When transmit is enabled,
+pause frames are generated when the receive packet buffer crosses a predefined
+threshold. When receive is enabled, the transmit unit will halt for the time
+delay specified when a pause frame is received.
+
+NOTE: You must have a flow control capable link partner.
+
+Flow Control is on by default.
+
+Use ethtool to change the flow control settings.
+
+To enable or disable Rx or Tx Flow Control::
+
+ ethtool -A eth? rx <on|off> tx <on|off>
+
+Note: This command only enables or disables Flow Control if auto-negotiation is
+disabled. If auto-negotiation is enabled, this command changes the parameters
+used for auto-negotiation with the link partner.
+
+To enable or disable auto-negotiation::
+
+ ethtool -s eth? autoneg <on|off>
+
+Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
+on your device, you may not be able to change the auto-negotiation setting.
+
+RSS Hash Flow
+-------------
+Allows you to set the hash bytes per flow type and any combination of one or
+more options for Receive Side Scaling (RSS) hash byte configuration.
+
+::
+
+ # ethtool -N <dev> rx-flow-hash <type> <option>
+
+Where <type> is:
+ tcp4 signifying TCP over IPv4
+ udp4 signifying UDP over IPv4
+ tcp6 signifying TCP over IPv6
+ udp6 signifying UDP over IPv6
+And <option> is one or more of:
+ s Hash on the IP source address of the Rx packet.
+ d Hash on the IP destination address of the Rx packet.
+ f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
+ n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
+
+MAC and VLAN anti-spoofing feature
+----------------------------------
+When a malicious driver attempts to send a spoofed packet, it is dropped by the
+hardware and not transmitted.
+NOTE: This feature can be disabled for a specific Virtual Function (VF)::
+
+ ip link set <pf dev> vf <vf id> spoofchk {off|on}
+
+IEEE 1588 Precision Time Protocol (PTP) Hardware Clock (PHC)
+------------------------------------------------------------
+Precision Time Protocol (PTP) is used to synchronize clocks in a computer
+network. PTP support varies among Intel devices that support this driver. Use
+"ethtool -T <netdev name>" to get a definitive list of PTP capabilities
+supported by the device.
+
+IEEE 802.1ad (QinQ) Support
+---------------------------
+The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
+IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
+"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
+allow L2 tunneling and the ability to segregate traffic within a particular
+VLAN ID, among other uses.
+
+The following are examples of how to configure 802.1ad (QinQ)::
+
+ ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
+ ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
+
+Where "24" and "371" are example VLAN IDs.
+
+NOTES:
+ Receive checksum offloads, cloud filters, and VLAN acceleration are not
+ supported for 802.1ad (QinQ) packets.
+
+VXLAN and GENEVE Overlay HW Offloading
+--------------------------------------
+Virtual Extensible LAN (VXLAN) allows you to extend an L2 network over an L3
+network, which may be useful in a virtualized or cloud environment. Some
+Intel(R) Ethernet Network devices perform VXLAN processing, offloading it from
+the operating system. This reduces CPU utilization.
+
+VXLAN offloading is controlled by the Tx and Rx checksum offload options
+provided by ethtool. That is, if Tx checksum offload is enabled, and the
+adapter has the capability, VXLAN offloading is also enabled.
+
+Support for VXLAN and GENEVE HW offloading is dependent on kernel support of
+the HW offloading features.
+
+Multiple Functions per Port
+---------------------------
+Some adapters based on the Intel Ethernet Controller X710/XL710 support
+multiple functions on a single physical port. Configure these functions through
+the System Setup/BIOS.
+
+Minimum TX Bandwidth is the guaranteed minimum data transmission bandwidth, as
+a percentage of the full physical port link speed, that the partition will
+receive. The bandwidth the partition is awarded will never fall below the level
+you specify.
+
+The range for the minimum bandwidth values is:
+1 to ((100 minus # of partitions on the physical port) plus 1)
+For example, if a physical port has 4 partitions, the range would be:
+1 to ((100 - 4) + 1 = 97)
+
+The Maximum Bandwidth percentage represents the maximum transmit bandwidth
+allocated to the partition as a percentage of the full physical port link
+speed. The accepted range of values is 1-100. The value is used as a limiter,
+should you chose that any one particular function not be able to consume 100%
+of a port's bandwidth (should it be available). The sum of all the values for
+Maximum Bandwidth is not restricted, because no more than 100% of a port's
+bandwidth can ever be used.
+
+NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions
+per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says
+"add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than
+64 virtual functions (VFs).
+
+Data Center Bridging (DCB)
+--------------------------
+DCB is a configuration Quality of Service implementation in hardware. It uses
+the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
+different priorities that traffic can be filtered into. It also enables
+priority flow control (802.1Qbb) which can limit or eliminate the number of
+dropped packets during network stress. Bandwidth can be allocated to each of
+these priorities, which is enforced at the hardware level (802.1Qaz).
+
+Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
+802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
+and can accept settings from a DCBX capable peer. Software configuration of
+DCBX parameters via dcbtool/lldptool are not supported.
+
+NOTE: Firmware LLDP can be disabled by setting the private flag disable-fw-lldp.
+
+The i40e driver implements the DCB netlink interface layer to allow user-space
+to communicate with the driver and query DCB configuration for the port.
+
+NOTE:
+The kernel assumes that TC0 is available, and will disable Priority Flow
+Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
+enabled when setting up DCB on your switch.
+
+Interrupt Rate Limiting
+-----------------------
+:Valid Range: 0-235 (0=no limit)
+
+The Intel(R) Ethernet Controller XL710 family supports an interrupt rate
+limiting mechanism. The user can control, via ethtool, the number of
+microseconds between interrupts.
+
+Syntax::
+
+ # ethtool -C ethX rx-usecs-high N
+
+The range of 0-235 microseconds provides an effective range of 4,310 to 250,000
+interrupts per second. The value of rx-usecs-high can be set independently of
+rx-usecs and tx-usecs in the same ethtool command, and is also independent of
+the adaptive interrupt moderation algorithm. The underlying hardware supports
+granularity in 4-microsecond intervals, so adjacent values may result in the
+same interrupt rate.
+
+One possible use case is the following::
+
+ # ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs-high 20 rx-usecs \
+ 5 tx-usecs 5
+
+The above command would disable adaptive interrupt moderation, and allow a
+maximum of 5 microseconds before indicating a receive or transmit was complete.
+However, instead of resulting in as many as 200,000 interrupts per second, it
+limits total interrupts per second to 50,000 via the rx-usecs-high parameter.
+
+Performance Optimization
+========================
+Driver defaults are meant to fit a wide variety of workloads, but if further
+optimization is required we recommend experimenting with the following settings.
+
+NOTE: For better performance when processing small (64B) frame sizes, try
+enabling Hyper threading in the BIOS in order to increase the number of logical
+cores in the system and subsequently increase the number of queues available to
+the adapter.
+
+Virtualized Environments
+------------------------
+1. Disable XPS on both ends by using the included virt_perf_default script
+or by running the following command as root::
+
+ for file in `ls /sys/class/net/<ethX>/queues/tx-*/xps_cpus`;
+ do echo 0 > $file; done
+
+2. Using the appropriate mechanism (vcpupin) in the vm, pin the cpu's to
+individual lcpu's, making sure to use a set of cpu's included in the
+device's local_cpulist: /sys/class/net/<ethX>/device/local_cpulist.
+
+3. Configure as many Rx/Tx queues in the VM as available. Do not rely on
+the default setting of 1.
+
+
+Non-virtualized Environments
+----------------------------
+Pin the adapter's IRQs to specific cores by disabling the irqbalance service
+and using the included set_irq_affinity script. Please see the script's help
+text for further options.
+
+- The following settings will distribute the IRQs across all the cores evenly::
+
+ # scripts/set_irq_affinity -x all <interface1> , [ <interface2>, ... ]
+
+- The following settings will distribute the IRQs across all the cores that are
+ local to the adapter (same NUMA node)::
+
+ # scripts/set_irq_affinity -x local <interface1> ,[ <interface2>, ... ]
+
+For very CPU intensive workloads, we recommend pinning the IRQs to all cores.
+
+For IP Forwarding: Disable Adaptive ITR and lower Rx and Tx interrupts per
+queue using ethtool.
+
+- Setting rx-usecs and tx-usecs to 125 will limit interrupts to about 8000
+ interrupts per second per queue.
+
+::
+
+ # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 125 \
+ tx-usecs 125
+
+For lower CPU utilization: Disable Adaptive ITR and lower Rx and Tx interrupts
+per queue using ethtool.
+
+- Setting rx-usecs and tx-usecs to 250 will limit interrupts to about 4000
+ interrupts per second per queue.
+
+::
+
+ # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 250 \
+ tx-usecs 250
+
+For lower latency: Disable Adaptive ITR and ITR by setting Rx and Tx to 0 using
+ethtool.
+
+::
+
+ # ethtool -C <interface> adaptive-rx off adaptive-tx off rx-usecs 0 \
+ tx-usecs 0
+
+Application Device Queues (ADq)
+-------------------------------
+Application Device Queues (ADq) allows you to dedicate one or more queues to a
+specific application. This can reduce latency for the specified application,
+and allow Tx traffic to be rate limited per application. Follow the steps below
+to set ADq.
+
+1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
+The shaper bw_rlimit parameter is optional.
+
+Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
+to 1Gbit for tc0 and 3Gbit for tc1.
+
+::
+
+ # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
+ queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
+ max_rate 1Gbit 3Gbit
+
+map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
+sets priorities 0-3 to use tc0 and 4-7 to use tc1)
+
+queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
+16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
+number of queues for all tcs is 64 or number of cores, whichever is lower.)
+
+hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
+offload mode in mqprio that makes full use of the mqprio options, the
+TCs, the queue configurations, and the QoS parameters.
+
+shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
+Totals must be equal or less than port speed.
+
+For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
+monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
+
+2. Enable HW TC offload on interface::
+
+ # ethtool -K <interface> hw-tc-offload on
+
+3. Apply TCs to ingress (RX) flow of interface::
+
+ # tc qdisc add dev <interface> ingress
+
+NOTES:
+ - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
+ - ADq is not compatible with cloud filters.
+ - Setting up channels via ethtool (ethtool -L) is not supported when the
+ TCs are configured using mqprio.
+ - You must have iproute2 latest version
+ - NVM version 6.01 or later is required.
+ - ADq cannot be enabled when any the following features are enabled: Data
+ Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband
+ Filters.
+ - If another driver (for example, DPDK) has set cloud filters, you cannot
+ enable ADq.
+ - Tunnel filters are not supported in ADq. If encapsulated packets do
+ arrive in non-tunnel mode, filtering will be done on the inner headers.
+ For example, for VXLAN traffic in non-tunnel mode, PCTYPE is identified
+ as a VXLAN encapsulated packet, outer headers are ignored. Therefore,
+ inner headers are matched.
+ - If a TC filter on a PF matches traffic over a VF (on the PF), that
+ traffic will be routed to the appropriate queue of the PF, and will
+ not be passed on the VF. Such traffic will end up getting dropped higher
+ up in the TCP/IP stack as it does not match PF address data.
+ - If traffic matches multiple TC filters that point to different TCs,
+ that traffic will be duplicated and sent to all matching TC queues.
+ The hardware switch mirrors the packet to a VSI list when multiple
+ filters are matched.
+
+
+Known Issues/Troubleshooting
+============================
+
+NOTE: 1 Gb devices based on the Intel(R) Ethernet Network Connection X722 do
+not support the following features:
+
+ * Data Center Bridging (DCB)
+ * QOS
+ * VMQ
+ * SR-IOV
+ * Task Encapsulation offload (VXLAN, NVGRE)
+ * Energy Efficient Ethernet (EEE)
+ * Auto-media detect
+
+Unexpected Issues when the device driver and DPDK share a device
+----------------------------------------------------------------
+Unexpected issues may result when an i40e device is in multi driver mode and
+the kernel driver and DPDK driver are sharing the device. This is because
+access to the global NIC resources is not synchronized between multiple
+drivers. Any change to the global NIC configuration (writing to a global
+register, setting global configuration by AQ, or changing switch modes) will
+affect all ports and drivers on the device. Loading DPDK with the
+"multi-driver" module parameter may mitigate some of the issues.
+
+TC0 must be enabled when setting up DCB on a switch
+---------------------------------------------------
+The kernel assumes that TC0 is available, and will disable Priority Flow
+Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
+enabled when setting up DCB on your switch.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/intel/iavf.rst b/Documentation/networking/device_drivers/intel/iavf.rst
new file mode 100644
index 0000000..84ac7e7
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/iavf.rst
@@ -0,0 +1,331 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=================================================================
+Linux Base Driver for Intel(R) Ethernet Adaptive Virtual Function
+=================================================================
+
+Intel Ethernet Adaptive Virtual Function Linux driver.
+Copyright(c) 2013-2018 Intel Corporation.
+
+Contents
+========
+
+- Overview
+- Identifying Your Adapter
+- Additional Configurations
+- Known Issues/Troubleshooting
+- Support
+
+Overview
+========
+
+This file describes the iavf Linux Base Driver. This driver was formerly
+called i40evf.
+
+The iavf driver supports the below mentioned virtual function devices and
+can only be activated on kernels running the i40e or newer Physical Function
+(PF) driver compiled with CONFIG_PCI_IOV. The iavf driver requires
+CONFIG_PCI_MSI to be enabled.
+
+The guest OS loading the iavf driver must support MSI-X interrupts.
+
+Identifying Your Adapter
+========================
+
+The driver in this kernel is compatible with devices based on the following:
+ * Intel(R) XL710 X710 Virtual Function
+ * Intel(R) X722 Virtual Function
+ * Intel(R) XXV710 Virtual Function
+ * Intel(R) Ethernet Adaptive Virtual Function
+
+For the best performance, make sure the latest NVM/FW is installed on your
+device.
+
+For information on how to identify your adapter, and for the latest NVM/FW
+images and Intel network drivers, refer to the Intel Support website:
+http://www.intel.com/support
+
+
+Additional Features and Configurations
+======================================
+
+Viewing Link Messages
+---------------------
+Link messages will not be displayed to the console if the distribution is
+restricting system messages. In order to see network driver link messages on
+your console, set dmesg to eight by entering the following::
+
+ # dmesg -n 8
+
+NOTE:
+ This setting is not saved across reboots.
+
+ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The latest ethtool
+version is required for this functionality. Download it at:
+https://www.kernel.org/pub/software/network/ethtool/
+
+Setting VLAN Tag Stripping
+--------------------------
+If you have applications that require Virtual Functions (VFs) to receive
+packets with VLAN tags, you can disable VLAN tag stripping for the VF. The
+Physical Function (PF) processes requests issued from the VF to enable or
+disable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF,
+then requests from that VF to set VLAN tag stripping will be ignored.
+
+To enable/disable VLAN tag stripping for a VF, issue the following command
+from inside the VM in which you are running the VF::
+
+ # ethtool -K <if_name> rxvlan on/off
+
+or alternatively::
+
+ # ethtool --offload <if_name> rxvlan on/off
+
+Adaptive Virtual Function
+-------------------------
+Adaptive Virtual Function (AVF) allows the virtual function driver, or VF, to
+adapt to changing feature sets of the physical function driver (PF) with which
+it is associated. This allows system administrators to update a PF without
+having to update all the VFs associated with it. All AVFs have a single common
+device ID and branding string.
+
+AVFs have a minimum set of features known as "base mode," but may provide
+additional features depending on what features are available in the PF with
+which the AVF is associated. The following are base mode features:
+
+- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
+ for Tx/Rx
+- i40e descriptors and ring format
+- Descriptor write-back completion
+- 1 control queue, with i40e descriptors, CSRs and ring format
+- 5 MSI-X interrupt vectors and corresponding i40e CSRs
+- 1 Interrupt Throttle Rate (ITR) index
+- 1 Virtual Station Interface (VSI) per VF
+- 1 Traffic Class (TC), TC0
+- Receive Side Scaling (RSS) with 64 entry indirection table and key,
+ configured through the PF
+- 1 unicast MAC address reserved per VF
+- 16 MAC address filters for each VF
+- Stateless offloads - non-tunneled checksums
+- AVF device ID
+- HW mailbox is used for VF to PF communications (including on Windows)
+
+IEEE 802.1ad (QinQ) Support
+---------------------------
+The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
+IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
+"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
+allow L2 tunneling and the ability to segregate traffic within a particular
+VLAN ID, among other uses.
+
+The following are examples of how to configure 802.1ad (QinQ)::
+
+ # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
+ # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
+
+Where "24" and "371" are example VLAN IDs.
+
+NOTES:
+ Receive checksum offloads, cloud filters, and VLAN acceleration are not
+ supported for 802.1ad (QinQ) packets.
+
+Application Device Queues (ADq)
+-------------------------------
+Application Device Queues (ADq) allows you to dedicate one or more queues to a
+specific application. This can reduce latency for the specified application,
+and allow Tx traffic to be rate limited per application. Follow the steps below
+to set ADq.
+
+Requirements:
+
+- The sch_mqprio, act_mirred and cls_flower modules must be loaded
+- The latest version of iproute2
+- If another driver (for example, DPDK) has set cloud filters, you cannot
+ enable ADQ
+- Depending on the underlying PF device, ADQ cannot be enabled when the
+ following features are enabled:
+
+ + Data Center Bridging (DCB)
+ + Multiple Functions per Port (MFP)
+ + Sideband Filters
+
+1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
+The shaper bw_rlimit parameter is optional.
+
+Example: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
+to 1Gbit for tc0 and 3Gbit for tc1.
+
+::
+
+ tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
+ queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
+ max_rate 1Gbit 3Gbit
+
+map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
+sets priorities 0-3 to use tc0 and 4-7 to use tc1)
+
+queues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
+16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
+number of queues for all tcs is 64 or number of cores, whichever is lower.)
+
+hw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
+offload mode in mqprio that makes full use of the mqprio options, the
+TCs, the queue configurations, and the QoS parameters.
+
+shaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
+Totals must be equal or less than port speed.
+
+For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
+monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
+
+NOTE:
+ Setting up channels via ethtool (ethtool -L) is not supported when the
+ TCs are configured using mqprio.
+
+2. Enable HW TC offload on interface::
+
+ # ethtool -K <interface> hw-tc-offload on
+
+3. Apply TCs to ingress (RX) flow of interface::
+
+ # tc qdisc add dev <interface> ingress
+
+NOTES:
+ - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
+ - ADq is not compatible with cloud filters
+ - Setting up channels via ethtool (ethtool -L) is not supported when the TCs
+ are configured using mqprio
+ - You must have iproute2 latest version
+ - NVM version 6.01 or later is required
+ - ADq cannot be enabled when any the following features are enabled: Data
+ Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
+ - If another driver (for example, DPDK) has set cloud filters, you cannot
+ enable ADq
+ - Tunnel filters are not supported in ADq. If encapsulated packets do arrive
+ in non-tunnel mode, filtering will be done on the inner headers. For example,
+ for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
+ encapsulated packet, outer headers are ignored. Therefore, inner headers are
+ matched.
+ - If a TC filter on a PF matches traffic over a VF (on the PF), that traffic
+ will be routed to the appropriate queue of the PF, and will not be passed on
+ the VF. Such traffic will end up getting dropped higher up in the TCP/IP
+ stack as it does not match PF address data.
+ - If traffic matches multiple TC filters that point to different TCs, that
+ traffic will be duplicated and sent to all matching TC queues. The hardware
+ switch mirrors the packet to a VSI list when multiple filters are matched.
+
+
+Known Issues/Troubleshooting
+============================
+
+Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
+---------------------------------------------------------------------------------
+If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
+series based device, the VF slaves may fail when they become the active slave.
+If the MAC address of the VF is set by the PF (Physical Function) of the
+device, when you add a slave, or change the active-backup slave, Linux bonding
+tries to sync the backup slave's MAC address to the same MAC address as the
+active slave. Linux bonding will fail at this point. This issue will not occur
+if the VF's MAC address is not set by the PF.
+
+Traffic Is Not Being Passed Between VM and Client
+-------------------------------------------------
+You may not be able to pass traffic between a client system and a
+Virtual Machine (VM) running on a separate host if the Virtual Function
+(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled
+on the VF. Note that this situation can occur in any combination of client,
+host, and guest operating system. For information on how to set the VF to
+trusted mode, refer to the section "VLAN Tag Packet Steering" in this
+readme document. For information on setting spoof checking, refer to the
+section "MAC and VLAN anti-spoofing feature" in this readme document.
+
+Do not unload port driver if VF with active VM is bound to it
+-------------------------------------------------------------
+Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
+Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
+Once the VM shuts down, or otherwise releases the VF, the command will complete.
+
+Using four traffic classes fails
+--------------------------------
+Do not try to reserve more than three traffic classes in the iavf driver. Doing
+so will fail to set any traffic classes and will cause the driver to write
+errors to stdout. Use a maximum of three queues to avoid this issue.
+
+Multiple log error messages on iavf driver removal
+--------------------------------------------------
+If you have several VFs and you remove the iavf driver, several instances of
+the following log errors are written to the log::
+
+ Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
+ Unable to send the message to VF 2 aq_err 12
+ ARQ Overflow Error detected
+
+Virtual machine does not get link
+---------------------------------
+If the virtual machine has more than one virtual port assigned to it, and those
+virtual ports are bound to different physical ports, you may not get link on
+all of the virtual ports. The following command may work around the issue::
+
+ # ethtool -r <PF>
+
+Where <PF> is the PF interface in the host, for example: p5p1. You may need to
+run the command more than once to get link on all virtual ports.
+
+MAC address of Virtual Function changes unexpectedly
+----------------------------------------------------
+If a Virtual Function's MAC address is not assigned in the host, then the VF
+(virtual function) driver will use a random MAC address. This random MAC
+address may change each time the VF driver is reloaded. You can assign a static
+MAC address in the host machine. This static MAC address will survive
+a VF driver reload.
+
+Driver Buffer Overflow Fix
+--------------------------
+The fix to resolve CVE-2016-8105, referenced in Intel SA-00069
+https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html
+is included in this and future versions of the driver.
+
+Multiple Interfaces on Same Ethernet Broadcast Network
+------------------------------------------------------
+Due to the default ARP behavior on Linux, it is not possible to have one system
+on two IP networks in the same Ethernet broadcast domain (non-partitioned
+switch) behave as expected. All Ethernet interfaces will respond to IP traffic
+for any IP address assigned to the system. This results in unbalanced receive
+traffic.
+
+If you have multiple interfaces in a server, either turn on ARP filtering by
+entering::
+
+ # echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+
+NOTE:
+ This setting is not saved across reboots. The configuration change can be
+ made permanent by adding the following line to the file /etc/sysctl.conf::
+
+ net.ipv4.conf.all.arp_filter = 1
+
+Another alternative is to install the interfaces in separate broadcast domains
+(either in different switches or in a switch partitioned to VLANs).
+
+Rx Page Allocation Errors
+-------------------------
+'Page allocation failure. order:0' errors may occur under stress.
+This is caused by the way the Linux kernel reports this stressed condition.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://support.intel.com
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on the supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/device_drivers/intel/ice.rst b/Documentation/networking/device_drivers/intel/ice.rst
new file mode 100644
index 0000000..ee43ea5
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/ice.rst
@@ -0,0 +1,46 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+==================================================================
+Linux Base Driver for the Intel(R) Ethernet Connection E800 Series
+==================================================================
+
+Intel ice Linux driver.
+Copyright(c) 2018 Intel Corporation.
+
+Contents
+========
+
+- Enabling the driver
+- Support
+
+The driver in this release supports Intel's E800 Series of products. For
+more information, visit Intel's support page at https://support.intel.com.
+
+Enabling the driver
+===================
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+ make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+ -> Device Drivers
+ -> Network device support (NETDEVICES [=y])
+ -> Ethernet driver support
+ -> Intel devices
+ -> Intel(R) Ethernet Connection E800 Series Support
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/intel/igb.rst b/Documentation/networking/device_drivers/intel/igb.rst
new file mode 100644
index 0000000..87e560f
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/igb.rst
@@ -0,0 +1,213 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+==========================================================
+Linux Base Driver for Intel(R) Ethernet Network Connection
+==========================================================
+
+Intel Gigabit Linux driver.
+Copyright(c) 1999-2018 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Command Line Parameters
+- Additional Configurations
+- Support
+
+
+Identifying Your Adapter
+========================
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+http://www.intel.com/support
+
+
+Command Line Parameters
+========================
+If the driver is built as a module, the following optional parameters are used
+by entering them on the command line with the modprobe command using this
+syntax::
+
+ modprobe igb [<option>=<VAL1>,<VAL2>,...]
+
+There needs to be a <VAL#> for each network port in the system supported by
+this driver. The values will be applied to each instance, in function order.
+For example::
+
+ modprobe igb max_vfs=2,4
+
+In this case, there are two network ports supported by igb in the system.
+
+NOTE: A descriptor describes a data buffer and attributes related to the data
+buffer. This information is accessed by the hardware.
+
+max_vfs
+-------
+:Valid Range: 0-7
+
+This parameter adds support for SR-IOV. It causes the driver to spawn up to
+max_vfs worth of virtual functions. If the value is greater than 0 it will
+also force the VMDq parameter to be 1 or more.
+
+The parameters for the driver are referenced by position. Thus, if you have a
+dual port adapter, or more than one adapter in your system, and want N virtual
+functions per port, you must specify a number for each port with each parameter
+separated by a comma. For example::
+
+ modprobe igb max_vfs=4
+
+This will spawn 4 VFs on the first port.
+
+::
+
+ modprobe igb max_vfs=2,4
+
+This will spawn 2 VFs on the first port and 4 VFs on the second port.
+
+NOTE: Caution must be used in loading the driver with these parameters.
+Depending on your system configuration, number of slots, etc., it is impossible
+to predict in all cases where the positions would be on the command line.
+
+NOTE: Neither the device nor the driver control how VFs are mapped into config
+space. Bus layout will vary by operating system. On operating systems that
+support it, you can check sysfs to find the mapping.
+
+NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering
+and VLAN tag stripping/insertion will remain enabled. Please remove the old
+VLAN filter before the new VLAN filter is added. For example::
+
+ ip link set eth0 vf 0 vlan 100 // set vlan 100 for VF 0
+ ip link set eth0 vf 0 vlan 0 // Delete vlan 100
+ ip link set eth0 vf 0 vlan 200 // set a new vlan 200 for VF 0
+
+Debug
+-----
+:Valid Range: 0-16 (0=none,...,16=all)
+:Default Value: 0
+
+This parameter adjusts the level debug messages displayed in the system logs.
+
+
+Additional Features and Configurations
+======================================
+
+Jumbo Frames
+------------
+Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
+to a value larger than the default value of 1500.
+
+Use the ifconfig command to increase the MTU size. For example, enter the
+following where <x> is the interface number::
+
+ ifconfig eth<x> mtu 9000 up
+
+Alternatively, you can use the ip command as follows::
+
+ ip link set mtu 9000 dev eth<x>
+ ip link set up dev eth<x>
+
+This setting is not saved across reboots. The setting change can be made
+permanent by adding 'MTU=9000' to the file:
+
+- For RHEL: /etc/sysconfig/network-scripts/ifcfg-eth<x>
+- For SLES: /etc/sysconfig/network/<config_file>
+
+NOTE: The maximum MTU setting for Jumbo Frames is 9216. This value coincides
+with the maximum Jumbo Frames size of 9234 bytes.
+
+NOTE: Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
+poor performance or loss of link.
+
+
+ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The latest ethtool
+version is required for this functionality. Download it at:
+
+https://www.kernel.org/pub/software/network/ethtool/
+
+
+Enabling Wake on LAN (WoL)
+--------------------------
+WoL is configured through the ethtool utility.
+
+WoL will be enabled on the system during the next shut down or reboot. For
+this driver version, in order to enable WoL, the igb driver must be loaded
+prior to shutting down or suspending the system.
+
+NOTE: Wake on LAN is only supported on port A of multi-port devices. Also
+Wake On LAN is not supported for the following device:
+- Intel(R) Gigabit VT Quad Port Server Adapter
+
+
+Multiqueue
+----------
+In this mode, a separate MSI-X vector is allocated for each queue and one for
+"other" interrupts such as link status change and errors. All interrupts are
+throttled via interrupt moderation. Interrupt moderation must be used to avoid
+interrupt storms while the driver is processing one interrupt. The moderation
+value should be at least as large as the expected time for the driver to
+process an interrupt. Multiqueue is off by default.
+
+REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not found,
+the system will fallback to MSI or to Legacy interrupts. This driver supports
+receive multiqueue on all kernels that support MSI-X.
+
+NOTE: On some kernels a reboot is required to switch between single queue mode
+and multiqueue mode or vice-versa.
+
+
+MAC and VLAN anti-spoofing feature
+----------------------------------
+When a malicious driver attempts to send a spoofed packet, it is dropped by the
+hardware and not transmitted.
+
+An interrupt is sent to the PF driver notifying it of the spoof attempt. When a
+spoofed packet is detected, the PF driver will send the following message to
+the system log (displayed by the "dmesg" command):
+Spoof event(s) detected on VF(n), where n = the VF that attempted to do the
+spoofing
+
+
+Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool
+------------------------------------------------------------
+You can set a MAC address of a Virtual Function (VF), a default VLAN and the
+rate limit using the IProute2 tool. Download the latest version of the
+IProute2 tool from Sourceforge if your version does not have all the features
+you require.
+
+Credit Based Shaper (Qav Mode)
+------------------------------
+When enabling the CBS qdisc in the hardware offload mode, traffic shaping using
+the CBS (described in the IEEE 802.1Q-2018 Section 8.6.8.2 and discussed in the
+Annex L) algorithm will run in the i210 controller, so it's more accurate and
+uses less CPU.
+
+When using offloaded CBS, and the traffic rate obeys the configured rate
+(doesn't go above it), CBS should have little to no effect in the latency.
+
+The offloaded version of the algorithm has some limits, caused by how the idle
+slope is expressed in the adapter's registers. It can only represent idle slopes
+in 16.38431 kbps units, which means that if a idle slope of 2576kbps is
+requested, the controller will be configured to use a idle slope of ~2589 kbps,
+because the driver rounds the value up. For more details, see the comments on
+:c:func:`igb_config_tx_modes()`.
+
+NOTE: This feature is exclusive to i210 models.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/intel/igbvf.rst b/Documentation/networking/device_drivers/intel/igbvf.rst
new file mode 100644
index 0000000..557fc02
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/igbvf.rst
@@ -0,0 +1,65 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+===========================================================
+Linux Base Virtual Function Driver for Intel(R) 1G Ethernet
+===========================================================
+
+Intel Gigabit Virtual Function Linux driver.
+Copyright(c) 1999-2018 Intel Corporation.
+
+Contents
+========
+- Identifying Your Adapter
+- Additional Configurations
+- Support
+
+This driver supports Intel 82576-based virtual function devices-based virtual
+function devices that can only be activated on kernels that support SR-IOV.
+
+SR-IOV requires the correct platform and OS support.
+
+The guest OS loading this driver must support MSI-X interrupts.
+
+For questions related to hardware requirements, refer to the documentation
+supplied with your Intel adapter. All hardware requirements listed apply to use
+with Linux.
+
+Driver information can be obtained using ethtool, lspci, and ifconfig.
+Instructions on updating ethtool can be found in the section Additional
+Configurations later in this document.
+
+NOTE: There is a limit of a total of 32 shared VLANs to 1 or more VFs.
+
+
+Identifying Your Adapter
+========================
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+http://www.intel.com/support
+
+
+Additional Features and Configurations
+======================================
+
+ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The latest ethtool
+version is required for this functionality. Download it at:
+
+https://www.kernel.org/pub/software/network/ethtool/
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/README.ipw2100 b/Documentation/networking/device_drivers/intel/ipw2100.txt
similarity index 100%
rename from Documentation/networking/README.ipw2100
rename to Documentation/networking/device_drivers/intel/ipw2100.txt
diff --git a/Documentation/networking/README.ipw2200 b/Documentation/networking/device_drivers/intel/ipw2200.txt
similarity index 100%
rename from Documentation/networking/README.ipw2200
rename to Documentation/networking/device_drivers/intel/ipw2200.txt
diff --git a/Documentation/networking/device_drivers/intel/ixgb.rst b/Documentation/networking/device_drivers/intel/ixgb.rst
new file mode 100644
index 0000000..9450182
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/ixgb.rst
@@ -0,0 +1,468 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=====================================================================
+Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
+=====================================================================
+
+October 1, 2018
+
+
+Contents
+========
+
+- In This Release
+- Identifying Your Adapter
+- Command Line Parameters
+- Improving Performance
+- Additional Configurations
+- Known Issues/Troubleshooting
+- Support
+
+
+
+In This Release
+===============
+
+This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
+Network Connection. This driver includes support for Itanium(R)2-based
+systems.
+
+For questions related to hardware requirements, refer to the documentation
+supplied with your 10 Gigabit adapter. All hardware requirements listed apply
+to use with Linux.
+
+The following features are available in this kernel:
+ - Native VLANs
+ - Channel Bonding (teaming)
+ - SNMP
+
+Channel Bonding documentation can be found in the Linux kernel source:
+/Documentation/networking/bonding.txt
+
+The driver information previously displayed in the /proc filesystem is not
+supported in this release. Alternatively, you can use ethtool (version 1.6
+or later), lspci, and iproute2 to obtain the same information.
+
+Instructions on updating ethtool can be found in the section "Additional
+Configurations" later in this document.
+
+
+Identifying Your Adapter
+========================
+
+The following Intel network adapters are compatible with the drivers in this
+release:
+
++------------+------------------------------+----------------------------------+
+| Controller | Adapter Name | Physical Layer |
++============+==============================+==================================+
+| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) |
+| | Server Adapters | - 10G Base-SR (fiber) |
+| | | - 10G Base-CX4 (copper) |
++------------+------------------------------+----------------------------------+
+
+For more information on how to identify your adapter, go to the Adapter &
+Driver ID Guide at:
+
+ https://support.intel.com
+
+
+Command Line Parameters
+=======================
+
+If the driver is built as a module, the following optional parameters are
+used by entering them on the command line with the modprobe command using
+this syntax::
+
+ modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
+
+For example, with two 10GbE PCI adapters, entering::
+
+ modprobe ixgb TxDescriptors=80,128
+
+loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
+resources for the second adapter.
+
+The default value for each parameter is generally the recommended setting,
+unless otherwise noted.
+
+Copybreak
+---------
+:Valid Range: 0-XXXX
+:Default Value: 256
+
+ This is the maximum size of packet that is copied to a new buffer on
+ receive.
+
+Debug
+-----
+:Valid Range: 0-16 (0=none,...,16=all)
+:Default Value: 0
+
+ This parameter adjusts the level of debug messages displayed in the
+ system logs.
+
+FlowControl
+-----------
+:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
+:Default Value: 1 if no EEPROM, otherwise read from EEPROM
+
+ This parameter controls the automatic generation(Tx) and response(Rx) to
+ Ethernet PAUSE frames. There are hardware bugs associated with enabling
+ Tx flow control so beware.
+
+RxDescriptors
+-------------
+:Valid Range: 64-4096
+:Default Value: 1024
+
+ This value is the number of receive descriptors allocated by the driver.
+ Increasing this value allows the driver to buffer more incoming packets.
+ Each descriptor is 16 bytes. A receive buffer is also allocated for
+ each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
+ depending on the MTU setting. When the MTU size is 1500 or less, the
+ receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
+ receive buffer size will be either 4056, 8192, or 16384 bytes. The
+ maximum MTU size is 16114.
+
+TxDescriptors
+-------------
+:Valid Range: 64-4096
+:Default Value: 256
+
+ This value is the number of transmit descriptors allocated by the driver.
+ Increasing this value allows the driver to queue more transmits. Each
+ descriptor is 16 bytes.
+
+RxIntDelay
+----------
+:Valid Range: 0-65535 (0=off)
+:Default Value: 72
+
+ This value delays the generation of receive interrupts in units of
+ 0.8192 microseconds. Receive interrupt reduction can improve CPU
+ efficiency if properly tuned for specific network traffic. Increasing
+ this value adds extra latency to frame reception and can end up
+ decreasing the throughput of TCP traffic. If the system is reporting
+ dropped receives, this value may be set too high, causing the driver to
+ run out of available receive descriptors.
+
+TxIntDelay
+----------
+:Valid Range: 0-65535 (0=off)
+:Default Value: 32
+
+ This value delays the generation of transmit interrupts in units of
+ 0.8192 microseconds. Transmit interrupt reduction can improve CPU
+ efficiency if properly tuned for specific network traffic. Increasing
+ this value adds extra latency to frame transmission and can end up
+ decreasing the throughput of TCP traffic. If this value is set too high,
+ it will cause the driver to run out of available transmit descriptors.
+
+XsumRX
+------
+:Valid Range: 0-1
+:Default Value: 1
+
+ A value of '1' indicates that the driver should enable IP checksum
+ offload for received packets (both UDP and TCP) to the adapter hardware.
+
+RxFCHighThresh
+--------------
+:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity)
+:Default Value: 196,608 (0x30000)
+
+ Receive Flow control high threshold (when we send a pause frame)
+
+RxFCLowThresh
+-------------
+:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity)
+:Default Value: 163,840 (0x28000)
+
+ Receive Flow control low threshold (when we send a resume frame)
+
+FCReqTimeout
+------------
+:Valid Range: 1-65535
+:Default Value: 65535
+
+ Flow control request timeout (how long to pause the link partner's tx)
+
+IntDelayEnable
+--------------
+:Value Range: 0,1
+:Default Value: 1
+
+ Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it.
+
+
+Improving Performance
+=====================
+
+With the 10 Gigabit server adapters, the default Linux configuration will
+very likely limit the total available throughput artificially. There is a set
+of configuration changes that, when applied together, will increase the ability
+of Linux to transmit and receive data. The following enhancements were
+originally acquired from settings published at http://www.spec.org/web99/ for
+various submitted results using Linux.
+
+NOTE:
+ These changes are only suggestions, and serve as a starting point for
+ tuning your network performance.
+
+The changes are made in three major ways, listed in order of greatest effect:
+
+- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen
+ parameter.
+- Use sysctl to modify /proc parameters (essentially kernel tuning)
+- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
+ transmit burst lengths on the bus.
+
+NOTE:
+ setpci modifies the adapter's configuration registers to allow it to read
+ up to 4k bytes at a time (for transmits). However, for some systems the
+ behavior after modifying this register may be undefined (possibly errors of
+ some kind). A power-cycle, hard reset or explicitly setting the e6 register
+ back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
+ stable configuration.
+
+- COPY these lines and paste them into ixgb_perf.sh:
+
+::
+
+ #!/bin/bash
+ echo "configuring network performance , edit this file to change the interface
+ or device ID of 10GbE card"
+ # set mmrbc to 4k reads, modify only Intel 10GbE device IDs
+ # replace 1a48 with appropriate 10GbE device's ID installed on the system,
+ # if needed.
+ setpci -d 8086:1a48 e6.b=2e
+ # set the MTU (max transmission unit) - it requires your switch and clients
+ # to change as well.
+ # set the txqueuelen
+ # your ixgb adapter should be loaded as eth1 for this to work, change if needed
+ ip li set dev eth1 mtu 9000 txqueuelen 1000 up
+ # call the sysctl utility to modify /proc/sys entries
+ sysctl -p ./sysctl_ixgb.conf
+
+- COPY these lines and paste them into sysctl_ixgb.conf:
+
+::
+
+ # some of the defaults may be different for your kernel
+ # call this file with sysctl -p <this file>
+ # these are just suggested values that worked well to increase throughput in
+ # several network benchmark tests, your mileage may vary
+
+ ### IPV4 specific settings
+ # turn TCP timestamp support off, default 1, reduces CPU use
+ net.ipv4.tcp_timestamps = 0
+ # turn SACK support off, default on
+ # on systems with a VERY fast bus -> memory interface this is the big gainer
+ net.ipv4.tcp_sack = 0
+ # set min/default/max TCP read buffer, default 4096 87380 174760
+ net.ipv4.tcp_rmem = 10000000 10000000 10000000
+ # set min/pressure/max TCP write buffer, default 4096 16384 131072
+ net.ipv4.tcp_wmem = 10000000 10000000 10000000
+ # set min/pressure/max TCP buffer space, default 31744 32256 32768
+ net.ipv4.tcp_mem = 10000000 10000000 10000000
+
+ ### CORE settings (mostly for socket and UDP effect)
+ # set maximum receive socket buffer size, default 131071
+ net.core.rmem_max = 524287
+ # set maximum send socket buffer size, default 131071
+ net.core.wmem_max = 524287
+ # set default receive socket buffer size, default 65535
+ net.core.rmem_default = 524287
+ # set default send socket buffer size, default 65535
+ net.core.wmem_default = 524287
+ # set maximum amount of option memory buffers, default 10240
+ net.core.optmem_max = 524287
+ # set number of unprocessed input packets before kernel starts dropping them; default 300
+ net.core.netdev_max_backlog = 300000
+
+Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
+your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
+ID installed on the system.
+
+NOTE:
+ Unless these scripts are added to the boot process, these changes will
+ only last only until the next system reboot.
+
+
+Resolving Slow UDP Traffic
+--------------------------
+If your server does not seem to be able to receive UDP traffic as fast as it
+can receive TCP traffic, it could be because Linux, by default, does not set
+the network stack buffers as large as they need to be to support high UDP
+transfer rates. One way to alleviate this problem is to allow more memory to
+be used by the IP stack to store incoming data.
+
+For instance, use the commands::
+
+ sysctl -w net.core.rmem_max=262143
+
+and::
+
+ sysctl -w net.core.rmem_default=262143
+
+to increase the read buffer memory max and default to 262143 (256k - 1) from
+defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
+will increase the amount of memory used by the network stack for receives, and
+can be increased significantly more if necessary for your application.
+
+
+Additional Configurations
+=========================
+
+Configuring the Driver on Different Distributions
+-------------------------------------------------
+Configuring a network driver to load properly when the system is started is
+distribution dependent. Typically, the configuration process involves adding
+an alias line to /etc/modprobe.conf as well as editing other system startup
+scripts and/or configuration files. Many popular Linux distributions ship
+with tools to make these changes for you. To learn the proper way to
+configure a network device for your system, refer to your distribution
+documentation. If during this process you are asked for the driver or module
+name, the name for the Linux Base Driver for the Intel 10GbE Family of
+Adapters is ixgb.
+
+Viewing Link Messages
+---------------------
+Link messages will not be displayed to the console if the distribution is
+restricting system messages. In order to see network driver link messages on
+your console, set dmesg to eight by entering the following::
+
+ dmesg -n 8
+
+NOTE: This setting is not saved across reboots.
+
+Jumbo Frames
+------------
+The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
+enabled by changing the MTU to a value larger than the default of 1500.
+The maximum value for the MTU is 16114. Use the ip command to
+increase the MTU size. For example::
+
+ ip li set dev ethx mtu 9000
+
+The maximum MTU setting for Jumbo Frames is 16114. This value coincides
+with the maximum Jumbo Frames size of 16128.
+
+Ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The ethtool
+version 1.6 or later is required for this functionality.
+
+The latest release of ethtool can be found from
+https://www.kernel.org/pub/software/network/ethtool/
+
+NOTE:
+ The ethtool version 1.6 only supports a limited set of ethtool options.
+ Support for a more complete ethtool feature set can be enabled by
+ upgrading to the latest version.
+
+NAPI
+----
+NAPI (Rx polling mode) is supported in the ixgb driver.
+
+See https://wiki.linuxfoundation.org/networking/napi for more information on
+NAPI.
+
+
+Known Issues/Troubleshooting
+============================
+
+NOTE:
+ After installing the driver, if your Intel Network Connection is not
+ working, verify in the "In This Release" section of the readme that you have
+ installed the correct driver.
+
+Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis
+----------------------------------------------------------------------------
+Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
+Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
+chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
+The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
+Server adapter or the SmartBits. If this situation occurs using a different
+cable assembly may resolve the issue.
+
+Cable Interoperability Issues with HP Procurve 3400cl Switch Port
+-----------------------------------------------------------------
+Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
+adapter is connected to an HP Procurve 3400cl switch port using short cables
+(1 m or shorter). If this situation occurs, using a longer cable may resolve
+the issue.
+
+Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
+Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
+errors may be received either by the CX4 Server adapter or at the switch. If
+this situation occurs, using a different cable assembly may resolve the issue.
+
+Jumbo Frames System Requirement
+-------------------------------
+Memory allocation failures have been observed on Linux systems with 64 MB
+of RAM or less that are running Jumbo Frames. If you are using Jumbo
+Frames, your system may require more than the advertised minimum
+requirement of 64 MB of system memory.
+
+Performance Degradation with Jumbo Frames
+-----------------------------------------
+Degradation in throughput performance may be observed in some Jumbo frames
+environments. If this is observed, increasing the application's socket buffer
+size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
+See the specific application manual and /usr/src/linux*/Documentation/
+networking/ip-sysctl.txt for more details.
+
+Allocating Rx Buffers when Using Jumbo Frames
+---------------------------------------------
+Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
+the available memory is heavily fragmented. This issue may be seen with PCI-X
+adapters or with packet split disabled. This can be reduced or eliminated
+by changing the amount of available memory for receive buffer allocation, by
+increasing /proc/sys/vm/min_free_kbytes.
+
+Multiple Interfaces on Same Ethernet Broadcast Network
+------------------------------------------------------
+Due to the default ARP behavior on Linux, it is not possible to have
+one system on two IP networks in the same Ethernet broadcast domain
+(non-partitioned switch) behave as expected. All Ethernet interfaces
+will respond to IP traffic for any IP address assigned to the system.
+This results in unbalanced receive traffic.
+
+If you have multiple interfaces in a server, do either of the following:
+
+ - Turn on ARP filtering by entering::
+
+ echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+
+ - Install the interfaces in separate broadcast domains - either in
+ different switches or in a switch partitioned to VLANs.
+
+UDP Stress Test Dropped Packet Issue
+--------------------------------------
+Under small packets UDP stress test with 10GbE driver, the Linux system
+may drop UDP packets due to the fullness of socket buffers. You may want
+to change the driver's Flow Control variables to the minimum value for
+controlling packet reception.
+
+Tx Hangs Possible Under Stress
+------------------------------
+Under stress conditions, if TX hangs occur, turning off TSO
+"ethtool -K eth0 tso off" may resolve the problem.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/device_drivers/intel/ixgbe.rst b/Documentation/networking/device_drivers/intel/ixgbe.rst
new file mode 100644
index 0000000..f1d5233
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/ixgbe.rst
@@ -0,0 +1,541 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+===========================================================================
+Linux Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
+===========================================================================
+
+Intel 10 Gigabit Linux driver.
+Copyright(c) 1999-2018 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Command Line Parameters
+- Additional Configurations
+- Known Issues
+- Support
+
+Identifying Your Adapter
+========================
+The driver is compatible with devices based on the following:
+
+ * Intel(R) Ethernet Controller 82598
+ * Intel(R) Ethernet Controller 82599
+ * Intel(R) Ethernet Controller X520
+ * Intel(R) Ethernet Controller X540
+ * Intel(R) Ethernet Controller x550
+ * Intel(R) Ethernet Controller X552
+ * Intel(R) Ethernet Controller X553
+
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+https://www.intel.com/support
+
+SFP+ Devices with Pluggable Optics
+----------------------------------
+
+82599-BASED ADAPTERS
+~~~~~~~~~~~~~~~~~~~~
+NOTES:
+- If your 82599-based Intel(R) Network Adapter came with Intel optics or is an
+Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel optics
+and/or the direct attach cables listed below.
+- When 82599-based SFP+ devices are connected back to back, they should be set
+to the same Speed setting via ethtool. Results may vary if you mix speed
+settings.
+
++---------------+---------------------------------------+------------------+
+| Supplier | Type | Part Numbers |
++===============+=======================================+==================+
+| SR Modules |
++---------------+---------------------------------------+------------------+
+| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | FTLX8571D3BCV-IT |
++---------------+---------------------------------------+------------------+
+| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDZ-IN2 |
++---------------+---------------------------------------+------------------+
+| Intel | DUAL RATE 1G/10G SFP+ SR (bailed) | AFBR-703SDDZ-IN1 |
++---------------+---------------------------------------+------------------+
+| LR Modules |
++---------------+---------------------------------------+------------------+
+| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | FTLX1471D3BCV-IT |
++---------------+---------------------------------------+------------------+
+| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDZ-IN2 |
++---------------+---------------------------------------+------------------+
+| Intel | DUAL RATE 1G/10G SFP+ LR (bailed) | AFCT-701SDDZ-IN1 |
++---------------+---------------------------------------+------------------+
+
+The following is a list of 3rd party SFP+ modules that have received some
+testing. Not all modules are applicable to all devices.
+
++---------------+---------------------------------------+------------------+
+| Supplier | Type | Part Numbers |
++===============+=======================================+==================+
+| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL |
++---------------+---------------------------------------+------------------+
+| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ |
++---------------+---------------------------------------+------------------+
+| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL |
++---------------+---------------------------------------+------------------+
+| Finisar | DUAL RATE 1G/10G SFP+ SR (No Bail) | FTLX8571D3QCV-IT |
++---------------+---------------------------------------+------------------+
+| Avago | DUAL RATE 1G/10G SFP+ SR (No Bail) | AFBR-703SDZ-IN1 |
++---------------+---------------------------------------+------------------+
+| Finisar | DUAL RATE 1G/10G SFP+ LR (No Bail) | FTLX1471D3QCV-IT |
++---------------+---------------------------------------+------------------+
+| Avago | DUAL RATE 1G/10G SFP+ LR (No Bail) | AFCT-701SDZ-IN1 |
++---------------+---------------------------------------+------------------+
+| Finisar | 1000BASE-T SFP | FCLF8522P2BTL |
++---------------+---------------------------------------+------------------+
+| Avago | 1000BASE-T | ABCU-5710RZ |
++---------------+---------------------------------------+------------------+
+| HP | 1000BASE-SX SFP | 453153-001 |
++---------------+---------------------------------------+------------------+
+
+82599-based adapters support all passive and active limiting direct attach
+cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications.
+
+Laser turns off for SFP+ when ifconfig ethX down
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+"ifconfig ethX down" turns off the laser for 82599-based SFP+ fiber adapters.
+"ifconfig ethX up" turns on the laser.
+Alternatively, you can use "ip link set [down/up] dev ethX" to turn the
+laser off and on.
+
+
+82599-based QSFP+ Adapters
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+NOTES:
+- If your 82599-based Intel(R) Network Adapter came with Intel optics, it only
+supports Intel optics.
+- 82599-based QSFP+ adapters only support 4x10 Gbps connections. 1x40 Gbps
+connections are not supported. QSFP+ link partners must be configured for
+4x10 Gbps.
+- 82599-based QSFP+ adapters do not support automatic link speed detection.
+The link speed must be configured to either 10 Gbps or 1 Gbps to match the link
+partners speed capabilities. Incorrect speed configurations will result in
+failure to link.
+- Intel(R) Ethernet Converged Network Adapter X520-Q1 only supports the optics
+and direct attach cables listed below.
+
++---------------+---------------------------------------+------------------+
+| Supplier | Type | Part Numbers |
++===============+=======================================+==================+
+| Intel | DUAL RATE 1G/10G QSFP+ SRL (bailed) | E10GQSFPSR |
++---------------+---------------------------------------+------------------+
+
+82599-based QSFP+ adapters support all passive and active limiting QSFP+
+direct attach cables that comply with SFF-8436 v4.1 specifications.
+
+82598-BASED ADAPTERS
+~~~~~~~~~~~~~~~~~~~~
+NOTES:
+- Intel(r) Ethernet Network Adapters that support removable optical modules
+only support their original module type (for example, the Intel(R) 10 Gigabit
+SR Dual Port Express Module only supports SR optical modules). If you plug in
+a different type of module, the driver will not load.
+- Hot Swapping/hot plugging optical modules is not supported.
+- Only single speed, 10 gigabit modules are supported.
+- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module
+types are not supported. Please see your system documentation for details.
+
+The following is a list of SFP+ modules and direct attach cables that have
+received some testing. Not all modules are applicable to all devices.
+
++---------------+---------------------------------------+------------------+
+| Supplier | Type | Part Numbers |
++===============+=======================================+==================+
+| Finisar | SFP+ SR bailed, 10g single rate | FTLX8571D3BCL |
++---------------+---------------------------------------+------------------+
+| Avago | SFP+ SR bailed, 10g single rate | AFBR-700SDZ |
++---------------+---------------------------------------+------------------+
+| Finisar | SFP+ LR bailed, 10g single rate | FTLX1471D3BCL |
++---------------+---------------------------------------+------------------+
+
+82598-based adapters support all passive direct attach cables that comply with
+SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach cables
+are not supported.
+
+Third party optic modules and cables referred to above are listed only for the
+purpose of highlighting third party specifications and potential
+compatibility, and are not recommendations or endorsements or sponsorship of
+any third party's product by Intel. Intel is not endorsing or promoting
+products made by any third party and the third party reference is provided
+only to share information regarding certain optic modules and cables with the
+above specifications. There may be other manufacturers or suppliers, producing
+or supplying optic modules and cables with similar or matching descriptions.
+Customers must use their own discretion and diligence to purchase optic
+modules and cables from any third party of their choice. Customers are solely
+responsible for assessing the suitability of the product and/or devices and
+for the selection of the vendor for purchasing any product. THE OPTIC MODULES
+AND CABLES REFERRED TO ABOVE ARE NOT WARRANTED OR SUPPORTED BY INTEL. INTEL
+ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED
+WARRANTY, RELATING TO SALE AND/OR USE OF SUCH THIRD PARTY PRODUCTS OR
+SELECTION OF VENDOR BY CUSTOMERS.
+
+Command Line Parameters
+=======================
+
+max_vfs
+-------
+:Valid Range: 1-63
+
+This parameter adds support for SR-IOV. It causes the driver to spawn up to
+max_vfs worth of virtual functions.
+If the value is greater than 0 it will also force the VMDq parameter to be 1 or
+more.
+
+NOTE: This parameter is only used on kernel 3.7.x and below. On kernel 3.8.x
+and above, use sysfs to enable VFs. Also, for Red Hat distributions, this
+parameter is only used on version 6.6 and older. For version 6.7 and newer, use
+sysfs. For example::
+
+ #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs // enable VFs
+ #echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable VFs
+
+The parameters for the driver are referenced by position. Thus, if you have a
+dual port adapter, or more than one adapter in your system, and want N virtual
+functions per port, you must specify a number for each port with each parameter
+separated by a comma. For example::
+
+ modprobe ixgbe max_vfs=4
+
+This will spawn 4 VFs on the first port.
+
+::
+
+ modprobe ixgbe max_vfs=2,4
+
+This will spawn 2 VFs on the first port and 4 VFs on the second port.
+
+NOTE: Caution must be used in loading the driver with these parameters.
+Depending on your system configuration, number of slots, etc., it is impossible
+to predict in all cases where the positions would be on the command line.
+
+NOTE: Neither the device nor the driver control how VFs are mapped into config
+space. Bus layout will vary by operating system. On operating systems that
+support it, you can check sysfs to find the mapping.
+
+NOTE: When either SR-IOV mode or VMDq mode is enabled, hardware VLAN filtering
+and VLAN tag stripping/insertion will remain enabled. Please remove the old
+VLAN filter before the new VLAN filter is added. For example,
+
+::
+
+ ip link set eth0 vf 0 vlan 100 // set VLAN 100 for VF 0
+ ip link set eth0 vf 0 vlan 0 // Delete VLAN 100
+ ip link set eth0 vf 0 vlan 200 // set a new VLAN 200 for VF 0
+
+With kernel 3.6, the driver supports the simultaneous usage of max_vfs and DCB
+features, subject to the constraints described below. Prior to kernel 3.6, the
+driver did not support the simultaneous operation of max_vfs greater than 0 and
+the DCB features (multiple traffic classes utilizing Priority Flow Control and
+Extended Transmission Selection).
+
+When DCB is enabled, network traffic is transmitted and received through
+multiple traffic classes (packet buffers in the NIC). The traffic is associated
+with a specific class based on priority, which has a value of 0 through 7 used
+in the VLAN tag. When SR-IOV is not enabled, each traffic class is associated
+with a set of receive/transmit descriptor queue pairs. The number of queue
+pairs for a given traffic class depends on the hardware configuration. When
+SR-IOV is enabled, the descriptor queue pairs are grouped into pools. The
+Physical Function (PF) and each Virtual Function (VF) is allocated a pool of
+receive/transmit descriptor queue pairs. When multiple traffic classes are
+configured (for example, DCB is enabled), each pool contains a queue pair from
+each traffic class. When a single traffic class is configured in the hardware,
+the pools contain multiple queue pairs from the single traffic class.
+
+The number of VFs that can be allocated depends on the number of traffic
+classes that can be enabled. The configurable number of traffic classes for
+each enabled VF is as follows:
+0 - 15 VFs = Up to 8 traffic classes, depending on device support
+16 - 31 VFs = Up to 4 traffic classes
+32 - 63 VFs = 1 traffic class
+
+When VFs are configured, the PF is allocated one pool as well. The PF supports
+the DCB features with the constraint that each traffic class will only use a
+single queue pair. When zero VFs are configured, the PF can support multiple
+queue pairs per traffic class.
+
+allow_unsupported_sfp
+---------------------
+:Valid Range: 0,1
+:Default Value: 0 (disabled)
+
+This parameter allows unsupported and untested SFP+ modules on 82599-based
+adapters, as long as the type of module is known to the driver.
+
+debug
+-----
+:Valid Range: 0-16 (0=none,...,16=all)
+:Default Value: 0
+
+This parameter adjusts the level of debug messages displayed in the system
+logs.
+
+
+Additional Features and Configurations
+======================================
+
+Flow Control
+------------
+Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
+receiving and transmitting pause frames for ixgbe. When transmit is enabled,
+pause frames are generated when the receive packet buffer crosses a predefined
+threshold. When receive is enabled, the transmit unit will halt for the time
+delay specified when a pause frame is received.
+
+NOTE: You must have a flow control capable link partner.
+
+Flow Control is enabled by default.
+
+Use ethtool to change the flow control settings. To enable or disable Rx or
+Tx Flow Control::
+
+ ethtool -A eth? rx <on|off> tx <on|off>
+
+Note: This command only enables or disables Flow Control if auto-negotiation is
+disabled. If auto-negotiation is enabled, this command changes the parameters
+used for auto-negotiation with the link partner.
+
+To enable or disable auto-negotiation::
+
+ ethtool -s eth? autoneg <on|off>
+
+Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
+on your device, you may not be able to change the auto-negotiation setting.
+
+NOTE: For 82598 backplane cards entering 1 gigabit mode, flow control default
+behavior is changed to off. Flow control in 1 gigabit mode on these devices can
+lead to transmit hangs.
+
+Intel(R) Ethernet Flow Director
+-------------------------------
+The Intel Ethernet Flow Director performs the following tasks:
+
+- Directs receive packets according to their flows to different queues.
+- Enables tight control on routing a flow in the platform.
+- Matches flows and CPU cores for flow affinity.
+- Supports multiple parameters for flexible flow classification and load
+ balancing (in SFP mode only).
+
+NOTE: Intel Ethernet Flow Director masking works in the opposite manner from
+subnet masking. In the following command::
+
+ #ethtool -N eth11 flow-type ip4 src-ip 172.4.1.2 m 255.0.0.0 dst-ip \
+ 172.21.1.1 m 255.128.0.0 action 31
+
+The src-ip value that is written to the filter will be 0.4.1.2, not 172.0.0.0
+as might be expected. Similarly, the dst-ip value written to the filter will be
+0.21.1.1, not 172.0.0.0.
+
+To enable or disable the Intel Ethernet Flow Director::
+
+ # ethtool -K ethX ntuple <on|off>
+
+When disabling ntuple filters, all the user programmed filters are flushed from
+the driver cache and hardware. All needed filters must be re-added when ntuple
+is re-enabled.
+
+To add a filter that directs packet to queue 2, use -U or -N switch::
+
+ # ethtool -N ethX flow-type tcp4 src-ip 192.168.10.1 dst-ip \
+ 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
+
+To see the list of filters currently present::
+
+ # ethtool <-u|-n> ethX
+
+Sideband Perfect Filters
+------------------------
+Sideband Perfect Filters are used to direct traffic that matches specified
+characteristics. They are enabled through ethtool's ntuple interface. To add a
+new filter use the following command::
+
+ ethtool -U <device> flow-type <type> src-ip <ip> dst-ip <ip> src-port <port> \
+ dst-port <port> action <queue>
+
+Where:
+ <device> - the ethernet device to program
+ <type> - can be ip4, tcp4, udp4, or sctp4
+ <ip> - the IP address to match on
+ <port> - the port number to match on
+ <queue> - the queue to direct traffic towards (-1 discards the matched traffic)
+
+Use the following command to delete a filter::
+
+ ethtool -U <device> delete <N>
+
+Where <N> is the filter id displayed when printing all the active filters, and
+may also have been specified using "loc <N>" when adding the filter.
+
+The following example matches TCP traffic sent from 192.168.0.1, port 5300,
+directed to 192.168.0.5, port 80, and sends it to queue 7::
+
+ ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 \
+ src-port 5300 dst-port 80 action 7
+
+For each flow-type, the programmed filters must all have the same matching
+input set. For example, issuing the following two commands is acceptable::
+
+ ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
+ ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
+
+Issuing the next two commands, however, is not acceptable, since the first
+specifies src-ip and the second specifies dst-ip::
+
+ ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
+ ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
+
+The second command will fail with an error. You may program multiple filters
+with the same fields, using different values, but, on one device, you may not
+program two TCP4 filters with different matching fields.
+
+Matching on a sub-portion of a field is not supported by the ixgbe driver, thus
+partial mask fields are not supported.
+
+To create filters that direct traffic to a specific Virtual Function, use the
+"user-def" parameter. Specify the user-def as a 64 bit value, where the lower 32
+bits represents the queue number, while the next 8 bits represent which VF.
+Note that 0 is the PF, so the VF identifier is offset by 1. For example::
+
+ ... user-def 0x800000002 ...
+
+specifies to direct traffic to Virtual Function 7 (8 minus 1) into queue 2 of
+that VF.
+
+Note that these filters will not break internal routing rules, and will not
+route traffic that otherwise would not have been sent to the specified Virtual
+Function.
+
+Jumbo Frames
+------------
+Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
+to a value larger than the default value of 1500.
+
+Use the ifconfig command to increase the MTU size. For example, enter the
+following where <x> is the interface number::
+
+ ifconfig eth<x> mtu 9000 up
+
+Alternatively, you can use the ip command as follows::
+
+ ip link set mtu 9000 dev eth<x>
+ ip link set up dev eth<x>
+
+This setting is not saved across reboots. The setting change can be made
+permanent by adding 'MTU=9000' to the file::
+
+ /etc/sysconfig/network-scripts/ifcfg-eth<x> // for RHEL
+ /etc/sysconfig/network/<config_file> // for SLES
+
+NOTE: The maximum MTU setting for Jumbo Frames is 9710. This value coincides
+with the maximum Jumbo Frames size of 9728 bytes.
+
+NOTE: This driver will attempt to use multiple page sized buffers to receive
+each jumbo packet. This should help to avoid buffer starvation issues when
+allocating receive packets.
+
+NOTE: For 82599-based network connections, if you are enabling jumbo frames in
+a virtual function (VF), jumbo frames must first be enabled in the physical
+function (PF). The VF MTU setting cannot be larger than the PF MTU.
+
+Generic Receive Offload, aka GRO
+--------------------------------
+The driver supports the in-kernel software implementation of GRO. GRO has
+shown that by coalescing Rx traffic into larger chunks of data, CPU
+utilization can be significantly reduced when under large Rx load. GRO is an
+evolution of the previously-used LRO interface. GRO is able to coalesce
+other protocols besides TCP. It's also safe to use with configurations that
+are problematic for LRO, namely bridging and iSCSI.
+
+Data Center Bridging (DCB)
+--------------------------
+NOTE:
+The kernel assumes that TC0 is available, and will disable Priority Flow
+Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
+enabled when setting up DCB on your switch.
+
+DCB is a configuration Quality of Service implementation in hardware. It uses
+the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
+different priorities that traffic can be filtered into. It also enables
+priority flow control (802.1Qbb) which can limit or eliminate the number of
+dropped packets during network stress. Bandwidth can be allocated to each of
+these priorities, which is enforced at the hardware level (802.1Qaz).
+
+Adapter firmware implements LLDP and DCBX protocol agents as per 802.1AB and
+802.1Qaz respectively. The firmware based DCBX agent runs in willing mode only
+and can accept settings from a DCBX capable peer. Software configuration of
+DCBX parameters via dcbtool/lldptool are not supported.
+
+The ixgbe driver implements the DCB netlink interface layer to allow user-space
+to communicate with the driver and query DCB configuration for the port.
+
+ethtool
+-------
+The driver utilizes the ethtool interface for driver configuration and
+diagnostics, as well as displaying statistical information. The latest ethtool
+version is required for this functionality. Download it at:
+https://www.kernel.org/pub/software/network/ethtool/
+
+FCoE
+----
+The ixgbe driver supports Fiber Channel over Ethernet (FCoE) and Data Center
+Bridging (DCB). This code has no default effect on the regular driver
+operation. Configuring DCB and FCoE is outside the scope of this README. Refer
+to http://www.open-fcoe.org/ for FCoE project information and contact
+ixgbe-eedc@lists.sourceforge.net for DCB information.
+
+MAC and VLAN anti-spoofing feature
+----------------------------------
+When a malicious driver attempts to send a spoofed packet, it is dropped by the
+hardware and not transmitted.
+
+An interrupt is sent to the PF driver notifying it of the spoof attempt. When a
+spoofed packet is detected, the PF driver will send the following message to
+the system log (displayed by the "dmesg" command)::
+
+ ixgbe ethX: ixgbe_spoof_check: n spoofed packets detected
+
+where "x" is the PF interface number; and "n" is number of spoofed packets.
+NOTE: This feature can be disabled for a specific Virtual Function (VF)::
+
+ ip link set <pf dev> vf <vf id> spoofchk {off|on}
+
+IPsec Offload
+-------------
+The ixgbe driver supports IPsec Hardware Offload. When creating Security
+Associations with "ip xfrm ..." the 'offload' tag option can be used to
+register the IPsec SA with the driver in order to get higher throughput in
+the secure communications.
+
+The offload is also supported for ixgbe's VFs, but the VF must be set as
+'trusted' and the support must be enabled with::
+
+ ethtool --set-priv-flags eth<x> vf-ipsec on
+ ip link set eth<x> vf <y> trust on
+
+
+Known Issues/Troubleshooting
+============================
+
+Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS
+---------------------------------------------------------------------
+Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM.
+This includes traditional PCIe devices, as well as SR-IOV-capable devices based
+on the Intel Ethernet Controller XL710.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/intel/ixgbevf.rst b/Documentation/networking/device_drivers/intel/ixgbevf.rst
new file mode 100644
index 0000000..76bbde7
--- /dev/null
+++ b/Documentation/networking/device_drivers/intel/ixgbevf.rst
@@ -0,0 +1,67 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+============================================================
+Linux Base Virtual Function Driver for Intel(R) 10G Ethernet
+============================================================
+
+Intel 10 Gigabit Virtual Function Linux driver.
+Copyright(c) 1999-2018 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Known Issues
+- Support
+
+This driver supports 82599, X540, X550, and X552-based virtual function devices
+that can only be activated on kernels that support SR-IOV.
+
+For questions related to hardware requirements, refer to the documentation
+supplied with your Intel adapter. All hardware requirements listed apply to use
+with Linux.
+
+
+Identifying Your Adapter
+========================
+The driver is compatible with devices based on the following:
+
+ * Intel(R) Ethernet Controller 82598
+ * Intel(R) Ethernet Controller 82599
+ * Intel(R) Ethernet Controller X520
+ * Intel(R) Ethernet Controller X540
+ * Intel(R) Ethernet Controller x550
+ * Intel(R) Ethernet Controller X552
+ * Intel(R) Ethernet Controller X553
+
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+https://www.intel.com/support
+
+Known Issues/Troubleshooting
+============================
+
+SR-IOV requires the correct platform and OS support.
+
+The guest OS loading this driver must support MSI-X interrupts.
+
+This driver is only supported as a loadable module at this time. Intel is not
+supplying patches against the kernel source to allow for static linking of the
+drivers.
+
+VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs.
+
+
+Support
+=======
+For general information, go to the Intel support website at:
+
+https://www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+https://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to e1000-devel@lists.sf.net.
diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst
new file mode 100644
index 0000000..d071c6b
--- /dev/null
+++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst
@@ -0,0 +1,300 @@
+.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+
+=================================================
+Mellanox ConnectX(R) mlx5 core VPI Network Driver
+=================================================
+
+Copyright (c) 2019, Mellanox Technologies LTD.
+
+Contents
+========
+
+- `Enabling the driver and kconfig options`_
+- `Devlink info`_
+- `Devlink parameters`_
+- `Devlink health reporters`_
+- `mlx5 tracepoints`_
+
+Enabling the driver and kconfig options
+================================================
+
+| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out)
+| at build time via kernel Kconfig flags.
+| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
+| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
+| For the list of advanced features please see below.
+
+**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
+
+| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config.
+| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib).
+
+
+**CONFIG_MLX5_CORE_EN=(y/n)**
+
+| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads.
+| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be
+| built-in into mlx5_core.ko.
+
+
+**CONFIG_MLX5_EN_ARFS=(y/n)**
+
+| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering.
+| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4
+
+
+**CONFIG_MLX5_EN_RXNFC=(y/n)**
+
+| Enables ethtool receive network flow classification, which allows user defined
+| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API.
+
+
+**CONFIG_MLX5_CORE_EN_DCB=(y/n)**:
+
+| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_.
+
+
+**CONFIG_MLX5_MPFS=(y/n)**
+
+| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC.
+| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing
+| user configured unicast MAC addresses to the requesting PF.
+
+
+**CONFIG_MLX5_ESWITCH=(y/n)**
+
+| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering
+| and switching for the enabled VFs and PF in two available modes:
+| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_.
+| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_.
+
+
+**CONFIG_MLX5_CORE_IPOIB=(y/n)**
+
+| IPoIB offloads & acceleration support.
+| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma
+| IPoIB ulp netdevice.
+
+
+**CONFIG_MLX5_FPGA=(y/n)**
+
+| Build support for the Innova family of network cards by Mellanox Technologies.
+| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board.
+| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow
+| building sandbox-specific client drivers.
+
+
+**CONFIG_MLX5_EN_IPSEC=(y/n)**
+
+| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
+
+**CONFIG_MLX5_EN_TLS=(y/n)**
+
+| TLS cryptography-offload accelaration.
+
+
+**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
+
+| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
+
+
+**External options** ( Choose if the corresponding mlx5 feature is required )
+
+- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
+- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled.
+- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
+
+Devlink info
+============
+
+The devlink info reports the running and stored firmware versions on device.
+It also prints the device PSID which represents the HCA board type ID.
+
+User command example::
+
+ $ devlink dev info pci/0000:00:06.0
+ pci/0000:00:06.0:
+ driver mlx5_core
+ versions:
+ fixed:
+ fw.psid MT_0000000009
+ running:
+ fw.version 16.26.0100
+ stored:
+ fw.version 16.26.0100
+
+Devlink parameters
+==================
+
+flow_steering_mode: Device flow steering mode
+---------------------------------------------
+The flow steering mode parameter controls the flow steering mode of the driver.
+Two modes are supported:
+1. 'dmfs' - Device managed flow steering.
+2. 'smfs - Software/Driver managed flow steering.
+
+In DMFS mode, the HW steering entities are created and managed through the
+Firmware.
+In SMFS mode, the HW steering entities are created and managed though by
+the driver directly into Hardware without firmware intervention.
+
+SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode.
+
+User command examples:
+
+- Set SMFS flow steering mode::
+
+ $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
+
+- Read device flow steering mode::
+
+ $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
+ pci/0000:06:00.0:
+ name flow_steering_mode type driver-specific
+ values:
+ cmode runtime value smfs
+
+
+Devlink health reporters
+========================
+
+tx reporter
+-----------
+The tx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- TX timeout
+ Report on kernel tx timeout detection.
+ Recover by searching lost interrupts.
+- TX error completion
+ Report on error tx completion.
+ Recover by flushing the TX queue and reset it.
+
+TX reporter also support on demand diagnose callback, on which it provides
+real time information of its send queues status.
+
+User commands examples:
+
+- Diagnose send queues status::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter tx
+
+NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+
+- Show number of tx errors indicated, number of recover flows ended successfully,
+ is autorecover enabled and graceful period from last recover::
+
+ $ devlink health show pci/0000:82:00.0 reporter tx
+
+rx reporter
+-----------
+The rx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- RX queues initialization (population) timeout
+ RX queues descriptors population on ring initialization is done in
+ napi context via triggering an irq, in case of a failure to get
+ the minimum amount of descriptors, a timeout would occur and it
+ could be recoverable by polling the EQ (Event Queue).
+- RX completions with errors (reported by HW on interrupt context)
+ Report on rx completion error.
+ Recover (if needed) by flushing the related queue and reset it.
+
+RX reporter also supports on demand diagnose callback, on which it
+provides real time information of its receive queues status.
+
+- Diagnose rx queues status, and corresponding completion queue::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter rx
+
+NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+
+- Show number of rx errors indicated, number of recover flows ended successfully,
+ is autorecover enabled and graceful period from last recover::
+
+ $ devlink health show pci/0000:82:00.0 reporter rx
+
+fw reporter
+-----------
+The fw reporter implements diagnose and dump callbacks.
+It follows symptoms of fw error such as fw syndrome by triggering
+fw core dump and storing it into the dump buffer.
+The fw reporter diagnose command can be triggered any time by the user to check
+current fw status.
+
+User commands examples:
+
+- Check fw heath status::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter fw
+
+- Read FW core dump if already stored or trigger new one::
+
+ $ devlink health dump show pci/0000:82:00.0 reporter fw
+
+NOTE: This command can run only on the PF which has fw tracer ownership,
+running it on other PF or any VF will return "Operation not permitted".
+
+fw fatal reporter
+-----------------
+The fw fatal reporter implements dump and recover callbacks.
+It follows fatal errors indications by CR-space dump and recover flow.
+The CR-space dump uses vsc interface which is valid even if the FW command
+interface is not functional, which is the case in most FW fatal errors.
+The recover function runs recover flow which reloads the driver and triggers fw
+reset if needed.
+
+User commands examples:
+
+- Run fw recover flow manually::
+
+ $ devlink health recover pci/0000:82:00.0 reporter fw_fatal
+
+- Read FW CR-space dump if already strored or trigger new one::
+
+ $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
+
+NOTE: This command can run only on PF.
+
+mlx5 tracepoints
+================
+
+mlx5 driver provides internal trace points for tracking and debugging using
+kernel tracepoints interfaces (refer to Documentation/trace/ftrase.rst).
+
+For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/
+
+tc and eswitch offloads tracepoints:
+
+- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::
+
+ $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT
+
+- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::
+
+ $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL
+
+- mlx5e_stats_flower: trace flower stats request::
+
+ $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217
+
+- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::
+
+ $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1
+
+- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::
+
+ $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/device_drivers/microsoft/netvsc.txt
similarity index 86%
rename from Documentation/networking/netvsc.txt
rename to Documentation/networking/device_drivers/microsoft/netvsc.txt
index 92f5b31..3bfa635 100644
--- a/Documentation/networking/netvsc.txt
+++ b/Documentation/networking/device_drivers/microsoft/netvsc.txt
@@ -45,6 +45,15 @@
like packets and significantly reduces CPU usage under heavy Rx
load.
+ Large Receive Offload (LRO), or Receive Side Coalescing (RSC)
+ -------------------------------------------------------------
+ The driver supports LRO/RSC in the vSwitch feature. It reduces the per packet
+ processing overhead by coalescing multiple TCP segments when possible. The
+ feature is enabled by default on VMs running on Windows Server 2019 and
+ later. It may be changed by ethtool command:
+ ethtool -K eth0 lro on
+ ethtool -K eth0 lro off
+
SR-IOV support
--------------
Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/device_drivers/neterion/s2io.txt
similarity index 100%
rename from Documentation/networking/s2io.txt
rename to Documentation/networking/device_drivers/neterion/s2io.txt
diff --git a/Documentation/networking/vxge.txt b/Documentation/networking/device_drivers/neterion/vxge.txt
similarity index 100%
rename from Documentation/networking/vxge.txt
rename to Documentation/networking/device_drivers/neterion/vxge.txt
diff --git a/Documentation/networking/device_drivers/netronome/nfp.rst b/Documentation/networking/device_drivers/netronome/nfp.rst
new file mode 100644
index 0000000..6c08ac8
--- /dev/null
+++ b/Documentation/networking/device_drivers/netronome/nfp.rst
@@ -0,0 +1,133 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+=============================================
+Netronome Flow Processor (NFP) Kernel Drivers
+=============================================
+
+Copyright (c) 2019, Netronome Systems, Inc.
+
+Contents
+========
+
+- `Overview`_
+- `Acquiring Firmware`_
+
+Overview
+========
+
+This driver supports Netronome's line of Flow Processor devices,
+including the NFP4000, NFP5000, and NFP6000 models, which are also
+incorporated in the company's family of Agilio SmartNICs. The SR-IOV
+physical and virtual functions for these devices are supported by
+the driver.
+
+Acquiring Firmware
+==================
+
+The NFP4000 and NFP6000 devices require application specific firmware
+to function. Application firmware can be located either on the host file system
+or in the device flash (if supported by management firmware).
+
+Firmware files on the host filesystem contain card type (`AMDA-*` string), media
+config etc. They should be placed in `/lib/firmware/netronome` directory to
+load firmware from the host file system.
+
+Firmware for basic NIC operation is available in the upstream
+`linux-firmware.git` repository.
+
+Firmware in NVRAM
+-----------------
+
+Recent versions of management firmware supports loading application
+firmware from flash when the host driver gets probed. The firmware loading
+policy configuration may be used to configure this feature appropriately.
+
+Devlink or ethtool can be used to update the application firmware on the device
+flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
+command. Users need to take care to write the correct firmware image for the
+card and media configuration to flash.
+
+Available storage space in flash depends on the card being used.
+
+Dealing with multiple projects
+------------------------------
+
+NFP hardware is fully programmable therefore there can be different
+firmware images targeting different applications.
+
+When using application firmware from host, we recommend placing
+actual firmware files in application-named subdirectories in
+`/lib/firmware/netronome` and linking the desired files, e.g.::
+
+ $ tree /lib/firmware/netronome/
+ /lib/firmware/netronome/
+ ├── bpf
+ │ ├── nic_AMDA0081-0001_1x40.nffw
+ │ └── nic_AMDA0081-0001_4x10.nffw
+ ├── flower
+ │ ├── nic_AMDA0081-0001_1x40.nffw
+ │ └── nic_AMDA0081-0001_4x10.nffw
+ ├── nic
+ │ ├── nic_AMDA0081-0001_1x40.nffw
+ │ └── nic_AMDA0081-0001_4x10.nffw
+ ├── nic_AMDA0081-0001_1x40.nffw -> bpf/nic_AMDA0081-0001_1x40.nffw
+ └── nic_AMDA0081-0001_4x10.nffw -> bpf/nic_AMDA0081-0001_4x10.nffw
+
+ 3 directories, 8 files
+
+You may need to use hard instead of symbolic links on distributions
+which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
+
+After changing firmware files you may need to regenerate the initramfs
+image. Initramfs contains drivers and firmware files your system may
+need to boot. Refer to the documentation of your distribution to find
+out how to update initramfs. Good indication of stale initramfs
+is system loading wrong driver or firmware on boot, but when driver is
+later reloaded manually everything works correctly.
+
+Selecting firmware per device
+-----------------------------
+
+Most commonly all cards on the system use the same type of firmware.
+If you want to load specific firmware image for a specific card, you
+can use either the PCI bus address or serial number. Driver will print
+which files it's looking for when it recognizes a NFP device::
+
+ nfp: Looking for firmware file in order of priority:
+ nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
+ nfp: netronome/pci-0000:02:00.0.nffw: not found
+ nfp: netronome/nic_AMDA0081-0001_1x40.nffw: found, loading...
+
+In this case if file (or link) called *serial-00-12-34-aa-bb-5d-10-ff.nffw*
+or *pci-0000:02:00.0.nffw* is present in `/lib/firmware/netronome` this
+firmware file will take precedence over `nic_AMDA*` files.
+
+Note that `serial-*` and `pci-*` files are **not** automatically included
+in initramfs, you will have to refer to documentation of appropriate tools
+to find out how to include them.
+
+Firmware loading policy
+-----------------------
+
+Firmware loading policy is controlled via three HWinfo parameters
+stored as key value pairs in the device flash:
+
+app_fw_from_flash
+ Defines which firmware should take precedence, 'Disk' (0), 'Flash' (1) or
+ the 'Preferred' (2) firmware. When 'Preferred' is selected, the management
+ firmware makes the decision over which firmware will be loaded by comparing
+ versions of the flash firmware and the host supplied firmware.
+ This variable is configurable using the 'fw_load_policy'
+ devlink parameter.
+
+abi_drv_reset
+ Defines if the driver should reset the firmware when
+ the driver is probed, either 'Disk' (0) if firmware was found on disk,
+ 'Always' (1) reset or 'Never' (2) reset. Note that the device is always
+ reset on driver unload if firmware was loaded when the driver was probed.
+ This variable is configurable using the 'reset_dev_on_drv_probe'
+ devlink parameter.
+
+abi_drv_load_ifc
+ Defines a list of PF devices allowed to load FW on the device.
+ This variable is not currently user configurable.
diff --git a/Documentation/networking/device_drivers/pensando/ionic.rst b/Documentation/networking/device_drivers/pensando/ionic.rst
new file mode 100644
index 0000000..c17d680
--- /dev/null
+++ b/Documentation/networking/device_drivers/pensando/ionic.rst
@@ -0,0 +1,45 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+========================================================
+Linux Driver for the Pensando(R) Ethernet adapter family
+========================================================
+
+Pensando Linux Ethernet driver.
+Copyright(c) 2019 Pensando Systems, Inc
+
+Contents
+========
+
+- Identifying the Adapter
+- Support
+
+Identifying the Adapter
+=======================
+
+To find if one or more Pensando PCI Ethernet devices are installed on the
+host, check for the PCI devices::
+
+ $ lspci -d 1dd8:
+ b5:00.0 Ethernet controller: Device 1dd8:1002
+ b6:00.0 Ethernet controller: Device 1dd8:1002
+
+If such devices are listed as above, then the ionic.ko driver should find
+and configure them for use. There should be log entries in the kernel
+messages such as these::
+
+ $ dmesg | grep ionic
+ ionic Pensando Ethernet NIC Driver, ver 0.15.0-k
+ ionic 0000:b5:00.0 enp181s0: renamed from eth0
+ ionic 0000:b6:00.0 enp182s0: renamed from eth0
+
+Support
+=======
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by Pensando personnel::
+
+ netdev@vger.kernel.org
+
+For more specific support needs, please use the Pensando driver support
+email::
+
+ drivers@pensando.io
diff --git a/Documentation/networking/LICENSE.qla3xxx b/Documentation/networking/device_drivers/qlogic/LICENSE.qla3xxx
similarity index 100%
rename from Documentation/networking/LICENSE.qla3xxx
rename to Documentation/networking/device_drivers/qlogic/LICENSE.qla3xxx
diff --git a/Documentation/networking/LICENSE.qlcnic b/Documentation/networking/device_drivers/qlogic/LICENSE.qlcnic
similarity index 100%
rename from Documentation/networking/LICENSE.qlcnic
rename to Documentation/networking/device_drivers/qlogic/LICENSE.qlcnic
diff --git a/Documentation/networking/LICENSE.qlge b/Documentation/networking/device_drivers/qlogic/LICENSE.qlge
similarity index 100%
rename from Documentation/networking/LICENSE.qlge
rename to Documentation/networking/device_drivers/qlogic/LICENSE.qlge
diff --git a/Documentation/networking/rmnet.txt b/Documentation/networking/device_drivers/qualcomm/rmnet.txt
similarity index 100%
rename from Documentation/networking/rmnet.txt
rename to Documentation/networking/device_drivers/qualcomm/rmnet.txt
diff --git a/Documentation/networking/README.sb1000 b/Documentation/networking/device_drivers/sb1000.txt
similarity index 100%
rename from Documentation/networking/README.sb1000
rename to Documentation/networking/device_drivers/sb1000.txt
diff --git a/Documentation/networking/smc9.txt b/Documentation/networking/device_drivers/smsc/smc9.txt
similarity index 100%
rename from Documentation/networking/smc9.txt
rename to Documentation/networking/device_drivers/smsc/smc9.txt
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/device_drivers/stmicro/stmmac.txt
similarity index 99%
rename from Documentation/networking/stmmac.txt
rename to Documentation/networking/device_drivers/stmicro/stmmac.txt
index 2bb0707..1ae979f 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/device_drivers/stmicro/stmmac.txt
@@ -267,7 +267,7 @@
During the board's device_init we can configure the first
MAC for fixed_link by calling:
- fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status, -1);
+ fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status);
and the second one, with a real PHY device attached to the bus,
by using the stmmac_mdio_bus_data structure (to provide the id, the
reset procedure etc).
diff --git a/Documentation/networking/ti-cpsw.txt b/Documentation/networking/device_drivers/ti/cpsw.txt
similarity index 100%
rename from Documentation/networking/ti-cpsw.txt
rename to Documentation/networking/device_drivers/ti/cpsw.txt
diff --git a/Documentation/networking/tlan.txt b/Documentation/networking/device_drivers/ti/tlan.txt
similarity index 100%
rename from Documentation/networking/tlan.txt
rename to Documentation/networking/device_drivers/ti/tlan.txt
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/device_drivers/toshiba/spider_net.txt
similarity index 100%
rename from Documentation/networking/spider_net.txt
rename to Documentation/networking/device_drivers/toshiba/spider_net.txt
diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink-health.txt
new file mode 100644
index 0000000..1db3fbe
--- /dev/null
+++ b/Documentation/networking/devlink-health.txt
@@ -0,0 +1,86 @@
+The health mechanism is targeted for Real Time Alerting, in order to know when
+something bad had happened to a PCI device
+- Provide alert debug information
+- Self healing
+- If problem needs vendor support, provide a way to gather all needed debugging
+ information.
+
+The main idea is to unify and centralize driver health reports in the
+generic devlink instance and allow the user to set different
+attributes of the health reporting and recovery procedures.
+
+The devlink health reporter:
+Device driver creates a "health reporter" per each error/health type.
+Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+or unknown (driver specific).
+For each registered health reporter a driver can issue error/health reports
+asynchronously. All health reports handling is done by devlink.
+Device driver can provide specific callbacks for each "health reporter", e.g.
+ - Recovery procedures
+ - Diagnostics and object dump procedures
+ - OOB initial parameters
+Different parts of the driver can register different types of health reporters
+with different handlers.
+
+Once an error is reported, devlink health will do the following actions:
+ * A log is being send to the kernel trace events buffer
+ * Health status and statistics are being updated for the reporter instance
+ * Object dump is being taken and saved at the reporter instance (as long as
+ there is no other dump which is already stored)
+ * Auto recovery attempt is being done. Depends on:
+ - Auto-recovery configuration
+ - Grace period vs. time passed since last recover
+
+The user interface:
+User can access/change each reporter's parameters and driver specific callbacks
+via devlink, e.g per error type (per health reporter)
+ - Configure reporter's generic parameters (like: disable/enable auto recovery)
+ - Invoke recovery procedure
+ - Run diagnostics
+ - Object dump
+
+The devlink health interface (via netlink):
+DEVLINK_CMD_HEALTH_REPORTER_GET
+ Retrieves status and configuration info per DEV and reporter.
+DEVLINK_CMD_HEALTH_REPORTER_SET
+ Allows reporter-related configuration setting.
+DEVLINK_CMD_HEALTH_REPORTER_RECOVER
+ Triggers a reporter's recovery procedure.
+DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
+ Retrieves diagnostics data from a reporter on a device.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
+ Retrieves the last stored dump. Devlink health
+ saves a single dump. If an dump is not already stored by the devlink
+ for this reporter, devlink generates a new dump.
+ dump output is defined by the reporter.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
+ Clears the last saved dump file for the specified reporter.
+
+
+ netlink
+ +--------------------------+
+ | |
+ | + |
+ | | |
+ +--------------------------+
+ |request for ops
+ |(diagnose,
+ mlx5_core devlink |recover,
+ |dump)
++--------+ +--------------------------+
+| | | reporter| |
+| | | +---------v----------+ |
+| | ops execution | | | |
+| <----------------------------------+ | |
+| | | | | |
+| | | + ^------------------+ |
+| | | | request for ops |
+| | | | (recover, dump) |
+| | | | |
+| | | +-+------------------+ |
+| | health report | | health handler | |
+| +-------------------------------> | |
+| | | +--------------------+ |
+| | health reporter create | |
+| +----------------------------> |
++--------+ +--------------------------+
diff --git a/Documentation/networking/devlink-info-versions.rst b/Documentation/networking/devlink-info-versions.rst
new file mode 100644
index 0000000..4914f58
--- /dev/null
+++ b/Documentation/networking/devlink-info-versions.rst
@@ -0,0 +1,64 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+=====================
+Devlink info versions
+=====================
+
+board.id
+========
+
+Unique identifier of the board design.
+
+board.rev
+=========
+
+Board design revision.
+
+asic.id
+=======
+
+ASIC design identifier.
+
+asic.rev
+========
+
+ASIC design revision.
+
+board.manufacture
+=================
+
+An identifier of the company or the facility which produced the part.
+
+fw
+==
+
+Overall firmware version, often representing the collection of
+fw.mgmt, fw.app, etc.
+
+fw.mgmt
+=======
+
+Control unit firmware version. This firmware is responsible for house
+keeping tasks, PHY control etc. but not the packet-by-packet data path
+operation.
+
+fw.app
+======
+
+Data path microcode controlling high-speed packet processing.
+
+fw.undi
+=======
+
+UNDI software, may include the UEFI driver, firmware or both.
+
+fw.ncsi
+=======
+
+Version of the software responsible for supporting/handling the
+Network Controller Sideband Interface.
+
+fw.psid
+=======
+
+Unique identifier of the firmware parameter set.
diff --git a/Documentation/networking/devlink-params-bnxt.txt b/Documentation/networking/devlink-params-bnxt.txt
new file mode 100644
index 0000000..481aa30
--- /dev/null
+++ b/Documentation/networking/devlink-params-bnxt.txt
@@ -0,0 +1,18 @@
+enable_sriov [DEVICE, GENERIC]
+ Configuration mode: Permanent
+
+ignore_ari [DEVICE, GENERIC]
+ Configuration mode: Permanent
+
+msix_vec_per_pf_max [DEVICE, GENERIC]
+ Configuration mode: Permanent
+
+msix_vec_per_pf_min [DEVICE, GENERIC]
+ Configuration mode: Permanent
+
+gre_ver_check [DEVICE, DRIVER-SPECIFIC]
+ Generic Routing Encapsulation (GRE) version check will
+ be enabled in the device. If disabled, device skips
+ version checking for incoming packets.
+ Type: Boolean
+ Configuration mode: Permanent
diff --git a/Documentation/networking/devlink-params-mlxsw.txt b/Documentation/networking/devlink-params-mlxsw.txt
new file mode 100644
index 0000000..c63ea9f
--- /dev/null
+++ b/Documentation/networking/devlink-params-mlxsw.txt
@@ -0,0 +1,10 @@
+fw_load_policy [DEVICE, GENERIC]
+ Configuration mode: driverinit
+
+acl_region_rehash_interval [DEVICE, DRIVER-SPECIFIC]
+ Sets an interval for periodic ACL region rehashes.
+ The value is in milliseconds, minimal value is "3000".
+ Value "0" disables the periodic work.
+ The first rehash will be run right after value is set.
+ Type: u32
+ Configuration mode: runtime
diff --git a/Documentation/networking/devlink-params-nfp.txt b/Documentation/networking/devlink-params-nfp.txt
new file mode 100644
index 0000000..43e4d40
--- /dev/null
+++ b/Documentation/networking/devlink-params-nfp.txt
@@ -0,0 +1,5 @@
+fw_load_policy [DEVICE, GENERIC]
+ Configuration mode: permanent
+
+reset_dev_on_drv_probe [DEVICE, GENERIC]
+ Configuration mode: permanent
diff --git a/Documentation/networking/devlink-params.txt b/Documentation/networking/devlink-params.txt
new file mode 100644
index 0000000..ddba3e9
--- /dev/null
+++ b/Documentation/networking/devlink-params.txt
@@ -0,0 +1,67 @@
+Devlink configuration parameters
+================================
+Following is the list of configuration parameters via devlink interface.
+Each parameter can be generic or driver specific and are device level
+parameters.
+
+Note that the driver-specific files should contain the generic params
+they support to, with supported config modes.
+
+Each parameter can be set in different configuration modes:
+ runtime - set while driver is running, no reset required.
+ driverinit - applied while driver initializes, requires restart
+ driver by devlink reload command.
+ permanent - written to device's non-volatile memory, hard reset
+ required.
+
+Following is the list of parameters:
+====================================
+enable_sriov [DEVICE, GENERIC]
+ Enable Single Root I/O Virtualisation (SRIOV) in
+ the device.
+ Type: Boolean
+
+ignore_ari [DEVICE, GENERIC]
+ Ignore Alternative Routing-ID Interpretation (ARI)
+ capability. If enabled, adapter will ignore ARI
+ capability even when platforms has the support
+ enabled and creates same number of partitions when
+ platform does not support ARI.
+ Type: Boolean
+
+msix_vec_per_pf_max [DEVICE, GENERIC]
+ Provides the maximum number of MSIX interrupts that
+ a device can create. Value is same across all
+ physical functions (PFs) in the device.
+ Type: u32
+
+msix_vec_per_pf_min [DEVICE, GENERIC]
+ Provides the minimum number of MSIX interrupts required
+ for the device initialization. Value is same across all
+ physical functions (PFs) in the device.
+ Type: u32
+
+fw_load_policy [DEVICE, GENERIC]
+ Controls the device's firmware loading policy.
+ Valid values:
+ * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0)
+ Load firmware version preferred by the driver.
+ * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1)
+ Load firmware currently stored in flash.
+ * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK (2)
+ Load firmware currently available on host's disk.
+ Type: u8
+
+reset_dev_on_drv_probe [DEVICE, GENERIC]
+ Controls the device's reset policy on driver probe.
+ Valid values:
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_UNKNOWN (0)
+ Unknown or invalid value.
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_ALWAYS (1)
+ Always reset device on driver probe.
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_NEVER (2)
+ Never reset device on driver probe.
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_DISK (3)
+ Reset only if device firmware can be found in the
+ filesystem.
+ Type: u8
diff --git a/Documentation/networking/devlink-trap-netdevsim.rst b/Documentation/networking/devlink-trap-netdevsim.rst
new file mode 100644
index 0000000..b721c94
--- /dev/null
+++ b/Documentation/networking/devlink-trap-netdevsim.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Devlink Trap netdevsim
+======================
+
+Driver-specific Traps
+=====================
+
+.. list-table:: List of Driver-specific Traps Registered by ``netdevsim``
+ :widths: 5 5 90
+
+ * - Name
+ - Type
+ - Description
+ * - ``fid_miss``
+ - ``exception``
+ - When a packet enters the device it is classified to a filtering
+ indentifier (FID) based on the ingress port and VLAN. This trap is used
+ to trap packets for which a FID could not be found
diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
new file mode 100644
index 0000000..8e90a85
--- /dev/null
+++ b/Documentation/networking/devlink-trap.rst
@@ -0,0 +1,209 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Devlink Trap
+============
+
+Background
+==========
+
+Devices capable of offloading the kernel's datapath and perform functions such
+as bridging and routing must also be able to send specific packets to the
+kernel (i.e., the CPU) for processing.
+
+For example, a device acting as a multicast-aware bridge must be able to send
+IGMP membership reports to the kernel for processing by the bridge module.
+Without processing such packets, the bridge module could never populate its
+MDB.
+
+As another example, consider a device acting as router which has received an IP
+packet with a TTL of 1. Upon routing the packet the device must send it to the
+kernel so that it will route it as well and generate an ICMP Time Exceeded
+error datagram. Without letting the kernel route such packets itself, utilities
+such as ``traceroute`` could never work.
+
+The fundamental ability of sending certain packets to the kernel for processing
+is called "packet trapping".
+
+Overview
+========
+
+The ``devlink-trap`` mechanism allows capable device drivers to register their
+supported packet traps with ``devlink`` and report trapped packets to
+``devlink`` for further analysis.
+
+Upon receiving trapped packets, ``devlink`` will perform a per-trap packets and
+bytes accounting and potentially report the packet to user space via a netlink
+event along with all the provided metadata (e.g., trap reason, timestamp, input
+port). This is especially useful for drop traps (see :ref:`Trap-Types`)
+as it allows users to obtain further visibility into packet drops that would
+otherwise be invisible.
+
+The following diagram provides a general overview of ``devlink-trap``::
+
+ Netlink event: Packet w/ metadata
+ Or a summary of recent drops
+ ^
+ |
+ Userspace |
+ +---------------------------------------------------+
+ Kernel |
+ |
+ +-------+--------+
+ | |
+ | drop_monitor |
+ | |
+ +-------^--------+
+ |
+ |
+ |
+ +----+----+
+ | | Kernel's Rx path
+ | devlink | (non-drop traps)
+ | |
+ +----^----+ ^
+ | |
+ +-----------+
+ |
+ +-------+-------+
+ | |
+ | Device driver |
+ | |
+ +-------^-------+
+ Kernel |
+ +---------------------------------------------------+
+ Hardware |
+ | Trapped packet
+ |
+ +--+---+
+ | |
+ | ASIC |
+ | |
+ +------+
+
+.. _Trap-Types:
+
+Trap Types
+==========
+
+The ``devlink-trap`` mechanism supports the following packet trap types:
+
+ * ``drop``: Trapped packets were dropped by the underlying device. Packets
+ are only processed by ``devlink`` and not injected to the kernel's Rx path.
+ The trap action (see :ref:`Trap-Actions`) can be changed.
+ * ``exception``: Trapped packets were not forwarded as intended by the
+ underlying device due to an exception (e.g., TTL error, missing neighbour
+ entry) and trapped to the control plane for resolution. Packets are
+ processed by ``devlink`` and injected to the kernel's Rx path. Changing the
+ action of such traps is not allowed, as it can easily break the control
+ plane.
+
+.. _Trap-Actions:
+
+Trap Actions
+============
+
+The ``devlink-trap`` mechanism supports the following packet trap actions:
+
+ * ``trap``: The sole copy of the packet is sent to the CPU.
+ * ``drop``: The packet is dropped by the underlying device and a copy is not
+ sent to the CPU.
+
+Generic Packet Traps
+====================
+
+Generic packet traps are used to describe traps that trap well-defined packets
+or packets that are trapped due to well-defined conditions (e.g., TTL error).
+Such traps can be shared by multiple device drivers and their description must
+be added to the following table:
+
+.. list-table:: List of Generic Packet Traps
+ :widths: 5 5 90
+
+ * - Name
+ - Type
+ - Description
+ * - ``source_mac_is_multicast``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop because of a
+ multicast source MAC
+ * - ``vlan_tag_mismatch``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop in case of VLAN
+ tag mismatch: The ingress bridge port is not configured with a PVID and
+ the packet is untagged or prio-tagged
+ * - ``ingress_vlan_filter``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop in case they are
+ tagged with a VLAN that is not configured on the ingress bridge port
+ * - ``ingress_spanning_tree_filter``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop in case the STP
+ state of the ingress bridge port is not "forwarding"
+ * - ``port_list_is_empty``
+ - ``drop``
+ - Traps packets that the device decided to drop in case they need to be
+ flooded (e.g., unknown unicast, unregistered multicast) and there are
+ no ports the packets should be flooded to
+ * - ``port_loopback_filter``
+ - ``drop``
+ - Traps packets that the device decided to drop in case after layer 2
+ forwarding the only port from which they should be transmitted through
+ is the port from which they were received
+ * - ``blackhole_route``
+ - ``drop``
+ - Traps packets that the device decided to drop in case they hit a
+ blackhole route
+ * - ``ttl_value_is_too_small``
+ - ``exception``
+ - Traps unicast packets that should be forwarded by the device whose TTL
+ was decremented to 0 or less
+ * - ``tail_drop``
+ - ``drop``
+ - Traps packets that the device decided to drop because they could not be
+ enqueued to a transmission queue which is full
+
+Driver-specific Packet Traps
+============================
+
+Device drivers can register driver-specific packet traps, but these must be
+clearly documented. Such traps can correspond to device-specific exceptions and
+help debug packet drops caused by these exceptions. The following list includes
+links to the description of driver-specific traps registered by various device
+drivers:
+
+ * :doc:`/devlink-trap-netdevsim`
+
+Generic Packet Trap Groups
+==========================
+
+Generic packet trap groups are used to aggregate logically related packet
+traps. These groups allow the user to batch operations such as setting the trap
+action of all member traps. In addition, ``devlink-trap`` can report aggregated
+per-group packets and bytes statistics, in case per-trap statistics are too
+narrow. The description of these groups must be added to the following table:
+
+.. list-table:: List of Generic Packet Trap Groups
+ :widths: 10 90
+
+ * - Name
+ - Description
+ * - ``l2_drops``
+ - Contains packet traps for packets that were dropped by the device during
+ layer 2 forwarding (i.e., bridge)
+ * - ``l3_drops``
+ - Contains packet traps for packets that were dropped by the device or hit
+ an exception (e.g., TTL error) during layer 3 forwarding
+ * - ``buffer_drops``
+ - Contains packet traps for packets that were dropped by the device due to
+ an enqueue decision
+
+Testing
+=======
+
+See ``tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh`` for a
+test covering the core infrastructure. Test cases should be added for any new
+functionality.
+
+Device drivers should focus their tests on device-specific functionality, such
+as the triggering of supported packet traps.
diff --git a/Documentation/networking/dsa/b53.rst b/Documentation/networking/dsa/b53.rst
new file mode 100644
index 0000000..b41637c
--- /dev/null
+++ b/Documentation/networking/dsa/b53.rst
@@ -0,0 +1,183 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+Broadcom RoboSwitch Ethernet switch driver
+==========================================
+
+The Broadcom RoboSwitch Ethernet switch family is used in quite a range of
+xDSL router, cable modems and other multimedia devices.
+
+The actual implementation supports the devices BCM5325E, BCM5365, BCM539x,
+BCM53115 and BCM53125 as well as BCM63XX.
+
+Implementation details
+======================
+
+The driver is located in ``drivers/net/dsa/b53/`` and is implemented as a
+DSA driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the
+subsystem and what it provides.
+
+The switch is, if possible, configured to enable a Broadcom specific 4-bytes
+switch tag which gets inserted by the switch for every packet forwarded to the
+CPU interface, conversely, the CPU network interface should insert a similar
+tag for packets entering the CPU port. The tag format is described in
+``net/dsa/tag_brcm.c``.
+
+The configuration of the device depends on whether or not tagging is
+supported.
+
+The interface names and example network configuration are used according the
+configuration described in the :ref:`dsa-config-showcases`.
+
+Configuration with tagging support
+----------------------------------
+
+The tagging based configuration is desired. It is not specific to the b53
+DSA driver and will work like all DSA drivers which supports tagging.
+
+See :ref:`dsa-tagged-configuration`.
+
+Configuration without tagging support
+-------------------------------------
+
+Older models (5325, 5365) support a different tag format that is not supported
+yet. 539x and 531x5 require managed mode and some special handling, which is
+also not yet supported. The tagging support is disabled in these cases and the
+switch need a different configuration.
+
+The configuration slightly differ from the :ref:`dsa-vlan-configuration`.
+
+The b53 tags the CPU port in all VLANs, since otherwise any PVID untagged
+VLAN programming would basically change the CPU port's default PVID and make
+it untagged, undesirable.
+
+In difference to the configuration described in :ref:`dsa-vlan-configuration`
+the default VLAN 1 has to be removed from the slave interface configuration in
+single port and gateway configuration, while there is no need to add an extra
+VLAN configuration in the bridge showcase.
+
+single port
+~~~~~~~~~~~
+The configuration can only be set up via VLAN tagging and bridge setup.
+By default packages are tagged with vid 1:
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+ ip link add link eth0 name eth0.3 type vlan id 3
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+ ip link set eth0.3 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev wan master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 2 pvid untagged
+ bridge vlan del dev lan1 vid 1
+ bridge vlan add dev lan2 vid 3 pvid untagged
+ bridge vlan del dev lan2 vid 1
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.1
+ ip addr add 192.0.2.5/30 dev eth0.2
+ ip addr add 192.0.2.9/30 dev eth0.3
+
+ # bring up the bridge devices
+ ip link set br0 up
+
+
+bridge
+~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridge
+ ip link set dev wan master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set eth0.1 master br0
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+gateway
+~~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev wan master br0
+ ip link set eth0.1 master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev wan vid 2 pvid untagged
+ bridge vlan del dev wan vid 1
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.2
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge devices
+ ip link set br0 up
diff --git a/Documentation/networking/dsa/bcm_sf2.txt b/Documentation/networking/dsa/bcm_sf2.rst
similarity index 83%
rename from Documentation/networking/dsa/bcm_sf2.txt
rename to Documentation/networking/dsa/bcm_sf2.rst
index eba3a24..dee2340 100644
--- a/Documentation/networking/dsa/bcm_sf2.txt
+++ b/Documentation/networking/dsa/bcm_sf2.rst
@@ -1,3 +1,4 @@
+=============================================
Broadcom Starfighter 2 Ethernet switch driver
=============================================
@@ -25,27 +26,27 @@
The switch hardware block is typically interfaced using MMIO accesses and
contains a bunch of sub-blocks/registers:
-* SWITCH_CORE: common switch registers
-* SWITCH_REG: external interfaces switch register
-* SWITCH_MDIO: external MDIO bus controller (there is another one in SWITCH_CORE,
+- ``SWITCH_CORE``: common switch registers
+- ``SWITCH_REG``: external interfaces switch register
+- ``SWITCH_MDIO``: external MDIO bus controller (there is another one in SWITCH_CORE,
which is used for indirect PHY accesses)
-* SWITCH_INDIR_RW: 64-bits wide register helper block
-* SWITCH_INTRL2_0/1: Level-2 interrupt controllers
-* SWITCH_ACB: Admission control block
-* SWITCH_FCB: Fail-over control block
+- ``SWITCH_INDIR_RW``: 64-bits wide register helper block
+- ``SWITCH_INTRL2_0/1``: Level-2 interrupt controllers
+- ``SWITCH_ACB``: Admission control block
+- ``SWITCH_FCB``: Fail-over control block
Implementation details
======================
-The driver is located in drivers/net/dsa/bcm_sf2.c and is implemented as a DSA
-driver; see Documentation/networking/dsa/dsa.txt for details on the subsystem
+The driver is located in ``drivers/net/dsa/bcm_sf2.c`` and is implemented as a DSA
+driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the subsystem
and what it provides.
The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag
which gets inserted by the switch for every packet forwarded to the CPU
interface, conversely, the CPU network interface should insert a similar tag for
packets entering the CPU port. The tag format is described in
-net/dsa/tag_brcm.c.
+``net/dsa/tag_brcm.c``.
Overall, the SF2 driver is a fairly regular DSA driver; there are a few
specifics covered below.
@@ -54,7 +55,7 @@
-------------------
The DSA platform device driver is probed using a specific compatible string
-provided in net/dsa/dsa.c. The reason for that is because the DSA subsystem gets
+provided in ``net/dsa/dsa.c``. The reason for that is because the DSA subsystem gets
registered as a platform device driver currently. DSA will provide the needed
device_node pointers which are then accessible by the switch driver setup
function to setup resources such as register ranges and interrupts. This
@@ -70,7 +71,7 @@
in order to properly configure them. By default, the SF2 pseudo-PHY address, and
an external switch pseudo-PHY address will both be snooping for incoming MDIO
transactions, since they are at the same address (30), resulting in some kind of
-"double" programming. Using DSA, and setting ds->phys_mii_mask accordingly, we
+"double" programming. Using DSA, and setting ``ds->phys_mii_mask`` accordingly, we
selectively divert reads and writes towards external Broadcom switches
pseudo-PHY addresses. Newer revisions of the SF2 hardware have introduced a
configurable pseudo-PHY address which circumvents the initial design limitation.
@@ -86,7 +87,7 @@
MoCA interface carrier state and properly report this to the networking stack.
The MoCA interfaces are supported using the PHY library's fixed PHY/emulated PHY
-device and the switch driver registers a fixed_link_update callback for such
+device and the switch driver registers a ``fixed_link_update`` callback for such
PHYs which reflects the link state obtained from the interrupt handler.
diff --git a/Documentation/networking/dsa/configuration.rst b/Documentation/networking/dsa/configuration.rst
new file mode 100644
index 0000000..af029b3
--- /dev/null
+++ b/Documentation/networking/dsa/configuration.rst
@@ -0,0 +1,292 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+DSA switch configuration from userspace
+=======================================
+
+The DSA switch configuration is not integrated into the main userspace
+network configuration suites by now and has to be performed manualy.
+
+.. _dsa-config-showcases:
+
+Configuration showcases
+-----------------------
+
+To configure a DSA switch a couple of commands need to be executed. In this
+documentation some common configuration scenarios are handled as showcases:
+
+*single port*
+ Every switch port acts as a different configurable Ethernet port
+
+*bridge*
+ Every switch port is part of one configurable Ethernet bridge
+
+*gateway*
+ Every switch port except one upstream port is part of a configurable
+ Ethernet bridge.
+ The upstream port acts as different configurable Ethernet port.
+
+All configurations are performed with tools from iproute2, which is available
+at https://www.kernel.org/pub/linux/utils/net/iproute2/
+
+Through DSA every port of a switch is handled like a normal linux Ethernet
+interface. The CPU port is the switch port connected to an Ethernet MAC chip.
+The corresponding linux Ethernet interface is called the master interface.
+All other corresponding linux interfaces are called slave interfaces.
+
+The slave interfaces depend on the master interface. They can only brought up,
+when the master interface is up.
+
+In this documentation the following Ethernet interfaces are used:
+
+*eth0*
+ the master interface
+
+*lan1*
+ a slave interface
+
+*lan2*
+ another slave interface
+
+*lan3*
+ a third slave interface
+
+*wan*
+ A slave interface dedicated for upstream traffic
+
+Further Ethernet interfaces can be configured similar.
+The configured IPs and networks are:
+
+*single port*
+ * lan1: 192.0.2.1/30 (192.0.2.0 - 192.0.2.3)
+ * lan2: 192.0.2.5/30 (192.0.2.4 - 192.0.2.7)
+ * lan3: 192.0.2.9/30 (192.0.2.8 - 192.0.2.11)
+
+*bridge*
+ * br0: 192.0.2.129/25 (192.0.2.128 - 192.0.2.255)
+
+*gateway*
+ * br0: 192.0.2.129/25 (192.0.2.128 - 192.0.2.255)
+ * wan: 192.0.2.1/30 (192.0.2.0 - 192.0.2.3)
+
+.. _dsa-tagged-configuration:
+
+Configuration with tagging support
+----------------------------------
+
+The tagging based configuration is desired and supported by the majority of
+DSA switches. These switches are capable to tag incoming and outgoing traffic
+without using a VLAN based configuration.
+
+single port
+~~~~~~~~~~~
+
+.. code-block:: sh
+
+ # configure each interface
+ ip addr add 192.0.2.1/30 dev lan1
+ ip addr add 192.0.2.5/30 dev lan2
+ ip addr add 192.0.2.9/30 dev lan3
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan2 up
+ ip link set lan3 up
+
+bridge
+~~~~~~
+
+.. code-block:: sh
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan2 up
+ ip link set lan3 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # add ports to bridge
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set dev lan3 master br0
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+gateway
+~~~~~~~
+
+.. code-block:: sh
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # configure the upstream port
+ ip addr add 192.0.2.1/30 dev wan
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # add ports to bridge
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+.. _dsa-vlan-configuration:
+
+Configuration without tagging support
+-------------------------------------
+
+A minority of switches are not capable to use a taging protocol
+(DSA_TAG_PROTO_NONE). These switches can be configured by a VLAN based
+configuration.
+
+single port
+~~~~~~~~~~~
+The configuration can only be set up via VLAN tagging and bridge setup.
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+ ip link add link eth0 name eth0.3 type vlan id 3
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+ ip link set eth0.3 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan1 up
+ ip link set lan3 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set dev lan3 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 1 pvid untagged
+ bridge vlan add dev lan2 vid 2 pvid untagged
+ bridge vlan add dev lan3 vid 3 pvid untagged
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.1
+ ip addr add 192.0.2.5/30 dev eth0.2
+ ip addr add 192.0.2.9/30 dev eth0.3
+
+ # bring up the bridge devices
+ ip link set br0 up
+
+
+bridge
+~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan2 up
+ ip link set lan3 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridge
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set dev lan3 master br0
+ ip link set eth0.1 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 1 pvid untagged
+ bridge vlan add dev lan2 vid 1 pvid untagged
+ bridge vlan add dev lan3 vid 1 pvid untagged
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+gateway
+~~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev wan master br0
+ ip link set eth0.1 master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 1 pvid untagged
+ bridge vlan add dev lan2 vid 1 pvid untagged
+ bridge vlan add dev wan vid 2 pvid untagged
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.2
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge devices
+ ip link set br0 up
diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.rst
similarity index 67%
rename from Documentation/networking/dsa/dsa.txt
rename to Documentation/networking/dsa/dsa.rst
index 25170ad..563d56c 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.rst
@@ -1,10 +1,8 @@
-Distributed Switch Architecture
-===============================
-
-Introduction
+============
+Architecture
============
-This document describes the Distributed Switch Architecture (DSA) subsystem
+This document describes the **Distributed Switch Architecture (DSA)** subsystem
design principles, limitations, interactions with other subsystems, and how to
develop drivers for this subsystem as well as a TODO for developers interested
in joining the effort.
@@ -70,11 +68,11 @@
DSA currently supports 5 different tagging protocols, and a tag-less mode as
well. The different protocols are implemented in:
-net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy)
-net/dsa/tag_dsa.c: Marvell's original DSA tag
-net/dsa/tag_edsa.c: Marvell's enhanced DSA tag
-net/dsa/tag_brcm.c: Broadcom's 4 bytes tag
-net/dsa/tag_qca.c: Qualcomm's 2 bytes tag
+- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy)
+- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag
+- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag
+- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag
+- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag
The exact format of the tag protocol is vendor specific, but in general, they
all contain something which:
@@ -89,7 +87,7 @@
the CPU/management Ethernet interface. Such a driver might occasionally need to
know whether DSA is enabled (e.g.: to enable/disable specific offload features),
but the DSA subsystem has been proven to work with industry standard drivers:
-e1000e, mv643xx_eth etc. without having to introduce modifications to these
+``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these
drivers. Such network devices are also often referred to as conduit network
devices since they act as a pipe between the host processor and the hardware
Ethernet switch.
@@ -100,40 +98,42 @@
When a master netdev is used with DSA, a small hook is placed in in the
networking stack is in order to have the DSA subsystem process the Ethernet
switch specific tagging protocol. DSA accomplishes this by registering a
-specific (and fake) Ethernet type (later becoming skb->protocol) with the
-networking stack, this is also known as a ptype or packet_type. A typical
+specific (and fake) Ethernet type (later becoming ``skb->protocol``) with the
+networking stack, this is also known as a ``ptype`` or ``packet_type``. A typical
Ethernet Frame receive sequence looks like this:
Master network device (e.g.: e1000e):
-Receive interrupt fires:
-- receive function is invoked
-- basic packet processing is done: getting length, status etc.
-- packet is prepared to be processed by the Ethernet layer by calling
- eth_type_trans
+1. Receive interrupt fires:
-net/ethernet/eth.c:
+ - receive function is invoked
+ - basic packet processing is done: getting length, status etc.
+ - packet is prepared to be processed by the Ethernet layer by calling
+ ``eth_type_trans``
-eth_type_trans(skb, dev)
- if (dev->dsa_ptr != NULL)
- -> skb->protocol = ETH_P_XDSA
+2. net/ethernet/eth.c::
-drivers/net/ethernet/*:
+ eth_type_trans(skb, dev)
+ if (dev->dsa_ptr != NULL)
+ -> skb->protocol = ETH_P_XDSA
-netif_receive_skb(skb)
- -> iterate over registered packet_type
- -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
+3. drivers/net/ethernet/\*::
-net/dsa/dsa.c:
- -> dsa_switch_rcv()
- -> invoke switch tag specific protocol handler in
- net/dsa/tag_*.c
+ netif_receive_skb(skb)
+ -> iterate over registered packet_type
+ -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
-net/dsa/tag_*.c:
- -> inspect and strip switch tag protocol to determine originating port
- -> locate per-port network device
- -> invoke eth_type_trans() with the DSA slave network device
- -> invoked netif_receive_skb()
+4. net/dsa/dsa.c::
+
+ -> dsa_switch_rcv()
+ -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c'
+
+5. net/dsa/tag_*.c:
+
+ - inspect and strip switch tag protocol to determine originating port
+ - locate per-port network device
+ - invoke ``eth_type_trans()`` with the DSA slave network device
+ - invoked ``netif_receive_skb()``
Past this point, the DSA slave network devices get delivered regular Ethernet
frames that can be processed by the networking stack.
@@ -162,7 +162,7 @@
switch tag in the Ethernet frames.
These frames are then queued for transmission using the master network device
-ndo_start_xmit() function, since they contain the appropriate switch tag, the
+``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the
Ethernet switch will be able to process these incoming frames from the
management interface and delivers these frames to the physical switch port.
@@ -170,23 +170,25 @@
------------------------
Summarized, this is basically how DSA looks like from a network device
-perspective:
+perspective::
- |---------------------------
- | CPU network device (eth0)|
- ----------------------------
- | <tag added by switch |
- | |
- | |
- | tag added by CPU> |
- |--------------------------------------------|
- | Switch driver |
- |--------------------------------------------|
- || || ||
- |-------| |-------| |-------|
- | sw0p0 | | sw0p1 | | sw0p2 |
- |-------| |-------| |-------|
+ |---------------------------
+ | CPU network device (eth0)|
+ ----------------------------
+ | <tag added by switch |
+ | |
+ | |
+ | tag added by CPU> |
+ |--------------------------------------------|
+ | Switch driver |
+ |--------------------------------------------|
+ || || ||
+ |-------| |-------| |-------|
+ | sw0p0 | | sw0p1 | | sw0p2 |
+ |-------| |-------| |-------|
+
+
Slave MDIO bus
--------------
@@ -207,53 +209,41 @@
Data structures
---------------
-DSA data structures are defined in include/net/dsa.h as well as
-net/dsa/dsa_priv.h.
+DSA data structures are defined in ``include/net/dsa.h`` as well as
+``net/dsa/dsa_priv.h``:
-dsa_chip_data: platform data configuration for a given switch device, this
-structure describes a switch device's parent device, its address, as well as
-various properties of its ports: names/labels, and finally a routing table
-indication (when cascading switches)
+- ``dsa_chip_data``: platform data configuration for a given switch device,
+ this structure describes a switch device's parent device, its address, as
+ well as various properties of its ports: names/labels, and finally a routing
+ table indication (when cascading switches)
-dsa_platform_data: platform device configuration data which can reference a
-collection of dsa_chip_data structure if multiples switches are cascaded, the
-master network device this switch tree is attached to needs to be referenced
+- ``dsa_platform_data``: platform device configuration data which can reference
+ a collection of dsa_chip_data structure if multiples switches are cascaded,
+ the master network device this switch tree is attached to needs to be
+ referenced
-dsa_switch_tree: structure assigned to the master network device under
-"dsa_ptr", this structure references a dsa_platform_data structure as well as
-the tagging protocol supported by the switch tree, and which receive/transmit
-function hooks should be invoked, information about the directly attached switch
-is also provided: CPU port. Finally, a collection of dsa_switch are referenced
-to address individual switches in the tree.
+- ``dsa_switch_tree``: structure assigned to the master network device under
+ ``dsa_ptr``, this structure references a dsa_platform_data structure as well as
+ the tagging protocol supported by the switch tree, and which receive/transmit
+ function hooks should be invoked, information about the directly attached
+ switch is also provided: CPU port. Finally, a collection of dsa_switch are
+ referenced to address individual switches in the tree.
-dsa_switch: structure describing a switch device in the tree, referencing a
-dsa_switch_tree as a backpointer, slave network devices, master network device,
-and a reference to the backing dsa_switch_ops
+- ``dsa_switch``: structure describing a switch device in the tree, referencing
+ a ``dsa_switch_tree`` as a backpointer, slave network devices, master network
+ device, and a reference to the backing``dsa_switch_ops``
-dsa_switch_ops: structure referencing function pointers, see below for a full
-description.
+- ``dsa_switch_ops``: structure referencing function pointers, see below for a
+ full description.
Design limitations
==================
-DSA is a platform device driver
--------------------------------
-
-DSA is implemented as a DSA platform device driver which is convenient because
-it will register the entire DSA switch tree attached to a master network device
-in one-shot, facilitating the device creation and simplifying the device driver
-model a bit, this comes however with a number of limitations:
-
-- building DSA and its switch drivers as modules is currently not working
-- the device driver parenting does not necessarily reflect the original
- bus/device the switch can be created from
-- supporting non-MDIO and non-MMIO (platform) switches is not possible
-
Limits on the number of devices and ports
-----------------------------------------
DSA currently limits the number of maximum switches within a tree to 4
-(DSA_MAX_SWITCHES), and the number of ports per switch to 12 (DSA_MAX_PORTS).
+(``DSA_MAX_SWITCHES``), and the number of ports per switch to 12 (``DSA_MAX_PORTS``).
These limits could be extended to support larger configurations would this need
arise.
@@ -292,15 +282,15 @@
DSA currently leverages the following subsystems:
-- MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c
-- Switchdev: net/switchdev/*
+- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c``
+- Switchdev:``net/switchdev/*``
- Device Tree for various of_* functions
MDIO/PHY library
----------------
Slave network devices exposed by DSA may or may not be interfacing with PHY
-devices (struct phy_device as defined in include/linux/phy.h), but the DSA
+devices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA
subsystem deals with all possible combinations:
- internal PHY devices, built into the Ethernet switch hardware
@@ -309,16 +299,16 @@
- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a
fixed PHYs
-The PHY configuration is done by the dsa_slave_phy_setup() function and the
+The PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the
logic basically looks like this:
- if Device Tree is used, the PHY device is looked up using the standard
"phy-handle" property, if found, this PHY device is created and registered
- using of_phy_connect()
+ using ``of_phy_connect()``
- if Device Tree is used, and the PHY device is "fixed", that is, conforms to
the definition of a non-MDIO managed PHY as defined in
- Documentation/devicetree/bindings/net/fixed-link.txt, the PHY is registered
+ ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered
and connected transparently using the special fixed MDIO bus driver
- finally, if the PHY is built into the switch, as is very common with
@@ -344,8 +334,8 @@
-----------
DSA features a standardized binding which is documented in
-Documentation/devicetree/bindings/net/dsa/dsa.txt. PHY/MDIO library helper
-functions such as of_get_phy_mode(), of_phy_connect() are also used to query
+``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper
+functions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query
per-port PHY specific details: interface connection, MDIO bus location etc..
Driver development
@@ -354,8 +344,8 @@
DSA switch drivers need to implement a dsa_switch_ops structure which will
contain the various members described below.
-register_switch_driver() registers this dsa_switch_ops in its internal list
-of drivers to probe for. unregister_switch_driver() does the exact opposite.
+``register_switch_driver()`` registers this dsa_switch_ops in its internal list
+of drivers to probe for. ``unregister_switch_driver()`` does the exact opposite.
Unless requested differently by setting the priv_size member accordingly, DSA
does not allocate any driver private context space.
@@ -363,17 +353,17 @@
Switch configuration
--------------------
-- tag_protocol: this is to indicate what kind of tagging protocol is supported,
- should be a valid value from the dsa_tag_protocol enum
+- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported,
+ should be a valid value from the ``dsa_tag_protocol`` enum
-- probe: probe routine which will be invoked by the DSA platform device upon
+- ``probe``: probe routine which will be invoked by the DSA platform device upon
registration to test for the presence/absence of a switch device. For MDIO
devices, it is recommended to issue a read towards internal registers using
the switch pseudo-PHY and return whether this is a supported device. For other
buses, return a non-NULL string
-- setup: setup function for the switch, this function is responsible for setting
- up the dsa_switch_ops private structure with all it needs: register maps,
+- ``setup``: setup function for the switch, this function is responsible for setting
+ up the ``dsa_switch_ops`` private structure with all it needs: register maps,
interrupts, mutexes, locks etc.. This function is also expected to properly
configure the switch to separate all network interfaces from each other, that
is, they should be isolated by the switch hardware itself, typically by creating
@@ -388,27 +378,27 @@
PHY devices and link management
-------------------------------
-- get_phy_flags: Some switches are interfaced to various kinds of Ethernet PHYs,
+- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs,
if the PHY library PHY driver needs to know about information it cannot obtain
on its own (e.g.: coming from switch memory mapped registers), this function
should return a 32-bits bitmask of "flags", that is private between the switch
- driver and the Ethernet PHY driver in drivers/net/phy/*.
+ driver and the Ethernet PHY driver in ``drivers/net/phy/\*``.
-- phy_read: Function invoked by the DSA slave MDIO bus when attempting to read
+- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read
the switch port MDIO registers. If unavailable, return 0xffff for each read.
For builtin switch Ethernet PHYs, this function should allow reading the link
status, auto-negotiation results, link partner pages etc..
-- phy_write: Function invoked by the DSA slave MDIO bus when attempting to write
+- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write
to the switch port MDIO registers. If unavailable return a negative error
code.
-- adjust_link: Function invoked by the PHY library when a slave network device
+- ``adjust_link``: Function invoked by the PHY library when a slave network device
is attached to a PHY device. This function is responsible for appropriately
configuring the switch port link parameters: speed, duplex, pause based on
- what the phy_device is providing.
+ what the ``phy_device`` is providing.
-- fixed_link_update: Function invoked by the PHY library, and specifically by
+- ``fixed_link_update``: Function invoked by the PHY library, and specifically by
the fixed PHY driver asking the switch driver for link parameters that could
not be auto-negotiated, or obtained by reading the PHY registers through MDIO.
This is particularly useful for specific kinds of hardware such as QSGMII,
@@ -418,87 +408,87 @@
Ethtool operations
------------------
-- get_strings: ethtool function used to query the driver's strings, will
+- ``get_strings``: ethtool function used to query the driver's strings, will
typically return statistics strings, private flags strings etc.
-- get_ethtool_stats: ethtool function used to query per-port statistics and
+- ``get_ethtool_stats``: ethtool function used to query per-port statistics and
return their values. DSA overlays slave network devices general statistics:
RX/TX counters from the network device, with switch driver specific statistics
per port
-- get_sset_count: ethtool function used to query the number of statistics items
+- ``get_sset_count``: ethtool function used to query the number of statistics items
-- get_wol: ethtool function used to obtain Wake-on-LAN settings per-port, this
+- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this
function may, for certain implementations also query the master network device
Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN
-- set_wol: ethtool function used to configure Wake-on-LAN settings per-port,
+- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port,
direct counterpart to set_wol with similar restrictions
-- set_eee: ethtool function which is used to configure a switch port EEE (Green
+- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green
Ethernet) settings, can optionally invoke the PHY library to enable EEE at the
PHY level if relevant. This function should enable EEE at the switch port MAC
controller and data-processing logic
-- get_eee: ethtool function which is used to query a switch port EEE settings,
+- ``get_eee``: ethtool function which is used to query a switch port EEE settings,
this function should return the EEE state of the switch port MAC controller
and data-processing logic as well as query the PHY for its currently configured
EEE settings
-- get_eeprom_len: ethtool function returning for a given switch the EEPROM
+- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM
length/size in bytes
-- get_eeprom: ethtool function returning for a given switch the EEPROM contents
+- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents
-- set_eeprom: ethtool function writing specified data to a given switch EEPROM
+- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM
-- get_regs_len: ethtool function returning the register length for a given
+- ``get_regs_len``: ethtool function returning the register length for a given
switch
-- get_regs: ethtool function returning the Ethernet switch internal register
+- ``get_regs``: ethtool function returning the Ethernet switch internal register
contents. This function might require user-land code in ethtool to
pretty-print register values and registers
Power management
----------------
-- suspend: function invoked by the DSA platform device when the system goes to
+- ``suspend``: function invoked by the DSA platform device when the system goes to
suspend, should quiesce all Ethernet switch activities, but keep ports
participating in Wake-on-LAN active as well as additional wake-up logic if
supported
-- resume: function invoked by the DSA platform device when the system resumes,
+- ``resume``: function invoked by the DSA platform device when the system resumes,
should resume all Ethernet switch activities and re-configure the switch to be
in a fully active state
-- port_enable: function invoked by the DSA slave network device ndo_open
+- ``port_enable``: function invoked by the DSA slave network device ndo_open
function when a port is administratively brought up, this function should be
fully enabling a given switch port. DSA takes care of marking the port with
- BR_STATE_BLOCKING if the port is a bridge member, or BR_STATE_FORWARDING if it
+ ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it
was not, and propagating these changes down to the hardware
-- port_disable: function invoked by the DSA slave network device ndo_close
+- ``port_disable``: function invoked by the DSA slave network device ndo_close
function when a port is administratively brought down, this function should be
fully disabling a given switch port. DSA takes care of marking the port with
- BR_STATE_DISABLED and propagating changes to the hardware if this port is
+ ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is
disabled while being a bridge member
Bridge layer
------------
-- port_bridge_join: bridge layer function invoked when a given switch port is
+- ``port_bridge_join``: bridge layer function invoked when a given switch port is
added to a bridge, this function should be doing the necessary at the switch
level to permit the joining port from being added to the relevant logical
domain for it to ingress/egress traffic with other members of the bridge.
-- port_bridge_leave: bridge layer function invoked when a given switch port is
+- ``port_bridge_leave``: bridge layer function invoked when a given switch port is
removed from a bridge, this function should be doing the necessary at the
switch level to deny the leaving port from ingress/egress traffic from the
remaining bridge members. When the port leaves the bridge, it should be aged
out at the switch hardware for the switch to (re) learn MAC addresses behind
this port.
-- port_stp_state_set: bridge layer function invoked when a given switch port STP
+- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP
state is computed by the bridge layer and should be propagated to switch
hardware to forward/block/learn traffic. The switch driver is responsible for
computing a STP state change based on current and asked parameters and perform
@@ -507,7 +497,7 @@
Bridge VLAN filtering
---------------------
-- port_vlan_filtering: bridge layer function invoked when the bridge gets
+- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets
configured for turning on or off VLAN filtering. If nothing specific needs to
be done at the hardware level, this callback does not need to be implemented.
When VLAN filtering is turned on, the hardware must be programmed with
@@ -517,65 +507,61 @@
accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are
allowed.
-- port_vlan_prepare: bridge layer function invoked when the bridge prepares the
+- ``port_vlan_prepare``: bridge layer function invoked when the bridge prepares the
configuration of a VLAN on the given port. If the operation is not supported
- by the hardware, this function should return -EOPNOTSUPP to inform the bridge
+ by the hardware, this function should return ``-EOPNOTSUPP`` to inform the bridge
code to fallback to a software implementation. No hardware setup must be done
in this function. See port_vlan_add for this and details.
-- port_vlan_add: bridge layer function invoked when a VLAN is configured
+- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured
(tagged or untagged) for the given switch port
-- port_vlan_del: bridge layer function invoked when a VLAN is removed from the
+- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the
given switch port
-- port_vlan_dump: bridge layer function invoked with a switchdev callback
+- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each VLAN the given port is a member
of. A switchdev object is used to carry the VID and bridge flags.
-- port_fdb_prepare: bridge layer function invoked when the bridge prepares the
- installation of a Forwarding Database entry. If the operation is not
- supported, this function should return -EOPNOTSUPP to inform the bridge code
- to fallback to a software implementation. No hardware setup must be done in
- this function. See port_fdb_add for this and details.
-
-- port_fdb_add: bridge layer function invoked when the bridge wants to install a
+- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a
Forwarding Database entry, the switch hardware should be programmed with the
specified address in the specified VLAN Id in the forwarding database
- associated with this VLAN ID
+ associated with this VLAN ID. If the operation is not supported, this
+ function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to
+ a software implementation.
-Note: VLAN ID 0 corresponds to the port private database, which, in the context
-of DSA, would be the its port-based VLAN, used by the associated bridge device.
+.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
+ of DSA, would be its port-based VLAN, used by the associated bridge device.
-- port_fdb_del: bridge layer function invoked when the bridge wants to remove a
+- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
Forwarding Database entry, the switch hardware should be programmed to delete
the specified MAC address from the specified VLAN ID if it was mapped into
this port forwarding database
-- port_fdb_dump: bridge layer function invoked with a switchdev callback
+- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and FDB info.
-- port_mdb_prepare: bridge layer function invoked when the bridge prepares the
+- ``port_mdb_prepare``: bridge layer function invoked when the bridge prepares the
installation of a multicast database entry. If the operation is not supported,
- this function should return -EOPNOTSUPP to inform the bridge code to fallback
+ this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback
to a software implementation. No hardware setup must be done in this function.
- See port_fdb_add for this and details.
+ See ``port_fdb_add`` for this and details.
-- port_mdb_add: bridge layer function invoked when the bridge wants to install
+- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install
a multicast database entry, the switch hardware should be programmed with the
specified address in the specified VLAN ID in the forwarding database
associated with this VLAN ID.
-Note: VLAN ID 0 corresponds to the port private database, which, in the context
-of DSA, would be the its port-based VLAN, used by the associated bridge device.
+.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
+ of DSA, would be its port-based VLAN, used by the associated bridge device.
-- port_mdb_del: bridge layer function invoked when the bridge wants to remove a
+- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
multicast database entry, the switch hardware should be programmed to delete
the specified MAC address from the specified VLAN ID if it was mapped into
this port forwarding database.
-- port_mdb_dump: bridge layer function invoked with a switchdev callback
+- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and MDB info.
@@ -594,7 +580,7 @@
Other hanging fruits
--------------------
-- making the number of ports fully dynamic and not dependent on DSA_MAX_PORTS
+- making the number of ports fully dynamic and not dependent on ``DSA_MAX_PORTS``
- allowing more than one CPU/management interface:
http://comments.gmane.org/gmane.linux.network/365657
- porting more drivers from other vendors:
diff --git a/Documentation/networking/dsa/index.rst b/Documentation/networking/dsa/index.rst
new file mode 100644
index 0000000..ee631e2
--- /dev/null
+++ b/Documentation/networking/dsa/index.rst
@@ -0,0 +1,13 @@
+===============================
+Distributed Switch Architecture
+===============================
+
+.. toctree::
+ :maxdepth: 1
+
+ dsa
+ b53
+ bcm_sf2
+ lan9303
+ sja1105
+ configuration
diff --git a/Documentation/networking/dsa/lan9303.txt b/Documentation/networking/dsa/lan9303.rst
similarity index 84%
rename from Documentation/networking/dsa/lan9303.txt
rename to Documentation/networking/dsa/lan9303.rst
index 144b02b..e3c820d 100644
--- a/Documentation/networking/dsa/lan9303.txt
+++ b/Documentation/networking/dsa/lan9303.rst
@@ -1,3 +1,4 @@
+==============================
LAN9303 Ethernet switch driver
==============================
@@ -9,10 +10,9 @@
Driver details
==============
-The driver is implemented as a DSA driver, see
-Documentation/networking/dsa/dsa.txt.
+The driver is implemented as a DSA driver, see ``Documentation/networking/dsa/dsa.rst``.
-See Documentation/devicetree/bindings/net/dsa/lan9303.txt for device tree
+See ``Documentation/devicetree/bindings/net/dsa/lan9303.txt`` for device tree
binding.
The LAN9303 can be managed both via MDIO and I2C, both supported by this driver.
diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst
new file mode 100644
index 0000000..eef20d0
--- /dev/null
+++ b/Documentation/networking/dsa/sja1105.rst
@@ -0,0 +1,310 @@
+=========================
+NXP SJA1105 switch driver
+=========================
+
+Overview
+========
+
+The NXP SJA1105 is a family of 6 devices:
+
+- SJA1105E: First generation, no TTEthernet
+- SJA1105T: First generation, TTEthernet
+- SJA1105P: Second generation, no TTEthernet, no SGMII
+- SJA1105Q: Second generation, TTEthernet, no SGMII
+- SJA1105R: Second generation, no TTEthernet, SGMII
+- SJA1105S: Second generation, TTEthernet, SGMII
+
+These are SPI-managed automotive switches, with all ports being gigabit
+capable, and supporting MII/RMII/RGMII and optionally SGMII on one port.
+
+Being automotive parts, their configuration interface is geared towards
+set-and-forget use, with minimal dynamic interaction at runtime. They
+require a static configuration to be composed by software and packed
+with CRC and table headers, and sent over SPI.
+
+The static configuration is composed of several configuration tables. Each
+table takes a number of entries. Some configuration tables can be (partially)
+reconfigured at runtime, some not. Some tables are mandatory, some not:
+
+============================= ================== =============================
+Table Mandatory Reconfigurable
+============================= ================== =============================
+Schedule no no
+Schedule entry points if Scheduling no
+VL Lookup no no
+VL Policing if VL Lookup no
+VL Forwarding if VL Lookup no
+L2 Lookup no no
+L2 Policing yes no
+VLAN Lookup yes yes
+L2 Forwarding yes partially (fully on P/Q/R/S)
+MAC Config yes partially (fully on P/Q/R/S)
+Schedule Params if Scheduling no
+Schedule Entry Points Params if Scheduling no
+VL Forwarding Params if VL Forwarding no
+L2 Lookup Params no partially (fully on P/Q/R/S)
+L2 Forwarding Params yes no
+Clock Sync Params no no
+AVB Params no no
+General Params yes partially
+Retagging no yes
+xMII Params yes no
+SGMII no yes
+============================= ================== =============================
+
+
+Also the configuration is write-only (software cannot read it back from the
+switch except for very few exceptions).
+
+The driver creates a static configuration at probe time, and keeps it at
+all times in memory, as a shadow for the hardware state. When required to
+change a hardware setting, the static configuration is also updated.
+If that changed setting can be transmitted to the switch through the dynamic
+reconfiguration interface, it is; otherwise the switch is reset and
+reprogrammed with the updated static configuration.
+
+Traffic support
+===============
+
+The switches do not support switch tagging in hardware. But they do support
+customizing the TPID by which VLAN traffic is identified as such. The switch
+driver is leveraging ``CONFIG_NET_DSA_TAG_8021Q`` by requesting that special
+VLANs (with a custom TPID of ``ETH_P_EDSA`` instead of ``ETH_P_8021Q``) are
+installed on its ports when not in ``vlan_filtering`` mode. This does not
+interfere with the reception and transmission of real 802.1Q-tagged traffic,
+because the switch does no longer parse those packets as VLAN after the TPID
+change.
+The TPID is restored when ``vlan_filtering`` is requested by the user through
+the bridge layer, and general IP termination becomes no longer possible through
+the switch netdevices in this mode.
+
+The switches have two programmable filters for link-local destination MACs.
+These are used to trap BPDUs and PTP traffic to the master netdevice, and are
+further used to support STP and 1588 ordinary clock/boundary clock
+functionality.
+
+The following traffic modes are supported over the switch netdevices:
+
++--------------------+------------+------------------+------------------+
+| | Standalone | Bridged with | Bridged with |
+| | ports | vlan_filtering 0 | vlan_filtering 1 |
++====================+============+==================+==================+
+| Regular traffic | Yes | Yes | No (use master) |
++--------------------+------------+------------------+------------------+
+| Management traffic | Yes | Yes | Yes |
+| (BPDU, PTP) | | | |
++--------------------+------------+------------------+------------------+
+
+Switching features
+==================
+
+The driver supports the configuration of L2 forwarding rules in hardware for
+port bridging. The forwarding, broadcast and flooding domain between ports can
+be restricted through two methods: either at the L2 forwarding level (isolate
+one bridge's ports from another's) or at the VLAN port membership level
+(isolate ports within the same bridge). The final forwarding decision taken by
+the hardware is a logical AND of these two sets of rules.
+
+The hardware tags all traffic internally with a port-based VLAN (pvid), or it
+decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification
+is not possible. Once attributed a VLAN tag, frames are checked against the
+port's membership rules and dropped at ingress if they don't match any VLAN.
+This behavior is available when switch ports are enslaved to a bridge with
+``vlan_filtering 1``.
+
+Normally the hardware is not configurable with respect to VLAN awareness, but
+by changing what TPID the switch searches 802.1Q tags for, the semantics of a
+bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or
+untagged), and therefore this mode is also supported.
+
+Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
+all bridges should have the same level of VLAN awareness (either both have
+``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact
+that VLAN awareness is global at the switch level is that once a bridge with
+``vlan_filtering`` enslaves at least one switch port, the other un-bridged
+ports are no longer available for standalone traffic termination.
+
+Topology and loop detection through STP is supported.
+
+L2 FDB manipulation (add/delete/dump) is currently possible for the first
+generation devices. Aging time of FDB entries, as well as enabling fully static
+management (no address learning and no flooding of unknown traffic) is not yet
+configurable in the driver.
+
+A special comment about bridging with other netdevices (illustrated with an
+example):
+
+A board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1.
+The switch ports (swp0-3) are under br0.
+It is desired that eth0 is turned into another switched port that communicates
+with swp0-3.
+
+If br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the
+intended results.
+If br0 has vlan_filtering 1, then a new br1 interface needs to be created that
+enslaves eth0 and eth1 (the DSA master of the switch ports). This is because in
+this mode, the switch ports beneath br0 are not capable of regular traffic, and
+are only used as a conduit for switchdev operations.
+
+Offloads
+========
+
+Time-aware scheduling
+---------------------
+
+The switch supports a variation of the enhancements for scheduled traffic
+specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
+ensure deterministic latency for priority traffic that is sent in-band with its
+gate-open event in the network schedule.
+
+This capability can be managed through the tc-taprio offload ('flags 2'). The
+difference compared to the software implementation of taprio is that the latter
+would only be able to shape traffic originated from the CPU, but not
+autonomously forwarded flows.
+
+The device has 8 traffic classes, and maps incoming frames to one of them based
+on the VLAN PCP bits (if no VLAN is present, the port-based default is used).
+As described in the previous sections, depending on the value of
+``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
+either be the typical 0x8100 or a custom value used internally by the driver
+for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
+or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
+EtherType. In these modes, injecting into a particular TX queue can only be
+done by the DSA net devices, which populate the PCP field of the tagging header
+on egress. Using ``vlan_filtering=1``, the behavior is the other way around:
+offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
+net devices are no longer able to do that. To inject frames into a hardware TX
+queue with VLAN awareness active, it is necessary to create a VLAN
+sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged
+towards the switch, with the VLAN PCP bits set appropriately.
+
+Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
+notable exception: the switch always treats it with a fixed priority and
+disregards any VLAN PCP bits even if present. The traffic class for management
+traffic has a value of 7 (highest priority) at the moment, which is not
+configurable in the driver.
+
+Below is an example of configuring a 500 us cyclic schedule on egress port
+``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
+and the gates for all other traffic classes are open for 400 us::
+
+ #!/bin/bash
+
+ set -e -u -o pipefail
+
+ NSEC_PER_SEC="1000000000"
+
+ gatemask() {
+ local tc_list="$1"
+ local mask=0
+
+ for tc in ${tc_list}; do
+ mask=$((${mask} | (1 << ${tc})))
+ done
+
+ printf "%02x" ${mask}
+ }
+
+ if ! systemctl is-active --quiet ptp4l; then
+ echo "Please start the ptp4l service"
+ exit
+ fi
+
+ now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
+ # Phase-align the base time to the start of the next second.
+ sec=$(echo "${now}" | gawk -F. '{ print $1; }')
+ base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
+
+ tc qdisc add dev swp5 parent root handle 100 taprio \
+ num_tc 8 \
+ map 0 1 2 3 5 6 7 \
+ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
+ base-time ${base_time} \
+ sched-entry S $(gatemask 7) 100000 \
+ sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
+ flags 2
+
+It is possible to apply the tc-taprio offload on multiple egress ports. There
+are hardware restrictions related to the fact that no gate event may trigger
+simultaneously on two ports. The driver checks the consistency of the schedules
+against this restriction and errors out when appropriate. Schedule analysis is
+needed to avoid this, which is outside the scope of the document.
+
+At the moment, the time-aware scheduler can only be triggered based on a
+standalone clock and not based on PTP time. This means the base-time argument
+from tc-taprio is ignored and the schedule starts right away. It also means it
+is more difficult to phase-align the scheduler with the other devices in the
+network.
+
+Device Tree bindings and board design
+=====================================
+
+This section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt``
+and aims to showcase some potential switch caveats.
+
+RMII PHY role and out-of-band signaling
+---------------------------------------
+
+In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by
+an external oscillator (but not by the PHY).
+But the spec is rather loose and devices go outside it in several ways.
+Some PHYs go against the spec and may provide an output pin where they source
+the 50 MHz clock themselves, in an attempt to be helpful.
+On the other hand, the SJA1105 is only binary configurable - when in the RMII
+MAC role it will also attempt to drive the clock signal. To prevent this from
+happening it must be put in RMII PHY role.
+But doing so has some unintended consequences.
+In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].
+These are practically some extra code words (/J/ and /K/) sent prior to the
+preamble of each frame. The MAC does not have this out-of-band signaling
+mechanism defined by the RMII spec.
+So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the
+clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105
+emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to
+frame preambles, which the real PHY is not expected to understand. So the PHY
+simply encodes the extra symbols received from the SJA1105-as-PHY onto the
+100Base-Tx wire.
+On the other side of the wire, some link partners might discard these extra
+symbols, while others might choke on them and discard the entire Ethernet
+frames that follow along. This looks like packet loss with some link partners
+but not with others.
+The take-away is that in RMII mode, the SJA1105 must be let to drive the
+reference clock if connected to a PHY.
+
+RGMII fixed-link and internal delays
+------------------------------------
+
+As mentioned in the bindings document, the second generation of devices has
+tunable delay lines as part of the MAC, which can be used to establish the
+correct RGMII timing budget.
+When powered up, these can shift the Rx and Tx clocks with a phase difference
+between 73.8 and 101.7 degrees.
+The catch is that the delay lines need to lock onto a clock signal with a
+stable frequency. This means that there must be at least 2 microseconds of
+silence between the clock at the old vs at the new frequency. Otherwise the
+lock is lost and the delay lines must be reset (powered down and back up).
+In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25
+MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the
+AN process.
+In the situation where the switch port is connected through an RGMII fixed-link
+to a link partner whose link state life cycle is outside the control of Linux
+(such as a different SoC), then the delay lines would remain unlocked (and
+inactive) until there is manual intervention (ifdown/ifup on the switch port).
+The take-away is that in RGMII mode, the switch's internal delays are only
+reliable if the link partner never changes link speeds, or if it does, it does
+so in a way that is coordinated with the switch port (practically, both ends of
+the fixed-link are under control of the same Linux system).
+As to why would a fixed-link interface ever change link speeds: there are
+Ethernet controllers out there which come out of reset in 100 Mbps mode, and
+their driver inevitably needs to change the speed and clock frequency if it's
+required to work at gigabit.
+
+MDIO bus and PHY management
+---------------------------
+
+The SJA1105 does not have an MDIO bus and does not perform in-band AN either.
+Therefore there is no link state notification coming from the switch device.
+A board would need to hook up the PHYs connected to the switch to any other
+MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO
+bus). Link state management then works by the driver manually keeping in sync
+(over SPI commands) the MAC link speed with the settings negotiated by the PHY.
diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt
deleted file mode 100644
index 1208954..0000000
--- a/Documentation/networking/e1000e.txt
+++ /dev/null
@@ -1,312 +0,0 @@
-Linux* Driver for Intel(R) Ethernet Network Connection
-======================================================
-
-Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2013 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Command Line Parameters
-- Additional Configurations
-- Support
-
-Identifying Your Adapter
-========================
-
-The e1000e driver supports all PCI Express Intel(R) Gigabit Network
-Connections, except those that are 82575, 82576 and 82580-based*.
-
-* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by
- the e1000 driver, not the e1000e driver due to the 82546 part being used
- behind a PCI Express bridge.
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/go/network/adapter/idguide.htm
-
-For the latest Intel network drivers for Linux, refer to the following
-website. In the search field, enter your adapter name or type, or use the
-networking link on the left to search for your adapter:
-
- http://support.intel.com/support/go/network/adapter/home.htm
-
-Command Line Parameters
-=======================
-
-The default value for each parameter is generally the recommended setting,
-unless otherwise noted.
-
-NOTES: For more information about the InterruptThrottleRate,
- RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
- parameters, see the application note at:
- http://www.intel.com/design/network/applnots/ap450.htm
-
-InterruptThrottleRate
----------------------
-Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
- 4=simplified balancing)
-Default Value: 3
-
-The driver can limit the amount of interrupts per second that the adapter
-will generate for incoming packets. It does this by writing a value to the
-adapter that is based on the maximum amount of interrupts that the adapter
-will generate per second.
-
-Setting InterruptThrottleRate to a value greater or equal to 100
-will program the adapter to send out a maximum of that many interrupts
-per second, even if more packets have come in. This reduces interrupt
-load on the system and can lower CPU utilization under heavy load,
-but will increase latency as packets are not processed as quickly.
-
-The default behaviour of the driver previously assumed a static
-InterruptThrottleRate value of 8000, providing a good fallback value for
-all traffic types, but lacking in small packet performance and latency.
-The hardware can handle many more small packets per second however, and
-for this reason an adaptive interrupt moderation algorithm was implemented.
-
-The driver has two adaptive modes (setting 1 or 3) in which
-it dynamically adjusts the InterruptThrottleRate value based on the traffic
-that it receives. After determining the type of incoming traffic in the last
-timeframe, it will adjust the InterruptThrottleRate to an appropriate value
-for that traffic.
-
-The algorithm classifies the incoming traffic every interval into
-classes. Once the class is determined, the InterruptThrottleRate value is
-adjusted to suit that traffic type the best. There are three classes defined:
-"Bulk traffic", for large amounts of packets of normal size; "Low latency",
-for small amounts of traffic and/or a significant percentage of small
-packets; and "Lowest latency", for almost completely small packets or
-minimal traffic.
-
-In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
-for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
-latency" or "Lowest latency" class, the InterruptThrottleRate is increased
-stepwise to 20000. This default mode is suitable for most applications.
-
-For situations where low latency is vital such as cluster or
-grid computing, the algorithm can reduce latency even more when
-InterruptThrottleRate is set to mode 1. In this mode, which operates
-the same as mode 3, the InterruptThrottleRate will be increased stepwise to
-70000 for traffic in class "Lowest latency".
-
-In simplified mode the interrupt rate is based on the ratio of TX and
-RX traffic. If the bytes per second rate is approximately equal, the
-interrupt rate will drop as low as 2000 interrupts per second. If the
-traffic is mostly transmit or mostly receive, the interrupt rate could
-be as high as 8000.
-
-Setting InterruptThrottleRate to 0 turns off any interrupt moderation
-and may improve small packet latency, but is generally not suitable
-for bulk throughput traffic.
-
-NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
- RxAbsIntDelay parameters. In other words, minimizing the receive
- and/or transmit absolute delays does not force the controller to
- generate more interrupts than what the Interrupt Throttle Rate
- allows.
-
-NOTE: When e1000e is loaded with default settings and multiple adapters
- are in use simultaneously, the CPU utilization may increase non-
- linearly. In order to limit the CPU utilization without impacting
- the overall throughput, we recommend that you load the driver as
- follows:
-
- modprobe e1000e InterruptThrottleRate=3000,3000,3000
-
- This sets the InterruptThrottleRate to 3000 interrupts/sec for
- the first, second, and third instances of the driver. The range
- of 2000 to 3000 interrupts per second works on a majority of
- systems and is a good starting point, but the optimal value will
- be platform-specific. If CPU utilization is not a concern, use
- RX_POLLING (NAPI) and default driver settings.
-
-RxIntDelay
-----------
-Valid Range: 0-65535 (0=off)
-Default Value: 0
-
-This value delays the generation of receive interrupts in units of 1.024
-microseconds. Receive interrupt reduction can improve CPU efficiency if
-properly tuned for specific network traffic. Increasing this value adds
-extra latency to frame reception and can end up decreasing the throughput
-of TCP traffic. If the system is reporting dropped receives, this value
-may be set too high, causing the driver to run out of available receive
-descriptors.
-
-CAUTION: When setting RxIntDelay to a value other than 0, adapters may
- hang (stop transmitting) under certain network conditions. If
- this occurs a NETDEV WATCHDOG message is logged in the system
- event log. In addition, the controller is automatically reset,
- restoring the network connection. To eliminate the potential
- for the hang ensure that RxIntDelay is set to 0.
-
-RxAbsIntDelay
--------------
-Valid Range: 0-65535 (0=off)
-Default Value: 8
-
-This value, in units of 1.024 microseconds, limits the delay in which a
-receive interrupt is generated. Useful only if RxIntDelay is non-zero,
-this value ensures that an interrupt is generated after the initial
-packet is received within the set amount of time. Proper tuning,
-along with RxIntDelay, may improve traffic throughput in specific network
-conditions.
-
-TxIntDelay
-----------
-Valid Range: 0-65535 (0=off)
-Default Value: 8
-
-This value delays the generation of transmit interrupts in units of
-1.024 microseconds. Transmit interrupt reduction can improve CPU
-efficiency if properly tuned for specific network traffic. If the
-system is reporting dropped transmits, this value may be set too high
-causing the driver to run out of available transmit descriptors.
-
-TxAbsIntDelay
--------------
-Valid Range: 0-65535 (0=off)
-Default Value: 32
-
-This value, in units of 1.024 microseconds, limits the delay in which a
-transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
-this value ensures that an interrupt is generated after the initial
-packet is sent on the wire within the set amount of time. Proper tuning,
-along with TxIntDelay, may improve traffic throughput in specific
-network conditions.
-
-Copybreak
----------
-Valid Range: 0-xxxxxxx (0=off)
-Default Value: 256
-
-Driver copies all packets below or equaling this size to a fresh RX
-buffer before handing it up the stack.
-
-This parameter is different than other parameters, in that it is a
-single (not 1,1,1 etc.) parameter applied to all driver instances and
-it is also available during runtime at
-/sys/module/e1000e/parameters/copybreak
-
-SmartPowerDownEnable
---------------------
-Valid Range: 0-1
-Default Value: 0 (disabled)
-
-Allows PHY to turn off in lower power states. The user can set this parameter
-in supported chipsets.
-
-KumeranLockLoss
----------------
-Valid Range: 0-1
-Default Value: 1 (enabled)
-
-This workaround skips resetting the PHY at shutdown for the initial
-silicon releases of ICH8 systems.
-
-IntMode
--------
-Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X)
-Default Value: 2
-
-Allows changing the interrupt mode at module load time, without requiring a
-recompile. If the driver load fails to enable a specific interrupt mode, the
-driver will try other interrupt modes, from least to most compatible. The
-interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1)
-interrupts, only MSI and Legacy will be attempted.
-
-CrcStripping
-------------
-Valid Range: 0-1
-Default Value: 1 (enabled)
-
-Strip the CRC from received packets before sending up the network stack. If
-you have a machine with a BMC enabled but cannot receive IPMI traffic after
-loading or enabling the driver, try disabling this feature.
-
-WriteProtectNVM
----------------
-Valid Range: 0,1
-Default Value: 1
-
-If set to 1, configure the hardware to ignore all write/erase cycles to the
-GbE region in the ICHx NVM (in order to prevent accidental corruption of the
-NVM). This feature can be disabled by setting the parameter to 0 during initial
-driver load.
-NOTE: The machine must be power cycled (full off/on) when enabling NVM writes
-via setting the parameter to zero. Once the NVM has been locked (via the
-parameter at 1 when the driver loads) it cannot be unlocked except via power
-cycle.
-
-Additional Configurations
-=========================
-
- Jumbo Frames
- ------------
- Jumbo Frames support is enabled by changing the MTU to a value larger than
- the default of 1500. Use the ifconfig command to increase the MTU size.
- For example:
-
- ifconfig eth<x> mtu 9000 up
-
- This setting is not saved across reboots.
-
- Notes:
-
- - The maximum MTU setting for Jumbo Frames is 9216. This value coincides
- with the maximum Jumbo Frames size of 9234 bytes.
-
- - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
- poor performance or loss of link.
-
- - Some adapters limit Jumbo Frames sized packets to a maximum of
- 4096 bytes and some adapters do not support Jumbo Frames.
-
- - Jumbo Frames cannot be configured on an 82579-based Network device, if
- MACSec is enabled on the system.
-
- ethtool
- -------
- The driver utilizes the ethtool interface for driver configuration and
- diagnostics, as well as displaying statistical information. We
- strongly recommend downloading the latest version of ethtool at:
-
- https://kernel.org/pub/software/network/ethtool/
-
- NOTE: When validating enable/disable tests on some parts (82578, for example)
- you need to add a few seconds between tests when working with ethtool.
-
- Speed and Duplex
- ----------------
- Speed and Duplex are configured through the ethtool* utility. For
- instructions, refer to the ethtool man page.
-
- Enabling Wake on LAN* (WoL)
- ---------------------------
- WoL is configured through the ethtool* utility. For instructions on
- enabling WoL with ethtool, refer to the ethtool man page.
-
- WoL will be enabled on the system during the next shut down or reboot.
- For this driver version, in order to enable WoL, the e1000e driver must be
- loaded when shutting down or rebooting the system.
-
- In most cases Wake On LAN is only supported on port A for multiple port
- adapters. To verify if a port supports Wake on Lan run ethtool eth<X>.
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- www.intel.com/support/
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index e6b4ebb..319e5e0 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -203,11 +203,11 @@
Instruction Addressing mode Description
- ld 1, 2, 3, 4, 10 Load word into A
+ ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A
ldh 1, 2 Load half-word into A
ldb 1, 2 Load byte into A
- ldx 3, 4, 5, 10 Load word into X
+ ldx 3, 4, 5, 12 Load word into X
ldxi 4 Load word into X
ldxb 5 Load byte into X
@@ -216,14 +216,14 @@
jmp 6 Jump to label
ja 6 Jump to label
- jeq 7, 8 Jump on A == k
- jneq 8 Jump on A != k
- jne 8 Jump on A != k
- jlt 8 Jump on A < k
- jle 8 Jump on A <= k
- jgt 7, 8 Jump on A > k
- jge 7, 8 Jump on A >= k
- jset 7, 8 Jump on A & k
+ jeq 7, 8, 9, 10 Jump on A == <x>
+ jneq 9, 10 Jump on A != <x>
+ jne 9, 10 Jump on A != <x>
+ jlt 9, 10 Jump on A < <x>
+ jle 9, 10 Jump on A <= <x>
+ jgt 7, 8, 9, 10 Jump on A > <x>
+ jge 7, 8, 9, 10 Jump on A >= <x>
+ jset 7, 8, 9, 10 Jump on A & <x>
add 0, 4 A + <x>
sub 0, 4 A - <x>
@@ -240,7 +240,7 @@
tax Copy A into X
txa Copy X into A
- ret 4, 9 Return
+ ret 4, 11 Return
The next table shows addressing formats from the 2nd column:
@@ -254,9 +254,11 @@
5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet
6 L Jump label L
7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf
- 8 #k,Lt Jump to Lt if predicate is true
- 9 a/%a Accumulator A
- 10 extension BPF extension
+ 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf
+ 9 #k,Lt Jump to Lt if predicate is true
+ 10 x/%x,Lt Jump to Lt if predicate is true
+ 11 a/%a Accumulator A
+ 12 extension BPF extension
The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with
@@ -462,10 +464,11 @@
JIT compiler
------------
-The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, PowerPC,
-ARM, ARM64, MIPS and s390 and can be enabled through CONFIG_BPF_JIT. The JIT
-compiler is transparently invoked for each attached filter from user space
-or for internal kernel users if it has been previously enabled by root:
+The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
+PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
+CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
+attached filter from user space or for internal kernel users if it has
+been previously enabled by root:
echo 1 > /proc/sys/net/core/bpf_jit_enable
@@ -601,9 +604,10 @@
skb pointer). All constraints and restrictions from bpf_check_classic() apply
before a conversion to the new layout is being done behind the scenes!
-Currently, the classic BPF format is being used for JITing on most 32-bit
-architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64, arm32 perform
-JIT compilation from eBPF instruction set.
+Currently, the classic BPF format is being used for JITing on most
+32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
+sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF
+instruction set.
Some core changes of the new internal format:
@@ -825,7 +829,7 @@
is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack.
-Internal BPF can used as generic assembler for last step performance
+Internal BPF can be used as a generic assembler for last step performance
optimizations, socket filters and seccomp are using it as assembler. Tracing
filters may use it as assembler to generate code from kernel. In kernel usage
may not be bounded by security considerations, since generated internal BPF code
@@ -863,7 +867,7 @@
BPF_STX 0x03 BPF_STX 0x03
BPF_ALU 0x04 BPF_ALU 0x04
BPF_JMP 0x05 BPF_JMP 0x05
- BPF_RET 0x06 [ class 6 unused, for future if needed ]
+ BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
@@ -900,9 +904,9 @@
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */
-If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
+If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:
- BPF_JA 0x00
+ BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10
BPF_JGT 0x20
BPF_JGE 0x30
@@ -910,8 +914,8 @@
BPF_JNE 0x50 /* eBPF only: jump != */
BPF_JSGT 0x60 /* eBPF only: signed '>' */
BPF_JSGE 0x70 /* eBPF only: signed '>=' */
- BPF_CALL 0x80 /* eBPF only: function call */
- BPF_EXIT 0x90 /* eBPF only: function return */
+ BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */
+ BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */
BPF_JLT 0xa0 /* eBPF only: unsigned '<' */
BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */
BPF_JSLT 0xc0 /* eBPF only: signed '<' */
@@ -934,8 +938,9 @@
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return
-value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently
-unused and reserved for future use.
+value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
+BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
+operands for the comparisons instead.
For load and store instructions the 8-bit 'code' field is divided as:
@@ -1125,6 +1130,14 @@
PTR_TO_STACK Frame pointer.
PTR_TO_PACKET skb->data.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
+ PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
+ PTR_TO_SOCKET_OR_NULL
+ Either a pointer to a socket, or NULL; socket lookup
+ returns this type, which becomes a PTR_TO_SOCKET when
+ checked != NULL. PTR_TO_SOCKET is reference-counted,
+ so programs must release the reference through the
+ socket release function before the end of the program.
+ Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
@@ -1171,6 +1184,13 @@
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
that pointer are safe.
+The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
+to all copies of the pointer returned from a socket lookup. This has similar
+behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
+it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
+represents a reference to the corresponding 'struct sock'. To ensure that the
+reference is not leaked, it is imperative to NULL-check the reference and in
+the non-NULL case, and pass the valid reference to the socket release function.
Direct packet access
--------------------
@@ -1444,6 +1464,55 @@
8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm'
+Program that performs a socket lookup then sets the pointer to NULL without
+checking it:
+value:
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_MOV64_IMM(BPF_REG_3, 4),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_MOV64_IMM(BPF_REG_5, 0),
+ BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+Error:
+ 0: (b7) r2 = 0
+ 1: (63) *(u32 *)(r10 -8) = r2
+ 2: (bf) r2 = r10
+ 3: (07) r2 += -8
+ 4: (b7) r3 = 4
+ 5: (b7) r4 = 0
+ 6: (b7) r5 = 0
+ 7: (85) call bpf_sk_lookup_tcp#65
+ 8: (b7) r0 = 0
+ 9: (95) exit
+ Unreleased reference id=1, alloc_insn=7
+
+Program that performs a socket lookup but does not NULL-check the returned
+value:
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_MOV64_IMM(BPF_REG_3, 4),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_MOV64_IMM(BPF_REG_5, 0),
+ BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
+ BPF_EXIT_INSN(),
+Error:
+ 0: (b7) r2 = 0
+ 1: (63) *(u32 *)(r10 -8) = r2
+ 2: (bf) r2 = r10
+ 3: (07) r2 += -8
+ 4: (b7) r3 = 4
+ 5: (b7) r4 = 0
+ 6: (b7) r5 = 0
+ 7: (85) call bpf_sk_lookup_tcp#65
+ 8: (95) exit
+ Unreleased reference id=1, alloc_insn=7
+
Testing
-------
diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt
deleted file mode 100644
index c2d6e18..0000000
--- a/Documentation/networking/i40e.txt
+++ /dev/null
@@ -1,190 +0,0 @@
-Linux Base Driver for the Intel(R) Ethernet Controller XL710 Family
-===================================================================
-
-Intel i40e Linux driver.
-Copyright(c) 2013 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Additional Configurations
-- Performance Tuning
-- Known Issues
-- Support
-
-
-Identifying Your Adapter
-========================
-
-The driver in this release is compatible with the Intel Ethernet
-Controller XL710 Family.
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/network/sb/CS-012904.htm
-
-
-Enabling the driver
-===================
-
-The driver is enabled via the standard kernel configuration system,
-using the make command:
-
- make config/oldconfig/menuconfig/etc.
-
-The driver is located in the menu structure at:
-
- -> Device Drivers
- -> Network device support (NETDEVICES [=y])
- -> Ethernet driver support
- -> Intel devices
- -> Intel(R) Ethernet Controller XL710 Family
-
-Additional Configurations
-=========================
-
- Generic Receive Offload (GRO)
- -----------------------------
- The driver supports the in-kernel software implementation of GRO. GRO has
- shown that by coalescing Rx traffic into larger chunks of data, CPU
- utilization can be significantly reduced when under large Rx load. GRO is
- an evolution of the previously-used LRO interface. GRO is able to coalesce
- other protocols besides TCP. It's also safe to use with configurations that
- are problematic for LRO, namely bridging and iSCSI.
-
- Ethtool
- -------
- The driver utilizes the ethtool interface for driver configuration and
- diagnostics, as well as displaying statistical information. The latest
- ethtool version is required for this functionality.
-
- The latest release of ethtool can be found from
- https://www.kernel.org/pub/software/network/ethtool
-
-
- Flow Director n-ntuple traffic filters (FDir)
- ---------------------------------------------
- The driver utilizes the ethtool interface for configuring ntuple filters,
- via "ethtool -N <device> <filter>".
-
- The sctp4, ip4, udp4, and tcp4 flow types are supported with the standard
- fields including src-ip, dst-ip, src-port and dst-port. The driver only
- supports fully enabling or fully masking the fields, so use of the mask
- fields for partial matches is not supported.
-
- Additionally, the driver supports using the action to specify filters for a
- Virtual Function. You can specify the action as a 64bit value, where the
- lower 32 bits represents the queue number, while the next 8 bits represent
- which VF. Note that 0 is the PF, so the VF identifier is offset by 1. For
- example:
-
- ... action 0x800000002 ...
-
- Would indicate to direct traffic for Virtual Function 7 (8 minus 1) on queue
- 2 of that VF.
-
- The driver also supports using the user-defined field to specify 2 bytes of
- arbitrary data to match within the packet payload in addition to the regular
- fields. The data is specified in the lower 32bits of the user-def field in
- the following way:
-
- +----------------------------+---------------------------+
- | 31 28 24 20 16 | 15 12 8 4 0|
- +----------------------------+---------------------------+
- | offset into packet payload | 2 bytes of flexible data |
- +----------------------------+---------------------------+
-
- As an example,
-
- ... user-def 0x4FFFF ....
-
- means to match the value 0xFFFF 4 bytes into the packet payload. Note that
- the offset is based on the beginning of the payload, and not the beginning
- of the packet. Thus
-
- flow-type tcp4 ... user-def 0x8BEAF ....
-
- would match TCP/IPv4 packets which have the value 0xBEAF 8bytes into the
- TCP/IPv4 payload.
-
- For ICMP, the hardware parses the ICMP header as 4 bytes of header and 4
- bytes of payload, so if you want to match an ICMP frames payload you may need
- to add 4 to the offset in order to match the data.
-
- Furthermore, the offset can only be up to a value of 64, as the hardware
- will only read up to 64 bytes of data from the payload. It must also be even
- as the flexible data is 2 bytes long and must be aligned to byte 0 of the
- packet payload.
-
- When programming filters, the hardware is limited to using a single input
- set for each flow type. This means that it is an error to program two
- different filters with the same type that don't match on the same fields.
- Thus the second of the following two commands will fail:
-
- ethtool -N <device> flow-type tcp4 src-ip 192.168.0.7 action 5
- ethtool -N <device> flow-type tcp4 dst-ip 192.168.15.18 action 1
-
- This is because the first filter will be accepted and reprogram the input
- set for TCPv4 filters, but the second filter will be unable to reprogram the
- input set until all the conflicting TCPv4 filters are first removed.
-
- Note that the user-defined flexible offset is also considered part of the
- input set and cannot be programmed separately for multiple filters of the
- same type. However, the flexible data is not part of the input set and
- multiple filters may use the same offset but match against different data.
-
- Data Center Bridging (DCB)
- --------------------------
- DCB configuration is not currently supported.
-
- FCoE
- ----
- The driver supports Fiber Channel over Ethernet (FCoE) and Data Center
- Bridging (DCB) functionality. Configuring DCB and FCoE is outside the scope
- of this driver doc. Refer to http://www.open-fcoe.org/ for FCoE project
- information and http://www.open-lldp.org/ or email list
- e1000-eedc@lists.sourceforge.net for DCB information.
-
- MAC and VLAN anti-spoofing feature
- ----------------------------------
- When a malicious driver attempts to send a spoofed packet, it is dropped by
- the hardware and not transmitted. An interrupt is sent to the PF driver
- notifying it of the spoof attempt.
-
- When a spoofed packet is detected the PF driver will send the following
- message to the system log (displayed by the "dmesg" command):
-
- Spoof event(s) detected on VF (n)
-
- Where n=the VF that attempted to do the spoofing.
-
-
-Performance Tuning
-==================
-
-An excellent article on performance tuning can be found at:
-
-http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
-
-
-Known Issues
-============
-
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://e1000.sourceforge.net
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sourceforge.net and copy
-netdev@vger.kernel.org.
diff --git a/Documentation/networking/i40evf.txt b/Documentation/networking/i40evf.txt
deleted file mode 100644
index e9b3035..0000000
--- a/Documentation/networking/i40evf.txt
+++ /dev/null
@@ -1,54 +0,0 @@
-Linux* Base Driver for Intel(R) Network Connection
-==================================================
-
-Intel Ethernet Adaptive Virtual Function Linux driver.
-Copyright(c) 2013-2017 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Known Issues/Troubleshooting
-- Support
-
-This file describes the i40evf Linux* Base Driver.
-
-The i40evf driver supports the below mentioned virtual function
-devices and can only be activated on kernels running the i40e or
-newer Physical Function (PF) driver compiled with CONFIG_PCI_IOV.
-The i40evf driver requires CONFIG_PCI_MSI to be enabled.
-
-The guest OS loading the i40evf driver must support MSI-X interrupts.
-
-Supported Hardware
-==================
-Intel XL710 X710 Virtual Function
-Intel Ethernet Adaptive Virtual Function
-Intel X722 Virtual Function
-
-Identifying Your Adapter
-========================
-
-For more information on how to identify your adapter, go to the
-Adapter & Driver ID Guide at:
-
- http://support.intel.com/support/go/network/adapter/idguide.htm
-
-Known Issues/Troubleshooting
-============================
-
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/ice.txt b/Documentation/networking/ice.txt
deleted file mode 100644
index 6261c46..0000000
--- a/Documentation/networking/ice.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-Intel(R) Ethernet Connection E800 Series Linux Driver
-===================================================================
-
-Intel ice Linux driver.
-Copyright(c) 2018 Intel Corporation.
-
-Contents
-========
-- Enabling the driver
-- Support
-
-The driver in this release supports Intel's E800 Series of products. For
-more information, visit Intel's support page at http://support.intel.com.
-
-Enabling the driver
-===================
-
-The driver is enabled via the standard kernel configuration system,
-using the make command:
-
- Make oldconfig/silentoldconfig/menuconfig/etc.
-
-The driver is located in the menu structure at:
-
- -> Device Drivers
- -> Network device support (NETDEVICES [=y])
- -> Ethernet driver support
- -> Intel devices
- -> Intel(R) Ethernet Connection E800 Series Support
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-If an issue is identified with the released source code, please email
-the maintainer listed in the MAINTAINERS file.
diff --git a/Documentation/networking/ieee802154.rst b/Documentation/networking/ieee802154.rst
new file mode 100644
index 0000000..36ca823
--- /dev/null
+++ b/Documentation/networking/ieee802154.rst
@@ -0,0 +1,180 @@
+===============================
+IEEE 802.15.4 Developer's Guide
+===============================
+
+Introduction
+============
+The IEEE 802.15.4 working group focuses on standardization of the bottom
+two layers: Medium Access Control (MAC) and Physical access (PHY). And there
+are mainly two options available for upper layers:
+
+- ZigBee - proprietary protocol from the ZigBee Alliance
+- 6LoWPAN - IPv6 networking over low rate personal area networks
+
+The goal of the Linux-wpan is to provide a complete implementation
+of the IEEE 802.15.4 and 6LoWPAN protocols. IEEE 802.15.4 is a stack
+of protocols for organizing Low-Rate Wireless Personal Area Networks.
+
+The stack is composed of three main parts:
+
+- IEEE 802.15.4 layer; We have chosen to use plain Berkeley socket API,
+ the generic Linux networking stack to transfer IEEE 802.15.4 data
+ messages and a special protocol over netlink for configuration/management
+- MAC - provides access to shared channel and reliable data delivery
+- PHY - represents device drivers
+
+Socket API
+==========
+
+.. c:function:: int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
+
+The address family, socket addresses etc. are defined in the
+include/net/af_ieee802154.h header or in the special header
+in the userspace package (see either http://wpan.cakelab.org/ or the
+git tree at https://github.com/linux-wpan/wpan-tools).
+
+6LoWPAN Linux implementation
+============================
+
+The IEEE 802.15.4 standard specifies an MTU of 127 bytes, yielding about 80
+octets of actual MAC payload once security is turned on, on a wireless link
+with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format
+[RFC4944] was specified to carry IPv6 datagrams over such constrained links,
+taking into account limited bandwidth, memory, or energy resources that are
+expected in applications such as wireless Sensor Networks. [RFC4944] defines
+a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header
+to support the IPv6 minimum MTU requirement [RFC2460], and stateless header
+compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the
+relatively large IPv6 and UDP headers down to (in the best case) several bytes.
+
+In September 2011 the standard update was published - [RFC6282].
+It deprecates HC1 and HC2 compression and defines IPHC encoding format which is
+used in this Linux implementation.
+
+All the code related to 6lowpan you may find in files: net/6lowpan/*
+and net/ieee802154/6lowpan/*
+
+To setup a 6LoWPAN interface you need:
+1. Add IEEE802.15.4 interface and set channel and PAN ID;
+2. Add 6lowpan interface by command like:
+# ip link add link wpan0 name lowpan0 type lowpan
+3. Bring up 'lowpan0' interface
+
+Drivers
+=======
+
+Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
+1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
+exports a management (e.g. MLME) and data API.
+2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
+possibly with some kinds of acceleration like automatic CRC computation and
+comparation, automagic ACK handling, address matching, etc.
+
+Those types of devices require different approach to be hooked into Linux kernel.
+
+HardMAC
+-------
+
+See the header include/net/ieee802154_netdev.h. You have to implement Linux
+net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
+code via plain sk_buffs. On skb reception skb->cb must contain additional
+info as described in the struct ieee802154_mac_cb. During packet transmission
+the skb->cb is used to provide additional data to device's header_ops->create
+function. Be aware that this data can be overridden later (when socket code
+submits skb to qdisc), so if you need something from that cb later, you should
+store info in the skb->data on your own.
+
+To hook the MLME interface you have to populate the ml_priv field of your
+net_device with a pointer to struct ieee802154_mlme_ops instance. The fields
+assoc_req, assoc_resp, disassoc_req, start_req, and scan_req are optional.
+All other fields are required.
+
+SoftMAC
+-------
+
+The MAC is the middle layer in the IEEE 802.15.4 Linux stack. This moment it
+provides interface for drivers registration and management of slave interfaces.
+
+NOTE: Currently the only monitor device type is supported - it's IEEE 802.15.4
+stack interface for network sniffers (e.g. WireShark).
+
+This layer is going to be extended soon.
+
+See header include/net/mac802154.h and several drivers in
+drivers/net/ieee802154/.
+
+Fake drivers
+------------
+
+In addition there is a driver available which simulates a real device with
+SoftMAC (fakelb - IEEE 802.15.4 loopback driver) interface. This option
+provides a possibility to test and debug the stack without usage of real hardware.
+
+Device drivers API
+==================
+
+The include/net/mac802154.h defines following functions:
+
+.. c:function:: struct ieee802154_dev *ieee802154_alloc_device (size_t priv_size, struct ieee802154_ops *ops)
+
+Allocation of IEEE 802.15.4 compatible device.
+
+.. c:function:: void ieee802154_free_device(struct ieee802154_dev *dev)
+
+Freeing allocated device.
+
+.. c:function:: int ieee802154_register_device(struct ieee802154_dev *dev)
+
+Register PHY in the system.
+
+.. c:function:: void ieee802154_unregister_device(struct ieee802154_dev *dev)
+
+Freeing registered PHY.
+
+.. c:function:: void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb, u8 lqi):
+
+Telling 802.15.4 module there is a new received frame in the skb with
+the RF Link Quality Indicator (LQI) from the hardware device.
+
+.. c:function:: void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb, bool ifs_handling):
+
+Telling 802.15.4 module the frame in the skb is or going to be
+transmitted through the hardware device
+
+The device driver must implement the following callbacks in the IEEE 802.15.4
+operations structure at least::
+
+ struct ieee802154_ops {
+ ...
+ int (*start)(struct ieee802154_hw *hw);
+ void (*stop)(struct ieee802154_hw *hw);
+ ...
+ int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb);
+ int (*ed)(struct ieee802154_hw *hw, u8 *level);
+ int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel);
+ ...
+ };
+
+.. c:function:: int start(struct ieee802154_hw *hw):
+
+Handler that 802.15.4 module calls for the hardware device initialization.
+
+.. c:function:: void stop(struct ieee802154_hw *hw):
+
+Handler that 802.15.4 module calls for the hardware device cleanup.
+
+.. c:function:: int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb):
+
+Handler that 802.15.4 module calls for each frame in the skb going to be
+transmitted through the hardware device.
+
+.. c:function:: int ed(struct ieee802154_hw *hw, u8 *level):
+
+Handler that 802.15.4 module calls for Energy Detection from the hardware
+device.
+
+.. c:function:: int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel):
+
+Set radio for listening on specific channel of the hardware device.
+
+Moreover IEEE 802.15.4 device operations structure should be filled.
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
deleted file mode 100644
index e74d8e1..0000000
--- a/Documentation/networking/ieee802154.txt
+++ /dev/null
@@ -1,177 +0,0 @@
-
- Linux IEEE 802.15.4 implementation
-
-
-Introduction
-============
-The IEEE 802.15.4 working group focuses on standardization of the bottom
-two layers: Medium Access Control (MAC) and Physical access (PHY). And there
-are mainly two options available for upper layers:
- - ZigBee - proprietary protocol from the ZigBee Alliance
- - 6LoWPAN - IPv6 networking over low rate personal area networks
-
-The goal of the Linux-wpan is to provide a complete implementation
-of the IEEE 802.15.4 and 6LoWPAN protocols. IEEE 802.15.4 is a stack
-of protocols for organizing Low-Rate Wireless Personal Area Networks.
-
-The stack is composed of three main parts:
- - IEEE 802.15.4 layer; We have chosen to use plain Berkeley socket API,
- the generic Linux networking stack to transfer IEEE 802.15.4 data
- messages and a special protocol over netlink for configuration/management
- - MAC - provides access to shared channel and reliable data delivery
- - PHY - represents device drivers
-
-
-Socket API
-==========
-
-int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
-.....
-
-The address family, socket addresses etc. are defined in the
-include/net/af_ieee802154.h header or in the special header
-in the userspace package (see either http://wpan.cakelab.org/ or the
-git tree at https://github.com/linux-wpan/wpan-tools).
-
-
-Kernel side
-=============
-
-Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
-1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
- exports a management (e.g. MLME) and data API.
-2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
- possibly with some kinds of acceleration like automatic CRC computation and
- comparation, automagic ACK handling, address matching, etc.
-
-Those types of devices require different approach to be hooked into Linux kernel.
-
-
-HardMAC
-=======
-
-See the header include/net/ieee802154_netdev.h. You have to implement Linux
-net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
-code via plain sk_buffs. On skb reception skb->cb must contain additional
-info as described in the struct ieee802154_mac_cb. During packet transmission
-the skb->cb is used to provide additional data to device's header_ops->create
-function. Be aware that this data can be overridden later (when socket code
-submits skb to qdisc), so if you need something from that cb later, you should
-store info in the skb->data on your own.
-
-To hook the MLME interface you have to populate the ml_priv field of your
-net_device with a pointer to struct ieee802154_mlme_ops instance. The fields
-assoc_req, assoc_resp, disassoc_req, start_req, and scan_req are optional.
-All other fields are required.
-
-
-SoftMAC
-=======
-
-The MAC is the middle layer in the IEEE 802.15.4 Linux stack. This moment it
-provides interface for drivers registration and management of slave interfaces.
-
-NOTE: Currently the only monitor device type is supported - it's IEEE 802.15.4
-stack interface for network sniffers (e.g. WireShark).
-
-This layer is going to be extended soon.
-
-See header include/net/mac802154.h and several drivers in
-drivers/net/ieee802154/.
-
-
-Device drivers API
-==================
-
-The include/net/mac802154.h defines following functions:
- - struct ieee802154_hw *
- ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops):
- allocation of IEEE 802.15.4 compatible hardware device
-
- - void ieee802154_free_hw(struct ieee802154_hw *hw):
- freeing allocated hardware device
-
- - int ieee802154_register_hw(struct ieee802154_hw *hw):
- register PHY which is the allocated hardware device, in the system
-
- - void ieee802154_unregister_hw(struct ieee802154_hw *hw):
- freeing registered PHY
-
- - void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb,
- u8 lqi):
- telling 802.15.4 module there is a new received frame in the skb with
- the RF Link Quality Indicator (LQI) from the hardware device
-
- - void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
- bool ifs_handling):
- telling 802.15.4 module the frame in the skb is or going to be
- transmitted through the hardware device
-
-The device driver must implement the following callbacks in the IEEE 802.15.4
-operations structure at least:
-struct ieee802154_ops {
- ...
- int (*start)(struct ieee802154_hw *hw);
- void (*stop)(struct ieee802154_hw *hw);
- ...
- int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb);
- int (*ed)(struct ieee802154_hw *hw, u8 *level);
- int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel);
- ...
-};
-
- - int start(struct ieee802154_hw *hw):
- handler that 802.15.4 module calls for the hardware device initialization.
-
- - void stop(struct ieee802154_hw *hw):
- handler that 802.15.4 module calls for the hardware device cleanup.
-
- - int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb):
- handler that 802.15.4 module calls for each frame in the skb going to be
- transmitted through the hardware device.
-
- - int ed(struct ieee802154_hw *hw, u8 *level):
- handler that 802.15.4 module calls for Energy Detection from the hardware
- device.
-
- - int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel):
- set radio for listening on specific channel of the hardware device.
-
-Moreover IEEE 802.15.4 device operations structure should be filled.
-
-Fake drivers
-============
-
-In addition there is a driver available which simulates a real device with
-SoftMAC (fakelb - IEEE 802.15.4 loopback driver) interface. This option
-provides a possibility to test and debug the stack without usage of real hardware.
-
-See sources in drivers/net/ieee802154 folder for more details.
-
-
-6LoWPAN Linux implementation
-============================
-
-The IEEE 802.15.4 standard specifies an MTU of 127 bytes, yielding about 80
-octets of actual MAC payload once security is turned on, on a wireless link
-with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format
-[RFC4944] was specified to carry IPv6 datagrams over such constrained links,
-taking into account limited bandwidth, memory, or energy resources that are
-expected in applications such as wireless Sensor Networks. [RFC4944] defines
-a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header
-to support the IPv6 minimum MTU requirement [RFC2460], and stateless header
-compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the
-relatively large IPv6 and UDP headers down to (in the best case) several bytes.
-
-In September 2011 the standard update was published - [RFC6282].
-It deprecates HC1 and HC2 compression and defines IPHC encoding format which is
-used in this Linux implementation.
-
-All the code related to 6lowpan you may find in files: net/6lowpan/*
-and net/ieee802154/6lowpan/*
-
-To setup a 6LoWPAN interface you need:
-1. Add IEEE802.15.4 interface and set channel and PAN ID;
-2. Add 6lowpan interface by command like:
- # ip link add link wpan0 name lowpan0 type lowpan
-3. Bring up 'lowpan0' interface
diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt
deleted file mode 100644
index f90643e..0000000
--- a/Documentation/networking/igb.txt
+++ /dev/null
@@ -1,129 +0,0 @@
-Linux* Base Driver for Intel(R) Ethernet Network Connection
-===========================================================
-
-Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2013 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Additional Configurations
-- Support
-
-Identifying Your Adapter
-========================
-
-This driver supports all 82575, 82576 and 82580-based Intel (R) gigabit network
-connections.
-
-For specific information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/go/network/adapter/idguide.htm
-
-Command Line Parameters
-=======================
-
-The default value for each parameter is generally the recommended setting,
-unless otherwise noted.
-
-max_vfs
--------
-Valid Range: 0-7
-Default Value: 0
-
-This parameter adds support for SR-IOV. It causes the driver to spawn up to
-max_vfs worth of virtual function.
-
-Additional Configurations
-=========================
-
- Jumbo Frames
- ------------
- Jumbo Frames support is enabled by changing the MTU to a value larger than
- the default of 1500. Use the ip command to increase the MTU size.
- For example:
-
- ip link set dev eth<x> mtu 9000
-
- This setting is not saved across reboots.
-
- Notes:
-
- - The maximum MTU setting for Jumbo Frames is 9216. This value coincides
- with the maximum Jumbo Frames size of 9234 bytes.
-
- - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
- poor performance or loss of link.
-
- ethtool
- -------
- The driver utilizes the ethtool interface for driver configuration and
- diagnostics, as well as displaying statistical information. The latest
- version of ethtool can be found at:
-
- https://www.kernel.org/pub/software/network/ethtool/
-
- Enabling Wake on LAN* (WoL)
- ---------------------------
- WoL is configured through the ethtool* utility.
-
- For instructions on enabling WoL with ethtool, refer to the ethtool man page.
-
- WoL will be enabled on the system during the next shut down or reboot.
- For this driver version, in order to enable WoL, the igb driver must be
- loaded when shutting down or rebooting the system.
-
- Wake On LAN is only supported on port A of multi-port adapters.
-
- Wake On LAN is not supported for the Intel(R) Gigabit VT Quad Port Server
- Adapter.
-
- Multiqueue
- ----------
- In this mode, a separate MSI-X vector is allocated for each queue and one
- for "other" interrupts such as link status change and errors. All
- interrupts are throttled via interrupt moderation. Interrupt moderation
- must be used to avoid interrupt storms while the driver is processing one
- interrupt. The moderation value should be at least as large as the expected
- time for the driver to process an interrupt. Multiqueue is off by default.
-
- REQUIREMENTS: MSI-X support is required for Multiqueue. If MSI-X is not
- found, the system will fallback to MSI or to Legacy interrupts.
-
- MAC and VLAN anti-spoofing feature
- ----------------------------------
- When a malicious driver attempts to send a spoofed packet, it is dropped by
- the hardware and not transmitted. An interrupt is sent to the PF driver
- notifying it of the spoof attempt.
-
- When a spoofed packet is detected the PF driver will send the following
- message to the system log (displayed by the "dmesg" command):
-
- Spoof event(s) detected on VF(n)
-
- Where n=the VF that attempted to do the spoofing.
-
- Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool
- ------------------------------------------------------------
- You can set a MAC address of a Virtual Function (VF), a default VLAN and the
- rate limit using the IProute2 tool. Download the latest version of the
- iproute2 tool from Sourceforge if your version does not have all the
- features you require.
-
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- www.intel.com/support/
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/igbvf.txt b/Documentation/networking/igbvf.txt
deleted file mode 100644
index bd40473..0000000
--- a/Documentation/networking/igbvf.txt
+++ /dev/null
@@ -1,80 +0,0 @@
-Linux* Base Driver for Intel(R) Ethernet Network Connection
-===========================================================
-
-Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2013 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Additional Configurations
-- Support
-
-This file describes the igbvf Linux* Base Driver for Intel Network Connection.
-
-The igbvf driver supports 82576-based virtual function devices that can only
-be activated on kernels that support SR-IOV. SR-IOV requires the correct
-platform and OS support.
-
-The igbvf driver requires the igb driver, version 2.0 or later. The igbvf
-driver supports virtual functions generated by the igb driver with a max_vfs
-value of 1 or greater. For more information on the max_vfs parameter refer
-to the README included with the igb driver.
-
-The guest OS loading the igbvf driver must support MSI-X interrupts.
-
-This driver is only supported as a loadable module at this time. Intel is
-not supplying patches against the kernel source to allow for static linking
-of the driver. For questions related to hardware requirements, refer to the
-documentation supplied with your Intel Gigabit adapter. All hardware
-requirements listed apply to use with Linux.
-
-Instructions on updating ethtool can be found in the section "Additional
-Configurations" later in this document.
-
-VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs.
-
-Identifying Your Adapter
-========================
-
-The igbvf driver supports 82576-based virtual function devices that can only
-be activated on kernels that support SR-IOV.
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/go/network/adapter/idguide.htm
-
-For the latest Intel network drivers for Linux, refer to the following
-website. In the search field, enter your adapter name or type, or use the
-networking link on the left to search for your adapter:
-
- http://downloadcenter.intel.com/scripts-df-external/Support_Intel.aspx
-
-Additional Configurations
-=========================
-
- ethtool
- -------
- The driver utilizes the ethtool interface for driver configuration and
- diagnostics, as well as displaying statistical information. The ethtool
- version 3.0 or later is required for this functionality, although we
- strongly recommend downloading the latest version at:
-
- https://www.kernel.org/pub/software/network/ethtool/
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index fcd710f..d4dca42 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -11,18 +11,30 @@
batman-adv
can
can_ucan_protocol
- dpaa2/index
- e100
- e1000
+ device_drivers/index
+ dsa/index
+ devlink-info-versions
+ devlink-trap
+ devlink-trap-netdevsim
+ ieee802154
+ j1939
kapi
z8530book
msg_zerocopy
failover
net_failover
+ phy
+ sfp-phylink
alias
bridge
+ snmp_counter
+ checksum-offloads
+ segmentation-offloads
+ scaling
+ tls
+ tls-offload
-.. only:: subproject
+.. only:: subproject and html
Indices
=======
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 960de8f..8d4ad1d 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -80,6 +80,12 @@
Possible values:
0 - Layer 3
1 - Layer 4
+ 2 - Layer 3 or inner Layer 3 if present
+
+fib_sync_mem - UNSIGNED INTEGER
+ Amount of dirty memory from fib entries that can be backlogged before
+ synchronize_rcu is forced.
+ Default: 512kB Minimum: 64kB Maximum: 64MB
ip_forward_update_priority - INTEGER
Whether to update SKB priority from "TOS" field in IPv4 header after it
@@ -108,8 +114,8 @@
Default: 512
neigh/default/gc_thresh3 - INTEGER
- Maximum number of neighbor entries allowed. Increase this
- when using large numbers of interfaces and when communicating
+ Maximum number of non-PERMANENT neighbor entries allowed. Increase
+ this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
Default: 1024
@@ -201,8 +207,8 @@
somaxconn - INTEGER
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
- Defaults to 128. See also tcp_max_syn_backlog for additional tuning
- for TCP sockets.
+ Defaults to 4096. (Was 128 before linux-5.4)
+ See also tcp_max_syn_backlog for additional tuning for TCP sockets.
tcp_abort_on_overflow - BOOLEAN
If listening service is too slow to accept new connections,
@@ -250,6 +256,20 @@
Path MTU discovery (MTU probing). If MTU probing is enabled,
this is the initial MSS used by the connection.
+tcp_mtu_probe_floor - INTEGER
+ If MTU probing is enabled this caps the minimum MSS used for search_low
+ for the connection.
+
+ Default : 48
+
+tcp_min_snd_mss - INTEGER
+ TCP SYN and SYNACK messages usually advertise an ADVMSS option,
+ as described in RFC 1122 and RFC 6691.
+ If this ADVMSS option is smaller than tcp_min_snd_mss,
+ it is silently capped to tcp_min_snd_mss.
+
+ Default : 48 (at least 8 bytes of payload per segment)
+
tcp_congestion_control - STRING
Set the congestion control algorithm to be used for new
connections. The algorithm "reno" is always available, but
@@ -316,6 +336,17 @@
By default it's enabled with a non-zero value. 0 disables F-RTO.
+tcp_fwmark_accept - BOOLEAN
+ If set, incoming connections to listening sockets that do not have a
+ socket mark will set the mark of the accepting socket to the fwmark of
+ the incoming SYN packet. This will cause all packets on that connection
+ (starting from the first SYNACK) to be sent with that fwmark. The
+ listening socket's mark is unchanged. Listening sockets that already
+ have a fwmark set via setsockopt(SOL_SOCKET, SO_MARK, ...) are
+ unaffected.
+
+ Default: 0
+
tcp_invalid_ratelimit - INTEGER
Limit the maximal rate for sending duplicate acknowledgments
in response to incoming TCP packets that are for an existing
@@ -359,6 +390,7 @@
derived from the listen socket to be bound to the L3 domain in
which the packets originated. Only valid when the kernel was
compiled with CONFIG_NET_L3_MASTER_DEV.
+ Default: 0 (disabled)
tcp_low_latency - BOOLEAN
This is a legacy option, it has no effect anymore.
@@ -376,11 +408,14 @@
up to ~64K of unswappable memory.
tcp_max_syn_backlog - INTEGER
- Maximal number of remembered connection requests, which have not
- received an acknowledgment from connecting client.
+ Maximal number of remembered connection requests (SYN_RECV),
+ which have not received an acknowledgment from connecting client.
+ This is a per-listener limit.
The minimal value is 128 for low memory machines, and it will
increase in proportion to the memory of machine.
If server suffers from overload, try increasing this number.
+ Remember to also check /proc/sys/net/core/somaxconn
+ A SYN_RECV request socket consumes about 304 bytes of memory.
tcp_max_tw_buckets - INTEGER
Maximal number of timewait sockets held by system simultaneously.
@@ -410,6 +445,7 @@
minimum RTT when it is moved to a longer path (e.g., due to traffic
engineering). A longer window makes the filter more resistant to RTT
inflations such as transient congestion. The unit is seconds.
+ Possible values: 0 - 86400 (1 day)
Default: 300
tcp_moderate_rcvbuf - BOOLEAN
@@ -542,10 +578,10 @@
Default : 1,000,000 ns (1 ms)
tcp_comp_sack_nr - INTEGER
- Max numer of SACK that can be compressed.
+ Max number of SACK that can be compressed.
Using 0 disables SACK compression.
- Detault : 44
+ Default : 44
tcp_slow_start_after_idle - BOOLEAN
If set, provide RFC2861 behavior and time out the congestion
@@ -630,6 +666,26 @@
0 to disable the blackhole detection.
By default, it is set to 1hr.
+tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
+ The list consists of a primary key and an optional backup key. The
+ primary key is used for both creating and validating cookies, while the
+ optional backup key is only used for validating cookies. The purpose of
+ the backup key is to maximize TFO validation when keys are rotated.
+
+ A randomly chosen primary key may be configured by the kernel if
+ the tcp_fastopen sysctl is set to 0x400 (see above), or if the
+ TCP_FASTOPEN setsockopt() optname is set and a key has not been
+ previously configured via sysctl. If keys are configured via
+ setsockopt() by using the TCP_FASTOPEN_KEY optname, then those
+ per-socket keys will be used instead of any keys that are specified via
+ sysctl.
+
+ A key is specified as 4 8-digit hexadecimal integers which are separated
+ by a '-' as: xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx. Leading zeros may be
+ omitted. A primary and a backup key may be specified by separating them
+ by a comma. If only one key is specified, it becomes the primary key and
+ any previously configured backup keys are removed.
+
tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 127. Default value
@@ -747,13 +803,21 @@
flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes
limits the number of bytes on qdisc or device to reduce artificial
RTT/cwnd and reduce bufferbloat.
- Default: 262144
+ Default: 1048576 (16 * 65536)
tcp_challenge_ack_limit - INTEGER
Limits number of Challenge ACK sent per second, as recommended
in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
Default: 100
+tcp_rx_skb_cache - BOOLEAN
+ Controls a per TCP socket cache of one skb, that might help
+ performance of some workloads. This might be dangerous
+ on systems with a lot of TCP sockets, since it increases
+ memory usage.
+
+ Default: 0 (disabled)
+
UDP variables:
udp_l3mdev_accept - BOOLEAN
@@ -762,6 +826,7 @@
being received regardless of the L3 domain in which they
originated. Only valid when the kernel was compiled with
CONFIG_NET_L3_MASTER_DEV.
+ Default: 0 (disabled)
udp_mem - vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets.
@@ -788,6 +853,16 @@
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 4K
+RAW variables:
+
+raw_l3mdev_accept - BOOLEAN
+ Enabling this option allows a "global" bound socket to work
+ across L3 master domains (e.g., VRFs) with packets capable of
+ being received regardless of the L3 domain in which they
+ originated. Only valid when the kernel was compiled with
+ CONFIG_NET_L3_MASTER_DEV.
+ Default: 1 (enabled)
+
CIPSOv4 Variables:
cipso_cache_enable - BOOLEAN
@@ -1313,6 +1388,7 @@
Default value is 0.
xfrm4_gc_thresh - INTEGER
+ (Obsolete since linux-4.14)
The threshold at which we will start garbage collecting for IPv4
destination cache entries. At twice this value the system will
refuse new allocations.
@@ -1379,14 +1455,26 @@
FALSE: disabled
Default: true
-flowlabel_reflect - BOOLEAN
- Automatically reflect the flow label. Needed for Path MTU
+flowlabel_reflect - INTEGER
+ Control flow label reflection. Needed for Path MTU
Discovery to work with Equal Cost Multipath Routing in anycast
environments. See RFC 7690 and:
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
- TRUE: enabled
- FALSE: disabled
- Default: FALSE
+
+ This is a bitmask.
+ 1: enabled for established flows
+
+ Note that this prevents automatic flowlabel changes, as done
+ in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
+ and "tcp: Change txhash on every SYN and RTO retransmit"
+
+ 2: enabled for TCP RESET packets (no active listener)
+ If set, a RST packet sent in response to a SYN packet on a closed
+ port will reflect the incoming flow label.
+
+ 4: enabled for ICMPv6 echo reply messages.
+
+ Default: 0
fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes.
@@ -1394,6 +1482,7 @@
Possible values:
0 - Layer 3 (source and destination addresses plus flow label)
1 - Layer 4 (standard 5-tuple)
+ 2 - Layer 3 or inner Layer 3 if present
anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6
@@ -1442,6 +1531,14 @@
header.
Default: INT_MAX (unlimited)
+skip_notify_on_dev_down - BOOLEAN
+ Controls whether an RTM_DELROUTE message is generated for routes
+ removed when a device is taken down or deleted. IPv4 does not
+ generate this message; IPv6 does by default. Setting this sysctl
+ to true skips the message, making IPv4 and IPv6 on par in relying
+ on userspace caches to track link events and evict routes.
+ Default: false (generate message)
+
IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER
@@ -1877,17 +1974,43 @@
icmp/*:
ratelimit - INTEGER
- Limit the maximal rates for sending ICMPv6 packets.
+ Limit the maximal rates for sending ICMPv6 messages.
0 to disable any limiting,
otherwise the minimal space between responses in milliseconds.
Default: 1000
+ratemask - list of comma separated ranges
+ For ICMPv6 message types matching the ranges in the ratemask, limit
+ the sending of the message according to ratelimit parameter.
+
+ The format used for both input and output is a comma separated
+ list of ranges (e.g. "0-127,129" for ICMPv6 message type 0 to 127 and
+ 129). Writing to the file will clear all previous ranges of ICMPv6
+ message types and update the current list with the input.
+
+ Refer to: https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml
+ for numerical values of ICMPv6 message types, e.g. echo request is 128
+ and echo reply is 129.
+
+ Default: 0-1,3-127 (rate limit ICMPv6 errors except Packet Too Big)
+
echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it over the IPv6 protocol.
Default: 0
+echo_ignore_multicast - BOOLEAN
+ If set non-zero, then the kernel will ignore all ICMP ECHO
+ requests sent to it over the IPv6 protocol via multicast.
+ Default: 0
+
+echo_ignore_anycast - BOOLEAN
+ If set non-zero, then the kernel will ignore all ICMP ECHO
+ requests sent to it over the IPv6 protocol destined to anycast address.
+ Default: 0
+
xfrm6_gc_thresh - INTEGER
+ (Obsolete since linux-4.14)
The threshold at which we will start garbage collecting for IPv6
destination cache entries. At twice this value the system will
refuse new allocations.
@@ -2173,7 +2296,7 @@
/proc/sys/net/core/*
- Please see: Documentation/sysctl/net.txt for descriptions of these entries.
+ Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
/proc/sys/net/unix/*
diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt
deleted file mode 100644
index 09f71d7..0000000
--- a/Documentation/networking/ixgb.txt
+++ /dev/null
@@ -1,433 +0,0 @@
-Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
-=====================================================================
-
-March 14, 2011
-
-
-Contents
-========
-
-- In This Release
-- Identifying Your Adapter
-- Building and Installation
-- Command Line Parameters
-- Improving Performance
-- Additional Configurations
-- Known Issues/Troubleshooting
-- Support
-
-
-
-In This Release
-===============
-
-This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
-Network Connection. This driver includes support for Itanium(R)2-based
-systems.
-
-For questions related to hardware requirements, refer to the documentation
-supplied with your 10 Gigabit adapter. All hardware requirements listed apply
-to use with Linux.
-
-The following features are available in this kernel:
- - Native VLANs
- - Channel Bonding (teaming)
- - SNMP
-
-Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.txt
-
-The driver information previously displayed in the /proc filesystem is not
-supported in this release. Alternatively, you can use ethtool (version 1.6
-or later), lspci, and iproute2 to obtain the same information.
-
-Instructions on updating ethtool can be found in the section "Additional
-Configurations" later in this document.
-
-
-Identifying Your Adapter
-========================
-
-The following Intel network adapters are compatible with the drivers in this
-release:
-
-Controller Adapter Name Physical Layer
----------- ------------ --------------
-82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber)
- Server Adapters 10G Base-SR (850 nm optical fiber)
- 10G Base-CX4(twin-axial copper cabling)
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/network/sb/CS-012904.htm
-
-
-Building and Installation
-=========================
-
-select m for "Intel(R) PRO/10GbE support" located at:
- Location:
- -> Device Drivers
- -> Network device support (NETDEVICES [=y])
- -> Ethernet (10000 Mbit) (NETDEV_10000 [=y])
-1. make modules && make modules_install
-
-2. Load the module:
-
- modprobe ixgb <parameter>=<value>
-
- The insmod command can be used if the full
- path to the driver module is specified. For example:
-
- insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgb/ixgb.ko
-
- With 2.6 based kernels also make sure that older ixgb drivers are
- removed from the kernel, before loading the new module:
-
- rmmod ixgb; modprobe ixgb
-
-3. Assign an IP address to the interface by entering the following, where
- x is the interface number:
-
- ip addr add ethx <IP_address>
-
-4. Verify that the interface works. Enter the following, where <IP_address>
- is the IP address for another machine on the same subnet as the interface
- that is being tested:
-
- ping <IP_address>
-
-
-Command Line Parameters
-=======================
-
-If the driver is built as a module, the following optional parameters are
-used by entering them on the command line with the modprobe command using
-this syntax:
-
- modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
-
-For example, with two 10GbE PCI adapters, entering:
-
- modprobe ixgb TxDescriptors=80,128
-
-loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
-resources for the second adapter.
-
-The default value for each parameter is generally the recommended setting,
-unless otherwise noted.
-
-FlowControl
-Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
-Default: Read from the EEPROM
- If EEPROM is not detected, default is 1
- This parameter controls the automatic generation(Tx) and response(Rx) to
- Ethernet PAUSE frames. There are hardware bugs associated with enabling
- Tx flow control so beware.
-
-RxDescriptors
-Valid Range: 64-512
-Default Value: 512
- This value is the number of receive descriptors allocated by the driver.
- Increasing this value allows the driver to buffer more incoming packets.
- Each descriptor is 16 bytes. A receive buffer is also allocated for
- each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
- depending on the MTU setting. When the MTU size is 1500 or less, the
- receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
- receive buffer size will be either 4056, 8192, or 16384 bytes. The
- maximum MTU size is 16114.
-
-RxIntDelay
-Valid Range: 0-65535 (0=off)
-Default Value: 72
- This value delays the generation of receive interrupts in units of
- 0.8192 microseconds. Receive interrupt reduction can improve CPU
- efficiency if properly tuned for specific network traffic. Increasing
- this value adds extra latency to frame reception and can end up
- decreasing the throughput of TCP traffic. If the system is reporting
- dropped receives, this value may be set too high, causing the driver to
- run out of available receive descriptors.
-
-TxDescriptors
-Valid Range: 64-4096
-Default Value: 256
- This value is the number of transmit descriptors allocated by the driver.
- Increasing this value allows the driver to queue more transmits. Each
- descriptor is 16 bytes.
-
-XsumRX
-Valid Range: 0-1
-Default Value: 1
- A value of '1' indicates that the driver should enable IP checksum
- offload for received packets (both UDP and TCP) to the adapter hardware.
-
-
-Improving Performance
-=====================
-
-With the 10 Gigabit server adapters, the default Linux configuration will
-very likely limit the total available throughput artificially. There is a set
-of configuration changes that, when applied together, will increase the ability
-of Linux to transmit and receive data. The following enhancements were
-originally acquired from settings published at http://www.spec.org/web99/ for
-various submitted results using Linux.
-
-NOTE: These changes are only suggestions, and serve as a starting point for
- tuning your network performance.
-
-The changes are made in three major ways, listed in order of greatest effect:
-- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen
- parameter.
-- Use sysctl to modify /proc parameters (essentially kernel tuning)
-- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
- transmit burst lengths on the bus.
-
-NOTE: setpci modifies the adapter's configuration registers to allow it to read
-up to 4k bytes at a time (for transmits). However, for some systems the
-behavior after modifying this register may be undefined (possibly errors of
-some kind). A power-cycle, hard reset or explicitly setting the e6 register
-back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
-stable configuration.
-
-- COPY these lines and paste them into ixgb_perf.sh:
-#!/bin/bash
-echo "configuring network performance , edit this file to change the interface
-or device ID of 10GbE card"
-# set mmrbc to 4k reads, modify only Intel 10GbE device IDs
-# replace 1a48 with appropriate 10GbE device's ID installed on the system,
-# if needed.
-setpci -d 8086:1a48 e6.b=2e
-# set the MTU (max transmission unit) - it requires your switch and clients
-# to change as well.
-# set the txqueuelen
-# your ixgb adapter should be loaded as eth1 for this to work, change if needed
-ip li set dev eth1 mtu 9000 txqueuelen 1000 up
-# call the sysctl utility to modify /proc/sys entries
-sysctl -p ./sysctl_ixgb.conf
-- END ixgb_perf.sh
-
-- COPY these lines and paste them into sysctl_ixgb.conf:
-# some of the defaults may be different for your kernel
-# call this file with sysctl -p <this file>
-# these are just suggested values that worked well to increase throughput in
-# several network benchmark tests, your mileage may vary
-
-### IPV4 specific settings
-# turn TCP timestamp support off, default 1, reduces CPU use
-net.ipv4.tcp_timestamps = 0
-# turn SACK support off, default on
-# on systems with a VERY fast bus -> memory interface this is the big gainer
-net.ipv4.tcp_sack = 0
-# set min/default/max TCP read buffer, default 4096 87380 174760
-net.ipv4.tcp_rmem = 10000000 10000000 10000000
-# set min/pressure/max TCP write buffer, default 4096 16384 131072
-net.ipv4.tcp_wmem = 10000000 10000000 10000000
-# set min/pressure/max TCP buffer space, default 31744 32256 32768
-net.ipv4.tcp_mem = 10000000 10000000 10000000
-
-### CORE settings (mostly for socket and UDP effect)
-# set maximum receive socket buffer size, default 131071
-net.core.rmem_max = 524287
-# set maximum send socket buffer size, default 131071
-net.core.wmem_max = 524287
-# set default receive socket buffer size, default 65535
-net.core.rmem_default = 524287
-# set default send socket buffer size, default 65535
-net.core.wmem_default = 524287
-# set maximum amount of option memory buffers, default 10240
-net.core.optmem_max = 524287
-# set number of unprocessed input packets before kernel starts dropping them; default 300
-net.core.netdev_max_backlog = 300000
-- END sysctl_ixgb.conf
-
-Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
-your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
-ID installed on the system.
-
-NOTE: Unless these scripts are added to the boot process, these changes will
- only last only until the next system reboot.
-
-
-Resolving Slow UDP Traffic
---------------------------
-If your server does not seem to be able to receive UDP traffic as fast as it
-can receive TCP traffic, it could be because Linux, by default, does not set
-the network stack buffers as large as they need to be to support high UDP
-transfer rates. One way to alleviate this problem is to allow more memory to
-be used by the IP stack to store incoming data.
-
-For instance, use the commands:
- sysctl -w net.core.rmem_max=262143
-and
- sysctl -w net.core.rmem_default=262143
-to increase the read buffer memory max and default to 262143 (256k - 1) from
-defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
-will increase the amount of memory used by the network stack for receives, and
-can be increased significantly more if necessary for your application.
-
-
-Additional Configurations
-=========================
-
- Configuring the Driver on Different Distributions
- -------------------------------------------------
- Configuring a network driver to load properly when the system is started is
- distribution dependent. Typically, the configuration process involves adding
- an alias line to /etc/modprobe.conf as well as editing other system startup
- scripts and/or configuration files. Many popular Linux distributions ship
- with tools to make these changes for you. To learn the proper way to
- configure a network device for your system, refer to your distribution
- documentation. If during this process you are asked for the driver or module
- name, the name for the Linux Base Driver for the Intel 10GbE Family of
- Adapters is ixgb.
-
- Viewing Link Messages
- ---------------------
- Link messages will not be displayed to the console if the distribution is
- restricting system messages. In order to see network driver link messages on
- your console, set dmesg to eight by entering the following:
-
- dmesg -n 8
-
- NOTE: This setting is not saved across reboots.
-
-
- Jumbo Frames
- ------------
- The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
- enabled by changing the MTU to a value larger than the default of 1500.
- The maximum value for the MTU is 16114. Use the ip command to
- increase the MTU size. For example:
-
- ip li set dev ethx mtu 9000
-
- The maximum MTU setting for Jumbo Frames is 16114. This value coincides
- with the maximum Jumbo Frames size of 16128.
-
-
- ethtool
- -------
- The driver utilizes the ethtool interface for driver configuration and
- diagnostics, as well as displaying statistical information. The ethtool
- version 1.6 or later is required for this functionality.
-
- The latest release of ethtool can be found from
- https://www.kernel.org/pub/software/network/ethtool/
-
- NOTE: The ethtool version 1.6 only supports a limited set of ethtool options.
- Support for a more complete ethtool feature set can be enabled by
- upgrading to the latest version.
-
-
- NAPI
- ----
-
- NAPI (Rx polling mode) is supported in the ixgb driver. NAPI is enabled
- or disabled based on the configuration of the kernel. see CONFIG_IXGB_NAPI
-
- See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
-
-
-Known Issues/Troubleshooting
-============================
-
- NOTE: After installing the driver, if your Intel Network Connection is not
- working, verify in the "In This Release" section of the readme that you have
- installed the correct driver.
-
- Intel(R) PRO/10GbE CX4 Server Adapter Cable Interoperability Issue with
- Fujitsu XENPAK Module in SmartBits Chassis
- ---------------------------------------------------------------------
- Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
- Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
- chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
- The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
- Server adapter or the SmartBits. If this situation occurs using a different
- cable assembly may resolve the issue.
-
- CX4 Server Adapter Cable Interoperability Issues with HP Procurve 3400cl
- Switch Port
- ------------------------------------------------------------------------
- Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
- adapter is connected to an HP Procurve 3400cl switch port using short cables
- (1 m or shorter). If this situation occurs, using a longer cable may resolve
- the issue.
-
- Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
- Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
- errors may be received either by the CX4 Server adapter or at the switch. If
- this situation occurs, using a different cable assembly may resolve the issue.
-
-
- Jumbo Frames System Requirement
- -------------------------------
- Memory allocation failures have been observed on Linux systems with 64 MB
- of RAM or less that are running Jumbo Frames. If you are using Jumbo
- Frames, your system may require more than the advertised minimum
- requirement of 64 MB of system memory.
-
-
- Performance Degradation with Jumbo Frames
- -----------------------------------------
- Degradation in throughput performance may be observed in some Jumbo frames
- environments. If this is observed, increasing the application's socket buffer
- size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
- See the specific application manual and /usr/src/linux*/Documentation/
- networking/ip-sysctl.txt for more details.
-
-
- Allocating Rx Buffers when Using Jumbo Frames
- ---------------------------------------------
- Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
- the available memory is heavily fragmented. This issue may be seen with PCI-X
- adapters or with packet split disabled. This can be reduced or eliminated
- by changing the amount of available memory for receive buffer allocation, by
- increasing /proc/sys/vm/min_free_kbytes.
-
-
- Multiple Interfaces on Same Ethernet Broadcast Network
- ------------------------------------------------------
- Due to the default ARP behavior on Linux, it is not possible to have
- one system on two IP networks in the same Ethernet broadcast domain
- (non-partitioned switch) behave as expected. All Ethernet interfaces
- will respond to IP traffic for any IP address assigned to the system.
- This results in unbalanced receive traffic.
-
- If you have multiple interfaces in a server, do either of the following:
-
- - Turn on ARP filtering by entering:
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-
- - Install the interfaces in separate broadcast domains - either in
- different switches or in a switch partitioned to VLANs.
-
-
- UDP Stress Test Dropped Packet Issue
- --------------------------------------
- Under small packets UDP stress test with 10GbE driver, the Linux system
- may drop UDP packets due to the fullness of socket buffers. You may want
- to change the driver's Flow Control variables to the minimum value for
- controlling packet reception.
-
-
- Tx Hangs Possible Under Stress
- ------------------------------
- Under stress conditions, if TX hangs occur, turning off TSO
- "ethtool -K eth0 tso off" may resolve the problem.
-
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt
deleted file mode 100644
index 6878354..0000000
--- a/Documentation/networking/ixgbe.txt
+++ /dev/null
@@ -1,349 +0,0 @@
-Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of
-Adapters
-=============================================================================
-
-Intel 10 Gigabit Linux driver.
-Copyright(c) 1999 - 2013 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Additional Configurations
-- Performance Tuning
-- Known Issues
-- Support
-
-Identifying Your Adapter
-========================
-
-The driver in this release is compatible with 82598, 82599 and X540-based
-Intel Network Connections.
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/network/sb/CS-012904.htm
-
-SFP+ Devices with Pluggable Optics
-----------------------------------
-
-82599-BASED ADAPTERS
-
-NOTES: If your 82599-based Intel(R) Network Adapter came with Intel optics, or
-is an Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel
-optics and/or the direct attach cables listed below.
-
-When 82599-based SFP+ devices are connected back to back, they should be set to
-the same Speed setting via ethtool. Results may vary if you mix speed settings.
-82598-based adapters support all passive direct attach cables that comply
-with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach
-cables are not supported.
-
-Supplier Type Part Numbers
-
-SR Modules
-Intel DUAL RATE 1G/10G SFP+ SR (bailed) FTLX8571D3BCV-IT
-Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDDZ-IN1
-Intel DUAL RATE 1G/10G SFP+ SR (bailed) AFBR-703SDZ-IN2
-LR Modules
-Intel DUAL RATE 1G/10G SFP+ LR (bailed) FTLX1471D3BCV-IT
-Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDDZ-IN1
-Intel DUAL RATE 1G/10G SFP+ LR (bailed) AFCT-701SDZ-IN2
-
-The following is a list of 3rd party SFP+ modules and direct attach cables that
-have received some testing. Not all modules are applicable to all devices.
-
-Supplier Type Part Numbers
-
-Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL
-Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ
-Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL
-
-Finisar DUAL RATE 1G/10G SFP+ SR (No Bail) FTLX8571D3QCV-IT
-Avago DUAL RATE 1G/10G SFP+ SR (No Bail) AFBR-703SDZ-IN1
-Finisar DUAL RATE 1G/10G SFP+ LR (No Bail) FTLX1471D3QCV-IT
-Avago DUAL RATE 1G/10G SFP+ LR (No Bail) AFCT-701SDZ-IN1
-Finistar 1000BASE-T SFP FCLF8522P2BTL
-Avago 1000BASE-T SFP ABCU-5710RZ
-
-82599-based adapters support all passive and active limiting direct attach
-cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications.
-
-Laser turns off for SFP+ when device is down
--------------------------------------------
-"ip link set down" turns off the laser for 82599-based SFP+ fiber adapters.
-"ip link set up" turns on the laser.
-
-
-82598-BASED ADAPTERS
-
-NOTES for 82598-Based Adapters:
-- Intel(R) Network Adapters that support removable optical modules only support
- their original module type (i.e., the Intel(R) 10 Gigabit SR Dual Port
- Express Module only supports SR optical modules). If you plug in a different
- type of module, the driver will not load.
-- Hot Swapping/hot plugging optical modules is not supported.
-- Only single speed, 10 gigabit modules are supported.
-- LAN on Motherboard (LOMs) may support DA, SR, or LR modules. Other module
- types are not supported. Please see your system documentation for details.
-
-The following is a list of 3rd party SFP+ modules and direct attach cables that
-have received some testing. Not all modules are applicable to all devices.
-
-Supplier Type Part Numbers
-
-Finisar SFP+ SR bailed, 10g single rate FTLX8571D3BCL
-Avago SFP+ SR bailed, 10g single rate AFBR-700SDZ
-Finisar SFP+ LR bailed, 10g single rate FTLX1471D3BCL
-
-82598-based adapters support all passive direct attach cables that comply
-with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach
-cables are not supported.
-
-
-Flow Control
-------------
-Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
-receiving and transmitting pause frames for ixgbe. When TX is enabled, PAUSE
-frames are generated when the receive packet buffer crosses a predefined
-threshold. When rx is enabled, the transmit unit will halt for the time delay
-specified when a PAUSE frame is received.
-
-Flow Control is enabled by default. If you want to disable a flow control
-capable link partner, use ethtool:
-
- ethtool -A eth? autoneg off RX off TX off
-
-NOTE: For 82598 backplane cards entering 1 gig mode, flow control default
-behavior is changed to off. Flow control in 1 gig mode on these devices can
-lead to Tx hangs.
-
-Intel(R) Ethernet Flow Director
--------------------------------
-Supports advanced filters that direct receive packets by their flows to
-different queues. Enables tight control on routing a flow in the platform.
-Matches flows and CPU cores for flow affinity. Supports multiple parameters
-for flexible flow classification and load balancing.
-
-Flow director is enabled only if the kernel is multiple TX queue capable.
-
-An included script (set_irq_affinity.sh) automates setting the IRQ to CPU
-affinity.
-
-You can verify that the driver is using Flow Director by looking at the counter
-in ethtool: fdir_miss and fdir_match.
-
-Other ethtool Commands:
-To enable Flow Director
- ethtool -K ethX ntuple on
-To add a filter
- Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23
- action 1
-To see the list of filters currently present:
- ethtool -u ethX
-
-Perfect Filter: Perfect filter is an interface to load the filter table that
-funnels all flow into queue_0 unless an alternative queue is specified using
-"action". In that case, any flow that matches the filter criteria will be
-directed to the appropriate queue.
-
-If the queue is defined as -1, filter will drop matching packets.
-
-To account for filter matches and misses, there are two stats in ethtool:
-fdir_match and fdir_miss. In addition, rx_queue_N_packets shows the number of
-packets processed by the Nth queue.
-
-NOTE: Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not
-compatible with Flow Director. IF Flow Director is enabled, these will be
-disabled.
-
-The following three parameters impact Flow Director.
-
-FdirMode
---------
-Valid Range: 0-2 (0=off, 1=ATR, 2=Perfect filter mode)
-Default Value: 1
-
- Flow Director filtering modes.
-
-FdirPballoc
------------
-Valid Range: 0-2 (0=64k, 1=128k, 2=256k)
-Default Value: 0
-
- Flow Director allocated packet buffer size.
-
-AtrSampleRate
---------------
-Valid Range: 1-100
-Default Value: 20
-
- Software ATR Tx packet sample rate. For example, when set to 20, every 20th
- packet, looks to see if the packet will create a new flow.
-
-Node
-----
-Valid Range: 0-n
-Default Value: 1 (off)
-
- 0 - n: where n is the number of NUMA nodes (i.e. 0 - 3) currently online in
- your system
- 1: turns this option off
-
- The Node parameter will allow you to pick which NUMA node you want to have
- the adapter allocate memory on.
-
-max_vfs
--------
-Valid Range: 1-63
-Default Value: 0
-
- If the value is greater than 0 it will also force the VMDq parameter to be 1
- or more.
-
- This parameter adds support for SR-IOV. It causes the driver to spawn up to
- max_vfs worth of virtual function.
-
-
-Additional Configurations
-=========================
-
- Jumbo Frames
- ------------
- The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
- enabled by changing the MTU to a value larger than the default of 1500.
- The maximum value for the MTU is 16110. Use the ip command to
- increase the MTU size. For example:
-
- ip link set dev ethx mtu 9000
-
- The maximum MTU setting for Jumbo Frames is 9710. This value coincides
- with the maximum Jumbo Frames size of 9728.
-
- Generic Receive Offload, aka GRO
- --------------------------------
- The driver supports the in-kernel software implementation of GRO. GRO has
- shown that by coalescing Rx traffic into larger chunks of data, CPU
- utilization can be significantly reduced when under large Rx load. GRO is an
- evolution of the previously-used LRO interface. GRO is able to coalesce
- other protocols besides TCP. It's also safe to use with configurations that
- are problematic for LRO, namely bridging and iSCSI.
-
- Data Center Bridging, aka DCB
- -----------------------------
- DCB is a configuration Quality of Service implementation in hardware.
- It uses the VLAN priority tag (802.1p) to filter traffic. That means
- that there are 8 different priorities that traffic can be filtered into.
- It also enables priority flow control which can limit or eliminate the
- number of dropped packets during network stress. Bandwidth can be
- allocated to each of these priorities, which is enforced at the hardware
- level.
-
- To enable DCB support in ixgbe, you must enable the DCB netlink layer to
- allow the userspace tools (see below) to communicate with the driver.
- This can be found in the kernel configuration here:
-
- -> Networking support
- -> Networking options
- -> Data Center Bridging support
-
- Once this is selected, DCB support must be selected for ixgbe. This can
- be found here:
-
- -> Device Drivers
- -> Network device support (NETDEVICES [=y])
- -> Ethernet (10000 Mbit) (NETDEV_10000 [=y])
- -> Intel(R) 10GbE PCI Express adapters support
- -> Data Center Bridging (DCB) Support
-
- After these options are selected, you must rebuild your kernel and your
- modules.
-
- In order to use DCB, userspace tools must be downloaded and installed.
- The dcbd tools can be found at:
-
- http://e1000.sf.net
-
- Ethtool
- -------
- The driver utilizes the ethtool interface for driver configuration and
- diagnostics, as well as displaying statistical information. The latest
- ethtool version is required for this functionality.
-
- The latest release of ethtool can be found from
- https://www.kernel.org/pub/software/network/ethtool/
-
- FCoE
- ----
- This release of the ixgbe driver contains new code to enable users to use
- Fiber Channel over Ethernet (FCoE) and Data Center Bridging (DCB)
- functionality that is supported by the 82598-based hardware. This code has
- no default effect on the regular driver operation, and configuring DCB and
- FCoE is outside the scope of this driver README. Refer to
- http://www.open-fcoe.org/ for FCoE project information and contact
- e1000-eedc@lists.sourceforge.net for DCB information.
-
- MAC and VLAN anti-spoofing feature
- ----------------------------------
- When a malicious driver attempts to send a spoofed packet, it is dropped by
- the hardware and not transmitted. An interrupt is sent to the PF driver
- notifying it of the spoof attempt.
-
- When a spoofed packet is detected the PF driver will send the following
- message to the system log (displayed by the "dmesg" command):
-
- Spoof event(s) detected on VF (n)
-
- Where n=the VF that attempted to do the spoofing.
-
-
-Performance Tuning
-==================
-
-An excellent article on performance tuning can be found at:
-
-http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
-
-
-Known Issues
-============
-
- Enabling SR-IOV in a 32-bit or 64-bit Microsoft* Windows* Server 2008/R2
- Guest OS using Intel (R) 82576-based GbE or Intel (R) 82599-based 10GbE
- controller under KVM
- ------------------------------------------------------------------------
- KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This
- includes traditional PCIe devices, as well as SR-IOV-capable devices using
- Intel 82576-based and 82599-based controllers.
-
- While direct assignment of a PCIe device or an SR-IOV Virtual Function (VF)
- to a Linux-based VM running 2.6.32 or later kernel works fine, there is a
- known issue with Microsoft Windows Server 2008 VM that results in a "yellow
- bang" error. This problem is within the KVM VMM itself, not the Intel driver,
- or the SR-IOV logic of the VMM, but rather that KVM emulates an older CPU
- model for the guests, and this older CPU model does not support MSI-X
- interrupts, which is a requirement for Intel SR-IOV.
-
- If you wish to use the Intel 82576 or 82599-based controllers in SR-IOV mode
- with KVM and a Microsoft Windows Server 2008 guest try the following
- workaround. The workaround is to tell KVM to emulate a different model of CPU
- when using qemu to create the KVM guest:
-
- "-cpu qemu64,model=13"
-
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://e1000.sourceforge.net
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
deleted file mode 100644
index 53d8d2a..0000000
--- a/Documentation/networking/ixgbevf.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-Linux* Base Driver for Intel(R) Ethernet Network Connection
-===========================================================
-
-Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2013 Intel Corporation.
-
-Contents
-========
-
-- Identifying Your Adapter
-- Known Issues/Troubleshooting
-- Support
-
-This file describes the ixgbevf Linux* Base Driver for Intel Network
-Connection.
-
-The ixgbevf driver supports 82599-based virtual function devices that can only
-be activated on kernels with CONFIG_PCI_IOV enabled.
-
-The ixgbevf driver supports virtual functions generated by the ixgbe driver
-with a max_vfs value of 1 or greater.
-
-The guest OS loading the ixgbevf driver must support MSI-X interrupts.
-
-VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs.
-
-Identifying Your Adapter
-========================
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- http://support.intel.com/support/go/network/adapter/idguide.htm
-
-Known Issues/Troubleshooting
-============================
-
-
-Support
-=======
-
-For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on the supported
-kernel with a supported adapter, email the specific information related
-to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/j1939.rst b/Documentation/networking/j1939.rst
new file mode 100644
index 0000000..dc60b13
--- /dev/null
+++ b/Documentation/networking/j1939.rst
@@ -0,0 +1,422 @@
+.. SPDX-License-Identifier: (GPL-2.0 OR MIT)
+
+===================
+J1939 Documentation
+===================
+
+Overview / What Is J1939
+========================
+
+SAE J1939 defines a higher layer protocol on CAN. It implements a more
+sophisticated addressing scheme and extends the maximum packet size above 8
+bytes. Several derived specifications exist, which differ from the original
+J1939 on the application level, like MilCAN A, NMEA2000 and especially
+ISO-11783 (ISOBUS). This last one specifies the so-called ETP (Extended
+Transport Protocol) which is has been included in this implementation. This
+results in a maximum packet size of ((2 ^ 24) - 1) * 7 bytes == 111 MiB.
+
+Specifications used
+-------------------
+
+* SAE J1939-21 : data link layer
+* SAE J1939-81 : network management
+* ISO 11783-6 : Virtual Terminal (Extended Transport Protocol)
+
+.. _j1939-motivation:
+
+Motivation
+==========
+
+Given the fact there's something like SocketCAN with an API similar to BSD
+sockets, we found some reasons to justify a kernel implementation for the
+addressing and transport methods used by J1939.
+
+* **Addressing:** when a process on an ECU communicates via J1939, it should
+ not necessarily know its source address. Although at least one process per
+ ECU should know the source address. Other processes should be able to reuse
+ that address. This way, address parameters for different processes
+ cooperating for the same ECU, are not duplicated. This way of working is
+ closely related to the UNIX concept where programs do just one thing, and do
+ it well.
+
+* **Dynamic addressing:** Address Claiming in J1939 is time critical.
+ Furthermore data transport should be handled properly during the address
+ negotiation. Putting this functionality in the kernel eliminates it as a
+ requirement for _every_ user space process that communicates via J1939. This
+ results in a consistent J1939 bus with proper addressing.
+
+* **Transport:** both TP & ETP reuse some PGNs to relay big packets over them.
+ Different processes may thus use the same TP & ETP PGNs without actually
+ knowing it. The individual TP & ETP sessions _must_ be serialized
+ (synchronized) between different processes. The kernel solves this problem
+ properly and eliminates the serialization (synchronization) as a requirement
+ for _every_ user space process that communicates via J1939.
+
+J1939 defines some other features (relaying, gateway, fast packet transport,
+...). In-kernel code for these would not contribute to protocol stability.
+Therefore, these parts are left to user space.
+
+The J1939 sockets operate on CAN network devices (see SocketCAN). Any J1939
+user space library operating on CAN raw sockets will still operate properly.
+Since such library does not communicate with the in-kernel implementation, care
+must be taken that these two do not interfere. In practice, this means they
+cannot share ECU addresses. A single ECU (or virtual ECU) address is used by
+the library exclusively, or by the in-kernel system exclusively.
+
+J1939 concepts
+==============
+
+PGN
+---
+
+The PGN (Parameter Group Number) is a number to identify a packet. The PGN
+is composed as follows:
+1 bit : Reserved Bit
+1 bit : Data Page
+8 bits : PF (PDU Format)
+8 bits : PS (PDU Specific)
+
+In J1939-21 distinction is made between PDU1 format (where PF < 240) and PDU2
+format (where PF >= 240). Furthermore, when using PDU2 format, the PS-field
+contains a so-called Group Extension, which is part of the PGN. When using PDU2
+format, the Group Extension is set in the PS-field.
+
+On the other hand, when using PDU1 format, the PS-field contains a so-called
+Destination Address, which is _not_ part of the PGN. When communicating a PGN
+from user space to kernel (or visa versa) and PDU2 format is used, the PS-field
+of the PGN shall be set to zero. The Destination Address shall be set
+elsewhere.
+
+Regarding PGN mapping to 29-bit CAN identifier, the Destination Address shall
+be get/set from/to the appropriate bits of the identifier by the kernel.
+
+
+Addressing
+----------
+
+Both static and dynamic addressing methods can be used.
+
+For static addresses, no extra checks are made by the kernel, and provided
+addresses are considered right. This responsibility is for the OEM or system
+integrator.
+
+For dynamic addressing, so-called Address Claiming, extra support is foreseen
+in the kernel. In J1939 any ECU is known by it's 64-bit NAME. At the moment of
+a successful address claim, the kernel keeps track of both NAME and source
+address being claimed. This serves as a base for filter schemes. By default,
+packets with a destination that is not locally, will be rejected.
+
+Mixed mode packets (from a static to a dynamic address or vice versa) are
+allowed. The BSD sockets define separate API calls for getting/setting the
+local & remote address and are applicable for J1939 sockets.
+
+Filtering
+---------
+
+J1939 defines white list filters per socket that a user can set in order to
+receive a subset of the J1939 traffic. Filtering can be based on:
+
+* SA
+* SOURCE_NAME
+* PGN
+
+When multiple filters are in place for a single socket, and a packet comes in
+that matches several of those filters, the packet is only received once for
+that socket.
+
+How to Use J1939
+================
+
+API Calls
+---------
+
+On CAN, you first need to open a socket for communicating over a CAN network.
+To use J1939, #include <linux/can/j1939.h>. From there, <linux/can.h> will be
+included too. To open a socket, use:
+
+.. code-block:: C
+
+ s = socket(PF_CAN, SOCK_DGRAM, CAN_J1939);
+
+J1939 does use SOCK_DGRAM sockets. In the J1939 specification, connections are
+mentioned in the context of transport protocol sessions. These still deliver
+packets to the other end (using several CAN packets). SOCK_STREAM is not
+supported.
+
+After the successful creation of the socket, you would normally use the bind(2)
+and/or connect(2) system call to bind the socket to a CAN interface. After
+binding and/or connecting the socket, you can read(2) and write(2) from/to the
+socket or use send(2), sendto(2), sendmsg(2) and the recv*() counterpart
+operations on the socket as usual. There are also J1939 specific socket options
+described below.
+
+In order to send data, a bind(2) must have been successful. bind(2) assigns a
+local address to a socket.
+
+Different from CAN is that the payload data is just the data that get send,
+without it's header info. The header info is derived from the sockaddr supplied
+to bind(2), connect(2), sendto(2) and recvfrom(2). A write(2) with size 4 will
+result in a packet with 4 bytes.
+
+The sockaddr structure has extensions for use with J1939 as specified below:
+
+.. code-block:: C
+
+ struct sockaddr_can {
+ sa_family_t can_family;
+ int can_ifindex;
+ union {
+ struct {
+ __u64 name;
+ /* pgn:
+ * 8 bit: PS in PDU2 case, else 0
+ * 8 bit: PF
+ * 1 bit: DP
+ * 1 bit: reserved
+ */
+ __u32 pgn;
+ __u8 addr;
+ } j1939;
+ } can_addr;
+ }
+
+can_family & can_ifindex serve the same purpose as for other SocketCAN sockets.
+
+can_addr.j1939.pgn specifies the PGN (max 0x3ffff). Individual bits are
+specified above.
+
+can_addr.j1939.name contains the 64-bit J1939 NAME.
+
+can_addr.j1939.addr contains the address.
+
+The bind(2) system call assigns the local address, i.e. the source address when
+sending packages. If a PGN during bind(2) is set, it's used as a RX filter.
+I.e. only packets with a matching PGN are received. If an ADDR or NAME is set
+it is used as a receive filter, too. It will match the destination NAME or ADDR
+of the incoming packet. The NAME filter will work only if appropriate Address
+Claiming for this name was done on the CAN bus and registered/cached by the
+kernel.
+
+On the other hand connect(2) assigns the remote address, i.e. the destination
+address. The PGN from connect(2) is used as the default PGN when sending
+packets. If ADDR or NAME is set it will be used as the default destination ADDR
+or NAME. Further a set ADDR or NAME during connect(2) is used as a receive
+filter. It will match the source NAME or ADDR of the incoming packet.
+
+Both write(2) and send(2) will send a packet with local address from bind(2) and
+the remote address from connect(2). Use sendto(2) to overwrite the destination
+address.
+
+If can_addr.j1939.name is set (!= 0) the NAME is looked up by the kernel and
+the corresponding ADDR is used. If can_addr.j1939.name is not set (== 0),
+can_addr.j1939.addr is used.
+
+When creating a socket, reasonable defaults are set. Some options can be
+modified with setsockopt(2) & getsockopt(2).
+
+RX path related options:
+
+- SO_J1939_FILTER - configure array of filters
+- SO_J1939_PROMISC - disable filters set by bind(2) and connect(2)
+
+By default no broadcast packets can be send or received. To enable sending or
+receiving broadcast packets use the socket option SO_BROADCAST:
+
+.. code-block:: C
+
+ int value = 1;
+ setsockopt(sock, SOL_SOCKET, SO_BROADCAST, &value, sizeof(value));
+
+The following diagram illustrates the RX path:
+
+.. code::
+
+ +--------------------+
+ | incoming packet |
+ +--------------------+
+ |
+ V
+ +--------------------+
+ | SO_J1939_PROMISC? |
+ +--------------------+
+ | |
+ no | | yes
+ | |
+ .---------' `---------.
+ | |
+ +---------------------------+ |
+ | bind() + connect() + | |
+ | SOCK_BROADCAST filter | |
+ +---------------------------+ |
+ | |
+ |<---------------------'
+ V
+ +---------------------------+
+ | SO_J1939_FILTER |
+ +---------------------------+
+ |
+ V
+ +---------------------------+
+ | socket recv() |
+ +---------------------------+
+
+TX path related options:
+SO_J1939_SEND_PRIO - change default send priority for the socket
+
+Message Flags during send() and Related System Calls
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+send(2), sendto(2) and sendmsg(2) take a 'flags' argument. Currently
+supported flags are:
+
+* MSG_DONTWAIT, i.e. non-blocking operation.
+
+recvmsg(2)
+^^^^^^^^^^
+
+In most cases recvmsg(2) is needed if you want to extract more information than
+recvfrom(2) can provide. For example package priority and timestamp. The
+Destination Address, name and packet priority (if applicable) are attached to
+the msghdr in the recvmsg(2) call. They can be extracted using cmsg(3) macros,
+with cmsg_level == SOL_J1939 && cmsg_type == SCM_J1939_DEST_ADDR,
+SCM_J1939_DEST_NAME or SCM_J1939_PRIO. The returned data is a uint8_t for
+priority and dst_addr, and uint64_t for dst_name.
+
+.. code-block:: C
+
+ uint8_t priority, dst_addr;
+ uint64_t dst_name;
+
+ for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+ switch (cmsg->cmsg_level) {
+ case SOL_CAN_J1939:
+ if (cmsg->cmsg_type == SCM_J1939_DEST_ADDR)
+ dst_addr = *CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_J1939_DEST_NAME)
+ memcpy(&dst_name, CMSG_DATA(cmsg), cmsg->cmsg_len - CMSG_LEN(0));
+ else if (cmsg->cmsg_type == SCM_J1939_PRIO)
+ priority = *CMSG_DATA(cmsg);
+ break;
+ }
+ }
+
+Dynamic Addressing
+------------------
+
+Distinction has to be made between using the claimed address and doing an
+address claim. To use an already claimed address, one has to fill in the
+j1939.name member and provide it to bind(2). If the name had claimed an address
+earlier, all further messages being sent will use that address. And the
+j1939.addr member will be ignored.
+
+An exception on this is PGN 0x0ee00. This is the "Address Claim/Cannot Claim
+Address" message and the kernel will use the j1939.addr member for that PGN if
+necessary.
+
+To claim an address following code example can be used:
+
+.. code-block:: C
+
+ struct sockaddr_can baddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .name = name,
+ .addr = J1939_IDLE_ADDR,
+ .pgn = J1939_NO_PGN, /* to disable bind() rx filter for PGN */
+ },
+ .can_ifindex = if_nametoindex("can0"),
+ };
+
+ bind(sock, (struct sockaddr *)&baddr, sizeof(baddr));
+
+ /* for Address Claiming broadcast must be allowed */
+ int value = 1;
+ setsockopt(sock, SOL_SOCKET, SO_BROADCAST, &value, sizeof(value));
+
+ /* configured advanced RX filter with PGN needed for Address Claiming */
+ const struct j1939_filter filt[] = {
+ {
+ .pgn = J1939_PGN_ADDRESS_CLAIMED,
+ .pgn_mask = J1939_PGN_PDU1_MAX,
+ }, {
+ .pgn = J1939_PGN_ADDRESS_REQUEST,
+ .pgn_mask = J1939_PGN_PDU1_MAX,
+ }, {
+ .pgn = J1939_PGN_ADDRESS_COMMANDED,
+ .pgn_mask = J1939_PGN_MAX,
+ },
+ };
+
+ setsockopt(sock, SOL_CAN_J1939, SO_J1939_FILTER, &filt, sizeof(filt));
+
+ uint64_t dat = htole64(name);
+ const struct sockaddr_can saddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .pgn = J1939_PGN_ADDRESS_CLAIMED,
+ .addr = J1939_NO_ADDR,
+ },
+ };
+
+ /* Afterwards do a sendto(2) with data set to the NAME (Little Endian). If the
+ * NAME provided, does not match the j1939.name provided to bind(2), EPROTO
+ * will be returned.
+ */
+ sendto(sock, dat, sizeof(dat), 0, (const struct sockaddr *)&saddr, sizeof(saddr));
+
+If no-one else contests the address claim within 250ms after transmission, the
+kernel marks the NAME-SA assignment as valid. The valid assignment will be kept
+among other valid NAME-SA assignments. From that point, any socket bound to the
+NAME can send packets.
+
+If another ECU claims the address, the kernel will mark the NAME-SA expired.
+No socket bound to the NAME can send packets (other than address claims). To
+claim another address, some socket bound to NAME, must bind(2) again, but with
+only j1939.addr changed to the new SA, and must then send a valid address claim
+packet. This restarts the state machine in the kernel (and any other
+participant on the bus) for this NAME.
+
+can-utils also include the jacd tool, so it can be used as code example or as
+default Address Claiming daemon.
+
+Send Examples
+-------------
+
+Static Addressing
+^^^^^^^^^^^^^^^^^
+
+This example will send a PGN (0x12300) from SA 0x20 to DA 0x30.
+
+Bind:
+
+.. code-block:: C
+
+ struct sockaddr_can baddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .name = J1939_NO_NAME,
+ .addr = 0x20,
+ .pgn = J1939_NO_PGN,
+ },
+ .can_ifindex = if_nametoindex("can0"),
+ };
+
+ bind(sock, (struct sockaddr *)&baddr, sizeof(baddr));
+
+Now, the socket 'sock' is bound to the SA 0x20. Since no connect(2) was called,
+at this point we can use only sendto(2) or sendmsg(2).
+
+Send:
+
+.. code-block:: C
+
+ const struct sockaddr_can saddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .name = J1939_NO_NAME;
+ .pgn = 0x30,
+ .addr = 0x12300,
+ },
+ };
+
+ sendto(sock, dat, sizeof(dat), 0, (const struct sockaddr *)&saddr, sizeof(saddr));
diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/mac80211_hwsim.rst
similarity index 81%
rename from Documentation/networking/mac80211_hwsim/README
rename to Documentation/networking/mac80211_hwsim/mac80211_hwsim.rst
index 3566a72..d2266ce 100644
--- a/Documentation/networking/mac80211_hwsim/README
+++ b/Documentation/networking/mac80211_hwsim/mac80211_hwsim.rst
@@ -1,5 +1,13 @@
+:orphan:
+
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===================================================================
mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211
-Copyright (c) 2008, Jouni Malinen <j@w1.fi>
+===================================================================
+
+:Copyright: |copy| 2008, Jouni Malinen <j@w1.fi>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
@@ -7,6 +15,7 @@
Introduction
+============
mac80211_hwsim is a Linux kernel module that can be used to simulate
arbitrary number of IEEE 802.11 radios for mac80211. It can be used to
@@ -43,6 +52,7 @@
Simple example
+==============
This example shows how to use mac80211_hwsim to simulate two radios:
one to act as an access point and the other as a station that
@@ -50,17 +60,19 @@
care of WPA2-PSK authentication. In addition, hostapd is also
processing access point side of association.
+::
-# Build mac80211_hwsim as part of kernel configuration
-# Load the module
-modprobe mac80211_hwsim
+ # Build mac80211_hwsim as part of kernel configuration
-# Run hostapd (AP) for wlan0
-hostapd hostapd.conf
+ # Load the module
+ modprobe mac80211_hwsim
-# Run wpa_supplicant (station) for wlan1
-wpa_supplicant -Dnl80211 -iwlan1 -c wpa_supplicant.conf
+ # Run hostapd (AP) for wlan0
+ hostapd hostapd.conf
+
+ # Run wpa_supplicant (station) for wlan1
+ wpa_supplicant -Dnl80211 -iwlan1 -c wpa_supplicant.conf
More test cases are available in hostap.git:
diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 2f24a19..025cc9b 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -30,7 +30,7 @@
0 - disabled / RFC 3443 [Short] Pipe Model
1 - enabled / RFC 3443 Uniform Model (default)
-default_ttl - BOOL
+default_ttl - INTEGER
Default TTL value to use for MPLS packets where it cannot be
propagated from an IP header, either because one isn't present
or ip_ttl_propagate has been disabled.
diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst
index fe46d48..ace5620 100644
--- a/Documentation/networking/msg_zerocopy.rst
+++ b/Documentation/networking/msg_zerocopy.rst
@@ -7,7 +7,7 @@
=====
The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
-The feature is currently implemented for TCP sockets.
+The feature is currently implemented for TCP and UDP sockets.
Opportunity and Caveats
@@ -50,7 +50,7 @@
patchset
[PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY
- http://lkml.kernel.org/r/20170803202945.70750-1-willemdebruijn.kernel@gmail.com
+ https://lkml.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com
Interface
diff --git a/Documentation/networking/net_dim.txt b/Documentation/networking/net_dim.txt
index 9cb31c5..9bdb7d5 100644
--- a/Documentation/networking/net_dim.txt
+++ b/Documentation/networking/net_dim.txt
@@ -92,16 +92,16 @@
Part III: Registering a Network Device to DIM
==============================================
-Net DIM API exposes the main function net_dim(struct net_dim *dim,
-struct net_dim_sample end_sample). This function is the entry point to the Net
+Net DIM API exposes the main function net_dim(struct dim *dim,
+struct dim_sample end_sample). This function is the entry point to the Net
DIM algorithm and has to be called every time the driver would like to check if
it should change interrupt moderation parameters. The driver should provide two
-data structures: struct net_dim and struct net_dim_sample. Struct net_dim
+data structures: struct dim and struct dim_sample. Struct dim
describes the state of DIM for a specific object (RX queue, TX queue,
other queues, etc.). This includes the current selected profile, previous data
samples, the callback function provided by the driver and more.
-Struct net_dim_sample describes a data sample, which will be compared to the
-data sample stored in struct net_dim in order to decide on the algorithm's next
+Struct dim_sample describes a data sample, which will be compared to the
+data sample stored in struct dim in order to decide on the algorithm's next
step. The sample should include bytes, packets and interrupts, measured by
the driver.
@@ -110,9 +110,9 @@
interrupt. Since Net DIM has a built-in moderation and it might decide to skip
iterations under certain conditions, there is no need to moderate the net_dim()
calls as well. As mentioned above, the driver needs to provide an object of type
-struct net_dim to the net_dim() function call. It is advised for each entity
-using Net DIM to hold a struct net_dim as part of its data structure and use it
-as the main Net DIM API object. The struct net_dim_sample should hold the latest
+struct dim to the net_dim() function call. It is advised for each entity
+using Net DIM to hold a struct dim as part of its data structure and use it
+as the main Net DIM API object. The struct dim_sample should hold the latest
bytes, packets and interrupts count. No need to perform any calculations, just
include the raw data.
@@ -132,19 +132,19 @@
my_driver.c:
-#include <linux/net_dim.h>
+#include <linux/dim.h>
/* Callback for net DIM to schedule on a decision to change moderation */
void my_driver_do_dim_work(struct work_struct *work)
{
- /* Get struct net_dim from struct work_struct */
- struct net_dim *dim = container_of(work, struct net_dim,
- work);
+ /* Get struct dim from struct work_struct */
+ struct dim *dim = container_of(work, struct dim,
+ work);
/* Do interrupt moderation related stuff */
...
/* Signal net DIM work is done and it should move to next iteration */
- dim->state = NET_DIM_START_MEASURE;
+ dim->state = DIM_START_MEASURE;
}
/* My driver's interrupt handler */
@@ -152,13 +152,13 @@
{
...
/* A struct to hold current measured data */
- struct net_dim_sample dim_sample;
+ struct dim_sample dim_sample;
...
/* Initiate data sample struct with current data */
- net_dim_sample(my_entity->events,
- my_entity->packets,
- my_entity->bytes,
- &dim_sample);
+ dim_update_sample(my_entity->events,
+ my_entity->packets,
+ my_entity->bytes,
+ &dim_sample);
/* Call net DIM */
net_dim(&my_entity->dim, dim_sample);
...
diff --git a/Documentation/networking/netdev-FAQ.rst b/Documentation/networking/netdev-FAQ.rst
index 0ac5fa7..642fa96 100644
--- a/Documentation/networking/netdev-FAQ.rst
+++ b/Documentation/networking/netdev-FAQ.rst
@@ -131,6 +131,19 @@
version that should be applied. If there is any doubt, the maintainer
will reply and ask what should be done.
+Q: I made changes to only a few patches in a patch series should I resend only those changed?
+---------------------------------------------------------------------------------------------
+A: No, please resend the entire patch series and make sure you do number your
+patches such that it is clear this is the latest and greatest set of patches
+that can be applied.
+
+Q: I submitted multiple versions of a patch series and it looks like a version other than the last one has been accepted, what should I do?
+-------------------------------------------------------------------------------------------------------------------------------------------
+A: There is no revert possible, once it is pushed out, it stays like that.
+Please send incremental versions on top of what has been merged in order to fix
+the patches the way they would look like if your latest patch series was to be
+merged.
+
Q: How can I tell what patches are queued up for backporting to the various stable releases?
--------------------------------------------------------------------------------------------
A: Normally Greg Kroah-Hartman collects stable commits himself, but for
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
index c4a54c1..58dd1c1 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.txt
@@ -115,7 +115,7 @@
* Transmit UDP segmentation offload
-NETIF_F_GSO_UDP_GSO_L4 accepts a single UDP header with a payload that exceeds
+NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds
gso_size. On segmentation, it segments the payload on gso_size boundaries and
replicates the network and UDP headers (fixing up the last one if less than
gso_size).
diff --git a/Documentation/networking/nf_conntrack-sysctl.txt b/Documentation/networking/nf_conntrack-sysctl.txt
index 1669dc2..f75c2ce 100644
--- a/Documentation/networking/nf_conntrack-sysctl.txt
+++ b/Documentation/networking/nf_conntrack-sysctl.txt
@@ -157,7 +157,16 @@
default 30
nf_conntrack_udp_timeout_stream - INTEGER (seconds)
- default 180
+ default 120
This extended timeout will be used in case there is an UDP stream
detected.
+
+nf_conntrack_gre_timeout - INTEGER (seconds)
+ default 30
+
+nf_conntrack_gre_timeout_stream - INTEGER (seconds)
+ default 180
+
+ This extended timeout will be used in case there is an GRE stream
+ detected.
diff --git a/Documentation/networking/nf_flowtable.txt b/Documentation/networking/nf_flowtable.txt
index 54128c5..ca2136c 100644
--- a/Documentation/networking/nf_flowtable.txt
+++ b/Documentation/networking/nf_flowtable.txt
@@ -44,10 +44,10 @@
/ \ / \ |Routing | / \
--> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
\_________/ \__________/ ---------- \____________/ ^
- | ^ | | ^ |
- flowtable | | ____\/___ | |
- | | | / \ | |
- __\/___ | --------->| forward |------------ |
+ | ^ | ^ |
+ flowtable | ____\/___ | |
+ | | / \ | |
+ __\/___ | | forward |------------ |
|-----| | \_________/ |
|-----| | 'flow offload' rule |
|-----| | adds entry to |
diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt
index 355c6d8..b203d13 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.txt
@@ -22,8 +22,9 @@
2. Querying from userspace
Both admin and operational state can be queried via the netlink
-operation RTM_GETLINK. It is also possible to subscribe to RTMGRP_LINK
-to be notified of updates. This is important for setting from userspace.
+operation RTM_GETLINK. It is also possible to subscribe to RTNLGRP_LINK
+to be notified of updates while the interface is admin up. This is
+important for setting from userspace.
These values contain interface state:
@@ -101,8 +102,9 @@
complete. Corresponding functions are netif_dormant_on() to set the
flag, netif_dormant_off() to clear it and netif_dormant() to query.
-On device allocation, networking core sets the flags equivalent to
-netif_carrier_ok() and !netif_dormant().
+On device allocation, both flags __LINK_STATE_NOCARRIER and
+__LINK_STATE_DORMANT are cleared, so the effective state is equivalent
+to netif_carrier_ok() and !netif_dormant().
Whenever the driver CHANGES one of these flags, a workqueue event is
@@ -133,11 +135,11 @@
driver. Afterwards, the userspace application can set IFLA_OPERSTATE
to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
netif_carrier_off() or netif_dormant_on(). Changes made by userspace
-are multicasted on the netlink group RTMGRP_LINK.
+are multicasted on the netlink group RTNLGRP_LINK.
So basically a 802.1X supplicant interacts with the kernel like this:
--subscribe to RTMGRP_LINK
+-subscribe to RTNLGRP_LINK
-set IFLA_LINKMODE to 1 via RTM_SETLINK
-query RTM_GETLINK once to get initial state
-if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until
diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst
new file mode 100644
index 0000000..a689966
--- /dev/null
+++ b/Documentation/networking/phy.rst
@@ -0,0 +1,490 @@
+=====================
+PHY Abstraction Layer
+=====================
+
+Purpose
+=======
+
+Most network devices consist of set of registers which provide an interface
+to a MAC layer, which communicates with the physical connection through a
+PHY. The PHY concerns itself with negotiating link parameters with the link
+partner on the other side of the network connection (typically, an ethernet
+cable), and provides a register interface to allow drivers to determine what
+settings were chosen, and to configure what settings are allowed.
+
+While these devices are distinct from the network devices, and conform to a
+standard layout for the registers, it has been common practice to integrate
+the PHY management code with the network driver. This has resulted in large
+amounts of redundant code. Also, on embedded systems with multiple (and
+sometimes quite different) ethernet controllers connected to the same
+management bus, it is difficult to ensure safe use of the bus.
+
+Since the PHYs are devices, and the management busses through which they are
+accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
+In doing so, it has these goals:
+
+#. Increase code-reuse
+#. Increase overall code-maintainability
+#. Speed development time for new network drivers, and for new systems
+
+Basically, this layer is meant to provide an interface to PHY devices which
+allows network driver writers to write as little code as possible, while
+still providing a full feature set.
+
+The MDIO bus
+============
+
+Most network devices are connected to a PHY by means of a management bus.
+Different devices use different busses (though some share common interfaces).
+In order to take advantage of the PAL, each bus interface needs to be
+registered as a distinct device.
+
+#. read and write functions must be implemented. Their prototypes are::
+
+ int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
+ int read(struct mii_bus *bus, int mii_id, int regnum);
+
+ mii_id is the address on the bus for the PHY, and regnum is the register
+ number. These functions are guaranteed not to be called from interrupt
+ time, so it is safe for them to block, waiting for an interrupt to signal
+ the operation is complete
+
+#. A reset function is optional. This is used to return the bus to an
+ initialized state.
+
+#. A probe function is needed. This function should set up anything the bus
+ driver needs, setup the mii_bus structure, and register with the PAL using
+ mdiobus_register. Similarly, there's a remove function to undo all of
+ that (use mdiobus_unregister).
+
+#. Like any driver, the device_driver structure must be configured, and init
+ exit functions are used to register the driver.
+
+#. The bus must also be declared somewhere as a device, and registered.
+
+As an example for how one driver implemented an mdio bus driver, see
+drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
+for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
+
+(RG)MII/electrical interface considerations
+===========================================
+
+The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
+electrical signal interface using a synchronous 125Mhz clock signal and several
+data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
+between the clock line (RXC or TXC) and the data lines to let the PHY (clock
+sink) have enough setup and hold times to sample the data lines correctly. The
+PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
+the PHY driver and optionally the MAC driver, implement the required delay. The
+values of phy_interface_t must be understood from the perspective of the PHY
+device itself, leading to the following:
+
+* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+ internal delay by itself, it assumes that either the Ethernet MAC (if capable
+ or the PCB traces) insert the correct 1.5-2ns delay
+
+* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
+ for the transmit data lines (TXD[3:0]) processed by the PHY device
+
+* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
+ for the receive data lines (RXD[3:0]) processed by the PHY device
+
+* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
+ both transmit AND receive data lines from/to the PHY device
+
+Whenever possible, use the PHY side RGMII delay for these reasons:
+
+* PHY devices may offer sub-nanosecond granularity in how they allow a
+ receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
+ precision may be required to account for differences in PCB trace lengths
+
+* PHY devices are typically qualified for a large range of applications
+ (industrial, medical, automotive...), and they provide a constant and
+ reliable delay across temperature/pressure/voltage ranges
+
+* PHY device drivers in PHYLIB being reusable by nature, being able to
+ configure correctly a specified delay enables more designs with similar delay
+ requirements to be operate correctly
+
+For cases where the PHY is not capable of providing this delay, but the
+Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
+should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+configured correctly in order to provide the required transmit and/or receive
+side delay from the perspective of the PHY device. Conversely, if the Ethernet
+MAC driver looks at the phy_interface_t value, for any other mode but
+PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
+disabled.
+
+In case neither the Ethernet MAC, nor the PHY are capable of providing the
+required delays, as defined per the RGMII standard, several options may be
+available:
+
+* Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+ set of pins'strength, delays, and voltage; and it may be a suitable
+ option to insert the expected 2ns RGMII delay.
+
+* Modifying the PCB design to include a fixed delay (e.g: using a specifically
+ designed serpentine), which may not require software configuration at all.
+
+Common problems with RGMII delay mismatch
+-----------------------------------------
+
+When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
+will most likely result in the clock and data line signals to be unstable when
+the PHY or MAC take a snapshot of these signals to translate them into logical
+1 or 0 states and reconstruct the data being transmitted/received. Typical
+symptoms include:
+
+* Transmission/reception partially works, and there is frequent or occasional
+ packet loss observed
+
+* Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
+ or just discard them all
+
+* Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
+ (since there is enough setup/hold time in that case)
+
+Connecting to a PHY
+===================
+
+Sometime during startup, the network driver needs to establish a connection
+between the PHY device, and the network device. At this time, the PHY's bus
+and drivers need to all have been loaded, so it is ready for the connection.
+At this point, there are several ways to connect to the PHY:
+
+#. The PAL handles everything, and only calls the network driver when
+ the link state changes, so it can react.
+
+#. The PAL handles everything except interrupts (usually because the
+ controller has the interrupt registers).
+
+#. The PAL handles everything, but checks in with the driver every second,
+ allowing the network driver to react first to any changes before the PAL
+ does.
+
+#. The PAL serves only as a library of functions, with the network device
+ manually calling functions to update status, and configure the PHY
+
+
+Letting the PHY Abstraction Layer do Everything
+===============================================
+
+If you choose option 1 (The hope is that every driver can, but to still be
+useful to drivers that can't), connecting to the PHY is simple:
+
+First, you need a function to react to changes in the link state. This
+function follows this protocol::
+
+ static void adjust_link(struct net_device *dev);
+
+Next, you need to know the device name of the PHY connected to this device.
+The name will look something like, "0:00", where the first number is the
+bus id, and the second is the PHY's address on that bus. Typically,
+the bus is responsible for making its ID unique.
+
+Now, to connect, just call this function::
+
+ phydev = phy_connect(dev, phy_name, &adjust_link, interface);
+
+*phydev* is a pointer to the phy_device structure which represents the PHY.
+If phy_connect is successful, it will return the pointer. dev, here, is the
+pointer to your net_device. Once done, this function will have started the
+PHY's software state machine, and registered for the PHY's interrupt, if it
+has one. The phydev structure will be populated with information about the
+current state, though the PHY will not yet be truly operational at this
+point.
+
+PHY-specific flags should be set in phydev->dev_flags prior to the call
+to phy_connect() such that the underlying PHY driver can check for flags
+and perform specific operations based on them.
+This is useful if the system has put hardware restrictions on
+the PHY/controller, of which the PHY needs to be aware.
+
+*interface* is a u32 which specifies the connection type used
+between the controller and the PHY. Examples are GMII, MII,
+RGMII, and SGMII. See "PHY interface mode" below. For a full
+list, see include/linux/phy.h
+
+Now just make sure that phydev->supported and phydev->advertising have any
+values pruned from them which don't make sense for your controller (a 10/100
+controller may be connected to a gigabit capable PHY, so you would need to
+mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
+for these bitfields. Note that you should not SET any bits, except the
+SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
+put into an unsupported state.
+
+Lastly, once the controller is ready to handle network traffic, you call
+phy_start(phydev). This tells the PAL that you are ready, and configures the
+PHY to connect to the network. If the MAC interrupt of your network driver
+also handles PHY status changes, just set phydev->irq to PHY_IGNORE_INTERRUPT
+before you call phy_start and use phy_mac_interrupt() from the network
+driver. If you don't want to use interrupts, set phydev->irq to PHY_POLL.
+phy_start() enables the PHY interrupts (if applicable) and starts the
+phylib state machine.
+
+When you want to disconnect from the network (even if just briefly), you call
+phy_stop(phydev). This function also stops the phylib state machine and
+disables PHY interrupts.
+
+PHY interface modes
+===================
+
+The PHY interface mode supplied in the phy_connect() family of functions
+defines the initial operating mode of the PHY interface. This is not
+guaranteed to remain constant; there are PHYs which dynamically change
+their interface mode without software interaction depending on the
+negotiation results.
+
+Some of the interface modes are described below:
+
+``PHY_INTERFACE_MODE_1000BASEX``
+ This defines the 1000BASE-X single-lane serdes link as defined by the
+ 802.3 standard section 36. The link operates at a fixed bit rate of
+ 1.25Gbaud using a 10B/8B encoding scheme, resulting in an underlying
+ data rate of 1Gbps. Embedded in the data stream is a 16-bit control
+ word which is used to negotiate the duplex and pause modes with the
+ remote end. This does not include "up-clocked" variants such as 2.5Gbps
+ speeds (see below.)
+
+``PHY_INTERFACE_MODE_2500BASEX``
+ This defines a variant of 1000BASE-X which is clocked 2.5 times faster,
+ than the 802.3 standard giving a fixed bit rate of 3.125Gbaud.
+
+``PHY_INTERFACE_MODE_SGMII``
+ This is used for Cisco SGMII, which is a modification of 1000BASE-X
+ as defined by the 802.3 standard. The SGMII link consists of a single
+ serdes lane running at a fixed bit rate of 1.25Gbaud with 10B/8B
+ encoding. The underlying data rate is 1Gbps, with the slower speeds of
+ 100Mbps and 10Mbps being achieved through replication of each data symbol.
+ The 802.3 control word is re-purposed to send the negotiated speed and
+ duplex information from to the MAC, and for the MAC to acknowledge
+ receipt. This does not include "up-clocked" variants such as 2.5Gbps
+ speeds.
+
+ Note: mismatched SGMII vs 1000BASE-X configuration on a link can
+ successfully pass data in some circumstances, but the 16-bit control
+ word will not be correctly interpreted, which may cause mismatches in
+ duplex, pause or other settings. This is dependent on the MAC and/or
+ PHY behaviour.
+
+
+Pause frames / flow control
+===========================
+
+The PHY does not participate directly in flow control/pause frames except by
+making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
+MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
+controller supports such a thing. Since flow control/pause frames generation
+involves the Ethernet MAC driver, it is recommended that this driver takes care
+of properly indicating advertisement and support for such features by setting
+the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
+either before or after phy_connect() and/or as a result of implementing the
+ethtool::set_pauseparam feature.
+
+
+Keeping Close Tabs on the PAL
+=============================
+
+It is possible that the PAL's built-in state machine needs a little help to
+keep your network device and the PHY properly in sync. If so, you can
+register a helper function when connecting to the PHY, which will be called
+every second before the state machine reacts to any changes. To do this, you
+need to manually call phy_attach() and phy_prepare_link(), and then call
+phy_start_machine() with the second argument set to point to your special
+handler.
+
+Currently there are no examples of how to use this functionality, and testing
+on it has been limited because the author does not have any drivers which use
+it (they all use option 1). So Caveat Emptor.
+
+Doing it all yourself
+=====================
+
+There's a remote chance that the PAL's built-in state machine cannot track
+the complex interactions between the PHY and your network device. If this is
+so, you can simply call phy_attach(), and not call phy_start_machine or
+phy_prepare_link(). This will mean that phydev->state is entirely yours to
+handle (phy_start and phy_stop toggle between some of the states, so you
+might need to avoid them).
+
+An effort has been made to make sure that useful functionality can be
+accessed without the state-machine running, and most of these functions are
+descended from functions which did not interact with a complex state-machine.
+However, again, no effort has been made so far to test running without the
+state machine, so tryer beware.
+
+Here is a brief rundown of the functions::
+
+ int phy_read(struct phy_device *phydev, u16 regnum);
+ int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
+
+Simple read/write primitives. They invoke the bus's read/write function
+pointers.
+::
+
+ void phy_print_status(struct phy_device *phydev);
+
+A convenience function to print out the PHY status neatly.
+::
+
+ void phy_request_interrupt(struct phy_device *phydev);
+
+Requests the IRQ for the PHY interrupts.
+::
+
+ struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
+ phy_interface_t interface);
+
+Attaches a network device to a particular PHY, binding the PHY to a generic
+driver if none was found during bus initialization.
+::
+
+ int phy_start_aneg(struct phy_device *phydev);
+
+Using variables inside the phydev structure, either configures advertising
+and resets autonegotiation, or disables autonegotiation, and configures
+forced settings.
+::
+
+ static inline int phy_read_status(struct phy_device *phydev);
+
+Fills the phydev structure with up-to-date information about the current
+settings in the PHY.
+::
+
+ int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
+
+Ethtool convenience functions.
+::
+
+ int phy_mii_ioctl(struct phy_device *phydev,
+ struct mii_ioctl_data *mii_data, int cmd);
+
+The MII ioctl. Note that this function will completely screw up the state
+machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
+use this only to write registers which are not standard, and don't set off
+a renegotiation.
+
+PHY Device Drivers
+==================
+
+With the PHY Abstraction Layer, adding support for new PHYs is
+quite easy. In some cases, no work is required at all! However,
+many PHYs require a little hand-holding to get up-and-running.
+
+Generic PHY driver
+------------------
+
+If the desired PHY doesn't have any errata, quirks, or special
+features you want to support, then it may be best to not add
+support, and let the PHY Abstraction Layer's Generic PHY Driver
+do all of the work.
+
+Writing a PHY driver
+--------------------
+
+If you do need to write a PHY driver, the first thing to do is
+make sure it can be matched with an appropriate PHY device.
+This is done during bus initialization by reading the device's
+UID (stored in registers 2 and 3), then comparing it to each
+driver's phy_id field by ANDing it with each driver's
+phy_id_mask field. Also, it needs a name. Here's an example::
+
+ static struct phy_driver dm9161_driver = {
+ .phy_id = 0x0181b880,
+ .name = "Davicom DM9161E",
+ .phy_id_mask = 0x0ffffff0,
+ ...
+ }
+
+Next, you need to specify what features (speed, duplex, autoneg,
+etc) your PHY device and driver support. Most PHYs support
+PHY_BASIC_FEATURES, but you can look in include/mii.h for other
+features.
+
+Each driver consists of a number of function pointers, documented
+in include/linux/phy.h under the phy_driver structure.
+
+Of these, only config_aneg and read_status are required to be
+assigned by the driver code. The rest are optional. Also, it is
+preferred to use the generic phy driver's versions of these two
+functions if at all possible: genphy_read_status and
+genphy_config_aneg. If this is not possible, it is likely that
+you only need to perform some actions before and after invoking
+these functions, and so your functions will wrap the generic
+ones.
+
+Feel free to look at the Marvell, Cicada, and Davicom drivers in
+drivers/net/phy/ for examples (the lxt and qsemi drivers have
+not been tested as of this writing).
+
+The PHY's MMD register accesses are handled by the PAL framework
+by default, but can be overridden by a specific PHY driver if
+required. This could be the case if a PHY was released for
+manufacturing before the MMD PHY register definitions were
+standardized by the IEEE. Most modern PHYs will be able to use
+the generic PAL framework for accessing the PHY's MMD registers.
+An example of such usage is for Energy Efficient Ethernet support,
+implemented in the PAL. This support uses the PAL to access MMD
+registers for EEE query and configuration if the PHY supports
+the IEEE standard access mechanisms, or can use the PHY's specific
+access interfaces if overridden by the specific PHY driver. See
+the Micrel driver in drivers/net/phy/ for an example of how this
+can be implemented.
+
+Board Fixups
+============
+
+Sometimes the specific interaction between the platform and the PHY requires
+special handling. For instance, to change where the PHY's clock input is,
+or to add a delay to account for latency issues in the data path. In order
+to support such contingencies, the PHY Layer allows platform code to register
+fixups to be run when the PHY is brought up (or subsequently reset).
+
+When the PHY Layer brings up a PHY it checks to see if there are any fixups
+registered for it, matching based on UID (contained in the PHY device's phy_id
+field) and the bus identifier (contained in phydev->dev.bus_id). Both must
+match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
+wildcards for the bus ID and UID, respectively.
+
+When a match is found, the PHY layer will invoke the run function associated
+with the fixup. This function is passed a pointer to the phy_device of
+interest. It should therefore only operate on that PHY.
+
+The platform code can either register the fixup using phy_register_fixup()::
+
+ int phy_register_fixup(const char *phy_id,
+ u32 phy_uid, u32 phy_uid_mask,
+ int (*run)(struct phy_device *));
+
+Or using one of the two stubs, phy_register_fixup_for_uid() and
+phy_register_fixup_for_id()::
+
+ int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
+ int (*run)(struct phy_device *));
+ int phy_register_fixup_for_id(const char *phy_id,
+ int (*run)(struct phy_device *));
+
+The stubs set one of the two matching criteria, and set the other one to
+match anything.
+
+When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module,
+unregister fixup and free allocate memory are required.
+
+Call one of following function before unloading module::
+
+ int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
+ int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
+ int phy_register_fixup_for_id(const char *phy_id);
+
+Standards
+=========
+
+IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
+http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
+
+RGMII v1.3:
+http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
+
+RGMII v2.0:
+http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
deleted file mode 100644
index bdec0f7..0000000
--- a/Documentation/networking/phy.txt
+++ /dev/null
@@ -1,427 +0,0 @@
-
--------
-PHY Abstraction Layer
-(Updated 2008-04-08)
-
-Purpose
-
- Most network devices consist of set of registers which provide an interface
- to a MAC layer, which communicates with the physical connection through a
- PHY. The PHY concerns itself with negotiating link parameters with the link
- partner on the other side of the network connection (typically, an ethernet
- cable), and provides a register interface to allow drivers to determine what
- settings were chosen, and to configure what settings are allowed.
-
- While these devices are distinct from the network devices, and conform to a
- standard layout for the registers, it has been common practice to integrate
- the PHY management code with the network driver. This has resulted in large
- amounts of redundant code. Also, on embedded systems with multiple (and
- sometimes quite different) ethernet controllers connected to the same
- management bus, it is difficult to ensure safe use of the bus.
-
- Since the PHYs are devices, and the management busses through which they are
- accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
- In doing so, it has these goals:
-
- 1) Increase code-reuse
- 2) Increase overall code-maintainability
- 3) Speed development time for new network drivers, and for new systems
-
- Basically, this layer is meant to provide an interface to PHY devices which
- allows network driver writers to write as little code as possible, while
- still providing a full feature set.
-
-The MDIO bus
-
- Most network devices are connected to a PHY by means of a management bus.
- Different devices use different busses (though some share common interfaces).
- In order to take advantage of the PAL, each bus interface needs to be
- registered as a distinct device.
-
- 1) read and write functions must be implemented. Their prototypes are:
-
- int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
- int read(struct mii_bus *bus, int mii_id, int regnum);
-
- mii_id is the address on the bus for the PHY, and regnum is the register
- number. These functions are guaranteed not to be called from interrupt
- time, so it is safe for them to block, waiting for an interrupt to signal
- the operation is complete
-
- 2) A reset function is optional. This is used to return the bus to an
- initialized state.
-
- 3) A probe function is needed. This function should set up anything the bus
- driver needs, setup the mii_bus structure, and register with the PAL using
- mdiobus_register. Similarly, there's a remove function to undo all of
- that (use mdiobus_unregister).
-
- 4) Like any driver, the device_driver structure must be configured, and init
- exit functions are used to register the driver.
-
- 5) The bus must also be declared somewhere as a device, and registered.
-
- As an example for how one driver implemented an mdio bus driver, see
- drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
- for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
-
-(RG)MII/electrical interface considerations
-
- The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
- electrical signal interface using a synchronous 125Mhz clock signal and several
- data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
- between the clock line (RXC or TXC) and the data lines to let the PHY (clock
- sink) have enough setup and hold times to sample the data lines correctly. The
- PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
- the PHY driver and optionally the MAC driver, implement the required delay. The
- values of phy_interface_t must be understood from the perspective of the PHY
- device itself, leading to the following:
-
- * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
- internal delay by itself, it assumes that either the Ethernet MAC (if capable
- or the PCB traces) insert the correct 1.5-2ns delay
-
- * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
- for the transmit data lines (TXD[3:0]) processed by the PHY device
-
- * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
- for the receive data lines (RXD[3:0]) processed by the PHY device
-
- * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
- both transmit AND receive data lines from/to the PHY device
-
- Whenever possible, use the PHY side RGMII delay for these reasons:
-
- * PHY devices may offer sub-nanosecond granularity in how they allow a
- receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
- precision may be required to account for differences in PCB trace lengths
-
- * PHY devices are typically qualified for a large range of applications
- (industrial, medical, automotive...), and they provide a constant and
- reliable delay across temperature/pressure/voltage ranges
-
- * PHY device drivers in PHYLIB being reusable by nature, being able to
- configure correctly a specified delay enables more designs with similar delay
- requirements to be operate correctly
-
- For cases where the PHY is not capable of providing this delay, but the
- Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
- should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
- configured correctly in order to provide the required transmit and/or receive
- side delay from the perspective of the PHY device. Conversely, if the Ethernet
- MAC driver looks at the phy_interface_t value, for any other mode but
- PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
- disabled.
-
- In case neither the Ethernet MAC, nor the PHY are capable of providing the
- required delays, as defined per the RGMII standard, several options may be
- available:
-
- * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
- set of pins'strength, delays, and voltage; and it may be a suitable
- option to insert the expected 2ns RGMII delay.
-
- * Modifying the PCB design to include a fixed delay (e.g: using a specifically
- designed serpentine), which may not require software configuration at all.
-
-Common problems with RGMII delay mismatch
-
- When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
- will most likely result in the clock and data line signals to be unstable when
- the PHY or MAC take a snapshot of these signals to translate them into logical
- 1 or 0 states and reconstruct the data being transmitted/received. Typical
- symptoms include:
-
- * Transmission/reception partially works, and there is frequent or occasional
- packet loss observed
-
- * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
- or just discard them all
-
- * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
- (since there is enough setup/hold time in that case)
-
-
-Connecting to a PHY
-
- Sometime during startup, the network driver needs to establish a connection
- between the PHY device, and the network device. At this time, the PHY's bus
- and drivers need to all have been loaded, so it is ready for the connection.
- At this point, there are several ways to connect to the PHY:
-
- 1) The PAL handles everything, and only calls the network driver when
- the link state changes, so it can react.
-
- 2) The PAL handles everything except interrupts (usually because the
- controller has the interrupt registers).
-
- 3) The PAL handles everything, but checks in with the driver every second,
- allowing the network driver to react first to any changes before the PAL
- does.
-
- 4) The PAL serves only as a library of functions, with the network device
- manually calling functions to update status, and configure the PHY
-
-
-Letting the PHY Abstraction Layer do Everything
-
- If you choose option 1 (The hope is that every driver can, but to still be
- useful to drivers that can't), connecting to the PHY is simple:
-
- First, you need a function to react to changes in the link state. This
- function follows this protocol:
-
- static void adjust_link(struct net_device *dev);
-
- Next, you need to know the device name of the PHY connected to this device.
- The name will look something like, "0:00", where the first number is the
- bus id, and the second is the PHY's address on that bus. Typically,
- the bus is responsible for making its ID unique.
-
- Now, to connect, just call this function:
-
- phydev = phy_connect(dev, phy_name, &adjust_link, interface);
-
- phydev is a pointer to the phy_device structure which represents the PHY. If
- phy_connect is successful, it will return the pointer. dev, here, is the
- pointer to your net_device. Once done, this function will have started the
- PHY's software state machine, and registered for the PHY's interrupt, if it
- has one. The phydev structure will be populated with information about the
- current state, though the PHY will not yet be truly operational at this
- point.
-
- PHY-specific flags should be set in phydev->dev_flags prior to the call
- to phy_connect() such that the underlying PHY driver can check for flags
- and perform specific operations based on them.
- This is useful if the system has put hardware restrictions on
- the PHY/controller, of which the PHY needs to be aware.
-
- interface is a u32 which specifies the connection type used
- between the controller and the PHY. Examples are GMII, MII,
- RGMII, and SGMII. For a full list, see include/linux/phy.h
-
- Now just make sure that phydev->supported and phydev->advertising have any
- values pruned from them which don't make sense for your controller (a 10/100
- controller may be connected to a gigabit capable PHY, so you would need to
- mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
- for these bitfields. Note that you should not SET any bits, except the
- SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
- put into an unsupported state.
-
- Lastly, once the controller is ready to handle network traffic, you call
- phy_start(phydev). This tells the PAL that you are ready, and configures the
- PHY to connect to the network. If you want to handle your own interrupts,
- just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start.
- Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL.
-
- When you want to disconnect from the network (even if just briefly), you call
- phy_stop(phydev).
-
-Pause frames / flow control
-
- The PHY does not participate directly in flow control/pause frames except by
- making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
- MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
- controller supports such a thing. Since flow control/pause frames generation
- involves the Ethernet MAC driver, it is recommended that this driver takes care
- of properly indicating advertisement and support for such features by setting
- the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
- either before or after phy_connect() and/or as a result of implementing the
- ethtool::set_pauseparam feature.
-
-
-Keeping Close Tabs on the PAL
-
- It is possible that the PAL's built-in state machine needs a little help to
- keep your network device and the PHY properly in sync. If so, you can
- register a helper function when connecting to the PHY, which will be called
- every second before the state machine reacts to any changes. To do this, you
- need to manually call phy_attach() and phy_prepare_link(), and then call
- phy_start_machine() with the second argument set to point to your special
- handler.
-
- Currently there are no examples of how to use this functionality, and testing
- on it has been limited because the author does not have any drivers which use
- it (they all use option 1). So Caveat Emptor.
-
-Doing it all yourself
-
- There's a remote chance that the PAL's built-in state machine cannot track
- the complex interactions between the PHY and your network device. If this is
- so, you can simply call phy_attach(), and not call phy_start_machine or
- phy_prepare_link(). This will mean that phydev->state is entirely yours to
- handle (phy_start and phy_stop toggle between some of the states, so you
- might need to avoid them).
-
- An effort has been made to make sure that useful functionality can be
- accessed without the state-machine running, and most of these functions are
- descended from functions which did not interact with a complex state-machine.
- However, again, no effort has been made so far to test running without the
- state machine, so tryer beware.
-
- Here is a brief rundown of the functions:
-
- int phy_read(struct phy_device *phydev, u16 regnum);
- int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
-
- Simple read/write primitives. They invoke the bus's read/write function
- pointers.
-
- void phy_print_status(struct phy_device *phydev);
-
- A convenience function to print out the PHY status neatly.
-
- int phy_start_interrupts(struct phy_device *phydev);
- int phy_stop_interrupts(struct phy_device *phydev);
-
- Requests the IRQ for the PHY interrupts, then enables them for
- start, or disables then frees them for stop.
-
- struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
- phy_interface_t interface);
-
- Attaches a network device to a particular PHY, binding the PHY to a generic
- driver if none was found during bus initialization.
-
- int phy_start_aneg(struct phy_device *phydev);
-
- Using variables inside the phydev structure, either configures advertising
- and resets autonegotiation, or disables autonegotiation, and configures
- forced settings.
-
- static inline int phy_read_status(struct phy_device *phydev);
-
- Fills the phydev structure with up-to-date information about the current
- settings in the PHY.
-
- int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
-
- Ethtool convenience functions.
-
- int phy_mii_ioctl(struct phy_device *phydev,
- struct mii_ioctl_data *mii_data, int cmd);
-
- The MII ioctl. Note that this function will completely screw up the state
- machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
- use this only to write registers which are not standard, and don't set off
- a renegotiation.
-
-
-PHY Device Drivers
-
- With the PHY Abstraction Layer, adding support for new PHYs is
- quite easy. In some cases, no work is required at all! However,
- many PHYs require a little hand-holding to get up-and-running.
-
-Generic PHY driver
-
- If the desired PHY doesn't have any errata, quirks, or special
- features you want to support, then it may be best to not add
- support, and let the PHY Abstraction Layer's Generic PHY Driver
- do all of the work.
-
-Writing a PHY driver
-
- If you do need to write a PHY driver, the first thing to do is
- make sure it can be matched with an appropriate PHY device.
- This is done during bus initialization by reading the device's
- UID (stored in registers 2 and 3), then comparing it to each
- driver's phy_id field by ANDing it with each driver's
- phy_id_mask field. Also, it needs a name. Here's an example:
-
- static struct phy_driver dm9161_driver = {
- .phy_id = 0x0181b880,
- .name = "Davicom DM9161E",
- .phy_id_mask = 0x0ffffff0,
- ...
- }
-
- Next, you need to specify what features (speed, duplex, autoneg,
- etc) your PHY device and driver support. Most PHYs support
- PHY_BASIC_FEATURES, but you can look in include/mii.h for other
- features.
-
- Each driver consists of a number of function pointers, documented
- in include/linux/phy.h under the phy_driver structure.
-
- Of these, only config_aneg and read_status are required to be
- assigned by the driver code. The rest are optional. Also, it is
- preferred to use the generic phy driver's versions of these two
- functions if at all possible: genphy_read_status and
- genphy_config_aneg. If this is not possible, it is likely that
- you only need to perform some actions before and after invoking
- these functions, and so your functions will wrap the generic
- ones.
-
- Feel free to look at the Marvell, Cicada, and Davicom drivers in
- drivers/net/phy/ for examples (the lxt and qsemi drivers have
- not been tested as of this writing).
-
- The PHY's MMD register accesses are handled by the PAL framework
- by default, but can be overridden by a specific PHY driver if
- required. This could be the case if a PHY was released for
- manufacturing before the MMD PHY register definitions were
- standardized by the IEEE. Most modern PHYs will be able to use
- the generic PAL framework for accessing the PHY's MMD registers.
- An example of such usage is for Energy Efficient Ethernet support,
- implemented in the PAL. This support uses the PAL to access MMD
- registers for EEE query and configuration if the PHY supports
- the IEEE standard access mechanisms, or can use the PHY's specific
- access interfaces if overridden by the specific PHY driver. See
- the Micrel driver in drivers/net/phy/ for an example of how this
- can be implemented.
-
-Board Fixups
-
- Sometimes the specific interaction between the platform and the PHY requires
- special handling. For instance, to change where the PHY's clock input is,
- or to add a delay to account for latency issues in the data path. In order
- to support such contingencies, the PHY Layer allows platform code to register
- fixups to be run when the PHY is brought up (or subsequently reset).
-
- When the PHY Layer brings up a PHY it checks to see if there are any fixups
- registered for it, matching based on UID (contained in the PHY device's phy_id
- field) and the bus identifier (contained in phydev->dev.bus_id). Both must
- match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
- wildcards for the bus ID and UID, respectively.
-
- When a match is found, the PHY layer will invoke the run function associated
- with the fixup. This function is passed a pointer to the phy_device of
- interest. It should therefore only operate on that PHY.
-
- The platform code can either register the fixup using phy_register_fixup():
-
- int phy_register_fixup(const char *phy_id,
- u32 phy_uid, u32 phy_uid_mask,
- int (*run)(struct phy_device *));
-
- Or using one of the two stubs, phy_register_fixup_for_uid() and
- phy_register_fixup_for_id():
-
- int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
- int (*run)(struct phy_device *));
- int phy_register_fixup_for_id(const char *phy_id,
- int (*run)(struct phy_device *));
-
- The stubs set one of the two matching criteria, and set the other one to
- match anything.
-
- When phy_register_fixup() or *_for_uid()/*_for_id() is called at module,
- unregister fixup and free allocate memory are required.
-
- Call one of following function before unloading module.
-
- int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
- int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
- int phy_register_fixup_for_id(const char *phy_id);
-
-Standards
-
- IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
- http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
-
- RGMII v1.3:
- http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
-
- RGMII v2.0:
- http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt
index 0235ae6..f2a0147 100644
--- a/Documentation/networking/rds.txt
+++ b/Documentation/networking/rds.txt
@@ -389,7 +389,7 @@
a common (to all paths) part, and a per-path struct rds_conn_path. All
I/O workqs and reconnect threads are driven from the rds_conn_path.
Transports such as TCP that are multipath capable may then set up a
- TPC socket per rds_conn_path, and this is managed by the transport via
+ TCP socket per rds_conn_path, and this is managed by the transport via
the transport privatee cp_transport_data pointer.
Transports announce themselves as multipath capable by setting the
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
index b540716..180e07d 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -661,7 +661,7 @@
setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7);
The keyring can be manipulated after it has been given to the socket. This
- permits the server to add more keys, replace keys, etc. whilst it is live.
+ permits the server to add more keys, replace keys, etc. while it is live.
(3) A local address must then be bound:
@@ -796,7 +796,9 @@
s64 tx_total_len,
gfp_t gfp,
rxrpc_notify_rx_t notify_rx,
- bool upgrade);
+ bool upgrade,
+ bool intr,
+ unsigned int debug_id);
This allocates the infrastructure to make a new RxRPC call and assigns
call and connection numbers. The call will be made on the UDP port that
@@ -824,6 +826,13 @@
the server upgrade the service to a better one. The resultant service ID
is returned by rxrpc_kernel_recv_data().
+ intr should be set to true if the call should be interruptible. If this
+ is not set, this function may not return until a channel has been
+ allocated; if it is set, the function may return -ERESTARTSYS.
+
+ debug_id is the call debugging ID to be used for tracing. This can be
+ obtained by atomically incrementing rxrpc_debug_id.
+
If this function is successful, an opaque reference to the RxRPC call is
returned. The caller now holds a reference on this and it must be
properly ended.
@@ -1000,51 +1009,6 @@
size should be set when the call is begun. tx_total_len may not be less
than zero.
- (*) Check to see the completion state of a call so that the caller can assess
- whether it needs to be retried.
-
- enum rxrpc_call_completion {
- RXRPC_CALL_SUCCEEDED,
- RXRPC_CALL_REMOTELY_ABORTED,
- RXRPC_CALL_LOCALLY_ABORTED,
- RXRPC_CALL_LOCAL_ERROR,
- RXRPC_CALL_NETWORK_ERROR,
- };
-
- int rxrpc_kernel_check_call(struct socket *sock, struct rxrpc_call *call,
- enum rxrpc_call_completion *_compl,
- u32 *_abort_code);
-
- On return, -EINPROGRESS will be returned if the call is still ongoing; if
- it is finished, *_compl will be set to indicate the manner of completion,
- *_abort_code will be set to any abort code that occurred. 0 will be
- returned on a successful completion, -ECONNABORTED will be returned if the
- client failed due to a remote abort and anything else will return an
- appropriate error code.
-
- The caller should look at this information to decide if it's worth
- retrying the call.
-
- (*) Retry a client call.
-
- int rxrpc_kernel_retry_call(struct socket *sock,
- struct rxrpc_call *call,
- struct sockaddr_rxrpc *srx,
- struct key *key);
-
- This attempts to partially reinitialise a call and submit it again whilst
- reusing the original call's Tx queue to avoid the need to repackage and
- re-encrypt the data to be sent. call indicates the call to retry, srx the
- new address to send it to and key the encryption key to use for signing or
- encrypting the packets.
-
- For this to work, the first Tx data packet must still be in the transmit
- queue, and currently this is only permitted for local and network errors
- and the call must not have been aborted. Any partially constructed Tx
- packet is left as is and can continue being filled afterwards.
-
- It returns 0 if the call was requeued and an error otherwise.
-
(*) Get call RTT.
u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call);
@@ -1054,20 +1018,62 @@
(*) Check call still alive.
- u32 rxrpc_kernel_check_life(struct socket *sock,
- struct rxrpc_call *call);
+ bool rxrpc_kernel_check_life(struct socket *sock,
+ struct rxrpc_call *call,
+ u32 *_life);
+ void rxrpc_kernel_probe_life(struct socket *sock,
+ struct rxrpc_call *call);
- This returns a number that is updated when ACKs are received from the peer
- (notably including PING RESPONSE ACKs which we can elicit by sending PING
- ACKs to see if the call still exists on the server). The caller should
- compare the numbers of two calls to see if the call is still alive after
- waiting for a suitable interval.
+ The first function passes back in *_life a number that is updated when
+ ACKs are received from the peer (notably including PING RESPONSE ACKs
+ which we can elicit by sending PING ACKs to see if the call still exists
+ on the server). The caller should compare the numbers of two calls to see
+ if the call is still alive after waiting for a suitable interval. It also
+ returns true as long as the call hasn't yet reached the completed state.
This allows the caller to work out if the server is still contactable and
- if the call is still alive on the server whilst waiting for the server to
+ if the call is still alive on the server while waiting for the server to
process a client operation.
- This function may transmit a PING ACK.
+ The second function causes a ping ACK to be transmitted to try to provoke
+ the peer into responding, which would then cause the value returned by the
+ first function to change. Note that this must be called in TASK_RUNNING
+ state.
+
+ (*) Get reply timestamp.
+
+ bool rxrpc_kernel_get_reply_time(struct socket *sock,
+ struct rxrpc_call *call,
+ ktime_t *_ts)
+
+ This allows the timestamp on the first DATA packet of the reply of a
+ client call to be queried, provided that it is still in the Rx ring. If
+ successful, the timestamp will be stored into *_ts and true will be
+ returned; false will be returned otherwise.
+
+ (*) Get remote client epoch.
+
+ u32 rxrpc_kernel_get_epoch(struct socket *sock,
+ struct rxrpc_call *call)
+
+ This allows the epoch that's contained in packets of an incoming client
+ call to be queried. This value is returned. The function always
+ successful if the call is still in progress. It shouldn't be called once
+ the call has expired. Note that calling this on a local client call only
+ returns the local epoch.
+
+ This value can be used to determine if the remote client has been
+ restarted as it shouldn't change otherwise.
+
+ (*) Set the maxmimum lifespan on a call.
+
+ void rxrpc_kernel_set_max_life(struct socket *sock,
+ struct rxrpc_call *call,
+ unsigned long hard_timeout)
+
+ This sets the maximum lifespan on a call to hard_timeout (which is in
+ jiffies). In the event of the timeout occurring, the call will be
+ aborted and -ETIME or -ETIMEDOUT will be returned.
=======================
@@ -1119,14 +1125,14 @@
(*) connection_expiry
The amount of time in seconds after a connection was last used before we
- remove it from the connection list. Whilst a connection is in existence,
+ remove it from the connection list. While a connection is in existence,
it serves as a placeholder for negotiated security; when it is deleted,
the security must be renegotiated.
(*) transport_expiry
The amount of time in seconds after a transport was last used before we
- remove it from the transport list. Whilst a transport is in existence, it
+ remove it from the transport list. While a transport is in existence, it
serves to anchor the peer data and keeps the connection ID counter.
(*) rxrpc_rx_window_size
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.rst
similarity index 92%
rename from Documentation/networking/scaling.txt
rename to Documentation/networking/scaling.rst
index b7056a8..f78d7bf 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.rst
@@ -1,4 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
Scaling in the Linux Networking Stack
+=====================================
Introduction
@@ -10,11 +14,11 @@
The following technologies are described:
- RSS: Receive Side Scaling
- RPS: Receive Packet Steering
- RFS: Receive Flow Steering
- Accelerated Receive Flow Steering
- XPS: Transmit Packet Steering
+- RSS: Receive Side Scaling
+- RPS: Receive Packet Steering
+- RFS: Receive Flow Steering
+- Accelerated Receive Flow Steering
+- XPS: Transmit Packet Steering
RSS: Receive Side Scaling
@@ -45,7 +49,9 @@
can be directed to their own receive queue. Such “n-tuple” filters can
be configured from ethtool (--config-ntuple).
-==== RSS Configuration
+
+RSS Configuration
+-----------------
The driver for a multi-queue capable NIC typically provides a kernel
module parameter for specifying the number of hardware queues to
@@ -63,7 +69,9 @@
indirection table could be done to give different queues different
relative weights.
-== RSS IRQ Configuration
+
+RSS IRQ Configuration
+~~~~~~~~~~~~~~~~~~~~~
Each receive queue has a separate IRQ associated with it. The NIC triggers
this to notify a CPU when new packets arrive on the given queue. The
@@ -77,7 +85,9 @@
will be running irqbalance, a daemon that dynamically optimizes IRQ
assignments and as a result may override any manual settings.
-== Suggested Configuration
+
+Suggested Configuration
+~~~~~~~~~~~~~~~~~~~~~~~
RSS should be enabled when latency is a concern or whenever receive
interrupt processing forms a bottleneck. Spreading load between CPUs
@@ -105,10 +115,12 @@
interrupt handler, RPS selects the CPU to perform protocol processing
above the interrupt handler. This is accomplished by placing the packet
on the desired CPU’s backlog queue and waking up the CPU for processing.
-RPS has some advantages over RSS: 1) it can be used with any NIC,
-2) software filters can easily be added to hash over new protocols,
+RPS has some advantages over RSS:
+
+1) it can be used with any NIC
+2) software filters can easily be added to hash over new protocols
3) it does not increase hardware device interrupt rate (although it does
-introduce inter-processor interrupts (IPIs)).
+ introduce inter-processor interrupts (IPIs))
RPS is called during bottom half of the receive interrupt handler, when
a driver sends a packet up the network stack with netif_rx() or
@@ -135,21 +147,25 @@
processing on the remote CPU, and any queued packets are then processed
up the networking stack.
-==== RPS Configuration
+
+RPS Configuration
+-----------------
RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on
by default for SMP). Even when compiled in, RPS remains disabled until
explicitly configured. The list of CPUs to which RPS may forward traffic
-can be configured for each receive queue using a sysfs file entry:
+can be configured for each receive queue using a sysfs file entry::
- /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
+ /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
This file implements a bitmap of CPUs. RPS is disabled when it is zero
(the default), in which case packets are processed on the interrupting
CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to
the bitmap.
-== Suggested Configuration
+
+Suggested Configuration
+~~~~~~~~~~~~~~~~~~~~~~~
For a single queue device, a typical RPS configuration would be to set
the rps_cpus to the CPUs in the same memory domain of the interrupting
@@ -163,7 +179,9 @@
RPS might be beneficial if the rps_cpus for each queue are the ones that
share the same memory domain as the interrupting CPU for that queue.
-==== RPS Flow Limit
+
+RPS Flow Limit
+--------------
RPS scales kernel receive processing across CPUs without introducing
reordering. The trade-off to sending all packets from the same flow
@@ -187,29 +205,33 @@
the threshold, so flow limit does not sever connections outright:
even large flows maintain connectivity.
-== Interface
+
+Interface
+~~~~~~~~~
Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not
turned on. It is implemented for each CPU independently (to avoid lock
and cache contention) and toggled per CPU by setting the relevant bit
in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU
-bitmap interface as rps_cpus (see above) when called from procfs:
+bitmap interface as rps_cpus (see above) when called from procfs::
- /proc/sys/net/core/flow_limit_cpu_bitmap
+ /proc/sys/net/core/flow_limit_cpu_bitmap
Per-flow rate is calculated by hashing each packet into a hashtable
bucket and incrementing a per-bucket counter. The hash function is
the same that selects a CPU in RPS, but as the number of buckets can
be much larger than the number of CPUs, flow limit has finer-grained
identification of large flows and fewer false positives. The default
-table has 4096 buckets. This value can be modified through sysctl
+table has 4096 buckets. This value can be modified through sysctl::
- net.core.flow_limit_table_len
+ net.core.flow_limit_table_len
The value is only consulted when a new table is allocated. Modifying
it does not update active tables.
-== Suggested Configuration
+
+Suggested Configuration
+~~~~~~~~~~~~~~~~~~~~~~~
Flow limit is useful on systems with many concurrent connections,
where a single connection taking up 50% of a CPU indicates a problem.
@@ -280,10 +302,10 @@
the current CPU is updated to match the desired CPU if one of the
following is true:
-- The current CPU's queue head counter >= the recorded tail counter
- value in rps_dev_flow[i]
-- The current CPU is unset (>= nr_cpu_ids)
-- The current CPU is offline
+ - The current CPU's queue head counter >= the recorded tail counter
+ value in rps_dev_flow[i]
+ - The current CPU is unset (>= nr_cpu_ids)
+ - The current CPU is offline
After this check, the packet is sent to the (possibly updated) current
CPU. These rules aim to ensure that a flow only moves to a new CPU when
@@ -291,19 +313,23 @@
packets could arrive later than those about to be processed on the new
CPU.
-==== RFS Configuration
+
+RFS Configuration
+-----------------
RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on
by default for SMP). The functionality remains disabled until explicitly
-configured. The number of entries in the global flow table is set through:
+configured. The number of entries in the global flow table is set through::
- /proc/sys/net/core/rps_sock_flow_entries
+ /proc/sys/net/core/rps_sock_flow_entries
-The number of entries in the per-queue flow table are set through:
+The number of entries in the per-queue flow table are set through::
- /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt
+ /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt
-== Suggested Configuration
+
+Suggested Configuration
+~~~~~~~~~~~~~~~~~~~~~~~
Both of these need to be set before RFS is enabled for a receive queue.
Values for both are rounded up to the nearest power of two. The
@@ -347,7 +373,9 @@
to populate the map. For each CPU, the corresponding queue in the map is
set to be one whose processing CPU is closest in cache locality.
-==== Accelerated RFS Configuration
+
+Accelerated RFS Configuration
+-----------------------------
Accelerated RFS is only available if the kernel is compiled with
CONFIG_RFS_ACCEL and support is provided by the NIC device and driver.
@@ -356,11 +384,14 @@
configured for each receive queue by the driver, so no additional
configuration should be necessary.
-== Suggested Configuration
+
+Suggested Configuration
+~~~~~~~~~~~~~~~~~~~~~~~
This technique should be enabled whenever one wants to use RFS and the
NIC supports hardware acceleration.
+
XPS: Transmit Packet Steering
=============================
@@ -430,20 +461,25 @@
for instance, sets the flag when all data for a connection has been
acknowledged.
-==== XPS Configuration
+XPS Configuration
+-----------------
XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
default for SMP). The functionality remains disabled until explicitly
configured. To enable XPS, the bitmap of CPUs/receive-queues that may
use a transmit queue is configured using the sysfs file entry:
-For selection based on CPUs map:
-/sys/class/net/<dev>/queues/tx-<n>/xps_cpus
+For selection based on CPUs map::
-For selection based on receive-queues map:
-/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
+ /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
-== Suggested Configuration
+For selection based on receive-queues map::
+
+ /sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
+
+
+Suggested Configuration
+~~~~~~~~~~~~~~~~~~~~~~~
For a network device with a single transmission queue, XPS configuration
has no effect, since there is no choice in this case. In a multi-queue
@@ -460,16 +496,18 @@
user configuration for receive-queue map does not apply, then the transmit
queue is selected based on the CPUs map.
-Per TX Queue rate limitation:
-=============================
+
+Per TX Queue rate limitation
+============================
These are rate-limitation mechanisms implemented by HW, where currently
-a max-rate attribute is supported, by setting a Mbps value to
+a max-rate attribute is supported, by setting a Mbps value to::
-/sys/class/net/<dev>/queues/tx-<n>/tx_maxrate
+ /sys/class/net/<dev>/queues/tx-<n>/tx_maxrate
A value of zero means disabled, and this is the default.
+
Further Information
===================
RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into
@@ -480,5 +518,6 @@
submitted by Ben Hutchings (bwh@kernel.org)
Authors:
-Tom Herbert (therbert@google.com)
-Willem de Bruijn (willemb@google.com)
+
+- Tom Herbert (therbert@google.com)
+- Willem de Bruijn (willemb@google.com)
diff --git a/Documentation/networking/segmentation-offloads.txt b/Documentation/networking/segmentation-offloads.rst
similarity index 87%
rename from Documentation/networking/segmentation-offloads.txt
rename to Documentation/networking/segmentation-offloads.rst
index aca542e..085e8fa 100644
--- a/Documentation/networking/segmentation-offloads.txt
+++ b/Documentation/networking/segmentation-offloads.rst
@@ -1,4 +1,9 @@
-Segmentation Offloads in the Linux Networking Stack
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Segmentation Offloads
+=====================
+
Introduction
============
@@ -13,7 +18,8 @@
* Generic Segmentation Offload - GSO
* Generic Receive Offload - GRO
* Partial Generic Segmentation Offload - GSO_PARTIAL
- * SCTP accelleration with GSO - GSO_BY_FRAGS
+ * SCTP acceleration with GSO - GSO_BY_FRAGS
+
TCP Segmentation Offload
========================
@@ -42,6 +48,7 @@
and we will either increment the IP ID for all frames, or leave it at a
static value based on driver preference.
+
UDP Fragmentation Offload
=========================
@@ -54,6 +61,7 @@
still receive them from tuntap and similar devices. Offload of UDP-based
tunnel protocols is still supported.
+
IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
========================================================
@@ -71,17 +79,19 @@
data is normally referred to as the inner headers. Below is the list of
calls to access the given headers:
-IPIP/SIT Tunnel:
- Outer Inner
-MAC skb_mac_header
-Network skb_network_header skb_inner_network_header
-Transport skb_transport_header
+IPIP/SIT Tunnel::
-UDP/GRE Tunnel:
- Outer Inner
-MAC skb_mac_header skb_inner_mac_header
-Network skb_network_header skb_inner_network_header
-Transport skb_transport_header skb_inner_transport_header
+ Outer Inner
+ MAC skb_mac_header
+ Network skb_network_header skb_inner_network_header
+ Transport skb_transport_header
+
+UDP/GRE Tunnel::
+
+ Outer Inner
+ MAC skb_mac_header skb_inner_mac_header
+ Network skb_network_header skb_inner_network_header
+ Transport skb_transport_header skb_inner_transport_header
In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
@@ -93,6 +103,7 @@
headers will be left with a partial checksum and only the outer header
checksum will be computed.
+
Generic Segmentation Offload
============================
@@ -106,6 +117,7 @@
offload is required in GSO. Otherwise it becomes possible for a frame to
be re-routed between devices and end up being unable to be transmitted.
+
Generic Receive Offload
=======================
@@ -117,6 +129,7 @@
If the value of the IPv4 ID is not sequentially incrementing it will be
altered so that it is when a frame assembled via GRO is segmented via GSO.
+
Partial Generic Segmentation Offload
====================================
@@ -134,7 +147,8 @@
that the IPv4 ID field is incremented in the case that a given header does
not have the DF bit set.
-SCTP accelleration with GSO
+
+SCTP acceleration with GSO
===========================
SCTP - despite the lack of hardware support - can still take advantage of
@@ -157,14 +171,14 @@
There are some helpers to make this easier:
- - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
- an skb is an SCTP GSO skb.
+- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
+ an skb is an SCTP GSO skb.
- - For size checks, the skb_gso_validate_*_len family of helpers correctly
- considers GSO_BY_FRAGS.
+- For size checks, the skb_gso_validate_*_len family of helpers correctly
+ considers GSO_BY_FRAGS.
- - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
- will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
+- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
+ will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
new file mode 100644
index 0000000..a5e00a1
--- /dev/null
+++ b/Documentation/networking/sfp-phylink.rst
@@ -0,0 +1,272 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======
+phylink
+=======
+
+Overview
+========
+
+phylink is a mechanism to support hot-pluggable networking modules
+directly connected to a MAC without needing to re-initialise the
+adapter on hot-plug events.
+
+phylink supports conventional phylib-based setups, fixed link setups
+and SFP (Small Formfactor Pluggable) modules at present.
+
+Modes of operation
+==================
+
+phylink has several modes of operation, which depend on the firmware
+settings.
+
+1. PHY mode
+
+ In PHY mode, we use phylib to read the current link settings from
+ the PHY, and pass them to the MAC driver. We expect the MAC driver
+ to configure exactly the modes that are specified without any
+ negotiation being enabled on the link.
+
+2. Fixed mode
+
+ Fixed mode is the same as PHY mode as far as the MAC driver is
+ concerned.
+
+3. In-band mode
+
+ In-band mode is used with 802.3z, SGMII and similar interface modes,
+ and we are expecting to use and honor the in-band negotiation or
+ control word sent across the serdes channel.
+
+By example, what this means is that:
+
+.. code-block:: none
+
+ ð {
+ phy = <&phy>;
+ phy-mode = "sgmii";
+ };
+
+does not use in-band SGMII signalling. The PHY is expected to follow
+exactly the settings given to it in its :c:func:`mac_config` function.
+The link should be forced up or down appropriately in the
+:c:func:`mac_link_up` and :c:func:`mac_link_down` functions.
+
+.. code-block:: none
+
+ ð {
+ managed = "in-band-status";
+ phy = <&phy>;
+ phy-mode = "sgmii";
+ };
+
+uses in-band mode, where results from the PHY's negotiation are passed
+to the MAC through the SGMII control word, and the MAC is expected to
+acknowledge the control word. The :c:func:`mac_link_up` and
+:c:func:`mac_link_down` functions must not force the MAC side link
+up and down.
+
+Rough guide to converting a network driver to sfp/phylink
+=========================================================
+
+This guide briefly describes how to convert a network driver from
+phylib to the sfp/phylink support. Please send patches to improve
+this documentation.
+
+1. Optionally split the network driver's phylib update function into
+ three parts dealing with link-down, link-up and reconfiguring the
+ MAC settings. This can be done as a separate preparation commit.
+
+ An example of this preparation can be found in git commit fc548b991fb0.
+
+2. Replace::
+
+ select FIXED_PHY
+ select PHYLIB
+
+ with::
+
+ select PHYLINK
+
+ in the driver's Kconfig stanza.
+
+3. Add::
+
+ #include <linux/phylink.h>
+
+ to the driver's list of header files.
+
+4. Add::
+
+ struct phylink *phylink;
+ struct phylink_config phylink_config;
+
+ to the driver's private data structure. We shall refer to the
+ driver's private data pointer as ``priv`` below, and the driver's
+ private data structure as ``struct foo_priv``.
+
+5. Replace the following functions:
+
+ .. flat-table::
+ :header-rows: 1
+ :widths: 1 1
+ :stub-columns: 0
+
+ * - Original function
+ - Replacement function
+ * - phy_start(phydev)
+ - phylink_start(priv->phylink)
+ * - phy_stop(phydev)
+ - phylink_stop(priv->phylink)
+ * - phy_mii_ioctl(phydev, ifr, cmd)
+ - phylink_mii_ioctl(priv->phylink, ifr, cmd)
+ * - phy_ethtool_get_wol(phydev, wol)
+ - phylink_ethtool_get_wol(priv->phylink, wol)
+ * - phy_ethtool_set_wol(phydev, wol)
+ - phylink_ethtool_set_wol(priv->phylink, wol)
+ * - phy_disconnect(phydev)
+ - phylink_disconnect_phy(priv->phylink)
+
+ Please note that some of these functions must be called under the
+ rtnl lock, and will warn if not. This will normally be the case,
+ except if these are called from the driver suspend/resume paths.
+
+6. Add/replace ksettings get/set methods with:
+
+ .. code-block:: c
+
+ static int foo_ethtool_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_set(priv->phylink, cmd);
+ }
+
+ static int foo_ethtool_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_get(priv->phylink, cmd);
+ }
+
+7. Replace the call to:
+
+ phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface);
+
+ and associated code with a call to:
+
+ err = phylink_of_phy_connect(priv->phylink, node, flags);
+
+ For the most part, ``flags`` can be zero; these flags are passed to
+ the of_phy_attach() inside this function call if a PHY is specified
+ in the DT node ``node``.
+
+ ``node`` should be the DT node which contains the network phy property,
+ fixed link properties, and will also contain the sfp property.
+
+ The setup of fixed links should also be removed; these are handled
+ internally by phylink.
+
+ of_phy_connect() was also passed a function pointer for link updates.
+ This function is replaced by a different form of MAC updates
+ described below in (8).
+
+ Manipulation of the PHY's supported/advertised happens within phylink
+ based on the validate callback, see below in (8).
+
+ Note that the driver no longer needs to store the ``phy_interface``,
+ and also note that ``phy_interface`` becomes a dynamic property,
+ just like the speed, duplex etc. settings.
+
+ Finally, note that the MAC driver has no direct access to the PHY
+ anymore; that is because in the phylink model, the PHY can be
+ dynamic.
+
+8. Add a :c:type:`struct phylink_mac_ops <phylink_mac_ops>` instance to
+ the driver, which is a table of function pointers, and implement
+ these functions. The old link update function for
+ :c:func:`of_phy_connect` becomes three methods: :c:func:`mac_link_up`,
+ :c:func:`mac_link_down`, and :c:func:`mac_config`. If step 1 was
+ performed, then the functionality will have been split there.
+
+ It is important that if in-band negotiation is used,
+ :c:func:`mac_link_up` and :c:func:`mac_link_down` do not prevent the
+ in-band negotiation from completing, since these functions are called
+ when the in-band link state changes - otherwise the link will never
+ come up.
+
+ The :c:func:`validate` method should mask the supplied supported mask,
+ and ``state->advertising`` with the supported ethtool link modes.
+ These are the new ethtool link modes, so bitmask operations must be
+ used. For an example, see drivers/net/ethernet/marvell/mvneta.c.
+
+ The :c:func:`mac_link_state` method is used to read the link state
+ from the MAC, and report back the settings that the MAC is currently
+ using. This is particularly important for in-band negotiation
+ methods such as 1000base-X and SGMII.
+
+ The :c:func:`mac_config` method is used to update the MAC with the
+ requested state, and must avoid unnecessarily taking the link down
+ when making changes to the MAC configuration. This means the
+ function should modify the state and only take the link down when
+ absolutely necessary to change the MAC configuration. An example
+ of how to do this can be found in :c:func:`mvneta_mac_config` in
+ drivers/net/ethernet/marvell/mvneta.c.
+
+ For further information on these methods, please see the inline
+ documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`.
+
+9. Remove calls to of_parse_phandle() for the PHY,
+ of_phy_register_fixed_link() for fixed links etc. from the probe
+ function, and replace with:
+
+ .. code-block:: c
+
+ struct phylink *phylink;
+ priv->phylink_config.dev = &dev.dev;
+ priv->phylink_config.type = PHYLINK_NETDEV;
+
+ phylink = phylink_create(&priv->phylink_config, node, phy_mode, &phylink_ops);
+ if (IS_ERR(phylink)) {
+ err = PTR_ERR(phylink);
+ fail probe;
+ }
+
+ priv->phylink = phylink;
+
+ and arrange to destroy the phylink in the probe failure path as
+ appropriate and the removal path too by calling:
+
+ .. code-block:: c
+
+ phylink_destroy(priv->phylink);
+
+10. Arrange for MAC link state interrupts to be forwarded into
+ phylink, via:
+
+ .. code-block:: c
+
+ phylink_mac_change(priv->phylink, link_is_up);
+
+ where ``link_is_up`` is true if the link is currently up or false
+ otherwise.
+
+11. Verify that the driver does not call::
+
+ netif_carrier_on()
+ netif_carrier_off()
+
+ as these will interfere with phylink's tracking of the link state,
+ and cause phylink to omit calls via the :c:func:`mac_link_up` and
+ :c:func:`mac_link_down` methods.
+
+Network drivers should call phylink_stop() and phylink_start() via their
+suspend/resume paths, which ensures that the appropriate
+:c:type:`struct phylink_mac_ops <phylink_mac_ops>` methods are called
+as necessary.
+
+For information describing the SFP cage in DT, please see the binding
+documentation in the kernel source tree
+``Documentation/devicetree/bindings/net/sff,sfp.txt``
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
new file mode 100644
index 0000000..38a4edc
--- /dev/null
+++ b/Documentation/networking/snmp_counter.rst
@@ -0,0 +1,1793 @@
+============
+SNMP counter
+============
+
+This document explains the meaning of SNMP counters.
+
+General IPv4 counters
+=====================
+All layer 4 packets and ICMP packets will change these counters, but
+these counters won't be changed by layer 2 packets (such as STP) or
+ARP packets.
+
+* IpInReceives
+
+Defined in `RFC1213 ipInReceives`_
+
+.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
+
+The number of packets received by the IP layer. It gets increasing at the
+beginning of ip_rcv function, always be updated together with
+IpExtInOctets. It will be increased even if the packet is dropped
+later (e.g. due to the IP header is invalid or the checksum is wrong
+and so on). It indicates the number of aggregated segments after
+GRO/LRO.
+
+* IpInDelivers
+
+Defined in `RFC1213 ipInDelivers`_
+
+.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
+
+The number of packets delivers to the upper layer protocols. E.g. TCP, UDP,
+ICMP and so on. If no one listens on a raw socket, only kernel
+supported protocols will be delivered, if someone listens on the raw
+socket, all valid IP packets will be delivered.
+
+* IpOutRequests
+
+Defined in `RFC1213 ipOutRequests`_
+
+.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
+
+The number of packets sent via IP layer, for both single cast and
+multicast packets, and would always be updated together with
+IpExtOutOctets.
+
+* IpExtInOctets and IpExtOutOctets
+
+They are Linux kernel extensions, no RFC definitions. Please note,
+RFC1213 indeed defines ifInOctets and ifOutOctets, but they
+are different things. The ifInOctets and ifOutOctets include the MAC
+layer header size but IpExtInOctets and IpExtOutOctets don't, they
+only include the IP layer header and the IP layer data.
+
+* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
+
+They indicate the number of four kinds of ECN IP packets, please refer
+`Explicit Congestion Notification`_ for more details.
+
+.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6
+
+These 4 counters calculate how many packets received per ECN
+status. They count the real frame number regardless the LRO/GRO. So
+for the same packet, you might find that IpInReceives count 1, but
+IpExtInNoECTPkts counts 2 or more.
+
+* IpInHdrErrors
+
+Defined in `RFC1213 ipInHdrErrors`_. It indicates the packet is
+dropped due to the IP header error. It might happen in both IP input
+and IP forward paths.
+
+.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27
+
+* IpInAddrErrors
+
+Defined in `RFC1213 ipInAddrErrors`_. It will be increased in two
+scenarios: (1) The IP address is invalid. (2) The destination IP
+address is not a local address and IP forwarding is not enabled
+
+.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27
+
+* IpExtInNoRoutes
+
+This counter means the packet is dropped when the IP stack receives a
+packet and can't find a route for it from the route table. It might
+happen when IP forwarding is enabled and the destination IP address is
+not a local address and there is no route for the destination IP
+address.
+
+* IpInUnknownProtos
+
+Defined in `RFC1213 ipInUnknownProtos`_. It will be increased if the
+layer 4 protocol is unsupported by kernel. If an application is using
+raw socket, kernel will always deliver the packet to the raw socket
+and this counter won't be increased.
+
+.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27
+
+* IpExtInTruncatedPkts
+
+For IPv4 packet, it means the actual data size is smaller than the
+"Total Length" field in the IPv4 header.
+
+* IpInDiscards
+
+Defined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped
+in the IP receiving path and due to kernel internal reasons (e.g. no
+enough memory).
+
+.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28
+
+* IpOutDiscards
+
+Defined in `RFC1213 ipOutDiscards`_. It indicates the packet is
+dropped in the IP sending path and due to kernel internal reasons.
+
+.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28
+
+* IpOutNoRoutes
+
+Defined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is
+dropped in the IP sending path and no route is found for it.
+
+.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29
+
+ICMP counters
+=============
+* IcmpInMsgs and IcmpOutMsgs
+
+Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
+
+.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
+.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43
+
+As mentioned in the RFC1213, these two counters include errors, they
+would be increased even if the ICMP packet has an invalid type. The
+ICMP output path will check the header of a raw socket, so the
+IcmpOutMsgs would still be updated if the IP header is constructed by
+a userspace program.
+
+* ICMP named types
+
+| These counters include most of common ICMP types, they are:
+| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
+| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
+| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_
+| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_
+| IcmpInRedirects: `RFC1213 icmpInRedirects`_
+| IcmpInEchos: `RFC1213 icmpInEchos`_
+| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_
+| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_
+| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_
+| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_
+| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_
+| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_
+| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_
+| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_
+| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_
+| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_
+| IcmpOutEchos: `RFC1213 icmpOutEchos`_
+| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_
+| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_
+| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_
+| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_
+| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_
+
+.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41
+.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41
+.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42
+.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42
+.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42
+.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42
+.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42
+.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42
+.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43
+.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43
+.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43
+
+.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44
+.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44
+.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44
+.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44
+.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44
+.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45
+.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45
+.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45
+.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45
+.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45
+.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46
+
+Every ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP
+Echo packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are
+straightforward. The 'In' counter means kernel receives such a packet
+and the 'Out' counter means kernel sends such a packet.
+
+* ICMP numeric types
+
+They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
+ICMP type number. These counters track all kinds of ICMP packets. The
+ICMP type number definition could be found in the `ICMP parameters`_
+document.
+
+.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml
+
+For example, if the Linux kernel sends an ICMP Echo packet, the
+IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
+packet, IcmpMsgInType0 would increase 1.
+
+* IcmpInCsumErrors
+
+This counter indicates the checksum of the ICMP packet is
+wrong. Kernel verifies the checksum after updating the IcmpInMsgs and
+before updating IcmpMsgInType[N]. If a packet has bad checksum, the
+IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
+
+* IcmpInErrors and IcmpOutErrors
+
+Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
+
+.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
+.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43
+
+When an error occurs in the ICMP packet handler path, these two
+counters would be updated. The receiving packet path use IcmpInErrors
+and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
+is increased, IcmpInErrors would always be increased too.
+
+relationship of the ICMP counters
+---------------------------------
+The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
+are updated at the same time. The sum of IcmpMsgInType[N] plus
+IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
+receives an ICMP packet, kernel follows below logic:
+
+1. increase IcmpInMsgs
+2. if has any error, update IcmpInErrors and finish the process
+3. update IcmpMsgOutType[N]
+4. handle the packet depending on the type, if has any error, update
+ IcmpInErrors and finish the process
+
+So if all errors occur in step (2), IcmpInMsgs should be equal to the
+sum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in
+step (4), IcmpInMsgs should be equal to the sum of
+IcmpMsgOutType[N]. If the errors occur in both step (2) and step (4),
+IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
+IcmpInErrors.
+
+General TCP counters
+====================
+* TcpInSegs
+
+Defined in `RFC1213 tcpInSegs`_
+
+.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
+
+The number of packets received by the TCP layer. As mentioned in
+RFC1213, it includes the packets received in error, such as checksum
+error, invalid TCP header and so on. Only one error won't be included:
+if the layer 2 destination address is not the NIC's layer 2
+address. It might happen if the packet is a multicast or broadcast
+packet, or the NIC is in promiscuous mode. In these situations, the
+packets would be delivered to the TCP layer, but the TCP layer will discard
+these packets before increasing TcpInSegs. The TcpInSegs counter
+isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
+counter would only increase 1.
+
+* TcpOutSegs
+
+Defined in `RFC1213 tcpOutSegs`_
+
+.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
+
+The number of packets sent by the TCP layer. As mentioned in RFC1213,
+it excludes the retransmitted packets. But it includes the SYN, ACK
+and RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of
+GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
+increase 2.
+
+* TcpActiveOpens
+
+Defined in `RFC1213 tcpActiveOpens`_
+
+.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
+
+It means the TCP layer sends a SYN, and come into the SYN-SENT
+state. Every time TcpActiveOpens increases 1, TcpOutSegs should always
+increase 1.
+
+* TcpPassiveOpens
+
+Defined in `RFC1213 tcpPassiveOpens`_
+
+.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
+
+It means the TCP layer receives a SYN, replies a SYN+ACK, come into
+the SYN-RCVD state.
+
+* TcpExtTCPRcvCoalesce
+
+When packets are received by the TCP layer and are not be read by the
+application, the TCP layer will try to merge them. This counter
+indicate how many packets are merged in such situation. If GRO is
+enabled, lots of packets would be merged by GRO, these packets
+wouldn't be counted to TcpExtTCPRcvCoalesce.
+
+* TcpExtTCPAutoCorking
+
+When sending packets, the TCP layer will try to merge small packets to
+a bigger one. This counter increase 1 for every packet merged in such
+situation. Please refer to the LWN article for more details:
+https://lwn.net/Articles/576263/
+
+* TcpExtTCPOrigDataSent
+
+This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
+explaination below::
+
+ TCPOrigDataSent: number of outgoing packets with original data (excluding
+ retransmission but including data-in-SYN). This counter is different from
+ TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
+ more useful to track the TCP retransmission rate.
+
+* TCPSynRetrans
+
+This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
+explaination below::
+
+ TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
+ retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
+
+* TCPFastOpenActiveFail
+
+This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
+explaination below::
+
+ TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because
+ the remote does not accept it or the attempts timed out.
+
+.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
+
+* TcpExtListenOverflows and TcpExtListenDrops
+
+When kernel receives a SYN from a client, and if the TCP accept queue
+is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
+At the same time kernel will also add 1 to TcpExtListenDrops. When a
+TCP socket is in LISTEN state, and kernel need to drop a packet,
+kernel would always add 1 to TcpExtListenDrops. So increase
+TcpExtListenOverflows would let TcpExtListenDrops increasing at the
+same time, but TcpExtListenDrops would also increase without
+TcpExtListenOverflows increasing, e.g. a memory allocation fail would
+also let TcpExtListenDrops increase.
+
+Note: The above explanation is based on kernel 4.10 or above version, on
+an old kernel, the TCP stack has different behavior when TCP accept
+queue is full. On the old kernel, TCP stack won't drop the SYN, it
+would complete the 3-way handshake. As the accept queue is full, TCP
+stack will keep the socket in the TCP half-open queue. As it is in the
+half open queue, TCP stack will send SYN+ACK on an exponential backoff
+timer, after client replies ACK, TCP stack checks whether the accept
+queue is still full, if it is not full, moves the socket to the accept
+queue, if it is full, keeps the socket in the half-open queue, at next
+time client replies ACK, this socket will get another chance to move
+to the accept queue.
+
+
+TCP Fast Open
+=============
+* TcpEstabResets
+
+Defined in `RFC1213 tcpEstabResets`_.
+
+.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
+
+* TcpAttemptFails
+
+Defined in `RFC1213 tcpAttemptFails`_.
+
+.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
+
+* TcpOutRsts
+
+Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
+the 'segments sent containing the RST flag', but in linux kernel, this
+couner indicates the segments kerenl tried to send. The sending
+process might be failed due to some errors (e.g. memory alloc failed).
+
+.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
+
+* TcpExtTCPSpuriousRtxHostQueues
+
+When the TCP stack wants to retransmit a packet, and finds that packet
+is not lost in the network, but the packet is not sent yet, the TCP
+stack would give up the retransmission and update this counter. It
+might happen if a packet stays too long time in a qdisc or driver
+queue.
+
+* TcpEstabResets
+
+The socket receives a RST packet in Establish or CloseWait state.
+
+* TcpExtTCPKeepAlive
+
+This counter indicates many keepalive packets were sent. The keepalive
+won't be enabled by default. A userspace program could enable it by
+setting the SO_KEEPALIVE socket option.
+
+* TcpExtTCPSpuriousRTOs
+
+The spurious retransmission timeout detected by the `F-RTO`_
+algorithm.
+
+.. _F-RTO: https://tools.ietf.org/html/rfc5682
+
+TCP Fast Path
+=============
+When kernel receives a TCP packet, it has two paths to handler the
+packet, one is fast path, another is slow path. The comment in kernel
+code provides a good explanation of them, I pasted them below::
+
+ It is split into a fast path and a slow path. The fast path is
+ disabled when:
+
+ - A zero window was announced from us
+ - zero window probing
+ is only handled properly on the slow path.
+ - Out of order segments arrived.
+ - Urgent data is expected.
+ - There is no buffer space left
+ - Unexpected TCP flags/window values/header lengths are received
+ (detected by checking the TCP header against pred_flags)
+ - Data is sent in both directions. The fast path only supports pure senders
+ or pure receivers (this means either the sequence number or the ack
+ value must stay constant)
+ - Unexpected TCP option.
+
+Kernel will try to use fast path unless any of the above conditions
+are satisfied. If the packets are out of order, kernel will handle
+them in slow path, which means the performance might be not very
+good. Kernel would also come into slow path if the "Delayed ack" is
+used, because when using "Delayed ack", the data is sent in both
+directions. When the TCP window scale option is not used, kernel will
+try to enable fast path immediately when the connection comes into the
+established state, but if the TCP window scale option is used, kernel
+will disable the fast path at first, and try to enable it after kernel
+receives packets.
+
+* TcpExtTCPPureAcks and TcpExtTCPHPAcks
+
+If a packet set ACK flag and has no data, it is a pure ACK packet, if
+kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
+if kernel handles it in the slow path, TcpExtTCPPureAcks will
+increase 1.
+
+* TcpExtTCPHPHits
+
+If a TCP packet has data (which means it is not a pure ACK packet),
+and this packet is handled in the fast path, TcpExtTCPHPHits will
+increase 1.
+
+
+TCP abort
+=========
+* TcpExtTCPAbortOnData
+
+It means TCP layer has data in flight, but need to close the
+connection. So TCP layer sends a RST to the other side, indicate the
+connection is not closed very graceful. An easy way to increase this
+counter is using the SO_LINGER option. Please refer to the SO_LINGER
+section of the `socket man page`_:
+
+.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html
+
+By default, when an application closes a connection, the close function
+will return immediately and kernel will try to send the in-flight data
+async. If you use the SO_LINGER option, set l_onoff to 1, and l_linger
+to a positive number, the close function won't return immediately, but
+wait for the in-flight data are acked by the other side, the max wait
+time is l_linger seconds. If set l_onoff to 1 and set l_linger to 0,
+when the application closes a connection, kernel will send a RST
+immediately and increase the TcpExtTCPAbortOnData counter.
+
+* TcpExtTCPAbortOnClose
+
+This counter means the application has unread data in the TCP layer when
+the application wants to close the TCP connection. In such a situation,
+kernel will send a RST to the other side of the TCP connection.
+
+* TcpExtTCPAbortOnMemory
+
+When an application closes a TCP connection, kernel still need to track
+the connection, let it complete the TCP disconnect process. E.g. an
+app calls the close method of a socket, kernel sends fin to the other
+side of the connection, then the app has no relationship with the
+socket any more, but kernel need to keep the socket, this socket
+becomes an orphan socket, kernel waits for the reply of the other side,
+and would come to the TIME_WAIT state finally. When kernel has no
+enough memory to keep the orphan socket, kernel would send an RST to
+the other side, and delete the socket, in such situation, kernel will
+increase 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger
+TcpExtTCPAbortOnMemory:
+
+1. the memory used by the TCP protocol is higher than the third value of
+the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
+
+.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html
+
+2. the orphan socket count is higher than net.ipv4.tcp_max_orphans
+
+
+* TcpExtTCPAbortOnTimeout
+
+This counter will increase when any of the TCP timers expire. In such
+situation, kernel won't send RST, just give up the connection.
+
+* TcpExtTCPAbortOnLinger
+
+When a TCP connection comes into FIN_WAIT_2 state, instead of waiting
+for the fin packet from the other side, kernel could send a RST and
+delete the socket immediately. This is not the default behavior of
+Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
+you could let kernel follow this behavior.
+
+* TcpExtTCPAbortFailed
+
+The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
+satisfied. If an internal error occurs during this process,
+TcpExtTCPAbortFailed will be increased.
+
+.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
+
+TCP Hybrid Slow Start
+=====================
+The Hybrid Slow Start algorithm is an enhancement of the traditional
+TCP congestion window Slow Start algorithm. It uses two pieces of
+information to detect whether the max bandwidth of the TCP path is
+approached. The two pieces of information are ACK train length and
+increase in packet delay. For detail information, please refer the
+`Hybrid Slow Start paper`_. Either ACK train length or packet delay
+hits a specific threshold, the congestion control algorithm will come
+into the Congestion Avoidance state. Until v4.20, two congestion
+control algorithms are using Hybrid Slow Start, they are cubic (the
+default congestion control algorithm) and cdg. Four snmp counters
+relate with the Hybrid Slow Start algorithm.
+
+.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
+
+* TcpExtTCPHystartTrainDetect
+
+How many times the ACK train length threshold is detected
+
+* TcpExtTCPHystartTrainCwnd
+
+The sum of CWND detected by ACK train length. Dividing this value by
+TcpExtTCPHystartTrainDetect is the average CWND which detected by the
+ACK train length.
+
+* TcpExtTCPHystartDelayDetect
+
+How many times the packet delay threshold is detected.
+
+* TcpExtTCPHystartDelayCwnd
+
+The sum of CWND detected by packet delay. Dividing this value by
+TcpExtTCPHystartDelayDetect is the average CWND which detected by the
+packet delay.
+
+TCP retransmission and congestion control
+=========================================
+The TCP protocol has two retransmission mechanisms: SACK and fast
+recovery. They are exclusive with each other. When SACK is enabled,
+the kernel TCP stack would use SACK, or kernel would use fast
+recovery. The SACK is a TCP option, which is defined in `RFC2018`_,
+the fast recovery is defined in `RFC6582`_, which is also called
+'Reno'.
+
+The TCP congestion control is a big and complex topic. To understand
+the related snmp counter, we need to know the states of the congestion
+control state machine. There are 5 states: Open, Disorder, CWR,
+Recovery and Loss. For details about these states, please refer page 5
+and page 6 of this document:
+https://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf
+
+.. _RFC2018: https://tools.ietf.org/html/rfc2018
+.. _RFC6582: https://tools.ietf.org/html/rfc6582
+
+* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery
+
+When the congestion control comes into Recovery state, if sack is
+used, TcpExtTCPSackRecovery increases 1, if sack is not used,
+TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP
+stack begins to retransmit the lost packets.
+
+* TcpExtTCPSACKReneging
+
+A packet was acknowledged by SACK, but the receiver has dropped this
+packet, so the sender needs to retransmit this packet. In this
+situation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver
+could drop a packet which has been acknowledged by SACK, although it is
+unusual, it is allowed by the TCP protocol. The sender doesn't really
+know what happened on the receiver side. The sender just waits until
+the RTO expires for this packet, then the sender assumes this packet
+has been dropped by the receiver.
+
+* TcpExtTCPRenoReorder
+
+The reorder packet is detected by fast recovery. It would only be used
+if SACK is disabled. The fast recovery algorithm detects recorder by
+the duplicate ACK number. E.g., if retransmission is triggered, and
+the original retransmitted packet is not lost, it is just out of
+order, the receiver would acknowledge multiple times, one for the
+retransmitted packet, another for the arriving of the original out of
+order packet. Thus the sender would find more ACks than its
+expectation, and the sender knows out of order occurs.
+
+* TcpExtTCPTSReorder
+
+The reorder packet is detected when a hole is filled. E.g., assume the
+sender sends packet 1,2,3,4,5, and the receiving order is
+1,2,4,5,3. When the sender receives the ACK of packet 3 (which will
+fill the hole), two conditions will let TcpExtTCPTSReorder increase
+1: (1) if the packet 3 is not re-retransmitted yet. (2) if the packet
+3 is retransmitted but the timestamp of the packet 3's ACK is earlier
+than the retransmission timestamp.
+
+* TcpExtTCPSACKReorder
+
+The reorder packet detected by SACK. The SACK has two methods to
+detect reorder: (1) DSACK is received by the sender. It means the
+sender sends the same packet more than one times. And the only reason
+is the sender believes an out of order packet is lost so it sends the
+packet again. (2) Assume packet 1,2,3,4,5 are sent by the sender, and
+the sender has received SACKs for packet 2 and 5, now the sender
+receives SACK for packet 4 and the sender doesn't retransmit the
+packet yet, the sender would know packet 4 is out of order. The TCP
+stack of kernel will increase TcpExtTCPSACKReorder for both of the
+above scenarios.
+
+* TcpExtTCPSlowStartRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is 'Loss'.
+
+* TcpExtTCPFastRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is not 'Loss'.
+
+* TcpExtTCPLostRetransmit
+
+A SACK points out that a retransmission packet is lost again.
+
+* TcpExtTCPRetransFail
+
+The TCP stack tries to deliver a retransmission packet to lower layers
+but the lower layers return an error.
+
+* TcpExtTCPSynRetrans
+
+The TCP stack retransmits a SYN packet.
+
+DSACK
+=====
+The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
+duplicate packets to the sender. There are two kinds of
+duplications: (1) a packet which has been acknowledged is
+duplicate. (2) an out of order packet is duplicate. The TCP stack
+counts these two kinds of duplications on both receiver side and
+sender side.
+
+.. _RFC2883 : https://tools.ietf.org/html/rfc2883
+
+* TcpExtTCPDSACKOldSent
+
+The TCP stack receives a duplicate packet which has been acked, so it
+sends a DSACK to the sender.
+
+* TcpExtTCPDSACKOfoSent
+
+The TCP stack receives an out of order duplicate packet, so it sends a
+DSACK to the sender.
+
+* TcpExtTCPDSACKRecv
+
+The TCP stack receives a DSACK, which indicates an acknowledged
+duplicate packet is received.
+
+* TcpExtTCPDSACKOfoRecv
+
+The TCP stack receives a DSACK, which indicate an out of order
+duplicate packet is received.
+
+invalid SACK and DSACK
+======================
+When a SACK (or DSACK) block is invalid, a corresponding counter would
+be updated. The validation method is base on the start/end sequence
+number of the SACK block. For more details, please refer the comment
+of the function tcp_is_sackblock_valid in the kernel source code. A
+SACK option could have up to 4 blocks, they are checked
+individually. E.g., if 3 blocks of a SACk is invalid, the
+corresponding counter would be updated 3 times. The comment of the
+`Add counters for discarded SACK blocks`_ patch has additional
+explaination:
+
+.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32
+
+* TcpExtTCPSACKDiscard
+
+This counter indicates how many SACK blocks are invalid. If the invalid
+SACK block is caused by ACK recording, the TCP stack will only ignore
+it and won't update this counter.
+
+* TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo
+
+When a DSACK block is invalid, one of these two counters would be
+updated. Which counter will be updated depends on the undo_marker flag
+of the TCP socket. If the undo_marker is not set, the TCP stack isn't
+likely to re-transmit any packets, and we still receive an invalid
+DSACK block, the reason might be that the packet is duplicated in the
+middle of the network. In such scenario, TcpExtTCPDSACKIgnoredNoUndo
+will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld
+will be updated. As implied in its name, it might be an old packet.
+
+SACK shift
+==========
+The linux networking stack stores data in sk_buff struct (skb for
+short). If a SACK block acrosses multiple skb, the TCP stack will try
+to re-arrange data in these skb. E.g. if a SACK block acknowledges seq
+10 to 15, skb1 has seq 10 to 13, skb2 has seq 14 to 20. The seq 14 and
+15 in skb2 would be moved to skb1. This operation is 'shift'. If a
+SACK block acknowledges seq 10 to 20, skb1 has seq 10 to 13, skb2 has
+seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be
+discard, this operation is 'merge'.
+
+* TcpExtTCPSackShifted
+
+A skb is shifted
+
+* TcpExtTCPSackMerged
+
+A skb is merged
+
+* TcpExtTCPSackShiftFallback
+
+A skb should be shifted or merged, but the TCP stack doesn't do it for
+some reasons.
+
+TCP out of order
+================
+* TcpExtTCPOFOQueue
+
+The TCP layer receives an out of order packet and has enough memory
+to queue it.
+
+* TcpExtTCPOFODrop
+
+The TCP layer receives an out of order packet but doesn't have enough
+memory, so drops it. Such packets won't be counted into
+TcpExtTCPOFOQueue.
+
+* TcpExtTCPOFOMerge
+
+The received out of order packet has an overlay with the previous
+packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
+packets will also be counted into TcpExtTCPOFOQueue.
+
+TCP PAWS
+========
+PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
+which is used to drop old packets. It depends on the TCP
+timestamps. For detail information, please refer the `timestamp wiki`_
+and the `RFC of PAWS`_.
+
+.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
+.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
+
+* TcpExtPAWSActive
+
+Packets are dropped by PAWS in Syn-Sent status.
+
+* TcpExtPAWSEstab
+
+Packets are dropped by PAWS in any status other than Syn-Sent.
+
+TCP ACK skip
+============
+In some scenarios, kernel would avoid sending duplicate ACKs too
+frequently. Please find more details in the tcp_invalid_ratelimit
+section of the `sysctl document`_. When kernel decides to skip an ACK
+due to tcp_invalid_ratelimit, kernel would update one of below
+counters to indicate the ACK is skipped in which scenario. The ACK
+would only be skipped if the received packet is either a SYN packet or
+it has no data.
+
+.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
+
+* TcpExtTCPACKSkippedSynRecv
+
+The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
+TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
+waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
+in the Syn-Recv status. But in several scenarios, the TCP stack need
+to send an ACK. E.g., the TCP stack receives the same SYN packet
+repeately, the received packet does not pass the PAWS check, or the
+received packet sequence number is out of window. In these scenarios,
+the TCP stack needs to send ACK. If the ACk sending frequency is higher than
+tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
+increase TcpExtTCPACKSkippedSynRecv.
+
+
+* TcpExtTCPACKSkippedPAWS
+
+The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
+numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
+or Time-Wait statuses, the skipped ACK would be counted to
+TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
+TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
+would be counted to TcpExtTCPACKSkippedPAWS.
+
+* TcpExtTCPACKSkippedSeq
+
+The sequence number is out of window and the timestamp passes the PAWS
+check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
+
+* TcpExtTCPACKSkippedFinWait2
+
+The ACK is skipped in Fin-Wait-2 status, the reason would be either
+PAWS check fails or the received sequence number is out of window.
+
+* TcpExtTCPACKSkippedTimeWait
+
+Tha ACK is skipped in Time-Wait status, the reason would be either
+PAWS check failed or the received sequence number is out of window.
+
+* TcpExtTCPACKSkippedChallenge
+
+The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
+3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
+`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
+three scenarios, In some TCP status, the linux TCP stack would also
+send challenge ACKs if the ACK number is before the first
+unacknowledged number (more strict than `RFC 5961 section 5.2`_).
+
+.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
+.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
+.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
+
+TCP receive window
+==================
+* TcpExtTCPWantZeroWindowAdv
+
+Depending on current memory usage, the TCP stack tries to set receive
+window to zero. But the receive window might still be a no-zero
+value. For example, if the previous window size is 10, and the TCP
+stack receives 3 bytes, the current window size would be 7 even if the
+window size calculated by the memory usage is zero.
+
+* TcpExtTCPToZeroWindowAdv
+
+The TCP receive window is set to zero from a no-zero value.
+
+* TcpExtTCPFromZeroWindowAdv
+
+The TCP receive window is set to no-zero value from zero.
+
+
+Delayed ACK
+===========
+The TCP Delayed ACK is a technique which is used for reducing the
+packet count in the network. For more details, please refer the
+`Delayed ACK wiki`_
+
+.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
+
+* TcpExtDelayedACKs
+
+A delayed ACK timer expires. The TCP stack will send a pure ACK packet
+and exit the delayed ACK mode.
+
+* TcpExtDelayedACKLocked
+
+A delayed ACK timer expires, but the TCP stack can't send an ACK
+immediately due to the socket is locked by a userspace program. The
+TCP stack will send a pure ACK later (after the userspace program
+unlock the socket). When the TCP stack sends the pure ACK later, the
+TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
+mode.
+
+* TcpExtDelayedACKLost
+
+It will be updated when the TCP stack receives a packet which has been
+ACKed. A Delayed ACK loss might cause this issue, but it would also be
+triggered by other reasons, such as a packet is duplicated in the
+network.
+
+Tail Loss Probe (TLP)
+=====================
+TLP is an algorithm which is used to detect TCP packet loss. For more
+details, please refer the `TLP paper`_.
+
+.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
+
+* TcpExtTCPLossProbes
+
+A TLP probe packet is sent.
+
+* TcpExtTCPLossProbeRecovery
+
+A packet loss is detected and recovered by TLP.
+
+TCP Fast Open
+=============
+TCP Fast Open is a technology which allows data transfer before the
+3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
+general description.
+
+.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open
+
+* TcpExtTCPFastOpenActive
+
+When the TCP stack receives an ACK packet in the SYN-SENT status, and
+the ACK packet acknowledges the data in the SYN packet, the TCP stack
+understand the TFO cookie is accepted by the other side, then it
+updates this counter.
+
+* TcpExtTCPFastOpenActiveFail
+
+This counter indicates that the TCP stack initiated a TCP Fast Open,
+but it failed. This counter would be updated in three scenarios: (1)
+the other side doesn't acknowledge the data in the SYN packet. (2) The
+SYN packet which has the TFO cookie is timeout at least once. (3)
+after the 3-way handshake, the retransmission timeout happens
+net.ipv4.tcp_retries1 times, because some middle-boxes may black-hole
+fast open after the handshake.
+
+* TcpExtTCPFastOpenPassive
+
+This counter indicates how many times the TCP stack accepts the fast
+open request.
+
+* TcpExtTCPFastOpenPassiveFail
+
+This counter indicates how many times the TCP stack rejects the fast
+open request. It is caused by either the TFO cookie is invalid or the
+TCP stack finds an error during the socket creating process.
+
+* TcpExtTCPFastOpenListenOverflow
+
+When the pending fast open request number is larger than
+fastopenq->max_qlen, the TCP stack will reject the fast open request
+and update this counter. When this counter is updated, the TCP stack
+won't update TcpExtTCPFastOpenPassive or
+TcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the
+TCP_FASTOPEN socket operation and it could not be larger than
+net.core.somaxconn. For example:
+
+setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
+
+* TcpExtTCPFastOpenCookieReqd
+
+This counter indicates how many times a client wants to request a TFO
+cookie.
+
+SYN cookies
+===========
+SYN cookies are used to mitigate SYN flood, for details, please refer
+the `SYN cookies wiki`_.
+
+.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies
+
+* TcpExtSyncookiesSent
+
+It indicates how many SYN cookies are sent.
+
+* TcpExtSyncookiesRecv
+
+How many reply packets of the SYN cookies the TCP stack receives.
+
+* TcpExtSyncookiesFailed
+
+The MSS decoded from the SYN cookie is invalid. When this counter is
+updated, the received packet won't be treated as a SYN cookie and the
+TcpExtSyncookiesRecv counter wont be updated.
+
+Challenge ACK
+=============
+For details of challenge ACK, please refer the explaination of
+TcpExtTCPACKSkippedChallenge.
+
+* TcpExtTCPChallengeACK
+
+The number of challenge acks sent.
+
+* TcpExtTCPSYNChallenge
+
+The number of challenge acks sent in response to SYN packets. After
+updates this counter, the TCP stack might send a challenge ACK and
+update the TcpExtTCPChallengeACK counter, or it might also skip to
+send the challenge and update the TcpExtTCPACKSkippedChallenge.
+
+prune
+=====
+When a socket is under memory pressure, the TCP stack will try to
+reclaim memory from the receiving queue and out of order queue. One of
+the reclaiming method is 'collapse', which means allocate a big sbk,
+copy the contiguous skbs to the single big skb, and free these
+contiguous skbs.
+
+* TcpExtPruneCalled
+
+The TCP stack tries to reclaim memory for a socket. After updates this
+counter, the TCP stack will try to collapse the out of order queue and
+the receiving queue. If the memory is still not enough, the TCP stack
+will try to discard packets from the out of order queue (and update the
+TcpExtOfoPruned counter)
+
+* TcpExtOfoPruned
+
+The TCP stack tries to discard packet on the out of order queue.
+
+* TcpExtRcvPruned
+
+After 'collapse' and discard packets from the out of order queue, if
+the actually used memory is still larger than the max allowed memory,
+this counter will be updated. It means the 'prune' fails.
+
+* TcpExtTCPRcvCollapsed
+
+This counter indicates how many skbs are freed during 'collapse'.
+
+examples
+========
+
+ping test
+---------
+Run the ping command against the public dns server 8.8.8.8::
+
+ nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
+ PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
+ 64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
+
+ --- 8.8.8.8 ping statistics ---
+ 1 packets transmitted, 1 received, 0% packet loss, time 0ms
+ rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
+
+The nstayt result::
+
+ nstatuser@nstat-a:~$ nstat
+ #kernel
+ IpInReceives 1 0.0
+ IpInDelivers 1 0.0
+ IpOutRequests 1 0.0
+ IcmpInMsgs 1 0.0
+ IcmpInEchoReps 1 0.0
+ IcmpOutMsgs 1 0.0
+ IcmpOutEchos 1 0.0
+ IcmpMsgInType0 1 0.0
+ IcmpMsgOutType8 1 0.0
+ IpExtInOctets 84 0.0
+ IpExtOutOctets 84 0.0
+ IpExtInNoECTPkts 1 0.0
+
+The Linux server sent an ICMP Echo packet, so IpOutRequests,
+IcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The
+server got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs,
+IcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply
+was passed to the ICMP layer via IP layer, so IpInDelivers was
+increased 1. The default ping data size is 48, so an ICMP Echo packet
+and its corresponding Echo Reply packet are constructed by:
+
+* 14 bytes MAC header
+* 20 bytes IP header
+* 16 bytes ICMP header
+* 48 bytes data (default value of the ping command)
+
+So the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
+
+tcp 3-way handshake
+-------------------
+On server side, we run::
+
+ nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
+ Listening on [0.0.0.0] (family 0, port 9000)
+
+On client side, we run::
+
+ nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000
+ Connection to 192.168.122.251 9000 port [tcp/*] succeeded!
+
+The server listened on tcp 9000 port, the client connected to it, they
+completed the 3-way handshake.
+
+On server side, we can find below nstat output::
+
+ nstatuser@nstat-b:~$ nstat | grep -i tcp
+ TcpPassiveOpens 1 0.0
+ TcpInSegs 2 0.0
+ TcpOutSegs 1 0.0
+ TcpExtTCPPureAcks 1 0.0
+
+On client side, we can find below nstat output::
+
+ nstatuser@nstat-a:~$ nstat | grep -i tcp
+ TcpActiveOpens 1 0.0
+ TcpInSegs 1 0.0
+ TcpOutSegs 2 0.0
+
+When the server received the first SYN, it replied a SYN+ACK, and came into
+SYN-RCVD state, so TcpPassiveOpens increased 1. The server received
+SYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2
+packets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK
+of the 3-way handshake is a pure ACK without data, so
+TcpExtTCPPureAcks increased 1.
+
+When the client sent SYN, the client came into the SYN-SENT state, so
+TcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent
+ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
+1, TcpOutSegs increased 2.
+
+TCP normal traffic
+------------------
+Run nc on server::
+
+ nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
+ Listening on [0.0.0.0] (family 0, port 9000)
+
+Run nc on client::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+ Connection to nstat-b 9000 port [tcp/*] succeeded!
+
+Input a string in the nc client ('hello' in our example)::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+ Connection to nstat-b 9000 port [tcp/*] succeeded!
+ hello
+
+The client side nstat output::
+
+ nstatuser@nstat-a:~$ nstat
+ #kernel
+ IpInReceives 1 0.0
+ IpInDelivers 1 0.0
+ IpOutRequests 1 0.0
+ TcpInSegs 1 0.0
+ TcpOutSegs 1 0.0
+ TcpExtTCPPureAcks 1 0.0
+ TcpExtTCPOrigDataSent 1 0.0
+ IpExtInOctets 52 0.0
+ IpExtOutOctets 58 0.0
+ IpExtInNoECTPkts 1 0.0
+
+The server side nstat output::
+
+ nstatuser@nstat-b:~$ nstat
+ #kernel
+ IpInReceives 1 0.0
+ IpInDelivers 1 0.0
+ IpOutRequests 1 0.0
+ TcpInSegs 1 0.0
+ TcpOutSegs 1 0.0
+ IpExtInOctets 58 0.0
+ IpExtOutOctets 52 0.0
+ IpExtInNoECTPkts 1 0.0
+
+Input a string in nc client side again ('world' in our exmaple)::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+ Connection to nstat-b 9000 port [tcp/*] succeeded!
+ hello
+ world
+
+Client side nstat output::
+
+ nstatuser@nstat-a:~$ nstat
+ #kernel
+ IpInReceives 1 0.0
+ IpInDelivers 1 0.0
+ IpOutRequests 1 0.0
+ TcpInSegs 1 0.0
+ TcpOutSegs 1 0.0
+ TcpExtTCPHPAcks 1 0.0
+ TcpExtTCPOrigDataSent 1 0.0
+ IpExtInOctets 52 0.0
+ IpExtOutOctets 58 0.0
+ IpExtInNoECTPkts 1 0.0
+
+
+Server side nstat output::
+
+ nstatuser@nstat-b:~$ nstat
+ #kernel
+ IpInReceives 1 0.0
+ IpInDelivers 1 0.0
+ IpOutRequests 1 0.0
+ TcpInSegs 1 0.0
+ TcpOutSegs 1 0.0
+ TcpExtTCPHPHits 1 0.0
+ IpExtInOctets 58 0.0
+ IpExtOutOctets 52 0.0
+ IpExtInNoECTPkts 1 0.0
+
+Compare the first client-side nstat and the second client-side nstat,
+we could find one difference: the first one had a 'TcpExtTCPPureAcks',
+but the second one had a 'TcpExtTCPHPAcks'. The first server-side
+nstat and the second server-side nstat had a difference too: the
+second server-side nstat had a TcpExtTCPHPHits, but the first
+server-side nstat didn't have it. The network traffic patterns were
+exactly the same: the client sent a packet to the server, the server
+replied an ACK. But kernel handled them in different ways. When the
+TCP window scale option is not used, kernel will try to enable fast
+path immediately when the connection comes into the established state,
+but if the TCP window scale option is used, kernel will disable the
+fast path at first, and try to enable it after kerenl receives
+packets. We could use the 'ss' command to verify whether the window
+scale option is used. e.g. run below command on either server or
+client::
+
+ nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 )
+ Netid Recv-Q Send-Q Local Address:Port Peer Address:Port
+ tcp 0 0 192.168.122.250:40654 192.168.122.251:9000
+ ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98
+
+The 'wscale:7,7' means both server and client set the window scale
+option to 7. Now we could explain the nstat output in our test:
+
+In the first nstat output of client side, the client sent a packet, server
+reply an ACK, when kernel handled this ACK, the fast path was not
+enabled, so the ACK was counted into 'TcpExtTCPPureAcks'.
+
+In the second nstat output of client side, the client sent a packet again,
+and received another ACK from the server, in this time, the fast path is
+enabled, and the ACK was qualified for fast path, so it was handled by
+the fast path, so this ACK was counted into TcpExtTCPHPAcks.
+
+In the first nstat output of server side, fast path was not enabled,
+so there was no 'TcpExtTCPHPHits'.
+
+In the second nstat output of server side, the fast path was enabled,
+and the packet received from client qualified for fast path, so it
+was counted into 'TcpExtTCPHPHits'.
+
+TcpExtTCPAbortOnClose
+---------------------
+On the server side, we run below python script::
+
+ import socket
+ import time
+
+ port = 9000
+
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.bind(('0.0.0.0', port))
+ s.listen(1)
+ sock, addr = s.accept()
+ while True:
+ time.sleep(9999999)
+
+This python script listen on 9000 port, but doesn't read anything from
+the connection.
+
+On the client side, we send the string "hello" by nc::
+
+ nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000
+
+Then, we come back to the server side, the server has received the "hello"
+packet, and the TCP layer has acked this packet, but the application didn't
+read it yet. We type Ctrl-C to terminate the server script. Then we
+could find TcpExtTCPAbortOnClose increased 1 on the server side::
+
+ nstatuser@nstat-b:~$ nstat | grep -i abort
+ TcpExtTCPAbortOnClose 1 0.0
+
+If we run tcpdump on the server side, we could find the server sent a
+RST after we type Ctrl-C.
+
+TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
+---------------------------------------------------
+Below is an example which let the orphan socket count be higher than
+net.ipv4.tcp_max_orphans.
+Change tcp_max_orphans to a smaller value on client::
+
+ sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans"
+
+Client code (create 64 connection to server)::
+
+ nstatuser@nstat-a:~$ cat client_orphan.py
+ import socket
+ import time
+
+ server = 'nstat-b' # server address
+ port = 9000
+
+ count = 64
+
+ connection_list = []
+
+ for i in range(64):
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.connect((server, port))
+ connection_list.append(s)
+ print("connection_count: %d" % len(connection_list))
+
+ while True:
+ time.sleep(99999)
+
+Server code (accept 64 connection from client)::
+
+ nstatuser@nstat-b:~$ cat server_orphan.py
+ import socket
+ import time
+
+ port = 9000
+ count = 64
+
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.bind(('0.0.0.0', port))
+ s.listen(count)
+ connection_list = []
+ while True:
+ sock, addr = s.accept()
+ connection_list.append((sock, addr))
+ print("connection_count: %d" % len(connection_list))
+
+Run the python scripts on server and client.
+
+On server::
+
+ python3 server_orphan.py
+
+On client::
+
+ python3 client_orphan.py
+
+Run iptables on server::
+
+ sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP
+
+Type Ctrl-C on client, stop client_orphan.py.
+
+Check TcpExtTCPAbortOnMemory on client::
+
+ nstatuser@nstat-a:~$ nstat | grep -i abort
+ TcpExtTCPAbortOnMemory 54 0.0
+
+Check orphane socket count on client::
+
+ nstatuser@nstat-a:~$ ss -s
+ Total: 131 (kernel 0)
+ TCP: 14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0
+
+ Transport Total IP IPv6
+ * 0 - -
+ RAW 1 0 1
+ UDP 1 1 0
+ TCP 14 13 1
+ INET 16 14 2
+ FRAG 0 0 0
+
+The explanation of the test: after run server_orphan.py and
+client_orphan.py, we set up 64 connections between server and
+client. Run the iptables command, the server will drop all packets from
+the client, type Ctrl-C on client_orphan.py, the system of the client
+would try to close these connections, and before they are closed
+gracefully, these connections became orphan sockets. As the iptables
+of the server blocked packets from the client, the server won't receive fin
+from the client, so all connection on clients would be stuck on FIN_WAIT_1
+stage, so they will keep as orphan sockets until timeout. We have echo
+10 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would
+only keep 10 orphan sockets, for all other orphan sockets, the client
+system sent RST for them and delete them. We have 64 connections, so
+the 'ss -s' command shows the system has 10 orphan sockets, and the
+value of TcpExtTCPAbortOnMemory was 54.
+
+An additional explanation about orphan socket count: You could find the
+exactly orphan socket count by the 'ss -s' command, but when kernel
+decide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel
+doesn't always check the exactly orphan socket count. For increasing
+performance, kernel checks an approximate count firstly, if the
+approximate count is more than tcp_max_orphans, kernel checks the
+exact count again. So if the approximate count is less than
+tcp_max_orphans, but exactly count is more than tcp_max_orphans, you
+would find TcpExtTCPAbortOnMemory is not increased at all. If
+tcp_max_orphans is large enough, it won't occur, but if you decrease
+tcp_max_orphans to a small value like our test, you might find this
+issue. So in our test, the client set up 64 connections although the
+tcp_max_orphans is 10. If the client only set up 11 connections, we
+can't find the change of TcpExtTCPAbortOnMemory.
+
+Continue the previous test, we wait for several minutes. Because of the
+iptables on the server blocked the traffic, the server wouldn't receive
+fin, and all the client's orphan sockets would timeout on the
+FIN_WAIT_1 state finally. So we wait for a few minutes, we could find
+10 timeout on the client::
+
+ nstatuser@nstat-a:~$ nstat | grep -i abort
+ TcpExtTCPAbortOnTimeout 10 0.0
+
+TcpExtTCPAbortOnLinger
+----------------------
+The server side code::
+
+ nstatuser@nstat-b:~$ cat server_linger.py
+ import socket
+ import time
+
+ port = 9000
+
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.bind(('0.0.0.0', port))
+ s.listen(1)
+ sock, addr = s.accept()
+ while True:
+ time.sleep(9999999)
+
+The client side code::
+
+ nstatuser@nstat-a:~$ cat client_linger.py
+ import socket
+ import struct
+
+ server = 'nstat-b' # server address
+ port = 9000
+
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10))
+ s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1))
+ s.connect((server, port))
+ s.close()
+
+Run server_linger.py on server::
+
+ nstatuser@nstat-b:~$ python3 server_linger.py
+
+Run client_linger.py on client::
+
+ nstatuser@nstat-a:~$ python3 client_linger.py
+
+After run client_linger.py, check the output of nstat::
+
+ nstatuser@nstat-a:~$ nstat | grep -i abort
+ TcpExtTCPAbortOnLinger 1 0.0
+
+TcpExtTCPRcvCoalesce
+--------------------
+On the server, we run a program which listen on TCP port 9000, but
+doesn't read any data::
+
+ import socket
+ import time
+ port = 9000
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.bind(('0.0.0.0', port))
+ s.listen(1)
+ sock, addr = s.accept()
+ while True:
+ time.sleep(9999999)
+
+Save the above code as server_coalesce.py, and run::
+
+ python3 server_coalesce.py
+
+On the client, save below code as client_coalesce.py::
+
+ import socket
+ server = 'nstat-b'
+ port = 9000
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.connect((server, port))
+
+Run::
+
+ nstatuser@nstat-a:~$ python3 -i client_coalesce.py
+
+We use '-i' to come into the interactive mode, then a packet::
+
+ >>> s.send(b'foo')
+ 3
+
+Send a packet again::
+
+ >>> s.send(b'bar')
+ 3
+
+On the server, run nstat::
+
+ ubuntu@nstat-b:~$ nstat
+ #kernel
+ IpInReceives 2 0.0
+ IpInDelivers 2 0.0
+ IpOutRequests 2 0.0
+ TcpInSegs 2 0.0
+ TcpOutSegs 2 0.0
+ TcpExtTCPRcvCoalesce 1 0.0
+ IpExtInOctets 110 0.0
+ IpExtOutOctets 104 0.0
+ IpExtInNoECTPkts 2 0.0
+
+The client sent two packets, server didn't read any data. When
+the second packet arrived at server, the first packet was still in
+the receiving queue. So the TCP layer merged the two packets, and we
+could find the TcpExtTCPRcvCoalesce increased 1.
+
+TcpExtListenOverflows and TcpExtListenDrops
+-------------------------------------------
+On server, run the nc command, listen on port 9000::
+
+ nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
+ Listening on [0.0.0.0] (family 0, port 9000)
+
+On client, run 3 nc commands in different terminals::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+ Connection to nstat-b 9000 port [tcp/*] succeeded!
+
+The nc command only accepts 1 connection, and the accept queue length
+is 1. On current linux implementation, set queue length to n means the
+actual queue length is n+1. Now we create 3 connections, 1 is accepted
+by nc, 2 in accepted queue, so the accept queue is full.
+
+Before running the 4th nc, we clean the nstat history on the server::
+
+ nstatuser@nstat-b:~$ nstat -n
+
+Run the 4th nc on the client::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+
+If the nc server is running on kernel 4.10 or higher version, you
+won't see the "Connection to ... succeeded!" string, because kernel
+will drop the SYN if the accept queue is full. If the nc client is running
+on an old kernel, you would see that the connection is succeeded,
+because kernel would complete the 3 way handshake and keep the socket
+on half open queue. I did the test on kernel 4.15. Below is the nstat
+on the server::
+
+ nstatuser@nstat-b:~$ nstat
+ #kernel
+ IpInReceives 4 0.0
+ IpInDelivers 4 0.0
+ TcpInSegs 4 0.0
+ TcpExtListenOverflows 4 0.0
+ TcpExtListenDrops 4 0.0
+ IpExtInOctets 240 0.0
+ IpExtInNoECTPkts 4 0.0
+
+Both TcpExtListenOverflows and TcpExtListenDrops were 4. If the time
+between the 4th nc and the nstat was longer, the value of
+TcpExtListenOverflows and TcpExtListenDrops would be larger, because
+the SYN of the 4th nc was dropped, the client was retrying.
+
+IpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes
+-------------------------------------------------
+server A IP address: 192.168.122.250
+server B IP address: 192.168.122.251
+Prepare on server A, add a route to server B::
+
+ $ sudo ip route add 8.8.8.8/32 via 192.168.122.251
+
+Prepare on server B, disable send_redirects for all interfaces::
+
+ $ sudo sysctl -w net.ipv4.conf.all.send_redirects=0
+ $ sudo sysctl -w net.ipv4.conf.ens3.send_redirects=0
+ $ sudo sysctl -w net.ipv4.conf.lo.send_redirects=0
+ $ sudo sysctl -w net.ipv4.conf.default.send_redirects=0
+
+We want to let sever A send a packet to 8.8.8.8, and route the packet
+to server B. When server B receives such packet, it might send a ICMP
+Redirect message to server A, set send_redirects to 0 will disable
+this behavior.
+
+First, generate InAddrErrors. On server B, we disable IP forwarding::
+
+ $ sudo sysctl -w net.ipv4.conf.all.forwarding=0
+
+On server A, we send packets to 8.8.8.8::
+
+ $ nc -v 8.8.8.8 53
+
+On server B, we check the output of nstat::
+
+ $ nstat
+ #kernel
+ IpInReceives 3 0.0
+ IpInAddrErrors 3 0.0
+ IpExtInOctets 180 0.0
+ IpExtInNoECTPkts 3 0.0
+
+As we have let server A route 8.8.8.8 to server B, and we disabled IP
+forwarding on server B, Server A sent packets to server B, then server B
+dropped packets and increased IpInAddrErrors. As the nc command would
+re-send the SYN packet if it didn't receive a SYN+ACK, we could find
+multiple IpInAddrErrors.
+
+Second, generate IpExtInNoRoutes. On server B, we enable IP
+forwarding::
+
+ $ sudo sysctl -w net.ipv4.conf.all.forwarding=1
+
+Check the route table of server B and remove the default route::
+
+ $ ip route show
+ default via 192.168.122.1 dev ens3 proto static
+ 192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.251
+ $ sudo ip route delete default via 192.168.122.1 dev ens3 proto static
+
+On server A, we contact 8.8.8.8 again::
+
+ $ nc -v 8.8.8.8 53
+ nc: connect to 8.8.8.8 port 53 (tcp) failed: Network is unreachable
+
+On server B, run nstat::
+
+ $ nstat
+ #kernel
+ IpInReceives 1 0.0
+ IpOutRequests 1 0.0
+ IcmpOutMsgs 1 0.0
+ IcmpOutDestUnreachs 1 0.0
+ IcmpMsgOutType3 1 0.0
+ IpExtInNoRoutes 1 0.0
+ IpExtInOctets 60 0.0
+ IpExtOutOctets 88 0.0
+ IpExtInNoECTPkts 1 0.0
+
+We enabled IP forwarding on server B, when server B received a packet
+which destination IP address is 8.8.8.8, server B will try to forward
+this packet. We have deleted the default route, there was no route for
+8.8.8.8, so server B increase IpExtInNoRoutes and sent the "ICMP
+Destination Unreachable" message to server A.
+
+Third, generate IpOutNoRoutes. Run ping command on server B::
+
+ $ ping -c 1 8.8.8.8
+ connect: Network is unreachable
+
+Run nstat on server B::
+
+ $ nstat
+ #kernel
+ IpOutNoRoutes 1 0.0
+
+We have deleted the default route on server B. Server B couldn't find
+a route for the 8.8.8.8 IP address, so server B increased
+IpOutNoRoutes.
+
+TcpExtTCPACKSkippedSynRecv
+--------------------------
+In this test, we send 3 same SYN packets from client to server. The
+first SYN will let server create a socket, set it to Syn-Recv status,
+and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
+again, and record the reply time (the duplicate ACK reply time). The
+third SYN will let server check the previous duplicate ACK reply time,
+and decide to skip the duplicate ACK, then increase the
+TcpExtTCPACKSkippedSynRecv counter.
+
+Run tcpdump to capture a SYN packet::
+
+ nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
+ tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
+
+Open another terminal, run nc command::
+
+ nstatuser@nstat-a:~$ nc nstat-b 9000
+
+As the nstat-b didn't listen on port 9000, it should reply a RST, and
+the nc command exited immediately. It was enough for the tcpdump
+command to capture a SYN packet. A linux server might use hardware
+offload for the TCP checksum, so the checksum in the /tmp/syn.pcap
+might be not correct. We call tcprewrite to fix it::
+
+ nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum
+
+On nstat-b, we run nc to listen on port 9000::
+
+ nstatuser@nstat-b:~$ nc -lkv 9000
+ Listening on [0.0.0.0] (family 0, port 9000)
+
+On nstat-a, we blocked the packet from port 9000, or nstat-a would send
+RST to nstat-b::
+
+ nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP
+
+Send 3 SYN repeatly to nstat-b::
+
+ nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done
+
+Check snmp cunter on nstat-b::
+
+ nstatuser@nstat-b:~$ nstat | grep -i skip
+ TcpExtTCPACKSkippedSynRecv 1 0.0
+
+As we expected, TcpExtTCPACKSkippedSynRecv is 1.
+
+TcpExtTCPACKSkippedPAWS
+-----------------------
+To trigger PAWS, we could send an old SYN.
+
+On nstat-b, let nc listen on port 9000::
+
+ nstatuser@nstat-b:~$ nc -lkv 9000
+ Listening on [0.0.0.0] (family 0, port 9000)
+
+On nstat-a, run tcpdump to capture a SYN::
+
+ nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
+ tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
+
+On nstat-a, run nc as a client to connect nstat-b::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+ Connection to nstat-b 9000 port [tcp/*] succeeded!
+
+Now the tcpdump has captured the SYN and exit. We should fix the
+checksum::
+
+ nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum
+
+Send the SYN packet twice::
+
+ nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done
+
+On nstat-b, check the snmp counter::
+
+ nstatuser@nstat-b:~$ nstat | grep -i skip
+ TcpExtTCPACKSkippedPAWS 1 0.0
+
+We sent two SYN via tcpreplay, both of them would let PAWS check
+failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
+for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
+
+TcpExtTCPACKSkippedSeq
+----------------------
+To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
+timestamp (to pass PAWS check) but the sequence number is out of
+window. The linux TCP stack would avoid to skip if the packet has
+data, so we need a pure ACK packet. To generate such a packet, we
+could create two sockets: one on port 9000, another on port 9001. Then
+we capture an ACK on port 9001, change the source/destination port
+numbers to match the port 9000 socket. Then we could trigger
+TcpExtTCPACKSkippedSeq via this packet.
+
+On nstat-b, open two terminals, run two nc commands to listen on both
+port 9000 and port 9001::
+
+ nstatuser@nstat-b:~$ nc -lkv 9000
+ Listening on [0.0.0.0] (family 0, port 9000)
+
+ nstatuser@nstat-b:~$ nc -lkv 9001
+ Listening on [0.0.0.0] (family 0, port 9001)
+
+On nstat-a, run two nc clients::
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9000
+ Connection to nstat-b 9000 port [tcp/*] succeeded!
+
+ nstatuser@nstat-a:~$ nc -v nstat-b 9001
+ Connection to nstat-b 9001 port [tcp/*] succeeded!
+
+On nstat-a, run tcpdump to capture an ACK::
+
+ nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
+ tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
+
+On nstat-b, send a packet via the port 9001 socket. E.g. we sent a
+string 'foo' in our example::
+
+ nstatuser@nstat-b:~$ nc -lkv 9001
+ Listening on [0.0.0.0] (family 0, port 9001)
+ Connection from nstat-a 42132 received!
+ foo
+
+On nstat-a, the tcpdump should have caputred the ACK. We should check
+the source port numbers of the two nc clients::
+
+ nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
+ State Recv-Q Send-Q Local Address:Port Peer Address:Port
+ ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000
+ ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001
+
+Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
+port 50208::
+
+ nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum
+
+Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b::
+
+ nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done
+
+Check TcpExtTCPACKSkippedSeq on nstat-b::
+
+ nstatuser@nstat-b:~$ nstat | grep -i skip
+ TcpExtTCPACKSkippedSeq 1 0.0
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 82236a1..86174ce 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -92,11 +92,11 @@
Switch ID
^^^^^^^^^
-The switchdev driver must implement the switchdev op switchdev_port_attr_get
-for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same
-physical ID for each port of a switch. The ID must be unique between switches
-on the same system. The ID does not need to be unique between switches on
-different systems.
+The switchdev driver must implement the net_device operation
+ndo_get_port_parent_id for each port netdev, returning the same physical ID for
+each port of a switch. The ID must be unique between switches on the same
+system. The ID does not need to be unique between switches on different
+systems.
The switch ID is used to locate ports on a switch and to know if aggregated
ports belong to the same switch.
@@ -196,7 +196,7 @@
and notify the switch driver of the mac/vlan/port tuples. The switch driver,
in turn, will notify the bridge driver using the switchdev notifier call:
- err = call_switchdev_notifiers(val, dev, info);
+ err = call_switchdev_notifiers(val, dev, info, extack);
Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
forgetting, and info points to a struct switchdev_notifier_fdb_info. On
@@ -232,10 +232,8 @@
the bridge's FDB. It's possible, but not optimal, to enable learning on the
device port and on the bridge port, and disable learning_sync.
-To support learning and learning_sync port attributes, the driver implements
-switchdev op switchdev_port_attr_get/set for
-SWITCHDEV_ATTR_PORT_ID_BRIDGE_FLAGS. The driver should initialize the attributes
-to the hardware defaults.
+To support learning, the driver implements switchdev op
+switchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS.
FDB Ageing
^^^^^^^^^^
@@ -373,22 +371,3 @@
NETEVENT_NEIGH_UPDATE. The device can be programmed with resolved nexthops
for the routes as arp_tbl updates. The driver implements ndo_neigh_destroy
to know when arp_tbl neighbor entries are purged from the port.
-
-Transaction item queue
-^^^^^^^^^^^^^^^^^^^^^^
-
-For switchdev ops attr_set and obj_add, there is a 2 phase transaction model
-used. First phase is to "prepare" anything needed, including various checks,
-memory allocation, etc. The goal is to handle the stuff that is not unlikely
-to fail here. The second phase is to "commit" the actual changes.
-
-Switchdev provides an infrastructure for sharing items (for example memory
-allocations) between the two phases.
-
-The object created by a driver in "prepare" phase and it is queued up by:
-switchdev_trans_item_enqueue()
-During the "commit" phase, the driver gets the object by:
-switchdev_trans_item_dequeue()
-
-If a transaction is aborted during "prepare" phase, switchdev code will handle
-cleanup of the queued-up objects.
diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt
deleted file mode 100644
index 9c7139d..0000000
--- a/Documentation/networking/tcp.txt
+++ /dev/null
@@ -1,101 +0,0 @@
-TCP protocol
-============
-
-Last updated: 3 June 2017
-
-Contents
-========
-
-- Congestion control
-- How the new TCP output machine [nyi] works
-
-Congestion control
-==================
-
-The following variables are used in the tcp_sock for congestion control:
-snd_cwnd The size of the congestion window
-snd_ssthresh Slow start threshold. We are in slow start if
- snd_cwnd is less than this.
-snd_cwnd_cnt A counter used to slow down the rate of increase
- once we exceed slow start threshold.
-snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
-snd_cwnd_stamp Timestamp for when congestion window last validated.
-snd_cwnd_used Used as a highwater mark for how much of the
- congestion window is in use. It is used to adjust
- snd_cwnd down when the link is limited by the
- application rather than the network.
-
-As of 2.6.13, Linux supports pluggable congestion control algorithms.
-A congestion control mechanism can be registered through functions in
-tcp_cong.c. The functions used by the congestion control mechanism are
-registered via passing a tcp_congestion_ops struct to
-tcp_register_congestion_control. As a minimum, the congestion control
-mechanism must provide a valid name and must implement either ssthresh,
-cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook.
-
-Private data for a congestion control mechanism is stored in tp->ca_priv.
-tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
-is important to check the size of your private data will fit this space, or
-alternatively, space could be allocated elsewhere and a pointer to it could
-be stored here.
-
-There are three kinds of congestion control algorithms currently: The
-simplest ones are derived from TCP reno (highspeed, scalable) and just
-provide an alternative congestion window calculation. More complex
-ones like BIC try to look at other events to provide better
-heuristics. There are also round trip time based algorithms like
-Vegas and Westwood+.
-
-Good TCP congestion control is a complex problem because the algorithm
-needs to maintain fairness and performance. Please review current
-research and RFC's before developing new modules.
-
-The default congestion control mechanism is chosen based on the
-DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default
-value then you can set it using sysctl net.ipv4.tcp_congestion_control. The
-module will be autoloaded if needed and you will get the expected protocol. If
-you ask for an unknown congestion method, then the sysctl attempt will fail.
-
-If you remove a TCP congestion control module, then you will get the next
-available one. Since reno cannot be built as a module, and cannot be
-removed, it will always be available.
-
-How the new TCP output machine [nyi] works.
-===========================================
-
-Data is kept on a single queue. The skb->users flag tells us if the frame is
-one that has been queued already. To add a frame we throw it on the end. Ack
-walks down the list from the start.
-
-We keep a set of control flags
-
-
- sk->tcp_pend_event
-
- TCP_PEND_ACK Ack needed
- TCP_ACK_NOW Needed now
- TCP_WINDOW Window update check
- TCP_WINZERO Zero probing
-
-
- sk->transmit_queue The transmission frame begin
- sk->transmit_new First new frame pointer
- sk->transmit_end Where to add frames
-
- sk->tcp_last_tx_ack Last ack seen
- sk->tcp_dup_ack Dup ack count for fast retransmit
-
-
-Frames are queued for output by tcp_write. We do our best to send the frames
-off immediately if possible, but otherwise queue and compute the body
-checksum in the copy.
-
-When a write is done we try to clear any pending events and piggy back them.
-If the window is full we queue full sized frames. On the first timeout in
-zero window we split this.
-
-On a timer we walk the retransmit list to send any retransmits, update the
-backoff timers etc. A change of route table stamp causes a change of header
-and recompute. We add any new tcp level headers and refinish the checksum
-before sending.
-
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 1be0b6f..8dd6333 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -6,11 +6,21 @@
* SO_TIMESTAMP
Generates a timestamp for each incoming packet in (not necessarily
monotonic) system time. Reports the timestamp via recvmsg() in a
- control message as struct timeval (usec resolution).
+ control message in usec resolution.
+ SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD
+ based on the architecture type and time_t representation of libc.
+ Control message format is in struct __kernel_old_timeval for
+ SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for
+ SO_TIMESTAMP_NEW options respectively.
* SO_TIMESTAMPNS
Same timestamping mechanism as SO_TIMESTAMP, but reports the
- timestamp as struct timespec (nsec resolution).
+ timestamp as struct timespec in nsec resolution.
+ SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD
+ based on the architecture type and time_t representation of libc.
+ Control message format is in struct timespec for SO_TIMESTAMPNS_OLD
+ and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options
+ respectively.
* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
Only for multicast:approximate transmit timestamp obtained by
@@ -22,7 +32,7 @@
timestamps for stream sockets.
-1.1 SO_TIMESTAMP:
+1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW):
This socket option enables timestamping of datagrams on the reception
path. Because the destination socket, if any, is not known early in
@@ -31,15 +41,25 @@
For interface details, see `man 7 socket`.
+Always use SO_TIMESTAMP_NEW timestamp to always get timestamp in
+struct __kernel_sock_timeval format.
-1.2 SO_TIMESTAMPNS:
+SO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038
+on 32 bit machines.
+
+1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW):
This option is identical to SO_TIMESTAMP except for the returned data type.
Its struct timespec allows for higher resolution (ns) timestamps than the
timeval of SO_TIMESTAMP (ms).
+Always use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in
+struct __kernel_timespec format.
-1.3 SO_TIMESTAMPING:
+SO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038
+on 32 bit machines.
+
+1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW):
Supports multiple types of timestamp requests. As a result, this
socket option takes a bitmap of flags, not a boolean. In
@@ -323,10 +343,23 @@
These timestamps are returned in a control message with cmsg_level
SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
+For SO_TIMESTAMPING_OLD:
+
struct scm_timestamping {
struct timespec ts[3];
};
+For SO_TIMESTAMPING_NEW:
+
+struct scm_timestamping64 {
+ struct __kernel_timespec ts[3];
+
+Always use SO_TIMESTAMPING_NEW timestamp to always get timestamp in
+struct scm_timestamping64 format.
+
+SO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038
+on 32 bit machines.
+
The structure can return up to three timestamps. This is a legacy
feature. At least one field is non-zero at any time. Most timestamps
are passed in ts[0]. Hardware timestamps are passed in ts[2].
@@ -335,7 +368,7 @@
Instead, expose the hardware clock device on the NIC directly as
a HW PTP clock source, to allow time conversion in userspace and
optionally synchronize system time with a userspace PTP stack such
-as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt.
+as linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst.
Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
@@ -417,7 +450,7 @@
Hardware time stamping must also be initialized for each device driver
that is expected to do hardware time stamping. The parameter is defined in
-/include/linux/net_tstamp.h as:
+include/uapi/linux/net_tstamp.h as:
struct hwtstamp_config {
int flags; /* no flags defined right now, must be zero */
@@ -487,7 +520,7 @@
HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
/* for the complete list of values, please check
- * the include file /include/linux/net_tstamp.h
+ * the include file include/uapi/linux/net_tstamp.h
*/
};
diff --git a/Documentation/networking/tls-offload-layers.svg b/Documentation/networking/tls-offload-layers.svg
new file mode 100644
index 0000000..cf72f05
--- /dev/null
+++ b/Documentation/networking/tls-offload-layers.svg
@@ -0,0 +1 @@
+<svg version="1.1" viewBox="0.0 0.0 460.0 500.0" fill="none" stroke="none" stroke-linecap="square" stroke-miterlimit="10" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg"><clipPath id="p.0"><path d="m0 0l960.0 0l0 720.0l-960.0 0l0 -720.0z" clip-rule="nonzero"/></clipPath><g clip-path="url(#p.0)"><path fill="#000000" fill-opacity="0.0" d="m0 0l960.0 0l0 720.0l-960.0 0z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m117.02887 0l72.28346 0l0 40.25197l-72.28346 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m117.02887 0l72.28346 0l0 40.25197l-72.28346 0z" fill-rule="evenodd"/><path fill="#000000" d="m135.71944 27.045982l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0zm12.853302 -3.109375l1.6875 0.203125q-0.40625 1.484375 -1.484375 2.3125q-1.078125 0.8125 -2.765625 0.8125q-2.125 0 -3.375 -1.296875q-1.234375 -1.3125 -1.234375 -3.671875q0 -2.453125 1.25 -3.796875q1.265625 -1.34375 3.265625 -1.34375q1.9375 0 3.15625 1.328125q1.234375 1.3125 1.234375 3.703125q0 0.15625 0 0.4375l-7.21875 0q0.09375 1.59375 0.90625 2.453125q0.8125 0.84375 2.015625 0.84375q0.90625 0 1.546875 -0.46875q0.640625 -0.484375 1.015625 -1.515625zm-5.390625 -2.65625l5.40625 0q-0.109375 -1.21875 -0.625 -1.828125q-0.78125 -0.953125 -2.03125 -0.953125q-1.125 0 -1.90625 0.765625q-0.765625 0.75 -0.84375 2.015625zm15.453842 4.578125q-0.921875 0.765625 -1.765625 1.09375q-0.828125 0.3125 -1.796875 0.3125q-1.59375 0 -2.453125 -0.78125q-0.859375 -0.78125 -0.859375 -1.984375q0 -0.71875 0.328125 -1.296875q0.328125 -0.59375 0.84375 -0.9375q0.53125 -0.359375 1.1875 -0.546875q0.46875 -0.125 1.453125 -0.25q1.984375 -0.234375 2.921875 -0.5625q0.015625 -0.34375 0.015625 -0.421875q0 -1.0 -0.46875 -1.421875q-0.625 -0.546875 -1.875 -0.546875q-1.15625 0 -1.703125 0.40625q-0.546875 0.40625 -0.8125 1.421875l-1.609375 -0.21875q0.21875 -1.015625 0.71875 -1.640625q0.5 -0.640625 1.453125 -0.984375q0.953125 -0.34375 2.1875 -0.34375q1.25 0 2.015625 0.296875q0.78125 0.28125 1.140625 0.734375q0.375 0.4375 0.515625 1.109375q0.078125 0.421875 0.078125 1.515625l0 2.1875q0 2.28125 0.109375 2.890625q0.109375 0.59375 0.40625 1.15625l-1.703125 0q-0.265625 -0.515625 -0.328125 -1.1875zm-0.140625 -3.671875q-0.890625 0.375 -2.671875 0.625q-1.015625 0.140625 -1.4375 0.328125q-0.421875 0.1875 -0.65625 0.53125q-0.21875 0.34375 -0.21875 0.78125q0 0.65625 0.5 1.09375q0.5 0.4375 1.453125 0.4375q0.9375 0 1.671875 -0.40625q0.75 -0.421875 1.09375 -1.140625q0.265625 -0.5625 0.265625 -1.640625l0 -0.609375zm10.469467 4.859375l0 -1.21875q-0.90625 1.4375 -2.703125 1.4375q-1.15625 0 -2.125 -0.640625q-0.96875 -0.640625 -1.5 -1.78125q-0.53125 -1.140625 -0.53125 -2.625q0 -1.453125 0.484375 -2.625q0.484375 -1.1875 1.4375 -1.8125q0.96875 -0.625 2.171875 -0.625q0.875 0 1.546875 0.375q0.6875 0.359375 1.109375 0.953125l0 -4.796875l1.640625 0l0 13.359375l-1.53125 0zm-5.171875 -4.828125q0 1.859375 0.78125 2.78125q0.78125 0.921875 1.84375 0.921875q1.078125 0 1.828125 -0.875q0.75 -0.890625 0.75 -2.6875q0 -1.984375 -0.765625 -2.90625q-0.765625 -0.9375 -1.890625 -0.9375q-1.078125 0 -1.8125 0.890625q-0.734375 0.890625 -0.734375 2.8125z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m309.02887 0l72.28348 0l0 40.25197l-72.28348 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m309.02887 0l72.28348 0l0 40.25197l-72.28348 0z" fill-rule="evenodd"/><path fill="#000000" d="m328.4915 27.045982l-2.96875 -9.671875l1.703125 0l1.53125 5.578125l0.578125 2.078125q0.046875 -0.15625 0.5 -2.0l1.546875 -5.65625l1.6875 0l1.4375 5.609375l0.484375 1.84375l0.5625 -1.859375l1.65625 -5.59375l1.59375 0l-3.03125 9.671875l-1.703125 0l-1.53125 -5.796875l-0.375 -1.640625l-1.953125 7.4375l-1.71875 0zm11.676086 0l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0zm6.228302 -11.46875l0 -1.890625l1.640625 0l0 1.890625l-1.640625 0zm0 11.46875l0 -9.671875l1.640625 0l0 9.671875l-1.640625 0zm7.722931 -1.46875l0.234375 1.453125q-0.6875 0.140625 -1.234375 0.140625q-0.890625 0 -1.390625 -0.28125q-0.484375 -0.28125 -0.6875 -0.734375q-0.203125 -0.46875 -0.203125 -1.9375l0 -5.578125l-1.203125 0l0 -1.265625l1.203125 0l0 -2.390625l1.625 -0.984375l0 3.375l1.65625 0l0 1.265625l-1.65625 0l0 5.671875q0 0.6875 0.078125 0.890625q0.09375 0.203125 0.28125 0.328125q0.203125 0.109375 0.578125 0.109375q0.265625 0 0.71875 -0.0625zm8.230194 -1.640625l1.6875 0.203125q-0.40625 1.484375 -1.484375 2.3125q-1.078125 0.8125 -2.765625 0.8125q-2.125 0 -3.375 -1.296875q-1.234375 -1.3125 -1.234375 -3.671875q0 -2.453125 1.25 -3.796875q1.265625 -1.34375 3.265625 -1.34375q1.9375 0 3.15625 1.328125q1.234375 1.3125 1.234375 3.703125q0 0.15625 0 0.4375l-7.21875 0q0.09375 1.59375 0.90625 2.453125q0.8125 0.84375 2.015625 0.84375q0.90625 0 1.546875 -0.46875q0.640625 -0.484375 1.015625 -1.515625zm-5.390625 -2.65625l5.40625 0q-0.109375 -1.21875 -0.625 -1.828125q-0.78125 -0.953125 -2.03125 -0.953125q-1.125 0 -1.90625 0.765625q-0.765625 0.75 -0.84375 2.015625z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m73.64304 101.46588l351.0551 0l0 53.70079l-351.0551 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m73.64304 101.46588l351.0551 0l0 53.70079l-351.0551 0z" fill-rule="evenodd"/><path fill="#000000" d="m215.67503 135.23627l0 -13.359367l1.640625 0l0 7.6249924l3.890625 -3.9374924l2.109375 0l-3.6875 3.5937424l4.0625 6.078125l-2.015625 0l-3.203125 -4.953125l-1.15625 1.125l0 3.828125l-1.640625 0zm12.90625 -1.46875l0.234375 1.453125q-0.6875 0.140625 -1.234375 0.140625q-0.890625 0 -1.390625 -0.28125q-0.484375 -0.28125 -0.6875 -0.734375q-0.203125 -0.46875 -0.203125 -1.9375l0 -5.5781174l-1.203125 0l0 -1.265625l1.203125 0l0 -2.390625l1.625 -0.984375l0 3.375l1.65625 0l0 1.265625l-1.65625 0l0 5.6718674q0 0.6875 0.078125 0.890625q0.09375 0.203125 0.28125 0.328125q0.203125 0.109375 0.578125 0.109375q0.265625 0 0.71875 -0.0625zm1.5583038 1.46875l0 -13.359367l1.640625 0l0 13.359367l-1.640625 0zm3.5354462 -2.890625l1.625 -0.25q0.125 0.96875 0.75 1.5q0.625 0.515625 1.75 0.515625q1.125 0 1.671875 -0.453125q0.546875 -0.46875 0.546875 -1.09375q0 -0.546875 -0.484375 -0.875q-0.328125 -0.21875 -1.671875 -0.546875q-1.8125 -0.46875 -2.515625 -0.796875q-0.6875 -0.328125 -1.046875 -0.90625q-0.359375 -0.59375 -0.359375 -1.3125q0 -0.6406174 0.296875 -1.1874924q0.296875 -0.5625 0.8125 -0.921875q0.375 -0.28125 1.03125 -0.46875q0.671875 -0.203125 1.421875 -0.203125q1.140625 0 2.0 0.328125q0.859375 0.328125 1.265625 0.890625q0.421875 0.5625 0.578125 1.4999924l-1.609375 0.21875q-0.109375 -0.7499924 -0.640625 -1.1718674q-0.515625 -0.421875 -1.46875 -0.421875q-1.140625 0 -1.625 0.375q-0.46875 0.375 -0.46875 0.875q0 0.31249237 0.1875 0.5781174q0.203125 0.265625 0.640625 0.4375q0.234375 0.09375 1.4375 0.421875q1.75 0.453125 2.4375 0.75q0.6875 0.296875 1.078125 0.859375q0.390625 0.5625 0.390625 1.40625q0 0.828125 -0.484375 1.546875q-0.46875 0.71875 -1.375 1.125q-0.90625 0.390625 -2.046875 0.390625q-1.875 0 -2.875 -0.78125q-0.984375 -0.78125 -1.25 -2.328125zm24.136429 -10.468742l1.765625 0l0 7.7187424q0 2.015625 -0.453125 3.203125q-0.453125 1.1875 -1.640625 1.9375q-1.1875 0.734375 -3.125 0.734375q-1.875 0 -3.078125 -0.640625q-1.1875 -0.65625 -1.703125 -1.875q-0.5 -1.234375 -0.5 -3.359375l0 -7.7187424l1.765625 0l0 7.7187424q0 1.734375 0.3125 2.5625q0.328125 0.8125 1.109375 1.265625q0.796875 0.453125 1.9375 0.453125q1.953125 0 2.78125 -0.890625q0.828125 -0.890625 0.828125 -3.390625l0 -7.7187424zm4.629181 13.359367l0 -13.359367l1.78125 0l0 11.781242l6.5625 0l0 1.578125l-8.34375 0zm10.453857 0l0 -13.359367l5.046875 0q1.328125 0 2.03125 0.125q0.96875 0.171875 1.640625 0.640625q0.671875 0.453125 1.078125 1.28125q0.40625 0.828125 0.40625 1.828125q0 1.703125 -1.09375 2.8906174q-1.078125 1.171875 -3.921875 1.171875l-3.421875 0l0 5.421875l-1.765625 0zm1.765625 -7.0l3.453125 0q1.71875 0 2.4375 -0.6406174q0.71875 -0.640625 0.71875 -1.796875q0 -0.84375 -0.421875 -1.4375q-0.421875 -0.59375 -1.125 -0.78125q-0.4375 -0.125 -1.640625 -0.125l-3.421875 0l0 4.7812424z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m73.64304 216.38058l351.0551 0l0 53.700775l-351.0551 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m73.64304 216.38058l351.0551 0l0 53.700775l-351.0551 0z" fill-rule="evenodd"/><path fill="#000000" d="m211.16338 250.15097l0 -11.78125l-4.40625 0l0 -1.578125l10.578125 0l0 1.578125l-4.40625 0l0 11.78125l-1.765625 0zm17.52098 -4.6875l1.765625 0.453125q-0.5625 2.171875 -2.0 3.328125q-1.4375 1.140625 -3.53125 1.140625q-2.15625 0 -3.515625 -0.875q-1.34375 -0.890625 -2.0625 -2.546875q-0.703125 -1.671875 -0.703125 -3.59375q0 -2.078125 0.796875 -3.625q0.796875 -1.5625 2.265625 -2.359375q1.484375 -0.8125 3.25 -0.8125q2.0 0 3.359375 1.015625q1.375 1.015625 1.90625 2.875l-1.734375 0.40625q-0.46875 -1.453125 -1.359375 -2.109375q-0.875 -0.671875 -2.203125 -0.671875q-1.546875 0 -2.578125 0.734375q-1.03125 0.734375 -1.453125 1.984375q-0.421875 1.234375 -0.421875 2.5625q0 1.703125 0.5 2.96875q0.5 1.265625 1.546875 1.90625q1.046875 0.625 2.265625 0.625q1.484375 0 2.515625 -0.859375q1.03125 -0.859375 1.390625 -2.546875zm3.9416962 4.6875l0 -13.359375l5.046875 0q1.328125 0 2.03125 0.125q0.96875 0.171875 1.640625 0.640625q0.671875 0.453125 1.078125 1.28125q0.40625 0.828125 0.40625 1.828125q0 1.703125 -1.09375 2.890625q-1.078125 1.171875 -3.921875 1.171875l-3.421875 0l0 5.421875l-1.765625 0zm1.765625 -7.0l3.453125 0q1.71875 0 2.4375 -0.640625q0.71875 -0.640625 0.71875 -1.796875q0 -0.84375 -0.421875 -1.4375q-0.421875 -0.59375 -1.125 -0.78125q-0.4375 -0.125 -1.640625 -0.125l-3.421875 0l0 4.78125zm14.664642 4.109375l1.625 -0.25q0.125 0.96875 0.75 1.5q0.625 0.515625 1.75 0.515625q1.125 0 1.671875 -0.453125q0.546875 -0.46875 0.546875 -1.09375q0 -0.546875 -0.484375 -0.875q-0.328125 -0.21875 -1.671875 -0.546875q-1.8125 -0.46875 -2.515625 -0.796875q-0.6875 -0.328125 -1.046875 -0.90625q-0.359375 -0.59375 -0.359375 -1.3125q0 -0.640625 0.296875 -1.1875q0.296875 -0.5625 0.8125 -0.921875q0.375 -0.28125 1.03125 -0.46875q0.671875 -0.203125 1.421875 -0.203125q1.140625 0 2.0 0.328125q0.859375 0.328125 1.2656097 0.890625q0.421875 0.5625 0.578125 1.5l-1.6093597 0.21875q-0.109375 -0.75 -0.640625 -1.171875q-0.515625 -0.421875 -1.46875 -0.421875q-1.140625 0 -1.625 0.375q-0.46875 0.375 -0.46875 0.875q0 0.3125 0.1875 0.578125q0.203125 0.265625 0.640625 0.4375q0.234375 0.09375 1.4375 0.421875q1.75 0.453125 2.4375 0.75q0.68748474 0.296875 1.0781097 0.859375q0.390625 0.5625 0.390625 1.40625q0 0.828125 -0.484375 1.546875q-0.46875 0.71875 -1.3749847 1.125q-0.90625 0.390625 -2.046875 0.390625q-1.875 0 -2.875 -0.78125q-0.984375 -0.78125 -1.25 -2.328125zm13.562485 1.421875l0.234375 1.453125q-0.6875 0.140625 -1.234375 0.140625q-0.890625 0 -1.390625 -0.28125q-0.484375 -0.28125 -0.6875 -0.734375q-0.203125 -0.46875 -0.203125 -1.9375l0 -5.578125l-1.203125 0l0 -1.265625l1.203125 0l0 -2.390625l1.625 -0.984375l0 3.375l1.65625 0l0 1.265625l-1.65625 0l0 5.671875q0 0.6875 0.078125 0.890625q0.09375 0.203125 0.28125 0.328125q0.203125 0.109375 0.578125 0.109375q0.265625 0 0.71875 -0.0625zm7.917694 0.28125q-0.921875 0.765625 -1.765625 1.09375q-0.828125 0.3125 -1.796875 0.3125q-1.59375 0 -2.453125 -0.78125q-0.859375 -0.78125 -0.859375 -1.984375q0 -0.71875 0.328125 -1.296875q0.328125 -0.59375 0.84375 -0.9375q0.53125 -0.359375 1.1875 -0.546875q0.46875 -0.125 1.453125 -0.25q1.984375 -0.234375 2.921875 -0.5625q0.015625 -0.34375 0.015625 -0.421875q0 -1.0 -0.46875 -1.421875q-0.625 -0.546875 -1.875 -0.546875q-1.15625 0 -1.703125 0.40625q-0.546875 0.40625 -0.8125 1.421875l-1.609375 -0.21875q0.21875 -1.015625 0.71875 -1.640625q0.5 -0.640625 1.453125 -0.984375q0.953125 -0.34375 2.1875 -0.34375q1.25 0 2.015625 0.296875q0.78125 0.28125 1.140625 0.734375q0.375 0.4375 0.515625 1.109375q0.078125 0.421875 0.078125 1.515625l0 2.1875q0 2.28125 0.109375 2.890625q0.109375 0.59375 0.40625 1.15625l-1.703125 0q-0.265625 -0.515625 -0.328125 -1.1875zm-0.140625 -3.671875q-0.890625 0.375 -2.671875 0.625q-1.015625 0.140625 -1.4375 0.328125q-0.421875 0.1875 -0.65625 0.53125q-0.21875 0.34375 -0.21875 0.78125q0 0.65625 0.5 1.09375q0.5 0.4375 1.453125 0.4375q0.9375 0 1.671875 -0.40625q0.75 -0.421875 1.09375 -1.140625q0.265625 -0.5625 0.265625 -1.640625l0 -0.609375zm10.516327 1.3125l1.609375 0.21875q-0.265625 1.65625 -1.359375 2.609375q-1.078125 0.9375 -2.671875 0.9375q-1.984375 0 -3.1875 -1.296875q-1.203125 -1.296875 -1.203125 -3.71875q0 -1.578125 0.515625 -2.75q0.515625 -1.171875 1.578125 -1.75q1.0625 -0.59375 2.3125 -0.59375q1.578125 0 2.578125 0.796875q1.0 0.796875 1.28125 2.265625l-1.59375 0.234375q-0.234375 -0.96875 -0.8125 -1.453125q-0.578125 -0.5 -1.390625 -0.5q-1.234375 0 -2.015625 0.890625q-0.78125 0.890625 -0.78125 2.8125q0 1.953125 0.75 2.84375q0.75 0.875 1.953125 0.875q0.96875 0 1.609375 -0.59375q0.65625 -0.59375 0.828125 -1.828125zm3.015625 3.546875l0 -13.359375l1.640625 0l0 7.625l3.890625 -3.9375l2.109375 0l-3.6875 3.59375l4.0625 6.078125l-2.015625 0l-3.203125 -4.953125l-1.15625 1.125l0 3.828125l-1.640625 0z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m73.64304 331.2953l351.0551 0l0 53.700775l-351.0551 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m73.64304 331.2953l351.0551 0l0 53.700775l-351.0551 0z" fill-rule="evenodd"/><path fill="#000000" d="m225.73463 365.06567l0 -13.359375l4.609375 0q1.546875 0 2.375 0.203125q1.140625 0.25 1.953125 0.953125q1.0625 0.890625 1.578125 2.28125q0.53125 1.390625 0.53125 3.171875q0 1.515625 -0.359375 2.703125q-0.359375 1.171875 -0.921875 1.9375q-0.546875 0.765625 -1.203125 1.21875q-0.65625 0.4375 -1.59375 0.671875q-0.9375 0.21875 -2.140625 0.21875l-4.828125 0zm1.765625 -1.578125l2.859375 0q1.3125 0 2.0625 -0.234375q0.75 -0.25 1.203125 -0.703125q0.625 -0.625 0.96875 -1.6875q0.359375 -1.0625 0.359375 -2.578125q0 -2.09375 -0.6875 -3.21875q-0.6875 -1.125 -1.671875 -1.5q-0.703125 -0.28125 -2.28125 -0.28125l-2.8125 0l0 10.203125zm11.488571 1.578125l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0zm6.228302 -11.46875l0 -1.890625l1.640625 0l0 1.890625l-1.640625 0zm0 11.46875l0 -9.671875l1.640625 0l0 9.671875l-1.640625 0zm6.832321 0l-3.6875 -9.671875l1.734375 0l2.078125 5.796875q0.328125 0.9375 0.625 1.9375q0.203125 -0.765625 0.609375 -1.828125l2.140625 -5.90625l1.6874847 0l-3.6562347 9.671875l-1.53125 0zm13.26561 -3.109375l1.6875 0.203125q-0.40625 1.484375 -1.484375 2.3125q-1.078125 0.8125 -2.765625 0.8125q-2.125 0 -3.375 -1.296875q-1.234375 -1.3125 -1.234375 -3.671875q0 -2.453125 1.25 -3.796875q1.265625 -1.34375 3.265625 -1.34375q1.9375 0 3.15625 1.328125q1.234375 1.3125 1.234375 3.703125q0 0.15625 0 0.4375l-7.21875 0q0.09375 1.59375 0.90625 2.453125q0.8125 0.84375 2.015625 0.84375q0.90625 0 1.546875 -0.46875q0.640625 -0.484375 1.015625 -1.515625zm-5.390625 -2.65625l5.40625 0q-0.109375 -1.21875 -0.625 -1.828125q-0.78125 -0.953125 -2.03125 -0.953125q-1.125 0 -1.90625 0.765625q-0.765625 0.75 -0.84375 2.015625zm9.125732 5.765625l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m73.64304 446.20996l351.0551 0l0 53.700806l-351.0551 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m73.64304 446.20996l351.0551 0l0 53.700806l-351.0551 0z" fill-rule="evenodd"/><path fill="#000000" d="m222.09538 479.98038l0 -13.359375l4.609375 0q1.546875 0 2.375 0.203125q1.140625 0.25 1.953125 0.953125q1.0625 0.890625 1.578125 2.28125q0.53125 1.390625 0.53125 3.171875q0 1.515625 -0.359375 2.703125q-0.359375 1.171875 -0.921875 1.9375q-0.546875 0.765625 -1.203125 1.21875q-0.65625 0.4375 -1.59375 0.671875q-0.9375 0.21875 -2.140625 0.21875l-4.828125 0zm1.765625 -1.578125l2.859375 0q1.3125 0 2.0625 -0.234375q0.75 -0.25 1.203125 -0.703125q0.625 -0.625 0.96875 -1.6875q0.359375 -1.0625 0.359375 -2.578125q0 -2.09375 -0.6875 -3.21875q-0.6875 -1.125 -1.671875 -1.5q-0.703125 -0.28125 -2.28125 -0.28125l-2.8125 0l0 10.203125zm18.129196 -1.53125l1.6875 0.203125q-0.40625 1.484375 -1.484375 2.3125q-1.078125 0.8125 -2.765625 0.8125q-2.125 0 -3.375 -1.296875q-1.234375 -1.3125 -1.234375 -3.671875q0 -2.453125 1.25 -3.796875q1.265625 -1.34375 3.265625 -1.34375q1.9375 0 3.15625 1.328125q1.234375 1.3125 1.234375 3.703125q0 0.15625 0 0.4375l-7.21875 0q0.09375 1.59375 0.90625 2.453125q0.8125 0.84375 2.015625 0.84375q0.90625 0 1.546875 -0.46875q0.640625 -0.484375 1.015625 -1.515625zm-5.390625 -2.65625l5.40625 0q-0.109375 -1.21875 -0.625 -1.828125q-0.78125 -0.953125 -2.03125 -0.953125q-1.125 0 -1.90625 0.765625q-0.765625 0.75 -0.84375 2.015625zm11.828842 5.765625l-3.6875 -9.671875l1.734375 0l2.078125 5.796875q0.328125 0.9375 0.625 1.9375q0.203125 -0.765625 0.609375 -1.828125l2.140625 -5.90625l1.6875 0l-3.65625 9.671875l-1.53125 0zm6.640625 -11.46875l0 -1.890625l1.6406097 0l0 1.890625l-1.6406097 0zm0 11.46875l0 -9.671875l1.6406097 0l0 9.671875l-1.6406097 0zm10.457321 -3.546875l1.609375 0.21875q-0.265625 1.65625 -1.359375 2.609375q-1.078125 0.9375 -2.671875 0.9375q-1.984375 0 -3.1875 -1.296875q-1.203125 -1.296875 -1.203125 -3.71875q0 -1.578125 0.515625 -2.75q0.515625 -1.171875 1.578125 -1.75q1.0625 -0.59375 2.3125 -0.59375q1.578125 0 2.578125 0.796875q1.0 0.796875 1.28125 2.265625l-1.59375 0.234375q-0.234375 -0.96875 -0.8125 -1.453125q-0.578125 -0.5 -1.390625 -0.5q-1.234375 0 -2.015625 0.890625q-0.78125 0.890625 -0.78125 2.8125q0 1.953125 0.75 2.84375q0.75 0.875 1.953125 0.875q0.96875 0 1.609375 -0.59375q0.65625 -0.59375 0.828125 -1.828125zm9.640625 0.4375l1.6875 0.203125q-0.40625 1.484375 -1.484375 2.3125q-1.078125 0.8125 -2.765625 0.8125q-2.125 0 -3.375 -1.296875q-1.234375 -1.3125 -1.234375 -3.671875q0 -2.453125 1.25 -3.796875q1.265625 -1.34375 3.265625 -1.34375q1.9375 0 3.15625 1.328125q1.234375 1.3125 1.234375 3.703125q0 0.15625 0 0.4375l-7.21875 0q0.09375 1.59375 0.90625 2.453125q0.8125 0.84375 2.015625 0.84375q0.90625 0 1.546875 -0.46875q0.640625 -0.484375 1.015625 -1.515625zm-5.390625 -2.65625l5.40625 0q-0.109375 -1.21875 -0.625 -1.828125q-0.78125 -0.953125 -2.03125 -0.953125q-1.125 0 -1.90625 0.765625q-0.765625 0.75 -0.84375 2.015625z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m153.17061 40.25197l0 0" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m153.17061 40.25197l0 0" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m48.435696 73.03937l402.67715 0" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" stroke-dasharray="4.0,3.0" d="m48.435696 73.03937l402.67715 0" fill-rule="evenodd"/><path fill="#cfe2f3" d="m177.95801 71.49061l-12.393707 0l0 12.897636l-24.7874 0l0 -12.897636l-12.393692 0l24.7874 -12.89764z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m177.95801 71.49061l-12.393707 0l0 12.897636l-24.7874 0l0 -12.897636l-12.393692 0l24.7874 -12.89764z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m320.3832 71.49061l12.393707 0l0 -12.89764l24.787384 0l0 12.89764l12.393707 0l-24.787415 12.897636z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m320.3832 71.49061l12.393707 0l0 -12.89764l24.787384 0l0 12.89764l12.393707 0l-24.787415 12.897636z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m177.95801 188.3931l-12.393707 0l0 12.897629l-24.7874 0l0 -12.897629l-12.393692 0l24.7874 -12.897644z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m177.95801 188.3931l-12.393707 0l0 12.897629l-24.7874 0l0 -12.897629l-12.393692 0l24.7874 -12.897644z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m320.3832 188.3931l12.393707 0l0 -12.897644l24.787384 0l0 12.897644l12.393707 0l-24.787415 12.897629z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m320.3832 188.3931l12.393707 0l0 -12.897644l24.787384 0l0 12.897644l12.393707 0l-24.787415 12.897629z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m177.95801 301.4256l-12.393707 0l0 12.897644l-24.7874 0l0 -12.897644l-12.393692 0l24.7874 -12.897644z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m177.95801 301.4256l-12.393707 0l0 12.897644l-24.7874 0l0 -12.897644l-12.393692 0l24.7874 -12.897644z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m320.3832 301.4256l12.393707 0l0 -12.897644l24.787384 0l0 12.897644l12.393707 0l-24.787415 12.897644z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m320.3832 301.4256l12.393707 0l0 -12.897644l24.787384 0l0 12.897644l12.393707 0l-24.787415 12.897644z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m177.95801 415.4906l-12.393707 0l0 12.897644l-24.7874 0l0 -12.897644l-12.393692 0l24.7874 -12.897644z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m177.95801 415.4906l-12.393707 0l0 12.897644l-24.7874 0l0 -12.897644l-12.393692 0l24.7874 -12.897644z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m320.3832 415.4906l12.393707 0l0 -12.897644l24.787384 0l0 12.897644l12.393707 0l-24.787415 12.897644z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m320.3832 415.4906l12.393707 0l0 -12.897644l24.787384 0l0 12.897644l12.393707 0l-24.787415 12.897644z" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m198.14961 44.009186l109.44881 0l0 53.70079l-109.44881 0z" fill-rule="evenodd"/><path fill="#000000" d="m224.98174 70.929184l2.78125 -13.359375l1.65625 0l-1.0 4.78125q0.78125 -0.71875 1.421875 -1.015625q0.640625 -0.296875 1.328125 -0.296875q1.359375 0 2.265625 1.015625q0.90625 1.0 0.90625 2.9375q0 1.28125 -0.375 2.359375q-0.359375 1.0625 -0.890625 1.78125q-0.53125 0.71875 -1.109375 1.15625q-0.578125 0.4375 -1.1875 0.640625q-0.59375 0.21875 -1.140625 0.21875q-0.96875 0 -1.703125 -0.5q-0.71875 -0.515625 -1.125 -1.546875l-0.390625 1.828125l-1.4375 0zm2.4375 -3.96875l-0.015625 0.3125q0 1.234375 0.59375 1.890625q0.59375 0.640625 1.484375 0.640625q0.859375 0 1.578125 -0.609375q0.734375 -0.609375 1.1875 -1.890625q0.46875 -1.28125 0.46875 -2.375q0 -1.21875 -0.59375 -1.890625q-0.578125 -0.671875 -1.4375 -0.671875q-0.90625 0 -1.65625 0.6875q-0.734375 0.6875 -1.234375 2.125q-0.375 1.0625 -0.375 1.78125zm14.531967 2.21875q-1.734375 1.96875 -3.5625 1.96875q-1.109375 0 -1.796875 -0.640625q-0.6875 -0.640625 -0.6875 -1.578125q0 -0.609375 0.296875 -2.09375l1.171875 -5.578125l1.65625 0l-1.296875 6.1875q-0.171875 0.765625 -0.171875 1.203125q0 0.546875 0.328125 0.859375q0.34375 0.296875 0.984375 0.296875q0.703125 0 1.359375 -0.328125q0.65625 -0.34375 1.125 -0.921875q0.484375 -0.578125 0.796875 -1.359375q0.1875 -0.5 0.453125 -1.765625l0.875 -4.171875l1.65625 0l-2.03125 9.671875l-1.515625 0l0.359375 -1.75zm4.000717 1.75l1.765625 -8.40625l-1.484375 0l0.265625 -1.265625l1.484375 0l0.28125 -1.375q0.21875 -1.03125 0.4375 -1.484375q0.234375 -0.453125 0.75 -0.75q0.53125 -0.296875 1.4375 -0.296875q0.625 0 1.828125 0.265625l-0.296875 1.4375q-0.84375 -0.21875 -1.40625 -0.21875q-0.484375 0 -0.734375 0.25q-0.25 0.234375 -0.4375 1.125l-0.21875 1.046875l1.84375 0l-0.265625 1.265625l-1.84375 0l-1.75 8.40625l-1.65625 0zm5.183304 0l1.765625 -8.40625l-1.484375 0l0.265625 -1.265625l1.484375 0l0.28125 -1.375q0.21875 -1.03125 0.4375 -1.484375q0.234375 -0.453125 0.75 -0.75q0.53125 -0.296875 1.4375 -0.296875q0.625 0 1.828125 0.265625l-0.296875 1.4375q-0.84375 -0.21875 -1.40625 -0.21875q-0.484375 0 -0.734375 0.25q-0.25 0.234375 -0.4375 1.125l-0.21875 1.046875l1.84375 0l-0.265625 1.265625l-1.84375 0l-1.75 8.40625l-1.65625 0zm12.058319 -3.28125l1.609375 0.15625q-0.34375 1.1875 -1.59375 2.265625q-1.234375 1.078125 -2.96875 1.078125q-1.0625 0 -1.96875 -0.5q-0.890625 -0.5 -1.359375 -1.4375q-0.46875 -0.953125 -0.46875 -2.15625q0 -1.59375 0.734375 -3.078125q0.734375 -1.484375 1.890625 -2.203125q1.171875 -0.734375 2.53125 -0.734375q1.734375 0 2.765625 1.078125q1.03125 1.0625 1.03125 2.921875q0 0.71875 -0.125 1.46875l-7.125 0q-0.046875 0.28125 -0.046875 0.5q0 1.359375 0.625 2.078125q0.625 0.71875 1.53125 0.71875q0.84375 0 1.65625 -0.546875q0.828125 -0.5625 1.28125 -1.609375zm-4.78125 -2.40625l5.421875 0q0.015625 -0.25 0.015625 -0.359375q0 -1.234375 -0.625 -1.890625q-0.625 -0.671875 -1.59375 -0.671875q-1.0625 0 -1.9375 0.734375q-0.859375 0.71875 -1.28125 2.1875zm8.063202 5.6875l2.015625 -9.671875l1.453125 0l-0.40625 1.96875q0.75 -1.109375 1.453125 -1.640625q0.71875 -0.546875 1.46875 -0.546875q0.5 0 1.21875 0.359375l-0.671875 1.53125q-0.4375 -0.3125 -0.9375 -0.3125q-0.875 0 -1.78125 0.96875q-0.90625 0.953125 -1.4375 3.46875l-0.8125 3.875l-1.5625 0zm6.368927 -3.3125l1.640625 -0.09375q0 0.703125 0.21875 1.21875q0.21875 0.5 0.796875 0.8125q0.59375 0.3125 1.375 0.3125q1.09375 0 1.640625 -0.4375q0.546875 -0.4375 0.546875 -1.015625q0 -0.4375 -0.328125 -0.8125q-0.328125 -0.390625 -1.640625 -0.953125q-1.3125 -0.5625 -1.671875 -0.78125q-0.609375 -0.375 -0.921875 -0.875q-0.3125 -0.515625 -0.3125 -1.171875q0 -1.140625 0.90625 -1.953125q0.921875 -0.828125 2.5625 -0.828125q1.828125 0 2.765625 0.84375q0.953125 0.84375 1.0 2.21875l-1.609375 0.109375q-0.046875 -0.875 -0.625 -1.390625q-0.578125 -0.515625 -1.65625 -0.515625q-0.84375 0 -1.328125 0.40625q-0.46875 0.390625 -0.46875 0.84375q0 0.453125 0.40625 0.796875q0.28125 0.234375 1.421875 0.734375q1.890625 0.8125 2.375 1.28125q0.78125 0.765625 0.78125 1.84375q0 0.71875 -0.4375 1.421875q-0.4375 0.6875 -1.34375 1.109375q-0.90625 0.40625 -2.140625 0.40625q-1.671875 0 -2.84375 -0.828125q-1.1875 -0.828125 -1.109375 -2.703125z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m198.14961 164.00919l109.44881 0l0 53.70079l-109.44881 0z" fill-rule="evenodd"/><path fill="#000000" d="m211.6696 187.61668l1.640625 -0.09375q0 0.703125 0.21875 1.21875q0.21875 0.5 0.796875 0.8125q0.59375 0.3125 1.375 0.3125q1.09375 0 1.640625 -0.4375q0.546875 -0.4375 0.546875 -1.015625q0 -0.4375 -0.328125 -0.8125q-0.328125 -0.390625 -1.640625 -0.953125q-1.3125 -0.5625 -1.671875 -0.78125q-0.609375 -0.375 -0.921875 -0.875q-0.3125 -0.515625 -0.3125 -1.171875q0 -1.140625 0.90625 -1.953125q0.921875 -0.828125 2.5625 -0.828125q1.828125 0 2.765625 0.84375q0.953125 0.84375 1.0 2.21875l-1.609375 0.109375q-0.046875 -0.875 -0.625 -1.390625q-0.578125 -0.515625 -1.65625 -0.515625q-0.84375 0 -1.328125 0.40625q-0.46875 0.390625 -0.46875 0.84375q0 0.453125 0.40625 0.796875q0.28125 0.234375 1.421875 0.734375q1.890625 0.8125 2.375 1.28125q0.78125 0.765625 0.78125 1.84375q0 0.71875 -0.4375 1.421875q-0.4375 0.6875 -1.34375 1.109375q-0.90625 0.40625 -2.140625 0.40625q-1.671875 0 -2.84375 -0.828125q-1.1875 -0.828125 -1.109375 -2.703125zm15.84375 -0.21875l1.65625 0.171875q-0.625 1.8125 -1.765625 2.703125q-1.140625 0.875 -2.609375 0.875q-1.578125 0 -2.5625 -1.015625q-0.96875 -1.03125 -0.96875 -2.859375q0 -1.578125 0.625 -3.109375q0.640625 -1.53125 1.796875 -2.328125q1.171875 -0.796875 2.6875 -0.796875q1.546875 0 2.453125 0.875q0.921875 0.875 0.921875 2.328125l-1.625 0.109375q-0.015625 -0.921875 -0.546875 -1.4375q-0.515625 -0.515625 -1.359375 -0.515625q-1.0 0 -1.734375 0.625q-0.71875 0.625 -1.140625 1.90625q-0.40625 1.28125 -0.40625 2.46875q0 1.234375 0.546875 1.859375q0.546875 0.609375 1.34375 0.609375q0.796875 0 1.53125 -0.609375q0.734375 -0.609375 1.15625 -1.859375zm9.171875 2.328125q-0.859375 0.734375 -1.65625 1.078125q-0.78125 0.34375 -1.6875 0.34375q-1.34375 0 -2.171875 -0.78125q-0.8125 -0.796875 -0.8125 -2.03125q0 -0.796875 0.375 -1.421875q0.375 -0.625 0.984375 -1.0q0.609375 -0.390625 1.484375 -0.546875q0.5625 -0.109375 2.109375 -0.171875q1.5625 -0.0625 2.234375 -0.328125q0.1875 -0.671875 0.1875 -1.125q0 -0.578125 -0.421875 -0.90625q-0.5625 -0.453125 -1.671875 -0.453125q-1.03125 0 -1.703125 0.46875q-0.65625 0.453125 -0.953125 1.296875l-1.671875 -0.140625q0.515625 -1.4375 1.609375 -2.203125q1.109375 -0.765625 2.796875 -0.765625q1.796875 0 2.84375 0.859375q0.796875 0.625 0.796875 1.65625q0 0.765625 -0.21875 1.796875l-0.53125 2.40625q-0.265625 1.140625 -0.265625 1.859375q0 0.453125 0.203125 1.3125l-1.671875 0q-0.125 -0.46875 -0.1875 -1.203125zm0.609375 -3.703125q-0.34375 0.140625 -0.75 0.21875q-0.390625 0.0625 -1.3125 0.15625q-1.4375 0.125 -2.03125 0.328125q-0.59375 0.1875 -0.90625 0.625q-0.296875 0.421875 -0.296875 0.9375q0 0.6875 0.484375 1.140625q0.484375 0.4375 1.359375 0.4375q0.828125 0 1.578125 -0.421875q0.75 -0.4375 1.1875 -1.203125q0.4375 -0.78125 0.6875 -2.21875zm7.094467 3.5625l-0.265625 1.359375q-0.59375 0.140625 -1.15625 0.140625q-0.984375 0 -1.5625 -0.46875q-0.4375 -0.375 -0.4375 -1.0q0 -0.3125 0.234375 -1.46875l1.171875 -5.625l-1.296875 0l0.265625 -1.265625l1.296875 0l0.5 -2.375l1.890625 -1.140625l-0.734375 3.515625l1.625 0l-0.28125 1.265625l-1.609375 0l-1.125 5.359375q-0.203125 1.015625 -0.203125 1.21875q0 0.28125 0.15625 0.4375q0.171875 0.15625 0.5625 0.15625q0.546875 0 0.96875 -0.109375zm5.183304 0l-0.265625 1.359375q-0.59375 0.140625 -1.15625 0.140625q-0.984375 0 -1.5625 -0.46875q-0.4375 -0.375 -0.4375 -1.0q0 -0.3125 0.234375 -1.46875l1.171875 -5.625l-1.296875 0l0.265625 -1.265625l1.296875 0l0.5 -2.375l1.890625 -1.140625l-0.734375 3.515625l1.625 0l-0.28125 1.265625l-1.609375 0l-1.125 5.359375q-0.203125 1.015625 -0.203125 1.21875q0 0.28125 0.15625 0.4375q0.171875 0.15625 0.5625 0.15625q0.546875 0 0.96875 -0.109375zm8.433304 -1.9375l1.609375 0.15625q-0.34375 1.1875 -1.59375 2.265625q-1.234375 1.078125 -2.96875 1.078125q-1.0625 0 -1.96875 -0.5q-0.890625 -0.5 -1.359375 -1.4375q-0.46875 -0.953125 -0.46875 -2.15625q0 -1.59375 0.734375 -3.078125q0.734375 -1.484375 1.890625 -2.203125q1.171875 -0.734375 2.53125 -0.734375q1.734375 0 2.765625 1.078125q1.03125 1.0625 1.03125 2.921875q0 0.71875 -0.125 1.46875l-7.125 0q-0.046875 0.28125 -0.046875 0.5q0 1.359375 0.625 2.078125q0.625 0.71875 1.53125 0.71875q0.84375 0 1.65625 -0.546875q0.828125 -0.5625 1.28125 -1.609375zm-4.78125 -2.40625l5.421875 0q0.015625 -0.25 0.015625 -0.359375q0 -1.234375 -0.625 -1.890625q-0.625 -0.671875 -1.59375 -0.671875q-1.0625 0 -1.9375 0.734375q-0.859375 0.71875 -1.28125 2.1875zm8.063202 5.6875l2.015625 -9.671875l1.453125 0l-0.40625 1.96875q0.75 -1.109375 1.453125 -1.640625q0.71875 -0.546875 1.46875 -0.546875q0.5 0 1.21875 0.359375l-0.671875 1.53125q-0.4375 -0.3125 -0.9375 -0.3125q-0.875 0 -1.78125 0.96875q-0.90625 0.953125 -1.4375 3.46875l-0.8125 3.875l-1.5625 0zm11.255371 0l2.796875 -13.359375l1.640625 0l-2.78125 13.359375l-1.65625 0zm6.613556 -11.484375l0.40625 -1.875l1.625 0l-0.390625 1.875l-1.640625 0zm-2.390625 11.484375l2.015625 -9.671875l1.65625 0l-2.03125 9.671875l-1.640625 0zm4.3635864 -3.3125l1.640625 -0.09375q0 0.703125 0.21875 1.21875q0.21875 0.5 0.796875 0.8125q0.59375 0.3125 1.375 0.3125q1.09375 0 1.640625 -0.4375q0.546875 -0.4375 0.546875 -1.015625q0 -0.4375 -0.328125 -0.8125q-0.328125 -0.390625 -1.640625 -0.953125q-1.3125 -0.5625 -1.671875 -0.78125q-0.609375 -0.375 -0.921875 -0.875q-0.3125 -0.515625 -0.3125 -1.171875q0 -1.140625 0.90625 -1.953125q0.921875 -0.828125 2.5625 -0.828125q1.828125 0 2.765625 0.84375q0.953125 0.84375 1.0 2.21875l-1.609375 0.109375q-0.046875 -0.875 -0.625 -1.390625q-0.578125 -0.515625 -1.65625 -0.515625q-0.84375 0 -1.328125 0.40625q-0.46875 0.390625 -0.46875 0.84375q0 0.453125 0.40625 0.796875q0.28125 0.234375 1.421875 0.734375q1.890625 0.8125 2.375 1.28125q0.78125 0.765625 0.78125 1.84375q0 0.71875 -0.4375 1.421875q-0.4375 0.6875 -1.34375 1.109375q-0.90625 0.40625 -2.140625 0.40625q-1.671875 0 -2.84375 -0.828125q-1.1875 -0.828125 -1.109375 -2.703125zm13.015625 1.96875l-0.265625 1.359375q-0.59375 0.140625 -1.15625 0.140625q-0.984375 0 -1.5625 -0.46875q-0.4375 -0.375 -0.4375 -1.0q0 -0.3125 0.234375 -1.46875l1.171875 -5.625l-1.296875 0l0.265625 -1.265625l1.296875 0l0.5 -2.375l1.890625 -1.140625l-0.734375 3.515625l1.625 0l-0.28125 1.265625l-1.609375 0l-1.125 5.359375q-0.203125 1.015625 -0.203125 1.21875q0 0.28125 0.15625 0.4375q0.171875 0.15625 0.5625 0.15625q0.546875 0 0.96875 -0.109375z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m198.14961 280.1392l109.44881 0l0 53.700806l-109.44881 0z" fill-rule="evenodd"/><path fill="#000000" d="m223.58026 303.7467l1.640625 -0.09375q0 0.703125 0.21875 1.21875q0.21875 0.5 0.796875 0.8125q0.59375 0.3125 1.375 0.3125q1.09375 0 1.640625 -0.4375q0.546875 -0.4375 0.546875 -1.015625q0 -0.4375 -0.328125 -0.8125q-0.328125 -0.390625 -1.640625 -0.953125q-1.3125 -0.5625 -1.671875 -0.78125q-0.609375 -0.375 -0.921875 -0.875q-0.3125 -0.515625 -0.3125 -1.171875q0 -1.140625 0.90625 -1.953125q0.921875 -0.828125 2.5625 -0.828125q1.828125 0 2.765625 0.84375q0.953125 0.84375 1.0 2.21875l-1.609375 0.109375q-0.046875 -0.875 -0.625 -1.390625q-0.578125 -0.515625 -1.65625 -0.515625q-0.84375 0 -1.328125 0.40625q-0.46875 0.390625 -0.46875 0.84375q0 0.453125 0.40625 0.796875q0.28125 0.234375 1.421875 0.734375q1.890625 0.8125 2.375 1.28125q0.78125 0.765625 0.78125 1.84375q0 0.71875 -0.4375 1.421875q-0.4375 0.6875 -1.34375 1.109375q-0.90625 0.40625 -2.140625 0.40625q-1.671875 0 -2.84375 -0.828125q-1.1875 -0.828125 -1.109375 -2.703125zm9.1875 3.3125l2.796875 -13.359375l1.640625 0l-1.734375 8.28125l4.8125 -4.59375l2.171875 0l-4.125 3.609375l2.5 6.0625l-1.8125 0l-1.921875 -4.96875l-2.015625 1.734375l-0.671875 3.234375l-1.640625 0zm7.5 3.703125l0 -1.1875l10.875 0l0 1.1875l-10.875 0zm12.188217 -3.703125l2.78125 -13.359375l1.6562653 0l-1.0000153 4.78125q0.78126526 -0.71875 1.4218903 -1.015625q0.640625 -0.296875 1.328125 -0.296875q1.359375 0 2.265625 1.015625q0.90625 1.0 0.90625 2.9375q0 1.28125 -0.375 2.359375q-0.359375 1.0625 -0.890625 1.78125q-0.53125 0.71875 -1.109375 1.15625q-0.578125 0.4375 -1.1875 0.640625q-0.59375 0.21875 -1.140625 0.21875q-0.96875 0 -1.7031403 -0.5q-0.71875 -0.515625 -1.125 -1.546875l-0.390625 1.828125l-1.4375 0zm2.4375 -3.96875l-0.015625 0.3125q0 1.234375 0.59375 1.890625q0.59376526 0.640625 1.4843903 0.640625q0.859375 0 1.578125 -0.609375q0.734375 -0.609375 1.1875 -1.890625q0.46875 -1.28125 0.46875 -2.375q0 -1.21875 -0.59375 -1.890625q-0.578125 -0.671875 -1.4375 -0.671875q-0.90625 0 -1.65625 0.6875q-0.73439026 0.6875 -1.2343903 2.125q-0.375 1.0625 -0.375 1.78125zm14.531967 2.21875q-1.734375 1.96875 -3.5625 1.96875q-1.109375 0 -1.796875 -0.640625q-0.6875 -0.640625 -0.6875 -1.578125q0 -0.609375 0.296875 -2.09375l1.171875 -5.578125l1.65625 0l-1.296875 6.1875q-0.171875 0.765625 -0.171875 1.203125q0 0.546875 0.328125 0.859375q0.34375 0.296875 0.984375 0.296875q0.703125 0 1.359375 -0.328125q0.65625 -0.34375 1.125 -0.921875q0.484375 -0.578125 0.796875 -1.359375q0.1875 -0.5 0.453125 -1.765625l0.875 -4.171875l1.65625 0l-2.03125 9.671875l-1.515625 0l0.359375 -1.75zm4.0007324 1.75l1.765625 -8.40625l-1.484375 0l0.265625 -1.265625l1.484375 0l0.28125 -1.375q0.21875 -1.03125 0.4375 -1.484375q0.234375 -0.453125 0.75 -0.75q0.53125 -0.296875 1.4375 -0.296875q0.625 0 1.828125 0.265625l-0.296875 1.4375q-0.84375 -0.21875 -1.40625 -0.21875q-0.484375 0 -0.734375 0.25q-0.25 0.234375 -0.4375 1.125l-0.21875 1.046875l1.84375 0l-0.265625 1.265625l-1.84375 0l-1.75 8.40625l-1.65625 0zm5.1832886 0l1.765625 -8.40625l-1.484375 0l0.265625 -1.265625l1.484375 0l0.28125 -1.375q0.21875 -1.03125 0.4375 -1.484375q0.234375 -0.453125 0.75 -0.75q0.53125 -0.296875 1.4375 -0.296875q0.625 0 1.828125 0.265625l-0.296875 1.4375q-0.84375 -0.21875 -1.40625 -0.21875q-0.484375 0 -0.734375 0.25q-0.25 0.234375 -0.4375 1.125l-0.21875 1.046875l1.84375 0l-0.265625 1.265625l-1.84375 0l-1.75 8.40625l-1.65625 0z" fill-rule="nonzero"/><path fill="#000000" fill-opacity="0.0" d="m163.81627 395.2362l178.11024 0l0 53.700806l-178.11024 0z" fill-rule="evenodd"/><path fill="#000000" d="m195.60925 418.62497l1.65625 0.171875q-0.625 1.8125 -1.765625 2.703125q-1.140625 0.875 -2.609375 0.875q-1.578125 0 -2.5625 -1.015625q-0.96875 -1.03125 -0.96875 -2.859375q0 -1.578125 0.625 -3.109375q0.640625 -1.53125 1.796875 -2.328125q1.171875 -0.796875 2.6875 -0.796875q1.546875 0 2.453125 0.875q0.921875 0.875 0.921875 2.328125l-1.625 0.109375q-0.015625 -0.921875 -0.546875 -1.4375q-0.515625 -0.515625 -1.359375 -0.515625q-1.0 0 -1.734375 0.625q-0.71875 0.625 -1.140625 1.90625q-0.40625 1.28125 -0.40625 2.46875q0 1.234375 0.546875 1.859375q0.546875 0.609375 1.34375 0.609375q0.796875 0 1.53125 -0.609375q0.734375 -0.609375 1.15625 -1.859375zm2.9375 -0.140625q0 -2.828125 1.671875 -4.6875q1.375 -1.53125 3.609375 -1.53125q1.75 0 2.8125 1.09375q1.078125 1.09375 1.078125 2.953125q0 1.65625 -0.671875 3.09375q-0.671875 1.4375 -1.921875 2.203125q-1.25 0.765625 -2.625 0.765625q-1.125 0 -2.046875 -0.484375q-0.921875 -0.484375 -1.421875 -1.359375q-0.484375 -0.890625 -0.484375 -2.046875zm1.65625 -0.15625q0 1.359375 0.65625 2.0625q0.65625 0.703125 1.65625 0.703125q0.53125 0 1.046875 -0.203125q0.53125 -0.21875 0.96875 -0.65625q0.453125 -0.4375 0.765625 -1.0q0.3125 -0.5625 0.5 -1.203125q0.28125 -0.90625 0.28125 -1.734375q0 -1.3125 -0.65625 -2.03125q-0.65625 -0.734375 -1.65625 -0.734375q-0.78125 0 -1.421875 0.375q-0.625 0.375 -1.140625 1.09375q-0.515625 0.703125 -0.765625 1.640625q-0.234375 0.9375 -0.234375 1.6875zm8.438217 3.828125l2.015625 -9.671875l1.5 0l-0.359375 1.6875q0.96875 -1.0 1.8125 -1.453125q0.859375 -0.453125 1.734375 -0.453125q1.1875 0 1.84375 0.640625q0.671875 0.625 0.671875 1.703125q0 0.53125 -0.234375 1.6875l-1.234375 5.859375l-1.640625 0l1.28125 -6.125q0.1875 -0.890625 0.1875 -1.328125q0 -0.484375 -0.328125 -0.78125q-0.328125 -0.296875 -0.953125 -0.296875q-1.28125 0 -2.265625 0.90625q-0.984375 0.90625 -1.453125 3.125l-0.9375 4.5l-1.640625 0zm14.219467 -1.34375l-0.265625 1.359375q-0.59375 0.140625 -1.15625 0.140625q-0.984375 0 -1.5625 -0.46875q-0.4375 -0.375 -0.4375 -1.0q0 -0.3125 0.234375 -1.46875l1.171875 -5.625l-1.296875 0l0.265625 -1.265625l1.296875 0l0.5 -2.375l1.890625 -1.140625l-0.734375 3.515625l1.625 0l-0.28125 1.265625l-1.609375 0l-1.125 5.359375q-0.203125 1.015625 -0.203125 1.21875q0 0.28125 0.15625 0.4375q0.171875 0.15625 0.5625 0.15625q0.546875 0 0.96875 -0.109375zm8.433304 -1.9375l1.609375 0.15625q-0.34375 1.1875 -1.59375 2.265625q-1.234375 1.078125 -2.96875 1.078125q-1.0625 0 -1.96875 -0.5q-0.890625 -0.5 -1.359375 -1.4375q-0.46875 -0.953125 -0.46875 -2.15625q0 -1.59375 0.734375 -3.078125q0.734375 -1.484375 1.890625 -2.203125q1.171875 -0.734375 2.53125 -0.734375q1.734375 0 2.765625 1.078125q1.03125 1.0625 1.03125 2.921875q0 0.71875 -0.125 1.46875l-7.125 0q-0.046875 0.28125 -0.046875 0.5q0 1.359375 0.625 2.078125q0.625 0.71875 1.53125 0.71875q0.84375 0 1.65625 -0.546875q0.828125 -0.5625 1.28125 -1.609375zm-4.78125 -2.40625l5.421875 0q0.015625 -0.25 0.015625 -0.359375q0 -1.234375 -0.625 -1.890625q-0.625 -0.671875 -1.59375 -0.671875q-1.0625 0 -1.9375 0.734375q-0.859375 0.71875 -1.28125 2.1875zm7.406967 5.6875l4.21875 -4.90625l-2.421875 -4.765625l1.828125 0l0.8125 1.71875q0.453125 0.96875 0.828125 1.84375l2.78125 -3.5625l2.015625 0l-4.0625 4.890625l2.4375 4.78125l-1.8125 0l-0.96875 -1.96875q-0.3125 -0.625 -0.703125 -1.5625l-2.875 3.53125l-2.078125 0zm13.828125 -1.34375l-0.265625 1.359375q-0.59375 0.140625 -1.15625 0.140625q-0.984375 0 -1.5625 -0.46875q-0.4375 -0.375 -0.4375 -1.0q0 -0.3125 0.234375 -1.46875l1.171875 -5.625l-1.296875 0l0.265625 -1.265625l1.296875 0l0.5 -2.375l1.890625 -1.140625l-0.734375 3.515625l1.625 0l-0.28125 1.265625l-1.609375 0l-1.125 5.359375q-0.203125 1.015625 -0.203125 1.21875q0 0.28125 0.15625 0.4375q0.171875 0.15625 0.5625 0.15625q0.546875 0 0.96875 -0.109375zm11.210358 -0.8125l0 -3.671875l-3.640625 0l0 -1.515625l3.640625 0l0 -3.640625l1.546875 0l0 3.640625l3.640625 0l0 1.515625l-3.640625 0l0 3.671875l-1.546875 0zm11.390778 2.15625l2.78125 -13.359375l1.65625 0l-1.0 4.78125q0.78125 -0.71875 1.421875 -1.015625q0.640625 -0.296875 1.328125 -0.296875q1.359375 0 2.265625 1.015625q0.90625 1.0 0.90625 2.9375q0 1.28125 -0.375 2.359375q-0.359375 1.0625 -0.890625 1.78125q-0.53125 0.71875 -1.109375 1.15625q-0.578125 0.4375 -1.1875 0.640625q-0.59375 0.21875 -1.140625 0.21875q-0.96875 0 -1.703125 -0.5q-0.71875 -0.515625 -1.125 -1.546875l-0.390625 1.828125l-1.4375 0zm2.4375 -3.96875l-0.015625 0.3125q0 1.234375 0.59375 1.890625q0.59375 0.640625 1.484375 0.640625q0.859375 0 1.578125 -0.609375q0.734375 -0.609375 1.1875 -1.890625q0.46875 -1.28125 0.46875 -2.375q0 -1.21875 -0.59375 -1.890625q-0.578125 -0.671875 -1.4375 -0.671875q-0.90625 0 -1.65625 0.6875q-0.734375 0.6875 -1.234375 2.125q-0.375 1.0625 -0.375 1.78125zm14.531952 2.21875q-1.734375 1.96875 -3.5625 1.96875q-1.109375 0 -1.796875 -0.640625q-0.6875 -0.640625 -0.6875 -1.578125q0 -0.609375 0.296875 -2.09375l1.171875 -5.578125l1.65625 0l-1.296875 6.1875q-0.171875 0.765625 -0.171875 1.203125q0 0.546875 0.328125 0.859375q0.34375 0.296875 0.984375 0.296875q0.703125 0 1.359375 -0.328125q0.65625 -0.34375 1.125 -0.921875q0.484375 -0.578125 0.796875 -1.359375q0.1875 -0.5 0.453125 -1.765625l0.875 -4.171875l1.65625 0l-2.03125 9.671875l-1.515625 0l0.359375 -1.75zm4.0007324 1.75l1.765625 -8.40625l-1.484375 0l0.265625 -1.265625l1.484375 0l0.28125 -1.375q0.21875 -1.03125 0.4375 -1.484375q0.234375 -0.453125 0.75 -0.75q0.53125 -0.296875 1.4375 -0.296875q0.625 0 1.828125 0.265625l-0.296875 1.4375q-0.84375 -0.21875 -1.40625 -0.21875q-0.484375 0 -0.734375 0.25q-0.25 0.234375 -0.4375 1.125l-0.21875 1.046875l1.84375 0l-0.265625 1.265625l-1.84375 0l-1.75 8.40625l-1.65625 0zm5.1832886 0l1.765625 -8.40625l-1.484375 0l0.265625 -1.265625l1.484375 0l0.28125 -1.375q0.21875 -1.03125 0.4375 -1.484375q0.234375 -0.453125 0.75 -0.75q0.53125 -0.296875 1.4375 -0.296875q0.625 0 1.828125 0.265625l-0.296875 1.4375q-0.84375 -0.21875 -1.40625 -0.21875q-0.484375 0 -0.734375 0.25q-0.25 0.234375 -0.4375 1.125l-0.21875 1.046875l1.84375 0l-0.265625 1.265625l-1.84375 0l-1.75 8.40625l-1.65625 0zm12.058319 -3.28125l1.609375 0.15625q-0.34375 1.1875 -1.59375 2.265625q-1.234375 1.078125 -2.96875 1.078125q-1.0625 0 -1.96875 -0.5q-0.890625 -0.5 -1.359375 -1.4375q-0.46875 -0.953125 -0.46875 -2.15625q0 -1.59375 0.734375 -3.078125q0.734375 -1.484375 1.890625 -2.203125q1.171875 -0.734375 2.53125 -0.734375q1.734375 0 2.765625 1.078125q1.03125 1.0625 1.03125 2.921875q0 0.71875 -0.125 1.46875l-7.125 0q-0.046875 0.28125 -0.046875 0.5q0 1.359375 0.625 2.078125q0.625 0.71875 1.53125 0.71875q0.84375 0 1.65625 -0.546875q0.828125 -0.5625 1.28125 -1.609375zm-4.78125 -2.40625l5.421875 0q0.015625 -0.25 0.015625 -0.359375q0 -1.234375 -0.625 -1.890625q-0.625 -0.671875 -1.59375 -0.671875q-1.0625 0 -1.9375 0.734375q-0.859375 0.71875 -1.28125 2.1875zm8.063202 5.6875l2.015625 -9.671875l1.453125 0l-0.40625 1.96875q0.75 -1.109375 1.453125 -1.640625q0.71875 -0.546875 1.46875 -0.546875q0.5 0 1.21875 0.359375l-0.671875 1.53125q-0.4375 -0.3125 -0.9375 -0.3125q-0.875 0 -1.78125 0.96875q-0.90625 0.953125 -1.4375 3.46875l-0.8125 3.875l-1.5625 0z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m0 165.96588l118.74016 0l0 40.25197l-118.74016 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m0 165.96588l118.74016 0l0 40.25197l-118.74016 0z" fill-rule="evenodd"/><path fill="#000000" d="m23.145836 190.12123l1.625 -0.25q0.125 0.96875 0.75 1.5q0.625 0.515625 1.75 0.515625q1.125 0 1.671875 -0.453125q0.546875 -0.46875 0.546875 -1.09375q0 -0.546875 -0.484375 -0.875q-0.328125 -0.21875 -1.671875 -0.546875q-1.8125 -0.46875 -2.515625 -0.796875q-0.6875 -0.328125 -1.046875 -0.90625q-0.359375 -0.59375 -0.359375 -1.3125q0 -0.640625 0.296875 -1.1875q0.296875 -0.5625 0.8125 -0.921875q0.375 -0.28125 1.03125 -0.46875q0.671875 -0.203125 1.421875 -0.203125q1.140625 0 2.0 0.328125q0.859375 0.328125 1.265625 0.890625q0.421875 0.5625 0.578125 1.5l-1.609375 0.21875q-0.109375 -0.75 -0.640625 -1.171875q-0.515625 -0.421875 -1.46875 -0.421875q-1.140625 0 -1.625 0.375q-0.46875 0.375 -0.46875 0.875q0 0.3125 0.1875 0.578125q0.203125 0.265625 0.640625 0.4375q0.234375 0.09375 1.4375 0.421875q1.75 0.453125 2.4375 0.75q0.6875 0.296875 1.078125 0.859375q0.390625 0.5625 0.390625 1.40625q0 0.828125 -0.484375 1.546875q-0.46875 0.71875 -1.375 1.125q-0.90625 0.390625 -2.046875 0.390625q-1.875 0 -2.875 -0.78125q-0.984375 -0.78125 -1.25 -2.328125zm13.5625 1.421875l0.234375 1.453125q-0.6875 0.140625 -1.234375 0.140625q-0.890625 0 -1.390625 -0.28125q-0.484375 -0.28125 -0.6875 -0.734375q-0.203125 -0.46875 -0.203125 -1.9375l0 -5.578125l-1.203125 0l0 -1.265625l1.203125 0l0 -2.390625l1.625 -0.984375l0 3.375l1.65625 0l0 1.265625l-1.65625 0l0 5.671875q0 0.6875 0.078125 0.890625q0.09375 0.203125 0.28125 0.328125q0.203125 0.109375 0.578125 0.109375q0.265625 0 0.71875 -0.0625zm1.5895538 1.46875l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0zm6.228302 3.703125l0 -13.375l1.484375 0l0 1.25q0.53125 -0.734375 1.1875 -1.09375q0.671875 -0.375 1.625 -0.375q1.234375 0 2.171875 0.640625q0.953125 0.625 1.4375 1.796875q0.484375 1.15625 0.484375 2.546875q0 1.484375 -0.53125 2.671875q-0.53125 1.1875 -1.546875 1.828125q-1.015625 0.625 -2.140625 0.625q-0.8125 0 -1.46875 -0.34375q-0.65625 -0.34375 -1.0625 -0.875l0 4.703125l-1.640625 0zm1.484375 -8.484375q0 1.859375 0.75 2.765625q0.765625 0.890625 1.828125 0.890625q1.09375 0 1.875 -0.921875q0.78125 -0.9375 0.78125 -2.875q0 -1.84375 -0.765625 -2.765625q-0.75 -0.921875 -1.8125 -0.921875q-1.046875 0 -1.859375 0.984375q-0.796875 0.96875 -0.796875 2.84375zm15.203842 3.59375q-0.921875 0.765625 -1.765625 1.09375q-0.828125 0.3125 -1.796875 0.3125q-1.59375 0 -2.453125 -0.78125q-0.859375 -0.78125 -0.859375 -1.984375q0 -0.71875 0.328125 -1.296875q0.328125 -0.59375 0.84375 -0.9375q0.53125 -0.359375 1.1875 -0.546875q0.46875 -0.125 1.453125 -0.25q1.984375 -0.234375 2.921875 -0.5625q0.015625 -0.34375 0.015625 -0.421875q0 -1.0 -0.46875 -1.421875q-0.625 -0.546875 -1.875 -0.546875q-1.15625 0 -1.703125 0.40625q-0.546875 0.40625 -0.8125 1.421875l-1.609375 -0.21875q0.21875 -1.015625 0.71875 -1.640625q0.5 -0.640625 1.453125 -0.984375q0.953125 -0.34375 2.1875 -0.34375q1.25 0 2.015625 0.296875q0.78125 0.28125 1.140625 0.734375q0.375 0.4375 0.515625 1.109375q0.078125 0.421875 0.078125 1.515625l0 2.1875q0 2.28125 0.109375 2.890625q0.109375 0.59375 0.40625 1.15625l-1.703125 0q-0.265625 -0.515625 -0.328125 -1.1875zm-0.140625 -3.671875q-0.890625 0.375 -2.671875 0.625q-1.015625 0.140625 -1.4375 0.328125q-0.421875 0.1875 -0.65625 0.53125q-0.21875 0.34375 -0.21875 0.78125q0 0.65625 0.5 1.09375q0.5 0.4375 1.453125 0.4375q0.9375 0 1.671875 -0.40625q0.75 -0.421875 1.09375 -1.140625q0.265625 -0.5625 0.265625 -1.640625l0 -0.609375zm4.188217 4.859375l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0zm5.572052 -2.890625l1.625 -0.25q0.125 0.96875 0.75 1.5q0.625 0.515625 1.75 0.515625q1.125 0 1.671875 -0.453125q0.546875 -0.46875 0.546875 -1.09375q0 -0.546875 -0.484375 -0.875q-0.328125 -0.21875 -1.671875 -0.546875q-1.8125 -0.46875 -2.515625 -0.796875q-0.6875 -0.328125 -1.046875 -0.90625q-0.359375 -0.59375 -0.359375 -1.3125q0 -0.640625 0.296875 -1.1875q0.296875 -0.5625 0.8125 -0.921875q0.375 -0.28125 1.03125 -0.46875q0.671875 -0.203125 1.421875 -0.203125q1.140625 0 2.0 0.328125q0.859375 0.328125 1.265625 0.890625q0.421875 0.5625 0.578125 1.5l-1.609375 0.21875q-0.109375 -0.75 -0.640625 -1.171875q-0.515625 -0.421875 -1.46875 -0.421875q-1.140625 0 -1.625 0.375q-0.46875 0.375 -0.46875 0.875q0 0.3125 0.1875 0.578125q0.203125 0.265625 0.640625 0.4375q0.234375 0.09375 1.4375 0.421875q1.75 0.453125 2.4375 0.75q0.6875 0.296875 1.078125 0.859375q0.390625 0.5625 0.390625 1.40625q0 0.828125 -0.484375 1.546875q-0.46875 0.71875 -1.375 1.125q-0.90625 0.390625 -2.046875 0.390625q-1.875 0 -2.875 -0.78125q-0.984375 -0.78125 -1.25 -2.328125zm16.609375 -0.21875l1.6875 0.203125q-0.40625 1.484375 -1.484375 2.3125q-1.078125 0.8125 -2.765625 0.8125q-2.125 0 -3.375 -1.296875q-1.234375 -1.3125 -1.234375 -3.671875q0 -2.453125 1.25 -3.796875q1.265625 -1.34375 3.265625 -1.34375q1.9375 0 3.15625 1.328125q1.234375 1.3125 1.234375 3.703125q0 0.15625 0 0.4375l-7.21875 0q0.09375 1.59375 0.90625 2.453125q0.8125 0.84375 2.015625 0.84375q0.90625 0 1.546875 -0.46875q0.640625 -0.484375 1.015625 -1.515625zm-5.390625 -2.65625l5.40625 0q-0.109375 -1.21875 -0.625 -1.828125q-0.78125 -0.953125 -2.03125 -0.953125q-1.125 0 -1.90625 0.765625q-0.765625 0.75 -0.84375 2.015625zm9.125717 5.765625l0 -9.671875l1.46875 0l0 1.46875q0.5625 -1.03125 1.03125 -1.359375q0.484375 -0.328125 1.0625 -0.328125q0.828125 0 1.6875 0.53125l-0.5625 1.515625q-0.609375 -0.359375 -1.203125 -0.359375q-0.546875 0 -0.96875 0.328125q-0.421875 0.328125 -0.609375 0.890625q-0.28125 0.875 -0.28125 1.921875l0 5.0625l-1.625 0z" fill-rule="nonzero"/></g></svg>
diff --git a/Documentation/networking/tls-offload-reorder-bad.svg b/Documentation/networking/tls-offload-reorder-bad.svg
new file mode 100644
index 0000000..d107aaf
--- /dev/null
+++ b/Documentation/networking/tls-offload-reorder-bad.svg
@@ -0,0 +1 @@
+<svg version="1.1" viewBox="0.0 0.0 672.0 68.0" fill="none" stroke="none" stroke-linecap="square" stroke-miterlimit="10" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg"><clipPath id="p.0"><path d="m0 0l960.0 0l0 720.0l-960.0 0l0 -720.0z" clip-rule="nonzero"/></clipPath><g clip-path="url(#p.0)"><path fill="#000000" fill-opacity="0.0" d="m0 0l960.0 0l0 720.0l-960.0 0z" fill-rule="evenodd"/><path fill="#b6d7a8" d="m0 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m0 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m15.953125 52.942722l-1.640625 0l0 -10.453125q-0.59375 0.5625 -1.5625 1.140625q-0.953125 0.5625 -1.71875 0.84375l0 -1.59375q1.375 -0.640625 2.40625 -1.5625q1.03125 -0.921875 1.453125 -1.78125l1.0625 0l0 13.40625z" fill-rule="nonzero"/><path fill="#c9daf8" d="m340.69897 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m340.69897 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m355.73022 52.942722l0 -3.203125l-5.796875 0l0 -1.5l6.09375 -8.65625l1.34375 0l0 8.65625l1.796875 0l0 1.5l-1.796875 0l0 3.203125l-1.640625 0zm0 -4.703125l0 -6.015625l-4.1875 6.015625l4.1875 0z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m225.37527 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m225.37527 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m235.15652 49.411472l1.640625 -0.21875q0.28125 1.40625 0.953125 2.015625q0.6875 0.609375 1.65625 0.609375q1.15625 0 1.953125 -0.796875q0.796875 -0.796875 0.796875 -1.984375q0 -1.125 -0.734375 -1.859375q-0.734375 -0.734375 -1.875 -0.734375q-0.46875 0 -1.15625 0.171875l0.1875 -1.4375q0.15625 0.015625 0.265625 0.015625q1.046875 0 1.875 -0.546875q0.84375 -0.546875 0.84375 -1.671875q0 -0.90625 -0.609375 -1.5q-0.609375 -0.59375 -1.578125 -0.59375q-0.953125 0 -1.59375 0.609375q-0.640625 0.59375 -0.8125 1.796875l-1.640625 -0.296875q0.296875 -1.640625 1.359375 -2.546875q1.0625 -0.90625 2.65625 -0.90625q1.09375 0 2.0 0.46875q0.921875 0.46875 1.40625 1.28125q0.5 0.8125 0.5 1.71875q0 0.859375 -0.46875 1.578125q-0.46875 0.703125 -1.375 1.125q1.1875 0.28125 1.84375 1.140625q0.65625 0.859375 0.65625 2.15625q0 1.734375 -1.28125 2.953125q-1.265625 1.21875 -3.21875 1.21875q-1.765625 0 -2.921875 -1.046875q-1.15625 -1.046875 -1.328125 -2.71875z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m572.3295 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m572.3295 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m590.72015 51.364597l0 1.578125l-8.828125 0q-0.015625 -0.59375 0.1875 -1.140625q0.34375 -0.90625 1.078125 -1.78125q0.75 -0.875 2.15625 -2.015625q2.171875 -1.78125 2.9375 -2.828125q0.765625 -1.046875 0.765625 -1.96875q0 -0.984375 -0.703125 -1.640625q-0.6875 -0.671875 -1.8125 -0.671875q-1.1875 0 -1.90625 0.71875q-0.703125 0.703125 -0.703125 1.953125l-1.6875 -0.171875q0.171875 -1.890625 1.296875 -2.875q1.140625 -0.984375 3.03125 -0.984375q1.921875 0 3.046875 1.0625q1.125 1.0625 1.125 2.640625q0 0.796875 -0.328125 1.578125q-0.328125 0.78125 -1.09375 1.640625q-0.75 0.84375 -2.53125 2.34375q-1.46875 1.234375 -1.890625 1.6875q-0.421875 0.4375 -0.6875 0.875l6.546875 0z" fill-rule="nonzero"/><path fill="#e06666" d="m615.56793 24.999102l6.5512085 0l0 42.04725l-6.5512085 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m615.56793 24.999102l6.5512085 0l0 42.04725l-6.5512085 0z" fill-rule="evenodd"/><path fill="#c9daf8" d="m456.51425 24.999102l99.02365 0l0 42.04725l-99.02365 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m456.51425 24.999102l99.02365 0l0 42.04725l-99.02365 0z" fill-rule="evenodd"/><path fill="#000000" d="m466.2955 49.442722l1.71875 -0.140625q0.1875 1.25 0.875 1.890625q0.703125 0.625 1.6875 0.625q1.1875 0 2.0 -0.890625q0.828125 -0.890625 0.828125 -2.359375q0 -1.40625 -0.796875 -2.21875q-0.78125 -0.8125 -2.0625 -0.8125q-0.78125 0 -1.421875 0.359375q-0.640625 0.359375 -1.0 0.9375l-1.546875 -0.203125l1.296875 -6.859375l6.640625 0l0 1.5625l-5.328125 0l-0.71875 3.59375q1.203125 -0.84375 2.515625 -0.84375q1.75 0 2.953125 1.21875q1.203125 1.203125 1.203125 3.109375q0 1.8125 -1.046875 3.140625q-1.296875 1.625 -3.515625 1.625q-1.8125 0 -2.96875 -1.015625q-1.15625 -1.03125 -1.3125 -2.71875z" fill-rule="nonzero"/><path fill="#e06666" d="m391.1985 24.999102l6.551178 0l0 42.04725l-6.551178 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m391.1985 24.999102l6.551178 0l0 42.04725l-6.551178 0z" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m114.43843 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" stroke-dasharray="4.0,3.0" d="m114.43843 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m163.95024 24.999102c0 -12.5 114.47246 -25.007874 228.9449 -25.0c114.47241 0.007874016 228.94489 12.531496 228.94489 25.062992" fill-rule="evenodd"/><path stroke="#000000" stroke-width="2.0" stroke-linejoin="round" stroke-linecap="butt" d="m163.95024 24.9991c0 -12.499998 114.47246 -25.007872 228.9449 -24.999998c57.236206 0.003937008 114.47244 3.136811 157.3996 7.835138c21.463562 2.3491635 39.349915 5.0896897 51.8703 8.026144c3.130127 0.7341137 5.9248657 1.4804726 8.356262 2.236023c0.30395508 0.09444237 0.6022339 0.1890316 0.89471436 0.28375435l0.37457275 0.12388611" fill-rule="evenodd"/><path fill="#000000" stroke="#000000" stroke-width="2.0" stroke-linecap="butt" d="m609.98517 21.270555l9.406311 2.1936665l-5.7955933 -7.7266836z" fill-rule="evenodd"/><path fill="#e06666" d="m47.56793 24.999102l6.551182 0l0 42.04725l-6.551182 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m47.56793 24.999102l6.551182 0l0 42.04725l-6.551182 0z" fill-rule="evenodd"/></g></svg>
diff --git a/Documentation/networking/tls-offload-reorder-good.svg b/Documentation/networking/tls-offload-reorder-good.svg
new file mode 100644
index 0000000..10e17d9
--- /dev/null
+++ b/Documentation/networking/tls-offload-reorder-good.svg
@@ -0,0 +1 @@
+<svg version="1.1" viewBox="0.0 0.0 672.0 68.0" fill="none" stroke="none" stroke-linecap="square" stroke-miterlimit="10" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg"><clipPath id="p.0"><path d="m0 0l960.0 0l0 720.0l-960.0 0l0 -720.0z" clip-rule="nonzero"/></clipPath><g clip-path="url(#p.0)"><path fill="#000000" fill-opacity="0.0" d="m0 0l960.0 0l0 720.0l-960.0 0z" fill-rule="evenodd"/><path fill="#b6d7a8" d="m0 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m0 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m15.953125 52.942722l-1.640625 0l0 -10.453125q-0.59375 0.5625 -1.5625 1.140625q-0.953125 0.5625 -1.71875 0.84375l0 -1.59375q1.375 -0.640625 2.40625 -1.5625q1.03125 -0.921875 1.453125 -1.78125l1.0625 0l0 13.40625z" fill-rule="nonzero"/><path fill="#b6d7a8" d="m340.69897 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m340.69897 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m355.73022 52.942722l0 -3.203125l-5.796875 0l0 -1.5l6.09375 -8.65625l1.34375 0l0 8.65625l1.796875 0l0 1.5l-1.796875 0l0 3.203125l-1.640625 0zm0 -4.703125l0 -6.015625l-4.1875 6.015625l4.1875 0z" fill-rule="nonzero"/><path fill="#cfe2f3" d="m225.37527 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m225.37527 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m235.15652 49.411472l1.640625 -0.21875q0.28125 1.40625 0.953125 2.015625q0.6875 0.609375 1.65625 0.609375q1.15625 0 1.953125 -0.796875q0.796875 -0.796875 0.796875 -1.984375q0 -1.125 -0.734375 -1.859375q-0.734375 -0.734375 -1.875 -0.734375q-0.46875 0 -1.15625 0.171875l0.1875 -1.4375q0.15625 0.015625 0.265625 0.015625q1.046875 0 1.875 -0.546875q0.84375 -0.546875 0.84375 -1.671875q0 -0.90625 -0.609375 -1.5q-0.609375 -0.59375 -1.578125 -0.59375q-0.953125 0 -1.59375 0.609375q-0.640625 0.59375 -0.8125 1.796875l-1.640625 -0.296875q0.296875 -1.640625 1.359375 -2.546875q1.0625 -0.90625 2.65625 -0.90625q1.09375 0 2.0 0.46875q0.921875 0.46875 1.40625 1.28125q0.5 0.8125 0.5 1.71875q0 0.859375 -0.46875 1.578125q-0.46875 0.703125 -1.375 1.125q1.1875 0.28125 1.84375 1.140625q0.65625 0.859375 0.65625 2.15625q0 1.734375 -1.28125 2.953125q-1.265625 1.21875 -3.21875 1.21875q-1.765625 0 -2.921875 -1.046875q-1.15625 -1.046875 -1.328125 -2.71875z" fill-rule="nonzero"/><path fill="#e06666" d="m271.56793 24.999102l6.551178 0l0 42.04725l-6.551178 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m271.56793 24.999102l6.551178 0l0 42.04725l-6.551178 0z" fill-rule="evenodd"/><path fill="#cfe2f3" d="m572.3295 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m572.3295 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" d="m590.72015 51.364597l0 1.578125l-8.828125 0q-0.015625 -0.59375 0.1875 -1.140625q0.34375 -0.90625 1.078125 -1.78125q0.75 -0.875 2.15625 -2.015625q2.171875 -1.78125 2.9375 -2.828125q0.765625 -1.046875 0.765625 -1.96875q0 -0.984375 -0.703125 -1.640625q-0.6875 -0.671875 -1.8125 -0.671875q-1.1875 0 -1.90625 0.71875q-0.703125 0.703125 -0.703125 1.953125l-1.6875 -0.171875q0.171875 -1.890625 1.296875 -2.875q1.140625 -0.984375 3.03125 -0.984375q1.921875 0 3.046875 1.0625q1.125 1.0625 1.125 2.640625q0 0.796875 -0.328125 1.578125q-0.328125 0.78125 -1.09375 1.640625q-0.75 0.84375 -2.53125 2.34375q-1.46875 1.234375 -1.890625 1.6875q-0.421875 0.4375 -0.6875 0.875l6.546875 0z" fill-rule="nonzero"/><path fill="#b6d7a8" d="m456.51425 24.999102l99.02365 0l0 42.04725l-99.02365 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m456.51425 24.999102l99.02365 0l0 42.04725l-99.02365 0z" fill-rule="evenodd"/><path fill="#000000" d="m466.2955 49.442722l1.71875 -0.140625q0.1875 1.25 0.875 1.890625q0.703125 0.625 1.6875 0.625q1.1875 0 2.0 -0.890625q0.828125 -0.890625 0.828125 -2.359375q0 -1.40625 -0.796875 -2.21875q-0.78125 -0.8125 -2.0625 -0.8125q-0.78125 0 -1.421875 0.359375q-0.640625 0.359375 -1.0 0.9375l-1.546875 -0.203125l1.296875 -6.859375l6.640625 0l0 1.5625l-5.328125 0l-0.71875 3.59375q1.203125 -0.84375 2.515625 -0.84375q1.75 0 2.953125 1.21875q1.203125 1.203125 1.203125 3.109375q0 1.8125 -1.046875 3.140625q-1.296875 1.625 -3.515625 1.625q-1.8125 0 -2.96875 -1.015625q-1.15625 -1.03125 -1.3125 -2.71875z" fill-rule="nonzero"/><path fill="#e06666" d="m503.1985 24.999102l6.551178 0l0 42.04725l-6.551178 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m503.1985 24.999102l6.551178 0l0 42.04725l-6.551178 0z" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m114.43843 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" stroke-dasharray="4.0,3.0" d="m114.43843 24.999102l99.02362 0l0 42.04725l-99.02362 0z" fill-rule="evenodd"/><path fill="#000000" fill-opacity="0.0" d="m163.95024 24.999102c0 -12.5 114.47246 -25.007874 228.9449 -25.0c114.47241 0.007874016 228.94489 12.531496 228.94489 25.062992" fill-rule="evenodd"/><path stroke="#000000" stroke-width="2.0" stroke-linejoin="round" stroke-linecap="butt" d="m163.95024 24.9991c0 -12.499998 114.47246 -25.007872 228.9449 -24.999998c57.236206 0.003937008 114.47244 3.136811 157.3996 7.835138c21.463562 2.3491635 39.349915 5.0896897 51.8703 8.026144c3.130127 0.7341137 5.9248657 1.4804726 8.356262 2.236023c0.30395508 0.09444237 0.6022339 0.1890316 0.89471436 0.28375435l0.37457275 0.12388611" fill-rule="evenodd"/><path fill="#000000" stroke="#000000" stroke-width="2.0" stroke-linecap="butt" d="m609.98517 21.270555l9.406311 2.1936665l-5.7955933 -7.7266836z" fill-rule="evenodd"/><path fill="#e06666" d="m47.56793 24.999102l6.551182 0l0 42.04725l-6.551182 0z" fill-rule="evenodd"/><path stroke="#000000" stroke-width="1.0" stroke-linejoin="round" stroke-linecap="butt" d="m47.56793 24.999102l6.551182 0l0 42.04725l-6.551182 0z" fill-rule="evenodd"/></g></svg>
diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
new file mode 100644
index 0000000..f914e81
--- /dev/null
+++ b/Documentation/networking/tls-offload.rst
@@ -0,0 +1,512 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+==================
+Kernel TLS offload
+==================
+
+Kernel TLS operation
+====================
+
+Linux kernel provides TLS connection offload infrastructure. Once a TCP
+connection is in ``ESTABLISHED`` state user space can enable the TLS Upper
+Layer Protocol (ULP) and install the cryptographic connection state.
+For details regarding the user-facing interface refer to the TLS
+documentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
+
+``ktls`` can operate in three modes:
+
+ * Software crypto mode (``TLS_SW``) - CPU handles the cryptography.
+ In most basic cases only crypto operations synchronous with the CPU
+ can be used, but depending on calling context CPU may utilize
+ asynchronous crypto accelerators. The use of accelerators introduces extra
+ latency on socket reads (decryption only starts when a read syscall
+ is made) and additional I/O load on the system.
+ * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto
+ on a packet by packet basis, provided the packets arrive in order.
+ This mode integrates best with the kernel stack and is described in detail
+ in the remaining part of this document
+ (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``).
+ * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where
+ NIC driver and firmware replace the kernel networking stack
+ with its own TCP handling, it is not usable in production environments
+ making use of the Linux networking stack for example any firewalling
+ abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``).
+
+The operation mode is selected automatically based on device configuration,
+offload opt-in or opt-out on per-connection basis is not currently supported.
+
+TX
+--
+
+At a high level user write requests are turned into a scatter list, the TLS ULP
+intercepts them, inserts record framing, performs encryption (in ``TLS_SW``
+mode) and then hands the modified scatter list to the TCP layer. From this
+point on the TCP stack proceeds as normal.
+
+In ``TLS_HW`` mode the encryption is not performed in the TLS ULP.
+Instead packets reach a device driver, the driver will mark the packets
+for crypto offload based on the socket the packet is attached to,
+and send them to the device for encryption and transmission.
+
+RX
+--
+
+On the receive side if the device handled decryption and authentication
+successfully, the driver will set the decrypted bit in the associated
+:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and
+are handled normally. ``ktls`` is informed when data is queued to the socket
+and the ``strparser`` mechanism is used to delineate the records. Upon read
+request, records are retrieved from the socket and passed to decryption routine.
+If device decrypted all the segments of the record the decryption is skipped,
+otherwise software path handles decryption.
+
+.. kernel-figure:: tls-offload-layers.svg
+ :alt: TLS offload layers
+ :align: center
+ :figwidth: 28em
+
+ Layers of Kernel TLS stack
+
+Device configuration
+====================
+
+During driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and
+``NETIF_F_HW_TLS_TX`` features and installs its
+:c:type:`struct tlsdev_ops <tlsdev_ops>`
+pointer in the :c:member:`tlsdev_ops` member of the
+:c:type:`struct net_device <net_device>`.
+
+When TLS cryptographic connection state is installed on a ``ktls`` socket
+(note that it is done twice, once for RX and once for TX direction,
+and the two are completely independent), the kernel checks if the underlying
+network device is offload-capable and attempts the offload. In case offload
+fails the connection is handled entirely in software using the same mechanism
+as if the offload was never tried.
+
+Offload request is performed via the :c:member:`tls_dev_add` callback of
+:c:type:`struct tlsdev_ops <tlsdev_ops>`:
+
+.. code-block:: c
+
+ int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
+ enum tls_offload_ctx_dir direction,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn);
+
+``direction`` indicates whether the cryptographic information is for
+the received or transmitted packets. Driver uses the ``sk`` parameter
+to retrieve the connection 5-tuple and socket family (IPv4 vs IPv6).
+Cryptographic information in ``crypto_info`` includes the key, iv, salt
+as well as TLS record sequence number. ``start_offload_tcp_sn`` indicates
+which TCP sequence number corresponds to the beginning of the record with
+sequence number from ``crypto_info``. The driver can add its state
+at the end of kernel structures (see :c:member:`driver_state` members
+in ``include/net/tls.h``) to avoid additional allocations and pointer
+dereferences.
+
+TX
+--
+
+After TX state is installed, the stack guarantees that the first segment
+of the stream will start exactly at the ``start_offload_tcp_sn`` sequence
+number, simplifying TCP sequence number matching.
+
+TX offload being fully initialized does not imply that all segments passing
+through the driver and which belong to the offloaded socket will be after
+the expected sequence number and will have kernel record information.
+In particular, already encrypted data may have been queued to the socket
+before installing the connection state in the kernel.
+
+RX
+--
+
+In RX direction local networking stack has little control over the segmentation,
+so the initial records' TCP sequence number may be anywhere inside the segment.
+
+Normal operation
+================
+
+At the minimum the device maintains the following state for each connection, in
+each direction:
+
+ * crypto secrets (key, iv, salt)
+ * crypto processing state (partial blocks, partial authentication tag, etc.)
+ * record metadata (sequence number, processing offset and length)
+ * expected TCP sequence number
+
+There are no guarantees on record length or record segmentation. In particular
+segments may start at any point of a record and contain any number of records.
+Assuming segments are received in order, the device should be able to perform
+crypto operations and authentication regardless of segmentation. For this
+to be possible device has to keep small amount of segment-to-segment state.
+This includes at least:
+
+ * partial headers (if a segment carried only a part of the TLS header)
+ * partial data block
+ * partial authentication tag (all data had been seen but part of the
+ authentication tag has to be written or read from the subsequent segment)
+
+Record reassembly is not necessary for TLS offload. If the packets arrive
+in order the device should be able to handle them separately and make
+forward progress.
+
+TX
+--
+
+The kernel stack performs record framing reserving space for the authentication
+tag and populating all other TLS header and tailer fields.
+
+Both the device and the driver maintain expected TCP sequence numbers
+due to the possibility of retransmissions and the lack of software fallback
+once the packet reaches the device.
+For segments passed in order, the driver marks the packets with
+a connection identifier (note that a 5-tuple lookup is insufficient to identify
+packets requiring HW offload, see the :ref:`5tuple_problems` section)
+and hands them to the device. The device identifies the packet as requiring
+TLS handling and confirms the sequence number matches its expectation.
+The device performs encryption and authentication of the record data.
+It replaces the authentication tag and TCP checksum with correct values.
+
+RX
+--
+
+Before a packet is DMAed to the host (but after NIC's embedded switching
+and packet transformation functions) the device validates the Layer 4
+checksum and performs a 5-tuple lookup to find any TLS connection the packet
+may belong to (technically a 4-tuple
+lookup is sufficient - IP addresses and TCP port numbers, as the protocol
+is always TCP). If connection is matched device confirms if the TCP sequence
+number is the expected one and proceeds to TLS handling (record delineation,
+decryption, authentication for each record in the packet). The device leaves
+the record framing unmodified, the stack takes care of record decapsulation.
+Device indicates successful handling of TLS offload in the per-packet context
+(descriptor) passed to the host.
+
+Upon reception of a TLS offloaded packet, the driver sets
+the :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>`
+corresponding to the segment. Networking stack makes sure decrypted
+and non-decrypted segments do not get coalesced (e.g. by GRO or socket layer)
+and takes care of partial decryption.
+
+Resync handling
+===============
+
+In presence of packet drops or network packet reordering, the device may lose
+synchronization with the TLS stream, and require a resync with the kernel's
+TCP stack.
+
+Note that resync is only attempted for connections which were successfully
+added to the device table and are in TLS_HW mode. For example,
+if the table was full when cryptographic state was installed in the kernel,
+such connection will never get offloaded. Therefore the resync request
+does not carry any cryptographic connection state.
+
+TX
+--
+
+Segments transmitted from an offloaded socket can get out of sync
+in similar ways to the receive side-retransmissions - local drops
+are possible, though network reorders are not. There are currently
+two mechanisms for dealing with out of order segments.
+
+Crypto state rebuilding
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Whenever an out of order segment is transmitted the driver provides
+the device with enough information to perform cryptographic operations.
+This means most likely that the part of the record preceding the current
+segment has to be passed to the device as part of the packet context,
+together with its TCP sequence number and TLS record number. The device
+can then initialize its crypto state, process and discard the preceding
+data (to be able to insert the authentication tag) and move onto handling
+the actual packet.
+
+In this mode depending on the implementation the driver can either ask
+for a continuation with the crypto state and the new sequence number
+(next expected segment is the one after the out of order one), or continue
+with the previous stream state - assuming that the out of order segment
+was just a retransmission. The former is simpler, and does not require
+retransmission detection therefore it is the recommended method until
+such time it is proven inefficient.
+
+Next record sync
+~~~~~~~~~~~~~~~~
+
+Whenever an out of order segment is detected the driver requests
+that the ``ktls`` software fallback code encrypt it. If the segment's
+sequence number is lower than expected the driver assumes retransmission
+and doesn't change device state. If the segment is in the future, it
+may imply a local drop, the driver asks the stack to sync the device
+to the next record state and falls back to software.
+
+Resync request is indicated with:
+
+.. code-block:: c
+
+ void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq)
+
+Until resync is complete driver should not access its expected TCP
+sequence number (as it will be updated from a different context).
+Following helper should be used to test if resync is complete:
+
+.. code-block:: c
+
+ bool tls_offload_tx_resync_pending(struct sock *sk)
+
+Next time ``ktls`` pushes a record it will first send its TCP sequence number
+and TLS record number to the driver. Stack will also make sure that
+the new record will start on a segment boundary (like it does when
+the connection is initially added).
+
+RX
+--
+
+A small amount of RX reorder events may not require a full resynchronization.
+In particular the device should not lose synchronization
+when record boundary can be recovered:
+
+.. kernel-figure:: tls-offload-reorder-good.svg
+ :alt: reorder of non-header segment
+ :align: center
+
+ Reorder of non-header segment
+
+Green segments are successfully decrypted, blue ones are passed
+as received on wire, red stripes mark start of new records.
+
+In above case segment 1 is received and decrypted successfully.
+Segment 2 was dropped so 3 arrives out of order. The device knows
+the next record starts inside 3, based on record length in segment 1.
+Segment 3 is passed untouched, because due to lack of data from segment 2
+the remainder of the previous record inside segment 3 cannot be handled.
+The device can, however, collect the authentication algorithm's state
+and partial block from the new record in segment 3 and when 4 and 5
+arrive continue decryption. Finally when 2 arrives it's completely outside
+of expected window of the device so it's passed as is without special
+handling. ``ktls`` software fallback handles the decryption of record
+spanning segments 1, 2 and 3. The device did not get out of sync,
+even though two segments did not get decrypted.
+
+Kernel synchronization may be necessary if the lost segment contained
+a record header and arrived after the next record header has already passed:
+
+.. kernel-figure:: tls-offload-reorder-bad.svg
+ :alt: reorder of header segment
+ :align: center
+
+ Reorder of segment with a TLS header
+
+In this example segment 2 gets dropped, and it contains a record header.
+Device can only detect that segment 4 also contains a TLS header
+if it knows the length of the previous record from segment 2. In this case
+the device will lose synchronization with the stream.
+
+Stream scan resynchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When the device gets out of sync and the stream reaches TCP sequence
+numbers more than a max size record past the expected TCP sequence number,
+the device starts scanning for a known header pattern. For example
+for TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur
+in the SSL/TLS version field of the header. Once pattern is matched
+the device continues attempting parsing headers at expected locations
+(based on the length fields at guessed locations).
+Whenever the expected location does not contain a valid header the scan
+is restarted.
+
+When the header is matched the device sends a confirmation request
+to the kernel, asking if the guessed location is correct (if a TLS record
+really starts there), and which record sequence number the given header had.
+The kernel confirms the guessed location was correct and tells the device
+the record sequence number. Meanwhile, the device had been parsing
+and counting all records since the just-confirmed one, it adds the number
+of records it had seen to the record number provided by the kernel.
+At this point the device is in sync and can resume decryption at next
+segment boundary.
+
+In a pathological case the device may latch onto a sequence of matching
+headers and never hear back from the kernel (there is no negative
+confirmation from the kernel). The implementation may choose to periodically
+restart scan. Given how unlikely falsely-matching stream is, however,
+periodic restart is not deemed necessary.
+
+Special care has to be taken if the confirmation request is passed
+asynchronously to the packet stream and record may get processed
+by the kernel before the confirmation request.
+
+Stack-driven resynchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The driver may also request the stack to perform resynchronization
+whenever it sees the records are no longer getting decrypted.
+If the connection is configured in this mode the stack automatically
+schedules resynchronization after it has received two completely encrypted
+records.
+
+The stack waits for the socket to drain and informs the device about
+the next expected record number and its TCP sequence number. If the
+records continue to be received fully encrypted stack retries the
+synchronization with an exponential back off (first after 2 encrypted
+records, then after 4 records, after 8, after 16... up until every
+128 records).
+
+Error handling
+==============
+
+TX
+--
+
+Packets may be redirected or rerouted by the stack to a different
+device than the selected TLS offload device. The stack will handle
+such condition using the :c:func:`sk_validate_xmit_skb` helper
+(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook).
+Offload maintains information about all records until the data is
+fully acknowledged, so if skbs reach the wrong device they can be handled
+by software fallback.
+
+Any device TLS offload handling error on the transmission side must result
+in the packet being dropped. For example if a packet got out of order
+due to a bug in the stack or the device, reached the device and can't
+be encrypted such packet must be dropped.
+
+RX
+--
+
+If the device encounters any problems with TLS offload on the receive
+side it should pass the packet to the host's networking stack as it was
+received on the wire.
+
+For example authentication failure for any record in the segment should
+result in passing the unmodified packet to the software fallback. This means
+packets should not be modified "in place". Splitting segments to handle partial
+decryption is not advised. In other words either all records in the packet
+had been handled successfully and authenticated or the packet has to be passed
+to the host's stack as it was on the wire (recovering original packet in the
+driver if device provides precise error is sufficient).
+
+The Linux networking stack does not provide a way of reporting per-packet
+decryption and authentication errors, packets with errors must simply not
+have the :c:member:`decrypted` mark set.
+
+A packet should also not be handled by the TLS offload if it contains
+incorrect checksums.
+
+Performance metrics
+===================
+
+TLS offload can be characterized by the following basic metrics:
+
+ * max connection count
+ * connection installation rate
+ * connection installation latency
+ * total cryptographic performance
+
+Note that each TCP connection requires a TLS session in both directions,
+the performance may be reported treating each direction separately.
+
+Max connection count
+--------------------
+
+The number of connections device can support can be exposed via
+``devlink resource`` API.
+
+Total cryptographic performance
+-------------------------------
+
+Offload performance may depend on segment and record size.
+
+Overload of the cryptographic subsystem of the device should not have
+significant performance impact on non-offloaded streams.
+
+Statistics
+==========
+
+Following minimum set of TLS-related statistics should be reported
+by the driver:
+
+ * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets
+ which were part of a TLS stream.
+ * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets
+ which were successfully decrypted.
+ * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device
+ for encryption of their TLS payload.
+ * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets
+ passed to the device for encryption.
+ * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for
+ encryption.
+ * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
+ but did not arrive in the expected order.
+ * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of
+ a TLS stream and arrived out-of-order, but skipped the HW offload routine
+ and went to the regular transmit flow as they were retransmissions of the
+ connection handshake.
+ * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of
+ a TLS stream dropped, because they arrived out of order and associated
+ record could not be found.
+ * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS
+ stream dropped, because they contain both data that has been encrypted by
+ software and data that expects hardware crypto offload.
+
+Notable corner cases, exceptions and additional requirements
+============================================================
+
+.. _5tuple_problems:
+
+5-tuple matching limitations
+----------------------------
+
+The device can only recognize received packets based on the 5-tuple
+of the socket. Current ``ktls`` implementation will not offload sockets
+routed through software interfaces such as those used for tunneling
+or virtual networking. However, many packet transformations performed
+by the networking stack (most notably any BPF logic) do not require
+any intermediate software device, therefore a 5-tuple match may
+consistently miss at the device level. In such cases the device
+should still be able to perform TX offload (encryption) and should
+fallback cleanly to software decryption (RX).
+
+Out of order
+------------
+
+Introducing extra processing in NICs should not cause packets to be
+transmitted or received out of order, for example pure ACK packets
+should not be reordered with respect to data segments.
+
+Ingress reorder
+---------------
+
+A device is permitted to perform packet reordering for consecutive
+TCP segments (i.e. placing packets in the correct order) but any form
+of additional buffering is disallowed.
+
+Coexistence with standard networking offload features
+-----------------------------------------------------
+
+Offloaded ``ktls`` sockets should support standard TCP stack features
+transparently. Enabling device TLS offload should not cause any difference
+in packets as seen on the wire.
+
+Transport layer transparency
+----------------------------
+
+The device should not modify any packet headers for the purpose
+of the simplifying TLS offload.
+
+The device should not depend on any packet headers beyond what is strictly
+necessary for TLS offload.
+
+Segment drops
+-------------
+
+Dropping packets is acceptable only in the event of catastrophic
+system errors and should never be used as an error handling mechanism
+in cases arising from normal operation. In other words, reliance
+on TCP retransmissions to handle corner cases is not acceptable.
+
+TLS device features
+-------------------
+
+Drivers should ignore the changes to TLS the device feature flags.
+These flags will be acted upon accordingly by the core ``ktls`` code.
+TLS device feature flags only control adding of new TLS connection
+offloads, old connections will remain active after flags are cleared.
diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.rst
similarity index 88%
rename from Documentation/networking/tls.txt
rename to Documentation/networking/tls.rst
index 58b5ef7..5bcbf75 100644
--- a/Documentation/networking/tls.txt
+++ b/Documentation/networking/tls.rst
@@ -1,3 +1,9 @@
+.. _kernel_tls:
+
+==========
+Kernel TLS
+==========
+
Overview
========
@@ -12,6 +18,8 @@
First create a new TCP socket and set the TLS ULP.
+.. code-block:: c
+
sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
@@ -21,6 +29,8 @@
data-path to the kernel. There is a separate socket option for moving
the transmit and the receive into the kernel.
+.. code-block:: c
+
/* From linux/tls.h */
struct tls_crypto_info {
unsigned short version;
@@ -58,6 +68,8 @@
socket is encrypted using TLS and the parameters provided in the socket option.
For example, we can send an encrypted hello world record as follows:
+.. code-block:: c
+
const char *msg = "hello world\n";
send(sock, msg, strlen(msg));
@@ -67,6 +79,8 @@
The sendfile system call will send the file's data over TLS records of maximum
length (2^14).
+.. code-block:: c
+
file = open(filename, O_RDONLY);
fstat(file, &stat);
sendfile(sock, file, &offset, stat.st_size);
@@ -89,6 +103,8 @@
are decrypted using TLS parameters provided. A full TLS record must
be received before decryption can happen.
+.. code-block:: c
+
char buffer[16384];
recv(sock, buffer, 16384);
@@ -97,12 +113,12 @@
buffer is too small, data is decrypted in the kernel and copied to
userspace.
-EINVAL is returned if the TLS version in the received message does not
+``EINVAL`` is returned if the TLS version in the received message does not
match the version passed in setsockopt.
-EMSGSIZE is returned if the received message is too big.
+``EMSGSIZE`` is returned if the received message is too big.
-EBADMSG is returned if decryption failed for any other reason.
+``EBADMSG`` is returned if decryption failed for any other reason.
Send TLS control messages
-------------------------
@@ -113,9 +129,11 @@
via a CMSG. For example the following function sends @data of @length bytes
using a record of type @record_type.
-/* send TLS control message using record_type */
+.. code-block:: c
+
+ /* send TLS control message using record_type */
static int klts_send_ctrl_message(int sock, unsigned char record_type,
- void *data, size_t length)
+ void *data, size_t length)
{
struct msghdr msg = {0};
int cmsg_len = sizeof(record_type);
@@ -151,6 +169,8 @@
returned if a control message is received. Data messages may be
received without a cmsg buffer set.
+.. code-block:: c
+
char buffer[16384];
char cmsg[CMSG_SPACE(sizeof(unsigned char))];
struct msghdr msg = {0};
@@ -186,12 +206,10 @@
At a high level, the kernel TLS ULP is a replacement for the record
layer of a userspace TLS library.
-A patchset to OpenSSL to use ktls as the record layer is here:
+A patchset to OpenSSL to use ktls as the record layer is
+`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
-https://github.com/Mellanox/openssl/commits/tls_rx2
-
-An example of calling send directly after a handshake using
-gnutls. Since it doesn't implement a full record layer, control
-messages are not supported:
-
-https://github.com/ktls/af_ktls-tool/commits/RX
+`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
+of calling send directly after a handshake using gnutls.
+Since it doesn't implement a full record layer, control
+messages are not supported.
diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt
index 949d5dc..0104830 100644
--- a/Documentation/networking/tuntap.txt
+++ b/Documentation/networking/tuntap.txt
@@ -204,8 +204,8 @@
media, receives them from user space program and instead of sending
packets via physical media sends them to the user space program.
-Let's say that you configured IPX on the tap0, then whenever
-the kernel sends an IPX packet to tap0, it is passed to the application
+Let's say that you configured IPv6 on the tap0, then whenever
+the kernel sends an IPv6 packet to tap0, it is passed to the application
(VTun for example). The application encrypts, compresses and sends it to
the other side over TCP or UDP. The application on the other side decompresses
and decrypts the data received and writes the packet to the TAP device,
diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
index 8ff7b4c..a5f103b 100644
--- a/Documentation/networking/vrf.txt
+++ b/Documentation/networking/vrf.txt
@@ -103,19 +103,33 @@
or to specify the output device using cmsg and IP_PKTINFO.
+By default the scope of the port bindings for unbound sockets is
+limited to the default VRF. That is, it will not be matched by packets
+arriving on interfaces enslaved to an l3mdev and processes may bind to
+the same port if they bind to an l3mdev.
+
TCP & UDP services running in the default VRF context (ie., not bound
to any VRF device) can work across all VRF domains by enabling the
tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
+
sysctl -w net.ipv4.tcp_l3mdev_accept=1
sysctl -w net.ipv4.udp_l3mdev_accept=1
+These options are disabled by default so that a socket in a VRF is only
+selected for packets in that VRF. There is a similar option for RAW
+sockets, which is enabled by default for reasons of backwards compatibility.
+This is so as to specify the output device with cmsg and IP_PKTINFO, but
+using a socket not bound to the corresponding VRF. This allows e.g. older ping
+implementations to be run with specifying the device but without executing it
+in the VRF. This option can be disabled so that packets received in a VRF
+context are only handled by a raw socket bound to the VRF, and packets in the
+default VRF are only handled by a socket not bound to any VRF:
+
+ sysctl -w net.ipv4.raw_l3mdev_accept=0
+
netfilter rules on the VRF device can be used to limit access to services
running in the default VRF context as well.
-The default VRF does not have limited scope with respect to port bindings.
-That is, if a process does a wildcard bind to a port in the default VRF it
-owns the port across all VRF domains within the network namespace.
-
################################################################################
Using iproute2 for VRFs
diff --git a/Documentation/networking/xfrm_device.txt b/Documentation/networking/xfrm_device.txt
index 50c34ca..a1c904d 100644
--- a/Documentation/networking/xfrm_device.txt
+++ b/Documentation/networking/xfrm_device.txt
@@ -68,6 +68,10 @@
- verify the algorithm is supported for offloads
- store the SA information (key, salt, target-ip, protocol, etc)
- enable the HW offload of the SA
+ - return status value:
+ 0 success
+ -EOPNETSUPP offload not supported, try SW IPsec
+ other fail the request
The driver can also set an offload_handle in the SA, an opaque void pointer
that can be used to convey context into the fast-path offload requests.
@@ -107,9 +111,10 @@
xfrm_state_hold(xs);
store the state information into the skb
- skb->sp = secpath_dup(skb->sp);
- skb->sp->xvec[skb->sp->len++] = xs;
- skb->sp->olen++;
+ sp = secpath_set(skb);
+ if (!sp) return;
+ sp->xvec[sp->len++] = xs;
+ sp->olen++;
indicate the success and/or error status of the offload
xo = xfrm_offload(skb);