About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / openvswitch.txt




Custom Search

Based on kernel version 3.9. Page generated on 2013-05-02 23:11 EST.

1	Open vSwitch datapath developer documentation
2	=============================================
3	
4	The Open vSwitch kernel module allows flexible userspace control over
5	flow-level packet processing on selected network devices.  It can be
6	used to implement a plain Ethernet switch, network device bonding,
7	VLAN processing, network access control, flow-based network control,
8	and so on.
9	
10	The kernel module implements multiple "datapaths" (analogous to
11	bridges), each of which can have multiple "vports" (analogous to ports
12	within a bridge).  Each datapath also has associated with it a "flow
13	table" that userspace populates with "flows" that map from keys based
14	on packet headers and metadata to sets of actions.  The most common
15	action forwards the packet to another vport; other actions are also
16	implemented.
17	
18	When a packet arrives on a vport, the kernel module processes it by
19	extracting its flow key and looking it up in the flow table.  If there
20	is a matching flow, it executes the associated actions.  If there is
21	no match, it queues the packet to userspace for processing (as part of
22	its processing, userspace will likely set up a flow to handle further
23	packets of the same type entirely in-kernel).
24	
25	
26	Flow key compatibility
27	----------------------
28	
29	Network protocols evolve over time.  New protocols become important
30	and existing protocols lose their prominence.  For the Open vSwitch
31	kernel module to remain relevant, it must be possible for newer
32	versions to parse additional protocols as part of the flow key.  It
33	might even be desirable, someday, to drop support for parsing
34	protocols that have become obsolete.  Therefore, the Netlink interface
35	to Open vSwitch is designed to allow carefully written userspace
36	applications to work with any version of the flow key, past or future.
37	
38	To support this forward and backward compatibility, whenever the
39	kernel module passes a packet to userspace, it also passes along the
40	flow key that it parsed from the packet.  Userspace then extracts its
41	own notion of a flow key from the packet and compares it against the
42	kernel-provided version:
43	
44	    - If userspace's notion of the flow key for the packet matches the
45	      kernel's, then nothing special is necessary.
46	
47	    - If the kernel's flow key includes more fields than the userspace
48	      version of the flow key, for example if the kernel decoded IPv6
49	      headers but userspace stopped at the Ethernet type (because it
50	      does not understand IPv6), then again nothing special is
51	      necessary.  Userspace can still set up a flow in the usual way,
52	      as long as it uses the kernel-provided flow key to do it.
53	
54	    - If the userspace flow key includes more fields than the
55	      kernel's, for example if userspace decoded an IPv6 header but
56	      the kernel stopped at the Ethernet type, then userspace can
57	      forward the packet manually, without setting up a flow in the
58	      kernel.  This case is bad for performance because every packet
59	      that the kernel considers part of the flow must go to userspace,
60	      but the forwarding behavior is correct.  (If userspace can
61	      determine that the values of the extra fields would not affect
62	      forwarding behavior, then it could set up a flow anyway.)
63	
64	How flow keys evolve over time is important to making this work, so
65	the following sections go into detail.
66	
67	
68	Flow key format
69	---------------
70	
71	A flow key is passed over a Netlink socket as a sequence of Netlink
72	attributes.  Some attributes represent packet metadata, defined as any
73	information about a packet that cannot be extracted from the packet
74	itself, e.g. the vport on which the packet was received.  Most
75	attributes, however, are extracted from headers within the packet,
76	e.g. source and destination addresses from Ethernet, IP, or TCP
77	headers.
78	
79	The <linux/openvswitch.h> header file defines the exact format of the
80	flow key attributes.  For informal explanatory purposes here, we write
81	them as comma-separated strings, with parentheses indicating arguments
82	and nesting.  For example, the following could represent a flow key
83	corresponding to a TCP packet that arrived on vport 1:
84	
85	    in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
86	    eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
87	    frag=no), tcp(src=49163, dst=80)
88	
89	Often we ellipsize arguments not important to the discussion, e.g.:
90	
91	    in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
92	
93	
94	Basic rule for evolving flow keys
95	---------------------------------
96	
97	Some care is needed to really maintain forward and backward
98	compatibility for applications that follow the rules listed under
99	"Flow key compatibility" above.
100	
101	The basic rule is obvious:
102	
103	    ------------------------------------------------------------------
104	    New network protocol support must only supplement existing flow
105	    key attributes.  It must not change the meaning of already defined
106	    flow key attributes.
107	    ------------------------------------------------------------------
108	
109	This rule does have less-obvious consequences so it is worth working
110	through a few examples.  Suppose, for example, that the kernel module
111	did not already implement VLAN parsing.  Instead, it just interpreted
112	the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
113	packet.  The flow key for any packet with an 802.1Q header would look
114	essentially like this, ignoring metadata:
115	
116	    eth(...), eth_type(0x8100)
117	
118	Naively, to add VLAN support, it makes sense to add a new "vlan" flow
119	key attribute to contain the VLAN tag, then continue to decode the
120	encapsulated headers beyond the VLAN tag using the existing field
121	definitions.  With this change, a TCP packet in VLAN 10 would have a
122	flow key much like this:
123	
124	    eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
125	
126	But this change would negatively affect a userspace application that
127	has not been updated to understand the new "vlan" flow key attribute.
128	The application could, following the flow compatibility rules above,
129	ignore the "vlan" attribute that it does not understand and therefore
130	assume that the flow contained IP packets.  This is a bad assumption
131	(the flow only contains IP packets if one parses and skips over the
132	802.1Q header) and it could cause the application's behavior to change
133	across kernel versions even though it follows the compatibility rules.
134	
135	The solution is to use a set of nested attributes.  This is, for
136	example, why 802.1Q support uses nested attributes.  A TCP packet in
137	VLAN 10 is actually expressed as:
138	
139	    eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
140	    ip(proto=6, ...), tcp(...)))
141	
142	Notice how the "eth_type", "ip", and "tcp" flow key attributes are
143	nested inside the "encap" attribute.  Thus, an application that does
144	not understand the "vlan" key will not see either of those attributes
145	and therefore will not misinterpret them.  (Also, the outer eth_type
146	is still 0x8100, not changed to 0x0800.)
147	
148	Handling malformed packets
149	--------------------------
150	
151	Don't drop packets in the kernel for malformed protocol headers, bad
152	checksums, etc.  This would prevent userspace from implementing a
153	simple Ethernet switch that forwards every packet.
154	
155	Instead, in such a case, include an attribute with "empty" content.
156	It doesn't matter if the empty content could be valid protocol values,
157	as long as those values are rarely seen in practice, because userspace
158	can always forward all packets with those values to userspace and
159	handle them individually.
160	
161	For example, consider a packet that contains an IP header that
162	indicates protocol 6 for TCP, but which is truncated just after the IP
163	header, so that the TCP header is missing.  The flow key for this
164	packet would include a tcp attribute with all-zero src and dst, like
165	this:
166	
167	    eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
168	
169	As another example, consider a packet with an Ethernet type of 0x8100,
170	indicating that a VLAN TCI should follow, but which is truncated just
171	after the Ethernet type.  The flow key for this packet would include
172	an all-zero-bits vlan and an empty encap attribute, like this:
173	
174	    eth(...), eth_type(0x8100), vlan(0), encap()
175	
176	Unlike a TCP packet with source and destination ports 0, an
177	all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
178	VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
179	attribute expressly to allow this situation to be distinguished.
180	Thus, the flow key in this second example unambiguously indicates a
181	missing or malformed VLAN TCI.
182	
183	Other rules
184	-----------
185	
186	The other rules for flow keys are much less subtle:
187	
188	    - Duplicate attributes are not allowed at a given nesting level.
189	
190	    - Ordering of attributes is not significant.
191	
192	    - When the kernel sends a given flow key to userspace, it always
193	      composes it the same way.  This allows userspace to hash and
194	      compare entire flow keys that it may not be able to fully
195	      interpret.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.