Based on kernel version 3.9. Page generated on 2013-05-02 23:11 EST.
1 Open vSwitch datapath developer documentation 2 ============================================= 3 4 The Open vSwitch kernel module allows flexible userspace control over 5 flow-level packet processing on selected network devices. It can be 6 used to implement a plain Ethernet switch, network device bonding, 7 VLAN processing, network access control, flow-based network control, 8 and so on. 9 10 The kernel module implements multiple "datapaths" (analogous to 11 bridges), each of which can have multiple "vports" (analogous to ports 12 within a bridge). Each datapath also has associated with it a "flow 13 table" that userspace populates with "flows" that map from keys based 14 on packet headers and metadata to sets of actions. The most common 15 action forwards the packet to another vport; other actions are also 16 implemented. 17 18 When a packet arrives on a vport, the kernel module processes it by 19 extracting its flow key and looking it up in the flow table. If there 20 is a matching flow, it executes the associated actions. If there is 21 no match, it queues the packet to userspace for processing (as part of 22 its processing, userspace will likely set up a flow to handle further 23 packets of the same type entirely in-kernel). 24 25 26 Flow key compatibility 27 ---------------------- 28 29 Network protocols evolve over time. New protocols become important 30 and existing protocols lose their prominence. For the Open vSwitch 31 kernel module to remain relevant, it must be possible for newer 32 versions to parse additional protocols as part of the flow key. It 33 might even be desirable, someday, to drop support for parsing 34 protocols that have become obsolete. Therefore, the Netlink interface 35 to Open vSwitch is designed to allow carefully written userspace 36 applications to work with any version of the flow key, past or future. 37 38 To support this forward and backward compatibility, whenever the 39 kernel module passes a packet to userspace, it also passes along the 40 flow key that it parsed from the packet. Userspace then extracts its 41 own notion of a flow key from the packet and compares it against the 42 kernel-provided version: 43 44 - If userspace's notion of the flow key for the packet matches the 45 kernel's, then nothing special is necessary. 46 47 - If the kernel's flow key includes more fields than the userspace 48 version of the flow key, for example if the kernel decoded IPv6 49 headers but userspace stopped at the Ethernet type (because it 50 does not understand IPv6), then again nothing special is 51 necessary. Userspace can still set up a flow in the usual way, 52 as long as it uses the kernel-provided flow key to do it. 53 54 - If the userspace flow key includes more fields than the 55 kernel's, for example if userspace decoded an IPv6 header but 56 the kernel stopped at the Ethernet type, then userspace can 57 forward the packet manually, without setting up a flow in the 58 kernel. This case is bad for performance because every packet 59 that the kernel considers part of the flow must go to userspace, 60 but the forwarding behavior is correct. (If userspace can 61 determine that the values of the extra fields would not affect 62 forwarding behavior, then it could set up a flow anyway.) 63 64 How flow keys evolve over time is important to making this work, so 65 the following sections go into detail. 66 67 68 Flow key format 69 --------------- 70 71 A flow key is passed over a Netlink socket as a sequence of Netlink 72 attributes. Some attributes represent packet metadata, defined as any 73 information about a packet that cannot be extracted from the packet 74 itself, e.g. the vport on which the packet was received. Most 75 attributes, however, are extracted from headers within the packet, 76 e.g. source and destination addresses from Ethernet, IP, or TCP 77 headers. 78 79 The <linux/openvswitch.h> header file defines the exact format of the 80 flow key attributes. For informal explanatory purposes here, we write 81 them as comma-separated strings, with parentheses indicating arguments 82 and nesting. For example, the following could represent a flow key 83 corresponding to a TCP packet that arrived on vport 1: 84 85 in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), 86 eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, 87 frag=no), tcp(src=49163, dst=80) 88 89 Often we ellipsize arguments not important to the discussion, e.g.: 90 91 in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) 92 93 94 Basic rule for evolving flow keys 95 --------------------------------- 96 97 Some care is needed to really maintain forward and backward 98 compatibility for applications that follow the rules listed under 99 "Flow key compatibility" above. 100 101 The basic rule is obvious: 102 103 ------------------------------------------------------------------ 104 New network protocol support must only supplement existing flow 105 key attributes. It must not change the meaning of already defined 106 flow key attributes. 107 ------------------------------------------------------------------ 108 109 This rule does have less-obvious consequences so it is worth working 110 through a few examples. Suppose, for example, that the kernel module 111 did not already implement VLAN parsing. Instead, it just interpreted 112 the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the 113 packet. The flow key for any packet with an 802.1Q header would look 114 essentially like this, ignoring metadata: 115 116 eth(...), eth_type(0x8100) 117 118 Naively, to add VLAN support, it makes sense to add a new "vlan" flow 119 key attribute to contain the VLAN tag, then continue to decode the 120 encapsulated headers beyond the VLAN tag using the existing field 121 definitions. With this change, a TCP packet in VLAN 10 would have a 122 flow key much like this: 123 124 eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) 125 126 But this change would negatively affect a userspace application that 127 has not been updated to understand the new "vlan" flow key attribute. 128 The application could, following the flow compatibility rules above, 129 ignore the "vlan" attribute that it does not understand and therefore 130 assume that the flow contained IP packets. This is a bad assumption 131 (the flow only contains IP packets if one parses and skips over the 132 802.1Q header) and it could cause the application's behavior to change 133 across kernel versions even though it follows the compatibility rules. 134 135 The solution is to use a set of nested attributes. This is, for 136 example, why 802.1Q support uses nested attributes. A TCP packet in 137 VLAN 10 is actually expressed as: 138 139 eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), 140 ip(proto=6, ...), tcp(...))) 141 142 Notice how the "eth_type", "ip", and "tcp" flow key attributes are 143 nested inside the "encap" attribute. Thus, an application that does 144 not understand the "vlan" key will not see either of those attributes 145 and therefore will not misinterpret them. (Also, the outer eth_type 146 is still 0x8100, not changed to 0x0800.) 147 148 Handling malformed packets 149 -------------------------- 150 151 Don't drop packets in the kernel for malformed protocol headers, bad 152 checksums, etc. This would prevent userspace from implementing a 153 simple Ethernet switch that forwards every packet. 154 155 Instead, in such a case, include an attribute with "empty" content. 156 It doesn't matter if the empty content could be valid protocol values, 157 as long as those values are rarely seen in practice, because userspace 158 can always forward all packets with those values to userspace and 159 handle them individually. 160 161 For example, consider a packet that contains an IP header that 162 indicates protocol 6 for TCP, but which is truncated just after the IP 163 header, so that the TCP header is missing. The flow key for this 164 packet would include a tcp attribute with all-zero src and dst, like 165 this: 166 167 eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) 168 169 As another example, consider a packet with an Ethernet type of 0x8100, 170 indicating that a VLAN TCI should follow, but which is truncated just 171 after the Ethernet type. The flow key for this packet would include 172 an all-zero-bits vlan and an empty encap attribute, like this: 173 174 eth(...), eth_type(0x8100), vlan(0), encap() 175 176 Unlike a TCP packet with source and destination ports 0, an 177 all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka 178 VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan 179 attribute expressly to allow this situation to be distinguished. 180 Thus, the flow key in this second example unambiguously indicates a 181 missing or malformed VLAN TCI. 182 183 Other rules 184 ----------- 185 186 The other rules for flow keys are much less subtle: 187 188 - Duplicate attributes are not allowed at a given nesting level. 189 190 - Ordering of attributes is not significant. 191 192 - When the kernel sends a given flow key to userspace, it always 193 composes it the same way. This allows userspace to hash and 194 compare entire flow keys that it may not be able to fully 195 interpret.