Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.
1 TCP protocol 2 ============ 3 4 Last updated: 3 June 2017 5 6 Contents 7 ======== 8 9 - Congestion control 10 - How the new TCP output machine [nyi] works 11 12 Congestion control 13 ================== 14 15 The following variables are used in the tcp_sock for congestion control: 16 snd_cwnd The size of the congestion window 17 snd_ssthresh Slow start threshold. We are in slow start if 18 snd_cwnd is less than this. 19 snd_cwnd_cnt A counter used to slow down the rate of increase 20 once we exceed slow start threshold. 21 snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. 22 snd_cwnd_stamp Timestamp for when congestion window last validated. 23 snd_cwnd_used Used as a highwater mark for how much of the 24 congestion window is in use. It is used to adjust 25 snd_cwnd down when the link is limited by the 26 application rather than the network. 27 28 As of 2.6.13, Linux supports pluggable congestion control algorithms. 29 A congestion control mechanism can be registered through functions in 30 tcp_cong.c. The functions used by the congestion control mechanism are 31 registered via passing a tcp_congestion_ops struct to 32 tcp_register_congestion_control. As a minimum, the congestion control 33 mechanism must provide a valid name and must implement either ssthresh, 34 cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook. 35 36 Private data for a congestion control mechanism is stored in tp->ca_priv. 37 tcp_ca(tp) returns a pointer to this space. This is preallocated space - it 38 is important to check the size of your private data will fit this space, or 39 alternatively, space could be allocated elsewhere and a pointer to it could 40 be stored here. 41 42 There are three kinds of congestion control algorithms currently: The 43 simplest ones are derived from TCP reno (highspeed, scalable) and just 44 provide an alternative congestion window calculation. More complex 45 ones like BIC try to look at other events to provide better 46 heuristics. There are also round trip time based algorithms like 47 Vegas and Westwood+. 48 49 Good TCP congestion control is a complex problem because the algorithm 50 needs to maintain fairness and performance. Please review current 51 research and RFC's before developing new modules. 52 53 The default congestion control mechanism is chosen based on the 54 DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default 55 value then you can set it using sysctl net.ipv4.tcp_congestion_control. The 56 module will be autoloaded if needed and you will get the expected protocol. If 57 you ask for an unknown congestion method, then the sysctl attempt will fail. 58 59 If you remove a TCP congestion control module, then you will get the next 60 available one. Since reno cannot be built as a module, and cannot be 61 removed, it will always be available. 62 63 How the new TCP output machine [nyi] works. 64 =========================================== 65 66 Data is kept on a single queue. The skb->users flag tells us if the frame is 67 one that has been queued already. To add a frame we throw it on the end. Ack 68 walks down the list from the start. 69 70 We keep a set of control flags 71 72 73 sk->tcp_pend_event 74 75 TCP_PEND_ACK Ack needed 76 TCP_ACK_NOW Needed now 77 TCP_WINDOW Window update check 78 TCP_WINZERO Zero probing 79 80 81 sk->transmit_queue The transmission frame begin 82 sk->transmit_new First new frame pointer 83 sk->transmit_end Where to add frames 84 85 sk->tcp_last_tx_ack Last ack seen 86 sk->tcp_dup_ack Dup ack count for fast retransmit 87 88 89 Frames are queued for output by tcp_write. We do our best to send the frames 90 off immediately if possible, but otherwise queue and compute the body 91 checksum in the copy. 92 93 When a write is done we try to clear any pending events and piggy back them. 94 If the window is full we queue full sized frames. On the first timeout in 95 zero window we split this. 96 97 On a timer we walk the retransmit list to send any retransmits, update the 98 backoff timers etc. A change of route table stamp causes a change of header 99 and recompute. We add any new tcp level headers and refinish the checksum 100 before sending.