About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / tcp.txt




Custom Search

Based on kernel version 3.13. Page generated on 2014-01-20 22:04 EST.

1	TCP protocol
2	============
3	
4	Last updated: 9 February 2008
5	
6	Contents
7	========
8	
9	- Congestion control
10	- How the new TCP output machine [nyi] works
11	
12	Congestion control
13	==================
14	
15	The following variables are used in the tcp_sock for congestion control:
16	snd_cwnd		The size of the congestion window
17	snd_ssthresh		Slow start threshold. We are in slow start if
18				snd_cwnd is less than this.
19	snd_cwnd_cnt		A counter used to slow down the rate of increase
20				once we exceed slow start threshold.
21	snd_cwnd_clamp		This is the maximum size that snd_cwnd can grow to.
22	snd_cwnd_stamp		Timestamp for when congestion window last validated.
23	snd_cwnd_used		Used as a highwater mark for how much of the
24				congestion window is in use. It is used to adjust
25				snd_cwnd down when the link is limited by the
26				application rather than the network.
27	
28	As of 2.6.13, Linux supports pluggable congestion control algorithms.
29	A congestion control mechanism can be registered through functions in
30	tcp_cong.c. The functions used by the congestion control mechanism are
31	registered via passing a tcp_congestion_ops struct to
32	tcp_register_congestion_control. As a minimum name, ssthresh,
33	cong_avoid, min_cwnd must be valid.
34	
35	Private data for a congestion control mechanism is stored in tp->ca_priv.
36	tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
37	is important to check the size of your private data will fit this space, or
38	alternatively space could be allocated elsewhere and a pointer to it could
39	be stored here.
40	
41	There are three kinds of congestion control algorithms currently: The
42	simplest ones are derived from TCP reno (highspeed, scalable) and just
43	provide an alternative the congestion window calculation. More complex
44	ones like BIC try to look at other events to provide better
45	heuristics.  There are also round trip time based algorithms like
46	Vegas and Westwood+.
47	
48	Good TCP congestion control is a complex problem because the algorithm
49	needs to maintain fairness and performance. Please review current
50	research and RFC's before developing new modules.
51	
52	The method that is used to determine which congestion control mechanism is
53	determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
54	The default congestion control will be the last one registered (LIFO);
55	so if you built everything as modules, the default will be reno. If you
56	build with the defaults from Kconfig, then CUBIC will be builtin (not a
57	module) and it will end up the default.
58	
59	If you really want a particular default value then you will need
60	to set it with the sysctl.  If you use a sysctl, the module will be autoloaded
61	if needed and you will get the expected protocol. If you ask for an
62	unknown congestion method, then the sysctl attempt will fail.
63	
64	If you remove a tcp congestion control module, then you will get the next
65	available one. Since reno cannot be built as a module, and cannot be
66	deleted, it will always be available.
67	
68	How the new TCP output machine [nyi] works.
69	===========================================
70	
71	Data is kept on a single queue. The skb->users flag tells us if the frame is
72	one that has been queued already. To add a frame we throw it on the end. Ack
73	walks down the list from the start.
74	
75	We keep a set of control flags
76	
77	
78		sk->tcp_pend_event
79	
80			TCP_PEND_ACK			Ack needed
81			TCP_ACK_NOW			Needed now
82			TCP_WINDOW			Window update check
83			TCP_WINZERO			Zero probing
84	
85	
86		sk->transmit_queue		The transmission frame begin
87		sk->transmit_new		First new frame pointer
88		sk->transmit_end		Where to add frames
89	
90		sk->tcp_last_tx_ack		Last ack seen
91		sk->tcp_dup_ack			Dup ack count for fast retransmit
92	
93	
94	Frames are queued for output by tcp_write. We do our best to send the frames
95	off immediately if possible, but otherwise queue and compute the body
96	checksum in the copy. 
97	
98	When a write is done we try to clear any pending events and piggy back them.
99	If the window is full we queue full sized frames. On the first timeout in
100	zero window we split this.
101	
102	On a timer we walk the retransmit list to send any retransmits, update the
103	backoff timers etc. A change of route table stamp causes a change of header
104	and recompute. We add any new tcp level headers and refinish the checksum
105	before sending. 
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.