About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / networking / packet_mmap.txt




Custom Search

Based on kernel version 3.9. Page generated on 2013-05-02 23:11 EST.

1	--------------------------------------------------------------------------------
2	+ ABSTRACT
3	--------------------------------------------------------------------------------
4	
5	This file documents the mmap() facility available with the PACKET
6	socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for
7	i) capture network traffic with utilities like tcpdump, ii) transmit network
8	traffic, or any other that needs raw access to network interface.
9	
10	You can find the latest version of this document at:
11	    http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap
12	
13	Howto can be found at:
14	    http://wiki.gnu-log.net (packet_mmap)
15	
16	Please send your comments to
17	    Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
18	    Johann Baudy <johann.baudy@gnu-log.net>
19	
20	-------------------------------------------------------------------------------
21	+ Why use PACKET_MMAP
22	--------------------------------------------------------------------------------
23	
24	In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
25	inefficient. It uses very limited buffers and requires one system call to
26	capture each packet, it requires two if you want to get packet's timestamp
27	(like libpcap always does).
28	
29	In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size 
30	configurable circular buffer mapped in user space that can be used to either
31	send or receive packets. This way reading packets just needs to wait for them,
32	most of the time there is no need to issue a single system call. Concerning
33	transmission, multiple packets can be sent through one system call to get the
34	highest bandwidth. By using a shared buffer between the kernel and the user
35	also has the benefit of minimizing packet copies.
36	
37	It's fine to use PACKET_MMAP to improve the performance of the capture and
38	transmission process, but it isn't everything. At least, if you are capturing
39	at high speeds (this is relative to the cpu speed), you should check if the
40	device driver of your network interface card supports some sort of interrupt
41	load mitigation or (even better) if it supports NAPI, also make sure it is
42	enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
43	supported by devices of your network. CPU IRQ pinning of your network interface
44	card can also be an advantage.
45	
46	--------------------------------------------------------------------------------
47	+ How to use mmap() to improve capture process
48	--------------------------------------------------------------------------------
49	
50	From the user standpoint, you should use the higher level libpcap library, which
51	is a de facto standard, portable across nearly all operating systems
52	including Win32. 
53	
54	Said that, at time of this writing, official libpcap 0.8.1 is out and doesn't include
55	support for PACKET_MMAP, and also probably the libpcap included in your distribution. 
56	
57	I'm aware of two implementations of PACKET_MMAP in libpcap:
58	
59	    http://wiki.ipxwarzone.com/		     (by Simon Patarin, based on libpcap 0.6.2)
60	    http://public.lanl.gov/cpw/              (by Phil Wood, based on lastest libpcap)
61	
62	The rest of this document is intended for people who want to understand
63	the low level details or want to improve libpcap by including PACKET_MMAP
64	support.
65	
66	--------------------------------------------------------------------------------
67	+ How to use mmap() directly to improve capture process
68	--------------------------------------------------------------------------------
69	
70	From the system calls stand point, the use of PACKET_MMAP involves
71	the following process:
72	
73	
74	[setup]     socket() -------> creation of the capture socket
75	            setsockopt() ---> allocation of the circular buffer (ring)
76	                              option: PACKET_RX_RING
77	            mmap() ---------> mapping of the allocated buffer to the
78	                              user process
79	
80	[capture]   poll() ---------> to wait for incoming packets
81	
82	[shutdown]  close() --------> destruction of the capture socket and
83	                              deallocation of all associated 
84	                              resources.
85	
86	
87	socket creation and destruction is straight forward, and is done 
88	the same way with or without PACKET_MMAP:
89	
90	 int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
91	
92	where mode is SOCK_RAW for the raw interface were link level
93	information can be captured or SOCK_DGRAM for the cooked
94	interface where link level information capture is not 
95	supported and a link level pseudo-header is provided 
96	by the kernel.
97	
98	The destruction of the socket and all associated resources
99	is done by a simple call to close(fd).
100	
101	Next I will describe PACKET_MMAP settings and its constraints,
102	also the mapping of the circular buffer in the user process and 
103	the use of this buffer.
104	
105	--------------------------------------------------------------------------------
106	+ How to use mmap() directly to improve transmission process
107	--------------------------------------------------------------------------------
108	Transmission process is similar to capture as shown below.
109	
110	[setup]          socket() -------> creation of the transmission socket
111	                 setsockopt() ---> allocation of the circular buffer (ring)
112	                                   option: PACKET_TX_RING
113	                 bind() ---------> bind transmission socket with a network interface
114	                 mmap() ---------> mapping of the allocated buffer to the
115	                                   user process
116	
117	[transmission]   poll() ---------> wait for free packets (optional)
118	                 send() ---------> send all packets that are set as ready in
119	                                   the ring
120	                                   The flag MSG_DONTWAIT can be used to return
121	                                   before end of transfer.
122	
123	[shutdown]  close() --------> destruction of the transmission socket and
124	                              deallocation of all associated resources.
125	
126	Binding the socket to your network interface is mandatory (with zero copy) to
127	know the header size of frames used in the circular buffer.
128	
129	As capture, each frame contains two parts:
130	
131	 --------------------
132	| struct tpacket_hdr | Header. It contains the status of
133	|                    | of this frame
134	|--------------------|
135	| data buffer        |
136	.                    .  Data that will be sent over the network interface.
137	.                    .
138	 --------------------
139	
140	 bind() associates the socket to your network interface thanks to
141	 sll_ifindex parameter of struct sockaddr_ll.
142	
143	 Initialization example:
144	
145	 struct sockaddr_ll my_addr;
146	 struct ifreq s_ifr;
147	 ...
148	
149	 strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
150	
151	 /* get interface index of eth0 */
152	 ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
153	
154	 /* fill sockaddr_ll struct to prepare binding */
155	 my_addr.sll_family = AF_PACKET;
156	 my_addr.sll_protocol = htons(ETH_P_ALL);
157	 my_addr.sll_ifindex =  s_ifr.ifr_ifindex;
158	
159	 /* bind socket to eth0 */
160	 bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
161	
162	 A complete tutorial is available at: http://wiki.gnu-log.net/
163	
164	By default, the user should put data at :
165	 frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
166	
167	So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
168	the beginning of the user data will be at :
169	 frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
170	
171	If you wish to put user data at a custom offset from the beginning of
172	the frame (for payload alignment with SOCK_RAW mode for instance) you
173	can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
174	to make this work it must be enabled previously with setsockopt()
175	and the PACKET_TX_HAS_OFF option.
176	
177	--------------------------------------------------------------------------------
178	+ PACKET_MMAP settings
179	--------------------------------------------------------------------------------
180	
181	To setup PACKET_MMAP from user level code is done with a call like
182	
183	 - Capture process
184	     setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
185	 - Transmission process
186	     setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
187	
188	The most significant argument in the previous call is the req parameter, 
189	this parameter must to have the following structure:
190	
191	    struct tpacket_req
192	    {
193	        unsigned int    tp_block_size;  /* Minimal size of contiguous block */
194	        unsigned int    tp_block_nr;    /* Number of blocks */
195	        unsigned int    tp_frame_size;  /* Size of frame */
196	        unsigned int    tp_frame_nr;    /* Total number of frames */
197	    };
198	
199	This structure is defined in /usr/include/linux/if_packet.h and establishes a 
200	circular buffer (ring) of unswappable memory.
201	Being mapped in the capture process allows reading the captured frames and 
202	related meta-information like timestamps without requiring a system call.
203	
204	Frames are grouped in blocks. Each block is a physically contiguous
205	region of memory and holds tp_block_size/tp_frame_size frames. The total number 
206	of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because
207	
208	    frames_per_block = tp_block_size/tp_frame_size
209	
210	indeed, packet_set_ring checks that the following condition is true
211	
212	    frames_per_block * tp_block_nr == tp_frame_nr
213	
214	Lets see an example, with the following values:
215	
216	     tp_block_size= 4096
217	     tp_frame_size= 2048
218	     tp_block_nr  = 4
219	     tp_frame_nr  = 8
220	
221	we will get the following buffer structure:
222	
223	        block #1                 block #2         
224	+---------+---------+    +---------+---------+    
225	| frame 1 | frame 2 |    | frame 3 | frame 4 |    
226	+---------+---------+    +---------+---------+    
227	
228	        block #3                 block #4
229	+---------+---------+    +---------+---------+
230	| frame 5 | frame 6 |    | frame 7 | frame 8 |
231	+---------+---------+    +---------+---------+
232	
233	A frame can be of any size with the only condition it can fit in a block. A block
234	can only hold an integer number of frames, or in other words, a frame cannot 
235	be spawned across two blocks, so there are some details you have to take into 
236	account when choosing the frame_size. See "Mapping and use of the circular 
237	buffer (ring)".
238	
239	--------------------------------------------------------------------------------
240	+ PACKET_MMAP setting constraints
241	--------------------------------------------------------------------------------
242	
243	In kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch),
244	the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or
245	16384 in a 64 bit architecture. For information on these kernel versions
246	see http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt
247	
248	 Block size limit
249	------------------
250	
251	As stated earlier, each block is a contiguous physical region of memory. These 
252	memory regions are allocated with calls to the __get_free_pages() function. As 
253	the name indicates, this function allocates pages of memory, and the second
254	argument is "order" or a power of two number of pages, that is 
255	(for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes, 
256	order=2 ==> 16384 bytes, etc. The maximum size of a 
257	region allocated by __get_free_pages is determined by the MAX_ORDER macro. More 
258	precisely the limit can be calculated as:
259	
260	   PAGE_SIZE << MAX_ORDER
261	
262	   In a i386 architecture PAGE_SIZE is 4096 bytes 
263	   In a 2.4/i386 kernel MAX_ORDER is 10
264	   In a 2.6/i386 kernel MAX_ORDER is 11
265	
266	So get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel 
267	respectively, with an i386 architecture.
268	
269	User space programs can include /usr/include/sys/user.h and 
270	/usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations.
271	
272	The pagesize can also be determined dynamically with the getpagesize (2) 
273	system call. 
274	
275	 Block number limit
276	--------------------
277	
278	To understand the constraints of PACKET_MMAP, we have to see the structure 
279	used to hold the pointers to each block.
280	
281	Currently, this structure is a dynamically allocated vector with kmalloc 
282	called pg_vec, its size limits the number of blocks that can be allocated.
283	
284	    +---+---+---+---+
285	    | x | x | x | x |
286	    +---+---+---+---+
287	      |   |   |   |
288	      |   |   |   v
289	      |   |   v  block #4
290	      |   v  block #3
291	      v  block #2
292	     block #1
293	
294	kmalloc allocates any number of bytes of physically contiguous memory from 
295	a pool of pre-determined sizes. This pool of memory is maintained by the slab 
296	allocator which is at the end the responsible for doing the allocation and 
297	hence which imposes the maximum memory that kmalloc can allocate. 
298	
299	In a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The 
300	predetermined sizes that kmalloc uses can be checked in the "size-<bytes>" 
301	entries of /proc/slabinfo
302	
303	In a 32 bit architecture, pointers are 4 bytes long, so the total number of 
304	pointers to blocks is
305	
306	     131072/4 = 32768 blocks
307	
308	 PACKET_MMAP buffer size calculator
309	------------------------------------
310	
311	Definitions:
312	
313	<size-max>    : is the maximum size of allocable with kmalloc (see /proc/slabinfo)
314	<pointer size>: depends on the architecture -- sizeof(void *)
315	<page size>   : depends on the architecture -- PAGE_SIZE or getpagesize (2)
316	<max-order>   : is the value defined with MAX_ORDER
317	<frame size>  : it's an upper bound of frame's capture size (more on this later)
318	
319	from these definitions we will derive 
320	
321		<block number> = <size-max>/<pointer size>
322		<block size> = <pagesize> << <max-order>
323	
324	so, the max buffer size is
325	
326		<block number> * <block size>
327	
328	and, the number of frames be
329	
330		<block number> * <block size> / <frame size>
331	
332	Suppose the following parameters, which apply for 2.6 kernel and an
333	i386 architecture:
334	
335		<size-max> = 131072 bytes
336		<pointer size> = 4 bytes
337		<pagesize> = 4096 bytes
338		<max-order> = 11
339	
340	and a value for <frame size> of 2048 bytes. These parameters will yield
341	
342		<block number> = 131072/4 = 32768 blocks
343		<block size> = 4096 << 11 = 8 MiB.
344	
345	and hence the buffer will have a 262144 MiB size. So it can hold 
346	262144 MiB / 2048 bytes = 134217728 frames
347	
348	Actually, this buffer size is not possible with an i386 architecture. 
349	Remember that the memory is allocated in kernel space, in the case of 
350	an i386 kernel's memory size is limited to 1GiB.
351	
352	All memory allocations are not freed until the socket is closed. The memory 
353	allocations are done with GFP_KERNEL priority, this basically means that 
354	the allocation can wait and swap other process' memory in order to allocate 
355	the necessary memory, so normally limits can be reached.
356	
357	 Other constraints
358	-------------------
359	
360	If you check the source code you will see that what I draw here as a frame
361	is not only the link level frame. At the beginning of each frame there is a 
362	header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
363	meta information like timestamp. So what we draw here a frame it's really 
364	the following (from include/linux/if_packet.h):
365	
366	/*
367	   Frame structure:
368	
369	   - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
370	   - struct tpacket_hdr
371	   - pad to TPACKET_ALIGNMENT=16
372	   - struct sockaddr_ll
373	   - Gap, chosen so that packet data (Start+tp_net) aligns to 
374	     TPACKET_ALIGNMENT=16
375	   - Start+tp_mac: [ Optional MAC header ]
376	   - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
377	   - Pad to align to TPACKET_ALIGNMENT=16
378	 */
379	 
380	 The following are conditions that are checked in packet_set_ring
381	
382	   tp_block_size must be a multiple of PAGE_SIZE (1)
383	   tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
384	   tp_frame_size must be a multiple of TPACKET_ALIGNMENT
385	   tp_frame_nr   must be exactly frames_per_block*tp_block_nr
386	
387	Note that tp_block_size should be chosen to be a power of two or there will
388	be a waste of memory.
389	
390	--------------------------------------------------------------------------------
391	+ Mapping and use of the circular buffer (ring)
392	--------------------------------------------------------------------------------
393	
394	The mapping of the buffer in the user process is done with the conventional 
395	mmap function. Even the circular buffer is compound of several physically
396	discontiguous blocks of memory, they are contiguous to the user space, hence
397	just one call to mmap is needed:
398	
399	    mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
400	
401	If tp_frame_size is a divisor of tp_block_size frames will be 
402	contiguously spaced by tp_frame_size bytes. If not, each
403	tp_block_size/tp_frame_size frames there will be a gap between 
404	the frames. This is because a frame cannot be spawn across two
405	blocks. 
406	
407	At the beginning of each frame there is an status field (see 
408	struct tpacket_hdr). If this field is 0 means that the frame is ready
409	to be used for the kernel, If not, there is a frame the user can read 
410	and the following flags apply:
411	
412	+++ Capture process:
413	     from include/linux/if_packet.h
414	
415	     #define TP_STATUS_COPY          2 
416	     #define TP_STATUS_LOSING        4 
417	     #define TP_STATUS_CSUMNOTREADY  8 
418	
419	TP_STATUS_COPY        : This flag indicates that the frame (and associated
420	                        meta information) has been truncated because it's 
421	                        larger than tp_frame_size. This packet can be 
422	                        read entirely with recvfrom().
423	                        
424	                        In order to make this work it must to be
425	                        enabled previously with setsockopt() and 
426	                        the PACKET_COPY_THRESH option. 
427	
428	                        The number of frames than can be buffered to 
429	                        be read with recvfrom is limited like a normal socket.
430	                        See the SO_RCVBUF option in the socket (7) man page.
431	
432	TP_STATUS_LOSING      : indicates there were packet drops from last time 
433	                        statistics where checked with getsockopt() and
434	                        the PACKET_STATISTICS option.
435	
436	TP_STATUS_CSUMNOTREADY: currently it's used for outgoing IP packets which 
437	                        its checksum will be done in hardware. So while
438	                        reading the packet we should not try to check the 
439	                        checksum. 
440	
441	for convenience there are also the following defines:
442	
443	     #define TP_STATUS_KERNEL        0
444	     #define TP_STATUS_USER          1
445	
446	The kernel initializes all frames to TP_STATUS_KERNEL, when the kernel
447	receives a packet it puts in the buffer and updates the status with
448	at least the TP_STATUS_USER flag. Then the user can read the packet,
449	once the packet is read the user must zero the status field, so the kernel 
450	can use again that frame buffer.
451	
452	The user can use poll (any other variant should apply too) to check if new
453	packets are in the ring:
454	
455	    struct pollfd pfd;
456	
457	    pfd.fd = fd;
458	    pfd.revents = 0;
459	    pfd.events = POLLIN|POLLRDNORM|POLLERR;
460	
461	    if (status == TP_STATUS_KERNEL)
462	        retval = poll(&pfd, 1, timeout);
463	
464	It doesn't incur in a race condition to first check the status value and 
465	then poll for frames.
466	
467	++ Transmission process
468	Those defines are also used for transmission:
469	
470	     #define TP_STATUS_AVAILABLE        0 // Frame is available
471	     #define TP_STATUS_SEND_REQUEST     1 // Frame will be sent on next send()
472	     #define TP_STATUS_SENDING          2 // Frame is currently in transmission
473	     #define TP_STATUS_WRONG_FORMAT     4 // Frame format is not correct
474	
475	First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
476	packet, the user fills a data buffer of an available frame, sets tp_len to
477	current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
478	This can be done on multiple frames. Once the user is ready to transmit, it
479	calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
480	forwarded to the network device. The kernel updates each status of sent
481	frames with TP_STATUS_SENDING until the end of transfer.
482	At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
483	
484	    header->tp_len = in_i_size;
485	    header->tp_status = TP_STATUS_SEND_REQUEST;
486	    retval = send(this->socket, NULL, 0, 0);
487	
488	The user can also use poll() to check if a buffer is available:
489	(status == TP_STATUS_SENDING)
490	
491	    struct pollfd pfd;
492	    pfd.fd = fd;
493	    pfd.revents = 0;
494	    pfd.events = POLLOUT;
495	    retval = poll(&pfd, 1, timeout);
496	
497	-------------------------------------------------------------------------------
498	+ What TPACKET versions are available and when to use them?
499	-------------------------------------------------------------------------------
500	
501	 int val = tpacket_version;
502	 setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
503	 getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
504	
505	where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
506	
507	TPACKET_V1:
508		- Default if not otherwise specified by setsockopt(2)
509		- RX_RING, TX_RING available
510		- VLAN metadata information available for packets
511		  (TP_STATUS_VLAN_VALID)
512	
513	TPACKET_V1 --> TPACKET_V2:
514		- Made 64 bit clean due to unsigned long usage in TPACKET_V1
515		  structures, thus this also works on 64 bit kernel with 32 bit
516		  userspace and the like
517		- Timestamp resolution in nanoseconds instead of microseconds
518		- RX_RING, TX_RING available
519		- How to switch to TPACKET_V2:
520			1. Replace struct tpacket_hdr by struct tpacket2_hdr
521			2. Query header len and save
522			3. Set protocol version to 2, set up ring as usual
523			4. For getting the sockaddr_ll,
524			   use (void *)hdr + TPACKET_ALIGN(hdrlen) instead of
525			   (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
526	
527	TPACKET_V2 --> TPACKET_V3:
528		- Flexible buffer implementation:
529			1. Blocks can be configured with non-static frame-size
530			2. Read/poll is at a block-level (as opposed to packet-level)
531			3. Added poll timeout to avoid indefinite user-space wait
532			   on idle links
533			4. Added user-configurable knobs:
534				4.1 block::timeout
535				4.2 tpkt_hdr::sk_rxhash
536		- RX Hash data available in user space
537		- Currently only RX_RING available
538	
539	-------------------------------------------------------------------------------
540	+ AF_PACKET fanout mode
541	-------------------------------------------------------------------------------
542	
543	In the AF_PACKET fanout mode, packet reception can be load balanced among
544	processes. This also works in combination with mmap(2) on packet sockets.
545	
546	Minimal example code by David S. Miller (try things like "./test eth0 hash",
547	"./test eth0 lb", etc.):
548	
549	#include <stddef.h>
550	#include <stdlib.h>
551	#include <stdio.h>
552	#include <string.h>
553	
554	#include <sys/types.h>
555	#include <sys/wait.h>
556	#include <sys/socket.h>
557	#include <sys/ioctl.h>
558	
559	#include <unistd.h>
560	
561	#include <linux/if_ether.h>
562	#include <linux/if_packet.h>
563	
564	#include <net/if.h>
565	
566	static const char *device_name;
567	static int fanout_type;
568	static int fanout_id;
569	
570	#ifndef PACKET_FANOUT
571	# define PACKET_FANOUT			18
572	# define PACKET_FANOUT_HASH		0
573	# define PACKET_FANOUT_LB		1
574	#endif
575	
576	static int setup_socket(void)
577	{
578		int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
579		struct sockaddr_ll ll;
580		struct ifreq ifr;
581		int fanout_arg;
582	
583		if (fd < 0) {
584			perror("socket");
585			return EXIT_FAILURE;
586		}
587	
588		memset(&ifr, 0, sizeof(ifr));
589		strcpy(ifr.ifr_name, device_name);
590		err = ioctl(fd, SIOCGIFINDEX, &ifr);
591		if (err < 0) {
592			perror("SIOCGIFINDEX");
593			return EXIT_FAILURE;
594		}
595	
596		memset(&ll, 0, sizeof(ll));
597		ll.sll_family = AF_PACKET;
598		ll.sll_ifindex = ifr.ifr_ifindex;
599		err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
600		if (err < 0) {
601			perror("bind");
602			return EXIT_FAILURE;
603		}
604	
605		fanout_arg = (fanout_id | (fanout_type << 16));
606		err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
607				 &fanout_arg, sizeof(fanout_arg));
608		if (err) {
609			perror("setsockopt");
610			return EXIT_FAILURE;
611		}
612	
613		return fd;
614	}
615	
616	static void fanout_thread(void)
617	{
618		int fd = setup_socket();
619		int limit = 10000;
620	
621		if (fd < 0)
622			exit(fd);
623	
624		while (limit-- > 0) {
625			char buf[1600];
626			int err;
627	
628			err = read(fd, buf, sizeof(buf));
629			if (err < 0) {
630				perror("read");
631				exit(EXIT_FAILURE);
632			}
633			if ((limit % 10) == 0)
634				fprintf(stdout, "(%d) \n", getpid());
635		}
636	
637		fprintf(stdout, "%d: Received 10000 packets\n", getpid());
638	
639		close(fd);
640		exit(0);
641	}
642	
643	int main(int argc, char **argp)
644	{
645		int fd, err;
646		int i;
647	
648		if (argc != 3) {
649			fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
650			return EXIT_FAILURE;
651		}
652	
653		if (!strcmp(argp[2], "hash"))
654			fanout_type = PACKET_FANOUT_HASH;
655		else if (!strcmp(argp[2], "lb"))
656			fanout_type = PACKET_FANOUT_LB;
657		else {
658			fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
659			exit(EXIT_FAILURE);
660		}
661	
662		device_name = argp[1];
663		fanout_id = getpid() & 0xffff;
664	
665		for (i = 0; i < 4; i++) {
666			pid_t pid = fork();
667	
668			switch (pid) {
669			case 0:
670				fanout_thread();
671	
672			case -1:
673				perror("fork");
674				exit(EXIT_FAILURE);
675			}
676		}
677	
678		for (i = 0; i < 4; i++) {
679			int status;
680	
681			wait(&status);
682		}
683	
684		return 0;
685	}
686	
687	-------------------------------------------------------------------------------
688	+ PACKET_TIMESTAMP
689	-------------------------------------------------------------------------------
690	
691	The PACKET_TIMESTAMP setting determines the source of the timestamp in
692	the packet meta information.  If your NIC is capable of timestamping
693	packets in hardware, you can request those hardware timestamps to used.
694	Note: you may need to enable the generation of hardware timestamps with
695	SIOCSHWTSTAMP.
696	
697	PACKET_TIMESTAMP accepts the same integer bit field as
698	SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE
699	and SOF_TIMESTAMPING_RAW_HARDWARE values are recognized by
700	PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
701	SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.
702	
703	    int req = 0;
704	    req |= SOF_TIMESTAMPING_SYS_HARDWARE;
705	    setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
706	
707	If PACKET_TIMESTAMP is not set, a software timestamp generated inside
708	the networking stack is used (the behavior before this setting was added).
709	
710	See include/linux/net_tstamp.h and Documentation/networking/timestamping
711	for more information on hardware timestamps.
712	
713	-------------------------------------------------------------------------------
714	+ Miscellaneous bits
715	-------------------------------------------------------------------------------
716	
717	- Packet sockets work well together with Linux socket filters, thus you also
718	  might want to have a look at Documentation/networking/filter.txt
719	
720	--------------------------------------------------------------------------------
721	+ THANKS
722	--------------------------------------------------------------------------------
723	   
724	   Jesse Brandeburg, for fixing my grammathical/spelling errors
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.