About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / accounting / taskstats.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:52 EST.

1	Per-task statistics interface
2	-----------------------------
3	
4	
5	Taskstats is a netlink-based interface for sending per-task and
6	per-process statistics from the kernel to userspace.
7	
8	Taskstats was designed for the following benefits:
9	
10	- efficiently provide statistics during lifetime of a task and on its exit
11	- unified interface for multiple accounting subsystems
12	- extensibility for use by future accounting patches
13	
14	Terminology
15	-----------
16	
17	"pid", "tid" and "task" are used interchangeably and refer to the standard
18	Linux task defined by struct task_struct.  per-pid stats are the same as
19	per-task stats.
20	
21	"tgid", "process" and "thread group" are used interchangeably and refer to the
22	tasks that share an mm_struct i.e. the traditional Unix process. Despite the
23	use of tgid, there is no special treatment for the task that is thread group
24	leader - a process is deemed alive as long as it has any task belonging to it.
25	
26	Usage
27	-----
28	
29	To get statistics during a task's lifetime, userspace opens a unicast netlink
30	socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
31	The response contains statistics for a task (if pid is specified) or the sum of
32	statistics for all tasks of the process (if tgid is specified).
33	
34	To obtain statistics for tasks which are exiting, the userspace listener
35	sends a register command and specifies a cpumask. Whenever a task exits on
36	one of the cpus in the cpumask, its per-pid statistics are sent to the
37	registered listener. Using cpumasks allows the data received by one listener
38	to be limited and assists in flow control over the netlink interface and is
39	explained in more detail below.
40	
41	If the exiting task is the last thread exiting its thread group,
42	an additional record containing the per-tgid stats is also sent to userspace.
43	The latter contains the sum of per-pid stats for all threads in the thread
44	group, both past and present.
45	
46	getdelays.c is a simple utility demonstrating usage of the taskstats interface
47	for reporting delay accounting statistics. Users can register cpumasks,
48	send commands and process responses, listen for per-tid/tgid exit data,
49	write the data received to a file and do basic flow control by increasing
50	receive buffer sizes.
51	
52	Interface
53	---------
54	
55	The user-kernel interface is encapsulated in include/linux/taskstats.h
56	
57	To avoid this documentation becoming obsolete as the interface evolves, only
58	an outline of the current version is given. taskstats.h always overrides the
59	description here.
60	
61	struct taskstats is the common accounting structure for both per-pid and
62	per-tgid data. It is versioned and can be extended by each accounting subsystem
63	that is added to the kernel. The fields and their semantics are defined in the
64	taskstats.h file.
65	
66	The data exchanged between user and kernel space is a netlink message belonging
67	to the NETLINK_GENERIC family and using the netlink attributes interface.
68	The messages are in the format
69	
70	    +----------+- - -+-------------+-------------------+
71	    | nlmsghdr | Pad |  genlmsghdr | taskstats payload |
72	    +----------+- - -+-------------+-------------------+
73	
74	
75	The taskstats payload is one of the following three kinds:
76	
77	1. Commands: Sent from user to kernel. Commands to get data on
78	a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
79	containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
80	the task/process for which userspace wants statistics.
81	
82	Commands to register/deregister interest in exit data from a set of cpus
83	consist of one attribute, of type
84	TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
85	attribute payload. The cpumask is specified as an ascii string of
86	comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
87	the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
88	in cpus before closing the listening socket, the kernel cleans up its interest
89	set over time. However, for the sake of efficiency, an explicit deregistration
90	is advisable.
91	
92	2. Response for a command: sent from the kernel in response to a userspace
93	command. The payload is a series of three attributes of type:
94	
95	a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
96	a pid/tgid will be followed by some stats.
97	
98	b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
99	are being returned.
100	
101	c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
102	same structure is used for both per-pid and per-tgid stats.
103	
104	3. New message sent by kernel whenever a task exits. The payload consists of a
105	   series of attributes of the following type:
106	
107	a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
108	b) TASKSTATS_TYPE_PID: contains exiting task's pid
109	c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
110	d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
111	e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
112	f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
113	
114	
115	per-tgid stats
116	--------------
117	
118	Taskstats provides per-process stats, in addition to per-task stats, since
119	resource management is often done at a process granularity and aggregating task
120	stats in userspace alone is inefficient and potentially inaccurate (due to lack
121	of atomicity).
122	
123	However, maintaining per-process, in addition to per-task stats, within the
124	kernel has space and time overheads. To address this, the taskstats code
125	accumulates each exiting task's statistics into a process-wide data structure.
126	When the last task of a process exits, the process level data accumulated also
127	gets sent to userspace (along with the per-task data).
128	
129	When a user queries to get per-tgid data, the sum of all other live threads in
130	the group is added up and added to the accumulated total for previously exited
131	threads of the same thread group.
132	
133	Extending taskstats
134	-------------------
135	
136	There are two ways to extend the taskstats interface to export more
137	per-task/process stats as patches to collect them get added to the kernel
138	in future:
139	
140	1. Adding more fields to the end of the existing struct taskstats. Backward
141	   compatibility is ensured by the version number within the
142	   structure. Userspace will use only the fields of the struct that correspond
143	   to the version its using.
144	
145	2. Defining separate statistic structs and using the netlink attributes
146	   interface to return them. Since userspace processes each netlink attribute
147	   independently, it can always ignore attributes whose type it does not
148	   understand (because it is using an older version of the interface).
149	
150	
151	Choosing between 1. and 2. is a matter of trading off flexibility and
152	overhead. If only a few fields need to be added, then 1. is the preferable
153	path since the kernel and userspace don't need to incur the overhead of
154	processing new netlink attributes. But if the new fields expand the existing
155	struct too much, requiring disparate userspace accounting utilities to
156	unnecessarily receive large structures whose fields are of no interest, then
157	extending the attributes structure would be worthwhile.
158	
159	Flow control for taskstats
160	--------------------------
161	
162	When the rate of task exits becomes large, a listener may not be able to keep
163	up with the kernel's rate of sending per-tid/tgid exit data leading to data
164	loss. This possibility gets compounded when the taskstats structure gets
165	extended and the number of cpus grows large.
166	
167	To avoid losing statistics, userspace should do one or more of the following:
168	
169	- increase the receive buffer sizes for the netlink sockets opened by
170	listeners to receive exit data.
171	
172	- create more listeners and reduce the number of cpus being listened to by
173	each listener. In the extreme case, there could be one listener for each cpu.
174	Users may also consider setting the cpu affinity of the listener to the subset
175	of cpus to which it listens, especially if they are listening to just one cpu.
176	
177	Despite these measures, if the userspace receives ENOBUFS error messages
178	indicated overflow of receive buffers, it should take measures to handle the
179	loss of data.
180	
181	----
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog