About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / kernel-per-CPU-kthreads.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	==========================================
2	Reducing OS jitter due to per-cpu kthreads
3	==========================================
4	
5	This document lists per-CPU kthreads in the Linux kernel and presents
6	options to control their OS jitter.  Note that non-per-CPU kthreads are
7	not listed here.  To reduce OS jitter from non-per-CPU kthreads, bind
8	them to a "housekeeping" CPU dedicated to such work.
9	
10	References
11	==========
12	
13	-	Documentation/IRQ-affinity.txt:  Binding interrupts to sets of CPUs.
14	
15	-	Documentation/cgroup-v1:  Using cgroups to bind tasks to sets of CPUs.
16	
17	-	man taskset:  Using the taskset command to bind tasks to sets
18		of CPUs.
19	
20	-	man sched_setaffinity:  Using the sched_setaffinity() system
21		call to bind tasks to sets of CPUs.
22	
23	-	/sys/devices/system/cpu/cpuN/online:  Control CPU N's hotplug state,
24		writing "0" to offline and "1" to online.
25	
26	-	In order to locate kernel-generated OS jitter on CPU N:
27	
28			cd /sys/kernel/debug/tracing
29			echo 1 > max_graph_depth # Increase the "1" for more detail
30			echo function_graph > current_tracer
31			# run workload
32			cat per_cpu/cpuN/trace
33	
34	kthreads
35	========
36	
37	Name:
38	  ehca_comp/%u
39	
40	Purpose:
41	  Periodically process Infiniband-related work.
42	
43	To reduce its OS jitter, do any of the following:
44	
45	1.	Don't use eHCA Infiniband hardware, instead choosing hardware
46		that does not require per-CPU kthreads.  This will prevent these
47		kthreads from being created in the first place.  (This will
48		work for most people, as this hardware, though important, is
49		relatively old and is produced in relatively low unit volumes.)
50	2.	Do all eHCA-Infiniband-related work on other CPUs, including
51		interrupts.
52	3.	Rework the eHCA driver so that its per-CPU kthreads are
53		provisioned only on selected CPUs.
54	
55	
56	Name:
57	  irq/%d-%s
58	
59	Purpose:
60	  Handle threaded interrupts.
61	
62	To reduce its OS jitter, do the following:
63	
64	1.	Use irq affinity to force the irq threads to execute on
65		some other CPU.
66	
67	Name:
68	  kcmtpd_ctr_%d
69	
70	Purpose:
71	  Handle Bluetooth work.
72	
73	To reduce its OS jitter, do one of the following:
74	
75	1.	Don't use Bluetooth, in which case these kthreads won't be
76		created in the first place.
77	2.	Use irq affinity to force Bluetooth-related interrupts to
78		occur on some other CPU and furthermore initiate all
79		Bluetooth activity on some other CPU.
80	
81	Name:
82	  ksoftirqd/%u
83	
84	Purpose:
85	  Execute softirq handlers when threaded or when under heavy load.
86	
87	To reduce its OS jitter, each softirq vector must be handled
88	separately as follows:
89	
90	TIMER_SOFTIRQ
91	-------------
92	
93	Do all of the following:
94	
95	1.	To the extent possible, keep the CPU out of the kernel when it
96		is non-idle, for example, by avoiding system calls and by forcing
97		both kernel threads and interrupts to execute elsewhere.
98	2.	Build with CONFIG_HOTPLUG_CPU=y.  After boot completes, force
99		the CPU offline, then bring it back online.  This forces
100		recurring timers to migrate elsewhere.	If you are concerned
101		with multiple CPUs, force them all offline before bringing the
102		first one back online.  Once you have onlined the CPUs in question,
103		do not offline any other CPUs, because doing so could force the
104		timer back onto one of the CPUs in question.
105	
106	NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
107	---------------------------------
108	
109	Do all of the following:
110	
111	1.	Force networking interrupts onto other CPUs.
112	2.	Initiate any network I/O on other CPUs.
113	3.	Once your application has started, prevent CPU-hotplug operations
114		from being initiated from tasks that might run on the CPU to
115		be de-jittered.  (It is OK to force this CPU offline and then
116		bring it back online before you start your application.)
117	
118	BLOCK_SOFTIRQ
119	-------------
120	
121	Do all of the following:
122	
123	1.	Force block-device interrupts onto some other CPU.
124	2.	Initiate any block I/O on other CPUs.
125	3.	Once your application has started, prevent CPU-hotplug operations
126		from being initiated from tasks that might run on the CPU to
127		be de-jittered.  (It is OK to force this CPU offline and then
128		bring it back online before you start your application.)
129	
130	IRQ_POLL_SOFTIRQ
131	----------------
132	
133	Do all of the following:
134	
135	1.	Force block-device interrupts onto some other CPU.
136	2.	Initiate any block I/O and block-I/O polling on other CPUs.
137	3.	Once your application has started, prevent CPU-hotplug operations
138		from being initiated from tasks that might run on the CPU to
139		be de-jittered.  (It is OK to force this CPU offline and then
140		bring it back online before you start your application.)
141	
142	TASKLET_SOFTIRQ
143	---------------
144	
145	Do one or more of the following:
146	
147	1.	Avoid use of drivers that use tasklets.  (Such drivers will contain
148		calls to things like tasklet_schedule().)
149	2.	Convert all drivers that you must use from tasklets to workqueues.
150	3.	Force interrupts for drivers using tasklets onto other CPUs,
151		and also do I/O involving these drivers on other CPUs.
152	
153	SCHED_SOFTIRQ
154	-------------
155	
156	Do all of the following:
157	
158	1.	Avoid sending scheduler IPIs to the CPU to be de-jittered,
159		for example, ensure that at most one runnable kthread is present
160		on that CPU.  If a thread that expects to run on the de-jittered
161		CPU awakens, the scheduler will send an IPI that can result in
162		a subsequent SCHED_SOFTIRQ.
163	2.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
164		is marked as an adaptive-ticks CPU using the "nohz_full="
165		boot parameter.  This reduces the number of scheduler-clock
166		interrupts that the de-jittered CPU receives, minimizing its
167		chances of being selected to do the load balancing work that
168		runs in SCHED_SOFTIRQ context.
169	3.	To the extent possible, keep the CPU out of the kernel when it
170		is non-idle, for example, by avoiding system calls and by
171		forcing both kernel threads and interrupts to execute elsewhere.
172		This further reduces the number of scheduler-clock interrupts
173		received by the de-jittered CPU.
174	
175	HRTIMER_SOFTIRQ
176	---------------
177	
178	Do all of the following:
179	
180	1.	To the extent possible, keep the CPU out of the kernel when it
181		is non-idle.  For example, avoid system calls and force both
182		kernel threads and interrupts to execute elsewhere.
183	2.	Build with CONFIG_HOTPLUG_CPU=y.  Once boot completes, force the
184		CPU offline, then bring it back online.  This forces recurring
185		timers to migrate elsewhere.  If you are concerned with multiple
186		CPUs, force them all offline before bringing the first one
187		back online.  Once you have onlined the CPUs in question, do not
188		offline any other CPUs, because doing so could force the timer
189		back onto one of the CPUs in question.
190	
191	RCU_SOFTIRQ
192	-----------
193	
194	Do at least one of the following:
195	
196	1.	Offload callbacks and keep the CPU in either dyntick-idle or
197		adaptive-ticks state by doing all of the following:
198	
199		a.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
200			de-jittered is marked as an adaptive-ticks CPU using the
201			"nohz_full=" boot parameter.  Bind the rcuo kthreads to
202			housekeeping CPUs, which can tolerate OS jitter.
203		b.	To the extent possible, keep the CPU out of the kernel
204			when it is non-idle, for example, by avoiding system
205			calls and by forcing both kernel threads and interrupts
206			to execute elsewhere.
207	
208	2.	Enable RCU to do its processing remotely via dyntick-idle by
209		doing all of the following:
210	
211		a.	Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
212		b.	Ensure that the CPU goes idle frequently, allowing other
213			CPUs to detect that it has passed through an RCU quiescent
214			state.	If the kernel is built with CONFIG_NO_HZ_FULL=y,
215			userspace execution also allows other CPUs to detect that
216			the CPU in question has passed through a quiescent state.
217		c.	To the extent possible, keep the CPU out of the kernel
218			when it is non-idle, for example, by avoiding system
219			calls and by forcing both kernel threads and interrupts
220			to execute elsewhere.
221	
222	Name:
223	  kworker/%u:%d%s (cpu, id, priority)
224	
225	Purpose:
226	  Execute workqueue requests
227	
228	To reduce its OS jitter, do any of the following:
229	
230	1.	Run your workload at a real-time priority, which will allow
231		preempting the kworker daemons.
232	2.	A given workqueue can be made visible in the sysfs filesystem
233		by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
234		Such a workqueue can be confined to a given subset of the
235		CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
236		files.	The set of WQ_SYSFS workqueues can be displayed using
237		"ls sys/devices/virtual/workqueue".  That said, the workqueues
238		maintainer would like to caution people against indiscriminately
239		sprinkling WQ_SYSFS across all the workqueues.	The reason for
240		caution is that it is easy to add WQ_SYSFS, but because sysfs is
241		part of the formal user/kernel API, it can be nearly impossible
242		to remove it, even if its addition was a mistake.
243	3.	Do any of the following needed to avoid jitter that your
244		application cannot tolerate:
245	
246		a.	Build your kernel with CONFIG_SLUB=y rather than
247			CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
248			use of each CPU's workqueues to run its cache_reap()
249			function.
250		b.	Avoid using oprofile, thus avoiding OS jitter from
251			wq_sync_buffer().
252		c.	Limit your CPU frequency so that a CPU-frequency
253			governor is not required, possibly enlisting the aid of
254			special heatsinks or other cooling technologies.  If done
255			correctly, and if you CPU architecture permits, you should
256			be able to build your kernel with CONFIG_CPU_FREQ=n to
257			avoid the CPU-frequency governor periodically running
258			on each CPU, including cs_dbs_timer() and od_dbs_timer().
259	
260			WARNING:  Please check your CPU specifications to
261			make sure that this is safe on your particular system.
262		d.	As of v3.18, Christoph Lameter's on-demand vmstat workers
263			commit prevents OS jitter due to vmstat_update() on
264			CONFIG_SMP=y systems.  Before v3.18, is not possible
265			to entirely get rid of the OS jitter, but you can
266			decrease its frequency by writing a large value to
267			/proc/sys/vm/stat_interval.  The default value is HZ,
268			for an interval of one second.	Of course, larger values
269			will make your virtual-memory statistics update more
270			slowly.  Of course, you can also run your workload at
271			a real-time priority, thus preempting vmstat_update(),
272			but if your workload is CPU-bound, this is a bad idea.
273			However, there is an RFC patch from Christoph Lameter
274			(based on an earlier one from Gilad Ben-Yossef) that
275			reduces or even eliminates vmstat overhead for some
276			workloads at https://lkml.org/lkml/2013/9/4/379.
277		e.	Boot with "elevator=noop" to avoid workqueue use by
278			the block layer.
279		f.	If running on high-end powerpc servers, build with
280			CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
281			daemon from running on each CPU every second or so.
282			(This will require editing Kconfig files and will defeat
283			this platform's RAS functionality.)  This avoids jitter
284			due to the rtas_event_scan() function.
285			WARNING:  Please check your CPU specifications to
286			make sure that this is safe on your particular system.
287		g.	If running on Cell Processor, build your kernel with
288			CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
289			spu_gov_work().
290			WARNING:  Please check your CPU specifications to
291			make sure that this is safe on your particular system.
292		h.	If running on PowerMAC, build your kernel with
293			CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
294			avoiding OS jitter from rackmeter_do_timer().
295	
296	Name:
297	  rcuc/%u
298	
299	Purpose:
300	  Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
301	
302	To reduce its OS jitter, do at least one of the following:
303	
304	1.	Build the kernel with CONFIG_PREEMPT=n.  This prevents these
305		kthreads from being created in the first place, and also obviates
306		the need for RCU priority boosting.  This approach is feasible
307		for workloads that do not require high degrees of responsiveness.
308	2.	Build the kernel with CONFIG_RCU_BOOST=n.  This prevents these
309		kthreads from being created in the first place.  This approach
310		is feasible only if your workload never requires RCU priority
311		boosting, for example, if you ensure frequent idle time on all
312		CPUs that might execute within the kernel.
313	3.	Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
314		boot parameter offloading RCU callbacks from all CPUs susceptible
315		to OS jitter.  This approach prevents the rcuc/%u kthreads from
316		having any work to do, so that they are never awakened.
317	4.	Ensure that the CPU never enters the kernel, and, in particular,
318		avoid initiating any CPU hotplug operations on this CPU.  This is
319		another way of preventing any callbacks from being queued on the
320		CPU, again preventing the rcuc/%u kthreads from having any work
321		to do.
322	
323	Name:
324	  rcuob/%d, rcuop/%d, and rcuos/%d
325	
326	Purpose:
327	  Offload RCU callbacks from the corresponding CPU.
328	
329	To reduce its OS jitter, do at least one of the following:
330	
331	1.	Use affinity, cgroups, or other mechanism to force these kthreads
332		to execute on some other CPU.
333	2.	Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
334		kthreads from being created in the first place.  However, please
335		note that this will not eliminate OS jitter, but will instead
336		shift it to RCU_SOFTIRQ.
337	
338	Name:
339	  watchdog/%u
340	
341	Purpose:
342	  Detect software lockups on each CPU.
343	
344	To reduce its OS jitter, do at least one of the following:
345	
346	1.	Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
347		kthreads from being created in the first place.
348	2.	Boot with "nosoftlockup=0", which will also prevent these kthreads
349		from being created.  Other related watchdog and softlockup boot
350		parameters may be found in Documentation/admin-guide/kernel-parameters.rst
351		and Documentation/watchdog/watchdog-parameters.txt.
352	3.	Echo a zero to /proc/sys/kernel/watchdog to disable the
353		watchdog timer.
354	4.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
355		order to reduce the frequency of OS jitter due to the watchdog
356		timer down to a level that is acceptable for your workload.
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog