About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / core-api / cpu_hotplug.rst




Custom Search

Based on kernel version 4.16.1. Page generated on 2018-04-09 11:52 EST.

1	=========================
2	CPU hotplug in the Kernel
3	=========================
4	
5	:Date: December, 2016
6	:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
7	          Rusty Russell <rusty@rustcorp.com.au>,
8	          Srivatsa Vaddagiri <vatsa@in.ibm.com>,
9	          Ashok Raj <ashok.raj@intel.com>,
10	          Joel Schopp <jschopp@austin.ibm.com>
11	
12	Introduction
13	============
14	
15	Modern advances in system architectures have introduced advanced error
16	reporting and correction capabilities in processors. There are couple OEMS that
17	support NUMA hardware which are hot pluggable as well, where physical node
18	insertion and removal require support for CPU hotplug.
19	
20	Such advances require CPUs available to a kernel to be removed either for
21	provisioning reasons, or for RAS purposes to keep an offending CPU off
22	system execution path. Hence the need for CPU hotplug support in the
23	Linux kernel.
24	
25	A more novel use of CPU-hotplug support is its use today in suspend resume
26	support for SMP. Dual-core and HT support makes even a laptop run SMP kernels
27	which didn't support these methods.
28	
29	
30	Command Line Switches
31	=====================
32	``maxcpus=n``
33	  Restrict boot time CPUs to *n*. Say if you have fourV CPUs, using
34	  ``maxcpus=2`` will only boot two. You can choose to bring the
35	  other CPUs later online.
36	
37	``nr_cpus=n``
38	  Restrict the total amount CPUs the kernel will support. If the number
39	  supplied here is lower than the number of physically available CPUs than
40	  those CPUs can not be brought online later.
41	
42	``additional_cpus=n``
43	  Use this to limit hotpluggable CPUs. This option sets
44	  ``cpu_possible_mask = cpu_present_mask + additional_cpus``
45	
46	  This option is limited to the IA64 architecture.
47	
48	``possible_cpus=n``
49	  This option sets ``possible_cpus`` bits in ``cpu_possible_mask``.
50	
51	  This option is limited to the X86 and S390 architecture.
52	
53	``cede_offline={"off","on"}``
54	  Use this option to disable/enable putting offlined processors to an extended
55	  ``H_CEDE`` state on supported pseries platforms. If nothing is specified,
56	  ``cede_offline`` is set to "on".
57	
58	  This option is limited to the PowerPC architecture.
59	
60	``cpu0_hotplug``
61	  Allow to shutdown CPU0.
62	
63	  This option is limited to the X86 architecture.
64	
65	CPU maps
66	========
67	
68	``cpu_possible_mask``
69	  Bitmap of possible CPUs that can ever be available in the
70	  system. This is used to allocate some boot time memory for per_cpu variables
71	  that aren't designed to grow/shrink as CPUs are made available or removed.
72	  Once set during boot time discovery phase, the map is static, i.e no bits
73	  are added or removed anytime. Trimming it accurately for your system needs
74	  upfront can save some boot time memory.
75	
76	``cpu_online_mask``
77	  Bitmap of all CPUs currently online. Its set in ``__cpu_up()``
78	  after a CPU is available for kernel scheduling and ready to receive
79	  interrupts from devices. Its cleared when a CPU is brought down using
80	  ``__cpu_disable()``, before which all OS services including interrupts are
81	  migrated to another target CPU.
82	
83	``cpu_present_mask``
84	  Bitmap of CPUs currently present in the system. Not all
85	  of them may be online. When physical hotplug is processed by the relevant
86	  subsystem (e.g ACPI) can change and new bit either be added or removed
87	  from the map depending on the event is hot-add/hot-remove. There are currently
88	  no locking rules as of now. Typical usage is to init topology during boot,
89	  at which time hotplug is disabled.
90	
91	You really don't need to manipulate any of the system CPU maps. They should
92	be read-only for most use. When setting up per-cpu resources almost always use
93	``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro
94	``for_each_cpu()`` can be used to iterate over a custom CPU mask.
95	
96	Never use anything other than ``cpumask_t`` to represent bitmap of CPUs.
97	
98	
99	Using CPU hotplug
100	=================
101	The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently
102	available on multiple architectures including ARM, MIPS, PowerPC and X86. The
103	configuration is done via the sysfs interface: ::
104	
105	 $ ls -lh /sys/devices/system/cpu
106	 total 0
107	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu0
108	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu1
109	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu2
110	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu3
111	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu4
112	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu5
113	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu6
114	 drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu7
115	 drwxr-xr-x  2 root root    0 Dec 21 16:33 hotplug
116	 -r--r--r--  1 root root 4.0K Dec 21 16:33 offline
117	 -r--r--r--  1 root root 4.0K Dec 21 16:33 online
118	 -r--r--r--  1 root root 4.0K Dec 21 16:33 possible
119	 -r--r--r--  1 root root 4.0K Dec 21 16:33 present
120	
121	The files *offline*, *online*, *possible*, *present* represent the CPU masks.
122	Each CPU folder contains an *online* file which controls the logical on (1) and
123	off (0) state. To logically shutdown CPU4: ::
124	
125	 $ echo 0 > /sys/devices/system/cpu/cpu4/online
126	  smpboot: CPU 4 is now offline
127	
128	Once the CPU is shutdown, it will be removed from */proc/interrupts*,
129	*/proc/cpuinfo* and should also not be shown visible by the *top* command. To
130	bring CPU4 back online: ::
131	
132	 $ echo 1 > /sys/devices/system/cpu/cpu4/online
133	 smpboot: Booting Node 0 Processor 4 APIC 0x1
134	
135	The CPU is usable again. This should work on all CPUs. CPU0 is often special
136	and excluded from CPU hotplug. On X86 the kernel option
137	*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
138	shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
139	used. Some known dependencies of CPU0:
140	
141	* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
142	* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
143	
144	Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
145	on CPU0.
146	
147	The CPU hotplug coordination
148	============================
149	
150	The offline case
151	----------------
152	Once a CPU has been logically shutdown the teardown callbacks of registered
153	hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating
154	at state ``CPUHP_OFFLINE``. This includes:
155	
156	* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen*
157	  will be set to true.
158	* All processes are migrated away from this outgoing CPU to new CPUs.
159	  The new CPU is chosen from each process' current cpuset, which may be
160	  a subset of all online CPUs.
161	* All interrupts targeted to this CPU are migrated to a new CPU
162	* timers are also migrated to a new CPU
163	* Once all services are migrated, kernel calls an arch specific routine
164	  ``__cpu_disable()`` to perform arch specific cleanup.
165	
166	Using the hotplug API
167	---------------------
168	It is possible to receive notifications once a CPU is offline or onlined. This
169	might be important to certain drivers which need to perform some kind of setup
170	or clean up functions based on the number of available CPUs: ::
171	
172	  #include <linux/cpuhotplug.h>
173	
174	  ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "X/Y:online",
175	                          Y_online, Y_prepare_down);
176	
177	*X* is the subsystem and *Y* the particular driver. The *Y_online* callback
178	will be invoked during registration on all online CPUs. If an error
179	occurs during the online callback the *Y_prepare_down* callback will be
180	invoked on all CPUs on which the online callback was previously invoked.
181	After registration completed, the *Y_online* callback will be invoked
182	once a CPU is brought online and *Y_prepare_down* will be invoked when a
183	CPU is shutdown. All resources which were previously allocated in
184	*Y_online* should be released in *Y_prepare_down*.
185	The return value *ret* is negative if an error occurred during the
186	registration process. Otherwise a positive value is returned which
187	contains the allocated hotplug for dynamically allocated states
188	(*CPUHP_AP_ONLINE_DYN*). It will return zero for predefined states.
189	
190	The callback can be remove by invoking ``cpuhp_remove_state()``. In case of a
191	dynamically allocated state (*CPUHP_AP_ONLINE_DYN*) use the returned state.
192	During the removal of a hotplug state the teardown callback will be invoked.
193	
194	Multiple instances
195	~~~~~~~~~~~~~~~~~~
196	If a driver has multiple instances and each instance needs to perform the
197	callback independently then it is likely that a ''multi-state'' should be used.
198	First a multi-state state needs to be registered: ::
199	
200	  ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "X/Y:online,
201	                                Y_online, Y_prepare_down);
202	  Y_hp_online = ret;
203	
204	The ``cpuhp_setup_state_multi()`` behaves similar to ``cpuhp_setup_state()``
205	except it prepares the callbacks for a multi state and does not invoke
206	the callbacks. This is a one time setup.
207	Once a new instance is allocated, you need to register this new instance: ::
208	
209	  ret = cpuhp_state_add_instance(Y_hp_online, &d->node);
210	
211	This function will add this instance to your previously allocated
212	*Y_hp_online* state and invoke the previously registered callback
213	(*Y_online*) on all online CPUs. The *node* element is a ``struct
214	hlist_node`` member of your per-instance data structure.
215	
216	On removal of the instance: ::
217	  cpuhp_state_remove_instance(Y_hp_online, &d->node)
218	
219	should be invoked which will invoke the teardown callback on all online
220	CPUs.
221	
222	Manual setup
223	~~~~~~~~~~~~
224	Usually it is handy to invoke setup and teardown callbacks on registration or
225	removal of a state because usually the operation needs to performed once a CPU
226	goes online (offline) and during initial setup (shutdown) of the driver. However
227	each registration and removal function is also available with a ``_nocalls``
228	suffix which does not invoke the provided callbacks if the invocation of the
229	callbacks is not desired. During the manual setup (or teardown) the functions
230	``get_online_cpus()`` and ``put_online_cpus()`` should be used to inhibit CPU
231	hotplug operations.
232	
233	
234	The ordering of the events
235	--------------------------
236	The hotplug states are defined in ``include/linux/cpuhotplug.h``:
237	
238	* The states *CPUHP_OFFLINE* … *CPUHP_AP_OFFLINE* are invoked before the
239	  CPU is up.
240	* The states *CPUHP_AP_OFFLINE* … *CPUHP_AP_ONLINE* are invoked
241	  just the after the CPU has been brought up. The interrupts are off and
242	  the scheduler is not yet active on this CPU. Starting with *CPUHP_AP_OFFLINE*
243	  the callbacks are invoked on the target CPU.
244	* The states between *CPUHP_AP_ONLINE_DYN* and *CPUHP_AP_ONLINE_DYN_END* are
245	  reserved for the dynamic allocation.
246	* The states are invoked in the reverse order on CPU shutdown starting with
247	  *CPUHP_ONLINE* and stopping at *CPUHP_OFFLINE*. Here the callbacks are
248	  invoked on the CPU that will be shutdown until *CPUHP_AP_OFFLINE*.
249	
250	A dynamically allocated state via *CPUHP_AP_ONLINE_DYN* is often enough.
251	However if an earlier invocation during the bring up or shutdown is required
252	then an explicit state should be acquired. An explicit state might also be
253	required if the hotplug event requires specific ordering in respect to
254	another hotplug event.
255	
256	Testing of hotplug states
257	=========================
258	One way to verify whether a custom state is working as expected or not is to
259	shutdown a CPU and then put it online again. It is also possible to put the CPU
260	to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to
261	*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE*
262	which would lead to rollback to the online state.
263	
264	All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states``: ::
265	
266	 $ tail /sys/devices/system/cpu/hotplug/states
267	 138: mm/vmscan:online
268	 139: mm/vmstat:online
269	 140: lib/percpu_cnt:online
270	 141: acpi/cpu-drv:online
271	 142: base/cacheinfo:online
272	 143: virtio/net:online
273	 144: x86/mce:online
274	 145: printk:online
275	 168: sched:active
276	 169: online
277	
278	To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue: ::
279	
280	  $ cat /sys/devices/system/cpu/cpu4/hotplug/state
281	  169
282	  $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target
283	  $ cat /sys/devices/system/cpu/cpu4/hotplug/state
284	  140
285	
286	It is important to note that the teardown callbac of state 140 have been
287	invoked. And now get back online: ::
288	
289	  $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target
290	  $ cat /sys/devices/system/cpu/cpu4/hotplug/state
291	  169
292	
293	With trace events enabled, the individual steps are visible, too: ::
294	
295	  #  TASK-PID   CPU#    TIMESTAMP  FUNCTION
296	  #     | |       |        |         |
297	      bash-394  [001]  22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work)
298	   cpuhp/4-31   [004]  22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate)
299	   cpuhp/4-31   [004]  22.990: cpuhp_exit:  cpu: 0004  state: 168 step: 168 ret: 0
300	   cpuhp/4-31   [004]  22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down)
301	   cpuhp/4-31   [004]  22.992: cpuhp_exit:  cpu: 0004  state: 144 step: 144 ret: 0
302	   cpuhp/4-31   [004]  22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep)
303	   cpuhp/4-31   [004]  22.994: cpuhp_exit:  cpu: 0004  state: 143 step: 143 ret: 0
304	   cpuhp/4-31   [004]  22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down)
305	   cpuhp/4-31   [004]  22.996: cpuhp_exit:  cpu: 0004  state: 142 step: 142 ret: 0
306	      bash-394  [001]  22.997: cpuhp_exit:  cpu: 0004  state: 140 step: 169 ret: 0
307	      bash-394  [005]  95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work)
308	   cpuhp/4-31   [004]  95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online)
309	   cpuhp/4-31   [004]  95.542: cpuhp_exit:  cpu: 0004  state: 141 step: 141 ret: 0
310	   cpuhp/4-31   [004]  95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online)
311	   cpuhp/4-31   [004]  95.544: cpuhp_exit:  cpu: 0004  state: 142 step: 142 ret: 0
312	   cpuhp/4-31   [004]  95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online)
313	   cpuhp/4-31   [004]  95.546: cpuhp_exit:  cpu: 0004  state: 143 step: 143 ret: 0
314	   cpuhp/4-31   [004]  95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online)
315	   cpuhp/4-31   [004]  95.548: cpuhp_exit:  cpu: 0004  state: 144 step: 144 ret: 0
316	   cpuhp/4-31   [004]  95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify)
317	   cpuhp/4-31   [004]  95.550: cpuhp_exit:  cpu: 0004  state: 145 step: 145 ret: 0
318	   cpuhp/4-31   [004]  95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate)
319	   cpuhp/4-31   [004]  95.552: cpuhp_exit:  cpu: 0004  state: 168 step: 168 ret: 0
320	      bash-394  [005]  95.553: cpuhp_exit:  cpu: 0004  state: 169 step: 140 ret: 0
321	
322	As it an be seen, CPU4 went down until timestamp 22.996 and then back up until
323	95.552. All invoked callbacks including their return codes are visible in the
324	trace.
325	
326	Architecture's requirements
327	===========================
328	The following functions and configurations are required:
329	
330	``CONFIG_HOTPLUG_CPU``
331	  This entry needs to be enabled in Kconfig
332	
333	``__cpu_up()``
334	  Arch interface to bring up a CPU
335	
336	``__cpu_disable()``
337	  Arch interface to shutdown a CPU, no more interrupts can be handled by the
338	  kernel after the routine returns. This includes the shutdown of the timer.
339	
340	``__cpu_die()``
341	  This actually supposed to ensure death of the CPU. Actually look at some
342	  example code in other arch that implement CPU hotplug. The processor is taken
343	  down from the ``idle()`` loop for that specific architecture. ``__cpu_die()``
344	  typically waits for some per_cpu state to be set, to ensure the processor dead
345	  routine is called to be sure positively.
346	
347	User Space Notification
348	=======================
349	After CPU successfully onlined or offline udev events are sent. A udev rule like: ::
350	
351	  SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh"
352	
353	will receive all events. A script like: ::
354	
355	  #!/bin/sh
356	
357	  if [ "${ACTION}" = "offline" ]
358	  then
359	      echo "CPU ${DEVPATH##*/} offline"
360	
361	  elif [ "${ACTION}" = "online" ]
362	  then
363	      echo "CPU ${DEVPATH##*/} online"
364	
365	  fi
366	
367	can process the event further.
368	
369	Kernel Inline Documentations Reference
370	======================================
371	
372	.. kernel-doc:: include/linux/cpuhotplug.h
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.