About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / power / basic-pm-debugging.txt




Custom Search

Based on kernel version 3.13. Page generated on 2014-01-20 22:04 EST.

1	Debugging hibernation and suspend
2		(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
3	
4	1. Testing hibernation (aka suspend to disk or STD)
5	
6	To check if hibernation works, you can try to hibernate in the "reboot" mode:
7	
8	# echo reboot > /sys/power/disk
9	# echo disk > /sys/power/state
10	
11	and the system should create a hibernation image, reboot, resume and get back to
12	the command prompt where you have started the transition.  If that happens,
13	hibernation is most likely to work correctly.  Still, you need to repeat the
14	test at least a couple of times in a row for confidence.  [This is necessary,
15	because some problems only show up on a second attempt at suspending and
16	resuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"
17	modes causes the PM core to skip some platform-related callbacks which on ACPI
18	systems might be necessary to make hibernation work.  Thus, if your machine fails
19	to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
20	
21	# echo platform > /sys/power/disk
22	# echo disk > /sys/power/state
23	
24	which is the default and recommended mode of hibernation.
25	
26	Unfortunately, the "platform" mode of hibernation does not work on some systems
27	with broken BIOSes.  In such cases the "shutdown" mode of hibernation might
28	work:
29	
30	# echo shutdown > /sys/power/disk
31	# echo disk > /sys/power/state
32	
33	(it is similar to the "reboot" mode, but it requires you to press the power
34	button to make the system resume).
35	
36	If neither "platform" nor "shutdown" hibernation mode works, you will need to
37	identify what goes wrong.
38	
39	a) Test modes of hibernation
40	
41	To find out why hibernation fails on your system, you can use a special testing
42	facility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
43	there is the file /sys/power/pm_test that can be used to make the hibernation
44	core run in a test mode.  There are 5 test modes available:
45	
46	freezer
47	- test the freezing of processes
48	
49	devices
50	- test the freezing of processes and suspending of devices
51	
52	platform
53	- test the freezing of processes, suspending of devices and platform
54	  global control methods(*)
55	
56	processors
57	- test the freezing of processes, suspending of devices, platform
58	  global control methods(*) and the disabling of nonboot CPUs
59	
60	core
61	- test the freezing of processes, suspending of devices, platform global
62	  control methods(*), the disabling of nonboot CPUs and suspending of
63	  platform/system devices
64	
65	(*) the platform global control methods are only available on ACPI systems
66	    and are only tested if the hibernation mode is set to "platform"
67	
68	To use one of them it is necessary to write the corresponding string to
69	/sys/power/pm_test (eg. "devices" to test the freezing of processes and
70	suspending devices) and issue the standard hibernation commands.  For example,
71	to use the "devices" test mode along with the "platform" mode of hibernation,
72	you should do the following:
73	
74	# echo devices > /sys/power/pm_test
75	# echo platform > /sys/power/disk
76	# echo disk > /sys/power/state
77	
78	Then, the kernel will try to freeze processes, suspend devices, wait 5 seconds,
79	resume devices and thaw processes.  If "platform" is written to
80	/sys/power/pm_test , then after suspending devices the kernel will additionally
81	invoke the global control methods (eg. ACPI global control methods) used to
82	prepare the platform firmware for hibernation.  Next, it will wait 5 seconds and
83	invoke the platform (eg. ACPI) global methods used to cancel hibernation etc.
84	
85	Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
86	hibernation/suspend operations.  Also, when open for reading, /sys/power/pm_test
87	contains a space-separated list of all available tests (including "none" that
88	represents the normal functionality) in which the current test level is
89	indicated by square brackets.
90	
91	Generally, as you can see, each test level is more "invasive" than the previous
92	one and the "core" level tests the hardware and drivers as deeply as possible
93	without creating a hibernation image.  Obviously, if the "devices" test fails,
94	the "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
95	should try the test modes starting from "freezer", through "devices", "platform"
96	and "processors" up to "core" (repeat the test on each level a couple of times
97	to make sure that any random factors are avoided).
98	
99	If the "freezer" test fails, there is a task that cannot be frozen (in that case
100	it usually is possible to identify the offending task by analysing the output of
101	dmesg obtained after the failing test).  Failure at this level usually means
102	that there is a problem with the tasks freezer subsystem that should be
103	reported.
104	
105	If the "devices" test fails, most likely there is a driver that cannot suspend
106	or resume its device (in the latter case the system may hang or become unstable
107	after the test, so please take that into consideration).  To find this driver,
108	you can carry out a binary search according to the rules:
109	- if the test fails, unload a half of the drivers currently loaded and repeat
110	(that would probably involve rebooting the system, so always note what drivers
111	have been loaded before the test),
112	- if the test succeeds, load a half of the drivers you have unloaded most
113	recently and repeat.
114	
115	Once you have found the failing driver (there can be more than just one of
116	them), you have to unload it every time before hibernation.  In that case please
117	make sure to report the problem with the driver.
118	
119	It is also possible that the "devices" test will still fail after you have
120	unloaded all modules. In that case, you may want to look in your kernel
121	configuration for the drivers that can be compiled as modules (and test again
122	with these drivers compiled as modules).  You may also try to use some special
123	kernel command line options such as "noapic", "noacpi" or even "acpi=off".
124	
125	If the "platform" test fails, there is a problem with the handling of the
126	platform (eg. ACPI) firmware on your system.  In that case the "platform" mode
127	of hibernation is not likely to work.  You can try the "shutdown" mode, but that
128	is rather a poor man's workaround.
129	
130	If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
131	work (of course, this only may be an issue on SMP systems) and the problem
132	should be reported.  In that case you can also try to switch the nonboot CPUs
133	off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
134	see if that works.
135	
136	If the "core" test fails, which means that suspending of the system/platform
137	devices has failed (these devices are suspended on one CPU with interrupts off),
138	the problem is most probably hardware-related and serious, so it should be
139	reported.
140	
141	A failure of any of the "platform", "processors" or "core" tests may cause your
142	system to hang or become unstable, so please beware.  Such a failure usually
143	indicates a serious problem that very well may be related to the hardware, but
144	please report it anyway.
145	
146	b) Testing minimal configuration
147	
148	If all of the hibernation test modes work, you can boot the system with the
149	"init=/bin/bash" command line parameter and attempt to hibernate in the
150	"reboot", "shutdown" and "platform" modes.  If that does not work, there
151	probably is a problem with a driver statically compiled into the kernel and you
152	can try to compile more drivers as modules, so that they can be tested
153	individually.  Otherwise, there is a problem with a modular driver and you can
154	find it by loading a half of the modules you normally use and binary searching
155	in accordance with the algorithm:
156	- if there are n modules loaded and the attempt to suspend and resume fails,
157	unload n/2 of the modules and try again (that would probably involve rebooting
158	the system),
159	- if there are n modules loaded and the attempt to suspend and resume succeeds,
160	load n/2 modules more and try again.
161	
162	Again, if you find the offending module(s), it(they) must be unloaded every time
163	before hibernation, and please report the problem with it(them).
164	
165	c) Advanced debugging
166	
167	In case that hibernation does not work on your system even in the minimal
168	configuration and compiling more drivers as modules is not practical or some
169	modules cannot be unloaded, you can use one of the more advanced debugging
170	techniques to find the problem.  First, if there is a serial port in your box,
171	you can boot the kernel with the 'no_console_suspend' parameter and try to log
172	kernel messages using the serial console.  This may provide you with some
173	information about the reasons of the suspend (resume) failure.  Alternatively,
174	it may be possible to use a FireWire port for debugging with firescope
175	(ftp://ftp.firstfloor.org/pub/ak/firescope/).  On x86 it is also possible to
176	use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
177	
178	2. Testing suspend to RAM (STR)
179	
180	To verify that the STR works, it is generally more convenient to use the s2ram
181	tool available from http://suspend.sf.net and documented at
182	http://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).
183	
184	Namely, after writing "freezer", "devices", "platform", "processors", or "core"
185	into /sys/power/pm_test (available if the kernel is compiled with
186	CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
187	to given string.  The STR test modes are defined in the same way as for
188	hibernation, so please refer to Section 1 for more information about them.  In
189	particular, the "core" test allows you to test everything except for the actual
190	invocation of the platform firmware in order to put the system into the sleep
191	state.
192	
193	Among other things, the testing with the help of /sys/power/pm_test may allow
194	you to identify drivers that fail to suspend or resume their devices.  They
195	should be unloaded every time before an STR transition.
196	
197	Next, you can follow the instructions at S2RAM_LINK to test the system, but if
198	it does not work "out of the box", you may need to boot it with
199	"init=/bin/bash" and test s2ram in the minimal configuration.  In that case,
200	you may be able to search for failing drivers by following the procedure
201	analogous to the one described in section 1.  If you find some failing drivers,
202	you will have to unload them every time before an STR transition (ie. before
203	you run s2ram), and please report the problems with them.
204	
205	There is a debugfs entry which shows the suspend to RAM statistics. Here is an
206	example of its output.
207		# mount -t debugfs none /sys/kernel/debug
208		# cat /sys/kernel/debug/suspend_stats
209		success: 20
210		fail: 5
211		failed_freeze: 0
212		failed_prepare: 0
213		failed_suspend: 5
214		failed_suspend_noirq: 0
215		failed_resume: 0
216		failed_resume_noirq: 0
217		failures:
218		  last_failed_dev:	alarm
219					adc
220		  last_failed_errno:	-16
221					-16
222		  last_failed_step:	suspend
223					suspend
224	Field success means the success number of suspend to RAM, and field fail means
225	the failure number. Others are the failure number of different steps of suspend
226	to RAM. suspend_stats just lists the last 2 failed devices, error number and
227	failed step of suspend.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.