Based on kernel version 4.9. Page generated on 2016-12-21 14:36 EST.
1 Debugging hibernation and suspend 2 (C) 2007 Rafael J. Wysocki <firstname.lastname@example.org>, GPL 3 4 1. Testing hibernation (aka suspend to disk or STD) 5 6 To check if hibernation works, you can try to hibernate in the "reboot" mode: 7 8 # echo reboot > /sys/power/disk 9 # echo disk > /sys/power/state 10 11 and the system should create a hibernation image, reboot, resume and get back to 12 the command prompt where you have started the transition. If that happens, 13 hibernation is most likely to work correctly. Still, you need to repeat the 14 test at least a couple of times in a row for confidence. [This is necessary, 15 because some problems only show up on a second attempt at suspending and 16 resuming the system.] Moreover, hibernating in the "reboot" and "shutdown" 17 modes causes the PM core to skip some platform-related callbacks which on ACPI 18 systems might be necessary to make hibernation work. Thus, if your machine fails 19 to hibernate or resume in the "reboot" mode, you should try the "platform" mode: 20 21 # echo platform > /sys/power/disk 22 # echo disk > /sys/power/state 23 24 which is the default and recommended mode of hibernation. 25 26 Unfortunately, the "platform" mode of hibernation does not work on some systems 27 with broken BIOSes. In such cases the "shutdown" mode of hibernation might 28 work: 29 30 # echo shutdown > /sys/power/disk 31 # echo disk > /sys/power/state 32 33 (it is similar to the "reboot" mode, but it requires you to press the power 34 button to make the system resume). 35 36 If neither "platform" nor "shutdown" hibernation mode works, you will need to 37 identify what goes wrong. 38 39 a) Test modes of hibernation 40 41 To find out why hibernation fails on your system, you can use a special testing 42 facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then, 43 there is the file /sys/power/pm_test that can be used to make the hibernation 44 core run in a test mode. There are 5 test modes available: 45 46 freezer 47 - test the freezing of processes 48 49 devices 50 - test the freezing of processes and suspending of devices 51 52 platform 53 - test the freezing of processes, suspending of devices and platform 54 global control methods(*) 55 56 processors 57 - test the freezing of processes, suspending of devices, platform 58 global control methods(*) and the disabling of nonboot CPUs 59 60 core 61 - test the freezing of processes, suspending of devices, platform global 62 control methods(*), the disabling of nonboot CPUs and suspending of 63 platform/system devices 64 65 (*) the platform global control methods are only available on ACPI systems 66 and are only tested if the hibernation mode is set to "platform" 67 68 To use one of them it is necessary to write the corresponding string to 69 /sys/power/pm_test (eg. "devices" to test the freezing of processes and 70 suspending devices) and issue the standard hibernation commands. For example, 71 to use the "devices" test mode along with the "platform" mode of hibernation, 72 you should do the following: 73 74 # echo devices > /sys/power/pm_test 75 # echo platform > /sys/power/disk 76 # echo disk > /sys/power/state 77 78 Then, the kernel will try to freeze processes, suspend devices, wait a few 79 seconds (5 by default, but configurable by the suspend.pm_test_delay module 80 parameter), resume devices and thaw processes. If "platform" is written to 81 /sys/power/pm_test , then after suspending devices the kernel will additionally 82 invoke the global control methods (eg. ACPI global control methods) used to 83 prepare the platform firmware for hibernation. Next, it will wait a 84 configurable number of seconds and invoke the platform (eg. ACPI) global 85 methods used to cancel hibernation etc. 86 87 Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal 88 hibernation/suspend operations. Also, when open for reading, /sys/power/pm_test 89 contains a space-separated list of all available tests (including "none" that 90 represents the normal functionality) in which the current test level is 91 indicated by square brackets. 92 93 Generally, as you can see, each test level is more "invasive" than the previous 94 one and the "core" level tests the hardware and drivers as deeply as possible 95 without creating a hibernation image. Obviously, if the "devices" test fails, 96 the "platform" test will fail as well and so on. Thus, as a rule of thumb, you 97 should try the test modes starting from "freezer", through "devices", "platform" 98 and "processors" up to "core" (repeat the test on each level a couple of times 99 to make sure that any random factors are avoided). 100 101 If the "freezer" test fails, there is a task that cannot be frozen (in that case 102 it usually is possible to identify the offending task by analysing the output of 103 dmesg obtained after the failing test). Failure at this level usually means 104 that there is a problem with the tasks freezer subsystem that should be 105 reported. 106 107 If the "devices" test fails, most likely there is a driver that cannot suspend 108 or resume its device (in the latter case the system may hang or become unstable 109 after the test, so please take that into consideration). To find this driver, 110 you can carry out a binary search according to the rules: 111 - if the test fails, unload a half of the drivers currently loaded and repeat 112 (that would probably involve rebooting the system, so always note what drivers 113 have been loaded before the test), 114 - if the test succeeds, load a half of the drivers you have unloaded most 115 recently and repeat. 116 117 Once you have found the failing driver (there can be more than just one of 118 them), you have to unload it every time before hibernation. In that case please 119 make sure to report the problem with the driver. 120 121 It is also possible that the "devices" test will still fail after you have 122 unloaded all modules. In that case, you may want to look in your kernel 123 configuration for the drivers that can be compiled as modules (and test again 124 with these drivers compiled as modules). You may also try to use some special 125 kernel command line options such as "noapic", "noacpi" or even "acpi=off". 126 127 If the "platform" test fails, there is a problem with the handling of the 128 platform (eg. ACPI) firmware on your system. In that case the "platform" mode 129 of hibernation is not likely to work. You can try the "shutdown" mode, but that 130 is rather a poor man's workaround. 131 132 If the "processors" test fails, the disabling/enabling of nonboot CPUs does not 133 work (of course, this only may be an issue on SMP systems) and the problem 134 should be reported. In that case you can also try to switch the nonboot CPUs 135 off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and 136 see if that works. 137 138 If the "core" test fails, which means that suspending of the system/platform 139 devices has failed (these devices are suspended on one CPU with interrupts off), 140 the problem is most probably hardware-related and serious, so it should be 141 reported. 142 143 A failure of any of the "platform", "processors" or "core" tests may cause your 144 system to hang or become unstable, so please beware. Such a failure usually 145 indicates a serious problem that very well may be related to the hardware, but 146 please report it anyway. 147 148 b) Testing minimal configuration 149 150 If all of the hibernation test modes work, you can boot the system with the 151 "init=/bin/bash" command line parameter and attempt to hibernate in the 152 "reboot", "shutdown" and "platform" modes. If that does not work, there 153 probably is a problem with a driver statically compiled into the kernel and you 154 can try to compile more drivers as modules, so that they can be tested 155 individually. Otherwise, there is a problem with a modular driver and you can 156 find it by loading a half of the modules you normally use and binary searching 157 in accordance with the algorithm: 158 - if there are n modules loaded and the attempt to suspend and resume fails, 159 unload n/2 of the modules and try again (that would probably involve rebooting 160 the system), 161 - if there are n modules loaded and the attempt to suspend and resume succeeds, 162 load n/2 modules more and try again. 163 164 Again, if you find the offending module(s), it(they) must be unloaded every time 165 before hibernation, and please report the problem with it(them). 166 167 c) Using the "test_resume" hibernation option 168 169 /sys/power/disk generally tells the kernel what to do after creating a 170 hibernation image. One of the available options is "test_resume" which 171 causes the just created image to be used for immediate restoration. Namely, 172 after doing: 173 174 # echo test_resume > /sys/power/disk 175 # echo disk > /sys/power/state 176 177 a hibernation image will be created and a resume from it will be triggered 178 immediately without involving the platform firmware in any way. 179 180 That test can be used to check if failures to resume from hibernation are 181 related to bad interactions with the platform firmware. That is, if the above 182 works every time, but resume from actual hibernation does not work or is 183 unreliable, the platform firmware may be responsible for the failures. 184 185 On architectures and platforms that support using different kernels to restore 186 hibernation images (that is, the kernel used to read the image from storage and 187 load it into memory is different from the one included in the image) or support 188 kernel address space randomization, it also can be used to check if failures 189 to resume may be related to the differences between the restore and image 190 kernels. 191 192 d) Advanced debugging 193 194 In case that hibernation does not work on your system even in the minimal 195 configuration and compiling more drivers as modules is not practical or some 196 modules cannot be unloaded, you can use one of the more advanced debugging 197 techniques to find the problem. First, if there is a serial port in your box, 198 you can boot the kernel with the 'no_console_suspend' parameter and try to log 199 kernel messages using the serial console. This may provide you with some 200 information about the reasons of the suspend (resume) failure. Alternatively, 201 it may be possible to use a FireWire port for debugging with firescope 202 (http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to 203 use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt . 204 205 2. Testing suspend to RAM (STR) 206 207 To verify that the STR works, it is generally more convenient to use the s2ram 208 tool available from http://suspend.sf.net and documented at 209 http://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK). 210 211 Namely, after writing "freezer", "devices", "platform", "processors", or "core" 212 into /sys/power/pm_test (available if the kernel is compiled with 213 CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding 214 to given string. The STR test modes are defined in the same way as for 215 hibernation, so please refer to Section 1 for more information about them. In 216 particular, the "core" test allows you to test everything except for the actual 217 invocation of the platform firmware in order to put the system into the sleep 218 state. 219 220 Among other things, the testing with the help of /sys/power/pm_test may allow 221 you to identify drivers that fail to suspend or resume their devices. They 222 should be unloaded every time before an STR transition. 223 224 Next, you can follow the instructions at S2RAM_LINK to test the system, but if 225 it does not work "out of the box", you may need to boot it with 226 "init=/bin/bash" and test s2ram in the minimal configuration. In that case, 227 you may be able to search for failing drivers by following the procedure 228 analogous to the one described in section 1. If you find some failing drivers, 229 you will have to unload them every time before an STR transition (ie. before 230 you run s2ram), and please report the problems with them. 231 232 There is a debugfs entry which shows the suspend to RAM statistics. Here is an 233 example of its output. 234 # mount -t debugfs none /sys/kernel/debug 235 # cat /sys/kernel/debug/suspend_stats 236 success: 20 237 fail: 5 238 failed_freeze: 0 239 failed_prepare: 0 240 failed_suspend: 5 241 failed_suspend_noirq: 0 242 failed_resume: 0 243 failed_resume_noirq: 0 244 failures: 245 last_failed_dev: alarm 246 adc 247 last_failed_errno: -16 248 -16 249 last_failed_step: suspend 250 suspend 251 Field success means the success number of suspend to RAM, and field fail means 252 the failure number. Others are the failure number of different steps of suspend 253 to RAM. suspend_stats just lists the last 2 failed devices, error number and 254 failed step of suspend.