Based on kernel version 3.3. Page generated on 2012-03-23 21:36 EST.
1 2 Hypervisor-Assisted Dump 3 ------------------------ 4 November 2007 5 6 The goal of hypervisor-assisted dump is to enable the dump of 7 a crashed system, and to do so from a fully-reset system, and 8 to minimize the total elapsed time until the system is back 9 in production use. 10 11 As compared to kdump or other strategies, hypervisor-assisted 12 dump offers several strong, practical advantages: 13 14 -- Unlike kdump, the system has been reset, and loaded 15 with a fresh copy of the kernel. In particular, 16 PCI and I/O devices have been reinitialized and are 17 in a clean, consistent state. 18 -- As the dump is performed, the dumped memory becomes 19 immediately available to the system for normal use. 20 -- After the dump is completed, no further reboots are 21 required; the system will be fully usable, and running 22 in its normal, production mode on its normal kernel. 23 24 The above can only be accomplished by coordination with, 25 and assistance from the hypervisor. The procedure is 26 as follows: 27 28 -- When a system crashes, the hypervisor will save 29 the low 256MB of RAM to a previously registered 30 save region. It will also save system state, system 31 registers, and hardware PTE's. 32 33 -- After the low 256MB area has been saved, the 34 hypervisor will reset PCI and other hardware state. 35 It will *not* clear RAM. It will then launch the 36 bootloader, as normal. 37 38 -- The freshly booted kernel will notice that there 39 is a new node (ibm,dump-kernel) in the device tree, 40 indicating that there is crash data available from 41 a previous boot. It will boot into only 256MB of RAM, 42 reserving the rest of system memory. 43 44 -- Userspace tools will parse /sys/kernel/release_region 45 and read /proc/vmcore to obtain the contents of memory, 46 which holds the previous crashed kernel. The userspace 47 tools may copy this info to disk, or network, nas, san, 48 iscsi, etc. as desired. 49 50 For Example: the values in /sys/kernel/release-region 51 would look something like this (address-range pairs). 52 CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / 53 DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A 54 55 -- As the userspace tools complete saving a portion of 56 dump, they echo an offset and size to 57 /sys/kernel/release_region to release the reserved 58 memory back to general use. 59 60 An example of this is: 61 "echo 0x40000000 0x10000000 > /sys/kernel/release_region" 62 which will release 256MB at the 1GB boundary. 63 64 Please note that the hypervisor-assisted dump feature 65 is only available on Power6-based systems with recent 66 firmware versions. 67 68 Implementation details: 69 ---------------------- 70 71 During boot, a check is made to see if firmware supports 72 this feature on this particular machine. If it does, then 73 we check to see if a active dump is waiting for us. If yes 74 then everything but 256 MB of RAM is reserved during early 75 boot. This area is released once we collect a dump from user 76 land scripts that are run. If there is dump data, then 77 the /sys/kernel/release_region file is created, and 78 the reserved memory is held. 79 80 If there is no waiting dump data, then only the highest 81 256MB of the ram is reserved as a scratch area. This area 82 is *not* released: this region will be kept permanently 83 reserved, so that it can act as a receptacle for a copy 84 of the low 256MB in the case a crash does occur. See, 85 however, "open issues" below, as to whether 86 such a reserved region is really needed. 87 88 Currently the dump will be copied from /proc/vmcore to a 89 a new file upon user intervention. The starting address 90 to be read and the range for each data point in provided 91 in /sys/kernel/release_region. 92 93 The tools to examine the dump will be same as the ones 94 used for kdump. 95 96 General notes: 97 -------------- 98 Security: please note that there are potential security issues 99 with any sort of dump mechanism. In particular, plaintext 100 (unencrypted) data, and possibly passwords, may be present in 101 the dump data. Userspace tools must take adequate precautions to 102 preserve security. 103 104 Open issues/ToDo: 105 ------------ 106 o The various code paths that tell the hypervisor that a crash 107 occurred, vs. it simply being a normal reboot, should be 108 reviewed, and possibly clarified/fixed. 109 110 o Instead of using /sys/kernel, should there be a /sys/dump 111 instead? There is a dump_subsys being created by the s390 code, 112 perhaps the pseries code should use a similar layout as well. 113 114 o Is reserving a 256MB region really required? The goal of 115 reserving a 256MB scratch area is to make sure that no 116 important crash data is clobbered when the hypervisor 117 save low mem to the scratch area. But, if one could assure 118 that nothing important is located in some 256MB area, then 119 it would not need to be reserved. Something that can be 120 improved in subsequent versions. 121 122 o Still working the kdump team to integrate this with kdump, 123 some work remains but this would not affect the current 124 patches. 125 126 o Still need to write a shell script, to copy the dump away. 127 Currently I am parsing it manually.