About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / fault-injection / fault-injection.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	Fault injection capabilities infrastructure
2	===========================================
3	
4	See also drivers/md/md-faulty.c and "every_nth" module option for scsi_debug.
5	
6	
7	Available fault injection capabilities
8	--------------------------------------
9	
10	o failslab
11	
12	  injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...)
13	
14	o fail_page_alloc
15	
16	  injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
17	
18	o fail_futex
19	
20	  injects futex deadlock and uaddr fault errors.
21	
22	o fail_make_request
23	
24	  injects disk IO errors on devices permitted by setting
25	  /sys/block/<device>/make-it-fail or
26	  /sys/block/<device>/<partition>/make-it-fail. (generic_make_request())
27	
28	o fail_mmc_request
29	
30	  injects MMC data errors on devices permitted by setting
31	  debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
32	
33	o fail_function
34	
35	  injects error return on specific functions, which are marked by
36	  ALLOW_ERROR_INJECTION() macro, by setting debugfs entries
37	  under /sys/kernel/debug/fail_function. No boot option supported.
38	
39	Configure fault-injection capabilities behavior
40	-----------------------------------------------
41	
42	o debugfs entries
43	
44	fault-inject-debugfs kernel module provides some debugfs entries for runtime
45	configuration of fault-injection capabilities.
46	
47	- /sys/kernel/debug/fail*/probability:
48	
49		likelihood of failure injection, in percent.
50		Format: <percent>
51	
52		Note that one-failure-per-hundred is a very high error rate
53		for some testcases.  Consider setting probability=100 and configure
54		/sys/kernel/debug/fail*/interval for such testcases.
55	
56	- /sys/kernel/debug/fail*/interval:
57	
58		specifies the interval between failures, for calls to
59		should_fail() that pass all the other tests.
60	
61		Note that if you enable this, by setting interval>1, you will
62		probably want to set probability=100.
63	
64	- /sys/kernel/debug/fail*/times:
65	
66		specifies how many times failures may happen at most.
67		A value of -1 means "no limit".
68	
69	- /sys/kernel/debug/fail*/space:
70	
71		specifies an initial resource "budget", decremented by "size"
72		on each call to should_fail(,size).  Failure injection is
73		suppressed until "space" reaches zero.
74	
75	- /sys/kernel/debug/fail*/verbose
76	
77		Format: { 0 | 1 | 2 }
78		specifies the verbosity of the messages when failure is
79		injected.  '0' means no messages; '1' will print only a single
80		log line per failure; '2' will print a call trace too -- useful
81		to debug the problems revealed by fault injection.
82	
83	- /sys/kernel/debug/fail*/task-filter:
84	
85		Format: { 'Y' | 'N' }
86		A value of 'N' disables filtering by process (default).
87		Any positive value limits failures to only processes indicated by
88		/proc/<pid>/make-it-fail==1.
89	
90	- /sys/kernel/debug/fail*/require-start:
91	- /sys/kernel/debug/fail*/require-end:
92	- /sys/kernel/debug/fail*/reject-start:
93	- /sys/kernel/debug/fail*/reject-end:
94	
95		specifies the range of virtual addresses tested during
96		stacktrace walking.  Failure is injected only if some caller
97		in the walked stacktrace lies within the required range, and
98		none lies within the rejected range.
99		Default required range is [0,ULONG_MAX) (whole of virtual address space).
100		Default rejected range is [0,0).
101	
102	- /sys/kernel/debug/fail*/stacktrace-depth:
103	
104		specifies the maximum stacktrace depth walked during search
105		for a caller within [require-start,require-end) OR
106		[reject-start,reject-end).
107	
108	- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
109	
110		Format: { 'Y' | 'N' }
111		default is 'N', setting it to 'Y' won't inject failures into
112		highmem/user allocations.
113	
114	- /sys/kernel/debug/failslab/ignore-gfp-wait:
115	- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
116	
117		Format: { 'Y' | 'N' }
118		default is 'N', setting it to 'Y' will inject failures
119		only into non-sleep allocations (GFP_ATOMIC allocations).
120	
121	- /sys/kernel/debug/fail_page_alloc/min-order:
122	
123		specifies the minimum page allocation order to be injected
124		failures.
125	
126	- /sys/kernel/debug/fail_futex/ignore-private:
127	
128		Format: { 'Y' | 'N' }
129		default is 'N', setting it to 'Y' will disable failure injections
130		when dealing with private (address space) futexes.
131	
132	- /sys/kernel/debug/fail_function/inject:
133	
134		Format: { 'function-name' | '!function-name' | '' }
135		specifies the target function of error injection by name.
136		If the function name leads '!' prefix, given function is
137		removed from injection list. If nothing specified ('')
138		injection list is cleared.
139	
140	- /sys/kernel/debug/fail_function/injectable:
141	
142		(read only) shows error injectable functions and what type of
143		error values can be specified. The error type will be one of
144		below;
145		- NULL:	retval must be 0.
146		- ERRNO: retval must be -1 to -MAX_ERRNO (-4096).
147		- ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096).
148	
149	- /sys/kernel/debug/fail_function/<functiuon-name>/retval:
150	
151		specifies the "error" return value to inject to the given
152		function for given function. This will be created when
153		user specifies new injection entry.
154	
155	o Boot option
156	
157	In order to inject faults while debugfs is not available (early boot time),
158	use the boot option:
159	
160		failslab=
161		fail_page_alloc=
162		fail_make_request=
163		fail_futex=
164		mmc_core.fail_request=<interval>,<probability>,<space>,<times>
165	
166	o proc entries
167	
168	- /proc/<pid>/fail-nth:
169	- /proc/self/task/<tid>/fail-nth:
170	
171		Write to this file of integer N makes N-th call in the task fail.
172		Read from this file returns a integer value. A value of '0' indicates
173		that the fault setup with a previous write to this file was injected.
174		A positive integer N indicates that the fault wasn't yet injected.
175		Note that this file enables all types of faults (slab, futex, etc).
176		This setting takes precedence over all other generic debugfs settings
177		like probability, interval, times, etc. But per-capability settings
178		(e.g. fail_futex/ignore-private) take precedence over it.
179	
180		This feature is intended for systematic testing of faults in a single
181		system call. See an example below.
182	
183	How to add new fault injection capability
184	-----------------------------------------
185	
186	o #include <linux/fault-inject.h>
187	
188	o define the fault attributes
189	
190	  DECLARE_FAULT_INJECTION(name);
191	
192	  Please see the definition of struct fault_attr in fault-inject.h
193	  for details.
194	
195	o provide a way to configure fault attributes
196	
197	- boot option
198	
199	  If you need to enable the fault injection capability from boot time, you can
200	  provide boot option to configure it. There is a helper function for it:
201	
202		setup_fault_attr(attr, str);
203	
204	- debugfs entries
205	
206	  failslab, fail_page_alloc, and fail_make_request use this way.
207	  Helper functions:
208	
209		fault_create_debugfs_attr(name, parent, attr);
210	
211	- module parameters
212	
213	  If the scope of the fault injection capability is limited to a
214	  single kernel module, it is better to provide module parameters to
215	  configure the fault attributes.
216	
217	o add a hook to insert failures
218	
219	  Upon should_fail() returning true, client code should inject a failure.
220	
221		should_fail(attr, size);
222	
223	Application Examples
224	--------------------
225	
226	o Inject slab allocation failures into module init/exit code
227	
228	#!/bin/bash
229	
230	FAILTYPE=failslab
231	echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
232	echo 10 > /sys/kernel/debug/$FAILTYPE/probability
233	echo 100 > /sys/kernel/debug/$FAILTYPE/interval
234	echo -1 > /sys/kernel/debug/$FAILTYPE/times
235	echo 0 > /sys/kernel/debug/$FAILTYPE/space
236	echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
237	echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
238	
239	faulty_system()
240	{
241		bash -c "echo 1 > /proc/self/make-it-fail && exec $*"
242	}
243	
244	if [ $# -eq 0 ]
245	then
246		echo "Usage: $0 modulename [ modulename ... ]"
247		exit 1
248	fi
249	
250	for m in $*
251	do
252		echo inserting $m...
253		faulty_system modprobe $m
254	
255		echo removing $m...
256		faulty_system modprobe -r $m
257	done
258	
259	------------------------------------------------------------------------------
260	
261	o Inject page allocation failures only for a specific module
262	
263	#!/bin/bash
264	
265	FAILTYPE=fail_page_alloc
266	module=$1
267	
268	if [ -z $module ]
269	then
270		echo "Usage: $0 <modulename>"
271		exit 1
272	fi
273	
274	modprobe $module
275	
276	if [ ! -d /sys/module/$module/sections ]
277	then
278		echo Module $module is not loaded
279		exit 1
280	fi
281	
282	cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
283	cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
284	
285	echo N > /sys/kernel/debug/$FAILTYPE/task-filter
286	echo 10 > /sys/kernel/debug/$FAILTYPE/probability
287	echo 100 > /sys/kernel/debug/$FAILTYPE/interval
288	echo -1 > /sys/kernel/debug/$FAILTYPE/times
289	echo 0 > /sys/kernel/debug/$FAILTYPE/space
290	echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
291	echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
292	echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
293	echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
294	
295	trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
296	
297	echo "Injecting errors into the module $module... (interrupt to stop)"
298	sleep 1000000
299	
300	------------------------------------------------------------------------------
301	
302	o Inject open_ctree error while btrfs mount
303	
304	#!/bin/bash
305	
306	rm -f testfile.img
307	dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
308	DEVICE=$(losetup --show -f testfile.img)
309	mkfs.btrfs -f $DEVICE
310	mkdir -p tmpmnt
311	
312	FAILTYPE=fail_function
313	FAILFUNC=open_ctree
314	echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject
315	echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval
316	echo N > /sys/kernel/debug/$FAILTYPE/task-filter
317	echo 100 > /sys/kernel/debug/$FAILTYPE/probability
318	echo 0 > /sys/kernel/debug/$FAILTYPE/interval
319	echo -1 > /sys/kernel/debug/$FAILTYPE/times
320	echo 0 > /sys/kernel/debug/$FAILTYPE/space
321	echo 1 > /sys/kernel/debug/$FAILTYPE/verbose
322	
323	mount -t btrfs $DEVICE tmpmnt
324	if [ $? -ne 0 ]
325	then
326		echo "SUCCESS!"
327	else
328		echo "FAILED!"
329		umount tmpmnt
330	fi
331	
332	echo > /sys/kernel/debug/$FAILTYPE/inject
333	
334	rmdir tmpmnt
335	losetup -d $DEVICE
336	rm testfile.img
337	
338	
339	Tool to run command with failslab or fail_page_alloc
340	----------------------------------------------------
341	In order to make it easier to accomplish the tasks mentioned above, we can use
342	tools/testing/fault-injection/failcmd.sh.  Please run a command
343	"./tools/testing/fault-injection/failcmd.sh --help" for more information and
344	see the following examples.
345	
346	Examples:
347	
348	Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab
349	allocation failure.
350	
351		# ./tools/testing/fault-injection/failcmd.sh \
352			-- make -C tools/testing/selftests/ run_tests
353	
354	Same as above except to specify 100 times failures at most instead of one time
355	at most by default.
356	
357		# ./tools/testing/fault-injection/failcmd.sh --times=100 \
358			-- make -C tools/testing/selftests/ run_tests
359	
360	Same as above except to inject page allocation failure instead of slab
361	allocation failure.
362	
363		# env FAILCMD_TYPE=fail_page_alloc \
364			./tools/testing/fault-injection/failcmd.sh --times=100 \
365	                -- make -C tools/testing/selftests/ run_tests
366	
367	Systematic faults using fail-nth
368	---------------------------------
369	
370	The following code systematically faults 0-th, 1-st, 2-nd and so on
371	capabilities in the socketpair() system call.
372	
373	#include <sys/types.h>
374	#include <sys/stat.h>
375	#include <sys/socket.h>
376	#include <sys/syscall.h>
377	#include <fcntl.h>
378	#include <unistd.h>
379	#include <string.h>
380	#include <stdlib.h>
381	#include <stdio.h>
382	#include <errno.h>
383	
384	int main()
385	{
386		int i, err, res, fail_nth, fds[2];
387		char buf[128];
388	
389		system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
390		sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
391		fail_nth = open(buf, O_RDWR);
392		for (i = 1;; i++) {
393			sprintf(buf, "%d", i);
394			write(fail_nth, buf, strlen(buf));
395			res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
396			err = errno;
397			pread(fail_nth, buf, sizeof(buf), 0);
398			if (res == 0) {
399				close(fds[0]);
400				close(fds[1]);
401			}
402			printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
403				res, err);
404			if (atoi(buf))
405				break;
406		}
407		return 0;
408	}
409	
410	An example output:
411	
412	1-th fault Y: res=-1/23
413	2-th fault Y: res=-1/23
414	3-th fault Y: res=-1/12
415	4-th fault Y: res=-1/12
416	5-th fault Y: res=-1/23
417	6-th fault Y: res=-1/23
418	7-th fault Y: res=-1/23
419	8-th fault Y: res=-1/12
420	9-th fault Y: res=-1/12
421	10-th fault Y: res=-1/12
422	11-th fault Y: res=-1/12
423	12-th fault Y: res=-1/12
424	13-th fault Y: res=-1/12
425	14-th fault Y: res=-1/12
426	15-th fault Y: res=-1/12
427	16-th fault N: res=0/12
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog