About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / kprobes.txt

Based on kernel version 2.6.25. Page generated on 2008-04-18 21:22 EST.

1	Title	: Kernel Probes (Kprobes)
2	Authors	: Jim Keniston <jkenisto[AT]us.ibm[DOT]com>
3		: Prasanna S Panchamukhi <prasanna[AT]in.ibm[DOT]com>
4	
5	CONTENTS
6	
7	1. Concepts: Kprobes, Jprobes, Return Probes
8	2. Architectures Supported
9	3. Configuring Kprobes
10	4. API Reference
11	5. Kprobes Features and Limitations
12	6. Probe Overhead
13	7. TODO
14	8. Kprobes Example
15	9. Jprobes Example
16	10. Kretprobes Example
17	Appendix A: The kprobes debugfs interface
18	
19	1. Concepts: Kprobes, Jprobes, Return Probes
20	
21	Kprobes enables you to dynamically break into any kernel routine and
22	collect debugging and performance information non-disruptively. You
23	can trap at almost any kernel code address, specifying a handler
24	routine to be invoked when the breakpoint is hit.
25	
26	There are currently three types of probes: kprobes, jprobes, and
27	kretprobes (also called return probes).  A kprobe can be inserted
28	on virtually any instruction in the kernel.  A jprobe is inserted at
29	the entry to a kernel function, and provides convenient access to the
30	function's arguments.  A return probe fires when a specified function
31	returns.
32	
33	In the typical case, Kprobes-based instrumentation is packaged as
34	a kernel module.  The module's init function installs ("registers")
35	one or more probes, and the exit function unregisters them.  A
36	registration function such as register_kprobe() specifies where
37	the probe is to be inserted and what handler is to be called when
38	the probe is hit.
39	
40	The next three subsections explain how the different types of
41	probes work.  They explain certain things that you'll need to
42	know in order to make the best use of Kprobes -- e.g., the
43	difference between a pre_handler and a post_handler, and how
44	to use the maxactive and nmissed fields of a kretprobe.  But
45	if you're in a hurry to start using Kprobes, you can skip ahead
46	to section 2.
47	
48	1.1 How Does a Kprobe Work?
49	
50	When a kprobe is registered, Kprobes makes a copy of the probed
51	instruction and replaces the first byte(s) of the probed instruction
52	with a breakpoint instruction (e.g., int3 on i386 and x86_64).
53	
54	When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
55	registers are saved, and control passes to Kprobes via the
56	notifier_call_chain mechanism.  Kprobes executes the "pre_handler"
57	associated with the kprobe, passing the handler the addresses of the
58	kprobe struct and the saved registers.
59	
60	Next, Kprobes single-steps its copy of the probed instruction.
61	(It would be simpler to single-step the actual instruction in place,
62	but then Kprobes would have to temporarily remove the breakpoint
63	instruction.  This would open a small time window when another CPU
64	could sail right past the probepoint.)
65	
66	After the instruction is single-stepped, Kprobes executes the
67	"post_handler," if any, that is associated with the kprobe.
68	Execution then continues with the instruction following the probepoint.
69	
70	1.2 How Does a Jprobe Work?
71	
72	A jprobe is implemented using a kprobe that is placed on a function's
73	entry point.  It employs a simple mirroring principle to allow
74	seamless access to the probed function's arguments.  The jprobe
75	handler routine should have the same signature (arg list and return
76	type) as the function being probed, and must always end by calling
77	the Kprobes function jprobe_return().
78	
79	Here's how it works.  When the probe is hit, Kprobes makes a copy of
80	the saved registers and a generous portion of the stack (see below).
81	Kprobes then points the saved instruction pointer at the jprobe's
82	handler routine, and returns from the trap.  As a result, control
83	passes to the handler, which is presented with the same register and
84	stack contents as the probed function.  When it is done, the handler
85	calls jprobe_return(), which traps again to restore the original stack
86	contents and processor state and switch to the probed function.
87	
88	By convention, the callee owns its arguments, so gcc may produce code
89	that unexpectedly modifies that portion of the stack.  This is why
90	Kprobes saves a copy of the stack and restores it after the jprobe
91	handler has run.  Up to MAX_STACK_SIZE bytes are copied -- e.g.,
92	64 bytes on i386.
93	
94	Note that the probed function's args may be passed on the stack
95	or in registers.  The jprobe will work in either case, so long as the
96	handler's prototype matches that of the probed function.
97	
98	1.3 Return Probes
99	
100	1.3.1 How Does a Return Probe Work?
101	
102	When you call register_kretprobe(), Kprobes establishes a kprobe at
103	the entry to the function.  When the probed function is called and this
104	probe is hit, Kprobes saves a copy of the return address, and replaces
105	the return address with the address of a "trampoline."  The trampoline
106	is an arbitrary piece of code -- typically just a nop instruction.
107	At boot time, Kprobes registers a kprobe at the trampoline.
108	
109	When the probed function executes its return instruction, control
110	passes to the trampoline and that probe is hit.  Kprobes' trampoline
111	handler calls the user-specified return handler associated with the
112	kretprobe, then sets the saved instruction pointer to the saved return
113	address, and that's where execution resumes upon return from the trap.
114	
115	While the probed function is executing, its return address is
116	stored in an object of type kretprobe_instance.  Before calling
117	register_kretprobe(), the user sets the maxactive field of the
118	kretprobe struct to specify how many instances of the specified
119	function can be probed simultaneously.  register_kretprobe()
120	pre-allocates the indicated number of kretprobe_instance objects.
121	
122	For example, if the function is non-recursive and is called with a
123	spinlock held, maxactive = 1 should be enough.  If the function is
124	non-recursive and can never relinquish the CPU (e.g., via a semaphore
125	or preemption), NR_CPUS should be enough.  If maxactive <= 0, it is
126	set to a default value.  If CONFIG_PREEMPT is enabled, the default
127	is max(10, 2*NR_CPUS).  Otherwise, the default is NR_CPUS.
128	
129	It's not a disaster if you set maxactive too low; you'll just miss
130	some probes.  In the kretprobe struct, the nmissed field is set to
131	zero when the return probe is registered, and is incremented every
132	time the probed function is entered but there is no kretprobe_instance
133	object available for establishing the return probe.
134	
135	1.3.2 Kretprobe entry-handler
136	
137	Kretprobes also provides an optional user-specified handler which runs
138	on function entry. This handler is specified by setting the entry_handler
139	field of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
140	function entry is hit, the user-defined entry_handler, if any, is invoked.
141	If the entry_handler returns 0 (success) then a corresponding return handler
142	is guaranteed to be called upon function return. If the entry_handler
143	returns a non-zero error then Kprobes leaves the return address as is, and
144	the kretprobe has no further effect for that particular function instance.
145	
146	Multiple entry and return handler invocations are matched using the unique
147	kretprobe_instance object associated with them. Additionally, a user
148	may also specify per return-instance private data to be part of each
149	kretprobe_instance object. This is especially useful when sharing private
150	data between corresponding user entry and return handlers. The size of each
151	private data object can be specified at kretprobe registration time by
152	setting the data_size field of the kretprobe struct. This data can be
153	accessed through the data field of each kretprobe_instance object.
154	
155	In case probed function is entered but there is no kretprobe_instance
156	object available, then in addition to incrementing the nmissed count,
157	the user entry_handler invocation is also skipped.
158	
159	2. Architectures Supported
160	
161	Kprobes, jprobes, and return probes are implemented on the following
162	architectures:
163	
164	- i386
165	- x86_64 (AMD-64, EM64T)
166	- ppc64
167	- ia64 (Does not support probes on instruction slot1.)
168	- sparc64 (Return probes not yet implemented.)
169	- arm
170	
171	3. Configuring Kprobes
172	
173	When configuring the kernel using make menuconfig/xconfig/oldconfig,
174	ensure that CONFIG_KPROBES is set to "y".  Under "Instrumentation
175	Support", look for "Kprobes".
176	
177	So that you can load and unload Kprobes-based instrumentation modules,
178	make sure "Loadable module support" (CONFIG_MODULES) and "Module
179	unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
180	
181	Also make sure that CONFIG_KALLSYMS and perhaps even CONFIG_KALLSYMS_ALL
182	are set to "y", since kallsyms_lookup_name() is used by the in-kernel
183	kprobe address resolution code.
184	
185	If you need to insert a probe in the middle of a function, you may find
186	it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
187	so you can use "objdump -d -l vmlinux" to see the source-to-object
188	code mapping.
189	
190	4. API Reference
191	
192	The Kprobes API includes a "register" function and an "unregister"
193	function for each type of probe.  Here are terse, mini-man-page
194	specifications for these functions and the associated probe handlers
195	that you'll write.  See the files in the samples/kprobes/ sub-directory
196	for examples.
197	
198	4.1 register_kprobe
199	
200	#include <linux/kprobes.h>
201	int register_kprobe(struct kprobe *kp);
202	
203	Sets a breakpoint at the address kp->addr.  When the breakpoint is
204	hit, Kprobes calls kp->pre_handler.  After the probed instruction
205	is single-stepped, Kprobe calls kp->post_handler.  If a fault
206	occurs during execution of kp->pre_handler or kp->post_handler,
207	or during single-stepping of the probed instruction, Kprobes calls
208	kp->fault_handler.  Any or all handlers can be NULL.
209	
210	NOTE:
211	1. With the introduction of the "symbol_name" field to struct kprobe,
212	the probepoint address resolution will now be taken care of by the kernel.
213	The following will now work:
214	
215		kp.symbol_name = "symbol_name";
216	
217	(64-bit powerpc intricacies such as function descriptors are handled
218	transparently)
219	
220	2. Use the "offset" field of struct kprobe if the offset into the symbol
221	to install a probepoint is known. This field is used to calculate the
222	probepoint.
223	
224	3. Specify either the kprobe "symbol_name" OR the "addr". If both are
225	specified, kprobe registration will fail with -EINVAL.
226	
227	4. With CISC architectures (such as i386 and x86_64), the kprobes code
228	does not validate if the kprobe.addr is at an instruction boundary.
229	Use "offset" with caution.
230	
231	register_kprobe() returns 0 on success, or a negative errno otherwise.
232	
233	User's pre-handler (kp->pre_handler):
234	#include <linux/kprobes.h>
235	#include <linux/ptrace.h>
236	int pre_handler(struct kprobe *p, struct pt_regs *regs);
237	
238	Called with p pointing to the kprobe associated with the breakpoint,
239	and regs pointing to the struct containing the registers saved when
240	the breakpoint was hit.  Return 0 here unless you're a Kprobes geek.
241	
242	User's post-handler (kp->post_handler):
243	#include <linux/kprobes.h>
244	#include <linux/ptrace.h>
245	void post_handler(struct kprobe *p, struct pt_regs *regs,
246		unsigned long flags);
247	
248	p and regs are as described for the pre_handler.  flags always seems
249	to be zero.
250	
251	User's fault-handler (kp->fault_handler):
252	#include <linux/kprobes.h>
253	#include <linux/ptrace.h>
254	int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
255	
256	p and regs are as described for the pre_handler.  trapnr is the
257	architecture-specific trap number associated with the fault (e.g.,
258	on i386, 13 for a general protection fault or 14 for a page fault).
259	Returns 1 if it successfully handled the exception.
260	
261	4.2 register_jprobe
262	
263	#include <linux/kprobes.h>
264	int register_jprobe(struct jprobe *jp)
265	
266	Sets a breakpoint at the address jp->kp.addr, which must be the address
267	of the first instruction of a function.  When the breakpoint is hit,
268	Kprobes runs the handler whose address is jp->entry.
269	
270	The handler should have the same arg list and return type as the probed
271	function; and just before it returns, it must call jprobe_return().
272	(The handler never actually returns, since jprobe_return() returns
273	control to Kprobes.)  If the probed function is declared asmlinkage
274	or anything else that affects how args are passed, the handler's
275	declaration must match.
276	
277	register_jprobe() returns 0 on success, or a negative errno otherwise.
278	
279	4.3 register_kretprobe
280	
281	#include <linux/kprobes.h>
282	int register_kretprobe(struct kretprobe *rp);
283	
284	Establishes a return probe for the function whose address is
285	rp->kp.addr.  When that function returns, Kprobes calls rp->handler.
286	You must set rp->maxactive appropriately before you call
287	register_kretprobe(); see "How Does a Return Probe Work?" for details.
288	
289	register_kretprobe() returns 0 on success, or a negative errno
290	otherwise.
291	
292	User's return-probe handler (rp->handler):
293	#include <linux/kprobes.h>
294	#include <linux/ptrace.h>
295	int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs);
296	
297	regs is as described for kprobe.pre_handler.  ri points to the
298	kretprobe_instance object, of which the following fields may be
299	of interest:
300	- ret_addr: the return address
301	- rp: points to the corresponding kretprobe object
302	- task: points to the corresponding task struct
303	- data: points to per return-instance private data; see "Kretprobe
304		entry-handler" for details.
305	
306	The regs_return_value(regs) macro provides a simple abstraction to
307	extract the return value from the appropriate register as defined by
308	the architecture's ABI.
309	
310	The handler's return value is currently ignored.
311	
312	4.4 unregister_*probe
313	
314	#include <linux/kprobes.h>
315	void unregister_kprobe(struct kprobe *kp);
316	void unregister_jprobe(struct jprobe *jp);
317	void unregister_kretprobe(struct kretprobe *rp);
318	
319	Removes the specified probe.  The unregister function can be called
320	at any time after the probe has been registered.
321	
322	5. Kprobes Features and Limitations
323	
324	Kprobes allows multiple probes at the same address.  Currently,
325	however, there cannot be multiple jprobes on the same function at
326	the same time.
327	
328	In general, you can install a probe anywhere in the kernel.
329	In particular, you can probe interrupt handlers.  Known exceptions
330	are discussed in this section.
331	
332	The register_*probe functions will return -EINVAL if you attempt
333	to install a probe in the code that implements Kprobes (mostly
334	kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such
335	as do_page_fault and notifier_call_chain).
336	
337	If you install a probe in an inline-able function, Kprobes makes
338	no attempt to chase down all inline instances of the function and
339	install probes there.  gcc may inline a function without being asked,
340	so keep this in mind if you're not seeing the probe hits you expect.
341	
342	A probe handler can modify the environment of the probed function
343	-- e.g., by modifying kernel data structures, or by modifying the
344	contents of the pt_regs struct (which are restored to the registers
345	upon return from the breakpoint).  So Kprobes can be used, for example,
346	to install a bug fix or to inject faults for testing.  Kprobes, of
347	course, has no way to distinguish the deliberately injected faults
348	from the accidental ones.  Don't drink and probe.
349	
350	Kprobes makes no attempt to prevent probe handlers from stepping on
351	each other -- e.g., probing printk() and then calling printk() from a
352	probe handler.  If a probe handler hits a probe, that second probe's
353	handlers won't be run in that instance, and the kprobe.nmissed member
354	of the second probe will be incremented.
355	
356	As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
357	the same handler) may run concurrently on different CPUs.
358	
359	Kprobes does not use mutexes or allocate memory except during
360	registration and unregistration.
361	
362	Probe handlers are run with preemption disabled.  Depending on the
363	architecture, handlers may also run with interrupts disabled.  In any
364	case, your handler should not yield the CPU (e.g., by attempting to
365	acquire a semaphore).
366	
367	Since a return probe is implemented by replacing the return
368	address with the trampoline's address, stack backtraces and calls
369	to __builtin_return_address() will typically yield the trampoline's
370	address instead of the real return address for kretprobed functions.
371	(As far as we can tell, __builtin_return_address() is used only
372	for instrumentation and error reporting.)
373	
374	If the number of times a function is called does not match the number
375	of times it returns, registering a return probe on that function may
376	produce undesirable results. In such a case, a line:
377	kretprobe BUG!: Processing kretprobe d000000000041aa8 @ c00000000004f48c
378	gets printed. With this information, one will be able to correlate the
379	exact instance of the kretprobe that caused the problem. We have the
380	do_exit() case covered. do_execve() and do_fork() are not an issue.
381	We're unaware of other specific cases where this could be a problem.
382	
383	If, upon entry to or exit from a function, the CPU is running on
384	a stack other than that of the current task, registering a return
385	probe on that function may produce undesirable results.  For this
386	reason, Kprobes doesn't support return probes (or kprobes or jprobes)
387	on the x86_64 version of __switch_to(); the registration functions
388	return -EINVAL.
389	
390	6. Probe Overhead
391	
392	On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
393	microseconds to process.  Specifically, a benchmark that hits the same
394	probepoint repeatedly, firing a simple handler each time, reports 1-2
395	million hits per second, depending on the architecture.  A jprobe or
396	return-probe hit typically takes 50-75% longer than a kprobe hit.
397	When you have a return probe set on a function, adding a kprobe at
398	the entry to that function adds essentially no overhead.
399	
400	Here are sample overhead figures (in usec) for different architectures.
401	k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe
402	on same function; jr = jprobe + return probe on same function
403	
404	i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
405	k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40
406	
407	x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
408	k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
409	
410	ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
411	k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
412	
413	7. TODO
414	
415	a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
416	programming interface for probe-based instrumentation.  Try it out.
417	b. Kernel return probes for sparc64.
418	c. Support for other architectures.
419	d. User-space probes.
420	e. Watchpoint probes (which fire on data references).
421	
422	8. Kprobes Example
423	
424	See samples/kprobes/kprobe_example.c
425	
426	9. Jprobes Example
427	
428	See samples/kprobes/jprobe_example.c
429	
430	10. Kretprobes Example
431	
432	See samples/kprobes/kretprobe_example.c
433	
434	For additional information on Kprobes, refer to the following URLs:
435	http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
436	http://www.redhat.com/magazine/005mar05/features/kprobes/
437	http://www-users.cs.umn.edu/~boutcher/kprobes/
438	http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115)
439	
440	
441	Appendix A: The kprobes debugfs interface
442	
443	With recent kernels (> 2.6.20) the list of registered kprobes is visible
444	under the /debug/kprobes/ directory (assuming debugfs is mounted at /debug).
445	
446	/debug/kprobes/list: Lists all registered probes on the system
447	
448	c015d71a  k  vfs_read+0x0
449	c011a316  j  do_fork+0x0
450	c03dedc5  r  tcp_v4_rcv+0x0
451	
452	The first column provides the kernel address where the probe is inserted.
453	The second column identifies the type of probe (k - kprobe, r - kretprobe
454	and j - jprobe), while the third column specifies the symbol+offset of
455	the probe. If the probed function belongs to a module, the module name
456	is also specified.
457	
458	/debug/kprobes/enabled: Turn kprobes ON/OFF
459	
460	Provides a knob to globally turn registered kprobes ON or OFF. By default,
461	all kprobes are enabled. By echoing "0" to this file, all registered probes
462	will be disarmed, till such time a "1" is echoed to this file.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.