About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / local_ops.txt


Based on kernel version 4.9. Page generated on 2016-12-21 14:35 EST.

1		     Semantics and Behavior of Local Atomic Operations
2	
3				    Mathieu Desnoyers
4	
5	
6		This document explains the purpose of the local atomic operations, how
7	to implement them for any given architecture and shows how they can be used
8	properly. It also stresses on the precautions that must be taken when reading
9	those local variables across CPUs when the order of memory writes matters.
10	
11	Note that local_t based operations are not recommended for general kernel use.
12	Please use the this_cpu operations instead unless there is really a special purpose.
13	Most uses of local_t in the kernel have been replaced by this_cpu operations.
14	this_cpu operations combine the relocation with the local_t like semantics in
15	a single instruction and yield more compact and faster executing code.
16	
17	
18	* Purpose of local atomic operations
19	
20	Local atomic operations are meant to provide fast and highly reentrant per CPU
21	counters. They minimize the performance cost of standard atomic operations by
22	removing the LOCK prefix and memory barriers normally required to synchronize
23	across CPUs.
24	
25	Having fast per CPU atomic counters is interesting in many cases : it does not
26	require disabling interrupts to protect from interrupt handlers and it permits
27	coherent counters in NMI handlers. It is especially useful for tracing purposes
28	and for various performance monitoring counters.
29	
30	Local atomic operations only guarantee variable modification atomicity wrt the
31	CPU which owns the data. Therefore, care must taken to make sure that only one
32	CPU writes to the local_t data. This is done by using per cpu data and making
33	sure that we modify it from within a preemption safe context. It is however
34	permitted to read local_t data from any CPU : it will then appear to be written
35	out of order wrt other memory writes by the owner CPU.
36	
37	
38	* Implementation for a given architecture
39	
40	It can be done by slightly modifying the standard atomic operations : only
41	their UP variant must be kept. It typically means removing LOCK prefix (on
42	i386 and x86_64) and any SMP synchronization barrier. If the architecture does
43	not have a different behavior between SMP and UP, including asm-generic/local.h
44	in your architecture's local.h is sufficient.
45	
46	The local_t type is defined as an opaque signed long by embedding an
47	atomic_long_t inside a structure. This is made so a cast from this type to a
48	long fails. The definition looks like :
49	
50	typedef struct { atomic_long_t a; } local_t;
51	
52	
53	* Rules to follow when using local atomic operations
54	
55	- Variables touched by local ops must be per cpu variables.
56	- _Only_ the CPU owner of these variables must write to them.
57	- This CPU can use local ops from any context (process, irq, softirq, nmi, ...)
58	  to update its local_t variables.
59	- Preemption (or interrupts) must be disabled when using local ops in
60	  process context to   make sure the process won't be migrated to a
61	  different CPU between getting the per-cpu variable and doing the
62	  actual local op.
63	- When using local ops in interrupt context, no special care must be
64	  taken on a mainline kernel, since they will run on the local CPU with
65	  preemption already disabled. I suggest, however, to explicitly
66	  disable preemption anyway to make sure it will still work correctly on
67	  -rt kernels.
68	- Reading the local cpu variable will provide the current copy of the
69	  variable.
70	- Reads of these variables can be done from any CPU, because updates to
71	  "long", aligned, variables are always atomic. Since no memory
72	  synchronization is done by the writer CPU, an outdated copy of the
73	  variable can be read when reading some _other_ cpu's variables.
74	
75	
76	* How to use local atomic operations
77	
78	#include <linux/percpu.h>
79	#include <asm/local.h>
80	
81	static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
82	
83	
84	* Counting
85	
86	Counting is done on all the bits of a signed long.
87	
88	In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
89	operations : it makes sure that preemption is disabled around write access to
90	the per cpu variable. For instance :
91	
92		local_inc(&get_cpu_var(counters));
93		put_cpu_var(counters);
94	
95	If you are already in a preemption-safe context, you can use
96	this_cpu_ptr() instead.
97	
98		local_inc(this_cpu_ptr(&counters));
99	
100	
101	
102	* Reading the counters
103	
104	Those local counters can be read from foreign CPUs to sum the count. Note that
105	the data seen by local_read across CPUs must be considered to be out of order
106	relatively to other memory writes happening on the CPU that owns the data.
107	
108		long sum = 0;
109		for_each_online_cpu(cpu)
110			sum += local_read(&per_cpu(counters, cpu));
111	
112	If you want to use a remote local_read to synchronize access to a resource
113	between CPUs, explicit smp_wmb() and smp_rmb() memory barriers must be used
114	respectively on the writer and the reader CPUs. It would be the case if you use
115	the local_t variable as a counter of bytes written in a buffer : there should
116	be a smp_wmb() between the buffer write and the counter increment and also a
117	smp_rmb() between the counter read and the buffer read.
118	
119	
120	Here is a sample module which implements a basic per cpu counter using local.h.
121	
122	--- BEGIN ---
123	/* test-local.c
124	 *
125	 * Sample module for local.h usage.
126	 */
127	
128	
129	#include <asm/local.h>
130	#include <linux/module.h>
131	#include <linux/timer.h>
132	
133	static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
134	
135	static struct timer_list test_timer;
136	
137	/* IPI called on each CPU. */
138	static void test_each(void *info)
139	{
140		/* Increment the counter from a non preemptible context */
141		printk("Increment on cpu %d\n", smp_processor_id());
142		local_inc(this_cpu_ptr(&counters));
143	
144		/* This is what incrementing the variable would look like within a
145		 * preemptible context (it disables preemption) :
146		 *
147		 * local_inc(&get_cpu_var(counters));
148		 * put_cpu_var(counters);
149		 */
150	}
151	
152	static void do_test_timer(unsigned long data)
153	{
154		int cpu;
155	
156		/* Increment the counters */
157		on_each_cpu(test_each, NULL, 1);
158		/* Read all the counters */
159		printk("Counters read from CPU %d\n", smp_processor_id());
160		for_each_online_cpu(cpu) {
161			printk("Read : CPU %d, count %ld\n", cpu,
162				local_read(&per_cpu(counters, cpu)));
163		}
164		del_timer(&test_timer);
165		test_timer.expires = jiffies + 1000;
166		add_timer(&test_timer);
167	}
168	
169	static int __init test_init(void)
170	{
171		/* initialize the timer that will increment the counter */
172		init_timer(&test_timer);
173		test_timer.function = do_test_timer;
174		test_timer.expires = jiffies + 1;
175		add_timer(&test_timer);
176	
177		return 0;
178	}
179	
180	static void __exit test_exit(void)
181	{
182		del_timer_sync(&test_timer);
183	}
184	
185	module_init(test_init);
186	module_exit(test_exit);
187	
188	MODULE_LICENSE("GPL");
189	MODULE_AUTHOR("Mathieu Desnoyers");
190	MODULE_DESCRIPTION("Local Atomic Ops");
191	--- END ---
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog