About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / atomic_ops.txt




Custom Search

Based on kernel version 3.19. Page generated on 2015-02-13 21:16 EST.

1			Semantics and Behavior of Atomic and
2			         Bitmask Operations
3	
4				  David S. Miller	 
5	
6		This document is intended to serve as a guide to Linux port
7	maintainers on how to implement atomic counter, bitops, and spinlock
8	interfaces properly.
9	
10		The atomic_t type should be defined as a signed integer and
11	the atomic_long_t type as a signed long integer.  Also, they should
12	be made opaque such that any kind of cast to a normal C integer type
13	will fail.  Something like the following should suffice:
14	
15		typedef struct { int counter; } atomic_t;
16		typedef struct { long counter; } atomic_long_t;
17	
18	Historically, counter has been declared volatile.  This is now discouraged.
19	See Documentation/volatile-considered-harmful.txt for the complete rationale.
20	
21	local_t is very similar to atomic_t. If the counter is per CPU and only
22	updated by one CPU, local_t is probably more appropriate. Please see
23	Documentation/local_ops.txt for the semantics of local_t.
24	
25	The first operations to implement for atomic_t's are the initializers and
26	plain reads.
27	
28		#define ATOMIC_INIT(i)		{ (i) }
29		#define atomic_set(v, i)	((v)->counter = (i))
30	
31	The first macro is used in definitions, such as:
32	
33	static atomic_t my_counter = ATOMIC_INIT(1);
34	
35	The initializer is atomic in that the return values of the atomic operations
36	are guaranteed to be correct reflecting the initialized value if the
37	initializer is used before runtime.  If the initializer is used at runtime, a
38	proper implicit or explicit read memory barrier is needed before reading the
39	value with atomic_read from another thread.
40	
41	As with all of the atomic_ interfaces, replace the leading "atomic_"
42	with "atomic_long_" to operate on atomic_long_t.
43	
44	The second interface can be used at runtime, as in:
45	
46		struct foo { atomic_t counter; };
47		...
48	
49		struct foo *k;
50	
51		k = kmalloc(sizeof(*k), GFP_KERNEL);
52		if (!k)
53			return -ENOMEM;
54		atomic_set(&k->counter, 0);
55	
56	The setting is atomic in that the return values of the atomic operations by
57	all threads are guaranteed to be correct reflecting either the value that has
58	been set with this operation or set with another operation.  A proper implicit
59	or explicit memory barrier is needed before the value set with the operation
60	is guaranteed to be readable with atomic_read from another thread.
61	
62	Next, we have:
63	
64		#define atomic_read(v)	((v)->counter)
65	
66	which simply reads the counter value currently visible to the calling thread.
67	The read is atomic in that the return value is guaranteed to be one of the
68	values initialized or modified with the interface operations if a proper
69	implicit or explicit memory barrier is used after possible runtime
70	initialization by any other thread and the value is modified only with the
71	interface operations.  atomic_read does not guarantee that the runtime
72	initialization by any other thread is visible yet, so the user of the
73	interface must take care of that with a proper implicit or explicit memory
74	barrier.
75	
76	*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
77	
78	Some architectures may choose to use the volatile keyword, barriers, or inline
79	assembly to guarantee some degree of immediacy for atomic_read() and
80	atomic_set().  This is not uniformly guaranteed, and may change in the future,
81	so all users of atomic_t should treat atomic_read() and atomic_set() as simple
82	C statements that may be reordered or optimized away entirely by the compiler
83	or processor, and explicitly invoke the appropriate compiler and/or memory
84	barrier for each use case.  Failure to do so will result in code that may
85	suddenly break when used with different architectures or compiler
86	optimizations, or even changes in unrelated code which changes how the
87	compiler optimizes the section accessing atomic_t variables.
88	
89	*** YOU HAVE BEEN WARNED! ***
90	
91	Properly aligned pointers, longs, ints, and chars (and unsigned
92	equivalents) may be atomically loaded from and stored to in the same
93	sense as described for atomic_read() and atomic_set().  The ACCESS_ONCE()
94	macro should be used to prevent the compiler from using optimizations
95	that might otherwise optimize accesses out of existence on the one hand,
96	or that might create unsolicited accesses on the other.
97	
98	For example consider the following code:
99	
100		while (a > 0)
101			do_something();
102	
103	If the compiler can prove that do_something() does not store to the
104	variable a, then the compiler is within its rights transforming this to
105	the following:
106	
107		tmp = a;
108		if (a > 0)
109			for (;;)
110				do_something();
111	
112	If you don't want the compiler to do this (and you probably don't), then
113	you should use something like the following:
114	
115		while (ACCESS_ONCE(a) < 0)
116			do_something();
117	
118	Alternatively, you could place a barrier() call in the loop.
119	
120	For another example, consider the following code:
121	
122		tmp_a = a;
123		do_something_with(tmp_a);
124		do_something_else_with(tmp_a);
125	
126	If the compiler can prove that do_something_with() does not store to the
127	variable a, then the compiler is within its rights to manufacture an
128	additional load as follows:
129	
130		tmp_a = a;
131		do_something_with(tmp_a);
132		tmp_a = a;
133		do_something_else_with(tmp_a);
134	
135	This could fatally confuse your code if it expected the same value
136	to be passed to do_something_with() and do_something_else_with().
137	
138	The compiler would be likely to manufacture this additional load if
139	do_something_with() was an inline function that made very heavy use
140	of registers: reloading from variable a could save a flush to the
141	stack and later reload.  To prevent the compiler from attacking your
142	code in this manner, write the following:
143	
144		tmp_a = ACCESS_ONCE(a);
145		do_something_with(tmp_a);
146		do_something_else_with(tmp_a);
147	
148	For a final example, consider the following code, assuming that the
149	variable a is set at boot time before the second CPU is brought online
150	and never changed later, so that memory barriers are not needed:
151	
152		if (a)
153			b = 9;
154		else
155			b = 42;
156	
157	The compiler is within its rights to manufacture an additional store
158	by transforming the above code into the following:
159	
160		b = 42;
161		if (a)
162			b = 9;
163	
164	This could come as a fatal surprise to other code running concurrently
165	that expected b to never have the value 42 if a was zero.  To prevent
166	the compiler from doing this, write something like:
167	
168		if (a)
169			ACCESS_ONCE(b) = 9;
170		else
171			ACCESS_ONCE(b) = 42;
172	
173	Don't even -think- about doing this without proper use of memory barriers,
174	locks, or atomic operations if variable a can change at runtime!
175	
176	*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
177	
178	Now, we move onto the atomic operation interfaces typically implemented with
179	the help of assembly code.
180	
181		void atomic_add(int i, atomic_t *v);
182		void atomic_sub(int i, atomic_t *v);
183		void atomic_inc(atomic_t *v);
184		void atomic_dec(atomic_t *v);
185	
186	These four routines add and subtract integral values to/from the given
187	atomic_t value.  The first two routines pass explicit integers by
188	which to make the adjustment, whereas the latter two use an implicit
189	adjustment value of "1".
190	
191	One very important aspect of these two routines is that they DO NOT
192	require any explicit memory barriers.  They need only perform the
193	atomic_t counter update in an SMP safe manner.
194	
195	Next, we have:
196	
197		int atomic_inc_return(atomic_t *v);
198		int atomic_dec_return(atomic_t *v);
199	
200	These routines add 1 and subtract 1, respectively, from the given
201	atomic_t and return the new counter value after the operation is
202	performed.
203	
204	Unlike the above routines, it is required that explicit memory
205	barriers are performed before and after the operation.  It must be
206	done such that all memory operations before and after the atomic
207	operation calls are strongly ordered with respect to the atomic
208	operation itself.
209	
210	For example, it should behave as if a smp_mb() call existed both
211	before and after the atomic operation.
212	
213	If the atomic instructions used in an implementation provide explicit
214	memory barrier semantics which satisfy the above requirements, that is
215	fine as well.
216	
217	Let's move on:
218	
219		int atomic_add_return(int i, atomic_t *v);
220		int atomic_sub_return(int i, atomic_t *v);
221	
222	These behave just like atomic_{inc,dec}_return() except that an
223	explicit counter adjustment is given instead of the implicit "1".
224	This means that like atomic_{inc,dec}_return(), the memory barrier
225	semantics are required.
226	
227	Next:
228	
229		int atomic_inc_and_test(atomic_t *v);
230		int atomic_dec_and_test(atomic_t *v);
231	
232	These two routines increment and decrement by 1, respectively, the
233	given atomic counter.  They return a boolean indicating whether the
234	resulting counter value was zero or not.
235	
236	It requires explicit memory barrier semantics around the operation as
237	above.
238	
239		int atomic_sub_and_test(int i, atomic_t *v);
240	
241	This is identical to atomic_dec_and_test() except that an explicit
242	decrement is given instead of the implicit "1".  It requires explicit
243	memory barrier semantics around the operation.
244	
245		int atomic_add_negative(int i, atomic_t *v);
246	
247	The given increment is added to the given atomic counter value.  A
248	boolean is return which indicates whether the resulting counter value
249	is negative.  It requires explicit memory barrier semantics around the
250	operation.
251	
252	Then:
253	
254		int atomic_xchg(atomic_t *v, int new);
255	
256	This performs an atomic exchange operation on the atomic variable v, setting
257	the given new value.  It returns the old value that the atomic variable v had
258	just before the operation.
259	
260	atomic_xchg requires explicit memory barriers around the operation.
261	
262		int atomic_cmpxchg(atomic_t *v, int old, int new);
263	
264	This performs an atomic compare exchange operation on the atomic value v,
265	with the given old and new values. Like all atomic_xxx operations,
266	atomic_cmpxchg will only satisfy its atomicity semantics as long as all
267	other accesses of *v are performed through atomic_xxx operations.
268	
269	atomic_cmpxchg requires explicit memory barriers around the operation.
270	
271	The semantics for atomic_cmpxchg are the same as those defined for 'cas'
272	below.
273	
274	Finally:
275	
276		int atomic_add_unless(atomic_t *v, int a, int u);
277	
278	If the atomic value v is not equal to u, this function adds a to v, and
279	returns non zero. If v is equal to u then it returns zero. This is done as
280	an atomic operation.
281	
282	atomic_add_unless requires explicit memory barriers around the operation
283	unless it fails (returns 0).
284	
285	atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
286	
287	
288	If a caller requires memory barrier semantics around an atomic_t
289	operation which does not return a value, a set of interfaces are
290	defined which accomplish this:
291	
292		void smp_mb__before_atomic(void);
293		void smp_mb__after_atomic(void);
294	
295	For example, smp_mb__before_atomic() can be used like so:
296	
297		obj->dead = 1;
298		smp_mb__before_atomic();
299		atomic_dec(&obj->ref_count);
300	
301	It makes sure that all memory operations preceding the atomic_dec()
302	call are strongly ordered with respect to the atomic counter
303	operation.  In the above example, it guarantees that the assignment of
304	"1" to obj->dead will be globally visible to other cpus before the
305	atomic counter decrement.
306	
307	Without the explicit smp_mb__before_atomic() call, the
308	implementation could legally allow the atomic counter update visible
309	to other cpus before the "obj->dead = 1;" assignment.
310	
311	A missing memory barrier in the cases where they are required by the
312	atomic_t implementation above can have disastrous results.  Here is
313	an example, which follows a pattern occurring frequently in the Linux
314	kernel.  It is the use of atomic counters to implement reference
315	counting, and it works such that once the counter falls to zero it can
316	be guaranteed that no other entity can be accessing the object:
317	
318	static void obj_list_add(struct obj *obj, struct list_head *head)
319	{
320		obj->active = 1;
321		list_add(&obj->list, head);
322	}
323	
324	static void obj_list_del(struct obj *obj)
325	{
326		list_del(&obj->list);
327		obj->active = 0;
328	}
329	
330	static void obj_destroy(struct obj *obj)
331	{
332		BUG_ON(obj->active);
333		kfree(obj);
334	}
335	
336	struct obj *obj_list_peek(struct list_head *head)
337	{
338		if (!list_empty(head)) {
339			struct obj *obj;
340	
341			obj = list_entry(head->next, struct obj, list);
342			atomic_inc(&obj->refcnt);
343			return obj;
344		}
345		return NULL;
346	}
347	
348	void obj_poke(void)
349	{
350		struct obj *obj;
351	
352		spin_lock(&global_list_lock);
353		obj = obj_list_peek(&global_list);
354		spin_unlock(&global_list_lock);
355	
356		if (obj) {
357			obj->ops->poke(obj);
358			if (atomic_dec_and_test(&obj->refcnt))
359				obj_destroy(obj);
360		}
361	}
362	
363	void obj_timeout(struct obj *obj)
364	{
365		spin_lock(&global_list_lock);
366		obj_list_del(obj);
367		spin_unlock(&global_list_lock);
368	
369		if (atomic_dec_and_test(&obj->refcnt))
370			obj_destroy(obj);
371	}
372	
373	(This is a simplification of the ARP queue management in the
374	 generic neighbour discover code of the networking.  Olaf Kirch
375	 found a bug wrt. memory barriers in kfree_skb() that exposed
376	 the atomic_t memory barrier requirements quite clearly.)
377	
378	Given the above scheme, it must be the case that the obj->active
379	update done by the obj list deletion be visible to other processors
380	before the atomic counter decrement is performed.
381	
382	Otherwise, the counter could fall to zero, yet obj->active would still
383	be set, thus triggering the assertion in obj_destroy().  The error
384	sequence looks like this:
385	
386		cpu 0				cpu 1
387		obj_poke()			obj_timeout()
388		obj = obj_list_peek();
389		... gains ref to obj, refcnt=2
390						obj_list_del(obj);
391						obj->active = 0 ...
392						... visibility delayed ...
393						atomic_dec_and_test()
394						... refcnt drops to 1 ...
395		atomic_dec_and_test()
396		... refcount drops to 0 ...
397		obj_destroy()
398		BUG() triggers since obj->active
399		still seen as one
400						obj->active update visibility occurs
401	
402	With the memory barrier semantics required of the atomic_t operations
403	which return values, the above sequence of memory visibility can never
404	happen.  Specifically, in the above case the atomic_dec_and_test()
405	counter decrement would not become globally visible until the
406	obj->active update does.
407	
408	As a historical note, 32-bit Sparc used to only allow usage of
409	24-bits of its atomic_t type.  This was because it used 8 bits
410	as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
411	type instruction.  However, 32-bit Sparc has since been moved over
412	to a "hash table of spinlocks" scheme, that allows the full 32-bit
413	counter to be realized.  Essentially, an array of spinlocks are
414	indexed into based upon the address of the atomic_t being operated
415	on, and that lock protects the atomic operation.  Parisc uses the
416	same scheme.
417	
418	Another note is that the atomic_t operations returning values are
419	extremely slow on an old 386.
420	
421	We will now cover the atomic bitmask operations.  You will find that
422	their SMP and memory barrier semantics are similar in shape and scope
423	to the atomic_t ops above.
424	
425	Native atomic bit operations are defined to operate on objects aligned
426	to the size of an "unsigned long" C data type, and are least of that
427	size.  The endianness of the bits within each "unsigned long" are the
428	native endianness of the cpu.
429	
430		void set_bit(unsigned long nr, volatile unsigned long *addr);
431		void clear_bit(unsigned long nr, volatile unsigned long *addr);
432		void change_bit(unsigned long nr, volatile unsigned long *addr);
433	
434	These routines set, clear, and change, respectively, the bit number
435	indicated by "nr" on the bit mask pointed to by "ADDR".
436	
437	They must execute atomically, yet there are no implicit memory barrier
438	semantics required of these interfaces.
439	
440		int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
441		int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
442		int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
443	
444	Like the above, except that these routines return a boolean which
445	indicates whether the changed bit was set _BEFORE_ the atomic bit
446	operation.
447	
448	WARNING! It is incredibly important that the value be a boolean,
449	ie. "0" or "1".  Do not try to be fancy and save a few instructions by
450	declaring the above to return "long" and just returning something like
451	"old_val & mask" because that will not work.
452	
453	For one thing, this return value gets truncated to int in many code
454	paths using these interfaces, so on 64-bit if the bit is set in the
455	upper 32-bits then testers will never see that.
456	
457	One great example of where this problem crops up are the thread_info
458	flag operations.  Routines such as test_and_set_ti_thread_flag() chop
459	the return value into an int.  There are other places where things
460	like this occur as well.
461	
462	These routines, like the atomic_t counter operations returning values,
463	require explicit memory barrier semantics around their execution.  All
464	memory operations before the atomic bit operation call must be made
465	visible globally before the atomic bit operation is made visible.
466	Likewise, the atomic bit operation must be visible globally before any
467	subsequent memory operation is made visible.  For example:
468	
469		obj->dead = 1;
470		if (test_and_set_bit(0, &obj->flags))
471			/* ... */;
472		obj->killed = 1;
473	
474	The implementation of test_and_set_bit() must guarantee that
475	"obj->dead = 1;" is visible to cpus before the atomic memory operation
476	done by test_and_set_bit() becomes visible.  Likewise, the atomic
477	memory operation done by test_and_set_bit() must become visible before
478	"obj->killed = 1;" is visible.
479	
480	Finally there is the basic operation:
481	
482		int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
483	
484	Which returns a boolean indicating if bit "nr" is set in the bitmask
485	pointed to by "addr".
486	
487	If explicit memory barriers are required around {set,clear}_bit() (which do
488	not return a value, and thus does not need to provide memory barrier
489	semantics), two interfaces are provided:
490	
491		void smp_mb__before_atomic(void);
492		void smp_mb__after_atomic(void);
493	
494	They are used as follows, and are akin to their atomic_t operation
495	brothers:
496	
497		/* All memory operations before this call will
498		 * be globally visible before the clear_bit().
499		 */
500		smp_mb__before_atomic();
501		clear_bit( ... );
502	
503		/* The clear_bit() will be visible before all
504		 * subsequent memory operations.
505		 */
506		 smp_mb__after_atomic();
507	
508	There are two special bitops with lock barrier semantics (acquire/release,
509	same as spinlocks). These operate in the same way as their non-_lock/unlock
510	postfixed variants, except that they are to provide acquire/release semantics,
511	respectively. This means they can be used for bit_spin_trylock and
512	bit_spin_unlock type operations without specifying any more barriers.
513	
514		int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
515		void clear_bit_unlock(unsigned long nr, unsigned long *addr);
516		void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
517	
518	The __clear_bit_unlock version is non-atomic, however it still implements
519	unlock barrier semantics. This can be useful if the lock itself is protecting
520	the other bits in the word.
521	
522	Finally, there are non-atomic versions of the bitmask operations
523	provided.  They are used in contexts where some other higher-level SMP
524	locking scheme is being used to protect the bitmask, and thus less
525	expensive non-atomic operations may be used in the implementation.
526	They have names similar to the above bitmask operation interfaces,
527	except that two underscores are prefixed to the interface name.
528	
529		void __set_bit(unsigned long nr, volatile unsigned long *addr);
530		void __clear_bit(unsigned long nr, volatile unsigned long *addr);
531		void __change_bit(unsigned long nr, volatile unsigned long *addr);
532		int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
533		int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
534		int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
535	
536	These non-atomic variants also do not require any special memory
537	barrier semantics.
538	
539	The routines xchg() and cmpxchg() need the same exact memory barriers
540	as the atomic and bit operations returning values.
541	
542	Spinlocks and rwlocks have memory barrier expectations as well.
543	The rule to follow is simple:
544	
545	1) When acquiring a lock, the implementation must make it globally
546	   visible before any subsequent memory operation.
547	
548	2) When releasing a lock, the implementation must make it such that
549	   all previous memory operations are globally visible before the
550	   lock release.
551	
552	Which finally brings us to _atomic_dec_and_lock().  There is an
553	architecture-neutral version implemented in lib/dec_and_lock.c,
554	but most platforms will wish to optimize this in assembler.
555	
556		int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
557	
558	Atomically decrement the given counter, and if will drop to zero
559	atomically acquire the given spinlock and perform the decrement
560	of the counter to zero.  If it does not drop to zero, do nothing
561	with the spinlock.
562	
563	It is actually pretty simple to get the memory barrier correct.
564	Simply satisfy the spinlock grab requirements, which is make
565	sure the spinlock operation is globally visible before any
566	subsequent memory operation.
567	
568	We can demonstrate this operation more clearly if we define
569	an abstract atomic operation:
570	
571		long cas(long *mem, long old, long new);
572	
573	"cas" stands for "compare and swap".  It atomically:
574	
575	1) Compares "old" with the value currently at "mem".
576	2) If they are equal, "new" is written to "mem".
577	3) Regardless, the current value at "mem" is returned.
578	
579	As an example usage, here is what an atomic counter update
580	might look like:
581	
582	void example_atomic_inc(long *counter)
583	{
584		long old, new, ret;
585	
586		while (1) {
587			old = *counter;
588			new = old + 1;
589	
590			ret = cas(counter, old, new);
591			if (ret == old)
592				break;
593		}
594	}
595	
596	Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
597	
598	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
599	{
600		long old, new, ret;
601		int went_to_zero;
602	
603		went_to_zero = 0;
604		while (1) {
605			old = atomic_read(atomic);
606			new = old - 1;
607			if (new == 0) {
608				went_to_zero = 1;
609				spin_lock(lock);
610			}
611			ret = cas(atomic, old, new);
612			if (ret == old)
613				break;
614			if (went_to_zero) {
615				spin_unlock(lock);
616				went_to_zero = 0;
617			}
618		}
619	
620		return went_to_zero;
621	}
622	
623	Now, as far as memory barriers go, as long as spin_lock()
624	strictly orders all subsequent memory operations (including
625	the cas()) with respect to itself, things will be fine.
626	
627	Said another way, _atomic_dec_and_lock() must guarantee that
628	a counter dropping to zero is never made visible before the
629	spinlock being acquired.
630	
631	Note that this also means that for the case where the counter
632	is not dropping to zero, there are no memory ordering
633	requirements.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.