About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / atomic_ops.txt


Based on kernel version 4.9. Page generated on 2016-12-21 14:28 EST.

1			Semantics and Behavior of Atomic and
2			         Bitmask Operations
3	
4				  David S. Miller	 
5	
6		This document is intended to serve as a guide to Linux port
7	maintainers on how to implement atomic counter, bitops, and spinlock
8	interfaces properly.
9	
10		The atomic_t type should be defined as a signed integer and
11	the atomic_long_t type as a signed long integer.  Also, they should
12	be made opaque such that any kind of cast to a normal C integer type
13	will fail.  Something like the following should suffice:
14	
15		typedef struct { int counter; } atomic_t;
16		typedef struct { long counter; } atomic_long_t;
17	
18	Historically, counter has been declared volatile.  This is now discouraged.
19	See Documentation/volatile-considered-harmful.txt for the complete rationale.
20	
21	local_t is very similar to atomic_t. If the counter is per CPU and only
22	updated by one CPU, local_t is probably more appropriate. Please see
23	Documentation/local_ops.txt for the semantics of local_t.
24	
25	The first operations to implement for atomic_t's are the initializers and
26	plain reads.
27	
28		#define ATOMIC_INIT(i)		{ (i) }
29		#define atomic_set(v, i)	((v)->counter = (i))
30	
31	The first macro is used in definitions, such as:
32	
33	static atomic_t my_counter = ATOMIC_INIT(1);
34	
35	The initializer is atomic in that the return values of the atomic operations
36	are guaranteed to be correct reflecting the initialized value if the
37	initializer is used before runtime.  If the initializer is used at runtime, a
38	proper implicit or explicit read memory barrier is needed before reading the
39	value with atomic_read from another thread.
40	
41	As with all of the atomic_ interfaces, replace the leading "atomic_"
42	with "atomic_long_" to operate on atomic_long_t.
43	
44	The second interface can be used at runtime, as in:
45	
46		struct foo { atomic_t counter; };
47		...
48	
49		struct foo *k;
50	
51		k = kmalloc(sizeof(*k), GFP_KERNEL);
52		if (!k)
53			return -ENOMEM;
54		atomic_set(&k->counter, 0);
55	
56	The setting is atomic in that the return values of the atomic operations by
57	all threads are guaranteed to be correct reflecting either the value that has
58	been set with this operation or set with another operation.  A proper implicit
59	or explicit memory barrier is needed before the value set with the operation
60	is guaranteed to be readable with atomic_read from another thread.
61	
62	Next, we have:
63	
64		#define atomic_read(v)	((v)->counter)
65	
66	which simply reads the counter value currently visible to the calling thread.
67	The read is atomic in that the return value is guaranteed to be one of the
68	values initialized or modified with the interface operations if a proper
69	implicit or explicit memory barrier is used after possible runtime
70	initialization by any other thread and the value is modified only with the
71	interface operations.  atomic_read does not guarantee that the runtime
72	initialization by any other thread is visible yet, so the user of the
73	interface must take care of that with a proper implicit or explicit memory
74	barrier.
75	
76	*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
77	
78	Some architectures may choose to use the volatile keyword, barriers, or inline
79	assembly to guarantee some degree of immediacy for atomic_read() and
80	atomic_set().  This is not uniformly guaranteed, and may change in the future,
81	so all users of atomic_t should treat atomic_read() and atomic_set() as simple
82	C statements that may be reordered or optimized away entirely by the compiler
83	or processor, and explicitly invoke the appropriate compiler and/or memory
84	barrier for each use case.  Failure to do so will result in code that may
85	suddenly break when used with different architectures or compiler
86	optimizations, or even changes in unrelated code which changes how the
87	compiler optimizes the section accessing atomic_t variables.
88	
89	*** YOU HAVE BEEN WARNED! ***
90	
91	Properly aligned pointers, longs, ints, and chars (and unsigned
92	equivalents) may be atomically loaded from and stored to in the same
93	sense as described for atomic_read() and atomic_set().  The ACCESS_ONCE()
94	macro should be used to prevent the compiler from using optimizations
95	that might otherwise optimize accesses out of existence on the one hand,
96	or that might create unsolicited accesses on the other.
97	
98	For example consider the following code:
99	
100		while (a > 0)
101			do_something();
102	
103	If the compiler can prove that do_something() does not store to the
104	variable a, then the compiler is within its rights transforming this to
105	the following:
106	
107		tmp = a;
108		if (a > 0)
109			for (;;)
110				do_something();
111	
112	If you don't want the compiler to do this (and you probably don't), then
113	you should use something like the following:
114	
115		while (ACCESS_ONCE(a) < 0)
116			do_something();
117	
118	Alternatively, you could place a barrier() call in the loop.
119	
120	For another example, consider the following code:
121	
122		tmp_a = a;
123		do_something_with(tmp_a);
124		do_something_else_with(tmp_a);
125	
126	If the compiler can prove that do_something_with() does not store to the
127	variable a, then the compiler is within its rights to manufacture an
128	additional load as follows:
129	
130		tmp_a = a;
131		do_something_with(tmp_a);
132		tmp_a = a;
133		do_something_else_with(tmp_a);
134	
135	This could fatally confuse your code if it expected the same value
136	to be passed to do_something_with() and do_something_else_with().
137	
138	The compiler would be likely to manufacture this additional load if
139	do_something_with() was an inline function that made very heavy use
140	of registers: reloading from variable a could save a flush to the
141	stack and later reload.  To prevent the compiler from attacking your
142	code in this manner, write the following:
143	
144		tmp_a = ACCESS_ONCE(a);
145		do_something_with(tmp_a);
146		do_something_else_with(tmp_a);
147	
148	For a final example, consider the following code, assuming that the
149	variable a is set at boot time before the second CPU is brought online
150	and never changed later, so that memory barriers are not needed:
151	
152		if (a)
153			b = 9;
154		else
155			b = 42;
156	
157	The compiler is within its rights to manufacture an additional store
158	by transforming the above code into the following:
159	
160		b = 42;
161		if (a)
162			b = 9;
163	
164	This could come as a fatal surprise to other code running concurrently
165	that expected b to never have the value 42 if a was zero.  To prevent
166	the compiler from doing this, write something like:
167	
168		if (a)
169			ACCESS_ONCE(b) = 9;
170		else
171			ACCESS_ONCE(b) = 42;
172	
173	Don't even -think- about doing this without proper use of memory barriers,
174	locks, or atomic operations if variable a can change at runtime!
175	
176	*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
177	
178	Now, we move onto the atomic operation interfaces typically implemented with
179	the help of assembly code.
180	
181		void atomic_add(int i, atomic_t *v);
182		void atomic_sub(int i, atomic_t *v);
183		void atomic_inc(atomic_t *v);
184		void atomic_dec(atomic_t *v);
185	
186	These four routines add and subtract integral values to/from the given
187	atomic_t value.  The first two routines pass explicit integers by
188	which to make the adjustment, whereas the latter two use an implicit
189	adjustment value of "1".
190	
191	One very important aspect of these two routines is that they DO NOT
192	require any explicit memory barriers.  They need only perform the
193	atomic_t counter update in an SMP safe manner.
194	
195	Next, we have:
196	
197		int atomic_inc_return(atomic_t *v);
198		int atomic_dec_return(atomic_t *v);
199	
200	These routines add 1 and subtract 1, respectively, from the given
201	atomic_t and return the new counter value after the operation is
202	performed.
203	
204	Unlike the above routines, it is required that these primitives
205	include explicit memory barriers that are performed before and after
206	the operation.  It must be done such that all memory operations before
207	and after the atomic operation calls are strongly ordered with respect
208	to the atomic operation itself.
209	
210	For example, it should behave as if a smp_mb() call existed both
211	before and after the atomic operation.
212	
213	If the atomic instructions used in an implementation provide explicit
214	memory barrier semantics which satisfy the above requirements, that is
215	fine as well.
216	
217	Let's move on:
218	
219		int atomic_add_return(int i, atomic_t *v);
220		int atomic_sub_return(int i, atomic_t *v);
221	
222	These behave just like atomic_{inc,dec}_return() except that an
223	explicit counter adjustment is given instead of the implicit "1".
224	This means that like atomic_{inc,dec}_return(), the memory barrier
225	semantics are required.
226	
227	Next:
228	
229		int atomic_inc_and_test(atomic_t *v);
230		int atomic_dec_and_test(atomic_t *v);
231	
232	These two routines increment and decrement by 1, respectively, the
233	given atomic counter.  They return a boolean indicating whether the
234	resulting counter value was zero or not.
235	
236	Again, these primitives provide explicit memory barrier semantics around
237	the atomic operation.
238	
239		int atomic_sub_and_test(int i, atomic_t *v);
240	
241	This is identical to atomic_dec_and_test() except that an explicit
242	decrement is given instead of the implicit "1".  This primitive must
243	provide explicit memory barrier semantics around the operation.
244	
245		int atomic_add_negative(int i, atomic_t *v);
246	
247	The given increment is added to the given atomic counter value.  A boolean
248	is return which indicates whether the resulting counter value is negative.
249	This primitive must provide explicit memory barrier semantics around
250	the operation.
251	
252	Then:
253	
254		int atomic_xchg(atomic_t *v, int new);
255	
256	This performs an atomic exchange operation on the atomic variable v, setting
257	the given new value.  It returns the old value that the atomic variable v had
258	just before the operation.
259	
260	atomic_xchg must provide explicit memory barriers around the operation.
261	
262		int atomic_cmpxchg(atomic_t *v, int old, int new);
263	
264	This performs an atomic compare exchange operation on the atomic value v,
265	with the given old and new values. Like all atomic_xxx operations,
266	atomic_cmpxchg will only satisfy its atomicity semantics as long as all
267	other accesses of *v are performed through atomic_xxx operations.
268	
269	atomic_cmpxchg must provide explicit memory barriers around the operation,
270	although if the comparison fails then no memory ordering guarantees are
271	required.
272	
273	The semantics for atomic_cmpxchg are the same as those defined for 'cas'
274	below.
275	
276	Finally:
277	
278		int atomic_add_unless(atomic_t *v, int a, int u);
279	
280	If the atomic value v is not equal to u, this function adds a to v, and
281	returns non zero. If v is equal to u then it returns zero. This is done as
282	an atomic operation.
283	
284	atomic_add_unless must provide explicit memory barriers around the
285	operation unless it fails (returns 0).
286	
287	atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
288	
289	
290	If a caller requires memory barrier semantics around an atomic_t
291	operation which does not return a value, a set of interfaces are
292	defined which accomplish this:
293	
294		void smp_mb__before_atomic(void);
295		void smp_mb__after_atomic(void);
296	
297	For example, smp_mb__before_atomic() can be used like so:
298	
299		obj->dead = 1;
300		smp_mb__before_atomic();
301		atomic_dec(&obj->ref_count);
302	
303	It makes sure that all memory operations preceding the atomic_dec()
304	call are strongly ordered with respect to the atomic counter
305	operation.  In the above example, it guarantees that the assignment of
306	"1" to obj->dead will be globally visible to other cpus before the
307	atomic counter decrement.
308	
309	Without the explicit smp_mb__before_atomic() call, the
310	implementation could legally allow the atomic counter update visible
311	to other cpus before the "obj->dead = 1;" assignment.
312	
313	A missing memory barrier in the cases where they are required by the
314	atomic_t implementation above can have disastrous results.  Here is
315	an example, which follows a pattern occurring frequently in the Linux
316	kernel.  It is the use of atomic counters to implement reference
317	counting, and it works such that once the counter falls to zero it can
318	be guaranteed that no other entity can be accessing the object:
319	
320	static void obj_list_add(struct obj *obj, struct list_head *head)
321	{
322		obj->active = 1;
323		list_add(&obj->list, head);
324	}
325	
326	static void obj_list_del(struct obj *obj)
327	{
328		list_del(&obj->list);
329		obj->active = 0;
330	}
331	
332	static void obj_destroy(struct obj *obj)
333	{
334		BUG_ON(obj->active);
335		kfree(obj);
336	}
337	
338	struct obj *obj_list_peek(struct list_head *head)
339	{
340		if (!list_empty(head)) {
341			struct obj *obj;
342	
343			obj = list_entry(head->next, struct obj, list);
344			atomic_inc(&obj->refcnt);
345			return obj;
346		}
347		return NULL;
348	}
349	
350	void obj_poke(void)
351	{
352		struct obj *obj;
353	
354		spin_lock(&global_list_lock);
355		obj = obj_list_peek(&global_list);
356		spin_unlock(&global_list_lock);
357	
358		if (obj) {
359			obj->ops->poke(obj);
360			if (atomic_dec_and_test(&obj->refcnt))
361				obj_destroy(obj);
362		}
363	}
364	
365	void obj_timeout(struct obj *obj)
366	{
367		spin_lock(&global_list_lock);
368		obj_list_del(obj);
369		spin_unlock(&global_list_lock);
370	
371		if (atomic_dec_and_test(&obj->refcnt))
372			obj_destroy(obj);
373	}
374	
375	(This is a simplification of the ARP queue management in the
376	 generic neighbour discover code of the networking.  Olaf Kirch
377	 found a bug wrt. memory barriers in kfree_skb() that exposed
378	 the atomic_t memory barrier requirements quite clearly.)
379	
380	Given the above scheme, it must be the case that the obj->active
381	update done by the obj list deletion be visible to other processors
382	before the atomic counter decrement is performed.
383	
384	Otherwise, the counter could fall to zero, yet obj->active would still
385	be set, thus triggering the assertion in obj_destroy().  The error
386	sequence looks like this:
387	
388		cpu 0				cpu 1
389		obj_poke()			obj_timeout()
390		obj = obj_list_peek();
391		... gains ref to obj, refcnt=2
392						obj_list_del(obj);
393						obj->active = 0 ...
394						... visibility delayed ...
395						atomic_dec_and_test()
396						... refcnt drops to 1 ...
397		atomic_dec_and_test()
398		... refcount drops to 0 ...
399		obj_destroy()
400		BUG() triggers since obj->active
401		still seen as one
402						obj->active update visibility occurs
403	
404	With the memory barrier semantics required of the atomic_t operations
405	which return values, the above sequence of memory visibility can never
406	happen.  Specifically, in the above case the atomic_dec_and_test()
407	counter decrement would not become globally visible until the
408	obj->active update does.
409	
410	As a historical note, 32-bit Sparc used to only allow usage of
411	24-bits of its atomic_t type.  This was because it used 8 bits
412	as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
413	type instruction.  However, 32-bit Sparc has since been moved over
414	to a "hash table of spinlocks" scheme, that allows the full 32-bit
415	counter to be realized.  Essentially, an array of spinlocks are
416	indexed into based upon the address of the atomic_t being operated
417	on, and that lock protects the atomic operation.  Parisc uses the
418	same scheme.
419	
420	Another note is that the atomic_t operations returning values are
421	extremely slow on an old 386.
422	
423	We will now cover the atomic bitmask operations.  You will find that
424	their SMP and memory barrier semantics are similar in shape and scope
425	to the atomic_t ops above.
426	
427	Native atomic bit operations are defined to operate on objects aligned
428	to the size of an "unsigned long" C data type, and are least of that
429	size.  The endianness of the bits within each "unsigned long" are the
430	native endianness of the cpu.
431	
432		void set_bit(unsigned long nr, volatile unsigned long *addr);
433		void clear_bit(unsigned long nr, volatile unsigned long *addr);
434		void change_bit(unsigned long nr, volatile unsigned long *addr);
435	
436	These routines set, clear, and change, respectively, the bit number
437	indicated by "nr" on the bit mask pointed to by "ADDR".
438	
439	They must execute atomically, yet there are no implicit memory barrier
440	semantics required of these interfaces.
441	
442		int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
443		int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
444		int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
445	
446	Like the above, except that these routines return a boolean which
447	indicates whether the changed bit was set _BEFORE_ the atomic bit
448	operation.
449	
450	WARNING! It is incredibly important that the value be a boolean,
451	ie. "0" or "1".  Do not try to be fancy and save a few instructions by
452	declaring the above to return "long" and just returning something like
453	"old_val & mask" because that will not work.
454	
455	For one thing, this return value gets truncated to int in many code
456	paths using these interfaces, so on 64-bit if the bit is set in the
457	upper 32-bits then testers will never see that.
458	
459	One great example of where this problem crops up are the thread_info
460	flag operations.  Routines such as test_and_set_ti_thread_flag() chop
461	the return value into an int.  There are other places where things
462	like this occur as well.
463	
464	These routines, like the atomic_t counter operations returning values,
465	must provide explicit memory barrier semantics around their execution.
466	All memory operations before the atomic bit operation call must be
467	made visible globally before the atomic bit operation is made visible.
468	Likewise, the atomic bit operation must be visible globally before any
469	subsequent memory operation is made visible.  For example:
470	
471		obj->dead = 1;
472		if (test_and_set_bit(0, &obj->flags))
473			/* ... */;
474		obj->killed = 1;
475	
476	The implementation of test_and_set_bit() must guarantee that
477	"obj->dead = 1;" is visible to cpus before the atomic memory operation
478	done by test_and_set_bit() becomes visible.  Likewise, the atomic
479	memory operation done by test_and_set_bit() must become visible before
480	"obj->killed = 1;" is visible.
481	
482	Finally there is the basic operation:
483	
484		int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
485	
486	Which returns a boolean indicating if bit "nr" is set in the bitmask
487	pointed to by "addr".
488	
489	If explicit memory barriers are required around {set,clear}_bit() (which do
490	not return a value, and thus does not need to provide memory barrier
491	semantics), two interfaces are provided:
492	
493		void smp_mb__before_atomic(void);
494		void smp_mb__after_atomic(void);
495	
496	They are used as follows, and are akin to their atomic_t operation
497	brothers:
498	
499		/* All memory operations before this call will
500		 * be globally visible before the clear_bit().
501		 */
502		smp_mb__before_atomic();
503		clear_bit( ... );
504	
505		/* The clear_bit() will be visible before all
506		 * subsequent memory operations.
507		 */
508		 smp_mb__after_atomic();
509	
510	There are two special bitops with lock barrier semantics (acquire/release,
511	same as spinlocks). These operate in the same way as their non-_lock/unlock
512	postfixed variants, except that they are to provide acquire/release semantics,
513	respectively. This means they can be used for bit_spin_trylock and
514	bit_spin_unlock type operations without specifying any more barriers.
515	
516		int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
517		void clear_bit_unlock(unsigned long nr, unsigned long *addr);
518		void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
519	
520	The __clear_bit_unlock version is non-atomic, however it still implements
521	unlock barrier semantics. This can be useful if the lock itself is protecting
522	the other bits in the word.
523	
524	Finally, there are non-atomic versions of the bitmask operations
525	provided.  They are used in contexts where some other higher-level SMP
526	locking scheme is being used to protect the bitmask, and thus less
527	expensive non-atomic operations may be used in the implementation.
528	They have names similar to the above bitmask operation interfaces,
529	except that two underscores are prefixed to the interface name.
530	
531		void __set_bit(unsigned long nr, volatile unsigned long *addr);
532		void __clear_bit(unsigned long nr, volatile unsigned long *addr);
533		void __change_bit(unsigned long nr, volatile unsigned long *addr);
534		int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
535		int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
536		int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
537	
538	These non-atomic variants also do not require any special memory
539	barrier semantics.
540	
541	The routines xchg() and cmpxchg() must provide the same exact
542	memory-barrier semantics as the atomic and bit operations returning
543	values.
544	
545	Note: If someone wants to use xchg(), cmpxchg() and their variants,
546	linux/atomic.h should be included rather than asm/cmpxchg.h, unless
547	the code is in arch/* and can take care of itself.
548	
549	Spinlocks and rwlocks have memory barrier expectations as well.
550	The rule to follow is simple:
551	
552	1) When acquiring a lock, the implementation must make it globally
553	   visible before any subsequent memory operation.
554	
555	2) When releasing a lock, the implementation must make it such that
556	   all previous memory operations are globally visible before the
557	   lock release.
558	
559	Which finally brings us to _atomic_dec_and_lock().  There is an
560	architecture-neutral version implemented in lib/dec_and_lock.c,
561	but most platforms will wish to optimize this in assembler.
562	
563		int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
564	
565	Atomically decrement the given counter, and if will drop to zero
566	atomically acquire the given spinlock and perform the decrement
567	of the counter to zero.  If it does not drop to zero, do nothing
568	with the spinlock.
569	
570	It is actually pretty simple to get the memory barrier correct.
571	Simply satisfy the spinlock grab requirements, which is make
572	sure the spinlock operation is globally visible before any
573	subsequent memory operation.
574	
575	We can demonstrate this operation more clearly if we define
576	an abstract atomic operation:
577	
578		long cas(long *mem, long old, long new);
579	
580	"cas" stands for "compare and swap".  It atomically:
581	
582	1) Compares "old" with the value currently at "mem".
583	2) If they are equal, "new" is written to "mem".
584	3) Regardless, the current value at "mem" is returned.
585	
586	As an example usage, here is what an atomic counter update
587	might look like:
588	
589	void example_atomic_inc(long *counter)
590	{
591		long old, new, ret;
592	
593		while (1) {
594			old = *counter;
595			new = old + 1;
596	
597			ret = cas(counter, old, new);
598			if (ret == old)
599				break;
600		}
601	}
602	
603	Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
604	
605	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
606	{
607		long old, new, ret;
608		int went_to_zero;
609	
610		went_to_zero = 0;
611		while (1) {
612			old = atomic_read(atomic);
613			new = old - 1;
614			if (new == 0) {
615				went_to_zero = 1;
616				spin_lock(lock);
617			}
618			ret = cas(atomic, old, new);
619			if (ret == old)
620				break;
621			if (went_to_zero) {
622				spin_unlock(lock);
623				went_to_zero = 0;
624			}
625		}
626	
627		return went_to_zero;
628	}
629	
630	Now, as far as memory barriers go, as long as spin_lock()
631	strictly orders all subsequent memory operations (including
632	the cas()) with respect to itself, things will be fine.
633	
634	Said another way, _atomic_dec_and_lock() must guarantee that
635	a counter dropping to zero is never made visible before the
636	spinlock being acquired.
637	
638	Note that this also means that for the case where the counter
639	is not dropping to zero, there are no memory ordering
640	requirements.
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog