About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / atomic_ops.txt

Custom Search

Based on kernel version 4.1. Page generated on 2015-06-28 12:07 EST.

1			Semantics and Behavior of Atomic and
2			         Bitmask Operations
4				  David S. Miller	 
6		This document is intended to serve as a guide to Linux port
7	maintainers on how to implement atomic counter, bitops, and spinlock
8	interfaces properly.
10		The atomic_t type should be defined as a signed integer and
11	the atomic_long_t type as a signed long integer.  Also, they should
12	be made opaque such that any kind of cast to a normal C integer type
13	will fail.  Something like the following should suffice:
15		typedef struct { int counter; } atomic_t;
16		typedef struct { long counter; } atomic_long_t;
18	Historically, counter has been declared volatile.  This is now discouraged.
19	See Documentation/volatile-considered-harmful.txt for the complete rationale.
21	local_t is very similar to atomic_t. If the counter is per CPU and only
22	updated by one CPU, local_t is probably more appropriate. Please see
23	Documentation/local_ops.txt for the semantics of local_t.
25	The first operations to implement for atomic_t's are the initializers and
26	plain reads.
28		#define ATOMIC_INIT(i)		{ (i) }
29		#define atomic_set(v, i)	((v)->counter = (i))
31	The first macro is used in definitions, such as:
33	static atomic_t my_counter = ATOMIC_INIT(1);
35	The initializer is atomic in that the return values of the atomic operations
36	are guaranteed to be correct reflecting the initialized value if the
37	initializer is used before runtime.  If the initializer is used at runtime, a
38	proper implicit or explicit read memory barrier is needed before reading the
39	value with atomic_read from another thread.
41	As with all of the atomic_ interfaces, replace the leading "atomic_"
42	with "atomic_long_" to operate on atomic_long_t.
44	The second interface can be used at runtime, as in:
46		struct foo { atomic_t counter; };
47		...
49		struct foo *k;
51		k = kmalloc(sizeof(*k), GFP_KERNEL);
52		if (!k)
53			return -ENOMEM;
54		atomic_set(&k->counter, 0);
56	The setting is atomic in that the return values of the atomic operations by
57	all threads are guaranteed to be correct reflecting either the value that has
58	been set with this operation or set with another operation.  A proper implicit
59	or explicit memory barrier is needed before the value set with the operation
60	is guaranteed to be readable with atomic_read from another thread.
62	Next, we have:
64		#define atomic_read(v)	((v)->counter)
66	which simply reads the counter value currently visible to the calling thread.
67	The read is atomic in that the return value is guaranteed to be one of the
68	values initialized or modified with the interface operations if a proper
69	implicit or explicit memory barrier is used after possible runtime
70	initialization by any other thread and the value is modified only with the
71	interface operations.  atomic_read does not guarantee that the runtime
72	initialization by any other thread is visible yet, so the user of the
73	interface must take care of that with a proper implicit or explicit memory
74	barrier.
76	*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
78	Some architectures may choose to use the volatile keyword, barriers, or inline
79	assembly to guarantee some degree of immediacy for atomic_read() and
80	atomic_set().  This is not uniformly guaranteed, and may change in the future,
81	so all users of atomic_t should treat atomic_read() and atomic_set() as simple
82	C statements that may be reordered or optimized away entirely by the compiler
83	or processor, and explicitly invoke the appropriate compiler and/or memory
84	barrier for each use case.  Failure to do so will result in code that may
85	suddenly break when used with different architectures or compiler
86	optimizations, or even changes in unrelated code which changes how the
87	compiler optimizes the section accessing atomic_t variables.
91	Properly aligned pointers, longs, ints, and chars (and unsigned
92	equivalents) may be atomically loaded from and stored to in the same
93	sense as described for atomic_read() and atomic_set().  The ACCESS_ONCE()
94	macro should be used to prevent the compiler from using optimizations
95	that might otherwise optimize accesses out of existence on the one hand,
96	or that might create unsolicited accesses on the other.
98	For example consider the following code:
100		while (a > 0)
101			do_something();
103	If the compiler can prove that do_something() does not store to the
104	variable a, then the compiler is within its rights transforming this to
105	the following:
107		tmp = a;
108		if (a > 0)
109			for (;;)
110				do_something();
112	If you don't want the compiler to do this (and you probably don't), then
113	you should use something like the following:
115		while (ACCESS_ONCE(a) < 0)
116			do_something();
118	Alternatively, you could place a barrier() call in the loop.
120	For another example, consider the following code:
122		tmp_a = a;
123		do_something_with(tmp_a);
124		do_something_else_with(tmp_a);
126	If the compiler can prove that do_something_with() does not store to the
127	variable a, then the compiler is within its rights to manufacture an
128	additional load as follows:
130		tmp_a = a;
131		do_something_with(tmp_a);
132		tmp_a = a;
133		do_something_else_with(tmp_a);
135	This could fatally confuse your code if it expected the same value
136	to be passed to do_something_with() and do_something_else_with().
138	The compiler would be likely to manufacture this additional load if
139	do_something_with() was an inline function that made very heavy use
140	of registers: reloading from variable a could save a flush to the
141	stack and later reload.  To prevent the compiler from attacking your
142	code in this manner, write the following:
144		tmp_a = ACCESS_ONCE(a);
145		do_something_with(tmp_a);
146		do_something_else_with(tmp_a);
148	For a final example, consider the following code, assuming that the
149	variable a is set at boot time before the second CPU is brought online
150	and never changed later, so that memory barriers are not needed:
152		if (a)
153			b = 9;
154		else
155			b = 42;
157	The compiler is within its rights to manufacture an additional store
158	by transforming the above code into the following:
160		b = 42;
161		if (a)
162			b = 9;
164	This could come as a fatal surprise to other code running concurrently
165	that expected b to never have the value 42 if a was zero.  To prevent
166	the compiler from doing this, write something like:
168		if (a)
169			ACCESS_ONCE(b) = 9;
170		else
171			ACCESS_ONCE(b) = 42;
173	Don't even -think- about doing this without proper use of memory barriers,
174	locks, or atomic operations if variable a can change at runtime!
178	Now, we move onto the atomic operation interfaces typically implemented with
179	the help of assembly code.
181		void atomic_add(int i, atomic_t *v);
182		void atomic_sub(int i, atomic_t *v);
183		void atomic_inc(atomic_t *v);
184		void atomic_dec(atomic_t *v);
186	These four routines add and subtract integral values to/from the given
187	atomic_t value.  The first two routines pass explicit integers by
188	which to make the adjustment, whereas the latter two use an implicit
189	adjustment value of "1".
191	One very important aspect of these two routines is that they DO NOT
192	require any explicit memory barriers.  They need only perform the
193	atomic_t counter update in an SMP safe manner.
195	Next, we have:
197		int atomic_inc_return(atomic_t *v);
198		int atomic_dec_return(atomic_t *v);
200	These routines add 1 and subtract 1, respectively, from the given
201	atomic_t and return the new counter value after the operation is
202	performed.
204	Unlike the above routines, it is required that these primitives
205	include explicit memory barriers that are performed before and after
206	the operation.  It must be done such that all memory operations before
207	and after the atomic operation calls are strongly ordered with respect
208	to the atomic operation itself.
210	For example, it should behave as if a smp_mb() call existed both
211	before and after the atomic operation.
213	If the atomic instructions used in an implementation provide explicit
214	memory barrier semantics which satisfy the above requirements, that is
215	fine as well.
217	Let's move on:
219		int atomic_add_return(int i, atomic_t *v);
220		int atomic_sub_return(int i, atomic_t *v);
222	These behave just like atomic_{inc,dec}_return() except that an
223	explicit counter adjustment is given instead of the implicit "1".
224	This means that like atomic_{inc,dec}_return(), the memory barrier
225	semantics are required.
227	Next:
229		int atomic_inc_and_test(atomic_t *v);
230		int atomic_dec_and_test(atomic_t *v);
232	These two routines increment and decrement by 1, respectively, the
233	given atomic counter.  They return a boolean indicating whether the
234	resulting counter value was zero or not.
236	Again, these primitives provide explicit memory barrier semantics around
237	the atomic operation.
239		int atomic_sub_and_test(int i, atomic_t *v);
241	This is identical to atomic_dec_and_test() except that an explicit
242	decrement is given instead of the implicit "1".  This primitive must
243	provide explicit memory barrier semantics around the operation.
245		int atomic_add_negative(int i, atomic_t *v);
247	The given increment is added to the given atomic counter value.  A boolean
248	is return which indicates whether the resulting counter value is negative.
249	This primitive must provide explicit memory barrier semantics around
250	the operation.
252	Then:
254		int atomic_xchg(atomic_t *v, int new);
256	This performs an atomic exchange operation on the atomic variable v, setting
257	the given new value.  It returns the old value that the atomic variable v had
258	just before the operation.
260	atomic_xchg must provide explicit memory barriers around the operation.
262		int atomic_cmpxchg(atomic_t *v, int old, int new);
264	This performs an atomic compare exchange operation on the atomic value v,
265	with the given old and new values. Like all atomic_xxx operations,
266	atomic_cmpxchg will only satisfy its atomicity semantics as long as all
267	other accesses of *v are performed through atomic_xxx operations.
269	atomic_cmpxchg must provide explicit memory barriers around the operation.
271	The semantics for atomic_cmpxchg are the same as those defined for 'cas'
272	below.
274	Finally:
276		int atomic_add_unless(atomic_t *v, int a, int u);
278	If the atomic value v is not equal to u, this function adds a to v, and
279	returns non zero. If v is equal to u then it returns zero. This is done as
280	an atomic operation.
282	atomic_add_unless must provide explicit memory barriers around the
283	operation unless it fails (returns 0).
285	atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
288	If a caller requires memory barrier semantics around an atomic_t
289	operation which does not return a value, a set of interfaces are
290	defined which accomplish this:
292		void smp_mb__before_atomic(void);
293		void smp_mb__after_atomic(void);
295	For example, smp_mb__before_atomic() can be used like so:
297		obj->dead = 1;
298		smp_mb__before_atomic();
299		atomic_dec(&obj->ref_count);
301	It makes sure that all memory operations preceding the atomic_dec()
302	call are strongly ordered with respect to the atomic counter
303	operation.  In the above example, it guarantees that the assignment of
304	"1" to obj->dead will be globally visible to other cpus before the
305	atomic counter decrement.
307	Without the explicit smp_mb__before_atomic() call, the
308	implementation could legally allow the atomic counter update visible
309	to other cpus before the "obj->dead = 1;" assignment.
311	A missing memory barrier in the cases where they are required by the
312	atomic_t implementation above can have disastrous results.  Here is
313	an example, which follows a pattern occurring frequently in the Linux
314	kernel.  It is the use of atomic counters to implement reference
315	counting, and it works such that once the counter falls to zero it can
316	be guaranteed that no other entity can be accessing the object:
318	static void obj_list_add(struct obj *obj, struct list_head *head)
319	{
320		obj->active = 1;
321		list_add(&obj->list, head);
322	}
324	static void obj_list_del(struct obj *obj)
325	{
326		list_del(&obj->list);
327		obj->active = 0;
328	}
330	static void obj_destroy(struct obj *obj)
331	{
332		BUG_ON(obj->active);
333		kfree(obj);
334	}
336	struct obj *obj_list_peek(struct list_head *head)
337	{
338		if (!list_empty(head)) {
339			struct obj *obj;
341			obj = list_entry(head->next, struct obj, list);
342			atomic_inc(&obj->refcnt);
343			return obj;
344		}
345		return NULL;
346	}
348	void obj_poke(void)
349	{
350		struct obj *obj;
352		spin_lock(&global_list_lock);
353		obj = obj_list_peek(&global_list);
354		spin_unlock(&global_list_lock);
356		if (obj) {
357			obj->ops->poke(obj);
358			if (atomic_dec_and_test(&obj->refcnt))
359				obj_destroy(obj);
360		}
361	}
363	void obj_timeout(struct obj *obj)
364	{
365		spin_lock(&global_list_lock);
366		obj_list_del(obj);
367		spin_unlock(&global_list_lock);
369		if (atomic_dec_and_test(&obj->refcnt))
370			obj_destroy(obj);
371	}
373	(This is a simplification of the ARP queue management in the
374	 generic neighbour discover code of the networking.  Olaf Kirch
375	 found a bug wrt. memory barriers in kfree_skb() that exposed
376	 the atomic_t memory barrier requirements quite clearly.)
378	Given the above scheme, it must be the case that the obj->active
379	update done by the obj list deletion be visible to other processors
380	before the atomic counter decrement is performed.
382	Otherwise, the counter could fall to zero, yet obj->active would still
383	be set, thus triggering the assertion in obj_destroy().  The error
384	sequence looks like this:
386		cpu 0				cpu 1
387		obj_poke()			obj_timeout()
388		obj = obj_list_peek();
389		... gains ref to obj, refcnt=2
390						obj_list_del(obj);
391						obj->active = 0 ...
392						... visibility delayed ...
393						atomic_dec_and_test()
394						... refcnt drops to 1 ...
395		atomic_dec_and_test()
396		... refcount drops to 0 ...
397		obj_destroy()
398		BUG() triggers since obj->active
399		still seen as one
400						obj->active update visibility occurs
402	With the memory barrier semantics required of the atomic_t operations
403	which return values, the above sequence of memory visibility can never
404	happen.  Specifically, in the above case the atomic_dec_and_test()
405	counter decrement would not become globally visible until the
406	obj->active update does.
408	As a historical note, 32-bit Sparc used to only allow usage of
409	24-bits of its atomic_t type.  This was because it used 8 bits
410	as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
411	type instruction.  However, 32-bit Sparc has since been moved over
412	to a "hash table of spinlocks" scheme, that allows the full 32-bit
413	counter to be realized.  Essentially, an array of spinlocks are
414	indexed into based upon the address of the atomic_t being operated
415	on, and that lock protects the atomic operation.  Parisc uses the
416	same scheme.
418	Another note is that the atomic_t operations returning values are
419	extremely slow on an old 386.
421	We will now cover the atomic bitmask operations.  You will find that
422	their SMP and memory barrier semantics are similar in shape and scope
423	to the atomic_t ops above.
425	Native atomic bit operations are defined to operate on objects aligned
426	to the size of an "unsigned long" C data type, and are least of that
427	size.  The endianness of the bits within each "unsigned long" are the
428	native endianness of the cpu.
430		void set_bit(unsigned long nr, volatile unsigned long *addr);
431		void clear_bit(unsigned long nr, volatile unsigned long *addr);
432		void change_bit(unsigned long nr, volatile unsigned long *addr);
434	These routines set, clear, and change, respectively, the bit number
435	indicated by "nr" on the bit mask pointed to by "ADDR".
437	They must execute atomically, yet there are no implicit memory barrier
438	semantics required of these interfaces.
440		int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
441		int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
442		int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
444	Like the above, except that these routines return a boolean which
445	indicates whether the changed bit was set _BEFORE_ the atomic bit
446	operation.
448	WARNING! It is incredibly important that the value be a boolean,
449	ie. "0" or "1".  Do not try to be fancy and save a few instructions by
450	declaring the above to return "long" and just returning something like
451	"old_val & mask" because that will not work.
453	For one thing, this return value gets truncated to int in many code
454	paths using these interfaces, so on 64-bit if the bit is set in the
455	upper 32-bits then testers will never see that.
457	One great example of where this problem crops up are the thread_info
458	flag operations.  Routines such as test_and_set_ti_thread_flag() chop
459	the return value into an int.  There are other places where things
460	like this occur as well.
462	These routines, like the atomic_t counter operations returning values,
463	must provide explicit memory barrier semantics around their execution.
464	All memory operations before the atomic bit operation call must be
465	made visible globally before the atomic bit operation is made visible.
466	Likewise, the atomic bit operation must be visible globally before any
467	subsequent memory operation is made visible.  For example:
469		obj->dead = 1;
470		if (test_and_set_bit(0, &obj->flags))
471			/* ... */;
472		obj->killed = 1;
474	The implementation of test_and_set_bit() must guarantee that
475	"obj->dead = 1;" is visible to cpus before the atomic memory operation
476	done by test_and_set_bit() becomes visible.  Likewise, the atomic
477	memory operation done by test_and_set_bit() must become visible before
478	"obj->killed = 1;" is visible.
480	Finally there is the basic operation:
482		int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
484	Which returns a boolean indicating if bit "nr" is set in the bitmask
485	pointed to by "addr".
487	If explicit memory barriers are required around {set,clear}_bit() (which do
488	not return a value, and thus does not need to provide memory barrier
489	semantics), two interfaces are provided:
491		void smp_mb__before_atomic(void);
492		void smp_mb__after_atomic(void);
494	They are used as follows, and are akin to their atomic_t operation
495	brothers:
497		/* All memory operations before this call will
498		 * be globally visible before the clear_bit().
499		 */
500		smp_mb__before_atomic();
501		clear_bit( ... );
503		/* The clear_bit() will be visible before all
504		 * subsequent memory operations.
505		 */
506		 smp_mb__after_atomic();
508	There are two special bitops with lock barrier semantics (acquire/release,
509	same as spinlocks). These operate in the same way as their non-_lock/unlock
510	postfixed variants, except that they are to provide acquire/release semantics,
511	respectively. This means they can be used for bit_spin_trylock and
512	bit_spin_unlock type operations without specifying any more barriers.
514		int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
515		void clear_bit_unlock(unsigned long nr, unsigned long *addr);
516		void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
518	The __clear_bit_unlock version is non-atomic, however it still implements
519	unlock barrier semantics. This can be useful if the lock itself is protecting
520	the other bits in the word.
522	Finally, there are non-atomic versions of the bitmask operations
523	provided.  They are used in contexts where some other higher-level SMP
524	locking scheme is being used to protect the bitmask, and thus less
525	expensive non-atomic operations may be used in the implementation.
526	They have names similar to the above bitmask operation interfaces,
527	except that two underscores are prefixed to the interface name.
529		void __set_bit(unsigned long nr, volatile unsigned long *addr);
530		void __clear_bit(unsigned long nr, volatile unsigned long *addr);
531		void __change_bit(unsigned long nr, volatile unsigned long *addr);
532		int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
533		int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
534		int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
536	These non-atomic variants also do not require any special memory
537	barrier semantics.
539	The routines xchg() and cmpxchg() must provide the same exact
540	memory-barrier semantics as the atomic and bit operations returning
541	values.
543	Spinlocks and rwlocks have memory barrier expectations as well.
544	The rule to follow is simple:
546	1) When acquiring a lock, the implementation must make it globally
547	   visible before any subsequent memory operation.
549	2) When releasing a lock, the implementation must make it such that
550	   all previous memory operations are globally visible before the
551	   lock release.
553	Which finally brings us to _atomic_dec_and_lock().  There is an
554	architecture-neutral version implemented in lib/dec_and_lock.c,
555	but most platforms will wish to optimize this in assembler.
557		int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
559	Atomically decrement the given counter, and if will drop to zero
560	atomically acquire the given spinlock and perform the decrement
561	of the counter to zero.  If it does not drop to zero, do nothing
562	with the spinlock.
564	It is actually pretty simple to get the memory barrier correct.
565	Simply satisfy the spinlock grab requirements, which is make
566	sure the spinlock operation is globally visible before any
567	subsequent memory operation.
569	We can demonstrate this operation more clearly if we define
570	an abstract atomic operation:
572		long cas(long *mem, long old, long new);
574	"cas" stands for "compare and swap".  It atomically:
576	1) Compares "old" with the value currently at "mem".
577	2) If they are equal, "new" is written to "mem".
578	3) Regardless, the current value at "mem" is returned.
580	As an example usage, here is what an atomic counter update
581	might look like:
583	void example_atomic_inc(long *counter)
584	{
585		long old, new, ret;
587		while (1) {
588			old = *counter;
589			new = old + 1;
591			ret = cas(counter, old, new);
592			if (ret == old)
593				break;
594		}
595	}
597	Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
599	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
600	{
601		long old, new, ret;
602		int went_to_zero;
604		went_to_zero = 0;
605		while (1) {
606			old = atomic_read(atomic);
607			new = old - 1;
608			if (new == 0) {
609				went_to_zero = 1;
610				spin_lock(lock);
611			}
612			ret = cas(atomic, old, new);
613			if (ret == old)
614				break;
615			if (went_to_zero) {
616				spin_unlock(lock);
617				went_to_zero = 0;
618			}
619		}
621		return went_to_zero;
622	}
624	Now, as far as memory barriers go, as long as spin_lock()
625	strictly orders all subsequent memory operations (including
626	the cas()) with respect to itself, things will be fine.
628	Said another way, _atomic_dec_and_lock() must guarantee that
629	a counter dropping to zero is never made visible before the
630	spinlock being acquired.
632	Note that this also means that for the case where the counter
633	is not dropping to zero, there are no memory ordering
634	requirements.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.