About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / atomic_ops.txt

Custom Search

Based on kernel version 4.3. Page generated on 2015-11-02 12:43 EST.

1			Semantics and Behavior of Atomic and
2			         Bitmask Operations
4				  David S. Miller	 
6		This document is intended to serve as a guide to Linux port
7	maintainers on how to implement atomic counter, bitops, and spinlock
8	interfaces properly.
10		The atomic_t type should be defined as a signed integer and
11	the atomic_long_t type as a signed long integer.  Also, they should
12	be made opaque such that any kind of cast to a normal C integer type
13	will fail.  Something like the following should suffice:
15		typedef struct { int counter; } atomic_t;
16		typedef struct { long counter; } atomic_long_t;
18	Historically, counter has been declared volatile.  This is now discouraged.
19	See Documentation/volatile-considered-harmful.txt for the complete rationale.
21	local_t is very similar to atomic_t. If the counter is per CPU and only
22	updated by one CPU, local_t is probably more appropriate. Please see
23	Documentation/local_ops.txt for the semantics of local_t.
25	The first operations to implement for atomic_t's are the initializers and
26	plain reads.
28		#define ATOMIC_INIT(i)		{ (i) }
29		#define atomic_set(v, i)	((v)->counter = (i))
31	The first macro is used in definitions, such as:
33	static atomic_t my_counter = ATOMIC_INIT(1);
35	The initializer is atomic in that the return values of the atomic operations
36	are guaranteed to be correct reflecting the initialized value if the
37	initializer is used before runtime.  If the initializer is used at runtime, a
38	proper implicit or explicit read memory barrier is needed before reading the
39	value with atomic_read from another thread.
41	As with all of the atomic_ interfaces, replace the leading "atomic_"
42	with "atomic_long_" to operate on atomic_long_t.
44	The second interface can be used at runtime, as in:
46		struct foo { atomic_t counter; };
47		...
49		struct foo *k;
51		k = kmalloc(sizeof(*k), GFP_KERNEL);
52		if (!k)
53			return -ENOMEM;
54		atomic_set(&k->counter, 0);
56	The setting is atomic in that the return values of the atomic operations by
57	all threads are guaranteed to be correct reflecting either the value that has
58	been set with this operation or set with another operation.  A proper implicit
59	or explicit memory barrier is needed before the value set with the operation
60	is guaranteed to be readable with atomic_read from another thread.
62	Next, we have:
64		#define atomic_read(v)	((v)->counter)
66	which simply reads the counter value currently visible to the calling thread.
67	The read is atomic in that the return value is guaranteed to be one of the
68	values initialized or modified with the interface operations if a proper
69	implicit or explicit memory barrier is used after possible runtime
70	initialization by any other thread and the value is modified only with the
71	interface operations.  atomic_read does not guarantee that the runtime
72	initialization by any other thread is visible yet, so the user of the
73	interface must take care of that with a proper implicit or explicit memory
74	barrier.
76	*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
78	Some architectures may choose to use the volatile keyword, barriers, or inline
79	assembly to guarantee some degree of immediacy for atomic_read() and
80	atomic_set().  This is not uniformly guaranteed, and may change in the future,
81	so all users of atomic_t should treat atomic_read() and atomic_set() as simple
82	C statements that may be reordered or optimized away entirely by the compiler
83	or processor, and explicitly invoke the appropriate compiler and/or memory
84	barrier for each use case.  Failure to do so will result in code that may
85	suddenly break when used with different architectures or compiler
86	optimizations, or even changes in unrelated code which changes how the
87	compiler optimizes the section accessing atomic_t variables.
91	Properly aligned pointers, longs, ints, and chars (and unsigned
92	equivalents) may be atomically loaded from and stored to in the same
93	sense as described for atomic_read() and atomic_set().  The ACCESS_ONCE()
94	macro should be used to prevent the compiler from using optimizations
95	that might otherwise optimize accesses out of existence on the one hand,
96	or that might create unsolicited accesses on the other.
98	For example consider the following code:
100		while (a > 0)
101			do_something();
103	If the compiler can prove that do_something() does not store to the
104	variable a, then the compiler is within its rights transforming this to
105	the following:
107		tmp = a;
108		if (a > 0)
109			for (;;)
110				do_something();
112	If you don't want the compiler to do this (and you probably don't), then
113	you should use something like the following:
115		while (ACCESS_ONCE(a) < 0)
116			do_something();
118	Alternatively, you could place a barrier() call in the loop.
120	For another example, consider the following code:
122		tmp_a = a;
123		do_something_with(tmp_a);
124		do_something_else_with(tmp_a);
126	If the compiler can prove that do_something_with() does not store to the
127	variable a, then the compiler is within its rights to manufacture an
128	additional load as follows:
130		tmp_a = a;
131		do_something_with(tmp_a);
132		tmp_a = a;
133		do_something_else_with(tmp_a);
135	This could fatally confuse your code if it expected the same value
136	to be passed to do_something_with() and do_something_else_with().
138	The compiler would be likely to manufacture this additional load if
139	do_something_with() was an inline function that made very heavy use
140	of registers: reloading from variable a could save a flush to the
141	stack and later reload.  To prevent the compiler from attacking your
142	code in this manner, write the following:
144		tmp_a = ACCESS_ONCE(a);
145		do_something_with(tmp_a);
146		do_something_else_with(tmp_a);
148	For a final example, consider the following code, assuming that the
149	variable a is set at boot time before the second CPU is brought online
150	and never changed later, so that memory barriers are not needed:
152		if (a)
153			b = 9;
154		else
155			b = 42;
157	The compiler is within its rights to manufacture an additional store
158	by transforming the above code into the following:
160		b = 42;
161		if (a)
162			b = 9;
164	This could come as a fatal surprise to other code running concurrently
165	that expected b to never have the value 42 if a was zero.  To prevent
166	the compiler from doing this, write something like:
168		if (a)
169			ACCESS_ONCE(b) = 9;
170		else
171			ACCESS_ONCE(b) = 42;
173	Don't even -think- about doing this without proper use of memory barriers,
174	locks, or atomic operations if variable a can change at runtime!
178	Now, we move onto the atomic operation interfaces typically implemented with
179	the help of assembly code.
181		void atomic_add(int i, atomic_t *v);
182		void atomic_sub(int i, atomic_t *v);
183		void atomic_inc(atomic_t *v);
184		void atomic_dec(atomic_t *v);
186	These four routines add and subtract integral values to/from the given
187	atomic_t value.  The first two routines pass explicit integers by
188	which to make the adjustment, whereas the latter two use an implicit
189	adjustment value of "1".
191	One very important aspect of these two routines is that they DO NOT
192	require any explicit memory barriers.  They need only perform the
193	atomic_t counter update in an SMP safe manner.
195	Next, we have:
197		int atomic_inc_return(atomic_t *v);
198		int atomic_dec_return(atomic_t *v);
200	These routines add 1 and subtract 1, respectively, from the given
201	atomic_t and return the new counter value after the operation is
202	performed.
204	Unlike the above routines, it is required that these primitives
205	include explicit memory barriers that are performed before and after
206	the operation.  It must be done such that all memory operations before
207	and after the atomic operation calls are strongly ordered with respect
208	to the atomic operation itself.
210	For example, it should behave as if a smp_mb() call existed both
211	before and after the atomic operation.
213	If the atomic instructions used in an implementation provide explicit
214	memory barrier semantics which satisfy the above requirements, that is
215	fine as well.
217	Let's move on:
219		int atomic_add_return(int i, atomic_t *v);
220		int atomic_sub_return(int i, atomic_t *v);
222	These behave just like atomic_{inc,dec}_return() except that an
223	explicit counter adjustment is given instead of the implicit "1".
224	This means that like atomic_{inc,dec}_return(), the memory barrier
225	semantics are required.
227	Next:
229		int atomic_inc_and_test(atomic_t *v);
230		int atomic_dec_and_test(atomic_t *v);
232	These two routines increment and decrement by 1, respectively, the
233	given atomic counter.  They return a boolean indicating whether the
234	resulting counter value was zero or not.
236	Again, these primitives provide explicit memory barrier semantics around
237	the atomic operation.
239		int atomic_sub_and_test(int i, atomic_t *v);
241	This is identical to atomic_dec_and_test() except that an explicit
242	decrement is given instead of the implicit "1".  This primitive must
243	provide explicit memory barrier semantics around the operation.
245		int atomic_add_negative(int i, atomic_t *v);
247	The given increment is added to the given atomic counter value.  A boolean
248	is return which indicates whether the resulting counter value is negative.
249	This primitive must provide explicit memory barrier semantics around
250	the operation.
252	Then:
254		int atomic_xchg(atomic_t *v, int new);
256	This performs an atomic exchange operation on the atomic variable v, setting
257	the given new value.  It returns the old value that the atomic variable v had
258	just before the operation.
260	atomic_xchg must provide explicit memory barriers around the operation.
262		int atomic_cmpxchg(atomic_t *v, int old, int new);
264	This performs an atomic compare exchange operation on the atomic value v,
265	with the given old and new values. Like all atomic_xxx operations,
266	atomic_cmpxchg will only satisfy its atomicity semantics as long as all
267	other accesses of *v are performed through atomic_xxx operations.
269	atomic_cmpxchg must provide explicit memory barriers around the operation,
270	although if the comparison fails then no memory ordering guarantees are
271	required.
273	The semantics for atomic_cmpxchg are the same as those defined for 'cas'
274	below.
276	Finally:
278		int atomic_add_unless(atomic_t *v, int a, int u);
280	If the atomic value v is not equal to u, this function adds a to v, and
281	returns non zero. If v is equal to u then it returns zero. This is done as
282	an atomic operation.
284	atomic_add_unless must provide explicit memory barriers around the
285	operation unless it fails (returns 0).
287	atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
290	If a caller requires memory barrier semantics around an atomic_t
291	operation which does not return a value, a set of interfaces are
292	defined which accomplish this:
294		void smp_mb__before_atomic(void);
295		void smp_mb__after_atomic(void);
297	For example, smp_mb__before_atomic() can be used like so:
299		obj->dead = 1;
300		smp_mb__before_atomic();
301		atomic_dec(&obj->ref_count);
303	It makes sure that all memory operations preceding the atomic_dec()
304	call are strongly ordered with respect to the atomic counter
305	operation.  In the above example, it guarantees that the assignment of
306	"1" to obj->dead will be globally visible to other cpus before the
307	atomic counter decrement.
309	Without the explicit smp_mb__before_atomic() call, the
310	implementation could legally allow the atomic counter update visible
311	to other cpus before the "obj->dead = 1;" assignment.
313	A missing memory barrier in the cases where they are required by the
314	atomic_t implementation above can have disastrous results.  Here is
315	an example, which follows a pattern occurring frequently in the Linux
316	kernel.  It is the use of atomic counters to implement reference
317	counting, and it works such that once the counter falls to zero it can
318	be guaranteed that no other entity can be accessing the object:
320	static void obj_list_add(struct obj *obj, struct list_head *head)
321	{
322		obj->active = 1;
323		list_add(&obj->list, head);
324	}
326	static void obj_list_del(struct obj *obj)
327	{
328		list_del(&obj->list);
329		obj->active = 0;
330	}
332	static void obj_destroy(struct obj *obj)
333	{
334		BUG_ON(obj->active);
335		kfree(obj);
336	}
338	struct obj *obj_list_peek(struct list_head *head)
339	{
340		if (!list_empty(head)) {
341			struct obj *obj;
343			obj = list_entry(head->next, struct obj, list);
344			atomic_inc(&obj->refcnt);
345			return obj;
346		}
347		return NULL;
348	}
350	void obj_poke(void)
351	{
352		struct obj *obj;
354		spin_lock(&global_list_lock);
355		obj = obj_list_peek(&global_list);
356		spin_unlock(&global_list_lock);
358		if (obj) {
359			obj->ops->poke(obj);
360			if (atomic_dec_and_test(&obj->refcnt))
361				obj_destroy(obj);
362		}
363	}
365	void obj_timeout(struct obj *obj)
366	{
367		spin_lock(&global_list_lock);
368		obj_list_del(obj);
369		spin_unlock(&global_list_lock);
371		if (atomic_dec_and_test(&obj->refcnt))
372			obj_destroy(obj);
373	}
375	(This is a simplification of the ARP queue management in the
376	 generic neighbour discover code of the networking.  Olaf Kirch
377	 found a bug wrt. memory barriers in kfree_skb() that exposed
378	 the atomic_t memory barrier requirements quite clearly.)
380	Given the above scheme, it must be the case that the obj->active
381	update done by the obj list deletion be visible to other processors
382	before the atomic counter decrement is performed.
384	Otherwise, the counter could fall to zero, yet obj->active would still
385	be set, thus triggering the assertion in obj_destroy().  The error
386	sequence looks like this:
388		cpu 0				cpu 1
389		obj_poke()			obj_timeout()
390		obj = obj_list_peek();
391		... gains ref to obj, refcnt=2
392						obj_list_del(obj);
393						obj->active = 0 ...
394						... visibility delayed ...
395						atomic_dec_and_test()
396						... refcnt drops to 1 ...
397		atomic_dec_and_test()
398		... refcount drops to 0 ...
399		obj_destroy()
400		BUG() triggers since obj->active
401		still seen as one
402						obj->active update visibility occurs
404	With the memory barrier semantics required of the atomic_t operations
405	which return values, the above sequence of memory visibility can never
406	happen.  Specifically, in the above case the atomic_dec_and_test()
407	counter decrement would not become globally visible until the
408	obj->active update does.
410	As a historical note, 32-bit Sparc used to only allow usage of
411	24-bits of its atomic_t type.  This was because it used 8 bits
412	as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
413	type instruction.  However, 32-bit Sparc has since been moved over
414	to a "hash table of spinlocks" scheme, that allows the full 32-bit
415	counter to be realized.  Essentially, an array of spinlocks are
416	indexed into based upon the address of the atomic_t being operated
417	on, and that lock protects the atomic operation.  Parisc uses the
418	same scheme.
420	Another note is that the atomic_t operations returning values are
421	extremely slow on an old 386.
423	We will now cover the atomic bitmask operations.  You will find that
424	their SMP and memory barrier semantics are similar in shape and scope
425	to the atomic_t ops above.
427	Native atomic bit operations are defined to operate on objects aligned
428	to the size of an "unsigned long" C data type, and are least of that
429	size.  The endianness of the bits within each "unsigned long" are the
430	native endianness of the cpu.
432		void set_bit(unsigned long nr, volatile unsigned long *addr);
433		void clear_bit(unsigned long nr, volatile unsigned long *addr);
434		void change_bit(unsigned long nr, volatile unsigned long *addr);
436	These routines set, clear, and change, respectively, the bit number
437	indicated by "nr" on the bit mask pointed to by "ADDR".
439	They must execute atomically, yet there are no implicit memory barrier
440	semantics required of these interfaces.
442		int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
443		int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
444		int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
446	Like the above, except that these routines return a boolean which
447	indicates whether the changed bit was set _BEFORE_ the atomic bit
448	operation.
450	WARNING! It is incredibly important that the value be a boolean,
451	ie. "0" or "1".  Do not try to be fancy and save a few instructions by
452	declaring the above to return "long" and just returning something like
453	"old_val & mask" because that will not work.
455	For one thing, this return value gets truncated to int in many code
456	paths using these interfaces, so on 64-bit if the bit is set in the
457	upper 32-bits then testers will never see that.
459	One great example of where this problem crops up are the thread_info
460	flag operations.  Routines such as test_and_set_ti_thread_flag() chop
461	the return value into an int.  There are other places where things
462	like this occur as well.
464	These routines, like the atomic_t counter operations returning values,
465	must provide explicit memory barrier semantics around their execution.
466	All memory operations before the atomic bit operation call must be
467	made visible globally before the atomic bit operation is made visible.
468	Likewise, the atomic bit operation must be visible globally before any
469	subsequent memory operation is made visible.  For example:
471		obj->dead = 1;
472		if (test_and_set_bit(0, &obj->flags))
473			/* ... */;
474		obj->killed = 1;
476	The implementation of test_and_set_bit() must guarantee that
477	"obj->dead = 1;" is visible to cpus before the atomic memory operation
478	done by test_and_set_bit() becomes visible.  Likewise, the atomic
479	memory operation done by test_and_set_bit() must become visible before
480	"obj->killed = 1;" is visible.
482	Finally there is the basic operation:
484		int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
486	Which returns a boolean indicating if bit "nr" is set in the bitmask
487	pointed to by "addr".
489	If explicit memory barriers are required around {set,clear}_bit() (which do
490	not return a value, and thus does not need to provide memory barrier
491	semantics), two interfaces are provided:
493		void smp_mb__before_atomic(void);
494		void smp_mb__after_atomic(void);
496	They are used as follows, and are akin to their atomic_t operation
497	brothers:
499		/* All memory operations before this call will
500		 * be globally visible before the clear_bit().
501		 */
502		smp_mb__before_atomic();
503		clear_bit( ... );
505		/* The clear_bit() will be visible before all
506		 * subsequent memory operations.
507		 */
508		 smp_mb__after_atomic();
510	There are two special bitops with lock barrier semantics (acquire/release,
511	same as spinlocks). These operate in the same way as their non-_lock/unlock
512	postfixed variants, except that they are to provide acquire/release semantics,
513	respectively. This means they can be used for bit_spin_trylock and
514	bit_spin_unlock type operations without specifying any more barriers.
516		int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
517		void clear_bit_unlock(unsigned long nr, unsigned long *addr);
518		void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
520	The __clear_bit_unlock version is non-atomic, however it still implements
521	unlock barrier semantics. This can be useful if the lock itself is protecting
522	the other bits in the word.
524	Finally, there are non-atomic versions of the bitmask operations
525	provided.  They are used in contexts where some other higher-level SMP
526	locking scheme is being used to protect the bitmask, and thus less
527	expensive non-atomic operations may be used in the implementation.
528	They have names similar to the above bitmask operation interfaces,
529	except that two underscores are prefixed to the interface name.
531		void __set_bit(unsigned long nr, volatile unsigned long *addr);
532		void __clear_bit(unsigned long nr, volatile unsigned long *addr);
533		void __change_bit(unsigned long nr, volatile unsigned long *addr);
534		int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
535		int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
536		int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
538	These non-atomic variants also do not require any special memory
539	barrier semantics.
541	The routines xchg() and cmpxchg() must provide the same exact
542	memory-barrier semantics as the atomic and bit operations returning
543	values.
545	Spinlocks and rwlocks have memory barrier expectations as well.
546	The rule to follow is simple:
548	1) When acquiring a lock, the implementation must make it globally
549	   visible before any subsequent memory operation.
551	2) When releasing a lock, the implementation must make it such that
552	   all previous memory operations are globally visible before the
553	   lock release.
555	Which finally brings us to _atomic_dec_and_lock().  There is an
556	architecture-neutral version implemented in lib/dec_and_lock.c,
557	but most platforms will wish to optimize this in assembler.
559		int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
561	Atomically decrement the given counter, and if will drop to zero
562	atomically acquire the given spinlock and perform the decrement
563	of the counter to zero.  If it does not drop to zero, do nothing
564	with the spinlock.
566	It is actually pretty simple to get the memory barrier correct.
567	Simply satisfy the spinlock grab requirements, which is make
568	sure the spinlock operation is globally visible before any
569	subsequent memory operation.
571	We can demonstrate this operation more clearly if we define
572	an abstract atomic operation:
574		long cas(long *mem, long old, long new);
576	"cas" stands for "compare and swap".  It atomically:
578	1) Compares "old" with the value currently at "mem".
579	2) If they are equal, "new" is written to "mem".
580	3) Regardless, the current value at "mem" is returned.
582	As an example usage, here is what an atomic counter update
583	might look like:
585	void example_atomic_inc(long *counter)
586	{
587		long old, new, ret;
589		while (1) {
590			old = *counter;
591			new = old + 1;
593			ret = cas(counter, old, new);
594			if (ret == old)
595				break;
596		}
597	}
599	Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
601	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
602	{
603		long old, new, ret;
604		int went_to_zero;
606		went_to_zero = 0;
607		while (1) {
608			old = atomic_read(atomic);
609			new = old - 1;
610			if (new == 0) {
611				went_to_zero = 1;
612				spin_lock(lock);
613			}
614			ret = cas(atomic, old, new);
615			if (ret == old)
616				break;
617			if (went_to_zero) {
618				spin_unlock(lock);
619				went_to_zero = 0;
620			}
621		}
623		return went_to_zero;
624	}
626	Now, as far as memory barriers go, as long as spin_lock()
627	strictly orders all subsequent memory operations (including
628	the cas()) with respect to itself, things will be fine.
630	Said another way, _atomic_dec_and_lock() must guarantee that
631	a counter dropping to zero is never made visible before the
632	spinlock being acquired.
634	Note that this also means that for the case where the counter
635	is not dropping to zero, there are no memory ordering
636	requirements.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.