About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / core-api / atomic_ops.rst




Custom Search

Based on kernel version 4.16.1. Page generated on 2018-04-09 11:52 EST.

1	=======================================================
2	Semantics and Behavior of Atomic and Bitmask Operations
3	=======================================================
4	
5	:Author: David S. Miller
6	
7	This document is intended to serve as a guide to Linux port
8	maintainers on how to implement atomic counter, bitops, and spinlock
9	interfaces properly.
10	
11	Atomic Type And Operations
12	==========================
13	
14	The atomic_t type should be defined as a signed integer and
15	the atomic_long_t type as a signed long integer.  Also, they should
16	be made opaque such that any kind of cast to a normal C integer type
17	will fail.  Something like the following should suffice::
18	
19		typedef struct { int counter; } atomic_t;
20		typedef struct { long counter; } atomic_long_t;
21	
22	Historically, counter has been declared volatile.  This is now discouraged.
23	See :ref:`Documentation/process/volatile-considered-harmful.rst
24	<volatile_considered_harmful>` for the complete rationale.
25	
26	local_t is very similar to atomic_t. If the counter is per CPU and only
27	updated by one CPU, local_t is probably more appropriate. Please see
28	:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of
29	local_t.
30	
31	The first operations to implement for atomic_t's are the initializers and
32	plain reads. ::
33	
34		#define ATOMIC_INIT(i)		{ (i) }
35		#define atomic_set(v, i)	((v)->counter = (i))
36	
37	The first macro is used in definitions, such as::
38	
39		static atomic_t my_counter = ATOMIC_INIT(1);
40	
41	The initializer is atomic in that the return values of the atomic operations
42	are guaranteed to be correct reflecting the initialized value if the
43	initializer is used before runtime.  If the initializer is used at runtime, a
44	proper implicit or explicit read memory barrier is needed before reading the
45	value with atomic_read from another thread.
46	
47	As with all of the ``atomic_`` interfaces, replace the leading ``atomic_``
48	with ``atomic_long_`` to operate on atomic_long_t.
49	
50	The second interface can be used at runtime, as in::
51	
52		struct foo { atomic_t counter; };
53		...
54	
55		struct foo *k;
56	
57		k = kmalloc(sizeof(*k), GFP_KERNEL);
58		if (!k)
59			return -ENOMEM;
60		atomic_set(&k->counter, 0);
61	
62	The setting is atomic in that the return values of the atomic operations by
63	all threads are guaranteed to be correct reflecting either the value that has
64	been set with this operation or set with another operation.  A proper implicit
65	or explicit memory barrier is needed before the value set with the operation
66	is guaranteed to be readable with atomic_read from another thread.
67	
68	Next, we have::
69	
70		#define atomic_read(v)	((v)->counter)
71	
72	which simply reads the counter value currently visible to the calling thread.
73	The read is atomic in that the return value is guaranteed to be one of the
74	values initialized or modified with the interface operations if a proper
75	implicit or explicit memory barrier is used after possible runtime
76	initialization by any other thread and the value is modified only with the
77	interface operations.  atomic_read does not guarantee that the runtime
78	initialization by any other thread is visible yet, so the user of the
79	interface must take care of that with a proper implicit or explicit memory
80	barrier.
81	
82	.. warning::
83	
84		``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS!
85	
86		Some architectures may choose to use the volatile keyword, barriers, or
87		inline assembly to guarantee some degree of immediacy for atomic_read()
88		and atomic_set().  This is not uniformly guaranteed, and may change in
89		the future, so all users of atomic_t should treat atomic_read() and
90		atomic_set() as simple C statements that may be reordered or optimized
91		away entirely by the compiler or processor, and explicitly invoke the
92		appropriate compiler and/or memory barrier for each use case.  Failure
93		to do so will result in code that may suddenly break when used with
94		different architectures or compiler optimizations, or even changes in
95		unrelated code which changes how the compiler optimizes the section
96		accessing atomic_t variables.
97	
98	Properly aligned pointers, longs, ints, and chars (and unsigned
99	equivalents) may be atomically loaded from and stored to in the same
100	sense as described for atomic_read() and atomic_set().  The READ_ONCE()
101	and WRITE_ONCE() macros should be used to prevent the compiler from using
102	optimizations that might otherwise optimize accesses out of existence on
103	the one hand, or that might create unsolicited accesses on the other.
104	
105	For example consider the following code::
106	
107		while (a > 0)
108			do_something();
109	
110	If the compiler can prove that do_something() does not store to the
111	variable a, then the compiler is within its rights transforming this to
112	the following::
113	
114		tmp = a;
115		if (a > 0)
116			for (;;)
117				do_something();
118	
119	If you don't want the compiler to do this (and you probably don't), then
120	you should use something like the following::
121	
122		while (READ_ONCE(a) < 0)
123			do_something();
124	
125	Alternatively, you could place a barrier() call in the loop.
126	
127	For another example, consider the following code::
128	
129		tmp_a = a;
130		do_something_with(tmp_a);
131		do_something_else_with(tmp_a);
132	
133	If the compiler can prove that do_something_with() does not store to the
134	variable a, then the compiler is within its rights to manufacture an
135	additional load as follows::
136	
137		tmp_a = a;
138		do_something_with(tmp_a);
139		tmp_a = a;
140		do_something_else_with(tmp_a);
141	
142	This could fatally confuse your code if it expected the same value
143	to be passed to do_something_with() and do_something_else_with().
144	
145	The compiler would be likely to manufacture this additional load if
146	do_something_with() was an inline function that made very heavy use
147	of registers: reloading from variable a could save a flush to the
148	stack and later reload.  To prevent the compiler from attacking your
149	code in this manner, write the following::
150	
151		tmp_a = READ_ONCE(a);
152		do_something_with(tmp_a);
153		do_something_else_with(tmp_a);
154	
155	For a final example, consider the following code, assuming that the
156	variable a is set at boot time before the second CPU is brought online
157	and never changed later, so that memory barriers are not needed::
158	
159		if (a)
160			b = 9;
161		else
162			b = 42;
163	
164	The compiler is within its rights to manufacture an additional store
165	by transforming the above code into the following::
166	
167		b = 42;
168		if (a)
169			b = 9;
170	
171	This could come as a fatal surprise to other code running concurrently
172	that expected b to never have the value 42 if a was zero.  To prevent
173	the compiler from doing this, write something like::
174	
175		if (a)
176			WRITE_ONCE(b, 9);
177		else
178			WRITE_ONCE(b, 42);
179	
180	Don't even -think- about doing this without proper use of memory barriers,
181	locks, or atomic operations if variable a can change at runtime!
182	
183	.. warning::
184	
185		``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER!
186	
187	Now, we move onto the atomic operation interfaces typically implemented with
188	the help of assembly code. ::
189	
190		void atomic_add(int i, atomic_t *v);
191		void atomic_sub(int i, atomic_t *v);
192		void atomic_inc(atomic_t *v);
193		void atomic_dec(atomic_t *v);
194	
195	These four routines add and subtract integral values to/from the given
196	atomic_t value.  The first two routines pass explicit integers by
197	which to make the adjustment, whereas the latter two use an implicit
198	adjustment value of "1".
199	
200	One very important aspect of these two routines is that they DO NOT
201	require any explicit memory barriers.  They need only perform the
202	atomic_t counter update in an SMP safe manner.
203	
204	Next, we have::
205	
206		int atomic_inc_return(atomic_t *v);
207		int atomic_dec_return(atomic_t *v);
208	
209	These routines add 1 and subtract 1, respectively, from the given
210	atomic_t and return the new counter value after the operation is
211	performed.
212	
213	Unlike the above routines, it is required that these primitives
214	include explicit memory barriers that are performed before and after
215	the operation.  It must be done such that all memory operations before
216	and after the atomic operation calls are strongly ordered with respect
217	to the atomic operation itself.
218	
219	For example, it should behave as if a smp_mb() call existed both
220	before and after the atomic operation.
221	
222	If the atomic instructions used in an implementation provide explicit
223	memory barrier semantics which satisfy the above requirements, that is
224	fine as well.
225	
226	Let's move on::
227	
228		int atomic_add_return(int i, atomic_t *v);
229		int atomic_sub_return(int i, atomic_t *v);
230	
231	These behave just like atomic_{inc,dec}_return() except that an
232	explicit counter adjustment is given instead of the implicit "1".
233	This means that like atomic_{inc,dec}_return(), the memory barrier
234	semantics are required.
235	
236	Next::
237	
238		int atomic_inc_and_test(atomic_t *v);
239		int atomic_dec_and_test(atomic_t *v);
240	
241	These two routines increment and decrement by 1, respectively, the
242	given atomic counter.  They return a boolean indicating whether the
243	resulting counter value was zero or not.
244	
245	Again, these primitives provide explicit memory barrier semantics around
246	the atomic operation::
247	
248		int atomic_sub_and_test(int i, atomic_t *v);
249	
250	This is identical to atomic_dec_and_test() except that an explicit
251	decrement is given instead of the implicit "1".  This primitive must
252	provide explicit memory barrier semantics around the operation::
253	
254		int atomic_add_negative(int i, atomic_t *v);
255	
256	The given increment is added to the given atomic counter value.  A boolean
257	is return which indicates whether the resulting counter value is negative.
258	This primitive must provide explicit memory barrier semantics around
259	the operation.
260	
261	Then::
262	
263		int atomic_xchg(atomic_t *v, int new);
264	
265	This performs an atomic exchange operation on the atomic variable v, setting
266	the given new value.  It returns the old value that the atomic variable v had
267	just before the operation.
268	
269	atomic_xchg must provide explicit memory barriers around the operation. ::
270	
271		int atomic_cmpxchg(atomic_t *v, int old, int new);
272	
273	This performs an atomic compare exchange operation on the atomic value v,
274	with the given old and new values. Like all atomic_xxx operations,
275	atomic_cmpxchg will only satisfy its atomicity semantics as long as all
276	other accesses of \*v are performed through atomic_xxx operations.
277	
278	atomic_cmpxchg must provide explicit memory barriers around the operation,
279	although if the comparison fails then no memory ordering guarantees are
280	required.
281	
282	The semantics for atomic_cmpxchg are the same as those defined for 'cas'
283	below.
284	
285	Finally::
286	
287		int atomic_add_unless(atomic_t *v, int a, int u);
288	
289	If the atomic value v is not equal to u, this function adds a to v, and
290	returns non zero. If v is equal to u then it returns zero. This is done as
291	an atomic operation.
292	
293	atomic_add_unless must provide explicit memory barriers around the
294	operation unless it fails (returns 0).
295	
296	atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
297	
298	
299	If a caller requires memory barrier semantics around an atomic_t
300	operation which does not return a value, a set of interfaces are
301	defined which accomplish this::
302	
303		void smp_mb__before_atomic(void);
304		void smp_mb__after_atomic(void);
305	
306	Preceding a non-value-returning read-modify-write atomic operation with
307	smp_mb__before_atomic() and following it with smp_mb__after_atomic()
308	provides the same full ordering that is provided by value-returning
309	read-modify-write atomic operations.
310	
311	For example, smp_mb__before_atomic() can be used like so::
312	
313		obj->dead = 1;
314		smp_mb__before_atomic();
315		atomic_dec(&obj->ref_count);
316	
317	It makes sure that all memory operations preceding the atomic_dec()
318	call are strongly ordered with respect to the atomic counter
319	operation.  In the above example, it guarantees that the assignment of
320	"1" to obj->dead will be globally visible to other cpus before the
321	atomic counter decrement.
322	
323	Without the explicit smp_mb__before_atomic() call, the
324	implementation could legally allow the atomic counter update visible
325	to other cpus before the "obj->dead = 1;" assignment.
326	
327	A missing memory barrier in the cases where they are required by the
328	atomic_t implementation above can have disastrous results.  Here is
329	an example, which follows a pattern occurring frequently in the Linux
330	kernel.  It is the use of atomic counters to implement reference
331	counting, and it works such that once the counter falls to zero it can
332	be guaranteed that no other entity can be accessing the object::
333	
334		static void obj_list_add(struct obj *obj, struct list_head *head)
335		{
336			obj->active = 1;
337			list_add(&obj->list, head);
338		}
339	
340		static void obj_list_del(struct obj *obj)
341		{
342			list_del(&obj->list);
343			obj->active = 0;
344		}
345	
346		static void obj_destroy(struct obj *obj)
347		{
348			BUG_ON(obj->active);
349			kfree(obj);
350		}
351	
352		struct obj *obj_list_peek(struct list_head *head)
353		{
354			if (!list_empty(head)) {
355				struct obj *obj;
356	
357				obj = list_entry(head->next, struct obj, list);
358				atomic_inc(&obj->refcnt);
359				return obj;
360			}
361			return NULL;
362		}
363	
364		void obj_poke(void)
365		{
366			struct obj *obj;
367	
368			spin_lock(&global_list_lock);
369			obj = obj_list_peek(&global_list);
370			spin_unlock(&global_list_lock);
371	
372			if (obj) {
373				obj->ops->poke(obj);
374				if (atomic_dec_and_test(&obj->refcnt))
375					obj_destroy(obj);
376			}
377		}
378	
379		void obj_timeout(struct obj *obj)
380		{
381			spin_lock(&global_list_lock);
382			obj_list_del(obj);
383			spin_unlock(&global_list_lock);
384	
385			if (atomic_dec_and_test(&obj->refcnt))
386				obj_destroy(obj);
387		}
388	
389	.. note::
390	
391		This is a simplification of the ARP queue management in the generic
392		neighbour discover code of the networking.  Olaf Kirch found a bug wrt.
393		memory barriers in kfree_skb() that exposed the atomic_t memory barrier
394		requirements quite clearly.
395	
396	Given the above scheme, it must be the case that the obj->active
397	update done by the obj list deletion be visible to other processors
398	before the atomic counter decrement is performed.
399	
400	Otherwise, the counter could fall to zero, yet obj->active would still
401	be set, thus triggering the assertion in obj_destroy().  The error
402	sequence looks like this::
403	
404		cpu 0				cpu 1
405		obj_poke()			obj_timeout()
406		obj = obj_list_peek();
407		... gains ref to obj, refcnt=2
408						obj_list_del(obj);
409						obj->active = 0 ...
410						... visibility delayed ...
411						atomic_dec_and_test()
412						... refcnt drops to 1 ...
413		atomic_dec_and_test()
414		... refcount drops to 0 ...
415		obj_destroy()
416		BUG() triggers since obj->active
417		still seen as one
418						obj->active update visibility occurs
419	
420	With the memory barrier semantics required of the atomic_t operations
421	which return values, the above sequence of memory visibility can never
422	happen.  Specifically, in the above case the atomic_dec_and_test()
423	counter decrement would not become globally visible until the
424	obj->active update does.
425	
426	As a historical note, 32-bit Sparc used to only allow usage of
427	24-bits of its atomic_t type.  This was because it used 8 bits
428	as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
429	type instruction.  However, 32-bit Sparc has since been moved over
430	to a "hash table of spinlocks" scheme, that allows the full 32-bit
431	counter to be realized.  Essentially, an array of spinlocks are
432	indexed into based upon the address of the atomic_t being operated
433	on, and that lock protects the atomic operation.  Parisc uses the
434	same scheme.
435	
436	Another note is that the atomic_t operations returning values are
437	extremely slow on an old 386.
438	
439	
440	Atomic Bitmask
441	==============
442	
443	We will now cover the atomic bitmask operations.  You will find that
444	their SMP and memory barrier semantics are similar in shape and scope
445	to the atomic_t ops above.
446	
447	Native atomic bit operations are defined to operate on objects aligned
448	to the size of an "unsigned long" C data type, and are least of that
449	size.  The endianness of the bits within each "unsigned long" are the
450	native endianness of the cpu. ::
451	
452		void set_bit(unsigned long nr, volatile unsigned long *addr);
453		void clear_bit(unsigned long nr, volatile unsigned long *addr);
454		void change_bit(unsigned long nr, volatile unsigned long *addr);
455	
456	These routines set, clear, and change, respectively, the bit number
457	indicated by "nr" on the bit mask pointed to by "ADDR".
458	
459	They must execute atomically, yet there are no implicit memory barrier
460	semantics required of these interfaces. ::
461	
462		int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
463		int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
464		int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
465	
466	Like the above, except that these routines return a boolean which
467	indicates whether the changed bit was set _BEFORE_ the atomic bit
468	operation.
469	
470	WARNING! It is incredibly important that the value be a boolean,
471	ie. "0" or "1".  Do not try to be fancy and save a few instructions by
472	declaring the above to return "long" and just returning something like
473	"old_val & mask" because that will not work.
474	
475	For one thing, this return value gets truncated to int in many code
476	paths using these interfaces, so on 64-bit if the bit is set in the
477	upper 32-bits then testers will never see that.
478	
479	One great example of where this problem crops up are the thread_info
480	flag operations.  Routines such as test_and_set_ti_thread_flag() chop
481	the return value into an int.  There are other places where things
482	like this occur as well.
483	
484	These routines, like the atomic_t counter operations returning values,
485	must provide explicit memory barrier semantics around their execution.
486	All memory operations before the atomic bit operation call must be
487	made visible globally before the atomic bit operation is made visible.
488	Likewise, the atomic bit operation must be visible globally before any
489	subsequent memory operation is made visible.  For example::
490	
491		obj->dead = 1;
492		if (test_and_set_bit(0, &obj->flags))
493			/* ... */;
494		obj->killed = 1;
495	
496	The implementation of test_and_set_bit() must guarantee that
497	"obj->dead = 1;" is visible to cpus before the atomic memory operation
498	done by test_and_set_bit() becomes visible.  Likewise, the atomic
499	memory operation done by test_and_set_bit() must become visible before
500	"obj->killed = 1;" is visible.
501	
502	Finally there is the basic operation::
503	
504		int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
505	
506	Which returns a boolean indicating if bit "nr" is set in the bitmask
507	pointed to by "addr".
508	
509	If explicit memory barriers are required around {set,clear}_bit() (which do
510	not return a value, and thus does not need to provide memory barrier
511	semantics), two interfaces are provided::
512	
513		void smp_mb__before_atomic(void);
514		void smp_mb__after_atomic(void);
515	
516	They are used as follows, and are akin to their atomic_t operation
517	brothers::
518	
519		/* All memory operations before this call will
520		 * be globally visible before the clear_bit().
521		 */
522		smp_mb__before_atomic();
523		clear_bit( ... );
524	
525		/* The clear_bit() will be visible before all
526		 * subsequent memory operations.
527		 */
528		 smp_mb__after_atomic();
529	
530	There are two special bitops with lock barrier semantics (acquire/release,
531	same as spinlocks). These operate in the same way as their non-_lock/unlock
532	postfixed variants, except that they are to provide acquire/release semantics,
533	respectively. This means they can be used for bit_spin_trylock and
534	bit_spin_unlock type operations without specifying any more barriers. ::
535	
536		int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
537		void clear_bit_unlock(unsigned long nr, unsigned long *addr);
538		void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
539	
540	The __clear_bit_unlock version is non-atomic, however it still implements
541	unlock barrier semantics. This can be useful if the lock itself is protecting
542	the other bits in the word.
543	
544	Finally, there are non-atomic versions of the bitmask operations
545	provided.  They are used in contexts where some other higher-level SMP
546	locking scheme is being used to protect the bitmask, and thus less
547	expensive non-atomic operations may be used in the implementation.
548	They have names similar to the above bitmask operation interfaces,
549	except that two underscores are prefixed to the interface name. ::
550	
551		void __set_bit(unsigned long nr, volatile unsigned long *addr);
552		void __clear_bit(unsigned long nr, volatile unsigned long *addr);
553		void __change_bit(unsigned long nr, volatile unsigned long *addr);
554		int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
555		int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
556		int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
557	
558	These non-atomic variants also do not require any special memory
559	barrier semantics.
560	
561	The routines xchg() and cmpxchg() must provide the same exact
562	memory-barrier semantics as the atomic and bit operations returning
563	values.
564	
565	.. note::
566	
567		If someone wants to use xchg(), cmpxchg() and their variants,
568		linux/atomic.h should be included rather than asm/cmpxchg.h, unless the
569		code is in arch/* and can take care of itself.
570	
571	Spinlocks and rwlocks have memory barrier expectations as well.
572	The rule to follow is simple:
573	
574	1) When acquiring a lock, the implementation must make it globally
575	   visible before any subsequent memory operation.
576	
577	2) When releasing a lock, the implementation must make it such that
578	   all previous memory operations are globally visible before the
579	   lock release.
580	
581	Which finally brings us to _atomic_dec_and_lock().  There is an
582	architecture-neutral version implemented in lib/dec_and_lock.c,
583	but most platforms will wish to optimize this in assembler. ::
584	
585		int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
586	
587	Atomically decrement the given counter, and if will drop to zero
588	atomically acquire the given spinlock and perform the decrement
589	of the counter to zero.  If it does not drop to zero, do nothing
590	with the spinlock.
591	
592	It is actually pretty simple to get the memory barrier correct.
593	Simply satisfy the spinlock grab requirements, which is make
594	sure the spinlock operation is globally visible before any
595	subsequent memory operation.
596	
597	We can demonstrate this operation more clearly if we define
598	an abstract atomic operation::
599	
600		long cas(long *mem, long old, long new);
601	
602	"cas" stands for "compare and swap".  It atomically:
603	
604	1) Compares "old" with the value currently at "mem".
605	2) If they are equal, "new" is written to "mem".
606	3) Regardless, the current value at "mem" is returned.
607	
608	As an example usage, here is what an atomic counter update
609	might look like::
610	
611		void example_atomic_inc(long *counter)
612		{
613			long old, new, ret;
614	
615			while (1) {
616				old = *counter;
617				new = old + 1;
618	
619				ret = cas(counter, old, new);
620				if (ret == old)
621					break;
622			}
623		}
624	
625	Let's use cas() in order to build a pseudo-C atomic_dec_and_lock()::
626	
627		int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
628		{
629			long old, new, ret;
630			int went_to_zero;
631	
632			went_to_zero = 0;
633			while (1) {
634				old = atomic_read(atomic);
635				new = old - 1;
636				if (new == 0) {
637					went_to_zero = 1;
638					spin_lock(lock);
639				}
640				ret = cas(atomic, old, new);
641				if (ret == old)
642					break;
643				if (went_to_zero) {
644					spin_unlock(lock);
645					went_to_zero = 0;
646				}
647			}
648	
649			return went_to_zero;
650		}
651	
652	Now, as far as memory barriers go, as long as spin_lock()
653	strictly orders all subsequent memory operations (including
654	the cas()) with respect to itself, things will be fine.
655	
656	Said another way, _atomic_dec_and_lock() must guarantee that
657	a counter dropping to zero is never made visible before the
658	spinlock being acquired.
659	
660	.. note::
661	
662		Note that this also means that for the case where the counter is not
663		dropping to zero, there are no memory ordering requirements.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.