About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / RCU / rculist_nulls.txt




Custom Search

Based on kernel version 3.13. Page generated on 2014-01-20 22:04 EST.

1	Using hlist_nulls to protect read-mostly linked lists and
2	objects using SLAB_DESTROY_BY_RCU allocations.
3	
4	Please read the basics in Documentation/RCU/listRCU.txt
5	
6	Using special makers (called 'nulls') is a convenient way
7	to solve following problem :
8	
9	A typical RCU linked list managing objects which are
10	allocated with SLAB_DESTROY_BY_RCU kmem_cache can
11	use following algos :
12	
13	1) Lookup algo
14	--------------
15	rcu_read_lock()
16	begin:
17	obj = lockless_lookup(key);
18	if (obj) {
19	  if (!try_get_ref(obj)) // might fail for free objects
20	    goto begin;
21	  /*
22	   * Because a writer could delete object, and a writer could
23	   * reuse these object before the RCU grace period, we
24	   * must check key after getting the reference on object
25	   */
26	  if (obj->key != key) { // not the object we expected
27	     put_ref(obj);
28	     goto begin;
29	   }
30	}
31	rcu_read_unlock();
32	
33	Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
34	but a version with an additional memory barrier (smp_rmb())
35	
36	lockless_lookup(key)
37	{
38	   struct hlist_node *node, *next;
39	   for (pos = rcu_dereference((head)->first);
40	          pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
41	          ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
42	          pos = rcu_dereference(next))
43	      if (obj->key == key)
44	         return obj;
45	   return NULL;
46	
47	And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
48	
49	   struct hlist_node *node;
50	   for (pos = rcu_dereference((head)->first);
51			pos && ({ prefetch(pos->next); 1; }) &&
52			({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
53			pos = rcu_dereference(pos->next))
54	      if (obj->key == key)
55	         return obj;
56	   return NULL;
57	}
58	
59	Quoting Corey Minyard :
60	
61	"If the object is moved from one list to another list in-between the
62	 time the hash is calculated and the next field is accessed, and the
63	 object has moved to the end of a new list, the traversal will not
64	 complete properly on the list it should have, since the object will
65	 be on the end of the new list and there's not a way to tell it's on a
66	 new list and restart the list traversal.  I think that this can be
67	 solved by pre-fetching the "next" field (with proper barriers) before
68	 checking the key."
69	
70	2) Insert algo :
71	----------------
72	
73	We need to make sure a reader cannot read the new 'obj->obj_next' value
74	and previous value of 'obj->key'. Or else, an item could be deleted
75	from a chain, and inserted into another chain. If new chain was empty
76	before the move, 'next' pointer is NULL, and lockless reader can
77	not detect it missed following items in original chain.
78	
79	/*
80	 * Please note that new inserts are done at the head of list,
81	 * not in the middle or end.
82	 */
83	obj = kmem_cache_alloc(...);
84	lock_chain(); // typically a spin_lock()
85	obj->key = key;
86	/*
87	 * we need to make sure obj->key is updated before obj->next
88	 * or obj->refcnt
89	 */
90	smp_wmb();
91	atomic_set(&obj->refcnt, 1);
92	hlist_add_head_rcu(&obj->obj_node, list);
93	unlock_chain(); // typically a spin_unlock()
94	
95	
96	3) Remove algo
97	--------------
98	Nothing special here, we can use a standard RCU hlist deletion.
99	But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused
100	very very fast (before the end of RCU grace period)
101	
102	if (put_last_reference_on(obj) {
103	   lock_chain(); // typically a spin_lock()
104	   hlist_del_init_rcu(&obj->obj_node);
105	   unlock_chain(); // typically a spin_unlock()
106	   kmem_cache_free(cachep, obj);
107	}
108	
109	
110	
111	--------------------------------------------------------------------------
112	With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
113	and extra smp_wmb() in insert function.
114	
115	For example, if we choose to store the slot number as the 'nulls'
116	end-of-list marker for each slot of the hash table, we can detect
117	a race (some writer did a delete and/or a move of an object
118	to another chain) checking the final 'nulls' value if
119	the lookup met the end of chain. If final 'nulls' value
120	is not the slot number, then we must restart the lookup at
121	the beginning. If the object was moved to the same chain,
122	then the reader doesn't care : It might eventually
123	scan the list again without harm.
124	
125	
126	1) lookup algo
127	
128	 head = &table[slot];
129	 rcu_read_lock();
130	begin:
131	 hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
132	   if (obj->key == key) {
133	      if (!try_get_ref(obj)) // might fail for free objects
134	         goto begin;
135	      if (obj->key != key) { // not the object we expected
136	         put_ref(obj);
137	         goto begin;
138	      }
139	  goto out;
140	 }
141	/*
142	 * if the nulls value we got at the end of this lookup is
143	 * not the expected one, we must restart lookup.
144	 * We probably met an item that was moved to another chain.
145	 */
146	 if (get_nulls_value(node) != slot)
147	   goto begin;
148	 obj = NULL;
149	
150	out:
151	 rcu_read_unlock();
152	
153	2) Insert function :
154	--------------------
155	
156	/*
157	 * Please note that new inserts are done at the head of list,
158	 * not in the middle or end.
159	 */
160	obj = kmem_cache_alloc(cachep);
161	lock_chain(); // typically a spin_lock()
162	obj->key = key;
163	/*
164	 * changes to obj->key must be visible before refcnt one
165	 */
166	smp_wmb();
167	atomic_set(&obj->refcnt, 1);
168	/*
169	 * insert obj in RCU way (readers might be traversing chain)
170	 */
171	hlist_nulls_add_head_rcu(&obj->obj_node, list);
172	unlock_chain(); // typically a spin_unlock()
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.