About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / futex-requeue-pi.txt




Custom Search

Based on kernel version 4.13.3. Page generated on 2017-09-23 13:55 EST.

1	================
2	Futex Requeue PI
3	================
4	
5	Requeueing of tasks from a non-PI futex to a PI futex requires
6	special handling in order to ensure the underlying rt_mutex is never
7	left without an owner if it has waiters; doing so would break the PI
8	boosting logic [see rt-mutex-desgin.txt] For the purposes of
9	brevity, this action will be referred to as "requeue_pi" throughout
10	this document.  Priority inheritance is abbreviated throughout as
11	"PI".
12	
13	Motivation
14	----------
15	
16	Without requeue_pi, the glibc implementation of
17	pthread_cond_broadcast() must resort to waking all the tasks waiting
18	on a pthread_condvar and letting them try to sort out which task
19	gets to run first in classic thundering-herd formation.  An ideal
20	implementation would wake the highest-priority waiter, and leave the
21	rest to the natural wakeup inherent in unlocking the mutex
22	associated with the condvar.
23	
24	Consider the simplified glibc calls::
25	
26		/* caller must lock mutex */
27		pthread_cond_wait(cond, mutex)
28		{
29			lock(cond->__data.__lock);
30			unlock(mutex);
31			do {
32			unlock(cond->__data.__lock);
33			futex_wait(cond->__data.__futex);
34			lock(cond->__data.__lock);
35			} while(...)
36			unlock(cond->__data.__lock);
37			lock(mutex);
38		}
39	
40		pthread_cond_broadcast(cond)
41		{
42			lock(cond->__data.__lock);
43			unlock(cond->__data.__lock);
44			futex_requeue(cond->data.__futex, cond->mutex);
45		}
46	
47	Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
48	has waiters. Note that pthread_cond_wait() attempts to lock the
49	mutex only after it has returned to user space.  This will leave the
50	underlying rt_mutex with waiters, and no owner, breaking the
51	previously mentioned PI-boosting algorithms.
52	
53	In order to support PI-aware pthread_condvar's, the kernel needs to
54	be able to requeue tasks to PI futexes.  This support implies that
55	upon a successful futex_wait system call, the caller would return to
56	user space already holding the PI futex.  The glibc implementation
57	would be modified as follows::
58	
59	
60		/* caller must lock mutex */
61		pthread_cond_wait_pi(cond, mutex)
62		{
63			lock(cond->__data.__lock);
64			unlock(mutex);
65			do {
66			unlock(cond->__data.__lock);
67			futex_wait_requeue_pi(cond->__data.__futex);
68			lock(cond->__data.__lock);
69			} while(...)
70			unlock(cond->__data.__lock);
71			/* the kernel acquired the mutex for us */
72		}
73	
74		pthread_cond_broadcast_pi(cond)
75		{
76			lock(cond->__data.__lock);
77			unlock(cond->__data.__lock);
78			futex_requeue_pi(cond->data.__futex, cond->mutex);
79		}
80	
81	The actual glibc implementation will likely test for PI and make the
82	necessary changes inside the existing calls rather than creating new
83	calls for the PI cases.  Similar changes are needed for
84	pthread_cond_timedwait() and pthread_cond_signal().
85	
86	Implementation
87	--------------
88	
89	In order to ensure the rt_mutex has an owner if it has waiters, it
90	is necessary for both the requeue code, as well as the waiting code,
91	to be able to acquire the rt_mutex before returning to user space.
92	The requeue code cannot simply wake the waiter and leave it to
93	acquire the rt_mutex as it would open a race window between the
94	requeue call returning to user space and the waiter waking and
95	starting to run.  This is especially true in the uncontended case.
96	
97	The solution involves two new rt_mutex helper routines,
98	rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
99	allow the requeue code to acquire an uncontended rt_mutex on behalf
100	of the waiter and to enqueue the waiter on a contended rt_mutex.
101	Two new system calls provide the kernel<->user interface to
102	requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
103	
104	FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
105	and pthread_cond_timedwait()) to block on the initial futex and wait
106	to be requeued to a PI-aware futex.  The implementation is the
107	result of a high-speed collision between futex_wait() and
108	futex_lock_pi(), with some extra logic to check for the additional
109	wake-up scenarios.
110	
111	FUTEX_CMP_REQUEUE_PI is called by the waker
112	(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
113	possibly wake the waiting tasks. Internally, this system call is
114	still handled by futex_requeue (by passing requeue_pi=1).  Before
115	requeueing, futex_requeue() attempts to acquire the requeue target
116	PI futex on behalf of the top waiter.  If it can, this waiter is
117	woken.  futex_requeue() then proceeds to requeue the remaining
118	nr_wake+nr_requeue tasks to the PI futex, calling
119	rt_mutex_start_proxy_lock() prior to each requeue to prepare the
120	task as a waiter on the underlying rt_mutex.  It is possible that
121	the lock can be acquired at this stage as well, if so, the next
122	waiter is woken to finish the acquisition of the lock.
123	
124	FUTEX_CMP_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
125	their sum is all that really matters.  futex_requeue() will wake or
126	requeue up to nr_wake + nr_requeue tasks.  It will wake only as many
127	tasks as it can acquire the lock for, which in the majority of cases
128	should be 0 as good programming practice dictates that the caller of
129	either pthread_cond_broadcast() or pthread_cond_signal() acquire the
130	mutex prior to making the call. FUTEX_CMP_REQUEUE_PI requires that
131	nr_wake=1.  nr_requeue should be INT_MAX for broadcast and 0 for
132	signal.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.