About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / unshare.txt

Custom Search

Based on kernel version 4.10.8. Page generated on 2017-04-01 14:44 EST.

2	unshare system call:
3	--------------------
4	This document describes the new system call, unshare. The document
5	provides an overview of the feature, why it is needed, how it can
6	be used, its interface specification, design, implementation and
7	how it can be tested.
9	Change Log:
10	-----------
11	version 0.1  Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006
13	Contents:
14	---------
15		1) Overview
16		2) Benefits
17		3) Cost
18		4) Requirements
19		5) Functional Specification
20		6) High Level Design
21		7) Low Level Design
22		8) Test Specification
23		9) Future Work
25	1) Overview
26	-----------
27	Most legacy operating system kernels support an abstraction of threads
28	as multiple execution contexts within a process. These kernels provide
29	special resources and mechanisms to maintain these "threads". The Linux
30	kernel, in a clever and simple manner, does not make distinction
31	between processes and "threads". The kernel allows processes to share
32	resources and thus they can achieve legacy "threads" behavior without
33	requiring additional data structures and mechanisms in the kernel. The
34	power of implementing threads in this manner comes not only from
35	its simplicity but also from allowing application programmers to work
36	outside the confinement of all-or-nothing shared resources of legacy
37	threads. On Linux, at the time of thread creation using the clone system
38	call, applications can selectively choose which resources to share
39	between threads.
41	unshare system call adds a primitive to the Linux thread model that
42	allows threads to selectively 'unshare' any resources that were being
43	shared at the time of their creation. unshare was conceptualized by
44	Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part
45	of the discussion on POSIX threads on Linux.  unshare augments the
46	usefulness of Linux threads for applications that would like to control
47	shared resources without creating a new process. unshare is a natural
48	addition to the set of available primitives on Linux that implement
49	the concept of process/thread as a virtual machine.
51	2) Benefits
52	-----------
53	unshare would be useful to large application frameworks such as PAM
54	where creating a new process to control sharing/unsharing of process
55	resources is not possible. Since namespaces are shared by default
56	when creating a new process using fork or clone, unshare can benefit
57	even non-threaded applications if they have a need to disassociate
58	from default shared namespace. The following lists two use-cases
59	where unshare can be used.
61	2.1 Per-security context namespaces
62	-----------------------------------
63	unshare can be used to implement polyinstantiated directories using
64	the kernel's per-process namespace mechanism. Polyinstantiated directories,
65	such as per-user and/or per-security context instance of /tmp, /var/tmp or
66	per-security context instance of a user's home directory, isolate user
67	processes when working with these directories. Using unshare, a PAM
68	module can easily setup a private namespace for a user at login.
69	Polyinstantiated directories are required for Common Criteria certification
70	with Labeled System Protection Profile, however, with the availability
71	of shared-tree feature in the Linux kernel, even regular Linux systems
72	can benefit from setting up private namespaces at login and
73	polyinstantiating /tmp, /var/tmp and other directories deemed
74	appropriate by system administrators.
76	2.2 unsharing of virtual memory and/or open files
77	-------------------------------------------------
78	Consider a client/server application where the server is processing
79	client requests by creating processes that share resources such as
80	virtual memory and open files. Without unshare, the server has to
81	decide what needs to be shared at the time of creating the process
82	which services the request. unshare allows the server an ability to
83	disassociate parts of the context during the servicing of the
84	request. For large and complex middleware application frameworks, this
85	ability to unshare after the process was created can be very
86	useful.
88	3) Cost
89	-------
90	In order to not duplicate code and to handle the fact that unshare
91	works on an active task (as opposed to clone/fork working on a newly
92	allocated inactive task) unshare had to make minor reorganizational
93	changes to copy_* functions utilized by clone/fork system call.
94	There is a cost associated with altering existing, well tested and
95	stable code to implement a new feature that may not get exercised
96	extensively in the beginning. However, with proper design and code
97	review of the changes and creation of an unshare test for the LTP
98	the benefits of this new feature can exceed its cost.
100	4) Requirements
101	---------------
102	unshare reverses sharing that was done using clone(2) system call,
103	so unshare should have a similar interface as clone(2). That is,
104	since flags in clone(int flags, void *stack) specifies what should
105	be shared, similar flags in unshare(int flags) should specify
106	what should be unshared. Unfortunately, this may appear to invert
107	the meaning of the flags from the way they are used in clone(2).
108	However, there was no easy solution that was less confusing and that
109	allowed incremental context unsharing in future without an ABI change.
111	unshare interface should accommodate possible future addition of
112	new context flags without requiring a rebuild of old applications.
113	If and when new context flags are added, unshare design should allow
114	incremental unsharing of those resources on an as needed basis.
116	5) Functional Specification
117	---------------------------
118	NAME
119		unshare - disassociate parts of the process execution context
122		#include <sched.h>
124		int unshare(int flags);
127		unshare allows a process to disassociate parts of its execution
128		context that are currently being shared with other processes. Part
129		of execution context, such as the namespace, is shared by default
130		when a new process is created using fork(2), while other parts,
131		such as the virtual memory, open file descriptors, etc, may be
132		shared by explicit request to share them when creating a process
133		using clone(2).
135		The main use of unshare is to allow a process to control its
136		shared execution context without creating a new process.
138		The flags argument specifies one or bitwise-or'ed of several of
139		the following constants.
142			If CLONE_FS is set, file system information of the caller
143			is disassociated from the shared file system information.
146			If CLONE_FILES is set, the file descriptor table of the
147			caller is disassociated from the shared file descriptor
148			table.
151			If CLONE_NEWNS is set, the namespace of the caller is
152			disassociated from the shared namespace.
155			If CLONE_VM is set, the virtual memory of the caller is
156			disassociated from the shared virtual memory.
159		On success, zero returned. On failure, -1 is returned and errno is
162		EPERM	CLONE_NEWNS was specified by a non-root process (process
163			without CAP_SYS_ADMIN).
165		ENOMEM	Cannot allocate sufficient memory to copy parts of caller's
166			context that need to be unshared.
168		EINVAL	Invalid flag was specified as an argument.
171		The unshare() call is Linux-specific and  should  not be used
172		in programs intended to be portable.
175		clone(2), fork(2)
177	6) High Level Design
178	--------------------
179	Depending on the flags argument, the unshare system call allocates
180	appropriate process context structures, populates it with values from
181	the current shared version, associates newly duplicated structures
182	with the current task structure and releases corresponding shared
183	versions. Helper functions of clone (copy_*) could not be used
184	directly by unshare because of the following two reasons.
185	  1) clone operates on a newly allocated not-yet-active task
186	     structure, where as unshare operates on the current active
187	     task. Therefore unshare has to take appropriate task_lock()
188	     before associating newly duplicated context structures
189	  2) unshare has to allocate and duplicate all context structures
190	     that are being unshared, before associating them with the
191	     current task and releasing older shared structures. Failure
192	     do so will create race conditions and/or oops when trying
193	     to backout due to an error. Consider the case of unsharing
194	     both virtual memory and namespace. After successfully unsharing
195	     vm, if the system call encounters an error while allocating
196	     new namespace structure, the error return code will have to
197	     reverse the unsharing of vm. As part of the reversal the
198	     system call will have to go back to older, shared, vm
199	     structure, which may not exist anymore.
201	Therefore code from copy_* functions that allocated and duplicated
202	current context structure was moved into new dup_* functions. Now,
203	copy_* functions call dup_* functions to allocate and duplicate
204	appropriate context structures and then associate them with the
205	task structure that is being constructed. unshare system call on
206	the other hand performs the following:
207	  1) Check flags to force missing, but implied, flags
208	  2) For each context structure, call the corresponding unshare
209	     helper function to allocate and duplicate a new context
210	     structure, if the appropriate bit is set in the flags argument.
211	  3) If there is no error in allocation and duplication and there
212	     are new context structures then lock the current task structure,
213	     associate new context structures with the current task structure,
214	     and release the lock on the current task structure.
215	  4) Appropriately release older, shared, context structures.
217	7) Low Level Design
218	-------------------
219	Implementation of unshare can be grouped in the following 4 different
220	items:
221	  a) Reorganization of existing copy_* functions
222	  b) unshare system call service function
223	  c) unshare helper functions for each different process context
224	  d) Registration of system call number for different architectures
226	  7.1) Reorganization of copy_* functions
227	       Each copy function such as copy_mm, copy_namespace, copy_files,
228	       etc, had roughly two components. The first component allocated
229	       and duplicated the appropriate structure and the second component
230	       linked it to the task structure passed in as an argument to the copy
231	       function. The first component was split into its own function.
232	       These dup_* functions allocated and duplicated the appropriate
233	       context structure. The reorganized copy_* functions invoked
234	       their corresponding dup_* functions and then linked the newly
235	       duplicated structures to the task structure with which the
236	       copy function was called.
238	  7.2) unshare system call service function
239	       * Check flags
240		 Force implied flags. If CLONE_THREAD is set force CLONE_VM.
241		 If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is
242		 set and signals are also being shared, force CLONE_THREAD. If
243		 CLONE_NEWNS is set, force CLONE_FS.
244	       * For each context flag, invoke the corresponding unshare_*
245		 helper routine with flags passed into the system call and a
246		 reference to pointer pointing the new unshared structure
247	       * If any new structures are created by unshare_* helper
248		 functions, take the task_lock() on the current task,
249		 modify appropriate context pointers, and release the
250	         task lock.
251	       * For all newly unshared structures, release the corresponding
252	         older, shared, structures.
254	  7.3) unshare_* helper functions
255	       For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,
256	       and CLONE_THREAD, return -EINVAL since they are not implemented yet.
257	       For others, check the flag value to see if the unsharing is
258	       required for that structure. If it is, invoke the corresponding
259	       dup_* function to allocate and duplicate the structure and return
260	       a pointer to it.
262	  7.4) Appropriately modify architecture specific code to register the
263	       new system call.
265	8) Test Specification
266	---------------------
267	The test for unshare should test the following:
268	  1) Valid flags: Test to check that clone flags for signal and
269		signal handlers, for which unsharing is not implemented
270		yet, return -EINVAL.
271	  2) Missing/implied flags: Test to make sure that if unsharing
272		namespace without specifying unsharing of filesystem, correctly
273		unshares both namespace and filesystem information.
274	  3) For each of the four (namespace, filesystem, files and vm)
275		supported unsharing, verify that the system call correctly
276		unshares the appropriate structure. Verify that unsharing
277		them individually as well as in combination with each
278		other works as expected.
279	  4) Concurrent execution: Use shared memory segments and futex on
280		an address in the shm segment to synchronize execution of
281		about 10 threads. Have a couple of threads execute execve,
282		a couple _exit and the rest unshare with different combination
283		of flags. Verify that unsharing is performed as expected and
284		that there are no oops or hangs.
286	9) Future Work
287	--------------
288	The current implementation of unshare does not allow unsharing of
289	signals and signal handlers. Signals are complex to begin with and
290	to unshare signals and/or signal handlers of a currently running
291	process is even more complex. If in the future there is a specific
292	need to allow unsharing of signals and/or signal handlers, it can
293	be incrementally added to unshare without affecting legacy
294	applications using unshare.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.