Linux Kernel Documentation :: credentials.txt

About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog
Documentation / credentials.txt

Based on kernel version 2.6.39.1. Page generated on 2011-06-03 13:45 EST.
1				     ====================
2				     CREDENTIALS IN LINUX
3				     ====================
4	
5	By: David Howells <dhowells@redhat.com>
6	
7	Contents:
8	
9	 (*) Overview.
10	
11	 (*) Types of credentials.
12	
13	 (*) File markings.
14	
15	 (*) Task credentials.
16	
17	     - Immutable credentials.
18	     - Accessing task credentials.
19	     - Accessing another task's credentials.
20	     - Altering credentials.
21	     - Managing credentials.
22	
23	 (*) Open file credentials.
24	
25	 (*) Overriding the VFS's use of credentials.
26	
27	
28	========
29	OVERVIEW
30	========
31	
32	There are several parts to the security check performed by Linux when one
33	object acts upon another:
34	
35	 (1) Objects.
36	
37	     Objects are things in the system that may be acted upon directly by
38	     userspace programs.  Linux has a variety of actionable objects, including:
39	
40		- Tasks
41		- Files/inodes
42		- Sockets
43		- Message queues
44		- Shared memory segments
45		- Semaphores
46		- Keys
47	
48	     As a part of the description of all these objects there is a set of
49	     credentials.  What's in the set depends on the type of object.
50	
51	 (2) Object ownership.
52	
53	     Amongst the credentials of most objects, there will be a subset that
54	     indicates the ownership of that object.  This is used for resource
55	     accounting and limitation (disk quotas and task rlimits for example).
56	
57	     In a standard UNIX filesystem, for instance, this will be defined by the
58	     UID marked on the inode.
59	
60	 (3) The objective context.
61	
62	     Also amongst the credentials of those objects, there will be a subset that
63	     indicates the 'objective context' of that object.  This may or may not be
64	     the same set as in (2) - in standard UNIX files, for instance, this is the
65	     defined by the UID and the GID marked on the inode.
66	
67	     The objective context is used as part of the security calculation that is
68	     carried out when an object is acted upon.
69	
70	 (4) Subjects.
71	
72	     A subject is an object that is acting upon another object.
73	
74	     Most of the objects in the system are inactive: they don't act on other
75	     objects within the system.  Processes/tasks are the obvious exception:
76	     they do stuff; they access and manipulate things.
77	
78	     Objects other than tasks may under some circumstances also be subjects.
79	     For instance an open file may send SIGIO to a task using the UID and EUID
80	     given to it by a task that called fcntl(F_SETOWN) upon it.  In this case,
81	     the file struct will have a subjective context too.
82	
83	 (5) The subjective context.
84	
85	     A subject has an additional interpretation of its credentials.  A subset
86	     of its credentials forms the 'subjective context'.  The subjective context
87	     is used as part of the security calculation that is carried out when a
88	     subject acts.
89	
90	     A Linux task, for example, has the FSUID, FSGID and the supplementary
91	     group list for when it is acting upon a file - which are quite separate
92	     from the real UID and GID that normally form the objective context of the
93	     task.
94	
95	 (6) Actions.
96	
97	     Linux has a number of actions available that a subject may perform upon an
98	     object.  The set of actions available depends on the nature of the subject
99	     and the object.
100	
101	     Actions include reading, writing, creating and deleting files; forking or
102	     signalling and tracing tasks.
103	
104	 (7) Rules, access control lists and security calculations.
105	
106	     When a subject acts upon an object, a security calculation is made.  This
107	     involves taking the subjective context, the objective context and the
108	     action, and searching one or more sets of rules to see whether the subject
109	     is granted or denied permission to act in the desired manner on the
110	     object, given those contexts.
111	
112	     There are two main sources of rules:
113	
114	     (a) Discretionary access control (DAC):
115	
116		 Sometimes the object will include sets of rules as part of its
117		 description.  This is an 'Access Control List' or 'ACL'.  A Linux
118		 file may supply more than one ACL.
119	
120		 A traditional UNIX file, for example, includes a permissions mask that
121		 is an abbreviated ACL with three fixed classes of subject ('user',
122		 'group' and 'other'), each of which may be granted certain privileges
123		 ('read', 'write' and 'execute' - whatever those map to for the object
124		 in question).  UNIX file permissions do not allow the arbitrary
125		 specification of subjects, however, and so are of limited use.
126	
127		 A Linux file might also sport a POSIX ACL.  This is a list of rules
128		 that grants various permissions to arbitrary subjects.
129	
130	     (b) Mandatory access control (MAC):
131	
132		 The system as a whole may have one or more sets of rules that get
133		 applied to all subjects and objects, regardless of their source.
134		 SELinux and Smack are examples of this.
135	
136		 In the case of SELinux and Smack, each object is given a label as part
137		 of its credentials.  When an action is requested, they take the
138		 subject label, the object label and the action and look for a rule
139		 that says that this action is either granted or denied.
140	
141	
142	====================
143	TYPES OF CREDENTIALS
144	====================
145	
146	The Linux kernel supports the following types of credentials:
147	
148	 (1) Traditional UNIX credentials.
149	
150		Real User ID
151		Real Group ID
152	
153	     The UID and GID are carried by most, if not all, Linux objects, even if in
154	     some cases it has to be invented (FAT or CIFS files for example, which are
155	     derived from Windows).  These (mostly) define the objective context of
156	     that object, with tasks being slightly different in some cases.
157	
158		Effective, Saved and FS User ID
159		Effective, Saved and FS Group ID
160		Supplementary groups
161	
162	     These are additional credentials used by tasks only.  Usually, an
163	     EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
164	     will be used as the objective.  For tasks, it should be noted that this is
165	     not always true.
166	
167	 (2) Capabilities.
168	
169		Set of permitted capabilities
170		Set of inheritable capabilities
171		Set of effective capabilities
172		Capability bounding set
173	
174	     These are only carried by tasks.  They indicate superior capabilities
175	     granted piecemeal to a task that an ordinary task wouldn't otherwise have.
176	     These are manipulated implicitly by changes to the traditional UNIX
177	     credentials, but can also be manipulated directly by the capset() system
178	     call.
179	
180	     The permitted capabilities are those caps that the process might grant
181	     itself to its effective or permitted sets through capset().  This
182	     inheritable set might also be so constrained.
183	
184	     The effective capabilities are the ones that a task is actually allowed to
185	     make use of itself.
186	
187	     The inheritable capabilities are the ones that may get passed across
188	     execve().
189	
190	     The bounding set limits the capabilities that may be inherited across
191	     execve(), especially when a binary is executed that will execute as UID 0.
192	
193	 (3) Secure management flags (securebits).
194	
195	     These are only carried by tasks.  These govern the way the above
196	     credentials are manipulated and inherited over certain operations such as
197	     execve().  They aren't used directly as objective or subjective
198	     credentials.
199	
200	 (4) Keys and keyrings.
201	
202	     These are only carried by tasks.  They carry and cache security tokens
203	     that don't fit into the other standard UNIX credentials.  They are for
204	     making such things as network filesystem keys available to the file
205	     accesses performed by processes, without the necessity of ordinary
206	     programs having to know about security details involved.
207	
208	     Keyrings are a special type of key.  They carry sets of other keys and can
209	     be searched for the desired key.  Each process may subscribe to a number
210	     of keyrings:
211	
212		Per-thread keying
213		Per-process keyring
214		Per-session keyring
215	
216	     When a process accesses a key, if not already present, it will normally be
217	     cached on one of these keyrings for future accesses to find.
218	
219	     For more information on using keys, see Documentation/keys.txt.
220	
221	 (5) LSM
222	
223	     The Linux Security Module allows extra controls to be placed over the
224	     operations that a task may do.  Currently Linux supports two main
225	     alternate LSM options: SELinux and Smack.
226	
227	     Both work by labelling the objects in a system and then applying sets of
228	     rules (policies) that say what operations a task with one label may do to
229	     an object with another label.
230	
231	 (6) AF_KEY
232	
233	     This is a socket-based approach to credential management for networking
234	     stacks [RFC 2367].  It isn't discussed by this document as it doesn't
235	     interact directly with task and file credentials; rather it keeps system
236	     level credentials.
237	
238	
239	When a file is opened, part of the opening task's subjective context is
240	recorded in the file struct created.  This allows operations using that file
241	struct to use those credentials instead of the subjective context of the task
242	that issued the operation.  An example of this would be a file opened on a
243	network filesystem where the credentials of the opened file should be presented
244	to the server, regardless of who is actually doing a read or a write upon it.
245	
246	
247	=============
248	FILE MARKINGS
249	=============
250	
251	Files on disk or obtained over the network may have annotations that form the
252	objective security context of that file.  Depending on the type of filesystem,
253	this may include one or more of the following:
254	
255	 (*) UNIX UID, GID, mode;
256	
257	 (*) Windows user ID;
258	
259	 (*) Access control list;
260	
261	 (*) LSM security label;
262	
263	 (*) UNIX exec privilege escalation bits (SUID/SGID);
264	
265	 (*) File capabilities exec privilege escalation bits.
266	
267	These are compared to the task's subjective security context, and certain
268	operations allowed or disallowed as a result.  In the case of execve(), the
269	privilege escalation bits come into play, and may allow the resulting process
270	extra privileges, based on the annotations on the executable file.
271	
272	
273	================
274	TASK CREDENTIALS
275	================
276	
277	In Linux, all of a task's credentials are held in (uid, gid) or through
278	(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
279	Each task points to its credentials by a pointer called 'cred' in its
280	task_struct.
281	
282	Once a set of credentials has been prepared and committed, it may not be
283	changed, barring the following exceptions:
284	
285	 (1) its reference count may be changed;
286	
287	 (2) the reference count on the group_info struct it points to may be changed;
288	
289	 (3) the reference count on the security data it points to may be changed;
290	
291	 (4) the reference count on any keyrings it points to may be changed;
292	
293	 (5) any keyrings it points to may be revoked, expired or have their security
294	     attributes changed; and
295	
296	 (6) the contents of any keyrings to which it points may be changed (the whole
297	     point of keyrings being a shared set of credentials, modifiable by anyone
298	     with appropriate access).
299	
300	To alter anything in the cred struct, the copy-and-replace principle must be
301	adhered to.  First take a copy, then alter the copy and then use RCU to change
302	the task pointer to make it point to the new copy.  There are wrappers to aid
303	with this (see below).
304	
305	A task may only alter its _own_ credentials; it is no longer permitted for a
306	task to alter another's credentials.  This means the capset() system call is no
307	longer permitted to take any PID other than the one of the current process.
308	Also keyctl_instantiate() and keyctl_negate() functions no longer permit
309	attachment to process-specific keyrings in the requesting process as the
310	instantiating process may need to create them.
311	
312	
313	IMMUTABLE CREDENTIALS
314	---------------------
315	
316	Once a set of credentials has been made public (by calling commit_creds() for
317	example), it must be considered immutable, barring two exceptions:
318	
319	 (1) The reference count may be altered.
320	
321	 (2) Whilst the keyring subscriptions of a set of credentials may not be
322	     changed, the keyrings subscribed to may have their contents altered.
323	
324	To catch accidental credential alteration at compile time, struct task_struct
325	has _const_ pointers to its credential sets, as does struct file.  Furthermore,
326	certain functions such as get_cred() and put_cred() operate on const pointers,
327	thus rendering casts unnecessary, but require to temporarily ditch the const
328	qualification to be able to alter the reference count.
329	
330	
331	ACCESSING TASK CREDENTIALS
332	--------------------------
333	
334	A task being able to alter only its own credentials permits the current process
335	to read or replace its own credentials without the need for any form of locking
336	- which simplifies things greatly.  It can just call:
337	
338		const struct cred *current_cred()
339	
340	to get a pointer to its credentials structure, and it doesn't have to release
341	it afterwards.
342	
343	There are convenience wrappers for retrieving specific aspects of a task's
344	credentials (the value is simply returned in each case):
345	
346		uid_t current_uid(void)		Current's real UID
347		gid_t current_gid(void)		Current's real GID
348		uid_t current_euid(void)	Current's effective UID
349		gid_t current_egid(void)	Current's effective GID
350		uid_t current_fsuid(void)	Current's file access UID
351		gid_t current_fsgid(void)	Current's file access GID
352		kernel_cap_t current_cap(void)	Current's effective capabilities
353		void *current_security(void)	Current's LSM security pointer
354		struct user_struct *current_user(void)  Current's user account
355	
356	There are also convenience wrappers for retrieving specific associated pairs of
357	a task's credentials:
358	
359		void current_uid_gid(uid_t *, gid_t *);
360		void current_euid_egid(uid_t *, gid_t *);
361		void current_fsuid_fsgid(uid_t *, gid_t *);
362	
363	which return these pairs of values through their arguments after retrieving
364	them from the current task's credentials.
365	
366	
367	In addition, there is a function for obtaining a reference on the current
368	process's current set of credentials:
369	
370		const struct cred *get_current_cred(void);
371	
372	and functions for getting references to one of the credentials that don't
373	actually live in struct cred:
374	
375		struct user_struct *get_current_user(void);
376		struct group_info *get_current_groups(void);
377	
378	which get references to the current process's user accounting structure and
379	supplementary groups list respectively.
380	
381	Once a reference has been obtained, it must be released with put_cred(),
382	free_uid() or put_group_info() as appropriate.
383	
384	
385	ACCESSING ANOTHER TASK'S CREDENTIALS
386	------------------------------------
387	
388	Whilst a task may access its own credentials without the need for locking, the
389	same is not true of a task wanting to access another task's credentials.  It
390	must use the RCU read lock and rcu_dereference().
391	
392	The rcu_dereference() is wrapped by:
393	
394		const struct cred *__task_cred(struct task_struct *task);
395	
396	This should be used inside the RCU read lock, as in the following example:
397	
398		void foo(struct task_struct *t, struct foo_data *f)
399		{
400			const struct cred *tcred;
401			...
402			rcu_read_lock();
403			tcred = __task_cred(t);
404			f->uid = tcred->uid;
405			f->gid = tcred->gid;
406			f->groups = get_group_info(tcred->groups);
407			rcu_read_unlock();
408			...
409		}
410	
411	Should it be necessary to hold another task's credentials for a long period of
412	time, and possibly to sleep whilst doing so, then the caller should get a
413	reference on them using:
414	
415		const struct cred *get_task_cred(struct task_struct *task);
416	
417	This does all the RCU magic inside of it.  The caller must call put_cred() on
418	the credentials so obtained when they're finished with.
419	
420	 [*] Note: The result of __task_cred() should not be passed directly to
421	     get_cred() as this may race with commit_cred().
422	
423	There are a couple of convenience functions to access bits of another task's
424	credentials, hiding the RCU magic from the caller:
425	
426		uid_t task_uid(task)		Task's real UID
427		uid_t task_euid(task)		Task's effective UID
428	
429	If the caller is holding the RCU read lock at the time anyway, then:
430	
431		__task_cred(task)->uid
432		__task_cred(task)->euid
433	
434	should be used instead.  Similarly, if multiple aspects of a task's credentials
435	need to be accessed, RCU read lock should be used, __task_cred() called, the
436	result stored in a temporary pointer and then the credential aspects called
437	from that before dropping the lock.  This prevents the potentially expensive
438	RCU magic from being invoked multiple times.
439	
440	Should some other single aspect of another task's credentials need to be
441	accessed, then this can be used:
442	
443		task_cred_xxx(task, member)
444	
445	where 'member' is a non-pointer member of the cred struct.  For instance:
446	
447		uid_t task_cred_xxx(task, suid);
448	
449	will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
450	magic.  This may not be used for pointer members as what they point to may
451	disappear the moment the RCU read lock is dropped.
452	
453	
454	ALTERING CREDENTIALS
455	--------------------
456	
457	As previously mentioned, a task may only alter its own credentials, and may not
458	alter those of another task.  This means that it doesn't need to use any
459	locking to alter its own credentials.
460	
461	To alter the current process's credentials, a function should first prepare a
462	new set of credentials by calling:
463	
464		struct cred *prepare_creds(void);
465	
466	this locks current->cred_replace_mutex and then allocates and constructs a
467	duplicate of the current process's credentials, returning with the mutex still
468	held if successful.  It returns NULL if not successful (out of memory).
469	
470	The mutex prevents ptrace() from altering the ptrace state of a process whilst
471	security checks on credentials construction and changing is taking place as
472	the ptrace state may alter the outcome, particularly in the case of execve().
473	
474	The new credentials set should be altered appropriately, and any security
475	checks and hooks done.  Both the current and the proposed sets of credentials
476	are available for this purpose as current_cred() will return the current set
477	still at this point.
478	
479	
480	When the credential set is ready, it should be committed to the current process
481	by calling:
482	
483		int commit_creds(struct cred *new);
484	
485	This will alter various aspects of the credentials and the process, giving the
486	LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually
487	commit the new credentials to current->cred, it will release
488	current->cred_replace_mutex to allow ptrace() to take place, and it will notify
489	the scheduler and others of the changes.
490	
491	This function is guaranteed to return 0, so that it can be tail-called at the
492	end of such functions as sys_setresuid().
493	
494	Note that this function consumes the caller's reference to the new credentials.
495	The caller should _not_ call put_cred() on the new credentials afterwards.
496	
497	Furthermore, once this function has been called on a new set of credentials,
498	those credentials may _not_ be changed further.
499	
500	
501	Should the security checks fail or some other error occur after prepare_creds()
502	has been called, then the following function should be invoked:
503	
504		void abort_creds(struct cred *new);
505	
506	This releases the lock on current->cred_replace_mutex that prepare_creds() got
507	and then releases the new credentials.
508	
509	
510	A typical credentials alteration function would look something like this:
511	
512		int alter_suid(uid_t suid)
513		{
514			struct cred *new;
515			int ret;
516	
517			new = prepare_creds();
518			if (!new)
519				return -ENOMEM;
520	
521			new->suid = suid;
522			ret = security_alter_suid(new);
523			if (ret < 0) {
524				abort_creds(new);
525				return ret;
526			}
527	
528			return commit_creds(new);
529		}
530	
531	
532	MANAGING CREDENTIALS
533	--------------------
534	
535	There are some functions to help manage credentials:
536	
537	 (*) void put_cred(const struct cred *cred);
538	
539	     This releases a reference to the given set of credentials.  If the
540	     reference count reaches zero, the credentials will be scheduled for
541	     destruction by the RCU system.
542	
543	 (*) const struct cred *get_cred(const struct cred *cred);
544	
545	     This gets a reference on a live set of credentials, returning a pointer to
546	     that set of credentials.
547	
548	 (*) struct cred *get_new_cred(struct cred *cred);
549	
550	     This gets a reference on a set of credentials that is under construction
551	     and is thus still mutable, returning a pointer to that set of credentials.
552	
553	
554	=====================
555	OPEN FILE CREDENTIALS
556	=====================
557	
558	When a new file is opened, a reference is obtained on the opening task's
559	credentials and this is attached to the file struct as 'f_cred' in place of
560	'f_uid' and 'f_gid'.  Code that used to access file->f_uid and file->f_gid
561	should now access file->f_cred->fsuid and file->f_cred->fsgid.
562	
563	It is safe to access f_cred without the use of RCU or locking because the
564	pointer will not change over the lifetime of the file struct, and nor will the
565	contents of the cred struct pointed to, barring the exceptions listed above
566	(see the Task Credentials section).
567	
568	
569	=======================================
570	OVERRIDING THE VFS'S USE OF CREDENTIALS
571	=======================================
572	
573	Under some circumstances it is desirable to override the credentials used by
574	the VFS, and that can be done by calling into such as vfs_mkdir() with a
575	different set of credentials.  This is done in the following places:
576	
577	 (*) sys_faccessat().
578	
579	 (*) do_coredump().
580	
581	 (*) nfs4recover.c.
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog
Information is copyright its respective author.
All material is available from the Linux Kernel Source distributed under a GPL License.
Hosted by mjmwired.net.