About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / filesystems / mandatory-locking.txt




Custom Search

Based on kernel version 3.15.4. Page generated on 2014-07-07 09:03 EST.

1		Mandatory File Locking For The Linux Operating System
2	
3			Andy Walker <andy@lysaker.kvaerner.no>
4	
5				   15 April 1996
6			     (Updated September 2007)
7	
8	0. Why you should avoid mandatory locking
9	-----------------------------------------
10	
11	The Linux implementation is prey to a number of difficult-to-fix race
12	conditions which in practice make it not dependable:
13	
14		- The write system call checks for a mandatory lock only once
15		  at its start.  It is therefore possible for a lock request to
16		  be granted after this check but before the data is modified.
17		  A process may then see file data change even while a mandatory
18		  lock was held.
19		- Similarly, an exclusive lock may be granted on a file after
20		  the kernel has decided to proceed with a read, but before the
21		  read has actually completed, and the reading process may see
22		  the file data in a state which should not have been visible
23		  to it.
24		- Similar races make the claimed mutual exclusion between lock
25		  and mmap similarly unreliable.
26	
27	1. What is  mandatory locking?
28	------------------------------
29	
30	Mandatory locking is kernel enforced file locking, as opposed to the more usual
31	cooperative file locking used to guarantee sequential access to files among
32	processes. File locks are applied using the flock() and fcntl() system calls
33	(and the lockf() library routine which is a wrapper around fcntl().) It is
34	normally a process' responsibility to check for locks on a file it wishes to
35	update, before applying its own lock, updating the file and unlocking it again.
36	The most commonly used example of this (and in the case of sendmail, the most
37	troublesome) is access to a user's mailbox. The mail user agent and the mail
38	transfer agent must guard against updating the mailbox at the same time, and
39	prevent reading the mailbox while it is being updated.
40	
41	In a perfect world all processes would use and honour a cooperative, or
42	"advisory" locking scheme. However, the world isn't perfect, and there's
43	a lot of poorly written code out there.
44	
45	In trying to address this problem, the designers of System V UNIX came up
46	with a "mandatory" locking scheme, whereby the operating system kernel would
47	block attempts by a process to write to a file that another process holds a
48	"read" -or- "shared" lock on, and block attempts to both read and write to a 
49	file that a process holds a "write " -or- "exclusive" lock on.
50	
51	The System V mandatory locking scheme was intended to have as little impact as
52	possible on existing user code. The scheme is based on marking individual files
53	as candidates for mandatory locking, and using the existing fcntl()/lockf()
54	interface for applying locks just as if they were normal, advisory locks.
55	
56	Note 1: In saying "file" in the paragraphs above I am actually not telling
57	the whole truth. System V locking is based on fcntl(). The granularity of
58	fcntl() is such that it allows the locking of byte ranges in files, in addition
59	to entire files, so the mandatory locking rules also have byte level
60	granularity.
61	
62	Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
63	borrowing the fcntl() locking scheme from System V. The mandatory locking
64	scheme is defined by the System V Interface Definition (SVID) Version 3.
65	
66	2. Marking a file for mandatory locking
67	---------------------------------------
68	
69	A file is marked as a candidate for mandatory locking by setting the group-id
70	bit in its file mode but removing the group-execute bit. This is an otherwise
71	meaningless combination, and was chosen by the System V implementors so as not
72	to break existing user programs.
73	
74	Note that the group-id bit is usually automatically cleared by the kernel when
75	a setgid file is written to. This is a security measure. The kernel has been
76	modified to recognize the special case of a mandatory lock candidate and to
77	refrain from clearing this bit. Similarly the kernel has been modified not
78	to run mandatory lock candidates with setgid privileges.
79	
80	3. Available implementations
81	----------------------------
82	
83	I have considered the implementations of mandatory locking available with
84	SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
85	
86	Generally I have tried to make the most sense out of the behaviour exhibited
87	by these three reference systems. There are many anomalies.
88	
89	All the reference systems reject all calls to open() for a file on which
90	another process has outstanding mandatory locks. This is in direct
91	contravention of SVID 3, which states that only calls to open() with the
92	O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
93	definition, which is the "Right Thing", since only calls with O_TRUNC can
94	modify the contents of the file.
95	
96	HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
97	just mandatory locks. That would appear to contravene POSIX.1.
98	
99	mmap() is another interesting case. All the operating systems mentioned
100	prevent mandatory locks from being applied to an mmap()'ed file, but  HP-UX
101	also disallows advisory locks for such a file. SVID actually specifies the
102	paranoid HP-UX behaviour.
103	
104	In my opinion only MAP_SHARED mappings should be immune from locking, and then
105	only from mandatory locks - that is what is currently implemented.
106	
107	SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
108	mandatory locks, so reads and writes to locked files always block when they
109	should return EAGAIN.
110	
111	I'm afraid that this is such an esoteric area that the semantics described
112	below are just as valid as any others, so long as the main points seem to
113	agree. 
114	
115	4. Semantics
116	------------
117	
118	1. Mandatory locks can only be applied via the fcntl()/lockf() locking
119	   interface - in other words the System V/POSIX interface. BSD style
120	   locks using flock() never result in a mandatory lock.
121	
122	2. If a process has locked a region of a file with a mandatory read lock, then
123	   other processes are permitted to read from that region. If any of these
124	   processes attempts to write to the region it will block until the lock is
125	   released, unless the process has opened the file with the O_NONBLOCK
126	   flag in which case the system call will return immediately with the error
127	   status EAGAIN.
128	
129	3. If a process has locked a region of a file with a mandatory write lock, all
130	   attempts to read or write to that region block until the lock is released,
131	   unless a process has opened the file with the O_NONBLOCK flag in which case
132	   the system call will return immediately with the error status EAGAIN.
133	
134	4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
135	   any mandatory locks owned by other processes will be rejected with the
136	   error status EAGAIN.
137	
138	5. Attempts to apply a mandatory lock to a file that is memory mapped and
139	   shared (via mmap() with MAP_SHARED) will be rejected with the error status
140	   EAGAIN.
141	
142	6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED)
143	   that has any mandatory locks in effect will be rejected with the error status
144	   EAGAIN.
145	
146	5. Which system calls are affected?
147	-----------------------------------
148	
149	Those which modify a file's contents, not just the inode. That gives read(),
150	write(), readv(), writev(), open(), creat(), mmap(), truncate() and
151	ftruncate(). truncate() and ftruncate() are considered to be "write" actions
152	for the purposes of mandatory locking.
153	
154	The affected region is usually defined as stretching from the current position
155	for the total number of bytes read or written. For the truncate calls it is
156	defined as the bytes of a file removed or added (we must also consider bytes
157	added, as a lock can specify just "the whole file", rather than a specific
158	range of bytes.)
159	
160	Note 3: I may have overlooked some system calls that need mandatory lock
161	checking in my eagerness to get this code out the door. Please let me know, or
162	better still fix the system calls yourself and submit a patch to me or Linus.
163	
164	6. Warning!
165	-----------
166	
167	Not even root can override a mandatory lock, so runaway processes can wreak
168	havoc if they lock crucial files. The way around it is to change the file
169	permissions (remove the setgid bit) before trying to read or write to it.
170	Of course, that might be a bit tricky if the system is hung :-(
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.