About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / filesystems / sharedsubtree.txt




Custom Search

Based on kernel version 3.16. Page generated on 2014-08-06 21:39 EST.

1	Shared Subtrees
2	---------------
3	
4	Contents:
5		1) Overview
6		2) Features
7		3) Setting mount states
8		4) Use-case
9		5) Detailed semantics
10		6) Quiz
11		7) FAQ
12		8) Implementation
13	
14	
15	1) Overview
16	-----------
17	
18	Consider the following situation:
19	
20	A process wants to clone its own namespace, but still wants to access the CD
21	that got mounted recently.  Shared subtree semantics provide the necessary
22	mechanism to accomplish the above.
23	
24	It provides the necessary building blocks for features like per-user-namespace
25	and versioned filesystem.
26	
27	2) Features
28	-----------
29	
30	Shared subtree provides four different flavors of mounts; struct vfsmount to be
31	precise
32	
33		a. shared mount
34		b. slave mount
35		c. private mount
36		d. unbindable mount
37	
38	
39	2a) A shared mount can be replicated to as many mountpoints and all the
40	replicas continue to be exactly same.
41	
42		Here is an example:
43	
44		Let's say /mnt has a mount that is shared.
45		mount --make-shared /mnt
46	
47		Note: mount(8) command now supports the --make-shared flag,
48		so the sample 'smount' program is no longer needed and has been
49		removed.
50	
51		# mount --bind /mnt /tmp
52		The above command replicates the mount at /mnt to the mountpoint /tmp
53		and the contents of both the mounts remain identical.
54	
55		#ls /mnt
56		a b c
57	
58		#ls /tmp
59		a b c
60	
61		Now let's say we mount a device at /tmp/a
62		# mount /dev/sd0  /tmp/a
63	
64		#ls /tmp/a
65		t1 t2 t3
66	
67		#ls /mnt/a
68		t1 t2 t3
69	
70		Note that the mount has propagated to the mount at /mnt as well.
71	
72		And the same is true even when /dev/sd0 is mounted on /mnt/a. The
73		contents will be visible under /tmp/a too.
74	
75	
76	2b) A slave mount is like a shared mount except that mount and umount events
77		only propagate towards it.
78	
79		All slave mounts have a master mount which is a shared.
80	
81		Here is an example:
82	
83		Let's say /mnt has a mount which is shared.
84		# mount --make-shared /mnt
85	
86		Let's bind mount /mnt to /tmp
87		# mount --bind /mnt /tmp
88	
89		the new mount at /tmp becomes a shared mount and it is a replica of
90		the mount at /mnt.
91	
92		Now let's make the mount at /tmp; a slave of /mnt
93		# mount --make-slave /tmp
94	
95		let's mount /dev/sd0 on /mnt/a
96		# mount /dev/sd0 /mnt/a
97	
98		#ls /mnt/a
99		t1 t2 t3
100	
101		#ls /tmp/a
102		t1 t2 t3
103	
104		Note the mount event has propagated to the mount at /tmp
105	
106		However let's see what happens if we mount something on the mount at /tmp
107	
108		# mount /dev/sd1 /tmp/b
109	
110		#ls /tmp/b
111		s1 s2 s3
112	
113		#ls /mnt/b
114	
115		Note how the mount event has not propagated to the mount at
116		/mnt
117	
118	
119	2c) A private mount does not forward or receive propagation.
120	
121		This is the mount we are familiar with. Its the default type.
122	
123	
124	2d) A unbindable mount is a unbindable private mount
125	
126		let's say we have a mount at /mnt and we make is unbindable
127	
128		# mount --make-unbindable /mnt
129	
130		 Let's try to bind mount this mount somewhere else.
131		 # mount --bind /mnt /tmp
132		 mount: wrong fs type, bad option, bad superblock on /mnt,
133		        or too many mounted file systems
134	
135		Binding a unbindable mount is a invalid operation.
136	
137	
138	3) Setting mount states
139	
140		The mount command (util-linux package) can be used to set mount
141		states:
142	
143		mount --make-shared mountpoint
144		mount --make-slave mountpoint
145		mount --make-private mountpoint
146		mount --make-unbindable mountpoint
147	
148	
149	4) Use cases
150	------------
151	
152		A) A process wants to clone its own namespace, but still wants to
153		   access the CD that got mounted recently.
154	
155		   Solution:
156	
157			The system administrator can make the mount at /cdrom shared
158			mount --bind /cdrom /cdrom
159			mount --make-shared /cdrom
160	
161			Now any process that clones off a new namespace will have a
162			mount at /cdrom which is a replica of the same mount in the
163			parent namespace.
164	
165			So when a CD is inserted and mounted at /cdrom that mount gets
166			propagated to the other mount at /cdrom in all the other clone
167			namespaces.
168	
169		B) A process wants its mounts invisible to any other process, but
170		still be able to see the other system mounts.
171	
172		   Solution:
173	
174			To begin with, the administrator can mark the entire mount tree
175			as shareable.
176	
177			mount --make-rshared /
178	
179			A new process can clone off a new namespace. And mark some part
180			of its namespace as slave
181	
182			mount --make-rslave /myprivatetree
183	
184			Hence forth any mounts within the /myprivatetree done by the
185			process will not show up in any other namespace. However mounts
186			done in the parent namespace under /myprivatetree still shows
187			up in the process's namespace.
188	
189	
190		Apart from the above semantics this feature provides the
191		building blocks to solve the following problems:
192	
193		C)  Per-user namespace
194	
195			The above semantics allows a way to share mounts across
196			namespaces.  But namespaces are associated with processes. If
197			namespaces are made first class objects with user API to
198			associate/disassociate a namespace with userid, then each user
199			could have his/her own namespace and tailor it to his/her
200			requirements. Offcourse its needs support from PAM.
201	
202		D)  Versioned files
203	
204			If the entire mount tree is visible at multiple locations, then
205			a underlying versioning file system can return different
206			version of the file depending on the path used to access that
207			file.
208	
209			An example is:
210	
211			mount --make-shared /
212			mount --rbind / /view/v1
213			mount --rbind / /view/v2
214			mount --rbind / /view/v3
215			mount --rbind / /view/v4
216	
217			and if /usr has a versioning filesystem mounted, then that
218			mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
219			/view/v4/usr too
220	
221			A user can request v3 version of the file /usr/fs/namespace.c
222			by accessing /view/v3/usr/fs/namespace.c . The underlying
223			versioning filesystem can then decipher that v3 version of the
224			filesystem is being requested and return the corresponding
225			inode.
226	
227	5) Detailed semantics:
228	-------------------
229		The section below explains the detailed semantics of
230		bind, rbind, move, mount, umount and clone-namespace operations.
231	
232		Note: the word 'vfsmount' and the noun 'mount' have been used
233		to mean the same thing, throughout this document.
234	
235	5a) Mount states
236	
237		A given mount can be in one of the following states
238		1) shared
239		2) slave
240		3) shared and slave
241		4) private
242		5) unbindable
243	
244		A 'propagation event' is defined as event generated on a vfsmount
245		that leads to mount or unmount actions in other vfsmounts.
246	
247		A 'peer group' is defined as a group of vfsmounts that propagate
248		events to each other.
249	
250		(1) Shared mounts
251	
252			A 'shared mount' is defined as a vfsmount that belongs to a
253			'peer group'.
254	
255			For example:
256				mount --make-shared /mnt
257				mount --bind /mnt /tmp
258	
259			The mount at /mnt and that at /tmp are both shared and belong
260			to the same peer group. Anything mounted or unmounted under
261			/mnt or /tmp reflect in all the other mounts of its peer
262			group.
263	
264	
265		(2) Slave mounts
266	
267			A 'slave mount' is defined as a vfsmount that receives
268			propagation events and does not forward propagation events.
269	
270			A slave mount as the name implies has a master mount from which
271			mount/unmount events are received. Events do not propagate from
272			the slave mount to the master.  Only a shared mount can be made
273			a slave by executing the following command
274	
275				mount --make-slave mount
276	
277			A shared mount that is made as a slave is no more shared unless
278			modified to become shared.
279	
280		(3) Shared and Slave
281	
282			A vfsmount can be both shared as well as slave.  This state
283			indicates that the mount is a slave of some vfsmount, and
284			has its own peer group too.  This vfsmount receives propagation
285			events from its master vfsmount, and also forwards propagation
286			events to its 'peer group' and to its slave vfsmounts.
287	
288			Strictly speaking, the vfsmount is shared having its own
289			peer group, and this peer-group is a slave of some other
290			peer group.
291	
292			Only a slave vfsmount can be made as 'shared and slave' by
293			either executing the following command
294				mount --make-shared mount
295			or by moving the slave vfsmount under a shared vfsmount.
296	
297		(4) Private mount
298	
299			A 'private mount' is defined as vfsmount that does not
300			receive or forward any propagation events.
301	
302		(5) Unbindable mount
303	
304			A 'unbindable mount' is defined as vfsmount that does not
305			receive or forward any propagation events and cannot
306			be bind mounted.
307	
308	
309	   	State diagram:
310	   	The state diagram below explains the state transition of a mount,
311		in response to various commands.
312		------------------------------------------------------------------------
313		|             |make-shared |  make-slave  | make-private |make-unbindab|
314		--------------|------------|--------------|--------------|-------------|
315		|shared	      |shared	   |*slave/private|   private	 | unbindable  |
316		|             |            |              |              |             |
317		|-------------|------------|--------------|--------------|-------------|
318		|slave	      |shared      |	**slave	  |    private   | unbindable  |
319		|             |and slave   |              |              |             |
320		|-------------|------------|--------------|--------------|-------------|
321		|shared	      |shared      |    slave	  |    private   | unbindable  |
322		|and slave    |and slave   |              |              |             |
323		|-------------|------------|--------------|--------------|-------------|
324		|private      |shared	   |  **private	  |    private   | unbindable  |
325		|-------------|------------|--------------|--------------|-------------|
326		|unbindable   |shared	   |**unbindable  |    private   | unbindable  |
327		------------------------------------------------------------------------
328	
329		* if the shared mount is the only mount in its peer group, making it
330		slave, makes it private automatically. Note that there is no master to
331		which it can be slaved to.
332	
333		** slaving a non-shared mount has no effect on the mount.
334	
335		Apart from the commands listed below, the 'move' operation also changes
336		the state of a mount depending on type of the destination mount. Its
337		explained in section 5d.
338	
339	5b) Bind semantics
340	
341		Consider the following command
342	
343		mount --bind A/a  B/b
344	
345		where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
346		is the destination mount and 'b' is the dentry in the destination mount.
347	
348		The outcome depends on the type of mount of 'A' and 'B'. The table
349		below contains quick reference.
350	   ---------------------------------------------------------------------------
351	   |         BIND MOUNT OPERATION                                            |
352	   |**************************************************************************
353	   |source(A)->| shared       |       private  |       slave    | unbindable |
354	   | dest(B)  |               |                |                |            |
355	   |   |      |               |                |                |            |
356	   |   v      |               |                |                |            |
357	   |**************************************************************************
358	   |  shared  | shared        |     shared     | shared & slave |  invalid   |
359	   |          |               |                |                |            |
360	   |non-shared| shared        |      private   |      slave     |  invalid   |
361	   ***************************************************************************
362	
363	     	Details:
364	
365		1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
366		which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
367		mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
368		are created and mounted at the dentry 'b' on all mounts where 'B'
369		propagates to. A new propagation tree containing 'C1',..,'Cn' is
370		created. This propagation tree is identical to the propagation tree of
371		'B'.  And finally the peer-group of 'C' is merged with the peer group
372		of 'A'.
373	
374		2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
375		which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
376		mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
377		are created and mounted at the dentry 'b' on all mounts where 'B'
378		propagates to. A new propagation tree is set containing all new mounts
379		'C', 'C1', .., 'Cn' with exactly the same configuration as the
380		propagation tree for 'B'.
381	
382		3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
383		mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
384		'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
385		'C3' ... are created and mounted at the dentry 'b' on all mounts where
386		'B' propagates to. A new propagation tree containing the new mounts
387		'C','C1',..  'Cn' is created. This propagation tree is identical to the
388		propagation tree for 'B'. And finally the mount 'C' and its peer group
389		is made the slave of mount 'Z'.  In other words, mount 'C' is in the
390		state 'slave and shared'.
391	
392		4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
393		invalid operation.
394	
395		5. 'A' is a private mount and 'B' is a non-shared(private or slave or
396		unbindable) mount. A new mount 'C' which is clone of 'A', is created.
397		Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
398	
399		6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
400		which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
401		mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
402		peer-group of 'A'.
403	
404		7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
405		new mount 'C' which is a clone of 'A' is created. Its root dentry is
406		'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
407		slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
408		'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
409		mount/unmount on 'A' do not propagate anywhere else. Similarly
410		mount/unmount on 'C' do not propagate anywhere else.
411	
412		8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
413		invalid operation. A unbindable mount cannot be bind mounted.
414	
415	5c) Rbind semantics
416	
417		rbind is same as bind. Bind replicates the specified mount.  Rbind
418		replicates all the mounts in the tree belonging to the specified mount.
419		Rbind mount is bind mount applied to all the mounts in the tree.
420	
421		If the source tree that is rbind has some unbindable mounts,
422		then the subtree under the unbindable mount is pruned in the new
423		location.
424	
425		eg: let's say we have the following mount tree.
426	
427			A
428		      /   \
429		      B   C
430		     / \ / \
431		     D E F G
432	
433		     Let's say all the mount except the mount C in the tree are
434		     of a type other than unbindable.
435	
436		     If this tree is rbound to say Z
437	
438		     We will have the following tree at the new location.
439	
440			Z
441			|
442			A'
443		       /
444		      B'		Note how the tree under C is pruned
445		     / \ 		in the new location.
446		    D' E'
447	
448	
449	
450	5d) Move semantics
451	
452		Consider the following command
453	
454		mount --move A  B/b
455	
456		where 'A' is the source mount, 'B' is the destination mount and 'b' is
457		the dentry in the destination mount.
458	
459		The outcome depends on the type of the mount of 'A' and 'B'. The table
460		below is a quick reference.
461	   ---------------------------------------------------------------------------
462	   |         		MOVE MOUNT OPERATION                                 |
463	   |**************************************************************************
464	   | source(A)->| shared      |       private  |       slave    | unbindable |
465	   | dest(B)  |               |                |                |            |
466	   |   |      |               |                |                |            |
467	   |   v      |               |                |                |            |
468	   |**************************************************************************
469	   |  shared  | shared        |     shared     |shared and slave|  invalid   |
470	   |          |               |                |                |            |
471	   |non-shared| shared        |      private   |    slave       | unbindable |
472	   ***************************************************************************
473		NOTE: moving a mount residing under a shared mount is invalid.
474	
475	      Details follow:
476	
477		1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
478		mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
479		are created and mounted at dentry 'b' on all mounts that receive
480		propagation from mount 'B'. A new propagation tree is created in the
481		exact same configuration as that of 'B'. This new propagation tree
482		contains all the new mounts 'A1', 'A2'...  'An'.  And this new
483		propagation tree is appended to the already existing propagation tree
484		of 'A'.
485	
486		2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
487		mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
488		are created and mounted at dentry 'b' on all mounts that receive
489		propagation from mount 'B'. The mount 'A' becomes a shared mount and a
490		propagation tree is created which is identical to that of
491		'B'. This new propagation tree contains all the new mounts 'A1',
492		'A2'...  'An'.
493	
494		3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
495		mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
496		'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
497		receive propagation from mount 'B'. A new propagation tree is created
498		in the exact same configuration as that of 'B'. This new propagation
499		tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
500		propagation tree is appended to the already existing propagation tree of
501		'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
502		becomes 'shared'.
503	
504		4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
505		is invalid. Because mounting anything on the shared mount 'B' can
506		create new mounts that get mounted on the mounts that receive
507		propagation from 'B'.  And since the mount 'A' is unbindable, cloning
508		it to mount at other mountpoints is not possible.
509	
510		5. 'A' is a private mount and 'B' is a non-shared(private or slave or
511		unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
512	
513		6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
514		is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
515		shared mount.
516	
517		7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
518		The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
519		continues to be a slave mount of mount 'Z'.
520	
521		8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
522		'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
523		unbindable mount.
524	
525	5e) Mount semantics
526	
527		Consider the following command
528	
529		mount device  B/b
530	
531		'B' is the destination mount and 'b' is the dentry in the destination
532		mount.
533	
534		The above operation is the same as bind operation with the exception
535		that the source mount is always a private mount.
536	
537	
538	5f) Unmount semantics
539	
540		Consider the following command
541	
542		umount A
543	
544		where 'A' is a mount mounted on mount 'B' at dentry 'b'.
545	
546		If mount 'B' is shared, then all most-recently-mounted mounts at dentry
547		'b' on mounts that receive propagation from mount 'B' and does not have
548		sub-mounts within them are unmounted.
549	
550		Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
551		each other.
552	
553		let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
554		'B1', 'B2' and 'B3' respectively.
555	
556		let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
557		mount 'B1', 'B2' and 'B3' respectively.
558	
559		if 'C1' is unmounted, all the mounts that are most-recently-mounted on
560		'B1' and on the mounts that 'B1' propagates-to are unmounted.
561	
562		'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
563		on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
564	
565		So all 'C1', 'C2' and 'C3' should be unmounted.
566	
567		If any of 'C2' or 'C3' has some child mounts, then that mount is not
568		unmounted, but all other mounts are unmounted. However if 'C1' is told
569		to be unmounted and 'C1' has some sub-mounts, the umount operation is
570		failed entirely.
571	
572	5g) Clone Namespace
573	
574		A cloned namespace contains all the mounts as that of the parent
575		namespace.
576	
577		Let's say 'A' and 'B' are the corresponding mounts in the parent and the
578		child namespace.
579	
580		If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
581		each other.
582	
583		If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
584		'Z'.
585	
586		If 'A' is a private mount, then 'B' is a private mount too.
587	
588		If 'A' is unbindable mount, then 'B' is a unbindable mount too.
589	
590	
591	6) Quiz
592	
593		A. What is the result of the following command sequence?
594	
595			mount --bind /mnt /mnt
596			mount --make-shared /mnt
597			mount --bind /mnt /tmp
598			mount --move /tmp /mnt/1
599	
600			what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
601			Should they all be identical? or should /mnt and /mnt/1 be
602			identical only?
603	
604	
605		B. What is the result of the following command sequence?
606	
607			mount --make-rshared /
608			mkdir -p /v/1
609			mount --rbind / /v/1
610	
611			what should be the content of /v/1/v/1 be?
612	
613	
614		C. What is the result of the following command sequence?
615	
616			mount --bind /mnt /mnt
617			mount --make-shared /mnt
618			mkdir -p /mnt/1/2/3 /mnt/1/test
619			mount --bind /mnt/1 /tmp
620			mount --make-slave /mnt
621			mount --make-shared /mnt
622			mount --bind /mnt/1/2 /tmp1
623			mount --make-slave /mnt
624	
625			At this point we have the first mount at /tmp and
626			its root dentry is 1. Let's call this mount 'A'
627			And then we have a second mount at /tmp1 with root
628			dentry 2. Let's call this mount 'B'
629			Next we have a third mount at /mnt with root dentry
630			mnt. Let's call this mount 'C'
631	
632			'B' is the slave of 'A' and 'C' is a slave of 'B'
633			A -> B -> C
634	
635			at this point if we execute the following command
636	
637			mount --bind /bin /tmp/test
638	
639			The mount is attempted on 'A'
640	
641			will the mount propagate to 'B' and 'C' ?
642	
643			what would be the contents of
644			/mnt/1/test be?
645	
646	7) FAQ
647	
648		Q1. Why is bind mount needed? How is it different from symbolic links?
649			symbolic links can get stale if the destination mount gets
650			unmounted or moved. Bind mounts continue to exist even if the
651			other mount is unmounted or moved.
652	
653		Q2. Why can't the shared subtree be implemented using exportfs?
654	
655			exportfs is a heavyweight way of accomplishing part of what
656			shared subtree can do. I cannot imagine a way to implement the
657			semantics of slave mount using exportfs?
658	
659		Q3 Why is unbindable mount needed?
660	
661			Let's say we want to replicate the mount tree at multiple
662			locations within the same subtree.
663	
664			if one rbind mounts a tree within the same subtree 'n' times
665			the number of mounts created is an exponential function of 'n'.
666			Having unbindable mount can help prune the unneeded bind
667			mounts. Here is a example.
668	
669			step 1:
670			   let's say the root tree has just two directories with
671			   one vfsmount.
672					    root
673					   /    \
674					  tmp    usr
675	
676			    And we want to replicate the tree at multiple
677			    mountpoints under /root/tmp
678	
679			step2:
680			      mount --make-shared /root
681	
682			      mkdir -p /tmp/m1
683	
684			      mount --rbind /root /tmp/m1
685	
686			      the new tree now looks like this:
687	
688					    root
689					   /    \
690					 tmp    usr
691					/
692				       m1
693				      /  \
694				     tmp  usr
695				     /
696				    m1
697	
698				  it has two vfsmounts
699	
700			step3:
701				    mkdir -p /tmp/m2
702				    mount --rbind /root /tmp/m2
703	
704				the new tree now looks like this:
705	
706					      root
707					     /    \
708					   tmp     usr
709					  /    \
710					m1       m2
711				       / \       /  \
712				     tmp  usr   tmp  usr
713				     / \          /
714				    m1  m2      m1
715					/ \     /  \
716				      tmp usr  tmp   usr
717				      /        / \
718				     m1       m1  m2
719				    /  \
720				  tmp   usr
721				  /  \
722				 m1   m2
723	
724			       it has 6 vfsmounts
725	
726			step 4:
727				  mkdir -p /tmp/m3
728				  mount --rbind /root /tmp/m3
729	
730				  I won't draw the tree..but it has 24 vfsmounts
731	
732	
733			at step i the number of vfsmounts is V[i] = i*V[i-1].
734			This is an exponential function. And this tree has way more
735			mounts than what we really needed in the first place.
736	
737			One could use a series of umount at each step to prune
738			out the unneeded mounts. But there is a better solution.
739			Unclonable mounts come in handy here.
740	
741			step 1:
742			   let's say the root tree has just two directories with
743			   one vfsmount.
744					    root
745					   /    \
746					  tmp    usr
747	
748			    How do we set up the same tree at multiple locations under
749			    /root/tmp
750	
751			step2:
752			      mount --bind /root/tmp /root/tmp
753	
754			      mount --make-rshared /root
755			      mount --make-unbindable /root/tmp
756	
757			      mkdir -p /tmp/m1
758	
759			      mount --rbind /root /tmp/m1
760	
761			      the new tree now looks like this:
762	
763					    root
764					   /    \
765					 tmp    usr
766					/
767				       m1
768				      /  \
769				     tmp  usr
770	
771			step3:
772				    mkdir -p /tmp/m2
773				    mount --rbind /root /tmp/m2
774	
775			      the new tree now looks like this:
776	
777					    root
778					   /    \
779					 tmp    usr
780					/   \
781				       m1     m2
782				      /  \     / \
783				     tmp  usr tmp usr
784	
785			step4:
786	
787				    mkdir -p /tmp/m3
788				    mount --rbind /root /tmp/m3
789	
790			      the new tree now looks like this:
791	
792					    	  root
793					      /    	  \
794					     tmp    	   usr
795				         /    \    \
796				       m1     m2     m3
797				      /  \     / \    /  \
798				     tmp  usr tmp usr tmp usr
799	
800	8) Implementation
801	
802	8A) Datastructure
803	
804		4 new fields are introduced to struct vfsmount
805		->mnt_share
806		->mnt_slave_list
807		->mnt_slave
808		->mnt_master
809	
810		->mnt_share links together all the mount to/from which this vfsmount
811			send/receives propagation events.
812	
813		->mnt_slave_list links all the mounts to which this vfsmount propagates
814			to.
815	
816		->mnt_slave links together all the slaves that its master vfsmount
817			propagates to.
818	
819		->mnt_master points to the master vfsmount from which this vfsmount
820			receives propagation.
821	
822		->mnt_flags takes two more flags to indicate the propagation status of
823			the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
824			vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
825			replicated.
826	
827		All the shared vfsmounts in a peer group form a cyclic list through
828		->mnt_share.
829	
830		All vfsmounts with the same ->mnt_master form on a cyclic list anchored
831		in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
832	
833		 ->mnt_master can point to arbitrary (and possibly different) members
834		 of master peer group.  To find all immediate slaves of a peer group
835		 you need to go through _all_ ->mnt_slave_list of its members.
836		 Conceptually it's just a single set - distribution among the
837		 individual lists does not affect propagation or the way propagation
838		 tree is modified by operations.
839	
840		All vfsmounts in a peer group have the same ->mnt_master.  If it is
841		non-NULL, they form a contiguous (ordered) segment of slave list.
842	
843		A example propagation tree looks as shown in the figure below.
844		[ NOTE: Though it looks like a forest, if we consider all the shared
845		mounts as a conceptual entity called 'pnode', it becomes a tree]
846	
847	
848			        A <--> B <--> C <---> D
849			       /|\	      /|      |\
850			      / F G	     J K      H I
851			     /
852			    E<-->K
853				/|\
854			       M L N
855	
856		In the above figure  A,B,C and D all are shared and propagate to each
857		other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
858		mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
859		'E' is also shared with 'K' and they propagate to each other.  And
860		'K' has 3 slaves 'M', 'L' and 'N'
861	
862		A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
863	
864		A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
865	
866		E's ->mnt_share links with ->mnt_share of K
867		'E', 'K', 'F', 'G' have their ->mnt_master point to struct
868					vfsmount of 'A'
869		'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
870		K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
871	
872		C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
873		J and K's ->mnt_master points to struct vfsmount of C
874		and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
875		'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
876	
877	
878		NOTE: The propagation tree is orthogonal to the mount tree.
879	
880	8B Locking:
881	
882		->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
883		by namespace_sem (exclusive for modifications, shared for reading).
884	
885		Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
886		There are two exceptions: do_add_mount() and clone_mnt().
887		The former modifies a vfsmount that has not been visible in any shared
888		data structures yet.
889		The latter holds namespace_sem and the only references to vfsmount
890		are in lists that can't be traversed without namespace_sem.
891	
892	8C Algorithm:
893	
894		The crux of the implementation resides in rbind/move operation.
895	
896		The overall algorithm breaks the operation into 3 phases: (look at
897		attach_recursive_mnt() and propagate_mnt())
898	
899		1. prepare phase.
900		2. commit phases.
901		3. abort phases.
902	
903		Prepare phase:
904	
905		for each mount in the source tree:
906			   a) Create the necessary number of mount trees to
907			   	be attached to each of the mounts that receive
908				propagation from the destination mount.
909			   b) Do not attach any of the trees to its destination.
910			      However note down its ->mnt_parent and ->mnt_mountpoint
911			   c) Link all the new mounts to form a propagation tree that
912			      is identical to the propagation tree of the destination
913			      mount.
914	
915			   If this phase is successful, there should be 'n' new
916			   propagation trees; where 'n' is the number of mounts in the
917			   source tree.  Go to the commit phase
918	
919			   Also there should be 'm' new mount trees, where 'm' is
920			   the number of mounts to which the destination mount
921			   propagates to.
922	
923			   if any memory allocations fail, go to the abort phase.
924	
925		Commit phase
926			attach each of the mount trees to their corresponding
927			destination mounts.
928	
929		Abort phase
930			delete all the newly created trees.
931	
932		NOTE: all the propagation related functionality resides in the file
933		pnode.c
934	
935	
936	------------------------------------------------------------------------
937	
938	version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
939	version 0.2  (Incorporated comments from Al Viro)
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.