Documentation / filesystems / files.txt


Based on kernel version 5.7.10. Page generated on 2020-07-23 22:17 EST.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
File management in the Linux kernel
-----------------------------------

This document describes how locking for files (struct file)
and file descriptor table (struct files) works.

Up until 2.6.12, the file descriptor table has been protected
with a lock (files->file_lock) and reference count (files->count).
->file_lock protected accesses to all the file related fields
of the table. ->count was used for sharing the file descriptor
table between tasks cloned with CLONE_FILES flag. Typically
this would be the case for posix threads. As with the common
refcounting model in the kernel, the last task doing
a put_files_struct() frees the file descriptor (fd) table.
The files (struct file) themselves are protected using
reference count (->f_count).

In the new lock-free model of file descriptor management,
the reference counting is similar, but the locking is
based on RCU. The file descriptor table contains multiple
elements - the fd sets (open_fds and close_on_exec, the
array of file pointers, the sizes of the sets and the array
etc.). In order for the updates to appear atomic to
a lock-free reader, all the elements of the file descriptor
table are in a separate structure - struct fdtable.
files_struct contains a pointer to struct fdtable through
which the actual fd table is accessed. Initially the
fdtable is embedded in files_struct itself. On a subsequent
expansion of fdtable, a new fdtable structure is allocated
and files->fdtab points to the new structure. The fdtable
structure is freed with RCU and lock-free readers either
see the old fdtable or the new fdtable making the update
appear atomic. Here are the locking rules for
the fdtable structure -

1. All references to the fdtable must be done through
   the files_fdtable() macro :

	struct fdtable *fdt;

	rcu_read_lock();

	fdt = files_fdtable(files);
	....
	if (n <= fdt->max_fds)
		....
	...
	rcu_read_unlock();

   files_fdtable() uses rcu_dereference() macro which takes care of
   the memory barrier requirements for lock-free dereference.
   The fdtable pointer must be read within the read-side
   critical section.

2. Reading of the fdtable as described above must be protected
   by rcu_read_lock()/rcu_read_unlock().

3. For any update to the fd table, files->file_lock must
   be held.

4. To look up the file structure given an fd, a reader
   must use either fcheck() or fcheck_files() APIs. These
   take care of barrier requirements due to lock-free lookup.
   An example :

	struct file *file;

	rcu_read_lock();
	file = fcheck(fd);
	if (file) {
		...
	}
	....
	rcu_read_unlock();

5. Handling of the file structures is special. Since the look-up
   of the fd (fget()/fget_light()) are lock-free, it is possible
   that look-up may race with the last put() operation on the
   file structure. This is avoided using atomic_long_inc_not_zero()
   on ->f_count :

	rcu_read_lock();
	file = fcheck_files(files, fd);
	if (file) {
		if (atomic_long_inc_not_zero(&file->f_count))
			*fput_needed = 1;
		else
		/* Didn't get the reference, someone's freed */
			file = NULL;
	}
	rcu_read_unlock();
	....
	return file;

   atomic_long_inc_not_zero() detects if refcounts is already zero or
   goes to zero during increment. If it does, we fail
   fget()/fget_light().

6. Since both fdtable and file structures can be looked up
   lock-free, they must be installed using rcu_assign_pointer()
   API. If they are looked up lock-free, rcu_dereference()
   must be used. However it is advisable to use files_fdtable()
   and fcheck()/fcheck_files() which take care of these issues.

7. While updating, the fdtable pointer must be looked up while
   holding files->file_lock. If ->file_lock is dropped, then
   another thread expand the files thereby creating a new
   fdtable and making the earlier fdtable pointer stale.
   For example :

	spin_lock(&files->file_lock);
	fd = locate_fd(files, file, start);
	if (fd >= 0) {
		/* locate_fd() may have expanded fdtable, load the ptr */
		fdt = files_fdtable(files);
		__set_open_fd(fd, fdt);
		__clear_close_on_exec(fd, fdt);
		spin_unlock(&files->file_lock);
	.....

   Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
   the fdtable pointer (fdt) must be loaded after locate_fd().