About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / filesystems / xfs.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	
2	The SGI XFS Filesystem
3	======================
4	
5	XFS is a high performance journaling filesystem which originated
6	on the SGI IRIX platform.  It is completely multi-threaded, can
7	support large files and large filesystems, extended attributes,
8	variable block sizes, is extent based, and makes extensive use of
9	Btrees (directories, extents, free space) to aid both performance
10	and scalability.
11	
12	Refer to the documentation at http://oss.sgi.com/projects/xfs/
13	for further details.  This implementation is on-disk compatible
14	with the IRIX version of XFS.
15	
16	
17	Mount Options
18	=============
19	
20	When mounting an XFS filesystem, the following options are accepted.
21	For boolean mount options, the names with the (*) suffix is the
22	default behaviour.
23	
24	  allocsize=size
25		Sets the buffered I/O end-of-file preallocation size when
26		doing delayed allocation writeout (default size is 64KiB).
27		Valid values for this option are page size (typically 4KiB)
28		through to 1GiB, inclusive, in power-of-2 increments.
29	
30		The default behaviour is for dynamic end-of-file
31		preallocation size, which uses a set of heuristics to
32		optimise the preallocation size based on the current
33		allocation patterns within the file and the access patterns
34		to the file. Specifying a fixed allocsize value turns off
35		the dynamic behaviour.
36	
37	  attr2
38	  noattr2
39		The options enable/disable an "opportunistic" improvement to
40		be made in the way inline extended attributes are stored
41		on-disk.  When the new form is used for the first time when
42		attr2 is selected (either when setting or removing extended
43		attributes) the on-disk superblock feature bit field will be
44		updated to reflect this format being in use.
45	
46		The default behaviour is determined by the on-disk feature
47		bit indicating that attr2 behaviour is active. If either
48		mount option it set, then that becomes the new default used
49		by the filesystem.
50	
51		CRC enabled filesystems always use the attr2 format, and so
52		will reject the noattr2 mount option if it is set.
53	
54	  discard
55	  nodiscard (*)
56		Enable/disable the issuing of commands to let the block
57		device reclaim space freed by the filesystem.  This is
58		useful for SSD devices, thinly provisioned LUNs and virtual
59		machine images, but may have a performance impact.
60	
61		Note: It is currently recommended that you use the fstrim
62		application to discard unused blocks rather than the discard
63		mount option because the performance impact of this option
64		is quite severe.
65	
66	  grpid/bsdgroups
67	  nogrpid/sysvgroups (*)
68		These options define what group ID a newly created file
69		gets.  When grpid is set, it takes the group ID of the
70		directory in which it is created; otherwise it takes the
71		fsgid of the current process, unless the directory has the
72		setgid bit set, in which case it takes the gid from the
73		parent directory, and also gets the setgid bit set if it is
74		a directory itself.
75	
76	  filestreams
77		Make the data allocator use the filestreams allocation mode
78		across the entire filesystem rather than just on directories
79		configured to use it.
80	
81	  ikeep
82	  noikeep (*)
83		When ikeep is specified, XFS does not delete empty inode
84		clusters and keeps them around on disk.  When noikeep is
85		specified, empty inode clusters are returned to the free
86		space pool.
87	
88	  inode32
89	  inode64 (*)
90		When inode32 is specified, it indicates that XFS limits
91		inode creation to locations which will not result in inode
92		numbers with more than 32 bits of significance.
93	
94		When inode64 is specified, it indicates that XFS is allowed
95		to create inodes at any location in the filesystem,
96		including those which will result in inode numbers occupying
97		more than 32 bits of significance. 
98	
99		inode32 is provided for backwards compatibility with older
100		systems and applications, since 64 bits inode numbers might
101		cause problems for some applications that cannot handle
102		large inode numbers.  If applications are in use which do
103		not handle inode numbers bigger than 32 bits, the inode32
104		option should be specified.
105	
106	
107	  largeio
108	  nolargeio (*)
109		If "nolargeio" is specified, the optimal I/O reported in
110		st_blksize by stat(2) will be as small as possible to allow
111		user applications to avoid inefficient read/modify/write
112		I/O.  This is typically the page size of the machine, as
113		this is the granularity of the page cache.
114	
115		If "largeio" specified, a filesystem that was created with a
116		"swidth" specified will return the "swidth" value (in bytes)
117		in st_blksize. If the filesystem does not have a "swidth"
118		specified but does specify an "allocsize" then "allocsize"
119		(in bytes) will be returned instead. Otherwise the behaviour
120		is the same as if "nolargeio" was specified.
121	
122	  logbufs=value
123		Set the number of in-memory log buffers.  Valid numbers
124		range from 2-8 inclusive.
125	
126		The default value is 8 buffers.
127	
128		If the memory cost of 8 log buffers is too high on small
129		systems, then it may be reduced at some cost to performance
130		on metadata intensive workloads. The logbsize option below
131		controls the size of each buffer and so is also relevant to
132		this case.
133	
134	  logbsize=value
135		Set the size of each in-memory log buffer.  The size may be
136		specified in bytes, or in kilobytes with a "k" suffix.
137		Valid sizes for version 1 and version 2 logs are 16384 (16k)
138		and 32768 (32k).  Valid sizes for version 2 logs also
139		include 65536 (64k), 131072 (128k) and 262144 (256k). The
140		logbsize must be an integer multiple of the log
141		stripe unit configured at mkfs time.
142	
143		The default value for for version 1 logs is 32768, while the
144		default value for version 2 logs is MAX(32768, log_sunit).
145	
146	  logdev=device and rtdev=device
147		Use an external log (metadata journal) and/or real-time device.
148		An XFS filesystem has up to three parts: a data section, a log
149		section, and a real-time section.  The real-time section is
150		optional, and the log section can be separate from the data
151		section or contained within it.
152	
153	  noalign
154		Data allocations will not be aligned at stripe unit
155		boundaries. This is only relevant to filesystems created
156		with non-zero data alignment parameters (sunit, swidth) by
157		mkfs.
158	
159	  norecovery
160		The filesystem will be mounted without running log recovery.
161		If the filesystem was not cleanly unmounted, it is likely to
162		be inconsistent when mounted in "norecovery" mode.
163		Some files or directories may not be accessible because of this.
164		Filesystems mounted "norecovery" must be mounted read-only or
165		the mount will fail.
166	
167	  nouuid
168		Don't check for double mounted file systems using the file
169		system uuid.  This is useful to mount LVM snapshot volumes,
170		and often used in combination with "norecovery" for mounting
171		read-only snapshots.
172	
173	  noquota
174		Forcibly turns off all quota accounting and enforcement
175		within the filesystem.
176	
177	  uquota/usrquota/uqnoenforce/quota
178		User disk quota accounting enabled, and limits (optionally)
179		enforced.  Refer to xfs_quota(8) for further details.
180	
181	  gquota/grpquota/gqnoenforce
182		Group disk quota accounting enabled and limits (optionally)
183		enforced.  Refer to xfs_quota(8) for further details.
184	
185	  pquota/prjquota/pqnoenforce
186		Project disk quota accounting enabled and limits (optionally)
187		enforced.  Refer to xfs_quota(8) for further details.
188	
189	  sunit=value and swidth=value
190		Used to specify the stripe unit and width for a RAID device
191		or a stripe volume.  "value" must be specified in 512-byte
192		block units. These options are only relevant to filesystems
193		that were created with non-zero data alignment parameters.
194	
195		The sunit and swidth parameters specified must be compatible
196		with the existing filesystem alignment characteristics.  In
197		general, that means the only valid changes to sunit are
198		increasing it by a power-of-2 multiple. Valid swidth values
199		are any integer multiple of a valid sunit value.
200	
201		Typically the only time these mount options are necessary if
202		after an underlying RAID device has had it's geometry
203		modified, such as adding a new disk to a RAID5 lun and
204		reshaping it.
205	
206	  swalloc
207		Data allocations will be rounded up to stripe width boundaries
208		when the current end of file is being extended and the file
209		size is larger than the stripe width size.
210	
211	  wsync
212		When specified, all filesystem namespace operations are
213		executed synchronously. This ensures that when the namespace
214		operation (create, unlink, etc) completes, the change to the
215		namespace is on stable storage. This is useful in HA setups
216		where failover must not result in clients seeing
217		inconsistent namespace presentation during or after a
218		failover event.
219	
220	
221	Deprecated Mount Options
222	========================
223	
224	  Name				Removal Schedule
225	  ----				----------------
226	  barrier			no earlier than v4.15
227	  nobarrier			no earlier than v4.15
228	
229	
230	Removed Mount Options
231	=====================
232	
233	  Name				Removed
234	  ----				-------
235	  delaylog/nodelaylog		v4.0
236	  ihashsize			v4.0
237	  irixsgid			v4.0
238	  osyncisdsync/osyncisosync	v4.0
239	
240	
241	sysctls
242	=======
243	
244	The following sysctls are available for the XFS filesystem:
245	
246	  fs.xfs.stats_clear		(Min: 0  Default: 0  Max: 1)
247		Setting this to "1" clears accumulated XFS statistics
248		in /proc/fs/xfs/stat.  It then immediately resets to "0".
249	
250	  fs.xfs.xfssyncd_centisecs	(Min: 100  Default: 3000  Max: 720000)
251		The interval at which the filesystem flushes metadata
252		out to disk and runs internal cache cleanup routines.
253	
254	  fs.xfs.filestream_centisecs	(Min: 1  Default: 3000  Max: 360000)
255		The interval at which the filesystem ages filestreams cache
256		references and returns timed-out AGs back to the free stream
257		pool.
258	
259	  fs.xfs.speculative_prealloc_lifetime
260			(Units: seconds   Min: 1  Default: 300  Max: 86400)
261		The interval at which the background scanning for inodes
262		with unused speculative preallocation runs. The scan
263		removes unused preallocation from clean inodes and releases
264		the unused space back to the free pool.
265	
266	  fs.xfs.error_level		(Min: 0  Default: 3  Max: 11)
267		A volume knob for error reporting when internal errors occur.
268		This will generate detailed messages & backtraces for filesystem
269		shutdowns, for example.  Current threshold values are:
270	
271			XFS_ERRLEVEL_OFF:       0
272			XFS_ERRLEVEL_LOW:       1
273			XFS_ERRLEVEL_HIGH:      5
274	
275	  fs.xfs.panic_mask		(Min: 0  Default: 0  Max: 255)
276		Causes certain error conditions to call BUG(). Value is a bitmask;
277		OR together the tags which represent errors which should cause panics:
278	
279			XFS_NO_PTAG                     0
280			XFS_PTAG_IFLUSH                 0x00000001
281			XFS_PTAG_LOGRES                 0x00000002
282			XFS_PTAG_AILDELETE              0x00000004
283			XFS_PTAG_ERROR_REPORT           0x00000008
284			XFS_PTAG_SHUTDOWN_CORRUPT       0x00000010
285			XFS_PTAG_SHUTDOWN_IOERROR       0x00000020
286			XFS_PTAG_SHUTDOWN_LOGERROR      0x00000040
287			XFS_PTAG_FSBLOCK_ZERO           0x00000080
288	
289		This option is intended for debugging only.
290	
291	  fs.xfs.irix_symlink_mode	(Min: 0  Default: 0  Max: 1)
292		Controls whether symlinks are created with mode 0777 (default)
293		or whether their mode is affected by the umask (irix mode).
294	
295	  fs.xfs.irix_sgid_inherit	(Min: 0  Default: 0  Max: 1)
296		Controls files created in SGID directories.
297		If the group ID of the new file does not match the effective group
298		ID or one of the supplementary group IDs of the parent dir, the
299		ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl
300		is set.
301	
302	  fs.xfs.inherit_sync		(Min: 0  Default: 1  Max: 1)
303		Setting this to "1" will cause the "sync" flag set
304		by the xfs_io(8) chattr command on a directory to be
305		inherited by files in that directory.
306	
307	  fs.xfs.inherit_nodump		(Min: 0  Default: 1  Max: 1)
308		Setting this to "1" will cause the "nodump" flag set
309		by the xfs_io(8) chattr command on a directory to be
310		inherited by files in that directory.
311	
312	  fs.xfs.inherit_noatime	(Min: 0  Default: 1  Max: 1)
313		Setting this to "1" will cause the "noatime" flag set
314		by the xfs_io(8) chattr command on a directory to be
315		inherited by files in that directory.
316	
317	  fs.xfs.inherit_nosymlinks	(Min: 0  Default: 1  Max: 1)
318		Setting this to "1" will cause the "nosymlinks" flag set
319		by the xfs_io(8) chattr command on a directory to be
320		inherited by files in that directory.
321	
322	  fs.xfs.inherit_nodefrag	(Min: 0  Default: 1  Max: 1)
323		Setting this to "1" will cause the "nodefrag" flag set
324		by the xfs_io(8) chattr command on a directory to be
325		inherited by files in that directory.
326	
327	  fs.xfs.rotorstep		(Min: 1  Default: 1  Max: 256)
328		In "inode32" allocation mode, this option determines how many
329		files the allocator attempts to allocate in the same allocation
330		group before moving to the next allocation group.  The intent
331		is to control the rate at which the allocator moves between
332		allocation groups when allocating extents for new files.
333	
334	Deprecated Sysctls
335	==================
336	
337	None at present.
338	
339	
340	Removed Sysctls
341	===============
342	
343	  Name				Removed
344	  ----				-------
345	  fs.xfs.xfsbufd_centisec	v4.0
346	  fs.xfs.age_buffer_centisecs	v4.0
347	
348	
349	Error handling
350	==============
351	
352	XFS can act differently according to the type of error found during its
353	operation. The implementation introduces the following concepts to the error
354	handler:
355	
356	 -failure speed:
357		Defines how fast XFS should propagate an error upwards when a specific
358		error is found during the filesystem operation. It can propagate
359		immediately, after a defined number of retries, after a set time period,
360		or simply retry forever.
361	
362	 -error classes:
363		Specifies the subsystem the error configuration will apply to, such as
364		metadata IO or memory allocation. Different subsystems will have
365		different error handlers for which behaviour can be configured.
366	
367	 -error handlers:
368		Defines the behavior for a specific error.
369	
370	The filesystem behavior during an error can be set via sysfs files. Each
371	error handler works independently - the first condition met by an error handler
372	for a specific class will cause the error to be propagated rather than reset and
373	retried.
374	
375	The action taken by the filesystem when the error is propagated is context
376	dependent - it may cause a shut down in the case of an unrecoverable error,
377	it may be reported back to userspace, or it may even be ignored because
378	there's nothing useful we can with the error or anyone we can report it to (e.g.
379	during unmount).
380	
381	The configuration files are organized into the following hierarchy for each
382	mounted filesystem:
383	
384	  /sys/fs/xfs/<dev>/error/<class>/<error>/
385	
386	Where:
387	  <dev>
388		The short device name of the mounted filesystem. This is the same device
389		name that shows up in XFS kernel error messages as "XFS(<dev>): ..."
390	
391	  <class>
392		The subsystem the error configuration belongs to. As of 4.9, the defined
393		classes are:
394	
395			- "metadata": applies metadata buffer write IO
396	
397	  <error>
398		The individual error handler configurations.
399	
400	
401	Each filesystem has "global" error configuration options defined in their top
402	level directory:
403	
404	  /sys/fs/xfs/<dev>/error/
405	
406	  fail_at_unmount		(Min:  0  Default:  1  Max: 1)
407		Defines the filesystem error behavior at unmount time.
408	
409		If set to a value of 1, XFS will override all other error configurations
410		during unmount and replace them with "immediate fail" characteristics.
411		i.e. no retries, no retry timeout. This will always allow unmount to
412		succeed when there are persistent errors present.
413	
414		If set to 0, the configured retry behaviour will continue until all
415		retries and/or timeouts have been exhausted. This will delay unmount
416		completion when there are persistent errors, and it may prevent the
417		filesystem from ever unmounting fully in the case of "retry forever"
418		handler configurations.
419	
420		Note: there is no guarantee that fail_at_unmount can be set whilst an
421		unmount is in progress. It is possible that the sysfs entries are
422		removed by the unmounting filesystem before a "retry forever" error
423		handler configuration causes unmount to hang, and hence the filesystem
424		must be configured appropriately before unmount begins to prevent
425		unmount hangs.
426	
427	Each filesystem has specific error class handlers that define the error
428	propagation behaviour for specific errors. There is also a "default" error
429	handler defined, which defines the behaviour for all errors that don't have
430	specific handlers defined. Where multiple retry constraints are configuredi for
431	a single error, the first retry configuration that expires will cause the error
432	to be propagated. The handler configurations are found in the directory:
433	
434	  /sys/fs/xfs/<dev>/error/<class>/<error>/
435	
436	  max_retries			(Min: -1  Default: Varies  Max: INTMAX)
437		Defines the allowed number of retries of a specific error before
438		the filesystem will propagate the error. The retry count for a given
439		error context (e.g. a specific metadata buffer) is reset every time
440		there is a successful completion of the operation.
441	
442		Setting the value to "-1" will cause XFS to retry forever for this
443		specific error.
444	
445		Setting the value to "0" will cause XFS to fail immediately when the
446		specific error is reported.
447	
448		Setting the value to "N" (where 0 < N < Max) will make XFS retry the
449		operation "N" times before propagating the error.
450	
451	  retry_timeout_seconds		(Min:  -1  Default:  Varies  Max: 1 day)
452		Define the amount of time (in seconds) that the filesystem is
453		allowed to retry its operations when the specific error is
454		found.
455	
456		Setting the value to "-1" will allow XFS to retry forever for this
457		specific error.
458	
459		Setting the value to "0" will cause XFS to fail immediately when the
460		specific error is reported.
461	
462		Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the
463		operation for up to "N" seconds before propagating the error.
464	
465	Note: The default behaviour for a specific error handler is dependent on both
466	the class and error context. For example, the default values for
467	"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
468	to "fail immediately" behaviour. This is done because ENODEV is a fatal,
469	unrecoverable error no matter how many times the metadata IO is retried.
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog