Based on kernel version 3.16. Page generated on 2014-08-06 21:41 EST.
1 This file contains brief information about the SCSI tape driver. 2 The driver is currently maintained by Kai MÃ¤kisara (email 3 Kai.Makisara@kolumbus.fi) 4 5 Last modified: Sun Aug 29 18:25:47 2010 by kai.makisara 6 7 8 BASICS 9 10 The driver is generic, i.e., it does not contain any code tailored 11 to any specific tape drive. The tape parameters can be specified with 12 one of the following three methods: 13 14 1. Each user can specify the tape parameters he/she wants to use 15 directly with ioctls. This is administratively a very simple and 16 flexible method and applicable to single-user workstations. However, 17 in a multiuser environment the next user finds the tape parameters in 18 state the previous user left them. 19 20 2. The system manager (root) can define default values for some tape 21 parameters, like block size and density using the MTSETDRVBUFFER ioctl. 22 These parameters can be programmed to come into effect either when a 23 new tape is loaded into the drive or if writing begins at the 24 beginning of the tape. The second method is applicable if the tape 25 drive performs auto-detection of the tape format well (like some 26 QIC-drives). The result is that any tape can be read, writing can be 27 continued using existing format, and the default format is used if 28 the tape is rewritten from the beginning (or a new tape is written 29 for the first time). The first method is applicable if the drive 30 does not perform auto-detection well enough and there is a single 31 "sensible" mode for the device. An example is a DAT drive that is 32 used only in variable block mode (I don't know if this is sensible 33 or not :-). 34 35 The user can override the parameters defined by the system 36 manager. The changes persist until the defaults again come into 37 effect. 38 39 3. By default, up to four modes can be defined and selected using the minor 40 number (bits 5 and 6). The number of modes can be changed by changing 41 ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed 42 above. Additional modes are dormant until they are defined by the 43 system manager (root). When specification of a new mode is started, 44 the configuration of mode 0 is used to provide a starting point for 45 definition of the new mode. 46 47 Using the modes allows the system manager to give the users choices 48 over some of the buffering parameters not directly accessible to the 49 users (buffered and asynchronous writes). The modes also allow choices 50 between formats in multi-tape operations (the explicitly overridden 51 parameters are reset when a new tape is loaded). 52 53 If more than one mode is used, all modes should contain definitions 54 for the same set of parameters. 55 56 Many Unices contain internal tables that associate different modes to 57 supported devices. The Linux SCSI tape driver does not contain such 58 tables (and will not do that in future). Instead of that, a utility 59 program can be made that fetches the inquiry data sent by the device, 60 scans its database, and sets up the modes using the ioctls. Another 61 alternative is to make a small script that uses mt to set the defaults 62 tailored to the system. 63 64 The driver supports fixed and variable block size (within buffer 65 limits). Both the auto-rewind (minor equals device number) and 66 non-rewind devices (minor is 128 + device number) are implemented. 67 68 In variable block mode, the byte count in write() determines the size 69 of the physical block on tape. When reading, the drive reads the next 70 tape block and returns to the user the data if the read() byte count 71 is at least the block size. Otherwise, error ENOMEM is returned. 72 73 In fixed block mode, the data transfer between the drive and the 74 driver is in multiples of the block size. The write() byte count must 75 be a multiple of the block size. This is not required when reading but 76 may be advisable for portability. 77 78 Support is provided for changing the tape partition and partitioning 79 of the tape with one or two partitions. By default support for 80 partitioned tape is disabled for each driver and it can be enabled 81 with the ioctl MTSETDRVBUFFER. 82 83 By default the driver writes one filemark when the device is closed after 84 writing and the last operation has been a write. Two filemarks can be 85 optionally written. In both cases end of data is signified by 86 returning zero bytes for two consecutive reads. 87 88 Writing filemarks without the immediate bit set in the SCSI command block acts 89 as a synchronization point, i.e., all remaining data form the drive buffers is 90 written to tape before the command returns. This makes sure that write errors 91 are caught at that point, but this takes time. In some applications, several 92 consecutive files must be written fast. The MTWEOFI operation can be used to 93 write the filemarks without flushing the drive buffer. Writing filemark at 94 close() is always flushing the drive buffers. However, if the previous 95 operation is MTWEOFI, close() does not write a filemark. This can be used if 96 the program wants to close/open the tape device between files and wants to 97 skip waiting. 98 99 If rewind, offline, bsf, or seek is done and previous tape operation was 100 write, a filemark is written before moving tape. 101 102 The compile options are defined in the file linux/drivers/scsi/st_options.h. 103 104 4. If the open option O_NONBLOCK is used, open succeeds even if the 105 drive is not ready. If O_NONBLOCK is not used, the driver waits for 106 the drive to become ready. If this does not happen in ST_BLOCK_SECONDS 107 seconds, open fails with the errno value EIO. With O_NONBLOCK the 108 device can be opened for writing even if there is a write protected 109 tape in the drive (commands trying to write something return error if 110 attempted). 111 112 113 MINOR NUMBERS 114 115 The tape driver currently supports up to 2^17 drives if 4 modes for 116 each drive are used. 117 118 The minor numbers consist of the following bit fields: 119 120 dev_upper non-rew mode dev-lower 121 20 - 8 7 6 5 4 0 122 The non-rewind bit is always bit 7 (the uppermost bit in the lowermost 123 byte). The bits defining the mode are below the non-rewind bit. The 124 remaining bits define the tape device number. This numbering is 125 backward compatible with the numbering used when the minor number was 126 only 8 bits wide. 127 128 129 SYSFS SUPPORT 130 131 The driver creates the directory /sys/class/scsi_tape and populates it with 132 directories corresponding to the existing tape devices. There are autorewind 133 and non-rewind entries for each mode. The names are stxy and nstxy, where x 134 is the tape number and y a character corresponding to the mode (none, l, m, 135 a). For example, the directories for the first tape device are (assuming four 136 modes): st0 nst0 st0l nst0l st0m nst0m st0a nst0a. 137 138 Each directory contains the entries: default_blksize default_compression 139 default_density defined dev device driver. The file 'defined' contains 1 140 if the mode is defined and zero if not defined. The files 'default_*' contain 141 the defaults set by the user. The value -1 means the default is not set. The 142 file 'dev' contains the device numbers corresponding to this device. The links 143 'device' and 'driver' point to the SCSI device and driver entries. 144 145 Each directory also contains the entry 'options' which shows the currently 146 enabled driver and mode options. The value in the file is a bit mask where the 147 bit definitions are the same as those used with MTSETDRVBUFFER in setting the 148 options. 149 150 A link named 'tape' is made from the SCSI device directory to the class 151 directory corresponding to the mode 0 auto-rewind device (e.g., st0). 152 153 154 BSD AND SYS V SEMANTICS 155 156 The user can choose between these two behaviours of the tape driver by 157 defining the value of the symbol ST_SYSV. The semantics differ when a 158 file being read is closed. The BSD semantics leaves the tape where it 159 currently is whereas the SYS V semantics moves the tape past the next 160 filemark unless the filemark has just been crossed. 161 162 The default is BSD semantics. 163 164 165 BUFFERING 166 167 The driver tries to do transfers directly to/from user space. If this 168 is not possible, a driver buffer allocated at run-time is used. If 169 direct i/o is not possible for the whole transfer, the driver buffer 170 is used (i.e., bounce buffers for individual pages are not 171 used). Direct i/o can be impossible because of several reasons, e.g.: 172 - one or more pages are at addresses not reachable by the HBA 173 - the number of pages in the transfer exceeds the number of 174 scatter/gather segments permitted by the HBA 175 - one or more pages can't be locked into memory (should not happen in 176 any reasonable situation) 177 178 The size of the driver buffers is always at least one tape block. In fixed 179 block mode, the minimum buffer size is defined (in 1024 byte units) by 180 ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of 181 several blocks and using one SCSI read or write to transfer all of the 182 blocks. Buffering of data across write calls in fixed block mode is 183 allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used. 184 Buffer allocation uses chunks of memory having sizes 2^n * (page 185 size). Because of this the actual buffer size may be larger than the 186 minimum allowable buffer size. 187 188 NOTE that if direct i/o is used, the small writes are not buffered. This may 189 cause a surprise when moving from 2.4. There small writes (e.g., tar without 190 -b option) may have had good throughput but this is not true any more with 191 2.6. Direct i/o can be turned off to solve this problem but a better solution 192 is to use bigger write() byte counts (e.g., tar -b 64). 193 194 Asynchronous writing. Writing the buffer contents to the tape is 195 started and the write call returns immediately. The status is checked 196 at the next tape operation. Asynchronous writes are not done with 197 direct i/o and not in fixed block mode. 198 199 Buffered writes and asynchronous writes may in some rare cases cause 200 problems in multivolume operations if there is not enough space on the 201 tape after the early-warning mark to flush the driver buffer. 202 203 Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is 204 attempted even if the user does not want to get all of the data at 205 this read command. Should be disabled for those drives that don't like 206 a filemark to truncate a read request or that don't like backspacing. 207 208 Scatter/gather buffers (buffers that consist of chunks non-contiguous 209 in the physical memory) are used if contiguous buffers can't be 210 allocated. To support all SCSI adapters (including those not 211 supporting scatter/gather), buffer allocation is using the following 212 three kinds of chunks: 213 1. The initial segment that is used for all SCSI adapters including 214 those not supporting scatter/gather. The size of this buffer will be 215 (PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of 216 this size (and it is not larger than the buffer size specified by 217 ST_BUFFER_BLOCKS). If this size is not available, the driver halves 218 the size and tries again until the size of one page. The default 219 settings in st_options.h make the driver to try to allocate all of the 220 buffer as one chunk. 221 2. The scatter/gather segments to fill the specified buffer size are 222 allocated so that as many segments as possible are used but the number 223 of segments does not exceed ST_FIRST_SG. 224 3. The remaining segments between ST_MAX_SG (or the module parameter 225 max_sg_segs) and the number of segments used in phases 1 and 2 226 are used to extend the buffer at run-time if this is necessary. The 227 number of scatter/gather segments allowed for the SCSI adapter is not 228 exceeded if it is smaller than the maximum number of scatter/gather 229 segments specified. If the maximum number allowed for the SCSI adapter 230 is smaller than the number of segments used in phases 1 and 2, 231 extending the buffer will always fail. 232 233 234 EOM BEHAVIOUR WHEN WRITING 235 236 When the end of medium early warning is encountered, the current write 237 is finished and the number of bytes is returned. The next write 238 returns -1 and errno is set to ENOSPC. To enable writing a trailer, 239 the next write is allowed to proceed and, if successful, the number of 240 bytes is returned. After this, -1 and the number of bytes are 241 alternately returned until the physical end of medium (or some other 242 error) is encountered. 243 244 245 MODULE PARAMETERS 246 247 The buffer size, write threshold, and the maximum number of allocated buffers 248 are configurable when the driver is loaded as a module. The keywords are: 249 250 buffer_kbs=xxx the buffer size for fixed block mode is set 251 to xxx kilobytes 252 write_threshold_kbs=xxx the write threshold in kilobytes set to xxx 253 max_sg_segs=xxx the maximum number of scatter/gather 254 segments 255 try_direct_io=x try direct transfer between user buffer and 256 tape drive if this is non-zero 257 258 Note that if the buffer size is changed but the write threshold is not 259 set, the write threshold is set to the new buffer size - 2 kB. 260 261 262 BOOT TIME CONFIGURATION 263 264 If the driver is compiled into the kernel, the same parameters can be 265 also set using, e.g., the LILO command line. The preferred syntax is 266 to use the same keyword used when loading as module but prepended 267 with 'st.'. For instance, to set the maximum number of scatter/gather 268 segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the 269 number of scatter/gather segments). 270 271 For compatibility, the old syntax from early 2.5 and 2.4 kernel 272 versions is supported. The same keywords can be used as when loading 273 the driver as module. If several parameters are set, the keyword-value 274 pairs are separated with a comma (no spaces allowed). A colon can be 275 used instead of the equal mark. The definition is prepended by the 276 string st=. Here is an example: 277 278 st=buffer_kbs:64,write_threshold_kbs:60 279 280 The following syntax used by the old kernel versions is also supported: 281 282 st=aa[,bb[,dd]] 283 284 where 285 aa is the buffer size for fixed block mode in 1024 byte units 286 bb is the write threshold in 1024 byte units 287 dd is the maximum number of scatter/gather segments 288 289 290 IOCTLS 291 292 The tape is positioned and the drive parameters are set with ioctls 293 defined in mtio.h The tape control program 'mt' uses these ioctls. Try 294 to find an mt that supports all of the Linux SCSI tape ioctls and 295 opens the device for writing if the tape contents will be modified 296 (look for a package mt-st* from the Linux ftp sites; the GNU mt does 297 not open for writing for, e.g., erase). 298 299 The supported ioctls are: 300 301 The following use the structure mtop: 302 303 MTFSF Space forward over count filemarks. Tape positioned after filemark. 304 MTFSFM As above but tape positioned before filemark. 305 MTBSF Space backward over count filemarks. Tape positioned before 306 filemark. 307 MTBSFM As above but ape positioned after filemark. 308 MTFSR Space forward over count records. 309 MTBSR Space backward over count records. 310 MTFSS Space forward over count setmarks. 311 MTBSS Space backward over count setmarks. 312 MTWEOF Write count filemarks. 313 MTWEOFI Write count filemarks with immediate bit set (i.e., does not 314 wait until data is on tape) 315 MTWSM Write count setmarks. 316 MTREW Rewind tape. 317 MTOFFL Set device off line (often rewind plus eject). 318 MTNOP Do nothing except flush the buffers. 319 MTRETEN Re-tension tape. 320 MTEOM Space to end of recorded data. 321 MTERASE Erase tape. If the argument is zero, the short erase command 322 is used. The long erase command is used with all other values 323 of the argument. 324 MTSEEK Seek to tape block count. Uses Tandberg-compatible seek (QFA) 325 for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and 326 block numbers in the status are not valid after a seek. 327 MTSETBLK Set the drive block size. Setting to zero sets the drive into 328 variable block mode (if applicable). 329 MTSETDENSITY Sets the drive density code to arg. See drive 330 documentation for available codes. 331 MTLOCK and MTUNLOCK Explicitly lock/unlock the tape drive door. 332 MTLOAD and MTUNLOAD Explicitly load and unload the tape. If the 333 command argument x is between MT_ST_HPLOADER_OFFSET + 1 and 334 MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the 335 drive with the command and it selects the tape slot to use of 336 HP C1553A changer. 337 MTCOMPRESSION Sets compressing or uncompressing drive mode using the 338 SCSI mode page 15. Note that some drives other methods for 339 control of compression. Some drives (like the Exabytes) use 340 density codes for compression control. Some drives use another 341 mode page but this page has not been implemented in the 342 driver. Some drives without compression capability will accept 343 any compression mode without error. 344 MTSETPART Moves the tape to the partition given by the argument at the 345 next tape operation. The block at which the tape is positioned 346 is the block where the tape was previously positioned in the 347 new active partition unless the next tape operation is 348 MTSEEK. In this case the tape is moved directly to the block 349 specified by MTSEEK. MTSETPART is inactive unless 350 MT_ST_CAN_PARTITIONS set. 351 MTMKPART Formats the tape with one partition (argument zero) or two 352 partitions (the argument gives in megabytes the size of 353 partition 1 that is physically the first partition of the 354 tape). The drive has to support partitions with size specified 355 by the initiator. Inactive unless MT_ST_CAN_PARTITIONS set. 356 MTSETDRVBUFFER 357 Is used for several purposes. The command is obtained from count 358 with mask MT_SET_OPTIONS, the low order bits are used as argument. 359 This command is only allowed for the superuser (root). The 360 subcommands are: 361 0 362 The drive buffer option is set to the argument. Zero means 363 no buffering. 364 MT_ST_BOOLEANS 365 Sets the buffering options. The bits are the new states 366 (enabled/disabled) the following options (in the 367 parenthesis is specified whether the option is global or 368 can be specified differently for each mode): 369 MT_ST_BUFFER_WRITES write buffering (mode) 370 MT_ST_ASYNC_WRITES asynchronous writes (mode) 371 MT_ST_READ_AHEAD read ahead (mode) 372 MT_ST_TWO_FM writing of two filemarks (global) 373 MT_ST_FAST_EOM using the SCSI spacing to EOD (global) 374 MT_ST_AUTO_LOCK automatic locking of the drive door (global) 375 MT_ST_DEF_WRITES the defaults are meant only for writes (mode) 376 MT_ST_CAN_BSR backspacing over more than one records can 377 be used for repositioning the tape (global) 378 MT_ST_NO_BLKLIMS the driver does not ask the block limits 379 from the drive (block size can be changed only to 380 variable) (global) 381 MT_ST_CAN_PARTITIONS enables support for partitioned 382 tapes (global) 383 MT_ST_SCSI2LOGICAL the logical block number is used in 384 the MTSEEK and MTIOCPOS for SCSI-2 drives instead of 385 the device dependent address. It is recommended to set 386 this flag unless there are tapes using the device 387 dependent (from the old times) (global) 388 MT_ST_SYSV sets the SYSV semantics (mode) 389 MT_ST_NOWAIT enables immediate mode (i.e., don't wait for 390 the command to finish) for some commands (e.g., rewind) 391 MT_ST_NOWAIT_EOF enables immediate filemark mode (i.e. when 392 writing a filemark, don't wait for it to complete). Please 393 see the BASICS note about MTWEOFI with respect to the 394 possible dangers of writing immediate filemarks. 395 MT_ST_SILI enables setting the SILI bit in SCSI commands when 396 reading in variable block mode to enhance performance when 397 reading blocks shorter than the byte count; set this only 398 if you are sure that the drive supports SILI and the HBA 399 correctly returns transfer residuals 400 MT_ST_DEBUGGING debugging (global; debugging must be 401 compiled into the driver) 402 MT_ST_SETBOOLEANS 403 MT_ST_CLEARBOOLEANS 404 Sets or clears the option bits. 405 MT_ST_WRITE_THRESHOLD 406 Sets the write threshold for this device to kilobytes 407 specified by the lowest bits. 408 MT_ST_DEF_BLKSIZE 409 Defines the default block size set automatically. Value 410 0xffffff means that the default is not used any more. 411 MT_ST_DEF_DENSITY 412 MT_ST_DEF_DRVBUFFER 413 Used to set or clear the density (8 bits), and drive buffer 414 state (3 bits). If the value is MT_ST_CLEAR_DEFAULT 415 (0xfffff) the default will not be used any more. Otherwise 416 the lowermost bits of the value contain the new value of 417 the parameter. 418 MT_ST_DEF_COMPRESSION 419 The compression default will not be used if the value of 420 the lowermost byte is 0xff. Otherwise the lowermost bit 421 contains the new default. If the bits 8-15 are set to a 422 non-zero number, and this number is not 0xff, the number is 423 used as the compression algorithm. The value 424 MT_ST_CLEAR_DEFAULT can be used to clear the compression 425 default. 426 MT_ST_SET_TIMEOUT 427 Set the normal timeout in seconds for this device. The 428 default is 900 seconds (15 minutes). The timeout should be 429 long enough for the retries done by the device while 430 reading/writing. 431 MT_ST_SET_LONG_TIMEOUT 432 Set the long timeout that is used for operations that are 433 known to take a long time. The default is 14000 seconds 434 (3.9 hours). For erase this value is further multiplied by 435 eight. 436 MT_ST_SET_CLN 437 Set the cleaning request interpretation parameters using 438 the lowest 24 bits of the argument. The driver can set the 439 generic status bit GMT_CLN if a cleaning request bit pattern 440 is found from the extended sense data. Many drives set one or 441 more bits in the extended sense data when the drive needs 442 cleaning. The bits are device-dependent. The driver is 443 given the number of the sense data byte (the lowest eight 444 bits of the argument; must be >= 18 (values 1 - 17 445 reserved) and <= the maximum requested sense data sixe), 446 a mask to select the relevant bits (the bits 9-16), and the 447 bit pattern (bits 17-23). If the bit pattern is zero, one 448 or more bits under the mask indicate cleaning request. If 449 the pattern is non-zero, the pattern must match the masked 450 sense data byte. 451 452 (The cleaning bit is set if the additional sense code and 453 qualifier 00h 17h are seen regardless of the setting of 454 MT_ST_SET_CLN.) 455 456 The following ioctl uses the structure mtpos: 457 MTIOCPOS Reads the current position from the drive. Uses 458 Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2 459 command for the SCSI-2 drives. 460 461 The following ioctl uses the structure mtget to return the status: 462 MTIOCGET Returns some status information. 463 The file number and block number within file are returned. The 464 block is -1 when it can't be determined (e.g., after MTBSF). 465 The drive type is either MTISSCSI1 or MTISSCSI2. 466 The number of recovered errors since the previous status call 467 is stored in the lower word of the field mt_erreg. 468 The current block size and the density code are stored in the field 469 mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and 470 MT_ST_DENSITY_SHIFT). 471 The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN 472 is set if there is no tape in the drive. GMT_EOD means either 473 end of recorded data or end of tape. GMT_EOT means end of tape. 474 475 476 MISCELLANEOUS COMPILE OPTIONS 477 478 The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL 479 is defined. 480 481 The maximum number of tape devices is determined by the define 482 ST_MAX_TAPES. If more tapes are detected at driver initialization, the 483 maximum is adjusted accordingly. 484 485 Immediate return from tape positioning SCSI commands can be enabled by 486 defining ST_NOWAIT. If this is defined, the user should take care that 487 the next tape operation is not started before the previous one has 488 finished. The drives and SCSI adapters should handle this condition 489 gracefully, but some drive/adapter combinations are known to hang the 490 SCSI bus in this case. 491 492 The MTEOM command is by default implemented as spacing over 32767 493 filemarks. With this method the file number in the status is 494 correct. The user can request using direct spacing to EOD by setting 495 ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file 496 number will be invalid. 497 498 When using read ahead or buffered writes the position within the file 499 may not be correct after the file is closed (correct position may 500 require backspacing over more than one record). The correct position 501 within file can be obtained if ST_IN_FILE_POS is defined at compile 502 time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl. 503 (The driver always backs over a filemark crossed by read ahead if the 504 user does not request data that far.) 505 506 507 DEBUGGING HINTS 508 509 To enable debugging messages, edit st.c and #define DEBUG 1. As seen 510 above, debugging can be switched off with an ioctl if debugging is 511 compiled into the driver. The debugging output is not voluminous. 512 513 If the tape seems to hang, I would be very interested to hear where 514 the driver is waiting. With the command 'ps -l' you can see the state 515 of the process using the tape. If the state is D, the process is 516 waiting for something. The field WCHAN tells where the driver is 517 waiting. If you have the current System.map in the correct place (in 518 /boot for the procps I use) or have updated /etc/psdatabase (for kmem 519 ps), ps writes the function name in the WCHAN field. If not, you have 520 to look up the function from System.map. 521 522 Note also that the timeouts are very long compared to most other 523 drivers. This means that the Linux driver may appear hung although the 524 real reason is that the tape firmware has got confused.