About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / device-mapper / switch.txt

Custom Search

Based on kernel version 4.13.3. Page generated on 2017-09-23 13:54 EST.

1	dm-switch
2	=========
4	The device-mapper switch target creates a device that supports an
5	arbitrary mapping of fixed-size regions of I/O across a fixed set of
6	paths.  The path used for any specific region can be switched
7	dynamically by sending the target a message.
9	It maps I/O to underlying block devices efficiently when there is a large
10	number of fixed-sized address regions but there is no simple pattern
11	that would allow for a compact representation of the mapping such as
12	dm-stripe.
14	Background
15	----------
17	Dell EqualLogic and some other iSCSI storage arrays use a distributed
18	frameless architecture.  In this architecture, the storage group
19	consists of a number of distinct storage arrays ("members") each having
20	independent controllers, disk storage and network adapters.  When a LUN
21	is created it is spread across multiple members.  The details of the
22	spreading are hidden from initiators connected to this storage system.
23	The storage group exposes a single target discovery portal, no matter
24	how many members are being used.  When iSCSI sessions are created, each
25	session is connected to an eth port on a single member.  Data to a LUN
26	can be sent on any iSCSI session, and if the blocks being accessed are
27	stored on another member the I/O will be forwarded as required.  This
28	forwarding is invisible to the initiator.  The storage layout is also
29	dynamic, and the blocks stored on disk may be moved from member to
30	member as needed to balance the load.
32	This architecture simplifies the management and configuration of both
33	the storage group and initiators.  In a multipathing configuration, it
34	is possible to set up multiple iSCSI sessions to use multiple network
35	interfaces on both the host and target to take advantage of the
36	increased network bandwidth.  An initiator could use a simple round
37	robin algorithm to send I/O across all paths and let the storage array
38	members forward it as necessary, but there is a performance advantage to
39	sending data directly to the correct member.
41	A device-mapper table already lets you map different regions of a
42	device onto different targets.  However in this architecture the LUN is
43	spread with an address region size on the order of 10s of MBs, which
44	means the resulting table could have more than a million entries and
45	consume far too much memory.
47	Using this device-mapper switch target we can now build a two-layer
48	device hierarchy:
50	    Upper Tier - Determine which array member the I/O should be sent to.
51	    Lower Tier - Load balance amongst paths to a particular member.
53	The lower tier consists of a single dm multipath device for each member.
54	Each of these multipath devices contains the set of paths directly to
55	the array member in one priority group, and leverages existing path
56	selectors to load balance amongst these paths.  We also build a
57	non-preferred priority group containing paths to other array members for
58	failover reasons.
60	The upper tier consists of a single dm-switch device.  This device uses
61	a bitmap to look up the location of the I/O and choose the appropriate
62	lower tier device to route the I/O.  By using a bitmap we are able to
63	use 4 bits for each address range in a 16 member group (which is very
64	large for us).  This is a much denser representation than the dm table
65	b-tree can achieve.
67	Construction Parameters
68	=======================
70	    <num_paths> <region_size> <num_optional_args> [<optional_args>...]
71	    [<dev_path> <offset>]+
73	<num_paths>
74	    The number of paths across which to distribute the I/O.
76	<region_size>
77	    The number of 512-byte sectors in a region. Each region can be redirected
78	    to any of the available paths.
80	<num_optional_args>
81	    The number of optional arguments. Currently, no optional arguments
82	    are supported and so this must be zero.
84	<dev_path>
85	    The block device that represents a specific path to the device.
87	<offset>
88	    The offset of the start of data on the specific <dev_path> (in units
89	    of 512-byte sectors). This number is added to the sector number when
90	    forwarding the request to the specific path. Typically it is zero.
92	Messages
93	========
95	set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
97	Modify the region table by specifying which regions are redirected to
98	which paths.
100	<index>
101	    The region number (region size was specified in constructor parameters).
102	    If index is omitted, the next region (previous index + 1) is used.
103	    Expressed in hexadecimal (WITHOUT any prefix like 0x).
105	<path_nr>
106	    The path number in the range 0 ... (<num_paths> - 1).
107	    Expressed in hexadecimal (WITHOUT any prefix like 0x).
109	R<n>,<m>
110	    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
111	    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
112	    slots.
114	Status
115	======
117	No status line is reported.
119	Example
120	=======
122	Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
123	the same size.
125	Create a switch device with 64kB region size:
126	    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
127		switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
129	Set mappings for the first 7 entries to point to devices switch0, switch1,
130	switch2, switch0, switch1, switch2, switch1:
131	    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
133	Set repetitive mapping. This command:
134	    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
135	is equivalent to:
136	    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
137		:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
Hide Line Numbers
About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Information is copyright its respective author. All material is available from the Linux Kernel Source distributed under a GPL License. This page is provided as a free service by mjmwired.net.