1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
|
Using GlusterFS volumes to host VM images and data was sub-optimal due to the FUSE overhead involved in accessing gluster volumes via GlusterFS native client. However this has changed now with two specific enhancements:
- A new library called libgfapi is now available as part of GlusterFS that provides POSIX-like C APIs for accessing gluster volumes. libgfapi support is available from GlusterFS-3.4 release.
- QEMU (starting from QEMU-1.3) will have GlusterFS block driver that uses libgfapi and hence there is no FUSE overhead any longer when QEMU works with VM images on gluster volumes.
GlusterFS with its pluggable translator model can serve as a flexible storage backend for QEMU. QEMU has to just talk to GlusterFS and GlusterFS will hide different file systems and storage types underneath. Various GlusterFS storage features like replication and striping will automatically be available for QEMU. Efforts are also on to add block device backend in Gluster via Block Device (BD) translator that will expose underlying block devices as files to QEMU. This allows GlusterFS to be a single storage backend for both file and block based storage types.
###GlusterFS specifcation in QEMU
VM image residing on gluster volume can be specified on QEMU command line using URI format
gluster[+transport]://[server[:port]]/volname/image[?socket=...]
* `gluster` is the protocol.
* `transport` specifies the transport type used to connect to gluster management daemon (glusterd). Valid transport types are `tcp, unix and rdma.` If a transport type isn’t specified, then tcp type is assumed.
* `server` specifies the server where the volume file specification for the given volume resides. This can be either hostname, ipv4 address or ipv6 address. ipv6 address needs to be within square brackets [ ]. If transport type is unix, then server field should not be specified. Instead the socket field needs to be populated with the path to unix domain socket.
* `port` is the port number on which glusterd is listening. This is optional and if not specified, QEMU will send 0 which will make gluster to use the default port. If the transport type is unix, then port should not be specified.
* `volname` is the name of the gluster volume which contains the VM image.
* `image` is the path to the actual VM image that resides on gluster volume.
###Examples:
gluster://1.2.3.4/testvol/a.img
gluster+tcp://1.2.3.4/testvol/a.img
gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
gluster+tcp://server.domain.com:24007/testvol/dir/a.img
gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
gluster+rdma://1.2.3.4:24007/testvol/a.img
NOTE: (GlusterFS URI description and above examples are taken from QEMU documentation)
###Configuring QEMU with GlusterFS backend
While building QEMU from source, in addition to the normal configuration options, ensure that –enable-glusterfs options are specified explicitly with ./configure script to get glusterfs support in qemu.
Starting with QEMU-1.6, pkg-config is used to configure the GlusterFS backend in QEMU. If you are using GlusterFS compiled and installed from sources, then the GlusterFS package config file (glusterfs-api.pc) might not be present at the standard path and you will have to explicitly add the path by executing this command before running the QEMU configure script:
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
Without this, GlusterFS driver will not be compiled into QEMU even when GlusterFS is present in the system.
* Creating a VM image on GlusterFS backend
qemu-img command can be used to create VM images on gluster backend. The general syntax for image creation looks like this:
For ex:
qemu-img create gluster://server/volname/path/to/image size
## How to setup the environment:
This usecase ( using glusterfs backend for VM disk store), is known as 'Virt-Store' usecase. Steps for the entire procedure could be split to:
* Steps to be done on gluster volume side
* Steps to be done on Hypervisor side
##Steps to be done on gluster side
These are the steps that needs to be done on the gluster side. Precisely this involves
Creating "Trusted Storage Pool"
Creating a volume
Tuning the volume for virt-store
Tuning glusterd to accept requests from QEMU
Tuning glusterfsd to accept requests from QEMU
Setting ownership on the volume
Starting the volume
* Creating "Trusted Storage Pool"
Install glusterfs rpms on the NODE. You can create a volume with a single node. You can also scale up the cluster, as we call as `Trusted Storage Pool`, by adding more nodes to the cluster
gluster peer probe <hostname>
* Creating a volume
It is highly recommended to have replicate volume or distribute-replicate volume for virt-store usecase, as it would add high availability and fault-tolerance. Remember the plain distribute works equally well
gluster volume create replica 2 <brick1> .. <brickN>
where, `<brick1> is <hostname>:/<path-of-dir> `
Note: It is recommended to create sub-directories inside brick and that could be used to create a volume.For example, say, /home/brick1 is the mountpoint of XFS, then you can create a sub-directory inside it /home/brick1/b1 and use it while creating a volume.You can also use space available in root filesystem for bricks. Gluster cli, by default, throws warning in that case. You can override it by using force option
gluster volume create replica 2 <brick1> .. <brickN> force
If you are new to GlusterFS, you can take a look at QuickStart (http://www.gluster.org/community/documentation/index.php/QuickStart) guide.
* Tuning the volume for virt-store
There are recommended settings available for virt-store. This provide good performance characteristics when enabled on the volume that was used for virt-store
Refer to http://www.gluster.org/community/documentation/index.php/Virt-store-usecase#Tunables for recommended tunables and for applying them on the volume, http://www.gluster.org/community/documentation/index.php/Virt-store-usecase#Applying_the_Tunables_on_the_volume
* Tuning glusterd to accept requests from QEMU
glusterd receives the request only from the applications that run with port number less than 1024 and it blocks otherwise. QEMU uses port number greater than 1024 and to make glusterd accept requests from QEMU, edit the glusterd vol file, /etc/glusterfs/glusterd.vol and add the following,
option rpc-auth-allow-insecure on
Note: If you have installed glusterfs from source, you can find glusterd vol file at `/usr/local/etc/glusterfs/glusterd.vol`
Restart glusterd after adding that option to glusterd vol file
service glusterd restart
* Tuning glusterfsd to accept requests from QEMU
Enable the option `allow-insecure` on the particular volume
gluster volume set <volname> server.allow-insecure on
IMPORTANT : As of now(april 2,2014)there is a bug, as allow-insecure is not dynamically set on a volume.You need to restart the volume for the change to take effect
* Setting ownership on the volume
Set the ownership of qemu:qemu on to the volume
gluster volume set <vol-name> storage.owner-uid 107
gluster volume set <vol-name> storage.owner-gid 107
* Starting the volume
Start the volume
gluster volume start <vol-name>
## Steps to be done on Hypervisor Side:
To create a raw image,
qemu-img create gluster://1.2.3.4/testvol/dir/a.img 5G
To create a qcow2 image,
qemu-img create -f qcow2 gluster://server.domain.com:24007/testvol/a.img 5G
## Booting VM image from GlusterFS backend
A VM image 'a.img' residing on gluster volume testvol can be booted using QEMU like this:
qemu-system-x86_64 -drive file=gluster://1.2.3.4/testvol/a.img,if=virtio
In addition to VM images, gluster drives can also be used as data drives:
qemu-system-x86_64 -drive file=gluster://1.2.3.4/testvol/a.img,if=virtio -drive file=gluster://1.2.3.4/datavol/a-data.img,if=virtio
Here 'a-data.img' from datavol gluster volume appears as a 2nd drive for the guest.
It is also possible to make use of libvirt to define a disk and use it with qemu:
### Create libvirt XML to define Virtual Machine
virt-install is python wrapper which is mostly used to create VM using set of params. How-ever virt-install doesn't support any network filesystem [ https://bugzilla.redhat.com/show_bug.cgi?id=1017308 ]
Create a libvirt VM xml - http://libvirt.org/formatdomain.html where the disk section is formatted in such a way, qemu driver for glusterfs is being used. This can be seen in the following example xml description
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source protocol='gluster' name='distrepvol/vm3.img'>
<host name='10.70.37.106' port='24007'/>
</source>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
* Define the VM from the XML file that was created earlier
virsh define <xml-file-description>
* Verify that the VM is created successfully
virsh list --all
* Start the VM
virsh start <VM>
* Verification
You can verify the disk image file that is being used by VM
virsh domblklist <VM-Domain-Name/ID>
The above should show the volume name and image name. Here is the example,
[root@test ~]# virsh domblklist vm-test2
Target Source
------------------------------------------------
vda distrepvol/test.img
hdc -
Reference:
For more details on this feature implementation and its advantages, please refer:
http://raobharata.wordpress.com/2012/10/29/qemu-glusterfs-native-integration/
http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt
|