summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt
Commit message (Collapse)AuthorAgeFilesLines
...
* cli: Add an option to fetch just incremental or cumulative I/0Dawit Alemu2013-12-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | information 'volume profile info' fetches both cumulative and incremental I/O statistics. There isn't a way to fetch just cumulative or incremental statistics. This change introduces two optional arguments, namely "incremental" and "cumulative", that can be tacked on to 'volume profile info'. In other words, the new command format is volume profile <VOLNAME> {start | info [incremental | cumulative] | stop} [nfs] 'volume profile info incremental' - fetches incremental stats 'volume profile info cumulative' - fetches cumulative stats 'volume profile info' - fetches incremental and cumulative stats Change-Id: I5ddb45d990542ea611d23d251efebfec46f472d0 BUG: 1030580 Signed-off-by: Dawit Alemu <dalemu@redhat.com> Reviewed-on: http://review.gluster.org/6264 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* cli, glusterd: More quota fixes ...Krutika Dhananjay2013-11-303-311/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... which may be grouped under the following categories: 1. Fix incorrect cli exit status for 'quota list' cmd 2. Print appropriate error message on quota parse errors in cli Authored by: Anuradha Talur <atalur@redhat.com> 3. glusterd: Improve quota validation during stage-op 4. Fix peer probe issues resulting from quota conf checksum mismatches 5. Enhancements to CLI output in the event of quota command failures Authored by: Kaushal Madappa <kmadappa@redhat.com> 7. Move aux mount location from /tmp to /var/run/gluster Authored by: Krishnan Parthasarathi <kparthas@redhat.com> 8. Fix performance issues in quota limit-usage Authored by: Krutika Dhananjay <kdhananj@redhat.com> Note: Some functions that were used in earlier version of quota, that aren't called anymore have been removed. Change-Id: I9d874f839ae5fdcfbe6d4f2d727eac091f27ac57 BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6366 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: Add the quota config xattr to newly added brickVarun Shastry2013-11-261-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: Quota directory limit configuration is stored in the xattrs. When a new brick is added these 'limit-set' xattrs have to be created to the directory in the new brick. This is done by the dht directory healing when the directory is created in the new brick. Since 'root' directory is already created DHT doesn't heal the limit-set xattr root. Solution: When the add-brick command is issued run the below hook script to heal the 'limit-set' xattr. The hook script does the following only if limit is configured on root. 1. Create an auxiliary mount. 2. getxattr 'limit-set' on the root 3. setxattr the same value on the root But this script needs the volume to be started to make the auxiliary mount. To handle the case when the add-brick is issued when the volume was stopped, symlink is created by the 'master' script to the corresponding location and these two are by default disabled. So, a 'master' script is added in the add-brick/pre. When add-brick command is issued, it enables one of the scripts mentioned above based on the condition, if volume is started - enable add-brick/post script else - enable start/post script After the actual script completes its job, it disables itself. Note: The enabling and disabling of the script is based on the glusterd's logic, that it only runs the scripts which starts its name with 'S'. So, Enable - symlink the file to 'S'* Disable - unlink the symlink. Change-Id: I2d3947a4d686c54417ec95f530af3bdd3444f4e2 BUG: 969461 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6104 Reviewed-by: Brian Foster <bfoster@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: use proper copy to set node-nameBala.FA2013-11-261-2/+19
| | | | | | | | | | | | Previously node-name is set to point to node-uuid which could cause memory leak. This is fixed by having memory copy of node-uuid. BUG: 1012296 Change-Id: I3b638ec289d5b167c6e752ef1ba41f41efacb9da Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6330 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli/glusterd: Changes to quota command Quota featureRaghavendra G2013-11-2617-584/+2536
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | re-work. Following are the cli commands that are new/re-worked: ====================================================== volume quota <VOLNAME> {enable|disable|list [<path> ...]|remove <path>| default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad]] [detail|clients|mem|inode|fd|callpool] volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history] glusterd changes: ================= * Quota limits are now set as extended attributes by glusterd from the aux mount created by the cli. * The gfids of the directories on which quota limits are set for a given volume are stored in /var/lib/glusterd/vols/<volname>/quota.conf file in binary format, and whose cksum and version is stored in /var/lib/glusterd/vols/<volname>/quota.cksum. Original-author: Krutika Dhananjay <kdhananj@redhat.com> Original-author: Krishnan Parthasarathi <kparthas@redhat.com> BUG: 969461 Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6003 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: placeholders for GFID to path conversionRaghavendra G2013-11-262-19/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | what? ===== The following is an attempt to generate the paths of a file when only its gfid is known. To find the path of a directory, the symlink handle to the directory maintained in the ".glusterfs" backend directory is read. The symlink handle is generated using the gfid of the directory. It (handle) contains the directory's name and parent gfid, which are used to recursively construct the absolute path as seen by the user from the mount point. A similar approach cannot be used for a regular file or a symbolic link since its hardlink handle, generated using its gfid, doesn't contain its parent gfid and basename. So xattrs are set to store the parent gfids and the number of hardlinks to a file or a symlink having the same parent gfid. When an user/application requests for the paths of a regular file or a symlink with multiple hardlinks, using the parent gfids stored in the xattrs, the paths of the parent directories are generated as mentioned earlier. The base names of the hardlinks (with the same parent gfid) are determined by matching the actual backend inode numbers of each entry in the parent directory with that of the hardlink handle. Xattr is set on a regular file, link, and symbolic link as follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value : <number of hardlinks to a regular file/symlink with the same parentgfid> If a regular file, hard link, symbolic link is created then an xattr in the above format is set in the backend. how to use? =========== This functionality can be used through getxattr interface. Two keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable usage of this functionality. A successful getxattr will have the result stored under same keys. Values will be, glusterfs.ancestry.dentry: -------------------------- A linked list of gf-dirent structures for all possible paths from root to this gfid. If there are multiple paths, the linked-list will be a series of paths one after another. Each path will be a series of dentries representing all components of the path. This key is primarily for internal usage within glusterfs. glusterfs.ancestry.path: ------------------------ A string containing all possible paths from root to this gfid. Multiple hardlinks of a file or a symlink are displayed as a colon seperated list (this could interfere with path components containing ':'). e.g. If there is a file "file1" in root directory with two hardlinks, "/dir2/link2tofile1" and "/dir1/link1tofile1", then [root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text file1 glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1" Thanks Amar, Avati and Venky for the inputs. Original Author: Ramana Raja <rraja@redhat.com> BUG: 990028 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497 Reviewed-on: http://review.gluster.org/5951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: fix possible memory leaksBala.FA2013-11-212-1/+6
| | | | | | | | | BUG: 955548 Change-Id: Iae410712e7e6d7a76cd537c77f1919e3b4cdf6bb Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6328 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bd: Add Zerofill FOP supportM. Mohan Kumar2013-11-202-0/+17
| | | | | | | | | | BUG: 1028673 Change-Id: I9ba8e3e6cf2f888640b4d2a2eb934a27ff903c42 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6290 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: Coverity fix for CID 1128906Santosh Kumar Pradhan2013-11-201-11/+13
| | | | | | | | | | | | | Fix the Coverity issue introduced in RFE: NFS volume set/reset commit i.e. http://review.gluster.org/6236 Change-Id: I817b9da03a3ce7f5511303faea0c50dfdad60ff4 BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6307 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Start rebalance only where requiredKaushal M2013-11-203-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | Gluster was starting rebalance processes on peers where it wasn't required in two cases. - For a normal rebalance command on a volume, rebalance processes were started on all peers instead of just the peers which contain bricks of the volume - For rebalance process being restarted by a volume sync, caused by a new peer being probed or a peer restarting, rebalance processes were started on all peers, for both a normal rebalance and for remove-brick needing rebalance. This patch adds a new check before starting rebalance process in the above two cases. - For rebalance process required by a rebalance command, each peer will check if it contains atleast one brick of the volume - For rebalance process required by a remove-brick command, each peer will check if it contains atleast one of the bricks being removed Change-Id: I512da16994f0d5482889c3a009c46dc20a8a15bb BUG: 1031887 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6301 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: fix undefined sybmol error related to BDPranith Kumar K2013-11-193-1/+5
| | | | | | | | | | Change-Id: I2210f1ac7de04c6025c0ec02d998b626d41466ae BUG: 1028672 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6303 Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: add peerid to volume status xml outputBala.FA2013-11-142-0/+36
| | | | | | | | | | | | This patch adds <peerid> tag to bricks and nfs/shd like services to volume status xml output. BUG: 955548 Change-Id: I9aaa9266e4d56f632235eaeef565e92d757c0694 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-144-0/+148
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Transparent data encryption and metadata authenticationEdward Shishkin2013-11-132-11/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ********************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ********************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bd: Add support to create clone, snapshot and merge of LV images.M. Mohan Kumar2013-11-134-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | Special xattr names "clone" & "snapshot" can be used to create full and linked clone of the LV images. GFID of destination posix file (to be mapped) is passed as a value to the xattr. Destination posix file must exist before running this operation. These operations form a basis for offloading storage related operations from QEMU to GlusterFS. Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file" Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file" Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file" Example: setfattr -n clone -v <gfid-of-dest-file> /media/source setfattr -n snapshot -v <gfid-of-dest-file> /media/source setfattr -n merge -v "/media/sn" /media/sn Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add aio support to BD xlatorM. Mohan Kumar2013-11-131-0/+4
| | | | | | | | | | | | Volume option bd-aio controls AIO feature for BD xlator. Code taken from posix-aio.c Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5748 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-1311-1/+367
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd_map: Remove bd_map xlatorM. Mohan Kumar2013-11-1311-423/+16
| | | | | | | | | | | Remove bd_map xlator and CLI related changes. Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/compress: Compression/DeCompression translatorPrashanth Pai2013-11-112-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: modify remove-brick CLI message.Ravishankar N ravishankar@redhat.com2013-11-071-5/+4
| | | | | | | | | | | | | | | | | | | | | In the current context "replica_cnt" is used just to know whether the specific key exists or not by calling "dict_get_int32", which we can replace by "dict_get ()". And changing the log message as it is more appropriate to say "migration of data" rather than "rebalance". This patch refactors commit 51c6fa7a354826744de98 against BZ 961669 reviewed on : http://review.gluster.org/5566 Change-Id: I48eae206a28d4083975e64407ed8fe4539f9c24b BUG: 1027270 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Original patch: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/6001 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: add option to specify a different base-portKaleb S. KEITHLEY2013-10-314-117/+127
| | | | | | | | | | | | | | | | | | This is (arguably) a hack to work around a bug in libvirt which is not well behaved wrt to using TCP ports in the unreserved space between 49152-65535. (See RFC 6335) Normally glusterd starts and binds to the first available port in range, usually 49152. libvirt's live migration also tries to use ports in this range, but has no fallback to use (an)other port(s) when the one it wants is already in use. Change-Id: Id8fe35c08b6ce4f268d46804bbb6dddab7a6b7bb BUG: 1018178 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/6210 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Release big-lock after log-rotate handler returnsKrutika Dhananjay2013-10-301-1/+8
| | | | | | | | | | | | ... so that subsequent volume commands don't block waiting forever, for the lock to be released. Change-Id: I24b5ec47f6982900ab74ff1b492d523f31ecfb7f BUG: 1022055 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6122 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Improved quota volume reset commandAnuradha2013-10-282-10/+23
| | | | | | | | | | | | | | | | | | | | | | | Quota volume reset command without "force" option fixed, doesn't fail anymore. It resets unprotected fields and not the protected ones. Also, an appropriate message is provided to the user for the following cases : 1. only unprotected fields are reset, "force" option should be used to reset protected fields. 2. Both protected and unprotected fields are reset. 3. No field was reset, "force" option required. Test case for the same also added. Change-Id: I24e8f1be87b79ccd81bf6f933e00608b861c7a16 BUG: 1022905 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/6135 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpcsvc: implement per-client RPC throttlingAnand Avati2013-10-281-0/+12
| | | | | | | | | | | | | | | | Implement a limit on the total number of outstanding RPC requests from a given cient. Once the limit is reached the client socket is removed from POLL-IN event polling. Change-Id: I8071b8c89b78d02e830e6af5a540308199d6bdcd BUG: 1008301 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6114 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Relax extended attribute checks for volume create and add ↵Vijay Bellur2013-10-172-5/+10
| | | | | | | | | | | | | | brick force. Expectation with force is that user is aware of the consequences of sanity checks not being triggered. Change-Id: I79dfeed16a23829a7217cef33ab83f9f0ffae336 Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1007509 Reviewed-on: http://review.gluster.org/5746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Changes to cli-glusterd communicationKaushal M2013-10-174-10/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Glusterd changes: With this patch, glusterd creates a socket file in DATADIR/run/glusterd.socket , and listen on it for cli requests. It listens for 2 rpc programs on the socket file, - The glusterd cli rpc program, for all cli commands - A reduced glusterd handshake program, just for the 'system:: getspec' command The location of the socket file can be changed with the glusterd option 'glusterd-sockfile'. To retain compatibility with the '--remote-host' cli option, glusterd also listens for the cli requests on port 24007. But, for the sake of security, it listens using a reduced cli rpc program on the port. The reduced rpc program only contains read-only procs used for 'volume (info|list|status)', 'peer status' and 'system:: getwd' cli commands. CLI changes: The gluster cli now uses the glusterd socket file for communicating with glusterd by default. A new option '--gluster-sock' has been added to allow specifying the sockfile used to connect. Using the '--remote-host' option will make cli connect to the given host & port. Tests changes: cluster.rc has been modified to make use of socket files and use different log files for each glusterd. Some of the tests using cluster.rc have been fixed. Change-Id: Iaf24bc22f42f8014a5fa300ce37c7fc9b1b92b53 BUG: 980754 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5280 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Add monotonic clocking counter for timer threadHarshavardhana2013-10-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gettimeofday() returns the current wall clock time and timezone. Using these functions in order to measure the passage of time (how long an operation took) therefore seems like a no-brainer. This time suffer's from some limitations: a. They have a low resolution: “High-performance” timing by definition, requires clock resolutions into the microseconds or better. b. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive). In such cases timer thread could go into an infinite loop. From 'man gettimeofday': ---------- .. .. The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). .. .. ---------- Rationale: For calculating interval timing for Timer thread, all that’s needed should be clock as a simple counter that increments at a stable rate. This is necessary to avoid the jumps which are caused by using "wall time", this counter must be monotonic that can never “tick” backwards, ever. Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb BUG: 1017993 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: [Feature] Command implementation to get heal-countVenkatesh Somyajulu2013-10-144-25/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia | wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Implementation of command "gluster volume heal vn statistics"Venkatesh Somyajulu2013-10-142-1/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "gluster volume heal volumename statistics" command gives the summary of the afr crawl done based on the entries present in the xattrop directory. Whenever afr crawls are attempted, the beginning time of crawl, end time of crawl, no of files healed, heal-failed count and number of files in split brain are shown along with the type of the crawl. If crawl is already in progress then it will give the number of files healed, heal failed count and number of files in split-brain from the beginning of the crawl and instead of telling the end time of the crawl, "CRAWL IN PROGRESS" message will be shown. Output format: command: "gluster volume heal volume-name statistics" Output: Gathering afr crawl statistics crawl statistics on volume volume-name has been successful ------------------------------------------------ Crawl statistics for brick no 0 Hostname of brick 192.168.122.248 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 ------------------------------------------------ Crawl statistics for brick no 1 Hostname of brick 192.168.122.1 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 -------------------------------------------------- Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9 BUG: 949400 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4790 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Implement 'volume status tasks'Krutika Dhananjay2013-10-084-27/+91
| | | | | | | | | | | | | | | | | | | oVirt's Gluster Integration needs an inexpensive command that can be executed every 10 seconds to monitor async tasks and their parameters, for all volumes. The solution involves adding a 'tasks' sub-command to 'volume status' to fetch only the async task IDs, type and other relevant parameters. Only the originator glusterd participates in this command as all the information needed is available on all the nodes. This is to make the command suitable for being executed every 10 seconds. Change-Id: I1edc607baf29b001a5585079dec681d7c641b3d1 BUG: 1012346 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6006 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* cli: add node uuid in rebalance and remove brick status xml outputBala.FA2013-10-031-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds node uuid in rebalance/remove-brick status xml output. Output XML will look like <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>0</opErrno> <opErrstr/> <volRebalance> <op>3</op> <nodeCount>1</nodeCount> <node> <nodeName>localhost</nodeName> ==>> <id>883626f8-4d29-4d02-8c5d-c9f48c5b2445</id> <files>0</files> <size>0</size> <lookups>0</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </node> <aggregate> <files>0</files> <size>0</size> <lookups>0</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </aggregate> </volRebalance> </cliOutput> Change-Id: I5a1d4f9043b33b9e88150647a243ddb16154e843 BUG: 1012296 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6005 Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Validating invalid log-level under geo-rep config options.Ajeet Jha2013-10-032-5/+18
| | | | | | | | | | | Change-Id: I8ff6b48ef41fd6e9ea68c42dfb9878f8a08ed627 BUG: 1010874 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/5989 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* glusterd: Calculate volume op-versions only on set/resetKaushal M2013-09-293-6/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The volume op-versions are calculated during a volume set/reset, reading a volume from disk and importing a volume during probe or volume sync. The calculation of the volume op-version depends on the clusters op-version as some features are enabled automatically depending on the clusters op-version. We also don't store the volume op-versions persistently and don't export the volume op-versions during sync. Due to this, there can occur cases which will lead to inconsistencies in volumes in different peers. One such case is below, Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N. The cluster has two volumes V1 and V2, which have volume op-versions N (since volume op-version cannot be greater than cluster op-version). We have, Cluster-op-version = N V1 op-version = N V2 op-version = N A set operation on V1 causes the clusters op-version to be bumped up to N+1. Assume that there exist some features that are automatically enabled on op-version N+1. The op-version of V2 remains at N as no operation has been performed on it. So, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N Now, we probe a new peer P4. On the new peer we will have the following op-versions, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N+1 This happens because we don't send volume op-versions during the sync after probe. P4 will freshly calculate the op-version of V2 (assuming features have been auto enabled due to the cluster op-version being N+1) as N+1. Another case is when glusterd on a peer restarts. Assume P3 was restarted, glusterd will recalculate the volume op-versions during the restore state. Again, op-version of V2 will be calculated as N+1 assuming auto enabled features. This will lead to inconsistency in the volume representation in memory and on disk, as glusterd will assume the volume contains auto enabled features, but the volfiles don't contain them as they were not regenrated. These kind of issues can be solved by calculating the volume op-version only when features are enabled and disabled (ie. during volume set/reset), persisting the volume-op-versions and exporting/importing them. Change-Id: I52de0668c92628622e85f4588fb28829a7231132 BUG: 1005043 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5568 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Fix storing volumes on setting global optsKaushal M2013-09-291-3/+6
| | | | | | | | | | | | | | | | | | Glusterd would not store all the volumes when a global options were set. When setting a global option, like 'nfs.*' options, glusterd used to modify the volinfo for all the volumes, but would store only the volinfo for the named volume. This lead to mismatch in the persisted and the in-memory representation of those volumes, which lead to problems like peers being rejected because of volume mismatches. Change-Id: I8bca10585e34b7135cb32af0055dbd462b3fb9b5 BUG: 1012400 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6007 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Set errstr appropriately on peer op failureKrutika Dhananjay2013-09-291-11/+17
| | | | | | | | | | Change-Id: I27f5f7cd54115d7b236b42f6beaaa05a8b379dd7 BUG: 1010153 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/5978 Reviewed-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* logging: Remove multiple definitions of DEFAULT_LOG_FILE_DIRECTORYVijay Bellur2013-09-241-1/+0
| | | | | | | | | Change-Id: I8d670a228d3c1282aa7d70b151f166d04abc40e5 BUG: 764890 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5909 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* glusterd/cli: Status detail cli parse check and vol geo status crash fixAvra Sengupta2013-09-211-7/+12
| | | | | | | | | | Change-Id: I1841864273fc4242de15fbfcf76fd5de40269f28 BUG: 1006249 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5889 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Adding transaction checks for cluster unlock.Avra Sengupta2013-09-201-3/+12
| | | | | | | | | | | | | | | | | | | | | | While a gluster command holding lock is in execution, any other gluster command which tries to run will fail to acquire the lock. As a result command#2 will follow the cleanup code flow, which also includes unlocking the held locks. As both the commands are run from the same node, command#2 will end up releasing the locks held by command#1 even before command#1 reaches completion. Now we call the unlock routine in the code path, of the cluster has been locked during the same transaction. Signed-off-by: Avra Sengupta <asengupt@redhat.com> Change-Id: I7b7aa4d4c7e565e982b75b8ed1e550fca528c834 BUG: 1008172 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5937 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: NFS daemon is limiting IOs to 64KBSantosh Kumar Pradhan2013-09-201-0/+18
| | | | | | | | | | | | | | | | | | | | | | | Problem: Gluster NFS server is hard-coding the max rsize/wsize to 64KB which is very less for NFS running over 10GE NIC. The existing options nfs.read-size, nfs.write-size are not working as expected. FIX: Make the options nfs.read-size (for rsize) and nfs.write-size (for wsize) work to tune the NFS I/O size. Value range would be 4KB(Min)-64KB(Default)-1MB(max). NB: Credit to "Richard Wareing" for catching it. Change-Id: I2754ecb0975692304308be8bcf496c713355f1c8 BUG: 1009223 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli/glusterd: improve rebalance fix-layout status reportingRavishankar N2013-09-192-2/+97
| | | | | | | | | | | | | | | | | | | Problem: Currenly the CLI rebalance status command output does not indicate the 'type' of rebalance, i.e. whether a full rebalance or only a fix-layout was carried out. Fix: After the rebalance status of all peers is received by the originator glusterd, alter it to reflect the type of rebalance before passing it on to the CLI process. Change-Id: I1940ffda0d36e25e5b33c84a0ea210394cc9e1d3 BUG: 1004744 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/5826 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Don't reset rebalance status on add-brickKaushal M2013-09-181-9/+0
| | | | | | | | | | | | | | | | | | | | The rebalance status was being reset to 'Not started' when add-brick was performed. This would lead to odd cases where a 'rebalance status' on a volume would show status as 'not started' but would also include the rebalance statistics. This also affected the showing of asynchronus task status in 'volume status' command. By not resetting the status prevent the above issues from happening. Since we use the running/not-running of the rebalance process as the check when performing other operations we can safely leave the rebalance stats collected on an add-brick. Change-Id: I4c69d9c789d081c6de7e7a81dd0d4eba2e83ec17 BUG: 1006247 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5895 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd : Blocking invalid values for geo-rep config optionsAvra Sengupta2013-09-172-9/+64
| | | | | | | | | | | Change-Id: Ia9ee44763a9c2798b26d3225bf03a974d7ece21f BUG: 998962 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5666 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli,glusterd: Task parameters in xml outputKaushal M2013-09-133-6/+165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces task parameters for the asynchronus task shown in volume status. The parameters are only given for xml output. The parameters shown currently are, - source and destination bricks for replace-brick tasks ...... <tasks> <task> <type>Replace brick</type> <id>3d1a1005-9d2e-4ae0-bd62-577bc1d333a3</id> <status>1</status> <params> <srcBrick>archm:/export/test4</srcBrick> <dstBrick>archm:/export/test-replace1</dstBrick> </params> </task> </tasks> ...... - list of bricks being removed for remove-brick tasks ...... <tasks> <task> <type>Remove brick</type> <id>901c20ca-0da2-41de-8669-5f0caca6b846</id> <status>1</status> <params> <brick>archm:/export/test2</brick> <brick>archm:/export/test3</brick> </params> </task> </tasks> ...... The changes for non-xml output will be done in a subsequent patch. Change-Id: I322afe2f83ed8adeddb99f7962c25911204dc204 BUG: 916577 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5771 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: Update sub_count on remove brickVijay Bellur2013-09-111-0/+1
| | | | | | | | | | Change-Id: I7c17de39da03c6b2764790581e097936da406695 BUG: 1002556 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5893 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Allow bumping down a peer's op-version during probeKaushal M2013-09-101-15/+9
| | | | | | | | | | | | | | | | | | Earlier, a peer running a higher op-version couldn't be probed into a cluster running at a lower op-version. This created issues when trying to expand an upgraded cluster. This patch changes this behaviour. The cluster no longer rejects a peer being probed if its op-version is higher than the cluster op-version. The peer will reduce its op-version if it doesn't have any volumes. If the peer contains volumes and needs to reduce its op-version, it fails the handshake and the probe fails. Change-Id: I12c6c873922799e1557b7184e956baea643d0dea BUG: 1005038 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5715 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Added missing MY_UUID conversion in gsync stagingAvra Sengupta2013-09-101-0/+2
| | | | | | | | | Change-Id: Ia4bf607e044d50636fb0a599a2ff91059f5b17aa BUG: 1006177 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5887 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gsyncd / geo-rep: "disjoint" cascading geo-replication sessionsVenky Shankar2013-09-041-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | Slave's xtime is now stored on the master itself (and that too only on the root), which implies it cannot be propogated to the cascaded slave. Thus the intermediate master now makes use of it's own volume information to propogate volume-mark and xtime. On starting Geo-Replication "geo-replication.ignore-pid-check" marker option is enabled, which is an override for the client-pid check in marker. This options triggers marker update only for geo-replication auxillary mount (client-pid == -1). Since gsyncd not does setxattr() directly on the bricks, this option won't trigger a chain of spurious metadata updates that would need to be processed by gsyncd. Change-Id: If50c5ef275dfb6b4ff4fd35be2565587e2fdf3e1 BUG: 996371 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5592 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker: force xtime updates (configurable) for client-pid = -1Venky Shankar2013-09-042-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is required by Geo-Replication that does auxillary mount with client-pid as -1 (which has special treatment at specific places in GlusterFS), to trigger xtime updates on the intermediate master in a cascading setup. Marker too had a check to "not" mark updates for geo-replication's auxillary mounts. With the new geo-replication design, xtimes are not set by the master on the slave for all entities. Due to this cascading setups were broken. This patch introduces "geo-replication.ignore-pid-check" option as a "override" for the client-pid check for gsyncd's client-pid. When this options is enabled, marker start "marking" even if the updates are from the special client. Geo-Replication on the detection of itself being an intermediate master, enables this option. Change-Id: I9f7140edd12fef5480595ee0f93f35b94cdb8345 BUG: 996371 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5591 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Added op-version checks to geo-rep commands.Venky Shankar2013-09-041-0/+43
| | | | | | | | | | | | | | Added op-version checks to all geo-rep commands. Min op-version should be 2. Change-Id: I942d897404e11e4d53123409731ba5cd252668fe BUG: 847839 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5732 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>