summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd.h
Commit message (Collapse)AuthorAgeFilesLines
* glusterd/snapshot: Recording the snapshots missed in each brick.Avra Sengupta2014-04-021-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Persisting missing snapshot info on disk as well as in memory in the following format: -------------NODE-UUID--------------:---------SNAP-UUID--------------=BRICKNUM:-------BRICKPATH--------:OPERATION:STATUS 927cb5fe-63da-48f5-82f6-e6a09ddc81c4:a17b4fe42c5a45f7a916438643edaa13= 3 :/brick/brick-dirs/brick3: 1 : 1 927cb5fe-63da-48f5-82f6-e6a09ddc81c4:a17b4fe42c5a45f7a916438643edaa13= 3 :/brick/brick-dirs/brick3: 3 : 1 927cb5fe-63da-48f5-82f6-e6a09ddc81c4:83a3cc05453b46b2a7eda4c9a9208638= 3 :/brick/brick-dirs/brick3: 1 : 1 This data will be stored on disk at /var/lib/glusterd/snaps/missed_snaps_list In memory we maintain the data as a list of glusterd_missed_snap_info in conf, the key for this list are the first two fields, i.e NODE-UUID:SNAP-UUID. For every NODE-UUID:SNAP-UUID, there can be multiple operations missed on multiple bricks. So we maintain a list of glusterd_snap_op_t for evert node of glusterd_missed_snap_info This list is maintained or updated during snapshot create, delete, and restore operations which are the only operations that if missed, are recorded in this list. During snapshot create, if a node is down, or a brick is down, we don't receive their mount point infos. snap_status of such bricks is marked as -1, and their brick details are added to this list. During snapshot delete, we check from originator node, if any other nodes, holding bricks of the said snap are down. Those are also added to the list. Also if the node is up, but the snapshot was pending for a snap brick, and its snap_status is -1, we add that to the list too. When a subsequent delete entry is processed for an already existing create entry, we just mark the create entries status as done (2), and don't add the delete entry to the list. During snapshot restore, we check from originator node, if any other nodes, holding bricks of the said snap are down. Those are also added to the list. Also if the node is up, but the snapshot was pending for a snap brick, and its snap_status is -1, we add that to the list too. Change-Id: I22578d14f81a54e13f6832966b70cd4cfdfd5b44 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7208 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Making snap operations crash consistentAvra Sengupta2014-04-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the events of a volume's brick being down, or a node being down, making snap ops like create, delete, restore, and status crash consistent. Marking snap status of snap bricks which were not snapshotted because the volume brick was down as -1, and not starting those snap bricks till the snapshot is taken. During delete bypassing lvm snapshot remove for snap bricks whose snap status is -1 During restore bypass replacing xattrs on the snapshot bricks whose snap status is -1. Also bumping restored volume's version so as to handle nodes being down. On handshake of a restored volume, passing brick's snap_status as well. During snapshot status of the non-snapshotted brick details display "N/A". If a node is down, the entry itself will not be displayed. Change-Id: Id042efd7507829995270da0b2b2a6282a08a053d Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7241 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Using original brickinfo for snap brick pathAvra Sengupta2014-04-021-0/+1
| | | | | | | | | | | | | Using /var/run/gluster/snaps/<snap-uuid>/ <original brick number>/<original brick's mount dir> as the snapshot brick path Change-Id: Icdd838512e662ed687662188bd197ed1a889fbea Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7231 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot : remove snap_volume variable from struct glusterd_snap_Vijaikumar M2014-03-251-2/+0
| | | | | | | | Change-Id: I3c5eae3f199d97a2fe70599e2cdb4c357d20f20d Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7197 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* snapshot: cleanup unwanted CG-codeVijaikumar M2014-03-251-40/+1
| | | | | | | | | Change-Id: Id3da2a7aac84027aeff050f6dd9a3484719362bc BUG: 1073780 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7204 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: do cleanup of snapshots in post-validate phase if half bakedRaghavendra Bhat2014-03-241-0/+5
| | | | | | | | | | objects are there Change-Id: I372cac98ad054cdc1a6fbc7f6c77c25981063b2f Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/7237 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot : Restore fix.Sachin Pandit2014-03-171-0/+3
| | | | | | | | | | | | Copy the snapshot present in origin volume to the restored volume. Change-Id: Icea9f712ae062b50b570d27b85cd231903446198 BUG: 1077142 Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/7282 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Validate snapshot name length while creationVijaikumar M2014-03-131-1/+0
| | | | | | | | Change-Id: Iaee04f5209e278d83c086e32c4cafd6c571370cd Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7242 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Modified restore backendRajesh Joseph2014-03-101-0/+5
| | | | | | | | | | | | | | Now instead of creating volume store files first we constructing the in-memory volinfo first and then generate the backend store files. This gives lot of flexibility in restore operation. This patch also fixes the read-only issue with restored snaps. Change-Id: I51032228a5212fc3b90dc6e93f3539af3eb36074 BUG: 1064688 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/7209 Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com>
* glusterd/mgmt_v3 locks: Generalization of the volume wide locks.Avra Sengupta2014-03-071-4/+4
| | | | | | | | | | Renaming volume locks as mgmt_v3 locks Change-Id: I2a324e2b8e1772d7b165fe96ce8ba5b902c2ed9a Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7187 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* gluster/snapshot: Create missing backend snapshot folderRajesh Joseph2014-03-071-8/+3
| | | | | | | | | | | | | | | | Snapshot volume bricks are mounted under /var/run/gluster/snaps folder. In the latest machines /var/run is a symbolic link to /run folder. In such cases if we use /var/run then there would be mismatch between mtab entry for the mount and the folder where we actually mount. Therefore this patch will get the correct folder and also create it if the folder is missing. Change-Id: I267aa7f3e171b486c5b3bb2a9f88cbd4be0e47ea BUG: 1072253 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/7140
* glusterd/snapshot: store location for snap driven changesVijaikumar M2014-03-031-39/+48
| | | | | | | | | | | | | | | | Currently snapshot volfiles are stored at: <workdir>/vols/<volname>/snaps/<snapvol> With snap driven approach we need to store the volfiles at: <workdir>/snaps/<snapname>/<snapvol> Change-Id: I8efdd5db29833b2b06b64a900cbb4c9b9a5d36b6 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/7006 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Use snap uuid to create lvm snapshotVijaikumar M2014-02-131-0/+14
| | | | | | | | | | | | Using snap uuid to create lvm snapshot will solve the problem of having '-' in the snap name or snap name is too long. Change-Id: If204f02a8f5de599fb409d06c7893ef3542a6300 BUG: 1045333 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6709 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/vol-locks: Dict_ref while adding req_ctx->dict to txn_opinfosAvra Sengupta2014-01-211-2/+5
| | | | | | | | | | | Introducing a wrapper function glusterd_txn_opinfo_init(), to initialize the opinfo to be set in the txn_id engine. Removed glusterd_op_fini_ctx() as the txn opinfo should only be cleared by glusterd_clear_txn_opinfo(). Change-Id: I17e85a162d6a3bca79941f8603d0c2b579f0d194 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* mgmt/glusterd : Having a separate list for snapshots.Sachin Pandit2014-01-151-0/+1
| | | | | | | | | Creating a separate list for snaps taken, as cluttering snaps in the volume list does not look neat. Change-Id: Ida4a183e95e8694b85ebb5a680d06b7d29a460a0 BUG: 1040947 Signed-off-by: Sachin Pandit <spandit@redhat.com>
* Snapshot: Gluster snapshot restore featureRajesh Joseph2014-01-151-8/+10
| | | | | | | | | | | Implemented gluster snapshot restore feature. The restore is done by replacing the origin volume with the snap volume. TODO: After the restore the snapshot volume should be deleted. As of now the deletion work is pending. Change-Id: Ib137fb6bb84a74030607ffa47f89cd705dc7e1ff Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Defining snap-max-soft-limit as a percentage of ↵Avra Sengupta2014-01-071-2/+3
| | | | | | | | | | | | | snap-max-hard-limit. This patch also prohibits configuration of snap-max-hard-limit and snap-max-soft-limit for snap volumes. Also displaying the snapshot configs by reading data only from local node, as all config data will be in sync across the cluster. Change-Id: I635b925c02ed5b108cd10c7193b154ad82d5afad BUG: 1043792 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snapshot: Introducing snap-max-hard-limit and snap-max-soft-limitAvra Sengupta2014-01-061-2/+4
| | | | | | | | | Note: Manually adding this patch again as this patch got missed in git reset option done on remote development branch Change-Id: I9e81c5ec003c1e1722d0fcb27dd87c365ee43ff4 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Fix Displaying Port, Online Status and Pid for snap vols ↵Avra Sengupta2013-12-101-0/+7
| | | | | | | | | | | in volume status Added a parent_volname member in glusterd_volinfo_ structure to help point the snap vol to the parent volname. Using this to fetch the pidfile location during volume status. Change-Id: I30a16646561394d0f7d16f66abff14c425f31f06 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* mgmt/glusterd: handle issues present in snapshot create and snap managementRaghavendra Bhat2013-12-021-2/+11
| | | | | Change-Id: I94b5f6e00be7d1ff0c454e291c779dae7b423748 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: create the pid file for the brick process of the snap volumeRaghavendra Bhat2013-11-291-0/+10
| | | | | Change-Id: I8fafb19c2c13caac2a509c36f99d2dd782865b13 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* snapshot: Snapshot restoreRajesh Joseph2013-11-151-0/+1
| | | | | | | | | | | | GL-31: Ability to restore snapshot Implemented snapshot restore for thin logical volume. As of now snapshot restore for CG is not tested. Testing for snapshot restore of a volume is done by changing the snapshot create process to create a thick snapshot. This is done because --merge option to restore thin volume is not working in the latest kernel. Change-Id: Ia3ded7e6c4da5957a74e269a25ba3200e6fb2d8b Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: snapshot config changesshishir gowda2013-11-151-0/+3
| | | | | | | | Also refactored code in glusterd for create command Additionally, removed brick-op func from mgmt_iniate_all_phases Change-Id: Iddcc332009c5716adee7f2b04c93b352fb983446 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd:store for CGshishir gowda2013-11-151-0/+5
| | | | | Change-Id: I6ec888a5553ad29ded032c02c80dd940b2aae007 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: snapshot create commandRaghavendra Bhat2013-11-151-2/+11
| | | | | | | | | | | | | | | | | This is still a work in progress. As of now, these things are done: * Take the snapshot of the backend brick * Create the new volume for the snapshot * Create the brick and the client volfiles * Store the snapshot related info in /var/lib/glusterd * Create the snap object representing the snapshot TODO: Start the brick processes for the snapshot Change-Id: I26fbb0f8e5cf004d4c1dbca51819bab1cd1bac15 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: Snapshot list supportRajesh Joseph2013-11-151-0/+1
| | | | | | | | Handles snapshot list command issued by cli. Details of all the snapshots will be sent back to the caller in required format. Change-Id: I01e512290548007c06e90b40a59cdde048fab954 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: Store for snapshotshishir gowda2013-11-151-0/+16
| | | | | | | | | | | | | | | | | | | | Introduced a new store for storing snapshot list for a given volume. $GLUSTERD_INSTALL_PATH/vols/<volname>/snap_list.info $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/ $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/info <-snapshot volume info $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/bricks <-snapshot volume brick dir $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/bricks/<infos> <-snapshot volume brick info files store delete options TODO - $GLUSTERD_INSTALL_PATH/CG/ <-place holder for all cg's .../CG/<cg-name>/info <- per cg information placeholder Change-Id: I1f9fd8ff7cc0682d05b33965736a43dca6adb3e9 Signed-off-by: shishir gowda <sgowda@redhat.com>
* cli: snapshot create cli interface.Avra Sengupta2013-11-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | $ gluster snapshot help snapshot help - display help for snapshot commands snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] - Snapshot Create. $ gluster snapshot create vol1 snapshot create: ???: snap created successfully $ gluster snapshot create vol1 vol2 snapshot create: ???: consistency group created successfully (The ??? will be replaced by the glusterd snap create command with the generated snap-name or cg-name) $ gluster snapshot create vol1 vol2 -n CG1 snapshot create: CG1: consistency group created successfully $ gluster snapshot create vol1 -n snap1 -d Description snapshot create: snap1: snap created successfully $ gluster snapshot create vol1 -n snap1 -d "Description can have -d within quotes" snapshot create: snap1: snap created successfully $ gluster snapshot create vol1 -n snap1 -d Description cant have -d without quotes snapshot create: failed: Options(-n/-d) are not valid descriptions Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] $ gluster snapshot create vol1 -n "Multi word snap name" -d Description snapshot create: failed: Invalid snap name Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] $ gluster snapshot create vol1 -d Description -n "-d" snapshot create: failed: Options(-n/-d) are not valid snap names Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] $ gluster snapshot create vol1 -d -n snap1 snapshot create: failed: No description provided Usage: snapshot create <volnames> [-n <snap-name/cg-name>] [-d <description>] Change-Id: I74b5a8406d72282fbb7ba7d07e0c7fe395148d38 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* mgmt/glusterd: Introduce snapshot infrastructureshishir gowda2013-11-151-0/+89
| | | | | | | | API's for creating, adding, finding, removing snapshots and consistency groups are provided. Change-Id: Ic28da69a075b062aefdf14754c68259ca58bd427 Signed-off-by: shishir gowda <sgowda@redhat.com>
* bd: Add support to create clone, snapshot and merge of LV images.M. Mohan Kumar2013-11-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Special xattr names "clone" & "snapshot" can be used to create full and linked clone of the LV images. GFID of destination posix file (to be mapped) is passed as a value to the xattr. Destination posix file must exist before running this operation. These operations form a basis for offloading storage related operations from QEMU to GlusterFS. Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file" Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file" Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file" Example: setfattr -n clone -v <gfid-of-dest-file> /media/source setfattr -n snapshot -v <gfid-of-dest-file> /media/source setfattr -n merge -v "/media/sn" /media/sn Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd_map: Remove bd_map xlatorM. Mohan Kumar2013-11-131-8/+0
| | | | | | | | | | | Remove bd_map xlator and CLI related changes. Change-Id: If7086205df1907127c1a1fa4ba603f1c48421d09 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5747 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: add option to specify a different base-portKaleb S. KEITHLEY2013-10-311-99/+100
| | | | | | | | | | | | | | | | | | This is (arguably) a hack to work around a bug in libvirt which is not well behaved wrt to using TCP ports in the unreserved space between 49152-65535. (See RFC 6335) Normally glusterd starts and binds to the first available port in range, usually 49152. libvirt's live migration also tries to use ports in this range, but has no fallback to use (an)other port(s) when the one it wants is already in use. Change-Id: Id8fe35c08b6ce4f268d46804bbb6dddab7a6b7bb BUG: 1018178 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/6210 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Changes to cli-glusterd communicationKaushal M2013-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Glusterd changes: With this patch, glusterd creates a socket file in DATADIR/run/glusterd.socket , and listen on it for cli requests. It listens for 2 rpc programs on the socket file, - The glusterd cli rpc program, for all cli commands - A reduced glusterd handshake program, just for the 'system:: getspec' command The location of the socket file can be changed with the glusterd option 'glusterd-sockfile'. To retain compatibility with the '--remote-host' cli option, glusterd also listens for the cli requests on port 24007. But, for the sake of security, it listens using a reduced cli rpc program on the port. The reduced rpc program only contains read-only procs used for 'volume (info|list|status)', 'peer status' and 'system:: getwd' cli commands. CLI changes: The gluster cli now uses the glusterd socket file for communicating with glusterd by default. A new option '--gluster-sock' has been added to allow specifying the sockfile used to connect. Using the '--remote-host' option will make cli connect to the given host & port. Tests changes: cluster.rc has been modified to make use of socket files and use different log files for each glusterd. Some of the tests using cluster.rc have been fixed. Change-Id: Iaf24bc22f42f8014a5fa300ce37c7fc9b1b92b53 BUG: 980754 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5280 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Validating invalid log-level under geo-rep config options.Ajeet Jha2013-10-031-3/+4
| | | | | | | | | | | Change-Id: I8ff6b48ef41fd6e9ea68c42dfb9878f8a08ed627 BUG: 1010874 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/5989 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* logging: Remove multiple definitions of DEFAULT_LOG_FILE_DIRECTORYVijay Bellur2013-09-241-1/+0
| | | | | | | | | Change-Id: I8d670a228d3c1282aa7d70b151f166d04abc40e5 BUG: 764890 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5909 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* glusterd : Blocking invalid values for geo-rep config optionsAvra Sengupta2013-09-171-0/+7
| | | | | | | | | | | Change-Id: Ia9ee44763a9c2798b26d3225bf03a974d7ece21f BUG: 998962 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5666 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli,glusterd: Task parameters in xml outputKaushal M2013-09-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces task parameters for the asynchronus task shown in volume status. The parameters are only given for xml output. The parameters shown currently are, - source and destination bricks for replace-brick tasks ...... <tasks> <task> <type>Replace brick</type> <id>3d1a1005-9d2e-4ae0-bd62-577bc1d333a3</id> <status>1</status> <params> <srcBrick>archm:/export/test4</srcBrick> <dstBrick>archm:/export/test-replace1</dstBrick> </params> </task> </tasks> ...... - list of bricks being removed for remove-brick tasks ...... <tasks> <task> <type>Remove brick</type> <id>901c20ca-0da2-41de-8669-5f0caca6b846</id> <status>1</status> <params> <brick>archm:/export/test2</brick> <brick>archm:/export/test3</brick> </params> </task> </tasks> ...... The changes for non-xml output will be done in a subsequent patch. Change-Id: I322afe2f83ed8adeddb99f7962c25911204dc204 BUG: 916577 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5771 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/gverify: Check for passwordless ssh in gverify.Venky Shankar2013-09-041-0/+3
| | | | | | | | | | Change-Id: I8c2d398114ad4534bcc052f9a5be8bbb2e7e2582 BUG: 999531 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5677 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Treat migration failures due to space constraints as skippedshishir gowda2013-07-301-0/+1
| | | | | | | | | | | | | | | | Currently rebalance/remove-brick op's display migration failed count even for files which failed due to space issues (not enough space for file, or migration leading to cluster imbalance) These will now be counted as skipped, and rebalance/remove-brick status will display the additional counter Change-Id: I674904d380b5f8300e9ca9e6af557c3d30d6cff4 BUG: 989846 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5399 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/cli changes for distributed geo-repAvra Sengupta2013-07-261-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commands: gluster system:: execute gsec_create gluster volume geo-rep <master> <slave-url> create [push-pem] [force] gluster volume geo-rep <master> <slave-url> start [force] gluster volume geo-rep <master> <slave-url> stop [force] gluster volume geo-rep <master> <slave-url> delete gluster volume geo-rep <master> <slave-url> config gluster volume geo-rep <master> <slave-url> status The geo-replication is distributed. The session will be created, and gsyncd will be spawned on all relevant nodes, instead of only one node. geo-rep: Collecting status detail related data Added persistent store for saving information about TotalFilesSynced, TotalSyncTime, TotalBytesSynced Changes in the status information in socket: Existing(Ex): FilesSynced=2;BytesSynced=2507;Uptime=00:26:01; New(Ex): FilesSynced=2;BytesSynced=2507;Uptime=00:26:01;SyncTime=0.69978; TotalSyncTime=2.890044;TotalFilesSynced=6;TotalBytesSynced=143640; Persistent details stored in /var/lib/glusterd/geo-replication/${mastervol}/${eSlave}-detail.status Change-Id: I1db7fc13ffca2e415c05200b0109b1254067f111 BUG: 847839 Original Author: Avra Sengupta <asengupt@redhat.com> Original Author: Venky Shankar <vshankar@redhat.com> Original Author: Aravinda VK <avishwan@redhat.com> Original Author: Amar Tumballi <amarts@redhat.com> Original Author: Csaba Henk <csaba@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5132 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* store: move glusterd_store functions from mgmt/glusterd to libglusterfsNiels de Vos2013-06-201-12/+6
| | | | | | | | | | | | | Making the glusterd_store_* functions re-usable will help with future changes that need to read/write lists of items. BUG: 904065 Change-Id: I99fb8eced76d12d5a254567eccff9790b43d8da3 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/4676 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Log peer op status at the appropriate timeKrutika Dhananjay2013-06-181-4/+5
| | | | | | | | | | Change-Id: Ia8e1af082078f2f791708ba4faa4992bf291dd6e BUG: 961339 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/5023 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix misleading log messages of the kind "Node <n> responded to <n>"Krutika Dhananjay2013-05-161-2/+3
| | | | | | | | | | | | | | | | | | | | | PROBLEM: glusterd logs coming from glusterd_xfer_friend_add_resp() (wrongly) indicate that a node responded to itself, although it actually responded to one of its peers. FIX: Make glusterd_xfer_friend_add_resp() distinguish between remote host and self and print the appropriate hostname. Change-Id: I2a504eeb058c08a0d378443888eb6f1dc7edc76f BUG: 963537 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/5017 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Start bricks on glusterd startup, only onceKrishnan Parthasarathi2013-05-151-0/+1
| | | | | | | | | | | | | | | The restarting of bricks has been deffered until the cluster 'stabilizes' itself volumes' view. Since glusterd_spawn_daemons is executed everytime a peer 'joins' the cluster, it may inadvertently restart bricks that were taken offline for say, maintenance purposes. This fix avoids that. Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba BUG: 960190 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4973 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Introduce volume op-versionsKaushal M2013-04-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Each volume is now associated with two op-versions, * op_version - the op-version of the highest op-versioned feature enabled * client_op_version - the op-version of the highest op-versioned feature enabled which affects the clients only. These two op-versions are generated dynamically and kept updated during runtime. Glusterd now uses the respective volumes' client-op-version during getspec requests. To achieve the above a new field in the vme table is introduced, client_option, this boolean field tells if the option is a client side option. Change-Id: I12c83b1dd29ab506026efd50d448cebbcee53c27 BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4584 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: big lock - a coarse-grained locking to prevent racesKrishnan Parthasarathi2013-04-121-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are primarily three lists that are part of glusterd process, that are concurrently accessed. Namely, priv->volumes, priv->peers and volinfo->bricks_list. Big-lock approach ----------------- WHAT IS IT? Big lock is a coarse-grained lock which protects all three lists, mentioned above, from racy access. HOW DOES IT WORK? At any given point in time, glusterd's thread(s) are in execution _iff_ there is a preceding, inbound network event. Of course, the sigwaiter thread and timer thread are exceptions. A network event is an external trigger to glusterd, via the epoll thread, in the form of POLLIN and POLLERR. As long as we take the big-lock at all such entry points and yield it when we are done, we are guaranteed that all the network events, accessing the global lists, are serialised. This amounts to holding the big lock at - all the handlers of all the actors in glusterd. (POLLIN) - all the cbks in glusterd. (POLLIN) - rpc_notify (DISCONNECT event), if we access/modify one of the three lists. (POLLERR) In the case of synctask'ized volume operations, we must remember that, if we held the big lock for the entire duration of the handler, we may block other non-synctask rpc actors from executing. For eg, volume-start would block in PMAP SIGNIN, if done incorrectly. To prevent this, we need to yield the big lock, when we yield the synctask, and reacquire on waking up of the synctask. Change-Id: Ib929f9905b55fb6c3fc27fefb497a26dba058e4f BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4784 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: enable valgrind usage even in non DEBUG buildRaghavendra Bhat2013-04-091-2/+0
| | | | | | | | | | | | * Till now running glusterfs processes were allowed to run in valgrind mode only when built with debug mode enabled. Change-Id: I11e07ea2a4da4f82f70cdded6258a22d65d6db64 BUG: 922877 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4688 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* glusterd: Removed fd leaks in glusterfs_start utility functionKrishnan Parthasarathi2013-03-251-6/+8
| | | | | | | | | | | | | | | | | | | PROBLEM: The FILE* associated with the pidfile was leaked if pmap_registry_search on the brickinfo' path failed. FIX: Eliminates the use of the FILE* that was leaked. Uses glusterd_is_service_running utility function in place of the earlier attempt to check for the same. Change-Id: I94082bd5a94b8a6340f8cc11726d3264e364efe6 BUG: 916549 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4596 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Mark vol as deleted by renaming voldir before cleaning up the storeKrutika Dhananjay2013-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: During 'volume delete', when glusterd fails to erase all information about a volume from the backend store (for instance because rmdir() failed on non-empty directories), not only does volume delete fail on that node, but also subsequent attempts to restart glusterd fail because the volume store is left in an inconsistent state. FIX: Rename the volume directory path to a new location <working-dir>/trash/<volume-id>.deleted, and then go on to clean up its contents. The volume is considered deleted once rename() succeeds, irrespective of whether the cleanup succeeds or not. Change-Id: Iaf18e1684f0b101808bd5e1cd53a5d55790541a8 BUG: 889630 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4639 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>