summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-utils.c
Commit message (Collapse)AuthorAgeFilesLines
* glusterd/snapshot: Addressed upstream review commentsHEADdevelopmentRajesh Joseph2014-04-071-18/+16
| | | | | | | | | | Code cleanup and review comments fixed. Change-Id: Ie51e3a2f995295632cb1db05bab8b38603ab9a0a Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/7399 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snapshot: Recording the snapshots missed in each brick.Avra Sengupta2014-04-021-1/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Persisting missing snapshot info on disk as well as in memory in the following format: -------------NODE-UUID--------------:---------SNAP-UUID--------------=BRICKNUM:-------BRICKPATH--------:OPERATION:STATUS 927cb5fe-63da-48f5-82f6-e6a09ddc81c4:a17b4fe42c5a45f7a916438643edaa13= 3 :/brick/brick-dirs/brick3: 1 : 1 927cb5fe-63da-48f5-82f6-e6a09ddc81c4:a17b4fe42c5a45f7a916438643edaa13= 3 :/brick/brick-dirs/brick3: 3 : 1 927cb5fe-63da-48f5-82f6-e6a09ddc81c4:83a3cc05453b46b2a7eda4c9a9208638= 3 :/brick/brick-dirs/brick3: 1 : 1 This data will be stored on disk at /var/lib/glusterd/snaps/missed_snaps_list In memory we maintain the data as a list of glusterd_missed_snap_info in conf, the key for this list are the first two fields, i.e NODE-UUID:SNAP-UUID. For every NODE-UUID:SNAP-UUID, there can be multiple operations missed on multiple bricks. So we maintain a list of glusterd_snap_op_t for evert node of glusterd_missed_snap_info This list is maintained or updated during snapshot create, delete, and restore operations which are the only operations that if missed, are recorded in this list. During snapshot create, if a node is down, or a brick is down, we don't receive their mount point infos. snap_status of such bricks is marked as -1, and their brick details are added to this list. During snapshot delete, we check from originator node, if any other nodes, holding bricks of the said snap are down. Those are also added to the list. Also if the node is up, but the snapshot was pending for a snap brick, and its snap_status is -1, we add that to the list too. When a subsequent delete entry is processed for an already existing create entry, we just mark the create entries status as done (2), and don't add the delete entry to the list. During snapshot restore, we check from originator node, if any other nodes, holding bricks of the said snap are down. Those are also added to the list. Also if the node is up, but the snapshot was pending for a snap brick, and its snap_status is -1, we add that to the list too. Change-Id: I22578d14f81a54e13f6832966b70cd4cfdfd5b44 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7208 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Making snap operations crash consistentAvra Sengupta2014-04-021-2/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the events of a volume's brick being down, or a node being down, making snap ops like create, delete, restore, and status crash consistent. Marking snap status of snap bricks which were not snapshotted because the volume brick was down as -1, and not starting those snap bricks till the snapshot is taken. During delete bypassing lvm snapshot remove for snap bricks whose snap status is -1 During restore bypass replacing xattrs on the snapshot bricks whose snap status is -1. Also bumping restored volume's version so as to handle nodes being down. On handshake of a restored volume, passing brick's snap_status as well. During snapshot status of the non-snapshotted brick details display "N/A". If a node is down, the entry itself will not be displayed. Change-Id: Id042efd7507829995270da0b2b2a6282a08a053d Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7241 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Using original brickinfo for snap brick pathAvra Sengupta2014-04-021-0/+1
| | | | | | | | | | | | | Using /var/run/gluster/snaps/<snap-uuid>/ <original brick number>/<original brick's mount dir> as the snapshot brick path Change-Id: Icdd838512e662ed687662188bd197ed1a889fbea Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7231 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* Review CommentsAvra Sengupta2014-03-271-2/+5
| | | | | | | | Change-Id: Ifce90b0b617bc0b43a9af0bd692a7290820ac62c Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/7358 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot : glusterd-utils related fix for upstream review commentsSachin Pandit2014-03-271-30/+28
| | | | | | | | Change-Id: I7add37a8e8a7d68de9d4c09110987ed1afd2f86c Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/7356 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot : remove snap_volume variable from struct glusterd_snap_Vijaikumar M2014-03-251-11/+11
| | | | | | | | Change-Id: I3c5eae3f199d97a2fe70599e2cdb4c357d20f20d Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7197 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* snapshot/config : snapshot config changes.Sachin Pandit2014-03-251-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syntax : gluster snapshot config [volname] [snap-max-hard-limit <value>] [snap-max-soft-limit <value>] 1)If volume name is not specified, then the value is set system wide. 2)If volume name is specified, then the value is set to that particular volume. *NOTE : snap-max-soft-limit cannot be specified to individual volumes. case 1) When system limit is greater than volume limit. Sample output : Snapshot System Configuration: snap-max-hard-limit : 200 snap-max-soft-limit : 50% Snapshot Volume Configuration: Volume : vol2 snap-max-hard-limit : 100 Effective snap-max-hard-limit : 100 Effective snap-max-soft-limit : 50 (50%) Volume : vol1 snap-max-hard-limit : 150 Effective snap-max-hard-limit : 150 Effective snap-max-soft-limit : 75 (50%) case 2) When system limit is lesser than volume limit. Sample output : Snapshot System Configuration: snap-max-hard-limit : 50 snap-max-soft-limit : 50% Snapshot Volume Configuration: Volume : vol2 snap-max-hard-limit : 100 Effective snap-max-hard-limit : 50 Effective snap-max-soft-limit : 25 (50%) Volume : vol1 snap-max-hard-limit : 150 Effective snap-max-hard-limit : 50 Effective snap-max-soft-limit : 25 (50%) Change-Id: I97b5daefec7205bb9ab7b5b51d38f504cc5ee940 BUG: 1075034 Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/7303 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: populate the snapshot volume list in the order when the ↵Vijaikumar M2014-03-251-0/+18
| | | | | | | | | | | | | | glusterd is restarted We are storing the snapshot objects in the order. We need to do the same for snapshot volumes Change-Id: Iea9594632e52d069f167cc8a840c78d0f7109947 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7307 Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: do cleanup of snapshots in post-validate phase if half bakedRaghavendra Bhat2014-03-241-0/+1
| | | | | | | | | | objects are there Change-Id: I372cac98ad054cdc1a6fbc7f6c77c25981063b2f Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/7237 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Fix brick statusVijaikumar M2014-03-131-0/+36
| | | | | | | | | | Change-Id: I04ff2ddf5c644dde2051b8a692d287e87ba59942 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7240 Reviewed-by: Sachin Pandit <spandit@redhat.com> Tested-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com>
* snapshot/restore: Restore failed in multiple node testRajesh Joseph2014-03-131-15/+18
| | | | | | | | | | | | | | Failed to set extended attribute on bricks on some nodes. Extended attribute should only be set for local node. Change-Id: Ib5b47b15d2f6f93e146d3d75e9061fccc9570a80 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/7213 Reviewed-by: Sachin Pandit <spandit@redhat.com> Tested-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snapshot: Modified restore backendRajesh Joseph2014-03-101-31/+138
| | | | | | | | | | | | | | Now instead of creating volume store files first we constructing the in-memory volinfo first and then generate the backend store files. This gives lot of flexibility in restore operation. This patch also fixes the read-only issue with restored snaps. Change-Id: I51032228a5212fc3b90dc6e93f3539af3eb36074 BUG: 1064688 Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-on: http://review.gluster.org/7209 Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com>
* glusterd/snapshot: Snapshot create and delete changesVijaikumar M2014-03-061-0/+1
| | | | | | | | | | | | | | | | | | | With the snap driven approach, While creating the snapshot, We have to mention the snap-name first and then the volumes to be associated with that. Corresponding changes has been made in glusterd. While deleting the snapshot, we have to mention only the snapname. Corresponding changes has been made in glusterd. CLI changes for the same can be found here: http://review.gluster.org/#/c/6947/ Change-Id: I8bd8f471da5b728165da5f331faad3dde3486823 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/7123 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: store location for snap driven changesVijaikumar M2014-03-031-511/+103
| | | | | | | | | | | | | | | | Currently snapshot volfiles are stored at: <workdir>/vols/<volname>/snaps/<snapvol> With snap driven approach we need to store the volfiles at: <workdir>/snaps/<snapname>/<snapvol> Change-Id: I8efdd5db29833b2b06b64a900cbb4c9b9a5d36b6 Signed-off-by: Vijaikumar M <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/7006 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd: check the status of the snap volume while stopping snap bricksRaghavendra Bhat2014-02-131-1/+1
| | | | | | | | Change-Id: I60b635f275498b285aa34702ce6ca41bfb7e01a0 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/6995 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Rajesh Joseph <rjoseph@redhat.com>
* mgmt/glusterd : Having a separate list for snapshots.Sachin Pandit2014-01-151-2/+32
| | | | | | | | | Creating a separate list for snaps taken, as cluttering snaps in the volume list does not look neat. Change-Id: Ida4a183e95e8694b85ebb5a680d06b7d29a460a0 BUG: 1040947 Signed-off-by: Sachin Pandit <spandit@redhat.com>
* Snapshot: Gluster snapshot restore featureRajesh Joseph2014-01-151-12/+19
| | | | | | | | | | | Implemented gluster snapshot restore feature. The restore is done by replacing the origin volume with the snap volume. TODO: After the restore the snapshot volume should be deleted. As of now the deletion work is pending. Change-Id: Ib137fb6bb84a74030607ffa47f89cd705dc7e1ff Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Updating snapshot options during glusterd-handshake.Avra Sengupta2014-01-091-0/+53
| | | | | Change-Id: Icdd20825f51a01f4186841639fb645cb56a2fd12 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snapshot: Defining snap-max-soft-limit as a percentage of ↵Avra Sengupta2014-01-071-6/+74
| | | | | | | | | | | | | snap-max-hard-limit. This patch also prohibits configuration of snap-max-hard-limit and snap-max-soft-limit for snap volumes. Also displaying the snapshot configs by reading data only from local node, as all config data will be in sync across the cluster. Change-Id: I635b925c02ed5b108cd10c7193b154ad82d5afad BUG: 1043792 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snapshot: Introducing snap-max-hard-limit and snap-max-soft-limitAvra Sengupta2014-01-061-15/+19
| | | | | | | | | Note: Manually adding this patch again as this patch got missed in git reset option done on remote development branch Change-Id: I9e81c5ec003c1e1722d0fcb27dd87c365ee43ff4 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
* glusterd/snapshot: Fixing glusterd_do_quorum_action to start snap bricks for ↵Avra Sengupta2013-12-171-7/+29
| | | | | | | snap volumes Change-Id: I97f59fcf1e78ded35fd15996d587ecd043c7dc17 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snapshot: Fix for cksum mismatches at snap create.Avra Sengupta2013-12-171-2/+4
| | | | | | | Also fixes peer rejects on glusterd restart Change-Id: I1671416c1f3fd2afea450cc3b4c632de187351ca Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* cli/glusterd: implement the snap and cg delete functionalitiesRaghavendra Bhat2013-12-121-0/+186
| | | | | Change-Id: Icdb66c89acdd043d0d6368c48ce2e01b1a40966f Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: snapshot related cleanupRaghavendra Bhat2013-12-121-2/+12
| | | | | Change-Id: I277a70f732666d047ba5dff7a7e6925e0679741b Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* glusterd/snapshot: Fix Displaying Port, Online Status and Pid for snap vols ↵Avra Sengupta2013-12-101-1/+12
| | | | | | | | | | | in volume status Added a parent_volname member in glusterd_volinfo_ structure to help point the snap vol to the parent volname. Using this to fetch the pidfile location during volume status. Change-Id: I30a16646561394d0f7d16f66abff14c425f31f06 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* glusterd/snap: Fix for socket connect failures on brick startAvra Sengupta2013-12-101-32/+33
| | | | | Change-Id: I9f53bd4d83794c69c54e4a03f59425a1ca6a4ac3 Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* mgmt/glusterd: create pid file for the snap bricks in proper pathRaghavendra Bhat2013-12-031-1/+1
| | | | | Change-Id: I98fcd25e80b1b39e0292eae059e2d624ca9094fb Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: handle issues present in snapshot create and snap managementRaghavendra Bhat2013-12-021-7/+55
| | | | | Change-Id: I94b5f6e00be7d1ff0c454e291c779dae7b423748 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: create the pid file for the brick process of the snap volumeRaghavendra Bhat2013-11-291-2/+26
| | | | | Change-Id: I8fafb19c2c13caac2a509c36f99d2dd782865b13 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* glusterd/Jarvis: Added aggr rsp dict in mgmt frameworkAvra Sengupta2013-11-151-0/+82
| | | | | | | Also fixes snapshot config output Change-Id: Ia50d94492009cf73dbb99ba20117b9fa4c41048a Signed-off-by: Avra Sengupta <asengupt@redhat.com>
* mgmt/glusterd: snapshot config changesshishir gowda2013-11-151-0/+11
| | | | | | | | Also refactored code in glusterd for create command Additionally, removed brick-op func from mgmt_iniate_all_phases Change-Id: Iddcc332009c5716adee7f2b04c93b352fb983446 Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: handle snap volume store and actual volume store separatelyRaghavendra Bhat2013-11-151-6/+17
| | | | | Change-Id: I8b88fe94d0f9ee1089cafdda037abcf2f7a180ca Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: changes to start the brick process of the snap volumeRaghavendra Bhat2013-11-151-2/+6
| | | | | Change-Id: I54db2fa67ebb6b57629f9536c296fbae07a1d159 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: snapshot create commandRaghavendra Bhat2013-11-151-18/+272
| | | | | | | | | | | | | | | | | This is still a work in progress. As of now, these things are done: * Take the snapshot of the backend brick * Create the new volume for the snapshot * Create the brick and the client volfiles * Store the snapshot related info in /var/lib/glusterd * Create the snap object representing the snapshot TODO: Start the brick processes for the snapshot Change-Id: I26fbb0f8e5cf004d4c1dbca51819bab1cd1bac15 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* mgmt/glusterd: Store for snapshotshishir gowda2013-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | Introduced a new store for storing snapshot list for a given volume. $GLUSTERD_INSTALL_PATH/vols/<volname>/snap_list.info $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/ $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/info <-snapshot volume info $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/bricks <-snapshot volume brick dir $GLUSTERD_INSTALL_PATH/vols/<volname>/snaps/<snap-name>/bricks/<infos> <-snapshot volume brick info files store delete options TODO - $GLUSTERD_INSTALL_PATH/CG/ <-place holder for all cg's .../CG/<cg-name>/info <- per cg information placeholder Change-Id: I1f9fd8ff7cc0682d05b33965736a43dca6adb3e9 Signed-off-by: shishir gowda <sgowda@redhat.com>
* glusterd/utils: Get brick mount's device nameshishir gowda2013-11-151-0/+41
| | | | | Change-Id: I03ff9e8094e7e36b28b521380949c7e9044c2e4e Signed-off-by: shishir gowda <sgowda@redhat.com>
* mgmt/glusterd: Introduce snapshot infrastructureshishir gowda2013-11-151-8/+26
| | | | | | | | API's for creating, adding, finding, removing snapshots and consistency groups are provided. Change-Id: Ic28da69a075b062aefdf14754c68259ca58bd427 Signed-off-by: shishir gowda <sgowda@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-141-0/+92
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-131-0/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Relax extended attribute checks for volume create and add ↵Vijay Bellur2013-10-171-4/+9
| | | | | | | | | | | | | | brick force. Expectation with force is that user is aware of the consequences of sanity checks not being triggered. Change-Id: I79dfeed16a23829a7217cef33ab83f9f0ffae336 Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1007509 Reviewed-on: http://review.gluster.org/5746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: [Feature] Command implementation to get heal-countVenkatesh Somyajulu2013-10-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia | wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Implementation of command "gluster volume heal vn statistics"Venkatesh Somyajulu2013-10-141-1/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "gluster volume heal volumename statistics" command gives the summary of the afr crawl done based on the entries present in the xattrop directory. Whenever afr crawls are attempted, the beginning time of crawl, end time of crawl, no of files healed, heal-failed count and number of files in split brain are shown along with the type of the crawl. If crawl is already in progress then it will give the number of files healed, heal failed count and number of files in split-brain from the beginning of the crawl and instead of telling the end time of the crawl, "CRAWL IN PROGRESS" message will be shown. Output format: command: "gluster volume heal volume-name statistics" Output: Gathering afr crawl statistics crawl statistics on volume volume-name has been successful ------------------------------------------------ Crawl statistics for brick no 0 Hostname of brick 192.168.122.248 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 ------------------------------------------------ Crawl statistics for brick no 1 Hostname of brick 192.168.122.1 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 -------------------------------------------------- Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9 BUG: 949400 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4790 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Implement 'volume status tasks'Krutika Dhananjay2013-10-081-0/+29
| | | | | | | | | | | | | | | | | | | oVirt's Gluster Integration needs an inexpensive command that can be executed every 10 seconds to monitor async tasks and their parameters, for all volumes. The solution involves adding a 'tasks' sub-command to 'volume status' to fetch only the async task IDs, type and other relevant parameters. Only the originator glusterd participates in this command as all the information needed is available on all the nodes. This is to make the command suitable for being executed every 10 seconds. Change-Id: I1edc607baf29b001a5585079dec681d7c641b3d1 BUG: 1012346 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6006 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Calculate volume op-versions only on set/resetKaushal M2013-09-291-2/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The volume op-versions are calculated during a volume set/reset, reading a volume from disk and importing a volume during probe or volume sync. The calculation of the volume op-version depends on the clusters op-version as some features are enabled automatically depending on the clusters op-version. We also don't store the volume op-versions persistently and don't export the volume op-versions during sync. Due to this, there can occur cases which will lead to inconsistencies in volumes in different peers. One such case is below, Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N. The cluster has two volumes V1 and V2, which have volume op-versions N (since volume op-version cannot be greater than cluster op-version). We have, Cluster-op-version = N V1 op-version = N V2 op-version = N A set operation on V1 causes the clusters op-version to be bumped up to N+1. Assume that there exist some features that are automatically enabled on op-version N+1. The op-version of V2 remains at N as no operation has been performed on it. So, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N Now, we probe a new peer P4. On the new peer we will have the following op-versions, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N+1 This happens because we don't send volume op-versions during the sync after probe. P4 will freshly calculate the op-version of V2 (assuming features have been auto enabled due to the cluster op-version being N+1) as N+1. Another case is when glusterd on a peer restarts. Assume P3 was restarted, glusterd will recalculate the volume op-versions during the restore state. Again, op-version of V2 will be calculated as N+1 assuming auto enabled features. This will lead to inconsistency in the volume representation in memory and on disk, as glusterd will assume the volume contains auto enabled features, but the volfiles don't contain them as they were not regenrated. These kind of issues can be solved by calculating the volume op-version only when features are enabled and disabled (ie. during volume set/reset), persisting the volume-op-versions and exporting/importing them. Change-Id: I52de0668c92628622e85f4588fb28829a7231132 BUG: 1005043 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5568 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Saving geo-rep session details in a more specific pathVenky Shankar2013-09-041-0/+12
| | | | | | | | | | | | | | | Now saving the session details in /var/lib/glusterd/geo-replication/<mastervol>_<slaveip>_<slavevol> repo to distinguish between two master-slave sessions where the slavename is same across two different clusters. Change-Id: I57c93f55cc9bd4fe2bffe579028aaf5e4335b223 BUG: 991501 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5488 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Regenerate client volfiles during upgradeshishir gowda2013-09-031-6/+27
| | | | | | | | | | | Change-Id: I1442bc1d115a9c6ecf139a0ca9da74d07e0fe928 BUG: 1003855 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5764 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Use volume op-versions during volgenKaushal M2013-08-021-2/+2
| | | | | | | | | | | | | | Instead of using the cluster op-version, volume op-version is used to enable open-behind during volgen. For doing this, the volume op-versions are updated before regenerating the volfiles. Change-Id: I675bb549bf7c7c0279030dca698fb530781addc6 BUG: 990830 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5385 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Re-initialize skipped file count in glusterdshishir gowda2013-07-311-0/+1
| | | | | | | | | Change-Id: I42d08b3a6a7a3839f5e9953e1f83959222c080f8 Signed-off-by: shishir gowda <sgowda@redhat.com> BUG: 989846 Reviewed-on: http://review.gluster.org/5446 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>