summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-utils.c
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: op-version check for brickops.Ravishankar N2014-03-241-0/+21
| | | | | | | | | | | | | | | | | | | | | | cluster op-version must be atleast 4 for add/remove brick to proceed. This change is required for the new afr-changelog xattr changes that will be done for glusterFS 3.6 (http://review.gluster.org/#/c/7155/). In add-brick, the check is done only when replica count is increased because only that will affect the AFR xattrs. In remove-brick, the check is unconditional failing which there will be inconsistencies in the client xlator names amongst the volfiles of different peers. Change-Id: If981da2f33899aed585ab70bb11c09a093c9d8e6 BUG: 1066778 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7122 Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: persistent client xlator/ afr changelog namesRavishankar N2014-03-241-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | -Add a unique brick-id field to glusterd_brickinfo_t -Persist the id to the brickinfo file -Use the brick-id as the client xlator name during vol create, add-brick and replace-brick operations. -For older volumes,generate the id in-memory during glusterd restore but defer writing it to the brickinfo file until the next volume set operation. -send and receive the brick-ids during peer probe. Feature page: www.gluster.org/community/documentation/index.php/Features/persistent-AFR-changelog-xattributes Related patch: http://review.gluster.org/#/c/7122 Change-Id: Ib7f1570004e33f4144476410eec2b84df4e41448 BUG: 1066778 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7155 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: send/receive volinfo->caps during peer probe.Ravishankar N2014-03-081-0/+12
| | | | | | | | | | | | | | | Problem: volinfo->caps was not sent over to newly probed peers, resulting in a 'Peer Rejected' state due to volinfo checksum mismatch. Fix: send/receive volinfo capability when peer probing. Change-Id: I2508d3fc7a6e4aeac9c22dd7fb2d3b362f4c21ff BUG: 1072720 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/7186 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fixed typo in console message during volume createSatheesaran2014-03-081-1/+1
| | | | | | | | | | | | | | | | While creating a volume, if the brick is created on the root partition, then the error statement is thrown. This error statements was containing two "is" in it. Removed one of the "is" Change-Id: I0d83f0feccda34989f7e2b97041d1f15ec9e2f00 BUG: 1065551 Signed-off-by: Satheesaran <satheesaran@gmail.com> Reviewed-on: http://review.gluster.org/7198 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Volume locks and transaction specific opinfosAvra Sengupta2014-02-101-8/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch we are replacing the existing cluster-wide lock taken on glusterds across the cluster, with volume locks which are also taken on glusterds across the cluster, but are volume specific. So with the volume locks we are able to perform more than one gluster operation at the same time, as long as the operations are being performed on different volumes. We maintain a global list of volume-locks (using a dict for a list) where the key is the volume name, and which saves the uuid of the originator glusterd. These locks are held and released per volume transaction. In order to acheive multiple gluster operations occuring at the same time, we also separate opinfos in the op-state-machine, as a part of this patch. To do so, we generate a unique transaction-id (uuid) per gluster transaction. An opinfo is then associated with this transaction id, which is used throughout the transaction. We maintain a run-time global list(using a dict) of transaction-ids, and their respective opinfos to achieve this. Upstream Feature Page: http://www.gluster.org/community/documentation/index.php/Features/glusterd-volume-locks Change-Id: Iaad505a854bac8de8f83beec0357eb6cde3f7ea8 BUG: 1011470 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5994 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: add volinfo to the list data structure in an orderVijaykumar M2014-02-081-1/+12
| | | | | | | | | | | | | | | | Currently volinfo is added at the end of the list while creating a volume. On gluster restart, readdir will not provide the ordered list and the data is populated in the same order as readdir. Solution is to insert the volinfo to the list in an order Change-Id: I1716ac6abbd7dd301a7125425fc413c6833f7a48 BUG: 1039912 Signed-off-by: Vijaykumar M <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/6472 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Fix race in pid file updateRavishankar N2014-02-031-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch only removes lines of code. For personal gratification, giving a detailed explanation of what the problem was. When glusterd spawns the local brick process, say when a reboot of the node occurs,the glusterd_brick_start() and subsequently the glusterd_volume_start_glusterfs() function gets called twice; from glusterd_spawn_daemons() and glusterd_do_volume_quorum_action() respectively. This causes a race, best described by a pseudo-code of current behaviour. glusterd_volume_start_glusterfs() { if(!brick process running) { step-a) reap pid file( i.e. unlink it) step-b) fork a brick process which creates and locks pid file and binds the process to a socket. } } Time Event ---- ----- T1 Call-1 arrives, completes step-a, starts step-b T2 Call-2 arrives, enters step-a as Call-1's forked child is not yet running. T3 Call-1's forked child is alive, creates pidfile and locks it,binds its address to a socket. T4 Call-2 performs step-a; i.e.unlinks the pid file created by Call-1 !! (files can still be stil be unlinked despite a lockf on it) T5 Call-2 does step-b, and the forked child process creates a *new* pid file with it's pid and locks this file. T6 But Call-2's brick process is not able to bind to socket as it is already in use (courtesy T3) and hence exits (so no locks anymore on the pidfile). Result: - Pid file now contains PID of an extinct brick process. - `gluster volume status` shows this PID value. It also notices that there is no lock held on pid file by the currently running brick process (created by Call-1) and hence shows N/A for the online status. Also, as a result of events at T4, "ls -l /proc/<brick process PID>/fd/pidfile" shows up as deleted. Fix: 1.Do not unlink pid file. i.e. avoid step-a. Now at T5,Call-2's brick process cannot obtain lock on pid file (Call-1's process still holds it) and exits. 2. Unrelated, but remove lock-unlock sequence in glusterfs_pidfile_setup() which does not seem to be doing anything useful. Change-Id: I18d3e577bb56f68d85d3e0d0bfd268e50ed4f038 BUG: 1035586 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/6786 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: make sure quota enforcer has established connection with ↵Raghavendra G2014-01-251-5/+14
| | | | | | | | | | | | | | | | | quotad before marking quota as enabled. without this patch there is a window of time when quota is marked as enabled in quota-enforcer, but connection to quotad wouldn't have been established. Any checklimit done during this period can result in a failed fop because of unavailability of quotad. Change-Id: I0d509fabc434dd55ce9ec59157123524197fcc80 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6572 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Free op_errstr before it goes out of scope.Ira Cooper2014-01-251-2/+2
| | | | | | | | | | | | Also removed the NULL check on path_list, as it shouldn't be needed. Change-Id: I5b655f7b383f301afa8fc1c1c09b31e2afe47f0f CID: 1138527 BUG: 789278 Signed-off-by: Ira Cooper <ira@samba.org> Reviewed-on: http://review.gluster.org/6784 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt: Fix memory leak of brickid from gf_asprintf.Ira Cooper2014-01-241-1/+4
| | | | | | | | | | | | No GF_FREE, so one was put into the return path with proper checking. Change-Id: Idde2803608409dcbf216062f83b7f4493946ba70 CID: 1124718 BUG: 789278 Signed-off-by: Ira Cooper <ira@samba.org> Reviewed-on: http://review.gluster.org/6755 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Relocate rebalance sockfileKaushal M2014-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The defrag sockfile was moved from priv->workdir to DEFAULT_VAR_RUN_DIRECTORY. The format for the new path of the defrag sockfile is 'DEFAULT_VAR_RUN_DIRECTORY/gluster-rebalance-<vol-id>.sock'. This was needed because the earlier location didn't have a fixed length and could exceed UNIX_PATH_MAX characters. This could lead to the rebalance process failing to start as the socket file could not be created. Also, for keeping backward compatiblity, glusterd_rebalance_rpc_create will try both the new and old sockfile locations when attempting reconnection. Change-Id: I6740ea665de84ebce1ef7199c412f426de54e3d0 BUG: 1049726 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6616 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: update volinfo->subvol_count in newly added peersRavishankar N2014-01-031-1/+2
| | | | | | | | | | | | Update the subvol_count when a peer imports information about the friend volumes. Change-Id: Id3884bd5727ff22be7ed87f43a1ec1b5fe34813c BUG: 1047955 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/6629 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: make volinfo a refcnt'ed object.Krishnan Parthasarathi2013-12-201-3/+44
| | | | | | | | | | | | | Add glusterd_volinfo_remove(..) which removes @volinfo from the list of volumes in the cluster and performs an unref on @volinfo Change-Id: I5f546ca58f61bc334ab1bab4c51c4a21e1f66161 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6521 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: rebalance to ref volinfo before startingKrishnan Parthasarathi2013-12-201-5/+7
| | | | | | | | | Change-Id: Ib316897dcbd0748bfb3bfcda186b9fe30c07f80f BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6522 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: ignore failure to stop a stopped service.Krishnan Parthasarathi2013-12-191-0/+12
| | | | | | | | | | | | | | | | kill(2) returns -1 with errno set to ESRCH when the pid of the process being killed doesn't exist. Failing glusterd_brick_stop on a stopped brick could result in volume-stop failing, in commit phase. This fix prevents that from happening. Change-Id: I00f46fa06e489a671efbb8e4119f545f8ccea329 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6525 Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix stopping of stale rebalance processesKaushal M2013-12-181-102/+12
| | | | | | | | | | | | | | | | | | | | | | | Trying to stop rebalance process via RPC using the GD_SYNCOP macro, could lead to glusterd crashing. In case of an implicit volume update, which happens when a peer comes back up, the stop function would be called in the epoll thread. This would lead to glusterd crashing as the epoll thread doesn't have synctasks for the GD_SYNCOP macro to make use of. Instead of using the RPC method, we now terminate the rebalance process by kill(). The rebalance process has been designed to be resistant to interruption, so this will not lead to any data corruption. Also, when checking for stale rebalance task, make sure that the old task-id is not null. Change-Id: I54dd93803954ee55316cc58b5877f38d1ebc40b9 BUG: 1044327 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6531 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rpc,glusterd: Use rpc_clnt notifyfn to cleanup mydataKaushal M2013-12-161-5/+19
| | | | | | | | | | | | | | | | | | | | | | | rpc: - On a RPC_TRANSPORT_CLEANUP event, rpc_clnt_notify calls the registered notifyfn with a RPC_CLNT_DESTROY event. The notifyfn should properly cleanup the saved mydata on this event. - Break the reconnect chain when an rpc client is disabled. This will prevent new disconnect events which can lead to crashes. glusterd: - Added support for RPC_CLNT_DESTROY in glusterd_brick_rpc_notify - Use a common glusterd_rpc_clnt_unref() function throught glusterd in place of rpc_clnt_unref(). This function correctly gives up the big-lock before performing the unref. Change-Id: I93230441c5089039643fc9f5632477ef1b695348 BUG: 962619 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5512 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Save/restore/sync rebalance dictKaushal M2013-12-161-26/+99
| | | | | | | | | | | | | | | | | | | | | A dictionary was added to store additional information of a rebalance process, like the bricks being removed in case of a rebalance started by remove-brick. This dictionary wasn't being stored/restored or synced during volume sync, leading to errors like a volume status command failing. These issues have been fixed in this patch. The rebalance dict is now stored/restored and also exported/imported during volume sync. Also, this makes sure that the rebalance dict is only create on remove-brick start. This adds a bricks decommissioned status to the information imported/exported during volume sync. Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227 BUG: 1040809 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6492 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/geo-rep: more glusterd and cli fixes for geo-rep.Ajeet Jha2013-12-121-55/+81
| | | | | | | | | | | | | | | | | | | | | | | | -> handle option validation cases in reset case. -> Creating valid conf path when glusterd restarts. -> Reading the gsyncd worker thread status and displaying it. -> Displaying status-detail per worker. -> Fetch checkpoint info in geo-rep status. -> use-tarssh value validation added. misc: misc geo-rep fixes based on cluster, logrotate etc.. -> cluster/dht: fix 'stime' getxattr getting overwritten. -> cluster/afr: return max of 'stime' values in subvol. -> geo-rep-logrotate: Sending SIGHUP to geo-rep auxiliary. -> cluster/dht: fix convoluted logic while aggregating. -> cluster/*: fix 'stime' min/max fetch logic. Change-Id: I811acea0bbd6194797a3e55d89295d1ea021ac85 BUG: 1036552 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6405 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Improve rebalance handling during volume syncKaushal M2013-12-101-10/+197
| | | | | | | | | | | | | | | | Glusterd will now correctly copy existing rebalance information when a volinfo is updated during volume sync. If the existing rebalance information was stale, then any existing rebalance process will be termimnated. A new rebalance process will be started only if there is no existing rebalance process. The rebalance process will not be started if the existing rebalance session had completed, failed or been stopped. Change-Id: I68c5984267c188734da76770ba557662d4ea3ee0 BUG: 1036464 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Aggregate tasks status in 'volume status [tasks]'Kaushal M2013-12-041-7/+209
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, glusterd used to just send back the local status of a task in a 'volume status [tasks]' command. As the rebalance operation is distributed and asynchronus, this meant that different peers could give different status values for a rebalance or remove-brick task. With this patch, all the peers will send back the tasks status as a part of the 'volume status' commit op, and the origin peer will aggregate these to arrive at a final status for the task. The aggregation is only done for rebalance or remove-brick tasks. The replace-brick task will have the same status on all the peers (see comment in glusterd_volume_status_aggregate_tasks_status() for more information) and need not be aggregated. The rebalance process has 5 states, NOT_STARTED - rebalance process has not been started on this node STARTED - rebalance process has been started and is still running STOPPED - rebalance process was stopped by a 'rebalance/remove-brick stop' command COMPLETED - rebalance process completed successfully FAILED - rebalance process failed to complete successfully The aggregation is done using the following precedence, STARTED > FAILED > STOPPED > COMPLETED > NOT_STARTED The new changes make the 'volume status tasks' command a distributed command as we need to get the task status from all peers. The following tests were performed, - Start a remove-brick task and do a status command on a peer which doesn't have the brick being removed. The remove-brick status was given correctly as 'in progress' and 'completed', instead of 'not started' - Start a rebalance task, run the status command. The status moved to 'completed' only after rebalance completed on all nodes. Also, change the CLI xml output code for rebalance status to use the same algorithm for status aggregation. Change-Id: Ifd4aff705aa51609a612d5a9194acc73e10a82c0 BUG: 1027094 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6230 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* glusterd: create rpc obj for rebalance only if absentKrishnan Parthasarathi2013-12-041-1/+1
| | | | | | | | | Change-Id: Iff305023577ff92a8f43f24dafcf201f86805769 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6423 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: submit RPC requests without holding big lockAnand Avati2013-12-031-4/+27
| | | | | | | | | | | | | | | If the endpoint of an RPC is not connected, the callback is called synchronously within rpc_clnt_submit(). Since callbacks typically hold the big lock, give up the big lock before calling rpc_clnt_submit and acquire it freshly after the call. Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b BUG: 1037849 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6413 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* cli, glusterd: More quota fixes ...Krutika Dhananjay2013-11-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... which may be grouped under the following categories: 1. Fix incorrect cli exit status for 'quota list' cmd 2. Print appropriate error message on quota parse errors in cli Authored by: Anuradha Talur <atalur@redhat.com> 3. glusterd: Improve quota validation during stage-op 4. Fix peer probe issues resulting from quota conf checksum mismatches 5. Enhancements to CLI output in the event of quota command failures Authored by: Kaushal Madappa <kmadappa@redhat.com> 7. Move aux mount location from /tmp to /var/run/gluster Authored by: Krishnan Parthasarathi <kparthas@redhat.com> 8. Fix performance issues in quota limit-usage Authored by: Krutika Dhananjay <kdhananj@redhat.com> Note: Some functions that were used in earlier version of quota, that aren't called anymore have been removed. Change-Id: I9d874f839ae5fdcfbe6d4f2d727eac091f27ac57 BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6366 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli/glusterd: Changes to quota command Quota featureRaghavendra G2013-11-261-109/+888
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | re-work. Following are the cli commands that are new/re-worked: ====================================================== volume quota <VOLNAME> {enable|disable|list [<path> ...]|remove <path>| default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad]] [detail|clients|mem|inode|fd|callpool] volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history] glusterd changes: ================= * Quota limits are now set as extended attributes by glusterd from the aux mount created by the cli. * The gfids of the directories on which quota limits are set for a given volume are stored in /var/lib/glusterd/vols/<volname>/quota.conf file in binary format, and whose cksum and version is stored in /var/lib/glusterd/vols/<volname>/quota.cksum. Original-author: Krutika Dhananjay <kdhananj@redhat.com> Original-author: Krishnan Parthasarathi <kparthas@redhat.com> BUG: 969461 Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6003 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: fix possible memory leaksBala.FA2013-11-211-1/+3
| | | | | | | | | BUG: 955548 Change-Id: Iae410712e7e6d7a76cd537c77f1919e3b4cdf6bb Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6328 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gNFS: Coverity fix for CID 1128906Santosh Kumar Pradhan2013-11-201-11/+13
| | | | | | | | | | | | | Fix the Coverity issue introduced in RFE: NFS volume set/reset commit i.e. http://review.gluster.org/6236 Change-Id: I817b9da03a3ce7f5511303faea0c50dfdad60ff4 BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6307 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Start rebalance only where requiredKaushal M2013-11-201-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | Gluster was starting rebalance processes on peers where it wasn't required in two cases. - For a normal rebalance command on a volume, rebalance processes were started on all peers instead of just the peers which contain bricks of the volume - For rebalance process being restarted by a volume sync, caused by a new peer being probed or a peer restarting, rebalance processes were started on all peers, for both a normal rebalance and for remove-brick needing rebalance. This patch adds a new check before starting rebalance process in the above two cases. - For rebalance process required by a rebalance command, each peer will check if it contains atleast one brick of the volume - For rebalance process required by a remove-brick command, each peer will check if it contains atleast one of the bricks being removed Change-Id: I512da16994f0d5482889c3a009c46dc20a8a15bb BUG: 1031887 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6301 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: add peerid to volume status xml outputBala.FA2013-11-141-0/+13
| | | | | | | | | | | | This patch adds <peerid> tag to bricks and nfs/shd like services to volume status xml output. BUG: 955548 Change-Id: I9aaa9266e4d56f632235eaeef565e92d757c0694 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-141-0/+92
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: posix/multi-brick support to BD xlatorM. Mohan Kumar2013-11-131-0/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current BD xlator (block backend) has a few limitations such as * Creation of directories not supported * Supports only single brick * Does not use extended attributes (and client gfid) like posix xlator * Creation of special files (symbolic links, device nodes etc) not supported Basic limitation of not allowing directory creation is blocking oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM creates multi-level directories when GlusterFS is used as storage backend for storing VM images. To overcome these limitations a new BD xlator with following improvements is suggested. * New hybrid BD xlator that handles both regular files and block device files * The volume will have both POSIX and BD bricks. Regular files are created on POSIX bricks, block devices are created on the BD brick (VG) * BD xlator leverages exiting POSIX xlator for most POSIX calls and hence sits above the POSIX xlator * Block device file is differentiated from regular file by an extended attribute * The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a posix file to Logical Volume (LV). * When a client sends a request to set BD_XATTR on a posix file, a new LV is created and mapped to posix file. So every block device will have a representative file in POSIX brick with 'user.glusterfs.bd' (BD_XATTR) set. * Here after all operations on this file results in LV related operations. For example opening a file that has BD_XATTR set results in opening the LV block device, reading results in reading the corresponding LV block device. When BD xlator gets request to set BD_XATTR via setxattr call, it creates a LV and information about this LV is placed in the xattr of the posix file. xattr "user.glusterfs.bd" used to identify that posix file is mapped to BD. Usage: Server side: [root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2 It creates a distributed gluster volume 'bdvol' with Volume Group vg1 using posix brick /storage/vg1_info in host1 and Volume Group vg2 using /storage/vg2_info in host2. [root@host1 ~]# gluster volume start bdvol Client side: [root@node ~]# mount -t glusterfs host1:/bdvol /media [root@node ~]# touch /media/posix It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# mkdir /media/image [root@node ~]# touch /media/image/lv1 It also creates regular posix file 'lv1' in either host1:/vg1 or host2:/vg2 brick [root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1 [root@node ~]# Above setxattr results in creating a new LV in corresponding brick's VG and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size' [root@node ~]# truncate -s5G /media/image/lv1 It results in resizig LV 'lv1'to 5G New BD xlator code is placed in xlators/storage/bd directory. Also add volume-uuid to the VG so that same VG can't be used for other bricks/volumes. After deleting a gluster volume, one has to manually remove the associated tag using vgchange <vg-name> --deltag <trusted.glusterfs.volume-id:<volume-id>> Changes from previous version V5: * Removed support for delayed deleting of LVs Changes from previous version V4: * Consolidated the patches * Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR. Changes from previous version V3: * Added support in FUSE to support full/linked clone * Added support to merge snapshots and provide information about origin * bd_map xlator removed * iatt structure used in inode_ctx. iatt is cached and updated during fsync/flush * aio support * Type and capabilities of volume are exported through getxattr Changes from version 2: * Used inode_context for caching BD size and to check if loc/fd is BD or not. * Added GlusterFS server offloaded copy and snapshot through setfattr FOP. As part of this libgfapi is modified. * BD xlator supports stripe * During unlinking if a LV file is already opened, its added to delete list and bd_del_thread tries to delete from this list when a last reference to that file is closed. Changes from previous version: * gfid is used as name of LV * ? is used to specify VG name for creating BD volume in volume create, add-brick. gluster volume create volname host:/path?vg * open-behind issue is fixed * A replicate brick can be added dynamically and LVs from source brick are replicated to destination brick * A distribute brick can be added dynamically and rebalance operation distributes existing LVs/files to the new brick * Thin provisioning support added. * bd_map xlator support retained * setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and setfattr -n user.glusterfs.bd -v "thin" creates thin LV * Capability and backend information added to gluster volume info (and --xml) so that management tools can exploit BD xlator. * tracing support for bd xlator added TODO: * Add support to display snapshots for a given LV * Display posix filename for list-origin instead of gfid Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/4809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Relax extended attribute checks for volume create and add ↵Vijay Bellur2013-10-171-4/+9
| | | | | | | | | | | | | | brick force. Expectation with force is that user is aware of the consequences of sanity checks not being triggered. Change-Id: I79dfeed16a23829a7217cef33ab83f9f0ffae336 Signed-off-by: Vijay Bellur <vbellur@redhat.com> BUG: 1007509 Reviewed-on: http://review.gluster.org/5746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: [Feature] Command implementation to get heal-countVenkatesh Somyajulu2013-10-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia | wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Implementation of command "gluster volume heal vn statistics"Venkatesh Somyajulu2013-10-141-1/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "gluster volume heal volumename statistics" command gives the summary of the afr crawl done based on the entries present in the xattrop directory. Whenever afr crawls are attempted, the beginning time of crawl, end time of crawl, no of files healed, heal-failed count and number of files in split brain are shown along with the type of the crawl. If crawl is already in progress then it will give the number of files healed, heal failed count and number of files in split-brain from the beginning of the crawl and instead of telling the end time of the crawl, "CRAWL IN PROGRESS" message will be shown. Output format: command: "gluster volume heal volume-name statistics" Output: Gathering afr crawl statistics crawl statistics on volume volume-name has been successful ------------------------------------------------ Crawl statistics for brick no 0 Hostname of brick 192.168.122.248 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:38 2013 Ending time of crawl: Wed Jul 10 15:52:38 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 ------------------------------------------------ Crawl statistics for brick no 1 Hostname of brick 192.168.122.1 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 Starting time of crawl: Wed Jul 10 15:52:42 2013 Ending time of crawl: Wed Jul 10 15:52:42 2013 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 -------------------------------------------------- Change-Id: I10bf9d10b005741db9973fb1352e0dd59ed99aa9 BUG: 949400 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4790 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli,glusterd: Implement 'volume status tasks'Krutika Dhananjay2013-10-081-0/+29
| | | | | | | | | | | | | | | | | | | oVirt's Gluster Integration needs an inexpensive command that can be executed every 10 seconds to monitor async tasks and their parameters, for all volumes. The solution involves adding a 'tasks' sub-command to 'volume status' to fetch only the async task IDs, type and other relevant parameters. Only the originator glusterd participates in this command as all the information needed is available on all the nodes. This is to make the command suitable for being executed every 10 seconds. Change-Id: I1edc607baf29b001a5585079dec681d7c641b3d1 BUG: 1012346 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6006 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Calculate volume op-versions only on set/resetKaushal M2013-09-291-2/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The volume op-versions are calculated during a volume set/reset, reading a volume from disk and importing a volume during probe or volume sync. The calculation of the volume op-version depends on the clusters op-version as some features are enabled automatically depending on the clusters op-version. We also don't store the volume op-versions persistently and don't export the volume op-versions during sync. Due to this, there can occur cases which will lead to inconsistencies in volumes in different peers. One such case is below, Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N. The cluster has two volumes V1 and V2, which have volume op-versions N (since volume op-version cannot be greater than cluster op-version). We have, Cluster-op-version = N V1 op-version = N V2 op-version = N A set operation on V1 causes the clusters op-version to be bumped up to N+1. Assume that there exist some features that are automatically enabled on op-version N+1. The op-version of V2 remains at N as no operation has been performed on it. So, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N Now, we probe a new peer P4. On the new peer we will have the following op-versions, Cluster op-version = N+1 V1 op-version = N+1 V2 op-version = N+1 This happens because we don't send volume op-versions during the sync after probe. P4 will freshly calculate the op-version of V2 (assuming features have been auto enabled due to the cluster op-version being N+1) as N+1. Another case is when glusterd on a peer restarts. Assume P3 was restarted, glusterd will recalculate the volume op-versions during the restore state. Again, op-version of V2 will be calculated as N+1 assuming auto enabled features. This will lead to inconsistency in the volume representation in memory and on disk, as glusterd will assume the volume contains auto enabled features, but the volfiles don't contain them as they were not regenrated. These kind of issues can be solved by calculating the volume op-version only when features are enabled and disabled (ie. during volume set/reset), persisting the volume-op-versions and exporting/importing them. Change-Id: I52de0668c92628622e85f4588fb28829a7231132 BUG: 1005043 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5568 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Saving geo-rep session details in a more specific pathVenky Shankar2013-09-041-0/+12
| | | | | | | | | | | | | | | Now saving the session details in /var/lib/glusterd/geo-replication/<mastervol>_<slaveip>_<slavevol> repo to distinguish between two master-slave sessions where the slavename is same across two different clusters. Change-Id: I57c93f55cc9bd4fe2bffe579028aaf5e4335b223 BUG: 991501 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5488 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Regenerate client volfiles during upgradeshishir gowda2013-09-031-6/+27
| | | | | | | | | | | Change-Id: I1442bc1d115a9c6ecf139a0ca9da74d07e0fe928 BUG: 1003855 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5764 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Use volume op-versions during volgenKaushal M2013-08-021-2/+2
| | | | | | | | | | | | | | Instead of using the cluster op-version, volume op-version is used to enable open-behind during volgen. For doing this, the volume op-versions are updated before regenerating the volfiles. Change-Id: I675bb549bf7c7c0279030dca698fb530781addc6 BUG: 990830 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5385 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Re-initialize skipped file count in glusterdshishir gowda2013-07-311-0/+1
| | | | | | | | | Change-Id: I42d08b3a6a7a3839f5e9953e1f83959222c080f8 Signed-off-by: shishir gowda <sgowda@redhat.com> BUG: 989846 Reviewed-on: http://review.gluster.org/5446 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Treat migration failures due to space constraints as skippedshishir gowda2013-07-301-0/+27
| | | | | | | | | | | | | | | | Currently rebalance/remove-brick op's display migration failed count even for files which failed due to space issues (not enough space for file, or migration leading to cluster imbalance) These will now be counted as skipped, and rebalance/remove-brick status will display the additional counter Change-Id: I674904d380b5f8300e9ca9e6af557c3d30d6cff4 BUG: 989846 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5399 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: let each brick write the valgrind o/p to different fileRaghavendra Bhat2013-07-261-1/+5
| | | | | | | | | | | | | Till now all the brick processes were writing the valgrind information to the same log file. Change-Id: I0251c943935e2901b729c71f21d0677edb9f6867 BUG: 922877 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/5394 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/cli changes for distributed geo-repAvra Sengupta2013-07-261-25/+208
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commands: gluster system:: execute gsec_create gluster volume geo-rep <master> <slave-url> create [push-pem] [force] gluster volume geo-rep <master> <slave-url> start [force] gluster volume geo-rep <master> <slave-url> stop [force] gluster volume geo-rep <master> <slave-url> delete gluster volume geo-rep <master> <slave-url> config gluster volume geo-rep <master> <slave-url> status The geo-replication is distributed. The session will be created, and gsyncd will be spawned on all relevant nodes, instead of only one node. geo-rep: Collecting status detail related data Added persistent store for saving information about TotalFilesSynced, TotalSyncTime, TotalBytesSynced Changes in the status information in socket: Existing(Ex): FilesSynced=2;BytesSynced=2507;Uptime=00:26:01; New(Ex): FilesSynced=2;BytesSynced=2507;Uptime=00:26:01;SyncTime=0.69978; TotalSyncTime=2.890044;TotalFilesSynced=6;TotalBytesSynced=143640; Persistent details stored in /var/lib/glusterd/geo-replication/${mastervol}/${eSlave}-detail.status Change-Id: I1db7fc13ffca2e415c05200b0109b1254067f111 BUG: 847839 Original Author: Avra Sengupta <asengupt@redhat.com> Original Author: Venky Shankar <vshankar@redhat.com> Original Author: Aravinda VK <avishwan@redhat.com> Original Author: Amar Tumballi <amarts@redhat.com> Original Author: Csaba Henk <csaba@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5132 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: changelog translatorAvra Sengupta2013-07-221-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the initial version of the Changelog Translator. What is it ----------- Goal is to capture changes performed on a GlusterFS volume. The translator needs to be loaded on the server (bricks) and captures changes in a plain text file inside a configured directory path (controlled by "changelog-dir", should be somewhere in <export>/.glusterfs/changelog by default). Changes are classified into 3 types: - Data: : TYPE-I - Metadata : TYPE-II - Entry : TYPE-III Changelog file is rolled over after a certain time interval (defauls to 60 seconds) after which a changelog is started. The thing to be noted here is that for a time interval (time slice) multiple changes for an inode are recorded only once (ie. say for 100+ writes on an inode that happens within the time slice has only a single corresponding entry in the changelog file). That way we do not bloat up the changelog and also save lots of writes. Changelog Format ----------------- TYPE-I and TYPE-II changes have the gfid on the entity on which the operation happened. TYPE-III being a entry op requires the parent gfid and the basename. Changelog format has been kept to a minimal and it's upto the consumers to do the heavy loading of figuring out deletes, renames etc.. A single changelog file records all three types of changes, with each change starting with an identifier ("D": DATA, "M": METADATA and "E": ENTRY). Option is provided for the encoding type (See TUNABLES). Consumers ---------- The only consumer as of today would be geo-replication, although backup utilities, self-heal, bit-rot detection could be possible consumers in the future. CLI ---- By default, change-logging is disabled (the translator is present in the server graph but does nothing). When enabled (via cli) each brick starts to log the changes. There are a set of tunable that can be used to change the translators behaviour: - enable/disable changelog (disabled by default) gluster volume set <volume> changelog {on|off} - set the logging directory (<brick>/.glusterfs/changelogs is the default) gluster volume set <volume> changelog-dir /path/to/dir - select encoding type (binary (default) or ascii) gluster volume set <volume> encoding {binary|ascii} - change the rollover time for the logs (60 secs by default) gluster volume set <volume> rollover-time <secs> - when secs > 0, changelog file is not open()'d with O_SYNC flag - and fsync is trigerred periodically every <secs> seconds. gluster volume set <volume> fsync-interval <secs> features/changelog: changelog consumer library (libgfchangelog) A shared library is provided for the consumer of the changelogs for easy acess via APIs. Application can link against this library and request for changelog updates. Conversion of binary logs to human-readable ascii format is also taken care by the library which keeps a copy of the changelog in application provided working directory. Change-Id: I75575fb7f1c53d2bec3dba1a329ea7bb3c628497 BUG: 847839 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5127 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Give up biglock before brick's rpc unrefKrishnan Parthasarathi2013-07-111-1/+5
| | | | | | | | | | | | | This is to prevent the possibility of a deadlock when rpc_connection_cleanup being called in the same thread as rpc_clnt_unref Change-Id: Ia4dcc0a8a6e6158d4ddec68b780fccbc4cd64adb BUG: 962619 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5321 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd/common-utils: move hostname helper functions to common-utilsKrishnan Parthasarathi2013-07-041-234/+2
| | | | | | | | | Change-Id: If47e209cb61ea0eb74ee2d6ef9e9342b2d6ee13a BUG: 980838 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5261 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: More checks before starting rebalance/remove-brickKaushal M2013-07-021-0/+13
| | | | | | | | | | | | Check if a previous remove-brick operation has been committed before starting a new rebalance/remove-brick task. Change-Id: I553e5ba64a6a352ca91032ab1a17997051a4494e BUG: 963541 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5019 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* store: move glusterd_store functions from mgmt/glusterd to libglusterfsNiels de Vos2013-06-201-1/+1
| | | | | | | | | | | | | Making the glusterd_store_* functions re-usable will help with future changes that need to read/write lists of items. BUG: 904065 Change-Id: I99fb8eced76d12d5a254567eccff9790b43d8da3 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/4676 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Disable transport before cleaning up rpc objectKrishnan Parthasarathi2013-06-181-17/+50
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: rpc_transport object, which is part of rpc_clnt, is destroyed prematurely. This is because, rpc_transport object is ref'd by socket layer and rpc layer. These ref's, until the synctask'izing of operations, were unref'd sequentially in the epoll thread. With more threads at play, the sequential unref guarantee is off. Fix: Shutting down the transport before proceeding with cleaning up of rpc_clnt object would serialize the unref's on the rpc_transport object and thus eliminating the race. Also, we don't store the address of brickinfo in brick's rpc notify function, to avoid the possibility of referring a freed brickinfo. Instead we use a string based id to 'reach' the corresponding brickinfo. Change-Id: If2739e2eeaee1e8b071ab2b6754b7ea0f81cfceb BUG: 962619 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5000 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>