summaryrefslogtreecommitdiffstats
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
...
* geo-rep: Fix sync hang with tarsshKotresh HR2019-05-132-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep sync hangs when tarssh is used as sync engine at heavy workload. Analysis and Root cause: It's found out that the tar process was hung. When debugged further, it's found out that stderr buffer of tar process on master was full i.e., 64k. When the buffer was copied to a file from /proc/pid/fd/2, the hang is resolved. This can happen when files picked by tar process to sync doesn't exist on master anymore. If this count increases around 1k, the stderr buffer is filled up. Fix: The tar process is executed using Popen with stderr as PIPE. The final execution is something like below. tar | ssh <args> root@slave tar --overwrite -xf - -C <path> It was waiting on ssh process first using communicate() and then tar. Note that communicate() reads stdout and stderr. So when stderr of tar process is filled up, there is no one to read until untar via ssh is completed. This can't happen and leads to deadlock. Hence we should be waiting on both process parallely, so that stderr is read on both processes. Change-Id: I609c7cc5c07e210c504771115b4d551a2e891adf fixes: bz#1707728 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cli: Validate invalid slave urlKotresh HR2019-05-111-0/+4
| | | | | | | | | This patch validates the invalid slave url in cli itself and throws appropriate error. fixes: bz#1098991 Change-Id: I278e2a04a4d619d2c2d1db0dd56ab5bdf7e7f469 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd: Add gluster volume stop operation to glusterd_validate_quorum()Vishal Pandey2019-05-111-1/+3
| | | | | | | | | | | | | | ISSUE: gluster volume stop succeeds even if quorum is not met. Fix: Add GD_OP_STOP_VOLUME to gluster_validate_quorum in glusterd_mgmt_v3_pre_validate (). Since the volume stop command has been ported from synctask to mgmt_v3, the quorum check was missed out. Change-Id: I7a634ad89ec2e286ea262d7952061efad5360042 fixes: bz#1690753 Signed-off-by: Vishal Pandey <vpandey@redhat.com>
* tests: fix bug-1319374.c compile warnings.Ravishankar N2019-05-101-0/+1
| | | | | | | | | | | | | | | | | I was looking at a downstream failure of bug-1319374-THIS-crash.t when I saw the compiler was throwing a warning while running the test: tests/bugs/gfapi/bug-1319374.c:17:61: warning: implicit declaration of function ‘strerror’; did you mean ‘perror’? [-Wimplicit-function-declaration] fprintf(stderr, "\nglfs_new: returned NULL (%s)\n", strerror(errno)); ^~~~~~~~ perror So I compiled the .c with -Wall and saw a lot many more warnings, all due of a missing header. This patch fixes it. fixes: bz#1708163 Change-Id: I8b6dd8e1404178a3d99b2d92d01f4575f5203e58 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* shd/glusterd: Serialize shd manager to prevent race conditionMohammed Rafi KC2019-05-101-0/+54
| | | | | | | | | | | At the time of a glusterd restart, while doing a handshake there is a possibility that multiple shd manager might get executed. Because of this, there is a chance that multiple shd get spawned during a glusterd restart Change-Id: Ie20798441e07d7d7a93b7d38dfb924cea178a920 fixes: bz#1707081 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* tests: improve and fix some test scriptsXavier Hernandez2019-05-0918-69/+162
| | | | | | Change-Id: Iceefe22af754096c599dc570d4894d14fce4deae Updates: bz#1193929 Signed-off-by: Xavier Hernandez <xhernandez@redhat.com>
* geo-rep: Fix sync-method configKotresh HR2019-05-092-2/+2
| | | | | | | | | | | | | | | | | | Problem: When 'use_tarssh' is set to true, it exits with successful message but the default 'rsync' was used as sync-engine. The new config 'sync-method' is not allowed to set from cli. Analysis and Fix: The 'use_tarssh' config is deprecated with new config framework and 'sync-method' is the new config to choose sync-method i.e. tarssh or rsync. This patch fixes the 'sync-method' config. The allowed values are tarssh and rsync. Change-Id: I0edb0319cad0455b29e49f2f08a64ce324735e84 fixes: bz#1707686 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tests/geo-rep: Fix arequal checksum comparisonKotresh HR2019-05-095-9/+10
| | | | | | | | | The arequal checkusm comparison was always returning as successful, eventhough, if it was not. Fixed the same. Change-Id: I5083da25c0954126e452d06311d2d376f8540555 fixes: bz#1707742 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tests: enhance the auth.allow test to validate all failures of 'login' moduleAmar Tumballi2019-05-081-4/+49
| | | | | | | | now the enhanced test covers most of the code in auth.login and auth.addr module. updates: bz#1693692 Change-Id: I1f43c7dc414e2e4d443a93e9a37051359fd46ea4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* dht: Custom xattrs are not healed in case of add-brickroot2019-05-081-0/+67
| | | | | | | | | | | | | | | | Problem: If any custom xattrs are set on the directory before add a brick, xattrs are not healed on the directory after adding a brick. Solution: xattr are not healed because dht_selfheal_dir_mkdir_lookup_cbk checks the value of MDS and if MDS value is not negative selfheal code path does not take reference of MDS xattrs.Change the condition to take reference of MDS xattr so that custom xattrs are populated on newly added brick Updates: bz#1702299 Change-Id: Id14beedb98cce6928055f294e1594b22132e811c Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests: delete the snapshots and the volume after the testsRaghavendra Bhat2019-05-061-0/+22
| | | | | | | | | | | | In uss.t multiple snapshots are taken and after all the tests things are left for the cleanup () function to get removed. Instead of that, delete the snapshots and the volume once all the tests are over so that cleanup operation becomes relatively a light operation. Change-Id: I2342740bbb185cd6c9a450eb3b4f5cbbba78974c fixes: bz#1704888 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* tests: validate volfile grammar - strings in volfileAmar Tumballi2019-05-061-0/+73
| | | | | | | | * libglusterfs/graph-print: remove unused code updates: bz#1693692 Change-Id: Iae81bb6a3af5911c3da07ab8f1d8f58f27e06905 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests/cli: add .t file to increase line coverage in cliSanju Rakonde2019-05-021-0/+21
| | | | | | | updates: bz#1693692 Change-Id: Ib188c5fddea8c762e89ff15aa83b08c35cdb21e1 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* tests: add .t files to increase cli code coveragerishubhjain2019-05-022-2/+3
| | | | | | | | different volume profile sub options are added in the test. Change-Id: I93100c37f51afc10870e60b91fcd86e7859e734a updates: bz#1693692 Signed-off-by: rishubhjain <rishubhjain47@gmail.com>
* tests: Add changelog snapshot testcaseKotresh HR2019-05-021-0/+60
| | | | | | | | | | Add testcase to test snapshot creation while I/O is happening with changelog enabled. updates: bz#1193929 Change-Id: Ice4cb596286c583ed7308484d65902007a48396c Signed-off-by: Kotresh HR <khiremat@redhat.com>
* nl-cache:add test to increase code coverageSheetal Pamecha2019-04-291-0/+30
| | | | | | Change-Id: Ie0a5c522dfa0123ca45f9decf5015d39b92cb0f3 updates: bz#1693692 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* performance/decompounder: remove the translator as the feature is not used ↵Amar Tumballi2019-04-291-6/+1
| | | | | | | | anymore updates: bz#1693692 Change-Id: Id5932b11e115ca6da1c2bfff7ae1460787109e06 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd: define dumpops in the xlator_api of glusterdSanju Rakonde2019-04-271-0/+13
| | | | | | | | | | | | | | Problem: statedump is not capturing information related to glusterd Solution: statdump is not capturing glusterd info because trav->dumpops is null in gf_proc_dump_single_xlator_info () where trav is glusterd xlator object. trav->dumpops is null because we missed to define dumpops in xlator_api of glusterd. defining dumpops in xlator_api of glusterd fixes the issue. fixes: bz#1703629 Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* geo-rep: Fix rename with existing destination with same gfidSunny Kumar2019-04-265-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep fails to sync the rename properly if destination exists. It results in source to be remained on slave causing more number of files on slave. Also heavy rename workload like logrotate caused lot of ESTALE errors Cause: Geo-rep fails to sync rename if destination exists if creation of source file also falls into single batch of changelogs being processed. This is because, after fixing problematic gfids verifying from master, while re-processing original entries, CREATE also was re-processed causing more files on slave and rename to be failed. Solution: Entries need to be removed from retrial list after fixing problematic gfids on slave so that it's not re-created again on slave. Also treat ESTALE as EEXIST so that the error is properly handled verifying the op on master volume. Change-Id: I50cf289e06b997adddff0552bf2466d9201dd1f9 fixes: bz#1694820 Signed-off-by: Kotresh HR <khiremat@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* features/bit-rot: Unconditionally sign the files during oneshot crawlRaghavendra Bhat2019-04-251-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently bit-rot feature has an issue with disabling and reenabling it on the same volume. Consider enabling bit-rot detection which goes on to crawl and sign all the files present in the volume. Then some files are modified and the bit-rot daemon goes on to sign the modified files with the correct signature. Now, disable bit-rot feature. While, signing and scrubbing are not happening, previous checksums of the files continue to exist as extended attributes. Now, if some files with checksum xattrs get modified, they are not signed with new signature as the feature is off. At this point, if the feature is enabled again, the bit rot daemon will go and sign those files which does not have any bit-rot specific xattrs (i.e. those files which were created after bit-rot was disabled). Whereas the files with bit-rot xattrs wont get signed with proper new checksum. At this point if scrubber runs, it finds the on disk checksum and the actual checksum of the file to be different (because the file got modified) and marks the file as corrupted. FIX: The fix is to unconditionally sign the files when the bit-rot daemon comes up (instead of skipping the files with bit-rot xattrs). Change-Id: Iadfb47dd39f7e2e77f22d549a4a07a385284f4f5 fixes: bz#1700078 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* tests/geo-rep: Add pause and resume test case for geo-repShwetha K Acharya2019-04-242-0/+12
| | | | | | | | Added pause and resume test case for geo-rep fixes: bz#1696077 Change-Id: Ib6fcc1926c3be1263bca1235194f737b895c8333 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* tests: add .t files to increase cli code coveragerishubhjain2019-04-241-0/+62
| | | | | | | | | Tests added for gluster volume top and profile with and without xml output Change-Id: I66aa6390b53ca448014059a3d27dc72e405216d2 updates: bz#1693692 Signed-off-by: rishubhjain <rishubhjain47@gmail.com>
* tests: add .t file to increase cli code coverageSanju Rakonde2019-04-243-1/+97
| | | | | | | updates: bz#1693692 Change-Id: I848e622d7b8562e864f0e208aafdc21d9cb757d3 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* cluster/ec: fix fd reopenXavi Hernandez2019-04-232-11/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1699866 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* extras/hooks: syntactical errors in SELinux hooks, scipt logic improvedMilan Zink2019-04-181-1/+3
| | | | | | Fixes: bz#1542072 Change-Id: Ia5fa1df81bbaec3a84653d136a331c76b457f42c Signed-off-by: Milan Zink <zeten30@gmail.com>
* tests: Heal should fail when read/write failsPranith Kumar K2019-04-161-0/+65
| | | | | | updates: bz#1699866 Change-Id: I7ccd1fc5fc134eeb6d443c755962a20819320d48 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterd: Optimize glusterd handshaking code pathMohit Agrawal2019-04-151-0/+69
| | | | | | | | | | | | | | | | | | | | Problem: At the time of handshaking glusterd populate volume data in a dictionary.While no. of volumes are configured more than 1500 glusterd takes more than 10 min to generated the data.Due to taking more time rpc request times out and rpc start bailing of call frames. Solution: To optimize the code done below changes 1) Spawn multiple threads to populate volumes data in bulk in separate dictionary and introduce an option glusterd.brick-dict-thread-count to configure no. of threads to populate volume data. 2) Populate tier data only while volume type is tier 3) Compare snap data only while snap_count is non zero Fixes: bz#1699339 Change-Id: I38dc71970c049217f9d1a06fc0aaf4c26eab18f5 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* cluster/afr: Remove local from owners_list on failure of lock-acquisitionPranith Kumar K2019-04-151-0/+47
| | | | | | | | | | | | | When eager-lock lock acquisition fails because of say network failures, the local is not being removed from owners_list, this leads to accumulation of waiting frames and the application will hang because the waiting frames are under the assumption that another transaction is in the process of acquiring lock because owner-list is not empty. Handled this case as well in this patch. Added asserts to make it easier to find these problems in future. fixes bz#1696599 Change-Id: I3101393265e9827755725b1f2d94a93d8709e923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* core: Log level changes do not effect on running client processMohit Agrawal2019-04-151-0/+113
| | | | | | | | | | | | | | | | | | | Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Fixes: bz#1696046
* posix/ctime: Fix stat(time attributes) inconsistency during readdirpKotresh HR2019-04-152-0/+79
| | | | | | | | | | | | | | | | | | | | Problem: Creation of tar file on gluster volume throws warning 'file changed as we read it' Cause: During readdirp, for few of the files whose inode is not present, time attributes were served from backend. This caused the ctime of few files to be different between before readdir and after readdir by tar. Solution: If ctime feature is enabled and inode is not present, don't serve the time attributes from backend file, serve it from xattr. fixes: bz#1698078 Change-Id: I427ef865f97399475faf5aa6ca495f7e317603ae Signed-off-by: Kotresh HR <khiremat@redhat.com>
* core: Brick is not able to detach successfully in brick_mux environmentMohit Agrawal2019-04-141-0/+33
| | | | | | | | | | | | | | | | | | | | | Problem: In brick_mux environment, while volumes are stopped in a loop bricks are not detached successfully. Brick's are not detached because xprtrefcnt has not become 0 for detached brick. At the time of initiating brick detach process server_notify saves xprtrefcnt on detach brick and once counter has become 0 then server_rpc_notify spawn a server_graph_janitor_threads for cleanup brick resources.xprtrefcnt has not become 0 because socket framework is not working due to assigning 0 as a fd for socket. In commit dc25d2c1eeace91669052e3cecc083896e7329b2 there was a change in changelog fini to close htime_fd if htime_fd is not negative, by default htime_fd is 0 so it close 0 also. Solution: Initialize htime_fd to -1 after just allocate changelog_priv by GF_CALLOC Fixes: bz#1699025 Change-Id: I5f7ca62a0eb1c0510c3e9b880d6ab8af8d736a25 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests/dht: Test that lookups are sent post brick upN Balachandran2019-04-121-0/+83
| | | | | | Change-Id: I3556793c5e9d58cc6a08644b41dc5740fab2610b updates: bz#1628194 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* test: Change glustershd_pid update in .t fileMohit Agrawal2019-04-122-3/+4
| | | | | | | | | | | Problem: bug-1650403.t && bug-858215.t are throwing error at the time of access glustershd pidfile Solution: Use ps command to findout glustershd pid Change-Id: I3477345b6220aa039e012e674cba21d741e9abab fixes: bz#1697486 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* tests: make sure to traverse all of meta dirAmar Tumballi2019-04-121-0/+27
| | | | | | | | Just to make all files will be listed, which means we have max code-coverage updates: bz#1693692 Change-Id: I11d36ac2f4d6d4fb91223aacd423ad23242eb454 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests: correctly check open fd's when gfid is missingXavi Hernandez2019-04-101-0/+3
| | | | | | | | | | | | | | | | | The helper funcion get_fd_count() returns how many open fd's has a given gfid on a brick. It could happen that the brick doesn't have information about that inode because it has not been previously accessed. Before this patch, the function returned "" when the inode was not present. This caused basic/ec/ec-fix-openfd.t test to fail because it was expecting '0' as the result. This patch forces get_fd_count() to return '0' when the gfid is not present in the state dump. Change-Id: I848b57744e96656bf81fbb7b126a5faf44e535eb updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* mgmt/glusterd: Make changes related to cloudsync xlatorAnuradha Talur2019-04-102-0/+69
| | | | | | | | | | 1) The placement of cloudsync xlator has been changed to make it shard xlator's child. If cloudsync has to work with shard in the graph, it needs to be child of shard. Change-Id: Ib55424fdcb7ce8edae9f19b8a6e3d3ba86c1f0c4 fixes: bz#1642168 Signed-off-by: Anuradha Talur <atalur@commvault.com>
* protocol: add an option to force using old-protocolAmar Tumballi2019-04-101-0/+31
| | | | | | | | | | | | | | As protocol implements every fop, and in general a large part of the codebase. Considering our regression is run mostly in 1 machine, there was no way of forcing the client to use old protocol (while new one is available). With this patch, a new 'testing' option is provided which forces client to use old protocol if found. This should help increase the code coverage by at least 10k lines overall. updates: bz#1693692 Change-Id: Ie45256f7dea250671b689c72b4b6f25037cef948 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* ec: increase line coverage of ecXavi Hernandez2019-04-101-1/+2
| | | | | | | | | | | Test ec-cpu-extensions.t has been modified so that it uses a bigger matrix. This makes use of more functions from ec-code-c.c. Changing read-policy to round-robin increases even more the functions used, reaching 100% of line and function coverage for this file. Change-Id: I26e4d33269cbd67f5d76d862f4cf1e69285e85e1 updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* tests: add a tests for trace xlatorAmar Tumballi2019-04-101-0/+33
| | | | | | | | this test alone covers most of code of trace xlator updates: bz#1693692 Change-Id: I287c72ee89bd1c02d992b020d5644e8dac0b77ab Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/dht: refactor dht lookup functionsN Balachandran2019-04-051-0/+145
| | | | | | | | | | Part 1: refactor the dht_lookup_dir_cbk and dht_selfheal_directory functions. Added a simple dht selfheal directory test Change-Id: I1410c26359e3c14b396adbe751937a52bd2fcff9 updates: bz#1590385 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Invalidate inode on change of split-brain-choicePranith Kumar K2019-04-051-0/+12
| | | | | | | | | | | When split-brain choice is changed from one brick to another brick, inode-invalidate is not called so readv call is served from cache leading to failures in split-brain-resolution.t. Fixed it by calling inode_invaldate() when this happens. updates bz#1193929 Change-Id: I2624614eec38c0303f3e1dc55dfae3d4b864218b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* tests/bitrot: enable self-heal daemon before accessing the filesRaghavendra Bhat2019-04-041-0/+3
| | | | | | | | | | | | | | For testing the recovery of bad (or corrupted files) in a dispersed volume, first enable self-heal daemon and let heal happen. In bitrot feature, if a file becomes corrupted, the solution recommended is to remove that file directly from the backend and then allowing heal to happen. Hence turn on self-heal daemon and allow the heal to happen after removing corrupted copy from the backend. Change-Id: I7186110398ec1aee7e5727b9d1aac9a01db4d831 fixes: bz#1695327 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* posix-acl: remove default functions, and use library fn insteadAmar Tumballi2019-04-031-0/+5
| | | | | | | | | | | this works as a better solution, as we reuse more functions from library. Also just do write/read on a file when acl is enabled, so we can see improvement in code coverage. updates: bz#1693692 Change-Id: If3359260c8ec2cf4fcf148fb4b95fdecc922c252 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests: add statedump to playgroundAmar Tumballi2019-04-011-0/+4
| | | | | | | | It helps in increased code coverage of playground. updates: bz#1693692 Change-Id: I81bcf30be1450948a6360d8915f06b973387a560 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mgmt/shd: Implement multiplexing in self heal daemonMohammed Rafi KC2019-04-013-26/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Shd daemon is per node, which means they create a graph with all volumes on it. While this is a great for utilizing resources, it is so good in terms of performance and managebility. Because self-heal daemons doesn't have capability to automatically reconfigure their graphs. So each time when any configurations changes happens to the volumes(replicate/disperse), we need to restart shd to bring the changes into the graph. Because of this all on going heal for all other volumes has to be stopped in the middle, and need to restart all over again. Solution: This changes makes shd as a per volume daemon, so that the graph will be generated for each volumes. When we want to start/reconfigure shd for a volume, we first search for an existing shd running on the node, if there is none, we will start a new process. If already a daemon is running for shd, then we will simply detach a graph for a volume and reatach the updated graph for the volume. This won't touch any of the on going operations for any other volumes on the shd daemon. Example of an shd graph when it is per volume graph ----------------------- | debug-iostat | ----------------------- / | \ / | \ --------- --------- ---------- | AFR-1 | | AFR-2 | | AFR-3 | -------- --------- ---------- A running shd daemon with 3 volumes will be like--> graph ----------------------- | debug-iostat | ----------------------- / | \ / | \ ------------ ------------ ------------ | volume-1 | | volume-2 | | volume-3 | ------------ ------------ ------------ Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99 fixes: bz#1659708 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* protocol/client: Do not fallback to anon-fd if fd is not openPranith Kumar K2019-03-311-0/+36
| | | | | | | | | | | | | | | | | | If an open comes on a file when a brick is down and after the brick comes up, a fop comes on the fd, client xlator would still wind the fop on anon-fd leading to wrong behavior of the fops in some cases. Example: If lk fop is issued on the fd just after the brick is up in the scenario above, lk fop will be sent on anon-fd instead of failing it on that client xlator. This lock will never be freed upon close of the fd as flush on anon-fd is invalid and is not wound below server xlator. As a fix, failing the fop unless the fd has FALLBACK_TO_ANON_FD flag. Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc fixes bz#1390914 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: thin-arbiter read txn fixesRavishankar N2019-03-291-0/+40
| | | | | | | | | | | | | - Fixes afr_ta_read_txn() to handle inode refresh failures. code-path. - Fixes a double free issue of dict. Note: This patch address post-merge review comments for commit 69532c141be160b3fea03c1579ae4ac13018dcdf fixes: bz#1686398 Change-Id: Id5299b45b68569d47df6b73755918237a1592cb4 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* geo-rep: Fix syncing multiple rename of symlinkKotresh HR2019-03-292-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Geo-rep fails to sync rename of symlink if it's renamed multiple times if creation and rename happened successively Worker crash at slave: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", in entry_ops [ESTALE, EINVAL, EBUSY]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", in lsetxattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory Geo-rep Behaviour: 1. SYMLINK doesn't record target path in changelog. So while syncing SYMLINK, readlink is done on master to get target path. 2. Geo-rep will create destination if source is not present while syncing RENAME. Hence while syncing RENAME of SYMLINK, target path is collected from destination. Cause: If symlink is created and renamed multiple times, creation of symlink is ignored, as it's no longer present on master at that path. While symlink is renamed multiple times at master, when syncing first RENAME of SYMLINK, both source and destination is not present, hence target path is not known. In this case, while creating destination directly at slave, regular file attributes were encoded into blob instead of symlink, causing failure in gfid-access translator while decoding blob. Solution: While syncing of RENAME of SYMLINK, when target is not known and when src and destination is not present on the master, don't create destination. Ignore the rename. It's ok to ignore. If it's unliked, it's fine. If it's renamed to something else, it will be synced then. Change-Id: Ibdfa495513b7c05b5370ab0b89c69a6802338d87 fixes: bz#1693648 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* geo-rep: fix integer config validationAravinda VK2019-03-271-0/+3
| | | | | | | | | ssh-port validation is mentioned as `validation=int` in template `gsyncd.conf`, but not handled this during geo-rep config set. Fixes: bz#1692666 Change-Id: I3f19d9b471b0a3327e4d094dfbefcc58ed2c34f6 Signed-off-by: Aravinda VK <avishwan@redhat.com>
* cluster/ec: Don't enqueue an entry if it is already healingAshish Pandey2019-03-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1 - heal-wait-qlength is by default 128. If shd is disabled and we need to heal files, client side heal is needed. If we access these files that will trigger the heal. However, it has been observed that a file will be enqueued multiple times in the heal wait queue, which in turn causes queue to be filled and prevent other files to be enqueued. 2 - While a file is going through healing and a write fop from mount comes on that file, it sends write on all the bricks including healing one. At the end it updates version and size on all the bricks. However, it does not unset dirty flag on all the bricks, even if this write fop was successful on all the bricks. After healing completion this dirty flag remain set and never gets cleaned up if SHD is disabled. Solution: 1 - If an entry is already in queue or going through heal process, don't enqueue next client side request to heal the same file. 2 - Unset dirty on all the bricks at the end if fop has succeeded on all the bricks even if some of the bricks are going through heal. Change-Id: Ia61ffe230c6502ce6cb934425d55e2f40dd1a727 updates: bz#1593224 Signed-off-by: Ashish Pandey <aspandey@redhat.com>