summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/dht: Fix rename journal in changelogKotresh HR2019-04-021-0/+11
| | | | | | | | | | | | | | | | | | | | | With patch [1], renames are journalled only on cached subvolume. The dht sends the special key on the cached subvolume so that the changelog journals the rename. With single distribute sub-volume, the key is not being set. This patch fixes the same. [1] https://review.gluster.org/10410 Backport of: > Patch: https://review.gluster.org/20093 > BUG: 1583018 > Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1660225 Change-Id: Ic2e35b40535916fa506a714f257ba325e22d0961 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/dht: sync brick root perms on add brickN Balachandran2019-03-271-18/+8
| | | | | | | | | | | | | | | | | If a single brick is added to the volume and the newly added brick is the first to respond to a dht_revalidate call, its stbuf will not be merged into local->stbuf as the brick does not yet have a layout. The is_permission_different check therefore fails to detect that an attr heal is required as it only considers the stbuf values from existing bricks. To fix this, merge all stbuf values into local->stbuf and use local->prebuf to store the correct directory attributes. Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc fixes: bz#1693057 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Send truncate on arbiter brick from SHDkarthik-us2019-03-211-14/+12
| | | | | | | | | | | | | | | | | | | Problem: In an arbiter volume configuration SHD will not send any writes onto the arbiter brick even if there is data pending marker for the arbiter brick. If we have a arbiter setup on the geo-rep master and there are data pending markers for the files on arbiter brick, SHD will not mark any data changelog during healing. While syncing the data from master to slave, if the arbiter-brick is considered as ACTIVE, then there is a chance that slave will miss out some data. If the arbiter brick is being newly added or replaced there is a chance of slave missing all the data during sync. Fix: If there is data pending marker for the arbiter brick, send truncate on the arbiter brick during heal, so that it will record truncate as the data transaction in changelog. Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec fixes: bz#1687746 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr: open_ftruncate_cbk should read fd from local->cont.open structSoumya Koduri2018-12-261-1/+1
| | | | | | | | | | | | afr_open stores the fd as part of its local->cont.open struct but when it calls ftruncate (if open flags contain O_TRUNC), the corresponding cbk function (afr_ open_ftruncate_cbk) is incorrectly referencing uninitialized local->fd. This patch fixes the same. Change-Id: Icbdedbd1b8cfea11d8f41b6e5c4cb4b44d989aba updates: bz#1655527 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* afr: assign gfid during name heal when no 'source' is presentRavishankar N2018-12-034-51/+55
| | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21297/ Problem: If parent dir is in split-brain or has dirty xattrs set, and the file has gfid missing on one of the bricks, then name heal won't assign the gfid. Fix: Use the brick we select the gfid from as the 'source'. Note: Problem was found while trying to debug a split-brain issue on Cynthia Zhou's setup. fixes: bz#1655561 Change-Id: Id088d4f0fb017aa35122de426654194e581ed742 Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr/lease: Read child nodes from lease structureroot2018-10-311-1/+1
| | | | | | | | | | For lease operation, we allocate and store child nodes data in lease structure. Use the same in afr_lease_cbk() while checking for the quorum. Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6 fixes: bz#1644474 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* afr: prevent winding inodelks twice for arbiter volumesRavishankar N2018-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21380/ Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1637953 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* afr: fix incorrect reporting of directory split-brainRavishankar N2018-10-057-16/+22
| | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/glusterfs/+/21135/ Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 fixes: bz#1633634 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Make data eager-lock decision based on number of locksPranith Kumar K2018-10-053-10/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For both Virt and block workloads the file is opened multiple times leading to dynamically setting eager-lock to off for the workload. Instead of depending on the number-of-open-fds, if we change the logic to depend on number of inodelks, then it will give better performance than the earlier logic. When there is an eager-lock and number of inodelks is more than 1 we know that there is a conflicting lock, so depend on that information to decide whether to keep the current transaction go through delayed-post-op or not. Locks xlator doesn't have implementation to query number of locks in fxattrop in releases older than 3.10 so to keep things backward compatible in 3.12, data transactions will use new logic where as fxattrop transactions will use old logic. I am planning to send one more patch which makes metadata domain locks also depend on inodelk-count Profile info for a dd of 500MB to a file with another fd opened on the file using exec 250>filename Without this patch: 0.14 67.41 us 16.72 us 3870.82 us 892 FINODELK 0.59 279.87 us 95.71 us 2085.89 us 898 FXATTROP 3.46 366.43 us 81.75 us 6952.79 us 4000 WRITE 95.79 148733.99 us 50568.12 us 919127.86 us 273 FSYNC With this patch: 0.00 51.01 us 38.07 us 80.16 us 4 FINODELK 0.00 235.43 us 235.43 us 235.43 us 1 TRUNCATE 0.00 125.07 us 56.80 us 193.33 us 2 GETXATTR 0.00 135.86 us 62.13 us 209.59 us 2 INODELK 0.00 197.88 us 155.39 us 253.90 us 4 FXATTROP 0.00 450.59 us 394.28 us 506.89 us 2 XATTROP 0.00 56.96 us 19.06 us 406.59 us 23 FLUSH 37.81 273648.93 us 48.43 us 6017657.05 us 44 LOOKUP 62.18 4951.86 us 93.80 us 1143154.75 us 3999 WRITE postgresql benchmark performance changed from ~1130 TPS to ~2300TPS randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS fixes bz#1635980 Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Batch writes in same lock even when multiple fds are openPranith Kumar K2018-10-051-9/+2
| | | | | | | | | | | | | | | | | | | | | | Problem: When eager-lock is disabled because of multiple-fds opened and app writes come on conflicting regions, the number of locks grows very fast leading to all the CPU being spent just in locking and unlocking by traversing huge queues in locks xlator for granting locks. Fix: Reduce the number of locks in transit by bundling the writes in the same lock and disable delayed piggy-pack when we learn that multiple fds are open on the file. This will reduce the size of queues in the locks xlator. This also reduces the number of network calls like inodelk/fxattrop. Please note that this problem can still happen if eager-lock is disabled as the writes will not be bundled in the same lock. fixes bz#1635979 Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Delegate name-heal when possiblePranith Kumar K2018-09-212-27/+85
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: When name-self-heal is triggered on the mount, it blocks lookup until name-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs name heal and all of them trigger heals waiting for other clients to complete heal. Fix: When a name-heal is needed but quorum number of names have the file and pending xattrs exist on the parent, then better to delegate the heal to SHD which will be completed as part of entry-heal of the parent directory. We could also do the same for quorum-number of names not present but we don't have any known use-case where this is a frequent occurrence so not changing that part at the moment. When there is a gfid mismatch or missing gfid it is important to complete the heal so that next rename doesn't assume everything is fine and perform a rename etc fixes bz#1625575 Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/afr: Delegate metadata heal with pending xattrs to SHDPranith Kumar K2018-09-213-38/+47
| | | | | | | | | | | | | | | | | | | Problem: When metadata-self-heal is triggered on the mount, it blocks lookup until metadata-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs metadata heal and all of them trigger heals waiting for other clients to complete heal. Fix: Only when the heal is needed but the pending xattrs are not set, trigger metadata heal that could block lookup. This is the only case where different clients may give different metadata to the clients without heals, which should be avoided. Updates bz#1625575 Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht: Delete MDS internal xattr from dict in dht_getxattr_cbkMohit Agrawal2018-08-161-0/+4
| | | | | | | | | | | | | | | | | | Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr Backport of: > BUG: 1584098 > Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1611116 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* afr: switch lk_owner only when pre-op succeedsRavishankar N2018-07-231-10/+10
| | | | | | | | | | | | | | | | | | | | Problem: In a disk full scenario, we take a failure path in afr_transaction_perform_fop() and go to unlock phase. But we change the lk-owner before that, causing unlock to fail. When mount issues another fop that takes locks on that file, it hangs. Fix: Change lk-owner only when we are about to perform the fop phase. Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop. Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate to zero will fail pre-op phase with ENOSPC when the user is actually trying to freee up space. Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353 fixes: bz#1603056 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit ec0d7d77de3e4bd485a4fa2e53c9137e25c71ce7)
* cluster/afr: Prevent execution of code after call_count decrementingPranith Kumar K2018-07-101-7/+8
| | | | | | | | | | | | | | | Problem: When call_count is decremented by one thread, another thread can go ahead with the operation leading to undefined behavior for the thread executing statements after decrementing call count. Fix: Do the operations necessary before decrementing call count. fixes bz#1599629 Change-Id: Icc90cd92ac16e5fbdfe534d9f0a61312943393fe Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit 03f1f5bdc46076178f1afdf8e2a76c5b973fe11f)
* afr: heal gfids when file is not present on all bricksRavishankar N2018-07-095-12/+51
| | | | | | | | | | | | commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression wherein if a file is present in only 1 brick of replica *and* doesn't have a gfid associated with it, it doesn't get healed upon the next lookup from the client. Fix it. Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599 fixes: bz#1597117 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit eb472d82a083883335bc494b87ea175ac43471ff)
* cluster/afr: Make sure lk-owner is assigned at the time of lockPranith Kumar K2018-07-041-2/+1
| | | | | | | | | | | | | | | Problem: In the new eager-lock implementation lk-owner is assigned after the 'local' is added to the eager-lock list, so there exists a possibility of lock being sent even before lk-owner is assigned. Fix: Make sure to assign lk-owner before adding local to eager-lock list fixes bz#1598193 Change-Id: I26d1b7bcf3e8b22531f1dc0b952cae2d92889ef2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit c6f93e422855f656d3a86461a8458f37ad0103eb)
* afr: don't update readables if inode refresh failed on all childrenRavishankar N2018-07-024-32/+50
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: If inode refresh failed on all children of afr due to ENOENT (say file migrated by dht), it resets the readables to zero. Any inflight txn which then later comes on the inode fails with EIO because no readable children present for the inode. Fix: Don't update readables when inode refresh fails on *all* children of afr. In that way any inflight txns will either proceed with its own inode refresh if needed and fail it with the right errno or use the old value of readables and continue with the txn. Also, add quorum checks to the beginning of afr_transaction(). Otherwise, we seem to be winding the lock and checking for quorum only in pre-op pahse. Note: This should ideally fix BZ 1329505 since the stop gap fix for it is has been reverted at https://review.gluster.org/#/c/20028. Change-Id: Ia638c092d8d12dc27afb3cdad133394845061319 updates: bz#1597116 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 0f13eed0c1fa74cefed486538b02e0c8a8708456)
* cluster/dht: Fix rebalance log msgN Balachandran2018-05-311-2/+2
| | | | | | | | | | Corrected the name of the xattr and fixed the code to log an error only if op_errno is not ENODATA or ENOATTR. Change-Id: I42c5b1d838eec586ac7bed2471eb1d27ff09a9ea fixes: bz#1583769 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: fix bug-1363721.t failureRavishankar N2018-05-253-0/+55
| | | | | | | | | | | | | | | | | | | | | | | Problem: In the .t, when the only good brick was brought down, writes on the fd were still succeeding on the bad bricks. The inflight split-brain check was marking the write as failure but since the write succeeded on all the bad bricks, afr_txn_nothing_failed() was set to true and we were unwinding writev with success to DHT and then catching the failure in post-op in the background. Fix: Don't wind the FOP phase if the write_subvol (which is populated with readable subvols obtained in pre-op cbk) does not have at least 1 good brick which was up when the transaction started. Note: This fix is not related to brick muliplexing. I ran the .t 10 times with this fix and brick-mux enabled without any failures. Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85 updates: bz#1581548 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 985a1d15db910e012ddc1dcdc2e333cc28a9968b)
* cluster/ec: Fix pre-op xattrop managementXavi Hernandez2018-05-253-32/+66
| | | | | | | | | | | | | | | | | | | | | | | Multiple pre-op xattrop can be simultaneously being processed. On the cbk it was checked if the fop was waiting for some specific data (like size and version) and, if so, it was assumed that this answer should contain that data. This is not true, since a fop can be waiting for some data, but it may come from the xattrop of another fop. This patch differentiates between needing some information and providing it. This is related to parallel writes. Disabling them fixed the problem, but also prevented concurrent reads. A change has been made so that disabling parallel writes still allows parallel reads. Backport of: > BUG: 1578325 Fixes: bz#1582056 Change-Id: I74772ad6b80b7b37805da93d5ec3ae099e96b041 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/dht: Remove EIO from dht_inode_missingN Balachandran2018-05-222-4/+2
| | | | | | | | | | | Removed EIO from the list of errnos that triggered a migrate check task. (cherry picked from commit c925962b91c67c8cd2391df7dd0251e0cbf66648) Change-Id: I7f89c7a16056421588f1af2377cebe6affddcb47 fixes: bz#1579674 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Debug logs in dht_readdir(p)_cbkN Balachandran2018-05-221-2/+27
| | | | | | | | | | Additional log messages to help debug issues with file listings. (cherry picked from commit d3e3b11d38b927cf849d2d7a20460650963fd438) Change-Id: Iccd07498ba01d597c0c40f026f4177dd06d7e901 fixes: bz#1579736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dht: Excessive 'dict is null' logs in dht_discover_completeMohit Agrawal2018-05-221-1/+2
| | | | | | | | | | | | | Problem: In Geo-Rep setup excessive "dict is null" logs in dht_discover_complete while xattr is NULL Solution: To avoid the logs update a condition in dht_discover_complete BUG: 1580215 Change-Id: Ic7aad712d9b6d69b85b76e4fdf2881adb0512237 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* dht: Avoid dict log flooding for internal MDS xattrMohit Agrawal2018-05-221-1/+1
| | | | | | | | | | | | | | Problem: Before populate MDS internal xattr first dht checks if MDS is present in xattr or not.If xattr dictionary is NULL dict_get log the message either dict or key is NULL Solution: Before call dict_get check xattr, if it is NULL then no need to call dict_get. BUG: 1579757 Change-Id: I81604ec5945b85eba14b42f4583d06ec713028f4 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Revert "gfapi: return pre/post attributes from glfs_pread/pwrite"ShyamsundarR2018-05-082-6/+5
| | | | | | | | | | | | | | | | | | This reverts commit d01f7244e9d9f7e3ef84e0ba7b48ef1b1b09d809. This is being reverted as the API signatures should adapt to a statx like structure, and also all APIs that need to return pre/post attrs are not complete. As a result, instead of fixing up part of the APIs and then refixing the same in a later release, removing these set of fixes from the branch Additionally fixed up posix-entry-ops.c which was using the new syncop signature Updates: bz#1575386 Change-Id: I35222dadc4a2e97010bc1e6b97b6f83583c311f6
* Revert "gfapi: return pre/post attributes from glfs_fsync/fdatasync"ShyamsundarR2018-05-081-1/+1
| | | | | | | | | | | | | | | This reverts commit 09943beb499617212f2985ca8ea9ecd1ed1b470e. This is being reverted as the API signatures should adapt to a statx like structure, and also all APIs that need to return pre/post attrs are not complete. As a result, instead of fixing up part of the APIs and then refixing the same in a later release, removing these set of fixes from the branch. Updates: bz#1575386 Change-Id: I3e0803c114dc6b9126d8a90f43812bca501e6338
* Revert "gfapi: return pre/post attributes from glfs_ftruncate"ShyamsundarR2018-05-081-7/+4
| | | | | | | | | | | | | | | | | | This reverts commit 248152767b0599986bbb6bb35fc27197f6be6964. This is being reverted as the API signatures should adapt to a statx like structure, and also all APIs that need to return pre/post attrs are not complete. As a result, instead of fixing up part of the APIs and then refixing the same in a later release, removing these set of fixes from the branch. Additionally fixed up cloudsync.c code that was using the new syncop signature. Updates: bz#1575386 Change-Id: Idb59d20666c0d7b0c83e7fdc31dd68b8c7db9550
* afr: Add lease() fopPoornima G2018-05-053-0/+157
| | | | | | | | Change-Id: Ied047dd5ee44e9d5a5d3db214826f7df30332ef9 updates: #350 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* cluster/dht: unwind if dht_selfheal_dir_mkdir returns an errorRaghavendra G2018-05-031-1/+5
| | | | | | | | | | If dht_selfheal_dir_mkdir returns an error, cbk passed to dht_selfheal_directory is not invoked. So, Current codepath leaves an unwound frame resulting in a hung fop forever. Change-Id: I422308b8a34a074301ca46b029ffe676f5e0f66c fixes: bz#1574305 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE errorSusant Palai2018-04-301-1/+8
| | | | | | | | | | | Problem: A directory deletion can happen just before gf_defrag_settle_hash which internally does a setxattr operation on a directory. Solution: Ignore ENOENT and ESTALE errors Fixes: bz#1572581 Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/afr: shd changes for thin arbiterkarthik-us2018-04-301-0/+184
| | | | | | | Updates #352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr: initial changes for thin arbiterRavishankar N2018-04-306-8/+229
| | | | | | | | | 1. Create thin arbiter index file during mount. 2. Set pending marker in thin arbiter id file in case of failure. Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82 updates: #352 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* libglusterfs: Capture the dict response in syncop_xattrop_cbkkarthik-us2018-04-271-1/+1
| | | | | | | | | | | | | | | Problem: Currently it is not possible to capture the xattrs values which are set on the bricks by calling syncop_(f)xattrop, because the response dict is not being assigned to any of the dictionaries. Fix: In the xattrop callback capture the response dict and send it back to the caller if it is requested. Change-Id: I9de9bcd97d6008091c9b060bcca3676cb9ae8ef9 fixes: bz#1572076 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/afr: Keep child-up until ping-eventPranith Kumar K2018-04-253-25/+40
| | | | | | | | | | | | | | | | | | | | | Problem: If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas as '1'. In this case, brick-A comes online and then ping-latency will be updated for it. When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with worst latency to mark it down. Since Brick-B just came online it always had '0' latency so brick-B used to be marked offline and Brick-B would eventually be the one to be online even when brick-A is more suited. Fix: Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until ping-latency is found as the just-up brick. Also keep ping-latency as -1 until child-up during initialization. BUG: 1567881 fixes bz#1567881 Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Fix dht_rename lock orderN Balachandran2018-04-231-18/+47
| | | | | | | | | | Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Need heal-timeout to be configured as low as 5 secondsPranith Kumar K2018-04-201-1/+1
| | | | | | | | | | | In Halo replication, there are pending heals more often than not. It makes sense to give users the capability to configure it as low as 5 seconds. BUG: 1569489 fixes bz#1569489 Change-Id: I451c1975827f66398b903f659c981ef3121d5376 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* fuse: do fd_resolve in fuse_getattr if fd is receivedSusant Palai2018-04-181-2/+2
| | | | | | | | | | | | | | | | | | | | problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/afr: Make sure latency-arg is passed to afrPranith Kumar K2018-04-181-0/+2
| | | | | | | | | | | xlator_notify doesn't pass the extra arguments that come in the input function, so XLATOR_NOTIFY macro should be used instead to pass the extra arguments to the function. BUG: 1567881 fixes bz#1567881 Change-Id: Ic15b6c446638cbacf3149693147a754219037c47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: fixes to afr-eager lockingRavishankar N2018-04-181-0/+2
| | | | | | | | | | | | | 1. If pre-op fails on all bricks,set lock->release to true in afr_handle_lock_acquire_failure so that the GF_ASSERT in afr_unlock() does not crash. 2. Added a missing 'return' after handling pre-op failure in afr_transaction_perform_fop(), fixing a use-after-free issue. Change-Id: If0627a9124cb5d6405037cab3f17f8325eed2d83 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: Handle file migrations when brick downN Balachandran2018-04-131-5/+51
| | | | | | | | | | | | | | | The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Wind open to all subvolsN Balachandran2018-04-111-10/+5
| | | | | | | | | | dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: act as passthrough for renames on single child DHTRaghavendra G2018-04-101-7/+15
| | | | | | | | | | Various synchronization present in dht_rename while handling directories and files is necessary only if we have more than only one child. Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247 fixes: bz#1563511 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Turn ON the stripe-cache option by defaultAshish Pandey2018-04-061-1/+1
| | | | | | Change-Id: I0a290396c30c635b13ee73004d20259efb76a954 fixes: bz#1563945 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* afr: add quorum checks in pre-opRavishankar N2018-04-051-33/+31
| | | | | | | | | | | | | | | | | Problem: We seem to be winding the FOP if pre-op did not succeed on quorum bricks and then failing the FOP with EROFS since the fop did not meet quorum. This essentially masks the actual error due to which pre-op failed. (See BZ). Fix: Skip FOP phase if pre-op quorum is not met and go to post-op. Fixes: 1561129 Change-Id: Ie58a41e8fa1ad79aa06093706e96db8eef61b6d9 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: enable lookup-optimize by defaultN Balachandran2018-04-042-2/+4
| | | | | | | | | | | | | | Lookup-optimize has been shown to improve create performance. The code has been in the project for several years and is considered stable. Enabling this by default in order to test this in the upstream regression runs. Change-Id: Iab792979ee34f0af4713931e0b5b399c23f65313 updates: bz#1557435 BUG: 1557435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Prevent ping-event handling on shdPranith Kumar K2018-04-031-0/+2
| | | | | | | | | On shd, we shouldn't treat any brick down based on latency, otherwise self-heal will never happen fixes: bz#1562717 Change-Id: Ica07fcc4fae91a6bfd9c9a670e2be464704d94b7 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Update dht option levelsN Balachandran2018-04-021-2/+16
| | | | | | | | | Set the levels for DHT options based on https://review.gluster.org/#/c/19466/ Change-Id: I51b31a706a0b9517404e83224c89de145fd5d7e1 updates: #430 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Update layout in inode only on successN Balachandran2018-04-022-4/+24
| | | | | | | | | | | | | | | | | | | | | With lookup-optimize enabled, gf_defrag_settle_hash in rebalance sometimes flips the on-disk layout on volume root post the migration of all files in the directory. This is sometimes seen when attempting to fix the layout of a directory multiple times before calling gf_defrag_settle_hash. dht_fix_layout_of_directory generates a new layout in memory but updates it in the inode ctx before it is set on disk. The layout may be different the second time around due to dht_selfheal_layout_maximize_overlap. If the layout is then not written to the disk, the inode now contains the wrong layout. gf_defrag_settle_hash does not check the correctness of the layout in the inode before updating the commit-hash and writing it to the disk thus changing the layout of the directory. Change-Id: Ie1407d92982518f2a0c40ec70ad370b34a87b4d4 updates: bz#1557435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: add new value for read-hash-mode volume optionRavishankar N2018-03-296-32/+119
| | | | | | | | | | Updates: #363 This new value (3) will try to wind read requests to the child of AFR having the least amount of pending requests in its queue. Change-Id: If6bda2aac9bf7aec3fc39622f78659313c4b6508 Signed-off-by: Ravishankar N <ravishankar@redhat.com>