glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	cluster/ec: Don't read from bad subvols	Pranith Kumar K	2015-07-06	1	-18/+23
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ic22813371faca4e8198c9b0b20518e68d275f3c1 BUG: 1232678 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11531 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: Remove failed subvols from source/sink computation	Pranith Kumar K	2015-07-06	1	-1/+6
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ib0de34c86ee25de361ec821d4015296c514742e0 BUG: 1240210 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11546 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: Make background healing optional behavior	Pranith Kumar K	2015-07-06	3	-11/+55
\| \| \| \| \| \| \| \| \| \| \|	Provide options to control number of active background heal count and qlen. Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11473 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: Add throttling in background healing	Pranith Kumar K	2015-07-01	6	-5/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- 8 parallel heals can happen. - 128 heals will wait for their turn - Heals will be rejected if 128 heals are already waiting. Change-Id: I2e99bf064db7bce71838ed9901a59ffd565ac390 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11471 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	cluster/ec: Remove dead code	Pranith Kumar K	2015-07-01	1	-1376/+2
\| \| \| \| \| \| \| \| \| \|	Change-Id: I99d7a038f29cebe823e17a8dda40d335441185bc BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11472 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	quota: Fix statfs values in EC when quota_deem_statfs is enabled	vmallika	2015-06-27	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When quota_deem_statfs is enabled, quota sends aggregated statfs values In EC we should not multiply statfs values with fragment number Change-Id: I7ef8ea1598d84b86ba5c5941a2bbe0a6ab43c101 BUG: 1233162 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11315 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
*	ec: Porting messages to new logging framework	Nandaja Varma	2015-06-26	15	-463/+1323
\| \| \| \| \| \| \| \| \|	Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c BUG: 1194640 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/10465 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	EC : While Healing a file, set the config xattr	Ashish Pandey	2015-06-26	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : trusted.ec.config attr was missing for the healed file Solution: Writing trusted.ec.config while healing a file. Change-Id: I340dd45ff8ab5bc1cd6e9b0cd2b2ded236e5acf0 BUG: 1235246 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11407 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	cluster/ec: wind fops on good subvols for access/readdir[p]	Pranith Kumar K	2015-06-26	7	-137/+203
\| \| \| \| \| \| \| \| \|	Change-Id: I1e629a6adc803c4b7164a5a7a81ee5cb1d0e139c BUG: 1232172 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11246 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Do not handle GF_CONTENT_KEY	Pranith Kumar K	2015-06-23	1	-85/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GF_CONTENT_KEY aggregation requires that the fragments on the bricks belong to same data i.e. no operations are modifying the content while lookup is performed on it. The only way to know it is to get at least ec->fragments+1 number of responses and see that two different sets of ec->fragments number of fragments give same data. But at the moment we feel that this slows down ec-lookup. So removing handling of this for now. Change-Id: I2da5087f1311d5cdde999062607b143b48c17713 BUG: 1226279 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11003 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	ec: Display correct message after successful heal start	Ashish Pandey	2015-06-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : While launching heal, it shows heal launch was unsuccessful. However, internaly it was successfully launched. Solution : Don't reset op_ret to -1 in for loop for every brick. Change-Id: Iff89fdaf6082767ed67523a56430a9e83e6984d3 BUG: 1203089 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11267 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
*	cluster/ec: Avoid parallel executions of the same state machine	Xavier Hernandez	2015-06-18	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In very rare circumstances it was possible that a subfop started by another fop could finish fast enough to cause that two or more instances of the same state machine be executing at the same time. Change-Id: I319924a18bd3f88115e751a66f8f4560435e0e0e BUG: 1233258 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11317 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Prevent Null dereference in dht-rename	Pranith Kumar K	2015-06-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Change-Id: I3059f3b577f550c92fb77c6b6b44defd0584cd2e BUG: 1230647 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11178 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: Wind unlock fops at all cost	Pranith Kumar K	2015-06-10	2	-8/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: While files are being created if more than redundancy number of bricks go down, then unlock for these fops do not go to the bricks. This will lead to stale locks leading to hangs. Fix: Wind unlock fops at all costs. Change-Id: I50a87e8b4d6d2dde5bf7405b82e3aeecd95ad00e BUG: 1220348 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Prevent double unwind	Pranith Kumar K	2015-06-08	4	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 1) ec_access/ec_readlink_/ec_readdir[p] _cbks are trying to recover only from ENOTCONN. 2) When the fop succeeds it unwinds right away. But when its ec_fop_manager resumes, if the number of bricks that are up is less than ec->fragments, the the state machine will resume with -EC_STATE_REPORT which unwinds again. This will lead to crashes. Fix: - If fop fails retry on other subvols, as ESTALE/ENOENT/EBADFD etc are also recoverable. - unwind success/failure in _cbks Change-Id: I2cac3c2f9669a4e6160f1ff4abc39f0299303222 BUG: 1228952 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11111 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	Changing log level to DEBUG in case of ENOENT	Ashish Pandey	2015-06-08	1	-2/+3
\| \| \| \| \| \| \| \| \|	Change-Id: I264e47ca679d8b57cd8c80306c07514e826f92d8 BUG: 1193388 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10784 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Don't handle EC_XATTR_DIRTY in response	Pranith Kumar K	2015-06-06	2	-27/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ec_update_size_version expects all the keys it did xattrop with to come in response so that it can set the values again in ec_update_size_version_done. But EC_XATTR_DIRTY is not combined so the value won't be present in the response. So ctx->post/pre_dirty are not updated in ec_update_size_version_done. So these values are still non-zero. When ec_unlock_now is called as part of flush's unlock phase it again tries to perform same xattrop for EC_XATTR_DIRTY. But ec_update_size_version is not expected to be called in unlock phase of flush because ec_flush_size_version should have reset everything to zero and unlock is never invoked from ec_update_size_version_done for flush/fsync/fsyncdir. This leads to stale lock which leads to hang. Fix: EC_XATTR_DIRTY is removed in ex_xattrop_cbk and is never combined with other answers. So remove handling of this in the response. Change-Id: If0ea3efec3235a6e312465d8838585fbe752c7ea BUG: 1227654 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11078 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: Fix incorrect check for iatt differences	Xavier Hernandez	2015-06-02	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A previous patch (http://review.gluster.org/10974) introduced a bug that caused that some metadata differences could not be detected in some circumstances. This could cause that self-heal is not triggered and the file not repaired. We also need to consider all differences for lookup requests, even if there isn't any lock. Special handling of differences in lookup is already done in lookup specific code. Change-Id: I3766b0f412b3201ae8a04664349578713572edc6 BUG: 1225793 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11018 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Ignore differences in non locked inodes	Xavier Hernandez	2015-05-30	5	-28/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ec combines iatt structures from multiple bricks, it checks for equality in important fields. This is ok for iatt related to inodes involved in the operation that have been locked before starting execution. However some fops return iatt information from other inodes. For example a rename locks source and destination parent directories, but it also returns an iatt from the entry itself. In these cases we ignore differences in some fields to avoid false detection of inconsistencies and trigger unnecessary self-heals. Another issue is solved in this patch that caused that the real size of the file stored into the inode context was lost during self-heal. Change-Id: I8b8eca30b2a6c39c7b9bbd3b3b6ba95228fcc041 BUG: 1225793 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10974 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System
*	cluster/dht: Fix dht_setxattr to follow files under migration	Nithya Balachandran	2015-05-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a file is under migration, any xattrs created on it are lost post migration of the file. This is because the xattrs are set only on the cached subvol of the source and as the source is under migration, it becomes a linkto file post migration. Change-Id: Ib8e233b519cf954e7723c6e26b38fa8f9b8c85c0 BUG: 1193636 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10212 Tested-by: NetBSD Build System Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec: Forced unlock when lock contention is detected	Xavier Hernandez	2015-05-27	13	-999/+1139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC uses an eager lock mechanism to optimize multiple read/write requests on the same entry or inode. This increases performance but can have adverse results when other clients try to access the same entry/inode. To solve this, this patch adds a functionality to detect when this happens and force an earlier release to not block other clients. The method consists on requesting GF_GLUSTERFS_INODELK_COUNT and GF_GLUSTERFS_ENTRYLK_COUNT for all fops that take a lock. When this count is greater than one, the lock is marked to be released. All fops already waiting for this lock will be executed normally before releasing the lock, but new requests that also require it will be blocked and restarted after the lock has been released and reacquired again. Another problem was that some operations did correctly lock the parent of an entry when needed, but got the size and version xattrs from the entry instead of the parent. This patch solves this problem by binding all queries of size and version to each lock and replacing all entrylk calls by inodelk ones to remove concurrent updates on directory metadata. This also allows rename to correctly update source and destination directories. Change-Id: I2df0b22bc6f407d49f3cbf0733b0720015bacfbd BUG: 1165041 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10852 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Fix use after free crash	Pranith Kumar K	2015-05-21	4	-23/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ec_heal creates ec_fop_data but doesn't run ec_manager. ec_fop_data_allocate adds this fop to ec->pending_fops, because ec_manager is not run on this heal fop it is never removed from ec->pending_fops. When it is accessed after free it leads to crash. It is better to not to add HEAL fops to ec->pending_fops because we don't want graph switch to hang the mount because of a BIG file/directory heal. BUG: 1188145 Change-Id: I8abdc92f06e0563192300ca4abca3909efcca9c3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10868 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
*	cluster/ec: Correctly cleanup delayed locks	Xavier Hernandez	2015-05-20	7	-65/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a delayed lock is pending, a graph switch doesn't correctly terminate it. This means that the update of version and size xattrs is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN event to correctly finish pending udpdates before completing the graph switch. Change-Id: I394f3b8d41df8d83cdd36636aeb62330f30a66d5 BUG: 1188145 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10787 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Handle lookup failures while op in progress	Pranith Kumar K	2015-05-19	2	-12/+23
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ia1834ec23d5de615526d4d4e4d2e32aff155b7f7 BUG: 1211962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10806 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Prevent unnecessary self-heals	Pranith Kumar K	2015-05-15	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a blocking lock is requested, lock request is succeeded even when ec->fragment number of locks are acquired successfully in non-blocking locking phase. This will lead to fop succeeding only on the bricks where the locks are acquired, leading to the necessity of self-heals. To prevent these un-necessary self-heals, if the remaining locks fail with EAGAIN in non-blocking lock phase try blocking locking phase instead. Change-Id: I940969e39acc620ccde2a876546cea77f7e130b6 BUG: 1221145 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10770 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	ec: Fix failures with missing files	Xavier Hernandez	2015-05-09	5	-283/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a file does not exist on a brick but it does on others, there could be problems trying to access it because there was some loc_t structures with null 'pargfid' but 'name' was set. This forced inode resolution based on <pargfid>/name instead of <gfid> which would be the correct one. To solve this problem, 'name' is always set to NULL when 'pargfid' is not present. Another problem was caused by an incorrect management of errors while doing incremental locking. The only allowed error during an incremental locking was ENOTCONN, but missing files on a brick can be returned as ESTALE. This caused an EIO on the operation. This patch doesn't care of errors during an incremental locking. At the end of the operation it will check if there are enough successfully locked bricks to continue or not. Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a BUG: 1176062 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9407 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Change meaning of trusted.ec.dirty	Pranith Kumar K	2015-05-08	7	-208/+555
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- With this change, the xattr will represent if the file needs to be healed or not. It will have different values for data/entry and metadata changes. - inode ref leaks and dict_set_dynstr related leaks fixed - Added support for trylock/lock based on heal-cmd execution or not in data heal. - Made fixes to pass regression runs Change-Id: I9d8def4c2badde18a76b7898816fecfac113737a BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10385 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: data heal implementation for ec	Pranith Kumar K	2015-05-08	3	-8/+788
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Data self-heal: 1) Take inode lock in domain 'this->name:self-heal' on 0-0 range (full file), So that no other processes try to do self-heal at the same time. 2) Take inode lock in domain 'this->name' on 0-0 range (full file), 3) perform fxattrop+fstat and get the xattrs on all the bricks 3) Choose the brick with ec->fragment number of same version as source 4) Truncate sinks 5) Unlock lock taken in 2) 5) For each block take full file lock, Read from sources write to the sinks, Unlock 6) Take full file lock and see if the file is still sane copy i.e. File didn't become unusable while the bricks are offline. Update mtime to before healing 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 6) 9) unlock lock acquired in 1) Change-Id: I6f4d42cd5423c767262c9d7bb5ca7767adb3e5fd BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10384 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: metadata/name/entry heal implementation for ec	Pranith Kumar K	2015-05-08	2	-0/+1058
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Metadata self-heal: 1) Take inode lock in domain 'this->name' on 0-0 range (full file) 2) perform lookup and get the xattrs on all the bricks 3) Choose the brick with highest version as source 4) Setattr uid/gid/permissions 5) removexattr stale xattrs 6) Setxattr existing/new xattrs 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 1) Entry self-heal: 1) take directory lock in domain 'this->name:self-heal' on 'NULL' to prevent more than one self-heal 2) we take directory lock in domain 'this->name' on 'NULL' 3) Perform lookup on version, dirty and remember the values 4) unlock lock acquired in 2) 5) readdir on all the bricks and trigger name heals 6) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 7) unlock lock acquired in 1) Name heal: 1) Take 'name' lock in 'this->name' on 'NULL' 2) Perform lookup on 'name' and get stat and xattr structures 3) Build gfid_db where for each gfid we know what subvolumes/bricks have a file with 'name' 4) Delete all the stale files i.e. the file does not exist on more than ec->redundancy number of bricks 5) On all the subvolumes/bricks with missing entry create 'name' with same type,gfid,permissions etc. 6) Unlock lock acquired in 1) Known limitation: At the moment with present design, it conservatively preserves the 'name' in case it can not decide whether to delete it. this can happen in the following scenario: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on one of the bricks (Lets say B) 3) Now name self-heal on d1 and d2 would re-create the file on both d1 and d2 resulting in d1/f1 and d2/f2. Because we wanted to prevent data loss in the case above, the following scenario is not healable, i.e. it needs manual intervention: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) We have two hard links: d1/a, d2/b and another file d3/c even before the brick went down 3) rename d3/c -> d2/b is performed 4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid will not be deleted. One could think why not delete the link if there is more than 1 hardlink, but that leads to similar data loss issue I described earlier: Scenario: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) We have two hard links: d1/a, d2/b 3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations are successful only on one of the bricks (Lets say B) 4) Now name self-heal on the 'names' above which can happen in parallel can decide to delete the file thinking it has 2 links but after all the self-heals do unlinks we are left with data loss. Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be BUG: 1213358 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10298 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: add separate versions for data/entry, metadata	Ashish Pandey	2015-05-06	8	-29/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding 64 bits in "version" key of extended attributes. First 64 bits (Left) represents Data version. Last 64 bits (right) represents Meta Data version. Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in 3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with old version files. For upgrades we need to tell users to complete heals and then upgrade BUG: 1215265 Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10312 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/ec: Fix dictionary compare function	Pranith Kumar K	2015-05-04	3	-83/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If both dicts are NULL then equal. If one of the dicts is NULL but the other has only ignorable keys then also they are equal. If both dicts are non-null then check if for each non-ignorable key, values are same or not. value_ignore function is used to skip comparing values for the keys which must be present in both the dictionaries but the value could be different. geo-rep's stime xattr doesn't need to be present in list xattr but when getxattr comes on stime xattr even if there aren't enough responses with the xattr we should still give out an answer which is maximum of the stimes available. Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d BUG: 1207712 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10078 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Use fd instead of loc for get_size_version	Ashish Pandey	2015-05-04	3	-57/+63
\| \| \| \| \| \| \| \| \|	Change-Id: Ia7d43cb3b222db34ecb0e35424f1766715ed8e6a BUG: 1188242 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10176 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
*	cluster/ec: Handle unhandled states	Pranith Kumar K	2015-04-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This was leading to hangs when get_size_and_version fails Change-Id: Iad9408c2dacc9a74594b8d2f94c95f402533b0f1 BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10390 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Perform inode-write on healing subvols	Pranith Kumar K	2015-04-24	3	-129/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the version numbers do not match, then writes are performed only on at least N-R bricks which have same version. But if we want to do healing of files which are constantly modified we need to allow writes on subvols that are undergoing heal. Data healing will mark 62nd bit while the heal is going on. When the data transaction sees that this bit is set it needs to perform the fop on that subvol irrespective of whether the versions match or do not match. Fop is considered successful only if N-R non-healing bricks succeed. Change-Id: I69a17582df397aaf6e8ca4b5e746c7ca802cbbde BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10372 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	quota/disperse: handle inode quotas in xattr aggregate	vmallika	2015-04-14	1	-13/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with the inode quota feature, quota size is now increased from 64bit to 192bits which contains values of 'file size', 'file count' and 'dir count' This change in quota size xattr needs to be handled in disperse xattr aggregation Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: I5fd28aa9f5b8b6cba83a98360236417a97ac16ee BUG: 1207967 Reviewed-on: http://review.gluster.org/10112 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec: Entry self-heal fixes	Pranith Kumar K	2015-04-13	1	-79/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Directory deletion should always happen with 'rm -rf' flag, otherwise the call may fail with ENOTEMPTY. - Instead of doing an explicit 'link' call, perform mknod call with GLUSTERFS_INTERNAL_FOP_KEY which acts as 'link' if the gfid already exists. Change-Id: I8826f92170421db37efb67dfc00afad4ab695907 BUG: 1207085 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10045 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Fix readdir de-itransform	Pranith Kumar K	2015-04-11	3	-8/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: gf_deitransform returns the glbal client-id in the complete graph. So except for the first disperse subvolume under dht, all the other disperse subvolumes will return a client-id greater than ec->nodes, so readdir will always error out in those subvolumes. Fix: Get the client subvolume whose client-id matches the client-id returned by gf_deitransform of offset. Change-Id: I26aa17504352d48d7ff14b390b62f49d7ab2d699 BUG: 1209113 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10165 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Ignore volume-mark key for comparing dicts	Pranith Kumar K	2015-04-10	1	-0/+1
\| \| \| \| \| \| \| \| \|	Change-Id: Id60107e9fb96588d24fa2f3be85c764b7f08e3d1 BUG: 1207712 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10077 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Fix iobuf mem leak	Raghavendra Talur	2015-04-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	iobuf_get and iobref_add implicitly ref the iobuf. Hence, it is necessary to unref iobuf before setting it to NULL. Change-Id: Icadd8925574cf04fe708d8090868e49356653a8e Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9818 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	libglusterfs/syncop: Add xdata to all syncop calls	Raghavendra Talur	2015-04-08	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: Have same ec_manager_* for [f]set/[f]removexattr	Pranith Kumar K	2015-04-08	1	-206/+80
\| \| \| \| \| \| \| \| \| \| \| \| \|	ec_manager_xxx() function for [f]set/[f]remove xattr is exactly same except the reporting part. So moved that to common function and use same ec_manager_xattr() function for all these fops. Change-Id: Iaa57023b800f8d1f3f6a827f4ceba9b0a0337336 BUG: 1199767 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10036 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: Refactor inode-writev	Pranith Kumar K	2015-04-06	4	-478/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	All _cbk() functions in inode-write.c do same things, i.e. store op_ret/op_errno, stat structures if they are available and combine them. Moved this common operation into one function ec_inode_write_cbk() and made all the other _cbk() functions to use this instead. Change-Id: I2387b9f2d9598ced6299a26ea1900e9deb9fadc4 BUG: 1199767 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9981 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	Avoid conflict between contrib/uuid and system uuid	Emmanuel Dreyfus	2015-04-04	5	-23/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
*	cluster/ec: Implement heal info for ec	Pranith Kumar K	2015-03-30	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \|	This also lists the files that are on-going I/O, which will be fixed later. Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327 BUG: 1203581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	cluster/ec: Use fd when appropriate for updating size/version	Pranith Kumar K	2015-03-27	1	-3/+9
\| \| \| \| \| \| \| \| \| \|	Change-Id: I5d3aca101c8cdda406d31d06c40404fa6a2b7170 BUG: 1192378 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
*	libxlator: Change marker xattr handling interface	Pranith Kumar K	2015-03-25	2	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Changed the implementation of marker xattr handling to take just a function which populates important data that is different from default 'gauge' values and subvolumes where the call needs to be wound. - Removed duplicate code I found while reading the code and moved it to cluster_marker_unwind. Removed unused structure members. - Changed dht/afr/stripe implementations to follow the new implementation - Implemented marker xattr handling for ec. Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4 BUG: 1200372 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9892 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: Refactor ec-dir-write	Pranith Kumar K	2015-03-24	2	-670/+158
\| \| \| \| \| \| \| \| \| \| \| \|	- Also fixed iatt_combine to go over all the valid iatts Change-Id: I1d52d705ed0437f602357acde3e479cedb748681 BUG: 1199767 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9827 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/dht: Change the subvolume encoding in d_off to be a "global"	Dan Lambright	2015-03-18	3	-53/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: Add self-heal-daemon command handlers	Pranith Kumar K	2015-03-09	6	-27/+707
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces the changes required in ec xlator to handle index/full heal. Index healer threads: Ec xlator start an index healer thread per local brick. This thread keeps waking up every minute to check if there are any files to be healed based on the indices kept in index directory. Whenever child_up event comes, then also this index healer thread wakes up and crawls the indices and triggers heal. When self-heal-daemon is disabled on this particular volume then the healer thread keeps waiting until it is enabled again to perform heals. Full healer threads: Ec xlator starts a full healer thread for the local subvolume provided by glusterd to perform full crawl on the directory hierarchy to perform heals. Once the crawl completes the thread exits if no more full heals are issued. Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic. Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/ec: Allow heal on name less loc	Pranith Kumar K	2015-03-05	3	-14/+39
\| \| \| \| \| \| \| \| \| \| \| \|	loc->parent may not always be populated. Even in those cases, self-heal should happen if it can be completed using nameless loc. Change-Id: I8871fc811bec8b881ae7fb09dcd202c6693b9877 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9717 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>