summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-244-5/+27
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Handle race between unlock-timer, new lockPranith Kumar K2015-07-233-50/+17
| | | | | | | | | | | | | | | | | | | | | | | | Problem: New lock could come at the time timer is on the way to unlock. This was leading to crash in timer thread because thread executing new lock can free up the timer_link->fop and then timer thread will try to access structures already freed. Fix: If the timer event is fired, set lock->release to true and wait for unlock to complete. Thanks to Xavi and Bhaskar for helping in confirming that this race is the RC. Thanks to Kritika for pointing out and explaining how Avati's patch can be used to fix this bug. Change-Id: I45fa5470bbc1f03b5f3d133e26d1e0ab24303378 BUG: 1243187 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11670 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht: send lookup even for fd based operations during rebalanceRavishankar N2015-07-221-23/+30
| | | | | | | | | | | | | | | | | | | | | Problem: dht_rebalance_inprogress_task() was not sending lookups to the destination subvolume for a file undergoing writes during rebalance. Due to this, afr was not able to populate the read_subvol and failed the write with EIO. Fix: Send lookup for fd based operations as well. Thanks to Raghavendra G for helping with the RCA. Change-Id: I638c203abfaa45b29aa5902ffd76e692a8212a19 BUG: 1244165 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11713 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Use xattrop (as opposed to setxattr) for updates to size xattrKrutika Dhananjay2015-07-151-2/+2
| | | | | | | | | | Change-Id: Icd8984976812bb47ae7129426f6c1aa9393b3ab9 BUG: 1232391 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11467 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Propogate correct errno in case of failuresPranith Kumar K2015-07-141-1/+4
| | | | | | | | | | | | | | | | | | - Also remove internal-fop setting in create/mknod etc xattrs. Rebalance was failing because ec was giving EIO when lock acquiring fails as the file/dir doesn't exist. Posix_create/mknod are not setting config xattr because internal-fop key is present in dict and setxattr for this fails leading to failure in setting rest of xattrs. Change-Id: Ifb429c8db9df7cd51e4f8ce53fdf1e1b975c9993 BUG: 1242254 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11639 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: wind readlink on good subvol(s)Pranith Kumar K2015-07-146-48/+90
| | | | | | | | | | BUG: 1232172 Change-Id: I3a56e487840d86147dd85bf5fbe79b165eae289f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11589 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Prevent data corruptionsPranith Kumar K2015-07-143-14/+32
| | | | | | | | | | | | | | - On lock reuse preserve 'healing' bits - Don't set ctx->size outside locks in healing code - Allow xattrop internal fops also on the fop->mask. Change-Id: I6b76da5d7ebe367d8f3552cbf9fd18e556f2a171 BUG: 1232678 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11640 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* ec : Implement check for the cluster.heal-timeout valuesAshish Pandey2015-07-141-0/+10
| | | | | | | | | | | | | | | | | | | for disperse volume. Problem : User can set wrong values for cluster.heal-timeout using cli. Any value like string, negative numbers or 0 could be set. Solution : Check the value given by user. It should be numerical, non negative and within the range. Change-Id: I5184ef1a11bb2c225f42ac9ccb1ba680a86cfe09 BUG: 1239037 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11573 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/tier: fixes for migration over ec as cold tierDan Lambright2015-07-132-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | An opendir is done in rebalance. The graph constructed when EC is used in tiering may have no local volumes (if all the hot volumes are on one node and all the others on another node). Previously the opendir only sent fops down the local subvolumes for migration. They must be sent down both the hot and cold subvolumes for tiering. When setxattr2() received a NULL subvolume; this dereferenced an uninitialized variable. When a lookup is done during creation of the destination file, the xattr dict is "polluted" with virtual xattrs. These cause subsequent xattrs in the new file to not be written by posix. They are required by EC. The inode gfid for "entry_loc" in gf_defrag_migrate_single_file() was not initialized. This made underlying translators think the gfid was 0, and failed migration. Change-Id: I6ccda8ca8e43485b9b354341bbfcb302496f632c BUG: 1236212 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11433 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* syncop: Include iatt to 'syncop_link' argsSoumya Koduri2015-07-102-2/+2
| | | | | | | | | | | | | | Include iatt to 'syncop_link' args to fetch proper attributes of the newly linked inode. Signed-off-by: Soumya Koduri <skoduri@redhat.com> Change-Id: If6b92961bd7a89add3791ed3a9b494087348b492 BUG: 1241788 Reviewed-on: http://review.gluster.org/11611 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/ec: Don't read from bricks that are healingPranith Kumar K2015-07-091-1/+1
| | | | | | | | | | BUG: 1232678 Change-Id: I35503039e4723cf7f33d6797f0ba90dd0aca130b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11580 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Remove locks in opendirPranith Kumar K2015-07-081-21/+1
| | | | | | | | | | | | | With readdir[p] taking locks to figure out which bricks are good/bad, no need to take any locks on opendir. BUG: 1232172 Change-Id: I4d924aeeaecab23af08c4598548a20d2a44cd849 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11506 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Fix use after free bugPranith Kumar K2015-07-072-0/+9
| | | | | | | | | | | | | | | In ec_lock() there is a chance that ec_resume is called on fop even before ec_sleep. This can result in refs == 0 for fop leading to use after free in this function when it calls ec_sleep so do ec_sleep at start and ec_resume at end of this function. Change-Id: I879b2667bf71eaa56be1b53b5bdc91b7bb56c650 BUG: 1240284 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11558 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Don't read from bad subvolsPranith Kumar K2015-07-061-18/+23
| | | | | | | | | | Change-Id: Ic22813371faca4e8198c9b0b20518e68d275f3c1 BUG: 1232678 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11531 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Remove failed subvols from source/sink computationPranith Kumar K2015-07-061-1/+6
| | | | | | | | | | Change-Id: Ib0de34c86ee25de361ec821d4015296c514742e0 BUG: 1240210 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11546 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Make background healing optional behaviorPranith Kumar K2015-07-063-11/+55
| | | | | | | | | | | Provide options to control number of active background heal count and qlen. Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11473 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr : expunge first, impunge next in entry selfhealAnuradha Talur2015-07-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When entry self-heals are performed, the files/directories that are to be expunged should be removed first and then impunge should be done. Consider the following scenario : A volume with 2 bricks : b0 and b1. 1) With following hierarchy on both bricks: olddir |__ oldfile 2) Bring down b1 and do 'mv olddir newdir'. 3) Bring up b1 and self-heal. 4) Without patch, during self-heal the events occur in following order, a) Creation of newdir on the sink brick. Notice that gfid of olddir and newdir are same. As a result of which gfid-link file in .glusterfs directory still points to olddir and not to newdir. b) Deletion of olddir on the sink brick. As a part of this deletion, the gfid link file is also deleted. Now, there is no link file pointing to newdir. 5) Files under newdir will not get listed as part of readdir. To tackle this kind of scenario, an expunge should be done first and impunge later; which is the purpose of this patch. Change-Id: Idc8546f652adf11a13784ff989077cf79986bbd5 BUG: 1238508 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11498 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add throttling in background healingPranith Kumar K2015-07-016-5/+114
| | | | | | | | | | | | | | - 8 parallel heals can happen. - 128 heals will wait for their turn - Heals will be rejected if 128 heals are already waiting. Change-Id: I2e99bf064db7bce71838ed9901a59ffd565ac390 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11471 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Remove dead codePranith Kumar K2015-07-011-1376/+2
| | | | | | | | | | Change-Id: I99d7a038f29cebe823e17a8dda40d335441185bc BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11472 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/dht: use refcount to manage memory used to store migrationRaghavendra G2015-07-012-21/+46
| | | | | | | | | | | | | | | | information. Without refcounting, we might free up memory while other fops are still accessing it. BUG: 1235927 Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11418 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: Porting messages to new logging frameworkarao2015-06-2716-473/+748
| | | | | | | | | | | | | updated Change-Id: I94ac7b2cb0d43a82cf0eeee21407cff9b575c458 BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com> Reviewed-on: http://review.gluster.org/9897 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* quota: Fix statfs values in EC when quota_deem_statfs is enabledvmallika2015-06-271-4/+12
| | | | | | | | | | | | | | When quota_deem_statfs is enabled, quota sends aggregated statfs values In EC we should not multiply statfs values with fragment number Change-Id: I7ef8ea1598d84b86ba5c5941a2bbe0a6ab43c101 BUG: 1233162 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11315 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/afr : truncate all sinks filesAnuradha2015-06-261-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : During data self-heal of sparse files, sparseness of files is lost. Cause : Earlier, only files with larger ia_size in sinks were being truncated to ia_size of source. This caused checksum mismatch of sparse blocks when ia_size of files in sinks were lesser than ia_size of source file. Leading to unnecessary healing of sparse blocks. As a result of which sparseness of files was lost. Solution : truncate files in all the sinks irrespective of their size with respect to the source file. After this change, checksum won't mismatch for sparse blocks and heal won't be triggered. As a result, sparseness of the files will be preserved. Other fixes in this patch : 1) in afr_does_size_mismatch(), check for mismatch only in sources. Previously, the check was being done for all children in a replica. 2) in __afr_selfheal_data_checksums_match(), check checksum mismatch only for children with valid responses. Change-Id: Ifcdb1cdc9b16c4a8a7867aecf9fa94b66e5301c2 BUG: 1232238 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11252 Reviewed-by: Prasanna Kumar Kalever Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: Block fops when file is in split-brainRavishankar N2015-06-264-14/+76
| | | | | | | | | | | | | For directories, block metadata FOPS. For non-directories, block data and metadata FOPS. Do not block entry FOPS. Change-Id: Id7f656f4a513b9d33c457dd7f2d58028dbef8e61 BUG: 1235007 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11371 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/tier: stop tier migration after graph switchDan Lambright2015-06-261-0/+16
| | | | | | | | | | | | | | | | | On a graph switch, a new xlator and private structures are created. The tier migration daemon must stop using the old xlator and private structures and begin using the new ones. Otherwise, when RPCs arrive (such as counter queries from glusterd), the new xlator will be consulted but it will not have up to date information. The fix detects a graph switch and exits the daemon in this case. Typical graph switches for the tier case would be turning off performance translators. Change-Id: Ibfbd4720dc82ea179b77c81b8f534abced21e3c8 BUG: 1226005 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11372
* tier/ctr: Ignore creation of T file and Ctr Lookup heal improvememntsJoseph Fernandes2015-06-261-3/+2
| | | | | | | | | | | | | | | | | | 1) Ignore creation of T file in ctr_mknod 2) Ignore lookup for T file in ctr_lookup 3) Ctr_lookup: a. If the gfid and pgfid in empty dont record b. Decreased log level for multiple heal attempts c. Inode/File heal happens after an expiry period, which is configurable. d. Hardlink heal happens after an expiry period, which is configurable. Change-Id: Id8eb5092e78beaec22d05f5283645081619e2452 BUG: 1235269 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* ec: Porting messages to new logging frameworkNandaja Varma2015-06-2615-463/+1323
| | | | | | | | | Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c BUG: 1194640 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/10465 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* EC : While Healing a file, set the config xattrAshish Pandey2015-06-261-1/+14
| | | | | | | | | | | | | | Problem : trusted.ec.config attr was missing for the healed file Solution: Writing trusted.ec.config while healing a file. Change-Id: I340dd45ff8ab5bc1cd6e9b0cd2b2ded236e5acf0 BUG: 1235246 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11407 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: wind fops on good subvols for access/readdir[p]Pranith Kumar K2015-06-267-137/+203
| | | | | | | | | Change-Id: I1e629a6adc803c4b7164a5a7a81ee5cb1d0e139c BUG: 1232172 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11246 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* tiering/rebalance: tier daemon stopped with out updating statusMohammed Rafi KC2015-06-251-3/+0
| | | | | | | | | | | | | | | | When a subvol goes down, tier daemon stopped immediately, and the status shows as "Progressing". With this change, with respect to tier xlator, when a subvol goes offline it will update the status as failed. Change-Id: I9f722ed0d35cda8c7fc1a7e75af52222e2d0fdb7 BUG: 1227803 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11068 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/afr : set pending xattrs for replaced brickAnuradha2015-06-255-10/+250
| | | | | | | | | | | | | | | | | | | | | | This patch is part two change to prevent data loss in a replicate volume on doing a replace-brick commit force operation. Problem: After doing replace-brick commit force, there is a chance that self heal might happen from the replaced (sink) brick rather than the source brick leading to data loss. Solution: Mark pending changelogs on afr children for the replaced afr-child so that heal is performed in the correct direction. Change-Id: Icb9807e49b4c1c4f1dcab115318d9a58ccf95675 BUG: 1207829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10448 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Pick gfid from poststat during fresh lookup for read child ↵Krutika Dhananjay2015-06-246-41/+68
| | | | | | | | | | | calculation Change-Id: I12c1e4f67f4ec4affbe13d7daf871044a8a2a12e BUG: 1235216 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11373 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht: Adding log messages to the new logging frameworkarao2015-06-2315-342/+975
| | | | | | | | | | | | | Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/10021 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Do not handle GF_CONTENT_KEYPranith Kumar K2015-06-231-85/+7
| | | | | | | | | | | | | | | | | GF_CONTENT_KEY aggregation requires that the fragments on the bricks belong to same data i.e. no operations are modifying the content while lookup is performed on it. The only way to know it is to get at least ec->fragments+1 number of responses and see that two different sets of ec->fragments number of fragments give same data. But at the moment we feel that this slows down ec-lookup. So removing handling of this for now. Change-Id: I2da5087f1311d5cdde999062607b143b48c17713 BUG: 1226279 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11003 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* ec: Display correct message after successful heal startAshish Pandey2015-06-221-1/+1
| | | | | | | | | | | | | | | | Problem : While launching heal, it shows heal launch was unsuccessful. However, internaly it was successfully launched. Solution : Don't reset op_ret to -1 in for loop for every brick. Change-Id: Iff89fdaf6082767ed67523a56430a9e83e6984d3 BUG: 1203089 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11267 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: complete conservative merge even in case of gfid split-brain.Ravishankar N2015-06-221-2/+18
| | | | | | | | | | | | | | | | | | | | Problem: While performing conservative merge, we bail out of the merge if we encounter a file with mismatching gfid or type. What this means is all entries that come after the mismatching file (during the merge) never get healed, no matter how many index heals are done. Fix: Continue with the merging of rest of the entries even if a gfid/type mismatch is found, but ensure that post-op does not happen on the parent dir in such a case. Change-Id: I9bbfccc8906007daa53a0750ddd401dcf83943f8 BUG: 1180545 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9429 Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* dht : Error value check before performing rebalance completeSakshi2015-06-191-5/+13
| | | | | | | | | | | | | Change-Id: I7a0cd288d16f27b887c7820162efdbe99a039d95 BUG: 1188242 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11097 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Avoid parallel executions of the same state machineXavier Hernandez2015-06-181-11/+13
| | | | | | | | | | | | | | In very rare circumstances it was possible that a subfop started by another fop could finish fast enough to cause that two or more instances of the same state machine be executing at the same time. Change-Id: I319924a18bd3f88115e751a66f8f4560435e0e0e BUG: 1233258 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11317 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Fix Null pointer dereference while loggingRaghavendra G2015-06-181-8/+8
| | | | | | | | | Change-Id: I1ea358b83267b0bcdf654ce18fe881fd4a6bf08d BUG: 1233139 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11313 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/dht: Prevent use after free bugPranith Kumar K2015-06-171-1/+3
| | | | | | | | | Change-Id: I2d1f5bb2dd27f6cea52c059b4ff08ca0fa63b140 BUG: 1231425 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11209 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Prevent Null dereference in dht-renamePranith Kumar K2015-06-121-1/+1
| | | | | | | | | | Change-Id: I3059f3b577f550c92fb77c6b6b44defd0584cd2e BUG: 1230647 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11178 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tier/dht: Fixing non atomic promotion/demotion w.r.t to frequency periodJoseph Fernandes2015-06-111-40/+59
| | | | | | | | | | | | | | | | | | | | | This fixes the ping-pong issue i.e files getting demoted immediately after promition, caused by off-sync promotion/demotion processes. The solution is do promotion/demotion refering to the system time. To have the fix working all the file serving nodes should have thier system time synchronized with each other either manually or using a NTP Server. NOTE: The ping-pong issue can re-appear even with this fix, if the admin have different promotion freq period and demotion freq period, but this would be under the control of the admin. Change-Id: I1b33a5881d0cac143662ddb48e5b7b653aeb1271 BUG: 1218717 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11110 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: account for reordered layoutsDan Lambright2015-06-112-14/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a tiered volume the cold subvolume is always at a fixed position in the graph. DHT's layout array, on the other hand, may have the cold subvolume in either the first or second index, therefore code cannot make any assumptions. The fix searches the layout for the correct position dynamically rather than statically. The bug manifested itself in NFS, in which a newly attached subvolume had not received an existing directory. This case is a "stale entry" and marked as such in the layout for that directory. The code did not see this, because it looked at the wrong index in the layout array. The fix also adds the check for decomissioned bricks, and fixes a problem in detach tier related to starting the rebalance process: we never received the right defrag command and it did not get directed to the tier translator. Change-Id: I77cdf9fbb0a777640c98003188565a79be9d0b56 BUG: 1214289 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Joseph Fernandes <josferna@redhat.com> Reviewed-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11092
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-113-23/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230007 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11137 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Wind unlock fops at all costPranith Kumar K2015-06-102-8/+45
| | | | | | | | | | | | | | | | | Problem: While files are being created if more than redundancy number of bricks go down, then unlock for these fops do not go to the bricks. This will lead to stale locks leading to hangs. Fix: Wind unlock fops at all costs. Change-Id: I50a87e8b4d6d2dde5bf7405b82e3aeecd95ad00e BUG: 1220348 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* tier/volume set: Validate volume set option for tierMohammed Rafi KC2015-06-101-0/+6
| | | | | | | | | | | | | | | | Volume set option related to tier volume can only be set for tier volume, also currently all volume set i for tier option accepts a non-negative integer. This patch validate both condition. Change-Id: I3611af048ff4ab193544058cace8db205ea92336 BUG: 1216960 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10751 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Joseph Fernandes
* cluster/ec: Prevent double unwindPranith Kumar K2015-06-084-13/+12
| | | | | | | | | | | | | | | | | | | | | | Problem: 1) ec_access/ec_readlink_/ec_readdir[p] _cbks are trying to recover only from ENOTCONN. 2) When the fop succeeds it unwinds right away. But when its ec_fop_manager resumes, if the number of bricks that are up is less than ec->fragments, the the state machine will resume with -EC_STATE_REPORT which unwinds again. This will lead to crashes. Fix: - If fop fails retry on other subvols, as ESTALE/ENOENT/EBADFD etc are also recoverable. - unwind success/failure in _cbks Change-Id: I2cac3c2f9669a4e6160f1ff4abc39f0299303222 BUG: 1228952 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11111 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not attempt entry self-heal if the last lookup on entry ↵Krutika Dhananjay2015-06-081-2/+9
| | | | | | | | | | | | | | | | | | | failed on src Test bug-948686.t was causing shd to dump core due to gfid being NULL. This was due to the volume being stopped while index heal's in progress, causing afr_selfheal_unlocked_lookup_on() to fail sometimes on the src brick with ENOTCONN. And when afr_selfheal_newentry_mark() copies the gfid off the src iatt, it essentially copies null gfid. This was causing the assertion as part of xattrop in protocol/client to fail. Change-Id: I237a0d6b1849e4c48d7645a2cc16d9bc1441ef95 BUG: 1229172 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11119 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Changing log level to DEBUG in case of ENOENTAshish Pandey2015-06-081-2/+3
| | | | | | | | | Change-Id: I264e47ca679d8b57cd8c80306c07514e826f92d8 BUG: 1193388 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10784 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Don't handle EC_XATTR_DIRTY in responsePranith Kumar K2015-06-062-27/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: ec_update_size_version expects all the keys it did xattrop with to come in response so that it can set the values again in ec_update_size_version_done. But EC_XATTR_DIRTY is not combined so the value won't be present in the response. So ctx->post/pre_dirty are not updated in ec_update_size_version_done. So these values are still non-zero. When ec_unlock_now is called as part of flush's unlock phase it again tries to perform same xattrop for EC_XATTR_DIRTY. But ec_update_size_version is not expected to be called in unlock phase of flush because ec_flush_size_version should have reset everything to zero and unlock is never invoked from ec_update_size_version_done for flush/fsync/fsyncdir. This leads to stale lock which leads to hang. Fix: EC_XATTR_DIRTY is removed in ex_xattrop_cbk and is never combined with other answers. So remove handling of this in the response. Change-Id: If0ea3efec3235a6e312465d8838585fbe752c7ea BUG: 1227654 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11078 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>