summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Use parallel dir scan functionalityPranith Kumar K2016-04-044-13/+59
| | | | | | | | | | | BUG: 1221737 Change-Id: I0ed71a72f0e33bd733723e00a01cf28378c5534e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13755 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: Don't lookup/forget inodesPranith Kumar K2016-03-314-32/+5
| | | | | | | | | | | | | | | | | | | | | | | | Problem: All inodes that are looked-up are always forgotten without fail in afr removing the benefits of them being in lru. This same code can cause crashes if between inode_lookup, inode_forget in afr if the top xlator does inode_forget(0). Fix: Don't use lookup/forget in afr. No benefits are there at the moment for keeping this code. It is impossible to prevent top xlators to do inode_forget(0). Found similar instances in ec and removed them even though those code paths are not going to be executed in any place other than heal-daemon. BUG: 1321554 Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13834 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com>
* afr: add mtime based split-brain resolution to CLIRavishankar N2016-03-291-10/+71
| | | | | | | | | | | | | | | | | | | Extended the CLI to include support for split-brain resolution based on mtime. The command syntax is: $:gluster volume heal <VOLNAME> split-brain latest-mtime <FILE> where <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file. Change-Id: I7a16f72ff1a4495aa69f43f22758a9404e958b4f BUG: 1321322 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13828 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Fix read-child selection in entry create fopPranith Kumar K2016-03-282-3/+46
| | | | | | | | | | | | | | | | | | When an entry is being created the inode is yet to be linked so args must be filled with gfid and ia_type for it to give consistent iatt. Also handle Dht sending fops on inode not yet linked. BUG: 1302948 Change-Id: I6969cacb437cad02f66716f3bf8ec004ffe7c691 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13827 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr : Enable auto heal when replica count increasesAnuradha Talur2016-03-215-82/+134
| | | | | | | | | | | | | | | | | | | | | | | This patch is part two change to prevent data loss in a replicate volume on doing a add-brick operation. Problem: After doing add-brick, there is a chance that self heal might happen from the newly added brick rather than the source brick, leading to data loss. Solution: Mark pending changelogs on afr children for the new afr-child so that heal is performed in the correct direction. Change-Id: I11871e55eef3593aec874f92214a2d97da229b17 BUG: 1276203 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12454 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Choose local child as source if possiblePranith Kumar K2016-03-119-31/+57
| | | | | | | | | | | | | | | | | It is better to choose local brick as source if possible to prevent over the wire read thus saving on bandwidth. Also changed code to not attempt data-heal if 'source' is selected as arbiter. Change-Id: I9a328d0198422280b13a30ab99545370a301dfea BUG: 1314150 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13585 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Krutika Dhananjay <kdhananj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* afr: misc performance improvementsRavishankar N2016-03-082-29/+56
| | | | | | | | | | | | | | | | | | | | | 1. In afr_getxattr_cbk, consider the errno value before blindly launching an inode refresh and a subsequent retry on other children. 2. We want to accuse small files only when we know for sure that there is no IO happening on that inode. Otherwise, the ia_sizes obtained in the post-inode-refresh replies may mismatch due to a race between inode-refresh and ongoing writes, causing spurious heal launches. Change-Id: Ife180f4fa5e584808c1077aacdc2423897675d33 BUG: 1309462 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13595 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: do not set arbiter as a readable subvol in inode contextRavishankar N2016-03-041-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If afr_lookup_done() or afr_read_subvol_select_by_policy() chooses the arbiter brick to serve the stat() data, file size will be reported as zero from the mount, despite other data bricks being available. This can break programs like tar which use the stat info to decide how much to read. Fix: In the inode-context, mark arbiter as a non-readable subvol for both data and metadata. It it to be noted that by making this fix, we are *not* going to serve metadata FOPS anymore from the arbiter brick despite the brick storing the metadata. It makes sense to do this because the ever increasing over-loaded FOPs (getxattr returning stat data etc.) and compound FOPS in gluster will otherwise make it difficult to add checks in code to handle corner cases. Change-Id: Ic60b25d77fd05e0897481b7fcb3716d4f2101001 BUG: 1310171 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Mat Clayton <mat@mixcloud.com> Reviewed-on: http://review.gluster.org/13539 Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: Don't delete gfid-req from lookup requestPranith Kumar K2016-03-022-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key. Dht uses same dict to send lookup to other subvolumes. So in case of directories and more than 1 dht subvolumes, second subvolume till the last subvolume won't get a lookup request with "gfid-req". So gfid reset never happens on the directories in distributed replicate subvolume for 2nd till last subvolumes. Fix: Make a copy of lookup xattr request. Also fixed replies_wipe possibly resetting gfid to NULL gfid BUG: 1312816 Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13545 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* afr: Add throttled background client-side healsRavishankar N2016-03-016-58/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70 BUG: 1297172 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13207 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* all: fixes for clang compile warningsKaleb S KEITHLEY2016-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cli/src/cli-cmd-parser.c (chenk) cli/src/cli-xml-output.c (spandit) cli/src/cli.c (chenk) libglusterfs/src/common-utils.c (vmallika) libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1) rpc/rpc-transport/socket/src/socket.c (?) xlators/cluster/afr/src/afr-transaction.c (?) xlators/cluster/dht/src/dht-common.h (srangana +2) xlators/cluster/dht/src/dht-selfheal.c (srangana +2) xlators/debug/io-stats/src/io-stats.c (R. Wareing) xlators/features/barrier/src/barrier.c (vshastry) xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1) xlators/features/shard/src/shard.c (kdhananj +1) xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri) xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu) xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu) xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit) xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu) xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu) xlators/protocol/client/src/client-messages.h (mselvaga +1) xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar) xlators/storage/bd/src/bd.c (M. Mohan Kumar) xlators/storage/posix/src/posix.c (nbalacha +1) Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd BUG: 1293133 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13031 CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cli/ afr: op_ret for index heal launchRavishankar N2016-02-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If index heal is launched when some of the bricks are down, glustershd of that node sends a -1 op_ret to glusterd which eventually propagates it to the CLI. Also, glusterd sometimes sends an err_str and sometimes not (depending on the failure happening in the brick-op phase or commit-op phase). So the message that gets displayed varies in each case: "Launching heal operation to perform index self heal on volume testvol has been unsuccessful" (OR) "Commit failed on <host>. Please check log file for details." Fix: 1. Modify afr_xl_op() to return -1 even if index healing of atleast one brick fails. 2. Ignore glusterd's error string in gf_cli_heal_volume_cbk and print a more meaningful message. The patch also fixes a bug in glusterfs_handle_translator_op() where if we encounter an error in notify of one xlator, we break out of the loop instead of sending the notify to other xlators. Change-Id: I957f6c4b4d0a45453ffd5488e425cab5a3e0acca BUG: 1302291 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13303 Reviewed-by: Anuradha Talur <atalur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: add seek() FOPNiels de Vos2016-02-052-0/+83
| | | | | | | | | | | | | | seek() is like a read(), copied the same semantics. Change-Id: I100b741d9bfacb799df318bb081f2497c0664927 BUG: 1220173 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11483 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Fix heal-info slow response while IO is in progressKrutika Dhananjay2016-02-033-19/+39
| | | | | | | | | | | | | | | | | | Now heal-info does an open() on the file being examined so that the client at some point sees open-fd count being > 1 and releases the eager-lock so that heal-info doesn't remain blocked forever until IO completes. Change-Id: Icc478098e2bc7234408728b54d8185102b3540dc BUG: 1297695 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/13326 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Move remaining gf_logs to gf_msgsPranith Kumar K2016-01-273-14/+19
| | | | | | | | | | | | Change-Id: I48d9e5313bd3ccf9fe26c90a7051a8a174d75c49 BUG: 1296818 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13195 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr : Check if dict is valid in afr_replies_interpret()Ravishankar N2016-01-211-1/+2
| | | | | | | | | | | | | | | | | posix_mkdir does not send response xdata. So even though replies are valid, the response xdata dict is NULL. Check if dict is non-null in afr_replies_interpret before doing dict_get Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: If543d68d8bfd2433519105839d5be106076cc276 BUG: 1294053 Reviewed-on: http://review.gluster.org/13185 Tested-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: skip healing data blocks for arbiterRavishankar N2016-01-181-9/+49
| | | | | | | | | | | | | | | | | | | 1 ....but still do other parts of data-self-heal like restoring the time and undo pending xattrs. 2. Perform undo_pending inside inodelks. 3. If arbiter is the only sink, do these other parts of data-self-heal inside a single lock-unlock sequence. Change-Id: I64c9d5b594375f852bfb73dee02c66a9a67a7176 BUG: 1286017 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12777 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Fix excessive logging in afr_accuse_smallfiles()Ravishankar N2015-12-281-1/+2
| | | | | | | | | | | | | Commit 2b7226f9d3470d8fe4c98c1fddb06e7f641e364d did not check for the validity of a dict before doing a dict_get. Fix that. Change-Id: Ie21f4da19256b17196f242cd8fd5bb76b0a69c1e BUG: 1294053 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13077 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S KEITHLEY2015-12-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | Revisiting http://review.gluster.org/#/c/11814/, which unintentionally introduced warnings from libtool about the xlator .so names. According to [1], the -module option must appear in the Makefile.am file(s); if -module is defined in a macro, e.g. in configure(.ac), then libtool will not recognize that this is a module and will emit a warning. [1] http://www.gnu.org/software/automake/manual/automake.html#Libtool-Modules Change-Id: Ifa5f9327d18d139597791c305aa10cc4410fb078 BUG: 1248669 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13003 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* afr: warn if pending xattrs missing during init()Ravishankar N2015-12-212-1/+17
| | | | | | | | | | | | | | | | | | Since commit 6e635284a4411b816d4d860a28262c9e6dc4bd6a (glusterfs-3.7.7), the afr pending xattrs are stored in the volfile and used by afr when it initializes. If a cluster is upgraded, prevent afr from loading until the op-version has been bumped up to 3.7.7 and the volfiles have been regenerated using a volume set command. Without this fix, AFR will crash when initialzing. Change-Id: I14249dedb3f2f77cd754d78d8a9a70fdc5fc8c10 BUG: 1293293 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/13038 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: handle bad objects during lookup/inode_refreshRavishankar N2015-12-201-1/+10
| | | | | | | | | | | | | | | | | If an object (file) is marked bad by bitrot, do not consider the brick on which the object is present as a potential read subvolume for AFR irrespective of the pending xattr values. Also do not consider the brick containing the bad object while performing afr_accuse_smallfiles(). Otherwise if the bad object's size is bigger, we may end up considering that as the source. Change-Id: I4abc68e51e5c43c5adfa56e1c00b46db22c88cf7 BUG: 1290965 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12955 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: refresh inode using fstatRavishankar N2015-12-201-26/+91
| | | | | | | | | | | | | | | For fd based operations (fgetxattr, readv etc.) if an inode refresh is required, do so using fstat instead of lookup. This is because the file might have been deleted by another client before refresh but posix mandates that FOPS using already open fds must still succeed. Change-Id: Id5f71c3af4892b648eb747f363dffe6208e7ac09 BUG: 1285230 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12894 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: During name heal, propagate EIO only on gfid or type mismatchKrutika Dhananjay2015-12-161-5/+13
| | | | | | | | | | | | | | | | When the disk associated with a brick returns EIO during lookup, chances are that name heal would return an EIO because one of the syncop_XXX() operations as part of it returned an EIO. This is inherently treated by afr_lookup_selfheal_wrap() as a split-brain and the lookup is aborted prematurely with EIO even if it succeeded on the other replica(s). Change-Id: Ib9b7f2974bff8e206897bb4f689f0482264c61e5 BUG: 1291701 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12973 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd/afr: store afr pending xattrs as a volume optionRavishankar N2015-12-151-9/+23
| | | | | | | | | | | | | | | | | | | | | | | Problem: When AFR xlator initialises, it uses the name of the client xlators below it for storing the pending changelogs (xattrs). This can be problem when some other xlator is loaded in between AFR and the client. Though that is a trivial 'traverse-graph-till-the-client-and-use-the-name' fix in AFR's init(), there are other issues like when there's no client xlator at all when, say, AFR is moved to the server side. Fix: The client xlator names are currenly unique and stored as brickinfo->brick_ids. So persist these ids as comma separated values in AFR's volume_options and use them as xattr values during init(). Change-Id: Ie761ffeb3373a4c4d85ad05c84a768c4188aa90d BUG: 1285152 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12738 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* heal : Changed heal info to process all indices directoriesAnuradha Talur2015-12-027-58/+102
| | | | | | | | | | Change-Id: Ida863844e14309b6526c1b8434273fbf05c410d2 BUG: 1250803 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12658 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : Readdirp performance enhancementAnuradha Talur2015-11-305-82/+162
| | | | | | | | | | | | | | | | | | | | Things done : 1) during lookup and inode_refresh as part of read_txn, request is sent to detect if heal is required or not. 2) If heal is required, be conservative in setting the readdirp entry inodes to NULL, otherwise don't be. 3) Self-heal-daemon now crawls both indices/xattrop and indices/dirty directory while healing. Change-Id: Ic4a4da63fb7e0726eab5f341a200859b29cf7eb7 BUG: 1250803 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12507 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: change data self-heal size check for arbiterPranith Kumar K2015-11-261-0/+4
| | | | | | | | | | | | | Size mismatch should consider that arbiter brick will have zero size file to prevent data self-heal to spuriously trigger/assuming need of self-heals. Change-Id: I179775d604236b9c8abfa360657abbb36abae829 BUG: 1285634 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12755 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Remember flags sent by create fopPranith Kumar K2015-11-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: In create fop, afr doesn't remember the flags. When afr has to perform fixing of the fd that needs to be opened on other bricks because the brick was down at the time of create, the flags with which it needs to send open are not present. Fix: Remember the flags in the fd_ctx. Thanks to Nitya for showing us the problem in re-open with the flags. Change-Id: I8ce1eb50c35fc0722cfc25cb4b6d234ef56180e5 BUG: 1285173 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12739 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: Drop compatibility lock for data self-healRavishankar N2015-11-181-18/+0
| | | | | | | | | | | | | | | | | | | In glusterfs 3.4 and older, AFR did not take locks in self-heal domain during data self-heal. So this compat lock in data domain was added to prevent older clients from trying to heal a file while an existing self-heal was going on by a newer client. But the side effect was that all appending writes (which take full locks in data domain) from mounts would be stalled until self-heal was complete. Since glusterfs 3.4 is not supported anymore, remove the compat lock. Change-Id: I31c8e4d7f3364f769a14eec295154e3c40d9f78e BUG: 1283032 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12602 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Mark self-heal fops as internalPranith Kumar K2015-11-184-8/+8
| | | | | | | | | | Change-Id: I8ae7af266d3e00460f0cfdc9389a926e5f2fee36 BUG: 1282761 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* afr: replica pair going offline does not require CHILD_MODIFIED eventSakshi Bansal2015-11-161-1/+1
| | | | | | | | | | | | | | | | | As a part of CHILD_MODIFIED event DHT forgets the current layout and performs fresh lookup. However this is not required when a replica pair goes offline as the xattrs can be read from other replica pairs. Hence setting different event to handle replica pair going down. Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3 BUG: 1281230 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12573 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr : Remove stale indicesAshish Pandey2015-11-161-1/+22
| | | | | | | | | Change-Id: Iba23338a452b49dc9fe6ae7b4ca108ebc377fe42 BUG: 1270668 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/12336 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: disable self-heal lock compatibility for arbiter volumesPranith Kumar K2015-10-291-8/+12
| | | | | | | | | | | | | | | | | | | | Problem: afrv2 takes locks from infinity-2 to infinity-1 to be compatible with <=3.5.x clients. For arbiter volumes this leads to problems as the I/O takes full file locks. Solution: Don't be compatible with <=3.5.x clients on arbiter volumes as arbiter volumes are introduced in 3.7 Change-Id: I48d6aab2000cab29c0c4acbf0ad356a3fa9e7bab BUG: 1275247 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12426 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: write zeros to sink for non-sparse filesRavishankar N2015-10-282-16/+43
| | | | | | | | | | | | | | | | | Problem: If a file is created with zeroes ('dd', 'fallocate' etc.) when a brick is down, the self-heal does not write the zeroes to the sink after it comes up. Consequenty, there is a mismatch in disk-usage amongst the bricks of the replica. Fix: If we definitely know that the file is not sparse, then write the zeroes to the sink even if the checksums match. Change-Id: Ic739b3da5dbf47d99801c0e1743bb13aeb3af864 BUG: 1272460 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12371 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: wind writes only on subvols where preop succeededRavishankar N2015-10-261-2/+4
| | | | | | | | | | | | | | | | | 1. Call local->transaction.wind() only on subvols where pre-op succeeded. 2. Update op_errno in afr_changelog_cbk call path. This fixes a bug in commit 7945121dda340ec8f25711b2ad3ca70b544de967 where we return EUCLEAN to the application if pre-op fails on all bricks. Change-Id: Iab8776e49a992e7a255314bba542742f7607f3ec BUG: 1272362 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12415 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libglusterfs: replace default functions with generated versionsJeff Darcy2015-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replacing repetitive code like this with code generated from a more compact "canonical" definition carries several advantages. * Ease the process of adding new fops (e.g. GF_FOP_IPC). * Ease the process of making global changes to existing fops (e.g. adding "xdata"). * Ensure strict consistency between all of the pieces that must be compatible with each other, through both kinds of changes. What we have right now is just a start. The above benefits will only truly be realized when we use the same definitions to generate stubs, syncops, and perhaps even parts of gfapi or glupy. This same infrastructure can also be used to reduce code duplication and potential for error in many of our translators. NSR already uses a similar technique, using a few hundred lines of templates to generate a few *thousand* lines of code. The ability to make a global "aspect" change (e.g. to quorum checking) in one place instead of seventy has already been demonstrated there. Other candidates for code generation include the AFR/EC transaction infrastructure, or stub creation/resumption in io-threads. Change-Id: If7d59de7a088848b557f5aea00741b4fe19017c1 BUG: 1271325 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/9411 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: do not wind write if pre-op fails on all childrenRavishankar N2015-10-203-13/+7
| | | | | | | | | | | | | | | | | | 1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails, transaction.failed_subvols[i] is set. If if fails on all chidren, we can directly proceed to unlock (via afr_changelog_post_op_now) without trying to wind the write, fail and then go to unlock. 2. 'fop_subvols' seems to be an unused variable, hence removing it. Change-Id: I9525628daf48082e979b0093fa0478934495e61f BUG: 1272362 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12368 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com>
* cluster/afr: Handle stack reset failuresPranith Kumar K2015-10-072-0/+8
| | | | | | | | | | | | | | | When all the bricks go down in the middle of the self-heal, in AFR_STACK_RESET afr_local_init will fail because all the bricks are down. So local will remain NULL for the frame. This leads to crashes as this failure is not handled in both entry and data self-heals. Change-Id: I71a02f161f2c4dbfdc8bb7f2a6f32807191ed253 BUG: 1269470 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12309 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S. KEITHLEY2015-09-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been lucky that we haven't had any symbol collisions until now. Now we have a collision between the snapview-client's svc_lookup() and libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi. As a short term solution all the snapview-client's FOP methods were changed to static scope. See http://review.gluster.org/11805. This works in snapview-client because all the FOP methods are defined in a single source file. This solution doesn't work for other xlators with FOP methods defined in multiple source files. To address this we link with libtool's '-export-symbols $symbol-file' (a wrapper around `ld --version-script ...` --- on linux anyway) and only export the minimum required symbols from the xlator sharedlib. N.B. the libtool man page says that the symbol file should be named foo.sym, thus the rename of *.exports to *.sym. While foo.exports worked, we will follow the documentation. Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 1248669 Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c Reviewed-on: http://review.gluster.org/11814 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* afr: perform replace-brick in a synctaskRavishankar N2015-09-154-14/+73
| | | | | | | | | | | | | | | | | | | | | | Problem: replace-brick setxattr is not performed inside a synctask. This can lead to hangs if the setxattr is executed by epoll thread, as the epoll thread will be waiting for replies to come where as epoll thread is the thread that needs to epoll_ctl for reading from socket and listen. Fix: Move replace-brick to synctask to prevent epoll thread hang. This patch is in line with the fix performed in http://review.gluster.org/#/c/12163/ Change-Id: I6a71038bb7819f9e98f7098b18a6cee34805868f BUG: 1262345 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12169 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr : get split-brain-status in a synctaskAnuradha Talur2015-09-146-22/+103
| | | | | | | | | | | | | | | On executing `getfattr -n replica.split-brain-status <file>` on mount, there is a possibility that the mount hangs. To avoid this hang, fetch the split-brain-status of a file in synctask. Change-Id: I87b781419ffc63248f915325b845e3233143d385 BUG: 1262345 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12163 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* afr: Do not wind the full writev payload to arbiter brickRavishankar N2015-09-071-0/+30
| | | | | | | | | | | | | | | ...because the arbiter xlator just unwinds it without passing it down till posix anyway. Instead, send a one-byte vector so that afr write transaction works as expected. Change-Id: I52913ca51dfee0c8472cbadb62c5d39b7badef77 BUG: 1259572 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12095 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com>
* afr: Unset dirty xattr after setting pending xattr during post-opRavishankar N2015-09-021-13/+13
| | | | | | | | | | | | | | | | | | | | In AFR transaction, in the pre-op, the dirty xattr is set. In the post-op, if the transaction fails on one of the bricks, then on the healthy brick, the dirty xattr is unset and then the pending xattr (for the brick that went down) is set in that order. If the brick crashes after unsetting the dirty xattr, we have lost information about a pending heal. Hence we need to reverse the order, i.e. set pending xattr first followed by unsetting the dirty. Change-Id: I0b8a872cb4579a1bad602f70c76f09691bd582b2 BUG: 1258801 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12078 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-014-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/afr: Make [f]xattrop metadata transactionPranith Kumar K2015-08-304-183/+234
| | | | | | | | | | | | | | | | | | | Problem: When xlators above afr do [f]xattrop when one of the bricks is down, after the brick comes backup, the metadata is not healed because [f]xattrop is not considered a transaction. Fix: Treat [f]xattrop as transaction so that changes done by xlators above afr are marked for heal when some of the bricks were down at the time of [f]xattrop. Change-Id: Iea180f9a456509847c3cd8d5d59a0cdc2712d334 BUG: 1248887 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr : Examine data/metadata readable for read-subvolAnuradha Talur2015-08-252-23/+70
| | | | | | | | | | | | | | | | | | During lookup and discover, currently read_subvol is based only on data_readable. read_subvol should be decided based on both data_readable and metadata_readable. Credits to Ravishankar N for the logic of afr_first_up_child from http://review.gluster.org/10905/ . Change-Id: I98580b23c278172ee2902be08eeaafb6722e830c BUG: 1240244 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11551 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: modify afr_txn_nothing_failed()Ravishankar N2015-08-251-12/+3
| | | | | | | | | | | | | | | | | | In an AFR transaction, we need to consider something as failed only if the failure (either in the pre-op or the FOP phase) occurs on the bricks on which a transaction lock was obtained. Without this, we would end up considering the transaction as failure even on the bricks on which the lock was not obtained, resulting in unnecessary fsyncs during the post-op phase of every write transaction for non-appending writes. Change-Id: Iee79e5d85dc7b4c41459d8bdd04a8454bdaf9a9d BUG: 1250170 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11827 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: launch index heal on local subvols up on a child-up eventRavishankar N2015-08-211-17/+11
| | | | | | | | | | | | | | | | | | | | | | | | Problem: When a replica's child goes down and comes up, the index heal is triggered only on the child that just came up. This does not serve the intended purpose as the list of files that need to be healed to this child is actually captured on the other child of the replica. Fix: Launch index-heal on all local children of the replica xlator which just received a child up. Note that afr_selfheal_childup() eventually calls afr_shd_index_healer() which will not run the heal on non-local children. Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: Ia23e47d197f983c695ec0bcd283e74931119ee55 BUG: 1253309 Reviewed-on: http://review.gluster.org/11912 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Do not wind statfs to arbiter brickRavishankar N2015-08-073-5/+8
| | | | | | | | | | | | | | | | | | Problem: AFR serves statfs from the brick having the least free space available. Since the size to be allocated to the arbiter brick in a 3 way replica is supposed to be considerably lesser than the other 2 bricks, statfs will be served from this brick which is incorrect. Fix: Don't serve statfs from the arbiter brick. Change-Id: I5af098b9c50626f52cf3d7dbb060bf754c797f05 BUG: 1251346 Reported-by: Fredrik Brandt <fredrikb@denlillaplaneten.se> Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11857 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr: Fix incorrect logging in read transactionsKrutika Dhananjay2015-07-261-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | afr_read_txn_refresh_done() at its entry point can fail for reasons like ENOENT/ESTALE but seldom due to EIO, which is something _AFR_ would internally generate and not receive in response from a child translator. AFR is reporting "split-brain" for _any_ kind of failure in read txn, of the following kind: [2015-07-07 18:04:34.787612] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error] This patch fixes such misleading errors. To-Do: Avoid logging EIO if/when split-brain choice is set. Will do that as part of a separate commit. Change-Id: Ib513c75168f7026118ad5b3f0b35e9dd498cfe1e BUG: 1246052 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11756 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>