summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr-common.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Handle EAGAIN properly in inodelkPranith Kumar K2014-10-301-14/+150
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/8739 Problem: When one of the brick is taken down and brough back up in a replica pair, locks on that brick will be allowed. Afr returns inodelk success even when one of the bricks already has the lock taken. Fix: If any brick returns EAGAIN return failure to parent xlator. Note: This change only works for non-blocking inodelks. This patch addresses dht-synchronization which uses non-blocking locks for rename. Blocking lock is issued by only one of the rebalance processes. So for now there is no possibility of deadlock. BUG: 1151308 Change-Id: I72f15d8789442c29b5c7be2d5dabf7bae6bfa845 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8923 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/afr: Don't queue transactions during open-fd fixPranith Kumar K2013-05-081-19/+13
| | | | | | | | | | | | | | | | | | | | | Before Anonymous fds are available, afr had to queue up transactions if the file is not opened on one of its subvolumes. This happens until the attempt to open the file either succeeds or fails. These attempts happen until the file is successfully opened on the subvolume. Now client xlator uses anonymous fds to perform the fops if the fd used for the fop is not 'opened'. Fops will be successful even when the file is not opened so there is no need to queue up the transactions anymore in afr. Open is attempted on the subvolume where it is not opened independent of the fop. Change-Id: I6d59293023e2de41c606395028c8980b83faca3f BUG: 953887 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4868 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Turn on eager-lock for fd DATA transactionsEmmanuel Dreyfus2013-05-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | Problem: With the present implementation, eager-lock is issued for any fd fop. eager-lock is being transferred to metadata transactions. But the lk-owner is set to local->fd address only for DATA transactions, but for METADATA transactions it is frame->root. Because of this unlock on the eager-lock fails and rebalance hangs. Fix: Enable eager-lock for fd DATA transactions This is a backport of change If30df7486a0b2f5e4150d3259d1261f81473ce8a http://review.gluster.org/#/c/4588/ BUG: 916226 Change-Id: Id41ac17f467c37e7fd8863e0c19932d7b16344f8 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/4899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: do complete split-brain check in all the fd based fopsRaghavendra Bhat2013-03-051-15/+0
| | | | | | | | | | | | | | | | | fd based operations such as readv checked only for data split brain instead of complete split-brain (i.e both data + metadata) assuming that open would have done the complete split-brain check. However open-behind would have unwound open, without winding to afr thus preventing the complete split-brain check and some appliations will be able to read the contents of the file even though the file has metadata split-brain. So let all the fd based fops do a defensive check of complete split-brain. Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Avoid priv->eager_lock value update racePranith Kumar K2013-02-061-0/+1
| | | | | | | | | | Change-Id: I7049c0c64e36a9dfa4cc0e0b34de7ec111d2f6c1 BUG: 908302 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4076 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: wakeup delayed post op on fsyncPranith Kumar K2013-01-291-5/+3
| | | | | | | | | Change-Id: I5d84ef72615f9d71b4af210976e2449de6e02326 BUG: 888174 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4446 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: Modified book-keeping structures for entrylksKrishnan Parthasarathi2013-01-231-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * There are upto 3 entry lockees that may be needed to perform entrylk'ing in posix dir-write operations. * For eg, rmdir ("/a/b") needs to acquire locks on two entities, - entrylk ("/a", "b") - entrylk ("/a/b", null) * Changed existing entrylk/rename/selfheal (entrylk) transactions to use the new book-keeping structures * Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now aware of the new entry lockee structure. Implementation notes: * Changed 'cookie' sent in stack_wind to encode lockee_entity_no and subvol_no. cookie is a non-negative integer such that 0 <= cookie < replica_count, When more than one lock is being acquired across the subvolumes, cookie % replica_count gives the subvol_no cookie / replica_count gives the lockee_entity_no. Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153 BUG: 765564 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.org/2828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Fail readv on data-split-brainPranith Kumar K2013-01-181-0/+15
| | | | | | | | | | | | | | | | | | Problem: Afr prevents opens on a file in split-brian but the fd that is already open still has the capability to perform both reads and writes to the file. Fix: Fail readvs on a file with EIO. Change-Id: I8e07f24c36fab800499b36ab374f984b743332cd BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4199 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: conditionally prioritize EIO errors over ENOENTBrian Foster2013-01-181-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | The most important errno logic historically only prioritized ESTALE over ENOENT. Commit c8c0942d added EIO prioritization over ENOENT to ensure that split-brain was reported when it occurs in conjunction with bricks missing the file entry. The unintended side effect of this change is that (non split-brain) EIO errors reported from the bricks themselves are now reported to the client when the expectation is that afr should squash said errors in favor of marking the file inconsistent. The high-level problem is that EIO is overloaded with different meanings from different contexts. This commit adds an eio parameter to the errno priority logic to conditionally flag when EIO is of higher priority and should be propagated to the client. BUG: 892730 Change-Id: Ib692a8a1f1737ef190d57894f392ec53ffb33aab Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4376 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: replace afr_more_important_error with afr_most_important_errorBrian Foster2013-01-171-22/+15
| | | | | | | | | | | | | | | | | | | | | afr_more_important_error() is written to return whether a new errno should override an existing errno for high-level operations that could span multiple sub-operations. It specifically prioritizes ESTALE over EIO over ENOENT, and otherwise defaults to the latest error passed having priority. This change preserves current behavior, but rewrites the logic to return the higher priority error of the existing and new errno. The purpose of the change is to make the logic a bit more clear and set the stage for future changes to make the logic flexible based on context. BUG: 892730 Change-Id: Id1aa48855dfb0507abc9d1ef22f2259b30472576 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4375 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Remember type of split-brain in inode-ctxPranith Kumar K2012-12-111-86/+83
| | | | | | | | | | | | | Along with this change, fixed the race of setting the split-brain status in inode-ctx after unwinding the fop from self-heal in case of back-ground self-heal. Change-Id: Ifc829300df485f50f139443802e8b6dc7038b4ad BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4198 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: mark new entry changelog for create/mknod failuresPranith Kumar K2012-12-041-4/+47
| | | | | | | | | | | | | | | | | | | | | | Problem: When create/mknod fails on some of the nodes, appropriate pending data/metadata changelogs are not assigned. This was not considered to be an issue because entry self-heal would do the assigning of appropriate changelog after creating new entries. But using the combination of rebalance and remove brick we can construct a case where a file with same name and gfid can be created in a dir with different data and link-to xattr without any changelog. Fix: When a create/mknod failure is observed mark the appropriate changelog on the new file created. Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6 BUG: 858212 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4207 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: make flush non-transactionalBrian Foster2012-12-041-136/+34
| | | | | | | | | | | | | | | | | | Flush is historically a transaction to ensure all previous writes were complete. This is no longer required as write-behind has learned to make flush a barrier operation (re: conversation w/ Avati). Flush taking a full file lock causes VMs running on afr volumes to stall when a migration occurs and self-heal is in progress. Make afr_flush() a non-transactional operation. BUG: 874045 Change-Id: If2db83823e280c86b1b29b41361eed7081601632 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4261 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to disable readdir failoverPranith Kumar K2012-12-031-0/+1
| | | | | | | | | | | | | | | | | In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4159 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: handle short writes in afr_writev_wind and self-heal to avoid corruptionBrian Foster2012-11-291-0/+2
| | | | | | | | | | | | | | | | | | | | | The current failure to handle short writes on writev fops leaves us open to file corruption. A short write on a user request is ignored and leaves replicas in an inconsistent state. A short write during a self-heal is ignored and incorrectly marks the files as consistent if the heal completes. Modify user writev handling to return the best case return value from each of the replicas. Short writes that occur relative to this value are marked as failed and will require a heal. Modify self-heal to set an error on a short write and abort the heal. BUG: 853690 Change-Id: I18b30f58702326249230eeebb361b29e40b535f5 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4150 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* replicate: don't stop checking xattrs because one was absentJeff Darcy2012-11-261-15/+41
| | | | | | | | | | | | | | | | | | | | | The functional issue is described by the subject line. This patch also addresses several efficiency/structure issues, such as... * Calling dict_set_ptr once for each txn type, instead of once overall. * Calling afr_index_for_transaction_type once per iteration instead of once per call (or better yet zero since the conversion is unnecessary). * Implementation of inner functions in a different file than their one caller, creating a spurious header-file dependency. Change-Id: I29e0df906a820533b66b9ced73e015dfe77267d2 BUG: 865825 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Cluster/afr: Fix output for gluster volume heal vn info healedVenkatesh Somyajula2012-11-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4181 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Preventing client crashing as the callings of GF_CALLOC has been failed.linbaiye2012-10-111-2/+5
| | | | | | | | | | | | | As the callings of GF_CALLOC can seldom come to a failure, glusterfs client will crash due to segment fault. We should have returned once the variables of transaction's local can't be alloced. Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b BUG: 861335 Signed-off-by: linbaiye <linbaiye@gmail.com> Reviewed-on: http://review.gluster.org/4005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Clean up of typepunning errors ( Strict aliasing warnings )Varun Shastry2012-09-171-1/+3
| | | | | | | | | | | Change-Id: I48733967facc526fb523a8dc9bd068f8c5cc5971 BUG: 764282 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3950 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: Avoid excessive logging in self-heal.Krishnan Parthasarathi2012-08-231-9/+6
| | | | | | | | | | | | | | - (Excessive) Logging has been very useful as 'bread-crumbs' in many a root-cause analyses. This patch aims at avoiding logging when the information could be reconstructed using the xattrs, statedump, and/or "volume heal" CLI commands. Change-Id: Iebc6b10ae18f0dd9704bdc6dd03bcfe0f2a09abd BUG: 844804 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3805 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Modified split-brain handlingPranith Kumar K2012-07-261-16/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA The bug is observed because the decision to mark a file in split-brain is taken outside appropriate locks. Lookup gathers xattrs outside any lock. The xattrs being in split-brain in lookup should only be taken as a hint. Appropriate inodelks should be taken before confirming a split-brain. Self-heal confirms this at the moment. If data/metadata self-heal is turned off, inspecting of xattrs could not be performed so split-brain behavior does not work correctly if the self-heal options are turned off. Fix Self-heals are launched to inspect xattrs even when the data/metadata self-heal options are turned off. The decision to heal data/metadata after the xattrs are inspected is based on whether the options are turned on/off. So decision to set/reset split-brain flag is taken inside appropriate locks. Testcases: tests 33-36 in https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh Change-Id: Ia8aeab08208b50c06609ad35a9d72f3d553ee343 BUG: 833727 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* remove useless if-before-free (and free-like) functionsJim Meyering2012-07-131-70/+35
| | | | | | | | | | | | See comments in http://bugzilla.redhat.com/839925 for the code to perform this change. Signed-off-by: Jim Meyering <meyering@redhat.com> BUG: 839925 Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a Reviewed-on: http://review.gluster.com/3661 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: post-op-delay supportAnand Avati2012-07-041-0/+3
| | | | | | | | | | | | | | | | post-op-delay introduces an artificial delay between the OP and POST-OP-CHANGELOG phases of a write transaction to increase the probability of changelog-piggyback and eager-locking to work more efficiently. Also enable eager-locking by default. Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3621 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
* cluster/afr: fix for read-subvolume option does not workJeff Darcy2012-07-031-2/+2
| | | | | | | | | | | | | | changed order of prevered read child in afr_select_read_child_from_policy when a read child is set over config option read-subvolume it shoudl be first to return Change-Id: I1c5a8171379bb2bad76f6653e9d68a9349d55142 BUG: 833750 Original-author: domwo <glusterfs@wollina.de> Signed-off-by: domwo <glusterfs@wollina.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3614 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Don't reset split-brain when data-self-heal is offPranith Kumar K2012-06-191-1/+0
| | | | | | | | | BUG: 804606 Change-Id: I8cefcb6efa687fac4ad412403c085b3767218f72 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3586 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Check for null gfid_reqPranith Kumar K2012-06-061-1/+1
| | | | | | | | | | | | | | gfid_req is set only by the fuse xlator. Fresh lookups performed by self-heal-daemon, rebalance will not have gfid at all. Change-Id: I6712e3063067ecc5f19956e75d28c86bfc19fc65 BUG: 829203 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3529 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: default read_child to a local brick if there is one.Jeff Darcy2012-06-051-2/+86
| | | | | | | | | | | Controlled by the "choose-local" option (on by default). Change-Id: I560f27c81703f2c9c62fdb51532c8eb763826df7 BUG: 806462 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: add hashed read-child method.Jeff Darcy2012-05-311-8/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both the first-to-respond method and the round-robin method are susceptible to clients repeatedly choosing the same servers across a series of opens, creating hot spots. Also, the code to handle a replica being down will ignore both methods and just choose the first remaining (which is not an issue for two-way but can be otherwise). The hashed method more reliably avoids such hot spots. There are three values/modes. 0: use the old (broken) methods. 1: select a read-child based on a hash of the file's GFID, so all clients will choose the same subvolume for a file (ensuring maximum consistency) but will distribute load for a set of files. 2: select a read-child based on a hash of the file's GFID plus the client's PID, so different children will distribute load even for one file. Mode 2 will probably be optimal for most cases. Using response time when we open the file is problematic, both because a single sample might not have been representative even then and because load might have shifted in the hours or days since (for long-lived files). Trying to use more current load information can lead to "herd following" behavior which is just as bad. Pseudo-random distribution is likely to be the best we can reasonably do, just as it is for DHT. Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461 BUG: 802513 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/2926 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Move landfill inside .glusterfsPranith Kumar K2012-05-311-5/+0
| | | | | | | | | Change-Id: Ia2944f891dd62e72f3c79678c3a1fed389854a90 BUG: 811970 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3158 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Assign gfid path if path is NULL in lookupPranith Kumar K2012-05-211-9/+14
| | | | | | | | | | | Change-Id: I45be4ea7f04ee79b67a83134fe8ebd18067a707f BUG: 820355 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3373 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Enforce order in pre/post opPranith Kumar K2012-05-181-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xattrop order in pre/post op on all the subvols is client-0, client-1... client-n where n is (replica-count - 1). This order can lead to invalid split-brains if the brick dies in the middle of xattrops. Example: transaction completed pre-op, so on all the subvolumes xattrs have '1' changelog. Now post-op is sent to both the subvols. On subvol-0 change-log of client-0 is decremented to 0, before decrementing change-log of client-1 to 0 the brick dies. This change-log status on subvol-0 gives the meaning that a change is done on subvol-0 successfully but on subvol-1 it failed. Which is not what happened. Changes done when the subvol-0 was down will lead to pending change-log on subvol-1 for subvol-0. Which is correct. When the subvol-0 is brought back up, the change-log will be in split-brain state even when it is not a legitimate split-brain. If the brick dies in the middle of xattrops it should remain fool. Pre-op should perform xattrop of the local change-log first and post-op should perform xattrop of the local change-log last. In case of optimistic changelogs txn_changelog should be done last on local if it succeeds, first if it fails. Change-Id: Ib6eeb20cdc49b0b1fd2f454f25a9c8e08388c6e7 BUG: 765194 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3226 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/replicate: check for 'loc->path' before dereferencing itAmar Tumballi2012-05-161-1/+1
| | | | | | | | | Change-Id: I4dada6fd509aa289e97fdb0b50b28300a15e6a0e Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 820355 Reviewed-on: http://review.gluster.com/3325 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Return EIO if read-child < 0 in inode-read fopsPranith Kumar K2012-05-161-1/+4
| | | | | | | | | | Change-Id: I8fb2369caffae8f295774b8b12a086c66ec714c7 BUG: 800884 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3332 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* license: dual license under GPLV2 and LGPLV3+Kaleb KEITHLEY2012-05-101-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the license was not changed in any of the following: .../argp-standalone/... .../booster/... .../cli/... .../contrib/... .../extras/... .../glusterfsd/... .../glusterfs-hadoop/... .../mod_clusterfs/... .../scheduler/... .../swift/... The license was not changed in any of the non-building xlators. The license was not changed in any of the xlators that seemed — to me — to be clearly server-side only, e.g. protocol/server Note too that copyright was changed along with the license; I did not change the copyright in files where the license did not change. If you find any errors or ommissions please don't hesitate to let me know. The complete list of files with the license change is: libglusterfs/src/byte-order.h libglusterfs/src/call-stub.c libglusterfs/src/call-stub.h libglusterfs/src/checksum.c libglusterfs/src/checksum.h libglusterfs/src/circ-buff.c libglusterfs/src/circ-buff.h libglusterfs/src/common-utils.c libglusterfs/src/common-utils.h libglusterfs/src/compat-errno.c libglusterfs/src/compat-errno.h libglusterfs/src/compat.c libglusterfs/src/compat.h libglusterfs/src/daemon.c libglusterfs/src/daemon.h libglusterfs/src/defaults.c libglusterfs/src/defaults.h libglusterfs/src/dict.c libglusterfs/src/dict.h libglusterfs/src/event-history.c libglusterfs/src/event-history.h libglusterfs/src/event.c libglusterfs/src/event.h libglusterfs/src/fd-lk.c libglusterfs/src/fd-lk.h libglusterfs/src/fd.c libglusterfs/src/fd.h libglusterfs/src/gf-dirent.c libglusterfs/src/gf-dirent.h libglusterfs/src/globals.c libglusterfs/src/globals.h libglusterfs/src/glusterfs.h libglusterfs/src/graph-print.c libglusterfs/src/graph-utils.h libglusterfs/src/graph.c libglusterfs/src/hashfn.c libglusterfs/src/hashfn.h libglusterfs/src/iatt.h libglusterfs/src/inode.c libglusterfs/src/inode.h libglusterfs/src/iobuf.c libglusterfs/src/iobuf.h libglusterfs/src/latency.c libglusterfs/src/latency.h libglusterfs/src/list.h libglusterfs/src/lkowner.h libglusterfs/src/locking.h libglusterfs/src/logging.c libglusterfs/src/logging.h libglusterfs/src/mem-pool.c libglusterfs/src/mem-pool.h libglusterfs/src/mem-types.h libglusterfs/src/options.c libglusterfs/src/options.h libglusterfs/src/rbthash.c libglusterfs/src/rbthash.h libglusterfs/src/run.c libglusterfs/src/run.h libglusterfs/src/scheduler.c libglusterfs/src/scheduler.h libglusterfs/src/stack.c libglusterfs/src/stack.h libglusterfs/src/statedump.c libglusterfs/src/statedump.h libglusterfs/src/syncop.c libglusterfs/src/syncop.h libglusterfs/src/syscall.c libglusterfs/src/syscall.h libglusterfs/src/timer.c libglusterfs/src/timer.h libglusterfs/src/trie.c libglusterfs/src/trie.h libglusterfs/src/xlator.c libglusterfs/src/xlator.h libglusterfsclient/src/libglusterfsclient-dentry.c libglusterfsclient/src/libglusterfsclient-internals.h libglusterfsclient/src/libglusterfsclient.c libglusterfsclient/src/libglusterfsclient.h rpc/rpc-lib/src/auth-glusterfs.c rpc/rpc-lib/src/auth-null.c rpc/rpc-lib/src/auth-unix.c rpc/rpc-lib/src/protocol-common.h rpc/rpc-lib/src/rpc-clnt.c rpc/rpc-lib/src/rpc-clnt.h rpc/rpc-lib/src/rpc-transport.c rpc/rpc-lib/src/rpc-transport.h rpc/rpc-lib/src/rpcsvc-auth.c rpc/rpc-lib/src/rpcsvc-common.h rpc/rpc-lib/src/rpcsvc.c rpc/rpc-lib/src/rpcsvc.h rpc/rpc-lib/src/xdr-common.h rpc/rpc-lib/src/xdr-rpc.c rpc/rpc-lib/src/xdr-rpc.h rpc/rpc-lib/src/xdr-rpcclnt.c rpc/rpc-lib/src/xdr-rpcclnt.h rpc/rpc-transport/rdma/src/name.c rpc/rpc-transport/rdma/src/name.h rpc/rpc-transport/rdma/src/rdma.c rpc/rpc-transport/rdma/src/rdma.h rpc/rpc-transport/socket/src/name.c rpc/rpc-transport/socket/src/name.h rpc/rpc-transport/socket/src/socket.c rpc/rpc-transport/socket/src/socket.h xlators/cluster/afr/src/afr-common.c xlators/cluster/afr/src/afr-dir-read.c xlators/cluster/afr/src/afr-dir-read.h xlators/cluster/afr/src/afr-dir-write.c xlators/cluster/afr/src/afr-dir-write.h xlators/cluster/afr/src/afr-inode-read.c xlators/cluster/afr/src/afr-inode-read.h xlators/cluster/afr/src/afr-inode-write.c xlators/cluster/afr/src/afr-inode-write.h xlators/cluster/afr/src/afr-lk-common.c xlators/cluster/afr/src/afr-mem-types.h xlators/cluster/afr/src/afr-open.c xlators/cluster/afr/src/afr-self-heal-algorithm.c xlators/cluster/afr/src/afr-self-heal-algorithm.h xlators/cluster/afr/src/afr-self-heal-common.c xlators/cluster/afr/src/afr-self-heal-common.h xlators/cluster/afr/src/afr-self-heal-data.c xlators/cluster/afr/src/afr-self-heal-entry.c xlators/cluster/afr/src/afr-self-heal-metadata.c xlators/cluster/afr/src/afr-self-heal.h xlators/cluster/afr/src/afr-self-heald.c xlators/cluster/afr/src/afr-self-heald.h xlators/cluster/afr/src/afr-transaction.c xlators/cluster/afr/src/afr-transaction.h xlators/cluster/afr/src/afr.c xlators/cluster/afr/src/afr.h xlators/cluster/afr/src/pump.c xlators/cluster/afr/src/pump.h xlators/cluster/dht/src/dht-common.c xlators/cluster/dht/src/dht-common.h xlators/cluster/dht/src/dht-diskusage.c xlators/cluster/dht/src/dht-hashfn.c xlators/cluster/dht/src/dht-helper.c xlators/cluster/dht/src/dht-inode-read.c xlators/cluster/dht/src/dht-inode-write.c xlators/cluster/dht/src/dht-layout.c xlators/cluster/dht/src/dht-linkfile.c xlators/cluster/dht/src/dht-mem-types.h xlators/cluster/dht/src/dht-rebalance.c xlators/cluster/dht/src/dht-rename.c xlators/cluster/dht/src/dht-selfheal.c xlators/cluster/dht/src/dht.c xlators/cluster/dht/src/nufa.c xlators/cluster/dht/src/switch.c xlators/cluster/stripe/src/stripe-helpers.c xlators/cluster/stripe/src/stripe-mem-types.h xlators/cluster/stripe/src/stripe.c xlators/cluster/stripe/src/stripe.h xlators/features/index/src/index-mem-types.h ¹ xlators/features/index/src/index.c ¹ xlators/features/index/src/index.h ¹ xlators/performance/io-cache/src/io-cache.c xlators/performance/io-cache/src/io-cache.h xlators/performance/io-cache/src/ioc-inode.c xlators/performance/io-cache/src/ioc-mem-types.h xlators/performance/io-cache/src/page.c xlators/performance/io-threads/src/io-threads.c xlators/performance/io-threads/src/io-threads.h xlators/performance/io-threads/src/iot-mem-types.h xlators/performance/md-cache/src/md-cache-mem-types.h xlators/performance/md-cache/src/md-cache.c xlators/performance/quick-read/src/quick-read-mem-types.h xlators/performance/quick-read/src/quick-read.c xlators/performance/quick-read/src/quick-read.h xlators/performance/read-ahead/src/page.c xlators/performance/read-ahead/src/read-ahead-mem-types.h xlators/performance/read-ahead/src/read-ahead.c xlators/performance/read-ahead/src/read-ahead.h xlators/performance/symlink-cache/src/symlink-cache.c xlators/performance/write-behind/src/write-behind-mem-types.h xlators/performance/write-behind/src/write-behind.c xlators/protocol/auth/addr/src/addr.c ¹ xlators/protocol/auth/login/src/login.c ¹ xlators/protocol/client/src/client-callback.c xlators/protocol/client/src/client-handshake.c xlators/protocol/client/src/client-helpers.c xlators/protocol/client/src/client-lk.c xlators/protocol/client/src/client-mem-types.h xlators/protocol/client/src/client.c xlators/protocol/client/src/client.h xlators/protocol/client/src/client3_1-fops.c ¹ Copyright only, license reverted to original Change-Id: If560e826c61b6b26f8b9af7bed6e4bcbaeba31a8 BUG: 820551 Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/3304 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Self-heald: Dump the event history completelyPranith Kumar K2012-05-071-1/+0
| | | | | | | | | Change-Id: Icf08ef1752795276f88c343d1d74af104095c6cb BUG: 796579 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle transient parent-entry xactions in lookupPranith Kumar K2012-04-191-6/+95
| | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses the case when the lookup on an entry is performed while it is being renamed. The lookup can possibly return 2 different gfids when lookup on one subvol reached before rename and on other after rename. In such cases the conflicting entry self-heal is triggered to resolve the issue, but if there are lot entry transactions going on the parent directory of the entry then the non-blocking locks could fail resulting in EIO. To avoid this, lookup queries locks xlator if there are any parent-entrylk on entry's basename. If afr finds that there are such locks and gfids are differing then it chooses the file with latest ctime as the iatt of the entry. This solution is not foolproof, but it decreases the probability of hitting the EIO. The correct solution is to take blocking locks on the parent-entry to find out the correct source. Taking blocking locks in lookup is not good. One stale entry lock can hang the whole filesystem. So we chose to go with this for now. Change-Id: Ibebb6c3074f56f80a96893b6bf5b77941e30d400 BUG: 765551 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3179 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Perform gfid-less lookup in afrPranith Kumar K2012-04-121-0/+5
| | | | | | | | | Change-Id: I78d9f0563e25047f392675ae32db38d2c94f6651 BUG: 795355 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3129 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* self-heald: Find self-heal failures, split-brainPranith Kumar K2012-04-051-2/+13
| | | | | | | | | | Change-Id: Ib967f0fe0b537fe60e51d7d05462b58a7f16596e BUG: 806745 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3077 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: adding extra data for fopsAmar Tumballi2012-03-221-65/+78
| | | | | | | | | | | | | with this change, the xlator APIs will have a dictionary as extra argument, which is passed between all the layers. This can be utilized for overloading in some of the operations. Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: fix a glitch in up_count/down_count updates.Jeff Darcy2012-03-191-2/+24
| | | | | | | | | Change-Id: I4919a98191bf7fe5edad9a149a129bcd177cd4a8 BUG: 802522 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/2927 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Self-heald: Handle errors gracefully and show errors to usersPranith Kumar K2012-03-181-1/+1
| | | | | | | | | Change-Id: I5424ebfadb5b2773ee6f7370cc2867a555aa48dd BUG: 800352 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2962 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: set_read_child when xactions in progress in fresh lookupPranith Kumar K2012-03-181-3/+6
| | | | | | | | | Change-Id: I33e0268635ae7a1f247b0052994e027f990083da BUG: 800755 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2963 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: Copy loc->gfid independent of lookup being fresh or otherwiseKrishnan Parthasarathi2012-03-181-5/+3
| | | | | | | | | | | | | This change ensures that entry self-heal following a lookup on that entry would have loc->gfid 'filled'. Change-Id: If723c71ca43e1f062dcb99cbe5488342514dace0 BUG: 786087 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.com/2950 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: Corrected getxattr 'key' matching in case of clrlk cmdKrishnan Parthasarathi2012-03-141-0/+3
| | | | | | | | | | | - Added local->dict cleanup into afr_local_cleanup Change-Id: Ie1b96615735a9d2a2be1757cd016dbe225aae31c BUG: 800412 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.com/2922 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: save the xattr obtained in the {f}xattrop_cbk in localRaghavendra Bhat2012-03-121-5/+19
| | | | | | | | | | | | | | | | | | | If the {f}xattrop operation succeeds on one of the subvolumes and fails on another (thus the xattr dict obtained from the failed subvolume in the callback will be NULL), then afr would be unwinding with op_ret = 0 (since the operation was successful on one subvolume), but the xattr dict would be NULL (afr is not saving the xattr it has received in the callback in its local structure and will send the xattr it has received in the last callback). xlators above afr might segfault when they access the xattr since they would have assumed that xattr would be present as op_ret is 0. Change-Id: I50761a302150285f31dfdaa397f890c9370a989a BUG: 797119 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2813 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: handle node failures in lookupPranith Kumar K2012-03-051-2/+25
| | | | | | | | | | | | | | When a transaction is in progress lookup depends on inode ctx for read-child. If the lookup fails on the read-child while another transaction is in progress, it should select the read-child as the next success_child which is in fresh_children. Change-Id: I33a04b102966b63a64bacf8d2e29f0d0119fdac6 BUG: 773225 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Add new option to know which process it is inPranith Kumar K2012-03-011-3/+1
| | | | | | | | | | | | | | | Afr xl needs to maintain inode-table inside the xl if it is in self-heal-daemon. The code was depending on the option self-heal-daemon to do this. This is wrong as the option can be reconfigured to on/off. Added a new option which can't be reconfigured for this purpose. Change-Id: Idc42c403c4bd9b73d1f328427ae4158ff1420b3a BUG: 795741 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: utilize mempool for frame->local allocationsAmar Tumballi2012-02-211-12/+12
| | | | | | | | | | | | | | | in each translator, which uses 'frame->local', we are using GF_CALLOC/GF_FREE, which would be costly considering the number of allocation happening in a lifetime of 'fop'. It would be good to utilize the mem pool framework for xlator's local structures, so there is no allocation overhead. Change-Id: Ida6e65039a24d9c219b380aa1c3559f36046dc94 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765336 Reviewed-on: http://review.gluster.com/2772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Add commands to see self-heald opsPranith Kumar K2012-02-201-6/+26
| | | | | | | | | Change-Id: Id92d3276e65a6c0fe61ab328b58b3954ae116c74 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Self-heald, Index integrationPranith Kumar K2012-02-201-13/+6
| | | | | | | | | Change-Id: Ic68eb00b356a6ee3cb88fe2bde50374be7a64ba3 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2749 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>