summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr
Commit message (Collapse)AuthorAgeFilesLines
* syncop: Include iatt to 'syncop_link' argsSoumya Koduri2015-07-101-1/+1
| | | | | | | | | | | | | | Include iatt to 'syncop_link' args to fetch proper attributes of the newly linked inode. Signed-off-by: Soumya Koduri <skoduri@redhat.com> Change-Id: If6b92961bd7a89add3791ed3a9b494087348b492 BUG: 1241788 Reviewed-on: http://review.gluster.org/11611 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/afr : expunge first, impunge next in entry selfhealAnuradha Talur2015-07-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When entry self-heals are performed, the files/directories that are to be expunged should be removed first and then impunge should be done. Consider the following scenario : A volume with 2 bricks : b0 and b1. 1) With following hierarchy on both bricks: olddir |__ oldfile 2) Bring down b1 and do 'mv olddir newdir'. 3) Bring up b1 and self-heal. 4) Without patch, during self-heal the events occur in following order, a) Creation of newdir on the sink brick. Notice that gfid of olddir and newdir are same. As a result of which gfid-link file in .glusterfs directory still points to olddir and not to newdir. b) Deletion of olddir on the sink brick. As a part of this deletion, the gfid link file is also deleted. Now, there is no link file pointing to newdir. 5) Files under newdir will not get listed as part of readdir. To tackle this kind of scenario, an expunge should be done first and impunge later; which is the purpose of this patch. Change-Id: Idc8546f652adf11a13784ff989077cf79986bbd5 BUG: 1238508 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11498 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Porting messages to new logging frameworkarao2015-06-2716-473/+748
| | | | | | | | | | | | | updated Change-Id: I94ac7b2cb0d43a82cf0eeee21407cff9b575c458 BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com> Reviewed-on: http://review.gluster.org/9897 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : truncate all sinks filesAnuradha2015-06-261-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : During data self-heal of sparse files, sparseness of files is lost. Cause : Earlier, only files with larger ia_size in sinks were being truncated to ia_size of source. This caused checksum mismatch of sparse blocks when ia_size of files in sinks were lesser than ia_size of source file. Leading to unnecessary healing of sparse blocks. As a result of which sparseness of files was lost. Solution : truncate files in all the sinks irrespective of their size with respect to the source file. After this change, checksum won't mismatch for sparse blocks and heal won't be triggered. As a result, sparseness of the files will be preserved. Other fixes in this patch : 1) in afr_does_size_mismatch(), check for mismatch only in sources. Previously, the check was being done for all children in a replica. 2) in __afr_selfheal_data_checksums_match(), check checksum mismatch only for children with valid responses. Change-Id: Ifcdb1cdc9b16c4a8a7867aecf9fa94b66e5301c2 BUG: 1232238 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11252 Reviewed-by: Prasanna Kumar Kalever Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: Block fops when file is in split-brainRavishankar N2015-06-264-14/+76
| | | | | | | | | | | | | For directories, block metadata FOPS. For non-directories, block data and metadata FOPS. Do not block entry FOPS. Change-Id: Id7f656f4a513b9d33c457dd7f2d58028dbef8e61 BUG: 1235007 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11371 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/afr : set pending xattrs for replaced brickAnuradha2015-06-255-10/+250
| | | | | | | | | | | | | | | | | | | | | | This patch is part two change to prevent data loss in a replicate volume on doing a replace-brick commit force operation. Problem: After doing replace-brick commit force, there is a chance that self heal might happen from the replaced (sink) brick rather than the source brick leading to data loss. Solution: Mark pending changelogs on afr children for the replaced afr-child so that heal is performed in the correct direction. Change-Id: Icb9807e49b4c1c4f1dcab115318d9a58ccf95675 BUG: 1207829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10448 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Pick gfid from poststat during fresh lookup for read child ↵Krutika Dhananjay2015-06-246-41/+68
| | | | | | | | | | | calculation Change-Id: I12c1e4f67f4ec4affbe13d7daf871044a8a2a12e BUG: 1235216 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11373 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* afr: complete conservative merge even in case of gfid split-brain.Ravishankar N2015-06-221-2/+18
| | | | | | | | | | | | | | | | | | | | Problem: While performing conservative merge, we bail out of the merge if we encounter a file with mismatching gfid or type. What this means is all entries that come after the mismatching file (during the merge) never get healed, no matter how many index heals are done. Fix: Continue with the merging of rest of the entries even if a gfid/type mismatch is found, but ensure that post-op does not happen on the parent dir in such a case. Change-Id: I9bbfccc8906007daa53a0750ddd401dcf83943f8 BUG: 1180545 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9429 Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Do not attempt entry self-heal if the last lookup on entry ↵Krutika Dhananjay2015-06-081-2/+9
| | | | | | | | | | | | | | | | | | | failed on src Test bug-948686.t was causing shd to dump core due to gfid being NULL. This was due to the volume being stopped while index heal's in progress, causing afr_selfheal_unlocked_lookup_on() to fail sometimes on the src brick with ENOTCONN. And when afr_selfheal_newentry_mark() copies the gfid off the src iatt, it essentially copies null gfid. This was causing the assertion as part of xattrop in protocol/client to fail. Change-Id: I237a0d6b1849e4c48d7645a2cc16d9bc1441ef95 BUG: 1229172 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11119 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: honour selfheal enable/disable volume set optionsRavishankar N2015-06-032-3/+11
| | | | | | | | | | | | | | | | | | | | afr-v1 had the following volume set options that are used to enable/ disable self-heals from happening in AFR xlator when loaded in the client graph: cluster.metadata-self-heal cluster.data-self-heal cluster.entry-self-heal In afr-v2, these 3 heals can happen from the client if there is an inode refresh. This patch allows such heals to proceed only if the corresponding volume set options are set to true. Change-Id: I8d97d6020611152e73a269f3fdb607652c66cc86 BUG: 1226507 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11012 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-2916-79/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: allow readdir to proceed for directories in split-brainRavishankar N2015-05-281-18/+22
| | | | | | | | | | | | | | | | | | | Problem: afr_read_txn() bails out if read_subvol==-1. This meant that for directories that were in entry split-brain, FOPS like readdir, access, stat etc were not allowed. Fix: Except for getxattr, all other FOPS are wound on the first up child of afr. Change-Id: Iacec8fbb1e75c4d2094baa304f62331c81a6f670 BUG: 1221481 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10776 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: NetBSD Build System
* cluster/afr: Treat op_ret >= 0 as success in afr_final_errno()Krutika Dhananjay2015-05-271-1/+1
| | | | | | | | | | Change-Id: I7ec29428b7f7ef249014f948a5d616bfb8aaf80d BUG: 1225491 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10946 Tested-by: NetBSD Build System Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : Do not copy dict when it is NULLAnuradha2015-05-131-1/+1
| | | | | | | | | | | | | | | | In afr_lookup_xattr_req_prepare(), dict_copy was done even though source dict was NULL. Change-Id: I85a5d2823ba021e7f78c1ce13402a0f16b08cb51 BUG: 1220332 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10755 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr : Prevent inode-evict during split-brain resolutionAnuradha2015-05-075-58/+303
| | | | | | | | | | | | | | | | | | | | | 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1209104 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: add arbitration supportRavishankar N2015-05-058-32/+207
| | | | | | | | | | | | | | | | | | | | | | | | | Add logic in afr to work in conjunction with the arbiter xlator when a replica 3 arbiter volume is created. More specifically, this patch: * Enables full locks for afr data transaction for such volumes. * Removes the upfront marking of pending xattrs at the time of pre-op and defer it to post-op. (This is an arbiter independent change and is made for all afr transactions.) * After pre-op stage, check if we can proceed with the fop stage without ending up in split-brain by examining the changelog xattrs. * Unwinds the fop with failure if only one source was available at the time of pre-op and the fop happened to fail on particular source brick. * Skips data self-heal if arbiter brick is the only source available. * Adds the arbiter-count option to the shd graph. This patch is a part of the arbiter logic implementation for 3 way AFR details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: I9603db9d04de5626eb2f4d8d959ef5b46113561d BUG: 1199985 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10258 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix dictionary compare functionPranith Kumar K2015-05-041-35/+6
| | | | | | | | | | | | | | | | | | | | If both dicts are NULL then equal. If one of the dicts is NULL but the other has only ignorable keys then also they are equal. If both dicts are non-null then check if for each non-ignorable key, values are same or not. value_ignore function is used to skip comparing values for the keys which must be present in both the dictionaries but the value could be different. geo-rep's stime xattr doesn't need to be present in list xattr but when getxattr comes on stime xattr even if there aren't enough responses with the xattr we should still give out an answer which is maximum of the stimes available. Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d BUG: 1207712 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10078 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* arbiter: load arbiter xlator on every 3rd brick of a replica 3 AFR subvolRavishankar N2015-04-272-0/+8
| | | | | | | | | | | | | | | | | | | Logic for adding the 'glusterd_brickinfo->group' member and using it to find the brick positon has been taken from http://review.gluster.org/#/c/9919. Thanks to Jeff Darcy for that. This patch is a part of the arbiter logic implementation for 3 way AFR details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: Idbfe4f29ee8e098e0102def8f38b32314316b188 BUG: 1199985 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10257 Tested-by: NetBSD Build System Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* cluster/afr,dht: Fix memleak after syncop_readlinkPranith Kumar K2015-04-231-0/+1
| | | | | | | | | | | | Change-Id: Ia71c14c2c2709c541075748c9011437e0d8cac4b BUG: 1213542 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10305 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs/syncop: Add xdata to all syncop callsRaghavendra Talur2015-04-086-39/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr : null dereference coverity fix.Manikandan Selvaganesh2015-04-081-14/+25
| | | | | | | | | | | | CID : 1194648 Change-Id: Ib26e7cdbf412d563240885fb3113bcc1fe5c9c49 BUG: 789278 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9571 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Cluster/afr : Coverity fix.Manikandan Selvaganesh2015-04-081-4/+0
| | | | | | | | | | | | | | | CID:1194644 Childup[] value will not be equal to -1 when afr_xl_op() function gets called Change-Id: Iaf7a9d41a54f6b2d52d9ba5dadb638f328afe14b BUG: 789278 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9540 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-0412-64/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Tests: fix spurious failure in read-subvol-entry.tEmmanuel Dreyfus2015-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | read-subvol-entry.t tests that if a brick has pending operations, it is not used for readdir operations. On NetBSD this test exhibits spurious failures, with the wrong brick being used to perform readdir. It happens because when afr_replies_interpret() looks at xattr for pending attributes, it uses alternative bahvior whether it is working on a directory or another object. The decision is based on inode->ia_type, which may be IA_INVAL at that time if we come there from: afr_replies_interpret.() afr_xattrs_are_equal() afr_lookup_metadata_heal_chec() afr_lookup_entry_heal() afr_lookup_cbk() Using replies[i].poststat.ia_type, which is correctly set, works around the problem. BUG: 1129939 Change-Id: Id9ccdd8604f79a69db5f1902697f8913acac50ad Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9831 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Xlators : Fixed typosManikandan Selvaganesh2015-04-023-3/+3
| | | | | | | | | | | Change-Id: I948f85cb369206ee8ce8b8cd5e48cae9adb971c9 BUG: 1075417 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9529 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
* cluster/ec: Implement heal info for ecPranith Kumar K2015-03-301-1/+1
| | | | | | | | | | | | This also lists the files that are on-going I/O, which will be fixed later. Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327 BUG: 1203581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* libxlator: Change marker xattr handling interfacePranith Kumar K2015-03-252-72/+20
| | | | | | | | | | | | | | | | | | | | | - Changed the implementation of marker xattr handling to take just a function which populates important data that is different from default 'gauge' values and subvolumes where the call needs to be wound. - Removed duplicate code I found while reading the code and moved it to cluster_marker_unwind. Removed unused structure members. - Changed dht/afr/stripe implementations to follow the new implementation - Implemented marker xattr handling for ec. Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4 BUG: 1200372 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9892 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Implementation of bit-rot xlatorVenky Shankar2015-03-241-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the "Signer" -- responsible for signing files with their checksums upon last file descriptor close (last release()). The event notification facility provided by the changelog xlator is made use of. Moreover, checksums are as of now SHA256 hash of the object data and is the only available hash at this point of time. Therefore, there is no special "what hash to use" type check, although it's does not take much to add various hashing algorithms to sign objects with. Signatures are stored in extended attributes of the objects along with the the type of hashing used to calculate the signature. This makes thing future proof when other hash types are added. The signature infrastructure is provided by bitrot stub: a little piece of code that sits over the POSIX xlator providing interfaces to "get or set" objects signature and it's staleness. Since objects are signed upon receiving release() notification, pre-existing data which are "never" modified would never be signed. To counter this, an initial crawler thread is spawned The crawler scans the entire brick for objects that are unsigned or "missed" signing due to the server going offline (node reboots, crashes, etc..) and triggers an explicit sign. This would also sign objects when bit-rot is enabled for a volume and/or after upgrade. Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b BUG: 1170075 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-195-33/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/quota : Introducing inode quotavmallika2015-03-181-18/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ========================================================================== Inode quota ========================================================================== = Currently, the only way to retrieve the number of files/objects in a = = directory or volume is to do a crawl of the entire directory/volume. = = This is expensive and is not scalable. = = = = The proposed mechanism will provide an easier alternative to determine = = the count of files/objects in a directory or volume. = = = = The new mechanism proposes to store count of objects/files as part of = = an extended attribute of a directory. Each directory's extended = = attribute value will indicate the number of files/objects present = = in a tree with the directory being considered as the root of the tree. = = = = The count value can be accessed by performing a getxattr(). = = Cluster translators like afr, dht and stripe will perform aggregation = = of count values from various bricks when getxattr() happens on the key = = associated with file/object count. = A new interface is introduced: ------------------------------ limit-objects : limit the number of inodes at directory level list-objects : list the directories where the limit is set remove-objects : remove the limit from the directory ========================================================================== CLI COMMAND: gluster volume quota <volname> limit-objects <path> <number> [<percent>] * <number> is a hard-limit for number of objects limitation for path "<path>" If hard-limit is exceeded, creation of file/directory is no longer permitted. * <percent> is a soft-limit for number of objects creation for path "<path>" If soft-limit is exceeded, a warning is issued for each creation. CLI COMMAND: gluster volume quota <volname> remove-objects [path] ========================================================================== CLI COMMAND: gluster volume quota <volname> list-objects [path] ... Sample output: ------------------ Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------ -------------------------------------- /dir 10 80% 10 0 Yes Yes ========================================================================== [root@snapshot-28 dir]# ls a b file11 file12 file13 file14 file15 file16 file17 [root@snapshot-28 dir]# touch a1 touch: cannot touch `a1': Disk quota exceeded * Nine files are created in directory "dir" and directory is included in * the count too. Hence the limit "10" is reached and further file creation fails ========================================================================== Note: We have also done some re-factoring in cli for volume name validation. New function cli_validate_volname is created ========================================================================== Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b BUG: 1190108 Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Make read child match check in afr optionalKrutika Dhananjay2015-03-183-0/+19
| | | | | | | | | | | | | | | Having this particular check which was introduced by commit c78998c39f0857ea7aacba360632c148afc54a55 causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I2858fc18b3539df7aa6d3f489e0d5cfaeb8a9b3c BUG: 1202669 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: remove stale index entriesRavishankar N2015-03-173-4/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Convert quota size from n/w to host order before useKrutika Dhananjay2015-03-121-0/+4
| | | | | | | | | | | | Change-Id: I3e4fe15716556441546fcd62b8ac2833869b21cf BUG: 1200670 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9853 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add self-heal-daemon command handlersPranith Kumar K2015-03-092-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the changes required in ec xlator to handle index/full heal. Index healer threads: Ec xlator start an index healer thread per local brick. This thread keeps waking up every minute to check if there are any files to be healed based on the indices kept in index directory. Whenever child_up event comes, then also this index healer thread wakes up and crawls the indices and triggers heal. When self-heal-daemon is disabled on this particular volume then the healer thread keeps waiting until it is enabled again to perform heals. Full healer threads: Ec xlator starts a full healer thread for the local subvolume provided by glusterd to perform full crawl on the directory hierarchy to perform heals. Once the crawl completes the thread exits if no more full heals are issued. Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic. Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Handle getxattr of quota-size keyPranith Kumar K2015-03-093-61/+103
| | | | | | | | | | | | | Afr needs to query QUOTA_SIZE_KEY from all the subvolumes and return the value which is maximum of the readable bricks. Change-Id: Ibb9064c8652aea0d984796e7a06f8adca72aa971 BUG: 1199431 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9820 Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Implementation of quorum-readsPranith Kumar K2015-03-055-2/+32
| | | | | | | | | | | Provide a way of disabling reads when quorum is not met. Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5 BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9543 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not increment healed_count if no healing was performedKrutika Dhananjay2015-03-047-67/+92
| | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: When file modifications are happening while index heal is launched, index healer could pick up entries which appeared in indices/xattrop transiently during the course of the operations on the mount point, and do not really need any heal. This will cause index healer to keep doing index-heal in a loop as long as it finds this entry, by believing that it did successfully heal some gfids even when it didn't. FIX: afr_selfheal() now returns a 1 to indicate that it did not (need to) heal a given gfid. afr_shd_selfheal() will not increment healed_count whenever afr_selfheal() returns a 1. Change-Id: I0d97e11392a032a852e8c6508f691300ef0e5b98 BUG: 1194305 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9713 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glfs_fini: Clean up all the resources allocated in glfs_new.Poornima G2015-03-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially even after calling glfs_fini(), all the threads created during init and many other resources like memory pool, iobuf pool, event pool and other memory allocs were not being freed. With this patch these resources are freed in glfs_fini(). The two thumb rules followed in this patch are: - The threads are not killed, they are made to exit voluntarily, once the queued tasks are completed. The main thread waits for the other threads to exit. - Free the memory pools and destroy the graphs only after all the other threads are stopped, so that there are less chances of hitting access after free. Resources freed and its order: 1. Destroy the inode table of all the graphs - Call forget on all the inodes. This will not be required when the cleanup during graph switch is implemented to perform inode table destroy. 2. Deactivate the current graph, call fini of all the xlators. 3. Syncenv destroy - Join the synctask threads and cleanup syncenv resources Sets the destroy mode, complete the existing synctasks, then join the synctask threads. After entering the destroy mode, -if a new synctask is submitted, it fails. -if syncenv_new() is called, it will end up creating new threads, but this is called only during init. 4. Poller thread destroy Register an event handler which sets the destroy mode for the poller. Once the poller is done processing all the events, it exits. 5. Tear down the logging framework The log file is closed and the log level is set to none, after this point no log messages appear either in log file or in stderr. 6. Destroy the timer thread Set the destroy bit, once the pending timer events are processed the timer thread exits. Note: Log infrastructure should be shutdown before destroying the timer thread as gf_log uses timers. 7. Destroy the glusterfs_ctx_t For all the graphs(active and passive), free graph, xlator structs and few other lists. Free the memory pools - iobuf pool, event pool, dict, logbuf pool, stub mem pool, stack mem pool, frame mem pool. Few things not addressed in this patch: 1. rpc_transport object not destroyed, the PARENT_DOWN should have destroyed this object but has not, needs to be addressed as a part of different patch 2. Each xlator fini should clean up the local pool allocated by its xlator. Needs to be addresses as a part of different patch. 3. Each xlator should implement forget to free its inode_ctx. Needs to be addresses as a part of different patch. 3. Few other leaks reported by valgrind. 4. fd and fd contexts The numbers: The resource usage by the test case in this patch: Without the fix, Memory: ~3GB; Threads: ~81 With this fix, Memory: 300MB; Threads: 1(main thread) Change-Id: I96b9277541737aa8372b4e6c9eed380cb871e7c2 BUG: 1093594 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/7642 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* libglusterfs: Moved common functions as utils in syncop/common-utilsPranith Kumar K2015-02-273-138/+11
| | | | | | | | | | | | | | | These will be used by both afr and ec. Moved syncop_dirfd, syncop_ftw, syncop_dir_scan functions also into syncop-utils.c Change-Id: I467253c74a346e1e292d36a8c1a035775c3aa670 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9740 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : provide split-brain info by using getxattrAnuradha2015-02-253-0/+140
| | | | | | | | | | | | | | | | | | | | | | This patch is one part to enable users analyze and resolve split-brain. Problem : To know if a file is in data/metadata split-brain Solution : Performing "getfattr -n afr.split-brain-status <path-to-file>" from the mount provides this information. Also provides the list of afr children to analyse to get more information. Change-Id: I4d9b429794759a906371416cb84c84a212e2c7b9 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9633 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Enable auto-quorum for replicate with odd number of bricksPranith Kumar K2015-02-091-8/+16
| | | | | | | | | | | Change-Id: I908934f1f22cf7d2d0ceccc0dedf28a69861997f BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9517 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Re-introduce heal-timeout optionPranith Kumar K2015-02-063-1/+14
| | | | | | | | | | | Change-Id: I87484c810006a92ed7489284b6d74e9b0aecae80 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: Provide syncop_ftw and syncop_dir_scan utilsPranith Kumar K2015-02-061-257/+117
| | | | | | | | | | | | | | | | | ftw provides file tree walk. dir_scan does just a readdir not readdirp. Also changed Afr's self-heal-daemon's crawling functions to use this. These utils will be used by ec in future to do proactive/full healing. Change-Id: I05715ddb789592c1b79a71e98f1e8cc29aac5c26 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9485 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fix parent read subvol selection policy in lookupKrutika Dhananjay2015-02-041-1/+1
| | | | | | | | | | | | | | | | | When lookup has succeeded on multiple subvols of AFR (including the read child of the parent dir) and all of them are "readable", ideally the call must be unwound with postparent from the parent's read child. But that is not the case, due to a bug introduced in the commit c78998c39f0857ea7aacba360632c148afc54a55. This patch fixes the issue. Change-Id: I83b0c26494a5d0bdbc30fcbe974fbdb6f7e9c84a BUG: 1179169 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9569 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: When parent and entry read subvols are different, set ↵Krutika Dhananjay2015-02-022-23/+94
| | | | | | | | | | | | | | | entry->inode to NULL That way a lookup would be forced on the entry, and its attributes will always be selected from its read subvol. Change-Id: Iaba25e2cd5f83e983fc8b1a1f48da3850808e6b8 BUG: 1179169 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9477 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr : Change in heal info split-brain commandAnuradha2015-01-302-5/+12
| | | | | | | | | | | | | Implementation of heal info split-brain command with glfs-heal. Change-Id: I233eb790de6eb5468a4cbb12a1cef0f97db2a1d2 BUG: 1183019 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9459 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Don't write to sparse regions of sink.Ravishankar N2015-01-301-2/+39
| | | | | | | | | | | | | | | | | | | | Problem: When data-self-heal-algorithm is set to 'full', shd just reads from source and writes to sink. If source file happened to be sparse (VM workloads), we end up actually writing 0s to the corresponding regions of the sink causing it to lose its sparseness. Fix: If the source file is sparse, and the data read from source and sink are both zeros for that range, skip writing that range to the sink. Change-Id: I787b06a553803247f43a40c00139cb483a22f9ca BUG: 1166020 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9480 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: split-brain resolution CLIRavishankar N2015-01-157-54/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the AFR heal command to include automated split-brain resolution. This patch [3/3] is the final patch for afr automated split-brain resolution implementation. "gluster volume heal <VOLNAME> [full | statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | split-brain]| split-brain {bigger-file <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]" The new additions being: 1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE> Locates the replica containing the FILE, selects bigger-file as source and completes heal. 2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE> Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal. 3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME> Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes heal. Note: <FILE> can be either the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which sometimes gets displayed in the heal info command's output. Entry/gfid split-brain resolution is not supported. Example can be found in the test case. Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9377 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs: change signature of syncop_(f)getxattrRavishankar N2015-01-052-6/+8
| | | | | | | | | | | | | | | | | Pass xdata dict to syncop_(f)getxattr calls. This patch [1/3] is required as a part of afr automated split-brain resolution implementation. Change-Id: I3970b3dd6daf64681a031e37f8e9afb14fb3d668 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9375 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: serialize inode locksPranith Kumar K2015-01-042-60/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Afr winds inodelk calls without any order, so blocking inodelks from two different mounts can lead to dead lock when mount1 gets the lock on brick-1 and blocked on brick-2 where as mount2 gets lock on brick-2 and blocked on brick-1 Fix: Serialize the inodelks whether they are blocking inodelks or non-blocking inodelks. Non-blocking locks also need to be serialized. Otherwise there is a chance that both the mounts which issued same non-blocking inodelk may endup not acquiring the lock on any-brick. Ex: Mount1 and Mount2 request for full length lock on file f1. Mount1 afr may acquire the partial lock on brick-1 and may not acquire the lock on brick-2 because Mount2 already got the lock on brick-2, vice versa. Since both the mounts only got partial locks, afr treats them as failure in gaining the locks and unwinds with EAGAIN errno. Change-Id: Ie6cc3d564638ab3aad586f9a4064d81e42d52aef BUG: 1176008 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9372 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>