summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* glusterd/geo-rep: Fix glusterd crashKotresh HR2016-12-143-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: glusterd crashes when geo-rep mountbroker setup is created if the slave user length is more than 8 characters. Cause: _POSIX_LOGIN_NAME_MAX is used which is 9 including NULL byte. Analysis: While the man page says it sufficient for portability, but acutally it's not. Linux allows the creation of username upto 32 characters by default where the max length is 256. And NetBSD's max is 17. Linux: #getconf LOGIN_NAME_MAX 256 NetBSD: #getconf LOGIN_NAME_MAX 17 Fix: Use LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX >Reviewed-on: http://review.gluster.org/16053 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Aravinda VK <avishwan@redhat.com> >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Change-Id: I26b7230433ecbbed6e6914ed39221a478c0266a8 BUG: 1403108 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/16084 Tested-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/ec: Fix lk-owner set race in ec_unlockPranith Kumar K2016-12-147-34/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Rename does two locks. There is a case where when it tries to unlock it sends xattrop of the directory with new version, callback of these two xattrops can be picked up by two separate epoll threads. Both of them will try to set the lk-owner for unlock in parallel on the same frame so one of these unlocks will fail because the lk-owner doesn't match. Fix: Specify the lk-owner which will be set on inodelk frame which will not be over written by any other thread/operation. >BUG: 1402710 >Change-Id: I666ffc931440dc5253d72df666efe0ef1d73f99a >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/16074 >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> BUG: 1404572 Change-Id: Iff4f0c1364e6533f3c07f192138bcd321789b4cd Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16130 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* common-ha: Create portblock RA as part of add/delete-nodeSoumya Koduri2016-12-141-8/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a node is added to or deleted from existing nfs-ganesha cluster, we need to create or cleanup portblock RA as well. This patch is to address the same. Also we need to adjust the quorum-policy with increase/decrease in the number of nodes in the cluster. >Change-Id: I31a896715b9b7fc931009723d1570bf7aa4da9b6 >BUG: 1403130 >Signed-off-by: Soumya Koduri <skoduri@redhat.com> >Reviewed-on: http://review.gluster.org/16089 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> >Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> >(cherry picked from commit 885ecce6e2df6464b388f42c91211ed31e17654d) Change-Id: I4572ddb8dee2596d81c33c3685c808bb9bf4d38f BUG: 1404133 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/16115 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* protocol/server: Remove unused variableAnoop C S2016-12-131-2/+0
| | | | | | | | | | | | | | | | | | | | | >Change-Id: I0d0a786b2d02d4db37c4da6194ee4b4feac31b63 >BUG: 1198849 >Signed-off-by: Anoop C S <anoopcs@redhat.com> >Reviewed-on: http://review.gluster.org/15899 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Vijay Bellur <vbellur@redhat.com> BUG: 1396780 Change-Id: I78cb34d0f91c2f375c2e4e413337253b34987baa Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/16058 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* protocol/server: Print error-xlator namePranith Kumar K2016-12-132-182/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: At the moment from which xlator the errors are stemming from is a mystery. Fix: With this patch we can find on the server side which xlator first gave the errno received by server xlator. I am not yet sure how to get this done for client side which has lot of copy_frame()s. May be another patch. >Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9 >BUG: 1394548 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15836 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Vijay Bellur <vbellur@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> BUG: 1396780 Change-Id: I3b293d21528da391eafd0fbaa5b451a1d3ddc237 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15886 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* protocol/server: Print pargfid in logs for rename errorPranith Kumar K2016-12-131-2/+2
| | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/15872 >BUG: 1394548 >Change-Id: I42ee627c8cdf54158f083f9019a096ace449e3cc >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> BUG: 1396780 Change-Id: I80c3d3c7994982b812857c6705b3bf4716bf0824 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15887 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* extras: Include shard and full-data-heal in virt groupKrutika Dhananjay2016-12-131-0/+2
| | | | | | | | | | | | | Backport of: http://review.gluster.org/15995 Change-Id: I06aa500d2b173980d6c6bd024a998fbd238868ff BUG: 1402216 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16121 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Handle volinfo->refcnt properly during volume start commandAvra Sengupta2016-12-131-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | While running the volume start command, the refcnt of the volume is incremented. At the end of the command, the refcnt should also be decremented. This is currently not the case. This patch, makes sure the refcnt is also decremented at the end of the volume start command. > Reviewed-on: http://review.gluster.org/16108 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 9b1c9395a397e337e4a0acac55b935cb1ce094b7) Change-Id: I017b5039be5948df41dde6bc89d2955d5d18971f BUG: 1404104 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/16113 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-12-137-38/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/15764/ Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 >Reviewed-on: http://review.gluster.org/15764 >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: N Balachandran <nbalacha@redhat.com> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> (cherry picked from commit f7ab6c45963fa0da68acedfb14281cd2456abc68) Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396880 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15890 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libglusterfs: Fix a read hangPoornima G2016-12-133-3/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/15923 Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) | readdir-ahead | read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. */ ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; /* for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected */ ... THIS = obj; /* THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. */ ... fn (frame, obj, params); /* fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement */ ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Modify STACK_WIND_TAIL() as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj /* resolve obj as the variables passed in obj macro can be overwritten in the further instrucions */ next_xl_fn = fn /* resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } >Reviewed-on: http://review.gluster.org/15923 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> (Cherry picked from commit 8943c19a2ef51b6e4fa66cb57211d469fe558579) BUG: 1399015 Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15933 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Fix the EIO that can occur in afr_inode_refresh as a resultPoornima G2016-12-134-18/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of cache invalidation(upcall). Issue: ------ When a cache invalidation is recieved as a result of changing pending xattr, the read_subvol is reset. Consider the below chain of execution: CHILD_DOWN ... afr_readv ... afr_inode_refresh ... afr_inode_read_subvol_reset <- as a result of pending xattr set by some other client GF_EVENT_UPCALL will be sent afr_refresh_done -> this results in an EIO, as the read subvol was reset by the end of the afr_inode_refresh Solution: --------- When GF_EVENT_UPCALL is recieved, instead of resetting read_subvol, set a variable need_refresh in inode_ctx, the next time some one starts a txn, along with event gen, need_rrefresh also needs to be checked. >Reviewed-on: http://review.gluster.org/15892 >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Signed-off-by: Poornima G <pgurusid@redhat.com> Change-Id: Ifda21a7a8039b8874215e1afa4bdf20f7d991b58 BUG: 1399450 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15959 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: allow I/O when favorite-child-policy is enabledRavishankar N2016-12-129-61/+469
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently, I/O on a split-brained file fails even when the favorite-child-policy is set until the self-heal is complete. Fix: If a valid 'source' is found using the set favorite-child-policy,inspect and reset the afr pending xattrs on the 'sinks' (inside appropriate locks),refresh the inode and then proceed with the read or write transaction. The resetting itself happens in the self-heal code and hence can also happen in the client side background-heal or by the shd's index-heal in addition to the txn code path explained above. When it happens in via heal, we also add checks in undo-pending to not reset the sink xattrs again. > Reviewed-on: http://review.gluster.org/15673 > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626 BUG: 1403121 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin@ubisoft.com> Reviewed-on: http://review.gluster.org/16088 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Fix per-txn optimistic changelog initialisationKrutika Dhananjay2016-12-123-29/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/16075 Incorrect initialisation of local->optimistic_change_log was leading to skipped pre-op and post-op even when a brick didn't participate in the txn because it was down. The result - missing granular name index resulting in some entries never getting healed. FIX: Initialise local->optimistic_change_log just before pre-op. Also fixed granular entry heal to create the granular name index in pre-op as opposed to post-op. This is to prevent loss of granular information when during an entry txn, the good (src) brick goes offline before the post-op is done. This would cause self-heal to do conservative merge (since dirty xattr is the only information available), which when granular-entry-heal is enabled, expects granular indices, the lack of which can lead to loss of data in the worst case. Change-Id: I213d98ca9b3c4604b095478bf427fa69c04a7d64 BUG: 1403743 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/16106 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* build: python site-packages vs. dist-packagesKaleb S. KEITHLEY2016-12-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike Fedora and RHEL/CentOS, where python has /usr/{lib,lib64}/python2.7/site-packages/..., Debian-based distributions have /usr/lib/python2.7/dist-packages/... On Debian our python scriptlet to determine this is broken and incorrectly returns .../site-packages, not .../dist-packages. Furthermore, the automake/autoconf/pkgconfig bits ahead of our own scriptlet have already determined the correct location -- all we need to do is use it. This change also slightly reworks the BUILD_PYTHON_INC scriptlet to bring it in line with the style/pattern of the other python scriptlets provided by autoconf/pkgconfig. (I do wonder though, why we have two sets of {C,CPP}FLAGS and LD_FLAGS - PYTHONDEV_CPPFLAGS and BUILD_PYTHON_INC, and PYTHONDEV_LDFLAGS and BUILD_PYTHON_LIB. They both have very similar values, but, e.g. BUILD_PYTHON_INC misses /usr/lib/x86_64-linux-gnu/python2.7 on Debian; even if that ommission doesn't seem to be hurting us.) Change-Id: I309a5c781a1d9aee4d7be2867223781bd2ae18fa BUG: 1389740 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15752 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* syncop: fix conditional wait bug in parallel dir scanRavishankar N2016-12-112-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The issue as seen by the user is detailed in the BZ but what is happening is if the no. of items in the wait queue == max-qlen, syncop_mt_dir_scan() does a pthread_cond_wait until the launched synctask workers dequeue the queue. But if for some reason the worker fails, the queue is never emptied due to which further invocations of syncop_mt_dir_scan() are blocked forever. Fix: Made some changes to _dir_scan_job_fn - If a worker encounters error while processing an entry, notify the readdir loop in syncop_mt_dir_scan() of the error but continue to process other entries in the queue, decrementing the qlen as and when we dequeue elements, and ending only when the queue is empty. - If the readdir loop in syncop_mt_dir_scan() gets an error form the worker, stop the readdir+queueing of further entries. > Reviewed-on: http://review.gluster.org/16073 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit 2d012c4558046afd6adb3992ff88f937c5f835e4) Change-Id: I39ce073e01a68c7ff18a0e9227389245a6f75b88 BUG: 1403187 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/16095 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tools/glusterfind: avoid deleting keys directoryMilind Changire2016-12-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: gluster volume delete mistakenly deletes the .keys directory under /var/lib/glusterd/glusterfind. Solution: Check for ".keys" directory and avoid deleting it. > BUG: 1402369 > Reviewed-on: http://review.gluster.org/16052 > Reviewed-by: Aravinda VK <avishwan@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 2b4b928ed350286192b63b10b18f85c669b741f8) Change-Id: Ia595c8bf3f423c1ad5d6faa183a29598c07a11f9 BUG: 1402671 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/16062 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* cluster/afr: Serialize conflicting locks on all subvolsPranith Kumar K2016-12-082-33/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1) When a blocking lock is issued and the parallel lock phase fails on all subvolumes with EAGAIN, it is not switching to serialized locking phase. 2) When quorum is enabled and locks fail partially it is better to give errno returned by brick rather than the default quorum errno. Fix: Handled this error case and changed op_errno to reflect the actual errno in case of quorum error. >BUG: 1369077 >Change-Id: Ifac2e4a13686e9fde601873012700966d56a7f31 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15984 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> BUG: 1402482 Change-Id: Ib1ca577bfa52ae537ab7186d10bfa2ae755813e3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16057 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Fix bugs in [f]inodelk/[f]entrylkPranith Kumar K2016-12-076-336/+468
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problems: 1) Inodelk is not taking quorum into account 2) finodelk, [f]entrylk are not implemented correctly 3) By default afr doesn't go for non-blocking parallel locks. Fix: Implemented a common framework which can be used by [f]inodelk/[f]entrylk. Used quorum for the same. >Change-Id: I239f13875a065298630d266941df10cfa3addc85 >BUG: 1369077 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15802 >Tested-by: Krutika Dhananjay <kdhananj@redhat.com> >Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Ravishankar N <ravishankar@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> BUG: 1402482 Change-Id: I0c5fed6ca87c6432bb20d00f76cdf5c328a52a85 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16056 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* glusterd/ganesha : handle volume reset properly for ganesha optionsJiffin Tony Thottan2016-12-076-87/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The "gluster volume reset" should first unexport the volume and then delete export configuration file. Also reset option is not applicable for ganesha.enable if volume value is "all". This patch also changes the name of create_export_config into manange_export_config Upstream reference : >Change-Id: Ie81a49e7d3e39a88bca9fbae5002bfda5cab34af >BUG: 1397795 >Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> >Reviewed-on: http://review.gluster.org/15914 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: soumya k <skoduri@redhat.com> >Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> >Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Change-Id: Ie81a49e7d3e39a88bca9fbae5002bfda5cab34af BUG: 1402366 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/16054 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cli: Print to screen frequentlyPranith Kumar K2016-12-071-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: CLI appears to be hung because XML document is not flushed periodically Fix: Flush the buffer as soon as we print something >BUG: 1395993 >Change-Id: Ic5f61d4c7d29ee162a124a049e60ceb810d6da6d >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15863 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Atin Mukherjee <amukherj@redhat.com> BUG: 1396779 Change-Id: Ie7b4093d54afc7dafc4d1d7c151a9b383cfe3830 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15885 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/dht: Check for null inodeN Balachandran2016-12-071-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Check for NULL inode before attempting to set dht inode ctx. > Change-Id: I7693c18445f138221d8417df5e95b118cedb818a > BUG: 1395261 > Signed-off-by: N Balachandran <nbalacha@redhat.com> > Reviewed-on: http://review.gluster.org/15847 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 8313d53accaa22feb14d284fb91245be0a32e16e) Change-Id: Id8c7bfe181bb40a02cd49b0f5fc3b45cabf5afa6 BUG: 1395517 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/15851 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* eventsapi: Push Messages to Webhooks in parallelAravinda VK2016-12-072-5/+67
| | | | | | | | | | | | | | | | | | | | | | | | | With this patch, glustereventsd will maintain one thread per webhook. If a webhook is slow, then all the events to that worker will be delayed but it will not affect the other webhooks. Note: Webhook in transit may get missed if glustereventsd reloads due to new Webhook addition or due configuration changes. >Reviewed-on: http://review.gluster.org/15966 >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Prashanth Pai <ppai@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> BUG: 1401261 Change-Id: I2d11e01c7ac434355bc356ff75396252f51b339b Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/16021 Tested-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* dht/rename : Incase of failure remove linkto file properlyJiffin Tony Thottan2016-12-062-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generally linkto file is created using root user. Consider following case, a user is trying to rename a file which he is not permitted. So the rename fails with EACESS and when rename tries to cleanup the linkto file, it fails. The above issue happens when rename/00.t test executed on nfs-ganesha clients : Steps executed in script * create a file "abc" using root * rename the file "abc" to "xyz" using a non root user, it fails with EACESS * delete "abc" * create directory "abc" using root * again try ot rename "abc" to "xyz" using non root user, test hungs here which slowly leds to OOM kill of ganesha process RCA put forwarded by Du for OOM kill of ganesha Note that when we hit this bug, we've a scenario of a dentry being present as: * a linkto file on one subvol * a directory on rest of subvols When a lookup happens on the dentry in such a scenario, the control flow goes into an infinite loop of: dht_lookup_everywhere dht_lookup_everywhere_cbk dht_lookup_unlink_cbk dht_lookup_everywhere_done dht_lookup_directory (as local->dir_count > 0) dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a linkto file on one of the subvol) dht_lookup_everywhere (as need_selfheal = 1). This infinite loop can cause increased consumption of memory due to: 1) dht_lookup_directory assigns a new layout to local->layout unconditionally 2) Most of the functions in this loop do a stack_wind of various fops. This results in growing of call stack (note that call-stack is destroyed only after lookup response is received by fuse - which never happens in this case) Thanks Du for root causing the oom kill and Sushant for suggesting the fix Upstream reference : >Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d >BUG: 1397052 >Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> >Reviewed-on: http://review.gluster.org/15894 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >(cherry picked from commit 57d59f4be205ae0c7888758366dc0049bdcfe449) Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d BUG: 1401023 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/16014 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* ganesha/scripts : avoid incrementing Export Id value for already exported ↵Jiffin Tony Thottan2016-12-041-32/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | volumes Currently a volume will unexport when it stops and reexport it during volume start using hook script. And also it increments the value for export id for each reexport. Since a hook script is called from every node parallely which may led inconsistency for export id value. Upstream reference : >Change-Id: Ib9f19a3172b2ade29a3b4edc908b3267c68c0b20 >BUG: 1399186 >Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> >Reviewed-on: http://review.gluster.org/15948 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: soumya k <skoduri@redhat.com> >Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> >(cherry picked from commit 76eef16d762f500df500de0d3187aff23dc39ac6) Change-Id: Ib9f19a3172b2ade29a3b4edc908b3267c68c0b20 BUG: 1401011 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/16013 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* selfheal: fix memory leak on client side healing queueMateusz Slupny2016-12-043-3/+8
| | | | | | | | | | | | | | | | | | | | | | > Reviewed-on: http://review.gluster.org/15968 > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> > Reviewed-by: Ravishankar N <ravishankar@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit fb95eb4da6f4fc0b9c69e3b159a2214fe47e6d1d) Change-Id: I2beaba829710565a3246f7449a5cd21755cf5f7d BUG: 1400926 Signed-off-by: Mateusz Slupny <mateusz.slupny@appeartv.com> Reviewed-on: http://review.gluster.org/16011 Tested-by: Ravishankar N <ravishankar@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/tier: fix op-version for tier-query-limitMilind Changire2016-12-022-2/+4
| | | | | | | | | | | | | | | | | | | | | | Correct the op-version for tier-query-limit option from 3.9.0 to 3.9.1 > BUG: 1366648 > Reviewed-on: http://review.gluster.org/15990 > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 530453c78146e8ba4f13636e1dec1ea59849c783) Change-Id: I3a52a94c2708a97c18377e945d559a51d8025c41 BUG: 1394482 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/16000 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* common-ha: IPaddr RA is not stopped when pacemaker quorum is lostKaleb S. KEITHLEY2016-12-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ken Gaillot writes: The other is pacemaker's no-quorum-policy cluster property. The default (which has not changed) is "stop" (stop all resources). Other values are "ignore" (act as if quorum was not lost), "freeze" (continue running existing resources but don't recover resources from unseen nodes) or "suicide" (shut down). But on my four node cluster % pcs property show no-quorum-policy Cluster Properties: % i.e. shows nothing. But: % pcs property list --all Cluster Properties: ... no-quorum-policy: stop ... % Seems to think it knows about it. and then % pcs property set no-quorum-policy=stop % pcs property show no-quorum-policy Cluster Properties: no-quorum-policy: stop % Which looks rather inconsistent. So we will try explicitly setting it to "stop" when there are three or more nodes. master bug 1400237 master patch http://review.gluster.org/#/c/15981/ Change-Id: I47fc7ee84fcd6ad52ccb776913511978a8d517b4 BUG: 1400572 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15996 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* eventsapi: Auto reload Webhooks data when modifiedAravinda VK2016-12-021-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | glustereventsd depends on reload signal to reload the Webhooks configurations. But if reload signal missed, no events will be sent to newly added Webhook. Added auto reload based on webhooks file mtime. Before pushing events to Webhooks, reloads webhooks configurations if previously recorded mtime is different than current mtime. > Reviewed-on: http://review.gluster.org/15731 > Reviewed-by: Prashanth Pai <ppai@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> BUG: 1399482 Change-Id: I83a41d6a52d8fa1d70e88294298f4a5c396d4158 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit b7ebffbda9ba784ccfae6d1a90766d5310cdaa15) Reviewed-on: http://review.gluster.org/15963 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* geo-rep: Fix Last synced status column issue during Hybrid CrawlAravinda VK2016-12-021-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | During Hybrid crawl, Geo-rep maintains stime xattr in subdirectories along with the Brick root. This is done to skip directories if Geo-rep crashes before Hybrid crawl completes. Update Last synced status only when stime xattr updated in brick root. Status output will mislead if it shows sub directory stime as last synced time. > Reviewed-on: http://review.gluster.org/15869 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Kotresh HR <khiremat@redhat.com> BUG: 1399470 Change-Id: I5b73aee7ae4a1c1e2d1001d1f55559b9f9efd6e6 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit 1876454d2e7950f25d1e5bb8e2c07ab27d521498) Reviewed-on: http://review.gluster.org/15962 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
* geo-rep: Handle ENOENT during unlinkAravinda VK2016-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | Do not raise traceback if a file/dir not exists during unlink or rmdir > Reviewed-on: http://review.gluster.org/15868 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> BUG: 1399092 Change-Id: Idd43ca1fa6ae6056c3cd493f0e2f151880a3968c Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit ecd6da0a754f21909dbbd8189228f5a27a15df3e) Reviewed-on: http://review.gluster.org/15940 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
* md-cache, afr: Reduce the window of stale readPoornima G2016-12-017-116/+370
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Consider a replica setup, where one mount writes data to a file and the other mount reads the file. In afr, read operations are not transaction based, a brick(read subvolume) is chosen as a part of lookup or other operations, read is always wound only to the read subvolume, even if there was write from a different client that failed on this brick. This stale read continues until there is a lookup or any write operation from the mount point. Currently, this is not a major issue, as a lookup is issued before every read and it will switch the read subvolume to a correct one. But with the plan of increasing md-cache timeout to 600s, the stale read problem will be more pronounced, i.e. stale read can continue for 600s(or more if cascaded with readdirp), as there will be no lookups. Solution: Afr doesn't have any built-in solution for stale read(without affecting the performance). The solution that came up, was to use upcall. When a file on any brick is marked bad for the first time, upcall sends a notification to all the clients that had recently accessed the file. The solution has 2 parts: - Identifying when a file is marked bad, on any of the bricks, for the first time - Client side actions on recieving the notifications Identifying when a file is marked bad on any of the bricks for the first time: ----------------------------------------------------------------------------- The idea is to track xattrop in upcall. xattrop currently comes with 2 afr xattrs - afr dirty bit and afr pending xattrs. Dirty xattr is set to 1 before every write, and is unset if write succeeds. In certain scenarios, dirty xattr can be 0 and still the file could be bad copy. Hence do not track dirty xattr. Pending xattr is set on the good copy, indicating the other bricks that have bad copy. It is still not as simple as, notifying when any of the pending xattrs change. It could lead to flood of notifcations, in case the other brick is completely down or consistantly failing. Hence it is important to notify only once, the first time a good copy is marked bad. Client side actions on recieving pending xattr change, notification: -------------------------------------------------------------------- md-cache will invalidate the cache of that file, so that further lookup is passed down to afr and hence update the read subvolume. Invalidating only in md-cache is not enough, consider the folling oder of opertaions: - pending xattr invalidation - invalidate md-cache - readdirp on the bad read subvolume - fill md-cache - lookup (served from md-cache) - read - wound to the old read subvol. Hence, along with invalidating md-cache, it is very important to reset the read subvolume for that file, in afr. Design Credit: Anuradha Talur, Ravishankar N 1. xattrop doesn't carry info saying post op/pre op. 2. Pre xattrop will have 0 value for all pending xattrs, the cbk of pre xattrop carries the on-disk xattr value. Non zero indicated healing is required. 3. Post xattrop will have non zero value for any of the pending xattrs, if the fop failed on any of the bricks. >Reviewed-on: http://review.gluster.org/15398 >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Signed-off-by: Poornima G <pgurusid@redhat.com> Change-Id: I469cbc111714c433984fe1c922be2ef113c25804 BUG: 1399450 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15958 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Implement IPC fopPoornima G2016-12-012-0/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/15378 Currently ipc() is not implemented in afr. md-cache and upcall uses ipc to register the list of xattrs, [1] for more details. For the ipc op GF_IPC_TARGET_UPCALL, it has to be wound to all the replica subvolumes. ipc() is failed when any of the subvolumes fails with other than ENOTCONN or all of the subvolumes are down. [1] http://review.gluster.org/#/c/15002/ >Reviewed-on: http://review.gluster.org/15378 >Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >Signed-off-by: Poornima G <pgurusid@redhat.com> Change-Id: I0f651330eafda64e4d922043fe53bd0014536247 BUG: 1399450 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15957 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd, cli: Fix volume options output format in get-state cliSamikshan Bairagya2016-12-011-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the get-state cli outputs the volume options in the following format: Volume1.rebalance.skipped: 0 Volume1.rebalance.lookedup: 0 Volume1.rebalance.files: 0 Volume1.rebalance.data: 0Bytes [Volume1.options] features.barrier: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Volume2.name: tv2 Volume2.id: 35854708-bb72-45a5-bdbd-77c51e5ebfb9 Volume2.type: Distribute This above format is a valid ini file format syntactically, but is not very easily parseable. This patch changes the format to look like the following and should be more easily parseable: Volume1.rebalance.skipped: 0 Volume1.rebalance.lookedup: 0 Volume1.rebalance.files: 0 Volume1.rebalance.data: 0Bytes Volume1.options.features.barrier: on Volume1.options.transport.address-family: inet Volume1.options.performance.readdir-ahead: on Volume1.options.nfs.disable: on Volume2.name: tv2 Volume2.id: 35854708-bb72-45a5-bdbd-77c51e5ebfb9 Volume2.type: Distribute > Reviewed-on: http://review.gluster.org/15975 > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Shubhendu Tripathi <shtripat@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 331a45942333e27f596b1930f7d459f59850c8a4) Change-Id: I9768b45de288d9817ec669d3a801874eb1914750 BUG: 1400545 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: http://review.gluster.org/15993 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* build: add systemd dependency to the glusterfs sub-packageMilind Changire2016-12-011-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: /bin/systemctl is not available at install time of primary glusterfs package. Solution: Add %{?systemd_requires} to the glusterfs sub-package install time requirements. Replace all "Requires: systemd" and "Requires: systemd-units" with %{?systemd_requires}. %systemd_requires is defined in /usr/lib/rpm/macros.d/macros.systemd systemd-units is provided by systemd. Add BuildRequires: systemd for the definition of %systemd_requires as well. > BUG: 1399031 > Reviewed-on: http://review.gluster.org/15936 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra Talur <rtalur@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> (cherry picked from commit 6138f4a2fc835bc94aa66543f5aee4f92081f1c7) Change-Id: I980ece7d538ea177ca6b0e70c1c169e6f04c46d4 BUG: 1400635 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15999 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* common-ha: add cluster HA status to --status output for gdeployKaleb S. KEITHLEY2016-12-011-28/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | gdeploy desires a one-liner "health" assessment. If all the VIP and port block/unblock RAs are located on their prefered nodes and 'Started', then the cluster is deemed to be good (healthy). N.B. status originally only checked the "online" nodes obtained from `pcs status` but we really want to consider all the configured nodes, whether they are online or not. Also one `pcs status` is enough. master bug 1395648 master http://review.gluster.org/15882 Change-Id: Id0e0380b6982e23763edeb0488843b5363e370b8 BUG: 1395649 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-by: Arthy Loganathan <aloganat@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15991 Tested-by: soumya k <skoduri@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* common-HA: Increase timeout for portblock RA of action=unblockSoumya Koduri2016-12-011-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Portblock RA of action type unblock stores the information about the client/server IPs connection in tickle_dir folder created in the shared storage. In case of node shutdown/reboot there could be cases wherein shared_storage may become unavailable for sometime. Hence increase the timeout to avoid that resource agent going into FAILED state. This is backport of below mainline patch - - http://review.gluster.org/15947 >Change-Id: I4f98f819895cb164c3a82ba8084c7c11610f35ff >BUG: 1399154 >Signed-off-by: Soumya Koduri <skoduri@redhat.com> >Reviewed-on: http://review.gluster.org/15947 >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> >Reviewed-by: Niels de Vos <ndevos@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> >(cherry picked from commit 1b2b5be970f78cc32069516fa347d9943dc17d3e) Change-Id: I8e590a1b21c2a73d324e0a20c17378c593c5ebd5 BUG: 1400546 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15994 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* geo-rep/cli: Validate Checkpoint labelAravinda VK2016-11-301-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | Checkpoint command accepts "now" or any other Time in "%Y-%m-%d %H:%M:%S" format as label. Validation added with this patch for the input label. Checkpoint set will fail for invalid label. > Reviewed-on: http://review.gluster.org/15721 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> > Reviewed-by: Kotresh HR <khiremat@redhat.com> BUG: 1395626 Change-Id: I23518c151ab4b294f64cae3b78baaacb3d8f7b82 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit 8a1993b32f476765f9f5c9294e7c3f2ae75198a0) Reviewed-on: http://review.gluster.org/15854 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com>
* protocol/server: Fix mem-leaks in compound fopsKrutika Dhananjay2016-11-301-187/+116
| | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/15965 * Remove spurious 'return' statement. * Free up 'compound_rsp_array_val' as well in the end. * Remove multiple refs on this_args->xdata. Change-Id: I207b5f3d87e7cfdd0d0997547cc4a0e9e32de8e1 BUG: 1399891 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15973 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Implement ipc fopPoornima G2016-11-304-1/+150
| | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/15387/ The ipc will be wound to all the bricks, but for it to be successfull, the fop should succeed on minimum number of bricks. >Reviewed-on: http://review.gluster.org/15387 >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Ashish Pandey <aspandey@redhat.com> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> >(cherry picked from commit 359b72a57b7c92fc2a11236ac05f5d740db2f540) Change-Id: I3f8cb6a349e87bafd0773583def9d4e3765aa140 BUG: 1399450 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15956 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tools/glusterfind: xml parsing fix for tiered volumesMilind Changire2016-11-301-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | gluster volume info <vol> --xml for non-tiered volumes have 'bricks/brick' elements under the 'volInfo/volumes/volume' element. However, tiered volumes have a 'bricks/hotBricks/brick' and 'bricks/coldBricks/brick' elements under the 'volInfo/volumes/volume' element. Fix main.py::get_nodes() > BUG: 1389481 > Reviewed-on: http://review.gluster.org/15746 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit 592fd52d5889cf8a46727b3609cdff60e9ef5c00) BUG: 1396142 Change-Id: I2f4465bfa8a55e7fa87917d3ec3e69b05d5241b9 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15871 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* eventsapi: Fix sending event during volume set helpAravinda VK2016-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Event sent when `gluster volume set help` command is run with Volume name as "help" and empty options list With this patch, event sent only when volume set on a real volume > Reviewed-on: http://review.gluster.org/15685 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> BUG: 1388070 Change-Id: Ia8785d6108cb86f7d89ecf9ea552df0334b41398 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit f31b3213e2a97259faa7dcae2354d2535732068b) Reviewed-on: http://review.gluster.org/15714 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/afr: CLI for granular entry heal enablement/disablementKrutika Dhananjay2016-11-2913-63/+407
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/15747 When there are already existing non-granular indices created that are yet to be healed, if granular-entry-heal option is toggled from 'off' to 'on', AFR self-heal whenever it kicks in, will try to look for granular indices in 'entry-changes'. Because of the absence of name indices, granular entry healing logic will fail to heal these directories, and worse yet unset pending extended attributes with the assumption that are no entries that need heal. To get around this, a new CLI is introduced which will invoke glfsheal program to figure whether at the time an attempt is made to enable granular entry heal, there are pending heals on the volume OR there are one or more bricks that are down. If either of them is true, the command will be failed with the appropriate error. New CLI: gluster volume heal <VOL> granular-entry-heal {enable,disable} Change-Id: Ic79519468a087cd337df664b968188c4adcba43a BUG: 1398500 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15941 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: A hard link is lost during rebalance + lookupMohit Agrawal2016-11-291-30/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: A hard link is lost during rebalance + lookup.Rebalance skip files if file has hardlink.In dht_migrate_file __is_file_migratable () function checks if a file has hardlink, if yes file is not migrated but if link is created after call this function then link will lost. Solution: Call __check_file_has_hardlink to check hardlink existence after (S+T) bits in migration process ,if file has hardlink then skip the file for migrate rebalance process. > BUG: 1396048 > Change-Id: Ia53c07ef42f1128c2eedf959a757e8df517b9d12 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (cherry picked from commit 4b8ccbed28837bd78894cb5ce3cf15bc8f364a93) BUG: 1399430 Change-Id: Idc869f2cf2355dacf54c36008840092b8e77acb9 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: http://review.gluster.org/15955 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* protocol/server: capture offset in seekRavishankar N2016-11-283-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: http://review.gluster.org/11482 implemented seek FOP but http://review.gluster.org/#/c/14137/ 'undid' the change where we pack the offset returned by seek in server xlator before sending it to the client. As a result, seek always returns zero to the client for SEEK_HOLE/ SEEK_DATA. Fix: I think 14137 removed it unintentionally, hence adding it back again. > Reviewed-on: http://review.gluster.org/15920 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> (cherry picked from commit cc37e5929d1e3ea4eaf4c4576a82066bf131ad05) Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I67a1f7b53214b043c5291f5704be4a50b698f91c BUG: 1399131 Reviewed-on: http://review.gluster.org/15944 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/tier: handle fast demotionsMilind Changire2016-11-2813-69/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Demote files on priority if hi-watermark has been breached and continue to demote until the watermark drops below hi-watermark. Monitor watermark more frequently. Trigger demotion as soon as hi-watermark is breached. Add cluster.tier-query-limit option to limit number of files returned from the database query for every iteration of tier_migrate_using_query_file(). If watermark hasn't dropped below hi-watermark during the first iteration, the next iteration will be triggered approximately 1 second after tier_demote() returns to the main tiering loop. Update changetimerecorder xlator to handle query for emergency demote mode. Add tier-ctr-interface.h: Move tier and ctr interface specific macros and struct definition from libglusterfs/src/gfdb/gfdb_data_store.h to new header libglusterfs/src/tier-ctr-interface.h > Reviewed-on: http://review.gluster.org/15158 > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Dan Lambright <dlambrig@redhat.com> (cherry picked from commit 460016428cf27484c333227f534c2e2f73a37fb1) Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811 BUG: 1394482 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/15835 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* libglusterfs:Now mempool is added to ctx pool list under a lockRajesh Joseph2016-11-282-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | mempool is added to ctx pool list without any lock. This can cause undefined behaviour in case of multithreaded environment. Fix: modify the list only under ctx->lock > Reviewed-on: http://review.gluster.org/15842 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra Talur <rtalur@redhat.com> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> > Reviewed-by: Poornima G <pgurusid@redhat.com> > Reviewed-by: Niels de Vos <ndevos@redhat.com> (cherry picked from commit 277008a3a8583ef10cec9e4182960792e56c5c10) Change-Id: I7bdbb3db48a899bb0e41427e149b13c0facaedba Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> BUG: 1397664 Reviewed-on: http://review.gluster.org/15908 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* cluster/afr: Fix deadlock due to compound fopsKrutika Dhananjay2016-11-271-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/15929 When an afr data transaction is eligible for using eager-lock, this information is represented in local->transaction.eager_lock_on. However, if non-blocking inodelk attempt (which is a full lock) fails, AFR falls back to blocking locks which are range locks. At this point, local->transaction.eager_lock[] per brick is reset but local->transaction.eager_lock_on is still true. When AFR decides to compound post-op and unlock, it is after confirming that the transaction did not use eager lock (well, except for a small bug where local->transaction.locks_acquired[] is not considered). But within afr_post_op_unlock_do(), afr again incorrectly sets the lock range to full-lock based on local->transaction.eager_lock_on value. This is a bug and can lead to deadlock since the locks acquired were range locks and a full unlock is being sent leading to unlock failure and thereby every other lock request (be it from SHD or other clients or glfsheal) getting blocked forever and the user perceives a hang. FIX: Unconditionally rely on the range locks in inodelk object for unlocking when using compounded post-op + unlock. Big thanks to Pranith for helping with the debugging. Change-Id: I2edcc13ac00bc1ba2e3558891ba98d0cd410b47a BUG: 1398888 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15932 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/index: Delete granular entry indices of already healed directories ↵Krutika Dhananjay2016-11-262-2/+97
| | | | | | | | | | | | | | | | | | | | | | | | during crawl Backport of: http://review.gluster.org/15880 If granular name indices are already in existence for a volume, and before they are healed, granular entry heal be disabled, a crawl on indices/xattrop will clear the changelogs on these directories. When their corresponding entry-changes indices are crawled subsequently, if it is found that the directories don't need heal anymore, the granular indices are not cleaned up. This patch fixes that problem by ensuring that the zero-xattrop also deletes the stale indices at the level of index translator. Change-Id: If4a2f14e33a78f2217e9fea8733ebb552af56059 BUG: 1398500 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15926 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Handle rpc errors, xdr failures etc with proper NULL checksKrutika Dhananjay2016-11-251-6/+17
| | | | | | | | | | | | | Backport of: http://review.gluster.org/15924 Change-Id: Ie2a181e113ba24abca8cd4fd6bb722d048f014a8 BUG: 1398499 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15925 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* performance/io-threads: Exit threads in fini() as wellPranith Kumar K2016-11-242-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: io-threads starts the thread in 'init()' but doesn't clean them up on 'fini()'. It relies on PARENT_DOWN to exit threads but there can be cases where event before PARENT_UP the graph init code can think of issuing fini(). This code path is hit when glfs_init() is called on a volume that is in 'stopped' state. It leads to a crash in ganesha process, because the io-thread tries to access freed memory. Fix: Ideal fix would be to wait for all fops in io-thread list to be completed on PARENT_DOWN, and have fini() do cleanup of threads. Because there is no proper documentation about how PARENT_DOWN/fini are supposed to be used, we are getting different kinds of sequences in different higher level protocols. So for now cleaning up in both PARENT_DOWN and fini(). Fuse doesn't call fini() gfapi is not calling PARENT_DOWN in some cases, so for now I don't see another way out. >BUG: 1396793 >Change-Id: I9c9154e7d57198dbaff0f30d3ffc25f6d8088aec >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: http://review.gluster.org/15888 >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> >(cherry picked from commit 25817a8c868b6c1b8149117f13e4216a99e453aa) Change-Id: Id55e7c2f3e90c013d40e59bfbfb3f1583b8c4061 BUG: 1397381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15921 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>