| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Race is explained at
https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0
This patch also handles performing of self-heal with shd-pid.
Also performs the healing with this->itable's inode rather than
main itable.
BUG: 1337405
Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14422
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If a named fresh-lookup is done on an loc and the fop fails on one of the
bricks or not sent on one of the bricks, but by the time response comes to afr,
if the brick is up, 'can_interpret' will be set to false in afr_lookup_done(),
this will lead to inode-ctx for that inode to be not set, this can lead to EIO
in case of a transaction as it depends on 'readable' array to be available by
that point.
Fix:
Refresh inode for inode-write fops for the ctx to be set if it is not already
done at the time of named fresh-lookup or if the file is in split-brain where
we need to perform one more refresh before failing the fop to check if the file
is still in split-brain or not.
BUG: 1336612
Change-Id: I5c50b62c8de06129b8516039f7c252e5008c47a5
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14368
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Spurious entries are reported in heal info when the mount is on second/third
brick of the replica pair because local-child is given preference in selecting
source. The code is supposed to suggest the file needs heal if the (source < 0)
(failure code path), but instead it is written as if any non-zero value
is considered failure.
Fix:
Treat +ve source as success case
BUG: 1335429
Change-Id: I1be7f9defef2ae03be7eec8d7d49bf34adeca82c
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/14302
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I52da41dff5619492b656c2217f4716a6cdadebe0
BUG: 1269461
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12442
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: During mount, afr waits for response from all its children before
notifying the parent xlator. In a 1x2 replica volume , if one of the nodes is
down, the mount will hang for more than a minute until child down is received
from the client xlator for that node.
Fix:
When parent up is received by afr, start a 10 second timer. In the timer call
back, if we receive a successful child up from atleast one brick, propagate the
event to the parent xlator.
Change-Id: I31e57c8802c1a03a4a5d581ee4ab82f3a9c8799d
BUG: 1054694
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11113
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Locking schemes in afr-v1 were locking the directory/file completely during
self-heal. Newer schemes of locking don't require Full directory, file locking.
But afr-v2 still has compatibility code to work-well with older clients, where
in entry-self-heal it takes a lock on a special 256 character name which can't
be created on the fs. Similarly for data self-heal there used to be a lock on
(LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks
before examining heal-state. If it doesn't take sh-domain locks, then there is
a possibility of heal-info hanging till self-heal completes because of
compatibility locks. But the problem with heal-info taking sh-domain locks is
that if two heal-info or shd, heal-info try to inspect heal state in parallel
using trylocks on sh-domain, there is a possibility that both of them assuming
a heal is in progress. This was leading to spurious entries being shown in
heal-info.
Fix:
As long as there is afr-v1 way of locking, we can't fix this problem with
simple solutions. If we know that the cluster is running newer versions of
locking schemes, in those cases we can give accurate information in heal-info.
So introduce a new option called 'locking-scheme' which if it is 'granular'
will give correct information in heal-info. Not only that, Extra network hops
for taking compatibility locks, sh-domain locks in heal info will not be
necessary anymore. Thus it improves performance.
BUG: 1322850
Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13873
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
All inodes that are looked-up are always forgotten without fail in
afr removing the benefits of them being in lru. This same code can
cause crashes if between inode_lookup, inode_forget in afr if the
top xlator does inode_forget(0).
Fix:
Don't use lookup/forget in afr. No benefits are there at the moment
for keeping this code. It is impossible to prevent top xlators to
do inode_forget(0). Found similar instances in ec
and removed them even though those code paths are not going to
be executed in any place other than heal-daemon.
BUG: 1321554
Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13834
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is part two change to prevent data loss
in a replicate volume on doing a add-brick operation.
Problem: After doing add-brick, there is a chance
that self heal might happen from the newly added
brick rather than the source brick, leading to data loss.
Solution: Mark pending changelogs on afr children for
the new afr-child so that heal is performed in the
correct direction.
Change-Id: I11871e55eef3593aec874f92214a2d97da229b17
BUG: 1276203
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12454
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is better to choose local brick as source if possible to prevent
over the wire read thus saving on bandwidth. Also changed code to not
attempt data-heal if 'source' is selected as arbiter.
Change-Id: I9a328d0198422280b13a30ab99545370a301dfea
BUG: 1314150
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13585
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: Krutika Dhananjay <kdhananj@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. In afr_getxattr_cbk, consider the errno value before blindly
launching an inode refresh and a subsequent retry on other children.
2. We want to accuse small files only when we know for sure that there is no
IO happening on that inode. Otherwise, the ia_sizes obtained in the
post-inode-refresh replies may mismatch due to a race between
inode-refresh and ongoing writes, causing spurious heal launches.
Change-Id: Ife180f4fa5e584808c1077aacdc2423897675d33
BUG: 1309462
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/13595
Smoke: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If afr_lookup_done() or afr_read_subvol_select_by_policy() chooses the
arbiter brick to serve the stat() data, file size will be reported as
zero from the mount, despite other data bricks being available. This can
break programs like tar which use the stat info to decide how much to read.
Fix:
In the inode-context, mark arbiter as a non-readable subvol for both
data and metadata.
It it to be noted that by making this fix, we are *not* going to serve
metadata FOPS anymore from the arbiter brick despite the brick storing
the metadata. It makes sense to do this because the ever increasing
over-loaded FOPs (getxattr returning stat data etc.) and compound FOPS
in gluster will otherwise make it difficult to add checks in code to
handle corner cases.
Change-Id: Ic60b25d77fd05e0897481b7fcb3716d4f2101001
BUG: 1310171
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Mat Clayton <mat@mixcloud.com>
Reviewed-on: http://review.gluster.org/13539
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key.
Dht uses same dict to send lookup to other subvolumes. So in case of
directories and more than 1 dht subvolumes, second subvolume till the last
subvolume won't get a lookup request with "gfid-req". So gfid reset never
happens on the directories in distributed replicate subvolume for 2nd till last
subvolumes.
Fix:
Make a copy of lookup xattr request.
Also fixed replies_wipe possibly resetting gfid to NULL gfid
BUG: 1312816
Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13545
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a heal is needed after inode refresh (lookup, read_txn), launch it in
the background instead of blocking the fop (that triggered refresh) until the
heal happens.
afr_replies_interpret() is modified such that the heal is
launched only if atleast one sink brick is up.
Max. no of heals that can happen in parallel is configurable via the
'background-self-heal-count' volume option. Any number greater than that
is put in a wait queue whose length is configurable via
'heal-wait-queue-leng' volume option. If the wait queue is also full,
further heals will be ignored.
Default values: background-self-heal-count=8, heal-wait-queue-leng=128
Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70
BUG: 1297172
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/13207
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now heal-info does an open() on the file being examined so that
the client at some point sees open-fd count being > 1 and releases
the eager-lock so that heal-info doesn't remain blocked forever
until IO completes.
Change-Id: Icc478098e2bc7234408728b54d8185102b3540dc
BUG: 1297695
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/13326
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I48d9e5313bd3ccf9fe26c90a7051a8a174d75c49
BUG: 1296818
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/13195
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
posix_mkdir does not send response xdata. So even though replies are
valid, the response xdata dict is NULL. Check if dict is non-null in
afr_replies_interpret before doing dict_get
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Change-Id: If543d68d8bfd2433519105839d5be106076cc276
BUG: 1294053
Reviewed-on: http://review.gluster.org/13185
Tested-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 2b7226f9d3470d8fe4c98c1fddb06e7f641e364d did not check for the
validity of a dict before doing a dict_get. Fix that.
Change-Id: Ie21f4da19256b17196f242cd8fd5bb76b0a69c1e
BUG: 1294053
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/13077
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an object (file) is marked bad by bitrot, do not consider the brick
on which the object is present as a potential read subvolume for AFR
irrespective of the pending xattr values.
Also do not consider the brick containing the bad object while
performing afr_accuse_smallfiles(). Otherwise if the bad object's size
is bigger, we may end up considering that as the source.
Change-Id: I4abc68e51e5c43c5adfa56e1c00b46db22c88cf7
BUG: 1290965
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12955
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For fd based operations (fgetxattr, readv etc.) if an inode refresh is
required, do so using fstat instead of lookup. This is because the file
might have been deleted by another client before refresh but posix
mandates that FOPS using already open fds must still succeed.
Change-Id: Id5f71c3af4892b648eb747f363dffe6208e7ac09
BUG: 1285230
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12894
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ida863844e14309b6526c1b8434273fbf05c410d2
BUG: 1250803
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12658
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Things done :
1) during lookup and inode_refresh as part of read_txn,
request is sent to detect if heal is required or not.
2) If heal is required, be conservative in setting the
readdirp entry inodes to NULL, otherwise don't be.
3) Self-heal-daemon now crawls both indices/xattrop
and indices/dirty directory while healing.
Change-Id: Ic4a4da63fb7e0726eab5f341a200859b29cf7eb7
BUG: 1250803
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12507
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8ae7af266d3e00460f0cfdc9389a926e5f2fee36
BUG: 1282761
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/12598
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a part of CHILD_MODIFIED event DHT forgets the current layout and
performs fresh lookup. However this is not required when a replica pair
goes offline as the xattrs can be read from other replica pairs. Hence
setting different event to handle replica pair going down.
Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3
BUG: 1281230
Signed-off-by: Sakshi Bansal <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/12573
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails,
transaction.failed_subvols[i] is set. If if fails on all chidren, we can
directly proceed to unlock (via afr_changelog_post_op_now) without trying
to wind the write, fail and then go to unlock.
2. 'fop_subvols' seems to be an unused variable, hence removing it.
Change-Id: I9525628daf48082e979b0093fa0478934495e61f
BUG: 1272362
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12368
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On executing `getfattr -n replica.split-brain-status <file>` on mount,
there is a possibility that the mount hangs. To avoid this hang,
fetch the split-brain-status of a file in synctask.
Change-Id: I87b781419ffc63248f915325b845e3233143d385
BUG: 1262345
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/12163
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three kinds of inline functions: plain inline, extern inline,
and static inline. All three have been removed from .c files, except
those in "contrib" which aren't our problem. Inlines in .h files, which
are overwhelmingly "static inline" already, have generally been left
alone. Over time we should be able to "lower" these into .c files, but
that has to be done in a case-by-case fashion requiring more manual
effort. This part was easy to do automatically without (as far as I can
tell) any ill effect.
In the process, several pieces of dead code were flagged by the
compiler, and were removed.
Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155
BUG: 1245331
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/11769
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When xlators above afr do [f]xattrop when one of the bricks is down, after the
brick comes backup, the metadata is not healed because [f]xattrop is not
considered a transaction.
Fix:
Treat [f]xattrop as transaction so that changes done by xlators above afr are
marked for heal when some of the bricks were down at the time of [f]xattrop.
Change-Id: Iea180f9a456509847c3cd8d5d59a0cdc2712d334
BUG: 1248887
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/11809
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During lookup and discover, currently read_subvol is based
only on data_readable. read_subvol should be decided based
on both data_readable and metadata_readable.
Credits to Ravishankar N for the logic of afr_first_up_child
from http://review.gluster.org/10905/ .
Change-Id: I98580b23c278172ee2902be08eeaafb6722e830c
BUG: 1240244
Signed-off-by: Anuradha Talur <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/11551
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When a replica's child goes down and comes up, the index heal is
triggered only on the child that just came up. This does not serve the
intended purpose as the list of files that need to be healed
to this child is actually captured on the other child of the replica.
Fix:
Launch index-heal on all local children of the replica xlator which just
received a child up. Note that afr_selfheal_childup() eventually calls
afr_shd_index_healer() which will not run the heal on non-local
children.
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Change-Id: Ia23e47d197f983c695ec0bcd283e74931119ee55
BUG: 1253309
Reviewed-on: http://review.gluster.org/11912
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: AFR serves statfs from the brick having the least free space
available. Since the size to be allocated to the arbiter brick in a 3
way replica is supposed to be considerably lesser than the other 2
bricks, statfs will be served from this brick which is incorrect.
Fix: Don't serve statfs from the arbiter brick.
Change-Id: I5af098b9c50626f52cf3d7dbb060bf754c797f05
BUG: 1251346
Reported-by: Fredrik Brandt <fredrikb@denlillaplaneten.se>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/11857
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated
Change-Id: I94ac7b2cb0d43a82cf0eeee21407cff9b575c458
BUG: 1194640
Signed-off-by: arao <arao@redhat.com>
Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com>
Reviewed-on: http://review.gluster.org/9897
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For directories, block metadata FOPS.
For non-directories, block data and metadata FOPS.
Do not block entry FOPS.
Change-Id: Id7f656f4a513b9d33c457dd7f2d58028dbef8e61
BUG: 1235007
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/11371
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
| |
calculation
Change-Id: I12c1e4f67f4ec4affbe13d7daf871044a8a2a12e
BUG: 1235216
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/11373
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
afr-v1 had the following volume set options that are used to enable/ disable
self-heals from happening in AFR xlator when loaded in the client graph:
cluster.metadata-self-heal
cluster.data-self-heal
cluster.entry-self-heal
In afr-v2, these 3 heals can happen from the client if there is an inode
refresh. This patch allows such heals to proceed only if the corresponding
volume set options are set to true.
Change-Id: I8d97d6020611152e73a269f3fdb607652c66cc86
BUG: 1226507
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/11012
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I7ec29428b7f7ef249014f948a5d616bfb8aaf80d
BUG: 1225491
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/10946
Tested-by: NetBSD Build System
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In afr_lookup_xattr_req_prepare(), dict_copy was
done even though source dict was NULL.
Change-Id: I85a5d2823ba021e7f78c1ce13402a0f16b08cb51
BUG: 1220332
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/10755
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Provided setfattr command to set timeout for split-brain
choice.
2) If split-brain inspection/resolution is being done
from the mount for a file, ref the inode when
split-brain-choice is set.
This inode will be unconditionally unref-ed after timeout
seconds set by the user/default otherwise.
3) Updated the doc and testcase to reflect the changes.
Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1
BUG: 1209104
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/10134
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add logic in afr to work in conjunction with the arbiter xlator when a
replica 3 arbiter volume is created. More specifically, this patch:
* Enables full locks for afr data transaction for such volumes.
* Removes the upfront marking of pending xattrs at the time of pre-op
and defer it to post-op. (This is an arbiter independent change and is made for all afr transactions.)
* After pre-op stage, check if we can proceed with the fop stage without
ending up in split-brain by examining the changelog xattrs.
* Unwinds the fop with failure if only one source was available at the
time of pre-op and the fop happened to fail on particular source brick.
* Skips data self-heal if arbiter brick is the only source available.
* Adds the arbiter-count option to the shd graph.
This patch is a part of the arbiter logic implementation for 3 way AFR
details of which can be found at http://review.gluster.org/#/c/9656/
Change-Id: I9603db9d04de5626eb2f4d8d959ef5b46113561d
BUG: 1199985
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/10258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If both dicts are NULL then equal. If one of the dicts is NULL but the other
has only ignorable keys then also they are equal. If both dicts are non-null
then check if for each non-ignorable key, values are same or not. value_ignore
function is used to skip comparing values for the keys which must be present in
both the dictionaries but the value could be different.
geo-rep's stime xattr doesn't need to be present in list xattr but when
getxattr comes on stime xattr even if there aren't enough responses with the
xattr we should still give out an answer which is maximum of the stimes
available.
Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d
BUG: 1207712
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/10078
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CID : 1194648
Change-Id: Ib26e7cdbf412d563240885fb3113bcc1fe5c9c49
BUG: 789278
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/9571
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.
Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.
A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.
BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
read-subvol-entry.t tests that if a brick has pending operations,
it is not used for readdir operations. On NetBSD this test exhibits
spurious failures, with the wrong brick being used to perform readdir.
It happens because when afr_replies_interpret() looks at xattr for
pending attributes, it uses alternative bahvior whether it is working
on a directory or another object. The decision is based on inode->ia_type,
which may be IA_INVAL at that time if we come there from:
afr_replies_interpret.()
afr_xattrs_are_equal()
afr_lookup_metadata_heal_chec()
afr_lookup_entry_heal()
afr_lookup_cbk()
Using replies[i].poststat.ia_type, which is correctly set, works around
the problem.
BUG: 1129939
Change-Id: Id9ccdd8604f79a69db5f1902697f8913acac50ad
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/9831
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the "Signer" -- responsible for signing files with their
checksums upon last file descriptor close (last release()).
The event notification facility provided by the changelog xlator
is made use of.
Moreover, checksums are as of now SHA256 hash of the object data
and is the only available hash at this point of time. Therefore,
there is no special "what hash to use" type check, although it's
does not take much to add various hashing algorithms to sign
objects with. Signatures are stored in extended attributes of the
objects along with the the type of hashing used to calculate the
signature. This makes thing future proof when other hash types
are added. The signature infrastructure is provided by bitrot
stub: a little piece of code that sits over the POSIX xlator
providing interfaces to "get or set" objects signature and it's
staleness.
Since objects are signed upon receiving release() notification,
pre-existing data which are "never" modified would never be
signed. To counter this, an initial crawler thread is spawned
The crawler scans the entire brick for objects that are unsigned
or "missed" signing due to the server going offline (node reboots,
crashes, etc..) and triggers an explicit sign. This would also
sign objects when bit-rot is enabled for a volume and/or after
upgrade.
Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b
BUG: 1170075
Original-Author: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9711
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Part 2/2 patch to enable users analyze and resolve
split-brain.
This patch enables :
1) Users to inspect the files in data and metadata split-brain.
2) Resolve the split-brain.
Both using a series of setfattr commands.
Consider a volume "test" with 2 bricks.
1) To inspect a file f1:
setfattr -n replica.split-brain-choice -v test-client-0 f1
After the execution of this command, if no read_subvol
is found, reads will be served from test-client-0 (corresponding
to brick-0).
2) To resolve split-brain :
setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1
Execution of this command will lead to the resolution
of data and metadata split-brain with subvol mentioned in the
command (test-client-0 here) as the source and the rest as sink.
Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179
BUG: 1191396
Signed-off-by: Anuradha <atalur@redhat.com>
Reviewed-on: http://review.gluster.org/9743
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
During pre-op phase, the index xlator
1. Creates the entry inside .glusterfs/indices/xattrop
2. Winds the xattrop fop to posix to mark dirty/pending changelogs.
If the brick crashes after 1, the xattrop entry becomes stale and never
gets removed by shd during subsequent crawls because there is nothing to
heal (changelogs are zero).
Though the stale entry does not get displayed in the output of 'heal
info' command, it nevertheless stays there forever unless a new write
transaction is performed on the file.
Fix:
During index self-heal if afr xattrs are found to be clean (indicated by
ret value of 2 on a call to afr_shd_selfheal(), send a dummy
post-op with all 0s for the xattr values, which makes the index xlator
to unlink the stale entry.
Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919
BUG: 1190069
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/9714
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Afr needs to query QUOTA_SIZE_KEY from all the subvolumes and return the
value which is maximum of the readable bricks.
Change-Id: Ibb9064c8652aea0d984796e7a06f8adca72aa971
BUG: 1199431
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9820
Reviewed-by: Anuradha Talur <atalur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a way of disabling reads when quorum is not met.
Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5
BUG: 1187885
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9543
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PROBLEM:
When file modifications are happening while index heal is launched,
index healer could pick up entries which appeared in indices/xattrop
transiently during the course of the operations on the mount point, and
do not really need any heal. This will cause index healer to keep doing
index-heal in a loop as long as it finds this entry, by believing that
it did successfully heal some gfids even when it didn't.
FIX:
afr_selfheal() now returns a 1 to indicate that it did not (need to)
heal a given gfid. afr_shd_selfheal() will not increment healed_count
whenever afr_selfheal() returns a 1.
Change-Id: I0d97e11392a032a852e8c6508f691300ef0e5b98
BUG: 1194305
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/9713
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initially even after calling glfs_fini(), all the threads created
during init and many other resources like memory pool, iobuf pool,
event pool and other memory allocs were not being freed.
With this patch these resources are freed in glfs_fini().
The two thumb rules followed in this patch are:
- The threads are not killed, they are made to exit voluntarily,
once the queued tasks are completed. The main thread waits for
the other threads to exit.
- Free the memory pools and destroy the graphs only after all the
other threads are stopped, so that there are less chances of
hitting access after free.
Resources freed and its order:
1. Destroy the inode table of all the graphs - Call forget on all the inodes.
This will not be required when the cleanup during graph switch is
implemented to perform inode table destroy.
2. Deactivate the current graph, call fini of all the xlators.
3. Syncenv destroy - Join the synctask threads and cleanup syncenv resources
Sets the destroy mode, complete the existing synctasks, then join the
synctask threads.
After entering the destroy mode,
-if a new synctask is submitted, it fails.
-if syncenv_new() is called, it will end up creating new threads,
but this is called only during init.
4. Poller thread destroy
Register an event handler which sets the destroy mode for the poller.
Once the poller is done processing all the events, it exits.
5. Tear down the logging framework
The log file is closed and the log level is set to none, after this
point no log messages appear either in log file or in stderr.
6. Destroy the timer thread
Set the destroy bit, once the pending timer events are processed
the timer thread exits.
Note: Log infrastructure should be shutdown before destroying the timer
thread as gf_log uses timers.
7. Destroy the glusterfs_ctx_t
For all the graphs(active and passive), free graph, xlator structs and few other lists.
Free the memory pools - iobuf pool, event pool, dict, logbuf pool,
stub mem pool, stack mem pool, frame mem pool.
Few things not addressed in this patch:
1. rpc_transport object not destroyed, the PARENT_DOWN should have
destroyed this object but has not, needs to be addressed as a part
of different patch
2. Each xlator fini should clean up the local pool allocated by its xlator.
Needs to be addresses as a part of different patch.
3. Each xlator should implement forget to free its inode_ctx.
Needs to be addresses as a part of different patch.
3. Few other leaks reported by valgrind.
4. fd and fd contexts
The numbers:
The resource usage by the test case in this patch:
Without the fix, Memory: ~3GB; Threads: ~81
With this fix, Memory: 300MB; Threads: 1(main thread)
Change-Id: I96b9277541737aa8372b4e6c9eed380cb871e7c2
BUG: 1093594
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/7642
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|