| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.
Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.
Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().
Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.
The .ts which relied on inode refresh in lookup path to trigger heals are
now changed to do read txn so that inode refresh and the heal happens.
Change-Id: Iebf39a9be6ffd7ffd6e4046c96b0fa78ade6c5ec
Fixes: #1179
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Erik Jacobson <erik.jacobson at hpe.com>
(cherry picked from commit f0fcd909ad4535b60c9208d4804ebe6afe421a09)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After add-brick and rebalance, the ctime xattr is not present
on rebalanced directories on new brick. This patch fixes the
same.
Note that ctime still doesn't support consistent time across
distribute sub-volume.
This patch also fixes the in-memory inconsistency of time attributes
when metadata is self healed.
Backport of:
> Patch: https://review.gluster.org/23127
> Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
> BUG: 1734026
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
Patch: https://review.gluster.org/23127
Change-Id: Ia20506f1839021bf61d4753191e7dc34b31bb2df
fixes: bz#1752413
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
On a EC volume, during upgrade from the older version where
ctime feature is not enabled(or not present) to the newer
version where the ctime feature is available (enabled default),
the self heal hangs and doesn't complete.
Cause:
The ctime feature has both client side code (utime) and
server side code (posix). The feature is driven from client.
Only if the client side sets the time in the frame, should
the server side sets the time attributes in xattr. But posix
setattr/fseattr was not doing that. When one of the server
nodes is updated, since ctime is enabled by default, it
starts setting xattr on setattr/fseattr on the updated node/brick.
On a EC volume the first two updated nodes(bricks) are not a
problem because there are 4 other bricks with consistent data.
However once the third brick is updated, the new attribute(mdata xattr)
will cause an inconsistency on metadata on 3 bricks, which
prevents the file to be repaired.
Fix:
Don't create mdata xattr with utimes/utimensat system call.
Only update if already present.
Backport of:
> Patch: https://review.gluster.org/22858
> Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c
> BUG: 1720201
> Signed-off-by: Kotresh HR <khiremat@redhat.com>
Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c
fixes: bz#1722805
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.
As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.
Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1662264
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1665358
Change-Id: Idbf88ec3ac683733b32c313377eeb72f2819bf0d
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
| |
With this changeset, default value for the AFR client side
heal volume option is set to "off"
fixes: bz#1663102
Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Iec47856ce2819e7d7d38a60279602e53ba45858d
updates: bz#1624332
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Test : Check success/failure of write fop while
different bricks/ta process are down.
Change-Id: I3c376935df93ebf1f794c964bd19bc1280d91c59
updates: bz#1624332
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the proposal to remove few features as they are not
actively maintained [1], removing tier translator from the
build. Also make sure there are no regression tests involving
tiering feature are present.
[1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html
Change-Id: I2c177f711f9b54b7b24e1a13525ff3132bd9a9c5
updates: bz#1642807
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.
- All further write txn failures are handled based on this in-memory
value without querying the TA.
- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.
- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.
- Any write txns got after the release request are maintained in a ta_waitq.
- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.
- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.
Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1579788
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.
When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.
Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1579788
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If both data bricks are up, read subvol will be based on read_subvols.
If only one data brick is up:
- First qeury the data-brick that is up. If it blames the other brick,
allow the reads.
- If if doesn't, query the TA to obtain the source of truth.
TODO: See if in-memory state can be maintained for read txns (BZ 1624358).
updates: bz#1579788
Change-Id: I61eec35592af3a1aaf9f90846d9a358b2e4b2fcc
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.
Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc
fixes bz#1622821
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.
Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.
Updates bz#1622821
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modifications
PROBLEM:
Stats of dentries that are readdirp'd ahead can become stale due to
fops like writes, truncate etc that modify the file pointed by
dentries. When a readdir is finally wound at offset corresponding to
these entries, the iatts that are returned to the application come
from readdir-ahead's cache, which are stale by now. This problem gets
further aggravated when caching translators/modules cache and continue
to serve this stale information.
FIX:
* Store the iatt in context of the inode pointed by dentry.
* Whenever the inode pointed by dentry undergoes modification, in cbk
of modification fop, update the iatt stored in inode-ctx to reflect
the modification.
* When serving a readdirp response from application, update iatts of
dentries with the iatts stored in the context of inodes pointed by
these dentries.
* Some fops don't have valid iatts in their responses. For eg., write
response whose data is still cached in write-behind will have zeroed
out stat. In this case keep only ia_type and ia_gfid and reset rest
of the iatt members to zero.
- fuse-bridge in this case just sends "entry" information back to
kernel and attr is not sent.
- gfapi sets entry->inode to NULL and zeroes out the entire stat
* There is one tiny race between the entry creation and a readdirp on
its parent dir, which could cause the inode-ctx setting and inode
ctx reading to happen on two different inode objects. To prevent
this, when entry->inode doesn't eqaul to linked_inode,
- fuse-bridge is made to send only "entry" information without
attributes
- gfapi sets entry->inode to NULL and zeroes out the entire stat.
Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
| |
fixes bz#1615789
Change-Id: I1f42e78fec5ddaf2a425dc4b82c9a20472aa146d
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test was retried once on build
https://build.gluster.org/job/regression-on-demand-multiplex/174/
(logs for the first try is not available with this build)
Test case was failing in line #47 where it was was checking for the
heal count to be 0. Line #51 had passed that means file got the gfid
split brain resolved, and both the bricks had same gfids.
At line #54 it again failed which checks for the md5sum on both the
bricks. At this point the md5sum of the brick where the file got
impunged had the md5sum same as the newly created empty file. This
means the data heal has not happened for the file.
At line #64 enabling granular-entry-heal faild, but without the logs
it is not possible to debug this issue.
Change-Id: I56d854dbb9e188cafedfd24a9d463603ae79bd06
fixes: bz#1615331
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
| |
Please see BZ for details.
Change-Id: Id9273432874bc6a452ac96b2b8c7a61ea6c5b98d
Fixes: bz#1615239
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
| |
Please see bug description for details.
Change-Id: Ieb6bce6d1d5c4c31f1878dd1a1c3d007d8ff81d5
fixes: bz#1614654
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shd keeps doing heals in a loop until it heals at least one entry in the
previous run. A heal is termed successful only if it heals both metadata and
entry/data heal i.e. the entry needs to be completely healed by just that healer.
In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old and brick-1
is new. After replace-brick only root-gfid will be present in brick-0's index
1) shd-thread corresponding to brick-0 does metadata heal, this creates
root-gfid in brick-0's 'dirty' index.
2) Both healer threads corresponding to brick-0 and brick-1 now try to heal
root-gfid and brick-1 gets the heal-domain lock. brick-0's shd-thread will
experience a failure and it goes back to waiting for 10 minutes
(cluster.heal-timeout).
3) When brick-1's healer-thread completes healing root-gfid it creates 5 files
which create indices in brick-0, so until brick-0 doesn't trigger one more
heal, heal won't happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser
than cluster.heal-timeout, so decreasing this to 5 seconds so that the next
heal is triggered which will do the heals.
fixes bz#1613807
Change-Id: I881133fc28880d8615fbc4558a0dfa0dc63d7798
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modified while in cache"
This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c.
With the latest master, I created a single brick volume and some files
inside it.
[root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
again"; ls -l /mnt/fuse1
umount: /mnt/fuse1: not mounted
total 0
----------. 0 root root 0 Jan 1 1970 file-1
----------. 0 root root 0 Jan 1 1970 file-2
----------. 0 root root 0 Jan 1 1970 file-3
----------. 0 root root 0 Jan 1 1970 file-4
----------. 0 root root 0 Jan 1 1970 file-5
d---------. 0 root root 0 Jan 1 1970 subdir
Trying again
total 3
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-1
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-2
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-3
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-4
-rw-r--r--. 1 root root 33 Aug 3 14:06 file-5
d---------. 0 root root 0 Jan 1 1970 subdir
[root@rhgs313-6 ~]#
Conversation can be followed on gluster-devel on thread with subj:
tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected
pointed this patch as culprit.
Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Updates: bz#1390050
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
invalidation
Invalidations are triggered mainly by two codepaths - upcall and
write-behind unwinding a cached write with zeroed out stat. For the
case of upcall, following race can happen:
* stat s1 is fetched from brick
* invalidation is detected on brick
* invalidation is propagated to md-cache and cache is invalidated
* s1 updates md-cache with a stale state
For the case of write-behind, imagine following sequence of operations,
* A stat s1 was issued from application thread t1 when size of file
was s1
* stat s1 completes on brick stack, but yet to reach md-cache
* A write w1 from application thread t2 extends file to size s2 is
cached in write-behind and response is unwound with zeroed out stat
* md-cache while handling write-cbk, invalidates cache
* md-cache receives response for s1, updates cache with stale stat
with size of s1 overwriting invalidation state
Fix is to remember when s1 was incident on md-cache and update cache
with results of s1 only if the it was incident after invalidation of
cache.
This patch identified some bugs in regression tests which is tracked
in https://bugzilla.redhat.com/show_bug.cgi?id=1608158. As a stop gap
measure I am marking following tests as bad
basic/afr/split-brain-resolution.t
bugs/bug-1368312.t
bugs/replicate/bug-1238398-split-brain-resolution.t
bugs/replicate/bug-1417522-block-split-brain-resolution.t
bugs/replicate/bug-1438255-do-not-mark-self-accusing-xattrs.t
Change-Id: Ia4bb9dd36494944e2d91e9e71a79b5a3974a8c77
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Updates: bz#1512691
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libargp or argp-standalone is available on all commonly used
distributions. There is no need to bundle an unmaintained version of
argp-standalone in this repository anymore.
FreeBSD places the argp.h file in /usr/local/include when
argp-standalone is installed. This path is not added to CPPFLAGS by
default, so thats done in configure.ac as well.
Change-Id: I384a53ab0a008ec9d48fd83afeaf8fbc197e91ee
Fixes: bz#1609337
Signed-off-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
while in cache
PROBLEM:
Entries that are readdirp'd ahead can undergo modification in terms
of writes, truncates which could modify their iatts. When a readdir
is finally wound at offset corresponding to these entries, the iatts
that are returned to the application come from readdir-ahead's cache,
which are stale by now. This problem gets further aggravated when caching
translators/modules cache and continue to serve this stale information.
FIX:
Whenever a dentry undergoes modification, in the cbk of the modification fop,
a "dirty" flag (default 0) is set in its inode ctx. When it's time for
readdir-ahead to serve these entries, it will read the inode ctx and check
if the entry is "dirty", and if it is, set the entry's attrs to all zeroes,
as an indicator to fuse, md-cache etc not to cache these attributes.
Also there is one tiny race between the entry creation and a readdirp on its
parent dir, which could cause the inode-ctx setting and inode ctx reading to
happen on two different inode objects. To prevent this, fuse-bridge is made to
drop entries for which dentry->inode is not the same as linked inode,
in readdirp cbk.
Change-Id: If7396507632b5268442ca580473d5155fee9cbef
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
| |
fixes bz#1597156
Change-Id: I323eb9190e40b12df216698dcdba74a6d336beeb
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
| |
fixes: bz#1596524
updates: gluster/glusterd2#515
Change-Id: I8a46fa2fd1fd2b0e9fbcecd3bb18d348aed9c6a9
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
BZ 1337791 marked this .t as bad based on the tar version being a likely
suspect. Undoing this to check as it passes on the latest jenkins slaves.
Change-Id: Ia581064a9c620351d3fe7aeef95d2644337952e1
fixes: bz#1595492
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reported-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Updates: #363
This new value (3) will try to wind read requests to the child of AFR
having the least amount of pending requests in its queue.
Change-Id: If6bda2aac9bf7aec3fc39622f78659313c4b6508
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
| |
BUG: 1557932
Change-Id: I3783e41b3812267bc10c0d05d062a31396ce135b
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We are not seeing much improvement with this change. So removing the
feature so that it doesn't need to be maintained anymore.
Fixes: #414
Change-Id: Ic7969b151544daf2547bd262a9fa03f575626411
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses 'timeout' command with 300 seconds default. Right now,
there is just 1 test which takes more than that in a properly
setup machine.
Ideally best case is set the default to something like 30 seconds,
and if a test is supposed to take more than that, owner should add
a timeout line to test knowingly. That way, it makes test writers
think about a time limit too.
Change-Id: I747005ce1f208aeb2ecbf899e8feea487ecd21a0
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not
met. Though the FOP is still unwound with failure, the xattrs remain on
the disk. Due to these partial post-ops and partial heals (healing only when
2 bricks are up), we can end up in split-brain purely from the afr
xattrs point of view i.e each brick is blamed by atleast one of the
others. These scenarios are hit when there is frequent
connect/disconnect of the client/shd to the bricks while I/O or heal
are in progress.
Fix:
Instead of undoing the post-op, pick a source based on the xattr values.
If 2 bricks blame one, the blamed one must be treated as sink.
If there is no majority, all are sources. Once we pick a source,
self-heal will then do the heal instead of erroring out due to
split-brain.
Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae
BUG: 1539358
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This test was failing due to an infra issue. The infra issue is now
fixed.
BUG: 1517961
Change-Id: I09dfab9c0a3ebe73c738222e6269d9e35c85eddb
Signed-off-by: Nigel Babu <nigelb@redhat.com>
|
|
|
|
|
|
| |
Change-Id: If6c36dc6c395730dfb17b5b4df6f24629d904926
BUG: 1517961
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In an arbiter volume, lookup was being served from one of the sink
bricks (source brick was down). shard uses the iatt values from lookup cbk
to calculate the size and block count, which in this case were incorrect
values. shard_local_t->last_block was thus initialised to -1, resulting
in an infinite while loop in shard_common_resolve_shards().
Fix:
Use client quorum logic to allow or fail the lookups from afr if there
are no readable subvolumes. So in replica-3 or arbiter vols, if there is
no good copy or if quorum is not met, fail lookup with ENOTCONN.
With this fix, we are also removing support for quorum-reads xlator
option. So if quorum is not met, neither read nor write txns are allowed
and we fail the fop with ENOTCONN.
Change-Id: Ic65c00c24f77ece007328b421494eee62a505fa0
BUG: 1467250
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
| |
Update .t tier tests to use the new tier CLI.
Change-Id: I0e7f1769071108d8266fc86378c4466bcaf96e7d
BUG: 1505253
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Append on a file with split-brain succeeds. Open is intercepted by open-behind,
when write comes on the file, open-behind does open+write. Open succeeds
because afr doesn't fail it. Then write succeeds because write-behind
intercepts it. Flush is also intercepted by write-behind, so the application
never gets to know that the write failed.
Fix:
Fail open on split-brain, so that when open-behind does open+write open fails
which leads to write failure. Application will know about this failure.
Change-Id: I4bff1c747c97bb2925d6987f4ced5f1ce75dbc15
BUG: 1294051
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
| |
BUG: 1501390
Change-Id: I9a04c094783ec33e617baeae3d0e0cbedb1d6c3b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable gfid2path feature by default. The basic
performance tests are carried out and it doesn't
show significant depreciation. The results are
updated in issue.
Updates: #139
Change-Id: I5f1949a608d0827018ef9d548d5d69f3bb7744fd
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/17950
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd shouldn't concern itself with creating directories specific to
certain xlators.
The index xlator will now proceed creating './glusterfs/indices' dir
only if the parent '.glusterfs' directory exists, which still fixes the
original problem reported i.e 'volume start force' command shouldn't
create brick path if it doesn't exist (BUG 1457202)
This reverts most of the changes done by the commit
b58a15948fb3fc37b6c0b70171482f50ed957f42
Change-Id: I7fc52ad64dce220e336c218fb4d85933ca2e61c0
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Reviewed-on: https://review.gluster.org/18003
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Currently there is no way for the admin from CLI to resolve gfid
split-brain based on some policy like choice of the brick, mtime
or size.
Fix:
With the existing CLI options based on size, mtime, and choice of
brick, we do lookup on the parent for the specified file. As
part of the lookup, if we find gfid mismatch, we resolve them
based on the policy and return. If the file is not in gfid split-
brain, then we check for the data and metadata split-brain in the
getxattr code path, and resolve if any.
This will work provided absolute path to the file with the CLI
and not with gfid of the file. Hence the source-brick policy
without any file path will also not resolve the gfid split-brain
since it uses the gfid of the files. But it can resolve any other
type of split-brains and skip the gfid mismatch resolution with
the usual error message.
Reverting the change https://review.gluster.org/17290. This patch
resolves the issue.
Fixes gluster/glusterfs#135
Change-Id: Iaeba6fc32f184a34255d03be87cda02773130a09
BUG: 1459530
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Reviewed-on: https://review.gluster.org/17485
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment when we have replica 3 or arbiter setup, even when
lk succeeds on just one brick we give success to application which
is wrong
Fix:
Consider quorum-number of successes as success when quorum is enabled.
BUG: 1461792
Change-Id: I5789e6eb5defb68f8a0eb9cd594d316f5cdebaea
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/17524
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
..or else when a volume start force is given, we end up creating
/brick-path/.glusterfs/indices folder and various subdirs under it and
eventually starting the brick process.
As a part of this patch, glusterd_get_index_basepath() is added in
glusterd, who will then use it to create the basepath during
volume-create, add-brick, replace-brick and reset-brick. It also uses this
function to set the 'index-base' xlator option for the index translator.
Change-Id: Id018cf3cb6f1e2e35b5c4cf438d1e939025cb0fc
BUG: 1457202
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/17426
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gfid-mismatch-resolution-with-fav-child-policy.t does a `TEST ls
$M0/f3` (line #170) to trigger healing of a file in gfid split-brain in
a rep-3 volume. But the code to trigger name heal of gfid split-brain
file is not yet there. The test is passing due a lookup/ stat on $M0
which triggers a background entry self heal (which has the code to heal
gfid split-brain files) which may or may not complete the heal before
line 170. If it doesn't, lookup on f3 is failing with EIO.
Add the .t to bad tests until Karthik's patch for CLI based gfid
split-brain resolution fixes name heal also.
Change-Id: Iba6e9d81db386bc406aff1ecb6a18851f09bf7c0
BUG: 1450730
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/17290
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Currently the automatic split brain resolution with favorite child policy
is not resolving the GFID split brains.
Fix:
When there is a GFID split brain and the favorite child policy is set to
size/mtime/ctime/majority, based on the policy decide on the source and
sinks. Delete the entry from the sinks and recreate it from the source.
Mark the appropriate pending attributes and resolve the GFID split brain.
When the heal takes place it will complete the pending heals and reset
the attributes.
Change-Id: Ie30e5373f94ca6f276745d9c3ad662b8acca6946
BUG: 1430719
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Reviewed-on: https://review.gluster.org/16878
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: add-brick command to increase replica count in an arbiter
volume succeeds, causing undesirable effects like the 4th brick being
loaded with the arbiter xlator, the 3rd one losing the arbiter xlator
(when the brick process is restarted), arbitration logic in afr going
for a toss etc.
Fix: Arbiter configuration should always be a replica 3 volume (of
which 3rd brick is arbiter). Hence disallow increasing replica count for
arbiter volume configurations.
Change-Id: I9fe4edac880d0f711e6d44324ad5562974e53e51
BUG: 1429200
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: https://review.gluster.org/16845
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test was commented out with the belief that it depended
on utimensat() support, but in fact it was not necessary because
`stat -c %Y` only outputs second resolution.
Simply commenting in the test made it fail because it checked
the values *before* the heal, while intended was to check them
*after* the heal. This commit fixes that.
Change-Id: I4194ac645b365a1f906a3ac9bcbbdb1f05000e27
BUG: 1422074
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
Reviewed-on: https://review.gluster.org/16789
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niklas Hambüchen
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for multiple brick translator stacks running
in a single brick server process. This reduces our per-brick memory usage by
approximately 3x, and our appetite for TCP ports even more. It also creates
potential to avoid process/thread thrashing, and to improve QoS by scheduling
more carefully across the bricks, but realizing that potential will require
further work.
Multiplexing is controlled by the "cluster.brick-multiplex" global option. By
default it's off, and bricks are started in separate processes as before. If
multiplexing is enabled, then *compatible* bricks (mostly those with the same
transport options) will be started in the same process.
Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb
BUG: 1385758
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: https://review.gluster.org/14763
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently STACK_UNWIND is performnd within ctx->lock.
If readdir-ahead is loaded as a child of dht, then there
can be scenarios where the function calling STACK_UNWIND
becomes re-entrant. Its a good practice to not call
STACK_WIND/UNWIND within local mutex's
Change-Id: If4e869849d99ce233014a8aad7c4d5eef8dc2e98
BUG: 1401812
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/16068
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
1. Have a replica 2 volume with bricks b1 and b2
2. Before setting the layout, b1 goes down
3. Set the layout write some data, which gets populated on b2
4. b2 goes down then b1 comes up
5. Add another brick b3, and heal will take place from b1 to b3, which
basically have no data
6. Write some data. Both b1 and b3 will mark b2 for pending writes
7. b1 goes down, and b2 comes up
8. b2 gets heald from b1. During heal it removes the data which is already
in b2, considering that as stale data. This leads to data loss.
Solution:
1. In glusterd stage-op, while adding bricks, check whether the replica
count is being increased
2. If yes, then check whether any of the bricks are down at that time
3. If yes, then fail the add-brick to avoid such data loss
4. Else continue the normal operation.
This check will work enen when we convert plain distribute volume to replicate
Test:
1. Create a replica 2 volume
2. Kill one brick from the volume
3. Try adding a brick to the volume
4. It should fail with all bricks are not up error
5. Cretae a distribute volume and kill one of the brick
6. Try to convert it to replicate volume, by adding bricks.
7. This should also fail.
Change-Id: I9c8d2ab104263e4206814c94c19212ab914ed07c
BUG: 1406411
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Reviewed-on: http://review.gluster.org/16330
Tested-by: Ravishankar N <ravishankar@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|