| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the logging levels from WARNING to DEBUG in the lookup path to
minimize incessant logging in case of gfid mismatch errors.
Change-Id: I631b16df3249cf826606f547531f985dac696088
BUG: 959083
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/4939
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, nufa wouldn't work on volume topologies such as
distribute-replicate or distribute-stripe.
Change-Id: Ia89ed4412a00601022c1fc94f046056ce4820fe8
BUG: 980838
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/5262
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment data-self-heal acquires locks in following
pattern. It takes full file lock then gets xattrs on files on both
replicas. Decides sources/sinks based on the xattrs. Now it acquires
lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
from source to sink now acquires lock 128k+1 till 256k then unlocks
0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
lock again then unlocks the final small range block.
It decrements pending counts and then unlocks the full file lock.
This pattern of locks is chosen to avoid more than 1 self-heal
to be in progress. BUT if another self-heal tries to take a full
file lock while a self-heal is already in progress it will be put in
blocked queue, further inodelks from writes by the application will
also be put in blocked queue because of the way locks xlator grants
inodelks. So until the self-heal is complete writes are blocked.
Here is the code:
xlators/features/locks/src/inodelk.c - line 225
if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
ret = -EAGAIN;
if (can_block == 0)
goto out;
gettimeofday (&lock->blkd_time, NULL);
list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
}
This leads to hangs in applications.
Fix:
Since we want to prevent two parallel self-heals. We let them compete
in a separate "domain". Lets call the domain on which the locks have
been taken on in previous approach as "data-domain".
In the new approach When a self-heal is triggered,
it acquires a full lock in the new domain "self-heal-domain".
After this it performs data-self-heal using the locks in
"data-domain" as before.
unlock the full file lock in "self-heal-domain"
With this approach, application's writevs don't have to wait
in pending queue when more than 1 self-heal is triggered.
Change-Id: Id79aef3dfa888945977fb9758374ac41c320d0d5
BUG: 967717
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5100
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- afr_local_copy should not be memduping locked nodes, that would
mean that lock is taken in self-heal on those nodes even before
it actually takes the lock. So removed memdup code. Even entry
lock related copying (lockee info) is also not necessary for
self-heal functionality, so removing that as well. Since it is
not local_copy anymore changed its name.
- My editor changed tabs to spaces.
Change-Id: I8dfb92cb8338e9a967c06907a8e29a8404782d61
BUG: 967717
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5099
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I40eec20ca6b3f857245a2438883822e251077ee9
BUG: 979365
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5269
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
At the moment afr-flush makes sure that a delayed post-op
is woken up but it does not wait for it to complete the
post-op before flush unwinds.
These are the steps that are happening:
1) flush fop comes on an fd which wakes up a delayed post-op
and continues with the flush fop.
2) post-op sends fsync on the wire.
3) flush completes and unwinds to fuse.
4) graph switch happens on the fuse mount disconnecting the
old graph's client connections to bricks.
5) xattrop after fsync fails with ENOTCONN because the
connections from old graph are taken down now.
Fix:
Wait for post-op to complete before starting to flush.
We could make flush act similar to fsync (i.e.) wind
flush as is but wait for post-op to complete before unwinding
flush, but it is better to send flush as the final fop. So
wind of flush will start after post-op is complete. Had to
change fsync to accommodate this change.
Change-Id: I93aa642647751969511718b0e137afbd067b388a
BUG: 980548
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5274
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Currently whenever there is metadata split-brain, a variable
sh->op_failed is set to 1 to denote that self heal got failed.
But if we proceed for data self heal, even code-path of data
self heal also relies on the sh->op_failed variable. So if will
check for sh->op_failed variable and will eventually fails to
do data self heal. So needed a mechanism to allow data self heal
even if metadata is in split brain.
Fix:
Some data structure revamp is done in
http://review.gluster.com/#/c/5106/ fix and this patch is
based on the above fix. Now we can store which particular self-heal
got failed i.e GFID_OR_MISSING_ENTRY_SELF_HEAL, METADATA, DATA,
ENTRY. And we can do two types of self heal failure check.
1. Individual type check: We can check which among all four
(Metadata, Data, Gfid or missing entry, entry self heal)
got failed.
2. In afr_self_heal_completion_cbk, we need to make check
based on the fact that if any specific self heal got failed treat
the complete self heal as failure so that it will populate
corresponding circular buffer of event history accordingly.
Change-Id: Icb91e513bcc752386fc8a78812405cfabe5cac2d
BUG: 977797
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5253
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
If fdctx is NULL in afr_fsync, process crashes because
of NULL dereference.
Fix:
if fdctx is NULL, always say witnessed unstable write so
that fsyncs are done properly. Handled fdctx being null
in afr_delayed_changelog_post_op otherwise fsync stub is
never resumed and the mount was hanging.
Change-Id: Icacc900e9be63c29db3325cb0e19cc250adebaac
BUG: 978794
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We added this code as an interim fix until afr can
handle split-brains even when opens are not issued.
Afr code has matured to reject fd based fops when
there are split-brains so we can remove it.
Change-Id: Ib337f78eccee86469a5eaabed1a547a2cea2bdcf
BUG: 974972
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5227
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I764883811e30ca9d9c249ad00b6762101083a2fe
BUG: 976800
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5248
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If no decommissioned nodes options are set in the options, then
clear the conf->decommissioned_bricks.
Change-Id: I426d2bcc874aab21b2eba0b16a580b9a26672ea2
Signed-off-by: shishir gowda <sgowda@redhat.com>
BUG: 973073
Reviewed-on: http://review.gluster.org/5199
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Nfs xlator never does open on a file for performing writes,
afr does not perform changelog wakeup for this fd so operations
which do metadata operations as soon as the data operations are
completed perceive a delay of 'post-op-delay-secs'.
Fix:
Perform changelog wakeup on anon-fd if the fd with same pid is
not present in inode-list.
Note:
This approach is a short-term fix. A proper fix needs a new domain
for taking metadata locks so that data/metadata locks don't compete
with each other.
Change-Id: I253afb289eadf30c7951e56fb2c4840d7132f5e4
BUG: 966018
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5066
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
As the end of the self heal, message logged by
"afr_self_heal_completion_cbk" is inadequate to determine what exactly failed
during the course of afr self heal. It is worth to have knowledge of what all
types of self heal got triggered for an entity and whether the status is success
or failure.
Fix:
At the end of self heal, it will log information about out of 4 types of self
heal (gfid or missing entry self heal, metadata, data and entry self heal),
who all got triggered and who all got failed or successful at the end.
Change-Id: I5360762fbd7d391ac4c6af6706b4835c5801835a
BUG: 968301
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5106
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the DISCARD file operation. Discard punches a hole
in a file in the provided range. Block de-allocation is implemented
via fallocate() (as requested via fuse and passed on to the brick
fs) but a separate fop is created within gluster to emphasize the
fact that discard changes file data (the discarded region is
replaced with zeroes) and must invalidate caches where appropriate.
BUG: 963678
Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/5090
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for the fallocate file operation. fallocate
allocates blocks for a particular inode such that future writes
to the associated region of the file are guaranteed not to fail
with ENOSPC.
This patch adds fallocate support to the following areas:
- libglusterfs
- mount/fuse
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
BUG: 949242
Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-on: http://review.gluster.org/4969
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In some code paths neither loc->gfid nor loc->inode->gfid
is populated which leads to EINVAL for linkfile setattr
in dht_linkfile_attr_heal.
Fix:
Populate loc->gfid before dht_linkfile_attr_heal.
Change-Id: I062770e6f6eaead304eff1dae81f8588a3b97eed
BUG: 971805
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5178
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ie2b2dc54aa0d500c35752c72d3b562bcc05b1fc2
BUG: 969336
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5123
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If unlink of linkfile returns ENOENT, do not fail unlink.
Proceed with unlinking of cached file.
Change-Id: If7cec92b40c39d68dd9c3606c6c2c3a6bd67d27b
BUG: 966848
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4971
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When taking blocking entrylks, afr orders the entrylks based on
uuid_compare of gfids of parent dirs, if they are equal then it orders
them based on the basenames. While this approach works fine, the
implementation assumes loc->gfids to be populated at the time of
the comparison, but loc may have gfid in loc->inode->gfid instead
of loc->gfid which was leading to order mismatches and dead-locks.
Fix:
Implemented loc_gfid which gives gfid by checking both loc->gfid,
loc->inode->gfid. Used this for ordering the blocking entrylks.
Change-Id: Ib0db36bbaf0df09fa87c3c3bb6a834db74fc2154
BUG: 965987
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5062
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assign local = frame->local before dereferencing
local->linkfile.linkfile_cbk. Additionally, fail if
op_ret is non_zero.
Change-Id: I96a2f34ba29887da9ccaae38a644431cf7c43265
BUG: 966858
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5141
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Lookups in discovery fail with ENOENT so local->inode
is never set. dht_layout_set logs the callstack when
the function is called in that state.
Fix:
Don't set layout when lookups fail in discovery.
Change-Id: I5d588314c89e3575fcf7796d57847e35fd20f89a
BUG: 965434
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/5055
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cannot heal in dht_discover, as it is a gfid based lookup, and not
path based. So, returning error here would lead to app's to see failure.
Also, update the layout in inode_ctx even if it has anomalies. Let
subsequent heals fix the issue.
Change-Id: I2358aadacf9a24e20a22ab0a6055c38c5eb6485c
BUG: 960348
Original-author: shishir gowda <sgowda@redhat.com>
Signed-off-by: shishir gowda <sgowda@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4959
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: If it is fresh volume with no files created the xattrop directory is
not present. If crawl happens in that time, lookup for xattrop
directory will fail and it results in printing a misleading log.
Fix: Changed misleading log.
Change-Id: Iae5da3e8423564d64096f88abdaf8c98e4935840
BUG: 928575
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4993
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If linkfile create fails with EEXISTS, then check if the file
is a linkfile for the same file. If not, return the error
Change-Id: Iab42db54422dea69de0049b5196365e65edadd91
BUG: 966858
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5060
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If renames are done with different uid/gid (non-owners), then we would
end up with incorrect uid/gid.
The fix is to create linkfiles, and heal the uid/gid as root:root. This
preserves our notion of creation as root:root and heal the uid/gid as
root:root in all paths. Additionally, we need to consider uid/gid from
only src_cached subvol, and not from linkfiles.
rename is also done as root:root if done on linkfile, as setattr of ownership
on linkfile is done after the rename
Change-Id: Icb5d431dc42da9c02dfae81980e3fe769a47a274
BUG: 884597
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4682
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if local->fd == NULL, then in dht_migration_check_complete, do not do
open call. Let the layout get updated, and proceed with invoking the
registered target_fn.
if local->fd == NULL, do not call dht_rebalance_in_progress_check for
truncate fop, but proceed with truncate2.
Change-Id: Ia5a5d40bcea7bfb320ef7096af1e035b8847d4ff
BUG: 960055
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4958
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In path based op's like truncate, we use getxattr instead of
fgetxattr call. These can fail with permission denied issues
as linkto file creation, and setattr of ownership is not atomic,
and in cases where setattr failed (subvols down..)
The fix is to perform getxattr as root:root as it is a internal
fop. fgetxattr, bypass the access check, as it already has a valid
open fd.
Change-Id: Ie221c9172e3c1c7ed4e50c8782d362826910756f
BUG: 957074
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4890
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Additionally, update op_errno to the lasted failure. If failures
found in complete_check, error returned would be EUCLEAN instead
of the right failure (in this case ENOENT)
Change-Id: Ib813867f4b817af651627b9ea07b0b09fa2b26ce
BUG: 966852
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4989
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mem_get function: Log message related to mem_pool
calloc is removed as its been calculated in mempool
'stats'. This messgae is consuming nearly half of
the total log messages in DEBUG mode.
dht_hash_compute function: Changed log level from DEBUG
to TRACE.
client_fdctx_destroy function: Changed log level from DEBUG
to TRACE.
Change-Id: Ic948db0419e76df4e95ebd0cabaf66eadbaada6b
BUG: 966851
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/5086
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is support for discovering a filename in a given directory
which has a case insensitive match of a given name. It is implemented
as a virtual extended attribute on the directory where the required
filename is specified in the key.
E.g:
sh# getfattr -e "text" -n user.glusterfs.get_real_filename:FiLe-B /mnt/samba/patchy
getfattr: Removing leading '/' from absolute path names
# file: mnt/samba/patchy
user.glusterfs.get_real_filename:FiLe-B="file-b"
In reality, there can be multiple "answers" as the backend filesystem is
case sensitive and there can be multiple files which can strcasecamp()
successfully. In this case we pick the first matched file from the first
responding server.
If a matching file does not exist, we return ENOENT (and NOT ENODATA).
This way the caller can differentiate between "unsupported" glusterfs
API and file not existing.
This API is used by Samba VFS to perform efficient discovery of the real
filename without doing a full scan at the Samba level.
Change-Id: I53054c4067cba69e585fd0bbce004495bc6e39e8
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4941
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not let inode linking to happen only in lookup(). While
that works, it is inefficient.
Change-Id: I51bbfb6255ec4324ab17ff00566375f49d120c06
BUG: 953694
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4931
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ib727948c6e21b19fd509f258ff0aea1c5d1a84d1
BUG: 966845
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/5056
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
syncop_open used to perform a ref in syncop_open_cbk so the extra
unref was needed but now syncop_open_cbk does not take a ref so no
need to do extra unref.
Fix:
remove the extra fd_unref and let dht_local_wipe do the final unref.
Change-Id: Ibe8f9a678d456a0c7bff175306068b5cd297ecc4
BUG: 961615
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4974
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See https://git.gluster.org/gluster-swift.git for the new location of
the Gluster-Swift code.
With this patch, no OpenStack Swift related RPMs are constructed.
This patch also removes the unused code that references the
user.ufo-test xattr key in the DHT translator.
Change-Id: I2da32642cbd777737a41c5f9f6d33f059c85a2c1
BUG: 961902 (https://bugzilla.redhat.com/show_bug.cgi?id=961902)
Signed-off-by: Peter Portante <peter.portante@redhat.com>
Reviewed-on: http://review.gluster.org/4970
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Luis Pabon <lpabon@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
gfid-self-heal always assigns the gfid(GFID-1) it gets from lookup.
Between the time of lookup to triggering the gfid-self-heal the
entry could have changed. Now lets say there is a case where
one of the files of the replica subolumes already has a gfid
(GFID-2) and the other does not. In that case healing should
happen with GFID-2 instead of GFID-1.
Fix:
Missing-entry-self-heal already handles all these cases. So removed
separate handling of gfid-self-heal.
Change-Id: Ie96261e9036c8f3cb4cad89347f9bf7b681cdc1a
BUG: 767585
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/2670
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since removexattr() fails to remove "security.selinux" in a system
where SELinux is enforcing, xattr self-healing fails.
As a consequence of this, user extended attributes are not being healed.
Added a check in afr to prune SELinux xattr from the dictionary
used for removing xattrs from the sink.
Minor changes in tests and md-cache as well.
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I854bfc0098dde812ce2afe64b125ee40c04bdeb1
BUG: 957877
Reviewed-on: http://review.gluster.org/4905
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ifa42762adde8b55ef1e2b51a59c93cebd983343f
BUG: 912581
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4792
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When subvols-per-directory is < available subvols, then there are layouts
which are not populated. This leads to incorrect identification of holes or
overlaps. We need to ignore layouts, which have err == 0, and start == stop.
In the current scenario (start == stop == 0).
Additionally, in layout-merge, treat missing xattrs as err = 0. In case of
missing layouts, anomalies will reset them.
For any other valid subvoles, err != 0 in case of layouts being zeroed out.
Also reverted back dht_selfheal_dir_xattr, which does layout calculation only
on subvols which have errors.
Change-Id: I9f57062722c9e8a26285e10675c31a78921115a1
BUG: 921408
Signed-off-by: shishir gowda <sgowda@redhat.com>
Reviewed-on: http://review.gluster.org/4668
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today there is a non-obvious dependence of eager-locking on
write-behind. The reason is that eager-locking works as long
as the inheriting transaction has no overlaps with any of the
transactions already in progress. While write-behind provides
non-overlapping writes as a side-effect most of times (and only
guarantees it when strict-write-ordering option is enabled,
which is not on by default) eager-lock needs the behavior
as a guarantee. This is leading to complex and unwanted checks
for the presence of write-behind in the graph, for the simple
task of checking for overlaps.
This patch removes the interdependence between eager-locking
and write-behind by making eager-locking do its own overlap checks
with in-progress writes.
Change-Id: Iccba1185aeb5f1e7f060089c895a62840787133f
BUG: 912581
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4782
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example:
If a new entry creation fop fails with EEXIST or a delete entry fop
fails with ENOENT, on all the subvols the fop is wound, then no
change took place to the directory. So we can treat that case as no
change happened to the directory.
Change-Id: I3b3a7931954da2166a9cba19ff9f76f37739d751
BUG: 860210
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4626
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FIX:
In missing entry self heal, once the source directories are determined
after the lookup and if file is not present on any of the brick which
contains the souce directory, the entry is removed from the directory.
So log message should give information of "Purging of entry".
Change-Id: I4d3deb602e0812dc1c9c8ba0a466716d81dede7e
BUG: 947312
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/4753
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In Pump entry self-heal happens for each directory during the
first opendir using conservative merge. But in entry-self-heal
readdir is issued with '0' size. So entry self-heal is not
creating any files. After pump thinks entry self-heal is complete
it proceeds to heal each of the file in the directory it just
healed. Fortunately most of the times it chooses source-brick
in pump as read-child for readdir. This happens because readchild is the
subvolume on which lookup succeeds first. In pump lookup succeeds
faster in local process than on the destination brick process most
of the times. For all the entries pump finds in readdir it does a
lookup. During this lookup the entry on the destination brick is
created and healed. This is the reason why replace-brick
succeeds whenever read-child for the directory is chosen as the
source-brick. Which is most of the times. When read-child is chosen
as the destination brick, readdir returns no entries so replace-brick
completes without syncing the whole data.
Fix:
Set readdir-size in pump so that entry self-heal happens with
64k size. This ensures that entry self-heal triggered from
opendir actually creates the files on the destination brick.
Change-Id: I65ea45d3c2735a9578f3aa34eff771b6563241ca
BUG: 909800
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4712
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions keep changing as new functionality is added, so copying
and pasting the code is not a good solution. This way ensures that all
fields get initialized properly no matter how much new stuff we throw in.
Change-Id: I9e9b043d2d305d31e80cf5689465555b70312756
BUG: 924488
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/4710
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if any subvol returned ENOENT while parent entrylk lock was held,
yield and return ENOENT for the entire lookup.
This is how the issue happens:
Multiple clients A, B and C are attempting 'mkdir -p /mnt/a/b/c'
1 Client A is in the middle of mkdir(/a). It has acquired lock.
It has performed mkdir(/a) on one subvol, and second one is still
in progress
2 Client B performs a lookup, sees directory /a on one,
ENOENT on the other, succeeds lookup.
3 Client B performs lookup on /a/b on both subvols, both return ENOENT
(one subvol because /a/b does not exist, another because /a
itself does not exist)
4 Client B proceeds to mkdir /a/b. It obtains entrylk on inode=/a with
basename=b on one subvol, but fails on other subvol as /a is yet to
be created by Client A.
5 Client A finishes mkdir of /a on other subvol
6 Client C also attempts to create /a/b, lookup returns ENOENT on
both subvols.
7 Client C tries to obtain entrylk on on inode=/a with basename=b,
obtains on one subvol (where B had failed), and waits for B to unlock
on other subvol.
8 Client B finishes mkdir() on one subvol with GFID-1 and completes
transaction and unlocks
9 Client C gets the lock on the second subvol, At this stage second
subvol already has /a/b created from Client B, but Client C does not
check that in the middle of mkdir transaction
10 Client C attempts mkdir /a/b on both subvols. It succeeds on
ONLY ONE (where Client B could not get lock because of
missing parent /a dir) with GFID-2, and gets EEXIST from ONE subvol.
This way we have /a/b in GFID mismatch. One subvol got GFID-1 because
Client B performed transaction on only one subvol (because entrylk()
could not be obtained on second subvol because of missing parent dir --
caused by premature/speculative succeeding of lookup() on /a when locks
are detected). Other subvol gets GFID-2 from Client C because while
it was waiting for entrylk() on both subvols, Client B was in the
middle of creating mkdir() on only one subvol, and Client C does not
"expect" this when it is between lock() and pre-op()/op() phase of the
transaction.
Original-author: Anand Avati <avati@redhat.com>
Change-Id: Idca475dbbc2a51e09da6fa0f9e1e37148caef208
BUG: 860210
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4625
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xattrs are first removed from sink followed by setting
source xattrs.
Change-Id: I181cb5b785b667bbfc6e40787a2183a8f45de06b
BUG: 906646
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/4656
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here are the logs of a file on which we saw EIO because of size mismatch:
[root@lizzie ~]# grep 38f18204 /var/log/glusterfs/mnt-x-.log
Reporting Unstable write for 38f18204-2840-408e-ae65-c01f4106b8c4
for offset: 0, len: 7680
Cleared unstable write flag for 38f18204-2840-408e-ae65-c01f4106b8c4:
offset 0 length 7680
Reporting Unstable write for 38f18204-2840-408e-ae65-c01f4106b8c4 for
offset: 7680, len: 71680
Reporting Unstable write for 38f18204-2840-408e-ae65-c01f4106b8c4 for
offset: 79360, len: 15716
fsync completed on 38f18204-2840-408e-ae65-c01f4106b8c4 for
offset 0 length 7680 with changelog status: -1 -1
According to these logs fsync did not happen after writev with
offset: 79360, len: 15716. Which is the reason for this problem.
In total 3 writes came. lets call them w1, w2, w3
w1 does pre_op so pre_op_done[0], pre_op_done[1] counts become 1 and 1
then is_piggyback_post_op() is called for w1 and it returns *false*
w1's fsync is fired
Now w2 and w3 come and see that pre_op_done[0], pre_op_done[1] are both 1,
so pre_op_piggyback[0] and pre_op_piggyback[1] are both incremented twice,
once by w2, one more time by w3 and become 2, 2 ------- Step-A
Now fsync of w1 is complete and it goes ahead with post op and decrements
pre_op_done[0], pre_op_done[1] to 0, 0
Now w2, w3 writevs complete and is_piggyback_post_op will return *true* for
both w2, w3.
So fsync is not fired for both w2, w3
this patch prevents Step-A from happening.
Change-Id: I8b6af1f1875b2cf5f718caa3c16ee7ff3dc96b5c
BUG: 927146
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4752
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scheme to encode brick d_off and brick id into global d_off has
two approaches. Since both brick d_off and global d_off are both 64-bit
wide, we need to be careful about how the brick id is encoded.
Filesystems like XFS always give a d_off which fits within 32bits. So
we have another 32bits (actually 31, in this scheme, as seen ahead) to
encode the brick id - which is typically plenty.
Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off,
as the d_off is calculated based on a hash function value. This leaves
us no "unused" bits to encode the brick id.
However both these filesystmes (EXT4 more importantly) are "tolerant" in
terms of the accuracy of the value presented back in seekdir(). i.e, a
seekdir(val) actually seeks to the entry which has the "closest" true
offset.
This "two-prong" scheme exploits this behavior - which seems to be the
best middle ground amongst various approaches and has all the advantages
of the old approach:
- Works against XFS and EXT4, the two most common filesystems out there.
(which wasn't an "advantage" of the old approach as it is borken against
EXT4)
- Probably works against most of the others as well. The ones which would
NOT work are those which return HUGE d_offs _and_ NOT tolerant to
seekdir() to "closest" true offset.
- Nothing to "remember in memory" or evict "old entries".
- Works fine across NFS server reboots and also NFS head failover.
- Tolerant to seekdir() to arbitrary locations.
Algorithm:
Each d_off can be encoded in either of the two schemes. There is no
requirement to encode all d_offs of a directory or a reply-set in
the same scheme.
The topmost bit of the 64 bits is used to specify the "type" of encoding
of this particular d_off. If the topmost bit (bit-63) is 1, it indicates
that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0,
it indicates that the "small" d_off encoding scheme is used.
The goal of the "small" d_off encoding is to stay as dense as possible
towards the lower bits even in the global d_off.
The goal of the HUGE d_off encoding is to stay as accurate (close) as
possible to the "true" d_off after a round of encoding and decoding.
If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick
ID (call it "n").
SMALL d_off
===========
Encoding
--------
If the top n + 1 bits are free in a brick offset, then we leave the
top bit as 0 and set the remaining bits based on the old formula:
hi_mask = 0xffffffffffffffff
hi_mask = ~(hi_mask >> (n + 1))
if ((hi_mask & d_off_brick) != 0)
do_large_d_off_encoding ()
d_off_global = (d_off_brick * N) + brick_id
Decoding
--------
If the top bit in the global offset is 0, it indicates that this
is the encoding formula used. So decoding such a global offset will
be like the old formula:
if ((d_off_global & 0x8000000000000000) != 0)
do_large_d_off_decoding()
d_off_brick = (d_off_global % N)
brick_id = d_off_global / N
HUGE d_off
==========
Encoding
--------
If the top n + 1 bits are NOT free in a given brick offset, then we
set the top bit as 1 in the global offset. The low n bits are replaced
by brick_id.
low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N))
d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id
if (d_off_global == 0xffffffffffffffff)
discard_entry();
Decoding
--------
If the top bit in the global offset is set 1, it indicates that
the encoding formula used is above. So decoding would look like:
hi_mask = (0xffffffffffffffff << n)
low_mask = ~(hi_mask)
d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff)
brick_id = global_d_off & low_mask
If "losing" the low n bits in this decoding of d_off_brick looks
"scary", we need to realize that till recently EXT4 used to only
return what can now be expressed as (d_off_global >> 32). The extra
31 bits of hash added by EXT recently, only decreases the probability
of a collision, and not eliminate it completely, anyways. In a way,
the "lost" n bits are made up by decreasing the probability of
collision by sharding the files into N bricks / EXT directories
-- call it "hash hedging", if you will :-)
Thanks-to: Zach Brown <zab@redhat.com>
Change-Id: Ieba9a7071829d51860b7c131982f12e0136b9855
BUG: 838784
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4711
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce AFR_CALL_RESUME macro which cleans up frame->local, like
how AFR_STACK_UNWIND etc. do.
Therefore fix leak in afr_fsync() path.
Change-Id: I3855d8e7e84dbc44e05f507563b7f722bf9621b8
BUG: 927146
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/4745
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added extra fsync to data self-heal code to make sure the
data reached disk before erasing the changelogs
Change-Id: I9e7e6e55cdc49de2b991705d1638946464a9d4f9
BUG: 927146
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4744
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) pre_op_piggyback should always be decremented.
2) Move fsync resume to just after post_op.
3) fsync stub should be created from afr's local
not from the final response.
Change-Id: I220bb532eb03bea584292f4dd2e816ad0c3e0cf7
BUG: 927146
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/4741
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|