| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
with this change, the xlator APIs will have a dictionary as extra
argument, which is passed between all the layers. This can be
utilized for overloading in some of the operations.
Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40
Signed-off-by: Amar Tumballi <amarts@redhat.com>
BUG: 782265
Reviewed-on: http://review.gluster.com/2960
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eager-lock is disabled by default.
Use cluster.eager-lock on/off to change the config.
write-behind on and eager-lock off is not supported configuration.
In afr, when eager-lock is enabled the inode lock on fd is taken
using the fd address as the lk-owner. So the lock is
interchangableale between the inode-locks on the same fd.
Change-Id: I7eef1ecd510f8028f5395dee882782da53c0de3f
BUG: 802515
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/2925
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
'enclosed' fop."
This reverts commit 2e80fdbeb6abbb23ff6789c2b98c82704883af0a.
Change-Id: I417fd43e4195d63e5b8b83dd3beb712887130e1e
Reviewed-on: http://review.gluster.com/2860
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
afr 'mangles' the lkowner inorder to ensure [f]inodelk/[f]entrylk fops from the
same application contend. But other fops that are 'visible' to the application
should operate with the lkowner provided by fuse for correct functioning of
posix-locks xlator.
Change-Id: I7e71f35ae7df2a070f1f46d4fc77eed26a717673
BUG: 790743
Signed-off-by: Krishnan Parthasarathi <kp@gluster.com>
Reviewed-on: http://review.gluster.com/2752
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Self-heal does not happen if the file has change log xattr
only for one of the subvol keys. This patch makes sure that
xattrop is done for all the afr subvol keys after a new entry
is created in entry-self-heal.
1) Added matrix create/cleanup functions
2) Impunging a new file does multiple xattrops on the source
subvol, one per sink. The code can do a single xattrop after
the entry is created on all the sinks.
3) Missing entry self-heal uses one frame per sink to heal
the file. This leads to multiple xattrops on the source subvol.
That code is changed now to use one frame which will
create the file on all subvols.
Change-Id: I65a42f9779b03f7efae283479f8653fb2cb8046b
BUG: 762680
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/2503
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Krishnan Parthasarathi <kp@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. What
--------
This change introduces an infrastructure change in the filesystem
which lets filesystem operation address objects (inodes) just by its
GFID. Thus far GFID has been a unique identifier of a user-visible
inode. But in terms of addressability the only mechanism thus far has
been the backend filesystem path, which could be derived from the
GFID only if it was cached in the inode table along with the entire set
of dentry ancestry leading up to the root.
This change essentially decouples addressability from the namespace. It
is no more necessary to be aware of the parent directory to address a
file or directory.
2. Why
-------
The biggest use case for such a feature is NFS for generating
persistent filehandles. So far the technique for generating filehandles
in NFS has been to encode path components so that the appropriate
inode_t can be repopulated into the inode table by means of a recursive
lookup of each component top-down.
Another use case is the ability to perform more intelligent self-healing
and rebalancing of inodes with hardlinks and also to detect renames.
A derived feature from GFID filehandles is anonymous FDs. An anonymous FD
is an internal USABLE "fd_t" which does not map to a user opened file
descriptor or to an internal ->open()'d fd. The ability to address a file
by the GFID eliminates the need to have a persistent ->open()'d fd for the
purpose of avoiding the namespace. This improves NFS read/write performance
significantly eliminating open/close calls and also fixes some of today's
limitations (like keeping an FD open longer than necessary resulting
in disk space leakage)
3. How
-------
At each storage/posix translator level, every file is hardlinked inside
a hidden .glusterfs directory (under the top level export) with the name
as the ascii-encoded standard UUID format string. For reasons of performance
and scalability there is a two-tier classification of those hardlinks
under directories with the initial parts of the UUID string as the directory
names.
For directories (which cannot be hardlinked), the approach is to use a symlink
which dereferences the parent GFID path along with basename of the directory.
The parent GFID dereference will in turn be a dereference of the grandparent
with the parent's basename, and so on recursively up to the root export.
4. Development
---------------
4a. To leverage the ability to address an inode by its GFID, the technique is
to perform a "nameless lookup". This means, to populate a loc_t structure as:
loc_t {
pargfid: NULL
parent: NULL
name: NULL
path: NULL
gfid: GFID to be looked up [out parameter]
inode: inode_new () result [in parameter]
}
and performing such lookup will return in its callback an inode_t
populated with the right contexts and a struct iatt which can be
used to perform an inode_link () on the inode (without a parent and
basename). The inode will now be hashed and linked in the inode table
and findable via inode_find().
A fundamental change moving forward is that the primary fields in a
loc_t structure are now going to be (pargfid, name) and (gfid) depending
on the kind of FOP. So far path had been the primary field for operations.
The remaining fields only serve as hints/helpers.
4b. If read/write is to be performed on an inode_t, the approach so far
has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and
then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do
fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost
in performance in the inbuilt NFS server.
5. Misc
-------
The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent
with the rest of the codebase.
Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f
BUG: 781318
Reviewed-on: http://review.gluster.com/669
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In rename the changelog modification needs to happen both on
old parent-dir and new parent-dir, so 2 stack winds are
done per brick.
Change-Id: I43f34661e397c4288162213944529e18b7724b1d
BUG: 766603
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/783
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I00c714a89575023f6dbdd3430dcbf191e5d08019
BUG: 3650
Reviewed-on: http://review.gluster.com/740
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Used a #pragma to kill ~170 in rpcgen code. Added GF_UNUSED to deal with
a few more from macros elsewhere. The remainder are function return values
(mostly context and dict calls) that really should be checked. Those would
be harder to fix without real understanding of the code where they occur,
so they remain as reminders.
(Patchset 2: deal with older gcc that doesn't handle #pragma GCC diagnostic)
(Patchset 3: fix include paths in generated files)
(Patchset 4: keep up with trunk, squash 9 new warnings)
(Patchset 5: six more, all in AFR)
Change-Id: I29760c8c81be4d7e6489312c5d0e92cc24814b7b
BUG: 2550
Reviewed-on: http://review.gluster.com/378
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Afr transaction performs lock, pre-op, op, post-op and unlock steps in that
order. The child_up[] is overloaded with the information of where all
the first two steps succeeded. This works perfectly fine for
Transaction, but the locking/unlocking part of the code is re-used by
data self-heal. In that each loop_frame does lock, rchecksum,
read-from-source and write-to-sinks, unlock steps.
Rchecksum fop assumes that the fop needs to happen on one source + all
sinks and sets the call_count to that number. But if the lock step fails
on any of the sinks it will mark the child_up of that child to 0, which
will result in call_count mismatch and the frame will hang thinking that
some more cbks need to come. When this happens loop_frame will never go
to unlock step leading to hangs on that file.
Change-Id: I3dd0449cc6193a980bacf637d935881f4b22210a
BUG: 3597
Reviewed-on: http://review.gluster.com/474
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a change in the way write transactions hold a lock
which optimizes the case of sequential writes from a single writer.
Lock phase of a transaction has two sub-phases. First is an attempt
to acquire locks in parallel by broadcasting non-blocking lock
requests. If lock aquistion fails on any server, then the held locks
are unlocked and revert to a blocking locked mode sequentially on
one server after another.
The change in this patch is to make the initial broadcasting lock
request attempt to acquire lock on the entire file. If this fails,
we revert back to the sequential "regional" blocking lock as before.
In the case where such an "eager" lock is granted in the non-blocking
phase, it gives rise to an opportunity for optimization. i.e, if
the next write transaction on the same FD arrives before the unlock
phase of the first transaction, it "takes over" the full file lock.
Similarly if yet another transaction arrives before the unlock phase
of the "optimized" transaction, that in turn "takes over" the lock
as well. The actual unlock now happens at the end of the last
"optimzed" transaction.
Any operation which arrives before the unlock phase of the previous
transaction is a potential candidate to become an "optimized"
transaction. In cases where the previous transaction had aquired
lock as a "regional" blocking lock, and the next transaction comes
in before its unlock phase, then it would not be an "optimized"
transaction.
Implied assumption
------------------
Since two or more transactions can now operate within the same
large lock, there is a possibility that overlapping transactions
can arrive at oppoosite orders on the servers. However in the
larger picture this is not possible as write-behind already
ensures that no two overlapping writes on an inode are in transit
at the same time. Overlapping writes across clients are not a
problem as they compete at locks anyways.
Theoretical benefits and potential harms
----------------------------------------
In case of a single writer: The benefits are large for sequential
writes. In the best case the entire file write can happen with just
one lock and unlock per server, provided writes are coming in fast
enough and getting pipelined by write-behind soon enough (which is
usually the case). If the writes are not coming in fast enough, then
the optimization "kicks in" for only those subsets of writes which
are close enough to get "piggybacked". For random writes the benefits
are the same as well. In any case the overall performance is better
than or equal to the performance without this optimization for a single
writer.
In case of multiple writers: When multiple writers are not writing
concurrently, there is no negative performance impact. When multiple
writers are writing concurrently to the same region, there is no
negative impact either, as they were previously getting arbitrated
at the locks translator too. In the case of multiple writers writing
to different regions concurrently, there will be an increased number
of "failovers" from failed parallel non-blocking to sequential blocking
regional locks. This above "worst case" has a simple workaround that
as soon as we detect > 1 open-fd-count in lookup xattr, we can disable
this optimization on those fds.
Beneficial side-effects
-----------------------
There is another similar optimization in AFR for changelogs which goes
by the name of "changelog-piggybacking". That works in a similar way where
pending flags get 'taken over' or 'piggybacked' by the next transaction
if its 'pre-op' phase kicks in before the 'post-op' phase of the
previous transaction. It has been observed that this changelog-piggybacking
optimization gives a saving of about ~55% savings of xattr calls hitting
the wire, measured across various types of network interfaces. The side
effect of this eager-lock optimization is that it gives an almost 100%
saving of xattr calls by making the optimistic-changelog work much more
efficiently as it gives a wider overlap of the xattr phases of two
consecutive transactions.
Change-Id: I41c02eb3b64c14c68ef66a344610ec3f024cd59d
BUG: 3409
Reviewed-on: http://review.gluster.com/240
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes ~200 such warnings, but leaves three categories untouched.
(1) Rpcgen code.
(2) Macros which set variables in the outer (calling function) scope.
(3) Variables which are set via function calls which may have side effects.
Change-Id: I6554555f78ed26134251504b038da7e94adacbcd
BUG: 2550
Reviewed-on: http://review.gluster.com/371
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code is checking for priv->child_up[i], which can change while the fop
is in progress. Since pending[child][id-of-transaction] alone is enough
to tell if the child became stale or not, use just that.
Change-Id: I494bf02cca66f4fd41526195fafce86a202c6bd1
BUG: 3455
Reviewed-on: http://review.gluster.com/293
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I206571c77f2d7b3c9f9d7bb82a936366fd99ce5c
BUG: 3182
Reviewed-on: http://review.gluster.com/141
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If write/truncate fails we should remove the child that failed the fop
from the fresh children. The previous code assumes that the children
that succeeded the fop are fresh children, which is wrong. Fixed that
in this patch.
Change-Id: I1e6e21e20faea00516a0fdd2e95f2d7e9cf9076d
BUG: 3411
Reviewed-on: http://review.gluster.com/263
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I2d10f2be44f518f496427f257988f1858e888084
BUG: 3348
Reviewed-on: http://review.gluster.com/200
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f
BUG: 3348
Reviewed-on: http://review.gluster.com/182
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2840 (files not getting self-healed when the first child goes down)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2840
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2840 (files not getting self-healed when the first child goes down)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2840
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
| |
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 2517 (the size of allocated memory may be wrong)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2517
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The standard way of maintaining changelog in replicate has been to
write out pending flags and to unset the pending flag post the
actual operation.
This new optimization kicks in only when all subvolumes are up.
The optimization is that, during pre-op, no changelog is written for
METADATA and ENTRY/RENAME operations. If during the operation nothing
failed, no changelog is updated in post-op either. If however,
something does fail during an operation, then, pending flags get
written during post op pointing only towards the failed nodes.
DATA transactions continue to work the way they are.
If one subvolume is down, pending flags are written in pre-op changelog
itself as before.
The impact of this optimization is only in the case when both servers
die or the client dies while the 'FOP' stage of the transaction is
in progress. By nature of METADATA and ENTRY operations, detecting a
mismatch later is not dependent on the presence of changelog. Changelog
only determines the direction in which self-heal happens for these types
of transactions. For the direction too this optimization does not have
a major impact because in the cases of failure (both servers dieing or
client dieing) the final state (direction of self-heal) would be
arbitrary anyways as the syscall wouldn't have completed.
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 2068 (performance enhancements)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2068
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vijay Bellur <vijay@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 971 (dynamic volume management)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1388 ()
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1388
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In every transaction check if the currently set read child in the
inode context failed in the fop and set it to another subvol on
which the latest fop has passed. This will prevent read fops landing
on subvols which have witnessed a failure.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1172 (ls -lh on NFS mount of 2-mirror replicate gives incorrect file size)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1172
|
|
|
|
|
|
|
|
|
|
|
|
| |
use a changelog piggybacking optimization instead of first-write-to-flush
optimization and do other cleanups (removal of post-post-op hook etc.)
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1235 (Bug for all pump/migrate commits)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1235
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
Signed-off-by: Vijay Bellur <vijay@dev.gluster.com>
BUG: 1235 (Bug for all pump/migrate commits)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1235
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pavan Vilas Sondur <pavan@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 960 ()
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=960
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pavan Vilas Sondur <pavan@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 1235 (Bug for all pump/migrate commits)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1235
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vijay Bellur <vijay@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 824 (Crash in afr rename transaction)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=824
|
|
|
|
|
|
|
|
|
|
|
|
| |
reset pre_op_done[i] to 0 after issuing a postop in flush. this was
missed during the introduction of pre_op_done[] array and was resulting
in a lot of spurious self heals when spurious flushes were received
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
| |
alloca.h should be included on a platform-specific basis.
Lets common-utils.h handle that.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 349 (FreeBSD compilation error (alloca.h).)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch completes the previous patch for self-heal of
open fds in replicate.
If an fd was never opened on a subvolume, we remember that
and do the open after we've done self-heal on that fd.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch brings in partial support for self-heal of open
fds. The precondition is that the fd should have been opened
successfully during the initial open() (or create()), and we
assume that protocol/client has successfully reopened the fd
when the subvolume comes back up.
It works by doing an "up/down flush" (a dummy flush transaction
to do post-op wherever necessary) and then triggering
data self-heal on the file in the post-post-op hook of the
dummy flush transaction. This ensures that any writes
that come in during self-heal will wait until self-heal completes.
The up/down flush is also done when a subvolume goes down,
so that post-op is done on all subvolumes where pre-op was done.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
| |
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
| |
Data self-heal now holds blocking locks, and instead of locking
on all subvolumes, it only locks on {data-lock-server-count} subvolumes.
Signed-off-by: Vikas Gorur <vikas@gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 170 (Auto-heal fails on files that are open()-ed/mmap()-ed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=170
|
|
|
|
|
|
|
|
|
|
|
| |
For ENTRY_RENAME_TRANSACTIONs, keep track separately whether the
lower_path and the higher_path have been locked, and unlock only
those which have been.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
|
|
|
|
|
| |
mark a subvol with held lock only if op_ret == 0
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
|
|
|
|
|
|
|
|
| |
transactions.
Hold the lock on the {higher_path} only after the lock on the
{lower_path} has been granted successfully.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 112 (parallel deletion of files mounted by different clients on the same back-end hangs and/or does not completely delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=112
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
each of the subvolume
- This patch fixes bug #29.
- Using separate copies of dictionaries also eliminates a potential bug in a
setup consisting of afr with a posix and client, each having io-threads on
top as children. Since posix_xattrop after performing required operations
on the xattr array passed in dictionary, sets the result at the same key
and in the same dictionary passed as input argument,
there can be race conditions where in the results of the operation on
posix-child can be sent to the other child as input argument for xattrop,
which ofcourse is wrong.
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
| |
__if_fd_pre_op_done - reset fd_ctx->pre_op_done to 0 so that double flushes do not result in two xattrop() calls
|
|
|
|
|
|
|
|
| |
Save the original pid while locking and restore it
after the FOP is done. This ensures posix-locks can
release locks (fcntl) properly.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
subvolumes while keeping existing data.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
| |
If a writev fails, remember it by marking it in the fd context.
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
| |
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|
|
|
|
|
|
|
| |
Earlier the check was in afr_flush(), which caused race conditions
with writev()
Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
|