| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. What
--------
This change introduces an infrastructure change in the filesystem
which lets filesystem operation address objects (inodes) just by its
GFID. Thus far GFID has been a unique identifier of a user-visible
inode. But in terms of addressability the only mechanism thus far has
been the backend filesystem path, which could be derived from the
GFID only if it was cached in the inode table along with the entire set
of dentry ancestry leading up to the root.
This change essentially decouples addressability from the namespace. It
is no more necessary to be aware of the parent directory to address a
file or directory.
2. Why
-------
The biggest use case for such a feature is NFS for generating
persistent filehandles. So far the technique for generating filehandles
in NFS has been to encode path components so that the appropriate
inode_t can be repopulated into the inode table by means of a recursive
lookup of each component top-down.
Another use case is the ability to perform more intelligent self-healing
and rebalancing of inodes with hardlinks and also to detect renames.
A derived feature from GFID filehandles is anonymous FDs. An anonymous FD
is an internal USABLE "fd_t" which does not map to a user opened file
descriptor or to an internal ->open()'d fd. The ability to address a file
by the GFID eliminates the need to have a persistent ->open()'d fd for the
purpose of avoiding the namespace. This improves NFS read/write performance
significantly eliminating open/close calls and also fixes some of today's
limitations (like keeping an FD open longer than necessary resulting
in disk space leakage)
3. How
-------
At each storage/posix translator level, every file is hardlinked inside
a hidden .glusterfs directory (under the top level export) with the name
as the ascii-encoded standard UUID format string. For reasons of performance
and scalability there is a two-tier classification of those hardlinks
under directories with the initial parts of the UUID string as the directory
names.
For directories (which cannot be hardlinked), the approach is to use a symlink
which dereferences the parent GFID path along with basename of the directory.
The parent GFID dereference will in turn be a dereference of the grandparent
with the parent's basename, and so on recursively up to the root export.
4. Development
---------------
4a. To leverage the ability to address an inode by its GFID, the technique is
to perform a "nameless lookup". This means, to populate a loc_t structure as:
loc_t {
pargfid: NULL
parent: NULL
name: NULL
path: NULL
gfid: GFID to be looked up [out parameter]
inode: inode_new () result [in parameter]
}
and performing such lookup will return in its callback an inode_t
populated with the right contexts and a struct iatt which can be
used to perform an inode_link () on the inode (without a parent and
basename). The inode will now be hashed and linked in the inode table
and findable via inode_find().
A fundamental change moving forward is that the primary fields in a
loc_t structure are now going to be (pargfid, name) and (gfid) depending
on the kind of FOP. So far path had been the primary field for operations.
The remaining fields only serve as hints/helpers.
4b. If read/write is to be performed on an inode_t, the approach so far
has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and
then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do
fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost
in performance in the inbuilt NFS server.
5. Misc
-------
The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent
with the rest of the codebase.
Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f
BUG: 781318
Reviewed-on: http://review.gluster.com/669
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In afr_open_fd_fix we were unlocking the local->fd->lock, without
holding the lock on it if we were not able to get the fd context.
Now we are directly going to out and returning, instead of going
to unlock without holding the lock.
Change-Id: I0da638bbd2c269127cf111b3aac707e4a95d20c6
BUG: 783036
Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Reviewed-on: http://review.gluster.com/2658
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Each xlator prevents the user from setting glusterfs-internal
xattrs like trusted.gfid by handling it in respective setxattr
functions. The speacial case of trusted.gfid is handled in
fuse (Not in posix because posix_setxattr is used to set gfid).
* For xlators which did not define setxattr and/or fsetxattr,
the functions have been implemented with appropriate checks.
xlator | fops-added
_______________|__________________________
|
1. afr | fsetxattr
2. stripe | setxatrr and fsetxattr
3. quota | setxattr and fsetxattr
Change-Id: Ib62abb7067415b23a708002f884d30e8866fbf48
BUG: 765487
Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com>
Reviewed-on: http://review.gluster.com/685
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I239128c51b728fbb7814fd6a41020b76c88fbd93
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
BUG: 772876
Reviewed-on: http://review.gluster.com/2623
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Idc0a05a8a25f278a7ab05e242263e0a5001bde18
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
BUG: 767862
Reviewed-on: http://review.gluster.com/800
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case if lookup decides there is a gfid-mismatch,
some enoents and self-heal cant remove the stale entry,
it tells lookup to unwind with EIO but since ENOENT
has more priority it is not over-written, this patch
fixes that case.
Change-Id: Icd68c4a5cf05dd97c568964ab647a34fdb6e26f4
BUG: 765528
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/2541
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fop should unwind with appropriate errno
- Local is de-allocated on errors
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Change-Id: I4db40342ae184fe1cc29e51072e8fea72ef2cb15
BUG: 770513
Reviewed-on: http://review.gluster.com/2539
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of split-brain/all-fool xattrs perform conservative merge.
Don't treat ignorant subvol as fool.
Change-Id: I6ddf89949cd5793c2abbead7c47f091e8461f1d4
BUG: 765528
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/2521
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
client asserts for missing pargfid in case of unlink. So
Afr needs to make sure it is present in that fop.
Change-Id: Iea0ad65e1e7254c8df412942c52d5870e853aa51
BUG: 769055
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/2495
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I7615f31309c6c8f5373e1ff0535d84396dfa1455
BUG: 765430
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/807
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In rename the changelog modification needs to happen both on
old parent-dir and new parent-dir, so 2 stack winds are
done per brick.
Change-Id: I43f34661e397c4288162213944529e18b7724b1d
BUG: 766603
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/783
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ia52ddb551e24c27969f7f5fa0f94c1044789731f
BUG: 3823
Reviewed-on: http://review.gluster.com/743
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I00c714a89575023f6dbdd3430dcbf191e5d08019
BUG: 3650
Reviewed-on: http://review.gluster.com/740
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I9888d8a0b86fdaf6589885766f2de7222d8c8ba2
BUG: 3802
Reviewed-on: http://review.gluster.com/705
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
Reviewed-on: http://review.gluster.com/745
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I0d87f06f989b2d4b971967c52d4898331693a801
BUG: 3675
Reviewed-on: http://review.gluster.com/735
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Iee12828ca515d44ed71d9cf97dcb8627c85f0593
BUG: 3740
Reviewed-on: http://review.gluster.com/725
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I79a1c70c47649fbcf236191f174d766d5806545c
BUG: 3805
Reviewed-on: http://review.gluster.com/719
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I6295245a7f40ba4f786f1f9f35b337f3f711128d
BUG: 3783
Reviewed-on: http://review.gluster.com/739
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I2f123ef93989862aa796903a45682981d5d7fc3c
BUG: 3533
Reviewed-on: http://review.gluster.com/473
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I2319258743e478cc3a932d8ff0b2204a97cd4f8e
BUG: 3760
Reviewed-on: http://review.gluster.com/680
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
*) removed uuid_generate usage in pump and afr, self-heald
*) filled the gfids for the fops which were sending no gfid in loc
Change-Id: I85da3c10f5ee2006248b0123155a60867870d202
BUG: 3760
Reviewed-on: http://review.gluster.com/679
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the entire glusterfs codebase.
This patch fixes many of spell mistakes and typo in the entire
glusterfs codebase and all supported modules.
Change-Id: I83238a41aa08118df3cf4d1d605505dd3cda35a1
BUG: 3809
Signed-off-by: Harshavardhana <fharshav@redhat.com>
Reviewed-on: http://review.gluster.com/731
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I53b007fbdb42313d207d5d63fbfaaa6aaf033f95
BUG: 3518
Reviewed-on: http://review.gluster.com/523
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I0f078d1753db65d2f2e0380d1b0450c114cf40dd
BUG: 3518
Reviewed-on: http://review.gluster.com/522
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes self-heal of special files like device files, fifo files, socket files
etc. Does it by doing the following:
* Prevent setting of pending data xattr on a special file during entry self-heal
when a new fils is created.
* Allow data self-heal to be started on all file types other than directories.
During data self-heal, for special files just erase pending xattrs, if those
xattrs were set by previous releases of glusterfs.
Change-Id: I34d8121e23ad00e85371ae2a36ef30cf3bd5db7a
BUG: 3525
Reviewed-on: http://review.gluster.com/618
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I425e2d23e9e45f10ddeff2eacf918dd90f8baee7
BUG: 3744
Reviewed-on: http://review.gluster.com/639
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I600120252445c06d9cc3e7aa24022c2559b6abe2
BUG: 3747
Reviewed-on: http://review.gluster.com/638
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Iddf5b59d3534c517dcd3c0d7b819e3768f6e915a
BUG: 3747
Reviewed-on: http://review.gluster.com/637
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
with this patch, there are no more warnings with gcc (GCC) 4.6.1 20110908
Change-Id: Ice0d52d304b9846395f8a4a191c98eb53125f792
BUG: 2550
Reviewed-on: http://review.gluster.com/607
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Clearing pump status on migration complete is futile because we would be
'replacing' src brick with destination brick in the volume anyway.
Change-Id: Ib12fee84bd5445c4a20dac1cf10555331d7b8ebd
BUG: 3653
Reviewed-on: http://review.gluster.com/585
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: If67f726f21b713fa9312dc499a1aca4cb00f71de
BUG: 3682
Reviewed-on: http://review.gluster.com/589
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ifdf0db71594ce526ad85c21103726798d9aceef4
BUG: 3639
Reviewed-on: http://review.gluster.com/556
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since gfid is used to uniquely identify a inode, in the statedump
printing inode number is not necessary. Its suffecient if the gfid
of the inode is printed. And do not print the the inodelks, entrylks
and posixlks if the lock count is 0.
Change-Id: Idac115fbce3a5684a0f02f8f5f20b194df8fb27f
BUG: 3476
Reviewed-on: http://review.gluster.com/530
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I0ef541c1f387c397c345e3f2bc9a57f1eff282a1
BUG: 3647
Reviewed-on: http://review.gluster.com/527
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ica845035781f47de990e9dcfefdeb37bed99d515
BUG: 3637
Reviewed-on: http://review.gluster.com/536
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Used a #pragma to kill ~170 in rpcgen code. Added GF_UNUSED to deal with
a few more from macros elsewhere. The remainder are function return values
(mostly context and dict calls) that really should be checked. Those would
be harder to fix without real understanding of the code where they occur,
so they remain as reminders.
(Patchset 2: deal with older gcc that doesn't handle #pragma GCC diagnostic)
(Patchset 3: fix include paths in generated files)
(Patchset 4: keep up with trunk, squash 9 new warnings)
(Patchset 5: six more, all in AFR)
Change-Id: I29760c8c81be4d7e6489312c5d0e92cc24814b7b
BUG: 2550
Reviewed-on: http://review.gluster.com/378
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ibcaaa9c928195939ff1e31b28b592e524e63a423
BUG: 3557
Reviewed-on: http://review.gluster.com/519
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This cmd is used in the context of proactive self-heal for replicated
volumes. User invokes the following cmd when (s)he suspects that self-heal
needs to be done on a particular volume,
gluster volume heal <VOLNAME>.
Change-Id: I3954353b53488c28b70406e261808239b44997f3
BUG: 3602
Reviewed-on: http://review.gluster.com/454
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Afr transaction performs lock, pre-op, op, post-op and unlock steps in that
order. The child_up[] is overloaded with the information of where all
the first two steps succeeded. This works perfectly fine for
Transaction, but the locking/unlocking part of the code is re-used by
data self-heal. In that each loop_frame does lock, rchecksum,
read-from-source and write-to-sinks, unlock steps.
Rchecksum fop assumes that the fop needs to happen on one source + all
sinks and sets the call_count to that number. But if the lock step fails
on any of the sinks it will mark the child_up of that child to 0, which
will result in call_count mismatch and the frame will hang thinking that
some more cbks need to come. When this happens loop_frame will never go
to unlock step leading to hangs on that file.
Change-Id: I3dd0449cc6193a980bacf637d935881f4b22210a
BUG: 3597
Reviewed-on: http://review.gluster.com/474
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I96db0d94566ceabf1649f890318363f738c06553
BUG: 2458
Reviewed-on: http://review.gluster.com/403
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Id8a1dffa3c3200234ad154d1749278a2d7c7021b
BUG: 3502
Reviewed-on: http://review.gluster.com/336
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, lookup triggers data self-heal but that is not the preferred way
of operating replicated volumes. We would like the data self heals to be
triggered in open instead.
Number of back-ground self-heals allowed is 16 and lookups block until
self-heal is completed. We want to prevent blocking in fops. We can not make
lookups independent of self-heal frames because when there are gfid conflicts
the decision of which file is correct is determined in self-heal phase.
So in afr, lookup self-heal is going to guarantee name space consistency
and open/fd fops will take responsibility for data consistency, these
are non blocking. The user needs to set the option cluster.data-self-heal
"open" for this behavior.
Change-Id: If9463cdb9ebac114708558ec13bbca0270acd659
BUG: 3503
Reviewed-on: http://review.gluster.com/334
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a change in the way write transactions hold a lock
which optimizes the case of sequential writes from a single writer.
Lock phase of a transaction has two sub-phases. First is an attempt
to acquire locks in parallel by broadcasting non-blocking lock
requests. If lock aquistion fails on any server, then the held locks
are unlocked and revert to a blocking locked mode sequentially on
one server after another.
The change in this patch is to make the initial broadcasting lock
request attempt to acquire lock on the entire file. If this fails,
we revert back to the sequential "regional" blocking lock as before.
In the case where such an "eager" lock is granted in the non-blocking
phase, it gives rise to an opportunity for optimization. i.e, if
the next write transaction on the same FD arrives before the unlock
phase of the first transaction, it "takes over" the full file lock.
Similarly if yet another transaction arrives before the unlock phase
of the "optimized" transaction, that in turn "takes over" the lock
as well. The actual unlock now happens at the end of the last
"optimzed" transaction.
Any operation which arrives before the unlock phase of the previous
transaction is a potential candidate to become an "optimized"
transaction. In cases where the previous transaction had aquired
lock as a "regional" blocking lock, and the next transaction comes
in before its unlock phase, then it would not be an "optimized"
transaction.
Implied assumption
------------------
Since two or more transactions can now operate within the same
large lock, there is a possibility that overlapping transactions
can arrive at oppoosite orders on the servers. However in the
larger picture this is not possible as write-behind already
ensures that no two overlapping writes on an inode are in transit
at the same time. Overlapping writes across clients are not a
problem as they compete at locks anyways.
Theoretical benefits and potential harms
----------------------------------------
In case of a single writer: The benefits are large for sequential
writes. In the best case the entire file write can happen with just
one lock and unlock per server, provided writes are coming in fast
enough and getting pipelined by write-behind soon enough (which is
usually the case). If the writes are not coming in fast enough, then
the optimization "kicks in" for only those subsets of writes which
are close enough to get "piggybacked". For random writes the benefits
are the same as well. In any case the overall performance is better
than or equal to the performance without this optimization for a single
writer.
In case of multiple writers: When multiple writers are not writing
concurrently, there is no negative performance impact. When multiple
writers are writing concurrently to the same region, there is no
negative impact either, as they were previously getting arbitrated
at the locks translator too. In the case of multiple writers writing
to different regions concurrently, there will be an increased number
of "failovers" from failed parallel non-blocking to sequential blocking
regional locks. This above "worst case" has a simple workaround that
as soon as we detect > 1 open-fd-count in lookup xattr, we can disable
this optimization on those fds.
Beneficial side-effects
-----------------------
There is another similar optimization in AFR for changelogs which goes
by the name of "changelog-piggybacking". That works in a similar way where
pending flags get 'taken over' or 'piggybacked' by the next transaction
if its 'pre-op' phase kicks in before the 'post-op' phase of the
previous transaction. It has been observed that this changelog-piggybacking
optimization gives a saving of about ~55% savings of xattr calls hitting
the wire, measured across various types of network interfaces. The side
effect of this eager-lock optimization is that it gives an almost 100%
saving of xattr calls by making the optimistic-changelog work much more
efficiently as it gives a wider overlap of the xattr phases of two
consecutive transactions.
Change-Id: I41c02eb3b64c14c68ef66a344610ec3f024cd59d
BUG: 3409
Reviewed-on: http://review.gluster.com/240
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes ~200 such warnings, but leaves three categories untouched.
(1) Rpcgen code.
(2) Macros which set variables in the outer (calling function) scope.
(3) Variables which are set via function calls which may have side effects.
Change-Id: I6554555f78ed26134251504b038da7e94adacbcd
BUG: 2550
Reviewed-on: http://review.gluster.com/371
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The steps in normal data self heal:
1) take big lock by self-heal frame. Get the xattrs/stat to decide
source, sink information.
2) spawn loop frames which perform self-heal by taking small locks on
the file. Every time a new lock is taken and the old lock is released.
3) Before releasing the final small lock a big lock is taken by the
self-heal frame, and unlock on small-lock. Erasing of the pending xattrs
happen then the big unlock happen and that is the end of the data self-heal.
When a data self-heal is needed for a file and the fop
that triggers the self-heal is open with O_TRUNC. Fuse sends open then
an explicit truncate for this. Open triggers the self-heal but by the
time it tries to spawn the loops the file size is truncated to 0, so
no loops are formed.
These are the steps:
1) Take big lock by self-heal frame. Get the xattrs/stat to decide
source, sink information.
2) loop frames are not spawned. The big lock is not released.
3) One more big lock is taken by the same self-heal frame, Erasing of
the pending xattrs etc happen, now it does two big unlocks, but after
the first unlock, the information on which the locks were performed is
forgotten, so the next unlock becomes a no-op. So there is a stale big
lock on that file preventing further writes.
As a fix, if the loops are not spawned, use the previous big lock to
perform the rest of the operations needed in completing the data
self-heal. No need to have one more big lock.
Change-Id: Id03171269594e447b2b6d1331e362d83bd1e3430
BUG: 3506
Reviewed-on: http://review.gluster.com/339
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is brought in an effort to be nice to the system resources when
self-heal is in progress.
Change-Id: I123f1eb4d8000613a35c0117f0aa27f926f3a921
BUG: 3503
Reviewed-on: http://review.gluster.com/333
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: I66362a3087a635fb7b759d7836a1f6564a6a7fc9
BUG: 3456
Reviewed-on: http://review.gluster.com/294
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code is checking for priv->child_up[i], which can change while the fop
is in progress. Since pending[child][id-of-transaction] alone is enough
to tell if the child became stale or not, use just that.
Change-Id: I494bf02cca66f4fd41526195fafce86a202c6bd1
BUG: 3455
Reviewed-on: http://review.gluster.com/293
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Idce22a6266c354e327d5d717715d2e62533eec58
BUG: 3448
Reviewed-on: http://review.gluster.com/292
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ie026ebed98cf5ff75ae1a13437d29f67d0e0254a
BUG: 3448
Reviewed-on: http://review.gluster.com/286
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Reviewed-by: Vijay Bellur <vijay@gluster.com>
|