| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a change in the way write transactions hold a lock
which optimizes the case of sequential writes from a single writer.
Lock phase of a transaction has two sub-phases. First is an attempt
to acquire locks in parallel by broadcasting non-blocking lock
requests. If lock aquistion fails on any server, then the held locks
are unlocked and revert to a blocking locked mode sequentially on
one server after another.
The change in this patch is to make the initial broadcasting lock
request attempt to acquire lock on the entire file. If this fails,
we revert back to the sequential "regional" blocking lock as before.
In the case where such an "eager" lock is granted in the non-blocking
phase, it gives rise to an opportunity for optimization. i.e, if
the next write transaction on the same FD arrives before the unlock
phase of the first transaction, it "takes over" the full file lock.
Similarly if yet another transaction arrives before the unlock phase
of the "optimized" transaction, that in turn "takes over" the lock
as well. The actual unlock now happens at the end of the last
"optimzed" transaction.
Any operation which arrives before the unlock phase of the previous
transaction is a potential candidate to become an "optimized"
transaction. In cases where the previous transaction had aquired
lock as a "regional" blocking lock, and the next transaction comes
in before its unlock phase, then it would not be an "optimized"
transaction.
Implied assumption
------------------
Since two or more transactions can now operate within the same
large lock, there is a possibility that overlapping transactions
can arrive at oppoosite orders on the servers. However in the
larger picture this is not possible as write-behind already
ensures that no two overlapping writes on an inode are in transit
at the same time. Overlapping writes across clients are not a
problem as they compete at locks anyways.
Theoretical benefits and potential harms
----------------------------------------
In case of a single writer: The benefits are large for sequential
writes. In the best case the entire file write can happen with just
one lock and unlock per server, provided writes are coming in fast
enough and getting pipelined by write-behind soon enough (which is
usually the case). If the writes are not coming in fast enough, then
the optimization "kicks in" for only those subsets of writes which
are close enough to get "piggybacked". For random writes the benefits
are the same as well. In any case the overall performance is better
than or equal to the performance without this optimization for a single
writer.
In case of multiple writers: When multiple writers are not writing
concurrently, there is no negative performance impact. When multiple
writers are writing concurrently to the same region, there is no
negative impact either, as they were previously getting arbitrated
at the locks translator too. In the case of multiple writers writing
to different regions concurrently, there will be an increased number
of "failovers" from failed parallel non-blocking to sequential blocking
regional locks. This above "worst case" has a simple workaround that
as soon as we detect > 1 open-fd-count in lookup xattr, we can disable
this optimization on those fds.
Beneficial side-effects
-----------------------
There is another similar optimization in AFR for changelogs which goes
by the name of "changelog-piggybacking". That works in a similar way where
pending flags get 'taken over' or 'piggybacked' by the next transaction
if its 'pre-op' phase kicks in before the 'post-op' phase of the
previous transaction. It has been observed that this changelog-piggybacking
optimization gives a saving of about ~55% savings of xattr calls hitting
the wire, measured across various types of network interfaces. The side
effect of this eager-lock optimization is that it gives an almost 100%
saving of xattr calls by making the optimistic-changelog work much more
efficiently as it gives a wider overlap of the xattr phases of two
consecutive transactions.
Change-Id: I41c02eb3b64c14c68ef66a344610ec3f024cd59d
BUG: 3409
Reviewed-on: http://review.gluster.com/243
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Iec8b609e66ef21f4fdd6ee2ff3060f0b71d47ca0
BUG: 3046
Reviewed-on: http://review.gluster.com/237
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
|
|
|
|
|
|
|
| |
Change-Id: I0c54fd1c15550e5e5551e95ed32adb14d8029fab
Reviewed-on: http://review.gluster.com/238
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ie67c4da49876555c3162909e474b9089a85f99a6
BUG: 3182
Reviewed-on: http://review.gluster.com/256
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
so stripe is more acl friendly
Porting patch ded0a9a2a0a9024def7a4b199ac3bbfa5d66485a from master
Change-Id: I0c7d8fb90714a4d92620646d940a58be58a3cf66
BUG: 3368
Signed-off-by: Amar Tumballi <amar@gluster.com>
Reviewed-on: http://review.gluster.com/202
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Id1f1a91cf15d933d5621a0073ddaebe02df0f159
BUG: 3348
Reviewed-on: http://review.gluster.com/198
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ibf5f45431d7a55b70d7304649af652d6f25bb688
BUG: 3348
Reviewed-on: http://review.gluster.com/183
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The presence of local->cached_subvol makes dht_lookup_everywhere_done behave
as though it was a lookup on a file where linkfile needs to be recreated. In
a fresh lookup, local->cached_subvol should be NULL.
Change-Id: Ie6bd6ad536def03d970526d51e20c6daeb00922b
BUG: 3317
Reviewed-on: http://review.gluster.com/186
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done, so that there is no gfid mismatch. Unlink the older
linkfile if it exists, and recreate it with the correct gfid.
Also removed unused rename related code.
BUG: 2522
Change-Id: Ia880adda5a94351f30971576b4faa861fac4682d
Reviewed-on: http://review.gluster.com/144
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Ib9cac6ed1635203802f089986f8acb1ce416265d
BUG: 3215
Reviewed-on: http://review.gluster.com/97
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
| |
also do some cleanups
Change-Id: Id792ac11b61627201ca08b9f271724dc3e9c5cd7
BUG: 3253
Reviewed-on: http://review.gluster.com/111
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the pre-GFID era, the linkfile of the destination file could be reused
the linkfile for the renamed file when dst_cached == src_cached.
This patch handles this situation and reverts the previous (wrong) fix.
Change-Id: Iba57b5eb91cf8b1fb40e74f6399cdf99b8b00410
BUG: 2464
Reviewed-on: http://review.gluster.com/89
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amar@gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
when the code path enters the 'subvol_filled()' case, local->params is set,
which contains the 'gfid-req' value, but the linkfile creation was not
checking for its existance.
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3159 (mknod (linkfile creation) with no 'gfid-req' key)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3159
|
|
|
|
|
|
|
|
|
| |
let the race get arbitrated at the dst_hashed subvolume.
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2522 ([glusterfs-3.1.3qa8]: rm -rf shows invalid argument)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2522
|
|
|
|
|
|
|
|
|
|
| |
Return the values received from the subvol
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3057 (acl permissions don't work on nfs mount)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3057
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@gluster.com>
BUG: 3138 ([release-3.2]: ls shows 2 entries)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3138
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lookup uses the sources array to decide if a child is read_child or not.
So if afr_mark_sources returns 0 i.e. all children are sources,
explicitly mark them as sources.
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Vijay Bellur <vijay@gluster.com>
BUG: 3138 ([release-3.2]: ls shows 2 entries)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3138
|
|
|
|
|
|
|
| |
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2815 (Server-enforced ACLs)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2815
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3072 (Crash in afr_access_cbk)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3072
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3077 (afr [f]truncate locks wrong region in transaction)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3077
|
|
|
|
|
|
|
|
| |
Signed-off-by: Krishnan Parthasarathi <kp@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3050 ('replace-brick' hangs on vm's)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3050
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Krishnan Parthasarathi <kp@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 3036 (self-heal problem in replace-brick)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3036
BUG: 3036 (self-heal problem in replace-brick)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3036
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
these fields are used mainly in case of selfheal path, where
'inode->gfid'||'parent->gfid' is not yet set.
These fields in 'loc' will have lower precedence than 'inode->gfid'
in client protocol.
also contains 'Pranith <pranithk@gluster.com>'s patch to set proper
loc->gfid during afr selfheal
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2346 (Log message enhancements in GlusterFS - phase 1)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
|
|
|
|
|
|
|
|
|
|
| |
If locks could not be held on any of the servers, then propagate the
errno returned by the lock FOPs instead of hardcoding EAGAIN/EINVAL.
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2993 ([glusterfs-3.2.0qa2]: hang while doing the selfheal)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2993
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.. to perform lookups on remaining subvolumes. This way, if there is a
race between two clients to 'fix' GFIDs with gfid-req, then the hashed
subvolume will arbitrate and return the winner in stbuf->ia_gfid. This
patch uses the returned gfid as the new gfid-req thereby preventing
mismatching GFIDs on other servers due to further races.
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2522 ([glusterfs-3.1.3qa8]: rm -rf shows invalid argument)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2522
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2840 (files not getting self-healed when the first child goes down)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2840
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2840 (files not getting self-healed when the first child goes down)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2840
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2840 (files not getting self-healed when the first child goes down)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2840
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kaushik BV <kaushikbv@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2978 (Geo-replication fails on stripe(Master) setup.)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2978
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When selfheal of dir is triggered, make sure the dirs are recreated
with the correct gfid, to prevent mismatch of gfids in the backend.
Also, remove the spurious memcpy to inode.gfid
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2994 ([glusterfs-3.2.1qa2]: untar and rm in parallel hangs untar)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2994
|
|
|
|
|
|
|
|
| |
Signed-off-by: Krishnan Parthasarathi <kp@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2489 (GlusterFS crashing with replace-brick)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2489
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2986 (Failed operations should should be logged `E' or `W')
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2986
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Junaid <junaid@gluster.com>
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2760 (Quota: stripe volume not showing the quota size properly)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2760
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
earlier logic of determining if different subvolumes have different
gfid for same file had a flaw. It could have checked with a empty
gfid field in case a reply comes from other subvolumes before first
subvolume
Signed-off-by: Amar Tumballi <amar@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2773 ([glusterfs-3.2.0qa12]: stripe lookup says gfid different)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2773
|
|
|
|
|
|
|
|
|
|
|
| |
If a older link file exists, unlink it and then create the linkfile.
This will prevent mis-match of gfid's.
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2464 ([7b07d444a77526f27f860210930bf1d4c7fbea9b]: rm -rf gives Invalid argumenrt error)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2464
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Anand Avati <avati@gluster.com>
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2522 ([glusterfs-3.1.3qa8]: rm -rf shows invalid argument)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2522
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2870 (Inconsistent xattr values when creating bricks)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2870
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Krishnan Parthasarathi <kp@gluster.com>
BUG: 2909 (replace brick of empty brick never says migration completed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2909
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2909 (replace brick of empty brick never says migration completed)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2909
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2949 (self-heal hangs)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2949
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2870 (Inconsistent xattr values when creating bricks)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2870
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2870 (Inconsistent xattr values when creating bricks)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2870
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier rmdir would succeed on all up subvols, but fuse would get an
error if one of the subvol was down. In follow up lookup, self heal
would be triggered, and since st_mode would be 0, the permissions
would be bad. The behaviour now is to fail rmdir if subvol is down
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2591 (Directories changing to d--------- permission after trying to delete)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2591
|
|
|
|
|
|
|
|
|
|
| |
calls.
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2760 (Quota: stripe volume not showing the quota size properly)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2760
|
|
|
|
|
|
|
|
| |
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2765 (geo-replication should have mercy on brick failure)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2765
|
|
|
|
|
|
|
|
|
|
| |
If this is set, when CHILD_DOWN event is received, call exit
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2536 (gsync service introspection)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2536
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2750 ([glusterfs-3.2.0qa11]: nfs server crashed in afr_sh_entry_expunge_cbk)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2750
|
|
|
|
|
|
|
|
| |
Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2726 ([glusterfs-3.2.0qa11]: glusterfs server crashed due to stack overflow)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In afr_private_t structure favorite child is declared as unsigned int.
In init function of afr we set favorite child to -1, if that option is
not found in volfile. But favorite child value will be set to a huge
value instead of -1 since it is an unsigned int and in statedump file
favorite child value is displayed as a huge value instead of -1.
Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2668 ([glusterfs-3.2.9qa7]: createbench error)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2668
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
path is a directory while aggregating quota-xattrs.
- The total number of lookups sent for a directory is equal to
(no of children + 1). Hence we should not aggregate the xattrs
from the first lookup.
Signed-off-by: Raghavendra G <raghavendra@gluster.com>
Signed-off-by: Junaid <junaid@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2604 (Quota: crossing the set limit)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2604
|
|
|
|
|
|
|
|
|
|
|
| |
Reverting commit 23d9783a192669b638d42b8dd127ad69ea36f950.
When first subvolume is down, mount point becomes inaccessible.
Signed-off-by: shishir gowda <shishirng@gluster.com>
Signed-off-by: Anand Avati <avati@gluster.com>
BUG: 2532 ([glusterfs-3.1.3qa8]: bringing first subvolume down makes mount point inaccessible)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2532
|