glusterfs.git/xlators/cluster, branch v3.4.0alpha2

cluster/afr: do complete split-brain check in all the fd based fops

2013-03-05T10:22:46+00:00

fd based operations such as readv checked only for data split brain
instead of complete split-brain (i.e both data + metadata) assuming that
open would have done the complete split-brain check. However open-behind
would have unwound open, without winding to afr thus preventing the complete
split-brain check and some appliations will be able to read the contents
of the file even though the file has metadata split-brain. So let all
the fd based fops do a defensive check of complete split-brain.

Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28
BUG: 846240
Signed-off-by: Raghavendra Bhat 
Reviewed-on: http://review.gluster.org/4620
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/distribute: Reopen fds in migration internally as root:root

2013-03-04T10:42:30+00:00

Though linkfile_create and rebalance dst file create sent a setattr
with correct ownership, there is still a race window where the linkfile
open (client open due to migration) will fail, as its ownership will be
root:root.

BUG: 884597
Change-Id: Iba73681eae4f280d39ee6c9a40009e195768bee7
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/4612
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy

cluster/distribute: Prevent spurious multiple defrag crawls

2013-03-04T10:42:10+00:00

In dht_notify, we used to create a thread to start defrag
crawls after we had heard from all child subvols.
This was in-correct, as a later event, could also trigger the
crawl again(due to the fact that all subvols had responded).

The fix is to make sure, the thread is started only once after
all subvols have responded the first time

BUG: 916449
Change-Id: I1619344fbb1cb51d5e1db38d8a29821fa870fa8b
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/4610
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy

cluster/distribute: Preserve file size during rebalance migration

2013-03-04T10:42:00+00:00

If holes are encountered, then we do not write these to the dst,
which sometimes causes file size to be lesser than src. Data is not
corrupted, as when non-zero reads are received, we do write that data.

Calling a truncrate to give file size to prevent it from being
truncated to less than src in case the file end has holes.

Thanks to Brian Foster for providing the test case

BUG: 915554
Change-Id: I7e1e0c475118b073c3ebb87e93220c1ec22e8b7d
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/4609
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/distribute: Remove suprious fd_unref call

2013-03-04T10:40:32+00:00

After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not
hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore.

BUG: 910661
Change-Id: Idedff91270966e6e70e71ee83785c0228e238d31
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/4608
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy

cluster/dht: Create linkfile with file uid/gid

2013-03-04T10:40:01+00:00

Currently, linkfile creation happens as root.

use uid/gid returned from _cbk (link/rename) to set the correct ownership of
the link files.

Also added test/dht.rc to implement common dht functions

BUG: 884597
Change-Id: I6bc0e04f62d4716fc033681e5678e852a1be7a2f
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/4607
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy

cluster/dht: pathinfo xattr changes for directories

2013-02-09T03:09:46+00:00

Since directories have presence on all subvolumes there is
no definite meaning of ->hashed_subvol or ->cached_subvol.
getxattr() code path chooses ->cached_subvol for pathinfo
extended attribute. While this makes sense of files, it makes
less sense for directories. Further if a hashed or a cached
subvolume is down, and there's a getxattr request for a
directory, we return with an errno.

This patch changes pathinfo extended attribute contents by
aggregating information from all subvolumes that are up.

Change-Id: I58adb741d63ccfd1d0239af75eb65f26f0fb384d
Signed-off-by: Venky Shankar 
BUG: 856455
Reviewed-on: http://review.gluster.org/4047
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

Use proper libtool option -avoid-version instead of bogus -avoidversion

2013-02-07T23:12:56+00:00

Change-Id: I1c9541058c7d07786539a3266ca125a6a15287d8
BUG: 859835
Signed-off-by: Anand Avati 
Original-author: Kacper Kowalik (Xarthisius) 
Signed-off-by: Kacper Kowalik (Xarthisius) 
Reviewed-on: http://review.gluster.org/3967
Tested-by: Gluster Build System

afr: serialize modification of {entrylk,inodelk}_lock_count

2013-02-07T19:09:35+00:00

Typically this lock was not needed in practice, but with
http://review.gluster.org/3842, this code gets executed in multiple
threads for different servers and we lose a count. This results in
leaked lock and a hang for a future transaction.

Change-Id: I377ed20e44f2a45cff522289dfef181f0653eca2
BUG: 765564
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/4480
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System

dht: better layout-optimization algorithm

2013-02-07T16:27:40+00:00

This method deals with the case where swapping might gain a bigger overlap
for the xlator currently under consideration, but sacrifices even more from
the xlator we're swapping with. For example:

A = 0x00000000 - 0x44444443 (new 0x00000000 - 0x55555554)
B = 0x44444444 - 0x77777776 (new 0x55555555 - 0xaaaaaaa9)
C = 0x77777777 - 0xffffffff (new 0xaaaaaaaa - 0xffffffff)

Here, the new range for B has a bigger overlap with the old C than with the
old B (0x33333333 vs. 0x22222222 to be precise) so looking only at that
might lead us to swap. However, such a swap turns the new C's overlap from
0x55555556 (vs. old C) to *zero* (vs. old B).  In other words, we've gained
0x11111111 for B but lost 0x55555556 for C, so it's a bad idea.

The new algorithm accounts for all effects of the swap, so it not only avoids
bad swaps but can make some good ones that would have been missed previously.
For example, if swapping a range X with a later range Y would not increase the
overlap for X we would previously have skipped it even if the swap would
increase Y's overlap without affecting X's.  This is the normal case when we're
adding a new brick (which initially has zero overlap with any old range) so
finding more good swaps is probably even more important than avoiding bad ones.

Also, the logic in dht_overlap_calc was completely broken before, causing
integer overflows instead of providing correct values, so no matter what
higher-level algorithm was in place the GIGO effect would have resulted in
bad decisions.

Change-Id: If61ed513cfcb931916c6b51da293e3efbaaf385f
BUG: 853258
Signed-off-by: Jeff Darcy 
Reviewed-on: http://review.gluster.org/3908
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati