glusterfs.git/xlators/cluster/afr/src/afr-common.c, branch v3.7.0beta1

afr : null dereference coverity fix.

2015-04-08T05:37:12+00:00

CID : 1194648

Change-Id: Ib26e7cdbf412d563240885fb3113bcc1fe5c9c49
BUG: 789278
Signed-off-by: Manikandan Selvaganesh 
Reviewed-on: http://review.gluster.org/9571
Tested-by: Gluster Build System 
Reviewed-by: Krutika Dhananjay 
Reviewed-by: Kaleb KEITHLEY

Avoid conflict between contrib/uuid and system uuid

2015-04-04T17:48:35+00:00

glusterfs relies on Linux uuid implementation, which
API is incompatible with most other systems's uuid. As
a result, libglusterfs has to embed contrib/uuid,
which is the Linux implementation, on non Linux systems.
This implementation is incompatible with systtem's
built in, but the symbols have the same names.

Usually this is not a problem because when we link
with -lglusterfs, libc's symbols are trumped. However
there is a problem when a program not linked with
-lglusterfs will dlopen() glusterfs component. In
such a case, libc's uuid implementation is already
loaded in the calling program, and it will be used
instead of libglusterfs's implementation, causing
crashes.

A possible workaround is to use pre-load libglusterfs
in the calling program (using LD_PRELOAD on NetBSD for
instance), but such a mechanism is not portable, nor
is it flexible. A much better approach is to rename
libglusterfs's uuid_* functions to gf_uuid_* to avoid
any possible conflict. This is what this change attempts.

BUG: 1206587
Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a
Signed-off-by: Emmanuel Dreyfus 
Reviewed-on: http://review.gluster.org/10017
Tested-by: Gluster Build System 
Reviewed-by: Niels de Vos

Tests: fix spurious failure in read-subvol-entry.t

2015-04-02T11:04:38+00:00

read-subvol-entry.t tests that if a brick has pending operations,
it is not used for readdir operations. On NetBSD this test exhibits
spurious failures, with the wrong brick being used to perform readdir.

It happens because when afr_replies_interpret() looks at xattr for
pending attributes, it uses alternative bahvior whether it is working
on a directory or another object. The decision is based on inode->ia_type,
which may be IA_INVAL at that time if we come there from:
  afr_replies_interpret.()
  afr_xattrs_are_equal()
  afr_lookup_metadata_heal_chec()
  afr_lookup_entry_heal()
  afr_lookup_cbk()

Using replies[i].poststat.ia_type, which is correctly set, works around
the problem.

BUG: 1129939
Change-Id: Id9ccdd8604f79a69db5f1902697f8913acac50ad
Signed-off-by: Emmanuel Dreyfus 
Reviewed-on: http://review.gluster.org/9831
Tested-by: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY 
Reviewed-by: Ravishankar N 
Reviewed-by: Vijay Bellur

features/bit-rot: Implementation of bit-rot xlator

2015-03-24T17:55:32+00:00

This is the "Signer" -- responsible for signing files with their
checksums upon last file descriptor close (last release()).
The event notification facility provided by the changelog xlator
is made use of.

Moreover, checksums are as of now SHA256 hash of the object data
and is the only available hash at this point of time. Therefore,
there is no special "what hash to use" type check, although it's
does not take much to add various hashing algorithms to sign
objects with. Signatures are stored in extended attributes of the
objects along with the the type of hashing used to calculate the
signature. This makes thing future proof when other hash types
are added. The signature  infrastructure is provided by bitrot
stub: a little piece of code that sits over the POSIX xlator
providing interfaces to "get or set" objects signature and it's
staleness.

Since objects are signed upon receiving release() notification,
pre-existing data which are "never" modified would never be
signed. To counter this, an initial crawler thread is spawned
The crawler scans the entire brick for objects that are unsigned
or "missed" signing due to the server going offline (node reboots,
crashes, etc..) and triggers an explicit sign. This would also
sign objects when bit-rot is enabled for a volume and/or after
upgrade.

Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b
BUG: 1170075
Original-Author: Raghavendra Bhat 
Signed-off-by: Venky Shankar 
Reviewed-on: http://review.gluster.org/9711
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/afr : enable inspection & resolution of files in split-brain

2015-03-19T13:33:12+00:00

Part 2/2 patch to enable users analyze and resolve
split-brain.

This patch enables :
1) Users to inspect the files in data and metadata split-brain.
2) Resolve the split-brain.
Both using a series of setfattr commands.

Consider a volume "test" with 2 bricks.

1) To inspect a file f1:
setfattr -n replica.split-brain-choice -v test-client-0 f1
        After the execution of this command, if no read_subvol
is found, reads will be served from test-client-0 (corresponding
to brick-0).

2) To resolve split-brain :
setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1
        Execution of this command will lead to the resolution
of data and metadata split-brain with subvol mentioned in the
command (test-client-0 here) as the source and the rest as sink.

Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179
BUG: 1191396
Signed-off-by: Anuradha 
Reviewed-on: http://review.gluster.org/9743
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System

afr: remove stale index entries

2015-03-17T16:14:44+00:00

Problem:
During pre-op phase, the index xlator
1. Creates the entry inside .glusterfs/indices/xattrop
2. Winds the xattrop fop to posix to mark dirty/pending changelogs.
If the brick crashes after 1, the xattrop entry becomes stale and never
gets removed by shd during subsequent crawls because there is nothing to
heal (changelogs are zero).

Though the stale entry does not get displayed in the output of 'heal
info' command, it nevertheless stays there forever unless a new write
transaction is performed on the file.

Fix:
During index self-heal if afr xattrs are found to be clean (indicated by
ret value of 2 on a call to afr_shd_selfheal(), send a dummy
post-op with all 0s for the xattr values, which makes the index xlator
to unlink the stale entry.

Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919
BUG: 1190069
Signed-off-by: Ravishankar N 
Reviewed-on: http://review.gluster.org/9714
Tested-by: Gluster Build System 
Reviewed-by: Anuradha Talur 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri

cluster/afr: Handle getxattr of quota-size key

2015-03-09T09:40:53+00:00

Afr needs to query QUOTA_SIZE_KEY from all the subvolumes and return the
value which is maximum of the readable bricks.

Change-Id: Ibb9064c8652aea0d984796e7a06f8adca72aa971
BUG: 1199431
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9820
Reviewed-by: Anuradha Talur 
Tested-by: Gluster Build System 
Reviewed-by: Krutika Dhananjay

cluster/afr: Implementation of quorum-reads

2015-03-06T05:56:20+00:00

Provide a way of disabling reads when quorum is not met.

Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5
BUG: 1187885
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9543
Reviewed-by: Ravishankar N 
Tested-by: Gluster Build System

cluster/afr: Do not increment healed_count if no healing was performed

2015-03-05T00:37:56+00:00

PROBLEM:
When file modifications are happening while index heal is launched,
index healer could pick up entries which appeared in indices/xattrop
transiently during the course of the operations on the mount point, and
do not really need any heal. This will cause index healer to keep doing
index-heal in a loop as long as it finds this entry, by believing that
it did successfully heal some gfids even when it didn't.

FIX:
afr_selfheal() now returns a 1 to indicate that it did not (need to)
heal a given gfid. afr_shd_selfheal() will not increment healed_count
whenever afr_selfheal() returns a 1.

Change-Id: I0d97e11392a032a852e8c6508f691300ef0e5b98
BUG: 1194305
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/9713
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
Reviewed-by: Ravishankar N 
Tested-by: Gluster Build System

glfs_fini: Clean up all the resources allocated in glfs_new.

2015-03-04T15:15:12+00:00

Initially even after calling glfs_fini(), all the threads created
during init and many other resources like memory pool, iobuf pool,
event pool and other memory allocs were not being freed.
With this patch these resources are freed in glfs_fini().

The two thumb rules followed in this patch are:
- The threads are not killed, they are made to exit voluntarily,
once the queued tasks are completed. The main thread waits for
the other threads to exit.
- Free the memory pools and destroy the graphs only after all the
other threads are stopped, so that there are less chances of
hitting access after free.

Resources freed and its order:
1. Destroy the inode table of all the graphs - Call forget on all the inodes.
This will not be required when the cleanup during graph switch is
implemented to perform inode table destroy.
2. Deactivate the current graph, call fini of all the xlators.
3. Syncenv destroy - Join the synctask threads and cleanup syncenv resources
Sets the destroy mode, complete the existing synctasks, then join the
synctask threads.
After entering the destroy mode,
-if a new synctask is submitted, it fails.
-if syncenv_new() is called, it will end up creating new threads,
but this is called only during init.
4. Poller thread destroy
Register an event handler which sets the destroy mode for the poller.
Once the poller is done processing all the events, it exits.
5. Tear down the logging framework
The log file is closed and the log level is set to none, after this
point no log messages appear either in log file or in stderr.
6. Destroy the timer thread
Set the destroy bit, once the pending timer events are processed
the timer thread exits.
Note: Log infrastructure should be shutdown before destroying the timer
thread as gf_log uses timers.
7. Destroy the glusterfs_ctx_t
For all the graphs(active and passive), free graph, xlator structs and few other lists.
Free the memory pools - iobuf pool, event pool, dict, logbuf pool,
stub mem pool, stack mem pool, frame mem pool.

Few things not addressed in this patch:
1. rpc_transport object not destroyed, the PARENT_DOWN should have
destroyed this object but has not, needs to be addressed as a part
of different patch
2. Each xlator fini should clean up the local pool allocated by its xlator.
Needs to be addresses as a part of different patch.
3. Each xlator should implement forget to free its inode_ctx.
Needs to be addresses as a part of different patch.
3. Few other leaks reported by valgrind.
4. fd and fd contexts

The numbers:
The resource usage by the test case in this patch:
Without the fix, Memory: ~3GB; Threads: ~81
With this fix, Memory: 300MB; Threads: 1(main thread)

Change-Id: I96b9277541737aa8372b4e6c9eed380cb871e7c2
BUG: 1093594
Signed-off-by: Poornima G
Reviewed-on: http://review.gluster.org/7642
Tested-by: Gluster Build System
Reviewed-by: Rajesh Joseph
Reviewed-by: Raghavendra Talur
Reviewed-by: Krishnan Parthasarathi
Reviewed-by: Pranith Kumar Karampuri
Reviewed-by: Shyamsundar Ranganathan