glusterfs.git/xlators/mgmt/glusterd/src/glusterd-rebalance.c, branch v3.4.2qa2

glusterd: big lock - a coarse-grained locking to prevent races

2013-04-17T12:48:50+00:00

There are primarily three lists that are part of glusterd process,
that are concurrently accessed. Namely, priv->volumes, priv->peers
and volinfo->bricks_list.

Big-lock approach
-----------------
WHAT IS IT?
Big lock is a coarse-grained lock which protects all three
lists, mentioned above, from racy access.

HOW DOES IT WORK?
At any given point in time, glusterd's thread(s) are in execution
_iff_ there is a preceding, inbound network event. Of course, the
sigwaiter thread and timer thread are exceptions.
A network event is an external trigger to glusterd, via the epoll
thread, in the form of POLLIN and POLLERR.
As long as we take the big-lock at all such entry points and yield
it when we are done, we are guaranteed that all the network events,
accessing the global lists, are serialised.

This amounts to holding the big lock at
- all the handlers of all the actors in glusterd. (POLLIN)
- all the cbks in glusterd. (POLLIN)
- rpc_notify (DISCONNECT event), if we access/modify
  one of the three lists. (POLLERR)

In the case of synctask'ized volume operations, we must remember that,
if we held the big lock for the entire duration of the handler,
we may block other non-synctask rpc actors from executing.
For eg, volume-start would block in PMAP SIGNIN, if done incorrectly.
To prevent this, we need to yield the big lock, when we yield the
synctask, and reacquire on waking up of the synctask.

BUG: 948686
Change-Id: I429832f1fed67bcac0813403d58346558a403ce9
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/4835
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd: Fix check for task-id existence in 'volume status'

2013-03-05T09:05:59+00:00

This fixes the issue of task-id tests failing randomly. The condition used to
check rebalance/remove-brick was running was wrong, which could lead to the
task-id for these tasks to not be displayed even when the actual commit hadn't
occured.

BUG: 857330
Change-Id: I0f86c6bbe7acec586ee0ea6e663369ea26171904
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/4617
Reviewed-by: Jeff Darcy 
Tested-by: Gluster Build System

glusterd: do dict unref after sending reply to cli

2013-02-03T20:35:09+00:00

This patch channelizes dict unrefs of dictionaries created from the cli
req during volume ops to one common function - glusterd_to_cli() - which
is guaranteed to be called irrespective of whether the command succeeds
or fails.

This patch also removes extra unrefs at a few places.

Change-Id: Ic8ba7166387b5dfd1f5ae860539e1b7093a94662
BUG: 861044
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/4003
Tested-by: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Anand Avati

glusterd: Add GF_ASSERT check in glusterd volume op handlers

2013-01-09T07:58:41+00:00

Change-Id: Iea6ac1e612812ba8ffc4b60899a9e574a3b09ea6
BUG: 873549
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/4346
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Vijay Bellur 
Tested-by: Vijay Bellur

glusterd, cli: Task id's for async tasks

2012-12-19T21:32:49+00:00

This patch introduces task-id's for async tasks like rebalance, remove-brick and
replace-brick. An id is generated for each task when it is started and displayed
to the user in cli output. The status of running tasks is also included in the
output of "volume status" along with its id, so that a user can easily track the
progress of an async task.

Also,
 * added tests for this feature into the regression test suite.
 * added a python script for creating files, 'create-files.py', courtesy
   Vijaykumar Koppad (vkoppad@redhat.com) into the test suite.

This patch reverts the revert commit 698deb33d731df6de84da8ae8ee4045e1543a168.

BUG: 857330
Change-Id: Id43d7cb629a38f47f733fbc18cb4c5f2f0327c7a
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/4294
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/dht: Add "afr.readdir-failover=off" option the rebalance process

2012-12-17T03:45:53+00:00

By failing over readdir (default behaviour), rebalance could get duplicate
files, as readdir would re-read from offset 0. Rebalance should not attempt
to migrate these files again.

Additionally, we need to handle these cases as failure in rebalance crawl.

No test case provided, as we cannot determine the read child in afr.

Change-Id: If07508b4f92dacc17e0f695b48a866c7c66004be
BUG: 859387
Signed-off-by: shishir gowda 
Reviewed-on: http://review.gluster.org/4300
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

glusterd: log appropriate message when locking fails

2012-12-12T00:06:08+00:00

PROBLEM:

When a transaction is already in progress, and the user tries to
execute another glusterd operation, the second operation fails as
glusterd fails to acquire lock. But to the user, a message like
"Operation failed" does not give ample information about why the
operation failed.

FIX:

Made glusterd_op_txn_begin use and initialise error string, which is
needed to capture failure in the "lock" phase.

Also made gd_sync_task_begin set error string appropriately when
locking fails.

In the process, I had to introduce error string in some glusterd_handle_*
functions. And because I introduced error string in these handlers, I
decided to also set them in places where these handlers could possibly
fail.

HOW I TESTED IT:

For want of a better idea, I "commented out" the call to
"glusterd_unlock", recompiled glusterd and ran two glusterd volume
operations, one after the other. The second operation fails with the
message "Another transaction is in progress. Please try again after
sometime." as expected.

The tests were performed on two volume ops : one of them
synctask'ized (volume start) and the other NOT (volume create).

Change-Id: Ia862972929872ae2f053707a544824d9cadc37be
BUG: 873549
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/4197
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

Fix xdr_to_generic success check

2012-12-10T05:59:56+00:00

This patch fixes the success check for xdr_to_generic function across the
codebase.

Also, cleans up the brick_op actors table in glusterfsd-mgmt.c to make sure that
the actors are called directly by rpcsvc.

Change-Id: I3086585f30c44f69f1bc83665f89e30025f76d3a
BUG: 884452
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/4278
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

Revert "glusterd, cli: Task id's for async tasks"

2012-12-04T23:59:52+00:00

This reverts commit ed15521d4e5af2b52b78fd33711e7562f5273bc6

Strangely, the test scripts are "silently" passing for failures too. Reverting patch for now.

Change-Id: I802ec1634c7863dc373cc7dc4a47bd4baa72764e
Reviewed-on: http://review.gluster.org/4267
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

glusterd, cli: Task id's for async tasks

2012-12-04T22:44:36+00:00

This patch introduces task-id's for async tasks like rebalance, remove-brick and
replace-brick. An id is generated for each task when it is started and displayed
to the user in cli output. The status of running tasks is also included in the
output of "volume status" along with its id, so that a user can easily track the
progress of an async task.

Also,
 * added tests for this feature into the regression test suite.
 * added a python script for creating files, 'create-files.py', courtesy
   Vijaykumar Koppad (vkoppad@redhat.com) into the test suite.

Change-Id: Ib0c0d12e0d6c8f72ace48d303d7ff3102157e876
BUG: 857330
Signed-off-by: Kaushal M 
Reviewed-on: http://review.gluster.org/3942
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati