diff options
| author | Poornima G <pgurusid@redhat.com> | 2016-06-06 06:29:40 -0400 | 
|---|---|---|
| committer | Jeff Darcy <jdarcy@redhat.com> | 2016-06-16 04:57:42 -0700 | 
| commit | b8ac20e888fbacad9d90cd8f1c6ff8579a5cefe9 (patch) | |
| tree | f7befa6b0e065afb87d9876f731e963d065acd40 /api | |
| parent | c04df79dc453ef5cb7b3a0ca8ba14598da6189ac (diff) | |
gfapi: Fix IO error caused when there is consecutive graph switches
Issue:
Consider a simple situation, where glfs_init() is done, i.e. initial
graph is up. Now perform 2 volume sets that results in 2 client side
graph changes. After this perform some IO, the IO fails with ENOTCON.
The only way to recover this client is i guess another graph switch
or restart.
What actually is happening from code perspective:
Initial graph lets say A, followed by 2 consecutive graph switches
to B and C without any IO those two switches.
- graph_setup (A) as a result of GF_EVENT_CHILD_UP, and
fs->next_subvol = A
- glfs_init() results in fs->active_subvol = A, fs->next_subvol = NULL
- graph_setup (B) as a result of GF_EVENT_CHILD_UP, and
fs->next_subvol = B
- graph_setup (C) as a result of GF_EVENT_CHILD_UP, and
fs->next_subvol = C. It also sees that the previous graph B was never
set as fs->active_subvol, i.e. no IO or anything happened on B, so
can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)).
This parent down on B, results in child_down(B), which is fine.
But child_down also triggers graph_setup(B).
- graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and
fs->next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained
above. This again leads to GF_EVENT_CHILD_DOWN on C.
- graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and
fs->next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained
above.
Thus both the graphs B and C are disconnected, and hence the ENOTCON
Solution:
Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN.
It don't see any reason why graph_setup should be called when there is
child_down. Not sure what the original reason was, to have graph_setup
in child_down. git hostory shows the first patch itself had this call.
Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b
BUG: 1343038
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/14656
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Diffstat (limited to 'api')
| -rw-r--r-- | api/src/glfs-master.c | 1 | 
1 files changed, 0 insertions, 1 deletions
| diff --git a/api/src/glfs-master.c b/api/src/glfs-master.c index ff8f68f452b..9f11a6a0c9c 100644 --- a/api/src/glfs-master.c +++ b/api/src/glfs-master.c @@ -105,7 +105,6 @@ notify (xlator_t *this, int event, void *data, ...)                          pthread_cond_broadcast (&fs->child_down_cond);                  }                  pthread_mutex_unlock (&fs->mutex); -		graph_setup (fs, graph);  		glfs_init_done (fs, 1);  		break;  	case GF_EVENT_CHILD_CONNECTING: | 
