diff options
author | Anand Avati <avati@gluster.com> | 2010-11-09 05:27:02 +0000 |
---|---|---|
committer | Anand V. Avati <avati@dev.gluster.com> | 2010-11-09 03:07:07 -0800 |
commit | 6fb49f18a9bbfd1266b4773e757e459519c6719c (patch) | |
tree | fff8ff41717114ead7a7e2b848e83058d6d8b15a /xlators/cluster/afr/src/afr.h | |
parent | 667c5e22467cbecd371bfc052e7f65b6b6b41e2d (diff) |
replicate: optimistic changelog
The standard way of maintaining changelog in replicate has been to
write out pending flags and to unset the pending flag post the
actual operation.
This new optimization kicks in only when all subvolumes are up.
The optimization is that, during pre-op, no changelog is written for
METADATA and ENTRY/RENAME operations. If during the operation nothing
failed, no changelog is updated in post-op either. If however,
something does fail during an operation, then, pending flags get
written during post op pointing only towards the failed nodes.
DATA transactions continue to work the way they are.
If one subvolume is down, pending flags are written in pre-op changelog
itself as before.
The impact of this optimization is only in the case when both servers
die or the client dies while the 'FOP' stage of the transaction is
in progress. By nature of METADATA and ENTRY operations, detecting a
mismatch later is not dependent on the presence of changelog. Changelog
only determines the direction in which self-heal happens for these types
of transactions. For the direction too this optimization does not have
a major impact because in the cases of failure (both servers dieing or
client dieing) the final state (direction of self-heal) would be
arbitrary anyways as the syscall wouldn't have completed.
Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com>
Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
BUG: 2068 (performance enhancements)
URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2068
Diffstat (limited to 'xlators/cluster/afr/src/afr.h')
-rw-r--r-- | xlators/cluster/afr/src/afr.h | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/xlators/cluster/afr/src/afr.h b/xlators/cluster/afr/src/afr.h index 758ac789aff..a7359f26963 100644 --- a/xlators/cluster/afr/src/afr.h +++ b/xlators/cluster/afr/src/afr.h @@ -88,6 +88,7 @@ typedef struct _afr_private { pthread_mutex_t mutex; struct list_head saved_fds; /* list of fds on which locks have succeeded */ + gf_boolean_t optimistic_change_log; } afr_private_t; typedef struct { @@ -312,6 +313,7 @@ typedef struct _afr_local { int32_t lock_recovery_child; dict_t *dict; + int optimistic_change_log; int (*openfd_flush_cbk) (call_frame_t *frame, xlator_t *this); @@ -805,6 +807,8 @@ AFR_BASENAME (const char *str) static inline int AFR_LOCAL_INIT (afr_local_t *local, afr_private_t *priv) { + int child_up_count = 0; + local->child_up = GF_CALLOC (sizeof (*local->child_up), priv->child_count, gf_afr_mt_char); @@ -815,6 +819,10 @@ AFR_LOCAL_INIT (afr_local_t *local, afr_private_t *priv) memcpy (local->child_up, priv->child_up, sizeof (*local->child_up) * priv->child_count); + child_up_count = afr_up_children_count (priv->child_count, local->child_up); + + if (priv->optimistic_change_log && child_up_count == priv->child_count) + local->optimistic_change_log = 1; local->call_count = afr_up_children_count (priv->child_count, local->child_up); if (local->call_count == 0) |