summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRichard Wareing <rwareing@fb.com>2016-04-01 20:42:50 -0700
committerKevin Vigor <kvigor@fb.com>2016-12-29 11:15:39 -0800
commit02f8b7300bc635dea9ae1fee6ef14c0d4725591a (patch)
tree3d1b55756f09cbcd6763d87ad1c26ec3b7a5ddf9
parent141e879d83da3e6fa2f3891d8aa1c7895017c401 (diff)
cluster/afr: Hybrid mounts must honor _marked_ up/down states
Summary: - Qsorted latencies must also take into account that a host can be marked down yet still have a (slightly) better latency. This can happen if max-replicas is reached, and there are multiple bricks which meet the halo threshold. In such a case we must prefer the brick which is marked up albeit with a perhaps slightly higher latency. It's not worth doing any sort of swap in this case as it's going to be jarring to the cluster and introduce flapping behavior. Test Plan: - Halo prove tests @ https://phabricator.fb.com/P56251020 Reviewers: sshreyas, kvigor Reviewed By: kvigor Blame Revision: Change-Id: I858bbf44638a4978093dc56d4a98c96be4f8b45e Change-Id: Ie3b79700ec63398bc30e7bcf75fb05931dd5d0d0 FB-commit-id: deb2f35db6f8b979857b4e3779b2a41c59a2e416 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16307 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
-rw-r--r--xlators/cluster/afr/src/afr-common.c23
1 files changed, 23 insertions, 0 deletions
diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index 0c621271405..d5002a2070b 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -4293,6 +4293,17 @@ __get_heard_from_all_status (xlator_t *this)
*
* Passed to the qsort function to order a list of children by the latency
* and/or up/down states.
+ *
+ * Note: This isn't as simple as taking the latencies and calling it a
+ * a day. Children can be marked down, which overrides their latency
+ * signal. Having a lower-latency child available doesn't guarentee this
+ * child shall be marked up: we don't want to constantly be swapping
+ * slightly better bricks for others...this is jarring to clients and
+ * could cause all sorts of issues. Plus, the fail-over, max-replicas
+ * flags must all be honored which manage the up/down state of children.
+ *
+ * In short, the (as marked) up/down down state of the brick shall always
+ * take precedence when sorting by latency.
*/
static int
_afr_cmp_child (const void *child1, const void *child2)
@@ -4300,6 +4311,18 @@ _afr_cmp_child (const void *child1, const void *child2)
struct afr_child *child11 = (struct afr_child *)child1;
struct afr_child *child22 = (struct afr_child *)child2;
+ /* If both children are _marked_ down they are equal */
+ if (!child11->child_up && !child22->child_up)
+ return 0;
+
+ /* Prefer child 2, child 1 is _marked_ down, child 2 is not */
+ if (!child11->child_up && child22->child_up)
+ return 1;
+
+ /* Prefer child 1, child 2 is _marked_ down, child 1 is not */
+ if (child11->child_up && !child22->child_up)
+ return -1;
+
if (child11->latency > child22->latency) {
return 1;
}