glusterd: socketfile & pidfile related fixes for brick multiplexing feature

Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
author: Mohit Agrawal <moagrawa@redhat.com> 2017-05-08 19:29:22 +0530
committer: Atin Mukherjee <amukherj@redhat.com> 2017-05-09 01:30:01 +0000
commit: 21c7f7baccfaf644805e63682e5a7d2a9864a1e6 (patch)
tree: 01bbbd50d13f609eb8f7d2cbe2ce5e3af1652e42 /xlators/storage
parent: 18e07cf01f975c80152e5469fb4e4274f08dc636 (diff)
1 files changed, 27 insertions, 9 deletions
diff --git a/xlators/storage/posix/src/posix-helpers.c b/xlators/storage/posix/src/posix-helpers.c
index ae07b28e48a..ca2c2b923d3 100644
--- a/xlators/storage/posix/src/posix-helpers.c
+++ b/xlators/storage/posix/src/posix-helpers.c
@@ -51,6 +51,7 @@
 #include "hashfn.h"
 #include "glusterfs-acl.h"
 #include "events.h"
+#include <sys/types.h>
 
 char *marker_xattrs[] = {"trusted.glusterfs.quota.*",
                          "trusted.glusterfs.*.xtime",
@@ -1829,6 +1830,9 @@ posix_health_check_thread_proc (void *data)
         struct posix_private *priv               = NULL;
         uint32_t              interval           = 0;
         int                   ret                = -1;
+        xlator_t                *top             = NULL;
+        xlator_list_t           **trav_p         = NULL;
+        int                     count            = 0;
 
         this = data;
         priv = this->private;
@@ -1840,7 +1844,6 @@ posix_health_check_thread_proc (void *data)
 
         gf_msg_debug (this->name, 0, "health-check thread started, "
                 "interval = %d seconds", interval);
-
         while (1) {
                 /* aborting sleep() is a request to exit this thread, sleep()
                  * will normally not return when cancelled */
@@ -1877,18 +1880,33 @@ abort:
 
         xlator_notify (this->parents->xlator, GF_EVENT_CHILD_DOWN, this);
 
-        ret = sleep (30);
-        if (ret == 0) {
+        /* Below code is use to ensure if brick multiplexing is enabled if
+           count is more than 1 it means brick mux has enabled
+        */
+        if (this->ctx->active) {
+                top = this->ctx->active->first;
+                for (trav_p = &top->children; *trav_p;
+                                               trav_p = &(*trav_p)->next) {
+                        count++;
+                }
+        }
+
+        if (count == 1) {
                 gf_msg (this->name, GF_LOG_EMERG, 0, P_MSG_HEALTHCHECK_FAILED,
                         "still alive! -> SIGTERM");
-                kill (getpid(), SIGTERM);
-        }
+                ret = sleep (30);
 
-        ret = sleep (30);
-        if (ret == 0) {
+                /* Need to kill the process only while brick mux has not enabled
+                */
+                if (ret == 0)
+                        kill (getpid(), SIGTERM);
+
+                ret = sleep (30);
                 gf_msg (this->name, GF_LOG_EMERG, 0, P_MSG_HEALTHCHECK_FAILED,
-                        "still alive! -> SIGKILL");
-                kill (getpid(), SIGKILL);
+                        "still alive! -> SIGTERM");
+                if (ret == 0)
+                        kill (getpid(), SIGTERM);
+
         }
 
         return NULL;
author	Mohit Agrawal <moagrawa@redhat.com>	2017-05-08 19:29:22 +0530
committer	Atin Mukherjee <amukherj@redhat.com>	2017-05-09 01:30:01 +0000
commit	21c7f7baccfaf644805e63682e5a7d2a9864a1e6 (patch)
tree	01bbbd50d13f609eb8f7d2cbe2ce5e3af1652e42 /xlators/storage
parent	18e07cf01f975c80152e5469fb4e4274f08dc636 (diff)