diff options
author | Kaleb S KEITHLEY <kkeithle@redhat.com> | 2016-06-01 14:40:13 -0400 |
---|---|---|
committer | Atin Mukherjee <amukherj@redhat.com> | 2016-06-03 21:10:02 -0700 |
commit | 04b5886132ee0fe84011033cd2db08285cc75e31 (patch) | |
tree | 5e88c9d76bdfd8dfa9df9193fbeb1cdd75d9455c /extras/ganesha/ocf/ganesha_mon | |
parent | 124425aef8118116d0bd1daa8269ab2c348b2cb9 (diff) |
common-ha: race/timing issue setting up cluster
The ganesha_grace resource agent can start before the ganesha_mon
resource agent, with the result that the crm_attribute that
ganesha_grace expects to find has not been created yet.
This is never (never? Or just so rarely that it has never actually
been seen during development) seen with four nodes, but with just
two nodes it's very repeatable.
Note that when long (FQDN) names are used it is not unexpected to
see Failed Actions in the output of `pcs status`, e.g.:
* nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com
'unknown error' (1): call=20, status=complete, exitreason='none',
last-rc-change='Wed Jun 1 12:32:32 2016', queued=0ms, exec=0ms
* nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com
'unknown error' (1): call=18, status=complete, exitreason='none',
last-rc-change='Wed Jun 1 12:32:42 2016', queued=0ms, exec=0ms
and as long as all the ganesha_grace_clone and cluster_ip-1
resource agents are in Started state then this is okay.
Change-Id: I726c9946ceb1ca92872b321612eb0f4c3cc039d8
BUG: 1341768
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/14607
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Diffstat (limited to 'extras/ganesha/ocf/ganesha_mon')
-rw-r--r-- | extras/ganesha/ocf/ganesha_mon | 33 |
1 files changed, 18 insertions, 15 deletions
diff --git a/extras/ganesha/ocf/ganesha_mon b/extras/ganesha/ocf/ganesha_mon index 974eb86a07a..7d0eb6b9cb8 100644 --- a/extras/ganesha/ocf/ganesha_mon +++ b/extras/ganesha/ocf/ganesha_mon @@ -124,8 +124,7 @@ ganesha_mon_stop() ganesha_mon_monitor() { - local short_host=$(hostname -s) - local long_host=$(hostname) + local host=$(hostname -s) local pid_file="/var/run/ganesha.nfsd.pid" # RHEL6 /etc/init.d/nfs-ganesha adds -p /var/run/ganesha.nfsd.pid @@ -154,13 +153,15 @@ ganesha_mon_monitor() # track grace-active crm_attr (attr != crm_attr) # we can't just use the attr as there's no way to query # its value in RHEL6 pacemaker - crm_attribute --node=${short_host} --lifetime=forever --name=${OCF_RESKEY_grace_active} --update=1 2> /dev/null - if [ $? -ne 0 ]; then - crm_attribute --node=${long_host} --lifetime=forever --name=${OCF_RESKEY_grace_active} --update=1 2> /dev/null - if [ $? -ne 0 ]; then - ocf_log info "warning: crm_attribute --node=${short_host} --lifetime=forever --name=${OCF_RESKEY_grace_active} --update=1 failed" - fi - fi + + crm_attribute --node=${host} --lifetime=forever --name=${OCF_RESKEY_grace_active} --update=1 2> /dev/null + if [ $? -ne 0 ]; then + host=$(hostname) + crm_attribute --node=${host} --lifetime=forever --name=${OCF_RESKEY_grace_active} --update=1 2> /dev/null + if [ $? -ne 0 ]; then + ocf_log info "mon monitor warning: crm_attribute --node=${host} --lifetime=forever --name=${OCF_RESKEY_grace_active} --update=1 failed" + fi + fi return ${OCF_SUCCESS} fi @@ -182,13 +183,15 @@ ganesha_mon_monitor() ocf_log info "warning: attrd_updater -D -n ${OCF_RESKEY_grace_active} failed" fi - crm_attribute --node=${short_host} --name=${OCF_RESKEY_grace_active} --update=0 2> /dev/null + host=$(hostname -s) + crm_attribute --node=${host} --name=${OCF_RESKEY_grace_active} --update=0 2> /dev/null if [ $? -ne 0 ]; then - crm_attribute --node=${long_host} --name=${OCF_RESKEY_grace_active} --update=0 2> /dev/null - if [ $? -ne 0 ]; then - ocf_log info "warning: crm_attribute --node=${short_host} --name=${OCF_RESKEY_grace_active} --update=0 failed" - fi - fi + host=$(hostname) + crm_attribute --node=${host} --name=${OCF_RESKEY_grace_active} --update=0 2> /dev/null + if [ $? -ne 0 ]; then + ocf_log info "mon monitor warning: crm_attribute --node=${host} --name=${OCF_RESKEY_grace_active} --update=0 failed" + fi + fi sleep ${OCF_RESKEY_grace_delay} |