summaryrefslogtreecommitdiffstats
path: root/extras/ganesha/ocf/ganesha_grace
Commit message (Collapse)AuthorAgeFilesLines
* Revert "packaging: (ganesha) remove glusterfs-ganesha subpackage and related ↵Jiffin Tony Thottan2019-08-241-0/+221
| | | | | | | | | | | | | | | | | | | files" Until 3.12, glusterd had an option to setup HA cluster for nfs-ganesha using pacemaker and corosync. But later infavour of "Storhaug" Project, this functionality was removed from the codebase. Since there is not much development happening towards storhaug, it better to keep back old working HA solution for nfs-ganesha with bit improvements. Planned improvements : * add an option in nfs-ganesha enable to set ganesha without HA. * Handle usage of export id's properly in the scripts. Change-Id: I1d60c8970bfc20035cf674d7b2705dfd4819647e updates: #663 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* storhaug HA: first step, remove resource agents and setup scriptKaleb S. KEITHLEY2017-01-171-222/+0
| | | | | | | | | | | | | | | | resource agents and setup script(s) are now in storhaug This is a phased switch-over to storhaug. Ultimately all components here should be (re)moved to the storhaug project and its packages. But for now some will linger here. Change-Id: Ied3956972b14b14d8a76e22c583b1fe25869f8e7 BUG: 1410843 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/16349 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* common-ha: nfs-grace-monitor timed out unknown error messagesKaleb S. KEITHLEY2016-11-161-3/+2
| | | | | | | | | | | | | | | grace_mon_monitor() occasionally returns OCF_ERR_GENERIC, but it ought to return OCF_NOT_RUNNING instead. Change-Id: I3d550e33cc3d0a8dce4333ced72db19b3b2f4f2e BUG: 1394224 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15831 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
* common-ha: race/timing issue setting up clusterKaleb S KEITHLEY2016-06-031-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ganesha_grace resource agent can start before the ganesha_mon resource agent, with the result that the crm_attribute that ganesha_grace expects to find has not been created yet. This is never (never? Or just so rarely that it has never actually been seen during development) seen with four nodes, but with just two nodes it's very repeatable. Note that when long (FQDN) names are used it is not unexpected to see Failed Actions in the output of `pcs status`, e.g.: * nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com 'unknown error' (1): call=20, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:32 2016', queued=0ms, exec=0ms * nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com 'unknown error' (1): call=18, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:42 2016', queued=0ms, exec=0ms and as long as all the ganesha_grace_clone and cluster_ip-1 resource agents are in Started state then this is okay. Change-Id: I726c9946ceb1ca92872b321612eb0f4c3cc039d8 BUG: 1341768 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14607 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* common-ha: post fail-back, ganesha.nfsds are not put into NFS-GRACEKaleb S KEITHLEY2016-05-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A little known, rarely used feature of pacemaker called "notification" is used to follow the status of the ganesha.nfsds in the cluster. This is done with location constraints and other Black Magick. When a nfsd dies, the ganesha-active attribute is cleared, the associated floating IP (VIP) fails over to another node, and the ganesha_grace notify method is invoked with post-stop on all the nodes where the ganesha.nfsd is still running. The notify methods send dbus msgs to put their nfsds into NFS-GRACE, and the nfsds perform their grace processing, e.g. taking over locks from the failed nfsd. N.B. Fail-back was originally not planned to be a feature for glusterfs-3.7, but we sorta got it for free. For fail-back, the opposite occurs. The ganesha-active attribute is recreated, the floating IP fails back, and the notify method is invoked with pre-start on all the nodes where the surviving ganesha.nfsds continue to run. The notify methods send dbus msgs again to put their nsfds into NFS-GRACE again, and the nfsds clean up their locks. Change-Id: I3fc64afa20ae3a928143d69aa533a8df68dd680e BUG: 1338967 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14506 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* common-ha: log flooded with Could not map name=xxxx to a UUIDKaleb S KEITHLEY2016-05-231-2/+16
| | | | | | | | | | | | | | | | When the cluster is configured with long (FQDN) cluster members the log is flooded with "Could not map name=$shortname to a UUID" notices, and setting/getting the attribute is failing Change-Id: I954d8cef7115659cc9c8b23dae75a5a247dc5db7 BUG: 1337650 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14435 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com>
* common-ha: continuous grace_mon log messages in /var/log/messagesKaleb S KEITHLEY2016-04-281-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | messages are seen on RHEL6.x and RHEL7.1 and earlier versions of pacemaker. (And RHEL7.2 with RHEL7.1 pacemaker packages.) It's not possible to query attrd attributes in the older version, only set/update/clear them. The messages come from invalid attempts to query the attributes. However it is possible to query crm attributes. The fix here is to create a "shadow" crm attribute for the attrd attribute. Changes are made to both, queries are made on the crm attribute. (Resource Agents "follow" the attrd attribute using constraint locations, so we must keep the attrd attribute.) Change-Id: I84ac1a80673e528d98b67b7d5062e21dcf744d4a BUG: 1324509 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/13919 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com>
* common-ha: reliable grace using pacemaker notify actionsKaleb S KEITHLEY2016-03-141-61/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using *-dead_ip-1 resources to track on which nodes the ganesha.nfsd had died was found to be unreliable. Running `pcs status` in the ganesha_grace monitor action was seen to time out during failover; the HA devs opined that it was, generally, not a good idea to run `pcs status` in a monitor action in any event. They suggested using the notify feature, where the resources on all the nodes are notified when a clone resource agent dies. This change adds a notify action to the ganesha_grace RA. The ganesha_mon RA monitors its ganesha.nfsd daemon. While the daemon is running, it creates two attributes: ganesha-active and grace-active. When the daemon stops for any reason, the attributes are deleted. Deleting the ganesha-active attribute triggers the failover of the virtual IP (the IPaddr RA) to another node where ganesha.nfsd is still running. The ganesha_grace RA monitors the grace-active attribute. When the grace-active attibute is deleted, the ganesha_grace RA stops, and will not restart. This triggers pacemaker to trigger the notify action in the ganesha_grace RAs on the other nodes in the cluster; which send a DBUS message to their ganesha.nfsd. (N.B. grace-active is a bit of a misnomer. while the grace-active attribute exists, everything is normal and healthy. Deleting the attribute triggers putting the surviving ganesha.nfsds into GRACE.) To ensure that the remaining/surviving ganesha.nfsds are put into NFS-GRACE before the IPaddr (virtual IP) fails over there is a short delay (sleep) between deleting the grace-active attribute and the ganesha-active attribute. To summarize: 1. on node 2 ganesha_mon:monitor notices that ganesha.nfsd has died 2. on node 2 ganesha_mon:monitor deletes its grace-active attribute 3. on node 2 ganesha_grace:monitor notices that grace-active is gone and returns OCF_ERR_GENERIC, a.k.a. new error. When pacemaker tries to (re)start ganesha_grace, its start action will return OCF_NOT_RUNNING, a.k.a. known error, don't attempt further restarts. 4. on nodes 1, 3, etc., ganesha_grace:notify receives a post-stop notification indicating that node 2 is gone, and sends a DBUS message to its ganesha.nfsd putting it into NFS-GRACE. 5. on node 2 ganesha_mon:monitor waits a short period, then deletes its ganesha-active attribute. This triggers the IPaddr (virt IP) failover according to constraint location rules. ganesha_nfsd modified to run for the duration, start action is invoked to setup the /var/lib/nfs symlink, stop action is invoked to restore it. ganesha-ha.sh modified accordingly to create it as a clone resource. BUG: 1290865 Change-Id: I1ba24f38fa4338b3aeb17c65645e9f439387ff57 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/12964 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/13725
* common-ha: cluster setup issues on RHEL7Kaleb S. KEITHLEY2015-06-191-2/+10
| | | | | | | | | | | | | | | | | | | | * use --name on RHEL7 (later versions of pcs drop --name) we guessed wrong and did not get the version that dropped use of --name option * more robust config file param parsing for n/v with ""s in the value after not sourcing the config file * pid file fix. RHEL6 init.d adds -p /var/run/ganesha.nfsd.pid to cmdline options. RHEL7 systemd does not, so defaults to /var/run/ganesha.pid. Change-Id: I575aa13c98f05523cca10c55f2c387200bad3f93 BUG: 1229948 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/11257 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Meghana M <mmadhusu@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* common-ha: delete-node implementationKaleb S. KEITHLEY2015-04-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | omnibus patch consisting of: + completed implemenation of delete-node (BZ 1212823) + teardown leaves /var/lib/nfs symlink (BZ 1210712) + setup copy config, teardown clean /etc/cluster (BZ 1212823) setup for copy config, teardown clean /etc/cluster: 1. on one (primary) node in the cluster, run: `ssh-keygen -f /var/lib/glusterd/nfs/secret.pem` Press Enter twice to avoid passphrase. 2. deploy the pubkey ~root/.ssh/authorized keys on _all_ nodes, run: `ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@$node` 3. copy the keys to _all_ nodes in the cluster, run: `scp /var/lib/glusterd/nfs/secret.* $node:/var/lib/glusterd/nfs/` N.B. this allows setup, teardown, etc., to be run on any node Change-Id: I9fcd3a57073ead24cd2d0ef0ee7a67c524f3d4b0 BUG: 1213933 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/10234 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ganesha-ha: more robust pid file handlingKaleb S. KEITHLEY2015-04-081-1/+2
| | | | | | | | | | | fix bug with reading pid file to determine if ganesha.nfsd is running Change-Id: I4050a119e2be93578045a221b67f616e152546d9 BUG: 1188184 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/10163 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* NFS-Ganesha: Install scripts, config files, and resource agent scriptsKaleb S. KEITHLEY2015-03-171-0/+168
Resubmitting after a gerrit bug bungled the merge of http://review.gluster.org/9621 (was it really a gerrit bug?) Scripts related to NFS-Ganesha are in extras/ganesha/scripts. Config files are in extras/ganesha/config. Resource Agent files are in extras/ganesha/ocf Files are copied to appropriate locations. Change-Id: I137169f4d653ee2b7d6df14d41e2babd0ae8d10c BUG: 1188184 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/9912 Tested-by: Gluster Build System <jenkins@build.gluster.com>