[ClusterLabs] Stonith stops after vSphere restart

Tue Apr 3 02:09:02 EDT 2018

Hi again,

After restarting the vCenter, all worked as expected.
Thanks to all.

Have a nice day.

23 de febrero de 2018 7:59, jota at disroot.org escribió:

> Hi all,
> 
> Thanks for your responses.
> With your advice I was able to configure it. I still have to test its operation. When it is
> possible to restart the vCenter, I will post the results.
> Have a nice weekend!
> 
> 22 de febrero de 2018 16:00, "Tomas Jelinek" <tojeline at redhat.com> escribió:
> 
>> Try this:
>> 
>> pcs resource meta vmware_soap failure-timeout=<desired timeout value>
>> 
>> Tomas
>> 
>> Dne 22.2.2018 v 14:55 jota at disroot.org napsal(a):
>> 
>>> Hi,
>>> 
>>> I am trying to configure the failure-timeout for stonith, but I only can do it for the other
>>> resources.
>>> When try to enable it for stonith, I get this error: "Error: resource option(s): 'failure-timeout',
>>> are not recognized for resource type: 'stonith::fence_vmware_soap'".
>>> 
>>> Thanks.
>>> 
>>> 22 de febrero de 2018 13:46, "Andrei Borzenkov" <arvidjaar at gmail.com> escribió:
>> 
>> On Thu, Feb 22, 2018 at 2:40 PM, <jota at disroot.org> wrote:
>>> Thanks for the responses.
>>> 
>>> So, if I understand, this is the right behaviour and it does not affect to the stonith mechanism.
>>> 
>>> If I remember correctly, the fault status persists for hours until I fix it manually.
>>> Is there any way to modify the expiry time to clean itself?.
>> 
>> Yes, as mentioned set failure-timeout resource meta-attribute.
>>> 22 de febrero de 2018 12:28, "Andrei Borzenkov" <arvidjaar at gmail.com> escribió:
>>> 
>>> Stonith resource state should have no impact on actual stonith
>>> operation. It only reflects whether monitor was successful or not and
>>> serves as warning to administrator that something may be wrong. It
>>> should automatically clear itself after failure-timeout has expired.
>>> 
>>> On Thu, Feb 22, 2018 at 1:58 PM, <jota at disroot.org> wrote:
>>> 
>>> Hi,
>>> 
>>> I have a 2 node pacemaker cluster configured with the fence agent
>>> vmware_soap.
>>> Everything works fine until the vCenter is restarted. After that, stonith
>>> fails and stop.
>>> 
>>> [root at node1 ~]# pcs status
>>> Cluster name: psqltest
>>> Stack: corosync
>>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>>> quorum
>>> Last updated: Thu Feb 22 11:30:22 2018
>>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>>> 
>>> 2 nodes configured
>>> 6 resources configured
>>> 
>>> Online: [ node1 node2 ]
>>> 
>>> Full list of resources:
>>> 
>>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>>> Masters: [ node1 ]
>>> Slaves: [ node2 ]
>>> Resource Group: pgsqltest
>>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>>> vmware_soap (stonith:fence_vmware_soap): Stopped
>>> 
>>> Failed Actions:
>>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>>> exitreason='none',
>>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>>> 
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> 
>>> [root at node1 ~]# pcs stonith show --full
>>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>>> 
>>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>>> again.
>>> Is there any way to do this automatically?.
>>> Is it possible to detect vSphere online again and enable stonith?.
>>> 
>>> Thanks.
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org