[Pacemaker] Cluster with two STONITH devices

Thu Apr 9 18:39:22 UTC 2015

Any thoughts on this would be much appreciated :)

On Wed, Apr 8, 2015 at 5:16 PM, Jorge Lopes <jmclopes at gmail.com> wrote:

> (I'm a bit confused because I received an auto-reply form
> pacemaker-bounces at oss.clusterlabs.org saying this list is inactive now
> but I just received a digest with my mail. I happens that I have resent the
> email to the new list with a bit more information, which was missing in the
> first message. So here it is that extra bit, anyway).
>
> I also have noticed this pattern (with both STONITH resources running):
> 1. With the cluster running without errors, I run "stop docker" in node
> cluster-a-1.
> 2. This leads to the vCenter STONITH to act as expected.
> 3. After the cluster is running again without errors, I run again "stop
> docker" in node cluster-a-1.
> 4. Now, the vCenter STONITH doesn't run and, instead, it is the IPMI
> STONITH that runs. This is unexpected for me, as I was expecting to see the
> vCenter STONITH to run again.
>
>
> On Wed, Apr 8, 2015 at 4:20 PM, Jorge Lopes <jmclopes at gmail.com> wrote:
>
>> Hi all.
>>
>> I'm having difficulties orchestrating two STONITH devices in my cluster.
>> I have been struggling with this in past days and I need some help, please.
>>
>> A simplified version of my cluster and its goals is as follows:
>> - The cluster has two physical servers, each with two nodes (VMWare
>> virtual machines): overall, there are 4 nodes in this simplified version.
>> - There are two resource groups: group-cluster-a and group-cluster-b.
>> - To achieve a good CPU balance in the physical servers, the cluster is
>> asymmetric, with one group running in one server and the other group
>> running on the other server.
>> - If the VM of one host becomes not usable, then its resources are
>> started in its sister VM deployed in the other physical host.
>> - If one physical host becomes not usable, then all resources are started
>> in the other physical host.
>> - Two STONITH levels are used to fence the problematic nodes.
>>
>> The resources have the following behavior:
>> - If the resource monitor detects a problem, then Pacemaker tries to
>> restart the resource in the same node.
>> - If it fails, then STONITH takes place (vcenter reboots the VM) and
>> Pacemaker starts the resource in the sister VM present in the other
>> physical host.
>> - If restarting the VM fails, I want to power off the physical server and
>> Pacemaker will start all resources in the other physical host.
>>
>>
>> The HA stack is:
>> Ubuntu 14.04 (the node OS, which is a visualized guest running in VMWare
>> ESXi 5.5)
>> Pacemaker 1.1.12
>> Corosync  2.3.4
>> CRM 2.1.2
>>
>> The 4 nodes are:
>> cluster-a-1
>> cluster-a-2
>> cluster-b-1
>> cluster-b-2
>>
>> The relevant configuration is:
>>
>> property symmetric-cluster=false
>> property stonith-enabled=true
>> property no-quorum-policy=stop
>>
>> group group-cluster-a vip-cluster-a docker-web
>> location loc-group-cluster-a-1 group-cluster-a inf: cluster-a-1
>> location loc-group-cluster-a-2 group-cluster-a 500: cluster-a-2
>>
>> group group-cluster-b vip-cluster-b docker-srv
>> location loc-group-cluster-b-1 group-cluster-b 500: cluster-b-1
>> location loc-group-cluster-b-2 group-cluster-b inf: cluster-b-2
>>
>>
>> # stonith vcenter definitions for host 1
>> # run in any of the host2 VM
>> primitive stonith-vcenter-host1 stonith:external/vcenter \
>>   params \
>>     VI_SERVER="192.168.40.20" \
>>     VI_CREDSTORE="/etc/vicredentials.xml" \
>>     HOSTLIST="cluster-a-1=cluster-a-1;cluster-a-2=cluster-a-2" \
>>     RESETPOWERON="1" \
>>   priority="2" \
>>   pcmk_host_check="static-list" \
>>   pcmk_host_list="cluster-a-1 cluster-a-2" \
>>   op monitor interval="60s"
>>
>> location loc1-stonith-vcenter-host1 stonith-vcenter-host1 500: cluster-b-1
>> location loc2-stonith-vcenter-host1 stonith-vcenter-host1 501: cluster-b-2
>>
>> # stonith vcenter definitions for host 2
>> # run in any of the host1 VM
>> primitive stonith-vcenter-host2 stonith:external/vcenter \
>>   params \
>>     VI_SERVER="192.168.40.21" \
>>     VI_CREDSTORE="/etc/vicredentials.xml" \
>>     HOSTLIST="cluster-b-1=cluster-b-1;cluster-b-2=cluster-b-2" \
>>     RESETPOWERON="1" \
>>   priority="2" \
>>   pcmk_host_check="static-list" \
>>   pcmk_host_list="cluster-b-1 cluster-b-2" \
>>   op monitor interval="60s"
>>
>> location loc1-stonith-vcenter-host2 stonith-vcenter-host2 500: cluster-a-1
>> location loc2-stonith-vcenter-host2 stonith-vcenter-host2 501: cluster-a-2
>>
>>
>> # stonith IPMI definitions for host 1 (DELL with iDRAC 7 enterprise
>> interface at 192.168.40.15)
>> # run in any of the host2 VM
>> primitive stonith-ipmi-host1 stonith:external/ipmi \
>>     params hostname="host1" ipaddr="192.168.40.15" userid="root"
>> passwd="mypassword" interface="lanplus" \
>>     priority="1" \
>>     pcmk_host_check="static-list" \
>>     pcmk_host_list="cluster-a-1 cluster-a-2" \
>>     op start interval="0" timeout="60s" requires="nothing" \
>>     op monitor interval="3600s" timeout="20s" requires="nothing"
>>
>> location loc1-stonith-ipmi-host1 stonith-ipmi-host1 500: cluster-b-1
>> location loc2-stonith-ipmi-host1 stonith-ipmi-host1 501: cluster-b-2
>>
>>
>> # stonith IPMI definitions for host 2 (DELL with iDRAC 7 enterprise
>> interface at 192.168.40.16)
>> # run in any of the host1 VM
>> primitive stonith-ipmi-host2 stonith:external/ipmi \
>>     params hostname="host2" ipaddr="192.168.40.16" userid="root"
>> passwd="mypassword" interface="lanplus" \
>>     priority="1" \
>>     pcmk_host_check="static-list" \
>>     pcmk_host_list="cluster-b-1 cluster-b-2" \
>>     op start interval="0" timeout="60s" requires="nothing" \
>>     op monitor interval="3600s" timeout="20s" requires="nothing"
>>
>> location loc1-stonith-ipmi-host2 stonith-ipmi-host2 500: cluster-a-1
>> location loc2-stonith-ipmi-host2 stonith-ipmi-host2 501: cluster-a-2
>>
>>
>> What is working:
>> - When an error is detected in one resource, the resource restart in the
>> same node, as expected.
>> - With the STONITH external/ipmi  resource *stopped*, a fail in one node
>> makes the vcenter rebooting it and the resources starts in the sister node.
>>
>>
>> What is not so good:
>> - When vcenter reboots one node, then the resource starts in the other
>> node as expected but then they return to the original node as soon as it
>> becomes online. This makes a bit of ping-pong and I think it is a
>> consequence of how the locations are defined. Any suggestion to avoid this?
>> After the resource was moved to another node, I would prefer that it stays
>> there, instead of returning it to the original node. I can think of playing
>> with the resource affinity scores - is this way it should be done?
>>
>> What is wrong:
>> Lets consider this scenario.
>> I have a set of resources provided by a docker agent. My test consists in
>> stopping the docker service in the node cluster-a-1, which makes the docker
>> agent to return OCF_ERR_INSTALLED to Pacemaker (this is a change I made in
>> the docker agent, when compared to the github repository version). With the
>> IPMI STONITH resource stopped, this leads to the node cluster-a-1 restart,
>> which is expected.
>>
>> But with the IPMI STONITH resource started, I notice an erratic behavior:
>> - Some times, the resources at the node cluster-a-1 are stopped and no
>> STONITH happens. Also, the resources are not moved to the node cluster-a-2.
>> In this situation, if I manually restart the node cluster-a-1 (virtual
>> machine restart), then the IPMI STONITH takes place and restarts the
>> corresponding physical server.
>> - Sometimes, the IPMI STONITH starts before the vCenter STONITH, which is
>> not expected because the vCenter STONITH has higher priority.
>>
>> I might have something wrong in my stonith definition, but I can't figure
>> what.
>> Any idea how to correct this?
>>
>> And how can I set external/ipmi to power off the physical host, instead
>> of rebooting it?
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150409/d3954b33/attachment.htm>