[Pacemaker] STONITH is not performed after stonithd reboots
Kazunori INOUE
inouekazu at intellilink.co.jp
Mon May 7 09:52:50 UTC 2012
Hi,
On the Pacemkaer-1.1 + Corosync stack, although stonithd reboots
after an abnormal end, STONITH is not performed after that.
I am using the newest devel.
- pacemaker : db5e16736cc2682fbf37f81cd47be7d17d5a2364
- corosync : 88dd3e1eeacd64701d665f10acbc40f3795dd32f
- glue : 2686:66d5f0c135c9
* 0. cluster's state.
[root at vm1 ~]# crm_mon -r1
============
Last updated: Wed May 2 16:07:29 2012
Last change: Wed May 2 16:06:35 2012 via cibadmin on vm1
Stack: corosync
Current DC: vm1 (1) - partition WITHOUT quorum
Version: 1.1.7-db5e167
2 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ vm1 vm2 ]
Full list of resources:
prmDummy (ocf::pacemaker:Dummy): Started vm2
prmStonith1 (stonith:external/libvirt): Started vm2
prmStonith2 (stonith:external/libvirt): Started vm1
[root at vm1 ~]# crm configure show
node $id="1" vm1
node $id="2" vm2
primitive prmDummy ocf:pacemaker:Dummy \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="fence" \
op stop interval="0s" timeout="60s" on-fail="stop"
primitive prmStonith1 stonith:external/libvirt \
params hostlist="vm1" hypervisor_uri="qemu+ssh://f/system" \
op start interval="0s" timeout="60s" \
op monitor interval="3600s" timeout="60s" \
op stop interval="0s" timeout="60s"
primitive prmStonith2 stonith:external/libvirt \
params hostlist="vm2" hypervisor_uri="qemu+ssh://g/system" \
op start interval="0s" timeout="60s" \
op monitor interval="3600s" timeout="60s" \
op stop interval="0s" timeout="60s"
location rsc_location-prmDummy prmDummy \
rule $id="rsc_location-prmDummy-rule" 200: #uname eq vm2
location rsc_location-prmStonith1 prmStonith1 \
rule $id="rsc_location-prmStonith1-rule" 200: #uname eq vm2 \
rule $id="rsc_location-prmStonith1-rule-0" -inf: #uname eq vm1
location rsc_location-prmStonith2 prmStonith2 \
rule $id="rsc_location-prmStonith2-rule" 200: #uname eq vm1 \
rule $id="rsc_location-prmStonith2-rule-0" -inf: #uname eq vm2
property $id="cib-bootstrap-options" \
dc-version="1.1.7-db5e167" \
cluster-infrastructure="corosync" \
no-quorum-policy="ignore" \
stonith-enabled="true" \
startup-fencing="false" \
stonith-timeout="120s"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY" \
migration-threshold="1"
* 1. terminate stonithd forcibly.
[root at vm1 ~]# pkill -9 stonithd
* 2. I cause STONITH, but stonithd says that a device is not found and
does not STONITH.
[root at vm1 ~]# ssh vm2 'rm /var/run/Dummy-prmDummy.state'
[root at vm1 ~]# grep Found /var/log/ha-debug
May 2 16:13:07 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 matching devices for 'vm2'
May 2 16:13:19 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 matching devices for 'vm2'
May 2 16:13:31 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 matching devices for 'vm2'
May 2 16:13:43 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 matching devices for 'vm2'
(snip)
[root at vm1 ~]#
After stonithd reboots, it seems that STONITH-resource or lrmd needs
to be rebooted.. is this the designed behavior?
# crm resource restart <STONITH resource (prmStonith2)>
or
# /usr/lib64/heartbeat/lrmd -r (on the node which stonithd rebooted)
----
Best regards,
Kazunori INOUE
More information about the Pacemaker
mailing list