[Pacemaker] Self-Fence???
David Pendell
lostogre at gmail.com
Thu Mar 28 20:42:43 UTC 2013
I have a two-node CentOS 6.4 based cluster, using pacemaker 1.1.8 with a
cman backend running primarily libvirt controlled kvm VMs. For the VMs, I
am using clvm volumes for the virtual hard drives and a single gfs2 volume
for shared storage of the config files for the VMs and other shared data.
For fencing, I use ipmi and a apc master switch to provide redundant
fencing. There are location constraints that do not allow the fencing
resources run on their own node. I am *not* using sbd or any other software
based fencing device.
I had a very bizarre situation this morning -- I had one of the nodes
powered off. Then the other self-fenced. I thought that was impossible.
Excerpts from the logs:
Mar 28 13:10:01 virtualhost2 stonith-ng[4223]: notice: remote_op_done:
Operation reboot of virtualhost2.delta-co.gov by
virtualhost1.delta-co.gov for crmd.4430 at virtualhost1.delta-co.gov.fc5638ad:
Timer expired
[...]
Virtualhost1 was offline, so I expect that line.
[...]
Mar 28 13:13:30 virtualhost2 pengine[4226]: notice: unpack_rsc_op:
Preventing p_ns2 from re-starting on virtualhost2.delta-co.gov: operation
monitor failed 'not installed' (rc=5)
[...]
If I had a brief interruption of my gfs2 volume, would that show up? And
would it be the cause of a fencing operation?
[...]
Mar 28 13:13:30 virtualhost2 pengine[4226]: warning: pe_fence_node: Node
virtualhost2.delta-co.gov will be fenced to recover from resource failure(s)
Mar 28 13:13:30 virtualhost2 pengine[4226]: warning: stage6: Scheduling
Node virtualhost2.delta-co.gov for STONITH
[...]
Why is it still trying to fence, if all of the fencing resources are
offline?
[...]
Mar 28 13:13:30 virtualhost2 crmd[4227]: notice: te_fence_node: Executing
reboot fencing operation (43) on virtualhost2.delta-co.gov (timeout=60000)
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]: notice: handle_request:
Client crmd.4227.9fdec3bd wants to fence (reboot) 'virtualhost2.delta-co.gov'
with device '(any)'
[...]
What does that mean? crmd.4227.9fdec3bd I figure 4227 is a process number,
but I don't what the next number is.
[...]
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]: error:
check_alternate_host: No alternate host available to handle complex self
fencing request
[...]
Where did that come from?
[...]
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]: notice:
check_alternate_host: Peer[1] virtualhost1.delta-co.gov
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]: notice:
check_alternate_host: Peer[2] virtualhost2.delta-co.gov
Mar 28 13:13:30 virtualhost2 stonith-ng[4223]: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
virtualhost2.delta-co.gov: 648ca743-6cda-4c81-9250-21c9109a51b9 (0)
[...]
The next logs are the reboot logs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130328/2f2fe2ba/attachment-0003.html>
More information about the Pacemaker
mailing list