[Pacemaker] Self-Fence???

Thu Mar 28 20:32:23 EDT 2013

Ok, then. I learned something new. Thanks.

d.p.

On Thu, Mar 28, 2013 at 6:28 PM, Andrew Beekhof <andrew at beekhof.net> wrote:

> On Fri, Mar 29, 2013 at 7:42 AM, David Pendell <lostogre at gmail.com> wrote:
> > I have a two-node CentOS 6.4 based cluster, using pacemaker 1.1.8 with a
> > cman backend running primarily libvirt controlled kvm VMs. For the VMs,
> I am
> > using clvm volumes for the virtual hard drives and a single gfs2 volume
> for
> > shared storage of the config files for the VMs and other shared data. For
> > fencing, I use ipmi and a apc master switch to provide redundant fencing.
> > There are location constraints that do not allow the fencing resources
> run
> > on their own node. I am *not* using sbd or any other software based
> fencing
> > device.
> >
> > I had a very bizarre situation this morning -- I had one of the nodes
> > powered off. Then the other self-fenced. I thought that was impossible.
>
> No. Not when a node is by itself.
>
> >
> > Excerpts from the logs:
> >
> > Mar 28 13:10:01 virtualhost2 stonith-ng[4223]:   notice: remote_op_done:
> > Operation reboot of virtualhost2.delta-co.gov by
> > virtualhost1.delta-co.gov for
> crmd.4430 at virtualhost1.delta-co.gov.fc5638ad:
> > Timer expired
> >
> > [...]
> > Virtualhost1 was offline, so I expect that line.
> > [...]
> >
> > Mar 28 13:13:30 virtualhost2 pengine[4226]:   notice: unpack_rsc_op:
> > Preventing p_ns2 from re-starting on virtualhost2.delta-co.gov:
> operation
> > monitor failed 'not installed' (rc=5)
> >
> > [...]
> > If I had a brief interruption of my gfs2 volume, would that show up? And
> > would it be the cause of a fencing operation?
> > [...]
> >
> > Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: pe_fence_node: Node
> > virtualhost2.delta-co.gov will be fenced to recover from resource
> failure(s)
> > Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: stage6: Scheduling
> > Node virtualhost2.delta-co.gov for STONITH
> >
> > [...]
> > Why is it still trying to fence, if all of the fencing resources are
> > offline?
> > [...]
> >
> > Mar 28 13:13:30 virtualhost2 crmd[4227]:   notice: te_fence_node:
> Executing
> > reboot fencing operation (43) on virtualhost2.delta-co.gov(timeout=60000)
> >
> > Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice: handle_request:
> > Client crmd.4227.9fdec3bd wants to fence (reboot)
> > 'virtualhost2.delta-co.gov' with device '(any)'
> >
> > [...]
> > What does that mean? crmd.4227.9fdec3bd  I figure 4227 is a process
> number,
> > but I don't what the next number is.
> > [...]
> >
> > Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:    error:
> > check_alternate_host: No alternate host available to handle complex self
> > fencing request
> >
> > [...]
> > Where did that come from?
>
> It was scheduled by the policy engine (because a resource failed to
> stop by the looks of it) and, as per the logs above, initiated by the
> crmd.
>
> > [...]
> >
> > Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
> > check_alternate_host: Peer[1] virtualhost1.delta-co.gov
> > Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
> > check_alternate_host: Peer[2] virtualhost2.delta-co.gov
> > Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:
> > initiate_remote_stonith_op: Initiating remote operation reboot for
> > virtualhost2.delta-co.gov: 648ca743-6cda-4c81-9250-21c9109a51b9 (0)
> >
> > [...]
> > The next logs are the reboot logs.
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130328/9bd3d60c/attachment-0003.html>