[Pacemaker] 3 node cluster - two nodes get fenced/rebooted when one dies?

Andrew Beekhof andrew at beekhof.net
Sun Jul 29 22:37:18 EDT 2012


On Fri, Jul 6, 2012 at 8:25 AM, Errol Neal <eneal at businessgrade.com> wrote:
> Hi again. I was hoping to get some insight into why two nodes get rebooted in my cluster when I halt one of of them.
>
> I'm running corosync 1.1.4 and pacemaker-1.1.6 on CentOS 6.2. I've put my configuration up on pastebin if anyone would like to take a look
>
> http://pastebin.com/raw.php?i=6cAkJ3Qk

Not really enough I'm afraid. We'd need a crm_report archive which has
the logs and other data necessary to debug an issue of this kind.

>
> Could this be related?

No.

>
> ERROR: native_create_actions: Resource st-xenapi-nas1-dev3-fence (stonith::fence_xenapi) is active on 2 nodes attempting recovery
>
> I noticed that during such times, multiple nodes are running the same resource. Incidentally, even if this isn't the cause, Is there a way to prevent this?

Not really, although I have been thinking about how to mask it in the PE.

Basically if there is a fencing device active on nodeX that is about
to be fenced, under some conditions we start it on nodeY before
stopping it on nodeX.
This is cheating a little, but is the only way to make progress if
nodeY needs it to fence nodeX or another node that failed at the same
time.

> Thanks in advance..
>
> -Errol
>
>
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list