[Pacemaker] Problems when quorum lost for a short period of time
Andrew Beekhof
andrew at beekhof.net
Wed Oct 2 04:55:49 UTC 2013
On 02/10/2013, at 6:26 AM, Lev Sidorenko <levs at securemedia.co.nz> wrote:
> Hello All!
>
> I have a 4-nodes cluster setup.
>
> It is actually 2 nodes for main+stanby and another two nodes just for
> provide quorum.
1 extra would have been enough
>
> So, all resources run on the main node but only DRBD-slave runs on the
> standby node.
>
> I have no-quorum-policy="stop"
>
> So, sometimes main node looses connection to the cluster and reports
> "quorum lost" but after 1-2 seconds connection re-establish and reports
> "quorum retained"
> This causes a big problem: as soon main node lost quorum it starts to
> stop all resources. In the same time the second node starts to start
> resources. After couple of seconds main node rejoins cluster but still
> does not manage to stop all resources and part of resources already
> started on the second node. So, I have lots of conflicts between
> resources on these two nodes.
>
> I tried to setup no-quorum-policy="suicide" hoping that as soon as main
> node lost connection to the cluster it will reboot itself which will
> give enough time for the second node start all of processes and become a
> main one.
> But with no-quorum-policy="suicide" main node just trying to STONITH all
> of others nodes but not reboot itself.
It will do that last IIRC
>
> So: the question is: how can I setup to instantly reboot a node when the
> node detects that quorum lost?
Why don't you tweak the timings in corosync.conf (guess, you dont say what you're using) to be more tolerant of these blips instead?
>
> Thank you in advance!
>
> With the best regards,
> Lev Sidorenko.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131002/d1209f9c/attachment-0004.sig>
More information about the Pacemaker
mailing list