[ClusterLabs] Antw: After reboot each node thinks the other is offline.
Stephen Carville (HA List)
62d2a7ca at opayq.com
Tue Aug 1 04:05:00 EDT 2017
On 07/31/2017 11:13 PM, Ulrich Windl [Masked] wrote:
>> I am experimenting with pacemaker for high availability for some load
>> balancers. I was able to sucessfully get two CentOS (6.9) machines
>> (scahadev01da and scahadev01db) to form a cluster and the shared IP was
>> assigned to scahadev01da. I simulated a failure by halting the primary
>> and the secondary eventually noticed bringing up the shared IP on its
>> eth0. So far, so good.
>>
>> A problem arises when the primary comes back up and, for some reason,
>> each node thinks the other is offline. This leads to both nodes adding
>
> If a node thinks the other is unexpectedly offline, it will fence it, and then it will be offline! Thus the IP can't run there. I guess you have no fencing configured, right?
No. I didn't realize it was necessary unless there was shared storage
involved. I guess it is time to go back to the drawing board. Can
clustering even be done reliably on CentOS 6? I have no objection to
moving to 7 but I was hoping I could get this up quicker than building
out a bunch of new balancers.
On a related note: I tried rebooting both nodes and each node still
thinks the other is offline. For future reference is there a way to
clear that?
> Regards,
> Ulrich
>
>> the duplicate IP to its own eth0. I probably do not need to tell you
>> the mischief that can cause if these were production servers.
>>
>> I tried restarting cman, pcsd and pacemaker on both machines with no
>> effect on the situation.
>>
>> I've found several mentions of it in the search engines but I've been
>> unable to find how to fix it. Any help is appreciated
>>
>> Both nodes have quorum disabled in /etc/sysconfig/cman
>>
>> CMAN_QUORUM_TIMEOUT=0
>>
>> #------------------------------------------------
>> Node 1
>>
>> scahadev01da# sudo pcs status
>> Cluster name: scahadev01d
>> Stack: cman
>> Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition
>> WITHOUT quorum
>> Last updated: Mon Jul 31 10:43:54 2017 Last change: Mon Jul 31 10:30:46
>> 2017 by root via cibadmin on scahadev01da
>>
>> 2 nodes and 1 resource configured
>>
>> Online: [ scahadev01da ]
>> OFFLINE: [ scahadev01db ]
>>
>> Full list of resources:
>>
>> VirtualIP (ocf::heartbeat:IPaddr2): Started scahadev01da
>>
>> Daemon Status:
>> cman: active/enabled
>> corosync: active/disabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>>
>> #------------------------------------------------
>> Node 2
>>
>> scahadev01db ~]$ sudo pcs status
>> Cluster name: scahadev01d
>> Stack: cman
>> Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition
>> WITHOUT quorum
>> Last updated: Mon Jul 31 10:43:47 2017 Last change: Sat Jul 29 13:45:15
>> 2017 by root via cibadmin on scahadev01da
>>
>> 2 nodes and 1 resource configured
>>
>> Online: [ scahadev01db ]
>> OFFLINE: [ scahadev01da ]
>>
>> Full list of resources:
>>
>> VirtualIP (ocf::heartbeat:IPaddr2): Started scahadev01db
>>
>> Daemon Status:
>> cman: active/enabled
>> corosync: active/disabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>>
>> --
>> Stephen Carville
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list