[ClusterLabs] two node cluster not behaving right

Fri Nov 6 11:00:27 CET 2015

> On 6. nov. 2015, at 08.42, Jan Friesse <jfriesse at redhat.com> wrote:
> 
> user.clusterlabs.org at siimnet.dk <mailto:user.clusterlabs.org at siimnet.dk> napsal(a):
>> Been new to pacemaker, I’m trying to create my first cluster of two nodes, but it seems to behave a little strange.
>> Following this guide: http://clusterlabs.org/quickstart-redhat-6.html <http://clusterlabs.org/quickstart-redhat-6.html> <http://clusterlabs.org/quickstart-redhat-6.html <http://clusterlabs.org/quickstart-redhat-6.html>>
>> 
>> but am unable to do f.ex.:
>> 
>> [root at afnA ~]# pcs property set stonith-enabled=false
>> Error: Unable to update cib
>> Call cib_replace failed (-62): Timer expired
>> 
>> 
>> only thing I find in logs are continued corosync events:
>> 
>> Nov 06 01:30:54 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:56 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:57 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:30:59 corosync [TOTEM ] Retransmit List: 96 97
>> Nov 06 01:31:01 corosync [TOTEM ] Retransmit List: 96 97
> 
> This means something is blocking successful delivery of packets. Make sure to:
> - Properly configure firewall (for testing you can disable it completely)
> - Make sure you have properly configured multicast. As alternative, you can try udpu. Udpu is usually better compatible with switches and for two node use case performance is same.
Found this thread: http://www.gossamer-threads.com/lists/linuxha/pacemaker/90203

It seems that multicast between my two KVM nodes stops after 180s:

afnA :   unicast, seq=178, size=69 bytes, dist=0, time=0.238ms
afnA : multicast, seq=178, size=69 bytes, dist=0, time=0.324ms
afnA :   unicast, seq=179, size=69 bytes, dist=0, time=0.243ms
afnA : multicast, seq=179, size=69 bytes, dist=0, time=0.313ms
afnA :   unicast, seq=180, size=69 bytes, dist=0, time=0.273ms
afnA :   unicast, seq=181, size=69 bytes, dist=0, time=0.449ms
afnA :   unicast, seq=182, size=69 bytes, dist=0, time=0.266ms
afnA :   unicast, seq=183, size=69 bytes, dist=0, time=0.367ms

I can then just restart omping and get another 180s of multicasting… hmm might this have anything to do with the open vswitch used between nodes… seem to remember to have read about issues with open vswitches and multicasting, will dig more…

Meanwhile since I only have two nodes cluster, how do I configure it to do unicast in /etc/cluster/cluster,conf, as cman stack doesn’t use /etc/corosync/corosync.conf (have test with skewed malfunction corosync.conf, cman still forms quorum initially)?

TIA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20151106/0d737f0f/attachment-0001.html>