[Pacemaker] Corosync crashes when cluster NIC disabled (Something strange happened)

Steven Dake sdake at redhat.com
Wed Mar 31 20:34:18 UTC 2010


On Wed, 2010-03-31 at 16:07 -0400, Simpson, John R wrote:
> Greetings all,
> 
> I have a lab cluster using Pacemaker 1.0.8 and Corosync 1.2.0-1
> (see packages below) on CentOS 5.4 (32-bit) VM's running under
> VMware ESXi 3.5.  My location constraints and connectivity
> tests were working well, so I was feeling really good when 
> I decided to shut down the interface used for cluster 
> communication and verify that it resulted in a split-brain cluster.
> 
> Much to my dismay, corosync crashed almost immediately on the node
> where I shut down the Ethernet interface.  I can recreate the issue
> at will on this cluster and a different cluster running a slightly
> more recent version of Pacemaker 1.0.8 and the same version of 
> Corosync on CentOS 5.4 64-bit VMs.
> 
> I've attached the log, but here is the most suspicious message:
> 
> Mar 31 15:35:16 corosync [pcmk  ] ERROR: pcmk_peer_update: Something strange happened: 1
> 
> Cluster communication is on 172.16.0.0/24 (eth1) and Apache, etc. are on 10.127.252.0/24 (eth0).
> 
> I've tried to include or attach all the relevant information -- please let me know if there's anything else that would be useful.
> 
> Regards,
> 
> John Simpson
> 

I've answered this so many times on the ml I've created a faq for it.
If the faq is unclear, let me know, and we can add to it.

http://www.corosync.org/doku.php?id=faq:ifdown

You mentioned Corosync crashed(segfault?), which it should not 

To report that crash, see the following faq:

http://www.corosync.org/doku.php?id=faq:crash


> [root at cy-ha01 ~]# netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0     0 eth3
> 172.16.0.0      0.0.0.0         255.255.255.0   U         0 0     0 eth1
> 192.168.0.0     0.0.0.0         255.255.255.0   U         0 0     0 eth2
> 10.127.252.0    0.0.0.0         255.255.255.0   U         0 0     0 eth0
> 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0     0 eth3
> 224.0.0.0       0.0.0.0         240.0.0.0       U         0 0     0 eth1
> 0.0.0.0         10.127.252.1    0.0.0.0         UG        0 0     0 eth0
> 
> [root at cy-ha01 ~]# date ; ifconfig eth1 down
> Wed Mar 31 15:35:03 EDT 2010
> 
> Output from crm_mon when eth1 is shut down.
> ============
> Last updated: Wed Mar 31 15:31:50 2010
> Stack: openais
> Current DC: cy-ha02 - partition with quorum
> Version: 1.0.8-2a76c6ac04bcccf42b89a08e55bfbd90da2fb49a
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ cy-ha01 cy-ha02 ]
> 
>  Resource Group: WebSiteGroup
>      ServiceIP  (ocf::heartbeat:IPaddr2):       Started cy-ha01
>      WebSite    (ocf::heartbeat:apache):        Started cy-ha01
>  Clone Set: CloneConnectivityTest
>      Started: [ cy-ha02 cy-ha01 ]
> Connection to the CIB terminated
> Reconnecting................................
> 
> [root at cy-ha01 ~]# rpm -qa | grep pace
> pacemaker-libs-devel-1.0.8-1.el5
> pacemaker-1.0.8-1.el5
> pacemaker-libs-1.0.8-1.el5
> [root at cy-ha01 ~]# rpm -qa | grep coros
> corosynclib-1.2.0-1.el5
> corosync-1.2.0-1.el5
> corosynclib-devel-1.2.0-1.el5
> 
> --
> John Simpson 
> Senior Software Engineer, I. T. Engineering and Operations
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker





More information about the Pacemaker mailing list