[Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

Marco Felettigh marco at nucleus.it
Fri Apr 11 08:54:29 EDT 2014


On Fri, 11 Apr 2014 17:17:57 +1000
Andrew Beekhof <andrew at beekhof.net> wrote:

> 
> On 8 Apr 2014, at 8:37 pm, marco at nucleus.it wrote:
> 
> > On Tue, 8 Apr 2014 10:49:16 +1000
> > Andrew Beekhof <andrew at beekhof.net> wrote:
> > 
> >> 
> >> On 7 Apr 2014, at 8:46 pm, marco at nucleus.it wrote:
> >> 
> >>> Hi,
> >>> in a production environment with 2 nodes ( nodeA , nodeB ) we had
> >>> an hardware failure so we restart the nodeB.
> >>> After the restarted nodeB came up we restart corosync/pacemaker on
> >>> it but for 2 days till now che corosync/pacemaker stuff is
> >>> looping.
> >>> 
> >>> crm_mon NodeA:
> >>> 
> >>> Stack: openais
> >>> Current DC: nodeA - partition with quorum
> >>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> >>> 2 Nodes configured, 2 expected votes
> >>> 17 Resources configured.
> >>> ============
> >>> 
> >>> Online: [ nodeA ]
> >>> OFFLINE: [ nodeB ]
> >>> 
> >>> 
> >>> crm_mon NodeB:
> >>> 
> >>> Stack: openais
> >>> Current DC: NONE
> >>> 2 Nodes configured, 2 expected votes
> >>> 17 Resources configured.
> >>> ============
> >>> 
> >>> OFFLINE: [ nodeA nodeB ]
> >>> 
> >>> This loop on nodeB reports:
> >>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner:
> >>> nodeA) lost: vote from nodeA (Age)
> >>> 
> >>> So investigating around i found these message on nodeA:
> >>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS
> >>> 
> >>> now this message is repeating for every operation.
> >>> Is it a corosync problem or a cib/pacemaker one ?
> >>> Any suggestion on what is happened ?
> >> 
> >> For some reason the cib can't connect to corosync anymore.
> >> No software got upgraded recently?
> >> 
> >> Are there any logs from corosync?
> >> Which distro is this?
> >> 
> >>> And why the start of a cluster node crasched the DC suff ? :(
> >>> 
> >>> 
> >>> Bye Marco
> >>> 
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>> 
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> >>> http://bugs.clusterlabs.org
> >> 
> > 
> > Hi,
> > the distro in an opensuse 11.1 and there is no updates also because
> > the distro is out of maintenance.
> 
> A good reason to be using SLES (or RHEL/CentOS).

Better Gentoo ;)

> 
> > We are planning and upgrade but the interesting thing is to figure
> > out the reasons of the problem.
> > The log in attachment, thanks for the support
> 
> There's nothing obvious in the logs.  Just that as far as pacemaker
> could tell, corosync suddenly went away. Was the corosync process
> still running?
> 

Yes , corosync was still running .





More information about the Pacemaker mailing list