[Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

Andrew Beekhof andrew at beekhof.net
Fri Apr 11 03:17:57 EDT 2014


On 8 Apr 2014, at 8:37 pm, marco at nucleus.it wrote:

> On Tue, 8 Apr 2014 10:49:16 +1000
> Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>> 
>> On 7 Apr 2014, at 8:46 pm, marco at nucleus.it wrote:
>> 
>>> Hi,
>>> in a production environment with 2 nodes ( nodeA , nodeB ) we had an
>>> hardware failure so we restart the nodeB.
>>> After the restarted nodeB came up we restart corosync/pacemaker on
>>> it but for 2 days till now che corosync/pacemaker stuff is looping.
>>> 
>>> crm_mon NodeA:
>>> 
>>> Stack: openais
>>> Current DC: nodeA - partition with quorum
>>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>> 2 Nodes configured, 2 expected votes
>>> 17 Resources configured.
>>> ============
>>> 
>>> Online: [ nodeA ]
>>> OFFLINE: [ nodeB ]
>>> 
>>> 
>>> crm_mon NodeB:
>>> 
>>> Stack: openais
>>> Current DC: NONE
>>> 2 Nodes configured, 2 expected votes
>>> 17 Resources configured.
>>> ============
>>> 
>>> OFFLINE: [ nodeA nodeB ]
>>> 
>>> This loop on nodeB reports:
>>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner:
>>> nodeA) lost: vote from nodeA (Age)
>>> 
>>> So investigating around i found these message on nodeA:
>>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS
>>> 
>>> now this message is repeating for every operation.
>>> Is it a corosync problem or a cib/pacemaker one ?
>>> Any suggestion on what is happened ?
>> 
>> For some reason the cib can't connect to corosync anymore.
>> No software got upgraded recently?
>> 
>> Are there any logs from corosync?
>> Which distro is this?
>> 
>>> And why the start of a cluster node crasched the DC suff ? :(
>>> 
>>> 
>>> Bye Marco
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>> http://bugs.clusterlabs.org
>> 
> 
> Hi,
> the distro in an opensuse 11.1 and there is no updates also because the
> distro is out of maintenance.

A good reason to be using SLES (or RHEL/CentOS).

> We are planning and upgrade but the interesting thing is to figure out
> the reasons of the problem.
> The log in attachment, thanks for the support

There's nothing obvious in the logs.  Just that as far as pacemaker could tell, corosync suddenly went away.
Was the corosync process still running?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140411/bf334bc7/attachment-0003.sig>


More information about the Pacemaker mailing list