[Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS

Andrew Beekhof andrew at beekhof.net
Mon Apr 14 00:40:43 EDT 2014


On 11 Apr 2014, at 10:54 pm, Marco Felettigh <marco at nucleus.it> wrote:

> On Fri, 11 Apr 2014 17:17:57 +1000
> Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>> 
>> On 8 Apr 2014, at 8:37 pm, marco at nucleus.it wrote:
>> 
>>> On Tue, 8 Apr 2014 10:49:16 +1000
>>> Andrew Beekhof <andrew at beekhof.net> wrote:
>>> 
>>>> 
>>>> On 7 Apr 2014, at 8:46 pm, marco at nucleus.it wrote:
>>>> 
>>>>> Hi,
>>>>> in a production environment with 2 nodes ( nodeA , nodeB ) we had
>>>>> an hardware failure so we restart the nodeB.
>>>>> After the restarted nodeB came up we restart corosync/pacemaker on
>>>>> it but for 2 days till now che corosync/pacemaker stuff is
>>>>> looping.
>>>>> 
>>>>> crm_mon NodeA:
>>>>> 
>>>>> Stack: openais
>>>>> Current DC: nodeA - partition with quorum
>>>>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>>>> 2 Nodes configured, 2 expected votes
>>>>> 17 Resources configured.
>>>>> ============
>>>>> 
>>>>> Online: [ nodeA ]
>>>>> OFFLINE: [ nodeB ]
>>>>> 
>>>>> 
>>>>> crm_mon NodeB:
>>>>> 
>>>>> Stack: openais
>>>>> Current DC: NONE
>>>>> 2 Nodes configured, 2 expected votes
>>>>> 17 Resources configured.
>>>>> ============
>>>>> 
>>>>> OFFLINE: [ nodeA nodeB ]
>>>>> 
>>>>> This loop on nodeB reports:
>>>>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner:
>>>>> nodeA) lost: vote from nodeA (Age)
>>>>> 
>>>>> So investigating around i found these message on nodeA:
>>>>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS
>>>>> 
>>>>> now this message is repeating for every operation.
>>>>> Is it a corosync problem or a cib/pacemaker one ?
>>>>> Any suggestion on what is happened ?
>>>> 
>>>> For some reason the cib can't connect to corosync anymore.
>>>> No software got upgraded recently?
>>>> 
>>>> Are there any logs from corosync?
>>>> Which distro is this?
>>>> 
>>>>> And why the start of a cluster node crasched the DC suff ? :(
>>>>> 
>>>>> 
>>>>> Bye Marco
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>>>> http://bugs.clusterlabs.org
>>>> 
>>> 
>>> Hi,
>>> the distro in an opensuse 11.1 and there is no updates also because
>>> the distro is out of maintenance.
>> 
>> A good reason to be using SLES (or RHEL/CentOS).
> 
> Better Gentoo ;)
> 
>> 
>>> We are planning and upgrade but the interesting thing is to figure
>>> out the reasons of the problem.
>>> The log in attachment, thanks for the support
>> 
>> There's nothing obvious in the logs.  Just that as far as pacemaker
>> could tell, corosync suddenly went away. Was the corosync process
>> still running?
>> 
> 
> Yes , corosync was still running .

Stopping pacemaker and restarting it didnt help?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140414/ae7cc4b3/attachment-0003.sig>


More information about the Pacemaker mailing list