[Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS
Andrew Beekhof
andrew at beekhof.net
Mon Apr 14 04:40:43 UTC 2014
On 11 Apr 2014, at 10:54 pm, Marco Felettigh <marco at nucleus.it> wrote:
> On Fri, 11 Apr 2014 17:17:57 +1000
> Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>
>> On 8 Apr 2014, at 8:37 pm, marco at nucleus.it wrote:
>>
>>> On Tue, 8 Apr 2014 10:49:16 +1000
>>> Andrew Beekhof <andrew at beekhof.net> wrote:
>>>
>>>>
>>>> On 7 Apr 2014, at 8:46 pm, marco at nucleus.it wrote:
>>>>
>>>>> Hi,
>>>>> in a production environment with 2 nodes ( nodeA , nodeB ) we had
>>>>> an hardware failure so we restart the nodeB.
>>>>> After the restarted nodeB came up we restart corosync/pacemaker on
>>>>> it but for 2 days till now che corosync/pacemaker stuff is
>>>>> looping.
>>>>>
>>>>> crm_mon NodeA:
>>>>>
>>>>> Stack: openais
>>>>> Current DC: nodeA - partition with quorum
>>>>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>>>> 2 Nodes configured, 2 expected votes
>>>>> 17 Resources configured.
>>>>> ============
>>>>>
>>>>> Online: [ nodeA ]
>>>>> OFFLINE: [ nodeB ]
>>>>>
>>>>>
>>>>> crm_mon NodeB:
>>>>>
>>>>> Stack: openais
>>>>> Current DC: NONE
>>>>> 2 Nodes configured, 2 expected votes
>>>>> 17 Resources configured.
>>>>> ============
>>>>>
>>>>> OFFLINE: [ nodeA nodeB ]
>>>>>
>>>>> This loop on nodeB reports:
>>>>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner:
>>>>> nodeA) lost: vote from nodeA (Age)
>>>>>
>>>>> So investigating around i found these message on nodeA:
>>>>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS
>>>>>
>>>>> now this message is repeating for every operation.
>>>>> Is it a corosync problem or a cib/pacemaker one ?
>>>>> Any suggestion on what is happened ?
>>>>
>>>> For some reason the cib can't connect to corosync anymore.
>>>> No software got upgraded recently?
>>>>
>>>> Are there any logs from corosync?
>>>> Which distro is this?
>>>>
>>>>> And why the start of a cluster node crasched the DC suff ? :(
>>>>>
>>>>>
>>>>> Bye Marco
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>>>> http://bugs.clusterlabs.org
>>>>
>>>
>>> Hi,
>>> the distro in an opensuse 11.1 and there is no updates also because
>>> the distro is out of maintenance.
>>
>> A good reason to be using SLES (or RHEL/CentOS).
>
> Better Gentoo ;)
>
>>
>>> We are planning and upgrade but the interesting thing is to figure
>>> out the reasons of the problem.
>>> The log in attachment, thanks for the support
>>
>> There's nothing obvious in the logs. Just that as far as pacemaker
>> could tell, corosync suddenly went away. Was the corosync process
>> still running?
>>
>
> Yes , corosync was still running .
Stopping pacemaker and restarting it didnt help?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140414/ae7cc4b3/attachment-0004.sig>
More information about the Pacemaker
mailing list