[Pacemaker] Corosync 1.4.7: zombie (defunct)
Sergey Arlashin
sergeyarl.maillist at gmail.com
Tue Jan 6 07:04:22 UTC 2015
Thank you!
I'll try 1.1.12.
--
Best regards,
Sergey Arlashin
On Jan 6, 2015, at 3:23 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> Yeah, I can imagine 1.1.6 behaving like this.
> I'd highly recommend 1.1.12
>
>> On 5 Jan 2015, at 5:14 pm, Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>
>> Pacemaker 1.1.6
>>
>> It runs on Ubuntu 12.04 LTS 64bit.
>>
>> Linux lb-node1 3.11.0-23-generic #40~precise1-Ubuntu SMP Wed Jun 4 22:06:36 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>>
>> --
>> Best regards,
>> Sergey Arlashin
>>
>>
>> On Jan 5, 2015, at 7:59 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>>> pacemaker version? it looks familiar but it depends on the version number.
>>>
>>>> On 29 Dec 2014, at 10:24 pm, Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
>>>>
>>>> Hi!
>>>> Recently I've noticed that one of my nodes had OFFLINE status in 'crm status' output. But it actually was not. I could ssh on this node. I could get 'crm status' from that node's console. After some time it became online. It happened several times without any obvious reason with other nodes.
>>>>
>>>> Still no error of fatal messages in logs. The only warning messages I could get from corosync.log were the following:
>>>>
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1346 -> 0.233.1347 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1347 -> 0.233.1348 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1348 -> 0.233.1349 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1349 -> 0.233.1350 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1350 -> 0.233.1351 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1351 -> 0.233.1352 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1352 -> 0.233.1353 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1353 -> 0.233.1354 not applied to 0.233.1354: current "num_updates" is greater than required
>>>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 491 for last-failure-Cachier=1419729443 failed: Application of an update diff failed
>>>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 494 for fail-count-Cachier=1 failed: Application of an update diff failed
>>>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 497 for probe_complete=true failed: Application of an update diff failed
>>>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 500 for last-failure-Cachier=1419729443 failed: Application of an update diff failed
>>>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 503 for fail-count-Cachier=1 failed: Application of an update diff failed
>>>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1338 -> 0.233.1339 not applied to 0.233.1382: current "num_updates" is greater than required
>>>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1339 -> 0.233.1340 not applied to 0.233.1382: current "num_updates" is greater than required
>>>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1340 -> 0.233.1341 not applied to 0.233.1382: current "num_updates" is greater than required
>>>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1341 -> 0.233.1342 not applied to 0.233.1382: current "num_updates" is greater than required
>>>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1342 -> 0.233.1343 not applied to 0.233.1382: current "num_updates" is greater than required
>>>>
>>>> After exploring corosync processes with ps I found out that on all my nodes there are zombie corosync procs like:
>>>>
>>>> root 13892 0.0 0.0 0 0 ? Z Dec26 0:04 [corosync] <defunct>
>>>> root 21793 0.0 0.0 0 0 ? Z Dec26 0:00 [corosync] <defunct>
>>>> root 27009 1.3 1.0 714292 10784 ? Ssl Dec18 223:38 /usr/sbin/corosync
>>>>
>>>> Is it ok to have zombie corosync procs on nodes? Or does it suggest that something wrong is going on ?
>>>>
>>>> Thanks in advance
>>>>
>>>> --
>>>> Best regards,
>>>> Sergey Arlashin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list