[Pacemaker] 1) attrd, crmd, cib, stonithd going to 100% CPU after standby 2) monitoring bug 3) meta failure-timeout issue

Proskurin Kirill k.proskurin at corp.mail.ru
Mon Oct 3 07:31:39 UTC 2011


On 10/03/2011 05:32 AM, Andrew Beekhof wrote:

I attach logs of this error as you asked in IRC.
This logs about third situation with this  error and resource name is 
different but problem and env is same.

Resource name is: tranprocessor
Primitive configuration in same as below except path to script.

>> 2)
>> This one is scary.
>> I twice run on situation then pacemaker thinks what resource is started but
>> it is not.
>
> RA is misbehaving.  Pacemaker will only consider a resource running if
> the RA tells us it is (running or in a failed state).
>
>> We use slightly modifed version of "anything" agent for our
>> scripts but they are aware of OCF return codes and other staff.
>>
>> I run monitoring by our agent from console:
>> # env -i ; OCF_ROOT=/usr/lib/ocf
>> OCF_RESKEY_binfile=/usr/local/mpop/bin/my/dialogues_notify.pl
>> /usr/lib/ocf/resource.d/mail.ru/generic monitor
>> # generic[14992]: DEBUG: default monitor : 7
>>
>> So our agent said what it is not running, but pacemaker still think it does.
>> I runs for 2 days and after I forced to cleanup it. And it find what it`snot
>> running in seconds.
>
> Did you configure a recurring monitor operation?
>
>>
>> This is really scary situation. I can`t reproduce it but I already have it
>> twice... may be more but I not see it, who knows.
>>
>> I attach out agent script and that is how we run this script:
>>
>> primitive dialogues_notify.pl ocf:mail.ru:generic \
>>         op monitor interval="30" timeout="300" on-fail="restart" \
>>         op start interval="0" timeout="300" \
>>         op stop interval="0" timeout="300" \
>>         params binfile="/usr/local/mpop/bin/my/dialogues_notify.pl" \
>>         meta failure-timeout="120"


-- 
Best regards,
Proskurin Kirill
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log.gz
Type: application/x-gzip
Size: 345664 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111003/184e4ded/attachment-0002.bin>


More information about the Pacemaker mailing list