[Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Mon Sep 27 05:26:17 UTC 2010
Hi,
When I investigated another problem, I discovered this phenomenon.
If attrd causes process trouble and does not restart, the problem does not occur.
Step1) After start, it causes a monitor error in UmIPaddr twice.
Online: [ srv01 srv02 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy2): Started srv01
Migration summary:
* Node srv02:
* Node srv01:
UmIPaddr: migration-threshold=10 fail-count=2
Step2) Kill Attrd and Attrd reboots.
Online: [ srv01 srv02 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy2): Started srv01
Migration summary:
* Node srv02:
* Node srv01:
UmIPaddr: migration-threshold=10 fail-count=2
Step3) It causes a monitor error in UmIPaddr.
Online: [ srv01 srv02 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy2): Started srv01
Migration summary:
* Node srv02:
* Node srv01:
UmIPaddr: migration-threshold=10 fail-count=1 -----> Fail-count return to the first.
The problem is so that attrd disappears fail-count by reboot.(Hash-tables is Lost.)
It is a problem very much that the trouble number of times is initialized.
I think that there is the following method.
method 1)Attrd maintain fail-count as a file in "/var/run" directories and refer.
method 2)When attrd started, Attrd communicates with cib and receives fail-count.
Is there a better method?
Please think about the solution of this problem.
Best Regards,
Hideo Yamauchi.
More information about the Pacemaker
mailing list