[Pacemaker] Monitor ops do not get cancelled
Phil Armstrong
pma at sgi.com
Thu Sep 23 18:49:45 UTC 2010
I posted earlier asking for help because I had a primitive whose monitor
operation was not getting canceled at the time that a manual relocation
was performed. I updated pacemaker (as was suggested) to
pacemaker-1.1.2-0.6.1 which is the latest I could find for an IA64
platform without having to build from source. If anyone knows of a later
IA64 binary version I would appreciate that information.
The monitor problem persisted after the upgrade, though the error
messages I was seeing earlier were no longer present. They were
apparently unrelated. Painful trial and error lead me to the conclusion
that it was the primitive's start-op timeout and monitor-op start-delay
values. When I had these values set at 480s, the monitor-op did not get
canceled for a manual relocation and so would get rescheduled after the
relocation only to find the resource not operational (it had been
relocated) and thus set the fail-count to non-zero, fencing the
resource. If I set the values to 240s, everything went smoothly and the
monitor-op was canceled.
As a test, I changed a different primitive's values to 480s and that
primitive then displayed the failing behavior.
If anyone knows why this might be the case (perhaps there are rules I am
unaware of that prohibit larger values) I would appreciate the
information. If not, I guess I should will a bug.
Thanks for any help in advance.
Phil
--
Phil Armstrong pma at sgi.com
Phone: 651-683-5561 VNET 233-5561
More information about the Pacemaker
mailing list