[Pacemaker] Monitor ops do not get cancelled

Phil Armstrong pma at sgi.com
Thu Sep 23 14:49:45 EDT 2010


I posted earlier asking for help because I had a primitive whose monitor 
operation was not getting canceled at the time that a manual relocation 
was performed. I updated pacemaker (as was suggested) to 
pacemaker-1.1.2-0.6.1 which is the latest I could find for an IA64 
platform without having to build from source. If anyone knows of a later 
IA64 binary version I would appreciate that information.

The monitor problem persisted after the upgrade, though the error 
messages I was seeing earlier were no longer present. They were 
apparently unrelated. Painful trial and error lead me to the conclusion 
that it was the primitive's start-op timeout and monitor-op start-delay 
values. When I had these values set at 480s, the monitor-op did not get 
canceled for a manual relocation and so would get rescheduled after the 
relocation only to find the resource not operational (it had been 
relocated) and thus set the fail-count to non-zero, fencing the 
resource. If I set the values to 240s, everything went smoothly and the 
monitor-op was canceled.

As a test, I changed a different primitive's values to 480s and that 
primitive then displayed the failing behavior.

If anyone knows why this might be the case (perhaps there are rules I am 
unaware of that prohibit larger values) I would appreciate the 
information. If not, I guess I should will a bug.

Thanks for any help in advance.

Phil

-- 
	Phil Armstrong       pma at sgi.com
	Phone: 651-683-5561  VNET 233-5561





More information about the Pacemaker mailing list