[Pacemaker] Resource monitoring actions when a resource dies uncleanly

Thu Jan 6 17:41:21 UTC 2011

Hi- 

First off, I'm new to Pacemaker and there's a tremendous amount of information to sift through, so my apologies if this has been answered already. 

I'm trying to set up a simple 2-node active/passive cluster that runs squid (reverse proxy for web services) on a service IP address. I'm not using STONITH because there's no shared data, so nothing horrible would happen if squid somehow ends up running on both boxes. So, there are just two resources, squid itself and the IP address, configured as a resource group because they must be on the same machine. 

I've done some investigation on setting up resource monitoring for squid. Ideally, if squid dies for any reason on the currently-active node, I would like to fail both resources (squid and IP) over to the other node. For resource monitoring, there is an on-fail action called "standby", which is described as: "Move all resources away from the node on which the resource failed." That sounded to me like what I want, so I tested it. Unfortunately, I found that if squid dies uncleanly (simulated by issuing a kill -9 to its process), Pacemaker gets into an infinite loop of repeatedly trying to use the init script to "stop" squid. The init script is returning some error value because, in its words, "squid is dead but pid file exists". squid is never started on the other node because Pacemaker is never satisfied that it has truly stopped on the original node. 

Since a typical unexpected software failure would be an unclean failure (seg fault or whatever), this monitoring doesn't seem very useful if it always gets stuck trying to "stop" the crashed service before taking any further action. Is there a generally-accepted way around this? Should the init script (LSB) be rewritten to respond differently to this situation, or is there some way to get Pacemaker to respond differently? 

Thanks, 

-Andrew L 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110106/79c2f47c/attachment-0001.html>