[Pacemaker] Pacemaker resource management

Sun Jun 6 14:47:35 UTC 2010

Hello all,

I have a couple of questions and I haven't found any relevant 
documentation about it so I would appreciate any answers on the matter.

I'm using drbd 8.3.2-6 with pacemaker 1.0.5-4.2, openais 0.80.5-15.2 and 
heartbeat 3.0.0-33.3 for a high availability 2 node cluster for mysql 
and apache with drbd partitions.

What I want to know is if a a resource fails, such as apache, pacemaker 
tries to restart the service, it has to do with 
"common_apply_stickiness", from what I can see in the logs.

1. How many times does pacemaker try to restart a resource before 
declaring it "down" and migrating the resource (and dependencies) to the 
other node?
2. How can I alter this behavior, to be able to set the number of 
retries a resource is attempted to be restarted before migrating it to 
the other available node?

I've noticed that sometimes, if there is a problem with the block device 
(drbd) the cluster will go into a stage where it migrates all resources 
in a group from A to B, however, when trying to start resources on B, 
there is a synchronization issue, one block device is still being in 
process of being updated from node A drbd0 to node B drbd0. In this case 
the group resources don't start until the synchronization is complete.

3. Can I "force" a group of resources to migrate to another node if any 
of the resources fails to be brought up within a number of retries or 
after a timeout (including if the group is just being migrated from A to 
B, but one resource fails to start on B, to be migrated back to A)? How?
4. Is there a Resource Agent out there that can be configured to send 
SNMP traps?

Thank you in advance for your replies.

Best regards.