[Pacemaker] Resources not migrating on node failure?

Wed Dec 1 11:32:39 UTC 2010

On 12/1/2010 at 05:11 AM, Anton Altaparmakov <aia21 at cam.ac.uk> wrote: 
> Hi, 
>  
> I have set up a three node cluster (running Ubuntu 10.04 LTS server with  
> Corosync 1.2.0, Pacemaker 1.0.8, drbd 8.3.7), where one node is only present  
> to provide quorum to the other two nodes in case one node fails but it itself  
> cannot run any resources.  The other two nodes are running drbd in  
> master/slave to provide replicated storage, then XFS file system on top of  
> the drbd storage on the master, together with an NFS server on top of the XFS  
> mount, and a service IP address on which the NFS export is shared.  This is  
> all working brilliantly and I can cause the resources to move to the slave  
> node by running "crm_standby -U cerberus -v on" where cerberus is the master  
> node and everything then migrates to the slave node "minotaur". 
>  
> My problem is if I pull the power plug on the master node "cerberus".  Then  
> nothing happens!  minotaur continues to run as slave and it never takes over. 
>  
> And I don't get why.  )-: 

Probably because STONITH is disabled.  It can't take over the resources unless
it knows they're stopped, and without a clean shutdown, there's no way to
guarantee they're stopped without STONITH.

> Also, a second question, possibly related to the first problem, is do I need  
> to define monitor actions for each resource or is that done automatically?

No, you need to define them.

> If I need to do it specifically, how do I do that now that I have it all up  
> and running without defining monitor actions? 

Run "crm configure edit" and add whichever monitor ops you need.

Have a look at Clusters from Scratch at:

  http://www.clusterlabs.org/wiki/Documentation

HTH,

Tim

-- 
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.