[Pacemaker] Action "unknown exec error" and unmanaged/failed resources, how to migrate?

Mon Apr 29 23:05:32 EDT 2013

On 29/04/2013, at 2:48 PM, Mark Williams <mwp at mwp.id.au> wrote:

> Hi all,
> 
> My two node cluster (qemu VMs, drbd) is now in quite a messy state.
> 
> The problem started with a unresponsive qemu VM, which appeared to be
> caused by a libvertd problem/bug.
> Others said the solution was to kill & restart libvertd which didnt help.
> To fix the problem, i figured putting the node1 into a clean state via
> server reboot would be the best idea, so i issued a crm standby
> command.

That doesn't initiate a reboot.
It only tells pacemaker to try and stop all the resources running on node1

> 
> I now have node1 in a standby state, but the resources/vm's that were
> (and still are) running on it have a "Master Started  (unmanaged)
> FAILED" state according to crm_mon.

Most likely because they refused to stop and you have no fencing.

> 
> Any actions i try to perform on that node (for example, moving a
> resource to the other node) results in a "unknown exec error".
> 
> I tried using crm_resource -C on a node1 "Started  (unmanaged) FAILED"
> resource, which changed its state to "Master (unmanaged) FAILED" (it
> did shutdown the running qemu VM).
> Trying to move that resource to node2 still fails with "unknown exec error".
> 
> How do i get out of this problem?

Step 1 is provide more information.
Likely from your logs.