[Pacemaker] Cannot start VirtualDomain resource after restart
Phil Frost
phil at macprofessionals.com
Wed Jun 20 16:40:29 CEST 2012
On 06/20/2012 10:11 AM, emmanuel segura wrote:
> I don't know but see the fail it's in the operation lx0_monitor_0, so
> i ask to someone with more experience then me, if pacemaker does a
> monitor operation before start?
I'm just learning Pacemaker myself, so I could be wrong on some points.
I don't have any specific solutions to give, but I can share some
troubleshooting techniques that might give some deeper insight into what
is happening.
Firstly, I'd try running "crm_simulate -LS -D pacemaker.dot", then
viewing the generated pacemaker.dot with graphviz [1] (specifically
"dot". It might also be helpful to pass pacemaker.dot through "tred"
first, to make it more readable). This asks crm_simulate to simulate
what pacemaker would like to do (-S), given the current live state (-L).
Probably it will tell you it would do nothing, because it's already
running in the desired (by pacemaker, anyway) state. However, I have
seen instances in testing where Pacemaker will be stuck in some start ->
monitor -> timeout loop that's not immediately obvious in crm_mon. This
will reveal that.
You can also use crm_simulate to see what Pacemaker would do if you
rebooted everything. This can give you some insight because it removes
the current state of all your nodes from the equation. To do this, you
have to generate a CIB dump without a status section. You can do that by
manually editing the output of "cibadmin -Q", but an easier way is to
run "crm configure show xml". Since there's no status section,
crm_simulate will assume the nodes are offline, so you also have to use
the "-u" option to tell it to simulate the nodes coming online. Putting
that all together, you get something like this:
crm configure show xml | crm_simulate -Sp -D pacemaker.dot -u node01 -u
node02 [-u node03 ...]
Of course you will have to adjust the node names to suit your
environment. You should see Pacemaker wanting to start all your
resources. If not, there's probably something in your configuration that
prevents it from doing so. Coincidentally, you will also see here the
answer to your question: Pacemaker does do a monitor of a resource on
all nodes before starting it. This way, it can avoid starting it if it
was already running but it didn't know about it.
If all that proves unfruitful, you can continue to run other "what-if"
tests by dumping the current CIB with "cibadmin -Q", editing it, and
passing it into crm_simulate. In this way you can make some guesses
about what's wrong and test your hypothesis.
[1] http://www.graphviz.org/
More information about the Pacemaker
mailing list