[Pacemaker] are stopped resources monitored?

James Harper james.harper at bendigoit.com.au
Tue Nov 29 21:26:38 EST 2011


> >
> > That thread goes around in circles and completely contradicts what
I'm
> > seeing. What I'm seeing is that unmanaged resources are never
monitored.
> 
> would be strange and how do you verify this? A look at your config may
also
> help to shed some light on this ...
> 

The relevant portions of the config are:

primitive p_xen_smtp2 ocf:heartbeat:Xen \
        params name=" smtp2" xmfile="/configs/xen/smtp2" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="300s" \
        op migrate_from interval="0" timeout="300s" \
        op migrate_to interval="0" timeout="300s" \
        op monitor interval="10s" timeout="30s" \
        meta allow-migrate="true"

property $id="cib-bootstrap-options" \
        dc-version="1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1322100376"
rsc_defaults $id="rsc-options" \
        resource-stickiness="200"

I just tested the following (it actually contradicts some of my previous
statements... but I'm including it anyway as it wasn't what I expected):
 
. VM is running on node bitvs6 as a managed resource
. I type "crm resource unmanage p_xen_smtp2"
. crm status is "Started bitvs6 (unmanaged)"
. I manually stop the VM outside crm
. A few seconds later, the status is " Started bitvs6 (unmanaged)
FAILED" with a failed action " p_xen_smtp2_monitor_10000 (node=bitvs6,
call=70, rc=7, status=complete): not running"... so okay... it did
monitor a managed and _running_ resource, even though it resulted in an
error
. I type "crm resource cleanup p_xen_smtp2"
. hangs for ages at "Waiting for 3 replies from the CRMd.No messages
received in 60 seconds.." then finally says "aborting"
. I type "crm resource stop p_xen_smtp2"
. hangs for a bit then says " Call cib_replace failed (-41): Remote node
did not respond"

Any further attempt to do anything with this resource just hangs...
maybe the Xen RA monitor script is broken? I can only fix it by starting
the VM manually so that the actual status matches crm's expected
resource status.

So starting again to demonstrate the problem:
. VM is running on node bitvs6 as a managed resource
. I type "crm resource stop p_xen_smtp2"
. VM shuts down as expected
. I type "crm resource unmanage p_xen_smtp2"
. I manually start the VM outside of crm
. crm _never_ notices that the resource is started unless I do something
like "crm resource cleanup p_xen_smtp2" to manually cause the monitoring
script to be run

Now the above is all about unmanaged resources, but this VM is one I
could rebuild easily enough so now I'm going to get tricky:

. VM is running on node bitvs6 as a managed resource
. I type "crm resource stop p_xen_smtp2"
. VM shuts down as expected
. I manually start the VM outside of crm
. crm still _never_ notices that the resource is started unless I do
something like "crm resource cleanup p_xen_smtp2" to manually cause the
monitoring script to be run

This really is unexpected behaviour... starting the resource in crm
causes the right things to happen (notices that the resource is running)
but I still expected that a stopped resource would be monitored...

Thanks

James






More information about the Pacemaker mailing list