[Pacemaker] are stopped resources monitored?

Andrew Beekhof andrew at beekhof.net
Thu Dec 8 22:17:39 CET 2011


On Wed, Nov 30, 2011 at 1:26 PM, James Harper
<james.harper at bendigoit.com.au> wrote:
>> >
>> > That thread goes around in circles and completely contradicts what
> I'm
>> > seeing. What I'm seeing is that unmanaged resources are never
> monitored.
>>
>> would be strange and how do you verify this? A look at your config may
> also
>> help to shed some light on this ...
>>
>
> The relevant portions of the config are:
>
> primitive p_xen_smtp2 ocf:heartbeat:Xen \
>        params name=" smtp2" xmfile="/configs/xen/smtp2" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="300s" \
>        op migrate_from interval="0" timeout="300s" \
>        op migrate_to interval="0" timeout="300s" \
>        op monitor interval="10s" timeout="30s" \
>        meta allow-migrate="true"
>
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore" \
>        last-lrm-refresh="1322100376"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="200"
>
> I just tested the following (it actually contradicts some of my previous
> statements... but I'm including it anyway as it wasn't what I expected):
>
> . VM is running on node bitvs6 as a managed resource
> . I type "crm resource unmanage p_xen_smtp2"
> . crm status is "Started bitvs6 (unmanaged)"
> . I manually stop the VM outside crm
> . A few seconds later, the status is " Started bitvs6 (unmanaged)
> FAILED" with a failed action " p_xen_smtp2_monitor_10000 (node=bitvs6,
> call=70, rc=7, status=complete): not running"... so okay... it did
> monitor a managed and _running_ resource, even though it resulted in an
> error

So far so good.

> . I type "crm resource cleanup p_xen_smtp2"

What for?
This has the side effect of stopping any recurring monitor action that
was running.

> . hangs for ages at "Waiting for 3 replies from the CRMd.No messages
> received in 60 seconds.." then finally says "aborting"
> . I type "crm resource stop p_xen_smtp2"
> . hangs for a bit then says " Call cib_replace failed (-41): Remote node
> did not respond"

That doesn't look good at all.
At a guess, it seems like something crashed.  If you want to file a
bug and attach a crm_report I'll take a look.

>
> Any further attempt to do anything with this resource just hangs...
> maybe the Xen RA monitor script is broken? I can only fix it by starting
> the VM manually so that the actual status matches crm's expected
> resource status.
>
> So starting again to demonstrate the problem:
> . VM is running on node bitvs6 as a managed resource
> . I type "crm resource stop p_xen_smtp2"
> . VM shuts down as expected
> . I type "crm resource unmanage p_xen_smtp2"
> . I manually start the VM outside of crm
> . crm _never_ notices that the resource is started unless I do something
> like "crm resource cleanup p_xen_smtp2" to manually cause the monitoring
> script to be run

The 1.1.x series will detect this if you specify a recurring monitor
with role=Stopped, but its not the default behaviour because, well,
"don't do that".

>
> Now the above is all about unmanaged resources, but this VM is one I
> could rebuild easily enough so now I'm going to get tricky:
>
> . VM is running on node bitvs6 as a managed resource
> . I type "crm resource stop p_xen_smtp2"
> . VM shuts down as expected
> . I manually start the VM outside of crm
> . crm still _never_ notices that the resource is started unless I do
> something like "crm resource cleanup p_xen_smtp2" to manually cause the
> monitoring script to be run

As above.

>
> This really is unexpected behaviour... starting the resource in crm
> causes the right things to happen (notices that the resource is running)
> but I still expected that a stopped resource would be monitored...

No, not by default.
There should be only one point of control, you're creating an internal
split-brain by telling the cluster to control the resource AND doing
so yourself in parallel.



More information about the Pacemaker mailing list