[Pacemaker] monitor operation stopped running
Chris Picton
chris at ecntelecoms.com
Fri Dec 17 09:56:53 UTC 2010
On Thu, 16 Dec 2010 08:27:51 +0100, Andrew Beekhof wrote:
> On Wed, Dec 15, 2010 at 8:30 AM, Chris Picton
>> Why would a resource cleanup remove the resource from the lrm, even
>> though it is still running correctly,
>
> Thats what cleanup does.
> What is supposed to happen next however, is that the cluster runs a
> non-recurring monitor operation to re-determine the current state of the
> cluster and go from there.
> Also, any recurring actions should have been cancelled at the point the
> resource was removed from the lrm.
>
> What versions of pacemaker and cluster-glue do you have? Distro?
>
I am using the clusterlabs rpms
pacemaker-1.0.9.1-1.15.el5
cluster-glue-1.0.6-1.6.el5
I see the following in the output of mon_mon -rf1t (I'm only showing the
resources which are showing rc != 0)
* Node sbc-tpna2-06.ecntelecoms.za.net: pingd=100
megaswitch:5: migration-threshold=1000000
+ (53) probe: last-rc-change='Fri Nov 26 09:17:38 2010' last-run='Fri
Nov 26 09:17:38 2010' exec-time=30ms queue-time=0ms rc=1 (unknown error)
+ (55) stop: last-rc-change='Fri Nov 26 09:17:41 2010' last-run='Fri
Nov 26 09:17:41 2010' exec-time=20ms queue-time=0ms rc=0 (ok)
+ (56) start: last-rc-change='Fri Nov 26 09:17:42 2010' last-run='Fri
Nov 26 09:17:42 2010' exec-time=1040ms queue-time=0ms rc=0 (ok)
+ (57) monitor: interval=8000ms last-rc-change='Fri Nov 26 09:17:44
2010' last-run='Fri Nov 26 09:17:44 2010' exec-time=260ms queue-time=0ms
rc=0 (ok)
* Node sbc-tpna2-05.ecntelecoms.za.net: pingd=100
megaswitch:4: migration-threshold=1000000
+ (58) probe: last-rc-change='Fri Nov 26 09:17:38 2010' last-run='Fri
Nov 26 09:17:38 2010' exec-time=30ms queue-time=0ms rc=1 (unknown error)
+ (60) stop: last-rc-change='Fri Nov 26 09:17:41 2010' last-run='Fri
Nov 26 09:17:41 2010' exec-time=20ms queue-time=0ms rc=0 (ok)
+ (61) start: last-rc-change='Fri Nov 26 09:17:42 2010' last-run='Fri
Nov 26 09:17:42 2010' exec-time=1040ms queue-time=0ms rc=0 (ok)
+ (62) monitor: interval=8000ms last-rc-change='Fri Nov 26 09:17:44
2010' last-run='Fri Nov 26 09:17:44 2010' exec-time=260ms queue-time=0ms
rc=0 (ok)
Would this affect the result of the 'non-recurring monitor
operation' (the probe operations having rc=1)
I am not 100% sure why the errors are there - the log on the server for
that day shows:
----
Nov 26 09:17:39 sbc-tpna2-06 crmd: [29893]: info: do_lrm_rsc_op:
Performing key=36:2184:7:c83a06e0-913e-4546-92e5-19f784dcaf5c
op=megaswitch:5_monitor_0 )
Nov 26 09:17:39 sbc-tpna2-06 lrmd: [29890]: info: rsc:megaswitch:5:53:
probe
Nov 26 09:17:39 sbc-tpna2-06 lrmd: [29890]: WARN: Managed
megaswitch:5:monitor process 24823 exited with return code 1.
Nov 26 09:17:39 sbc-tpna2-06 lrmd: [29890]: WARN: Managed
megaswitch:5:monitor process 24823 exited with return code 1.
Nov 26 09:17:39 sbc-tpna2-06 crmd: [29893]: info: process_lrm_event: LRM
operation megaswitch:5_monitor_0 (call=53, rc=1, cib-update=68,
confirmed=true) unknown error
----
If they are affecting it, how would I clear them, so pacemaker sees
everything as OK?
Thanks for the help
Chris
More information about the Pacemaker
mailing list