[Pacemaker] failure of a monitor operation would be disappearedfrom crm_mon
Junko IKEDA
ikedaj at intellilink.co.jp
Tue Sep 9 01:34:35 UTC 2008
> because it is no longer in the current start/stop series for the
> resource.
>
> >
> > It might be an expected behavior for now,
>
> it is.
>
> >
> > it would be convenient if crm_mon can keep showing some past failures.
>
> it cant display them forever. they are not (and should not) be kept
> in the CIB forever as it would cause the CIB size to explode.
It's true that CIB will get larger if it keeps them...
I want to ask one more about this case;
(1) resource starts on DC
============
Last updated: Tue Sep 9 09:51:51 2008
Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
2 Nodes configured.
1 Resources configured.
============
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online
Full list of resources:
dummy (ocf::heartbeat:Dummy): Started node-b
Operations:
* Node node-b:
dummy:
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=0 (ok)
* Node node-a:
(2) resource do a failover from DC to non-DC
============
Last updated: Tue Sep 9 09:52:17 2008
Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
2 Nodes configured.
1 Resources configured.
============
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online
Full list of resources:
dummy (ocf::heartbeat:Dummy): Started node-a
Operations:
* Node node-b:
dummy: fail-count=1
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=7 (not running)
+ stop: rc=0 (ok)
* Node node-a:
dummy:
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=0 (ok)
Failed actions:
dummy_monitor_10000 (node=node-b, call=4, rc=7): complete
(3) stop non-DC node
============
Last updated: Tue Sep 9 09:52:45 2008
Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
2 Nodes configured.
1 Resources configured.
============
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): OFFLINE
Full list of resources:
dummy (ocf::heartbeat:Dummy): Stopped
Operations:
* Node node-b:
dummy: fail-count=1
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=7 (not running)
+ stop: rc=0 (ok)
* Node node-a:
dummy:
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=0 (ok)
+ stop: rc=0 (ok)
Failed actions:
dummy_monitor_10000 (node=node-b, call=4, rc=7): complete
It seems that DC can keep its failure history.
Does it mean dummy_monitor_10000 (call=4) is in the current start/stop
series?
Thanks,
Junko
More information about the Pacemaker
mailing list