[Pacemaker] monitoring action fails

Raoul Bhatia [IPAX] r.bhatia at ipax.at
Wed Nov 12 12:19:21 UTC 2008


hi,

i have a cluster with several resources.

i issued crm_resource -P and now have got the cluster in some strange
state, which it cannot resolve by itself:

> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby
...
> Master/Slave Set: ms_drbd_www
>     drbd_www:0  (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>     drbd_www:1  (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
...
> Master/Slave Set: ms_drbd_mysql
>     drbd_mysql:0        (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>     drbd_mysql:1        (ocf::heartbeat:drbd) Master [  wc01    wc02 ]

failed actions:
> Failed actions:
>     drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete
>     drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete
>     drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete
>     drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete

those monitoring failures repeat continouesly. in the logfiles i find:
...
> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:0_monitor_0, magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort priority upgraded from 0 to 1
> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action done superceeded by restart
> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4)
> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:1_monitor_0, magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4)
...

i put some debug information into the drbd ocf ra:
> #!/bin/sh
> echo "----" >> /tmp/lalala

but /tmp/lalala stays emtpy. if i manually call the drbd ra with
all parameters i get the expected rc 8.

hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz
(its kinda big as a lot of actions failed)

cheers,
raoul

ps: i allready tried to revoke the crm_standby, but this does not
resolve the error messages and does not call the drbd ocf ra.
-- 
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia at ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            office at ipax.at
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________




More information about the Pacemaker mailing list