[Pacemaker] monitoring action fails
Raoul Bhatia [IPAX]
r.bhatia at ipax.at
Wed Nov 12 12:19:21 UTC 2008
hi,
i have a cluster with several resources.
i issued crm_resource -P and now have got the cluster in some strange
state, which it cannot resolve by itself:
> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby
...
> Master/Slave Set: ms_drbd_www
> drbd_www:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
> drbd_www:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
...
> Master/Slave Set: ms_drbd_mysql
> drbd_mysql:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
> drbd_mysql:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
failed actions:
> Failed actions:
> drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete
> drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete
> drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete
> drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete
those monitoring failures repeat continouesly. in the logfiles i find:
...
> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:0_monitor_0, magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort priority upgraded from 0 to 1
> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action done superceeded by restart
> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4)
> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:1_monitor_0, magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4)
...
i put some debug information into the drbd ocf ra:
> #!/bin/sh
> echo "----" >> /tmp/lalala
but /tmp/lalala stays emtpy. if i manually call the drbd ra with
all parameters i get the expected rc 8.
hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz
(its kinda big as a lot of actions failed)
cheers,
raoul
ps: i allready tried to revoke the crm_standby, but this does not
resolve the error messages and does not call the drbd ocf ra.
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
Technischer Leiter
IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office at ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
More information about the Pacemaker
mailing list