[Pacemaker] monitoring action fails
Andrew Beekhof
beekhof at gmail.com
Wed Nov 19 13:25:06 UTC 2008
My suspicion here is that the RA is messing up the monitoring action.
I'd suggest trying with just one of the drbd clones and see if that works.
On Wed, Nov 12, 2008 at 13:19, Raoul Bhatia [IPAX] <r.bhatia at ipax.at> wrote:
> hi,
>
> i have a cluster with several resources.
>
> i issued crm_resource -P and now have got the cluster in some strange
> state, which it cannot resolve by itself:
>
>> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
>> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby
> ...
>> Master/Slave Set: ms_drbd_www
>> drbd_www:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
>> drbd_www:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
> ...
>> Master/Slave Set: ms_drbd_mysql
>> drbd_mysql:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
>> drbd_mysql:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ]
>
> failed actions:
>> Failed actions:
>> drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete
>> drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete
>> drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete
>> drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete
>
> those monitoring failures repeat continouesly. in the logfiles i find:
> ...
>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:0_monitor_0, magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort priority upgraded from 0 to 1
>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action done superceeded by restart
>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4)
>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:1_monitor_0, magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4)
> ...
>
> i put some debug information into the drbd ocf ra:
>> #!/bin/sh
>> echo "----" >> /tmp/lalala
>
> but /tmp/lalala stays emtpy. if i manually call the drbd ra with
> all parameters i get the expected rc 8.
>
> hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz
> (its kinda big as a lot of actions failed)
>
> cheers,
> raoul
>
> ps: i allready tried to revoke the crm_standby, but this does not
> resolve the error messages and does not call the drbd ocf ra.
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. office at ipax.at
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
More information about the Pacemaker
mailing list