[Pacemaker] monitoring action fails

Andrew Beekhof beekhof at gmail.com
Wed Nov 19 08:25:06 EST 2008


My suspicion here is that the RA is messing up the monitoring action.
I'd suggest trying with just one of the drbd clones and see if that works.

On Wed, Nov 12, 2008 at 13:19, Raoul Bhatia [IPAX] <r.bhatia at ipax.at> wrote:
> hi,
>
> i have a cluster with several resources.
>
> i issued crm_resource -P and now have got the cluster in some strange
> state, which it cannot resolve by itself:
>
>> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby
>> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby
> ...
>> Master/Slave Set: ms_drbd_www
>>     drbd_www:0  (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>>     drbd_www:1  (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
> ...
>> Master/Slave Set: ms_drbd_mysql
>>     drbd_mysql:0        (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>>     drbd_mysql:1        (ocf::heartbeat:drbd) Master [  wc01    wc02 ]
>
> failed actions:
>> Failed actions:
>>     drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete
>>     drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete
>>     drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete
>>     drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete
>
> those monitoring failures repeat continouesly. in the logfiles i find:
> ...
>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:0_monitor_0, magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort priority upgraded from 0 to 1
>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action done superceeded by restart
>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4)
>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error
>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=drbd_www:1_monitor_0, magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed
>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4)
> ...
>
> i put some debug information into the drbd ocf ra:
>> #!/bin/sh
>> echo "----" >> /tmp/lalala
>
> but /tmp/lalala stays emtpy. if i manually call the drbd ra with
> all parameters i get the expected rc 8.
>
> hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz
> (its kinda big as a lot of actions failed)
>
> cheers,
> raoul
>
> ps: i allready tried to revoke the crm_standby, but this does not
> resolve the error messages and does not call the drbd ocf ra.
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc.          email.          r.bhatia at ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            office at ipax.at
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> ____________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>




More information about the Pacemaker mailing list