[Pacemaker] warning log is outputted after pacemaker stopped

Fri Mar 7 03:03:07 UTC 2014

On 25 Feb 2014, at 7:23 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:

> 2014-02-24 11:09 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>> 
>> On 24 Feb 2014, at 12:59 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>> 
>>> 
>>> On 21 Feb 2014, at 9:36 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> WARNING of the following is outputted after pacemaker stopped in
>>>> Pacemaker-1.1.11.
>>>> 
>>>> Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not
>>>> update default_ping_set = 100: rc=141
>>>> 
>>>> 
>>>> This is because pacemaker does not wait for completion of 'monitor' of
>>>> the ping resource.
>>>> 
>>>> Feb 21 18:22:52 bl460g1n6 lrmd[9100]:    debug:
>>>> recurring_action_timer: Scheduling another invokation of
>>>> prmPing_monitor_30000
>>>> Feb 21 18:22:52 bl460g1n6 pacemakerd[9094]:     info:
>>>> crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>>> (snip)
>>>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:   notice: process_lrm_event: LRM
>>>> operation prmPing_stop_0 (call=9, rc=0, cib-update=29, confirmed=true)
>>>> ok
>>>> (snip)
>>>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info:
>>>> services_action_cancel: Cancelling op: prmPing_monitor_30000 will
>>>> occur once operation completes
>>>> (snip)
>>>> Feb 21 18:22:54 bl460g1n6 pacemakerd[9094]:     info: main: Exiting pacemakerd
>>>> (snip)
>>>> Feb 21 18:22:55 bl460g1n6 corosync[9083]:  [MAIN  ] Corosync Cluster
>>>> Engine exiting normally
>>>> Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not
>>>> update default_ping_set = 100: rc=141
>>>> 
>>>> 
>>>> As with pacemaker-1.0, I think that it's better to perform a 'stop'
>>>> after completion of 'monitor'.
>>> 
>>> Absolutely.  If it is not then we have a problem.
>> 
>> Feb 21 18:22:20 bl460g1n6 crmd[9103]:   notice: te_rsc_command: Initiating action 6: monitor prmPing_monitor_30000 on bl460g1n6 (local)
>> Feb 21 18:22:20 bl460g1n6 crmd[9103]:     info: do_lrm_rsc_op: Performing key=6:1:0:82c5fc6f-611d-4ba9-8a4f-13155b12536e op=prmPing_monitor_30000
>> Feb 21 18:22:22 bl460g1n6 crmd[9103]:   notice: process_lrm_event: LRM operation prmPing_monitor_30000 (call=7, rc=0, cib-update=27, confirmed=false) ok
>> 
>> so far so good
>> 
>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:   notice: te_rsc_command: Initiating action 5: stop prmPing_stop_0 on bl460g1n6 (local)
>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: services_action_cancel: Cancelling op: prmPing_monitor_30000 will occur once operation completes
>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:     info: do_lrm_rsc_op: Performing key=5:2:0:82c5fc6f-611d-4ba9-8a4f-13155b12536e op=prmPing_stop_0
>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: log_execute: executing - rsc:prmPing action:stop call_id:9
>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: log_finished: finished - rsc:prmPing action:stop call_id:9 pid:9207 exit-code:0 exec-time:36ms queue-time:0ms
>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: services_action_cancel: Cancelling op: prmPing_monitor_30000 will occur once operation completes
>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:   notice: process_lrm_event: LRM operation prmPing_stop_0 (call=9, rc=0, cib-update=29, confirmed=true) ok
>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:     info: match_graph_event: Action prmPing_stop_0 (5) confirmed on bl460g1n6 (rc=0)
>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:     info: stop_recurring_actions: Cancelling op 7 for prmPing (prmPing:7)
>> 
>> David: I think we have the order wrong here... this should happen prior to the stop not after.
>> Also, does this mean we ran the stop action while the monitor was running?
>> 
>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: services_action_cancel: Cancelling op: prmPing_monitor_30000 will occur once operation completes
>> Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not update default_ping_set = 100: rc=141
>> 
>> 
>> Inoue-san: could we get a copy of this please?
>>   Feb 21 18:22:54 bl460g1n6 pengine[9102]:   notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-13.bz2
>> 
> 
> Since the file was lost, I reproduced the same event.

Thanks. I'm reasonably certain this is an lrmd bug, I've passed it on to David.

> 
>> I'd like to see if we knew the recurring operation was running
>> 
>> 
>>> 
>>>> (Who manages 'monitor' after pacemaker stopped?)
>>>> 
>>>> Regards,
>>>> Kazunori INOUE
>>>> <ha-log>_______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> <pcmk-Tue-25-Feb-2014.tar.bz2>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140307/a0197931/attachment-0003.sig>