[Pacemaker] warning log is outputted after pacemaker stopped

Tue Feb 25 03:23:57 EST 2014

2014-02-24 11:09 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>
> On 24 Feb 2014, at 12:59 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>
>> On 21 Feb 2014, at 9:36 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> WARNING of the following is outputted after pacemaker stopped in
>>> Pacemaker-1.1.11.
>>>
>>> Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not
>>> update default_ping_set = 100: rc=141
>>>
>>>
>>> This is because pacemaker does not wait for completion of 'monitor' of
>>> the ping resource.
>>>
>>> Feb 21 18:22:52 bl460g1n6 lrmd[9100]:    debug:
>>> recurring_action_timer: Scheduling another invokation of
>>> prmPing_monitor_30000
>>> Feb 21 18:22:52 bl460g1n6 pacemakerd[9094]:     info:
>>> crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> (snip)
>>> Feb 21 18:22:54 bl460g1n6 crmd[9103]:   notice: process_lrm_event: LRM
>>> operation prmPing_stop_0 (call=9, rc=0, cib-update=29, confirmed=true)
>>> ok
>>> (snip)
>>> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info:
>>> services_action_cancel: Cancelling op: prmPing_monitor_30000 will
>>> occur once operation completes
>>> (snip)
>>> Feb 21 18:22:54 bl460g1n6 pacemakerd[9094]:     info: main: Exiting pacemakerd
>>> (snip)
>>> Feb 21 18:22:55 bl460g1n6 corosync[9083]:  [MAIN  ] Corosync Cluster
>>> Engine exiting normally
>>> Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not
>>> update default_ping_set = 100: rc=141
>>>
>>>
>>> As with pacemaker-1.0, I think that it's better to perform a 'stop'
>>> after completion of 'monitor'.
>>
>> Absolutely.  If it is not then we have a problem.
>
> Feb 21 18:22:20 bl460g1n6 crmd[9103]:   notice: te_rsc_command: Initiating action 6: monitor prmPing_monitor_30000 on bl460g1n6 (local)
> Feb 21 18:22:20 bl460g1n6 crmd[9103]:     info: do_lrm_rsc_op: Performing key=6:1:0:82c5fc6f-611d-4ba9-8a4f-13155b12536e op=prmPing_monitor_30000
> Feb 21 18:22:22 bl460g1n6 crmd[9103]:   notice: process_lrm_event: LRM operation prmPing_monitor_30000 (call=7, rc=0, cib-update=27, confirmed=false) ok
>
> so far so good
>
> Feb 21 18:22:54 bl460g1n6 crmd[9103]:   notice: te_rsc_command: Initiating action 5: stop prmPing_stop_0 on bl460g1n6 (local)
> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: services_action_cancel: Cancelling op: prmPing_monitor_30000 will occur once operation completes
> Feb 21 18:22:54 bl460g1n6 crmd[9103]:     info: do_lrm_rsc_op: Performing key=5:2:0:82c5fc6f-611d-4ba9-8a4f-13155b12536e op=prmPing_stop_0
> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: log_execute: executing - rsc:prmPing action:stop call_id:9
> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: log_finished: finished - rsc:prmPing action:stop call_id:9 pid:9207 exit-code:0 exec-time:36ms queue-time:0ms
> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: services_action_cancel: Cancelling op: prmPing_monitor_30000 will occur once operation completes
> Feb 21 18:22:54 bl460g1n6 crmd[9103]:   notice: process_lrm_event: LRM operation prmPing_stop_0 (call=9, rc=0, cib-update=29, confirmed=true) ok
> Feb 21 18:22:54 bl460g1n6 crmd[9103]:     info: match_graph_event: Action prmPing_stop_0 (5) confirmed on bl460g1n6 (rc=0)
> Feb 21 18:22:54 bl460g1n6 crmd[9103]:     info: stop_recurring_actions: Cancelling op 7 for prmPing (prmPing:7)
>
> David: I think we have the order wrong here... this should happen prior to the stop not after.
> Also, does this mean we ran the stop action while the monitor was running?
>
> Feb 21 18:22:54 bl460g1n6 lrmd[9100]:     info: services_action_cancel: Cancelling op: prmPing_monitor_30000 will occur once operation completes
> Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not update default_ping_set = 100: rc=141
>
>
> Inoue-san: could we get a copy of this please?
>    Feb 21 18:22:54 bl460g1n6 pengine[9102]:   notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-13.bz2
>

Since the file was lost, I reproduced the same event.

> I'd like to see if we knew the recurring operation was running
>
>
>>
>>> (Who manages 'monitor' after pacemaker stopped?)
>>>
>>> Regards,
>>> Kazunori INOUE
>>> <ha-log>_______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pcmk-Tue-25-Feb-2014.tar.bz2
Type: application/x-bzip2
Size: 60040 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140225/fd06e66c/attachment-0003.bz2>