[Pacemaker] Action from a different CRMD transition results in restarting services

Andrew Beekhof andrew at beekhof.net
Wed Dec 12 18:52:42 EST 2012


On Thu, Dec 13, 2012 at 6:31 AM, Latrous, Youssef
<YLatrous at broadviewnet.com> wrote:
> Hi,
>
>
>
> I run into the following issue and I couldn’t find what it really means:
>
>
>
>         Detected action msgbroker_monitor_10000 from a different transition:
> 16048 vs. 18014

18014 is where we're up to now, 16048 is the (old) one that scheduled
the recurring monitor operation.
I suspect you'll find the action failed earlier in the logs and thats
why it needed to be restarted.

Not the best log message though :(

>
>
>
> I can see that its impact is to stop/start a service but I’d like to
> understand it a bit more.
>
>
>
> Thank you in advance for any information.
>
>
>
>
>
> Logs about this issue:
>
>>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: process_graph_event: Detected
> action msgbroker_monitor_10000 from a different transition: 16048 vs. 18014
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph:
> process_graph_event:477 - Triggered transition abort (complete=1,
> tag=lrm_rsc_op, id=msgbroker_monitor_10000,
> magic=0:7;104:16048:0:5fb57f01-3397-45a8-905f-c48cecdc8692, cib=0.971.5) :
> Old event
>
> Dec  6 22:55:05 Node1 crmd: [5235]: WARN: update_failcount: Updating
> failcount for msgbroker on Node0 after failed monitor: rc=7 (update=value++,
> time=1354852505)
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: do_state_transition: All 2 cluster
> nodes are eligible to run resources.
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28069:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph:
> te_update_diff:142 - Triggered transition abort (complete=1, tag=nvpair,
> id=status-Node0-fail-count-msgbroker, magic=NA, cib=0.971.6) : Transient
> attribute: update
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28070:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: abort_transition_graph:
> te_update_diff:142 - Triggered transition abort (complete=1, tag=nvpair,
> id=status-Node0-last-failure-msgbroker, magic=NA, cib=0.971.7) : Transient
> attribute: update
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke: Query 28071:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Dec  6 22:55:05 Node1 attrd: [5232]: info: find_hash_entry: Creating hash
> entry for last-failure-msgbroker
>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: do_pe_invoke_callback: Invoking
> the PE: query=28071, ref=pe_calc-dc-1354852505-39407, seq=12, quorate=1
>
> Dec  6 22:55:05 Node1 pengine: [5233]: notice: unpack_config: On loss of CCM
> Quorum: Ignore
>
> Dec  6 22:55:05 Node1 pengine: [5233]: notice: unpack_rsc_op: Operation
> txpublisher_monitor_0 found resource txpublisher active on Node1
>
> Dec  6 22:55:05 Node1 pengine: [5233]: WARN: unpack_rsc_op: Processing
> failed op msgbroker_monitor_10000 on Node0: not running (7)
>
>>
> Dec  6 22:55:05 Node1 pengine: [5233]: notice: common_apply_stickiness:
> msgbroker can fail 999999 more times on Node0 before being forced off
>
>>
> Dec  6 22:55:05 Node1 pengine: [5233]: notice: RecurringOp:  Start recurring
> monitor (10s) for msgbroker on Node0
>
>>
> Dec  6 22:55:05 Node1 pengine: [5233]: notice: LogActions: Recover msgbroker
> (Started Node0)
>
>>
> Dec  6 22:55:05 Node1 crmd: [5235]: info: te_rsc_command: Initiating action
> 37: stop msgbroker_stop_0 on Node0
>
>
>
>
>
> Transition 18014 details:
>
>
>
> Dec  6 22:52:18 Node1 pengine: [5233]: notice: process_pe_message:
> Transition 18014: PEngine Input stored in:
> /var/lib/pengine/pe-input-3270.bz2
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: unpack_graph: Unpacked transition
> 18014: 0 actions in 0 synapses
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: do_te_invoke: Processing graph
> 18014 (ref=pe_calc-dc-1354852338-39406) derived from
> /var/lib/pengine/pe-input-3270.bz2
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: run_graph:
> ====================================================
>
> Dec  6 22:52:18 Node1 crmd: [5235]: notice: run_graph: Transition 18014
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-3270.bz2): Complete
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: te_graph_trigger: Transition 18014
> is now complete
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: notify_crmd: Transition 18014
> status: done - <null>
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> Dec  6 22:52:18 Node1 crmd: [5235]: info: do_state_transition: Starting
> PEngine Recheck Timer
>
>
>
>
>
> Youssef
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list