[Pacemaker] another "node rebooting too quickly" bug?
Andrew Beekhof
andrew at beekhof.net
Mon Apr 28 07:01:22 UTC 2014
On 25 Apr 2014, at 5:39 am, Brian J. Murrell <brian at interlinx.bc.ca> wrote:
> Hi,
>
> As was previously discussed there is a bug in the handling of a STONITH
> if a node reboots too quickly. I had a different kind of failure that I
> suspect is the same kind of problem, just different symptom.
>
> The situation is a two node cluster with two resources plus a fencing
> resource. Each node is running one of the two resources and one is
> running the fencing resource. One of the nodes is killed. The
> remaining node should notice this, STONITH it and take over the
> resource, but what happens is all of the above except the resource is
> restarted on the node that was killed.
>
> Here's the log from the surviving node -- the one that issues the
> STONITH and *should* take over the resource:
>
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 20: memb=1, new=0, lost=1
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: memb: node2 2085752330
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: lost: node1 2068975114
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, lost=0
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: MEMB: node2 2085752330
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: ais_mark_unseen_peer_dead: Node node1 was not seen in the previous transition
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: update_member: Node 2068975114/node1 is now: lost
> Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: send_member_notification: Sending membership update 20 to 2 children
> Apr 22 19:47:26 node2 corosync[3204]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
>
> node1 is gone
>
> Apr 22 19:47:26 node2 corosync[3204]: [CPG ] chosen downlist: sender r(0) ip(10.14.82.124) r(1) ip(10.128.2.124) ; members(old:2 left:1)
> Apr 22 19:47:26 node2 corosync[3204]: [MAIN ] Completed service synchronization, ready to provide service.
> Apr 22 19:47:26 node2 crmd[3255]: notice: plugin_handle_membership: Membership 20: quorum lost
> Apr 22 19:47:26 node2 crmd[3255]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now lost (was member)
> Apr 22 19:47:26 node2 crmd[3255]: warning: match_down_event: No match for shutdown action on node1
> Apr 22 19:47:26 node2 crmd[3255]: notice: peer_update_callback: Stonith/shutdown of node1 not matched
>
> Is this "Stonith/shutdown of node1 not matched" because the fencing
> resource was running on node1 but now it's gone?
No thats pacemaker's obtuse way of saying "node1 went away when we weren't expecting it to"
>
> Apr 22 19:47:26 node2 cib[3250]: notice: plugin_handle_membership: Membership 20: quorum lost
> Apr 22 19:47:26 node2 cib[3250]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now lost (was member)
> Apr 22 19:47:26 node2 crmd[3255]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> Apr 22 19:47:26 node2 attrd[3253]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Apr 22 19:47:26 node2 attrd[3253]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Apr 22 19:47:26 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:47:26 node2 pengine[3254]: warning: pe_fence_node: Node node1 will be fenced because the node is no longer part of the cluster
> Apr 22 19:47:26 node2 pengine[3254]: warning: determine_online_status: Node node1 is unclean
> Apr 22 19:47:26 node2 pengine[3254]: warning: custom_action: Action st-fencing_stop_0 on node1 is unrunnable (offline)
> Apr 22 19:47:26 node2 pengine[3254]: warning: custom_action: Action resource1_stop_0 on node1 is unrunnable (offline)
>
> Resources from node1 unrunnable because it's AWOL!
Right, probably shouldn't be a warning.
>
> Apr 22 19:47:26 node2 pengine[3254]: warning: stage6: Scheduling Node node1 for STONITH
>
> So kill it!
>
> Apr 22 19:47:26 node2 pengine[3254]: notice: LogActions: Move st-fencing#011(Started node1 -> node2)
> Apr 22 19:47:26 node2 pengine[3254]: notice: LogActions: Move resource1#011(Started node1 -> node2)
>
> So here are the resources that were on node1 being scheduled to move to
> node2.
>
> Apr 22 19:47:26 node2 crmd[3255]: notice: te_fence_node: Executing reboot fencing operation (13) on node1 (timeout=60000)
> Apr 22 19:47:26 node2 stonith-ng[3251]: notice: handle_request: Client crmd.3255.0a3d203a wants to fence (reboot) 'node1' with device '(any)'
> Apr 22 19:47:26 node2 stonith-ng[3251]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for node1: 7660cf0e-10da-4e96-b3bd-4b415c72201c (0)
>
> The actual killing of node1.
>
> Apr 22 19:47:26 node2 pengine[3254]: warning: process_pe_message: Calculated Transition 23: /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0, lost=0
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: memb: node2 2085752330
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: update_member: Node 2068975114/node1 is now: member
Ah! You're running the custom plugin that I don't test anymore.
Please don't do that
>
> And this is where node1 comes back.
>
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: NEW: node1 2068975114
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: MEMB: node1 2068975114
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: MEMB: node2 2085752330
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: send_member_notification: Sending membership update 24 to 2 children
> Apr 22 19:48:22 node2 corosync[3204]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: update_member: 0x170c920 Node 2068975114 (node1) born on: 24
> Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: send_member_notification: Sending membership update 24 to 2 children
> Apr 22 19:48:22 node2 cib[3250]: notice: plugin_handle_membership: Membership 24: quorum acquired
> Apr 22 19:48:22 node2 cib[3250]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now member (was lost)
>
> And the CIB sees it.
>
> Apr 22 19:48:22 node2 crmd[3255]: notice: plugin_handle_membership: Membership 24: quorum acquired
> Apr 22 19:48:22 node2 crmd[3255]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now member (was lost)
> Apr 22 19:48:22 node2 corosync[3204]: [CPG ] chosen downlist: sender r(0) ip(10.14.82.124) r(1) ip(10.128.2.124) ; members(old:1 left:0)
> Apr 22 19:48:22 node2 corosync[3204]: [MAIN ] Completed service synchronization, ready to provide service.
> Apr 22 19:48:43 node2 corosync[3204]: [TOTEM ] A processor failed, forming new configuration.
>
> And it goes AWOL again? It shouldn't have.
Thats corosync failing, nothing to do with pacemaker
>
> Apr 22 19:48:44 node2 stonith-ng[3251]: notice: log_operation: Operation 'reboot' [4110] (call 2 from crmd.3255) for host 'node1' with device 'st-fencing' returned: 0 (OK)
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 28: memb=1, new=0, lost=1
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: memb: node2 2085752330
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: lost: node1 2068975114
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 28: memb=1, new=0, lost=0
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: MEMB: node2 2085752330
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: ais_mark_unseen_peer_dead: Node node1 was not seen in the previous transition
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: update_member: Node 2068975114/node1 is now: lost
> Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: send_member_notification: Sending membership update 28 to 2 children
> Apr 22 19:48:49 node2 corosync[3204]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Apr 22 19:48:49 node2 crmd[3255]: notice: plugin_handle_membership: Membership 28: quorum lost
> Apr 22 19:48:49 node2 crmd[3255]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now lost (was member)
> Apr 22 19:48:49 node2 cib[3250]: notice: plugin_handle_membership: Membership 28: quorum lost
> Apr 22 19:48:49 node2 cib[3250]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now lost (was member)
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 32: memb=1, new=0, lost=0
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: memb: node2 2085752330
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 32: memb=2, new=1, lost=0
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: update_member: Node 2068975114/node1 is now: member
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: NEW: node1 2068975114
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: MEMB: node1 2068975114
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: MEMB: node2 2085752330
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: send_member_notification: Sending membership update 32 to 2 children
> Apr 22 19:49:26 node2 corosync[3204]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: update_member: 0x170c920 Node 2068975114 (node1) born on: 32
> Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: send_member_notification: Sending membership update 32 to 2 children
> Apr 22 19:49:26 node2 crmd[3255]: notice: plugin_handle_membership: Membership 32: quorum acquired
> Apr 22 19:49:26 node2 crmd[3255]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now member (was lost)
> Apr 22 19:49:26 node2 cib[3250]: notice: plugin_handle_membership: Membership 32: quorum acquired
> Apr 22 19:49:26 node2 cib[3250]: notice: crm_update_peer_state: plugin_handle_membership: Node node1[2068975114] - state is now member (was lost)
>
> And it's back again?
>
> Apr 22 19:49:26 node2 corosync[3204]: [CPG ] chosen downlist: sender r(0) ip(10.14.82.124) r(1) ip(10.128.2.124) ; members(old:2 left:1)
> Apr 22 19:49:26 node2 corosync[3204]: [MAIN ] Completed service synchronization, ready to provide service.
> Apr 22 19:49:26 node2 stonith-ng[3251]: notice: remote_op_done: Operation reboot of node1 by node2 for crmd.3255 at node2.7660cf0e: OK
> Apr 22 19:49:26 node2 crmd[3255]: notice: tengine_stonith_callback: Stonith operation 2/13:23:0:e63ab1bb-b839-48d5-9a4d-b27677d3787f: OK (0)
> Apr 22 19:49:26 node2 crmd[3255]: notice: tengine_stonith_notify: Peer node1 was terminated (reboot) by node2 for node2: OK (ref=7660cf0e-10da-4e96-b3bd-4b415c72201c) by client crmd.3255
> Apr 22 19:49:26 node2 crmd[3255]: notice: run_graph: Transition 23 (Complete=1, Pending=0, Fired=0, Skipped=7, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-0.bz2): Stopped
> Apr 22 19:49:33 node2 attrd[3253]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Apr 22 19:49:33 node2 attrd[3253]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Apr 22 19:49:34 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:49:34 node2 pengine[3254]: notice: LogActions: Start st-fencing#011(node1)
> Apr 22 19:49:34 node2 pengine[3254]: notice: LogActions: Start resource1#011(node1)
>
> In any case, here we can see that the resource1 and the fencing resource
> were started up on node1 again.
>
> Why would these start back up on a node that has clearly failed and not
> started on this node, node2?
>
> Apr 22 19:49:34 node2 crmd[3255]: notice: te_rsc_command: Initiating action 5: monitor st-fencing_monitor_0 on node1
> Apr 22 19:49:34 node2 crmd[3255]: notice: te_rsc_command: Initiating action 6: monitor resource1_monitor_0 on node1
> Apr 22 19:49:34 node2 crmd[3255]: notice: te_rsc_command: Initiating action 7: monitor resource2_monitor_0 on node1
> Apr 22 19:49:34 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 24: /var/lib/pacemaker/pengine/pe-input-95.bz2
> Apr 22 19:49:40 node2 crmd[3255]: notice: te_rsc_command: Initiating action 4: probe_complete probe_complete on node1 - no waiting
> Apr 22 19:49:40 node2 crmd[3255]: notice: te_rsc_command: Initiating action 9: start st-fencing_start_0 on node1
> Apr 22 19:49:40 node2 crmd[3255]: notice: te_rsc_command: Initiating action 10: start resource1_start_0 on node1
> Apr 22 19:50:24 node2 crmd[3255]: notice: te_rsc_command: Initiating action 11: monitor resource1_monitor_5000 on node1
>
> And here is the cluster configuration messages so that you can see how
> resource1 and resource2 were added and what constraints were set on
> them, etc.:
>
> Apr 22 19:45:46 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only Change: 0.23.1
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" epoch="22" num_updates="1"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <primitive class="ocf" provider="foo" type="Target" id="resource1">
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <meta_attributes id="resource1-meta_attributes">
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <nvpair name="target-role" id="resource1-meta_attributes-target-role" value="Stopped"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ </meta_attributes>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <operations id="resource1-operations">
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <op id="resource1-monitor-5" interval="5" name="monitor" timeout="60"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <op id="resource1-start-0" interval="0" name="start" timeout="300"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <op id="resource1-stop-0" interval="0" name="stop" timeout="300"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ </operations>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <instance_attributes id="resource1-instance_attributes">
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <nvpair id="resource1-instance_attributes-target" name="target" value="bddf8d7c-5b5c-4c24-9e55-4704dd336a8b"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ </instance_attributes>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ </primitive>
> Apr 22 19:45:46 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action 6: monitor resource1_monitor_0 on node2 (local)
> Apr 22 19:45:46 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 15: /var/lib/pacemaker/pengine/pe-input-87.bz2
> Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action 4: monitor resource1_monitor_0 on node1
> Apr 22 19:45:46 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only Change: 0.24.1
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" epoch="23" num_updates="1"/>
> Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <rsc_location id="resource1-primary" node="node1" rsc="resource1" score="20"/>
> Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on node1 - no waiting
> Apr 22 19:45:46 node2 crmd[3255]: notice: process_lrm_event: LRM operation resource1_monitor_0 (call=10, rc=7, cib-update=58, confirmed=true) not running
> Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action 5: probe_complete probe_complete on node2 (local) - no waiting
> Apr 22 19:45:46 node2 crmd[3255]: notice: run_graph: Transition 15 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Stopped
> Apr 22 19:45:46 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:46 node2 crmd[3255]: notice: run_graph: Transition 16 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-88.bz2): Complete
> Apr 22 19:45:46 node2 crmd[3255]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr 22 19:45:46 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 16: /var/lib/pacemaker/pengine/pe-input-88.bz2
> Apr 22 19:45:47 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:47 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:47 node2 cibadmin[3520]: notice: crm_log_args: Invoked: cibadmin -o constraints -C -X <rsc_location id="resource1-secondary" node="node2" rsc="resource1" score="10"/>
> Apr 22 19:45:47 node2 crmd[3255]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: Diff: --- 0.24.3
> Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: Diff: +++ 0.25.1 1e5f079657dde6f6085a149c0cc5398b
> Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" epoch="24" num_updates="3"/>
> Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: ++ <rsc_location id="resource1-secondary" node="node2" rsc="resource1" score="10"/>
> Apr 22 19:45:47 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:47 node2 crmd[3255]: notice: run_graph: Transition 17 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-89.bz2): Complete
> Apr 22 19:45:47 node2 crmd[3255]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr 22 19:45:47 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 17: /var/lib/pacemaker/pengine/pe-input-89.bz2
> Apr 22 19:45:48 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:48 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:48 node2 crmd[3255]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Apr 22 19:45:48 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only Change: 0.26.1
> Apr 22 19:45:48 node2 cib[3250]: notice: cib:diff: -- <nvpair value="Stopped" id="resource1-meta_attributes-target-role"/>
> Apr 22 19:45:48 node2 cib[3250]: notice: cib:diff: ++ <nvpair name="target-role" id="resource1-meta_attributes-target-role" value="Started"/>
> Apr 22 19:45:48 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:48 node2 pengine[3254]: notice: LogActions: Start resource1#011(node1)
> Apr 22 19:45:48 node2 crmd[3255]: notice: te_rsc_command: Initiating action 7: start resource1_start_0 on node1
> Apr 22 19:45:48 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 18: /var/lib/pacemaker/pengine/pe-input-90.bz2
> Apr 22 19:45:49 node2 crmd[3255]: notice: te_rsc_command: Initiating action 8: monitor resource1_monitor_5000 on node1
> Apr 22 19:45:49 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:49 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:49 node2 crmd[3255]: notice: run_graph: Transition 18 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-90.bz2): Complete
> Apr 22 19:45:49 node2 crmd[3255]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr 22 19:45:50 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:53 node2 cibadmin[3597]: notice: crm_log_args: Invoked: cibadmin -o resources -C -x /tmp/tmpoSwAoH
> Apr 22 19:45:53 node2 crmd[3255]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Apr 22 19:45:53 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: Diff: --- 0.26.3
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: Diff: +++ 0.27.1 c0a07b74061c440082677bacfd9eff2a
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" epoch="26" num_updates="3"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <primitive class="ocf" provider="foo" type="Target" id="resource2">
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <meta_attributes id="resource2-meta_attributes">
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <nvpair name="target-role" id="resource2-meta_attributes-target-role" value="Stopped"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ </meta_attributes>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <operations id="resource2-operations">
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <op id="resource2-monitor-5" interval="5" name="monitor" timeout="60"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <op id="resource2-start-0" interval="0" name="start" timeout="300"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <op id="resource2-stop-0" interval="0" name="stop" timeout="300"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ </operations>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <instance_attributes id="resource2-instance_attributes">
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <nvpair id="resource2-instance_attributes-target" name="target" value="38716497-b4a2-4ad7-8307-d9829857c7fc"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ </instance_attributes>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ </primitive>
> Apr 22 19:45:53 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:53 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 19: /var/lib/pacemaker/pengine/pe-input-91.bz2
> Apr 22 19:45:53 node2 crmd[3255]: notice: te_rsc_command: Initiating action 7: monitor resource2_monitor_0 on node2 (local)
> Apr 22 19:45:53 node2 crmd[3255]: notice: te_rsc_command: Initiating action 5: monitor resource2_monitor_0 on node1
> Apr 22 19:45:53 node2 cibadmin[3601]: notice: crm_log_args: Invoked: cibadmin -o constraints -C -X <rsc_location id="resource2-primary" node="node2" rsc="resource2" score="20"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only Change: 0.28.1
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" epoch="27" num_updates="1"/>
> Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <rsc_location id="resource2-primary" node="node2" rsc="resource2" score="20"/>
> Apr 22 19:45:54 node2 crmd[3255]: notice: process_lrm_event: LRM operation resource2_monitor_0 (call=15, rc=7, cib-update=63, confirmed=true) not running
> Apr 22 19:45:54 node2 crmd[3255]: notice: te_rsc_command: Initiating action 6: probe_complete probe_complete on node2 (local) - no waiting
> Apr 22 19:45:54 node2 crmd[3255]: notice: te_rsc_command: Initiating action 4: probe_complete probe_complete on node1 - no waiting
> Apr 22 19:45:54 node2 crmd[3255]: notice: run_graph: Transition 19 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-91.bz2): Stopped
> Apr 22 19:45:54 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:54 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 20: /var/lib/pacemaker/pengine/pe-input-92.bz2
> Apr 22 19:45:54 node2 crmd[3255]: notice: run_graph: Transition 20 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-92.bz2): Complete
> Apr 22 19:45:54 node2 crmd[3255]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr 22 19:45:54 node2 crmd[3255]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: Diff: --- 0.28.3
> Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: Diff: +++ 0.29.1 a5aa96d3ac5932d62dd542c74956adaa
> Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" epoch="28" num_updates="3"/>
> Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: ++ <rsc_location id="resource2-secondary" node="node1" rsc="resource2" score="10"/>
> Apr 22 19:45:54 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:54 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 21: /var/lib/pacemaker/pengine/pe-input-93.bz2
> Apr 22 19:45:54 node2 crmd[3255]: notice: run_graph: Transition 21 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-93.bz2): Complete
> Apr 22 19:45:54 node2 crmd[3255]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr 22 19:45:54 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:54 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:55 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:55 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:55 node2 crmd[3255]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Apr 22 19:45:55 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only Change: 0.30.1
> Apr 22 19:45:55 node2 cib[3250]: notice: cib:diff: -- <nvpair value="Stopped" id="resource2-meta_attributes-target-role"/>
> Apr 22 19:45:55 node2 cib[3250]: notice: cib:diff: ++ <nvpair name="target-role" id="resource2-meta_attributes-target-role" value="Started"/>
> Apr 22 19:45:55 node2 pengine[3254]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:55 node2 pengine[3254]: notice: LogActions: Start resource2#011(node2)
> Apr 22 19:45:55 node2 crmd[3255]: notice: te_rsc_command: Initiating action 10: start resource2_start_0 on node2 (local)
> Apr 22 19:45:55 node2 pengine[3254]: notice: process_pe_message: Calculated Transition 22: /var/lib/pacemaker/pengine/pe-input-94.bz2
> Apr 22 19:45:56 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
> Apr 22 19:45:56 node2 stonith-ng[3251]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Apr 22 19:45:56 node2 crmd[3255]: notice: process_lrm_event: LRM operation resource2_start_0 (call=18, rc=0, cib-update=67, confirmed=true) ok
> Apr 22 19:45:56 node2 crmd[3255]: notice: te_rsc_command: Initiating action 11: monitor resource2_monitor_5000 on node2 (local)
> Apr 22 19:45:57 node2 crmd[3255]: notice: process_lrm_event: LRM operation resource2_monitor_5000 (call=21, rc=0, cib-update=68, confirmed=false) ok
> Apr 22 19:45:57 node2 crmd[3255]: notice: run_graph: Transition 22 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-94.bz2): Complete
> Apr 22 19:45:57 node2 crmd[3255]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Apr 22 19:45:57 node2 stonith-ng[3251]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
>
>
> Cheers,
> b.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140428/7b340868/attachment-0009.sig>
More information about the Pacemaker
mailing list