[Pacemaker] stonith in pacemaker clarification

Pentarh Udi pentarh at gmail.com
Fri Mar 4 13:33:29 EST 2011


This is the log. I called "crm node fence node4" and this restarted node4,
then shut down node3, then lost quorum and information about all running
resources (crm status printed no resources configures while there are 6
groups configured).

-------------------------

Mar 04 12:36:40 node1 crmd: [2717]: info: abort_transition_graph:
te_update_diff:146 - Triggered transition abort (complete=1,
tag=transient_attributes, id=node4, magic=NA, cib=0.633.101) : Transient
attribute: update
Mar 04 12:36:40 node1 crmd: [2717]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Mar 04 12:36:40 node1 crmd: [2717]: info: do_state_transition: All 4 cluster
nodes are eligible to run resources.
Mar 04 12:36:40 node1 crmd: [2717]: info: do_pe_invoke: Query 1938:
Requesting the current CIB: S_POLICY_ENGINE
Mar 04 12:36:40 node1 crmd: [2717]: info: do_pe_invoke_callback: Invoking
the PE: query=1938, ref=pe_calc-dc-1299260200-1188, seq=400, quorate=1
Mar 04 12:36:40 node1 pengine: [2716]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Mar 04 12:36:40 node1 pengine: [2716]: info: unpack_config: Node scores:
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Mar 04 12:36:40 node1 pengine: [2716]: WARN: pe_fence_node: Node node4 will
be fenced because termination was requested
Mar 04 12:36:40 node1 pengine: [2716]: WARN: determine_online_status: Node
node4 is unclean
Mar 04 12:36:40 node1 pengine: [2716]: info: determine_online_status: Node
node2 is online
....
Mar 04 12:36:40 node1 pengine: [2716]: WARN: stage6: Scheduling Node node4
for STONITH
Mar 04 12:36:40 node1 pengine: [2716]: info: native_stop_constraints:
st-node3_stop_0 is implicit after node4 is fenced
....
Mar 04 12:36:40 node1 pengine: [2716]: notice: LogActions: Leave resource
st-node4     (Started node3)
Mar 04 12:36:40 node1 pengine: [2716]: notice: LogActions: Move resource
st-node3      (Started node4 -> node2)
Mar 04 12:36:40 node1 crmd: [2717]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 04 12:36:40 node1 crmd: [2717]: info: unpack_graph: Unpacked transition
471: 6 actions in 6 synapses
Mar 04 12:36:40 node1 crmd: [2717]: info: do_te_invoke: Processing graph 471
(ref=pe_calc-dc-1299260200-1188) derived from /var/lib/pengine/pe-warn-7.bz2
Mar 04 12:36:40 node1 crmd: [2717]: info: te_pseudo_action: Pseudo action 87
fired and confirmed
Mar 04 12:36:40 node1 crmd: [2717]: info: te_rsc_command: Initiating action
88: start st-node3_start_0 on node2
Mar 04 12:36:40 node1 pengine: [2716]: WARN: process_pe_message: Transition
471: WARNINGs found during PE processing. PEngine Input stored in:
/var/lib/pengine/pe-warn-7.bz2
Mar 04 12:36:40 node1 pengine: [2716]: info: process_pe_message:
Configuration WARNINGs found during PE processing.  Please run "crm_verify
-L" to identify issues.
Mar 04 12:36:40 node1 crmd: [2717]: info: match_graph_event: Action
st-node3_start_0 (88) confirmed on node2 (rc=0)
Mar 04 12:36:40 node1 crmd: [2717]: info: te_pseudo_action: Pseudo action 89
fired and confirmed
Mar 04 12:36:40 node1 crmd: [2717]: info: te_fence_node: Executing reboot
fencing operation (91) on node4 (timeout=60000)
Mar 04 12:36:40 node1 stonithd: [2712]: info: client tengine [pid: 2717]
requests a STONITHoperation RESET on node node4
...
Mar 04 12:36:40 node1 stonithd: [2712]: info: we can't manage node4,
broadcast request to other nodes
Mar 04 12:36:40 node1 stonithd: [2712]: info: Broadcasting the message
succeeded: require others to stonith node node4.
Mar 04 12:36:43 node1 cib: [2713]: info: ais_dispatch: Membership 404:
quorum retained
Mar 04 12:36:43 node1 cib: [2713]: info: crm_update_peer: Node node4:
id=67152064 state=lost (new) addr=r(0) ip(192.168.0.4)  votes=1 born=388
seen=400 proc=00000000000000000000000000013312
Mar 04 12:36:43 node1 crmd: [2717]: info: ais_dispatch: Membership 404:
quorum retained
Mar 04 12:36:43 node1 crmd: [2717]: info: ais_status_callback: status: node4
is now lost (was member)
Mar 04 12:36:43 node1 crmd: [2717]: info: crm_update_peer: Node node4:
id=67152064 state=lost (new) addr=r(0) ip(192.168.0.4)  votes=1 born=388
seen=400 proc=00000000000000000000000000013312
Mar 04 12:36:43 node1 crmd: [2717]: info: erase_node_from_join: Removed node
node4 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1
Mar 04 12:36:43 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/1939,
version=0.633.102): ok (rc=0)
Mar 04 12:36:43 node1 crmd: [2717]: info: crm_ais_dispatch: Setting expected
votes to 4
Mar 04 12:36:43 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/1942,
version=0.633.103): ok (rc=0)
Mar 04 12:36:46 node1 stonithd: [2712]: info: Succeeded to STONITH the node
node4: optype=RESET. whodoit: node3
Mar 04 12:36:46 node1 stonithd: [2712]: info: Node node3 fenced node node4:
result=SUCCEEDED.
...
Mar 04 12:36:46 node1 pengine: [2716]: info: process_pe_message: Transition
472: PEngine Input stored in: /var/lib/pengine/pe-input-3427.bz2
Mar 04 12:37:25 node1 crmd: [2717]: info: process_graph_event: Detected
action sites-httpd_monitor_30000 from a different transition: 466 vs. 472
Mar 04 12:37:25 node1 crmd: [2717]: info: abort_transition_graph:
process_graph_event:462 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=sites-httpd_monitor_30000,
magic=2:-2;50:466:0:56d43a96-f91d-4456-840c-8030f2ffd6ac, cib=0.633.107) :
Old event
Mar 04 12:37:25 node1 crmd: [2717]: WARN: update_failcount: Updating
failcount for sites-httpd on node2 after failed monitor: rc=-2
(update=value++, time=1299260245)
Mar 04 12:37:25 node1 crmd: [2717]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
...
Mar 04 12:37:25 node1 crmd: [2717]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 04 12:37:25 node1 crmd: [2717]: info: unpack_graph: Unpacked transition
473: 11 actions in 11 synapses
Mar 04 12:37:25 node1 crmd: [2717]: info: do_te_invoke: Processing graph 473
(ref=pe_calc-dc-1299260245-1192) derived from
/var/lib/pengine/pe-input-3428.bz2
Mar 04 12:37:25 node1 crmd: [2717]: info: te_pseudo_action: Pseudo action 31
fired and confirmed
Mar 04 12:37:25 node1 crmd: [2717]: info: te_rsc_command: Initiating action
27: stop sites-nginx_stop_0 on node2
Mar 04 12:37:26 node1 pengine: [2716]: info: process_pe_message: Transition
473: PEngine Input stored in: /var/lib/pengine/pe-input-3428.bz2
Mar 04 12:37:26 node1 crmd: [2717]: info: match_graph_event: Action
sites-nginx_stop_0 (27) confirmed on node2 (rc=0)
Mar 04 12:37:26 node1 crmd: [2717]: info: te_rsc_command: Initiating action
4: stop sites-httpd_stop_0 on node2
Mar 04 12:37:31 node1 crmd: [2717]: info: handle_shutdown_request: Creating
shutdown request for node3 (state=S_TRANSITION_ENGINE)
Mar 04 12:37:31 node1 cib: [2713]: info: cib_process_shutdown_req: Shutdown
REQ from node3
Mar 04 12:37:31 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_shutdown_req for section 'all' (origin=node3/node3/(null),
version=0.633.110): ok (rc=0)
Mar 04 12:37:31 node1 cib: [2713]: info: ais_dispatch: Membership 404:
quorum retained
Mar 04 12:37:31 node1 crmd: [2717]: info: ais_dispatch: Membership 404:
quorum retained
Mar 04 12:37:31 node1 cib: [2713]: info: crm_update_peer: Node node3:
id=50374848 state=member addr=r(0) ip(192.168.0.3)  votes=1 born=400
seen=404 proc=00000000000000000000000000000002 (new)
Mar 04 12:37:31 node1 crmd: [2717]: info: crm_update_peer: Node node3:
id=50374848 state=member addr=r(0) ip(192.168.0.3)  votes=1 born=400
seen=404 proc=00000000000000000000000000000002 (new)
Mar 04 12:37:31 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/1952,
version=0.633.111): ok (rc=0)
Mar 04 12:37:31 node1 cib: [2713]: notice: ais_dispatch: Membership 408:
quorum lost
Mar 04 12:37:31 node1 crmd: [2717]: info: crm_ais_dispatch: Setting expected
votes to 4
Mar 04 12:37:31 node1 cib: [2713]: info: crm_update_peer: Node node3:
id=50374848 state=lost (new) addr=r(0) ip(192.168.0.3)  votes=1 born=400
seen=404 proc=00000000000000000000000000000002
Mar 04 12:37:31 node1 crmd: [2717]: notice: ais_dispatch: Membership 408:
quorum lost
Mar 04 12:37:31 node1 crmd: [2717]: info: ais_status_callback: status: node3
is now lost (was member)
Mar 04 12:37:31 node1 crmd: [2717]: info: crm_update_peer: Node node3:
id=50374848 state=lost (new) addr=r(0) ip(192.168.0.3)  votes=1 born=400
seen=404 proc=00000000000000000000000000000002
Mar 04 12:37:31 node1 crmd: [2717]: info: erase_node_from_join: Removed node
node3 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1
Mar 04 12:37:31 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/1955,
version=0.633.112): ok (rc=0)
Mar 04 12:37:31 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/1956,
version=0.633.112): ok (rc=0)
Mar 04 12:37:31 node1 cib: [2713]: info: log_data_element: cib:diff: - <cib
have-quorum="1" admin_epoch="0" epoch="633" num_updates="113" />
Mar 04 12:37:31 node1 cib: [2713]: info: log_data_element: cib:diff: + <cib
have-quorum="0" admin_epoch="0" epoch="634" num_updates="1" />
Mar 04 12:37:31 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/1958,
version=0.634.1): ok (rc=0)
Mar 04 12:37:31 node1 crmd: [2717]: info: crm_ais_dispatch: Setting expected
votes to 4
Mar 04 12:37:31 node1 crmd: [2717]: WARN: match_down_event: No match for
shutdown action on node3
Mar 04 12:37:31 node1 crmd: [2717]: info: te_update_diff: Stonith/shutdown
of node3 not matched
Mar 04 12:37:31 node1 crmd: [2717]: info: abort_transition_graph:
te_update_diff:191 - Triggered transition abort (complete=0, tag=node_state,
id=node3, magic=NA, cib=0.633.113) : Node failure

-- 
Regards, Pentarh Udi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110304/5e0f9540/attachment-0003.html>


More information about the Pacemaker mailing list