[Pacemaker] Beginner Fencing Help
imnotpc
imnotpc at rock3d.net
Mon Jun 6 14:02:24 UTC 2011
On Monday, June 06, 2011 03:11:24 Errol Neal wrote:
> On Fri, 06/03/2011 12:31 PM, imnotpc <imnotpc at rock3d.net> wrote:
> > I have a working 3 node cluster with a couple of resources defined. If I
> > shutdown a node crm_mon shows the cluster correctly identifies the node,
> > marks it as offline, and moves any resources on it. The fencing resource
> > (I've tried both ssh and meatware) also sees it as down and marks it
> > stopped. So far so good. I was expecting a console warning or a shutdown
> > attempt but nothing happens. I checked the logs and can see that stonith
> > sees the event but I don't see any actions taken. "crm_verify -L"
> > doesn't show any problems. What else should I do to
> > troubleshoot/configure this?
>
> You should probably begin by posting your config so we can have some
> additional context. What stonith devices do you have configured?
Right now I have meatware as the stonith device.
<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Mon Jun 6 08:45:09 2011"
crm_feature_set="3.0.5" dc-uuid="JeffDesk.LAN" epoch="17" have-quorum="1"
num_updates="81" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.5-1.fc15-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="3"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-
enabled" value="true"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="Server4.LAN" type="normal" uname="Server4.LAN"/>
<node id="JeffDesk.LAN" type="normal" uname="JeffDesk.LAN"/>
<node id="Server2.LAN" type="normal" uname="Server2.LAN"/>
</nodes>
<resources>
<primitive class="ocf" id="ClusterIP" provider="heartbeat"
type="IPaddr2">
<instance_attributes id="ClusterIP-instance_attributes">
<nvpair id="ClusterIP-instance_attributes-ip" name="ip"
value="192.168.0.200"/>
<nvpair id="ClusterIP-instance_attributes-cidr_netmask"
name="cidr_netmask" value="32"/>
</instance_attributes>
<operations>
<op id="ClusterIP-monitor-30s" interval="30s" name="monitor"/>
</operations>
</primitive>
<clone id="Fencing">
<primitive class="stonith" id="meatware-fence" type="meatware">
<instance_attributes id="meatware-fence-instance_attributes">
<nvpair id="meatware-fence-instance_attributes-hostlist"
name="hostlist" value="JeffDesk.LAN Server2.LAN Server4.LAN"/>
</instance_attributes>
</primitive>
</clone>
</resources>
<constraints/>
</configuration>
</cib>
When I shutdown a node I see this in the logs:
[...]
Jun 6 09:53:00 Server2 crmd: [2362]: info: handle_shutdown_request: Creating
shutdown request for Server4.LAN (state=S_IDLE)
Jun 6 09:53:00 Server2 crmd: [2362]: info: abort_transition_graph:
te_update_diff:149 - Triggered transition abort (complete=1, tag=nvpair,
id=status-Server4.LAN-shutdown, magic=NA, cib=0.17.208) : Transient attribute:
update
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: All 3 cluster
nodes are eligible to run resources.
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_pe_invoke: Query 84: Requesting
the current CIB: S_POLICY_ENGINE
Jun 6 09:53:00 Server2 pengine: [2361]: notice: native_print:
ClusterIP#011(ocf::heartbeat:IPaddr2):#011Started Server2.LAN
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_pe_invoke_callback: Invoking
the PE: query=84, ref=pe_calc-dc-1307368380-52, seq=252, quorate=1
Jun 6 09:53:00 Server2 pengine: [2361]: notice: clone_print: Clone Set:
Fencing [meatware-fence]
Jun 6 09:53:00 Server2 pengine: [2361]: notice: short_print: Started: [
Server2.LAN JeffDesk.LAN Server4.LAN ]
Jun 6 09:53:00 Server2 pengine: [2361]: notice: stage6: Scheduling Node
Server4.LAN for shutdown
Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave
ClusterIP#011(Started Server2.LAN)
Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave meatware-
fence:0#011(Started Server2.LAN)
Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave meatware-
fence:1#011(Started JeffDesk.LAN)
Jun 6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Stop meatware-
fence:2#011(Server4.LAN)
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jun 6 09:53:00 Server2 crmd: [2362]: info: unpack_graph: Unpacked transition
4: 4 actions in 4 synapses
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_te_invoke: Processing graph 4
(ref=pe_calc-dc-1307368380-52) derived from /var/lib/pengine/pe-input-67.bz2
Jun 6 09:53:00 Server2 crmd: [2362]: info: te_pseudo_action: Pseudo action 16
fired and confirmed
Jun 6 09:53:00 Server2 crmd: [2362]: info: te_rsc_command: Initiating action
13: stop meatware-fence:2_stop_0 on Server4.LAN
Jun 6 09:53:00 Server2 crmd: [2362]: info: match_graph_event: Action
meatware-fence:2_stop_0 (13) confirmed on Server4.LAN (rc=0)
Jun 6 09:53:00 Server2 crmd: [2362]: info: te_pseudo_action: Pseudo action 17
fired and confirmed
Jun 6 09:53:00 Server2 crmd: [2362]: info: te_crm_command: Executing crm-
event (20): do_shutdown on Server4.LAN
Jun 6 09:53:00 Server2 crmd: [2362]: info: run_graph:
====================================================
Jun 6 09:53:00 Server2 crmd: [2362]: notice: run_graph: Transition 4
(Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-67.bz2): Complete
Jun 6 09:53:00 Server2 crmd: [2362]: info: te_graph_trigger: Transition 4 is
now complete
Jun 6 09:53:00 Server2 crmd: [2362]: info: notify_crmd: Transition 4 status:
done - <null>
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: Starting
PEngine Recheck Timer
Jun 6 09:53:00 Server2 pacemakerd: [2353]: info: update_node_processes: Node
Server4.LAN now has process list: 00000000000000000000000000111112 (was
00000000000000000000000000111312)
Jun 6 09:53:00 Server2 stonith-ng: [2357]: info: crm_update_peer: Node
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000111112 (new)
Jun 6 09:53:00 Server2 attrd: [2360]: info: crm_update_peer: Node
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000111112 (new)
Jun 6 09:53:00 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000111112 (new)
Jun 6 09:53:00 Server2 crmd: [2362]: notice: crmd_peer_update: Status update:
Client Server4.LAN/crmd now has status [offline] (DC=true)
Jun 6 09:53:00 Server2 crmd: [2362]: info: erase_node_from_join: Removed node
Server4.LAN from join calculations: welcomed=0 itegrated=0 finalized=0
confirmed=1
Jun 6 09:53:00 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000111112 (new)
Jun 6 09:53:00 Server2 pacemakerd: [2353]: info: update_node_processes: Node
Server4.LAN now has process list: 00000000000000000000000000101112 (was
00000000000000000000000000111112)
Jun 6 09:53:00 Server2 stonith-ng: [2357]: info: crm_update_peer: Node
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000101112 (new)
Jun 6 09:53:00 Server2 attrd: [2360]: info: crm_update_peer: Node
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000101112 (new)
Jun 6 09:53:00 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000101112 (new)
Jun 6 09:53:00 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000101112 (new)
Jun 6 09:53:01 Server2 pacemakerd: [2353]: info: update_node_processes: Node
Server4.LAN now has process list: 00000000000000000000000000100112 (was
00000000000000000000000000101112)
Jun 6 09:53:01 Server2 attrd: [2360]: info: crm_update_peer: Node
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000100112 (new)
Jun 6 09:53:01 Server2 stonith-ng: [2357]: info: crm_update_peer: Node
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000100112 (new)
Jun 6 09:53:01 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000100112 (new)
Jun 6 09:53:01 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000100112 (new)
Jun 6 09:53:01 Server2 pacemakerd: [2353]: info: update_node_processes: Node
Server4.LAN now has process list: 00000000000000000000000000100102 (was
00000000000000000000000000100112)
Jun 6 09:53:01 Server2 attrd: [2360]: info: crm_update_peer: Node
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000100102 (new)
Jun 6 09:53:01 Server2 stonith-ng: [2357]: info: crm_update_peer: Node
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000100102 (new)
Jun 6 09:53:01 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000100102 (new)
Jun 6 09:53:01 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000100102 (new)
Jun 6 09:53:01 Server2 cib: [2358]: info: cib_process_shutdown_req: Shutdown
REQ from Server4.LAN
Jun 6 09:53:01 Server2 cib: [2358]: info: cib_process_request: Operation
complete: op cib_shutdown_req for section 'all'
(origin=Server4.LAN/Server4.LAN/(null), version=0.17.210): ok (rc=0)
Jun 6 09:53:06 Server2 pacemakerd: [2353]: info: update_node_processes: Node
Server4.LAN now has process list: 00000000000000000000000000100002 (was
00000000000000000000000000100102)
Jun 6 09:53:06 Server2 stonith-ng: [2357]: info: crm_update_peer: Node
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000100002 (new)
Jun 6 09:53:06 Server2 attrd: [2360]: info: crm_update_peer: Node
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000100002 (new)
Jun 6 09:53:06 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000100002 (new)
Jun 6 09:53:06 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000100002 (new)
Jun 6 09:53:06 Server2 pacemakerd: [2353]: info: update_node_processes: Node
Server4.LAN now has process list: 00000000000000000000000000000002 (was
00000000000000000000000000100002)
Jun 6 09:53:06 Server2 attrd: [2360]: info: crm_update_peer: Node
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000000002 (new)
Jun 6 09:53:06 Server2 stonith-ng: [2357]: info: crm_update_peer: Node
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0
proc=00000000000000000000000000000002 (new)
Jun 6 09:53:06 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000000002 (new)
Jun 6 09:53:06 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN:
id=67152064 state=member addr=r(0) ip(192.168.0.4) votes=1 born=244 seen=252
proc=00000000000000000000000000000002 (new)
[...]
The reference to pseudo actions seems suspicious.
Jeff
More information about the Pacemaker
mailing list