[Pacemaker] fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4
John McCabe
john at johnmccabe.net
Wed May 22 09:31:30 UTC 2013
Hi,
I've been trying to get fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64)
working within pacemaker (pacemaker-1.1.8-7.el6.x86_64) but am unable to
get it to work as intended, using fence_rhevm on the command line works as
expected, as does stonith_admin but from within pacemaker (triggered by
deliberately killing corosync on the node to be fenced):
May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed,
forming new configuration.
May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1
May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist: sender
r(0) ip(10.10.25.152) ; members(old:2 left:1)
May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service
synchronization, ready to provide service.
May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state:
cman_event_callback: Node enterprise[2] - state is now lost
May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match
for shutdown action on enterprise
May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback:
Stonith/shutdown of enterprise not matched
May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL
origin=check_join_state ]
May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker
fence enterprise (reset)
May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args:
Invoked: stonith_admin --reboot enterprise --tolerance 5s
May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss of
CCM Quorum: Ignore
May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message:
Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence enterprise
(reset) failed with rc=255
May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent
fence_pcmk result: error from agent
May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker
fence enterprise (reset)
May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args:
Invoked: stonith_admin --reboot enterprise --tolerance 5s
May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence enterprise
(reset) failed with rc=255
May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent
fence_pcmk result: error from agent
May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
May 21 22:22:01 defiant fenced[1302]: fencing node enterprise
and with corosync.log showing "warning: match_down_event: No match for
shutdown action on enterprise", "notice: peer_update_callback:
Stonith/shutdown of enterprise not matched"
May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new
configuration.
May 21 22:21:34 corosync [QUORUM] Members[1]: 1
May 21 22:21:34 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
May 21 22:21:34 [1749] defiant crmd: info: cman_event_callback:
Membership 296: quorum retained
May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership:
Left[5.0] cib.2
May 21 22:21:34 [1744] defiant cib: info: crm_update_peer_proc:
pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership:
Member[5.0] cib.1
May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership:
Left[5.0] stonith-ng.2
May 21 22:21:34 [1745] defiant stonith-ng: info: crm_update_peer_proc:
pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0)
ip(10.10.25.152) ; members(old:2 left:1)
May 21 22:21:34 corosync [MAIN ] Completed service synchronization, ready
to provide service.
May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership:
Member[5.0] stonith-ng.1
May 21 22:21:34 [1749] defiant crmd: notice: crm_update_peer_state:
cman_event_callback: Node enterprise[2] - state is now lost
May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback:
enterprise is now lost (was member)
May 21 22:21:34 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/150, version=0.22.3): OK (rc=0)
May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership:
Left[5.0] crmd.2
May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_proc:
pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback:
Client enterprise/peer now has status [offline] (DC=true)
May 21 22:21:34 [1749] defiant crmd: warning: match_down_event: No
match for shutdown action on enterprise
May 21 22:21:34 [1749] defiant crmd: notice: peer_update_callback:
Stonith/shutdown of enterprise not matched
May 21 22:21:34 [1749] defiant crmd: info:
crm_update_peer_expected: peer_update_callback: Node enterprise[2] -
expected state is now down
May 21 22:21:34 [1749] defiant crmd: info:
abort_transition_graph: peer_update_callback:211 - Triggered transition
abort (complete=1) : Node failure
May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership:
Member[5.0] crmd.1
May 21 22:21:34 [1749] defiant crmd: notice: do_state_transition:
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=check_join_state ]
May 21 22:21:34 [1749] defiant crmd: info:
abort_transition_graph: do_te_invoke:163 - Triggered transition abort
(complete=1) : Peer Halt
May 21 22:21:34 [1749] defiant crmd: info: join_make_offer:
Making join offers based on membership 296
May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_offer_all:
join-7: Waiting on 1 outstanding join acks
May 21 22:21:34 [1749] defiant crmd: info: update_dc: Set
DC to defiant (3.0.7)
May 21 22:21:34 [1749] defiant crmd: info: do_state_transition:
State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_finalize:
join-7: Syncing the CIB from defiant to the rest of the cluster
May 21 22:21:34 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/154, version=0.22.5): OK (rc=0)
May 21 22:21:34 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/155, version=0.22.6): OK (rc=0)
May 21 22:21:34 [1749] defiant crmd: info: stonith_action_create:
Initiating action metadata for agent fence_rhevm (target=(null))
May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack:
join-7: Updating node state to member for defiant
May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='defiant']/lrm
May 21 22:21:35 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7):
OK (rc=0)
May 21 22:21:35 [1749] defiant crmd: info: do_state_transition:
State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
May 21 22:21:35 [1749] defiant crmd: info:
abort_transition_graph: do_te_invoke:156 - Triggered transition abort
(complete=1) : Peer Cancelled
May 21 22:21:35 [1747] defiant attrd: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
May 21 22:21:35 [1747] defiant attrd: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
May 21 22:21:35 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/158, version=0.22.9): OK (rc=0)
May 21 22:21:35 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section cib
(origin=local/crmd/160, version=0.22.11): OK (rc=0)
May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
Startup probes: enabled
May 21 22:21:36 [1748] defiant pengine: notice: unpack_config: On
loss of CCM Quorum: Ignore
May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
May 21 22:21:36 [1748] defiant pengine: info: unpack_domains:
Unpacking domains
May 21 22:21:36 [1748] defiant pengine: info:
determine_online_status_fencing: Node defiant is active
May 21 22:21:36 [1748] defiant pengine: info:
determine_online_status: Node defiant is online
May 21 22:21:36 [1748] defiant pengine: info: native_print:
st-rhevm (stonith:fence_rhevm): Started defiant
May 21 22:21:36 [1748] defiant pengine: info: LogActions:
Leave st-rhevm (Started defiant)
May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message:
Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
May 21 22:21:36 [1749] defiant crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke:
Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from
/var/lib/pacemaker/pengine/pe-input-60.bz2
May 21 22:21:36 [1749] defiant crmd: notice: run_graph:
Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
May 21 22:21:36 [1749] defiant crmd: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
I can get the node enterprise to fence as expected from the command line
with:
stonith_admin --reboot enterprise --tolerance 5s
fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p <password>
-n enterprise -z
My config is as follows:
cluster.conf -----------------------------------
<?xml version="1.0"?>
<cluster config_version="1" name="cluster">
<logging debug="off"/>
<clusternodes>
<clusternode name="defiant" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="defiant"/>
</method>
</fence>
</clusternode>
<clusternode name="enterprise" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="enterprise"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="pcmk" agent="fence_pcmk"/>
</fencedevices>
<cman two_node="1" expected_votes="1">
</cman>
</cluster>
pacemaker cib ---------------------------------
Stonith device created with:
pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>"
passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1
debug="/tmp/debug.log"
<cib epoch="18" num_updates="88" admin_epoch="0"
validate-with="pacemaker-1.2" update-origin="defiant"
update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013"
crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.8-7.el6-394e906"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="cman"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="true"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="defiant" uname="defiant"/>
<node id="enterprise" uname="enterprise"/>
</nodes>
<resources>
<primitive class="stonith" id="st-rhevm" type="fence_rhevm">
<instance_attributes id="st-rhevm-instance_attributes">
<nvpair id="st-rhevm-instance_attributes-login" name="login"
value="<user>@<domain>"/>
<nvpair id="st-rhevm-instance_attributes-passwd" name="passwd"
value="<password>"/>
<nvpair id="st-rhevm-instance_attributes-debug" name="debug"
value="/tmp/debug.log"/>
<nvpair id="st-rhevm-instance_attributes-ssl" name="ssl"
value="1"/>
<nvpair id="st-rhevm-instance_attributes-verbose" name="verbose"
value="1"/>
<nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr"
value="<hypervisor ip>"/>
</instance_attributes>
</primitive>
</resources>
<constraints/>
</configuration>
<status>
<node_state id="defiant" uname="defiant" in_ccm="true" crmd="online"
crm-debug-origin="do_state_transition" join="member" expected="member">
<transient_attributes id="defiant">
<instance_attributes id="status-defiant">
<nvpair id="status-defiant-probe_complete" name="probe_complete"
value="true"/>
</instance_attributes>
</transient_attributes>
<lrm id="defiant">
<lrm_resources>
<lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
<lrm_rsc_op id="st-rhevm_last_0"
operation_key="st-rhevm_start_0" operation="start"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332"
last-rc-change="0" exec-time="232" queue-time="0"
op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="enterprise" uname="enterprise" in_ccm="true"
crmd="online" crm-debug-origin="do_update_resource" join="member"
expected="member">
<lrm id="enterprise">
<lrm_resources>
<lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
<lrm_rsc_op id="st-rhevm_last_0"
operation_key="st-rhevm_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800"
last-rc-change="0" exec-time="4" queue-time="0"
op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="enterprise">
<instance_attributes id="status-enterprise">
<nvpair id="status-enterprise-probe_complete"
name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>
The debug log output from fence_rhevm doesn't appear to show pacemaker
trying to request the reboot, only a vms command sent to the hypervisor
which responds with xml listing the VMs.
I can't quite see why its failing? Are you aware of any issues
with fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with
pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?
All the best,
/John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130522/7f88ab02/attachment-0003.html>
More information about the Pacemaker
mailing list