[Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.
Paul E Cain
pecain at us.ibm.com
Thu Jun 12 03:02:50 UTC 2014
Hi Andrew,
Thank you for your quick response.
I removed the on-fail="standby" items and re-tested but the problem
persists. The cibadmin -Q I gave you was actually from after I did the
STONITH on ha4 and ha3_fabric_ping tried to come up but failed. In
hindsight, maybe I should have made that clear or given you cibadmin -Q
from while the cluster is sitting there waiting for me to STONITH and
ha3_fabric_ping won't start. Any other ideas on why this would fail or even
just a way a to get around this problem? I just need to prevent the node
from fencing and trying to bring up the cluster resources if it cannot ping
10.10.0.1. Heartbeat had ping_group but I know of no similar feature with
Corosync/Pacemaker.
Thanks again for your time.
Info from while the cluster was waiting for me to fence:
When I ran crm_simulate on it this is what I got:
[root at ha3 ~]# crm_simulate -x /tmp/cib.xml
Current cluster status:
Node ha4 (168427535): UNCLEAN (offline)
Online: [ ha3 ]
ha3_fabric_ping (ocf::pacemaker:ping): Stopped
ha4_fabric_ping (ocf::pacemaker:ping): Stopped
fencing_route_to_ha3 (stonith:meatware): Stopped
fencing_route_to_ha4 (stonith:meatware): Stopped
[root at ha3 ~]# crm_mon -1
Last updated: Wed Jun 11 21:48:16 2014
Last change: Wed Jun 11 21:38:54 2014 via crmd on ha3
Stack: corosync
Current DC: ha3 (168427534) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
4 Resources configured
Node ha4 (168427535): UNCLEAN (offline)
Online: [ ha3 ]
cibadmin -Q
<cib epoch="208" num_updates="11" admin_epoch="0"
validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 21:38:54 2014"
crm_feature_set="3.0.8" update-origin="ha3" update-client="crmd"
have-quorum="1" dc-uuid="168427534">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair name="symmetric-cluster" value="true"
id="cib-bootstrap-options-symmetric-cluster"/>
<nvpair name="stonith-enabled" value="true"
id="cib-bootstrap-options-stonith-enabled"/>
<nvpair name="stonith-action" value="reboot"
id="cib-bootstrap-options-stonith-action"/>
<nvpair name="no-quorum-policy" value="ignore"
id="cib-bootstrap-options-no-quorum-policy"/>
<nvpair name="stop-orphan-resources" value="true"
id="cib-bootstrap-options-stop-orphan-resources"/>
<nvpair name="stop-orphan-actions" value="true"
id="cib-bootstrap-options-stop-orphan-actions"/>
<nvpair name="default-action-timeout" value="20s"
id="cib-bootstrap-options-default-action-timeout"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-9d39a6b"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="168427534" uname="ha3"/>
<node id="168427535" uname="ha4"/>
</nodes>
<resources>
<primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker"
type="ping">
<instance_attributes id="ha3_fabric_ping-instance_attributes">
<nvpair name="host_list" value="10.10.0.1"
id="ha3_fabric_ping-instance_attributes-host_list"/>
<nvpair name="failure_score" value="1"
id="ha3_fabric_ping-instance_attributes-failure_score"/>
</instance_attributes>
<operations>
<op name="start" timeout="60s" requires="nothing" interval="0"
id="ha3_fabric_ping-start-0">
<instance_attributes
id="ha3_fabric_ping-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" interval="15s" requires="nothing"
timeout="15s" id="ha3_fabric_ping-monitor-15s">
<instance_attributes
id="ha3_fabric_ping-monitor-15s-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="stop" on-fail="fence" requires="nothing" interval="0"
id="ha3_fabric_ping-stop-0">
<instance_attributes
id="ha3_fabric_ping-stop-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
</primitive>
<primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker"
type="ping">
<instance_attributes id="ha4_fabric_ping-instance_attributes">
<nvpair name="host_list" value="10.10.0.1"
id="ha4_fabric_ping-instance_attributes-host_list"/>
<nvpair name="failure_score" value="1"
id="ha4_fabric_ping-instance_attributes-failure_score"/>
</instance_attributes>
<operations>
<op name="start" timeout="60s" requires="nothing" interval="0"
id="ha4_fabric_ping-start-0">
<instance_attributes
id="ha4_fabric_ping-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" interval="15s" requires="nothing"
timeout="15s" id="ha4_fabric_ping-monitor-15s">
<instance_attributes
id="ha4_fabric_ping-monitor-15s-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="stop" on-fail="fence" requires="nothing" interval="0"
id="ha4_fabric_ping-stop-0">
<instance_attributes
id="ha4_fabric_ping-stop-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
</primitive>
<primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
<instance_attributes id="fencing_route_to_ha3-instance_attributes">
<nvpair name="hostlist" value="ha3"
id="fencing_route_to_ha3-instance_attributes-hostlist"/>
</instance_attributes>
<operations>
<op name="start" requires="nothing" interval="0"
id="fencing_route_to_ha3-start-0">
<instance_attributes
id="fencing_route_to_ha3-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" requires="nothing" interval="0"
id="fencing_route_to_ha3-monitor-0">
<instance_attributes
id="fencing_route_to_ha3-monitor-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
</primitive>
<primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
<instance_attributes id="fencing_route_to_ha4-instance_attributes">
<nvpair name="hostlist" value="ha4"
id="fencing_route_to_ha4-instance_attributes-hostlist"/>
</instance_attributes>
<operations>
<op name="start" requires="nothing" interval="0"
id="fencing_route_to_ha4-start-0">
<instance_attributes
id="fencing_route_to_ha4-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" requires="nothing" interval="0"
id="fencing_route_to_ha4-monitor-0">
<instance_attributes
id="fencing_route_to_ha4-monitor-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
</primitive>
</resources>
<constraints>
<rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping"
score="INFINITY" node="ha3"/>
<rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping"
score="-INFINITY" node="ha4"/>
<rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping"
score="INFINITY" node="ha4"/>
<rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping"
score="-INFINITY" node="ha3"/>
<rsc_location id="fencing_route_to_ha4_location"
rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
<rsc_location id="fencing_route_to_ha4_not_location"
rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
<rsc_location id="fencing_route_to_ha3_location"
rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
<rsc_location id="fencing_route_to_ha3_not_location"
rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
<rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4"
score="INFINITY" first="ha3_fabric_ping" first-action="start"
then="fencing_route_to_ha4" then-action="start"/>
<rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3"
score="INFINITY" first="ha4_fabric_ping" first-action="start"
then="fencing_route_to_ha3" then-action="start"/>
</constraints>
<rsc_defaults>
<meta_attributes id="rsc-options">
<nvpair name="resource-stickiness" value="INFINITY"
id="rsc-options-resource-stickiness"/>
<nvpair name="migration-threshold" value="0"
id="rsc-options-migration-threshold"/>
<nvpair name="is-managed" value="true"
id="rsc-options-is-managed"/>
</meta_attributes>
</rsc_defaults>
</configuration>
<status>
<node_state id="168427534" uname="ha3" in_ccm="true" crmd="online"
crm-debug-origin="do_update_resource" join="member" expected="member">
<lrm id="168427534">
<lrm_resources>
<lrm_resource id="ha3_fabric_ping" type="ping" class="ocf"
provider="pacemaker">
<lrm_rsc_op id="ha3_fabric_ping_last_0"
operation_key="ha3_fabric_ping_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="4:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
transition-magic="0:7;4:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
call-id="5" rc-code="7" op-status="0" interval="0" last-run="1402540735"
last-rc-change="1402540735" exec-time="42" queue-time="0"
op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
</lrm_resource>
<lrm_resource id="ha4_fabric_ping" type="ping" class="ocf"
provider="pacemaker">
<lrm_rsc_op id="ha4_fabric_ping_last_0"
operation_key="ha4_fabric_ping_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="5:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
transition-magic="0:7;5:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402540735"
last-rc-change="1402540735" exec-time="10" queue-time="0"
op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
</lrm_resource>
<lrm_resource id="fencing_route_to_ha3" type="meatware"
class="stonith">
<lrm_rsc_op id="fencing_route_to_ha3_last_0"
operation_key="fencing_route_to_ha3_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="6:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
transition-magic="0:7;6:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402540735"
last-rc-change="1402540735" exec-time="1" queue-time="0"
op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
</lrm_resource>
<lrm_resource id="fencing_route_to_ha4" type="meatware"
class="stonith">
<lrm_rsc_op id="fencing_route_to_ha4_last_0"
operation_key="fencing_route_to_ha4_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="7:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
transition-magic="0:7;7:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1"
call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402540735"
last-rc-change="1402540735" exec-time="0" queue-time="0"
op-digest="5be26fbcfd648e3d545d0115645dde76"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="168427534">
<instance_attributes id="status-168427534">
<nvpair id="status-168427534-shutdown" name="shutdown"
value="0"/>
<nvpair id="status-168427534-probe_complete"
name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
<node_state id="168427535" in_ccm="true" crmd="offline" join="down"
crm-debug-origin="do_state_transition"/>
</status>
</cib>
/var/log/messages
Jun 11 21:38:32 ha3 systemd: Starting LSB: Starts and stops Pacemaker
Cluster Manager....
Jun 11 21:38:32 ha3 pacemaker: Starting Pacemaker Cluster Manager
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: mcp_read_config: Configured
corosync to accept connections from group 1000: OK (1)
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: main: Starting Pacemaker
1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc
lha-fencing nagios corosync-native libesmtp
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427534
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: cluster_connect_quorum:
Quorum acquired
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node ha3[168427534] - state is now member (was
(null))
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable
to get node name for nodeid 168427535
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427535
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable
to get node name for nodeid 168427535
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable
to get node name for nodeid 168427535
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427535
Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node (null)[168427535] - state is now member (was
(null))
Jun 11 21:38:32 ha3 pengine[12486]: warning:
crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by
group haclient
Jun 11 21:38:32 ha3 cib[12482]: warning:
crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group
haclient
Jun 11 21:38:32 ha3 cib[12482]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: corosync
Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jun 11 21:38:32 ha3 crmd[12487]: notice: main: CRM Git Version: 9d39a6b
Jun 11 21:38:32 ha3 crmd[12487]: warning:
crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by
group haclient
Jun 11 21:38:32 ha3 crmd[12487]: warning:
crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group
haclient
Jun 11 21:38:32 ha3 attrd[12485]: notice: crm_cluster_connect: Connecting
to cluster infrastructure: corosync
Jun 11 21:38:32 ha3 attrd[12485]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:32 ha3 attrd[12485]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
Jun 11 21:38:32 ha3 attrd[12485]: notice: crm_update_peer_state:
attrd_peer_change_cb: Node (null)[168427534] - state is now member (was
(null))
Jun 11 21:38:32 ha3 attrd[12485]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:32 ha3 attrd[12485]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427534
Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 21:38:32 ha3 cib[12482]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:32 ha3 cib[12482]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
Jun 11 21:38:32 ha3 cib[12482]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:32 ha3 cib[12482]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 21:38:33 ha3 crmd[12487]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: corosync
Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 21:38:33 ha3 crmd[12487]: notice: cluster_connect_quorum: Quorum
acquired
Jun 11 21:38:33 ha3 crmd[12487]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node ha3[168427534] - state is now member (was
(null))
Jun 11 21:38:33 ha3 stonith-ng[12483]: notice: setup_cib: Watching for
stonith topology changes
Jun 11 21:38:33 ha3 stonith-ng[12483]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427535
Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427535
Jun 11 21:38:33 ha3 crmd[12487]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node (null)[168427535] - state is now member (was
(null))
Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 21:38:33 ha3 crmd[12487]: notice: do_started: The local CRM is
operational
Jun 11 21:38:33 ha3 crmd[12487]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL
origin=do_started ]
Jun 11 21:38:34 ha3 stonith-ng[12483]: notice: stonith_device_register:
Added 'fencing_route_to_ha4' to the device list (1 active devices)
Jun 11 21:38:37 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
Jun 11 21:38:37 ha3 systemd: Started LSB: Starts and stops Pacemaker
Cluster Manager..
Jun 11 21:38:54 ha3 crmd[12487]: warning: do_log: FSA: Input I_DC_TIMEOUT
from crm_timer_popped() received in state S_PENDING
Jun 11 21:38:54 ha3 crmd[12487]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=election_timeout_popped ]
Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: Diff: --- 0.206.0
Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: Diff: +++ 0.207.1
6c3024691ae3d5b4c93705a5f2130993
Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="206" num_updates="0"/>
Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-9d39a6b"/>
Jun 11 21:38:54 ha3 cib[12482]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:54 ha3 cib[12482]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 21:38:54 ha3 crmd[12487]: warning: do_log: FSA: Input I_ELECTION_DC
from do_election_check() received in state S_INTEGRATION
Jun 11 21:38:54 ha3 cib[12482]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.208.1
Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="207" num_updates="1"/>
Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
Jun 11 21:38:54 ha3 attrd[12485]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 21:38:54 ha3 attrd[12485]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 21:38:54 ha3 attrd[12485]: notice: write_attribute: Sent update 2
with 1 changes for terminate, id=<n/a>, set=(null)
Jun 11 21:38:54 ha3 attrd[12485]: notice: write_attribute: Sent update 3
with 1 changes for shutdown, id=<n/a>, set=(null)
Jun 11 21:38:54 ha3 attrd[12485]: notice: attrd_cib_callback: Update 2 for
terminate[ha3]=(null): OK (0)
Jun 11 21:38:54 ha3 attrd[12485]: notice: attrd_cib_callback: Update 3 for
shutdown[ha3]=0: OK (0)
Jun 11 21:38:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:38:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4
for STONITH
Jun 11 21:38:55 ha3 pengine[12486]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 21:38:55 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:38:55 ha3 pengine[12486]: warning: process_pe_message: Calculated
Transition 0: /var/lib/pacemaker/pengine/pe-warn-82.bz2
Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
Jun 11 21:38:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot
fencing operation (12) on ha4 (timeout=60000)
Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: handle_request: Client
crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
f9cbe911-8a32-4abf-9d7a-08e34167c203 (0)
Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 21:38:55 ha3 stonith: [12503]: info: parse config info info=ha4
Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:38:55 ha3 stonith: [12508]: info: parse config info info=ha4
Jun 11 21:38:55 ha3 stonith: [12508]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 21:38:55 ha3 stonith: [12508]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 21:38:55 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=27, confirmed=true) not
running
Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
Jun 11 21:38:55 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=28, confirmed=true) not
running
Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
3: probe_complete probe_complete on ha3 (local) - no waiting
Jun 11 21:38:55 ha3 attrd[12485]: notice: write_attribute: Sent update 4
with 1 changes for probe_complete, id=<n/a>, set=(null)
Jun 11 21:38:55 ha3 attrd[12485]: notice: attrd_cib_callback: Update 4 for
probe_complete[ha3]=true: OK (0)
Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done:
Child process 12506 performing action 'reboot' timed out with signal 15
Jun 11 21:39:55 ha3 stonith-ng[12483]: error: log_operation: Operation
'reboot' [12506] (call 2 from crmd.12487) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
Jun 11 21:39:55 ha3 stonith-ng[12483]: warning: log_operation:
fencing_route_to_ha4:12506 [ Performing: stonith -t meatware -T reset ha4 ]
Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer:
Couldn't find anyone to fence ha4 with <any>
Jun 11 21:39:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.12487 at ha3.f9cbe911: No route to host
Jun 11 21:39:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 2/12:0:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host
(-113)
Jun 11 21:39:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 2 for ha4 failed (No route to host): aborting transition.
Jun 11 21:39:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=f9cbe911-8a32-4abf-9d7a-08e34167c203) by client crmd.12487
Jun 11 21:39:55 ha3 crmd[12487]: notice: run_graph: Transition 0
(Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-82.bz2): Stopped
Jun 11 21:39:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:39:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4
for STONITH
Jun 11 21:39:55 ha3 pengine[12486]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 21:39:55 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:39:55 ha3 pengine[12486]: warning: process_pe_message: Calculated
Transition 1: /var/lib/pacemaker/pengine/pe-warn-83.bz2
Jun 11 21:39:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: handle_request: Client
crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
281b0cc2-f1cc-485f-aa03-c50704fc97f9 (0)
Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:39:55 ha3 stonith: [12536]: info: parse config info info=ha4
Jun 11 21:39:55 ha3 stonith: [12536]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 21:39:55 ha3 stonith: [12536]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done:
Child process 12535 performing action 'reboot' timed out with signal 15
Jun 11 21:40:55 ha3 stonith-ng[12483]: error: log_operation: Operation
'reboot' [12535] (call 3 from crmd.12487) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
Jun 11 21:40:55 ha3 stonith-ng[12483]: warning: log_operation:
fencing_route_to_ha4:12535 [ Performing: stonith -t meatware -T reset ha4 ]
Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer:
Couldn't find anyone to fence ha4 with <any>
Jun 11 21:40:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.12487 at ha3.281b0cc2: No route to host
Jun 11 21:40:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 3/8:1:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host
(-113)
Jun 11 21:40:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 3 for ha4 failed (No route to host): aborting transition.
Jun 11 21:40:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=281b0cc2-f1cc-485f-aa03-c50704fc97f9) by client crmd.12487
Jun 11 21:40:55 ha3 crmd[12487]: notice: run_graph: Transition 1
(Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
Jun 11 21:40:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:40:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4
for STONITH
Jun 11 21:40:55 ha3 pengine[12486]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 21:40:55 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:40:55 ha3 pengine[12486]: warning: process_pe_message: Calculated
Transition 2: /var/lib/pacemaker/pengine/pe-warn-83.bz2
Jun 11 21:40:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: handle_request: Client
crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
1adec92a-c8f0-4087-9e51-e24c947ca171 (0)
Jun 11 21:40:55 ha3 stonith: [12543]: info: parse config info info=ha4
Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:40:55 ha3 stonith: [12545]: info: parse config info info=ha4
Jun 11 21:40:55 ha3 stonith: [12545]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 21:40:55 ha3 stonith: [12545]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done:
Child process 12544 performing action 'reboot' timed out with signal 15
Jun 11 21:41:55 ha3 stonith-ng[12483]: error: log_operation: Operation
'reboot' [12544] (call 4 from crmd.12487) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
Jun 11 21:41:55 ha3 stonith-ng[12483]: warning: log_operation:
fencing_route_to_ha4:12544 [ Performing: stonith -t meatware -T reset ha4 ]
Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer:
Couldn't find anyone to fence ha4 with <any>
Jun 11 21:41:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.12487 at ha3.1adec92a: No route to host
Jun 11 21:41:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 4/8:2:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host
(-113)
Jun 11 21:41:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 4 for ha4 failed (No route to host): aborting transition.
Jun 11 21:41:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=1adec92a-c8f0-4087-9e51-e24c947ca171) by client crmd.12487
Jun 11 21:41:55 ha3 crmd[12487]: notice: run_graph: Transition 2
(Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
Jun 11 21:41:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:41:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4
for STONITH
Jun 11 21:41:55 ha3 pengine[12486]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 21:41:55 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:41:55 ha3 pengine[12486]: warning: process_pe_message: Calculated
Transition 3: /var/lib/pacemaker/pengine/pe-warn-83.bz2
Jun 11 21:41:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: handle_request: Client
crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
1511d2cb-2ab9-4a06-9676-807ed8b27f2b (0)
Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:41:55 ha3 stonith: [12548]: info: parse config info info=ha4
Jun 11 21:41:55 ha3 stonith: [12548]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 21:41:55 ha3 stonith: [12548]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done:
Child process 12547 performing action 'reboot' timed out with signal 15
Jun 11 21:42:55 ha3 stonith-ng[12483]: error: log_operation: Operation
'reboot' [12547] (call 5 from crmd.12487) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
Jun 11 21:42:55 ha3 stonith-ng[12483]: warning: log_operation:
fencing_route_to_ha4:12547 [ Performing: stonith -t meatware -T reset ha4 ]
Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer:
Couldn't find anyone to fence ha4 with <any>
Jun 11 21:42:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.12487 at ha3.1511d2cb: No route to host
Jun 11 21:42:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 5/8:3:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host
(-113)
Jun 11 21:42:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 5 for ha4 failed (No route to host): aborting transition.
Jun 11 21:42:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=1511d2cb-2ab9-4a06-9676-807ed8b27f2b) by client crmd.12487
Jun 11 21:42:55 ha3 crmd[12487]: notice: run_graph: Transition 3
(Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
Jun 11 21:42:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:42:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4
for STONITH
Jun 11 21:42:55 ha3 pengine[12486]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 21:42:55 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:42:55 ha3 pengine[12486]: warning: process_pe_message: Calculated
Transition 4: /var/lib/pacemaker/pengine/pe-warn-83.bz2
Jun 11 21:42:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: handle_request: Client
crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
4de370d1-dab3-4f8f-82cf-969899d6008c (0)
Jun 11 21:42:55 ha3 stonith: [12550]: info: parse config info info=ha4
Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:42:55 ha3 stonith: [12552]: info: parse config info info=ha4
Jun 11 21:42:55 ha3 stonith: [12552]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 21:42:55 ha3 stonith: [12552]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done:
Child process 12551 performing action 'reboot' timed out with signal 15
Jun 11 21:43:55 ha3 stonith-ng[12483]: error: log_operation: Operation
'reboot' [12551] (call 6 from crmd.12487) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
Jun 11 21:43:55 ha3 stonith-ng[12483]: warning: log_operation:
fencing_route_to_ha4:12551 [ Performing: stonith -t meatware -T reset ha4 ]
Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer:
Couldn't find anyone to fence ha4 with <any>
Jun 11 21:43:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.12487 at ha3.4de370d1: No route to host
Jun 11 21:43:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 6/8:4:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host
(-113)
Jun 11 21:43:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 6 for ha4 failed (No route to host): aborting transition.
Jun 11 21:43:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=4de370d1-dab3-4f8f-82cf-969899d6008c) by client crmd.12487
Jun 11 21:43:55 ha3 crmd[12487]: notice: run_graph: Transition 4
(Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
Jun 11 21:43:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:43:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4
for STONITH
Jun 11 21:43:55 ha3 pengine[12486]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 21:43:55 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:43:55 ha3 pengine[12486]: warning: process_pe_message: Calculated
Transition 5: /var/lib/pacemaker/pengine/pe-warn-83.bz2
Jun 11 21:43:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: handle_request: Client
crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
e766648d-20f0-4b94-b001-4873f9f8bb37 (0)
Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 21:43:55 ha3 stonith: [12554]: info: parse config info info=ha4
Jun 11 21:43:55 ha3 stonith: [12554]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 21:43:55 ha3 stonith: [12554]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 21:44:47 ha3 stonith: [12554]: info: node Meatware-reset: ha4
Jun 11 21:44:47 ha3 stonith-ng[12483]: notice: log_operation: Operation
'reboot' [12553] (call 7 from crmd.12487) for host 'ha4' with device
'fencing_route_to_ha4' returned: 0 (OK)
Jun 11 21:44:47 ha3 stonith-ng[12483]: notice: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.12487 at ha3.e766648d: OK
Jun 11 21:44:47 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith
operation 7/8:5:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: OK (0)
Jun 11 21:44:47 ha3 crmd[12487]: notice: crm_update_peer_state:
send_stonith_update: Node ha4[0] - state is now lost (was (null))
Jun 11 21:44:47 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4
was terminated (reboot) by ha3 for ha3: OK
(ref=e766648d-20f0-4b94-b001-4873f9f8bb37) by client crmd.12487
Jun 11 21:44:47 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
4: start ha3_fabric_ping_start_0 on ha3 (local)
Jun 11 21:45:07 ha3 attrd[12485]: notice: write_attribute: Sent update 5
with 1 changes for pingd, id=<n/a>, set=(null)
Jun 11 21:45:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 5 for
pingd[ha3]=0: OK (0)
Jun 11 21:45:07 ha3 ping(ha3_fabric_ping)[12560]: WARNING: pingd is less
than failure_score(1)
Jun 11 21:45:07 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=43, confirmed=true)
unknown error
Jun 11 21:45:07 ha3 crmd[12487]: warning: status_from_rc: Action 4
(ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
Jun 11 21:45:07 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541107)
Jun 11 21:45:07 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541107)
Jun 11 21:45:07 ha3 crmd[12487]: notice: run_graph: Transition 5
(Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
Jun 11 21:45:07 ha3 attrd[12485]: notice: write_attribute: Sent update 6
with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 21:45:07 ha3 attrd[12485]: notice: write_attribute: Sent update 7
with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 21:45:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:45:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:45:07 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 6: /var/lib/pacemaker/pengine/pe-input-315.bz2
Jun 11 21:45:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 6 for
fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
Jun 11 21:45:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 7 for
last-failure-ha3_fabric_ping[ha3]=1402541107: OK (0)
Jun 11 21:45:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:45:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:45:07 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 7: /var/lib/pacemaker/pengine/pe-input-316.bz2
Jun 11 21:45:07 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
1: stop ha3_fabric_ping_stop_0 on ha3 (local)
Jun 11 21:45:07 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=47, confirmed=true) ok
Jun 11 21:45:07 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
5: start ha3_fabric_ping_start_0 on ha3 (local)
Jun 11 21:45:12 ha3 attrd[12485]: notice: write_attribute: Sent update 8
with 1 changes for pingd, id=<n/a>, set=(null)
Jun 11 21:45:12 ha3 attrd[12485]: notice: attrd_cib_callback: Update 8 for
pingd[ha3]=(null): OK (0)
Jun 11 21:45:27 ha3 ping(ha3_fabric_ping)[12607]: WARNING: pingd is less
than failure_score(1)
Jun 11 21:45:27 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_start_0 (call=20, rc=1, cib-update=48, confirmed=true)
unknown error
Jun 11 21:45:27 ha3 crmd[12487]: warning: status_from_rc: Action 5
(ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
Jun 11 21:45:27 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541127)
Jun 11 21:45:27 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541127)
Jun 11 21:45:27 ha3 crmd[12487]: notice: run_graph: Transition 7
(Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-316.bz2): Stopped
Jun 11 21:45:27 ha3 attrd[12485]: notice: write_attribute: Sent update 9
with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 21:45:27 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:45:27 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:45:27 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 8: /var/lib/pacemaker/pengine/pe-input-317.bz2
Jun 11 21:45:27 ha3 attrd[12485]: notice: attrd_cib_callback: Update 9 for
last-failure-ha3_fabric_ping[ha3]=1402541127: OK (0)
Jun 11 21:45:27 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:45:27 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:45:27 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 9: /var/lib/pacemaker/pengine/pe-input-318.bz2
Jun 11 21:45:27 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
1: stop ha3_fabric_ping_stop_0 on ha3 (local)
Jun 11 21:45:27 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_stop_0 (call=21, rc=0, cib-update=51, confirmed=true) ok
Jun 11 21:45:27 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
5: start ha3_fabric_ping_start_0 on ha3 (local)
Jun 11 21:45:32 ha3 attrd[12485]: notice: write_attribute: Sent update 10
with 1 changes for pingd, id=<n/a>, set=(null)
Jun 11 21:45:32 ha3 attrd[12485]: notice: attrd_cib_callback: Update 10 for
pingd[ha3]=(null): OK (0)
Jun 11 21:45:47 ha3 ping(ha3_fabric_ping)[12654]: WARNING: pingd is less
than failure_score(1)
Jun 11 21:45:47 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_start_0 (call=22, rc=1, cib-update=52, confirmed=true)
unknown error
Jun 11 21:45:47 ha3 crmd[12487]: warning: status_from_rc: Action 5
(ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
Jun 11 21:45:47 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541147)
Jun 11 21:45:47 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541147)
Jun 11 21:45:47 ha3 crmd[12487]: notice: run_graph: Transition 9
(Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-318.bz2): Stopped
Jun 11 21:45:47 ha3 attrd[12485]: notice: write_attribute: Sent update 11
with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 21:45:47 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:45:47 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:45:47 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 10: /var/lib/pacemaker/pengine/pe-input-319.bz2
Jun 11 21:45:47 ha3 attrd[12485]: notice: attrd_cib_callback: Update 11 for
last-failure-ha3_fabric_ping[ha3]=1402541147: OK (0)
Jun 11 21:45:47 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:45:47 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:45:47 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 11: /var/lib/pacemaker/pengine/pe-input-320.bz2
Jun 11 21:45:47 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
1: stop ha3_fabric_ping_stop_0 on ha3 (local)
Jun 11 21:45:47 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_stop_0 (call=23, rc=0, cib-update=55, confirmed=true) ok
Jun 11 21:45:47 ha3 crmd[12487]: notice: te_rsc_command: Initiating action
5: start ha3_fabric_ping_start_0 on ha3 (local)
Jun 11 21:45:52 ha3 attrd[12485]: notice: write_attribute: Sent update 12
with 1 changes for pingd, id=<n/a>, set=(null)
Jun 11 21:45:52 ha3 attrd[12485]: notice: attrd_cib_callback: Update 12 for
pingd[ha3]=(null): OK (0)
Jun 11 21:46:07 ha3 ping(ha3_fabric_ping)[12700]: WARNING: pingd is less
than failure_score(1)
Jun 11 21:46:07 ha3 crmd[12487]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_start_0 (call=24, rc=1, cib-update=56, confirmed=true)
unknown error
Jun 11 21:46:07 ha3 crmd[12487]: warning: status_from_rc: Action 5
(ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
Jun 11 21:46:07 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541167)
Jun 11 21:46:07 ha3 crmd[12487]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402541167)
Jun 11 21:46:07 ha3 crmd[12487]: notice: run_graph: Transition 11
(Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-320.bz2): Stopped
Jun 11 21:46:07 ha3 attrd[12485]: notice: write_attribute: Sent update 13
with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 21:46:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:46:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 21:46:07 ha3 pengine[12486]: notice: process_pe_message: Calculated
Transition 12: /var/lib/pacemaker/pengine/pe-input-321.bz2
Jun 11 21:46:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 13 for
last-failure-ha3_fabric_ping[ha3]=1402541167: OK (0)
Jun 11 21:46:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 21:46:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Recover
ha3_fabric_ping (Started ha3)
Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Paul Cain
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager
<pacemaker at oss.clusterlabs.org>
Date: 06/11/2014 07:20 PM
Subject: Re: [Pacemaker] When stonith is enabled, resources won't start
until after stonith, even though requires="nothing" and
prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled
from source.
On 12 Jun 2014, at 4:55 am, Paul E Cain <pecain at us.ibm.com> wrote:
> Hello,
>
> Overview
> I'm experimenting with a small two-node Pacemaker cluster on two RHEL 7
VMs. One of the things I need to do is ensure that my cluster can connect
to a certain IP address 10.10.0.1 because once I add the actual resources
that will need to be HA those resources will need access to 10.10.0.1 for
the cluster to functional normally. To do that, I have one
ocf:pacemaker:ping resource for each node to check that connectivity. If
the ping fails, the node should go into standby mode and get fenced if
possible. Additionally, when a node first comes up I want that connectivity
check to happen before the fencing agents come up or a STONITH happens
because a node should not try to take over cluster resources if it cannot
connect to 10.10.0.1. To do this, I tried adding requires="nothing" and
prereq="nothing" to all the operations for both pinging resources. I also
have two meatware fencing agents to use for testing. I'm using order
constraints so they don't start until after the ping resources.
>
> Cluster When Functioning Normally
> [root at ha3 ~]# crm_mon -1
> Last updated: Wed Jun 11 13:10:54 2014
> Last change: Wed Jun 11 13:10:35 2014 via crmd on ha3
> Stack: corosync
> Current DC: ha3 (168427534) - partition with quorum
> Version: 1.1.10-9d39a6b
> 2 Nodes configured
> 4 Resources configured
>
>
> Online: [ ha3 ha4 ]
>
> ha3_fabric_ping (ocf::pacemaker:ping): Started ha3
> ha4_fabric_ping (ocf::pacemaker:ping): Started ha4
> fencing_route_to_ha3 (stonith:meatware): Started ha4
> fencing_route_to_ha4 (stonith:meatware): Started ha3
>
>
> Testing
> However, when I tested this by only starting up pacemaker on ha3 and also
preventing ha3 from connecting to 10.10.0.1, I found that ha3 would not
start until after ha4 was STONITHed. What I was aiming for was for
ha3_fabric_ping to fail to start, which would prevent the fencing agent
from starting and therefore prevent any STONITH.
>
>
> Question
> Any ideas why this is not working as expected? It's my understanding that
requires="nothing" should allow ha3_fabric_ping to start even before any
fencing operations. Maybe I'm misunderstanding something.
Its because the entire node is in standby mode.
Running crm_simulate with the cib.xml below shows:
Node ha3 (168427534): standby (on-fail)
In the config I see:
<op name="monitor" interval="15s" requires="nothing"
on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
and:
<lrm_rsc_op id="ha3_fabric_ping_last_failure_0"
operation_key="ha3_fabric_ping_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641"
last-rc-change="1402509641" exec-time="20043" queue-time="0"
op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
Note: rc-code="1"
The combination put the node into standby and prevented resources starting.
>
> Thanks for any help you can offer.
>
> Below is shows the software versions, cibadmin -Q, the /var/log/messages
on ha3 during my test, and my corosync.conf file.
>
> Tell me if you need any more information.
>
> Software Versions (All Compiled From Source From The Website of the
Respective Projects)
> Cluster glue 1.0.11
> libqb 0.17.0
> Corosync 2.3.3
> Pacemaker 1.1.11
> Resources Agents 3.9.5
> crmsh 2.0
>
> cibadmin -Q
> <cib epoch="204" num_updates="18" admin_epoch="0"
validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 12:56:50 2014"
crm_feature_set="3.0.8" update-origin="ha3" update-client="crm_resource"
have-quorum="1" dc-uuid="168427534">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair name="symmetric-cluster" value="true"
id="cib-bootstrap-options-symmetric-cluster"/>
> <nvpair name="stonith-enabled" value="true"
id="cib-bootstrap-options-stonith-enabled"/>
> <nvpair name="stonith-action" value="reboot"
id="cib-bootstrap-options-stonith-action"/>
> <nvpair name="no-quorum-policy" value="ignore"
id="cib-bootstrap-options-no-quorum-policy"/>
> <nvpair name="stop-orphan-resources" value="true"
id="cib-bootstrap-options-stop-orphan-resources"/>
> <nvpair name="stop-orphan-actions" value="true"
id="cib-bootstrap-options-stop-orphan-actions"/>
> <nvpair name="default-action-timeout" value="20s"
id="cib-bootstrap-options-default-action-timeout"/>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-9d39a6b"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="168427534" uname="ha3"/>
> <node id="168427535" uname="ha4"/>
> </nodes>
> <resources>
> <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker"
type="ping">
> <instance_attributes id="ha3_fabric_ping-instance_attributes">
> <nvpair name="host_list" value="10.10.0.1"
id="ha3_fabric_ping-instance_attributes-host_list"/>
> <nvpair name="failure_score" value="1"
id="ha3_fabric_ping-instance_attributes-failure_score"/>
> </instance_attributes>
> <operations>
> <op name="start" timeout="60s" requires="nothing"
on-fail="standby" interval="0" id="ha3_fabric_ping-start-0">
> <instance_attributes
id="ha3_fabric_ping-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" interval="15s" requires="nothing"
on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
> <instance_attributes
id="ha3_fabric_ping-monitor-15s-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="stop" on-fail="fence" requires="nothing" interval="0"
id="ha3_fabric_ping-stop-0">
> <instance_attributes
id="ha3_fabric_ping-stop-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> <meta_attributes id="ha3_fabric_ping-meta_attributes">
> <nvpair id="ha3_fabric_ping-meta_attributes-requires"
name="requires" value="nothing"/>
> </meta_attributes>
> </primitive>
> <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker"
type="ping">
> <instance_attributes id="ha4_fabric_ping-instance_attributes">
> <nvpair name="host_list" value="10.10.0.1"
id="ha4_fabric_ping-instance_attributes-host_list"/>
> <nvpair name="failure_score" value="1"
id="ha4_fabric_ping-instance_attributes-failure_score"/>
> </instance_attributes>
> <operations>
> <op name="start" timeout="60s" requires="nothing"
on-fail="standby" interval="0" id="ha4_fabric_ping-start-0">
> <instance_attributes
id="ha4_fabric_ping-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" interval="15s" requires="nothing"
on-fail="standby" timeout="15s" id="ha4_fabric_ping-monitor-15s">
> <instance_attributes
id="ha4_fabric_ping-monitor-15s-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="stop" on-fail="fence" requires="nothing" interval="0"
id="ha4_fabric_ping-stop-0">
> <instance_attributes
id="ha4_fabric_ping-stop-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> <meta_attributes id="ha4_fabric_ping-meta_attributes">
> <nvpair id="ha4_fabric_ping-meta_attributes-requires"
name="requires" value="nothing"/>
> </meta_attributes>
> </primitive>
> <primitive id="fencing_route_to_ha3" class="stonith"
type="meatware">
> <instance_attributes
id="fencing_route_to_ha3-instance_attributes">
> <nvpair name="hostlist" value="ha3"
id="fencing_route_to_ha3-instance_attributes-hostlist"/>
> </instance_attributes>
> <operations>
> <op name="start" requires="nothing" interval="0"
id="fencing_route_to_ha3-start-0">
> <instance_attributes
id="fencing_route_to_ha3-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" requires="nothing" interval="0"
id="fencing_route_to_ha3-monitor-0">
> <instance_attributes
id="fencing_route_to_ha3-monitor-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="fencing_route_to_ha4" class="stonith"
type="meatware">
> <instance_attributes
id="fencing_route_to_ha4-instance_attributes">
> <nvpair name="hostlist" value="ha4"
id="fencing_route_to_ha4-instance_attributes-hostlist"/>
> </instance_attributes>
> <operations>
> <op name="start" requires="nothing" interval="0"
id="fencing_route_to_ha4-start-0">
> <instance_attributes
id="fencing_route_to_ha4-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" requires="nothing" interval="0"
id="fencing_route_to_ha4-monitor-0">
> <instance_attributes
id="fencing_route_to_ha4-monitor-0-instance_attributes">
> <nvpair name="prereq" value="nothing"
id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping"
score="INFINITY" node="ha3"/>
> <rsc_location id="ha3_fabric_ping_not_location"
rsc="ha3_fabric_ping" score="-INFINITY" node="ha4"/>
> <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping"
score="INFINITY" node="ha4"/>
> <rsc_location id="ha4_fabric_ping_not_location"
rsc="ha4_fabric_ping" score="-INFINITY" node="ha3"/>
> <rsc_location id="fencing_route_to_ha4_location"
rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
> <rsc_location id="fencing_route_to_ha4_not_location"
rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
> <rsc_location id="fencing_route_to_ha3_location"
rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
> <rsc_location id="fencing_route_to_ha3_not_location"
rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
> <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4"
score="INFINITY" first="ha3_fabric_ping" first-action="start"
then="fencing_route_to_ha4" then-action="start"/>
> <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3"
score="INFINITY" first="ha4_fabric_ping" first-action="start"
then="fencing_route_to_ha3" then-action="start"/>
> </constraints>
> <rsc_defaults>
> <meta_attributes id="rsc-options">
> <nvpair name="resource-stickiness" value="INFINITY"
id="rsc-options-resource-stickiness"/>
> <nvpair name="migration-threshold" value="0"
id="rsc-options-migration-threshold"/>
> <nvpair name="is-managed" value="true"
id="rsc-options-is-managed"/>
> </meta_attributes>
> </rsc_defaults>
> </configuration>
> <status>
> <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online"
crm-debug-origin="do_update_resource" join="member" expected="member">
> <lrm id="168427534">
> <lrm_resources>
> <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf"
provider="pacemaker">
> <lrm_rsc_op id="ha3_fabric_ping_last_0"
operation_key="ha3_fabric_ping_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:0;4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="19" rc-code="0" op-status="0" interval="0" last-run="1402509661"
last-rc-change="1402509661" exec-time="12" queue-time="0"
op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> <lrm_rsc_op id="ha3_fabric_ping_last_failure_0"
operation_key="ha3_fabric_ping_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641"
last-rc-change="1402509641" exec-time="20043" queue-time="0"
op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
> </lrm_resource>
> <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf"
provider="pacemaker">
> <lrm_rsc_op id="ha4_fabric_ping_last_0"
operation_key="ha4_fabric_ping_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:7;5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402509565"
last-rc-change="1402509565" exec-time="10" queue-time="0"
op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> </lrm_resource>
> <lrm_resource id="fencing_route_to_ha3" type="meatware"
class="stonith">
> <lrm_rsc_op id="fencing_route_to_ha3_last_0"
operation_key="fencing_route_to_ha3_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:7;6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402509565"
last-rc-change="1402509565" exec-time="1" queue-time="0"
op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
> </lrm_resource>
> <lrm_resource id="fencing_route_to_ha4" type="meatware"
class="stonith">
> <lrm_rsc_op id="fencing_route_to_ha4_last_0"
operation_key="fencing_route_to_ha4_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:7;7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402509565"
last-rc-change="1402509565" exec-time="0" queue-time="0"
op-digest="5be26fbcfd648e3d545d0115645dde76"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> <transient_attributes id="168427534">
> <instance_attributes id="status-168427534">
> <nvpair id="status-168427534-shutdown" name="shutdown"
value="0"/>
> <nvpair id="status-168427534-probe_complete"
name="probe_complete" value="true"/>
> <nvpair id="status-168427534-fail-count-ha3_fabric_ping"
name="fail-count-ha3_fabric_ping" value="INFINITY"/>
> <nvpair id="status-168427534-last-failure-ha3_fabric_ping"
name="last-failure-ha3_fabric_ping" value="1402509661"/>
> </instance_attributes>
> </transient_attributes>
> </node_state>
> <node_state id="168427535" in_ccm="false" crmd="offline" join="down"
crm-debug-origin="send_stonith_update" uname="ha4" expected="down"/>
> </status>
> </cib>
> [root at ha3 ~]#
>
>
> /var/log/messages from when pacemaker started on ha3 to when
ha3_fabric_ping failed.
> Jun 11 12:59:01 ha3 systemd: Starting LSB: Starts and stops Pacemaker
Cluster Manager....
> Jun 11 12:59:01 ha3 pacemaker: Starting Pacemaker Cluster Manager
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: mcp_read_config: Configured
corosync to accept connections from group 1000: OK (1)
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: main: Starting Pacemaker
1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc
lha-fencing nagios corosync-native libesmtp
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427534
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: cluster_connect_quorum:
Quorum acquired
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Defaulting
to uname -n for the local corosync node name
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node ha3[168427534] - state is now member (was
(null))
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable
to get node name for nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable
to get node name for nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable
to get node name for nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427535
> Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node (null)[168427535] - state is now member (was
(null))
> Jun 11 12:59:02 ha3 pengine[5013]: warning:
crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by
group haclient
> Jun 11 12:59:02 ha3 cib[5009]: warning:
crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group
haclient
> Jun 11 12:59:02 ha3 cib[5009]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: corosync
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
> Jun 11 12:59:02 ha3 crmd[5014]: notice: main: CRM Git Version: 9d39a6b
> Jun 11 12:59:02 ha3 crmd[5014]: warning:
crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by
group haclient
> Jun 11 12:59:02 ha3 crmd[5014]: warning:
crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group
haclient
> Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_cluster_connect: Connecting
to cluster infrastructure: corosync
> Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
> Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Could not obtain
a node name for corosync nodeid 168427534
> Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_update_peer_state:
attrd_peer_change_cb: Node (null)[168427534] - state is now member (was
(null))
> Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
> Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427534
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
> Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting
to uname -n for the local corosync node name
> Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
> Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
> Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
> Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
> Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_cluster_connect: Connecting
to cluster infrastructure: corosync
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
> Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: setup_cib: Watching for
stonith topology changes
> Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of
CCM Quorum: Ignore
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
> Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum
acquired
> Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node ha3[168427534] - state is now member (was
(null))
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427535
> Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node (null)[168427535] - state is now member (was
(null))
> Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
> Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
> Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is
operational
> Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL
origin=do_started ]
> Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register:
Added 'fencing_route_to_ha4' to the device list (1 active devices)
> Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
> Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker
Cluster Manager..
> Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT
from crm_timer_popped() received in state S_PENDING
> Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC
from do_election_check() received in state S_INTEGRATION
> Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
> Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
> Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
> Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
> Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2
with 1 changes for terminate, id=<n/a>, set=(null)
> Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3
with 1 changes for shutdown, id=<n/a>, set=(null)
> Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for
terminate[ha3]=(null): OK (0)
> Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for
shutdown[ha3]=0: OK (0)
> Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
> Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4
for STONITH
> Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start
ha3_fabric_ping (ha3)
> Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
> Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calc
ulated Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot
fencing operation (12) on ha4 (timeout=60000)
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client
crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
b3ab6141-9612-4024-82b2-350e74bbb33d (0)
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable
to get node name for nodeid 168427534
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting
to uname -n for the local corosync node name
> Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4
> Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4
> Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
> Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
> Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not
running
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not
running
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
3: probe_complete probe_complete on ha3 (local) - no waiting
> Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4
with 1 changes for probe_complete, id=<n/a>, set=(null)
> Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for
probe_complete[ha3]=true: OK (0)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done:
Child process 5030 performing action 'reboot' timed out with signal 15
> Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation
'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation:
fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer:
Couldn't find anyone to fence ha4 with <any>
> Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.5014 at ha3.b3ab6141: No route to host
> Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith
operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host
(-113)
> Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith
operation 2 for ha4 failed (No route to host): aborting transition.
> Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014
> Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0
(Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped
> Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
> Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4
for STONITH
> Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start
ha3_fabric_ping (ha3)
> Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
> Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2
> Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client
crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0)
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4
> Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
> Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
> Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4
> Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation
'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device
'fencing_route_to_ha4' returned: 0 (OK)
> Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.5014 at ha3.eae78d4c: OK
> Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith
operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0)
> Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state:
send_stonith_update: Node ha4[0] - state is now lost (was (null))
> Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4
was terminated (reboot) by ha3 for ha3: OK
(ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014
> Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
4: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root.
> Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root.
> Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5
with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for
pingd[ha3]=0: OK (0)
> Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less
than failure_score(1)
> Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true)
unknown error
> Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4
(ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402509661)
> Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402509661)
> Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1
(Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped
> Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6
with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7
with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
> Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop
ha3_fabric_ping (ha3)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated
Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2
> Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for
fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for
last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
> Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop
ha3_fabric_ping (ha3)
> Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated
Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2
> Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
4: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
> Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete
> Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8
with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
> Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
> Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated
Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2
> Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete
> Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for
pingd[ha3]=(null): OK (0)
>
> /etc/corosync/corosync.conf
> # Please read the corosync.conf.5 manual page
> totem {
> version: 2
>
> crypto_cipher: none
> crypto_hash: none
>
> interface {
> ringnumber: 0
> bindnetaddr: 10.10.0.0
> mcastport: 5405
> ttl: 1
> }
> transport: udpu
> }
>
> logging {
> fileline: off
> to_logfile: no
> to_syslog: yes
> #logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
>
> nodelist {
> node {
> ring0_addr: 10.10.0.14
> }
>
> node {
> ring0_addr: 10.10.0.15
> }
> }
>
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> expected_votes: 2
> }
> [root at ha3 ~]#
>
> Paul Cain
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
[attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM]
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140611/f1ad92cd/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140611/f1ad92cd/attachment-0004.gif>
More information about the Pacemaker
mailing list