[Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.
Andrew Beekhof
andrew at beekhof.net
Fri Jun 13 01:49:42 CEST 2014
On 12 Jun 2014, at 1:02 pm, Paul E Cain <pecain at us.ibm.com> wrote:
> Hi Andrew,
>
> Thank you for your quick response.
>
> I removed the on-fail="standby" items and re-tested but the problem persists.
Thats because ha4 is still partially up.
Corosync is still running so we'd like to know the state of any resources that might be there before starting them on ha3
> The cibadmin -Q I gave you was actually from after I did the STONITH on ha4 and ha3_fabric_ping tried to come up but failed. In hindsight, maybe I should have made that clear or given you cibadmin -Q from while the cluster is sitting there waiting for me to STONITH and ha3_fabric_ping won't start. Any other ideas on why this would fail or even just a way a to get around this problem? I just need to prevent the node from fencing and trying to bring up the cluster resources if it cannot ping 10.10.0.1. Heartbeat had ping_group but I know of no similar feature with Corosync/Pacemaker.
>
> Thanks again for your time.
>
> Info from while the cluster was waiting for me to fence:
> When I ran crm_simulate on it this is what I got:
> [root at ha3 ~]# crm_simulate -x /tmp/cib.xml
>
> Current cluster status:
> Node ha4 (168427535): UNCLEAN (offline)
> Online: [ ha3 ]
>
> ha3_fabric_ping (ocf::pacemaker:ping): Stopped
> ha4_fabric_ping (ocf::pacemaker:ping): Stopped
> fencing_route_to_ha3 (stonith:meatware): Stopped
> fencing_route_to_ha4 (stonith:meatware): Stopped
>
>
> [root at ha3 ~]# crm_mon -1
> Last updated: Wed Jun 11 21:48:16 2014
> Last change: Wed Jun 11 21:38:54 2014 via crmd on ha3
> Stack: corosync
> Current DC: ha3 (168427534) - partition with quorum
> Version: 1.1.10-9d39a6b
> 2 Nodes configured
> 4 Resources configured
>
>
> Node ha4 (168427535): UNCLEAN (offline)
> Online: [ ha3 ]
>
>
> cibadmin -Q
> <cib epoch="208" num_updates="11" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 21:38:54 2014" crm_feature_set="3.0.8" update-origin="ha3" update-client="crmd" have-quorum="1" dc-uuid="168427534">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair name="symmetric-cluster" value="true" id="cib-bootstrap-options-symmetric-cluster"/>
> <nvpair name="stonith-enabled" value="true" id="cib-bootstrap-options-stonith-enabled"/>
> <nvpair name="stonith-action" value="reboot" id="cib-bootstrap-options-stonith-action"/>
> <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
> <nvpair name="stop-orphan-resources" value="true" id="cib-bootstrap-options-stop-orphan-resources"/>
> <nvpair name="stop-orphan-actions" value="true" id="cib-bootstrap-options-stop-orphan-actions"/>
> <nvpair name="default-action-timeout" value="20s" id="cib-bootstrap-options-default-action-timeout"/>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-9d39a6b"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="168427534" uname="ha3"/>
> <node id="168427535" uname="ha4"/>
> </nodes>
> <resources>
> <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> <instance_attributes id="ha3_fabric_ping-instance_attributes">
> <nvpair name="host_list" value="10.10.0.1" id="ha3_fabric_ping-instance_attributes-host_list"/>
> <nvpair name="failure_score" value="1" id="ha3_fabric_ping-instance_attributes-failure_score"/>
> </instance_attributes>
> <operations>
> <op name="start" timeout="60s" requires="nothing" interval="0" id="ha3_fabric_ping-start-0">
> <instance_attributes id="ha3_fabric_ping-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" interval="15s" requires="nothing" timeout="15s" id="ha3_fabric_ping-monitor-15s">
> <instance_attributes id="ha3_fabric_ping-monitor-15s-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha3_fabric_ping-stop-0">
> <instance_attributes id="ha3_fabric_ping-stop-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> <instance_attributes id="ha4_fabric_ping-instance_attributes">
> <nvpair name="host_list" value="10.10.0.1" id="ha4_fabric_ping-instance_attributes-host_list"/>
> <nvpair name="failure_score" value="1" id="ha4_fabric_ping-instance_attributes-failure_score"/>
> </instance_attributes>
> <operations>
> <op name="start" timeout="60s" requires="nothing" interval="0" id="ha4_fabric_ping-start-0">
> <instance_attributes id="ha4_fabric_ping-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" interval="15s" requires="nothing" timeout="15s" id="ha4_fabric_ping-monitor-15s">
> <instance_attributes id="ha4_fabric_ping-monitor-15s-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha4_fabric_ping-stop-0">
> <instance_attributes id="ha4_fabric_ping-stop-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
> <instance_attributes id="fencing_route_to_ha3-instance_attributes">
> <nvpair name="hostlist" value="ha3" id="fencing_route_to_ha3-instance_attributes-hostlist"/>
> </instance_attributes>
> <operations>
> <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha3-start-0">
> <instance_attributes id="fencing_route_to_ha3-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha3-monitor-0">
> <instance_attributes id="fencing_route_to_ha3-monitor-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
> <instance_attributes id="fencing_route_to_ha4-instance_attributes">
> <nvpair name="hostlist" value="ha4" id="fencing_route_to_ha4-instance_attributes-hostlist"/>
> </instance_attributes>
> <operations>
> <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha4-start-0">
> <instance_attributes id="fencing_route_to_ha4-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha4-monitor-0">
> <instance_attributes id="fencing_route_to_ha4-monitor-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" score="INFINITY" node="ha3"/>
> <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" score="-INFINITY" node="ha4"/>
> <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" score="INFINITY" node="ha4"/>
> <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" score="-INFINITY" node="ha3"/>
> <rsc_location id="fencing_route_to_ha4_location" rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
> <rsc_location id="fencing_route_to_ha4_not_location" rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
> <rsc_location id="fencing_route_to_ha3_location" rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
> <rsc_location id="fencing_route_to_ha3_not_location" rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
> <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" score="INFINITY" first="ha3_fabric_ping" first-action="start" then="fencing_route_to_ha4" then-action="start"/>
> <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" score="INFINITY" first="ha4_fabric_ping" first-action="start" then="fencing_route_to_ha3" then-action="start"/>
> </constraints>
> <rsc_defaults>
> <meta_attributes id="rsc-options">
> <nvpair name="resource-stickiness" value="INFINITY" id="rsc-options-resource-stickiness"/>
> <nvpair name="migration-threshold" value="0" id="rsc-options-migration-threshold"/>
> <nvpair name="is-managed" value="true" id="rsc-options-is-managed"/>
> </meta_attributes>
> </rsc_defaults>
> </configuration>
> <status>
> <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
> <lrm id="168427534">
> <lrm_resources>
> <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> <lrm_rsc_op id="ha3_fabric_ping_last_0" operation_key="ha3_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" transition-magic="0:7;4:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1402540735" last-rc-change="1402540735" exec-time="42" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> </lrm_resource>
> <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> <lrm_rsc_op id="ha4_fabric_ping_last_0" operation_key="ha4_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="5:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" transition-magic="0:7;5:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402540735" last-rc-change="1402540735" exec-time="10" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> </lrm_resource>
> <lrm_resource id="fencing_route_to_ha3" type="meatware" class="stonith">
> <lrm_rsc_op id="fencing_route_to_ha3_last_0" operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="6:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" transition-magic="0:7;6:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402540735" last-rc-change="1402540735" exec-time="1" queue-time="0" op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
> </lrm_resource>
> <lrm_resource id="fencing_route_to_ha4" type="meatware" class="stonith">
> <lrm_rsc_op id="fencing_route_to_ha4_last_0" operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="7:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" transition-magic="0:7;7:0:7:7901aff3-92ef-40e6-9193-5f396f4a06f1" call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402540735" last-rc-change="1402540735" exec-time="0" queue-time="0" op-digest="5be26fbcfd648e3d545d0115645dde76"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> <transient_attributes id="168427534">
> <instance_attributes id="status-168427534">
> <nvpair id="status-168427534-shutdown" name="shutdown" value="0"/>
> <nvpair id="status-168427534-probe_complete" name="probe_complete" value="true"/>
> </instance_attributes>
> </transient_attributes>
> </node_state>
> <node_state id="168427535" in_ccm="true" crmd="offline" join="down" crm-debug-origin="do_state_transition"/>
> </status>
> </cib>
>
>
> /var/log/messages
> Jun 11 21:38:32 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster Manager....
> Jun 11 21:38:32 ha3 pacemaker: Starting Pacemaker Cluster Manager
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: mcp_read_config: Configured corosync to accept connections from group 1000: OK (1)
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: main: Starting Pacemaker 1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc lha-fencing nagios corosync-native libesmtp
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: cluster_connect_quorum: Quorum acquired
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> Jun 11 21:38:32 ha3 pacemakerd[12480]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> Jun 11 21:38:32 ha3 pengine[12486]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jun 11 21:38:32 ha3 cib[12482]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jun 11 21:38:32 ha3 cib[12482]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 11 21:38:32 ha3 crmd[12487]: notice: main: CRM Git Version: 9d39a6b
> Jun 11 21:38:32 ha3 crmd[12487]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jun 11 21:38:32 ha3 crmd[12487]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jun 11 21:38:32 ha3 attrd[12485]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 11 21:38:32 ha3 attrd[12485]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 attrd[12485]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 11 21:38:32 ha3 attrd[12485]: notice: crm_update_peer_state: attrd_peer_change_cb: Node (null)[168427534] - state is now member (was (null))
> Jun 11 21:38:32 ha3 attrd[12485]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 attrd[12485]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 stonith-ng[12483]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:32 ha3 cib[12482]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 cib[12482]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 11 21:38:32 ha3 cib[12482]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:32 ha3 cib[12482]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:33 ha3 crmd[12487]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:33 ha3 crmd[12487]: notice: cluster_connect_quorum: Quorum acquired
> Jun 11 21:38:33 ha3 crmd[12487]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> Jun 11 21:38:33 ha3 stonith-ng[12483]: notice: setup_cib: Watching for stonith topology changes
> Jun 11 21:38:33 ha3 stonith-ng[12483]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> Jun 11 21:38:33 ha3 crmd[12487]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> Jun 11 21:38:33 ha3 crmd[12487]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:33 ha3 crmd[12487]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:33 ha3 crmd[12487]: notice: do_started: The local CRM is operational
> Jun 11 21:38:33 ha3 crmd[12487]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jun 11 21:38:34 ha3 stonith-ng[12483]: notice: stonith_device_register: Added 'fencing_route_to_ha4' to the device list (1 active devices)
> Jun 11 21:38:37 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
> Jun 11 21:38:37 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster Manager..
> Jun 11 21:38:54 ha3 crmd[12487]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jun 11 21:38:54 ha3 crmd[12487]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: Diff: --- 0.206.0
> Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: Diff: +++ 0.207.1 6c3024691ae3d5b4c93705a5f2130993
> Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: -- <cib admin_epoch="0" epoch="206" num_updates="0"/>
> Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-9d39a6b"/>
> Jun 11 21:38:54 ha3 cib[12482]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:54 ha3 cib[12482]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:54 ha3 crmd[12487]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> Jun 11 21:38:54 ha3 cib[12482]: notice: log_cib_diff: cib:diff: Local-only Change: 0.208.1
> Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: -- <cib admin_epoch="0" epoch="207" num_updates="1"/>
> Jun 11 21:38:54 ha3 cib[12482]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
> Jun 11 21:38:54 ha3 attrd[12485]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:54 ha3 attrd[12485]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:54 ha3 attrd[12485]: notice: write_attribute: Sent update 2 with 1 changes for terminate, id=<n/a>, set=(null)
> Jun 11 21:38:54 ha3 attrd[12485]: notice: write_attribute: Sent update 3 with 1 changes for shutdown, id=<n/a>, set=(null)
> Jun 11 21:38:54 ha3 attrd[12485]: notice: attrd_cib_callback: Update 2 for terminate[ha3]=(null): OK (0)
> Jun 11 21:38:54 ha3 attrd[12485]: notice: attrd_cib_callback: Update 3 for shutdown[ha3]=0: OK (0)
> Jun 11 21:38:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:38:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 11 21:38:55 ha3 pengine[12486]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 11 21:38:55 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:38:55 ha3 pengine[12486]: warning: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-warn-82.bz2
> Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> Jun 11 21:38:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot fencing operation (12) on ha4 (timeout=60000)
> Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: handle_request: Client crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: f9cbe911-8a32-4abf-9d7a-08e34167c203 (0)
> Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 11 21:38:55 ha3 stonith: [12503]: info: parse config info info=ha4
> Jun 11 21:38:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:38:55 ha3 stonith: [12508]: info: parse config info info=ha4
> Jun 11 21:38:55 ha3 stonith: [12508]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 11 21:38:55 ha3 stonith: [12508]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 11 21:38:55 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=27, confirmed=true) not running
> Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> Jun 11 21:38:55 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=28, confirmed=true) not running
> Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> Jun 11 21:38:55 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on ha3 (local) - no waiting
> Jun 11 21:38:55 ha3 attrd[12485]: notice: write_attribute: Sent update 4 with 1 changes for probe_complete, id=<n/a>, set=(null)
> Jun 11 21:38:55 ha3 attrd[12485]: notice: attrd_cib_callback: Update 4 for probe_complete[ha3]=true: OK (0)
> Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done: Child process 12506 performing action 'reboot' timed out with signal 15
> Jun 11 21:39:55 ha3 stonith-ng[12483]: error: log_operation: Operation 'reboot' [12506] (call 2 from crmd.12487) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 21:39:55 ha3 stonith-ng[12483]: warning: log_operation: fencing_route_to_ha4:12506 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> Jun 11 21:39:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.12487 at ha3.f9cbe911: No route to host
> Jun 11 21:39:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 2/12:0:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host (-113)
> Jun 11 21:39:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 2 for ha4 failed (No route to host): aborting transition.
> Jun 11 21:39:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=f9cbe911-8a32-4abf-9d7a-08e34167c203) by client crmd.12487
> Jun 11 21:39:55 ha3 crmd[12487]: notice: run_graph: Transition 0 (Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-82.bz2): Stopped
> Jun 11 21:39:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:39:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 11 21:39:55 ha3 pengine[12486]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 11 21:39:55 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:39:55 ha3 pengine[12486]: warning: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-83.bz2
> Jun 11 21:39:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: handle_request: Client crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 281b0cc2-f1cc-485f-aa03-c50704fc97f9 (0)
> Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:39:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:39:55 ha3 stonith: [12536]: info: parse config info info=ha4
> Jun 11 21:39:55 ha3 stonith: [12536]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 11 21:39:55 ha3 stonith: [12536]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done: Child process 12535 performing action 'reboot' timed out with signal 15
> Jun 11 21:40:55 ha3 stonith-ng[12483]: error: log_operation: Operation 'reboot' [12535] (call 3 from crmd.12487) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 21:40:55 ha3 stonith-ng[12483]: warning: log_operation: fencing_route_to_ha4:12535 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> Jun 11 21:40:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.12487 at ha3.281b0cc2: No route to host
> Jun 11 21:40:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 3/8:1:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host (-113)
> Jun 11 21:40:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 3 for ha4 failed (No route to host): aborting transition.
> Jun 11 21:40:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=281b0cc2-f1cc-485f-aa03-c50704fc97f9) by client crmd.12487
> Jun 11 21:40:55 ha3 crmd[12487]: notice: run_graph: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
> Jun 11 21:40:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:40:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 11 21:40:55 ha3 pengine[12486]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 11 21:40:55 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:40:55 ha3 pengine[12486]: warning: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-83.bz2
> Jun 11 21:40:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: handle_request: Client crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 1adec92a-c8f0-4087-9e51-e24c947ca171 (0)
> Jun 11 21:40:55 ha3 stonith: [12543]: info: parse config info info=ha4
> Jun 11 21:40:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:40:55 ha3 stonith: [12545]: info: parse config info info=ha4
> Jun 11 21:40:55 ha3 stonith: [12545]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 11 21:40:55 ha3 stonith: [12545]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done: Child process 12544 performing action 'reboot' timed out with signal 15
> Jun 11 21:41:55 ha3 stonith-ng[12483]: error: log_operation: Operation 'reboot' [12544] (call 4 from crmd.12487) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 21:41:55 ha3 stonith-ng[12483]: warning: log_operation: fencing_route_to_ha4:12544 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> Jun 11 21:41:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.12487 at ha3.1adec92a: No route to host
> Jun 11 21:41:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 4/8:2:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host (-113)
> Jun 11 21:41:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 4 for ha4 failed (No route to host): aborting transition.
> Jun 11 21:41:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=1adec92a-c8f0-4087-9e51-e24c947ca171) by client crmd.12487
> Jun 11 21:41:55 ha3 crmd[12487]: notice: run_graph: Transition 2 (Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
> Jun 11 21:41:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:41:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 11 21:41:55 ha3 pengine[12486]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 11 21:41:55 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:41:55 ha3 pengine[12486]: warning: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-warn-83.bz2
> Jun 11 21:41:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: handle_request: Client crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 1511d2cb-2ab9-4a06-9676-807ed8b27f2b (0)
> Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:41:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:41:55 ha3 stonith: [12548]: info: parse config info info=ha4
> Jun 11 21:41:55 ha3 stonith: [12548]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 11 21:41:55 ha3 stonith: [12548]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done: Child process 12547 performing action 'reboot' timed out with signal 15
> Jun 11 21:42:55 ha3 stonith-ng[12483]: error: log_operation: Operation 'reboot' [12547] (call 5 from crmd.12487) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 21:42:55 ha3 stonith-ng[12483]: warning: log_operation: fencing_route_to_ha4:12547 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> Jun 11 21:42:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.12487 at ha3.1511d2cb: No route to host
> Jun 11 21:42:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 5/8:3:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host (-113)
> Jun 11 21:42:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 5 for ha4 failed (No route to host): aborting transition.
> Jun 11 21:42:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=1511d2cb-2ab9-4a06-9676-807ed8b27f2b) by client crmd.12487
> Jun 11 21:42:55 ha3 crmd[12487]: notice: run_graph: Transition 3 (Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
> Jun 11 21:42:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:42:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 11 21:42:55 ha3 pengine[12486]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 11 21:42:55 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:42:55 ha3 pengine[12486]: warning: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-warn-83.bz2
> Jun 11 21:42:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: handle_request: Client crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 4de370d1-dab3-4f8f-82cf-969899d6008c (0)
> Jun 11 21:42:55 ha3 stonith: [12550]: info: parse config info info=ha4
> Jun 11 21:42:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:42:55 ha3 stonith: [12552]: info: parse config info info=ha4
> Jun 11 21:42:55 ha3 stonith: [12552]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 11 21:42:55 ha3 stonith: [12552]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: stonith_action_async_done: Child process 12551 performing action 'reboot' timed out with signal 15
> Jun 11 21:43:55 ha3 stonith-ng[12483]: error: log_operation: Operation 'reboot' [12551] (call 6 from crmd.12487) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 11 21:43:55 ha3 stonith-ng[12483]: warning: log_operation: fencing_route_to_ha4:12551 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> Jun 11 21:43:55 ha3 stonith-ng[12483]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.12487 at ha3.4de370d1: No route to host
> Jun 11 21:43:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 6/8:4:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: No route to host (-113)
> Jun 11 21:43:55 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 6 for ha4 failed (No route to host): aborting transition.
> Jun 11 21:43:55 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=4de370d1-dab3-4f8f-82cf-969899d6008c) by client crmd.12487
> Jun 11 21:43:55 ha3 crmd[12487]: notice: run_graph: Transition 4 (Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
> Jun 11 21:43:55 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:43:55 ha3 pengine[12486]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 11 21:43:55 ha3 pengine[12486]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 11 21:43:55 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:43:55 ha3 pengine[12486]: warning: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-warn-83.bz2
> Jun 11 21:43:55 ha3 crmd[12487]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: handle_request: Client crmd.12487.3b5db081 wants to fence (reboot) 'ha4' with device '(any)'
> Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: e766648d-20f0-4b94-b001-4873f9f8bb37 (0)
> Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:43:55 ha3 stonith-ng[12483]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 11 21:43:55 ha3 stonith: [12554]: info: parse config info info=ha4
> Jun 11 21:43:55 ha3 stonith: [12554]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 11 21:43:55 ha3 stonith: [12554]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 11 21:44:47 ha3 stonith: [12554]: info: node Meatware-reset: ha4
> Jun 11 21:44:47 ha3 stonith-ng[12483]: notice: log_operation: Operation 'reboot' [12553] (call 7 from crmd.12487) for host 'ha4' with device 'fencing_route_to_ha4' returned: 0 (OK)
> Jun 11 21:44:47 ha3 stonith-ng[12483]: notice: remote_op_done: Operation reboot of ha4 by ha3 for crmd.12487 at ha3.e766648d: OK
> Jun 11 21:44:47 ha3 crmd[12487]: notice: tengine_stonith_callback: Stonith operation 7/8:5:0:7901aff3-92ef-40e6-9193-5f396f4a06f1: OK (0)
> Jun 11 21:44:47 ha3 crmd[12487]: notice: crm_update_peer_state: send_stonith_update: Node ha4[0] - state is now lost (was (null))
> Jun 11 21:44:47 ha3 crmd[12487]: notice: tengine_stonith_notify: Peer ha4 was terminated (reboot) by ha3 for ha3: OK (ref=e766648d-20f0-4b94-b001-4873f9f8bb37) by client crmd.12487
> Jun 11 21:44:47 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 4: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 11 21:45:07 ha3 attrd[12485]: notice: write_attribute: Sent update 5 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 21:45:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 5 for pingd[ha3]=0: OK (0)
> Jun 11 21:45:07 ha3 ping(ha3_fabric_ping)[12560]: WARNING: pingd is less than failure_score(1)
> Jun 11 21:45:07 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=43, confirmed=true) unknown error
> Jun 11 21:45:07 ha3 crmd[12487]: warning: status_from_rc: Action 4 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 11 21:45:07 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541107)
> Jun 11 21:45:07 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541107)
> Jun 11 21:45:07 ha3 crmd[12487]: notice: run_graph: Transition 5 (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-83.bz2): Stopped
> Jun 11 21:45:07 ha3 attrd[12485]: notice: write_attribute: Sent update 6 with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 21:45:07 ha3 attrd[12485]: notice: write_attribute: Sent update 7 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:45:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-315.bz2
> Jun 11 21:45:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 6 for fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> Jun 11 21:45:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 7 for last-failure-ha3_fabric_ping[ha3]=1402541107: OK (0)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:45:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:45:07 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-316.bz2
> Jun 11 21:45:07 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 11 21:45:07 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=47, confirmed=true) ok
> Jun 11 21:45:07 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 5: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 11 21:45:12 ha3 attrd[12485]: notice: write_attribute: Sent update 8 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 21:45:12 ha3 attrd[12485]: notice: attrd_cib_callback: Update 8 for pingd[ha3]=(null): OK (0)
> Jun 11 21:45:27 ha3 ping(ha3_fabric_ping)[12607]: WARNING: pingd is less than failure_score(1)
> Jun 11 21:45:27 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=20, rc=1, cib-update=48, confirmed=true) unknown error
> Jun 11 21:45:27 ha3 crmd[12487]: warning: status_from_rc: Action 5 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 11 21:45:27 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541127)
> Jun 11 21:45:27 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541127)
> Jun 11 21:45:27 ha3 crmd[12487]: notice: run_graph: Transition 7 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-316.bz2): Stopped
> Jun 11 21:45:27 ha3 attrd[12485]: notice: write_attribute: Sent update 9 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:45:27 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-317.bz2
> Jun 11 21:45:27 ha3 attrd[12485]: notice: attrd_cib_callback: Update 9 for last-failure-ha3_fabric_ping[ha3]=1402541127: OK (0)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:45:27 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:45:27 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-318.bz2
> Jun 11 21:45:27 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 11 21:45:27 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=21, rc=0, cib-update=51, confirmed=true) ok
> Jun 11 21:45:27 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 5: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 11 21:45:32 ha3 attrd[12485]: notice: write_attribute: Sent update 10 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 21:45:32 ha3 attrd[12485]: notice: attrd_cib_callback: Update 10 for pingd[ha3]=(null): OK (0)
> Jun 11 21:45:47 ha3 ping(ha3_fabric_ping)[12654]: WARNING: pingd is less than failure_score(1)
> Jun 11 21:45:47 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=22, rc=1, cib-update=52, confirmed=true) unknown error
> Jun 11 21:45:47 ha3 crmd[12487]: warning: status_from_rc: Action 5 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 11 21:45:47 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541147)
> Jun 11 21:45:47 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541147)
> Jun 11 21:45:47 ha3 crmd[12487]: notice: run_graph: Transition 9 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-318.bz2): Stopped
> Jun 11 21:45:47 ha3 attrd[12485]: notice: write_attribute: Sent update 11 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:45:47 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 10: /var/lib/pacemaker/pengine/pe-input-319.bz2
> Jun 11 21:45:47 ha3 attrd[12485]: notice: attrd_cib_callback: Update 11 for last-failure-ha3_fabric_ping[ha3]=1402541147: OK (0)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:45:47 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:45:47 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 11: /var/lib/pacemaker/pengine/pe-input-320.bz2
> Jun 11 21:45:47 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 11 21:45:47 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=23, rc=0, cib-update=55, confirmed=true) ok
> Jun 11 21:45:47 ha3 crmd[12487]: notice: te_rsc_command: Initiating action 5: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 11 21:45:52 ha3 attrd[12485]: notice: write_attribute: Sent update 12 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 11 21:45:52 ha3 attrd[12485]: notice: attrd_cib_callback: Update 12 for pingd[ha3]=(null): OK (0)
> Jun 11 21:46:07 ha3 ping(ha3_fabric_ping)[12700]: WARNING: pingd is less than failure_score(1)
> Jun 11 21:46:07 ha3 crmd[12487]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=24, rc=1, cib-update=56, confirmed=true) unknown error
> Jun 11 21:46:07 ha3 crmd[12487]: warning: status_from_rc: Action 5 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 11 21:46:07 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541167)
> Jun 11 21:46:07 ha3 crmd[12487]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402541167)
> Jun 11 21:46:07 ha3 crmd[12487]: notice: run_graph: Transition 11 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-320.bz2): Stopped
> Jun 11 21:46:07 ha3 attrd[12485]: notice: write_attribute: Sent update 13 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:46:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: process_pe_message: Calculated Transition 12: /var/lib/pacemaker/pengine/pe-input-321.bz2
> Jun 11 21:46:07 ha3 attrd[12485]: notice: attrd_cib_callback: Update 13 for last-failure-ha3_fabric_ping[ha3]=1402541167: OK (0)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 11 21:46:07 ha3 pengine[12486]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 11 21:46:07 ha3 pengine[12486]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
>
> Paul Cain
>
> <graycol.gif>Andrew Beekhof ---06/11/2014 07:20:44 PM---On 12 Jun 2014, at 4:55 am, Paul E Cain <pecain at us.ibm.com> wrote: > Hello,
>
> From: Andrew Beekhof <andrew at beekhof.net>
> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Date: 06/11/2014 07:20 PM
> Subject: Re: [Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.
>
>
>
>
> On 12 Jun 2014, at 4:55 am, Paul E Cain <pecain at us.ibm.com> wrote:
>
> > Hello,
> >
> > Overview
> > I'm experimenting with a small two-node Pacemaker cluster on two RHEL 7 VMs. One of the things I need to do is ensure that my cluster can connect to a certain IP address 10.10.0.1 because once I add the actual resources that will need to be HA those resources will need access to 10.10.0.1 for the cluster to functional normally. To do that, I have one ocf:pacemaker:ping resource for each node to check that connectivity. If the ping fails, the node should go into standby mode and get fenced if possible. Additionally, when a node first comes up I want that connectivity check to happen before the fencing agents come up or a STONITH happens because a node should not try to take over cluster resources if it cannot connect to 10.10.0.1. To do this, I tried adding requires="nothing" and prereq="nothing" to all the operations for both pinging resources. I also have two meatware fencing agents to use for testing. I'm using order constraints so they don't start until after the ping resources.
> >
> > Cluster When Functioning Normally
> > [root at ha3 ~]# crm_mon -1
> > Last updated: Wed Jun 11 13:10:54 2014
> > Last change: Wed Jun 11 13:10:35 2014 via crmd on ha3
> > Stack: corosync
> > Current DC: ha3 (168427534) - partition with quorum
> > Version: 1.1.10-9d39a6b
> > 2 Nodes configured
> > 4 Resources configured
> >
> >
> > Online: [ ha3 ha4 ]
> >
> > ha3_fabric_ping (ocf::pacemaker:ping): Started ha3
> > ha4_fabric_ping (ocf::pacemaker:ping): Started ha4
> > fencing_route_to_ha3 (stonith:meatware): Started ha4
> > fencing_route_to_ha4 (stonith:meatware): Started ha3
> >
> >
> > Testing
> > However, when I tested this by only starting up pacemaker on ha3 and also preventing ha3 from connecting to 10.10.0.1, I found that ha3 would not start until after ha4 was STONITHed. What I was aiming for was for ha3_fabric_ping to fail to start, which would prevent the fencing agent from starting and therefore prevent any STONITH.
> >
> >
> > Question
> > Any ideas why this is not working as expected? It's my understanding that requires="nothing" should allow ha3_fabric_ping to start even before any fencing operations. Maybe I'm misunderstanding something.
>
>
> Its because the entire node is in standby mode.
> Running crm_simulate with the cib.xml below shows:
>
> Node ha3 (168427534): standby (on-fail)
>
> In the config I see:
>
> <op name="monitor" interval="15s" requires="nothing" on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
>
> and:
>
> <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" operation_key="ha3_fabric_ping_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641" last-rc-change="1402509641" exec-time="20043" queue-time="0" op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
>
> Note: rc-code="1"
>
> The combination put the node into standby and prevented resources starting.
>
> >
> > Thanks for any help you can offer.
> >
> > Below is shows the software versions, cibadmin -Q, the /var/log/messages on ha3 during my test, and my corosync.conf file.
> >
> > Tell me if you need any more information.
> >
> > Software Versions (All Compiled From Source From The Website of the Respective Projects)
> > Cluster glue 1.0.11
> > libqb 0.17.0
> > Corosync 2.3.3
> > Pacemaker 1.1.11
> > Resources Agents 3.9.5
> > crmsh 2.0
> >
> > cibadmin -Q
> > <cib epoch="204" num_updates="18" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 12:56:50 2014" crm_feature_set="3.0.8" update-origin="ha3" update-client="crm_resource" have-quorum="1" dc-uuid="168427534">
> > <configuration>
> > <crm_config>
> > <cluster_property_set id="cib-bootstrap-options">
> > <nvpair name="symmetric-cluster" value="true" id="cib-bootstrap-options-symmetric-cluster"/>
> > <nvpair name="stonith-enabled" value="true" id="cib-bootstrap-options-stonith-enabled"/>
> > <nvpair name="stonith-action" value="reboot" id="cib-bootstrap-options-stonith-action"/>
> > <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
> > <nvpair name="stop-orphan-resources" value= "true" id="cib-bootstrap-options-stop-orphan-resources"/>
> > <nvpair name="stop-orphan-actions" value="true" id="cib-bootstrap-options-stop-orphan-actions"/>
> > <nvpair name="default-action-timeout" value="20s" id="cib-bootstrap-options-default-action-timeout"/>
> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-9d39a6b"/>
> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
> > </cluster_property_set>
> > </crm_config>
> > <nodes>
> > <node id="168427534" uname="ha3"/>
> > <node id="168427535" uname="ha4"/>
> > </nodes>
> > <resources>
> > <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> > <instance_attributes id="ha3_fabric_ping-instance_attributes">
> > <nvpair name="host_list" value="10.10.0.1" id="ha3_fabric_ping-instance_attributes-host_list"/>
> > <nvpair name="failure_score" value="1" id="ha3_fabric_ping-instance_attributes-failure_score"/>
> > </instance_attributes>
> > <operations>
> > <op name="start" timeout="60s" requires="nothing" on-fail="standby" interval="0" id="ha3_fabric_ping-start-0">
> > <instance_attributes id="ha3_fabric_ping-start-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > <op name="monitor" interval="15s" requires="nothing" on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
> > <instance_attributes id="ha3_fabric_ping-monitor-15s-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha3_fabric_ping-stop-0">
> > <instance_attributes id="ha3_fabric_ping-stop-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > </operations>
> > <meta_attributes id="ha3_fabric_ping-meta_attributes">
> > <nvpair id="ha3_fabric_ping-meta_attributes-requires" name="requires" value="nothing"/>
> > </meta_attributes>
> > </primitive>
> > <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> > <instance_attributes id="ha4_fabric_ping-instance_attributes">
> > <nvpair name="host_list" value="10.10.0.1" id="ha4_fabric_ping-instance_attributes-host_list"/>
> > <nvpair name="failure_score" value="1" id="ha4_fabric_ping-instance_attributes-failure_score"/>
> > </instance_attributes>
> > <operations>
> > <op name="start" timeout="60s" requires="nothing" on-fail="standby" interval="0" id="ha4_fabric_ping-start-0">
> > <instance_attributes id="ha4_fabric_ping-start-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > <op name="monitor" interval="15s" requires="nothing" on-fail="standby" timeout="15s" id="ha4_fabric_ping-monitor-15s">
> > <instance_attributes id="ha4_fabric_ping-monitor-15s-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha4_fabric_ping-stop-0">
> > <instance_attributes id="ha4_fabric_ping-stop-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > </operations>
> > <meta_attributes id="ha4_fabric_ping-meta_attributes">
> > <nvpair id="ha4_fabric_ping-meta_attributes-requires" name="requires" value="nothing"/>
> > </meta_attributes>
> > </primitive>
> > <primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
> > <instance_attributes id="fencing_route_to_ha3-instance_attributes">
> > <nvpair name="hostlist" value="ha3" id="fencing_route_to_ha3-instance_attributes-hostlist"/>
> > </instance_attributes>
> > <operations>
> > <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha3-start-0">
> > <instance_attributes id="fencing_route_to_ha3-start-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha3-monitor-0">
> > <instance_attributes id="fencing_route_to_ha3-monitor-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > </operations>
> > </primitive>
> > <primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
> > <instance_attributes id="fencing_route_to_ha4-instance_attributes">
> > <nvpair name="hostlist" value="ha4" id="fencing_route_to_ha4-instance_attributes-hostlist"/>
> > </instance_attributes>
> > <operations>
> > <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha4-start-0">
> > <instance_attributes id="fencing_route_to_ha4-start-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha4-monitor-0">
> > <instance_attributes id="fencing_route_to_ha4-monitor-0-instance_attributes">
> > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
> > </instance_attributes>
> > </op>
> > </operations>
> > </primitive>
> > </resources>
> > <constraints>
> > <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" score="INFINITY" node="ha3"/>
> > <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" score="-INFINITY" node="ha4"/>
> > <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" score="INFINITY" node="ha4"/>
> > <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" score="-INFINITY" node="ha3"/>
> > <rsc_location id="fencing_route_to_ha4_location" rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
> > <rsc_location id="fencing_route_to_ha4_not_location" rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
> > <rsc_location id="fencing_route_to_ha3_location" rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
> > <rsc_location id="fencing_route_to_ha3_not_location" rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
> > <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" score="INFINITY" first="ha3_fabric_ping" first-action="start" then="fencing_route_to_ha4" then-action="start"/>
> > <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" score="INFINITY" first="ha4_fabric_ping" first-action="start" then="fencing_route_to_ha3" then-action="start"/>
> > </constraints>
> > <rsc_defaults>
> > <meta_attributes id="rsc-options">
> > <nvpair name="resource-stickiness" value="INFINITY" id="rsc-options-resource-stickiness"/>
> > <nvpair name="migration-threshold" value="0" id="rsc-options-migration-threshold"/>
> > <nvpair name="is-managed" value="true" id="rsc-options-is-managed"/>
> > </meta_attributes>
> > </rsc_defaults>
> > </configuration>
> > <status>
> > <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
> > <lrm id="168427534">
> > <lrm_resources>
> > <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> > <lrm_rsc_op id="ha3_fabric_ping_last_0" operation_key="ha3_fabric_ping_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:0;4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="19" rc-code="0" op-status="0" interval="0" last-run="1402509661" last-rc-change="1402509661" exec-time="12" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> > <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" operation_key="ha3_fabric_ping_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641" last-rc-change="1402509641" exec-time="20043" queue-time="0" op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
> > </lrm_resource>
> > <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> > <lrm_rsc_op id="ha4_fabric_ping_last_0" operation_key="ha4_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:7;5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402509565" last-rc-change="1402509565" exec-time="10" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> > </lrm_resource>
> > <lrm_resource id="fencing_route_to_ha3" type="meatware" class="stonith">
> > <lrm_rsc_op id="fencing_route_to_ha3_last_0" operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:7;6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402509565" last-rc-change="1402509565" exec-time="1" queue-time="0" op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
> > </lrm_resource>
> > <lrm_resource id="fencing_route_to_ha4" type="meatware" class="stonith">
> > <lrm_rsc_op id="fencing_route_to_ha4_last_0" operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:7;7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402509565" last-rc-change="1402509565" exec-time="0" queue-time="0" op-digest="5be26fbcfd648e3d545d0115645dde76"/>
> > </lrm_resource>
> > </lrm_resources>
> > </lrm>
> > <transient_attributes id="168427534">
> > <instance_attributes id="status-168427534">
> > <nvpair id="status-168427534-shutdown" name="shutdown" value="0"/>
> > <nvpair id="status-168427534-probe_complete" name="probe_complete" value="true"/>
> > <nvpair id="status-168427534-fail-count-ha3_fabric_ping" name="fail-count-ha3_fabric_ping" value="INFINITY"/>
> > <nvpair id="status-168427534-last-failure-ha3_fabric_ping" name="last-failure-ha3_fabric_ping" value="1402509661"/>
> > </instance_attributes>
> > </transient_attributes>
> > </node_state>
> > <node_state id="168427535" in_ccm="false" crmd="offline" join="down" crm-debug-origin="send_stonith_update" uname="ha4" expected="down"/>
> > </status>
> > </cib>
> > [root at ha3 ~]#
> >
> >
> > /var/log/messages from when pacemaker started on ha3 to when ha3_fabric_ping failed.
> > Jun 11 12:59:01 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster Manager....
> > Jun 11 12:59:01 ha3 pacemaker: Starting Pacemaker Cluster Manager
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: mcp_read_config: Configured corosync to accept connections from group 1000: OK (1)
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: main: Starting Pacemaker 1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc lha-fencing nagios corosync-native libesmtp
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: cluster_connect_quorum: Quorum acquired
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> > Jun 11 12:59:02 ha3 pengine[5013]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> > Jun 11 12:59:02 ha3 cib[5009]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> > Jun 11 12:59:02 ha3 cib[5009]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > Jun 11 12:59:02 ha3 crmd[5014]: notice: main: CRM Git Version: 9d39a6b
> > Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> > Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> > Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_update_peer_state: attrd_peer_change_cb: Node (null)[168427534] - state is now member (was (null))
> > Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: setup_cib: Watching for stonith topology changes
> > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum acquired
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is operational
> > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> > Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register: Added 'fencing_route_to_ha4' to the device list (1 active devices)
> > Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
> > Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster Manager..
> > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> > Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
> > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> > Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2 with 1 changes for terminate, id=<n/a>, set=(null)
> > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3 with 1 changes for shutdown, id=<n/a>, set=(null)
> > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for terminate[ha3]=(null): OK (0)
> > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for shutdown[ha3]=0: OK (0)
> > Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for STONITH
> > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start ha3_fabric_ping (ha3)
> > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> > Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calc ulated Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot fencing operation (12) on ha4 (timeout=60000)
> > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: b3ab6141-9612-4024-82b2-350e74bbb33d (0)
> > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4
> > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4
> > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not running
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not running
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on ha3 (local) - no waiting
> > Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4 with 1 changes for probe_complete, id=<n/a>, set=(null)
> > Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for probe_complete[ha3]=true: OK (0)
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done: Child process 5030 performing action 'reboot' timed out with signal 15
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation 'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation: fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ]
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.5014 at ha3.b3ab6141: No route to host
> > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host (-113)
> > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 2 for ha4 failed (No route to host): aborting transition.
> > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014
> > Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0 (Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped
> > Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for STONITH
> > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start ha3_fabric_ping (ha3)
> > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> > Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2
> > Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0)
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4
> > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> > Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4
> > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation 'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device 'fencing_route_to_ha4' returned: 0 (OK)
> > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation reboot of ha4 by ha3 for crmd.5014 at ha3.eae78d4c: OK
> > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0)
> > Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state: send_stonith_update: Node ha4[0] - state is now lost (was (null))
> > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was terminated (reboot) by ha3 for ha3: OK (ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014
> > Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: start ha3_fabric_ping_start_0 on ha3 (local)
> > Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root.
> > Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root.
> > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5 with 1 changes for pingd, id=<n/a>, set=(null)
> > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for pingd[ha3]=0: OK (0)
> > Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less than failure_score(1)
> > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true) unknown error
> > Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402509661)
> > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402509661)
> > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1 (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped
> > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6 with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop ha3_fabric_ping (ha3)
> > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2
> > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0)
> > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop ha3_fabric_ping (ha3)
> > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2
> > Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: stop ha3_fabric_ping_stop_0 on ha3 (local)
> > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
> > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete
> > Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8 with 1 changes for pingd, id=<n/a>, set=(null)
> > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2
> > Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete
> > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for pingd[ha3]=(null): OK (0)
> >
> > /etc/corosync/corosync.conf
> > # Please read the corosync.conf.5 manual page
> > totem {
> > version: 2
> >
> > crypto_cipher: none
> > crypto_hash: none
> >
> > interface {
> > ringnumber: 0
> > bindnetaddr: 10.10.0.0
> > mcastport: 5405
> > ttl: 1
> > }
> > transport: udpu
> > }
> >
> > logging {
> > fileline: off
> > to_logfile: no
> > to_syslog: yes
> > #logfile: /var/log/cluster/corosync.log
> > debug: off
> > timestamp: on
> > logger_subsys {
> > subsys: QUORUM
> > debug: off
> > }
> > }
> >
> > nodelist {
> > node {
> > ring0_addr: 10.10.0.14
> > }
> >
> > node {
> > ring0_addr: 10.10.0.15
> > }
> > }
> >
> > quorum {
> > # Enable and configure quorum subsystem (default: off)
> > # see also corosync.conf.5 and votequorum.5
> > provider: corosync_votequorum
> > expected_votes: 2
> > }
> > [root at ha3 ~]#
> >
> > Paul Cain
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> [attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM] _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140613/21dbf649/attachment-0001.sig>
More information about the Pacemaker
mailing list