[Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.
Andrew Beekhof
andrew at beekhof.net
Fri Jun 27 08:27:51 CEST 2014
On 14 Jun 2014, at 7:37 am, Paul E Cain <pecain at us.ibm.com> wrote:
> Hi Andrew,
>
> Thank you for your quick response. This time, I completely shut down ha4 and then started corosync and pacemaker on ha3. However, the problem still persisted. It's my understanding that using requires="nothing" or prereq="nothing" should allow the cluster to start resources without needing to fence. Is this not correct?
Apparently not without this patch:
https://github.com/ClusterLabs/pacemaker/commit/2a5bbf9
>
>
> [root at ha3 ~]# cat /tmp/cib.xml
> <cib epoch="216" num_updates="9" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Jun 12 21:25:13 2014" crm_feature_set="3.0.8" update-origin="ha3" update-client="crmd" have-quorum="0" dc-uuid="168427534">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair name="symmetric-cluster" value="true" id="cib-bootstrap-options-symmetric-cluster"/>
> <nvpair name="stonith-enabled" value="true" id="cib-bootstrap-options-stonith-enabled"/>
> <nvpair name="stonith-action" value="reboot" id="cib-bootstrap-options-stonith-action"/>
> <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
> <nvpair name="stop-orphan-resources" value="true" id="cib-bootstrap-options-stop-orphan-resources"/>
> <nvpair name="stop-orphan-actions" value="true" id="cib-bootstrap-options-stop-orphan-actions"/>
> <nvpair name="default-action-timeout" value="20s" id="cib-bootstrap-options-default-action-timeout"/>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-9d39a6b"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="168427534" uname="ha3"/>
> <node id="168427535" uname="ha4"/>
> </nodes>
> <resources>
> <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> <instance_attributes id="ha3_fabric_ping-instance_attributes">
> <nvpair name="host_list" value="10.10.0.1" id="ha3_fabric_ping-instance_attributes-host_list"/>
> <nvpair name="failure_score" value="1" id="ha3_fabric_ping-instance_attributes-failure_score"/>
> </instance_attributes>
> <operations>
> <op name="start" timeout="60s" requires="nothing" interval="0" id="ha3_fabric_ping-start-0">
> <instance_attributes id="ha3_fabric_ping-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" interval="15s" requires="nothing" timeout="15s" id="ha3_fabric_ping-monitor-15s">
> <instance_attributes id="ha3_fabric_ping-monitor-15s-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha3_fabric_ping-stop-0">
> <instance_attributes id="ha3_fabric_ping-stop-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> <instance_attributes id="ha4_fabric_ping-instance_attributes">
> <nvpair name="host_list" value="10.10.0.1" id="ha4_fabric_ping-instance_attributes-host_list"/>
> <nvpair name="failure_score" value="1" id="ha4_fabric_ping-instance_attributes-failure_score"/>
> </instance_attributes>
> <operations>
> <op name="start" timeout="60s" requires="nothing" interval="0" id="ha4_fabric_ping-start-0">
> <instance_attributes id="ha4_fabric_ping-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" interval="15s" requires="nothing" timeout="15s" id="ha4_fabric_ping-monitor-15s">
> <instance_attributes id="ha4_fabric_ping-monitor-15s-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha4_fabric_ping-stop-0">
> <instance_attributes id="ha4_fabric_ping-stop-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
> <instance_attributes id="fencing_route_to_ha3-instance_attributes">
> <nvpair name="hostlist" value="ha3" id="fencing_route_to_ha3-instance_attributes-hostlist"/>
> </instance_attributes>
> <operations>
> <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha3-start-0">
> <instance_attributes id="fencing_route_to_ha3-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha3-monitor-0">
> <instance_attributes id="fencing_route_to_ha3-monitor-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> <primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
> <instance_attributes id="fencing_route_to_ha4-instance_attributes">
> <nvpair name="hostlist" value="ha4" id="fencing_route_to_ha4-instance_attributes-hostlist"/>
> </instance_attributes>
> <operations>
> <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha4-start-0">
> <instance_attributes id="fencing_route_to_ha4-start-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha4-monitor-0">
> <instance_attributes id="fencing_route_to_ha4-monitor-0-instance_attributes">
> <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
> </instance_attributes>
> </op>
> </operations>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" score="INFINITY" node="ha3"/>
> <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" score="-INFINITY" node="ha4"/>
> <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" score="INFINITY" node="ha4"/>
> <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" score="-INFINITY" node="ha3"/>
> <rsc_location id="fencing_route_to_ha4_location" rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
> <rsc_location id="fencing_route_to_ha4_not_location" rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
> <rsc_location id="fencing_route_to_ha3_location" rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
> <rsc_location id="fencing_route_to_ha3_not_location" rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
> <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" score="INFINITY" first="ha3_fabric_ping" first-action="start" then="fencing_route_to_ha4" then-action="start"/>
> <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" score="INFINITY" first="ha4_fabric_ping" first-action="start" then="fencing_route_to_ha3" then-action="start"/>
> </constraints>
> <rsc_defaults>
> <meta_attributes id="rsc-options">
> <nvpair name="resource-stickiness" value="INFINITY" id="rsc-options-resource-stickiness"/>
> <nvpair name="migration-threshold" value="0" id="rsc-options-migration-threshold"/>
> <nvpair name="is-managed" value="true" id="rsc-options-is-managed"/>
> </meta_attributes>
> </rsc_defaults>
> </configuration>
> <status>
> <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
> <lrm id="168427534">
> <lrm_resources>
> <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> <lrm_rsc_op id="ha3_fabric_ping_last_0" operation_key="ha3_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" transition-magic="0:7;4:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1402626507" last-rc-change="1402626507" exec-time="42" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> </lrm_resource>
> <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> <lrm_rsc_op id="ha4_fabric_ping_last_0" operation_key="ha4_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="5:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" transition-magic="0:7;5:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402626507" last-rc-change="1402626507" exec-time="8" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> </lrm_resource>
> <lrm_resource id="fencing_route_to_ha3" type="meatware" class="stonith">
> <lrm_rsc_op id="fencing_route_to_ha3_last_0" operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="6:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" transition-magic="0:7;6:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402626507" last-rc-change="1402626507" exec-time="0" queue-time="0" op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
> </lrm_resource>
> <lrm_resource id="fencing_route_to_ha4" type="meatware" class="stonith">
> <lrm_rsc_op id="fencing_route_to_ha4_last_0" operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="7:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" transition-magic="0:7;7:1:7:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95" call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402626507" last-rc-change="1402626507" exec-time="0" queue-time="0" op-digest="5be26fbcfd648e3d545d0115645dde76"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> <transient_attributes id="168427534">
> <instance_attributes id="status-168427534">
> <nvpair id="status-168427534-shutdown" name="shutdown" value="0"/>
> <nvpair id="status-168427534-probe_complete" name="probe_complete" value="true"/>
> </instance_attributes>
> </transient_attributes>
> </node_state>
> </status>
> </cib>
>
>
> Jun 12 21:27:59 ha3 systemd: Starting LSB: Starts and stops Corosync Cluster Engine....
> Jun 12 21:27:59 ha3 corosync[4346]: [MAIN ] Corosync Cluster Engine ('2.3.3'): started and ready to provide service.
> Jun 12 21:27:59 ha3 corosync[4346]: [MAIN ] Corosync built-in features: pie relro bindnow
> Jun 12 21:27:59 ha3 corosync[4347]: [TOTEM ] Initializing transport (UDP/IP Unicast).
> Jun 12 21:27:59 ha3 corosync[4347]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
> Jun 12 21:27:59 ha3 corosync[4347]: [TOTEM ] The network interface [10.10.0.14] is now up.
> Jun 12 21:27:59 ha3 corosync[4347]: [SERV ] Service engine loaded: corosync configuration map access [0]
> Jun 12 21:27:59 ha3 corosync[4347]: [QB ] server name: cmap
> Jun 12 21:27:59 ha3 corosync[4347]: [SERV ] Service engine loaded: corosync configuration service [1]
> Jun 12 21:27:59 ha3 corosync[4347]: [QB ] server name: cfg
> Jun 12 21:27:59 ha3 corosync[4347]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
> Jun 12 21:27:59 ha3 corosync[4347]: [QB ] server name: cpg
> Jun 12 21:27:59 ha3 corosync[4347]: [SERV ] Service engine loaded: corosync profile loading service [4]
> Jun 12 21:27:59 ha3 corosync[4347]: [QUORUM] Using quorum provider corosync_votequorum
> Jun 12 21:27:59 ha3 corosync[4347]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
> Jun 12 21:27:59 ha3 corosync[4347]: [QB ] server name: votequorum
> Jun 12 21:27:59 ha3 corosync[4347]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
> Jun 12 21:27:59 ha3 corosync[4347]: [QB ] server name: quorum
> Jun 12 21:27:59 ha3 corosync[4347]: [TOTEM ] adding new UDPU member {10.10.0.14}
> Jun 12 21:27:59 ha3 corosync[4347]: [TOTEM ] adding new UDPU member {10.10.0.15}
> Jun 12 21:27:59 ha3 corosync[4347]: [TOTEM ] A new membership (10.10.0.14:980) was formed. Members joined: 168427534
> Jun 12 21:27:59 ha3 corosync[4347]: [QUORUM] Members[1]: 168427534
> Jun 12 21:27:59 ha3 corosync[4347]: [MAIN ] Completed service synchronization, ready to provide service.
> Jun 12 21:28:00 ha3 corosync: Starting Corosync Cluster Engine (corosync): [ OK ]
> Jun 12 21:28:00 ha3 systemd: Started LSB: Starts and stops Corosync Cluster Engine..
> Jun 12 21:28:05 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster Manager....
> Jun 12 21:28:05 ha3 pacemaker: Starting Pacemaker Cluster Manager
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: mcp_read_config: Configured corosync to accept connections from group 1000: OK (1)
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: main: Starting Pacemaker 1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc lha-fencing nagios corosync-native libesmtp
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: cluster_connect_quorum: Quorum lost
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:05 ha3 pacemakerd[4375]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> Jun 12 21:28:05 ha3 pengine[4381]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jun 12 21:28:05 ha3 cib[4377]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jun 12 21:28:05 ha3 cib[4377]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 12 21:28:05 ha3 stonith-ng[4378]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 12 21:28:05 ha3 crmd[4382]: notice: main: CRM Git Version: 9d39a6b
> Jun 12 21:28:05 ha3 crmd[4382]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jun 12 21:28:05 ha3 crmd[4382]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jun 12 21:28:05 ha3 attrd[4380]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 12 21:28:05 ha3 attrd[4380]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 attrd[4380]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 12 21:28:05 ha3 attrd[4380]: notice: crm_update_peer_state: attrd_peer_change_cb: Node (null)[168427534] - state is now member (was (null))
> Jun 12 21:28:05 ha3 attrd[4380]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 attrd[4380]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:05 ha3 stonith-ng[4378]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 stonith-ng[4378]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 12 21:28:05 ha3 stonith-ng[4378]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 stonith-ng[4378]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:05 ha3 cib[4377]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 cib[4377]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 12 21:28:05 ha3 cib[4377]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:05 ha3 cib[4377]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:06 ha3 crmd[4382]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jun 12 21:28:06 ha3 crmd[4382]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:06 ha3 crmd[4382]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> Jun 12 21:28:06 ha3 crmd[4382]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:06 ha3 crmd[4382]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:06 ha3 stonith-ng[4378]: notice: setup_cib: Watching for stonith topology changes
> Jun 12 21:28:06 ha3 stonith-ng[4378]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:28:06 ha3 crmd[4382]: notice: cluster_connect_quorum: Quorum lost
> Jun 12 21:28:06 ha3 crmd[4382]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> Jun 12 21:28:06 ha3 crmd[4382]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:06 ha3 crmd[4382]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:06 ha3 crmd[4382]: notice: do_started: The local CRM is operational
> Jun 12 21:28:06 ha3 crmd[4382]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jun 12 21:28:07 ha3 stonith-ng[4378]: notice: stonith_device_register: Added 'fencing_route_to_ha4' to the device list (1 active devices)
> Jun 12 21:28:10 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
> Jun 12 21:28:10 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster Manager..
> Jun 12 21:28:27 ha3 crmd[4382]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jun 12 21:28:27 ha3 crmd[4382]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jun 12 21:28:27 ha3 crmd[4382]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> Jun 12 21:28:27 ha3 cib[4377]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:27 ha3 cib[4377]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:27 ha3 attrd[4380]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:27 ha3 attrd[4380]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:27 ha3 attrd[4380]: notice: write_attribute: Sent update 2 with 1 changes for terminate, id=<n/a>, set=(null)
> Jun 12 21:28:27 ha3 attrd[4380]: notice: write_attribute: Sent update 3 with 1 changes for shutdown, id=<n/a>, set=(null)
> Jun 12 21:28:27 ha3 attrd[4380]: notice: attrd_cib_callback: Update 2 for terminate[ha3]=(null): OK (0)
> Jun 12 21:28:27 ha3 attrd[4380]: notice: attrd_cib_callback: Update 3 for shutdown[ha3]=0: OK (0)
> Jun 12 21:28:27 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:28:27 ha3 pengine[4381]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 12 21:28:27 ha3 pengine[4381]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 12 21:28:27 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:28:27 ha3 pengine[4381]: warning: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-warn-89.bz2
> Jun 12 21:28:27 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:28:27 ha3 pengine[4381]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 12 21:28:27 ha3 pengine[4381]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 12 21:28:27 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:28:27 ha3 pengine[4381]: warning: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-90.bz2
> Jun 12 21:28:27 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> Jun 12 21:28:27 ha3 crmd[4382]: notice: te_fence_node: Executing reboot fencing operation (12) on ha4 (timeout=60000)
> Jun 12 21:28:27 ha3 stonith-ng[4378]: notice: handle_request: Client crmd.4382.407ee05a wants to fence (reboot) 'ha4' with device '(any)'
> Jun 12 21:28:27 ha3 stonith-ng[4378]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: eefd3564-988e-408a-a423-1be83ef5bdbc (0)
> Jun 12 21:28:27 ha3 stonith-ng[4378]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> Jun 12 21:28:27 ha3 stonith-ng[4378]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Jun 12 21:28:27 ha3 stonith: [4393]: info: parse config info info=ha4
> Jun 12 21:28:27 ha3 stonith-ng[4378]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 12 21:28:27 ha3 stonith: [4398]: info: parse config info info=ha4
> Jun 12 21:28:27 ha3 stonith: [4398]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 12 21:28:27 ha3 stonith: [4398]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 12 21:28:27 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not running
> Jun 12 21:28:27 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> Jun 12 21:28:27 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not running
> Jun 12 21:28:27 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> Jun 12 21:28:27 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> Jun 12 21:28:27 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on ha3 (local) - no waiting
> Jun 12 21:28:27 ha3 attrd[4380]: notice: write_attribute: Sent update 4 with 1 changes for probe_complete, id=<n/a>, set=(null)
> Jun 12 21:28:27 ha3 attrd[4380]: notice: attrd_cib_callback: Update 4 for probe_complete[ha3]=true: OK (0)
> Jun 12 21:29:27 ha3 stonith-ng[4378]: notice: stonith_action_async_done: Child process 4395 performing action 'reboot' timed out with signal 15
> Jun 12 21:29:27 ha3 stonith-ng[4378]: error: log_operation: Operation 'reboot' [4395] (call 2 from crmd.4382) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jun 12 21:29:27 ha3 stonith-ng[4378]: warning: log_operation: fencing_route_to_ha4:4395 [ Performing: stonith -t meatware -T reset ha4 ]
> Jun 12 21:29:27 ha3 stonith-ng[4378]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>< /font>
> Jun 12 21:29:27 ha3 stonith-ng[4378]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4382 at ha3.eefd3564: No route to host
> Jun 12 21:29:27 ha3 crmd[4382]: notice: tengine_stonith_callback: Stonith operation 2/12:1:0:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95: No route to host (-113)
> Jun 12 21:29:27 ha3 crmd[4382]: notice: tengine_stonith_callback: Stonith operation 2 for ha4 failed (No route to host): aborting transition.
> Jun 12 21:29:27 ha3 crmd[4382]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=eefd3564-988e-408a-a423-1be83ef5bdbc) by client crmd.4382
> Jun 12 21:29:27 ha3 crmd[4382]: notice: run_graph: Transition 1 (Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-90.bz2): Stopped
> Jun 12 21:29:27 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:29:27 ha3 pengine[4381]: warning: stage6: Scheduling Node ha4 for STONITH
> Jun 12 21:29:27 ha3 pengine[4381]: notice: LogActions: Start ha3_fabric_ping (ha3)
> Jun 12 21:29:27 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:29:27 ha3 pengine[4381]: warning: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-91.bz2
> Jun 12 21:29:27 ha3 crmd[4382]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> Jun 12 21:29:27 ha3 stonith-ng[4378]: notice: handle_request: Client crmd.4382.407ee05a wants to fence (reboot) 'ha4' with device '(any)'
> Jun 12 21:29:27 ha3 stonith-ng[4378]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: e5bee870-55de-4f22-b104-74556075cc99 (0)
> Jun 12 21:29:27 ha3 stonith-ng[4378]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 12 21:29:27 ha3 stonith-ng[4378]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> Jun 12 21:29:27 ha3 stonith: [4426]: info: parse config info info=ha4
> Jun 12 21:29:27 ha3 stonith: [4426]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jun 12 21:29:27 ha3 stonith: [4426]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jun 12 21:30:16 ha3 stonith: [4426]: info: node Meatware-reset: ha4
> Jun 12 21:30:16 ha3 stonith-ng[4378]: notice: log_operation: Operation 'reboot' [4425] (call 3 from crmd.4382) for host 'ha4' with device 'fencing_route_to_ha4' returned: 0 (OK)
> Jun 12 21:30:16 ha3 stonith-ng[4378]: notice: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4382 at ha3.e5bee870: OK
> Jun 12 21:30:16 ha3 crmd[4382]: notice: tengine_stonith_callback: Stonith operation 3/8:2:0:a2fa2eff-30ee-4f05-a458-7d23b0fa4c95: OK (0)
> Jun 12 21:30:16 ha3 crmd[4382]: notice: crm_update_peer_state: send_stonith_update: Node ha4[0] - state is now lost (was (null))
> Jun 12 21:30:16 ha3 crmd[4382]: notice: tengine_stonith_notify: Peer ha4 was terminated (reboot) by ha3 for ha3: OK (ref=e5bee870-55de-4f22-b104-74556075cc99) by client crmd.4382
> Jun 12 21:30:16 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 4: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 12 21:30:36 ha3 attrd[4380]: notice: write_attribute: Sent update 5 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 12 21:30:36 ha3 attrd[4380]: notice: attrd_cib_callback: Update 5 for pingd[ha3]=0: OK (0)
> Jun 12 21:30:36 ha3 ping(ha3_fabric_ping)[4429]: WARNING: pingd is less than failure_score(1)
> Jun 12 21:30:36 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true) unknown error
> Jun 12 21:30:36 ha3 crmd[4382]: warning: status_from_rc: Action 4 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 12 21:30:36 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626636)
> Jun 12 21:30:36 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626636)
> Jun 12 21:30:36 ha3 crmd[4382]: notice: run_graph: Transition 2 (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-91.bz2): Stopped
> Jun 12 21:30:36 ha3 attrd[4380]: notice: write_attribute: Sent update 6 with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 12 21:30:36 ha3 attrd[4380]: notice: write_attribute: Sent update 7 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:30:36 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-350.bz2
> Jun 12 21:30:36 ha3 attrd[4380]: notice: attrd_cib_callback: Update 6 for fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> Jun 12 21:30:36 ha3 attrd[4380]: notice: attrd_cib_callback: Update 7 for last-failure-ha3_fabric_ping[ha3]=1402626636: OK (0)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:30:36 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:30:36 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-351.bz2
> Jun 12 21:30:36 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 12 21:30:36 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
> Jun 12 21:30:36 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 5: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 12 21:30:41 ha3 attrd[4380]: notice: write_attribute: Sent update 8 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 12 21:30:41 ha3 attrd[4380]: notice: attrd_cib_callback: Update 8 for pingd[ha3]=(null): OK (0)
> Jun 12 21:30:56 ha3 ping(ha3_fabric_ping)[4476]: WARNING: pingd is less than failure_score(1)
> Jun 12 21:30:56 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=20, rc=1, cib-update=42, confirmed=true) unknown error
> Jun 12 21:30:56 ha3 crmd[4382]: warning: status_from_rc: Action 5 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 12 21:30:56 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626656)
> Jun 12 21:30:56 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626656)
> Jun 12 21:30:56 ha3 crmd[4382]: notice: run_graph: Transition 4 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-351.bz2): Stopped
> Jun 12 21:30:56 ha3 attrd[4380]: notice: write_attribute: Sent update 9 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:30:56 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-352.bz2
> Jun 12 21:30:56 ha3 attrd[4380]: notice: attrd_cib_callback: Update 9 for last-failure-ha3_fabric_ping[ha3]=1402626656: OK (0)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:30:56 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:30:56 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-353.bz2
> Jun 12 21:30:56 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 12 21:30:56 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=21, rc=0, cib-update=45, confirmed=true) ok
> Jun 12 21:30:56 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 5: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 12 21:31:01 ha3 attrd[4380]: notice: write_attribute: Sent update 10 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 12 21:31:01 ha3 attrd[4380]: notice: attrd_cib_callback: Update 10 for pingd[ha3]=(null): OK (0)
> Jun 12 21:31:16 ha3 ping(ha3_fabric_ping)[4522]: WARNING: pingd is less than failure_score(1)
> Jun 12 21:31:16 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=22, rc=1, cib-update=46, confirmed=true) unknown error
> Jun 12 21:31:16 ha3 crmd[4382]: warning: status_from_rc: Action 5 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 12 21:31:16 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626676)
> Jun 12 21:31:16 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626676)
> Jun 12 21:31:16 ha3 crmd[4382]: notice: run_graph: Transition 6 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-353.bz2): Stopped
> Jun 12 21:31:16 ha3 attrd[4380]: notice: write_attribute: Sent update 11 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:31:16 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-354.bz2
> Jun 12 21:31:16 ha3 attrd[4380]: notice: attrd_cib_callback: Update 11 for last-failure-ha3_fabric_ping[ha3]=1402626676: OK (0)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:31:16 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:31:16 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-355.bz2
> Jun 12 21:31:16 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 12 21:31:16 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=23, rc=0, cib-update=49, confirmed=true) ok
> Jun 12 21:31:16 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 5: start ha3_fabric_ping_start_0 on ha3 (local)
> Jun 12 21:31:21 ha3 attrd[4380]: notice: write_attribute: Sent update 12 with 1 changes for pingd, id=<n/a>, set=(null)
> Jun 12 21:31:21 ha3 attrd[4380]: notice: attrd_cib_callback: Update 12 for pingd[ha3]=(null): OK (0)
> Jun 12 21:31:36 ha3 ping(ha3_fabric_ping)[4568]: WARNING: pingd is less than failure_score(1)
> Jun 12 21:31:36 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=24, rc=1, cib-update=50, confirmed=true) unknown error
> Jun 12 21:31:36 ha3 crmd[4382]: warning: status_from_rc: Action 5 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jun 12 21:31:36 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626696)
> Jun 12 21:31:36 ha3 crmd[4382]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402626696)
> Jun 12 21:31:36 ha3 crmd[4382]: notice: run_graph: Transition 8 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-355.bz2): Stopped
> Jun 12 21:31:36 ha3 attrd[4380]: notice: write_attribute: Sent update 13 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:31:36 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-356.bz2
> Jun 12 21:31:36 ha3 attrd[4380]: notice: attrd_cib_callback: Update 13 for last-failure-ha3_fabric_ping[ha3]=1402626696: OK (0)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jun 12 21:31:36 ha3 pengine[4381]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: LogActions: Recover ha3_fabric_ping (Started ha3)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> Jun 12 21:31:36 ha3 pengine[4381]: notice: process_pe_message: Calculated Transition 10: /var/lib/pacemaker/pengine/pe-input-357.bz2
> Jun 12 21:31:36 ha3 crmd[4382]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jun 12 21:31:36 ha3 crmd[4382]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=25, rc=0, cib-update=53, confirmed=true) ok
>
> Paul Cain
>
> <graycol.gif>Andrew Beekhof ---06/12/2014 06:53:35 PM---From: Andrew Beekhof <andrew at beekhof.net> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
>
> From: Andrew Beekhof <andrew at beekhof.net>
> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Date: 06/12/2014 06:53 PM
> Subject: Re: [Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.
>
>
>
>
>
> > > </crm_config>
> > > <nodes>
> > > <node id="168427534" uname="ha3"/>
> > > <node id="168427535" uname="ha4"/>
> > > </nodes>
> > > <resources>
> > > <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> > > <instance_attributes id="ha3_fabric_ping-instance_attributes">
> > > <nvpair name="host_list" value="10.10.0.1" id="ha3_fabric_ping-instance_attributes-host_list"/>
> > > <nvpair name="failure_score" value="1" id="ha3_fabric_ping-instance_attributes-failure_score"/>
> > > </instance_attributes>
> > > <operations>
> > > <op name="start" timeout="60s" requires="nothing" on-fail="standby" interval="0" id="ha3_fabric_ping-start-0">
> > > <instance_attributes id="ha3_fabric_ping-start-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > <op name="monitor" interval="15s" requires="nothing" on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
> > > <instance_attributes id="ha3_fabric_ping-monitor-15s-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha3_fabric_ping-stop-0">
> > > <instance_attributes id="ha3_fabric_ping-stop-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > </operations>
> > > <meta_attributes id="ha3_fabric_ping-meta_attributes">
> > > <nvpair id="ha3_fabric_ping-meta_attributes-requires" name="requires" value="nothing"/>
> > > </meta_attributes>
> > > </primitive>
> > > <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" type="ping">
> > > <instance_attributes id="ha4_fabric_ping-instance_attributes">
> > > <nvpair name="host_list" value="10.10.0.1" id="ha4_fabric_ping-instance_attributes-host_list"/>
> > > <nvpair name="failure_score" value="1" id="ha4_fabric_ping-instance_attributes-failure_score"/>
> > > </instance_attributes>
> > > <operations>
> > > <op name="start" timeout="60s" requires="nothing" on-fail="standby" interval="0" id="ha4_fabric_ping-start-0">
> > > <instance_attributes id="ha4_fabric_ping-start-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > <op name="monitor" interval="15s" requires="nothing" on-fail="standby" timeout="15s" id="ha4_fabric_ping-monitor-15s">
> > > <instance_attributes id="ha4_fabric_ping-monitor-15s-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha4_fabric_ping-stop-0">
> > > <instance_attributes id="ha4_fabric_ping-stop-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > </operations>
> > > <meta_attributes id="ha4_fabric_ping-meta_attributes">
> > > <nvpair id="ha4_fabric_ping-meta_attributes-requires" name="requires" value="nothing"/>
> > > </meta_attributes>
> > > </primitive>
> > > <primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
> > > <instance_attributes id="fencing_route_to_ha3-instance_attributes">
> > > <nvpair name="hostlist" value="ha3" id="fencing_route_to_ha3-instance_attributes-hostlist"/>
> > > </instance_attributes>
> > > <operations>
> > > <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha3-start-0">
> > > <instance_attributes id="fencing_route_to_ha3-start-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha3-monitor-0">
> > > <instance_attributes id="fencing_route_to_ha3-monitor-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > </operations>
> > > </primitive>
> > > <primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
> > > <instance_attributes id="fencing_route_to_ha4-instance_attributes">
> > > <nvpair name="hostlist" value="ha4" id="fencing_route_to_ha4-instance_attributes-hostlist"/>
> > > </instance_attributes>
> > > <operations>
> > > <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha4-start-0">
> > > <instance_attributes id="fencing_route_to_ha4-start-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha4-monitor-0">
> > > <instance_attributes id="fencing_route_to_ha4-monitor-0-instance_attributes">
> > > <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
> > > </instance_attributes>
> > > </op>
> > > </operations>
> > > </primitive>
> > > </resources>
> > > <constraints>
> > > <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" score="INFINITY" node="ha3"/>
> > > <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" score="-INFINITY" node="ha4"/>
> > > <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" score="INFINITY" node="ha4"/>
> > > <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" score="-INFINITY" node="ha3"/>
> > > <rsc_location id="fencing_route_to_ha4_location" rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
> > > <rsc_location id="fencing_route_to_ha4_not_location" rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
> > > <rsc_location id="fencing_route_to_ha3_location" rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
> > > <rsc_location id="fencing_route_to_ha3_not_location" rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
> > > <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" score="INFINITY" first="ha3_fabric_ping" first-action="start" then="fencing_route_to_ha4" then-action="start"/>
> > > <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" score="INFINITY" first="ha4_fabric_ping" first-action="start" then="fencing_route_to_ha3" then-action="start"/>
> > > </constraints>
> > > <rsc_defaults>
> > > <meta_attributes id="rsc-options">
> > > <nvpair name="resource-stickiness" value="INFINITY" id="rsc-options-resource-stickiness"/>
> > > <nvpair name="migration-threshold" value="0" id="rsc-options-migration-threshold"/>
> > > <nvpair name="is-managed" value="true" id="rsc-options-is-managed"/>
> > > </meta_attributes>
> > > </rsc_defaults>
> > > </configuration>
> > > <status>
> > > <node_state id="168427534" uname="ha3" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
> > > <lrm id="168427534">
> > > <lrm_resources>
> > > <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> > > <lrm_rsc_op id="ha3_fabric_ping_last_0" operation_key="ha3_fabric_ping_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:0;4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="19" rc-code="0" op-status="0" interval="0" last-run="1402509661" last-rc-change="1402509661" exec-time="12" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> > > <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" operation_key="ha3_fabric_ping_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641" last-rc-change="1402509641" exec-time="20043" queue-time="0" op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
> > > </lrm_resource>
> > > <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" provider="pacemaker">
> > > <lrm_rsc_op id="ha4_fabric_ping_last_0" operation_key="ha4_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:7;5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402509565" last-rc-change="1402509565" exec-time="10" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
> > > </lrm_resource>
> > > <lrm_resource id="fencing_route_to_ha3" type="meatware" class="stonith">
> > > <lrm_rsc_op id="fencing_route_to_ha3_last_0" operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:7;6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402509565" last-rc-change="1402509565" exec-time="1" queue-time="0" op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
> > > </lrm_resource>
> > > <lrm_resource id="fencing_route_to_ha4" type="meatware" class="stonith">
> > > <lrm_rsc_op id="fencing_route_to_ha4_last_0" operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.8" transition-key="7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" transition-magic="0:7;7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa" call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402509565" last-rc-change="1402509565" exec-time="0" queue-time="0" op-digest="5be26fbcfd648e3d545d0115645dde76"/>
> > > </lrm_resource>
> > > </lrm_resources>
> > > </lrm>
> > > <transient_attributes id="168427534">
> > > <instance_attributes id="status-168427534">
> > > <nvpair id="status-168427534-shutdown" name="shutdown" value="0"/>
> > > <nvpair id="status-168427534-probe_complete" name="probe_complete" value="true"/>
> > > <nvpair id="status-168427534-fail-count-ha3_fabric_ping" name="fail-count-ha3_fabric_ping" value="INFINITY"/>
> > > <nvpair id="status-168427534-last-failure-ha3_fabric_ping" name="last-failure-ha3_fabric_ping" value="1402509661"/>
> > > </instance_attributes>
> > > </transient_attributes>
> > > </node_state>
> > > <node_state id="168427535" in_ccm="false" crmd="offline" join="down" crm-debug-origin="send_stonith_update" uname="ha4" expected="down"/>
> > > </status>
> > > </cib>
> > > [root at ha3 ~]#
> > >
> > >
> > > /var/log/messages from when pacemaker started on ha3 to when ha3_fabric_ping failed.
> > > Jun 11 12:59:01 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster Manager....
> > > Jun 11 12:59:01 ha3 pacemaker: Starting Pacemaker Cluster Manager
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: mcp_read_config: Configured corosync to accept connections from group 1000: OK (1)
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: main: Starting Pacemaker 1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc lha-fencing nagios corosync-native libesmtp
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: cluster_connect_quorum: Quorum acquired
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > > Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> > > Jun 11 12:59:02 ha3 pengine[5013]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> > > Jun 11 12:59:02 ha3 cib[5009]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> > > Jun 11 12:59:02 ha3 cib[5009]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > > Jun 11 12:59:02 ha3 crmd[5014]: notice: main: CRM Git Version: 9d39a6b
> > > Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> > > Jun 11 12:59:02 ha3 crmd[5014]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> > > Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > > Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > > Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_update_peer_state: attrd_peer_change_cb: Node (null)[168427534] - state is now member (was (null))
> > > Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > > Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427534
> > > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: setup_cib: Watching for stonith topology changes
> > > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum acquired
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is operational
> > > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> > > Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register: Added 'fencing_route_to_ha4' to the device list (1 active devices)
> > > Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
> > > Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster Manager..
> > > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> > > Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
> > > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> > > Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2 with 1 changes for terminate, id=<n/a>, set=(null)
> > > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3 with 1 changes for shutdown, id=<n/a>, set=(null)
> > > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for terminate[ha3]=(null): OK (0)
> > > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for shutdown[ha3]=0: OK (0)
> > > Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for STONITH
> > > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start ha3_fabric_ping (ha3)
> > > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> > > Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calc ulated Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot fencing operation (12) on ha4 (timeout=60000)
> > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: b3ab6141-9612-4024-82b2-350e74bbb33d (0)
> > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4
> > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > > Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4
> > > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> > > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not running
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not running
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on ha3 (local) - no waiting
> > > Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4 with 1 changes for probe_complete, id=<n/a>, set=(null)
> > > Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for probe_complete[ha3]=true: OK (0)
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done: Child process 5030 performing action 'reboot' timed out with signal 15
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation 'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation: fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ]
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.5014 at ha3.b3ab6141: No route to host
> > > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host (-113)
> > > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 2 for ha4 failed (No route to host): aborting transition.
> > > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014
> > > Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0 (Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped
> > > Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for STONITH
> > > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start ha3_fabric_ping (ha3)
> > > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start fencing_route_to_ha4 (ha3)
> > > Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2
> > > Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0)
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > > Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4
> > > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> > > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> > > Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4
> > > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation 'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device 'fencing_route_to_ha4' returned: 0 (OK)
> > > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation reboot of ha4 by ha3 for crmd.5014 at ha3.eae78d4c: OK
> > > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0)
> > > Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state: send_stonith_update: Node ha4[0] - state is now lost (was (null))
> > > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was terminated (reboot) by ha3 for ha3: OK (ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014
> > > Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: start ha3_fabric_ping_start_0 on ha3 (local)
> > > Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root.
> > > Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root.
> > > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5 with 1 changes for pingd, id=<n/a>, set=(null)
> > > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for pingd[ha3]=0: OK (0)
> > > Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less than failure_score(1)
> > > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true) unknown error
> > > Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> > > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402509661)
> > > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402509661)
> > > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1 (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped
> > > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6 with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> > > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> > > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop ha3_fabric_ping (ha3)
> > > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2
> > > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> > > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0)
> > > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop ha3_fabric_ping (ha3)
> > > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2
> > > Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: stop ha3_fabric_ping_stop_0 on ha3 (local)
> > > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
> > > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete
> > > Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > > Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8 with 1 changes for pingd, id=<n/a>, set=(null)
> > > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > > Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > > Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2
> > > Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete
> > > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > > Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for pingd[ha3]=(null): OK (0)
> > >
> > > /etc/corosync/corosync.conf
> > > # Please read the corosync.conf.5 manual page
> > > totem {
> > > version: 2
> > >
> > > crypto_cipher: none
> > > crypto_hash: none
> > >
> > > interface {
> > > ringnumber: 0
> > > bindnetaddr: 10.10.0.0
> > > mcastport: 5405
> > > ttl: 1
> > > }
> > > transport: udpu
> > > }
> > >
> > > logging {
> > > fileline: off
> > > to_logfile: no
> > > to_syslog: yes
> > > #logfile: /var/log/cluster/corosync.log
> > > debug: off
> > > timestamp: on
> > > logger_subsys {
> > > subsys: QUORUM
> > > debug: off
> > > }
> > > }
> > >
> > > nodelist {
> > > node {
> > > ring0_addr: 10.10.0.14
> > > }
> > >
> > > node {
> > > ring0_addr: 10.10.0.15
> > > }
> > > }
> > >
> > > quorum {
> > > # Enable and configure quorum subsystem (default: off)
> > > # see also corosync.conf.5 and votequorum.5
> > > provider: corosync_votequorum
> > > expected_votes: 2
> > > }
> > > [root at ha3 ~]#
> > >
> > > Paul Cain
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> > [attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM] _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> [attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM] _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140627/4fd80030/attachment-0001.sig>
More information about the Pacemaker
mailing list