[Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.
Paul E Cain
pecain at us.ibm.com
Wed Jun 11 20:55:22 CEST 2014
Hello,
Overview
I'm experimenting with a small two-node Pacemaker cluster on two RHEL 7
VMs. One of the things I need to do is ensure that my cluster can connect
to a certain IP address 10.10.0.1 because once I add the actual resources
that will need to be HA those resources will need access to 10.10.0.1 for
the cluster to functional normally. To do that, I have one
ocf:pacemaker:ping resource for each node to check that connectivity. If
the ping fails, the node should go into standby mode and get fenced if
possible. Additionally, when a node first comes up I want that connectivity
check to happen before the fencing agents come up or a STONITH happens
because a node should not try to take over cluster resources if it cannot
connect to 10.10.0.1. To do this, I tried adding requires="nothing" and
prereq="nothing" to all the operations for both pinging resources. I also
have two meatware fencing agents to use for testing. I'm using order
constraints so they don't start until after the ping resources.
Cluster When Functioning Normally
[root at ha3 ~]# crm_mon -1
Last updated: Wed Jun 11 13:10:54 2014
Last change: Wed Jun 11 13:10:35 2014 via crmd on ha3
Stack: corosync
Current DC: ha3 (168427534) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
4 Resources configured
Online: [ ha3 ha4 ]
ha3_fabric_ping (ocf::pacemaker:ping): Started ha3
ha4_fabric_ping (ocf::pacemaker:ping): Started ha4
fencing_route_to_ha3 (stonith:meatware): Started ha4
fencing_route_to_ha4 (stonith:meatware): Started ha3
Testing
However, when I tested this by only starting up pacemaker on ha3 and also
preventing ha3 from connecting to 10.10.0.1, I found that ha3 would not
start until after ha4 was STONITHed. What I was aiming for was for
ha3_fabric_ping to fail to start, which would prevent the fencing agent
from starting and therefore prevent any STONITH.
Question
Any ideas why this is not working as expected? It's my understanding that
requires="nothing" should allow ha3_fabric_ping to start even before any
fencing operations. Maybe I'm misunderstanding something.
Thanks for any help you can offer.
Below is shows the software versions, cibadmin -Q, the /var/log/messages on
ha3 during my test, and my corosync.conf file.
Tell me if you need any more information.
Software Versions (All Compiled From Source From The Website of the
Respective Projects)
Cluster glue 1.0.11
libqb 0.17.0
Corosync 2.3.3
Pacemaker 1.1.11
Resources Agents 3.9.5
crmsh 2.0
cibadmin -Q
<cib epoch="204" num_updates="18" admin_epoch="0"
validate-with="pacemaker-1.2" cib-last-written="Wed Jun 11 12:56:50 2014"
crm_feature_set="3.0.8" update-origin="ha3" update-client="crm_resource"
have-quorum="1" dc-uuid="168427534">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair name="symmetric-cluster" value="true"
id="cib-bootstrap-options-symmetric-cluster"/>
<nvpair name="stonith-enabled" value="true"
id="cib-bootstrap-options-stonith-enabled"/>
<nvpair name="stonith-action" value="reboot"
id="cib-bootstrap-options-stonith-action"/>
<nvpair name="no-quorum-policy" value="ignore"
id="cib-bootstrap-options-no-quorum-policy"/>
<nvpair name="stop-orphan-resources" value="true"
id="cib-bootstrap-options-stop-orphan-resources"/>
<nvpair name="stop-orphan-actions" value="true"
id="cib-bootstrap-options-stop-orphan-actions"/>
<nvpair name="default-action-timeout" value="20s"
id="cib-bootstrap-options-default-action-timeout"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-9d39a6b"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="168427534" uname="ha3"/>
<node id="168427535" uname="ha4"/>
</nodes>
<resources>
<primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker"
type="ping">
<instance_attributes id="ha3_fabric_ping-instance_attributes">
<nvpair name="host_list" value="10.10.0.1"
id="ha3_fabric_ping-instance_attributes-host_list"/>
<nvpair name="failure_score" value="1"
id="ha3_fabric_ping-instance_attributes-failure_score"/>
</instance_attributes>
<operations>
<op name="start" timeout="60s" requires="nothing"
on-fail="standby" interval="0" id="ha3_fabric_ping-start-0">
<instance_attributes
id="ha3_fabric_ping-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" interval="15s" requires="nothing"
on-fail="standby" timeout="15s" id="ha3_fabric_ping-monitor-15s">
<instance_attributes
id="ha3_fabric_ping-monitor-15s-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="stop" on-fail="fence" requires="nothing" interval="0"
id="ha3_fabric_ping-stop-0">
<instance_attributes
id="ha3_fabric_ping-stop-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
<meta_attributes id="ha3_fabric_ping-meta_attributes">
<nvpair id="ha3_fabric_ping-meta_attributes-requires"
name="requires" value="nothing"/>
</meta_attributes>
</primitive>
<primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker"
type="ping">
<instance_attributes id="ha4_fabric_ping-instance_attributes">
<nvpair name="host_list" value="10.10.0.1"
id="ha4_fabric_ping-instance_attributes-host_list"/>
<nvpair name="failure_score" value="1"
id="ha4_fabric_ping-instance_attributes-failure_score"/>
</instance_attributes>
<operations>
<op name="start" timeout="60s" requires="nothing"
on-fail="standby" interval="0" id="ha4_fabric_ping-start-0">
<instance_attributes
id="ha4_fabric_ping-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" interval="15s" requires="nothing"
on-fail="standby" timeout="15s" id="ha4_fabric_ping-monitor-15s">
<instance_attributes
id="ha4_fabric_ping-monitor-15s-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="stop" on-fail="fence" requires="nothing" interval="0"
id="ha4_fabric_ping-stop-0">
<instance_attributes
id="ha4_fabric_ping-stop-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
<meta_attributes id="ha4_fabric_ping-meta_attributes">
<nvpair id="ha4_fabric_ping-meta_attributes-requires"
name="requires" value="nothing"/>
</meta_attributes>
</primitive>
<primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
<instance_attributes id="fencing_route_to_ha3-instance_attributes">
<nvpair name="hostlist" value="ha3"
id="fencing_route_to_ha3-instance_attributes-hostlist"/>
</instance_attributes>
<operations>
<op name="start" requires="nothing" interval="0"
id="fencing_route_to_ha3-start-0">
<instance_attributes
id="fencing_route_to_ha3-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" requires="nothing" interval="0"
id="fencing_route_to_ha3-monitor-0">
<instance_attributes
id="fencing_route_to_ha3-monitor-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
</primitive>
<primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
<instance_attributes id="fencing_route_to_ha4-instance_attributes">
<nvpair name="hostlist" value="ha4"
id="fencing_route_to_ha4-instance_attributes-hostlist"/>
</instance_attributes>
<operations>
<op name="start" requires="nothing" interval="0"
id="fencing_route_to_ha4-start-0">
<instance_attributes
id="fencing_route_to_ha4-start-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
<op name="monitor" requires="nothing" interval="0"
id="fencing_route_to_ha4-monitor-0">
<instance_attributes
id="fencing_route_to_ha4-monitor-0-instance_attributes">
<nvpair name="prereq" value="nothing"
id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
</instance_attributes>
</op>
</operations>
</primitive>
</resources>
<constraints>
<rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping"
score="INFINITY" node="ha3"/>
<rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping"
score="-INFINITY" node="ha4"/>
<rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping"
score="INFINITY" node="ha4"/>
<rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping"
score="-INFINITY" node="ha3"/>
<rsc_location id="fencing_route_to_ha4_location"
rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
<rsc_location id="fencing_route_to_ha4_not_location"
rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
<rsc_location id="fencing_route_to_ha3_location"
rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
<rsc_location id="fencing_route_to_ha3_not_location"
rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
<rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4"
score="INFINITY" first="ha3_fabric_ping" first-action="start"
then="fencing_route_to_ha4" then-action="start"/>
<rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3"
score="INFINITY" first="ha4_fabric_ping" first-action="start"
then="fencing_route_to_ha3" then-action="start"/>
</constraints>
<rsc_defaults>
<meta_attributes id="rsc-options">
<nvpair name="resource-stickiness" value="INFINITY"
id="rsc-options-resource-stickiness"/>
<nvpair name="migration-threshold" value="0"
id="rsc-options-migration-threshold"/>
<nvpair name="is-managed" value="true"
id="rsc-options-is-managed"/>
</meta_attributes>
</rsc_defaults>
</configuration>
<status>
<node_state id="168427534" uname="ha3" in_ccm="true" crmd="online"
crm-debug-origin="do_update_resource" join="member" expected="member">
<lrm id="168427534">
<lrm_resources>
<lrm_resource id="ha3_fabric_ping" type="ping" class="ocf"
provider="pacemaker">
<lrm_rsc_op id="ha3_fabric_ping_last_0"
operation_key="ha3_fabric_ping_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:0;4:3:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="19" rc-code="0" op-status="0" interval="0" last-run="1402509661"
last-rc-change="1402509661" exec-time="12" queue-time="0"
op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
<lrm_rsc_op id="ha3_fabric_ping_last_failure_0"
operation_key="ha3_fabric_ping_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:1;4:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="18" rc-code="1" op-status="0" interval="0" last-run="1402509641"
last-rc-change="1402509641" exec-time="20043" queue-time="0"
op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
</lrm_resource>
<lrm_resource id="ha4_fabric_ping" type="ping" class="ocf"
provider="pacemaker">
<lrm_rsc_op id="ha4_fabric_ping_last_0"
operation_key="ha4_fabric_ping_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:7;5:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="9" rc-code="7" op-status="0" interval="0" last-run="1402509565"
last-rc-change="1402509565" exec-time="10" queue-time="0"
op-digest="91b00b3fe95f23582466d18e42c4fd58"/>
</lrm_resource>
<lrm_resource id="fencing_route_to_ha3" type="meatware"
class="stonith">
<lrm_rsc_op id="fencing_route_to_ha3_last_0"
operation_key="fencing_route_to_ha3_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:7;6:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="13" rc-code="7" op-status="0" interval="0" last-run="1402509565"
last-rc-change="1402509565" exec-time="1" queue-time="0"
op-digest="502fbd7a2366c2be772d7fbecc9e0351"/>
</lrm_resource>
<lrm_resource id="fencing_route_to_ha4" type="meatware"
class="stonith">
<lrm_rsc_op id="fencing_route_to_ha4_last_0"
operation_key="fencing_route_to_ha4_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.8"
transition-key="7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
transition-magic="0:7;7:0:7:0ebf14dc-cfcf-425a-a507-65ed0ee060aa"
call-id="17" rc-code="7" op-status="0" interval="0" last-run="1402509565"
last-rc-change="1402509565" exec-time="0" queue-time="0"
op-digest="5be26fbcfd648e3d545d0115645dde76"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="168427534">
<instance_attributes id="status-168427534">
<nvpair id="status-168427534-shutdown" name="shutdown"
value="0"/>
<nvpair id="status-168427534-probe_complete"
name="probe_complete" value="true"/>
<nvpair id="status-168427534-fail-count-ha3_fabric_ping"
name="fail-count-ha3_fabric_ping" value="INFINITY"/>
<nvpair id="status-168427534-last-failure-ha3_fabric_ping"
name="last-failure-ha3_fabric_ping" value="1402509661"/>
</instance_attributes>
</transient_attributes>
</node_state>
<node_state id="168427535" in_ccm="false" crmd="offline" join="down"
crm-debug-origin="send_stonith_update" uname="ha4" expected="down"/>
</status>
</cib>
[root at ha3 ~]#
/var/log/messages from when pacemaker started on ha3 to when
ha3_fabric_ping failed.
Jun 11 12:59:01 ha3 systemd: Starting LSB: Starts and stops Pacemaker
Cluster Manager....
Jun 11 12:59:01 ha3 pacemaker: Starting Pacemaker Cluster Manager
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: mcp_read_config: Configured
corosync to accept connections from group 1000: OK (1)
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: main: Starting Pacemaker
1.1.10 (Build: 9d39a6b): agent-manpages ncurses libqb-logging libqb-ipc
lha-fencing nagios corosync-native libesmtp
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427534
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: cluster_connect_quorum:
Quorum acquired
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node ha3[168427534] - state is now member (was
(null))
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to
get node name for nodeid 168427535
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427535
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to
get node name for nodeid 168427535
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: corosync_node_name: Unable to
get node name for nodeid 168427535
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427535
Jun 11 12:59:01 ha3 pacemakerd[5007]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node (null)[168427535] - state is now member (was
(null))
Jun 11 12:59:02 ha3 pengine[5013]: warning:
crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by
group haclient
Jun 11 12:59:02 ha3 cib[5009]: warning:
crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group
haclient
Jun 11 12:59:02 ha3 cib[5009]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: corosync
Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jun 11 12:59:02 ha3 crmd[5014]: notice: main: CRM Git Version: 9d39a6b
Jun 11 12:59:02 ha3 crmd[5014]: warning:
crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by
group haclient
Jun 11 12:59:02 ha3 crmd[5014]: warning:
crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group
haclient
Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: corosync
Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
Jun 11 12:59:02 ha3 attrd[5012]: notice: crm_update_peer_state:
attrd_peer_change_cb: Node (null)[168427534] - state is now member (was
(null))
Jun 11 12:59:02 ha3 attrd[5012]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:02 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Could not
obtain a node name for corosync nodeid 168427534
Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
Jun 11 12:59:02 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
Jun 11 12:59:02 ha3 cib[5009]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:02 ha3 cib[5009]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: corosync
Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427534
Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: setup_cib: Watching for
stonith topology changes
Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum
acquired
Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node ha3[168427534] - state is now member (was
(null))
Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427535
Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427535
Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a
node name for corosync nodeid 168427535
Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node (null)[168427535] - state is now member (was
(null))
Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is
operational
Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL
origin=do_started ]
Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register:
Added 'fencing_route_to_ha4' to the device list (1 active devices)
Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[ OK ]
Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker
Cluster Manager..
Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT
from crm_timer_popped() received in state S_PENDING
Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=election_timeout_popped ]
Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC
from do_election_check() received in state S_INTEGRATION
Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to get
node name for nodeid 168427534
Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname
-n for the local corosync node name
Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2
with 1 changes for terminate, id=<n/a>, set=(null)
Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3
with 1 changes for shutdown, id=<n/a>, set=(null)
Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for
terminate[ha3]=(null): OK (0)
Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for
shutdown[ha3]=0: OK (0)
Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for
STONITH
Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calculated
Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2
Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot
fencing operation (12) on ha4 (timeout=60000)
Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client
crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
b3ab6141-9612-4024-82b2-350e74bbb33d (0)
Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to
get node name for nodeid 168427534
Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to
uname -n for the local corosync node name
Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4
Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4
Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not
running
Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not
running
Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
3: probe_complete probe_complete on ha3 (local) - no waiting
Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4
with 1 changes for probe_complete, id=<n/a>, set=(null)
Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for
probe_complete[ha3]=true: OK (0)
Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done:
Child process 5030 performing action 'reboot' timed out with signal 15
Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation
'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device
'fencing_route_to_ha4' returned: -62 (Timer expired)
Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation:
fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ]
Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer: Couldn't
find anyone to fence ha4 with <any>
Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.5014 at ha3.b3ab6141: No route to host
Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith
operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host
(-113)
Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith
operation 2 for ha4 failed (No route to host): aborting transition.
Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4
was not terminated (reboot) by ha3 for ha3: No route to host
(ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014
Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0
(Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped
Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for
STONITH
Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start
ha3_fabric_ping (ha3)
Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start
fencing_route_to_ha4 (ha3)
Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message: Calculated
Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2
Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot
fencing operation (8) on ha4 (timeout=60000)
Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client
crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op:
Initiating remote operation reboot for ha4:
eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0)
Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device:
fencing_route_to_ha4 can fence ha4: dynamic-list
Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4
Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED
to reset ha4.
Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" AFTER
power-cycling the machine.
Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4
Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation
'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device
'fencing_route_to_ha4' returned: 0 (OK)
Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation
reboot of ha4 by ha3 for crmd.5014 at ha3.eae78d4c: OK
Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith
operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0)
Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state:
send_stonith_update: Node ha4[0] - state is now lost (was (null))
Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4
was terminated (reboot) by ha3 for ha3: OK
(ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014
Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
4: start ha3_fabric_ping_start_0 on ha3 (local)
Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root.
Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root.
Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5
with 1 changes for pingd, id=<n/a>, set=(null)
Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for
pingd[ha3]=0: OK (0)
Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less
than failure_score(1)
Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true)
unknown error
Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4
(ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402509661)
Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating
failcount for ha3_fabric_ping on ha3 after failed start: rc=1
(update=INFINITY, time=1402509661)
Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1
(Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped
Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6
with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7
with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop
ha3_fabric_ping (ha3)
Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated
Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2
Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for
fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for
last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0)
Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop
ha3_fabric_ping (ha3)
Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated
Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2
Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action
4: stop ha3_fabric_ping_stop_0 on ha3 (local)
Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation
ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete
Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8
with 1 changes for pingd, id=<n/a>, set=(null)
Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure:
Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated
Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2
Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete
Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for
pingd[ha3]=(null): OK (0)
/etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
totem {
version: 2
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 10.10.0.0
mcastport: 5405
ttl: 1
}
transport: udpu
}
logging {
fileline: off
to_logfile: no
to_syslog: yes
#logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: 10.10.0.14
}
node {
ring0_addr: 10.10.0.15
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
expected_votes: 2
}
[root at ha3 ~]#
Paul Cain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140611/39a899b5/attachment-0001.html>
More information about the Pacemaker
mailing list