[Pacemaker] Problem configuring Heartbeat with CRM : Abnormal Failover test results
Deneux Olivier
odeneux at oxya.com
Fri Jul 29 14:14:31 CET 2011
Hello,
First of all, please excuse my apporximative english/US..
I'm facing a problem with configuring a simple 2 nodes cluster with 1
resource (Virtual IP)
I'v read and read a lot of threads and doc but... Don't find.
Have to say the cluster's world is pretty new for me...
I've installed on my 2 linux servers RHEL 4.1.2-48 following packages :
cluster-glue-1.0.5-1.el5.x86_64.rpm
cluster-glue-libs-1.0.5-1.el5.x86_64.rpm
corosync-1.2.5-1.3.el5.x86_64.rpm
corosynclib-1.2.5-1.3.el5.x86_64.rpm
heartbeat-3.0.3-2.el5.x86_64.rpm
heartbeat-libs-3.0.3-2.el5.x86_64.rpm
libesmtp-1.0.4-5.el5.x86_64.rpm
pacemaker-1.0.9.1-1.el5.x86_64.rpm
pacemaker-libs-1.0.9.1-1.el5.x86_64.rpm
resource-agents-1.0.3-2.el5.x86_64.rpm
(corosync not running, It seems that I don't need it)
Here under the ha.cf of node 1 :
node 1
autojoin none
keepalive 2
deadtime 10
initdead 80
udpport 694
ucast bond0 <@IP node2>
auto_failback off
node node1
node node2
use_logd yes
crm yes
Here under the ha.cf of node 2 :
node 1
autojoin none
keepalive 2
deadtime 10
initdead 80
udpport 694
ucast bond0 <@IP node1>
auto_failback off
node node1
node node2
use_logd yes
crm yes
I use crm to configure the cluster, here is the cib.xml file :
<cib validate-with="pacemaker-1.0" crm_feature_set="3.0.1"
have-quorum="1" admin_epoch="0" epoch="190"
dc-uuid="85f5f8dc-6ccf-4478-8a89-a3d7c952c0e4" num_
updates="0" cib-last-written="Fri Jul 29 14:18:28 2011">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.0.9-89bd754939df5150de7cd76835f98fe90851b677"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="Heartbeat"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh"
name="last-lrm-refresh" value="1311941556"/>
</cluster_property_set>
</crm_config>
<nodes>
<node type="normal" uname="node2" id="85f5f8dc-6ccf-4478-8a89-a3d7c952c0e4">
<instance_attributes id="nodes-85f5f8dc-6ccf-4478-8a89-a3d7c952c0e4">
<nvpair name="standby"
id="nodes-85f5f8dc-6ccf-4478-8a89-a3d7c952c0e4-standby" value="off"/>
</instance_attributes>
</node>
<node id="813121d2-360b-4532-8883-7f1330ed2c39" type="normal" uname="node1">
<instance_attributes id="nodes-813121d2-360b-4532-8883-7f1330ed2c39">
<nvpair id="nodes-813121d2-360b-4532-8883-7f1330ed2c39-standby"
name="standby" value="off"/>
</instance_attributes>
</node>
</nodes>
<resources>
<primitive class="ocf" id="ClusterIP" provider="heartbeat" type="IPaddr2">
<instance_attributes id="ClusterIP-instance_attributes">
<nvpair id="ClusterIP-instance_attributes-ip" name="ip" value="<@IP
Virtual>"/>
<nvpair id="ClusterIP-instance_attributes-cidr_netmask"
name="cidr_netmask" value="32"/>
</instance_attributes>
<operations>
<op id="ClusterIP-monitor-30s" interval="30s" name="monitor"/>
</operations>
<meta_attributes id="ClusterIP-meta_attributes">
<nvpair id="ClusterIP-meta_attributes-target-role" name="target-role"
value="Started"/>
</meta_attributes>
</primitive>
</resources>
<constraints/>
<rsc_defaults/>
<op_defaults/>
</configuration>
</cib>
Heartbeat demon starts well on both sides, here is the result of crm_mon :
============
Last updated: Fri Jul 29 14:49:47 2011
Stack: Heartbeat
Current DC: node1 (813121d2-360b-4532-8883-7f1330ed2c39) - partition with
quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ node 2 node1]
ClusterIP (ocf::heartbeat:IPaddr2): Started node2
To test if everything works fine, il launch a script taht stop network
on node2, waits 50s and then starts back network.
When the network goes down on node2, the resource migrates as expected
on node1.
But when the network is back operational, resource does note move back
to node2 (it should, as there's no stickiness option defined yet)
I have the following error on crm_mon :
============
Last updated: Fri Jul 29 14:52:15 2011
Stack: Heartbeat
Current DC: node2 (85f5f8dc-6ccf-4478-8a89-a3d7c952c0e4) - partition with
quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ node1 node2]
ClusterIP (ocf::heartbeat:IPaddr2): Started node1
Failed actions:
ClusterIP_start_0 (node=node2, call=6, rc=2, status=complete):
invalid parameter
Same behaviour if I swap resource to node1 and start/stop network on node1.
Why is there this "invalid parameter" ??
Here is an extract of the ha-log :
Jul 29 14:48:30 node2 pengine: [30752]: ERROR: unpack_rsc_op: Hard error
- ClusterIP_start_0 failed with rc=2: Preventing ClusterIP from
re-starting on node2
Jul 29 14:48:30 node2 pengine: [30752]: WARN: unpack_rsc_op: Processing
failed op ClusterIP_start_0 on node2: invalid parameter (2)
Jul 29 14:48:30 node2 pengine: [30752]: notice: native_print:
ClusterIP (ocf::heartbeat:IPaddr2): Started node1
Jul 29 14:48:30 node2 pengine: [30752]: info: get_failcount: ClusterIP
has failed INFINITY times on node2
Jul 29 14:48:30 node2 pengine: [30752]: WARN: common_apply_stickiness:
Forcing ClusterIP away from node2 after 1000000 failures (max=1000000)
Jul 29 14:48:30 node2 pengine: [30752]: notice: LogActions: Leave
resource ClusterIP (Started node1)
If you need more info, please ask me !
Thanks In advance
Olivier
More information about the Pacemaker
mailing list