[Pacemaker] standby attribute and same resources running at the same time
Leon Fauster
leonfauster at googlemail.com
Mon Mar 4 17:20:41 UTC 2013
Dear list,
just to excuse the triviality - i started to deploy a ha environment
in a test lab and therefore i do not have much experience.
i started to setup a 2-node cluster
corosync-1.4.1-15.el6.x86_64
pacemaker-1.1.8-7.el6.x86_64
cman-3.0.12.1-49.el6.x86_64
with rhel6.3 and then switched to rhel6.4.
This update brings some differences. The crm shell is gone and pcs is added.
Anyway i found some equivalent commands to setup/configure resources.
So far all good. I am doing some stress test now and noticed that rebooting
one node (n2), that node (n2) will be marked as standby in the cib (shown on the
other node (n1)).
After rebooting the node (n2) crm_mon on that node shows that the other node (n1)
is offline and begins to start the ressources. While the other node (n1) that wasn't
rebooted still shows n2 as standby. At that point both nodes are runnnig the "same"
resources. After a couple of minutes that situation is noticed and both nodes
renegotiate the current state. Then one node take over the responsibility to provide
the resources. On both nodes the previously rebooted node is still listed as standby.
cat /var/log/messages |grep error
Mar 4 17:32:33 cn1 pengine[1378]: error: native_create_actions: Resource resIP (ocf::IPaddr2) is active on 2 nodes attempting recovery
Mar 4 17:32:33 cn1 pengine[1378]: error: native_create_actions: Resource resApache (ocf::apache) is active on 2 nodes attempting recovery
Mar 4 17:32:33 cn1 pengine[1378]: error: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-error-6.bz2
Mar 4 17:32:48 cn1 crmd[1379]: notice: run_graph: Transition 1 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-6.bz2): Complete
crm_mon -1
Last updated: Mon Mar 4 17:49:08 2013
Last change: Mon Mar 4 10:22:53 2013 via crm_resource on cn1.localdomain
Stack: cman
Current DC: cn1.localdomain - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
2 Resources configured.
Node cn2.localdomain: standby
Online: [ cn1.localdomain ]
resIP (ocf::heartbeat:IPaddr2): Started cn1.localdomain
resApache (ocf::heartbeat:apache): Started cn1.localdomain
i checked the init scripts and found that the standby "behavior" comes
from a function that is called on "service pacemaker stop" (added in rhel6.4).
cman_pre_stop()
{
cname=`crm_node --name`
crm_attribute -N $cname -n standby -v true -l reboot
echo -n "Waiting for shutdown of managed resources"
...
i could not delete the standby attribute with
crm_attribute -G --node=cn2.localdomain -n standby
okay - recap:
1st. i have this delay where the two nodes dont see each
other (after rebooting) and the result are resources running on both
nodes while they should only run on one node - this will be corrected
by the cluster it self but this situation should not happen.
2nd. the standby attribute (and there must be a reason why redhat
added this) will prevent to migrate resources to that node. How
do i delete this attribute?
i appreciate any comments
--
Leon
A. $ cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="HA" config_version="5">
<logging debug="off"/>
<clusternodes>
<clusternode name="cn1.localdomain" votes="1" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="cn1.localdomain"/>
</method>
</fence>
</clusternode>
<clusternode name="cn2.localdomain" votes="1" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="cn2.localdomain"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="pcmk" agent="fence_pcmk"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
B. $ pcs config
Corosync Nodes:
Pacemaker Nodes:
cn1.localdomain cn2.localdomain
Resources:
Resource: resIP (provider=heartbeat type=IPaddr2 class=ocf)
Attributes: ip=192.168.201.220 nic=eth0 cidr_netmask=24
Operations: monitor interval=30s
Resource: resApache (provider=heartbeat type=apache class=ocf)
Attributes: httpd=/usr/sbin/httpd configfile=/etc/httpd/conf/httpd.conf
Operations: monitor interval=1min
Location Constraints:
Ordering Constraints:
start resApache then start resIP
Colocation Constraints:
resIP with resApache
Cluster Properties:
dc-version: 1.1.8-7.el6-394e906
cluster-infrastructure: cman
expected-quorum-votes: 2
stonith-enabled: false
no-quorum-policy: ignore
More information about the Pacemaker
mailing list