[Pacemaker] Duplicate node after corosync / pacemaker upgrade
Mistina Michal
Michal.Mistina at virte.sk
Tue Aug 13 07:42:15 UTC 2013
Hi.
I tried to set things going according to Andrew 3rd suggestion in
http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/ .
Everyone Talks to Corosync 2.0
Requirements:
- Filesystems supported: GFS2
- Corosync: 2.x
- Pacemaker: 1.1.7 or later
- Other: none
I'm running RHEL 6.3. I have 2-node cluster without fencing - Vmware
machines are used.
I had previously installed corosync and pacemaker by using yum, so the old
versions were installed (corosync 1.4.x, pacemaker 1.1.7). Everything worked
after I created config files. I used crmsh.
Then I tried to compile from latest source from git the new versions of:
- libqb
- resource-agents
- corosync
- pacemaker
Compilation of the pacemaker was done with ./configure --without-cman
--without-heartbeat
Now everything new is installed.
[root at tjtcaps01 ~]# rpm -qa | grep libqb
libqb-devel-0.16.0-1.el6.x86_64
libqb-0.16.0-1.el6.x86_64
[root at tjtcaps01 ~]# rpm -qa | grep resource-agents
resource-agents-3.9.5-1.158.1a87e.el6.x86_64
[root at tjtcaps01 ~]# rpm -qa | grep corosync
corosync-2.3.1-1.el6.x86_64
corosynclib-2.3.1-1.el6.x86_64
corosynclib-devel-2.3.1-1.el6.x86_64
[root at tjtcaps01 ~]# rpm -qa | grep pacemaker
pacemaker-cts-1.1.10-1.el6.x86_64
pacemaker-remote-1.1.10-1.el6.x86_64
pacemaker-cli-1.1.10-1.el6.x86_64
pacemaker-1.1.10-1.el6.x86_64
drbd-pacemaker-8.4.3-2.el6.x86_64
pacemaker-libs-1.1.10-1.el6.x86_64
pacemaker-libs-devel-1.1.10-1.el6.x86_64
pacemaker-doc-1.1.10-1.el6.x86_64
pacemaker-cluster-libs-1.1.10-1.el6.x86_64
What I have left were configuration files. CIB was not adjusted. Corosync
config (/etc/corosync/corosync.conf) was altered like this (lines commented
are from previous version):
[root at tjtcaps01 ~]# cat /etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.105.0
mcastaddr: 226.95.1.1
mcastport: 4000
ttl: 1
}
}
quorum {
provider: corosync_votequorum
expected_votes: 1
two_node: 1
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
#amf {
# mode: disabled
#}
#aisexec {
# user: root
# group: root
#}
#service {
# Load the Pacemaker Cluster Resource Manager
# name: pacemaker
# ver: 0
# }
I booted the services on both nodes in the following matter:
1. service corosync start
2. service pacemaker start
Then checked status with pcs:
[root at tjtcaps01 ~]# pcs status
Cluster name:
Last updated: Tue Aug 13 09:24:24 2013
Last change: Tue Aug 13 09:24:21 2013 via cibadmin on tjtcaps01
Stack: corosync
Current DC: tjtcaps01 (3232262648) - partition with quorum
Version: 1.1.10-1.el6-368c726
4 Nodes configured
6 Resources configured
Online: [ tjtcaps01 tjtcaps02 ]
OFFLINE: [ tjtcaps01 tjtcaps02 ]
Full list of resources:
Resource Group: PGServer
pg_lvm (ocf::heartbeat:LVM): Started tjtcaps01
pg_fs (ocf::heartbeat:Filesystem): Started tjtcaps01
pg_lsb (lsb:postgresql-9.2): Started tjtcaps01
pg_vip (ocf::heartbeat:IPaddr2): Started tjtcaps01
Master/Slave Set: ms_drbd_pg [drbd_pg]
Masters: [ tjtcaps01 ]
Slaves: [ tjtcaps02 ]
PCSD Status:
192.168.105.248: Offline
192.168.105.249: Offline
Here are my questions:
1. Was it the right path I have taken to install Corosync 2.0 +
pacemaker 1.1.10 even if I am using RHEL? Suggestion 3 on the aforementioned
blog seemed nicer to me than option 2 (Everyone Talks to CMAN).
2. Why the situation with node duplicates occurred? Did corosync
import to CIB those two additional nodes or pacemaker automatically add a
new definition of the same nodes to the CIB?
3. Which nodes from cib should I delete? (please see the CIB query
later in the mail)
a. This definition? <node id="tjtcaps01" type="normal"
uname="tjtcaps01"/>
b. Or this definition? <node id="3232262648" uname="tjtcaps01"/>
4. Why does pcs status command show PCSD Status Offline? If this is
not the right place to ask about corosync I will contact corosync mailing
list?
[root at tjtcaps01 ~]# cibadmin -Ql
<cib epoch="26" num_updates="11" admin_epoch="0"
validate-with="pacemaker-1.2" crm_feature_set="3.0.7"
update-origin="tjtcaps01" update-client="cibadmin" cib-last-written="Tue Aug
13 09:24:21 2013" have-quorum="1" dc-uuid="3232262648">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-1.el6-368c726"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-maintenance-mode"
name="maintenance-mode" value="false"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="tjtcaps01" type="normal" uname="tjtcaps01"/>
<node id="tjtcaps02" type="normal" uname="tjtcaps02"/>
<node id="3232262648" uname="tjtcaps01"/>
<node id="3232262649" uname="tjtcaps02"/>
</nodes>
<resources>
<group id="PGServer">
<primitive class="ocf" id="pg_lvm" provider="heartbeat" type="LVM">
<instance_attributes id="pg_lvm-instance_attributes">
<nvpair id="pg_lvm-instance_attributes-volgrpname"
name="volgrpname" value="vg_drbd"/>
</instance_attributes>
<operations>
<op id="pg_lvm-start-0" interval="0" name="start" timeout="30"/>
<op id="pg_lvm-stop-0" interval="0" name="stop" timeout="30"/>
</operations>
</primitive>
<primitive class="ocf" id="pg_fs" provider="heartbeat"
type="Filesystem">
<instance_attributes id="pg_fs-instance_attributes">
<nvpair id="pg_fs-instance_attributes-device" name="device"
value="/dev/vg_drbd/lv_pgsql"/>
<nvpair id="pg_fs-instance_attributes-directory"
name="directory" value="/var/lib/pgsql/9.2/data"/>
<nvpair id="pg_fs-instance_attributes-options" name="options"
value="noatime,nodiratime"/>
<nvpair id="pg_fs-instance_attributes-fstype" name="fstype"
value="xfs"/>
</instance_attributes>
<operations>
<op id="pg_fs-start-0" interval="0" name="start" timeout="60"/>
<op id="pg_fs-stop-0" interval="0" name="stop" timeout="120"/>
</operations>
</primitive>
<primitive class="lsb" id="pg_lsb" type="postgresql-9.2">
<operations>
<op id="pg_lsb-monitor-30" interval="30" name="monitor"
timeout="60"/>
<op id="pg_lsb-start-0" interval="0" name="start" timeout="60"/>
<op id="pg_lsb-stop-0" interval="0" name="stop" timeout="60"/>
</operations>
</primitive>
<primitive class="ocf" id="pg_vip" provider="heartbeat"
type="IPaddr2">
<instance_attributes id="pg_vip-instance_attributes">
<nvpair id="pg_vip-instance_attributes-ip" name="ip"
value="192.168.105.252"/>
<nvpair id="pg_vip-instance_attributes-iflabel" name="iflabel"
value="tjtcapsvip"/>
</instance_attributes>
<operations>
<op id="pg_vip-monitor-5" interval="5" name="monitor"/>
</operations>
</primitive>
</group>
<master id="ms_drbd_pg">
<meta_attributes id="ms_drbd_pg-meta_attributes">
<nvpair id="ms_drbd_pg-meta_attributes-master-max"
name="master-max" value="1"/>
<nvpair id="ms_drbd_pg-meta_attributes-master-node-max"
name="master-node-max" value="1"/>
<nvpair id="ms_drbd_pg-meta_attributes-clone-max" name="clone-max"
value="2"/>
<nvpair id="ms_drbd_pg-meta_attributes-clone-node-max"
name="clone-node-max" value="1"/>
<nvpair id="ms_drbd_pg-meta_attributes-notify" name="notify"
value="true"/>
</meta_attributes>
<primitive class="ocf" id="drbd_pg" provider="linbit" type="drbd">
<instance_attributes id="drbd_pg-instance_attributes">
<nvpair id="drbd_pg-instance_attributes-drbd_resource"
name="drbd_resource" value="postgres"/>
</instance_attributes>
<operations>
<op id="drbd_pg-monitor-15" interval="15" name="monitor"
role="Master"/>
<op id="drbd_pg-monitor-16" interval="16" name="monitor"
role="Slave"/>
<op id="drbd_pg-start-0" interval="0" name="start"
timeout="240"/>
<op id="drbd_pg-stop-0" interval="0" name="stop" timeout="120"/>
</operations>
</primitive>
</master>
</resources>
<constraints>
<rsc_location id="master-prefer-node1" node="tjtcaps01" rsc="pg_vip"
score="50"/>
<rsc_colocation id="col_pg_drbd" rsc="PGServer" score="INFINITY"
with-rsc="ms_drbd_pg" with-rsc-role="Master"/>
<rsc_order first="ms_drbd_pg" first-action="promote" id="ord_pg"
score="INFINITY" then="PGServer" then-action="start"/>
<rsc_location id="cli-prefer-PGServer" rsc="PGServer" node="tjtcaps01"
score="INFINITY"/>
</constraints>
<rsc_defaults>
<meta_attributes id="rsc-options">
<nvpair id="rsc-options-resource-stickiness"
name="resource-stickiness" value="100"/>
</meta_attributes>
</rsc_defaults>
</configuration>
<status>
<node_state id="3232262648" uname="tjtcaps01" in_ccm="true"
crmd="online" crm-debug-origin="do_update_resource" join="member"
expected="member">
<transient_attributes id="3232262648">
<instance_attributes id="status-3232262648">
<nvpair id="status-3232262648-master-drbd_pg"
name="master-drbd_pg" value="10000"/>
<nvpair id="status-3232262648-probe_complete"
name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
<lrm id="3232262648">
<lrm_resources>
<lrm_resource id="pg_vip" type="IPaddr2" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pg_vip_last_failure_0"
operation_key="pg_vip_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="6:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;6:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="17" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="257" queue-time="6"
op-digest="a0e257157c7f43b5dbaea731697d31ca"/>
<lrm_rsc_op id="pg_vip_monitor_5000"
operation_key="pg_vip_monitor_5000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="14:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;14:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="31" rc-code="0" op-status="0" interval="5000"
last-rc-change="1376378662" exec-time="145" queue-time="0"
op-digest="ae3a464c19aa8cd5b27dfe56422f45f1"/>
</lrm_resource>
<lrm_resource id="pg_lvm" type="LVM" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pg_lvm_last_failure_0"
operation_key="pg_lvm_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="3:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;3:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="5" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="147" queue-time="0"
op-digest="8d4be0b9171df5ac3b4484891fe0f160"/>
</lrm_resource>
<lrm_resource id="pg_lsb" type="postgresql-9.2" class="lsb">
<lrm_rsc_op id="pg_lsb_last_failure_0"
operation_key="pg_lsb_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="5:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;5:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="13" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="92" queue-time="0"
op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="pg_lsb_monitor_30000"
operation_key="pg_lsb_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="11:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;11:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="29" rc-code="0" op-status="0" interval="30000"
last-rc-change="1376378662" exec-time="66" queue-time="0"
op-digest="873ed4f07792aa8ff18f3254244675ea"/>
</lrm_resource>
<lrm_resource id="pg_fs" type="Filesystem" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pg_fs_last_failure_0"
operation_key="pg_fs_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="4:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;4:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="9" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="256" queue-time="0"
op-digest="8d338a386e15f64e7389c329553bbead"/>
</lrm_resource>
<lrm_resource id="drbd_pg" type="drbd" class="ocf"
provider="linbit">
<lrm_rsc_op id="drbd_pg_last_failure_0"
operation_key="drbd_pg_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="7:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:8;7:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="22" rc-code="8" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="301" queue-time="1"
op-digest="aced06114de28a9ed9baeef6ca82fda7"/>
<lrm_rsc_op id="drbd_pg_monitor_15000"
operation_key="drbd_pg_monitor_15000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="23:69:8:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:8;23:69:8:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="33" rc-code="8" op-status="0" interval="15000"
last-rc-change="1376378662" exec-time="160" queue-time="0"
op-digest="1b574855f35af4f42926160a697d4dac"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="3232262649" in_ccm="true" crmd="online" join="member"
crm-debug-origin="do_update_resource" uname="tjtcaps02" expected="member">
<transient_attributes id="3232262649">
<instance_attributes id="status-3232262649">
<nvpair id="status-3232262649-master-drbd_pg"
name="master-drbd_pg" value="10000"/>
<nvpair id="status-3232262649-probe_complete"
name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
<lrm id="3232262649">
<lrm_resources>
<lrm_resource id="pg_vip" type="IPaddr2" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pg_vip_last_0" operation_key="pg_vip_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="7:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;7:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="17" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="254" queue-time="0"
op-digest="a0e257157c7f43b5dbaea731697d31ca"/>
</lrm_resource>
<lrm_resource id="pg_lvm" type="LVM" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pg_lvm_last_0" operation_key="pg_lvm_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="4:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;4:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="5" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="162" queue-time="0"
op-digest="8d4be0b9171df5ac3b4484891fe0f160"/>
</lrm_resource>
<lrm_resource id="pg_lsb" type="postgresql-9.2" class="lsb">
<lrm_rsc_op id="pg_lsb_last_0" operation_key="pg_lsb_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="6:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;6:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="13" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="116" queue-time="0"
op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
</lrm_resource>
<lrm_resource id="pg_fs" type="Filesystem" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pg_fs_last_0" operation_key="pg_fs_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="5:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;5:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="9" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="285" queue-time="0"
op-digest="8d338a386e15f64e7389c329553bbead"/>
</lrm_resource>
<lrm_resource id="drbd_pg" type="drbd" class="ocf"
provider="linbit">
<lrm_rsc_op id="drbd_pg_last_failure_0"
operation_key="drbd_pg_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="8:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;8:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="22" rc-code="0" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="283" queue-time="1"
op-digest="aced06114de28a9ed9baeef6ca82fda7"/>
<lrm_rsc_op id="drbd_pg_monitor_16000"
operation_key="drbd_pg_monitor_16000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="26:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;26:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="29" rc-code="0" op-status="0" interval="16000"
last-rc-change="1376378662" exec-time="115" queue-time="0"
op-digest="1b574855f35af4f42926160a697d4dac"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
</status>
</cib>
Best regards,
Michal Mistina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130813/62774aaf/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3076 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130813/62774aaf/attachment-0003.p7s>
More information about the Pacemaker
mailing list