[Pacemaker] crm resource move doesn't move the resource

Fri Oct 8 20:05:42 UTC 2010

On 8 October 2010 09:29, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis
> <pavlos.parissis at gmail.com> wrote:
>> On 8 October 2010 08:29, Andrew Beekhof <andrew at beekhof.net> wrote:
>>> On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis
>>> <pavlos.parissis at gmail.com> wrote:
>>>>
>>>>
>>>> On 7 October 2010 09:01, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>
>>>>> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis
>>>>> <pavlos.parissis at gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am having again the same issue, in a different set of 3 nodes. When I
>>>>> > try
>>>>> > to failover manually the resource group on the standby node, the ms-drbd
>>>>> > resource is not moved as well and as a result the resource group is not
>>>>> > fully started, only the ip resource is started.
>>>>> > Any ideas why I am having this issue?
>>>>>
>>>>> I think its a bug that was fixed recently.  Could you try the latest
>>>>> from code Mercurial?
>>>>
>>>> 1.1 or 1.2 branch?
>>>
>>> 1.1
>>>
>> to save time on compiling stuff I want to use the available rpms on
>> 1.1.3 version from rpm-next repo.
>> But before I go and recreate the scenario, which means rebuild 3
>> nodes, I would like to know if this bug is fixed in 1.1.3
>
> As I said, I believe so.
>

I've just upgraded[1] my pacemaker to 1.1.3 and stonithd can not be
started, am I missing something?

Oct 08 21:08:01 node-02 heartbeat: [14192]: info: Starting
"/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 14192)
Oct 08 21:08:01 node-02 heartbeat: [14193]: info: Starting
"/usr/lib/heartbeat/attrd" as uid 101  gid 103 (pid 14193)
Oct 08 21:08:01 node-02 heartbeat: [14194]: info: Starting
"/usr/lib/heartbeat/crmd" as uid 101  gid 103 (pid 14194)
Oct 08 21:08:01 node-02 ccm: [14189]: info: Hostname: node-02
Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed
Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM
Connection failed 1 times (30 max)
Oct 08 21:08:01 node-02 attrd: [14193]: info: Invoked: /usr/lib/heartbeat/attrd
Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: Invoked:
/usr/lib/heartbeat/stonithd
Oct 08 21:08:01 node-02 stonith-ng: [14192]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Client [stonith-ng]
pid 14192 failed authorization [no default client auth]
Oct 08 21:08:01 node-02 heartbeat: [14158]: ERROR:
api_process_registration_msg: cannot add client(stonith-ng)
Oct 08 21:08:01 node-02 stonith-ng: [14192]: ERROR:
register_heartbeat_conn: Cannot sign on with heartbeat:
Oct 08 21:08:01 node-02 stonith-ng: [14192]: CRIT: main: Cannot sign
in to the cluster... terminating
Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Managed
/usr/lib/heartbeat/stonithd process 14192 exited with return code 100.
Oct 08 21:08:01 node-02 crmd: [14194]: info: Invoked: /usr/lib/heartbeat/crmd
Oct 08 21:08:01 node-02 crmd: [14194]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Oct 08 21:08:02 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry
Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed
Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM
Connection failed 2 times (30 max)
Oct 08 21:08:05 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't
complete CIB registration 2 times... pause and retry
[..snip...]
Oct 08 21:08:33 node-02 crmd: [14194]: ERROR: te_connect_stonith:
Sign-in failed: triggered a retry

[1] I use CentOS 5.4 and when I did the installation I used the
following repository
[root at node-02 ~]# cat /etc/yum.repos.d/pacemaker.repo
[clusterlabs]
name=High Availability/Clustering server technologies (epel-5)
baseurl=http://www.clusterlabs.org/rpm/epel-5
type=rpm-md
gpgcheck=0
enabled=1

and in order to perform the upgrade I added the following rep.

[clusterlabs-next]
name=High Availability/Clustering server technologies (epel-5-next)
baseurl=http://www.clusterlabs.org/rpm-next/epel-5
metadata_expire=45m
type=rpm-md
gpgcheck=0
enabled=1

and here is the installation/upgrade log, where you can see only
pacemaker-libs and pacemaker were upgraded.
Oct 03 21:06:20 Installed: libibverbs-1.1.3-2.el5.i386
Oct 03 21:06:25 Installed: lm_sensors-2.10.7-9.el5.i386
Oct 03 21:06:31 Installed: 1:net-snmp-5.3.2.2-9.el5_5.1.i386
Oct 03 21:06:31 Installed: librdmacm-1.0.10-1.el5.i386
Oct 03 21:06:32 Installed: openhpi-libs-2.14.0-5.el5.i386
Oct 03 21:06:33 Installed: OpenIPMI-libs-2.0.16-7.el5.i386
Oct 03 21:06:35 Installed: libesmtp-1.0.4-5.el5.i386
Oct 03 21:06:36 Installed: cluster-glue-libs-1.0.6-1.6.el5.i386
Oct 03 21:06:37 Installed: heartbeat-libs-3.0.3-2.3.el5.i386
Oct 03 21:06:39 Installed: corosynclib-1.2.7-1.1.el5.i386
Oct 03 21:06:42 Installed: cluster-glue-1.0.6-1.6.el5.i386
Oct 03 21:06:45 Installed: resource-agents-1.0.3-2.6.el5.i386
Oct 03 21:06:46 Installed: heartbeat-3.0.3-2.3.el5.i386
Oct 03 21:06:47 Installed: pacemaker-libs-1.0.9.1-1.15.el5.i386
Oct 03 21:06:49 Installed: pacemaker-1.0.9.1-1.15.el5.i386
Oct 03 21:06:50 Installed: corosync-1.2.7-1.1.el5.i386
Oct 08 21:06:37 Updated: pacemaker-libs-1.1.3-1.el5.i386
Oct 08 21:06:43 Updated: pacemaker-1.1.3-1.el5.i386

and my conf
[root at node-02 log]# cibadmin -Ql|grep vali
<cib validate-with="pacemaker-1.0" crm_feature_set="3.0.2"
have-quorum="1" dc-uuid="b7764e7b-0a00-4745-8d9e-6911271eefb2"
admin_epoch="0" epoch="319" num_updates="60">
[root at node-02 log]# crm configure show
node $id="80275014-5efe-4825-a29c-d42610f08cd1" node-02
node $id="b7764e7b-0a00-4745-8d9e-6911271eefb2" node-03
node $id="c7459ab3-55b6-4155-946d-5c1ba783507f" node-01
primitive drbd_01 ocf:linbit:drbd \
        params drbd_resource="drbd_pbx_service_1" \
        op monitor interval="30s" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="120s"
primitive drbd_02 ocf:linbit:drbd \
        params drbd_resource="drbd_pbx_service_2" \
        op monitor interval="30s" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="120s"
primitive fs_01 ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \
        meta migration-threshold="3" failure-timeout="60" \
        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive fs_02 ocf:heartbeat:Filesystem \
        params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \
        meta migration-threshold="3" failure-timeout="60" \
        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive ip_01 ocf:heartbeat:IPaddr2 \
        params ip="192.168.78.10" cidr_netmask="24" broadcast="192.168.78.255" \
        meta failure-timeout="120" migration-threshold="3" \
        op monitor interval="5s"
primitive ip_02 ocf:heartbeat:IPaddr2 \
        params ip="192.168.78.20" cidr_netmask="24" broadcast="192.168.78.255" \
        meta failure-timeout="120" migration-threshold="3" \
        op monitor interval="5s"
primitive pbx_01 lsb:znd-pbx_01 \
        meta failure-timeout="120" migration-threshold="3"
target-role="Started" \
        op monitor interval="20s" timeout="40s" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive pbx_02 ocf:heartbeat:Dummy \
        params state="/pbx_service_02/Dummy.state" \
        meta failure-timeout="120" migration-threshold="3" \
        op monitor interval="20s" timeout="40s"
primitive sshd-pbx_01 lsb:sshd-pbx_01 \
        meta target-role="Started" \
        op monitor interval="10m" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive sshd-pbx_02 lsb:sshd-pbx_02 \
        meta target-role="Started" \
        op monitor interval="10m" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive stonith-meatware stonith:meatware \
        params hostlist="node-01 node-02 node-03" stonith-timeout="60" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
group pbx_service_01 ip_01 fs_01 pbx_01 sshd-pbx_01 \
        meta target-role="Started"
group pbx_service_02 ip_02 fs_02 pbx_02 sshd-pbx_02 \
        meta target-role="Started"
ms ms-drbd_01 drbd_01 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
ms ms-drbd_02 drbd_02 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
clone stonith-clone stonith-meatware \
        meta clone-max="3" clone-node-max="1" target-role="Started"
globally_unique="false"
location PrimaryNode-drbd_01 ms-drbd_01 100: node-01
location PrimaryNode-drbd_02 ms-drbd_02 100: node-02
location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01
location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02
location SecondaryNode-drbd_01 ms-drbd_01 0: node-03
location SecondaryNode-drbd_02 ms-drbd_02 0: node-03
location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03
location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03
location stonith-node-01 stonith-clone 100: node-01
location stonith-node-02 stonith-clone 100: node-02
location stonith-node-03 stonith-clone 100: node-03
colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master
colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master
order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start
order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote pbx_service_02:start
property $id="cib-bootstrap-options" \
        stonith-enabled="true" \
        symmetric-cluster="false" \
        dc-version="1.1.3-9c2342c0378140df9bed7d192f2b9ed157908007" \
        cluster-infrastructure="Heartbeat" \
        last-lrm-refresh="1286195722"
rsc_defaults $id="rsc-options" \
        resource-stickiness="1000"
[root at node-02 log]#