[Pacemaker] Seems to be working but fails to transition to other node.

Thu May 31 00:16:56 UTC 2012

All Concerned;

I have been getting slapped around all day with this problem - I can't
solve it.

The system is only half done - I have not yet implemented the nfs portion -
but drbd part is not yet cooperating with corosync.

It appears to be working OK - but when I stop corosync on the DC - the
other node does not start drbd?

Here is how I am setting things up....

 Configure quorum<http://docs.homelinux.org/doku.php?id=create_high-available_drbd_device_with_pacemaker#fn__1>and
stonith<http://docs.homelinux.org/doku.php?id=create_high-available_drbd_device_with_pacemaker#fn__2>

property no-quorum-policy="ignore"
property stonith-enabled="false"

On wms1 onfigure DRBD resource

primitive drbd_drbd0 ocf:linbit:drbd \
                    params drbd_resource="drbd0" \
                    op monitor interval="30s"

Configure DRBD Master/Slave

ms ms_drbd_drbd0 drbd_drbd0 \
                    meta master-max="1" master-node-max="1" \
                         clone-max="2" clone-node-max="1" \
                         notify="true"

Configure filesystem mountpoint

primitive fs_ftpdata ocf:heartbeat:Filesystem \
                    params device="/dev/drbd0" \
                    directory="/mnt/drbd0" fstype="ext3"

When I check the status on the DC....

[root at wms2 ~]# crm
crm(live)# status
============
Last updated: Wed May 30 23:58:43 2012
Last change: Wed May 30 23:52:42 2012 via cibadmin on wms1
Stack: openais
Current DC: wms2 - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ wms1 wms2 ]

 Master/Slave Set: ms_drbd_drbd0 [drbd_drbd0]
     Masters: [ wms2 ]
     Slaves: [ wms1 ]
 fs_ftpdata    (ocf::heartbeat:Filesystem):    Started wms2

[root at wms2 ~]# mount -l | grep drbd

/dev/drbd0 on /mnt/drbd0 type ext3 (rw)

So I stop corosync - but the other node...

[root at wms1 ~]# crm
crm(live)# status
============
Last updated: Thu May 31 00:12:17 2012
Last change: Wed May 30 23:52:42 2012 via cibadmin on wms1
Stack: openais
Current DC: wms1 - partition WITHOUT quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ wms1 ]
OFFLINE: [ wms2 ]

 Master/Slave Set: ms_drbd_drbd0 [drbd_drbd0]
     Masters: [ wms1 ]
     Stopped: [ drbd_drbd0:1 ]

Fails to mount /dev/drbd0?

Any ideas?

I tailed /var/log/cluster/corosync.log and get this....

May 31 00:02:36 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 22 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:03:06 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 25 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:03:10 wms1 crmd: [1268]: WARN: cib_rsc_callback: Resource update
15 failed: (rc=-41) Remote node did not respond
May 31 00:03:36 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 28 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:06 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 31 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 34 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 37 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 40 for
master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:08:02 wms1 cib: [1257]: info: cib_stats: Processed 58 operations
(0.00us average, 0% utilization) in the last 10min
May 31 00:08:02 wms1 cib: [1264]: info: cib_stats: Processed 117 operations
(256.00us average, 0% utilization) in the last 10min

[root at wms2 ~]# tail /var/log/cluster/corosync.log
May 31 00:02:16 corosync [pcmk  ] info: update_member: Node wms2 now has
process list: 00000000000000000000000000000002 (2)
May 31 00:02:16 corosync [pcmk  ] notice: pcmk_shutdown: Shutdown complete
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: Pacemaker
Cluster Manager 1.1.6
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: corosync
extended virtual synchrony service
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: corosync
configuration service
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: corosync cluster
closed process group service v1.01
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: corosync cluster
config database access v1.01
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: corosync profile
loading service
May 31 00:02:16 corosync [SERV  ] Service engine unloaded: corosync cluster
quorum service v0.1
May 31 00:02:16 corosync [MAIN  ] Corosync Cluster Engine exiting with
status 0 at main.c:1858.

the example that I am working from talks about doing the following....

 group services fs_drbd0

But this fails miserable...  services being undefined?

-- 
Steven Silk
CSC
303 497 3112
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120530/31cc9bd6/attachment-0003.html>