[Pacemaker] Failover constraint problem

Sandor Feher sfeher at bluesystem.hu
Sat Apr 17 04:29:31 EDT 2010


Hi,

First of all my goal is to set up a two-node cluster with pacemaker to
serve our webhosting service.
This config sites on two vmware virtual machines for testing purposes
now. Both of them runs Debian Lenny.

Here are the basic rules I set up:

node0  has

virtual ip
drbd primary filesystem mounted under /mnt
nfs server offers /mnt mount point to node1

node1 has

drbd secondary node
nfs_client mounts node0's /mnt dir and it should be rw for both nodes

If  node0 fails then node1 will act as primary drbd node, take over
virtual ip and mount drbd partition under /mnt dir and will not start
nfs_client resource because it makes no sense (nfs_client should be take
down before drbd partition get mounted under /mnt).
If node1 fails the nothing should be happen because nfs_client only runs
node which has secondary drbd partition

So my problems are the following.

1.  If I migrate apache-group resorce to another node then nfs_client
won't release the /mnt mount point (I know according to this config it
should not).
      I think I need some clever constraint to achieve this.

2. If I shot down node1 (suppose that node0 the master at the moment and
runs apache-group) then nothing happens as expected but if node1 comes
online again the apache-group start to migrate to node1. I don't
understand why because there is a constraint for this to get
apache-group run on node which primary drbd resource and in this
situation node0 is.


crm configure show

node node0 \
         attributes standby="off"
node node1 \
         attributes standby="off"
primitive drbd0 ocf:heartbeat:drbd \
         params drbd_resource="r0" \
         op monitor interval="59s" role="Master" timeout="30s" \
         op monitor interval="60s" role="Slave" timeout="30s"
primitive fs0 ocf:heartbeat:Filesystem \
         params fstype="ext3" directory="/mnt" device="/dev/drbd0" \
         meta target-role="Started"
primitive nfs_client ocf:heartbeat:Filesystem \
         params fstype="nfs" directory="/mnt/"
device="192.168.1.40:/mnt/"
options="hard,intr,noatime,rw,nolock,tcp,timeo=50" \
         meta target-role="Stopped"
primitive nfs_server lsb:nfs-kernel-server \
         op monitor interval="1min"
primitive virtual-ip ocf:heartbeat:IPaddr2 \
         params ip="192.168.1.40" broadcast="192.168.1.255" nic="eth0"
cidr_netmask="24" \
         op monitor interval="21s" timeout="5s" target-role="Started"
group apache-group fs0 virtual-ip nfs_server \
         meta target-role="Started"
ms ms-drbd0 drbd0 \
         meta clone-max="2" notify="true" globally-unique="false"
target-role="Started"
location cli-prefer-apache-group apache-group \
         rule $id="cli-prefer-rule-apache-group" inf: #uname eq node0
colocation apache-group-on-ms-drbd0 inf: apache-group ms-drbd0:Master
colocation co_nfs_client inf: nfs_client ms-drbd0:Slave
order ms-drbd0-before-apache-group inf: ms-drbd0:promote apache-group:start
order ms-drbd0-before-nfs_client inf: ms-drbd0:promote nfs_client:start
property $id="cib-bootstrap-options" \
         dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
         cluster-infrastructure="openais" \
         stonith-enabled="false" \
         no-quorum-policy="ignore" \
         expected-quorum-votes="2" \
         last-lrm-refresh="1271453094"

node1:~# crm_mon -1
============
Last updated: Fri Apr 16 23:49:30 2010
Stack: openais
Current DC: node0 - partition with quorum
Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ node0 node1 ]

  Resource Group: apache-group
      fs0        (ocf::heartbeat:Filesystem):    Started node1
(unmanaged) FAILED
      virtual-ip (ocf::heartbeat:IPaddr2):       Stopped
      nfs_server (lsb:nfs-kernel-server):        Stopped
  Master/Slave Set: ms-drbd0
      Masters: [ node0 ]
      Slaves: [ node1 ]
  nfs_client     (ocf::heartbeat:Filesystem):    Started node1
(unmanaged) FAILED

Failed actions:
     nfs_client_start_0 (node=node0, call=98, rc=1, status=complete):
unknown error
     fs0_stop_0 (node=node1, call=9, rc=-2, status=Timed Out): unknown
exec error
     nfs_client_stop_0 (node=node1, call=7, rc=-2, status=Timed Out):
unknown exec error


I really appreciate any idea. Thank you in advance.

Regards,   Sandor
_______________________________________________
Openais mailing list
Openais at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais




More information about the Pacemaker mailing list