[Pacemaker] Collocating resource with a started clone instance

Fri Jun 22 12:31:01 CEST 2012

On 06/22/2012 11:58 AM, Sergey Tachenov wrote:
> Hi!
> 
> I'm trying to set up a 2-node cluster. I'm new to pacemaker, but
> things are getting better and better. However, I am completely at a
> loss here.
> 
> I have a cloned tomcat resource, which runs on both nodes and doesn't
> really depend on anything (it doesn't use DRBD or anything else of
> that sort). But I'm trying to get pacemaker move the cluster IP to
> another node in case tomcat fails. Here's the relevant parts of my
> config:
> 
> node srvplan1
> node srvplan2
> primitive DBIP ocf:heartbeat:IPaddr2 \
>        params ip="1.2.3.4" cidr_netmask="24" \
>        op monitor interval="10s"
> primitive drbd_pgdrive ocf:linbit:drbd \
>        params drbd_resource="pgdrive" \
>        op start interval="0" timeout="240" \
>        op stop interval="0" timeout="100"
> primitive pgdrive_fs ocf:heartbeat:Filesystem \
>        params device="/dev/drbd0" directory="/hd2" fstype="ext4"
> primitive ping ocf:pacemaker:ping \
>        params host_list="193.233.59.2" multiplier="1000" \
>        op monitor interval="10"
> primitive postgresql ocf:heartbeat:pgsql \
>        params pgdata="/hd2/pgsql" \
>        op monitor interval="30" timeout="30" depth="0" \
>        op start interval="0" timeout="60" \
>        op stop interval="0" timeout="60" \
>        meta target-role="Started"
> primitive tomcat ocf:heartbeat:tomcat \
>        params java_home="/usr/lib/jvm/jre"
> catalina_home="/usr/share/tomcat" tomcat_user="tomcat"
> script_log="/home/tmo/log/tomcat.log"
> statusurl="http://127.0.0.1:8080/status/" \
>        op start interval="0" timeout="60" \
>        op stop interval="0" timeout="120" \
>        op monitor interval="30" timeout="30"
> group postgres pgdrive_fs DBIP postgresql
> ms ms_drbd_pgdrive drbd_pgdrive \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> clone pings ping \
>        meta interleave="true"
> clone tomcats tomcat \
>        meta interleave="true" target-role="Started"
> location DBIPcheck DBIP \
>        rule $id="DBIPcheck-rule" 10000: defined pingd and pingd gt 0
> location master-prefer-node1 DBIP 50: srvplan1
> colocation DBIP-on-web 1000: DBIP tomcats

try inf: ... 1000: will be not enough .... because DBIP is also part of
postgres group and that group must follow the DRBD Master

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> colocation postgres_on_drbd inf: postgres ms_drbd_pgdrive:Master
> order postgres_after_drbd inf: ms_drbd_pgdrive:promote postgres:start
> 
> As you can see, there are three explicit constraints for the DBIP
> resource: preferred node (srvplan1, score 50), successful ping (score
> 10000) and running tomcat (score 1000). There's also the resource
> stickiness set to 100. Implicit constraints include collocation of the
> postgres group with the DRBD master instance.
> 
> The ping check works fine: if I unplug the external LAN cable or use
> iptables to block pings, everything gets moved to another node.
> 
> Check for tomcat isn't working for some reason, though:
> 
> [root at srvplan1 bin]# crm_mon -1
> ============
> Last updated: Fri Jun 22 10:06:59 2012
> Last change: Fri Jun 22 09:43:16 2012 via cibadmin on srvplan1
> Stack: openais
> Current DC: srvplan1 - partition with quorum
> Version: 1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 17 Resources configured.
> ============
> 
> Online: [ srvplan1 srvplan2 ]
> 
>  Master/Slave Set: ms_drbd_pgdrive [drbd_pgdrive]
>     Masters: [ srvplan1 ]
>     Slaves: [ srvplan2 ]
>  Resource Group: postgres
>     pgdrive_fs (ocf::heartbeat:Filesystem):    Started srvplan1
>     DBIP       (ocf::heartbeat:IPaddr2):       Started srvplan1
>     postgresql (ocf::heartbeat:pgsql): Started srvplan1
>  Clone Set: pings [ping]
>     Started: [ srvplan1 srvplan2 ]
>  Clone Set: tomcats [tomcat]
>     Started: [ srvplan2 ]
>     Stopped: [ tomcat:0 ]
> 
> Failed actions:
>    tomcat:0_start_0 (node=srvplan1, call=37, rc=-2, status=Timed
> Out): unknown exec error
> 
> As you can see, tomcat is stopped on srvplan1 (I have deliberately
> messed up the startup scripts), but everything else still runs there.
> ptest -L -s shows:
> 
> clone_color: ms_drbd_pgdrive allocation score on srvplan1: 10350
> clone_color: ms_drbd_pgdrive allocation score on srvplan2: 10000
> clone_color: drbd_pgdrive:0 allocation score on srvplan1: 10100
> clone_color: drbd_pgdrive:0 allocation score on srvplan2: 0
> clone_color: drbd_pgdrive:1 allocation score on srvplan1: 0
> clone_color: drbd_pgdrive:1 allocation score on srvplan2: 10100
> native_color: drbd_pgdrive:0 allocation score on srvplan1: 10100
> native_color: drbd_pgdrive:0 allocation score on srvplan2: 0
> native_color: drbd_pgdrive:1 allocation score on srvplan1: -INFINITY
> native_color: drbd_pgdrive:1 allocation score on srvplan2: 10100
> drbd_pgdrive:0 promotion score on srvplan1: 30700
> drbd_pgdrive:1 promotion score on srvplan2: 30000
> group_color: postgres allocation score on srvplan1: 0
> group_color: postgres allocation score on srvplan2: 0
> group_color: pgdrive_fs allocation score on srvplan1: 100
> group_color: pgdrive_fs allocation score on srvplan2: 0
> group_color: DBIP allocation score on srvplan1: 10150
> group_color: DBIP allocation score on srvplan2: 10000
> group_color: postgresql allocation score on srvplan1: 100
> group_color: postgresql allocation score on srvplan2: 0
> native_color: pgdrive_fs allocation score on srvplan1: 20450
> native_color: pgdrive_fs allocation score on srvplan2: -INFINITY
> clone_color: tomcats allocation score on srvplan1: -INFINITY
> clone_color: tomcats allocation score on srvplan2: 0
> clone_color: tomcat:0 allocation score on srvplan1: -INFINITY
> clone_color: tomcat:0 allocation score on srvplan2: 0
> clone_color: tomcat:1 allocation score on srvplan1: -INFINITY
> clone_color: tomcat:1 allocation score on srvplan2: 100
> native_color: tomcat:1 allocation score on srvplan1: -INFINITY
> native_color: tomcat:1 allocation score on srvplan2: 100
> native_color: tomcat:0 allocation score on srvplan1: -INFINITY
> native_color: tomcat:0 allocation score on srvplan2: -INFINITY
> native_color: DBIP allocation score on srvplan1: 9250
> native_color: DBIP allocation score on srvplan2: -INFINITY
> native_color: postgresql allocation score on srvplan1: 100
> native_color: postgresql allocation score on srvplan2: -INFINITY
> clone_color: pings allocation score on srvplan1: 0
> clone_color: pings allocation score on srvplan2: 0
> clone_color: ping:0 allocation score on srvplan1: 100
> clone_color: ping:0 allocation score on srvplan2: 0
> clone_color: ping:1 allocation score on srvplan1: 0
> clone_color: ping:1 allocation score on srvplan2: 100
> native_color: ping:0 allocation score on srvplan1: 100
> native_color: ping:0 allocation score on srvplan2: 0
> native_color: ping:1 allocation score on srvplan1: -INFINITY
> native_color: ping:1 allocation score on srvplan2: 100
> 
> Why the score for the DBIP is -INFINITY on the srvplan2? The only INF
> rule in my config is the collocation rule for the postgres group. I
> can surmise that DBIP can't be run on srvplan2 because the DRBD isn't
> Master there, but there's nothing preventing it from being promoted,
> and this rule doesn't stop the DBIP from being moved in case of ping
> failure either. So there must be something else.
> 
> I also don't quite understand why the DBIP score is 9250 on srvplan1.
> It should be at least 10000 for the ping, and 250 more for preference
> and stickiness. If I migrate the DBIP to srvplan2 manually, the score
> is 10200 there, which makes me think that 1000 gets subtracted because
> tomcat is stopped on srvplan1. But why? This is a positive rule, not a
> negative one. It should just add 1000 if tomcat is running, but
> shouldn't subtract anything if it isn't, am I wrong?
> 
> Does this have anything to do with the fact I'm trying to collocate
> the IP with a clone? Or am I looking in the wrong direction?
> 
> I tried removing DBIP from the group, and it got moved to another
> node. Obviously, everything else was left on the first one. Then I
> tried adding a collocation of DBIP with postgres resources (and the
> other way around), and if the score of that rules is high enough, the
> IP gets moved back, but I never was able to get postgres moved on the
> second node (where the IP is) instead.
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120622/8056f588/attachment-0001.sig>