[Pacemaker] Collocating resource with a started clone instance
Andreas Kurz
andreas at hastexo.com
Fri Jun 22 10:31:01 UTC 2012
On 06/22/2012 11:58 AM, Sergey Tachenov wrote:
> Hi!
>
> I'm trying to set up a 2-node cluster. I'm new to pacemaker, but
> things are getting better and better. However, I am completely at a
> loss here.
>
> I have a cloned tomcat resource, which runs on both nodes and doesn't
> really depend on anything (it doesn't use DRBD or anything else of
> that sort). But I'm trying to get pacemaker move the cluster IP to
> another node in case tomcat fails. Here's the relevant parts of my
> config:
>
> node srvplan1
> node srvplan2
> primitive DBIP ocf:heartbeat:IPaddr2 \
> params ip="1.2.3.4" cidr_netmask="24" \
> op monitor interval="10s"
> primitive drbd_pgdrive ocf:linbit:drbd \
> params drbd_resource="pgdrive" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100"
> primitive pgdrive_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/hd2" fstype="ext4"
> primitive ping ocf:pacemaker:ping \
> params host_list="193.233.59.2" multiplier="1000" \
> op monitor interval="10"
> primitive postgresql ocf:heartbeat:pgsql \
> params pgdata="/hd2/pgsql" \
> op monitor interval="30" timeout="30" depth="0" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60" \
> meta target-role="Started"
> primitive tomcat ocf:heartbeat:tomcat \
> params java_home="/usr/lib/jvm/jre"
> catalina_home="/usr/share/tomcat" tomcat_user="tomcat"
> script_log="/home/tmo/log/tomcat.log"
> statusurl="http://127.0.0.1:8080/status/" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="120" \
> op monitor interval="30" timeout="30"
> group postgres pgdrive_fs DBIP postgresql
> ms ms_drbd_pgdrive drbd_pgdrive \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> clone pings ping \
> meta interleave="true"
> clone tomcats tomcat \
> meta interleave="true" target-role="Started"
> location DBIPcheck DBIP \
> rule $id="DBIPcheck-rule" 10000: defined pingd and pingd gt 0
> location master-prefer-node1 DBIP 50: srvplan1
> colocation DBIP-on-web 1000: DBIP tomcats
try inf: ... 1000: will be not enough .... because DBIP is also part of
postgres group and that group must follow the DRBD Master
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
> colocation postgres_on_drbd inf: postgres ms_drbd_pgdrive:Master
> order postgres_after_drbd inf: ms_drbd_pgdrive:promote postgres:start
>
> As you can see, there are three explicit constraints for the DBIP
> resource: preferred node (srvplan1, score 50), successful ping (score
> 10000) and running tomcat (score 1000). There's also the resource
> stickiness set to 100. Implicit constraints include collocation of the
> postgres group with the DRBD master instance.
>
> The ping check works fine: if I unplug the external LAN cable or use
> iptables to block pings, everything gets moved to another node.
>
> Check for tomcat isn't working for some reason, though:
>
> [root at srvplan1 bin]# crm_mon -1
> ============
> Last updated: Fri Jun 22 10:06:59 2012
> Last change: Fri Jun 22 09:43:16 2012 via cibadmin on srvplan1
> Stack: openais
> Current DC: srvplan1 - partition with quorum
> Version: 1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 17 Resources configured.
> ============
>
> Online: [ srvplan1 srvplan2 ]
>
> Master/Slave Set: ms_drbd_pgdrive [drbd_pgdrive]
> Masters: [ srvplan1 ]
> Slaves: [ srvplan2 ]
> Resource Group: postgres
> pgdrive_fs (ocf::heartbeat:Filesystem): Started srvplan1
> DBIP (ocf::heartbeat:IPaddr2): Started srvplan1
> postgresql (ocf::heartbeat:pgsql): Started srvplan1
> Clone Set: pings [ping]
> Started: [ srvplan1 srvplan2 ]
> Clone Set: tomcats [tomcat]
> Started: [ srvplan2 ]
> Stopped: [ tomcat:0 ]
>
> Failed actions:
> tomcat:0_start_0 (node=srvplan1, call=37, rc=-2, status=Timed
> Out): unknown exec error
>
> As you can see, tomcat is stopped on srvplan1 (I have deliberately
> messed up the startup scripts), but everything else still runs there.
> ptest -L -s shows:
>
> clone_color: ms_drbd_pgdrive allocation score on srvplan1: 10350
> clone_color: ms_drbd_pgdrive allocation score on srvplan2: 10000
> clone_color: drbd_pgdrive:0 allocation score on srvplan1: 10100
> clone_color: drbd_pgdrive:0 allocation score on srvplan2: 0
> clone_color: drbd_pgdrive:1 allocation score on srvplan1: 0
> clone_color: drbd_pgdrive:1 allocation score on srvplan2: 10100
> native_color: drbd_pgdrive:0 allocation score on srvplan1: 10100
> native_color: drbd_pgdrive:0 allocation score on srvplan2: 0
> native_color: drbd_pgdrive:1 allocation score on srvplan1: -INFINITY
> native_color: drbd_pgdrive:1 allocation score on srvplan2: 10100
> drbd_pgdrive:0 promotion score on srvplan1: 30700
> drbd_pgdrive:1 promotion score on srvplan2: 30000
> group_color: postgres allocation score on srvplan1: 0
> group_color: postgres allocation score on srvplan2: 0
> group_color: pgdrive_fs allocation score on srvplan1: 100
> group_color: pgdrive_fs allocation score on srvplan2: 0
> group_color: DBIP allocation score on srvplan1: 10150
> group_color: DBIP allocation score on srvplan2: 10000
> group_color: postgresql allocation score on srvplan1: 100
> group_color: postgresql allocation score on srvplan2: 0
> native_color: pgdrive_fs allocation score on srvplan1: 20450
> native_color: pgdrive_fs allocation score on srvplan2: -INFINITY
> clone_color: tomcats allocation score on srvplan1: -INFINITY
> clone_color: tomcats allocation score on srvplan2: 0
> clone_color: tomcat:0 allocation score on srvplan1: -INFINITY
> clone_color: tomcat:0 allocation score on srvplan2: 0
> clone_color: tomcat:1 allocation score on srvplan1: -INFINITY
> clone_color: tomcat:1 allocation score on srvplan2: 100
> native_color: tomcat:1 allocation score on srvplan1: -INFINITY
> native_color: tomcat:1 allocation score on srvplan2: 100
> native_color: tomcat:0 allocation score on srvplan1: -INFINITY
> native_color: tomcat:0 allocation score on srvplan2: -INFINITY
> native_color: DBIP allocation score on srvplan1: 9250
> native_color: DBIP allocation score on srvplan2: -INFINITY
> native_color: postgresql allocation score on srvplan1: 100
> native_color: postgresql allocation score on srvplan2: -INFINITY
> clone_color: pings allocation score on srvplan1: 0
> clone_color: pings allocation score on srvplan2: 0
> clone_color: ping:0 allocation score on srvplan1: 100
> clone_color: ping:0 allocation score on srvplan2: 0
> clone_color: ping:1 allocation score on srvplan1: 0
> clone_color: ping:1 allocation score on srvplan2: 100
> native_color: ping:0 allocation score on srvplan1: 100
> native_color: ping:0 allocation score on srvplan2: 0
> native_color: ping:1 allocation score on srvplan1: -INFINITY
> native_color: ping:1 allocation score on srvplan2: 100
>
> Why the score for the DBIP is -INFINITY on the srvplan2? The only INF
> rule in my config is the collocation rule for the postgres group. I
> can surmise that DBIP can't be run on srvplan2 because the DRBD isn't
> Master there, but there's nothing preventing it from being promoted,
> and this rule doesn't stop the DBIP from being moved in case of ping
> failure either. So there must be something else.
>
> I also don't quite understand why the DBIP score is 9250 on srvplan1.
> It should be at least 10000 for the ping, and 250 more for preference
> and stickiness. If I migrate the DBIP to srvplan2 manually, the score
> is 10200 there, which makes me think that 1000 gets subtracted because
> tomcat is stopped on srvplan1. But why? This is a positive rule, not a
> negative one. It should just add 1000 if tomcat is running, but
> shouldn't subtract anything if it isn't, am I wrong?
>
> Does this have anything to do with the fact I'm trying to collocate
> the IP with a clone? Or am I looking in the wrong direction?
>
> I tried removing DBIP from the group, and it got moved to another
> node. Obviously, everything else was left on the first one. Then I
> tried adding a collocation of DBIP with postgres resources (and the
> other way around), and if the score of that rules is high enough, the
> IP gets moved back, but I never was able to get postgres moved on the
> second node (where the IP is) instead.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120622/8056f588/attachment-0004.sig>
More information about the Pacemaker
mailing list