[Pacemaker] Pacemaker 1.1.6 order possible bug ?

Wed Sep 5 09:44:09 UTC 2012

On Wed, Sep 05, 2012 at 07:51:35AM +1000, Andrew Beekhof wrote:
> On Mon, Sep 3, 2012 at 3:41 PM, Tomáš Vavřička <vavricka at ttc.cz> wrote:
> > Hello,
> >
> > Sorry If I send same question twice, but message did not appeared on mailing
> > list.
> >
> > I have a problem with orders in pacemaker 1.1.6 and corosync 1.4.1.
> >
> > Order below is working for failover, but it is not working when one cluster
> > node starts up (drbd stays in Slave state and ms_toponet is started before
> > DRBD gets promoted).
> >
> > order o_start inf: ms_drbd_postgres:promote postgres:start
> > ms_toponet:promote monitor_cluster:start
> >
> > Order below is not working for failover (it kills slave toponet app and
> > start it again) but it is working correctly when cluster starts up.
> >
> > order o_start inf: ms_drbd_postgres:promote postgres:start ms_toponet:start
> > ms_toponet:promote monitor_cluster:start
> 
> I would recommend breaking this into "basic" constraints.
> The shell syntax for constraint sets has been a source of confusion for a while.

Nothing's wrong with the shell syntax here. I believe that this
has been discussed before. When in doubt what the shell does,
just use "show xml".

> order o1 inf: ms_drbd_postgres:promote postgres:start
> order o2 inf: postgres:start ms_toponet:start
> order o3 inf: ms_toponet:start ms_toponet:promote
> order o4 inf: ms_toponet:promote monitor_cluster:start
> 
> If you still have problems with the expanded form, let us know.

Resource sets are not an issue in order constraints, but rather
in collocations.

Thanks,

Dejan

> >
> > I want to the pacemaker to act as in 1.0.12 version.
> > * when toponet master app is killed, move postgres resource to other node
> > and promote ms_toponet and ms_drbd_postgres to Master
> > * when one node is starting promote DRBD to master is is UpToDate
> >
> > Am I doing something wrong?
> >
> > It looks to me pacemaker ignores some orders (pacemaker should wait for DRBD
> > promotion when starting toponet app, but toponet app is started right after
> > DRBD start (slave)). I tried to solve this by different orders with
> > combination symmetrical=false, split orders, different orders for start and
> > stop, but no success at all (seems to me like completely ignoring
> > symmetrical=false directive).
> >
> > Pacemaker 1.1.7 is not working for me, because it has broken on-fail
> > directive.
> >
> > crm_mon output:
> >
> > ============
> > Last updated: Fri Aug 31 14:51:11 2012
> > Last change: Fri Aug 31 14:50:27 2012 by hacluster via crmd on toponet30
> > Stack: openais
> > Current DC: toponet30 - partition WITHOUT quorum
> > Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e
> > 2 Nodes configured, 2 expected votes
> > 10 Resources configured.
> > ============
> >
> > Online: [ toponet30 toponet31 ]
> >
> > st_primary      (stonith:external/xen0):        Started toponet30
> > st_secondary    (stonith:external/xen0):        Started toponet31
> >  Master/Slave Set: ms_drbd_postgres
> >      Masters: [ toponet30 ]
> >      Slaves: [ toponet31 ]
> >  Resource Group: postgres
> >      pg_fs      (ocf::heartbeat:Filesystem):    Started toponet30
> >      PGIP       (ocf::heartbeat:IPaddr2):       Started toponet30
> >      postgresql (ocf::heartbeat:pgsql): Started toponet30
> > monitor_cluster (ocf::heartbeat:monitor_cluster):       Started toponet30
> >  Master/Slave Set: ms_toponet
> >      Masters: [ toponet30 ]
> >      Slaves: [ toponet31 ]
> >
> > configuration:
> >
> > node toponet30
> > node toponet31
> > primitive PGIP ocf:heartbeat:IPaddr2 \
> >         params ip="192.168.100.3" cidr_netmask="29" \
> >         op monitor interval="5s"
> > primitive drbd_postgres ocf:linbit:drbd \
> >         params drbd_resource="postgres" \
> >         op start interval="0" timeout="240s" \
> >         op stop interval="0" timeout="120s" \
> >         op monitor interval="5s" role="Master" timeout="10s" \
> >         op monitor interval="10s" role="Slave" timeout="20s"
> > primitive monitor_cluster ocf:heartbeat:monitor_cluster \
> >         op monitor interval="30s" \
> >         op start interval="0" timeout="30s" \
> >         meta target-role="Started"
> > primitive pg_fs ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd0" directory="/var/lib/pgsql" fstype="ext3"
> > primitive postgresql ocf:heartbeat:pgsql \
> >         op start interval="0" timeout="80s" \
> >         op stop interval="0" timeout="60s" \
> >         op monitor interval="10s" timeout="10s" depth="0"
> > primitive st_primary stonith:external/xen0 \
> >         op start interval="0" timeout="60s" \
> >         params hostlist="toponet31:/etc/xen/vm/toponet31"
> > dom0="172.16.103.54"
> > primitive st_secondary stonith:external/xen0 \
> >         op start interval="0" timeout="60s" \
> >         params hostlist="toponet30:/etc/xen/vm/toponet30"
> > dom0="172.16.103.54"
> > primitive toponet ocf:heartbeat:toponet \
> >         op start interval="0" timeout="180s" \
> >         op stop interval="0" timeout="60s" \
> >         op monitor interval="10s" role="Master" timeout="20s"
> > on-fail="standby" \
> >         op monitor interval="20s" role="Slave" timeout="40s" \
> >         op promote interval="0" timeout="120s" \
> >         op demote interval="0" timeout="120s"
> > group postgres pg_fs PGIP postgresql
> > ms ms_drbd_postgres drbd_postgres \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true" target-role="Master"
> > ms ms_toponet toponet \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" target-role="Master"
> > location loc_st_pri st_primary -inf: toponet31
> > location loc_st_sec st_secondary -inf: toponet30
> > location master-prefer-node1 postgres 100: toponet30
> > colocation pg_on_drbd inf: monitor_cluster ms_toponet:Master postgres
> > ms_drbd_postgres:Master
> > order o_start inf: ms_drbd_postgres:start ms_drbd_postgres:promote
> > postgres:start ms_toponet:start ms_toponet:promote monitor_cluster:start
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
> >         cluster-infrastructure="openais" \
> >         expected-quorum-votes="2" \
> >         no-quorum-policy="ignore" \
> >         stonith-enabled="true"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="5000"
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org