[Pacemaker] Decreasing failover time when running DRBD+OCFS2+XEN in dual primary mode

Fri Jun 27 08:14:35 CEST 2014

On 13 Jun 2014, at 1:25 pm, kamal kishi <kamal.kishi at gmail.com> wrote:

> Fine Andrew, will check it out but does the timeouts provided for pacemaker affect this??

No, the timeouts just put a maximum on the time things can take before we decide the action failed.
Well, ok, the monitor timeout could play a role here.

> Which part of the time configuration will be considered by pacemaker to decide if the other node is actually down and the resources should be taken over by it.

None. Corosync/cman/heartbeat have their own timings which they use to decide if a node is dead or not.
All pacemaker gets is "up" or "down".

> 
> And Alexis, I'm not facing any issue while putting node to standby mode.
> I'm using DRBD 8.3.11 (apt-get install drbd8-utils=2:8.3.11-0ubuntu1)
> Had to force the download to particular version as the current download/patch is not compatible with pacemaker.
> You too try to install 8.3.11 and check once, all the best
> 
> 
> On Fri, Jun 13, 2014 at 5:22 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> On 12 Jun 2014, at 9:15 pm, kamal kishi <kamal.kishi at gmail.com> wrote:
> 
> > Hi All,
> >
> > This might be a basic question but I'm not sure whats taking time for failover switching.
> > Hope anyone can figure it out.
> 
> How about looking in the logs and seeing when the various stop/start actions occur and which ones take the longest?
> 
> >
> > Scenario -
> > Pacemaker running DRBD(Dual primary mode)+OCFS2+XEN for Virtual windows machine
> >
> > Pacemaker startup starts -
> > DRBD -> OCFS2 -> XEN
> > Lets consider under Server1  - DRBD, OCFS2(clone) and XEN are started
> >
> > Server2 - DRBD, OCFS2(clone) are started
> >
> > Now if Server1 power is OFF
> >
> > The XEN resource which was running under Server1 should be failed over to Server2.
> >
> > In my case, its taking almost 90 to 110 seconds to do this.
> >
> > Can anyone suggest me ways to reduce it to within 30 to 40 seconds
> >
> > My pacemaker configuration is -
> > crm configure
> > property no-quorum-policy=ignore
> > property stonith-enabled=false
> > property default-resource-stickiness=1000
> >
> > primitive resDRBDr1 ocf:linbit:drbd \
> > params drbd_resource="r0" \
> > op start interval="0" timeout="240s" \
> > op stop interval="0" timeout="100s" \
> > op monitor interval="20s" role="Master" timeout="240s" \
> > op monitor interval="30s" role="Slave" timeout="240s" \
> > meta migration-threshold="3" failure-timeout="60s"
> > primitive resOCFS2r1 ocf:heartbeat:Filesystem \
> > params device="/dev/drbd/by-res/r0" directory="/cluster" fstype="ocfs2" \
> > op monitor interval="10s" timeout="60s" \
> > op start interval="0" timeout="90s" \
> > op stop interval="0" timeout="60s" \
> > meta migration-threshold="3" failure-timeout="60s"
> > primitive resXen1 ocf:heartbeat:Xen \
> > params xmfile="/home/cluster/xen/win7.cfg" name="xenwin7" \
> > op monitor interval="20s" timeout="60s" \
> > op start interval="0" timeout="90s" \
> > op stop interval="0" timeout="60s" \
> > op migrate_from interval="0" timeout="120s" \
> > op migrate_to interval="0" timeout="120s" \
> > meta allow-migrate="true" target-role="started"
> >
> > ms msDRBDr1 resDRBDr1 \
> > meta notify="true" master-max="2" interleave="true" target-role="Started"
> > clone cloOCFS2r1 resOCFS2r1 \
> > meta interleave="true" ordered="true" target-role="Started"
> >
> > colocation colOCFS12-with-DRBDrMaster inf: cloOCFS2r1 msDRBDr1:Master
> > colocation colXen-with-OCFSr1 inf: resXen1 cloOCFS2r1
> > order ordDRBD-before-OCFSr1 inf: msDRBDr1:promote cloOCFS2r1:start
> > order ordOCFS2r1-before-Xen1 inf: cloOCFS2r1:start resXen1:start
> >
> > commit
> > bye
> >
> > --
> > Regards,
> > Kamal Kishore B V
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> -- 
> Regards,
> Kamal Kishore B V
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140627/8e76e15d/attachment.sig>