[Pacemaker] Question regarding starting of master/slave resources and ELECTIONs

Wed Apr 13 17:19:50 UTC 2011

Andrew,

Thanks for responding.  Comments inline with <Bob>

________________________________
From: Andrew Beekhof <andrew at beekhof.net>
To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
Cc: Bob Schatz <bschatz at yahoo.com>
Sent: Tue, April 12, 2011 11:23:14 PM
Subject: Re: [Pacemaker] Question regarding starting of master/slave resources 
and ELECTIONs

On Wed, Apr 13, 2011 at 4:54 AM, Bob Schatz <bschatz at yahoo.com> wrote:
> Hi,
> I am running Pacemaker 1.0.9 with Heartbeat 3.0.3.
> I create 5 master/slave resources in /etc/ha.d/resource.d/startstop during
> post-start.

I had no idea this was possible.  Why would you do this?

<Bob>  We and I know of a couple of other companies, bundle LinuxHA/Pacemaker 
into an appliance.  For me, when the appliance boots, it creates HA resources 
based on the hardware it discovers.   I assumed that once POST-START was called 
in the startstop script and we have a DC then the cluster is up and running.  I 
then use "crm" commands to create the configuration, etc.  I further assumed 
that since we have one DC in the cluster then all "crm" commands which modify 
the configuration would be ordered even if the DC fails over to a different 
node.  Is this incorrect?

> I noticed that 4 of the master/slave resources will start right away but the
> 5 master/slave resource seems to take a minute or so and I am only running
> with one node.
> Is this expected?

Probably, if the other 4 take around a minute each to start.
There is an lrmd config variable that controls how much parallelism it
allows (but i forget the name).

<Bob> It's max-children and I set it to 40 for this test to see if it would 
change the behavior.  (/sbin/lrmadmin -p max-children 40)

> My configuration is below and I have also attached ha-debug.
> Also, what triggers a crmd election?

Node up/down events and whenever someone replaces the cib (which the
shell used to do a lot).

<Bob> For my test, I only started one node so that I could avoid node up/down 
events.  I believe the log shows the cib being replaced.  Since I am using crm 
then I assume it must be due to crm.   Do the crm_resource, etc commands also 
replace the cib?  Would that avoid elections as a result of cibs being replaced?

Thanks,

Bob

>  I seemed to have a lot of elections in
> the attached log.  I was assuming that on a single node I would only run the
> election once in the beginning and then there would not be another one until
> a new node joined.
>
> Thanks,
> Bob
>
> My configuration is:
> node $id="856c1f72-7cd1-4906-8183-8be87eef96f2" mgraid-s000030311-1
> primitive SSJ000030312 ocf:omneon:ss \
>         params ss_resource="SSJ000030312"
> ssconf="/var/omneon/config/config.J000030312" \
>         op monitor interval="3s" role="Master" timeout="7s" \
>         op monitor interval="10s" role="Slave" timeout="7" \
>         op stop interval="0" timeout="20" \
>         op start interval="0" timeout="300"
> primitive SSJ000030313 ocf:omneon:ss \
>         params ss_resource="SSJ000030313"
> ssconf="/var/omneon/config/config.J000030313" \
>         op monitor interval="3s" role="Master" timeout="7s" \
>         op monitor interval="10s" role="Slave" timeout="7" \
>         op stop interval="0" timeout="20" \
>         op start interval="0" timeout="300"
> primitive SSJ000030314 ocf:omneon:ss \
>         params ss_resource="SSJ000030314"
> ssconf="/var/omneon/config/config.J000030314" \
>         op monitor interval="3s" role="Master" timeout="7s" \
>         op monitor interval="10s" role="Slave" timeout="7" \
>         op stop interval="0" timeout="20" \
>         op start interval="0" timeout="300"
> primitive SSJ000030315 ocf:omneon:ss \
>         params ss_resource="SSJ000030315"
> ssconf="/var/omneon/config/config.J000030315" \
>         op monitor interval="3s" role="Master" timeout="7s" \
>         op monitor interval="10s" role="Slave" timeout="7" \
>         op stop interval="0" timeout="20" \
>         op start interval="0" timeout="300"
> primitive SSS000030311 ocf:omneon:ss \
>         params ss_resource="SSS000030311"
> ssconf="/var/omneon/config/config.S000030311" \
>         op monitor interval="3s" role="Master" timeout="7s" \
>         op monitor interval="10s" role="Slave" timeout="7" \
>         op stop interval="0" timeout="20" \
>         op start interval="0" timeout="300"
> primitive icms lsb:S53icms \
>         op monitor interval="5s" timeout="7" \
>         op start interval="0" timeout="5"
> primitive mgraid-stonith stonith:external/mgpstonith \
>         params hostlist="mgraid-canister" \
>         op monitor interval="0" timeout="20s"
> primitive omserver lsb:S49omserver \
>         op monitor interval="5s" timeout="7" \
>         op start interval="0" timeout="5"
> ms ms-SSJ000030312 SSJ000030312 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> ms ms-SSJ000030313 SSJ000030313 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> ms ms-SSJ000030314 SSJ000030314 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> ms ms-SSJ000030315 SSJ000030315 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> ms ms-SSS000030311 SSS000030311 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> clone Fencing mgraid-stonith
> clone cloneIcms icms
> clone cloneOmserver omserver
> location ms-SSJ000030312-master-w1 ms-SSJ000030312 \
>         rule $id="ms-SSJ000030312-master-w1-rule" $role="master" 100: #uname
> eq mgraid-s000030311-0
> location ms-SSJ000030313-master-w1 ms-SSJ000030313 \
>         rule $id="ms-SSJ000030313-master-w1-rule" $role="master" 100: #uname
> eq mgraid-s000030311-0
> location ms-SSJ000030314-master-w1 ms-SSJ000030314 \
>         rule $id="ms-SSJ000030314-master-w1-rule" $role="master" 100: #uname
> eq mgraid-s000030311-0
> location ms-SSJ000030315-master-w1 ms-SSJ000030315 \
>         rule $id="ms-SSJ000030315-master-w1-rule" $role="master" 100: #uname
> eq mgraid-s000030311-0
> location ms-SSS000030311-master-w1 ms-SSS000030311 \
>         rule $id="ms-SSS000030311-master-w1-rule" $role="master" 100: #uname
> eq mgraid-s000030311-0
> order orderms-SSJ000030312 0: cloneIcms ms-SSJ000030312
> order orderms-SSJ000030313 0: cloneIcms ms-SSJ000030313
> order orderms-SSJ000030314 0: cloneIcms ms-SSJ000030314
> order orderms-SSJ000030315 0: cloneIcms ms-SSJ000030315
> order orderms-SSS000030311 0: cloneIcms ms-SSS000030311
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
>         cluster-infrastructure="Heartbeat" \
>         dc-deadtime="5s" \
>         stonith-enabled="true"
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110413/8e9ddc52/attachment.htm>