[Pacemaker] Question regarding starting of master/slave resources and ELECTIONs

Sat Apr 16 09:32:08 UTC 2011

On Sat, Apr 16, 2011 at 12:30 AM, Bob Schatz <bschatz at yahoo.com> wrote:
> Andrew,
> Comments at end with <BS>
> ________________________________
> From: Andrew Beekhof <andrew at beekhof.net>
> To: Bob Schatz <bschatz at yahoo.com>
> Cc: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Sent: Fri, April 15, 2011 4:28:52 AM
> Subject: Re: [Pacemaker] Question regarding starting of master/slave
> resources and ELECTIONs
>
> On Fri, Apr 15, 2011 at 5:58 AM, Bob Schatz <bschatz at yahoo.com> wrote:
>> Andrew,
>> Thanks for the help
>> Comments inline with <BS>
>> ________________________________
>> From: Andrew Beekhof <andrew at beekhof.net>
>> To: Bob Schatz <bschatz at yahoo.com>
>> Cc: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
>> Sent: Thu, April 14, 2011 2:14:40 AM
>> Subject: Re: [Pacemaker] Question regarding starting of master/slave
>> resources and ELECTIONs
>>
>> On Thu, Apr 14, 2011 at 10:49 AM, Andrew Beekhof <andrew at beekhof.net>
>> wrote:
>>
>>>>> I noticed that 4 of the master/slave resources will start right away
>>>>> but
>>>>> the
>>>>> 5 master/slave resource seems to take a minute or so and I am only
>>>>> running
>>>>> with one node.
>>>>> Is this expected?
>>>>
>>>> Probably, if the other 4 take around a minute each to start.
>>>> There is an lrmd config variable that controls how much parallelism it
>>>> allows (but i forget the name).
>>>> <Bob> It's max-children and I set it to 40 for this test to see if it
>>>> would
>>>> change the behavior.  (/sbin/lrmadmin -p max-children 40)
>>>
>>> Thats surprising.  I'll have a look at the logs.
>>
>> Looking at the logs, I see a couple of things:
>>
>>
>> This is very bad:
>> Apr 12 19:33:42 mgraid-S000030311-1 crmd: [17529]: WARN: get_uuid:
>> Could not calculate UUID for mgraid-s000030311-0
>> Apr 12 19:33:42 mgraid-S000030311-1 crmd: [17529]: WARN:
>> populate_cib_nodes_ha: Node mgraid-s000030311-0: no uuid found
>>
>> For some reason pacemaker cant get the node's uuid from heartbeat.
>>
>> <BS> I create the uuid when the node comes up.
>
> Heartbeat should have already created it before pacemaker even got
> started though.
>
>>
>> So we start a few things:
>>
>> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
>> do_lrm_rsc_op: Performing
>> key=23:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
>> op=SSS000030311:0_start_0 )
>> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
>> do_lrm_rsc_op: Performing
>> key=49:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
>> op=SSJ000030312:0_start_0 )
>> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
>> do_lrm_rsc_op: Performing
>> key=75:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
>> op=SSJ000030313:0_start_0 )
>> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
>> do_lrm_rsc_op: Performing
>> key=101:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
>> op=SSJ000030314:0_start_0 )
>>
>> But then another change comes in:
>>
>> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
>> abort_transition_graph: need_abort:59 - Triggered transition abort
>> (complete=0) : Non-status change
>>
>> Normally we'd recompute and keep going, but it was a(nother) replace
>> operation, so:
>>
>> Apr 12 19:33:42 mgraid-S000030311-1 crmd: [17529]: info:
>> do_state_transition: State transition S_TRANSITION_ENGINE ->
>> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
>> origin=do_cib_replaced ]
>>
>> All the time goes here:
>>
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Timer popped (timeout=20000,
>> abort_level=1000000, complete=true)
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Ignoring timeout while not in transition
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Timer popped (timeout=20000,
>> abort_level=1000000, complete=true)
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Ignoring timeout while not in transition
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Timer popped (timeout=20000,
>> abort_level=1000000, complete=true)
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Ignoring timeout while not in transition
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Timer popped (timeout=20000,
>> abort_level=1000000, complete=true)
>> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
>> action_timer_callback: Ignoring timeout while not in transition
>> Apr 12 19:37:00 mgraid-S000030311-1 crmd: [17529]: ERROR:
>> crm_timer_popped: Integration Timer (I_INTEGRATED) just popped!
>>
>> but its not at all clear to me why - although certainly avoiding the
>> election would help.
>> Is there any chance to load all the changes at once?
>>
>> <BS> Yes.  That worked.  I created the configuration in a file and then
>> did
>> a "crm configure load update <filename>" to avoid the election
>> Possibly the delay related to the UUID issue above, possibly it might
>> be related to one of these two patches that went in after 1.0.9
>>
>> andrew (stable-1.0)    High: crmd: Make sure we always poke the FSA after
>> a transition to clear any TE_HALT actions CS: 9187c0506fd3 On:
>> 2010-07-07
>> andrew (stable-1.0)    High: crmd: Reschedule the PE_START action if its
>> not already running when we try to use it CS: e44dfe49e448 On:
>> 2010-11-11
>>
>> Could you try turning on debug and/or a more recent version?
>>
>> <BS>  I turned on debug and grabbed the logs, configuration and and
>> /var/lib/pengine directory.   They are attached.
>>      Unfortunately I cannot try a new version with this hardware at this
>> time. :(
>
> Based on the new logs, it looks like the problem is resolved by
> loading everything once.
> All start actions appear to occur within a second of Apr 14 20:35:43.
>
> <BS>  This run did not create the resources all at once.  It did start the
> resources at Apr 14 20:35:43.
> However, when looking at the log debug/ctlr0/ha-debug you will see these
> entries (Note this node is not the pengine):
> Apr 14 20:35:52 mgraid-S000030311-0 lrmd: [15228]: debug:
> on_msg_get_state:state of rsc mgraid-stonith:0 is LRM_RSC_IDLE
> Apr 14 20:35:52 mgraid-S000030311-0 lrmd: [15228]: debug:
> on_msg_get_state:state of rsc SSJ000030313:0 is LRM_RSC_BUSY
> Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: WARN: msg_to_op(1324):
> failed to get the value of field lrm_opstatus from a ha_msg
> Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: msg_to_op: Message
> follows:
> Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG: Dumping
> message with 16 fields
> Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG[0] : [lrm_t=op]
> Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG[1] :
> [lrm_rid=SSJ000030313:0]
> Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG[2] :
> [lrm_op=start]
> This was reported as bug report 2580.
> Later, this resource was stopped for some reason by pengine.
> Unfortunately, I grabbed the logs before the resources such
> as SSJ000030313:0 were stopped.  :( :(

Oh.

> Anyway, I was able to get things to work correctly as summarized below.
> I think that 2580 covers all of the issues to get the problem fixed.  Do you
> agree?

Yes.

> In the mean time, a summary of the work around was that I did these steps:
> 1. Increase the number of threads lrmd has to fork off the shell scripts to
> do start, monitor, etc with this command in /etc/ha.d/resource.d/startstop
> (post-start action)
>        #  /sbin/lrmadmin -p max-children 8
> 2. Create all of the resources needed in one file (basically take the output
> from a "crm configure show" on an already configured system and copy to a
> file) as opposed to creating each resource and associated linkage with
> individual crm commands.  This avoids the "replace" step which may happen in
> crm.  The "replace" operation causes the crmds to do an election which
> causes me problems.  (I assume it is a bug related to 2580 but I am not
> sure)
> 3. Do an "update" as opposed to a "replace" of the HA configuration with
> this step:
>       # crm configure load update <filename>
>
> Thanks Andrew for your help!
> Bob
>
>
>
>
>