[Pacemaker] Pacemaker failover delays (followup)

Tue Mar 12 21:35:18 EDT 2013

On Sat, Mar 9, 2013 at 9:50 AM, Michael Powell <
Michael.Powell at harmonicinc.com> wrote:

> Andrew,****
>
> ** **
>
> Thanks for the feedback to my earlier questions from March 6th.  I’ve done
> some further investigation wrt the timing of what I’d call the “simple”
> failover case:   where an SSID that is master on the DC node is killed, and
> it takes 10-12 seconds before the slave SSID on the other node transitions
> to master.  (Recall that “SSID” is a SliceServer app instance, each of
> which is abstracted as a Pacemaker resource.)****
>
> ** **
>
> Before going into my findings, I want to clear up a couple of
> misstatements on my part.****
>
> **·         **WRT my mention of “notifications” in my earlier e-mail, I
> misused the term.  I was simply referring to the “notify” events passed
> from the DC to the other node.****
>
> **·         **I also misspoke when I said that the failed SSID was
> subsequently restarted as a result of a monitor event.  In fact, the SSID
> process is restarted by the “ss” resource agent script in response  to a
> “start” event from lrmd.****
>
> ** **
>
> The key issue, however, is the time required – 10 to 12 seconds – from the
> time the master SSID is killed until the slave fails over to become
> master.  You opined that the time required would largely depend upon the
> behavior of the resource agent, which in our case is a script called “ss”.
> To determine what effect the ss script’s execution would be, I modified it
> to log the current monotonic system clock value each time it starts, and
> just before it exits.  The log messages specify the clock value in ms.****
>
> ** **
>
> From this, I did find several instances where the ss script would take
> just over a second to complete execution.  In each such case, the “culprit”
> is an exec of “crm_node –p”, which is called to determine how many nodes
> are presently in the cluster.  (I’ve verified this timing independently by
> executing “crm_node –p” from a command line when the cluster is
> quiescent.)  This seems like a rather long time for a simple objective.
> What would “crm_node –p” do that would take so long?
>

If support for corosync is compiled in, probably trying to connect to it
first.
Its not as smart as 1.1.x

Otherwise its just waiting for heartbeat to answer.

> ****
>
> ** **
>
> That notwithstanding, from the POV of the slave during the failover, there
> are delays of several hundred to about 1400ms between the completion of the
> ss script and its invocation for the next event.  To explain, I’ve attached
> an Excel spreadsheet (which I’ve verified is virus-free), that documents
> two experiments.  In each case, there’s an SSID instance that’s master on
> node-0, the DC, and which is killed.  The spreadsheet includes a synopsis
> of the log message that follows on both cans, interleaved into a timeline.
> ****
>
> ** **
>
> By way of explanation, columns B-D contain timestamp information for
> node-0 and columns E-G for node 1.  Columns B/E show the current time of
> day, C/F show the monotonic clock value when the ss script begins execution
> (in ms, truncated to the least 5 digits), and D/G show the duration of the
> ss script execution for the relevant event.  Column H is text extracted
> from the log, showing the key text.  In some cases there is a significant
> amount of information in the log file relating to pengine behavior, but I
> omitted such information from the spreadsheet.  Column I contains
> explanatory comments.****
>
> ** **
>
> Realizing that we need to look forward to upgrading our Pacemaker version
> (from 1.0.9), I wonder if you can clear up a couple of questions.  We are
> presently using Heartbeat, which I believe restricts our upgrade to the 1.0
> branch, correct?
>

Nope.  http://blog.clusterlabs.org/blog/2012/can-pacemaker-118-be-used-with/

> In other words, if we want to upgrade to the 1.1 branch, are we required
> to replace Heartbeat with Corosync?  Secondly, when upgrading, are there
> kernel dependencies to worry about?
>

Not unless you're using OCFS2/GFS2.

> We are presently running on the open source kernel version 2.6.18.  We
> plan to migrate to the most current 2.8 or 3.0 version later this year, at
> which time it would probably make sense to bring Pacemaker up to date.****
>
> ** **
>
> I apologize for the length of this posting, and again appreciate any
> assistance you can offer.****
>
> ** **
>
> Regards,****
>
>   Michael Powell****
>
> ** **
>
> [image: LogoSignature2]****
>
> ** **
>
>     Michael Powell****
>
>     Staff Engineer****
>
> ** **
>
>     15220 NW Greenbrier Pkwy****
>
>         Suite 290****
>
>     Beaverton, OR   97006****
>
>     T 503-372-7327    M 503-789-3019   H 503-625-5332****
>
> ** **
>
>     www.harmonicinc.com****
>
> ** **
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130313/50f0be27/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1625 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130313/50f0be27/attachment-0003.gif>