[Pacemaker] Pacemaker failover delays (followup)

Tue Mar 12 13:56:52 EDT 2013

----- Original Message -----
> From: "Michael Powell" <Michael.Powell at harmonicinc.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Friday, March 8, 2013 4:50:10 PM
> Subject: [Pacemaker] Pacemaker failover delays (followup)
> 
> 
> 
> 
> 
> Andrew,
> 
> 
> 
> Thanks for the feedback to my earlier questions from March 6th. I’ve
> done some further investigation wrt the timing of what I’d call the
> “simple” failover case: where an SSID that is master on the DC node
> is killed, and it takes 10-12 seconds before the slave SSID on the
> other node transitions to master. (Recall that “SSID” is a
> SliceServer app instance, each of which is abstracted as a Pacemaker
> resource.)
> 
> 
> 
> Before going into my findings, I want to clear up a couple of
> misstatements on my part.
> 
> · WRT my mention of “notifications” in my earlier e-mail, I misused
> the term. I was simply referring to the “notify” events passed from
> the DC to the other node.
> 
> · I also misspoke when I said that the failed SSID was subsequently
> restarted as a result of a monitor event. In fact, the SSID process
> is restarted by the “ss” resource agent script in response to a
> “start” event from lrmd.
> 
> 
> 
> The key issue, however, is the time required – 10 to 12 seconds –
> from the time the master SSID is killed until the slave fails over
> to become master. You opined that the time required would largely
> depend upon the behavior of the resource agent, which in our case is
> a script called “ss”. To determine what effect the ss script’s
> execution would be, I modified it to log the current monotonic
> system clock value each time it starts, and just before it exits.
> The log messages specify the clock value in ms.
> 
> 
> 
> From this, I did find several instances where the ss script would
> take just over a second to complete execution. In each such case,
> the “culprit” is an exec of “crm_node –p”, which is called to
> determine how many nodes are presently in the cluster. (I’ve
> verified this timing independently by executing “crm_node –p” from a
> command line when the cluster is quiescent.) This seems like a
> rather long time for a simple objective. What would “crm_node –p” do
> that would take so long?

What crm_node -p does depends on which pacemaker stack are running. Using pacemaker+corosync 2.0 with vote quorum (the stack described in the 'pacemaker 1.1 for corosync 2.x' documentation found here http://clusterlabs.org/doc/) crm_node -p is nearly instantaneous.  If you have determined that your resource agent is not at fault, I would recommend updating to a more recent cluster stack.  It is going to be difficult for us to support you if you are using the aging pacemaker+heartbeat stack.

> 
> 
> That notwithstanding, from the POV of the slave during the failover,
> there are delays of several hundred to about 1400ms between the
> completion of the ss script and its invocation for the next event.
> To explain, I’ve attached an Excel spreadsheet (which I’ve verified
> is virus-free), that documents two experiments. In each case,
> there’s an SSID instance that’s master on node-0, the DC, and which
> is killed. The spreadsheet includes a synopsis of the log message
> that follows on both cans, interleaved into a timeline.
> 
> 
> 
> By way of explanation, columns B-D contain timestamp information for
> node-0 and columns E-G for node 1. Columns B/E show the current time
> of day, C/F show the monotonic clock value when the ss script begins
> execution (in ms, truncated to the least 5 digits), and D/G show the
> duration of the ss script execution for the relevant event. Column H
> is text extracted from the log, showing the key text. In some cases
> there is a significant amount of information in the log file
> relating to pengine behavior, but I omitted such information from
> the spreadsheet. Column I contains explanatory comments.
> 
> 
> 
> Realizing that we need to look forward to upgrading our Pacemaker
> version (from 1.0.9), I wonder if you can clear up a couple of
> questions. We are presently using Heartbeat, which I believe
> restricts our upgrade to the 1.0 branch, correct? In other words, if
> we want to upgrade to the 1.1 branch, are we required to replace
> Heartbeat with Corosync?

Technically, no. Realistically yes. If you update to the latest 1.1.x version of pacemaker I would not recommend attempting to use heartbeat.  We put forth quite a bit of effort to verify the pacemaker+corosync 2.0 and pacemaker+corosync plugin stacks.  There is no effort that I am aware of to test the heartbeat stack. It is largely depreciated and unsupported at this point.

> Secondly, when upgrading, are there kernel
> dependencies to worry about?

I can't speak to that, Andrew may know.

-- Vossel

> We are presently running on the open
> source kernel version 2.6.18. We plan to migrate to the most current
> 2.8 or 3.0 version later this year, at which time it would probably
> make sense to bring Pacemaker up to date.
> 
> 
> 
> I apologize for the length of this posting, and again appreciate any
> assistance you can offer.
> 
> 
> 
> Regards,
> 
> Michael Powell
> 
> 
> 
> LogoSignature2
> 
> 
> 
> Michael Powell
> 
> Staff Engineer
> 
> 
> 
> 15220 NW Greenbrier Pkwy
> 
> Suite 290
> 
> Beaverton, OR 97006
> 
> T 503-372-7327 M 503-789-3019 H 503-625-5332
> 
> 
> 
> www.harmonicinc.com
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>