[Pacemaker] [Openais] very slow pacemaker/corosync shutdown

Andrew Beekhof andrew at beekhof.net
Fri Sep 20 02:17:15 UTC 2013


On 20/09/2013, at 10:46 AM, Lists <lists at benjamindsmith.com> wrote:

> On 09/19/2013 04:50 PM, Andrew Beekhof wrote:
>> From this we can infer that corosync has gotten horribly confused and, as a consequence, pacemaker can't talk to its peers anymore.
>> 
>>> >this is a test cluster and not being monitored by a netmon. Any other details I could provide that would be useful/helpful?
>> Shortly before this, Corosync claims:
>> 
>> Sep 19 00:47:07 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
>> Sep 19 00:56:09 [9004] nomad.schoolpathways.com       crmd:     info: pcmk_cpg_membership: 	Left[2.0] crmd.1
>> Sep 19 00:56:09 [9004] nomad.schoolpathways.com       crmd:     info: crm_update_peer_proc: 	pcmk_cpg_membership: Node bender.schoolpathways.com[1] - corosync-cpg is now offline
>> Sep 19 00:56:09 [9004] nomad.schoolpathways.com       crmd:     info: peer_update_callback: 	Client bender.schoolpathways.com/peer now has status [offline] (DC=true)
>> 
>> Is this true?
>> If not, perhaps some timeouts need to be adjusted.  A switch to udpu (instead of multicast) may also be helpful.
> 
> Although the times you specifically mention were probably due to intentionally created failures, later, similar messages would have been clearly outside the range of time where I was testing. I've updated corosync.conf to use udpu from an example config and continue testing.
> 
> What timeout values might be useful to consider?

try 'man corosync.conf' and look for 'milliseconds' :)

> These two machines are next to each other, on the same gigabit switch and no packet loss has ever been detected.Truth is that I'm unsure what would be waiting.

Its quite possibly an algorithm issue. We've seen a few like this in the past.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130920/8f20dfe3/attachment-0004.sig>


More information about the Pacemaker mailing list