[Pacemaker] Need to relax corosync due to backup of VM through snapshot

Thu Nov 21 08:09:32 UTC 2013

On 2013-11-20T16:58:01, Gianluca Cecchi <gianluca.cecchi at gmail.com> wrote:

> Based on docs  I thought that the timeout should be
> 
> token x token_retransmits_before_loss_const

No, the comments in the corosync.conf.example and man corosync.conf
should be pretty clear, I hope. Can you recommend which phrasing we
should improve?

> SO my current test config is:
>   # diff corosync.conf corosync.conf.pre181113
> 24,25c24
> < #token: 5000
> < token: 120000

A 120s node timeout? That is really, really long. Why is the backup tool
interfering with the scheduling of high priority processes so much? That
sounds like the real bug.

> Any comment?
> Any different strategies successfully used in similar environments
> where high latencies get in place at snapshot deletion when
> consolidate phase of disks is executed?

A setup where a VM apparently can freeze for almost 120s is not suitable
for HA.

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde