[Pacemaker] Need to relax corosync due to backup of VM through snapshot

Wed Nov 20 15:58:01 UTC 2013

Hello,
trying to relax timeout because of backups that runs usign Netbackup
and VMware storage api and causing cluster reconfiguration
Based on docs  I thought that the timeout should be

token x token_retransmits_before_loss_const

but actually through my tests I see that setting or not setting the
second one the timeout is equal to the token value...
Is this correct and expected? WHat is then the meaning of
token_retransmits_before_loss_const?

My test system is SLES 11 SP2 with HA extension

SO my current test config is:
  # diff corosync.conf corosync.conf.pre181113
24,25c24
< #token: 5000
< token: 120000
---
> token: 5000
28c27
< #token_retransmits_before_loss_const: 10
---
> token_retransmits_before_loss_const: 10

ALso due to drbd in place too (mastsr/save resource in pacemaker), I set this:
/etc/drbd.d # diff global_common.conf global_common.conf.pre181113
47,50c47
< connect-int 61;
< ping-int 61;
< timeout 600;
< ping-timeout 600;
---
> ping-timeout 100;

So that I allow 60 seconds for drbd timeouts.

Any comment?
Any different strategies successfully used in similar environments
where high latencies get in place at snapshot deletion when
consolidate phase of disks is executed?

Thanks in advance,
Gianluca