[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Andrei Borzenkov arvidjaar at gmail.com
Sun Nov 26 15:31:26 CET 2017


22.11.2017 22:45, Klaus Wenninger пишет:
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
> Don't know it from sbd but have seen where fencing using
> the cycle-method with machines that boot quickly leads to
> strange behavior.
> If you configure sbd to not clear the disk-slot on startup
> (SBD_START_MODE=clean) it should be left to the other
> side to do that which should prevent the other node from
> coming up while the one fencing is still waiting. You might
> set the method from cycle to off/on to make the fencing
> side clean the slot.
> 
>>
>> I can provide full logs tomorrow if needed.
> Yes would be interesting to see more ...
> 

crm_report attached (it's from different trivial test cluster). Actually
I can reliably reproduce it as long as node is rebooted and pacemaker is
started before stonith agent confirmed node kill.

Unfortunately in case of SBD I cannot set stonith timeout too low as we
need to account for possible storage path failover.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report-Sun-26-Nov-2017.tar.bz2
Type: application/x-bzip
Size: 91583 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171126/59897608/attachment-0001.bin>


More information about the Users mailing list