[ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

Thu Nov 30 14:11:39 CET 2017

On 11/30/2017 01:41 PM, Ulrich Windl wrote:
>
>>>> "Gao,Yan" <ygao at suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht
> <e71afccc-06e3-97dd-c66a-1b4bac550c23 at suse.com>:
>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>>> corosync and forcing STONITH pacemaker was not started after reboot.
>>> In logs I see during boot
>>>
>>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>>> just fenced by sapprod01p for sapprod01p
>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>>> process (3151) can no longer be respawned,
>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down 
>> Pacemaker
>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>>> stonith with SBD always takes msgwait (at least, visually host is not
>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>>> and is up and running long before timeout expires.
> As msgwait was intended for the message to arrive, and not for the reboot time (I guess), this just shows a fundamental problem in SBD design: Receipt of the fencing command is not confirmed (other than by seeing the consequences of ist execution).

The 2 x msgwait is not for confirmations but for writing the poison-pill
and for
having it read by the target-side.
Thus it is assumed that within a single msgwait the data is written and
confirmed.
And if the target-side doesn't manage to do the read within that time it
will
suicide via watchdog.
Thus a working watchdog is a fundamental precondition for sbd to work
properly
and storage-solutions that are doing caching, replication and stuff without
proper syncing are just not suitable for sbd.

Regards,
Klaus

>
> So the fencing node will see the other host is down (on the network), but it won't believe it until SBD msgwait is over. OTOH if your msgwait is very low, and the storage has a problem (exceeding msgwait), the node will assume a successful fencing when in fact it didn't complete.
>
> So maybe there should be two timeouts: One for the command to be delivered (without needing a confirmation, but the confirmation could shorten the wait), and another for executing the command (how long will it take from receipt of the command until the host is definitely down). Again a confirmation could stop waiting before the timeout is reached.
>
> Regards,
> Ulrich
>
>
>>> I think I have seen similar report already. Is it something that can
>>> be fixed by SBD/pacemaker tuning?
>> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>>
>> Regards,
>>    Yan
>>
>>> I can provide full logs tomorrow if needed.
>>>
>>> TIA
>>>
>>> -andrei
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org 
>>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>>
>>>
>>