[Pacemaker] How to really deal with gateway restarts?

Tue Jun 15 15:21:35 UTC 2010

>>>> I thought "dampen" attribute could help with some of the options, but
>>>> actually it is does not.
>>>
>>> It should do. ?Hard to say without any logs from the two machines.
>>
>> Unfort. I don't have log files here, can provide you if that would help.
>> Are you sure dampen should help here? From my testing it only does:
>> "Unless the next attribute value is stable for dampen interval, do not
>> change the attribute value in CIB". However, the pingd attribute is
>> set for two nodes, thus they are stored in separate XML section
>> meaning they are not correlated by dampen.
>
> But the attribute that is set in both cases is called "pingd", so yes,
> dampen should definitely apply here.
> What version of pacemaker do you have?  That would also be relevant.

# rpm -qa|egrep 'pacem|heart|glue'
heartbeat-3.0.1-1.el5.x86_64
cluster-glue-libs-1.0.1-1.el5.x86_64
cluster-glue-1.0.1-1.el5.x86_64
pacemaker-1.0.7-2.el5.x86_64
heartbeat-libs-3.0.1-1.el5.x86_64
pacemaker-libs-1.0.7-2.el5.x86_64

Please find attached ha-debug logs from following tests:
 1. dc and non-dc from situation where VM was restarted. Currently
active node was DC.
 2. dc_noreboot and non-dc_noreboot from situation where VM was not
restarted (I would like to achieve this). Currently active node was
not the DC.
Both tests used dampen=5s

So it seems like it would be better to have DC assigned into standby
node every time (this would also make failovers faster). But there is
no option how ho force DC election or assign the role. Am I right?
If dampen handling was fixed later, it would be great. I don't think
this is the case, the only relevant commit seems to be:
http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/214f0fc258f2
More verbose description how dampen is supposed to work would also be handy.

>
>> I tried dampen 5s and 0s
>> without any effect.
>> My issue is that Pacemaker acts immediatelly after getting local
>> update request, nomatter what the remote is trying to say. Remote
>> update is processed only after migration decision is made and that is
>> a bit late.
>>
>>>
>>>> The only thing that worked for me was
>>>> restarting of standby CRM, until its monitoring cycle was a bit behind
>>>> the active. But I would not be happy with it.
>>>> Does anybody have any idea if there could be some option like "Hey,
>>>> change of this attribute can trigger resource migration. Let's wait a
>>>> while (configured) for standby value update..."? Or any other crazy
>>>> ideas?
>>>>
>>>> Thanks,
>>>> Tino
>>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dc
Type: application/octet-stream
Size: 17427 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/98b29cdc/attachment-0016.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: non-dc
Type: application/octet-stream
Size: 623 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/98b29cdc/attachment-0017.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dc_noreboot
Type: application/octet-stream
Size: 15374 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/98b29cdc/attachment-0018.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: non-dc_noreboot
Type: application/octet-stream
Size: 436 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100615/98b29cdc/attachment-0019.obj>