[Pacemaker] How to really deal with gateway restarts?

Wed Jun 16 05:46:49 EDT 2010

On Tue, Jun 15, 2010 at 5:21 PM, Maros Timko <timkom at gmail.com> wrote:
>>>>> I thought "dampen" attribute could help with some of the options, but
>>>>> actually it is does not.
>>>>
>>>> It should do. ?Hard to say without any logs from the two machines.
>>>
>>> Unfort. I don't have log files here, can provide you if that would help.
>>> Are you sure dampen should help here? From my testing it only does:
>>> "Unless the next attribute value is stable for dampen interval, do not
>>> change the attribute value in CIB". However, the pingd attribute is
>>> set for two nodes, thus they are stored in separate XML section
>>> meaning they are not correlated by dampen.
>>
>> But the attribute that is set in both cases is called "pingd", so yes,
>> dampen should definitely apply here.
>> What version of pacemaker do you have?  That would also be relevant.
>
> # rpm -qa|egrep 'pacem|heart|glue'
> heartbeat-3.0.1-1.el5.x86_64
> cluster-glue-libs-1.0.1-1.el5.x86_64
> cluster-glue-1.0.1-1.el5.x86_64
> pacemaker-1.0.7-2.el5.x86_64
> heartbeat-libs-3.0.1-1.el5.x86_64
> pacemaker-libs-1.0.7-2.el5.x86_64
>
> Please find attached ha-debug logs from following tests:

A hb_report archive would be preferred, it contains everything needed
to figure out whats going on and I wouldn't need to ask for you
configuration ;-)

>  1. dc and non-dc from situation where VM was restarted. Currently
> active node was DC.
>  2. dc_noreboot and non-dc_noreboot from situation where VM was not
> restarted (I would like to achieve this). Currently active node was
> not the DC.
> Both tests used dampen=5s
>
> So it seems like it would be better to have DC assigned into standby
> node every time (this would also make failovers faster).

Nope.

> But there is
> no option how ho force DC election or assign the role. Am I right?

Right and for good reason, because its not relevant.

> If dampen handling was fixed later, it would be great. I don't think
> this is the case, the only relevant commit seems to be:
> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/214f0fc258f2
> More verbose description how dampen is supposed to work would also be handy.
>
>>
>>> I tried dampen 5s and 0s
>>> without any effect.
>>> My issue is that Pacemaker acts immediatelly after getting local
>>> update request, nomatter what the remote is trying to say. Remote
>>> update is processed only after migration decision is made and that is
>>> a bit late.
>>>
>>>>
>>>>> The only thing that worked for me was
>>>>> restarting of standby CRM, until its monitoring cycle was a bit behind
>>>>> the active. But I would not be happy with it.
>>>>> Does anybody have any idea if there could be some option like "Hey,
>>>>> change of this attribute can trigger resource migration. Let's wait a
>>>>> while (configured) for standby value update..."? Or any other crazy
>>>>> ideas?
>>>>>
>>>>> Thanks,
>>>>> Tino
>>>>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>