[Pacemaker] How to really deal with gateway restarts?

Fri Jun 18 06:44:15 UTC 2010

On Thu, Jun 17, 2010 at 5:01 PM, Maros Timko <timkom at gmail.com> wrote:
>> On Tue, Jun 15, 2010 at 5:21 PM, Maros Timko <timkom at gmail.com> wrote:
>>>>>>> I thought "dampen" attribute could help with some of the options, but
>>>>>>> actually it is does not.
>>>>>>
>>>>>> It should do. ?Hard to say without any logs from the two machines.
>>>>>
>>>>> Unfort. I don't have log files here, can provide you if that would help.
>>>>> Are you sure dampen should help here? From my testing it only does:
>>>>> "Unless the next attribute value is stable for dampen interval, do not
>>>>> change the attribute value in CIB". However, the pingd attribute is
>>>>> set for two nodes, thus they are stored in separate XML section
>>>>> meaning they are not correlated by dampen.
>>>>
>>>> But the attribute that is set in both cases is called "pingd", so yes,
>>>> dampen should definitely apply here.
>>>> What version of pacemaker do you have? ?That would also be relevant.
>>>
>>> # rpm -qa|egrep 'pacem|heart|glue'
>>> heartbeat-3.0.1-1.el5.x86_64
>>> cluster-glue-libs-1.0.1-1.el5.x86_64
>>> cluster-glue-1.0.1-1.el5.x86_64
>>> pacemaker-1.0.7-2.el5.x86_64
>>> heartbeat-libs-3.0.1-1.el5.x86_64
>>> pacemaker-libs-1.0.7-2.el5.x86_64
>>>
>>> Please find attached ha-debug logs from following tests:
>>
>> A hb_report archive would be preferred, it contains everything needed
>> to figure out whats going on and I wouldn't need to ask for you
>> configuration ;-)
>
> OK, for simplicity I have created an academic configuration that uses
> dampen 10 seconds because it uses monitor interval 10 seconds. So if
> attrd should wait dampen period until all nodes send updates before
> updating CIB (and possibly triggering failover), this should work
> nomatter what delay could be between monitoring cycles of the nodes.
> However, it proves that:
>  - if current node is DC and is active (running resources), it moves
> resources (or restarts if stop would take longer) on gateway failure
>  - if current node is not a DC and is active (running resources), it
> moves resources (or restarts if stop would take longer) when gateway
> connection is re-established
> Please find attached hb-report as well as ha-debug files from both
> nodes because I increased debug level for attrd but they did not get
> into merged ha-log file in the report.
> What I did:
>  1. DC was active, I disconnected both public cables at the same time.
>     Resources migrated to standby
>  2. DC was not active, I reconnected both public cables at the same time.
>     Resources migrated to standby DC
>
> Let me know if you would need anything else.

Despite your clocks being a bit out, "dampen" looks to be doing what
its supposed to...

Jun 17 15:13:27 vsp7 attrd_updater: [31719]: info: Invoked:
attrd_updater -n pingd -v 0 -d 10s
Jun 17 15:13:37 vsp7 attrd: [30997]: info: attrd_trigger_update:
Sending flush op to all hosts for: pingd (0)
Jun 17 15:13:37 vsp7 attrd: [30997]: info: attrd_ha_callback: flush
message from vsp7.example.com
Jun 17 15:13:37 vsp7 attrd: [30997]: info: attrd_perform_update: Sent
update 16: pingd=0

Jun 17 15:13:33 vsp8 attrd_updater: [22137]: info: Invoked:
attrd_updater -n pingd -v 0 -d 10s
Jun 17 15:13:36 vsp8 attrd: [21295]: info: attrd_ha_callback: flush
message from vsp7.example.com
Jun 17 15:13:36 vsp8 attrd: [21295]: info: attrd_perform_update: Sent
update 17: pingd=0

vsp7 notices the down link first, waits 10s and then tells everyone to
write to the cib.
vsp8 only noticed the down link 3s earlier but writes to the cib anyway.

The problem is that the two writes aren't truly atomic and they're
happening just far apart enough[1] for the PE to complete its
calculation and for the crmd to execute it.
One day we're probably going to have to rewrite attrd to elect a
leader which gathers all the values and writes them in one go.

[1] Hard to say how long because, as I said, the times on both
machines aren't in sync.
Either that or heartbeat has gained the ability to send messages
backwards in time :)