[Pacemaker] Pacemaker Digest, Vol 31, Issue 56

Fri Jun 18 07:29:51 EDT 2010

On Fri, Jun 18, 2010 at 12:18 PM, Maros Timko <timkom at gmail.com> wrote:
>> Despite your clocks being a bit out, "dampen" looks to be doing what
>> its supposed to...
>
> OK, vsp7 is more than 1 second ahead, still not that bad
> [root at vsp7 ~]# date +%H:%M:%S.%N; ssh 135.64.30.29 date +%H:%M:%S.%N
> 10:07:15.093943000
> 10:07:13.924642000

Oh thats not the cause of the problem.  It just made looking at the
logs "more interesting".

And actually I looked at it different way, and you can see that the
time between the two updates is at most a second.

 ### Update 1
Jun 17 15:13:36 vsp8 crmd: [21296]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]

Jun 17 15:13:36 vsp8 crmd: [21296]: info: te_rsc_command: Initiating
action 7: stop Dummy1_stop_0 on vsp8.example.com (local)
Jun 17 15:13:36 vsp8 crmd: [21296]: info: te_rsc_command: Initiating
action 8: start Dummy1_start_0 on vsp7.example.com

 ### Update 2
Jun 17 15:13:37 vsp8 crmd: [21296]: info: abort_transition_graph:
te_update_diff:146 - Triggered transition abort (complete=0,
tag=transient_attributes, id=e20a4cb1-c2a2-41b1-910b-0f2b79db0ef8,
magic=NA, cib=0.9.13) : Transient attribute: update

>>
>> Jun 17 15:13:27 vsp7 attrd_updater: [31719]: info: Invoked:
>> attrd_updater -n pingd -v 0 -d 10s
>> Jun 17 15:13:37 vsp7 attrd: [30997]: info: attrd_trigger_update:
>> Sending flush op to all hosts for: pingd (0)
>> Jun 17 15:13:37 vsp7 attrd: [30997]: info: attrd_ha_callback: flush
>> message from vsp7.example.com
>> Jun 17 15:13:37 vsp7 attrd: [30997]: info: attrd_perform_update: Sent
>> update 16: pingd=0
>>
>> Jun 17 15:13:33 vsp8 attrd_updater: [22137]: info: Invoked:
>> attrd_updater -n pingd -v 0 -d 10s
>> Jun 17 15:13:36 vsp8 attrd: [21295]: info: attrd_ha_callback: flush
>> message from vsp7.example.com
>> Jun 17 15:13:36 vsp8 attrd: [21295]: info: attrd_perform_update: Sent
>> update 17: pingd=0
>>
>> vsp7 notices the down link first, waits 10s and then tells everyone to
>> write to the cib.
>> vsp8 only noticed the down link 3s earlier but writes to the cib anyway.
>>
>
> The explanation of "dampen" from Pacemaker docs:
> "The time to wait (dampening) for further changes occur. Use this to
> prevent a resource from bouncing around the cluster when cluster nodes
> notice the loss of connectivity at slightly different times."
> I used 10 seconds dampen to prevent attribute updates at "slightly
> different times",

And it did.
Instead of the updates form both nodes coming 7s apart, they came in
"about" 1s apart.

We know about the problem but its a major rewrite to fix and hasn't
really been much of an issue for people to become a priority.
If the config is a bit larger, the PE takes a bit longer and the two
updates appear to occur "instaneously".
If the resources take more than 1s to stop, the DC doesn't get as much
done either.

And if the network is down, then no-one is going to notice the
resources restarting anyway.

But if anyone would like to volunteer to write the patch... I'd
certainly point them in the right direction :-)

> but what it is actually doing is:
>  If I am a DC
>   - wait dampen period if there is any additional local update
>   - ignore (or buffer?) remote updates, but stop dampen timer on
> remote timer stop
>   - write our update to the CIB
>   - process remote updates now
>
> Non-DC reported the issue as the first one:
> Jun 17 15:13:27 vsp7.example.com attrd_updater: [31719]: info:
> Invoked: attrd_updater -n pingd -v 0 -d 10s
> Jun 17 15:13:27 vsp7.example.com attrd: [30997]: debug:
> attrd_local_callback: New value of pingd is 0
> Jun 17 15:13:37 vsp7.example.com attrd: [30997]: debug: debug2:
> stop_attrd_timer: Stopping pingd timer
> Jun 17 15:13:37 vsp7.example.com attrd: [30997]: info:
> attrd_trigger_update: Sending flush op to all hosts for: pingd (0)
>   ^--- after dampen period (10s), sent an update
> Jun 17 15:13:37 vsp7.example.com attrd: [30997]: info:
> attrd_ha_callback: flush message from vsp7.example.com
> Jun 17 15:13:37 vsp7.example.com attrd: [30997]: debug:
> log_data_element: find_nvpair_attr: Match <nvpair
> id="status-e20a4cb1-c2a2-41b1-910b-0f2b79db0ef8-pingd" name="pingd"
> value="300" />
> Jun 17 15:13:37 vsp7.example.com attrd: [30997]: debug: debug2:
> log_data_element: update_attr: update_attr <nvpair
> id="status-e20a4cb1-c2a2-41b1-910b-0f2b79db0ef8-pingd" name="pingd"
> value="0" />
> DC (with fixed time stamps):
> Jun 17 15:13:32 vsp8.example.com attrd_updater: [22137]: info:
> Invoked: attrd_updater -n pingd -v 0 -d 10s
> Jun 17 15:13:32 vsp8.example.com attrd: [21295]: debug:
> attrd_local_callback: update message from attrd_updater: pingd=0
> Jun 17 15:13:32 vsp8.example.com attrd: [21295]: debug:
> attrd_local_callback: New value of pingd is 0
> Jun 17 15:13:34 vsp8.example.com attrd: [21295]: info:
> attrd_ha_callback: flush message from vsp7.example.com
> Jun 17 15:13:34 vsp8.example.com attrd: [21295]: debug: debug2:
> stop_attrd_timer: Stopping pingd timer
>  ^--- stop timer request after 2s, did it come from remote Non-DC?
>        Remote value did not get through
> Jun 17 15:13:34 vsp8.example.com attrd: [21295]: debug:
> log_data_element: find_nvpair_attr: Match <nvpair
> id="status-bf90630d-6485-4713-bd0f-07fc669f938f-pingd" name="pingd"
> value="300" />
> Jun 17 15:13:34 vsp8.example.com attrd: [21295]: debug: debug2:
> log_data_element: update_attr: update_attr <nvpair
> id="status-bf90630d-6485-4713-bd0f-07fc669f938f-pingd" name="pingd"
> value="0" />
>
> I am just wondering if it worked at all somewhere in the past. I hope
> yes as I think if somebody was testing it when it was implemented.
> Don't get me wrong Andrew, I don't want to bark here as an old dog.
> Just wanted to point out that there is some issue here and wanted to
> ensure if I really need to workaround it (that is always a bit ugly).
> Now I know, I have to.
>
>> The problem is that the two writes aren't truly atomic and they're
>> happening just far apart enough[1] for the PE to complete its
>> calculation and for the crmd to execute it.
>> One day we're probably going to have to rewrite attrd to elect a
>> leader which gathers all the values and writes them in one go.
>
> Seems like attrd running on DC is trying to do something like that but
> it does not work properly.
>
>>
>> [1] Hard to say how long because, as I said, the times on both
>> machines aren't in sync.
>> Either that or heartbeat has gained the ability to send messages
>> backwards in time :)
>
> No, I tried it number of times with different intervals and delays. I
> could do the test again with synced time if you want.
>
>>
>
> Thanks,
> Tino
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>