[Pacemaker] About behavior in "Action Lost".

Wed Sep 29 13:53:46 UTC 2010

Sorry, it probably got rebased before I pushed it.

http://hg.clusterlabs.org/pacemaker/1.1/rev/dd8e37df3e96 should be the
right link

On Wed, Sep 29, 2010 at 2:51 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi Andrew,
>
>> Pushed as:
>>    http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18
>>
>> Not sure about applying to 1.0 though, its a dramatic change in behavior.
>
> The change of this link is not found.
> Where did you update it?
>
> Best Regards,
> Hideo Yamauchi.
>
> --- Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> Pushed as:
>>    http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18
>>
>> Not sure about applying to 1.0 though, its a dramatic change in behavior.
>>
>> On Wed, Sep 22, 2010 at 11:18 AM,  <renayama19661014 at ybb.ne.jp> wrote:
>> > Hi Andrew,
>> >
>> > Thank you for comment.
>> >
>> >> A long time ago in a galaxy far away, some messaging layers used to
>> >> loose quite a few actions, including stops.
>> >> About the same time, we decided that fencing because a stop action was
>> >> lost wasn't a good idea.
>> >>
>> >> The rationale was that if the operation eventually completed, it would
>> >> end up in the CIB anyway.
>> >> And even if it didn't, the PE would continue to try the operation
>> >> again until the whole node fell over at which point it would get shot
>> >> anyway.
>> >
>> > Sorry...
>> > I did not know the fact that there was such an argument in old days.
>> >
>> >
>> >> Now, having said that, things have improved since then and perhaps,
>> >> the interest of speeding up recovery in these situations, it is time
>> >> to stop treating stop operations differently.
>> >> Would you agree?
>> >
>> > That means, you change it in the case of "Action Lost" of the stop this time to carry out
>> stonith?
>> > If my recognition is right, I agree too.
>> >
>> > if(timer->action->type != action_type_rsc) {
>> > send_update = FALSE;
>> > } else if(safe_str_eq(task, "cancel")) {
>> > /* we dont need to update the CIB with these */
>> > send_update = FALSE;
>> > }
>> > ---> delete "else if(safe_str_eq(task, "stop")){..}" ?
>> >
>> > if(send_update) {
>> > /* cib_action_update(timer->action, LRM_OP_PENDING, EXECRA_STATUS_UNKNOWN); */
>> > cib_action_update(timer->action, LRM_OP_TIMEOUT, EXECRA_UNKNOWN_ERROR);
>> > }
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> > --- Andrew Beekhof <andrew at beekhof.net> wrote:
>> >
>> >> On Tue, Sep 21, 2010 at 8:59 AM, �<renayama19661014 at ybb.ne.jp> wrote:
>> >> > Hi,
>> >> >
>> >> > Node was in state that the load was very high, and we confirmed monitor movement of
>> Pacemeker.
>> >> > Action Lost occurred in stop movement after the error of the monitor occurred.
>> >> >
>> >> > Sep �8 20:02:22 cgl54 crmd: [3507]: ERROR: print_elem: Aborting transition, action
>> lost:
>> >> [Action 9]:
>> >> > In-flight (id: prmApPostgreSQLDB1_stop_0, loc: cgl49, priority: 0)
>> >> > Sep �8 20:02:22 cgl54 crmd: [3507]: info: abort_transition_graph:
>> action_timer_callback:486
>> > -
>> >> > Triggered transition abort (complete=0) : Action lost
>> >> >
>> >> >
>> >> > For the load of the node, We think that the stop movement did not go well.
>> >> > But cannot nodes execute stonith.
>> >>
>> >> A long time ago in a galaxy far away, some messaging layers used to
>> >> loose quite a few actions, including stops.
>> >> About the same time, we decided that fencing because a stop action was
>> >> lost wasn't a good idea.
>> >>
>> >> The rationale was that if the operation eventually completed, it would
>> >> end up in the CIB anyway.
>> >> And even if it didn't, the PE would continue to try the operation
>> >> again until the whole node fell over at which point it would get shot
>> >> anyway.
>> >>
>> >> Now, having said that, things have improved since then and perhaps,
>> >> the interest of speeding up recovery in these situations, it is time
>> >> to stop treating stop operations differently.
>> >> Would you agree?
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >>
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>