[Pacemaker] 'stop' operation passes outdated set of instance attributes to RA
Andrew Beekhof
andrew at beekhof.net
Mon Feb 23 19:21:00 UTC 2015
> On 24 Feb 2015, at 5:53 am, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>
> 23.02.2015 05:20, Andrew Beekhof wrote:
>>
>>> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>
>>> Hi,
>>>
>>> I believe that is a bug that 'stop' operation uses set of instance attributes from the original 'start' op, not what successful 'reload' had.
>>> Corresponding pe-input has correct set of attributes, and pre-stop 'notify' op uses updated set of attributes too.
>>> This is easily reproducible with 3.9.6 resource agents and trace_ra.
>>>
>>> pacemaker is c529898.
>>>
>>> Should I provide more information?
>>
>> Yes please.
>
> I doubt what could be needed to reproduce and fix that.
> On the one hand, everything from crm_report (may be except digest hashed) will be ok. On the other, vars are set to the outdated values, and that is visible in RA traces. May be it is enough to just to try to reproduce with my latest patch to resource agents (included in 3.9.6)?
> Steps are:
> * create a clone resource (it is enough to set clone-max=1) with RA which supports both reload and notify (may be it is simpler to unconditionally set OCF_RESKEY_trace_ra=1 in the very beginning of the resource agent before OCF framework is imported to get traces of all RA executions)
> * enable notifications (and trace_ra) for that resource
> * start the resource
> * change parameters for the resource - that should cause reload
> * stop the resource
> * compare printenv output in the very beginning of the start, reload, notify pre-stop and stop actions traces.
>
> Everything should be clear just after that is done I think.
General rule of thumb... add 1 month turnaround if I need to set up a cluster to reproduce compared to looking at logs/PE files.
Thats not me being mean, I simply don't have the bandwidth. Yesterday I did nothing but answer emails and I barely scratched the surface.
So the easier it is for me to reply, the sooner its going to happen.
>
> Best,
> Vladislav
>
>
>> I suspect the lrmd needs to update it's parameter cache for the reload operation.
Did you try David's fix?
(See, I didn't even find time to hunt down the right place for a 1 line change)
>>
>> David?
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list