[Pacemaker] Question about the behavior when a pacemaker's process crashed

Wed Jul 24 05:40:09 EDT 2013

(13.07.18 19:23), Andrew Beekhof wrote:
>
> On 17/07/2013, at 6:53 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>
>> (13.07.16 21:18), Andrew Beekhof wrote:
>>>
>>> On 16/07/2013, at 7:04 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>>>
>>>> (13.07.15 11:00), Andrew Beekhof wrote:
>>>>>
>>>>> On 12/07/2013, at 6:28 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using pacemaker-1.1.10.
>>>>>> When a pacemaker's process crashed, the node is sometimes fenced or is not sometimes fenced.
>>>>>> Is this the assumed behavior?
>>>>>
>>>>> Yes.
>>>>>
>>>>> Sometimes the dev1 respawns the processes fast enough that dev2 gets the "hey, i'm back" notification before the PE gets run and fencing can be initiated.
>>>>> In such cases, there is nothing to be gained from fencing - dev1 is reachable and responding.
>>>>
>>>> OK... but I want pacemaker to certainly perform either behavior (fence is performed or fence is not performed), since operation is troublesome.
>>>> I think that it is better if user can specify behavior as an option.
>>>
>>> This makes no sense. Sorry.
>>> It is wrong to induce more downtime than absolutely necessary just to make a test pass.
>>
>> If careful of the increase in downtime, isn't it better to prevent fencing, in this case?
>
> With hindsight, yes.
> But we have no way of knowing at the time.
> If you want pacemaker to wait some time for it to come back, you can set crmd-transition-delay which will achieve the same thing it does for attrd.

I think that only a little is suitable for my demand because crmd-transition-delay is delay.

>
>> Because pacemakerd respawns a broken child process, so the cluster will return to a online state.
>> If so, does subsequent fencing not increase a downtime?
>
> Yes, but only we know that because we have more knowledge than the cluster.

Is it because stack is corosync?
In pacemaker-1.0 with heartbeat, behavior when a child process crashed can be specified by ha.cf.
- when specified 'pacemaker respawn', the cluster will recover to online.
- when specified 'pacemaker on', the node will reboot by oneself.
I want to perform a setup and operation (established practice) equivalent to it.

>
>>
>> Best regards.
>>
>>>
>>>>>
>>>>> It makes writing CTS tests hard, but it is not incorrect.
>>>>>
>>>>>>
>>>>>> procedure:
>>>>>> $ systemctl start pacemaker
>>>>>> $ crm configure load update test.cli
>>>>>> $ pkill -9 lrmd
>>>>>>
>>>>>> attachment:
>>>>>> STONITH.tar.bz2 : it's crm_report when fenced
>>>>>> notSTONITH.tar.bz2 : it's crm_report when not fenced
>>>>>>
>>>>>> Best regards.
>>>>>> <notSTONITH.tar.bz2><STONITH.tar.bz2>_______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org