[Pacemaker] Question about the behavior when a pacemaker's process crashed

Thu Jul 25 02:00:45 UTC 2013

On 24/07/2013, at 7:40 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:

> (13.07.18 19:23), Andrew Beekhof wrote:
>> 
>> On 17/07/2013, at 6:53 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>> 
>>> (13.07.16 21:18), Andrew Beekhof wrote:
>>>> 
>>>> On 16/07/2013, at 7:04 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>>>> 
>>>>> (13.07.15 11:00), Andrew Beekhof wrote:
>>>>>> 
>>>>>> On 12/07/2013, at 6:28 PM, Kazunori INOUE <inouekazu at intellilink.co.jp> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm using pacemaker-1.1.10.
>>>>>>> When a pacemaker's process crashed, the node is sometimes fenced or is not sometimes fenced.
>>>>>>> Is this the assumed behavior?
>>>>>> 
>>>>>> Yes.
>>>>>> 
>>>>>> Sometimes the dev1 respawns the processes fast enough that dev2 gets the "hey, i'm back" notification before the PE gets run and fencing can be initiated.
>>>>>> In such cases, there is nothing to be gained from fencing - dev1 is reachable and responding.
>>>>> 
>>>>> OK... but I want pacemaker to certainly perform either behavior (fence is performed or fence is not performed), since operation is troublesome.
>>>>> I think that it is better if user can specify behavior as an option.
>>>> 
>>>> This makes no sense. Sorry.
>>>> It is wrong to induce more downtime than absolutely necessary just to make a test pass.
>>> 
>>> If careful of the increase in downtime, isn't it better to prevent fencing, in this case?
>> 
>> With hindsight, yes.
>> But we have no way of knowing at the time.
>> If you want pacemaker to wait some time for it to come back, you can set crmd-transition-delay which will achieve the same thing it does for attrd.
> 
> I think that only a little is suitable for my demand because crmd-transition-delay is delay.

The only alternative to a delay, either by crmd-transition-delay or some other means, is that the crmd predicts the future.

> 
>> 
>>> Because pacemakerd respawns a broken child process, so the cluster will return to a online state.
>>> If so, does subsequent fencing not increase a downtime?
>> 
>> Yes, but only we know that because we have more knowledge than the cluster.
> 
> Is it because stack is corosync?

No.

> In pacemaker-1.0 with heartbeat, behavior when a child process crashed can be specified by ha.cf.
> - when specified 'pacemaker respawn', the cluster will recover to online.

The node may still end up being fenced even with "pacemaker respawn".

If the node does not recover fast enough, relative to the "some process died" notification, then the node will get fenced.
If the "hey the process is back again" notification gets held up due to network congestion, then the node will get fenced.
Like most things in clustering, timing is hugely significant - consider a resource that fails just before vs. just after a monitor action is run

Now it could be that heartbeat is consistently slow sending out the "some process died" notification (I recall it does not send them at all sometimes), but that would be a bug not a feature.

> - when specified 'pacemaker on', the node will reboot by oneself.

"by oneself"?  Not because the other side fences it?

> I want to perform a setup and operation (established practice) equivalent to it.
> 
>> 
>>> 
>>> Best regards.
>>> 
>>>> 
>>>>>> 
>>>>>> It makes writing CTS tests hard, but it is not incorrect.
>>>>>> 
>>>>>>> 
>>>>>>> procedure:
>>>>>>> $ systemctl start pacemaker
>>>>>>> $ crm configure load update test.cli
>>>>>>> $ pkill -9 lrmd
>>>>>>> 
>>>>>>> attachment:
>>>>>>> STONITH.tar.bz2 : it's crm_report when fenced
>>>>>>> notSTONITH.tar.bz2 : it's crm_report when not fenced
>>>>>>> 
>>>>>>> Best regards.
>>>>>>> <notSTONITH.tar.bz2><STONITH.tar.bz2>_______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> 
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org