[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Tue Oct 11 08:58:26 UTC 2016
Hi Klaus,
Thank you for comment.
I make the patch which is prototype using WD service.
Please wait a little.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: Klaus Wenninger <kwenning at redhat.com>
> To: users at clusterlabs.org
> Cc:
> Date: 2016/10/10, Mon 21:03
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>
> On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp wrote:
>> Hi All,
>>
>> Our user may not necessarily use sdb.
>>
>> I confirmed that there was a method using WD service of corosync as one
> method not to use sdb.
>> Pacemaker watches the process of pacemaker by WD service using CMAP and can
> carry out watchdog.
>
> Have to have a look at that...
> But if we establish some in-between-layer in pacemaker we could have this
> as one of the possibilities besides e.g. sbd (with enhanced API), going for
> a watchdog-device directly, ...
>
>>
>>
>> We can set up a patch of pacemaker.
>
> Always helpful to discuss/clarify an idea once some code is available ...
>
>> Was the discussion of using WD service over so far?
>
> Not from my pov. Just a day off ;-)
>
>>
>>
>> Best Regard,
>> Hideo Yamauchi.
>>
>>
>> ----- Original Message -----
>>> From: Klaus Wenninger <kwenning at redhat.com>
>>> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>;
> users at clusterlabs.org
>>> Cc:
>>> Date: 2016/10/7, Fri 17:47
>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC
> crmd is frozen, cluster decisions are delayed infinitely
>>>
>>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am
>
>>> 06.10.2016 um 18:03 in
>>>> Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>>> On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>> Hi All,
>>>>>>
>>>>>>>> If a user uses sbd, can the cluster evade a
> problem of
>>> SIGSTOP of crmd?
>>>>>>>
>>>>>>> As pointed out earlier, maybe crmd should feed a
> watchdog. Then
>>> stopping
>>>>> crmd
>>>>>>> will reboot the node (unless the watchdog fails).
>>>>>> Thank you for comment.
>>>>>>
>>>>>> We examine watchdog of crmd, too.
>>>>>> In addition, I comment after examination advanced.
>>>>> Was thinking of doing a small test implementation going
>>>>> a little in the direction Lars Ellenberg had been pointing
> out.
>>>>>
>>>>> a couple of thoughts I had so far:
>>>>>
>>>>> - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>>> an application can use to create a watchdog within sbd
>>>> Why has it to be done within sbd?
>>> Not necessarily, could be spawned out as well into an own project or
>>> something already existent could be taken.
>>> Remember to have added a dbus-interface to
>>> https://sourceforge.net/projects/watchdog/ for a project once.
>>> If you have a suggestion I'm open.
>>> Going off sbd would have the advantage of a smooth start:
>>>
>>> - cluster/pacemaker-watcher are there already and can
>>> be replaced/moved over time
>>> - the lifecycle of the daemon (when started/stopped) is
>>> already something that is in the code and in the people's minds
>>>
>>>>> - parameters for the first are a name and a timeout
>>>>>
>>>>> - first use-case would be crmd observation
>>>>>
>>>>> - later on we could think of removing pacemaker dependencies
>>>>> from sbd by moving the actual implementation of
>>>>> pacemaker-watcher and probably cluster-watcher as well
>>>>> into pacemaker - using the new API
>>>>>
>>>>> - this of course creates sbd dependency within pacemaker so
>>>>> that it would make sense to offer a simpler and
> self-contained
>>>>> implementation within pacemaker as an alternative
>>>> I think the watchdog interface is so simple that you don't
> need a relay
>>> for it. The only limit I can imagine is the number of watchdogs
> available of
>>> some specific hardware.
>>> That is the point ;-)
>>>>> thus it would be favorable to have the dependency
>>>>> within a non-compulsory pacemaker-rpm so that
>>>>> we can offer an alternative that doesn't use sbd
>>>>> at maybe the cost of being less reliable or one
>>>>> that owns a hardware-watchdog by itself for systems
>>>>> where this is still unused.
>>>>>
>>>>> - e.g. via some kind of plugin (Andrew forgive me -
>>>>> no pils ;-)
> )
>>>>> - or via an additional daemon
>>>>>
>>>>> What did you have in mind?
>>>>> Maybe it makes sense to synchronize...
>>>>>
>>>>> Regards,
>>>>> Klaus
>>>>>
>>>>>> Best Regards,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>>>>>> Cc:
>>>>>>> Date: 2016/10/5, Wed 23:08
>>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC
> crmd is
>>> frozen,
>>>>> cluster decisions are delayed infinitely
>>>>>>>>>> <renayama19661014 at ybb.ne.jp>
> schrieb am
>>> 21.09.2016 um 11:52
>>>>>>> in Nachricht
>>>>>>>
> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Was the final conclusion given about this
> problem?
>>>>>>>>
>>>>>>>> If a user uses sbd, can the cluster evade a
> problem of
>>> SIGSTOP of crmd?
>>>>>>> As pointed out earlier, maybe crmd should feed a
> watchdog. Then
>>> stopping
>>>>> crmd
>>>>>>> will reboot the node (unless the watchdog fails).
>>>>>>>
>>>>>>>> We are interested in this problem, too.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Hideo Yamauchi.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list