[ClusterLabs] Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Thu Oct 6 16:03:23 UTC 2016

On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
>
>>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>  
>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping crmd 
>> will reboot the node (unless the watchdog fails).
>
> Thank you for comment.
>
> We examine watchdog of crmd, too.
> In addition, I comment after examination advanced.

Was thinking of doing a small test implementation going
a little in the direction Lars Ellenberg had been pointing out.

a couple of thoughts I had so far:

- add an API (via DBus or libqb - favoring libqb atm) to sbd
  an application can use to create a watchdog within sbd

- parameters for the first are a name and a timeout

- first use-case would be crmd observation

- later on we could think of removing pacemaker dependencies
  from sbd by moving the actual implementation of
  pacemaker-watcher and probably cluster-watcher as well
  into pacemaker - using the new API

- this of course creates sbd dependency within pacemaker so
  that it would make sense to offer a simpler and self-contained
  implementation within pacemaker as an alternative

  thus it would be favorable to have the dependency
  within a non-compulsory pacemaker-rpm so that
  we can offer an alternative that doesn't use sbd
  at maybe the cost of being less reliable or one
  that owns a hardware-watchdog by itself for systems
  where this is still unused.

  - e.g. via some kind of plugin (Andrew forgive me -
                                                   no pils ;-) )
  - or via an additional daemon

What did you have in mind?
Maybe it makes sense to synchronize...

Regards,
Klaus

>
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> ----- Original Message -----
>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>> Cc: 
>> Date: 2016/10/5, Wed 23:08
>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>>
>>>>>  <renayama19661014 at ybb.ne.jp> schrieb am 21.09.2016 um 11:52 
>> in Nachricht
>> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>  Hi All,
>>>
>>>  Was the final conclusion given about this problem?
>>>
>>>  If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping crmd 
>> will reboot the node (unless the watchdog fails).
>>
>>>  We are interested in this problem, too.
>>>
>>>  Best Regards,
>>>
>>>  Hideo Yamauchi.
>>>
>>>
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org 
>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>
>>>  Project Home: http://www.clusterlabs.org 
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  Bugs: http://bugs.clusterlabs.org 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org