[ClusterLabs] Antw: stonithd: stonith_choose_peer: Couldn't find anyone to fence <node> with <any>

Thu Aug 13 22:18:38 UTC 2015

> On 13 Aug 2015, at 11:36 pm, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
>>>> Kostiantyn Ponomarenko <konstantin.ponomarenko at gmail.com> schrieb am 13.08.2015
> um 13:39 in Nachricht
> <CAEnTH0fxLzwZw4jmoYK_GO0W9O6e2Gdd-ZfdfOhZRAhwCGV3bg at mail.gmail.com>:
>> Hi,
>> 
>> Brief description of the STONITH problem:
>> 
>> I see two different behaviors with two different STONITH configurations. If
>> Pacemaker cannot find a device that can STONITH a problematic node, the
>> node remains up and running. Which is bad, because it must be STONITHed.
> 
> Correct observation. I wonder whether cloning a STONITH resource would help;

no

> for a symmetric STONITH like SBD any node can fence any other node at the same time. Still pacemaker waits for the stonith resource (wich is something different than SBD) is confirmed running on one node (hard to get if one node with the STONITH resource in a two-node cluster went down unexpectedly).
> 
>> As opposite to it, if Pacemaker finds a device that, it thinks, can STONITH
>> a problematic node, even if the device actually cannot, Pacemaker goes down
>> after STONITH returns false positive. The Pacemaker shutdowns itself right
>> after STONITH.
>> Is it the expected behavior?
> 
> I'd surprised if it were.
> 
>> Do I need to configure a two more STONITH agents for just rebooting nodes
>> on which they are running (e.g. with # reboot -f)?
> 
> Good question ;-)
> 
> [...]
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org