[Pacemaker] Proposed new stonith topology syntax

Tue Feb 7 06:51:43 UTC 2012

07.02.2012 00:22, Andrew Beekhof wrote:
> Stonith is never a SPOF.

Sorry for being unclear.

I meant that having redundant PSU connected to two outlets of the same
PDU (connected to the one power source in turn) is a SPOF for a node,
not for a cluster.

So I assume that everybody connect every RPSU to two different PDUs and
to two different power sources.

Then it is impossible to do reset-like (all offs - all ons) operation on
two power outlets from within a single instance of fencing agent (which
knows about one PDU only). Then that logic should be considered to be
moved one layer upper.

> 
> Something else needs to have failed before fencing has even a chance to do so.
> 
> Unless you put all the nodes on the same PDU... but that would be silly.
> 
> On Mon, Feb 6, 2012 at 3:29 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>> 06.02.2012 01:55, Andrew Beekhof wrote:
>>> On Sat, Feb 4, 2012 at 5:50 AM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>> Hi Andrew, Dejan, all,
>>>>
>>>> 25.01.2012 03:24, Andrew Beekhof wrote:
>>>> [snip]
>>>>>>> If they're for the same host but different devices, then at most
>>>>>>> you'll get the commands sent in parallel, guaranteeing simultaneous is
>>>>>>> near impossible.
>>>>>>
>>>>>> Yes, what I meant is almost simultaneous, i.e. that both ports
>>>>>> are for a while turned "off" at the same time. I'm not sure how
>>>>>> does it work in reality. For instance, how long does the reset
>>>>>> command keep the power off on the outlet. So, it should be
>>>>>> "simultanous enough" :)
>>>>>
>>>>> I dont think 'reboot' is an option if you're using multiple devices.
>>>>> You have to use 'off' (followed by a manual 'on') for any kind of reliability.
>>>>>
>>>>
>>>> Why not to implement subsequent 'ons' after all 'offs' are confirmed?
>>>
>>> That could be possible in the future.
>>> However since none of this was possible in the old stonithd, its not
>>> something I plan for the initial implementation.
>>>
>>> Also, you're requiring an extra level of intelligence in stonith-ng,
>>> to know that even though the admin asked for 'reboot' and the devices
>>> support 'reboot', that we should ignore that and do 'off' + 'on' in
>>> some specific scenarios.
>>>

I just
>>>> With some configurable delay f.e.
>>>> That would be great for careful admins who keep fencing device lists actual.
>>>> From admin's PoV, reset and reset-like on-off operations should not
>>>> differ in a result, offending host should be restarted if admin says
>>>> 'restart' or 'reboot' in fencing parameters for that host (sorry, do not
>>>> remember which one is used).
>>>> Need in manual 'on' looks like a limitation for me so I wouldn't use
>>>> such fencing mechanism. I prefer to have everything automated and
>>>> predictable as much as possible.
>>>
>>> Then don't put a node under the control of two devices.
>>> Have it be two ports on the same host and you wont hit this limitation.
>>
>> It's a SPOF in the case of PDUs.
>>
>> I do not use PDUs at all, I have everything ready to shorten 'reset'
>> lines on servers instead of plugging off power cords, just waiting for
>> linear fencing topology to be implemented in both snonith-ng and crmsh.
>>
>> So, I just care about generic admin who wants to use PDUs for fencing.
>>
>>>
>>>> If 'on' is not done, then fencing is not doing what you've specified
>>>> (for 'reboot/reset' action).
>>>>
>>>> Even more, if we need to do 'reset' of a host which has two PSUs
>>>> connected to two different PDUs, then it should be translated to
>>>> 'all-off' - 'delay' - 'all-on' automatically. I would like such powerful
>>>> fencing system very much (yes, I'm a careful admin).
>>>>
>>>> I understand that implementation will require some efforts (even for so
>>>> great programmer like you Andrew), but that would be a really useful
>>>> feature,
>>>>
>>>> Best,
>>>> Vladislav
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org