[Pacemaker] Shooting and diagnosis of stonith plugins

Thu Oct 16 09:00:16 UTC 2008

Hi Dejan.

>> If 'ibmrsa-telnet' goes right way, it means any stonith plugin
>> that can't shoot a host machine with a power fault must not
>> be used alone. They must use with some other plugin which checks
>> if its target machines is running or not.
>
> This is an inherent problem of the lights-out devices such as IBM
> RSA or HP iLO, i.e. that they share power source with the node
> they manage. Power failure renders this kind of stonith device
> useless. Unfortunately, there's nothing one can do about it.

But something must be done.

In this case, what a plugin can do is one of the following:

   A) Check the target by another way.
   B) Retry forever.
   C) Return failure to caller.

A is what 'ssh' does.
   And you said 'ssh' isn't a production.
   Does it mean any other real stonith plugin must not do A?

B is remarked in http://www.linux-ha.org/STONITH.
   it says like this.

     3. When given a RESET or OFF command it must not return
        control to its caller until the node is no longer running.

   Any plugin follows B keeps running until stonithd kills it
   on an error.

C is what 'ibmrsa-telnet' does.
   Any plugin follows C returns failure on an error immediatly.
   But I don't know any document which encourages C.

Which is a right choice for real stonith plugins?

Dejan Muhamedagic wrote:
> Hi Takenaka-san,
> 
> On Wed, Oct 15, 2008 at 02:09:17PM +0900, Takenaka Kazuhiro wrote:
>> Hi Dejan.
>>
>>> Hi Takenaka-san,
>>>
>>> On Fri, Oct 10, 2008 at 03:30:27PM +0900, Takenaka Kazuhiro wrote:
>>>> Hi all.
>>>>
>>>> So far as I know, every stonith plugin is expected to diagnose if
>>>> its target is fenced out from the other nodes before it returns
>>>> successful status on 'reset' or 'off'.
>>> It depends on the stonith device. Sometimes it is enough just to
>>> send the reset command and let the device deal with it. Sometimes
>>> it is necessary to check the current power state. However, it
>>> looks like this is not what you want to talk about.
>> You said "The point of a stonith operation is to ensure that a host
>> is down or rebooted." in the following thread.
>>
>> http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034323.html
>>
>> So I have thought any stonith plugin should make sure if its target
>> is down or rebooted before it returns. However, this isn't a main
>> issue just as you understood.
>>
>>>> However, I think this diagnosis is somewhat excess burden for an
>>>> indivdual plugin.
>>> Actually, the stonith plugins are not required to know the state
>> ... snip ...
>>>>   <primitive type="external/ssh class="stonith" task="shoot" ...>
>>>>
>>>> I hope some kind of agreement will be made about this problem.
>> Please let me put aside your comments abobe for now.
>> I have an question about your comments below and I'd like
>> you to answer it first.
>>
>>> This new concept does make sense with the ssh plugin. However,
>>> all other plugins function in a significantly different way and I
>>> don't see how this can apply to them.
>>>
>>> Thanks,
>>>
>>> Dejan
>> Yes. 'ssh' is so different from 'ibmrsa-telnet'.
>>
>> 'ssh' shooots a target via a NIC.
>> 'ibmrsa-telnet' shooots a target via a RSA.
> 
> Actually, I'd rather leave ssh out of this discussion. It was
> never meant for production, just for testing.
> 
>> So, these devices must lost their power when power-faults
>> occur on their host machines
>>
>> In this case, neither 'ssh' nor 'ibmrsa-telnet' can deal with
>> their target devices. They gets a explicit connection failure
>> in this situation.
>>
>> But what actually follows is so different.
>>
>> In the case where 'ssh' is used as a stonith plugin, it returns a
>> successful status and the suspended resources are resumed on the
>> other nodes.
>>
>> On the other hand, In the case where 'ibmrsa-telnet' is used,
>> it returns an error status and the suspended resources are not
>> resumed anywhere. (I think 'ibmrsa-telnet' isn't only one plugin
>> that works in this way. 'ibmrsa' and 'ipmi' also should work in
>> the same way.)
>>
>> 'ssh' and 'ibmrsa-telnet' measure success and failure of
>> shooting targets in different way and it makes difference
>> of these results.
>>
>> 'ssh' never checks whether it could deal with its target device.
>> Even if the deal failes explicitly, 'ssh' ignores it.
>> Instead, 'ssh' always returns its status according to a subsequent
>> ping check.
>>
>> On the other hand, 'ibmrsa-telnet' returns its status according
>> to if it could deal with the device. Whenever 'ibmrsa-telnet'
>> gets any explicit failure with dealing, it returns an error
>> status. 'ibmrsa-telnet' never checks target's status in any way.
>>
>> Which is a correct implementation as a stonith plugin?
> 
> Both. Note that ssh relies on the network, hence using ping to
> verify the host status is fine. However, for a "real" stonith
> device such as RSA doing that would be wrong.
> 
>> In the other words, When a explicit connection error occurs
>> during a stonith action, How should stonith plugins do?
>>
>> I have believed 'ssh' goes right way. Because I have thought
>> a stonith plugin which failes a failover on a power fault
>> is out of problem.
> 
> If the stonith device cannot be reached then we don't know if the
> host is running or not. Hence we have to assume the worst case.
> 
>> If 'ibmrsa-telnet' goes right way, it means any stonith plugin
>> that can't shoot a host machine with a power fault must not
>> be used alone. They must use with some other plugin which checks
>> if its target machines is running or not.
> 
> This is an inherent problem of the lights-out devices such as IBM
> RSA or HP iLO, i.e. that they share power source with the node
> they manage. Power failure renders this kind of stonith device
> useless. Unfortunately, there's nothing one can do about it.
> 
> Thanks,
> 
> Dejan
> 
>> Dejan Muhamedagic wrote:
>>> Hi Takenaka-san,
>>>
>>> On Fri, Oct 10, 2008 at 03:30:27PM +0900, Takenaka Kazuhiro wrote:
>>>> Hi all.
>>>>
>>>> So far as I know, every stonith plugin is expected to diagnose if
>>>> its target is fenced out from the other nodes before it returns
>>>> successful status on 'reset' or 'off'.
>>> It depends on the stonith device. Sometimes it is enough just to
>>> send the reset command and let the device deal with it. Sometimes
>>> it is necessary to check the current power state. However, it
>>> looks like this is not what you want to talk about.
>>>
>>>> However, I think this diagnosis is somewhat excess burden for an
>>>> indivdual plugin.
>>> Actually, the stonith plugins are not required to know the state
>>> of the host. They just make sure that the host is in a certain
>>> state or that it is reset. This normally doesn't involve the host
>>> itself, just the device which can manage it. Put in other words:
>>> If you pull the power plug or press the reset button there's no
>>> need to try ping or ssh or whatever else to verify that the host
>>> really went down.
>>>
>>>> Because authors of plugins know how to deal with stonith devices
>>>> for which they make plugins, but they can't always expect structure
>>>> of clusters on which their plugins will work.
>>>>
>>>> When a clusters administrator try to use some plugin but the diagnosis
>>>> of the plugin doesn't match the cluster, the administrator can't help
>>>> but directly alter the plugin.
>>>>
>>>> This gets down plugins' adaptiveness and can't be favorable.
>>>> One idea to avoid this problem is making schemes or conventions
>>>> which enable plugins to delegate the diagnosis to other plugins.
>>>>
>>>> Attached two plugins are a sample of this idea. They work cooperatively
>>>> by the attached cib.xml.
>>> It is an interesting idea. It seems like it would require that
>>> all existing stonith plugins return false so that the next, the
>>> "test status" plugin can report the state of the host.
>>>
>>>> 'sshAltered' only shoots its targets and 'pingAllAddr' only diagnoses
>>>> activity of its targets.
>>>>
>>>> The followings are little more detailed explanations:
>>>>
>>>>   When some accidents made necessary to shoot a corrupted node
>>>>   by another node, the shooter node uses 'sshAltered' firstly to
>>>>   shoot the target node.
>>>>
>>>>   'sshAltered' shoots its targets but never exits with a successful
>>>>   status if the value of attribute 'shoot_only' is "yes" in the same
>>>>   way as the attached cib.xml. So, next plugin will be used always
>>>>   if it is defined.
>>>>
>>>>   'pingAllAddr' confirms activity of the IP addresses of its targets
>>>>   specified in cib.xml. If any of the IP addresses don't respond,
>>>>   'pingAllAddr' exits with a successful status, otherwise it
>>>>   exits with an error status.
>>>>
>>>> After once 'external/ssh' is rewritten into 'sshAltered', there
>>>> is no need to rewrite it again to use other conditions to
>>>> confirm targets' death.
>>>>
>>>> For example, if a cluster uses iSCSI shared storages and
>>>> a failover action on this cluster must wait for the iSCSI target
>>>> devices to sweep connections to the corrupted node, it can do by
>>>> the other type plugins instead of 'pingAllAddr'. Their task is to
>>>> ask iSCSI target devices about completion of connection sweeping.
>>>>
>>>> Vice-versa is also true. Any plugin which follows the explained
>>>> convention can work together with 'pingAllAddr'.
>>>>
>>>> It can also be avalable by another tag-attibute like this:
>>>>
>>>>   <primitive type="external/ssh class="stonith" task="shoot" ...>
>>>>
>>>> I hope some kind of agreement will be made about this problem.
>>> This new concept does make sense with the ssh plugin. However,
>>> all other plugins function in a significantly different way and I
>>> don't see how this can apply to them.
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>>
>>>> Best regard.
>>>> -- 
>>>> Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
> 

-- 
竹中 一博
Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp>
NTT OSSセンタ 技術ユニット
TEL 03-5860-5135