[Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

Wed Jul 9 22:00:52 UTC 2014

On 9 Jul 2014, at 10:43 pm, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:

> On Tue, Jul 8, 2014, at 06:06, Andrew Beekhof wrote:
>> 
>> On 5 Jul 2014, at 1:00 am, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
>> 
>>> From: andrew at beekhof.net
>>> Date: Fri, 4 Jul 2014 22:50:28 +1000
>>> To: pacemaker at oss.clusterlabs.org
>>> Subject: Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require	--force to be added to levels
>>> 
>>> 
>>> On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
>>> 
>>>>>> Hi all,
>>>>>> while creating a cloned stonith resource
>>>>> 
>>>>> Any particular reason you feel the need to clone it?
>>>> 
>>>> In the end, I suppose it's only a "purist mindset" :) because it is a PDU whose power outlets control both nodes, so
>>>> its resource "should be" active (and monitored) on both nodes "independently".
>>>> I understand that it would work anyway, leaving it not cloned and not location-constrained
>>>> just as regular, "dedicated" stonith devices would not need to be location-constrained, right?
>>>> 
>>>>>> for multi-level STONITH on a fully-up-to-date CentOS 6.5 (pacemaker-1.1.10-14.el6_5.3.x86_64):
>>>>>> 
>>>>>> pcs cluster cib stonith_cfg
>>>>>> pcs -f stonith_cfg stonith create pdu1 fence_apc action="off" \
>>>>>>    ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \    pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \
>>>>>>    pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval="240s"
>>>>>> pcs -f stonith_cfg resource clone pdu1 pdu1Clone
>>>>>> pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1Clone
>>>>>> pcs -f stonith_cfg stonith level add 2 cluster2.verolengo.privatelan pdu1Clone
>>>>>> 
>>>>>> 
>>>>>> the last 2 lines do not succeed unless I add the option "--force" and even so I still get errors when issuing verify:
>>>>>> 
>>>>>> [root at cluster1 ~]# pcs stonith level verify
>>>>>> Error: pdu1Clone is not a stonith id
>>>>> 
>>>>> If you check, I think you'll find there is no such resource as 'pdu1Clone'.
>>>>> I don't believe pcs lets you decide what the clone name is.
>>>> 
>>>> You're right! (obviously ;> )
>>>> It's been automatically named pdu1-clone
>>>> 
>>>> I suppose that there's still too much crmsh in my memory :)
>>>> 
>>>> Anyway, removing the stonith level (to start from scratch) and using the correct clone name does not change the result:
>>>> 
>>>> [root at cluster1 etc]# pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1-clone
>>>> Error: pdu1-clone is not a stonith id (use --force to override)
>>> 
>>> I bet we didn't think of that.
>>> What if you just do:
>>> 
>>>   pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1
>>> 
>>> Does that work?
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> Yes, no errors at all and verify successful.
> 
> This initially passed by as a simple check for general sanity, while now, on second read, I think you were suggesting that I could clone as usual then configure with the primitive resource (which I usually avoid when working with regular clones) and it should automatically use instead the clone "at runtime", correct?

right. but also consider not cloning it at all :)

> 
>>> Remember that a full real test (to verify actual second level functionality in presence of first level failure)
>>> is still pending for both the plain and cloned setup.
>>> 
>>> Apropos: I read through the list archives that stonith resources (being resources, after all)
>>> could themselves cause fencing (!) if failing (start, monitor, stop)
>> 
>> stop just unsets a flag in stonithd.
>> start does perform a monitor op though, which could fail.
>> 
>> but by default only stop failure would result in fencing.
> 
> I though that start-failure-is-fatal was true by default, but maybe not for stonith resources.

fatal in the sense of "won't attempt to run it there again", not the "fence the whole node" way

> 
>>> and that an ad-hoc
>>> on-fail setting could be used to prevent that.
>>> Maybe my aforementioned naive testing procedure (pull the iLO cable) could provoke that?
>> 
>> _shouldnt_ do so
>> 
>>> Would you suggest to configure such an on-fail option?
>> 
>> again, shouldn't be necessary
> 
> Thanks again.
> 
> Regards,
> Giuseppe
> 
>>> Many thanks again for your help (and all your valuable work, of course!).
>>> 
>>> Regards,
>>> Giuseppe
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> Email had 1 attachment:
>> + signature.asc
>>  1k (application/pgp-signature)
> -- 
>  Giuseppe Ragusa
>  giuseppe.ragusa at fastmail.fm
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140710/2bd13309/attachment-0004.sig>